NullSpend Docs

Cost Tracking

NullSpend calculates the cost of every LLM request automatically. No SDK or code changes beyond the initial proxy setup.

NullSpend calculates the cost of every LLM request automatically. No SDK or code changes beyond the initial proxy setup.

How It Works

Your App ──► NullSpend Proxy ──► Provider (OpenAI / Anthropic)

                ◄── response ───────────┘

                ├─ Extract token counts from response
                ├─ Look up model pricing
                ├─ Calculate cost in microdollars
                └─ Log cost event (async, never blocks response)

The proxy extracts usage data from the provider's response — token counts for input, output, cached, and reasoning tokens — then calculates cost using the model's pricing rates. Cost is expressed in microdollars (1 microdollar = $0.000001, so $1.00 = 1,000,000 microdollars).

Cost logging is asynchronous. It never adds latency to your response.

OpenAI Cost Formula

normalInputTokens = prompt_tokens - cached_tokens
cost = (normalInputTokens × inputRate)
     + (cachedTokens × cachedInputRate)
     + (completionTokens × outputRate)
  • Reasoning tokens (from o3, o4-mini, o1) are a subset of completion_tokens. They are tracked separately for attribution but are not double-counted — the output cost already includes them.
  • The proxy tries model from the request first, then falls back to the model in the response (handles aliases).
  • Each component is rounded individually, then a residual correction is applied to the largest component to guarantee the parts sum to the total.

Example

A gpt-4o request with 1,000 input tokens (200 cached) and 500 output tokens:

normalInput = 1000 - 200 = 800
inputCost   = 800 × 2.50 / 1,000,000 = 0.002 → 2 microdollars
cachedCost  = 200 × 1.25 / 1,000,000 = 0.00025 → 0 microdollars
outputCost  = 500 × 10.00 / 1,000,000 = 0.005 → 5 microdollars
total       = 7 microdollars ($0.000007)

Anthropic Cost Formula

cost = (input_tokens × inputRate)
     + (cache_creation_tokens × cacheWriteRate)
     + (cache_read_tokens × cachedInputRate)
     + (output_tokens × outputRate)

Key differences from OpenAI:

  • input_tokens is already uncached. No subtraction needed — Anthropic separates input and cache read tokens in the response.
  • Cache write has two TTL tiers. If the response includes ephemeral_5m_input_tokens and ephemeral_1h_input_tokens, each is priced at its respective rate. Otherwise, all cache creation tokens use the 5-minute rate.
  • Long context pricing. When input_tokens + cache_creation_tokens + cache_read_tokens > 200,000, multipliers apply:
    • Input, cached input, cache write: 2x
    • Output: 1.5x
  • Thinking tokens (from Claude with extended thinking) are included in output_tokens and priced at the output rate.

Example

A claude-sonnet-4-5 request with 5,000 input tokens, 1,000 cache read tokens, and 2,000 output tokens:

inputCost  = 5000 × 3.00 / 1,000,000 = 0.015 → 15 microdollars
cachedCost = 1000 × 0.30 / 1,000,000 = 0.0003 → 0 microdollars
outputCost = 2000 × 15.00 / 1,000,000 = 0.03 → 30 microdollars
total      = 45 microdollars ($0.000045)

Pre-Request Cost Estimation

Before forwarding a request, the proxy estimates the maximum possible cost. This estimate is used for budget enforcement — if the estimate exceeds the remaining budget, the request is blocked before reaching the provider.

Formula:

inputTokenEstimate  = ceil(JSON.stringify(requestBody).length / 4)
outputTokenEstimate = max_completion_tokens ?? max_tokens ?? modelCap ?? defaultCap

estimatedCost = round((inputCost + outputCost) × 1.1)
  • The ÷ 4 ratio approximates 4 characters per token.
  • The 1.1× safety margin accounts for estimation imprecision.
  • If the model is unknown, the fallback estimate is $1.00 (1,000,000 microdollars).

Default output caps by model:

ModelsDefault Cap
o3, o3-mini, o4-mini, o1100,000 tokens
claude-opus-4-6, claude-opus-4-5128,000 tokens
claude-sonnet-4-6, claude-sonnet-4-5, claude-opus-4-1, claude-haiku-4-564,000 tokens
claude-haiku-3.58,000 tokens
claude-haiku-34,000 tokens
All other OpenAI models16,384 tokens
All other Anthropic models64,000 tokens

What's Recorded

Every request produces a cost event with these fields:

FieldTypeDescription
requestIdstringUnique request identifier
providerstring"openai" or "anthropic"
modelstringModel used (e.g., gpt-4o, claude-sonnet-4-5)
inputTokensintegerTotal input tokens (OpenAI: prompt_tokens; Anthropic: input + cache creation + cache read)
outputTokensintegerOutput/completion tokens
cachedInputTokensintegerCached input tokens (OpenAI: cached_tokens; Anthropic: cache_read_tokens)
reasoningTokensintegerReasoning/thinking tokens (OpenAI o-series, Anthropic extended thinking)
costMicrodollarsintegerTotal cost in microdollars
costBreakdownobjectPer-component costs: { input, cached, output, reasoning }. Available in the summary aggregation (GET /api/cost-events/summary) but not in individual event responses.
durationMsintegerTotal request duration in milliseconds
sessionIdstring?Session ID from X-NullSpend-Session header
traceIdstring?Trace ID (from traceparent, X-NullSpend-Trace-Id, or auto-generated)
sourcestring"proxy", "api", or "mcp"
tagsobjectKey-value pairs from X-NullSpend-Tags header
apiKeyIdstringAPI key that made the request
createdAttimestampWhen the event was recorded

Cost events are deduplicated by (requestId, provider) — reprocessing the same request is safe.

Request / Response Body Logging

Pro and Enterprise plans can capture the full request and response bodies for every proxied LLM call. This is useful for debugging agent behavior, auditing prompts, and replaying requests.

AspectDetail
ActivationAutomatic when your org has a Pro or Enterprise subscription
StorageCloudflare R2, scoped by org ID and request ID
Size cap1 MB per object (request body and response body are stored separately)
StreamingStreaming (SSE) responses are captured in full — the raw SSE text is accumulated during streaming and stored after the stream completes
Cancelled streamsPartial response bodies are stored for cancelled streams (valuable for debugging why clients abort)
Non-streamingJSON request and response bodies are stored as-is
ViewingRequest and response bodies are visible on the cost event detail page in the dashboard
Latency impactZero — the accumulator passes chunks through immediately; storage happens asynchronously after the response is delivered

Body logging is an observability feature. It does not affect cost calculation, budget enforcement, or webhook dispatch. If R2 storage fails, the cost event and budget reconciliation proceed normally.

Session Replay

Group cost events by session to see the full sequence of LLM calls your agent made. This is the primary tool for debugging agent behavior in production.

How to Use

  1. Set the session header on your agent's requests:
X-NullSpend-Session: research-task-47

Any string up to 256 characters. Typically a conversation ID, task ID, or run ID from your agent framework.

  1. View in the dashboard. Session IDs appear as clickable links in the Activity table and on individual cost event detail pages. Click to open the session replay page.

  2. Session replay page shows:

    • Summary stats — total cost, event count, duration, total tokens
    • Chronological timeline — every LLM call in the session, ordered by timestamp
    • Expandable events — click any event to load the full request and response bodies (requires Pro/Enterprise body logging)

What You See

Session: research-task-47
Total cost: $4.30 | Events: 12 | Duration: 2m 34s | Tokens: 18,400

Timeline:
  14:21:05  GPT-4o        100 → 50 tokens    $0.15    680ms
  14:21:08  GPT-4o        900 → 380 tokens   $0.12    450ms
  14:21:15  Claude Sonnet 600 → 200 tokens   $0.08    320ms
  14:21:22  GPT-4o        2,100 → 1,200      $0.45    1.2s
  ...

Click any row to expand and see the full request/response bodies — what the agent sent and what the provider returned.

API Access

Query session events programmatically:

  • List events for a session: GET /api/cost-events?sessionId=research-task-47
  • Full session with summary: GET /api/cost-events/sessions/research-task-47

See Cost Events API for details.

Cost Sources

SourceHow It Works
proxyAutomatic. Every request through proxy.nullspend.dev generates a cost event.
apiManual. POST to /api/cost-events with token counts. Used by the SDK or direct HTTP.
mcpMCP tool calls. Budget check and cost event ingestion via /v1/mcp/budget/check and /v1/mcp/events.

Cancelled Streams

When a streaming response is cancelled before completion (client disconnects), the proxy cannot extract final token counts from the response. Instead, it logs an estimated cost event with two system tags:

  • _ns_estimated: "true" — cost is an estimate, not exact
  • _ns_cancelled: "true" — the stream was cancelled

The estimate uses the pre-request estimation formula. These events are included in budget spend tracking to prevent cancelled streams from creating a gap between tracked and actual spend.

Where to See Costs

  • Supported Models — full pricing table for all 38 models
  • Tracing — correlate requests across a multi-step agent run
  • Tags — attribute costs to teams, environments, and features
  • Budgets — enforce spending ceilings based on cost tracking
  • Webhooks — get notified on every cost event

On this page