NullSpend

How NullSpend calculates per-request cost for every model — input, output, cached, and reasoning tokens — across the proxy and SDK paths.

NullSpend calculates the cost of every LLM request automatically. No SDK or code changes beyond the initial proxy setup.

How It Works

Your App ──► NullSpend Proxy ──► Provider (OpenAI / Anthropic)
                                        │
                ◄── response ───────────┘
                │
                ├─ Extract token counts from response
                ├─ Look up model pricing
                ├─ Calculate cost in microdollars
                └─ Log cost event (async, never blocks response)

The proxy extracts usage data from the provider's response — token counts for input, output, cached, and reasoning tokens — then calculates cost using the model's pricing rates. Cost is expressed in microdollars (1 microdollar = $0.000001, so $1.00 = 1,000,000 microdollars).

Cost logging is asynchronous. It never adds latency to your response.

OpenAI Cost Formula

normalInputTokens = prompt_tokens - cached_tokens
cost = (normalInputTokens × inputRate)
     + (cachedTokens × cachedInputRate)
     + (completionTokens × outputRate)

Reasoning tokens (from o3, o4-mini, o1) are a subset of completion_tokens. They are tracked separately for attribution but are not double-counted — the output cost already includes them.
The proxy tries model from the request first, then falls back to the model in the response (handles aliases).
Each component is rounded individually, then a residual correction is applied to the largest component to guarantee the parts sum to the total.

Example

A gpt-4o request with 1,000 input tokens (200 cached) and 500 output tokens:

normalInput = 1000 - 200 = 800
inputCost   = 800 × 2.50 / 1,000,000 = 0.002 → 2 microdollars
cachedCost  = 200 × 1.25 / 1,000,000 = 0.00025 → 0 microdollars
outputCost  = 500 × 10.00 / 1,000,000 = 0.005 → 5 microdollars
total       = 7 microdollars ($0.000007)

Anthropic Cost Formula

cost = (input_tokens × inputRate)
     + (cache_creation_tokens × cacheWriteRate)
     + (cache_read_tokens × cachedInputRate)
     + (output_tokens × outputRate)

Key differences from OpenAI:

input_tokens is already uncached. No subtraction needed — Anthropic separates input and cache read tokens in the response.
Cache write has two TTL tiers. If the response includes ephemeral_5m_input_tokens and ephemeral_1h_input_tokens, each is priced at its respective rate. Otherwise, all cache creation tokens use the 5-minute rate.
Long context pricing. When input_tokens + cache_creation_tokens + cache_read_tokens > 200,000, multipliers apply:
- Input, cached input, cache write: 2x
- Output: 1.5x
Thinking tokens (from Claude with extended thinking) are included in output_tokens and priced at the output rate.

Example

A claude-sonnet-4-5 request with 5,000 input tokens, 1,000 cache read tokens, and 2,000 output tokens:

inputCost  = 5000 × 3.00 / 1,000,000 = 0.015 → 15 microdollars
cachedCost = 1000 × 0.30 / 1,000,000 = 0.0003 → 0 microdollars
outputCost = 2000 × 15.00 / 1,000,000 = 0.03 → 30 microdollars
total      = 45 microdollars ($0.000045)

Pre-Request Cost Estimation

Before forwarding a request, the proxy estimates the maximum possible cost. This estimate is used for budget enforcement — if the estimate exceeds the remaining budget, the request is blocked before reaching the provider.

Formula:

inputTokenEstimate  = ceil(JSON.stringify(requestBody).length / 4)
outputTokenEstimate = max_completion_tokens ?? max_tokens ?? modelCap ?? defaultCap

estimatedCost = round((inputCost + outputCost) × 1.1)

The ÷ 4 ratio approximates 4 characters per token.
The 1.1× safety margin accounts for estimation imprecision.
If the model is unknown, the fallback estimate is $1.00 (1,000,000 microdollars).

Default output caps by model:

Models	Default Cap
OpenAI o3, o3-mini, o4-mini, o1	100,000 tokens
All other OpenAI models	16,384 tokens
Anthropic claude-opus-4-6, claude-opus-4-5 (and dated variants)	128,000 tokens
Anthropic claude-haiku-3.5	8,000 tokens
Anthropic claude-haiku-3	4,000 tokens
All other Anthropic models (sonnet-4-6, sonnet-4-5, sonnet-4, opus-4-1, opus-4, haiku-4-5)	64,000 tokens
Gemini 2.5 family + 3.x previews	65,536 tokens
Gemini 2.0 family	8,192 tokens

What's Recorded

Every request produces a cost event with these fields:

Field	Type	Description
`requestId`	string	Unique request identifier
`provider`	string	`"openai"`, `"anthropic"`, or `"google"`
`model`	string	Model used (e.g., `gpt-4o`, `claude-sonnet-4-6`, `gemini-2.5-flash`)
`inputTokens`	integer	Total input tokens (OpenAI: prompt_tokens; Anthropic: input + cache creation + cache read)
`outputTokens`	integer	Output/completion tokens
`cachedInputTokens`	integer	Cached input tokens (OpenAI: cached_tokens; Anthropic: cache_read_tokens)
`reasoningTokens`	integer	Reasoning/thinking tokens (OpenAI o-series, Anthropic extended thinking)
`costMicrodollars`	integer	Total cost in microdollars
`costBreakdown`	object	Per-component costs: `{ input, cached, output, reasoning }`. Available in the summary aggregation (`GET /api/cost-events/summary`) but not in individual event responses.
`durationMs`	integer	Total request duration in milliseconds
`sessionId`	string?	Session ID from `X-NullSpend-Session` header
`traceId`	string?	Trace ID (from `traceparent`, `X-NullSpend-Trace-Id`, or auto-generated)
`source`	string	`"proxy"`, `"sdk"`, `"api"`, or `"mcp"`
`tags`	object	Key-value pairs from `X-NullSpend-Tags` header
`apiKeyId`	string	API key that made the request
`createdAt`	timestamp	When the event was recorded

Cost events are deduplicated by (requestId, provider) — reprocessing the same request is safe.

Request / Response Body Logging

Pro, Scale, and Enterprise plans can capture the full request and response bodies for every proxied LLM call. This is useful for debugging agent behavior, auditing prompts, and replaying requests.

Aspect	Detail
Activation	Automatic when your org has a Pro, Scale, or Enterprise subscription
Storage	Cloudflare R2, scoped by org ID and request ID
Size cap	1 MB per object (request body and response body are stored separately)
Streaming	Streaming (SSE) responses are captured in full — the raw SSE text is accumulated during streaming and stored after the stream completes
Cancelled streams	Partial response bodies are stored for cancelled streams (valuable for debugging why clients abort)
Non-streaming	JSON request and response bodies are stored as-is
Viewing	Request and response bodies are visible on the cost event detail page in the dashboard
Latency impact	Zero — the accumulator passes chunks through immediately; storage happens asynchronously after the response is delivered

Body logging is an observability feature. It does not affect cost calculation, budget enforcement, or webhook dispatch. If R2 storage fails, the cost event and budget reconciliation proceed normally.

Session Replay

Group cost events by session to see the full sequence of LLM calls your agent made. This is the primary tool for debugging agent behavior in production.

How to Use

Set the session header on your agent's requests:

X-NullSpend-Session: research-task-47

Any string up to 256 characters. Typically a conversation ID, task ID, or run ID from your agent framework.

View in the dashboard. Session IDs appear as clickable links in the Activity table and on individual cost event detail pages. Click to open the session replay page.
Session replay page shows:
- Summary stats — total cost, event count, duration, total tokens
- Chronological timeline — every LLM call in the session, ordered by timestamp
- Expandable events — click any event to load the full request and response bodies (requires Pro / Scale / Enterprise body logging)

What You See

Session: research-task-47
Total cost: $4.30 | Events: 12 | Duration: 2m 34s | Tokens: 18,400

Timeline:
  14:21:05  GPT-4o        100 → 50 tokens    $0.15    680ms
  14:21:08  GPT-4o        900 → 380 tokens   $0.12    450ms
  14:21:15  Claude Sonnet 600 → 200 tokens   $0.08    320ms
  14:21:22  GPT-4o        2,100 → 1,200      $0.45    1.2s
  ...

Click any row to expand and see the full request/response bodies — what the agent sent and what the provider returned.

API Access

Query session events programmatically:

List events for a session: GET /api/cost-events?sessionId=research-task-47
Full session with summary: GET /api/cost-events/sessions/research-task-47

See Cost Events API for details.

Cost Sources

Source	How It Works
proxy	Automatic. Every request through `proxy.nullspend.dev` generates a cost event.
api	Manual. POST to `/api/cost-events` with token counts. Used by the SDK or direct HTTP.
mcp	MCP tool calls. Budget check and cost event ingestion via `/v1/mcp/budget/check` and `/v1/mcp/events`.

Cancelled Streams

When a streaming response is cancelled before completion (client disconnects), the proxy cannot extract final token counts from the response. Instead, it logs an estimated cost event with two system tags:

_ns_estimated: "true" — cost is an estimate, not exact
_ns_cancelled: "true" — the stream was cancelled

The estimate uses the pre-request estimation formula. These events are included in budget spend tracking to prevent cancelled streams from creating a gap between tracked and actual spend.

Where to See Costs

Dashboard — Daily spend chart, model breakdown, per-key breakdown at nullspend.dev/app/analytics
API — GET /api/cost-events with filters for model, provider, API key, tags, trace ID, and more. Ingest custom events with POST /api/cost-events.
Webhooks — cost_event.created fires for every cost event. Supports full and thin payload modes.

Supported Models — full pricing table for all 56 models
Tracing — correlate requests across a multi-step agent run
Tags — attribute costs to teams, environments, and features
Budgets — enforce spending ceilings based on cost tracking
Webhooks — get notified on every cost event

Cost Tracking

How It Works

OpenAI Cost Formula

Example

Anthropic Cost Formula

Example

Pre-Request Cost Estimation

What's Recorded

Request / Response Body Logging

Session Replay

How to Use

What You See

API Access

Cost Sources

Cancelled Streams

Where to See Costs

On this page