Cost Tracking
NullSpend calculates the cost of every LLM request automatically. No SDK or code changes beyond the initial proxy setup.
NullSpend calculates the cost of every LLM request automatically. No SDK or code changes beyond the initial proxy setup.
How It Works
Your App ──► NullSpend Proxy ──► Provider (OpenAI / Anthropic)
│
◄── response ───────────┘
│
├─ Extract token counts from response
├─ Look up model pricing
├─ Calculate cost in microdollars
└─ Log cost event (async, never blocks response)The proxy extracts usage data from the provider's response — token counts for input, output, cached, and reasoning tokens — then calculates cost using the model's pricing rates. Cost is expressed in microdollars (1 microdollar = $0.000001, so $1.00 = 1,000,000 microdollars).
Cost logging is asynchronous. It never adds latency to your response.
OpenAI Cost Formula
normalInputTokens = prompt_tokens - cached_tokens
cost = (normalInputTokens × inputRate)
+ (cachedTokens × cachedInputRate)
+ (completionTokens × outputRate)- Reasoning tokens (from o3, o4-mini, o1) are a subset of
completion_tokens. They are tracked separately for attribution but are not double-counted — the output cost already includes them. - The proxy tries
modelfrom the request first, then falls back to the model in the response (handles aliases). - Each component is rounded individually, then a residual correction is applied to the largest component to guarantee the parts sum to the total.
Example
A gpt-4o request with 1,000 input tokens (200 cached) and 500 output tokens:
normalInput = 1000 - 200 = 800
inputCost = 800 × 2.50 / 1,000,000 = 0.002 → 2 microdollars
cachedCost = 200 × 1.25 / 1,000,000 = 0.00025 → 0 microdollars
outputCost = 500 × 10.00 / 1,000,000 = 0.005 → 5 microdollars
total = 7 microdollars ($0.000007)Anthropic Cost Formula
cost = (input_tokens × inputRate)
+ (cache_creation_tokens × cacheWriteRate)
+ (cache_read_tokens × cachedInputRate)
+ (output_tokens × outputRate)Key differences from OpenAI:
input_tokensis already uncached. No subtraction needed — Anthropic separates input and cache read tokens in the response.- Cache write has two TTL tiers. If the response includes
ephemeral_5m_input_tokensandephemeral_1h_input_tokens, each is priced at its respective rate. Otherwise, all cache creation tokens use the 5-minute rate. - Long context pricing. When
input_tokens + cache_creation_tokens + cache_read_tokens > 200,000, multipliers apply:- Input, cached input, cache write: 2x
- Output: 1.5x
- Thinking tokens (from Claude with extended thinking) are included in
output_tokensand priced at the output rate.
Example
A claude-sonnet-4-5 request with 5,000 input tokens, 1,000 cache read tokens, and 2,000 output tokens:
inputCost = 5000 × 3.00 / 1,000,000 = 0.015 → 15 microdollars
cachedCost = 1000 × 0.30 / 1,000,000 = 0.0003 → 0 microdollars
outputCost = 2000 × 15.00 / 1,000,000 = 0.03 → 30 microdollars
total = 45 microdollars ($0.000045)Pre-Request Cost Estimation
Before forwarding a request, the proxy estimates the maximum possible cost. This estimate is used for budget enforcement — if the estimate exceeds the remaining budget, the request is blocked before reaching the provider.
Formula:
inputTokenEstimate = ceil(JSON.stringify(requestBody).length / 4)
outputTokenEstimate = max_completion_tokens ?? max_tokens ?? modelCap ?? defaultCap
estimatedCost = round((inputCost + outputCost) × 1.1)- The
÷ 4ratio approximates 4 characters per token. - The
1.1×safety margin accounts for estimation imprecision. - If the model is unknown, the fallback estimate is $1.00 (1,000,000 microdollars).
Default output caps by model:
| Models | Default Cap |
|---|---|
| o3, o3-mini, o4-mini, o1 | 100,000 tokens |
| claude-opus-4-6, claude-opus-4-5 | 128,000 tokens |
| claude-sonnet-4-6, claude-sonnet-4-5, claude-opus-4-1, claude-haiku-4-5 | 64,000 tokens |
| claude-haiku-3.5 | 8,000 tokens |
| claude-haiku-3 | 4,000 tokens |
| All other OpenAI models | 16,384 tokens |
| All other Anthropic models | 64,000 tokens |
What's Recorded
Every request produces a cost event with these fields:
| Field | Type | Description |
|---|---|---|
requestId | string | Unique request identifier |
provider | string | "openai" or "anthropic" |
model | string | Model used (e.g., gpt-4o, claude-sonnet-4-5) |
inputTokens | integer | Total input tokens (OpenAI: prompt_tokens; Anthropic: input + cache creation + cache read) |
outputTokens | integer | Output/completion tokens |
cachedInputTokens | integer | Cached input tokens (OpenAI: cached_tokens; Anthropic: cache_read_tokens) |
reasoningTokens | integer | Reasoning/thinking tokens (OpenAI o-series, Anthropic extended thinking) |
costMicrodollars | integer | Total cost in microdollars |
costBreakdown | object | Per-component costs: { input, cached, output, reasoning }. Available in the summary aggregation (GET /api/cost-events/summary) but not in individual event responses. |
durationMs | integer | Total request duration in milliseconds |
sessionId | string? | Session ID from X-NullSpend-Session header |
traceId | string? | Trace ID (from traceparent, X-NullSpend-Trace-Id, or auto-generated) |
source | string | "proxy", "api", or "mcp" |
tags | object | Key-value pairs from X-NullSpend-Tags header |
apiKeyId | string | API key that made the request |
createdAt | timestamp | When the event was recorded |
Cost events are deduplicated by (requestId, provider) — reprocessing the same request is safe.
Request / Response Body Logging
Pro and Enterprise plans can capture the full request and response bodies for every proxied LLM call. This is useful for debugging agent behavior, auditing prompts, and replaying requests.
| Aspect | Detail |
|---|---|
| Activation | Automatic when your org has a Pro or Enterprise subscription |
| Storage | Cloudflare R2, scoped by org ID and request ID |
| Size cap | 1 MB per object (request body and response body are stored separately) |
| Streaming | Streaming (SSE) responses are captured in full — the raw SSE text is accumulated during streaming and stored after the stream completes |
| Cancelled streams | Partial response bodies are stored for cancelled streams (valuable for debugging why clients abort) |
| Non-streaming | JSON request and response bodies are stored as-is |
| Viewing | Request and response bodies are visible on the cost event detail page in the dashboard |
| Latency impact | Zero — the accumulator passes chunks through immediately; storage happens asynchronously after the response is delivered |
Body logging is an observability feature. It does not affect cost calculation, budget enforcement, or webhook dispatch. If R2 storage fails, the cost event and budget reconciliation proceed normally.
Session Replay
Group cost events by session to see the full sequence of LLM calls your agent made. This is the primary tool for debugging agent behavior in production.
How to Use
- Set the session header on your agent's requests:
X-NullSpend-Session: research-task-47Any string up to 256 characters. Typically a conversation ID, task ID, or run ID from your agent framework.
-
View in the dashboard. Session IDs appear as clickable links in the Activity table and on individual cost event detail pages. Click to open the session replay page.
-
Session replay page shows:
- Summary stats — total cost, event count, duration, total tokens
- Chronological timeline — every LLM call in the session, ordered by timestamp
- Expandable events — click any event to load the full request and response bodies (requires Pro/Enterprise body logging)
What You See
Session: research-task-47
Total cost: $4.30 | Events: 12 | Duration: 2m 34s | Tokens: 18,400
Timeline:
14:21:05 GPT-4o 100 → 50 tokens $0.15 680ms
14:21:08 GPT-4o 900 → 380 tokens $0.12 450ms
14:21:15 Claude Sonnet 600 → 200 tokens $0.08 320ms
14:21:22 GPT-4o 2,100 → 1,200 $0.45 1.2s
...Click any row to expand and see the full request/response bodies — what the agent sent and what the provider returned.
API Access
Query session events programmatically:
- List events for a session:
GET /api/cost-events?sessionId=research-task-47 - Full session with summary:
GET /api/cost-events/sessions/research-task-47
See Cost Events API for details.
Cost Sources
| Source | How It Works |
|---|---|
| proxy | Automatic. Every request through proxy.nullspend.dev generates a cost event. |
| api | Manual. POST to /api/cost-events with token counts. Used by the SDK or direct HTTP. |
| mcp | MCP tool calls. Budget check and cost event ingestion via /v1/mcp/budget/check and /v1/mcp/events. |
Cancelled Streams
When a streaming response is cancelled before completion (client disconnects), the proxy cannot extract final token counts from the response. Instead, it logs an estimated cost event with two system tags:
_ns_estimated: "true"— cost is an estimate, not exact_ns_cancelled: "true"— the stream was cancelled
The estimate uses the pre-request estimation formula. These events are included in budget spend tracking to prevent cancelled streams from creating a gap between tracked and actual spend.
Where to See Costs
- Dashboard — Daily spend chart, model breakdown, per-key breakdown at nullspend.dev/app/analytics
- API —
GET /api/cost-eventswith filters for model, provider, API key, tags, trace ID, and more. Ingest custom events withPOST /api/cost-events. - Webhooks —
cost_event.createdfires for every cost event. Supports full and thin payload modes.
Related
- Supported Models — full pricing table for all 38 models
- Tracing — correlate requests across a multi-step agent run
- Tags — attribute costs to teams, environments, and features
- Budgets — enforce spending ceilings based on cost tracking
- Webhooks — get notified on every cost event