Supported Models
NullSpend has pricing data for 38 models across OpenAI, Anthropic, and Google. The proxy routes requests to OpenAI and Anthropic; Google models are supported for cost calculation only.
NullSpend has pricing data for 38 models across OpenAI, Anthropic, and Google. The proxy routes requests to OpenAI and Anthropic. Google models are supported for cost calculation (via API ingest or the cost-engine package) but are not routable through the proxy.
Cost Formula
cost_microdollars = Math.round(Σ(tokens × rate_per_million_tokens))Rates are in dollars per million tokens. The result is in microdollars (1 microdollar = $0.000001).
For the full calculation logic including cached tokens, cache writes, and long context multipliers, see Cost Tracking.
OpenAI Models
14 models. Rates in $/MTok.
| Model | Input | Cached Input | Output |
|---|---|---|---|
gpt-4o | 2.50 | 1.25 | 10.00 |
gpt-4o-mini | 0.15 | 0.075 | 0.60 |
gpt-4.1 | 2.00 | 0.50 | 8.00 |
gpt-4.1-mini | 0.40 | 0.10 | 1.60 |
gpt-4.1-nano | 0.10 | 0.025 | 0.40 |
o4-mini | 1.10 | 0.275 | 4.40 |
o3 | 2.00 | 0.50 | 8.00 |
o3-mini | 1.10 | 0.55 | 4.40 |
o1 | 15.00 | 7.50 | 60.00 |
gpt-5 | 1.25 | 0.125 | 10.00 |
gpt-5-mini | 0.25 | 0.025 | 2.00 |
gpt-5-nano | 0.05 | 0.005 | 0.40 |
gpt-5.1 | 1.25 | 0.125 | 10.00 |
gpt-5.2 | 1.75 | 0.175 | 14.00 |
OpenAI cost formula: (prompt_tokens - cached_tokens) × input + cached_tokens × cached + completion_tokens × output. Reasoning tokens are a subset of completion tokens — not double-counted.
Anthropic Models
22 models (10 aliases + 12 dated variants). Rates in $/MTok.
Aliases
| Model | Input | Cached Input | Cache Write (5m) | Cache Write (1h) | Output |
|---|---|---|---|---|---|
claude-opus-4-6 | 5.00 | 0.50 | 6.25 | 10.00 | 25.00 |
claude-opus-4-5 | 5.00 | 0.50 | 6.25 | 10.00 | 25.00 |
claude-opus-4-1 | 15.00 | 1.50 | 18.75 | 30.00 | 75.00 |
claude-opus-4 | 15.00 | 1.50 | 18.75 | 30.00 | 75.00 |
claude-sonnet-4-6 | 3.00 | 0.30 | 3.75 | 6.00 | 15.00 |
claude-sonnet-4-5 | 3.00 | 0.30 | 3.75 | 6.00 | 15.00 |
claude-sonnet-4 | 3.00 | 0.30 | 3.75 | 6.00 | 15.00 |
claude-haiku-4-5 | 1.00 | 0.10 | 1.25 | 2.00 | 5.00 |
claude-haiku-3.5 | 0.80 | 0.08 | 1.00 | 1.60 | 4.00 |
claude-haiku-3 | 0.25 | 0.03 | 0.30 | 0.50 | 1.25 |
Dated Variants
Dated variants share the exact same rates as their alias:
| Model | Same Rates As |
|---|---|
claude-opus-4-6-20260205 | claude-opus-4-6 |
claude-sonnet-4-6-20260217 | claude-sonnet-4-6 |
claude-sonnet-4-5-20250929 | claude-sonnet-4-5 |
claude-opus-4-5-20251101 | claude-opus-4-5 |
claude-haiku-4-5-20251001 | claude-haiku-4-5 |
claude-opus-4-1-20250805 | claude-opus-4-1 |
claude-opus-4-20250514 | claude-opus-4 |
claude-sonnet-4-20250514 | claude-sonnet-4 |
claude-3-5-haiku-20241022 | claude-haiku-3.5 |
claude-3-haiku-20240307 | claude-haiku-3 |
claude-opus-4-0 | claude-opus-4 |
claude-sonnet-4-0 | claude-sonnet-4 |
Long Context Pricing
When total input tokens (input + cache creation + cache read) exceed 200,000 tokens, multipliers apply:
| Component | Multiplier |
|---|---|
| Input | 2× |
| Cached Input (read) | 2× |
| Cache Write (5m and 1h) | 2× |
| Output | 1.5× |
Cache Write TTLs
Anthropic offers two cache write tiers:
| Tier | TTL | Rate Column |
|---|---|---|
| Ephemeral (5-minute) | 5 minutes | Cache Write (5m) |
| Extended (1-hour) | 1 hour | Cache Write (1h) |
If the response includes ephemeral_5m_input_tokens and ephemeral_1h_input_tokens, each is priced at its respective rate. Otherwise, all cache creation tokens use the 5-minute rate.
Google Models (cost calculation only)
2 models. Rates in $/MTok. These models are supported for cost calculation via API ingest and the cost-engine package, but the proxy does not route requests to Google.
| Model | Input | Cached Input | Output |
|---|---|---|---|
gemini-2.5-pro | 1.25 | 0.3125 | 10.00 |
gemini-2.5-flash | 0.15 | 0.0375 | 0.60 |
Unknown Models
If a request uses a model not in the pricing catalog, the proxy returns 400 with error code invalid_model. See the error reference for details.
To request a new model, contact support.
Related
- Cost Tracking — full cost calculation formulas for each provider
- Error Reference —
invalid_modelerror details
MCP Proxy
MCP proxy that gates risky tool calls through NullSpend approval before forwarding to an upstream MCP server. Adds cost tracking and budget enforcement for ever
Use with AI Coding Assistants
Copy-paste reference blocks for Cursor, Claude Code, GitHub Copilot, and other AI coding tools. Give your assistant full context on the NullSpend API in one paste.