NullSpend Docs

Supported Models

NullSpend supports 56 models across OpenAI, Anthropic, and Google Gemini with full proxy routing, cost tracking, and budget enforcement.

NullSpend supports 56 models across OpenAI (26), Anthropic (22), and Google Gemini (8) with full proxy routing, cost tracking, and budget enforcement.

Cost Formula

cost_microdollars = Math.round(Σ(tokens × rate_per_million_tokens))

Rates are in dollars per million tokens. The result is in microdollars (1 microdollar = $0.000001).

For the full calculation logic including cached tokens, cache writes, and long context multipliers, see Cost Tracking.

OpenAI Models

26 models. Rates in $/MTok.

ModelInputCached InputOutput
gpt-4o2.501.2510.00
gpt-4o-mini0.150.0750.60
gpt-4.12.000.508.00
gpt-4.1-mini0.400.101.60
gpt-4.1-nano0.100.0250.40
o4-mini1.100.2754.40
o32.000.508.00
o3-mini1.100.554.40
o3-pro20.0020.0080.00
o115.007.5060.00
o1-pro150.00150.00600.00
o1-mini1.100.554.40
gpt-51.250.12510.00
gpt-5-mini0.250.0252.00
gpt-5-nano0.050.0050.40
gpt-5-pro15.0015.00120.00
gpt-5.11.250.12510.00
gpt-5.21.750.17514.00
gpt-5.2-pro21.0021.00168.00
gpt-5.42.500.2515.00
gpt-5.4-mini0.750.0754.50
gpt-5.4-nano0.200.021.25
gpt-5.4-pro30.0030.00180.00
o3-deep-research10.002.5040.00
o4-mini-deep-research2.000.508.00
computer-use-preview3.003.0012.00

OpenAI cost formula: (prompt_tokens - cached_tokens) × input + cached_tokens × cached + completion_tokens × output. Reasoning tokens are a subset of completion tokens — not double-counted.

Anthropic Models

22 models (10 aliases + 12 dated variants). Rates in $/MTok.

Aliases

ModelInputCached InputCache Write (5m)Cache Write (1h)Output
claude-opus-4-65.000.506.2510.0025.00
claude-opus-4-55.000.506.2510.0025.00
claude-opus-4-115.001.5018.7530.0075.00
claude-opus-415.001.5018.7530.0075.00
claude-sonnet-4-63.000.303.756.0015.00
claude-sonnet-4-53.000.303.756.0015.00
claude-sonnet-43.000.303.756.0015.00
claude-haiku-4-51.000.101.252.005.00
claude-haiku-3.50.800.081.001.604.00
claude-haiku-30.250.030.300.501.25

Dated Variants

Dated variants share the exact same rates as their alias:

ModelSame Rates As
claude-opus-4-6-20260205claude-opus-4-6
claude-sonnet-4-6-20260217claude-sonnet-4-6
claude-sonnet-4-5-20250929claude-sonnet-4-5
claude-opus-4-5-20251101claude-opus-4-5
claude-haiku-4-5-20251001claude-haiku-4-5
claude-opus-4-1-20250805claude-opus-4-1
claude-opus-4-20250514claude-opus-4
claude-sonnet-4-20250514claude-sonnet-4
claude-3-5-haiku-20241022claude-haiku-3.5
claude-3-haiku-20240307claude-haiku-3
claude-opus-4-0claude-opus-4
claude-sonnet-4-0claude-sonnet-4

Long Context Pricing

When total input tokens (input + cache creation + cache read) exceed 200,000 tokens, multipliers apply:

ComponentMultiplier
Input
Cached Input (read)
Cache Write (5m and 1h)
Output1.5×

Cache Write TTLs

Anthropic offers two cache write tiers:

TierTTLRate Column
Ephemeral (5-minute)5 minutesCache Write (5m)
Extended (1-hour)1 hourCache Write (1h)

If the response includes ephemeral_5m_input_tokens and ephemeral_1h_input_tokens, each is priced at its respective rate. Otherwise, all cache creation tokens use the 5-minute rate.

Google Gemini Models

8 models. Rates in $/MTok. Proxy routes natively via /v1beta/models/{model}:generateContent.

ModelInputCached InputOutput
gemini-2.5-pro1.250.12510.00
gemini-2.5-flash0.300.032.50
gemini-2.5-flash-lite0.100.010.40
gemini-2.0-flash0.100.0250.40
gemini-2.0-flash-lite0.0750.30
gemini-3-flash-preview0.500.053.00
gemini-3.1-pro-preview2.000.2012.00
gemini-3.1-flash-lite-preview0.250.0251.50

Gemini cost formula: (promptTokenCount - cachedContentTokenCount) × input + cachedContentTokenCount × cached + candidatesTokenCount × output. Thinking tokens (thoughtsTokenCount) are a subset of output, tracked as _ns_thinking_tokens tag.

Model aliases: Dated model names (e.g., gemini-2.5-flash-preview-04-17) are resolved to their base model for pricing via prefix matching.

Tiered pricing: gemini-2.5-pro is the only Gemini model with a long-context tier — prompts exceeding 200K tokens are billed at 2× input, 2× cached input, and 1.5× output. NullSpend applies these multipliers automatically. All other Gemini models (flash family, 2.0 series, 3.x previews) price flat regardless of prompt length.

On this page