Architecture
How NullSpend enforces budgets in-path with sub-20ms overhead, attributes cost across four surfaces, and stays open source from end to end.
NullSpend has four control surfaces, one shared budget engine, and a thin reporting plane. Every piece is Apache-2.0 and runs on your infrastructure if you want it to.
The four surfaces
The same budget enforcement applies regardless of how a request enters NullSpend.
┌─────────────────────────────────────────────────────────────────────┐
│ YOUR APP / AGENT │
└──────┬──────────────┬─────────────────┬──────────────────┬────────┘
│ │ │ │
[PROXY] [SDK] [CLAUDE AGENT] [MCP]
in-path, per-call adapter server + proxy
drop-in env control (Anthropic SDK) agent-native
│ │ │ │
└──────┬───────┴────────┬────────┴────────┬─────────┘
▼ ▼ ▼
┌─────────────────────────┐
│ BUDGET ENGINE │
│ check → reserve → │
│ spend → reconcile │
└────────────┬────────────┘
▼
OpenAI · Anthropic · GeminiProxy — in-path enforcement
A Cloudflare Workers proxy wraps OpenAI, Anthropic, and Google endpoints. One environment variable swaps you in:
OPENAI_BASE_URL=https://proxy.nullspend.dev/v1Every request goes through check-and-reserve against the budget engine before reaching the upstream provider. If the budget is exhausted, the proxy returns a 429 with a structured envelope before any provider tokens are spent. Sub-20ms overhead at p50.
Unbypassable. Your application cannot route around it without rewriting the base URL, which is a config change, not a code change.
SDK — per-call control
When the proxy can't see a call (embedded inference, custom transport, third-party agent framework), the SDK gives you the same accounting in TypeScript or Python:
import { NullSpend } from "@nullspend/sdk";
const ns = new NullSpend({ apiKey });
const fetch = ns.createTrackedFetch("openai");createTrackedFetch wraps the standard fetch, calculates cost from response token counts, and reports to the same budget engine the proxy uses. Same enforcement guarantees, applied client-side.
Claude Agent adapter
@nullspend/claude-agent wraps the Claude Agent SDK so every model call routes through the NullSpend proxy with no other code changes. Same accounting and enforcement as the proxy, surfaced through the agent runtime your code already uses.
import { withNullSpend } from "@nullspend/claude-agent";
import { ClaudeAgent } from "@anthropic-ai/claude-agent-sdk";
const agent = withNullSpend(new ClaudeAgent({ /* ... */ }), { apiKey });MCP — server and proxy
For autonomous agents that need to ask for their own budget, @nullspend/mcp-server exposes three tools any MCP client can call:
check_budget— am I allowed to spend $X?request_budget— increase my budget, route to a human if neededget_cost— what have I spent so far this session?
@nullspend/mcp-proxy complements it by gating upstream MCP tool calls through budget enforcement and HITL approval — so an agent invoking an external MCP tool is governed the same way as one invoking an LLM.
Works with Claude, Cursor, or any MCP client. The agent self-governs against the same budget state the proxy, SDK, and Claude Agent adapter enforce.
Budget engine
The core invariant: a single state owner per (org, key) pair. State lives in a Cloudflare Durable Object so writes serialize without distributed locks.
The lifecycle of every request:
- Check — does the budget have headroom for the estimated cost?
- Reserve — atomically subtract the estimated cost from available budget
- Spend — provider returns; compute exact cost from real token counts
- Reconcile — release the reservation, charge the actual cost, emit a cost event
Reservations expire if the request hangs, so a stuck call cannot permanently lock budget. Reconciliation is idempotent (idempotent: replays don't double-charge), so retries and partial failures don't corrupt accounting.
Cost engine
Pricing data is a JSON-checked-into-the-repo (packages/cost-engine/src/pricing-data.json). 56+ models across OpenAI, Anthropic, Google. Input tokens, output tokens, cached tokens, reasoning tokens — each priced at the model's published rate. No third-party feeds. No estimates. The arithmetic is in source.
When a new model ships, you add a row to the JSON. The same engine runs in the proxy worker, the SDK, the dashboard, and the Python package. One source of truth.
Storage
| Plane | Where | What lives there |
|---|---|---|
| Budget state | Cloudflare Durable Objects | Live counters, reservations, plan counters |
| Cost events | Postgres (Supabase) | Per-request cost rows, tag attribution, customer mapping |
| Request bodies (opt-in) | Cloudflare R2 | Encrypted full bodies for Pro/Scale/Enterprise debugging |
| Audit log | Postgres | Org-scoped action history |
The proxy is the only writer to budget state. The dashboard reads cost events directly from Postgres, never the Durable Object — this keeps the hot path fast and the analytical path independent.
Open source
Everything is Apache-2.0:
apps/proxy— Cloudflare Workers proxypackages/sdk— TypeScript SDKpackages/sdk-python— Python SDKpackages/cost-engine— pricing arithmeticpackages/mcp-server/packages/mcp-proxy— MCP integrationpackages/claude-agent— Claude Agent SDK adapterpackages/db— Drizzle ORM schema and migrations
Run the binary we run, on your own infrastructure. Read the exact formula that decides your margin. No black-box billing.
NullSpend
The open-source profitability layer for AI-native SaaS. Per-request cost tracking, per-customer margin intelligence, budget enforcement, and Stripe revenue sync — across SDK, proxy, Claude Agent adapter, and MCP surfaces.
OpenAI Quickstart
Get cost tracking for your OpenAI calls in under 2 minutes.