Proxy Endpoints
Routes the NullSpend proxy exposes for upstream provider calls.
The NullSpend proxy sits between your agents and upstream providers. It authenticates requests, tracks costs, and enforces budgets transparently.
Base URL: https://proxy.nullspend.dev
Provider Routes
| Method | Path | Provider | Default Upstream |
|---|---|---|---|
| POST | /v1/chat/completions | OpenAI | https://api.openai.com |
| POST | /v1/messages | Anthropic | https://api.anthropic.com |
| POST | /v1beta/models/{model}:generateContent | Google Gemini | https://generativelanguage.googleapis.com |
| POST | /v1beta/models/{model}:streamGenerateContent | Google Gemini (streaming) | https://generativelanguage.googleapis.com |
| POST | /v1/mcp/budget/check | MCP | None (local) |
| POST | /v1/mcp/events | MCP | None (local) |
Per-Customer Budget Routes
The bind, gate, and unit-economics endpoints share the proxy worker for low-latency enforcement.
| Method | Path | Purpose |
|---|---|---|
| POST | /v1/bind | Link a customer to a plan + budget cap |
| POST | /v1/gate | Per-action enforcement decision (200-always envelope) |
| GET | /v1/customers/:customerId/unit-economics | Per-customer aggregate state |
See the Unit Economics API reference for full request/response shapes and the Per-Customer Budgets guide for the conceptual model.
All provider routes require an X-NullSpend-Key header. Unsupported paths return 404 not_found. Non-POST methods return 404.
OpenAI (/v1/chat/completions)
Forwards to https://api.openai.com/v1/chat/completions (or custom upstream). Headers forwarded to the upstream provider:
authorization— Your OpenAI API keyopenai-organizationopenai-projecttraceparent,tracestate— W3C trace context
Supports both streaming and non-streaming responses.
Anthropic (/v1/messages)
Forwards to https://api.anthropic.com/v1/messages (or custom upstream). Headers forwarded:
x-api-keyorauthorization— Your Anthropic API keyanthropic-version— Defaults to2023-06-01if not providedanthropic-betatraceparent,tracestate
Supports both streaming and non-streaming responses.
Google Gemini (/v1beta/models/{model}:generateContent, :streamGenerateContent)
Forwards to Google's Gemini API using the same URL structure as the native API. Replace the base URL and add your NullSpend key. The request body passes through unmodified in native Gemini format.
Authentication: The proxy extracts the Google API key from either:
Authorization: Bearer <key>(NullSpend convention, works with all providers)x-goog-api-key: <key>(Google's native header)
Headers forwarded:
x-goog-api-key— Your Google API key (extracted fromAuthorizationor forwarded directly)traceparent,tracestate— W3C trace context
Streaming: The proxy automatically appends ?alt=sse to the upstream URL for :streamGenerateContent requests. You don't need to include it yourself.
Model in URL: Unlike OpenAI/Anthropic where the model is in the request body, Gemini puts the model in the URL path. The proxy extracts it automatically for cost tracking. Dated model aliases (e.g., gemini-2.5-flash-preview-04-17) are resolved to their base model for pricing.
Thinking tokens: Gemini 2.5 models report thoughtsTokenCount in usage metadata. These are tracked as _ns_thinking_tokens in cost event tags, similar to OpenAI reasoning tokens.
Response ID: Since Gemini doesn't return a request ID in response headers, the proxy generates a UUID and sets x-request-id on the response. Google's responseId from the response body is captured as the _ns_google_response_id tag for cross-referencing.
# Non-streaming
curl "https://proxy.nullspend.dev/v1beta/models/gemini-2.5-flash:generateContent" \
-H "x-goog-api-key: $GOOGLE_API_KEY" \
-H "X-NullSpend-Key: $NULLSPEND_API_KEY" \
-H "Content-Type: application/json" \
-d '{"contents":[{"parts":[{"text":"Hello"}]}]}'
# Streaming
curl "https://proxy.nullspend.dev/v1beta/models/gemini-2.5-flash:streamGenerateContent" \
-H "x-goog-api-key: $GOOGLE_API_KEY" \
-H "X-NullSpend-Key: $NULLSPEND_API_KEY" \
-H "Content-Type: application/json" \
--no-buffer \
-d '{"contents":[{"parts":[{"text":"Count to 5"}]}]}'MCP (/v1/mcp/budget/check, /v1/mcp/events)
Local endpoints for MCP server integrations. /budget/check performs a pre-request budget check. /events ingests cost events from MCP tool calls.
Policy Endpoint
GET /v1/policy returns the calling key's enforcement policy: current budget remaining/limit/spend, allowed models and providers, and the cheapest model per provider (filtered to the allow list). Used by SDKs to render upgrade prompts before a request is sent.
Auth: API key (same as provider routes). Returns 200 with a JSON body shaped:
{
"budget": { "remaining_microdollars": 9_000_000, "max_microdollars": 10_000_000, "spend_microdollars": 1_000_000, "period_end": "2026-05-01T00:00:00.000Z", "entity_type": "api_key", "entity_id": "..." },
"allowed_models": ["gpt-4o-mini", "claude-haiku-4-5"],
"allowed_providers": ["openai", "anthropic"],
"cheapest_per_provider": { "openai": { "model": "gpt-4o-mini", "input_per_mtok": 150_000, "output_per_mtok": 600_000 } },
"cheapest_overall": { "model": "gpt-4o-mini", "provider": "openai", "input_per_mtok": 150_000, "output_per_mtok": 600_000 },
"restrictions_active": true
}The dashboard mirror of this endpoint is GET /api/policy (same auth, same shape).
Health Endpoints
No authentication required.
| Method | Path | Response |
|---|---|---|
| GET | /health | { "status": "ok", "service": "nullspend-proxy" } |
| GET | /health/metrics | Analytics Engine metrics (JSON or Prometheus, based on Accept header) |
| GET | /health/ready | { "status": "ok", "service": "nullspend-proxy" } — simple readiness check |
| GET | /health/feature-flags | Live runtime values of PLAN_COUNTER_ENABLED, NULLSPEND_CLOUD, CACHE_SCHEMA_VERSION, and build_sha. Used by the launch-watcher and shadow-mode alerts. |
Internal Endpoints
These use shared secret authentication (not API keys) and are not for external use.
| Method | Path | Purpose |
|---|---|---|
| POST | /internal/budget/invalidate | Invalidate budget cache for a user |
| GET | /internal/budget/velocity-state | Query velocity limit state |
Upstream Allowlist
When overriding the upstream provider with the X-NullSpend-Upstream header, only these URLs are accepted:
| URL | Provider |
|---|---|
https://api.openai.com | OpenAI (default) |
https://api.groq.com/openai | Groq |
https://api.together.xyz | Together AI |
https://api.fireworks.ai/inference | Fireworks AI |
https://api.mistral.ai | Mistral |
https://openrouter.ai/api | OpenRouter |
https://generativelanguage.googleapis.com | Google Gemini (default) |
Invalid upstream URLs return 400 invalid_upstream. Entries must not include API version path segments (/v1, /v1beta), as the proxy appends these automatically.
Body Size Limit
Maximum request body: 1 MB (1,048,576 bytes).
Enforced in two places:
- Pre-read —
Content-Lengthheader checked before reading the body - Post-read — Actual byte count verified after reading
Exceeding either check returns 413 payload_too_large.
Response body logging (Pro/Enterprise) also caps at 1 MB. Streaming responses exceeding 1 MB are truncated in the stored body; the client receives the full response regardless.
Request Processing Pipeline
Every request follows this exact order:
- Trace ID resolution — Always runs, even on errors. Sets
X-NullSpend-Trace-Idon the response. - Health routes — No auth. Returns immediately for
/health,/health/metrics,/health/ready. - Internal routes — Shared secret auth. Returns immediately for
/internal/*. - Route lookup — POST only. Unknown
/v1/*paths return404. - Rate limiting + API key authentication — Run in parallel via
Promise.all. Rate limiting checks IP then key limits (Rate Limits). Auth does SHA-256 hash lookup (Authentication). - Body parsing — JSON validation and size check. Runs sequentially after auth and rate limiting complete.
- Context construction — Resolves webhooks, API version, session ID, tags, and trace context.
- Route handler — Budget check → upstream call → cost tracking → reconciliation.
Related
- Custom Headers — request and response headers
- Authentication — key lifecycle and validation
- Rate Limits — enforcement order and failure modes
- Errors — error codes and response format