Budgets
Budgets are spending ceilings enforced by the proxy. When the estimated cost of a request would push spend over the limit, the proxy returns `429` — the reque
Budgets are spending ceilings enforced by the proxy. When the estimated cost of a request would push spend over the limit, the proxy returns 429 — the request never reaches the provider and you are not charged.
For a task-oriented setup guide, see Budget Configuration.
How Budgets Work
Request arrives
│
├─ 1. Estimate cost (input tokens + max output tokens × 1.1 safety margin)
│
├─ 2. Period reset ────────── due? ───────► Reset spend to 0, start new period
│
├─ 3. Session limit check ─── exceeds? ──► 429 session_limit_exceeded
│
├─ 4. Velocity check ──────── tripped? ───► 429 velocity_exceeded + Retry-After
│
├─ 4.5 Finalization reserve ── finalize header + in zone? ► unlock reserve
│
├─ 5. Budget check ────────── exceeds? ──► 429 budget_exceeded
│
├─ 6. Reserve estimated cost (30s TTL)
│
├─ 7. Forward request to provider
│
├─ 8. Receive response, calculate actual cost
│
└─ 9. Reconcile: apply actual cost, release reservationBudget enforcement uses a Cloudflare Durable Object with embedded SQLite. All checks and mutations are serialized — no race conditions, even under concurrent load.
Budget Entity Types
| Entity Type | What It Scopes To |
|---|---|
user | All requests from a user account (across all their API keys) |
api_key | Requests from a specific API key |
tag | Requests carrying a specific tag key-value pair (e.g., env=production) |
customer | Requests carrying a specific customer ID via X-NullSpend-Customer header |
A single request can match multiple budgets (e.g., a user budget + an API key budget + a customer budget). All matching budgets must have sufficient remaining balance for the request to proceed.
Configuration
| Field | Type | Description |
|---|---|---|
maxBudgetMicrodollars | integer (required) | Spending ceiling in microdollars. $50 = 50,000,000 |
policy | string | Enforcement policy: "strict_block" (default, denies requests), "soft_block" (logs but allows), or "warn" (tracks only). |
resetInterval | string or null | "daily", "weekly", "monthly", or null (no reset — manual only) |
thresholdPercentages | integer[] | Webhook alert thresholds. Default: [50, 80, 90, 95]. Max 10 values, must be ascending, each 1–100. |
velocityLimitMicrodollars | integer or null | Max spend per velocity window. See Velocity Limits. |
velocityWindowSeconds | integer | Sliding window size. Range: 10–3600. Default: 60. |
velocityCooldownSeconds | integer | Block duration after velocity trip. Range: 10–3600. Default: 60. |
sessionLimitMicrodollars | integer or null | Per-session spending cap. See Session Limits. |
finalizationReserveMicrodollars | integer or null | Portion of budget held back for graceful agent shutdown. When set, normal requests are denied when they'd eat into the reserve. Requests with X-NullSpend-Finalize: 1 can use the reserve once spend reaches the reserve zone. See Finalization Reserve. |
Enforcement Lifecycle
The proxy checks budgets in this exact order. A denial at any step stops the pipeline — later steps are not evaluated.
1. Period Reset
If the budget has a resetInterval and the current period has elapsed, spend resets to 0 and a new period starts. A budget.reset webhook fires.
2. Session Limit Check
If the budget has a sessionLimitMicrodollars and the request includes an X-NullSpend-Session header, the proxy checks cumulative spend for that session. If currentSessionSpend + estimatedCost > sessionLimit, the request is denied.
3. Velocity Check (Circuit Breaker)
If the budget has a velocityLimitMicrodollars, the proxy checks spend within the sliding window. The velocity check uses a circuit breaker pattern:
- Closed (normal): requests pass through, velocity spend is tracked
- Open (tripped): all requests are denied until the cooldown expires
- Recovery: after cooldown, the breaker resets and a
velocity.recoveredwebhook fires
If estimatedSpend + estimate > velocityLimit, the breaker trips.
4. Budget Exhaustion Check
If currentSpend + reservations + estimatedCost > maxBudget, the request is denied. Only budgets with strict_block policy deny requests (this is the default).
5. Reservation
The estimated cost is reserved for 30 seconds. Reservations prevent concurrent requests from collectively exceeding the budget. If a reservation expires (upstream timeout, crash), it is automatically cleaned up.
6. Reconciliation
After the provider responds, the proxy calculates the actual cost and reconciles:
- Adds actual cost to cumulative spend
- Removes the reservation
- Adjusts session spend by
actualCost - estimatedCost
429 Response Bodies
Budget Exceeded
{
"error": {
"code": "budget_exceeded",
"message": "Request blocked: estimated cost exceeds remaining budget.",
"details": {
"entity_type": "user",
"entity_id": "ns_usr_...",
"budget_limit_microdollars": 100000000,
"budget_spend_microdollars": 95000000,
"estimated_cost_microdollars": 8000000,
"finalization_reserve_microdollars": 10000000,
"finalization_remaining_microdollars": 0
}
}
}The finalization_reserve_microdollars and finalization_remaining_microdollars fields are only present when the budget has a finalization reserve configured. finalization_remaining_microdollars is the budget remaining after subtracting both spend and reserve.
Velocity Exceeded
{
"error": {
"code": "velocity_exceeded",
"message": "Request blocked: spending rate exceeds velocity limit. Retry after cooldown.",
"details": {
"limitMicrodollars": 10000000,
"windowSeconds": 60,
"currentMicrodollars": 9500000
}
}
}The response includes a Retry-After header with the cooldown duration in seconds.
Session Limit Exceeded
{
"error": {
"code": "session_limit_exceeded",
"message": "Request blocked: session spend exceeds session limit. Start a new session.",
"details": {
"session_id": "conv_abc123",
"session_spend_microdollars": 4800000,
"session_limit_microdollars": 5000000
}
}
}No Retry-After header — the session is done. Start a new session (new X-NullSpend-Session value) to continue.
Tag Budget Exceeded
{
"error": {
"code": "tag_budget_exceeded",
"message": "Request blocked: tag budget exceeded",
"details": {
"tag_key": "team",
"tag_value": "billing",
"budget_limit_microdollars": 50000000,
"budget_spend_microdollars": 49500000
}
}
}Velocity Limits
Velocity limits catch runaway loops — an agent stuck in a retry cycle can burn through a budget in seconds.
How it works:
- The proxy tracks spend within a sliding window (e.g., $10 in 60 seconds)
- When spend exceeds the limit, a circuit breaker trips
- All requests are blocked for the cooldown period
- After cooldown, the breaker resets and requests resume
- A
velocity.recoveredwebhook fires on recovery
Configuration:
| Field | Range | Default | Description |
|---|---|---|---|
velocityLimitMicrodollars | > 0 | null (disabled) | Max spend per window |
velocityWindowSeconds | 10–3600 | 60 | Sliding window size |
velocityCooldownSeconds | 10–3600 | 60 | Block duration after trip |
Example: $10 velocity limit with 60s window and 60s cooldown means: if your agents spend more than $10 within any 60-second sliding window, all requests are blocked for 60 seconds.
For the full reference — sliding window algorithm, circuit breaker states, and webhook payloads — see Velocity Limits.
Session Limits
Session limits cap how much a single agent conversation can spend, regardless of the overall budget.
How it works:
- Your agent sets
X-NullSpend-Session: conv_abc123on each request - The proxy tracks cumulative spend per session ID
- When a session's spend exceeds the limit, the request is blocked
- The agent should start a new session (new ID) to continue
Key behaviors:
- No header = no enforcement. Session limits only apply when
X-NullSpend-Sessionis present. - Client-defined sessions. The proxy does not manage session lifecycle — your agent decides when to start a new session.
- Independent of budget resets. Session spend does NOT reset when the budget period resets.
- Always strict. Session limits are hard caps regardless of the budget policy.
- 24-hour cleanup. Stale session data is automatically cleaned up after 24 hours of inactivity.
For the full reference — session tracking internals, header usage, and webhook payloads — see Session Limits.
Finalization Reserve
When an agent is near its budget limit, it gets hard-killed mid-task with a 429. Finalization reserve holds back a configurable portion of the budget so the agent can finish gracefully.
How it works:
- You set a
finalizationReserveMicrodollarson the budget (e.g., $5 reserve on a $100 budget) - Normal requests are denied when
spend + reservations + estimate > limit - reserve(effective limit = $95) - When the agent detects it's near the wall (via response headers), it sets
X-NullSpend-Finalize: 1on its final request - The proxy checks if the entity is in the "reserve zone" (spend + reservations >= limit - reserve). If yes, the reserve is unlocked for that request
- If the entity is NOT in the reserve zone, the finalize header is ignored (prevents premature reserve spending)
Key behaviors:
- Server-enforced zone gate. The
X-NullSpend-Finalizeheader only works when the entity has actually reached the reserve zone. Setting it on every request does nothing until you're near the limit. - Only applies to
strict_blockbudgets.soft_blockandwarnbudgets don't enforce the reserve (they don't enforce limits at all). - Response headers show remaining. Every response includes
X-NullSpend-Budget-Effective-RemainingandX-NullSpend-Budget-Finalization-Reserveheaders when a reserve is configured. - Requests-Remaining estimate.
X-NullSpend-Budget-Requests-Remainingshows approximately how many more requests fit in the effective remaining, based on a rolling average of recent request costs.
Dashboard:
The budget form includes a collapsible "Finalization reserve" section. When set, the budget list shows a two-zone progress bar: green for normal spend and amber for the reserve zone. A shield icon indicates budgets with active reserves.
SDK support:
Both the TypeScript and Python SDKs support finalization reserve:
// TypeScript: finalize a request via the proxy
const trackedFetch = ns.createTrackedFetch("openai", {
finalize: true, // Injects X-NullSpend-Finalize: 1
});# Python: finalize a request via the proxy
tracked = ns.create_tracked_client("openai", finalize=True)The SDK's cooperative budget check also subtracts the reserve from remaining (for strict_block budgets only), and skips the subtraction when finalize: true.
Threshold Alerts
When spend crosses a threshold percentage, a webhook fires:
- Thresholds ≥ 90% fire as
budget.threshold.critical - Thresholds < 90% fire as
budget.threshold.warning
Default thresholds are [50, 80, 90, 95]. Customize per budget with up to 10 values (ascending, each 1–100).
See Webhook Event Types for payload details.
Creating a Budget
Dashboard
- Go to Budgets → Set Budget
- Choose entity (your account or a specific API key)
- Set the spending ceiling
- Optionally configure reset interval, velocity limits, session limits, finalization reserve, and alert thresholds
- Click Set Budget — takes effect immediately
API
Budget creation and management uses session authentication (dashboard). See the Budgets API for full endpoint documentation.
# Requires dashboard session cookie
curl -X POST "https://nullspend.dev/api/budgets" \
-H "Cookie: session=..." \
-H "Content-Type: application/json" \
-d '{
"entityType": "api_key",
"entityId": "ns_key_11223344-5566-7788-99aa-bbccddeeff00",
"maxBudgetMicrodollars": 50000000,
"resetInterval": "monthly",
"velocityLimitMicrodollars": 10000000,
"velocityWindowSeconds": 60,
"velocityCooldownSeconds": 60,
"sessionLimitMicrodollars": 5000000
}'To check budget status programmatically (with an API key), use GET /api/budgets/status.
Best Practices
- Start generous, tighten later. Set initial budgets higher than expected. Once you have cost data, tighten with confidence.
- One budget per concern. Separate API keys (and budgets) for different agents, environments, or teams.
- Use session limits for multi-step agents. Cap each task's cost so a single stuck agent can't consume the entire budget.
- Monitor before enforcing. Use the analytics dashboard to understand spending patterns before setting tight ceilings.
- Combine velocity + session limits. Velocity catches sudden spikes; session limits catch slow accumulation over a long conversation.
- Use finalization reserve for multi-step agents. Set a reserve large enough for one cleanup request so agents can save state, send notifications, or close connections before shutting down.
Related
- Budget Configuration Guide — step-by-step setup walkthrough
- Cost Tracking — how costs are calculated and recorded
- Tags — tag-based cost attribution and tag budgets
- Webhook Event Types — budget.exceeded, velocity.exceeded, threshold alerts
- Error Reference — all 429 error codes and response shapes
Cost Attribution
Break down AI spend by customer, team, feature, or any dimension. Answer 'how much does each customer cost me?' with per-key and per-tag grouping.
Velocity Limits
Velocity limits catch runaway loops — an agent stuck in a retry cycle can burn through a budget in seconds before a human can react. They add a spending-rate