Budget Configuration
Step-by-step guide to creating budgets, configuring velocity / session / loop limits, choosing enforcement policies, and using finalization reserves.
Set spending ceilings to prevent cost overruns from runaway agents.
How budgets work
A budget is a spending ceiling attached to a user account, an API key, a tag, or a customer. When the estimated cost of a request would push cumulative spend over the budget, the proxy blocks the request with a 429 Too Many Requests response.
Budget enforcement happens before the request is forwarded to the LLM provider. The proxy checks the current spend against the budget atomically using a Cloudflare Durable Object (single-threaded, no race conditions even under concurrent load).
Creating a budget
- Open the NullSpend dashboard
- Click Set Budget
- Configure:
- Budget for — your account, a specific API key, a tag (e.g.,
env=production), or a customer - Budget limit — spending ceiling in dollars (e.g., $50.00)
- Reset interval — none (manual reset), daily, weekly, or monthly
- Budget for — your account, a specific API key, a tag (e.g.,
- Optionally configure advanced guardrails (expand each section):
- Velocity limit — max spend per sliding time window (triggers a cooldown if exceeded)
- Alert thresholds — custom percentage thresholds for webhook alerts
- Session limit — max spend per agent session (see below)
- Click Set Budget
The budget takes effect immediately. No proxy restart or redeployment needed.
Budget enforcement behavior
When a request would exceed the budget:
- The proxy returns HTTP
429 Too Many Requests - The response body includes a machine-readable error:
{ "error": { "code": "budget_exceeded", "message": "Request blocked: estimated cost exceeds remaining budget", "details": null } } - Your application receives a standard HTTP error — handle it like any other rate limit or quota error
The blocked request is never forwarded to the LLM provider. You are not charged by OpenAI/Anthropic for blocked requests.
Enforcement policies
Each budget has a policy that controls what happens when the limit is exceeded:
| Policy | Behavior | Use case |
|---|---|---|
strict_block (default) | Blocks the request with 429. The request is never forwarded to the provider. | Production cost control. You want hard spending caps. |
soft_block | Logs the overage and allows the request through. Cost events are tagged with budget status denied. | Transitioning from tracking to enforcement. See the impact before turning on hard blocks. |
warn | Tracks spend but never blocks. Threshold webhooks still fire. | Monitoring-only mode. Useful during initial rollout or for customer budgets where you want visibility without disruption. |
Set the policy when creating a budget via the dashboard or API:
{
"entityType": "customer",
"entityId": "acme-corp",
"maxBudgetMicrodollars": 50000000,
"policy": "warn"
}Session limits and velocity limits always enforce strict_block regardless of the budget policy.
Velocity limits
Velocity limits catch runaway loops — an agent stuck in a retry cycle can burn through a budget in seconds. When spend in a sliding window exceeds the velocity limit, the proxy trips a circuit breaker and blocks requests for a cooldown period.
Configure in the budget dialog:
- Velocity limit — dollar amount per window (e.g., $10)
- Window — sliding window in seconds (10-3600, default 60)
- Cooldown — block duration after tripping (10-3600, default 60)
Velocity denial returns 429 with "code": "velocity_exceeded" and a Retry-After header indicating when the cooldown expires.
Session limits
Session limits cap how much a single agent session can spend, regardless of the overall budget. This prevents a single long-running agent task from consuming the entire budget.
How it works
- Your agent sets a session ID via the
X-NullSpend-Sessionheader on each request - The proxy tracks cumulative spend per session ID per budget entity
- When a session's spend would exceed the session limit, the request is blocked
Configuration
Set the session limit in the budget dialog under "Session limit (optional)". Enter a dollar amount (e.g., $5.00).
Session denial response
{
"error": {
"code": "session_limit_exceeded",
"message": "Request blocked: session spend exceeds session limit. Start a new session.",
"details": {
"session_id": "agent-task-abc",
"session_spend_microdollars": 4800000,
"session_limit_microdollars": 5000000
}
}
}No Retry-After header is sent — the session is done. The agent should start a new session (new session ID) to continue.
Key behaviors
- No session header = no enforcement. Session limits only apply when the
X-NullSpend-Sessionheader is present. - Sessions are client-defined. The proxy does not manage session lifecycle. Your agent decides when to start a new session by sending a new session ID.
- Independent of budget resets. Session spend does NOT reset when the budget period resets. A session spans calendar boundaries.
- Always strict block. Session limits are hard caps regardless of the budget policy (strict_block, soft_block, or warn).
- 24-hour cleanup. Stale session data is automatically cleaned up after 24 hours of inactivity.
Webhook event
When a session limit is exceeded, a session.limit_exceeded webhook event is dispatched (if webhooks are configured).
Finalization reserve
When agents hit their budget limit, they're hard-killed mid-task. Finalization reserve holds back a portion of the budget so agents can finish gracefully.
Setting a finalization reserve
- In the budget dialog, expand Finalization reserve
- Enter the dollar amount to hold back (e.g., $5.00 on a $100 budget)
- Click Set Budget
The budget list shows a two-zone progress bar when a reserve is active: green for the normal zone, amber for the reserve.
How agents use it
Your agent watches the response headers:
const response = await trackedFetch("https://proxy.nullspend.dev/v1/chat/completions", {
// ... normal request
});
const remaining = parseInt(response.headers.get("X-NullSpend-Budget-Effective-Remaining") || "0");
const reserve = parseInt(response.headers.get("X-NullSpend-Budget-Finalization-Reserve") || "0");
if (remaining <= 0 && reserve > 0) {
// We're in the reserve zone — do one final cleanup request
const finalResponse = await trackedFetch("https://proxy.nullspend.dev/v1/chat/completions", {
headers: { "X-NullSpend-Finalize": "1" },
// ... cleanup request
});
}Or with the SDK:
const finalFetch = ns.createTrackedFetch("openai", { finalize: true });Recommendations
- Set the reserve to the cost of 1-2 cleanup requests (typically $1-5 for GPT-4o)
- Use the
X-NullSpend-Budget-Requests-Remainingheader to know when to start winding down - The reserve only unlocks when the agent has actually reached the zone — setting
finalizeearly has no effect
Budget tracking
Budget spend is tracked in real-time using a Cloudflare Durable Object with embedded SQLite. This means:
- Spend updates are instantaneous (no batch processing delay)
- Concurrent requests are serialized (no race conditions)
- Budget state is durable and survives proxy restarts
Viewing budget status
In the dashboard, go to Budgets to see:
- Current spend vs. ceiling for each budget
- Percentage utilization with color-coded health indicators
- Velocity limit indicator (lightning icon) and session limit indicator (clock icon)
- Reset interval and days remaining
Best practices
Start generous, tighten later
Set your initial budget higher than you expect to need. Once you have a few days of cost data in the analytics dashboard, you can tighten the budget to a reasonable ceiling with confidence.
One budget per concern
Create separate API keys (and budgets) for different agents, environments, or teams:
production-agent-alpha— $200/monthproduction-agent-beta— $100/monthstaging— $20/monthdevelopment— $5/month
Use session limits for long-running agents
If your agents run multi-step tasks (research, code generation, data analysis), set a session limit to cap each task's cost. This prevents a single stuck agent from consuming the entire budget while still allowing other tasks to proceed.
Monitor before enforcing
Use the analytics dashboard to understand your spending patterns before setting tight budgets. The daily spend chart and model breakdown help you identify which models and agents drive the most cost.
Pricing tier limits
| Tier | Budgets | API Keys | Webhooks | Team Members |
|---|---|---|---|---|
| Free | 3 | 10 | 2 | Unlimited |
| Pro ($49/mo) | Unlimited | Unlimited | Unlimited | Unlimited |
| Scale ($199/mo) | Unlimited | Unlimited | Unlimited | Unlimited |
| Enterprise | Unlimited | Unlimited | Unlimited | Unlimited |
The Free tier includes 3 budgets — enough for separate production, staging, and development keys. Upgrade to Pro for unlimited budgets across all keys and environments. See the pricing page for the full feature matrix.
API Reference
For programmatic budget management, see the Budgets API. Key endpoints:
POST /api/budgets— create or update a budget (session auth)GET /api/budgets/status— check remaining balance (API key auth)
Use with AI Coding Assistants
Copy-paste reference blocks for Cursor, Claude Code, GitHub Copilot, and other AI coding tools. Give your assistant full context on the NullSpend API in one paste.
Per-Customer Budgets with bind and gate
How to use NullSpend's bind and gate APIs to cap each end-user's AI spend independently and render upgrade flows on denial.