Gemini Quickstart
Get cost tracking for your Google Gemini calls in under 2 minutes.
Get cost tracking for your Google Gemini calls in under 2 minutes.
Prerequisites
- A NullSpend account (sign up)
- An existing app that calls the Google Gemini API
- A Google AI Studio API key (get one)
Step 1: Create an API Key
- Log in to the NullSpend dashboard
- Go to Settings → Create API Key
- Copy the key (starts with
ns_live_sk_) — you won't see it again
Step 2: Point Your Requests at the Proxy
NullSpend supports Google's native Gemini REST API. Change the base URL from generativelanguage.googleapis.com to proxy.nullspend.dev and add your NullSpend key. No SDK wrapper needed, no body format changes.
cURL
# Non-streaming
curl "https://proxy.nullspend.dev/v1beta/models/gemini-2.5-flash:generateContent" \
-H "x-goog-api-key: $GOOGLE_API_KEY" \
-H "X-NullSpend-Key: $NULLSPEND_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "Hello, what can you do?"}]}],
"generationConfig": {"maxOutputTokens": 200}
}'
# Streaming
curl "https://proxy.nullspend.dev/v1beta/models/gemini-2.5-flash:streamGenerateContent" \
-H "x-goog-api-key: $GOOGLE_API_KEY" \
-H "X-NullSpend-Key: $NULLSPEND_API_KEY" \
-H "Content-Type: application/json" \
--no-buffer \
-d '{
"contents": [{"parts": [{"text": "Count from 1 to 10"}]}],
"generationConfig": {"maxOutputTokens": 200}
}'TypeScript (fetch)
const response = await fetch(
"https://proxy.nullspend.dev/v1beta/models/gemini-2.5-flash:generateContent",
{
method: "POST",
headers: {
"Content-Type": "application/json",
"x-goog-api-key": process.env.GOOGLE_API_KEY!,
"X-NullSpend-Key": process.env.NULLSPEND_API_KEY!,
},
body: JSON.stringify({
contents: [{ parts: [{ text: "Hello" }] }],
}),
},
);
const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);TypeScript (Google GenAI SDK)
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({
apiKey: process.env.GOOGLE_API_KEY!,
httpOptions: {
baseUrl: "https://proxy.nullspend.dev/v1beta",
headers: {
"X-NullSpend-Key": process.env.NULLSPEND_API_KEY!,
},
},
});
const response = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: "Hello, what can you do?",
});
console.log(response.text);Python (requests)
import requests
import os
response = requests.post(
"https://proxy.nullspend.dev/v1beta/models/gemini-2.5-flash:generateContent",
headers={
"Content-Type": "application/json",
"x-goog-api-key": os.environ["GOOGLE_API_KEY"],
"X-NullSpend-Key": os.environ["NULLSPEND_API_KEY"],
},
json={
"contents": [{"parts": [{"text": "Hello"}]}],
},
)
print(response.json()["candidates"][0]["content"]["parts"][0]["text"])Python (Google GenAI SDK)
Note: The Python GenAI SDK currently only supports custom
base_urlwithvertexai=True. For Gemini API (non-Vertex) proxying, use therequestsexample above or set theHTTPS_PROXYenvironment variable.
Note on maxOutputTokens: Gemini 2.5 models are "thinking" models that use part of the output token budget for internal reasoning. If you set maxOutputTokens too low (e.g., 20), the model may spend all tokens on thinking and return no visible output. Use at least 200 for short responses.
Step 3: Check the Dashboard
Open the NullSpend dashboard. Cost events appear within seconds. You'll see:
- Daily spend chart — cost over time
- Model breakdown — Gemini models with
provider: google - Thinking tokens — tagged as
_ns_thinking_tokensfor Gemini 2.5 models
What Gets Tracked
| Field | Source |
|---|---|
| Input tokens | usageMetadata.promptTokenCount |
| Output tokens | usageMetadata.candidatesTokenCount |
| Cached tokens | usageMetadata.cachedContentTokenCount |
| Thinking tokens | usageMetadata.thoughtsTokenCount (tag: _ns_thinking_tokens) |
| Google response ID | responseId (tag: _ns_google_response_id) |
How It Differs from OpenAI/Anthropic
| Aspect | OpenAI/Anthropic | Gemini |
|---|---|---|
| Model location | In request body ("model": "gpt-4o") | In URL path (/models/gemini-2.5-flash:generateContent) |
| Streaming | Body field "stream": true | Different endpoint (:streamGenerateContent) |
| Auth header | Authorization: Bearer | x-goog-api-key (or Authorization: Bearer) |
| SSE format | Delta-based (partial chunks) | Complete response per event |
| Request body | NullSpend convention | Native Gemini format (passthrough, no transformation) |
What's Next
- Set a budget — The proxy blocks Gemini requests with
429when the budget ceiling is hit. - Add tags — Attribute Gemini costs to teams or features with the
X-NullSpend-Tagsheader. - Configure webhooks — Get notified on cost events and budget thresholds.
- OpenAI too? — OpenAI Quickstart
- Anthropic? — Anthropic Quickstart
Troubleshooting
401 Unauthorized
Your X-NullSpend-Key header is missing or invalid. Your Google API key (x-goog-api-key) is separate and forwards to Google unchanged.
404 Not Found
Check the URL path. Gemini endpoints must match /v1beta/models/{model}:generateContent or :streamGenerateContent exactly. Other Gemini methods (e.g., :countTokens, :embedContent) are not yet supported.
429 Too Many Requests
Either a NullSpend budget was exceeded (check error.code: budget_exceeded, velocity_exceeded) or you hit the rate limit. Google-side 429s (quota exceeded) pass through with the original error message.
Empty response (no visible output)
Gemini 2.5 models use thinking tokens from the output budget. Increase maxOutputTokens in generationConfig (try 500+). Check usageMetadata.thoughtsTokenCount to see how many tokens went to thinking.
Streaming returns JSON array instead of SSE
The proxy automatically appends ?alt=sse to streaming requests. If you're hitting Google directly (not through the proxy), you need to add ?alt=sse to the URL yourself.