Skip to main content

1. Use environment variables

export NINJACHAT_API_KEY="nj_sk_YOUR_API_KEY"
import os
HEADERS = {"Authorization": f"Bearer {os.environ['NINJACHAT_API_KEY']}"}
Never hardcode keys. Add .env to .gitignore.

2. Pick the right model for the job

Don’t use a $0.015 model for simple tasks:
def smart_chat(task, message):
    models = {
        "classify": "deepseek-v3",       # $0.003 — simple tasks
        "chat":     "gpt-5",             # $0.006 — general use
        "code":     "claude-sonnet-4.6",  # $0.015 — when quality matters
        "fast":     "gemini-2.5-flash",   # $0.003 — lowest latency
    }
    r = requests.post("https://ninjachat.ai/api/v1/chat",
        headers=HEADERS,
        json={"model": models[task], "messages": [{"role": "user", "content": message}]}
    )
    return r.json()["message"]["content"]

3. Use streaming for chat UIs

Users hate waiting for a full response. Stream it:
r = requests.post("https://ninjachat.ai/api/v1/chat",
    headers=HEADERS,
    json={
        "model": "gpt-5",
        "messages": [{"role": "user", "content": "Tell me a story"}],
        "stream": True
    },
    stream=True
)
for line in r.iter_lines():
    if line and line.decode().startswith("data: ") and line.decode() != "data: [DONE]":
        import json
        token = json.loads(line.decode()[6:]).get("choices", [{}])[0].get("delta", {}).get("content", "")
        print(token, end="", flush=True)

4. Retry on 429s and 5xx

import time

def reliable_call(url, headers, json, retries=3):
    for i in range(retries):
        r = requests.post(url, headers=headers, json=json)
        if r.status_code == 200:
            return r.json()
        if r.status_code in (429, 500, 502, 503):
            time.sleep(int(r.headers.get("Retry-After", 2 ** i)))
            continue
        r.raise_for_status()
    raise Exception("Max retries")

5. Monitor costs

Every response tells you what it cost. Log it:
data = r.json()
print(f"Model: {data['model']}, Cost: {data['cost']['this_request']}, Latency: {data['metadata']['latency_ms']}ms")

# React to low-balance warnings before hitting $0
if "balance_warning" in data:
    print(f"⚠️ {data['balance_warning']['threshold']}")
Check your balance and usage programmatically:
# Current balance
r = requests.get("https://ninjachat.ai/api/api-keys", headers=HEADERS)
print(r.json()["billing"]["balanceFormatted"])

# Per-request logs (filterable by model, date, status)
r = requests.get("https://ninjachat.ai/api/api-keys/logs",
    headers=HEADERS, params={"group": "chat", "limit": 50})
for log in r.json()["logs"]:
    print(f"{log['model']}{log['latencyMs']}ms — {log['statusCode']}")

# Export usage as CSV for accounting
r = requests.get("https://ninjachat.ai/api/api-keys/logs?format=csv&from=2026-03-01",
    headers=HEADERS)
with open("usage.csv", "w") as f:
    f.write(r.text)

# Purchase history
r = requests.get("https://ninjachat.ai/api/api-keys/transactions", headers=HEADERS)
for t in r.json()["transactions"]:
    print(f"{t['timestamp']}{t['amount']}")

6. Proxy for browser apps

Never put your API key in frontend code:
# Your backend handles the API key
@app.post("/api/chat")
def proxy(req):
    return requests.post("https://ninjachat.ai/api/v1/chat",
        headers=HEADERS,  # Server-side only
        json={"model": "gpt-5", "messages": req.json["messages"]}
    ).json()

7. Use separate keys per environment

  • dev-key for development
  • prod-key for production
  • Revoke one without affecting the other