Best Practices

1. Use environment variables

export NINJACHAT_API_KEY="nj_sk_YOUR_API_KEY"

import os
HEADERS = {"Authorization": f"Bearer {os.environ['NINJACHAT_API_KEY']}"}

Never hardcode keys. Add .env to .gitignore.

2. Pick the right model for the job

Don’t use a $0.015 model for simple tasks:

def smart_chat(task, message):
    models = {
        "classify": "deepseek-v3",       # $0.003 — simple tasks
        "chat":     "gpt-5",             # $0.006 — general use
        "code":     "claude-sonnet-4.6",  # $0.015 — when quality matters
        "fast":     "gemini-2.5-flash",   # $0.003 — lowest latency
    }
    r = requests.post("https://ninjachat.ai/api/v1/chat",
        headers=HEADERS,
        json={"model": models[task], "messages": [{"role": "user", "content": message}]}
    )
    return r.json()["message"]["content"]

3. Use streaming for chat UIs

Users hate waiting for a full response. Stream it:

r = requests.post("https://ninjachat.ai/api/v1/chat",
    headers=HEADERS,
    json={
        "model": "gpt-5",
        "messages": [{"role": "user", "content": "Tell me a story"}],
        "stream": True
    },
    stream=True
)
for line in r.iter_lines():
    if line and line.decode().startswith("data: ") and line.decode() != "data: [DONE]":
        import json
        token = json.loads(line.decode()[6:]).get("choices", [{}])[0].get("delta", {}).get("content", "")
        print(token, end="", flush=True)

4. Retry on 429s and 5xx

import time

def reliable_call(url, headers, json, retries=3):
    for i in range(retries):
        r = requests.post(url, headers=headers, json=json)
        if r.status_code == 200:
            return r.json()
        if r.status_code in (429, 500, 502, 503):
            time.sleep(int(r.headers.get("Retry-After", 2 ** i)))
            continue
        r.raise_for_status()
    raise Exception("Max retries")

5. Monitor costs

Every response tells you what it cost. Log it:

data = r.json()
print(f"Model: {data['model']}, Cost: {data['cost']['this_request']}, Latency: {data['metadata']['latency_ms']}ms")

# React to low-balance warnings before hitting $0
if "balance_warning" in data:
    print(f"⚠️ {data['balance_warning']['threshold']}")

Check your balance and usage programmatically:

# Current balance
r = requests.get("https://ninjachat.ai/api/api-keys", headers=HEADERS)
print(r.json()["billing"]["balanceFormatted"])

# Per-request logs (filterable by model, date, status)
r = requests.get("https://ninjachat.ai/api/api-keys/logs",
    headers=HEADERS, params={"group": "chat", "limit": 50})
for log in r.json()["logs"]:
    print(f"{log['model']} — {log['latencyMs']}ms — {log['statusCode']}")

# Export usage as CSV for accounting
r = requests.get("https://ninjachat.ai/api/api-keys/logs?format=csv&from=2026-03-01",
    headers=HEADERS)
with open("usage.csv", "w") as f:
    f.write(r.text)

# Purchase history
r = requests.get("https://ninjachat.ai/api/api-keys/transactions", headers=HEADERS)
for t in r.json()["transactions"]:
    print(f"{t['timestamp']} — {t['amount']}")

6. Proxy for browser apps

Never put your API key in frontend code:

# Your backend handles the API key
@app.post("/api/chat")
def proxy(req):
    return requests.post("https://ninjachat.ai/api/v1/chat",
        headers=HEADERS,  # Server-side only
        json={"model": "gpt-5", "messages": req.json["messages"]}
    ).json()

7. Use separate keys per environment

dev-key for development
prod-key for production
Revoke one without affecting the other

Get Started

Chat

Other Endpoints

Reference

1. Use environment variables

2. Pick the right model for the job

3. Use streaming for chat UIs

4. Retry on 429s and 5xx

5. Monitor costs

6. Proxy for browser apps

7. Use separate keys per environment

Get Started

Chat

Other Endpoints

Reference

​1. Use environment variables

​2. Pick the right model for the job

​3. Use streaming for chat UIs

​4. Retry on 429s and 5xx

​5. Monitor costs

​6. Proxy for browser apps

​7. Use separate keys per environment

1. Use environment variables

2. Pick the right model for the job

3. Use streaming for chat UIs

4. Retry on 429s and 5xx

5. Monitor costs

6. Proxy for browser apps

7. Use separate keys per environment