Skip to main content
New: Set model: "auto" for smart routing (no more manual model picking), model: "a>b>c" for fallback chains, or budget_cents: 1 for budget-aware routing. See all power features →

Request

POST https://ninjachat.ai/api/v1/chat
Authorization: Bearer nj_sk_YOUR_API_KEY
Content-Type: application/json
{
  "model": "gpt-5",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in one paragraph."}
  ],
  "temperature": 0.7,
  "max_tokens": 2048
}

Response

{
  "id": "chatcmpl-1749584400000",
  "object": "chat.completion",
  "model": "gpt-5",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Quantum computing harnesses quantum mechanical phenomena..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 87,
    "total_tokens": 111
  },
  "cost": {"this_request": "$0.006"},
  "balance": "$4.820",
  "metadata": {"latency_ms": 1243}
}

Parameters

Standard parameters

ParameterTypeRequiredDefaultDescription
modelstringNoninja-1Model ID, auto* variant, ensemble* variant, or fallback chain (a>b>c).
messagesarrayYes1–50 messages. Each has role (system/user/assistant) and content.
temperaturenumberNo0.70 = deterministic, 2 = creative.
max_tokensintegerNo2048Max response length (1–16,384).
streambooleanNofalseStream tokens via SSE.

Power feature parameters

ParameterTypeDefaultDescription
session_idstringAttach to a persistent session. History is auto-injected. Sessions →
cachebooleantrueCache non-streaming responses. Returns "cached": true on hits (free).
include_routingbooleanfalseInclude routing decision in response (which model was chosen and why).
include_qualitybooleanfalseInclude quality score (0–1 confidence) in response.
budget_centsnumberAuto-select best model within this cost ceiling (cents). Overrides model.
min_qualitynumberIf response quality score < this threshold (0–1), auto-retry with a better model.
fallback_on_errorbooleantrueIn fallback chains: continue to next model on error.

Power features

Smart routing — auto, auto-fast, auto-cheap, auto-quality

NinjaChat detects task type (code, math, creative, analysis, quick, general) and routes to the optimal model automatically.
{
  "model": "auto",
  "messages": [{"role": "user", "content": "Write a binary search in Python"}],
  "include_routing": true
}
Response includes:
"routing": {
  "requested": "auto",
  "resolved": "claude-sonnet-4.6",
  "task_type": "code",
  "reasoning": "Detected task type: code"
}
Full smart routing guide →

Fallback chains — model: "a>b>c"

Try models in order. If one fails or scores below min_quality, automatically fall back to the next.
{
  "model": "claude-opus-4.6>gpt-5>gemini-3.1-pro",
  "messages": [{"role": "user", "content": "Review this contract clause..."}],
  "min_quality": 0.85
}
Response:
{
  "model": "gpt-5",
  "fallback": {
    "chain": ["claude-opus-4.6", "gpt-5", "gemini-3.1-pro"],
    "triggered": true,
    "attempts": [
      {"model": "claude-opus-4.6", "success": false, "error": "timeout"},
      {"model": "gpt-5", "success": true, "quality_score": 0.94}
    ]
  }
}
  • Up to 4 models in a chain
  • Billing: charged per successful attempt
  • Combine with min_quality to trigger fallback on low-quality responses

Ensemble — model: "ensemble" or "ensemble-quality"

Runs 3 models in parallel, then synthesizes the best answer with a 4th call. Eliminates hallucinations and gets consensus.
{
  "model": "ensemble",
  "messages": [{"role": "user", "content": "Is microservices worth it for a 3-person startup?"}]
}
{
  "ensemble": {
    "models": ["gpt-5", "claude-sonnet-4.6", "gemini-3.1-pro"],
    "synthesis": "consensus"
  },
  "cost": {"this_request": "$0.040"}
}
VariantModelsCost
ensembleGPT-5 + Claude Sonnet 4.6 + Gemini 3.1 Pro$0.04/req
ensemble-qualityGPT-5 + Claude Opus 4.6 + Gemini 3.1 Pro$0.05/req

Budget routing — budget_cents

Pick the best available model within a cost ceiling. Overrides model.
{
  "budget_cents": 0.5,
  "messages": [{"role": "user", "content": "Translate to Spanish: Hello world"}],
  "include_routing": true
}
"routing": {
  "budget_routing": {
    "requested_budget_cents": 0.5,
    "resolved": "deepseek-v3",
    "reason": "Best model within 0.5¢: DeepSeek V3 (free tier, 0.3¢/req)"
  }
}

Quality scoring — include_quality + min_quality

Every response can include a heuristic quality score. Use min_quality to auto-retry with a better model if the score is too low.
{
  "model": "gpt-5",
  "messages": [{"role": "user", "content": "Write a poem about the ocean"}],
  "include_quality": true,
  "min_quality": 0.8
}
"quality": {
  "confidence": 0.94,
  "flags": [],
  "suggested_retry": false
}
If confidence < min_quality, the system automatically retries with a higher-quality model and adds auto_retry to the response:
"auto_retry": {
  "triggered": true,
  "original_model": "gpt-5-mini",
  "retry_model": "gpt-5",
  "original_quality": 0.52
}

Response caching — cache

Non-streaming, non-session responses are cached by default (TTL: 1 hour). Identical requests return instantly with no credit charge.
"cached": true
Disable with "cache": false. Cache is automatically disabled when using session_id.

Persistent sessions — session_id

Attach conversation history to a named session. All prior messages are automatically prepended.
{
  "model": "gpt-5",
  "messages": [{"role": "user", "content": "What did I ask you before?"}],
  "session_id": "user-123"
}
"session": {
  "id": "user-123",
  "message_count": 6
}
Full sessions guide →

Free tier

Users with $0 balance get 50 free requests/month (resets on the 1st). Free requests include free_tier in the response:
"free_tier": {
  "used": 3,
  "remaining": 47
}

Streaming

Set stream: true for token-by-token SSE output. Works with all model variants.
import os, requests, json

r = requests.post("https://ninjachat.ai/api/v1/chat",
    headers={"Authorization": f"Bearer {os.environ['NINJACHAT_API_KEY']}"},
    json={
        "model": "auto",
        "messages": [{"role": "user", "content": "Write a haiku about coding"}],
        "stream": True,
        "include_routing": True,
    },
    stream=True
)

for line in r.iter_lines():
    if not line:
        continue
    text = line.decode("utf-8")
    if text.startswith("data: ") and text != "data: [DONE]":
        chunk = json.loads(text[6:])
        # First chunk may contain routing info (no content delta)
        if "routing" in chunk:
            print(f"Routed to: {chunk['routing']['resolved']}", flush=True)
        token = chunk.get("choices", [{}])[0].get("delta", {}).get("content", "")
        print(token, end="", flush=True)
print()

Multi-turn conversations

Without sessions, pass the full history manually:
messages = [
    {"role": "system", "content": "You are a helpful tutor."},
    {"role": "user", "content": "What is photosynthesis?"},
    {"role": "assistant", "content": "Photosynthesis converts sunlight into energy..."},
    {"role": "user", "content": "How does it compare to solar panels?"},
]
r = requests.post("https://ninjachat.ai/api/v1/chat",
    headers={"Authorization": f"Bearer {os.environ['NINJACHAT_API_KEY']}"},
    json={"model": "gpt-5", "messages": messages}
)
Or use sessions to have NinjaChat manage history for you.

Models and pricing

TierCostModels
Ultra$0.030/reqclaude-opus-4.6
Premium$0.015/reqclaude-sonnet-4.6
Standard$0.006/reqgpt-5 o3-mini claude-sonnet-4.5 claude-haiku-4.5 gemini-2.5-pro gemini-3-pro gemini-3.1-pro grok-4 kimi-k2 kimi-k2.5 mistral-large llama-4-maverick
Free$0.003/reqgpt-5-mini gemini-2.5-flash gemini-3-flash llama-4-scout deepseek-v3 qwq-32b glm-5 minimax-m2.5 ninja-1 uncensored-ai
Ensemble$0.040/reqensemble (3 models + synthesis)
Ensemble Quality$0.050/reqensemble-quality
Auto variants (auto, auto-fast, auto-cheap, auto-quality) are billed at the resolved model’s rate. All model details →