Request
Response
Parameters
Standard parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | No | ninja-1 | Model ID, auto* variant, ensemble* variant, or fallback chain (a>b>c). |
messages | array | Yes | — | 1–50 messages. Each has role (system/user/assistant) and content. |
temperature | number | No | 0.7 | 0 = deterministic, 2 = creative. |
max_tokens | integer | No | 2048 | Max response length (1–16,384). |
stream | boolean | No | false | Stream tokens via SSE. |
Power feature parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
session_id | string | — | Attach to a persistent session. History is auto-injected. Sessions → |
cache | boolean | true | Cache non-streaming responses. Returns "cached": true on hits (free). |
include_routing | boolean | false | Include routing decision in response (which model was chosen and why). |
include_quality | boolean | false | Include quality score (0–1 confidence) in response. |
budget_cents | number | — | Auto-select best model within this cost ceiling (cents). Overrides model. |
min_quality | number | — | If response quality score < this threshold (0–1), auto-retry with a better model. |
fallback_on_error | boolean | true | In fallback chains: continue to next model on error. |
Power features
Smart routing — auto, auto-fast, auto-cheap, auto-quality
NinjaChat detects task type (code, math, creative, analysis, quick, general) and routes to the optimal model automatically.
Fallback chains — model: "a>b>c"
Try models in order. If one fails or scores below min_quality, automatically fall back to the next.
- Up to 4 models in a chain
- Billing: charged per successful attempt
- Combine with
min_qualityto trigger fallback on low-quality responses
Ensemble — model: "ensemble" or "ensemble-quality"
Runs 3 models in parallel, then synthesizes the best answer with a 4th call. Eliminates hallucinations and gets consensus.
| Variant | Models | Cost |
|---|---|---|
ensemble | GPT-5 + Claude Sonnet 4.6 + Gemini 3.1 Pro | $0.04/req |
ensemble-quality | GPT-5 + Claude Opus 4.6 + Gemini 3.1 Pro | $0.05/req |
Budget routing — budget_cents
Pick the best available model within a cost ceiling. Overrides model.
Quality scoring — include_quality + min_quality
Every response can include a heuristic quality score. Use min_quality to auto-retry with a better model if the score is too low.
confidence < min_quality, the system automatically retries with a higher-quality model and adds auto_retry to the response:
Response caching — cache
Non-streaming, non-session responses are cached by default (TTL: 1 hour). Identical requests return instantly with no credit charge.
"cache": false. Cache is automatically disabled when using session_id.
Persistent sessions — session_id
Attach conversation history to a named session. All prior messages are automatically prepended.
Free tier
Users with $0 balance get 50 free requests/month (resets on the 1st). Free requests includefree_tier in the response:
Streaming
Setstream: true for token-by-token SSE output. Works with all model variants.
Multi-turn conversations
Without sessions, pass the full history manually:Models and pricing
| Tier | Cost | Models |
|---|---|---|
| Ultra | $0.030/req | claude-opus-4.6 |
| Premium | $0.015/req | claude-sonnet-4.6 |
| Standard | $0.006/req | gpt-5 o3-mini claude-sonnet-4.5 claude-haiku-4.5 gemini-2.5-pro gemini-3-pro gemini-3.1-pro grok-4 kimi-k2 kimi-k2.5 mistral-large llama-4-maverick |
| Free | $0.003/req | gpt-5-mini gemini-2.5-flash gemini-3-flash llama-4-scout deepseek-v3 qwq-32b glm-5 minimax-m2.5 ninja-1 uncensored-ai |
| Ensemble | $0.040/req | ensemble (3 models + synthesis) |
| Ensemble Quality | $0.050/req | ensemble-quality |
auto, auto-fast, auto-cheap, auto-quality) are billed at the resolved model’s rate.
All model details →