Documentation Index
Fetch the complete documentation index at: https://docs.ninjachat.ai/llms.txt
Use this file to discover all available pages before exploring further.
Request
Response
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
messages | array | Yes | — | Same format as /chat. 1–20 messages. |
models | array | No | Top 5 across tiers | Which models to run. 2–8 models. Cannot include auto* or ensemble*. |
rank_by | string | No | "balanced" | How to rank results: quality, speed, cost, or balanced. |
max_tokens | integer | No | 1024 | Max response length per model (1–8,192). |
temperature | number | No | 0.7 | Sampling temperature. |
include_full_responses | boolean | No | true | Include full response text in results. Set to false to get 200-char previews. |
rank_by modes
| Mode | Weights |
|---|---|
balanced | 50% quality + 30% speed + 20% cost |
quality | 100% quality score |
speed | 100% speed (lowest latency wins) |
cost | 100% cost (cheapest wins) |
Default models (when models not specified)
gpt-5, claude-sonnet-4.6, gemini-3.1-pro, deepseek-v3, gemini-3-flash
Billing
You are charged for every successful model call. A 4-model compare costs the sum of each model’s per-request rate. The response includestotal_cost and a per-model cost_cents breakdown.
If any model fails, you are not charged for that model. All successful models are charged.
Pre-flight balance check: if your balance is less than the estimated total cost, the request fails before any models run.
Code examples
Common use cases
Choose a model for production — Compare 4–5 models on a representative sample of your actual prompts before committing to one. Verify quality across models — Run the same benchmark prompt monthly to see if model updates changed behavior. Find the best value —summary.best_value shows the model with the highest quality-to-cost ratio.
Regression testing — Run your golden test prompts against a new model before switching.