TL;DR — Set model to auto and NinjaChat routes each request to the best model for quality, speed, and cost based on your prompt — with a routing object that shows exactly what happened.
The problem smart routing solves
Picking the right model is hard. You need to know that o3-mini is best for math, claude-sonnet-4.6 for code, gemini-3.1-pro for creative writing, and gemini-3-flash for quick factual answers. Smart routing does this automatically.
The four auto variants
Model ID Optimizes for Best when… autoQuality + speed balance You want the best model without thinking about it auto-fastLowest latency Real-time apps, chatbots, low-latency pipelines auto-cheapLowest cost High-volume jobs, batch processing, cost-sensitive apps auto-qualityHighest quality Enterprise use, critical decisions, best possible output
Request
curl -X POST https://ninjachat.ai/api/v1/chat \
-H "Authorization: Bearer nj_sk_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Solve: x² + 5x + 6 = 0"}],
"include_routing": true
}'
Response
{
"id" : "chatcmpl-1749584400000" ,
"object" : "chat.completion" ,
"model" : "o3-mini" ,
"choices" : [{
"index" : 0 ,
"message" : {
"role" : "assistant" ,
"content" : "Factoring: x² + 5x + 6 = (x + 2)(x + 3) = 0 \n Solutions: x = -2 and x = -3"
},
"finish_reason" : "stop"
}],
"routing" : {
"requested" : "auto" ,
"resolved" : "o3-mini" ,
"task_type" : "math" ,
"reasoning" : "Detected task type: math"
},
"cost" : { "this_request" : "$0.006" },
"balance" : "$4.820"
}
The routing field shows exactly which model was chosen and why. Add "include_routing": true to always see this.
How task detection works
NinjaChat analyzes your last 3 user messages using keyword pattern matching:
Task type Detected keywords auto routes tocodefunction, debug, implement, algorithm, TypeScript, SQL… claude-sonnet-4.6mathequation, solve, calculate, integral, probability… o3-minicreativewrite, story, poem, imagine, fiction, lyrics… gemini-3.1-proanalysisanalyze, compare, evaluate, research, summarize… gpt-5quickShort prompts under 80 chars, “what is”, “define”… gemini-3-flashgeneralEverything else gpt-5
Full routing table
auto (balanced)
auto-fast
auto-cheap
auto-quality
Task Model code claude-sonnet-4.6math o3-minicreative gemini-3.1-proanalysis gpt-5quick gemini-3-flashgeneral gpt-5
Task Model code claude-haiku-4.5math gpt-5-minicreative gemini-3-flashanalysis gpt-5-miniquick gpt-5-minigeneral gpt-5-mini
Task Model code deepseek-v3math qwq-32bcreative gemini-3-flashanalysis deepseek-v3quick gemini-2.5-flashgeneral llama-4-maverick
Task Model code claude-opus-4.6math o3-minicreative gemini-3.1-proanalysis claude-opus-4.6quick claude-opus-4.6general claude-opus-4.6
Billing
Auto variants are billed at the resolved model’s rate — not a flat fee. If auto routes to o3-mini, you pay $0.006. If it routes to claude-sonnet-4.6, you pay $0.015. The routing field always shows the cost-incurring model.
Parameters
The auto variants accept all standard Chat parameters. Two extra fields are relevant:
Parameter Type Default Description include_routingboolean falseInclude routing object in response showing resolved model and task type budget_centsnumber — Override auto selection with a cost ceiling. See Budget Routing .
Code examples
Python
Node.js
cURL — auto-cheap for batch jobs
import requests, os
r = requests.post( "https://ninjachat.ai/api/v1/chat" ,
headers = { "Authorization" : f "Bearer { os.environ[ 'NINJACHAT_API_KEY' ] } " },
json = {
"model" : "auto" ,
"messages" : [{ "role" : "user" , "content" : "Write a merge sort in Python" }],
"include_routing" : True ,
}
)
data = r.json()
print (data[ "choices" ][ 0 ][ "message" ][ "content" ])
print ( "Routed to:" , data[ "routing" ][ "resolved" ]) # claude-sonnet-4.6
print ( "Task type:" , data[ "routing" ][ "task_type" ]) # code
Combine with other features
Smart routing works with every other chat feature:
{
"model" : "auto" ,
"messages" : [ ... ],
"session_id" : "user-123" ,
"include_routing" : true ,
"include_quality" : true ,
"min_quality" : 0.8
}
If quality falls below min_quality, the system auto-retries with a better model — on top of smart routing.