Skip to main content
Set stream: true to receive tokens as they’re generated over SSE instead of waiting for the full response.

Request

{
  "model": "gpt-5",
  "messages": [{"role": "user", "content": "Write a haiku about coding"}],
  "stream": true
}

Code examples

import os, requests, json

r = requests.post("https://ninjachat.ai/api/v1/chat",
    headers={"Authorization": f"Bearer {os.environ['NINJACHAT_API_KEY']}"},
    json={
        "model": "auto",
        "messages": [{"role": "user", "content": "Write a haiku about coding"}],
        "stream": True,
        "include_routing": True,
    },
    stream=True
)

for line in r.iter_lines():
    if not line:
        continue
    text = line.decode("utf-8")
    if text.startswith("data: ") and text != "data: [DONE]":
        chunk = json.loads(text[6:])
        if "routing" in chunk:
            print(f"Routed to: {chunk['routing']['resolved']}", flush=True)
        token = chunk.get("choices", [{}])[0].get("delta", {}).get("content", "")
        print(token, end="", flush=True)
print()

SSE format

Each event is a data: line containing a JSON chunk. The stream ends with data: [DONE].
data: {"choices":[{"delta":{"content":"The"}}]}
data: {"choices":[{"delta":{"content":" capital"}}]}
data: [DONE]
When using smart routing (model: "auto"), the first chunk includes a routing object with the resolved model — before any content tokens arrive.

Notes

  • Streaming works with all models, smart routing variants, and sessions.
  • Response caching is automatically disabled for streaming requests.
  • Billing is the same as non-streaming — you’re charged at the resolved model’s rate.