Text Models

Streaming

Set stream: true to receive tokens as they are generated. The response is a Server-Sent Events (SSE) stream with data: chunks. Each chunk contains a delta with partial content, and the stream ends with a [DONE] sentinel.

streaming.py

response = client.chat.completions.create(
    model="<model-id>",
    messages=[{#60a5fa]">class="text-emerald-400">"role": class="text-emerald-400">"user", class="text-emerald-400">"content": class="text-emerald-400">"Write a haiku."}],
    stream=True,
)

#60a5fa]">for chunk in response:
    delta = chunk.choices[0].delta.content or #60a5fa]">class="text-emerald-400">""
    #60a5fa]">print(delta, end=class="text-emerald-400">"", flush=True)

If you cancel the stream mid-response, billing stops at the tokens already produced — you are not charged for tokens the model would have generated. A final usage chunk may arrive with prompt and completion token totals; availability depends on the model provider.