Text Models
Streaming
Set stream: true to receive tokens as they are generated. The response is a Server-Sent Events (SSE) stream with data: chunks. Each chunk contains a delta with partial content, and the stream ends with a [DONE] sentinel.
streaming.py
response = client.chat.completions.create(
model="<model-id>",
messages=[{#60a5fa]">class="text-emerald-400">"role": class="text-emerald-400">"user", class="text-emerald-400">"content": class="text-emerald-400">"Write a haiku."}],
stream=True,
)
#60a5fa]">for chunk in response:
delta = chunk.choices[0].delta.content or #60a5fa]">class="text-emerald-400">""
#60a5fa]">print(delta, end=class="text-emerald-400">"", flush=True)If you cancel the stream mid-response, billing stops at the tokens already produced — you are not charged for tokens the model would have generated. A final usage chunk may arrive with prompt and completion token totals; availability depends on the model provider.