Streaming
Receive responses in real-time with server-sent events (SSE).
Overview
Streaming allows you to receive partial responses as they are generated, providing a much better user experience for real-time applications like chatbots.
Enable Streaming
Set stream: true in your request:
{
"model": "model-id",
"messages": [{"role": "user", "content": "Tell me a story."}],
"stream": true
}Server-Sent Events (SSE) Format
The API returns a stream of data: events:
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"},"index":0}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Once"},"index":0}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" upon"},"index":0}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" a"},"index":0}]}
data: [DONE]Each chunk contains a delta object with the incremental content.
Examples
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.yuhuanstudio.com/v1"
)
stream = client.chat.completions.create(
model="model-id",
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://api.yuhuanstudio.com/v1",
});
const stream = await client.chat.completions.create({
model: "model-id",
messages: [{ role: "user", content: "Tell me a story." }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}curl https://api.yuhuanstudio.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-N \
-d '{
"model": "model-id",
"messages": [{"role": "user", "content": "Tell me a story."}],
"stream": true
}'Streaming with Reasoning Models
When using reasoning/thinking models, the stream may include thinking tokens:
stream = client.chat.completions.create(
model="model-id",
messages=[{"role": "user", "content": "Solve: What is 15! / 13!?"}],
stream=True
)
for chunk in stream:
# Thinking content (reasoning process)
if hasattr(chunk.choices[0].delta, 'reasoning_content'):
thinking = chunk.choices[0].delta.reasoning_content
if thinking:
print(f"[Thinking] {thinking}", end="")
# Final content
content = chunk.choices[0].delta.content
if content:
print(content, end="")Stream Options
Include Usage
Request token usage information in the final stream event:
{
"model": "model-id",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true,
"stream_options": {
"include_usage": true
}
}The final chunk before [DONE] will include:
{
"usage": {
"prompt_tokens": 10,
"completion_tokens": 25,
"total_tokens": 35
}
}How is this guide?