# Streaming (/docs/streaming)


Overview [#overview]

Streaming allows you to receive partial responses as they are generated, providing a much better user experience for real-time applications like chatbots.

Enable Streaming [#enable-streaming]

Set `stream: true` in your request:

```json
{
  "model": "model-id",
  "messages": [{"role": "user", "content": "Tell me a story."}],
  "stream": true
}
```

Server-Sent Events (SSE) Format [#server-sent-events-sse-format]

The API returns a stream of `data:` events:

```
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Once"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" upon"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" a"},"index":0}]}

data: [DONE]
```

Each chunk contains a `delta` object with the incremental content.

Examples [#examples]

<Tabs items="[&#x22;Python&#x22;, &#x22;JavaScript&#x22;, &#x22;cURL&#x22;]">
  <Tab value="Python">
    ```python
    from openai import OpenAI

    client = OpenAI(
        api_key="YOUR_API_KEY",
        base_url="https://api.yuhuanstudio.com/v1"
    )

    stream = client.chat.completions.create(
        model="model-id",
        messages=[{"role": "user", "content": "Tell me a story."}],
        stream=True
    )

    for chunk in stream:
        content = chunk.choices[0].delta.content
        if content:
            print(content, end="", flush=True)
    ```
  </Tab>

  <Tab value="JavaScript">
    ```javascript
    import OpenAI from "openai";

    const client = new OpenAI({
      apiKey: "YOUR_API_KEY",
      baseURL: "https://api.yuhuanstudio.com/v1",
    });

    const stream = await client.chat.completions.create({
      model: "model-id",
      messages: [{ role: "user", content: "Tell me a story." }],
      stream: true,
    });

    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content;
      if (content) process.stdout.write(content);
    }
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl https://api.yuhuanstudio.com/v1/chat/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -N \
      -d '{
        "model": "model-id",
        "messages": [{"role": "user", "content": "Tell me a story."}],
        "stream": true
      }'
    ```
  </Tab>
</Tabs>

Streaming with Reasoning Models [#streaming-with-reasoning-models]

When using reasoning/thinking models, the stream may include thinking tokens:

```python
stream = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Solve: What is 15! / 13!?"}],
    stream=True
)

for chunk in stream:
    # Thinking content (reasoning process)
    if hasattr(chunk.choices[0].delta, 'reasoning_content'):
        thinking = chunk.choices[0].delta.reasoning_content
        if thinking:
            print(f"[Thinking] {thinking}", end="")

    # Final content
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="")
```

Stream Options [#stream-options]

Include Usage [#include-usage]

Request token usage information in the final stream event:

```json
{
  "model": "model-id",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}
```

The final chunk before `[DONE]` will include:

```json
{
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 25,
    "total_tokens": 35
  }
}
```
