# Best Practices (/docs/best-practices)


Use Streaming for Long Responses [#use-streaming-for-long-responses]

For chat completions and text generation, enable streaming to reduce time-to-first-token:

```python
stream = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Write a detailed analysis..."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
```

Handle Errors Gracefully [#handle-errors-gracefully]

Always implement retry logic with exponential backoff:

```python
import time
from openai import OpenAI, RateLimitError, APIConnectionError

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.yuhuanstudio.com/v1"
)

def make_request_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="model-id",
                messages=messages
            )
        except RateLimitError:
            wait = 2 ** attempt
            time.sleep(wait)
        except APIConnectionError:
            wait = 2 ** attempt
            time.sleep(wait)
    raise Exception("Max retries exceeded")
```

Optimize Token Usage [#optimize-token-usage]

1. **Be specific in prompts** — shorter, clearer prompts reduce input tokens
2. **Set `max_tokens`** — prevent runaway responses
3. **Use appropriate models** — use smaller models for simpler tasks

```python
# For simple classification, use a smaller model
response = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Classify: positive or negative? 'Great product!'"}],
    max_tokens=10
)

# For complex reasoning, use a capable model
response = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Analyze the economic implications of..."}],
    max_tokens=4096
)
```

Secure Your API Keys [#secure-your-api-keys]

* **Never hardcode keys** in source code
* **Use environment variables** or secret managers
* **Rotate keys regularly** via the dashboard
* **Set per-key rate limits** to prevent abuse

```bash
# Use environment variables
export YUNXIN_API_KEY="yx-..."
```

```python
import os
client = OpenAI(
    api_key=os.environ["YUNXIN_API_KEY"],
    base_url="https://api.yuhuanstudio.com/v1"
)
```

Use Batch API for Bulk Operations [#use-batch-api-for-bulk-operations]

For non-time-sensitive tasks, use the Batch API for higher throughput and lower cost:

```python
# Create a batch of requests
batch = client.batches.create(
    input_file_id="file-abc123",
    endpoint="/v1/chat/completions",
    completion_window="24h"
)
```

Monitor Usage [#monitor-usage]

* Check the **Analytics** dashboard regularly for usage patterns
* Set up **budget alerts** to avoid unexpected costs
* Review **error rates** to identify integration issues
* Use **request logging** for debugging

Choose the Right Model [#choose-the-right-model]

<Callout>
  Use the Models API (`GET /v1/models`) to discover and compare available models. Consider factors like context length, capabilities, and pricing when selecting a model for your use case.
</Callout>

```bash
curl "https://api.yuhuanstudio.com/v1/models?capability=chat" \
  -H "Authorization: Bearer YOUR_API_KEY"
```