Best Practices
Recommendations for building reliable applications with the Yunxin API.
Use Streaming for Long Responses
For chat completions and text generation, enable streaming to reduce time-to-first-token:
stream = client.chat.completions.create(
model="model-id",
messages=[{"role": "user", "content": "Write a detailed analysis..."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")Handle Errors Gracefully
Always implement retry logic with exponential backoff:
import time
from openai import OpenAI, RateLimitError, APIConnectionError
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.yuhuanstudio.com/v1"
)
def make_request_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="model-id",
messages=messages
)
except RateLimitError:
wait = 2 ** attempt
time.sleep(wait)
except APIConnectionError:
wait = 2 ** attempt
time.sleep(wait)
raise Exception("Max retries exceeded")Optimize Token Usage
- Be specific in prompts — shorter, clearer prompts reduce input tokens
- Set
max_tokens— prevent runaway responses - Use appropriate models — use smaller models for simpler tasks
# For simple classification, use a smaller model
response = client.chat.completions.create(
model="model-id",
messages=[{"role": "user", "content": "Classify: positive or negative? 'Great product!'"}],
max_tokens=10
)
# For complex reasoning, use a capable model
response = client.chat.completions.create(
model="model-id",
messages=[{"role": "user", "content": "Analyze the economic implications of..."}],
max_tokens=4096
)Secure Your API Keys
- Never hardcode keys in source code
- Use environment variables or secret managers
- Rotate keys regularly via the dashboard
- Set per-key rate limits to prevent abuse
# Use environment variables
export YUNXIN_API_KEY="yx-..."import os
client = OpenAI(
api_key=os.environ["YUNXIN_API_KEY"],
base_url="https://api.yuhuanstudio.com/v1"
)Use Batch API for Bulk Operations
For non-time-sensitive tasks, use the Batch API for higher throughput and lower cost:
# Create a batch of requests
batch = client.batches.create(
input_file_id="file-abc123",
endpoint="/v1/chat/completions",
completion_window="24h"
)Monitor Usage
- Check the Analytics dashboard regularly for usage patterns
- Set up budget alerts to avoid unexpected costs
- Review error rates to identify integration issues
- Use request logging for debugging
Choose the Right Model
Use the Models API (GET /v1/models) to discover and compare available models. Consider factors like context length, capabilities, and pricing when selecting a model for your use case.
curl "https://api.yuhuanstudio.com/v1/models?capability=chat" \
-H "Authorization: Bearer YOUR_API_KEY"How is this guide?