Best Practices

Use Streaming for Long Responses

For chat completions and text generation, enable streaming to reduce time-to-first-token:

stream = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Write a detailed analysis..."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Handle Errors Gracefully

Always implement retry logic with exponential backoff:

import time
from openai import OpenAI, RateLimitError, APIConnectionError

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.yuhuanstudio.com/v1"
)

def make_request_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="model-id",
                messages=messages
            )
        except RateLimitError:
            wait = 2 ** attempt
            time.sleep(wait)
        except APIConnectionError:
            wait = 2 ** attempt
            time.sleep(wait)
    raise Exception("Max retries exceeded")

Optimize Token Usage

Be specific in prompts — shorter, clearer prompts reduce input tokens
Set max_tokens — prevent runaway responses
Use appropriate models — use smaller models for simpler tasks

# For simple classification, use a smaller model
response = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Classify: positive or negative? 'Great product!'"}],
    max_tokens=10
)

# For complex reasoning, use a capable model
response = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Analyze the economic implications of..."}],
    max_tokens=4096
)

Choose the Right API Format

Yunxin supports three API formats, each with different strengths:

Format	Best For	SDK
Chat Completions	General use, widest compatibility	OpenAI SDK
Messages API	Claude-specific features (thinking, document blocks)	Anthropic SDK
Responses API	Agent workflows, built-in tools	OpenAI SDK (newer)

Yunxin automatically converts between formats when needed — for example, Messages API requests are transparently converted to Chat Completions for providers that don't natively support the Messages format.

Secure Your API Keys

Never hardcode keys in source code
Use environment variables or secret managers
Rotate keys regularly via the dashboard
Set per-key rate limits to prevent abuse

# Use environment variables
export YUNXIN_API_KEY="yx-..."

import os
client = OpenAI(
    api_key=os.environ["YUNXIN_API_KEY"],
    base_url="https://api.yuhuanstudio.com/v1"
)

Use Batch API for Bulk Operations

For non-time-sensitive tasks, use the Batch API for higher throughput and lower cost:

# Create a batch of requests
batch = client.batches.create(
    input_file_id="file-abc123",
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

Monitor Usage

Check the Analytics dashboard regularly for usage patterns
Set up budget alerts to avoid unexpected costs
Review error rates to identify integration issues
Use request logging for debugging

Choose the Right Model

Use the Models API (GET /v1/models) to discover and compare available models. Consider factors like context length, capabilities, and pricing when selecting a model for your use case.

curl "https://api.yuhuanstudio.com/v1/models?capability=chat" \
  -H "Authorization: Bearer YOUR_API_KEY"

Use Streaming for Long Responses

For chat completions and text generation, enable streaming to reduce time-to-first-token:

stream = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Write a detailed analysis..."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Handle Errors Gracefully

Always implement retry logic with exponential backoff:

import time
from openai import OpenAI, RateLimitError, APIConnectionError

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.yuhuanstudio.com/v1"
)

def make_request_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="model-id",
                messages=messages
            )
        except RateLimitError:
            wait = 2 ** attempt
            time.sleep(wait)
        except APIConnectionError:
            wait = 2 ** attempt
            time.sleep(wait)
    raise Exception("Max retries exceeded")

Optimize Token Usage

Be specific in prompts — shorter, clearer prompts reduce input tokens
Set max_tokens — prevent runaway responses
Use appropriate models — use smaller models for simpler tasks

# For simple classification, use a smaller model
response = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Classify: positive or negative? 'Great product!'"}],
    max_tokens=10
)

# For complex reasoning, use a capable model
response = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Analyze the economic implications of..."}],
    max_tokens=4096
)

Choose the Right API Format

Yunxin supports three API formats, each with different strengths:

Format	Best For	SDK
Chat Completions	General use, widest compatibility	OpenAI SDK
Messages API	Claude-specific features (thinking, document blocks)	Anthropic SDK
Responses API	Agent workflows, built-in tools	OpenAI SDK (newer)

Secure Your API Keys

Never hardcode keys in source code
Use environment variables or secret managers
Rotate keys regularly via the dashboard
Set per-key rate limits to prevent abuse

# Use environment variables
export YUNXIN_API_KEY="yx-..."

import os
client = OpenAI(
    api_key=os.environ["YUNXIN_API_KEY"],
    base_url="https://api.yuhuanstudio.com/v1"
)

Use Batch API for Bulk Operations

For non-time-sensitive tasks, use the Batch API for higher throughput and lower cost:

# Create a batch of requests
batch = client.batches.create(
    input_file_id="file-abc123",
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

Monitor Usage

Check the Analytics dashboard regularly for usage patterns
Set up budget alerts to avoid unexpected costs
Review error rates to identify integration issues
Use request logging for debugging

Choose the Right Model

Use the Models API (GET /v1/models) to discover and compare available models. Consider factors like context length, capabilities, and pricing when selecting a model for your use case.

curl "https://api.yuhuanstudio.com/v1/models?capability=chat" \
  -H "Authorization: Bearer YOUR_API_KEY"

Use Streaming for Long Responses

Handle Errors Gracefully

Optimize Token Usage

Choose the Right API Format

Secure Your API Keys

Use Batch API for Bulk Operations

Monitor Usage

Choose the Right Model

On this page

Best Practices

Use Streaming for Long Responses

Handle Errors Gracefully

Optimize Token Usage

Choose the Right API Format

Secure Your API Keys

Use Batch API for Bulk Operations

Monitor Usage

Choose the Right Model

On this page