Rate Limits
Understanding and working with Yunxin API rate limits.
Overview
Rate limits protect the API from abuse and ensure fair access for all users. Limits are applied per API key.
Rate Limit Tiers
| Tier | Requests/Min (RPM) | Tokens/Min (TPM) | Requests/Day |
|---|---|---|---|
| Free | 10 | 40,000 | 500 |
| Basic | 50 | 200,000 | 10,000 |
| Pro | 100 | 1,000,000 | 100,000 |
| Enterprise | Unlimited | Unlimited | Unlimited |
Rate Limit Headers
Every API response includes rate limit information in the headers:
X-RateLimit-Limit: 50
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1709251200
X-RateLimit-Resource: chat
X-RateLimit-Used: 3
X-Request-Priority: 2| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests allowed per minute |
X-RateLimit-Remaining | Remaining requests in the current window |
X-RateLimit-Reset | Unix timestamp when the rate limit resets |
X-RateLimit-Resource | The resource being rate limited (e.g., chat, images, audio) |
X-RateLimit-Used | Requests used in the current window |
X-Request-Priority | Your priority level (1=Low, 2=Normal, 3=High, 4=Critical) |
Handling Rate Limits
When you exceed the rate limit, the API returns a 429 Too Many Requests response:
{
"error": {
"type": "rate_limit_error",
"message": "Rate limit exceeded. Please retry after 12 seconds.",
"code": "rate_limit_exceeded"
}
}Retry Strategy
Implement exponential backoff with jitter:
import time
import random
def call_with_retry(fn, max_retries=5):
for attempt in range(max_retries):
try:
return fn()
except Exception as e:
if "rate_limit" not in str(e):
raise
wait = (2 ** attempt) + random.uniform(0, 1)
time.sleep(wait)
raise Exception("Max retries exceeded")Best Practices
- Batch requests — Combine multiple prompts into fewer requests where possible
- Cache responses — Store and reuse responses for identical queries
- Use streaming — Streaming responses count as a single request regardless of output length
- Monitor usage — Use the Dashboard analytics to track your consumption patterns
How is this guide?