# Rate Limits (/docs/rate-limits)


Overview [#overview]

Rate limits protect the API from abuse and ensure fair access for all users. Limits are applied per API key.

Rate Limit Tiers [#rate-limit-tiers]

| Tier       | Requests/Min (RPM) | Tokens/Min (TPM) | Requests/Day |
| ---------- | ------------------ | ---------------- | ------------ |
| Free       | 10                 | 40,000           | 500          |
| Basic      | 50                 | 200,000          | 10,000       |
| Pro        | 100                | 1,000,000        | 100,000      |
| Enterprise | Unlimited          | Unlimited        | Unlimited    |

Rate Limit Headers [#rate-limit-headers]

Every API response includes rate limit information in the headers:

```http
X-RateLimit-Limit: 50
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1709251200
X-RateLimit-Resource: chat
X-RateLimit-Used: 3
X-Request-Priority: 2
```

| Header                  | Description                                                       |
| ----------------------- | ----------------------------------------------------------------- |
| `X-RateLimit-Limit`     | Maximum requests allowed per minute                               |
| `X-RateLimit-Remaining` | Remaining requests in the current window                          |
| `X-RateLimit-Reset`     | Unix timestamp when the rate limit resets                         |
| `X-RateLimit-Resource`  | The resource being rate limited (e.g., `chat`, `images`, `audio`) |
| `X-RateLimit-Used`      | Requests used in the current window                               |
| `X-Request-Priority`    | Your priority level (1=Low, 2=Normal, 3=High, 4=Critical)         |

Handling Rate Limits [#handling-rate-limits]

When you exceed the rate limit, the API returns a `429 Too Many Requests` response:

```json
{
  "error": {
    "type": "rate_limit_error",
    "message": "Rate limit exceeded. Please retry after 12 seconds.",
    "code": "rate_limit_exceeded"
  }
}
```

Retry Strategy [#retry-strategy]

Implement exponential backoff with jitter:

```python
import time
import random

def call_with_retry(fn, max_retries=5):
    for attempt in range(max_retries):
        try:
            return fn()
        except Exception as e:
            if "rate_limit" not in str(e):
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(wait)
    raise Exception("Max retries exceeded")
```

Best Practices [#best-practices]

* **Batch requests** — Combine multiple prompts into fewer requests where possible
* **Cache responses** — Store and reuse responses for identical queries
* **Use streaming** — Streaming responses count as a single request regardless of output length
* **Monitor usage** — Use the Dashboard analytics to track your consumption patterns
