Rate Limits

Overview

Rate limits protect the API from abuse and ensure fair access for all users. Limits are applied per API key.

Rate Limit Tiers

Tier	Requests/Min (RPM)	Tokens/Min (TPM)	Requests/Day
Free	10	40,000	500
Basic	50	200,000	10,000
Pro	100	1,000,000	100,000
Enterprise	Unlimited	Unlimited	Unlimited

Rate Limit Headers

Every API response includes rate limit information in the headers:

X-RateLimit-Limit: 50
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1709251200
X-RateLimit-Resource: chat
X-RateLimit-Used: 3
X-Request-Priority: 2

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed per minute
`X-RateLimit-Remaining`	Remaining requests in the current window
`X-RateLimit-Reset`	Unix timestamp when the rate limit resets
`X-RateLimit-Resource`	The resource being rate limited (e.g., `chat`, `images`, `audio`)
`X-RateLimit-Used`	Requests used in the current window
`X-Request-Priority`	Your priority level (1=Low, 2=Normal, 3=High, 4=Critical)

Handling Rate Limits

When you exceed the rate limit, the API returns a 429 Too Many Requests response:

{
  "error": {
    "type": "rate_limit_error",
    "message": "Rate limit exceeded. Please retry after 12 seconds.",
    "code": "rate_limit_exceeded"
  }
}

Retry Strategy

Implement exponential backoff with jitter:

import time
import random

def call_with_retry(fn, max_retries=5):
    for attempt in range(max_retries):
        try:
            return fn()
        except Exception as e:
            if "rate_limit" not in str(e):
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(wait)
    raise Exception("Max retries exceeded")

Best Practices

Batch requests — Combine multiple prompts into fewer requests where possible
Cache responses — Store and reuse responses for identical queries
Use streaming — Streaming responses count as a single request regardless of output length
Monitor usage — Use the Dashboard analytics to track your consumption patterns

Overview

Rate limits protect the API from abuse and ensure fair access for all users. Limits are applied per API key.

Rate Limit Tiers

Tier	Requests/Min (RPM)	Tokens/Min (TPM)	Requests/Day
Free	10	40,000	500
Basic	50	200,000	10,000
Pro	100	1,000,000	100,000
Enterprise	Unlimited	Unlimited	Unlimited

Rate Limit Headers

Every API response includes rate limit information in the headers:

X-RateLimit-Limit: 50
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1709251200
X-RateLimit-Resource: chat
X-RateLimit-Used: 3
X-Request-Priority: 2

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed per minute
`X-RateLimit-Remaining`	Remaining requests in the current window
`X-RateLimit-Reset`	Unix timestamp when the rate limit resets
`X-RateLimit-Resource`	The resource being rate limited (e.g., `chat`, `images`, `audio`)
`X-RateLimit-Used`	Requests used in the current window
`X-Request-Priority`	Your priority level (1=Low, 2=Normal, 3=High, 4=Critical)

Handling Rate Limits

When you exceed the rate limit, the API returns a 429 Too Many Requests response:

{
  "error": {
    "type": "rate_limit_error",
    "message": "Rate limit exceeded. Please retry after 12 seconds.",
    "code": "rate_limit_exceeded"
  }
}

Retry Strategy

Implement exponential backoff with jitter:

import time
import random

def call_with_retry(fn, max_retries=5):
    for attempt in range(max_retries):
        try:
            return fn()
        except Exception as e:
            if "rate_limit" not in str(e):
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(wait)
    raise Exception("Max retries exceeded")

Best Practices

Batch requests — Combine multiple prompts into fewer requests where possible
Cache responses — Store and reuse responses for identical queries
Use streaming — Streaming responses count as a single request regardless of output length
Monitor usage — Use the Dashboard analytics to track your consumption patterns

Overview

Rate Limit Tiers

Rate Limit Headers

Handling Rate Limits

Retry Strategy

Best Practices

On this page

Rate Limits

Overview

Rate Limit Tiers

Rate Limit Headers

Handling Rate Limits

Retry Strategy

Best Practices

On this page