Rate limits
Per-key RPM and concurrency limits, the RateLimit-* response headers, and how to handle a 429 response.
Each API key is capped at 120 requests per minute (RPM) and 10 concurrent generations by default; both values are configurable. When you exceed the RPM cap, the request is rejected with 429 rate_limit_exceeded and a Retry-After header telling you how many seconds to wait before retrying.
Default limits
Prop
Type
Response headers
The RateLimit-* headers are returned on every response, so you can track your remaining budget without waiting for a 429.
Prop
Type
The 429 response
HTTP/1.1 429 Too Many Requests
RateLimit-Limit: 120
RateLimit-Remaining: 0
RateLimit-Reset: 37
Retry-After: 37{
"error": {
"type": "rate_limit_error",
"code": "rate_limit_exceeded",
"message": "Rate limit exceeded. Please retry later."
}
}Honour Retry-After
On a 429, wait exactly the number of seconds given in Retry-After (or RateLimit-Reset) before retrying. That is more reliable than a fixed delay.
Concurrency is capped separately
The 10 concurrent-generation limit is independent of the RPM cap. If you run many long video jobs, queue them on your side so you don't hit the concurrency ceiling.