Rate limits
Breeze applies plan-based concurrent generation limits to protect long-running synthesis capacity. Standard API requests are not currently throttled by a published request-per-minute quota.
Concurrent generations
Concurrent generations are synthesis jobs that are admitted or running. Queued generations are short background waits for admitted capacity. Studio and Developer API jobs share the same per-user pool. Text-to-speech, streaming text-to-speech, async text-to-speech, voice design, and voice cloning all count toward these limits. Voice design counts each requested preview as one generation. When preview_count is omitted, Breeze generates one preview.
| Plan | Concurrent generations | Queued generations |
|---|---|---|
| Free | 3 | 0 |
| Starter | 6 | 2 |
| Creator | 10 | 3 |
| Pro | 20 | 5 |
When both admitted and queued generation capacity are full, the API returns 429 GENERATION_CONCURRENCY_EXCEEDED. When Breeze's shared generation capacity is temporarily unavailable, the API returns 503 GENERATION_CAPACITY_EXCEEDED.
Best practices
- Use exponential backoff with jitter for
429 GENERATION_CONCURRENCY_EXCEEDEDand transient5xxresponses. - Group text into natural requests instead of sending one word at a time.
- Treat async text-to-speech as a background job and poll for completion; async delivery does not bypass concurrent generation limits.
- Cache responses — identical inputs are re-billed.