Rate limits

Breeze applies plan-based concurrent generation limits to protect long-running synthesis capacity. Standard API requests are not currently throttled by a published request-per-minute quota.

Concurrent generations

Concurrent generations are synthesis jobs that are admitted or running. Queued generations are short background waits for admitted capacity. Studio and Developer API jobs share the same per-user pool. Text-to-speech, streaming text-to-speech, async text-to-speech, voice design, and voice cloning all count toward these limits. Voice design counts each requested preview as one generation. When preview_count is omitted, Breeze generates one preview.

PlanConcurrent generationsQueued generations
Free30
Starter62
Creator103
Pro205

When both admitted and queued generation capacity are full, the API returns 429 GENERATION_CONCURRENCY_EXCEEDED. When Breeze's shared generation capacity is temporarily unavailable, the API returns 503 GENERATION_CAPACITY_EXCEEDED.

Best practices

  • Use exponential backoff with jitter for 429 GENERATION_CONCURRENCY_EXCEEDED and transient 5xx responses.
  • Group text into natural requests instead of sending one word at a time.
  • Treat async text-to-speech as a background job and poll for completion; async delivery does not bypass concurrent generation limits.
  • Cache responses — identical inputs are re-billed.