ElevenLabs as Voice Agent Infrastructure

Diving deeper into

ElevenLabs at $90M ARR

Document
its premium-priced, human-sounding text-to-speech model will become core infrastructure for the rising wave of AI voice agents
Analyzed 6 sources

ElevenLabs is trying to become the voice layer that every AI agent plugs into, not just a tool for creators to make voiceovers. That matters because voice agents need speech that sounds natural enough for a real customer call, while also working fast enough for live back and forth. ElevenLabs already had premium positioning in cloning and text to speech, then expanded into a broader agents stack with speech, telephony, and multimodal agent tooling, which makes it easier to stay inside the workflow as voice agents move from demos to production.

  • The clearest proof point is that AI agent companies are already building on top of it. Decagon launched voice support with ElevenLabs in February 2025, using it inside customer service agents that handle account access, returns, and charge disputes across phone, chat, and email.
  • The wedge is quality, but the tradeoff is price. ElevenLabs was estimated at roughly 5x Cartesia per minute in late 2024, which means buyers only keep paying up if the better voice quality lifts call completion, customer satisfaction, or brand trust enough to offset higher usage cost.
  • The market is converging from separate speech modules into full voice stacks. Cartesia now bundles text to speech, speech to text, and agent tools in one pricing surface, while ElevenLabs has expanded from speech generation into a full agents platform with phone, web, app, and text plus voice support.

The next step is a fight over who owns the whole voice agent stack. If ElevenLabs keeps winning on realism while adding more workflow, routing, and deployment features, it can move from premium model vendor to default infrastructure for enterprise voice agents. That would shift it closer to the center of customer support, sales, healthcare, and scheduling software.