Cartesia's Cost Edge Under Threat
Cartesia
The real threat is not just cheaper pricing, it is that bundlers are turning price into a feature of distribution. Cartesia used to win by being the fast, low cost speech layer that developers plugged into a separate agent stack. As ElevenLabs and Deepgram package speech, orchestration, and deployment together, buyers increasingly compare one monthly voice agent bill versus another, not one TTS endpoint versus another.
-
ElevenLabs now sells voice agents starting at $0.10 per minute, down about 50%, and explicitly says the cut comes from owning both the model research and the product layer. That matters because it narrows the gap between Cartesia as a specialist component and ElevenLabs as a bundled stack.
-
Deepgram is pushing the same bundled logic from the STT side. Its Voice Agent API rolls STT, TTS, orchestration, and deployment choices into one product at $4.50 per hour, or about $0.075 per minute, which makes single vendor procurement easier for contact center and regulated enterprise buyers.
-
Cartesia is answering by climbing the stack itself. Sonic handles low latency speech generation, Ink handles live transcription on noisy calls, and Line adds telephony, logging, evaluations, and deployment. That turns Cartesia from a cheaper voice model into a fuller operating layer for teams that still want best component performance and private deployment.
The next phase of this market is a fight over who owns the whole real time voice loop. Pricing pressure will keep collapsing standalone model premiums, so Cartesia's path is to make its latency, on prem deployment, and enterprise control valuable enough that customers buy the integrated Cartesia stack, not just the cheapest minute of speech.