Third-Party AI Dependence Caps Margins
Retell AI
Retell’s biggest strategic risk is that it sits in the thinnest layer of the stack, the orchestration layer, where customer value is real but the core intelligence and much of the cost base still belong to someone else. Retell charges a $0.055 per minute platform fee on top of pass through model, speech to text, and text to speech costs, which makes growth efficient today but caps how much gross margin it can capture unless it owns more of the voice pipeline or adds higher value workflow software around it.
-
The product works by routing each call through separate providers for transcription, reasoning, and speech generation. That gives customers flexibility to swap OpenAI, Anthropic, Gemini, Deepgram, Cartesia, or ElevenLabs, but it also means Retell cannot fully control latency, reliability, or input costs the way a vertically integrated rival can.
-
Vapi has a nearly identical business model, with a per minute platform fee layered on top of third party telephony and AI costs. That shows this dependency is not unique to Retell, it is a structural feature of the model agnostic voice platform category, which makes durable differentiation harder unless one player builds proprietary infrastructure or stronger application software.
-
The clearest contrast is Bland AI, which is described as hosting its own speech recognition, language model, and speech systems in house. That kind of integration can improve real time performance and preserve more economics per call, while Retell’s model agnostic approach is better for speed, flexibility, and developer adoption early on.
The next phase of competition is likely to shift from who can connect the most APIs to who can own more of the stack or bundle enough workflow, routing, QA, testing, and analytics to become the system a contact center actually runs on. If Retell succeeds there, it can turn a thin infrastructure margin into a thicker software margin and make provider dependency matter less.