Vapi moving up stack for defensibility

Diving deeper into

Vapi

Company Report
suggesting a strategy of gradually moving up the stack to improve margins and defensibility.
Analyzed 5 sources

This points to a classic infrastructure climb, start as the control layer, then internalize the expensive and performance critical pieces once usage is large enough to justify it. Vapi began by stitching together telephony, speech, and model APIs, which kept the product flexible and fast to ship, but left gross margin and reliability exposed to outside vendors. Building its own concurrency and audio transport layer is the first step toward owning more of the real time call path.

  • Today, Vapi mainly monetizes with a $0.05 per minute platform fee while passing through telephony, transcription, model, and voice costs. That makes adoption easy for developers, but it also means much of the unit economics still sit with suppliers. Owning more of the stack lets Vapi keep a larger share of each call.
  • The competitive benchmark is companies like Bland AI and Deepgram, which already push deeper into owned speech infrastructure. Deepgram now sells speech-to-text, text-to-speech, and a Voice Agent API, showing how voice vendors expand from one layer into a fuller stack to capture more budget and reduce dependence on partners.
  • There is also a defensive reason to move up stack. If Vapi stays only as an orchestration layer, core providers can add orchestration themselves, as Cartesia has done with Line. By owning more runtime, deployment, and observability, Vapi becomes harder to replace than a simple router between third party APIs.

The likely next step is not that Vapi tries to build every model itself, but that it keeps absorbing the layers that most affect latency, uptime, enterprise deployment, and take rate. That would move it from a flexible developer wrapper into a real voice infrastructure platform, with better margins, stronger switching costs, and more leverage against both model vendors and contact center incumbents.