Vapi control plane for voice

Diving deeper into

Vapi

Company Report
developers can use default providers or "bring their own" for each component
Analyzed 5 sources

This modularity is Vapi's real wedge, because it lets customers keep Vapi's call orchestration while swapping the expensive and sensitive parts of the stack underneath. In practice, a team can use Vapi for turn taking, interruptions, call routing, and tool calling, while choosing its own speech to text, model, voice, telephony, and storage setup to hit a specific cost target, latency target, or compliance rule such as HIPAA data handling.

  • The money flow is unusually transparent. Vapi adds a platform fee of $0.05 per minute, then layers in telephony, transcription, model, and voice costs from whatever providers power the call. That makes provider choice a direct lever on gross margin and customer pricing, not just a technical preference.
  • This also makes Vapi more of a control plane than a model vendor. Its own docs show bring your own keys for transcriber, LLM, voice, transport, and storage, while the orchestration layer stays on Vapi. That means switching from one voice or speech provider does not require rebuilding the whole agent.
  • The contrast with the market is clear. Deepgram now offers a bundled voice agent API with optional bring your own LLM and TTS, and OpenAI's realtime architecture pushes even further toward one model handling voice end to end. Vapi is betting many teams still want the freedom to mix providers instead of accepting a single vendor stack.

The next step is a split market. Simpler use cases will move toward bundled speech to speech stacks, while larger and regulated deployments will keep favoring orchestration layers that let teams choose each component separately. If Vapi keeps owning the orchestration layer while providers compete underneath it, its position gets stronger as the neutral hub for production voice agents.