AI Avatar Infrastructure Wins
Chris Savage, CEO of Wistia, on the economics of AI avatars
The real power in AI avatars sits below the app, because the winning layer is the one every sales tool, training tool, support tool, and video platform can plug into. In practice, that means selling avatar generation, voice, translation, streaming, and consent workflows as metered infrastructure, then letting dozens of application companies bundle those capabilities into their own products and push unit costs down with scale.
-
The market is already splitting this way. Synthesia and HeyGen package avatars into end user video apps with editing, hosting, and templates, while Tavus is going API first so avatars can be embedded inside software like HubSpot, Intercom, or Shopify instead of forcing users into a separate creation tool.
-
Infrastructure wins when usage fragments across many workflows. A training team needs cheap batch video generation, a sales team needs personalized outbound clips, and a website agent needs real time conversational streaming. Tavus prices around minutes and streams, and HeyGen also sells separate API plans, which shows the stack is becoming a usage based input rather than a standalone app purchase.
-
This follows the same pattern as other API layers. Recall.ai turned meeting bots into a universal API sold per hour of meeting data, abstracting away dozens of brittle integrations. Avatar infrastructure can do the same for consent capture, rendering, lip sync, translation, and delivery across many surfaces, while application companies keep the customer relationship.
As avatar quality rises and cost per minute falls, more software companies will treat video faces the way they treat payments, maps, or transcription today, as embedded infrastructure bought wholesale and marked into their own product. The companies that own that layer will compound fastest because every new use case adds volume, data, and distribution without requiring them to win the end application each time.