Infrastructure wins in AI avatars
Chris Savage, CEO of Wistia, on the economics of AI avatars
The winning layer in AI avatars is likely to look less like software sold seat by seat and more like metered compute sold by the minute. The reason is simple. As dubbing, voice cloning, and avatar rendering get cheaper, the provider with the deepest API usage, the most volume, and the thinnest target margin can keep dropping price and pull more demand onto its rails, the same way AWS and Twilio expanded markets by making usage cheap and easy.
-
Infrastructure players monetize repeated usage, not one time creation. Twilio prices voice, messaging, and functions on pay as you go units, and Tavus sells developer plans by minutes, interactions, concurrency, and volume discounts. That model makes lower prices a growth lever because cheaper units create more total traffic.
-
In AI video, this favors API first suppliers that can sit underneath many apps at once. Tavus is positioned as an avatar API, while HeyGen mixes a creation app with developer APIs, and Synthesia has pushed upward into hosting, analytics, and publishing. The more product surface a company owns, the less purely it behaves like infrastructure.
-
The market is already moving this way. AI avatar video creation collapsed from roughly $10,000 production workflows to about $30 and minutes of generation, which opened onboarding, compliance, sales outreach, and translation use cases that were previously too expensive to justify. Lower unit cost did not just win share, it created entirely new demand.
Over time, the category should split cleanly in two. A few high volume infrastructure layers will compete on cost, reliability, and APIs, while application companies bundle avatars into broader video workflows for marketing, training, and sales. As prices keep falling, the largest opportunity shifts from selling premium avatar magic to powering millions of routine video interactions invisibly in the background.