Background
Chris Savage is the CEO and co-founder of Wistia. We talked with Chris about the gross margins and pricing models we can expect to see from AI video companies like HeyGen and Synthesia, the commodization of the infrastructure players enabling AI video, and how AI-generated videos will change the volume and frequency with which businesses create video.
Questions
- What’s the big use case that you see driving adoption of AI avatars or talking head videos?
- How do you expect AI talking heads, text-to-video and video editing via editing text to impact the overall volume and frequency of video content creation by businesses? How does that affect the video hosting business?
- Being able to have your videos instantly translated into dozens of languages has become a key selling point for AI avatar platforms. Can you talk about how big of an impact translation will have as a feature? What use case do you see it being most useful for—marketing, sales, learning & development, etc?
- What do you anticipate the gross margins to be for AI talking head video platforms? How do the margins compare to traditional video hosting, factoring in the additional cost of AI models and compute for the AI avatars, plus translation, transcription and more?
- You're saying that it makes the most sense for us to end up where e.g. the sales and marketing AI avatar experience lives in HubSpot, the customer service experience lives in Intercom, maybe the ecommerce one lives in Shopify.
- How do you think about Wistia’s place in this? All the video being created with AI has to be hosted somewhere—do you think about Wistia mainly as the place for all these AI videos to be hosted?
- How do you think AI talking head platforms will think about pricing to balance the value they’re delivering with creating sustainable economics? Will it be per avatar, per video duration, per render, or some other metric?
Interview
What’s the big use case that you see driving adoption of AI avatars or talking head videos?
It's hard for me to say. The answer to this question can be different if you're the creator of these models or not. You talk to the people who are making this stuff, and they're like, "It's ready for prime time. It looks real."
My estimation is it's really close. I think there are certain cases where it looks real, but I don't think that's the default. But there are certain applications of video where it doesn't have to look real—and where entertainment and hooking someone to stick with you is not required.
At least the applications where businesses are using AI avatars today are where it's like training content. A lot of corporate learning and training is already robotic, and it is not entertaining. You write the script of the exact things people need to know, and then the regulations change, and you just tweak the script and re-generate the video. You don't have to reshoot. I think that's probably where the product-market fit is with this today.
How do you expect AI talking heads, text-to-video and video editing via editing text to impact the overall volume and frequency of video content creation by businesses? How does that affect the video hosting business?
What a lot of these AI avatar companies are really doing is increasing quality. When it's done perfectly, it helps you increase the quality of the videos you're making in a lot of cases. It’s giving you lighting that looks really good, or it's hiding edits, or it's fixing flubs.
The upshot is that there’s going to be way, way more video in general. We are already on a trend because of COVID of expecting that, you know, everyone learns differently.
Some people want to read. Some people want to listen. Some people want to watch. Well, if you're not giving people ways to watch, you are going to be missing out big time because the world around you is changing so quickly. And so I think about that a lot. That trend's already happening. This is only going to accelerate that.
On the flip side, we always want to know who to hold responsible for things. If someone overpromises and underdelivers, you remember that that person overpromised. If someone says they're going to give you a great product and it isn't great, you remember that.
AI avatars are going to make creating a connection to the individual person and understanding that relationship even more important.
Being able to have your videos instantly translated into dozens of languages has become a key selling point for AI avatar platforms. Can you talk about how big of an impact translation will have as a feature? What use case do you see it being most useful for—marketing, sales, learning & development, etc?
I think the AI translation stuff specifically, that's behind the scenes of this, is getting so good that it's going to change how companies think about going international. There'll be countries that before you would have thought, "Man, to translate all this in, I'd have to have a huge team constantly updating it."
Now, I think it will look like you're able to use AI translation tools, but you'll still need native speakers to check if it actually makes sense and to tweak it. This is going to allow for much more internationalization, and checking and tweaking are going to become much more important.
And this is a bit of an aside, but when you think about the changes here, I ask myself what percentage of content today is translated into different languages—say, business content versus infotainment. It's got to be unbelievably low.
Generously, let's say it's a percent. When you can trust AI to do this, it changes the job entirely. In a corporate setting, people are going to still want to make sure the messages they're putting out are accurate. They're not offending people, and they're not doing things that shouldn't be done.
You might be increasing the demand for translation services by like 10x, but it's a different type of work. It's not zero to 100; it's like 99 to 100. The amount of translated content might be 10 times more, and the number of people checking it might dramatically increase.
It's just a little funny example of something that might shake out like this, and I think it's a pretty likely scenario that AI basically creates much more work, but just a very slight tweak on the work that people are doing today.
What do you anticipate the gross margins to be for AI talking head video platforms? How do the margins compare to traditional video hosting, factoring in the additional cost of AI models and compute for the AI avatars, plus translation, transcription and more?
I think that's going to be similar to how CDNs have worked. The first generation of it was very expensive, very capital intensive to get the servers in place. They could charge a fair amount to be like, "Hey, your content's actually fast. Can we deliver it everywhere?"
As evolutions have gone on and there's much more competition, ultimately, it's an infrastructure play. It ends up being the cost of goods for other people. The lowest price, most efficient thing wins.
There are so many players in the avatar API space, similar to the large language models, where the cost is being driven to zero. We should expect a small number of big players and probably open source versions. I don't see why this would be that different.
The challenge is doing two miracles instead of one—having a massive technical breakthrough and building an application that's actually easy and solves the right problems in the right places.
I think the cost per minute of creation of these things will be driven down over time, and the quality will be pushed up. They'll be incorporated into all places they make sense, which is going to be really broad. It's like oh, personalized sales videos that are tied into your MAP will differ from using videos for training, which will differ from live avatars on your site that you talk to, which helps you. Each usage will need a different approach.
If you think about how broad these problems are when you have a solution that works well, like a fake human, it will be everywhere. And so I think the companies that figure out how to be the infrastructure behind it all are the ones most likely to win.
They're the ones who can get to scale and align themselves with all these partner applications, saying, "This is going to be part of your cost of goods. It's going to truly enable something new, and we're going to drive down the price as we get to scale." I think that's going to be what takes the day here.
You're saying that it makes the most sense for us to end up where e.g. the sales and marketing AI avatar experience lives in HubSpot, the customer service experience lives in Intercom, maybe the ecommerce one lives in Shopify.
Absolutely, they're all going to do it without a doubt. They're going to look at all the API providers, play them against each other on price, and because they're so similar, they're going to be able to drive the price down significantly.
I think it's really scary if you’re trying to be the infrastructure layer. To win, you have to be the most trusted and deliver the highest quality.
If you're trying to be both the infrastructure business and the application business at the same time, that’s very tricky, especially for a startup.
How do you think about Wistia’s place in this? All the video being created with AI has to be hosted somewhere—do you think about Wistia mainly as the place for all these AI videos to be hosted?
Yes, we already see tons of AI videos coming into us. For Wistia, we are the application layer, so we're not just their infrastructure provider. We aim to give insights on how other videos are performing, provide tools to edit videos, improve them, the ability to record, and to do live events.
We're for the marketer who needs things to be simple, fast, easy, and reliable. Will we integrate AI features into our platform over time? Absolutely. We will be one of those applications where if you're making marketing videos and you need to tweak things through your editor, we'll look at the stuff and when the fidelity is there, we will incorporate it.
How do you think AI talking head platforms will think about pricing to balance the value they’re delivering with creating sustainable economics? Will it be per avatar, per video duration, per render, or some other metric?
I think it's going to force the highest value usage versions of these implementations. So, people who use this to make a lot of training content will see their production costs dramatically lowered, and they'll happily pay for it. But will someone use this for all their support videos right out of the gate? I don’t know.
It depends on the pricing, and actually, translation and other services can get expensive. It's going to be a while until those costs come down.
What I expect us to see is similar to what we've done at Wistia, which is we translate your captions into a different language. Companies will have to make the choice of whether to actually dub in different languages and determine when to do it.
If the pricing comes down, we'll see more people doing it. That's why I believe that the players in this space that are truly positioned as infrastructure players are going to be in the best position to keep cutting prices, like AWS or Twilio. They'll find the margin they want and keep lowering prices to enable more use cases. The first company to do so will be in a very unique position.
Disclaimers
This transcript is for information purposes only and does not constitute advice of any type or trade recommendation and should not form the basis of any investment decision. Sacra accepts no liability for the transcript or for any errors, omissions or inaccuracies in respect of it. The views of the experts expressed in the transcript are those of the experts and they are not endorsed by, nor do they represent the opinion of Sacra. Sacra reserves all copyright, intellectual property rights in the transcript. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any transcript is strictly prohibited.