What is the per-query cost and usage comparison between video-based generative AI and ChatGPT's $0.05 per query with 2M users?

It depends. Not all models are created equal. Not all models are used in the same way. In particular, video has a slightly more complex inference process than per-token queries from language models. And more importantly, you need to factor in what type of video task the user is requesting. Video editing and video generation are more than just generating a series of frames. Especially if you want to optimize for users having expression and controllability. In contrast, large language models, like the one behind ChatGPT, are taking advantage of zero-shot or few-shot techniques, which can generate good results with little to no new data. These models can more easily generalize to a wide spectrum of downstream tasks. No additional training required. Potentially, having one model to solve multiple problems like copywriting, code generation, chat applications, etc. The per-query costs of all those tasks will always be in the same ballpark, with inference optimizations being, for the most part, applicable to all tasks since it’s the same model. But in the case of video, just given the nature of the medium, the universe of transformations that can be made to all or parts of a video frame is a way more complex problem.

For Runway, the bet early on was to build a full-stack pipeline and uncover optimal cost-effective ways to deploy those models for creative use cases. And that also translates into finding optimal unit economics to deploy these kinds of models to millions of users. Considering all the quirks and nuances of how creatives work. It's a long-term investment. One that has not been necessarily easy.

There are short-term products and long-term products. We're focused on building long-term products. We’re a company building a product that will not offer marginal innovations or incremental improvements to an existing system. We're interested in inventing. Leap-frogging the current stack. That also needs to translate into our cost structure, making it cost-effective to use these models in production, something we spent a lot of time doing.