Home  >  Companies  >  Velvet
Velvet
Tool for creative teams to instantly generate and edit cinematic AI videos

Funding

$500.00M

2025

View PDF
Details
Headquarters
San Francisco, CA
CEO
Lucas Mantovani
Website
Milestones
FOUNDING YEAR
2025

Valuation

Velvet raised approximately $500,000 in a pre-seed round as part of Y Combinator's Fall 2025 batch.

Product

None

As AI collapsed the time and cost to create video content from weeks and $10,000+ per video to minutes and $30/month by automating the entire production stack, a new bottleneck emerged: stitching together 20+ separate AI media creation tools—Google Veo for generation, ElevenLabs for voice, Midjourney for images, Runway for effects—each creating temporary files that users drag into professional editors like Adobe Premiere or DaVinci Resolve.

Velvet consolidates this fragmented workflow into a single browser-based video studio that integrates leading AI models behind a unified timeline editor, allowing users to generate, edit, and export videos without leaving the platform.

The platform aggregates multiple best-in-class AI video models including Google Veo, OpenAI Sora, Runway Gen-3, Tencent Kling, and ByteDance Seedance for video generation, plus ElevenLabs for voice synthesis and other specialized models for specific effects.

Users input text prompts or upload product photos, select their preferred AI model and video length (typically 5-10 seconds per clip), then Velvet queues the generation job on GPU backends and returns 2-4 candidate clips within 30-90 seconds.

These clips appear in a storyboard interface where users drag preferred shots onto a familiar non-linear timeline editor mirroring Premiere Pro's interaction model—with split and merge functions, cross-fades, trim tools, text overlays, logo insertion, aspect ratio presets (16, 9, 1), and color grading.

Enterprise customers get additional brand safety features that scan every frame for watermarks and compliance with brand guidelines before export, addressing a critical pain point as companies shift from internal training videos (where quality bars are lower) to external marketing content.

Collaboration features include Slack integration that automatically posts draft videos for team review and accepts regeneration commands directly through chat—"make the sunset more dramatic" or "replace the voiceover with a British accent"—without requiring users to log back into the web app.

The platform exports videos in multiple aspect ratios simultaneously, ready for different social media platforms and advertising slots, solving the repurposing workflow that typically requires separate tools.

Velvet's team demonstrated the platform's capabilities by producing their own YC launch video in approximately 5 hours for $50 in generation credits—a task that would have required $5,000-10,000 and 2-3 weeks using traditional video production, illustrating the 100x cost reduction and 10-50x speed improvement that AI video enables.

Business Model

Velvet monetizes high-velocity video creation via consumption-based credits rather than the slower expansion mechanics of storage and bandwidth that legacy video platforms like Vimeo and Wistia rely on, albeit at lower gross margins due to pass-through GPU compute costs.

Customers subscribe to monthly plans with predetermined credit allocations: the Picasso tier includes 140,000 credits (~120 Veo 3.1 video generations at ~1,167 credits per 5-10 second clip) while the Michelangelo tier offers 600,000 credits (~500 Veo 3.1 generations at ~1,200 credits per clip), plus 24/7 Slack support and brand compliance scanning.

The credit system abstracts away the complexity of different AI model pricing structures—a Veo generation might cost 1,200 credits while Runway Gen-3 costs 800 credits and Kling costs 1,000 credits—presenting unified pricing to customers while Velvet manages the underlying API relationships and usage fees with each model provider.

None

This aggregation creates margin opportunities: if Velvet pays $0.10 per Veo generation through Google's API but charges customers $0.15-0.20 in credit equivalents, the platform captures 50-100% gross margins on the spread, though margins compress as competition intensifies and model providers lower prices.

The business model faces margin pressure as Chris Savage, CEO of Wistia, predicts: "AI avatar providers will become commoditized, with prices driven down by competition similar to how CDNs evolved... the most successful players will partner with high-volume customers and continuously reduce costs as they scale."

This dynamic explains why Velvet can't rely solely on reselling third-party models but must add value through workflow integration, collaboration features, and eventually hosting and distribution capabilities.

For customers who lack in-house video production capabilities, Velvet offers an end-to-end managed service where Velvet's team produces full AI videos for brands, presents drafts via Slack, and iterates based on client feedback—essentially functioning as an AI-native video production agency that charges premium rates for hands-on creative direction while leveraging the same underlying platform tools that self-serve customers access.

Competition

None

The AI video market is bifurcating along whether products serve marketers (who need hosting, analytics & lead capture) or developers (who need APIs, SDKs & webhooks) and whether they're focused on generating net-new content or editing & distributing it, creating four battlegrounds: avatar generators (Synthesia at $146M ARR, HeyGen at $95M ARR), video AI APIs (Tavus, D-ID), AI-native editors (Descript at $55M ARR, CapCut), and hosting platforms (Vimeo at $450M revenue, Wistia at ~$70M ARR).

Velvet positions itself as a horizontal aggregator that combines model access with editing workflows, competing most directly with other aggregation platforms while facing encroachment from vertically integrated players moving in both directions.

Vertically integrated platforms

Adobe integrated Firefly Video into Premiere Pro for Creative Cloud's 30M+ subscribers, adding native AI video generation within the existing editing interface that professionals already use daily—representing the "innovator's dilemma" risk where incumbents bolt AI onto their installed base.

Google offers Veo-powered video creation through Workspace Vids (targeting enterprise teams on Google Workspace) and direct YouTube integration, while Meta launched a Vibes feed enabling in-app AI video remixing for Instagram and Facebook advertisers, potentially reducing reliance on external video tools as social platforms internalize creation.

Canva ($3.3B ARR in August 2025) represents the most formidable competitive threat in Velvet's target market: by incorporating AI avatar generation through HeyGen's API integration, transcription, translation, and basic video editing into its all-in-one creative platform alongside 150M monthly active users, Canva can bundle video features into existing $14.99/month subscriptions rather than requiring separate $89/month video tool contracts—driving adoption through distribution rather than superior features.

Model-centric creative suites

Runway ($90M ARR) operates Gen-3 and Gen-4 proprietary models with accompanying creative tools, targeting independent filmmakers and high-end creators with longer-form, character-consistent storytelling capabilities that command premium pricing.

Pika Labs offers 12-second video generation with effects like lip-sync and interactive editing, while Luma's Dream Machine provides high-fidelity physics simulation with mobile apps and enterprise distribution through Adobe and AWS Bedrock partnerships.

PixVerse has scaled to 100M global users with template-based editing agents and backing from Alibaba for creator outreach programs, demonstrating massive consumer traction though unclear monetization.

These model-first companies face commoditization risk as open-source alternatives emerge and hyperscalers (Google, OpenAI, Meta) release free or low-cost video generation APIs—potentially forcing them to move up-stack into applications (like Velvet) or down-stack into specialized infrastructure.

Horizontal aggregators

Multiple platforms compete directly with Velvet's aggregation approach, combining multiple AI models behind unified interfaces with varying emphasis on professional editing workflows.

Traditional video editing platforms like CapCut (ByteDance) and DaVinci Resolve are incorporating AI generation features through partnerships and native development, potentially capturing users before they consider dedicated AI video platforms—CapCut's 200M+ mobile users and free pricing create a formidable wedge into casual creators who may graduate to paid AI video tools only after exhausting free options.

The strategic question for Velvet is whether aggregation alone provides sustainable differentiation, or whether the platform must add proprietary models, vertical-specific templates, hosting infrastructure, or distribution features to defend margins as the core aggregation layer commoditizes.

TAM Expansion

Multimodal content creation

None

As companies now produce an average of 3 videos per week (up 3x from 1 per week in 2023) and YouTube uploads surged from 7M/day in 2022 to 20M/day by April 2025, the bottleneck is shifting downstream from creation to post-production workflows including push-button quality enhancement, cross-medium repurposing, asset management, and marketing automation integration.

Velvet could expand beyond generation into comprehensive video operations by adding DAM-type features (AI-assisted tagging, video indexing, semantic search within footage), automated quality enhancement (eye contact correction, super-resolution upscaling, soundtrack-to-footage beat sync), and distribution integrations with marketing automation platforms (HubSpot, Marketo) that personalize video variables based on CRM data.

This expansion would position Velvet to capture a larger share of the $500-2,000/month video workflow budget rather than just the $50-200/month generation component. Spectrum Reach produced nearly 7,000 AI-generated commercial projects in 2023 using the Waymark platform, demonstrating enterprise demand for high-volume video production that requires robust management and orchestration beyond just model access.

Enterprise brand customization and compliance

Large enterprises require video content that adheres to strict brand guidelines and avoids intellectual property risks—Hour One reported users generated over 3,248 days (9 years) of video content in a single year, creating massive compliance surface area.

Velvet could develop fine-tunable AI models trained on specific brand assets, creating private style libraries for Fortune 500 clients that generate on-brand content automatically while avoiding the "generic AI look" that commoditizes creative output across competitors using the same base models.

Brand compliance scanning that checks every frame for watermarks, logo placement, color accuracy, and style guide adherence would reduce legal review bottlenecks that currently gate high-volume advertising creative production.

This capability addresses the fact that 41% of companies now use AI in video production (up from 18% in 2023), creating urgent demand for tools that maintain quality control and brand consistency at scale.

Real-time variant generation for performance marketing

None

Social media advertising relies on rapid A/B testing of creative variants to optimize engagement and conversion—the average marketing video length dropped from 168 seconds in 2016 to 76 seconds in 2023, with companies adopting a bimodal content strategy of high-production long-form content alongside scrappy short-form clips flooding channels.

Velvet could automate the generation of dozens of aspect ratio, messaging, hook, and visual style variants from single prompts, then integrate with ad platforms (Meta Ads Manager, Google Ads, TikTok Ads) to automatically test variants and iteratively generate new creative based on performance data.

This closed-loop workflow would align with digital advertising budgets where brands need frequent creative refreshes to maintain engagement across platforms—companies tripled output from less than 1 video per week in 2022 to ~3 per week by 2024, with AI-powered tools enabling this velocity increase. Automated variant generation could compress the time and cost of creative testing cycles from weeks and thousands of dollars to hours and tens of dollars.

Risks

Model commoditization: As AI video generation commoditizes, with providers such as Google and OpenAI lowering prices, Velvet's aggregation model faces margin pressure. Differentiation beyond reselling third-party AI models becomes harder as similar capabilities become widely available at lower cost.

Platform integration: Incumbent platforms, including Adobe, Google, and Meta, are integrating AI video generation into existing workflows and user bases. Their distribution and embedded customer relationships could reduce adoption of standalone tools like Velvet as native features improve within familiar editing environments.

Enterprise sales complexity: While Velvet targets enterprise customers with brand compliance and collaboration features, selling to large organizations typically requires longer sales cycles, extensive security reviews, and custom integrations. The company's current product-led growth model may not translate effectively to enterprise procurement, which could limit expansion into higher-value customer segments and revenue.

DISCLAIMERS

This report is for information purposes only and is not to be used or considered as an offer or the solicitation of an offer to sell or to buy or subscribe for securities or other financial instruments. Nothing in this report constitutes investment, legal, accounting or tax advice or a representation that any investment or strategy is suitable or appropriate to your individual circumstances or otherwise constitutes a personal trade recommendation to you.

This research report has been prepared solely by Sacra and should not be considered a product of any person or entity that makes such report available, if any.

Information and opinions presented in the sections of the report were obtained or derived from sources Sacra believes are reliable, but Sacra makes no representation as to their accuracy or completeness. Past performance should not be taken as an indication or guarantee of future performance, and no representation or warranty, express or implied, is made regarding future performance. Information, opinions and estimates contained in this report reflect a determination at its original date of publication by Sacra and are subject to change without notice.

Sacra accepts no liability for loss arising from the use of the material presented in this report, except that this exclusion of liability does not apply to the extent that liability arises under specific statutes or regulations applicable to Sacra. Sacra may have issued, and may in the future issue, other reports that are inconsistent with, and reach different conclusions from, the information presented in this report. Those reports reflect different assumptions, views and analytical methods of the analysts who prepared them and Sacra is under no obligation to ensure that such other reports are brought to the attention of any recipient of this report.

All rights reserved. All material presented in this report, unless specifically indicated otherwise is under copyright to Sacra. Sacra reserves any and all intellectual property rights in the report. All trademarks, service marks and logos used in this report are trademarks or service marks or registered trademarks or service marks of Sacra. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any report is strictly prohibited. None of the material, nor its content, nor any copy of it, may be altered in any way, transmitted to, copied or distributed to any other party, without the prior express written permission of Sacra. Any unauthorized duplication, redistribution or disclosure of this report will result in prosecution.