
Funding
$47.50M
2025
Valuation
Sesame AI raised $47.5 million in a Series A round led by Andreessen Horowitz in February 2025. Other investors in the round included Spark Capital and Matrix Partners. The company is reportedly in talks to raise over $200 million in a follow-on round, though this financing has not yet closed.
Product

Sesame AI is a conversational AI company that has built emotionally intelligent voice assistants called Maya and Miles that can interrupt, laugh, change tone mid-sentence, and respond to emotional cues in real-time conversations. Unlike traditional voice assistants that follow a rigid speech-to-text-to-speech pipeline, Sesame's Conversational Speech Model processes both text and audio tokens simultaneously, enabling natural conversational flow with 200-300 millisecond response times.
Users interact with Maya and Miles through a web browser or mobile app, where they can have extended conversations that feel remarkably human-like. The AI companions remember previous parts of the conversation, reference earlier topics, and modulate their delivery based on the user's emotional state. If a user mentions being a coffee lover, Maya might later ask about their favorite roast, demonstrating contextual memory that spans up to two minutes of dialogue history.
The underlying technology uses a transformer architecture trained on over one million hours of filtered audio data, with explicit modeling of emotional and prosodic elements like pitch, rhythm, and energy. The system can run on consumer-grade hardware through model sizes ranging from 1 billion to 8 billion parameters, making it feasible for on-device deployment in the company's planned smart glasses hardware.
Business Model
Sesame AI operates a B2C model for its consumer AI companions while building B2B revenue streams through enterprise licensing and developer APIs. The company's go-to-market strategy centers on creating an audio-first computing platform that moves beyond screen-based interactions to always-available voice companions.
The consumer business model revolves around subscription access to Maya and Miles, with the long-term vision of bundling these AI companions with proprietary smart glasses hardware. This vertical integration approach allows Sesame to control the entire user experience while capturing both software subscription revenue and higher-margin hardware sales.
On the enterprise side, Sesame monetizes through hosted inference services, premium model fine-tuning, and SLA-backed enterprise tiers for companies wanting to embed emotionally intelligent voices in their products. The open-source release of CSM-1B creates a freemium funnel where developers can experiment with the base model before upgrading to paid hosting or custom voice solutions.
The business model benefits from network effects as more conversations improve the emotional modeling capabilities, while the audio-first form factor creates switching costs once users integrate the AI companions into their daily routines. The company's approach of building both the AI models and the hardware creates multiple monetization touchpoints and defensibility against pure software competitors.
Competition
Big tech platform integration
OpenAI, Google, Amazon, and Apple are rapidly integrating conversational AI into their existing ecosystems, creating formidable competition through distribution advantages. OpenAI's GPT-4o voice mode reaches over 200 million monthly active users through ChatGPT, while Google is rebuilding Assistant with generative AI capabilities across its Pixel hardware ecosystem. Amazon's generative Alexa relaunch targets its 100 million Echo install base, and Apple's Siri integration with on-device LLMs in iOS creates tight privacy positioning that resonates with health and family use cases.
These platform players can amortize model development costs across massive user bases and device ecosystems, potentially commoditizing natural voice interaction through aggressive pricing. Their vertical integration allows them to bundle voice AI as a loss leader while monetizing through other services, creating pricing pressure that independent voice AI companies struggle to match.
Voice infrastructure specialists
ElevenLabs dominates voice cloning and real-time voice APIs with over $300 million in funding and strong adoption across gaming, media, and enterprise applications. PlayAI and other infrastructure providers are building multi-layer APIs that let developers create custom voice agents, competing directly for the developer mindshare that Sesame needs to build its enterprise ecosystem.
These specialists often have deeper technical capabilities in specific areas like voice cloning or multi-language support, while maintaining asset-light business models that can scale rapidly. Their focus on developer tools and APIs creates competitive pressure on Sesame's enterprise monetization strategy, particularly as voice quality and latency continue to improve across the industry.
Vertical AI companions
Companies like Character.AI and Replika have built large user bases around AI companionship, while newer entrants focus on specific use cases like healthcare, eldercare, or professional assistance. These vertical players often have deeper domain expertise and can optimize their models for specific conversational contexts that general-purpose assistants struggle to match.
The emergence of specialized AI companions in areas like mental health, education, and customer service creates fragmentation in the market that could limit Sesame's ability to build a dominant horizontal platform. Each vertical requires different conversation patterns, regulatory compliance, and user experience design that favors focused competitors over generalist approaches.
TAM Expansion
Hardware integration
Sesame's development of audio-first smart glasses represents a significant TAM expansion opportunity beyond pure software licensing. The wearable computing market is projected to reach $185 billion by 2030, with audio-first devices avoiding the privacy concerns and social acceptance issues that plague camera-enabled AR glasses.
By controlling the full hardware and software stack, Sesame can capture higher-margin device revenue while creating a more defensible competitive position. The glasses form factor enables always-on AI interaction without the friction of pulling out a phone, potentially creating new usage patterns around continuous AI assistance throughout the day.
Enterprise voice solutions
The open-sourcing of CSM-1B creates pathways into enterprise markets worth over $50 billion annually across customer service, gaming, and interactive media. Companies building call center automation, in-game NPCs, and branded voice experiences need emotionally intelligent speech that current text-to-speech solutions cannot provide.
Sesame's ability to generate voices that can interrupt, express emotion, and maintain conversational context addresses key limitations in existing enterprise voice solutions. The automotive industry alone represents a $5 billion opportunity by 2028 as car manufacturers seek more natural in-vehicle assistants that can handle complex, multi-turn conversations while driving.
Global market expansion
The planned expansion to over 20 languages multiplies Sesame's addressable market beyond English-speaking consumers to include 3.4 billion speakers of other languages. This expansion is particularly valuable in markets where audio-first interaction is preferred due to lower smartphone penetration or cultural preferences for voice communication.
International expansion also opens business process outsourcing opportunities in regions like Asia-Pacific and Latin America, where emotionally intelligent voice AI can transform call center operations and customer service delivery. The audio-first approach requires less localization than screen-based interfaces, making global scaling more efficient than traditional software products.
Risks
Model commoditization: The rapid improvement in voice AI capabilities across OpenAI, Google, and other well-funded competitors threatens to commoditize natural conversation as a differentiator. As latency decreases and emotional modeling becomes standard across the industry, Sesame's core technical advantages may erode faster than the company can build sustainable competitive moats through hardware integration or specialized use cases.
Hardware execution: Sesame's transition from software to consumer hardware introduces significant operational complexity, supply chain risk, and capital requirements that the company has not yet proven it can manage. The smart glasses market has seen multiple high-profile failures from better-resourced companies, and consumer adoption of new wearable form factors remains unpredictable despite technological advances.
Privacy backlash: Always-on AI companions that process intimate conversations create substantial privacy risks that could trigger regulatory restrictions or consumer backlash. As AI companions become more emotionally sophisticated and integrated into daily life, concerns about data collection, emotional manipulation, and psychological dependency may limit adoption or force costly compliance measures that undermine the business model.
News
DISCLAIMERS
This report is for information purposes only and is not to be used or considered as an offer or the solicitation of an offer to sell or to buy or subscribe for securities or other financial instruments. Nothing in this report constitutes investment, legal, accounting or tax advice or a representation that any investment or strategy is suitable or appropriate to your individual circumstances or otherwise constitutes a personal trade recommendation to you.
This research report has been prepared solely by Sacra and should not be considered a product of any person or entity that makes such report available, if any.
Information and opinions presented in the sections of the report were obtained or derived from sources Sacra believes are reliable, but Sacra makes no representation as to their accuracy or completeness. Past performance should not be taken as an indication or guarantee of future performance, and no representation or warranty, express or implied, is made regarding future performance. Information, opinions and estimates contained in this report reflect a determination at its original date of publication by Sacra and are subject to change without notice.
Sacra accepts no liability for loss arising from the use of the material presented in this report, except that this exclusion of liability does not apply to the extent that liability arises under specific statutes or regulations applicable to Sacra. Sacra may have issued, and may in the future issue, other reports that are inconsistent with, and reach different conclusions from, the information presented in this report. Those reports reflect different assumptions, views and analytical methods of the analysts who prepared them and Sacra is under no obligation to ensure that such other reports are brought to the attention of any recipient of this report.
All rights reserved. All material presented in this report, unless specifically indicated otherwise is under copyright to Sacra. Sacra reserves any and all intellectual property rights in the report. All trademarks, service marks and logos used in this report are trademarks or service marks or registered trademarks or service marks of Sacra. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any report is strictly prohibited. None of the material, nor its content, nor any copy of it, may be altered in any way, transmitted to, copied or distributed to any other party, without the prior express written permission of Sacra. Any unauthorized duplication, redistribution or disclosure of this report will result in prosecution.