Filip Kozera, CEO of Wordware, on the rise of vibe doing

Background

We recently talked to Warp CEO Zach Lloyd about the shift from IDE to agentic terminal in software development.

To understand how the same wave is moving into broader knowledge work, we reached out to Filip Kozera, co-founder and CEO of Wordware ($30M raised, Y Combinator S24), the creator of Sauna.

Key points from our conversation via Sacra AI:

Vibe coding for software engineering & building apps has opened the door to “vibe doing” for all knowledge workers, which applies the AI agent delegation model of Claude Code, Codex (OpenAI) and Cursor Agents to the work of chiefs-of-staff (email & chat), managers (project management, slides & docs) and executive assistants (scheduling, calendar, meeting prep), with startups like Sauna ($30M seed, Spark), Lindy ($10M raised, A16Z), Fyxer ($9M ARR in May 2025) & Zo Computer ($8M seed, Lightspeed) going up against platform players like Claude Cowork ($19B annualized revenue, up 1,167% YoY), Perplexity Computer ($148M ARR in June 2025), and open source project OpenClaw (acquihired by OpenAI in February 2026). "2025 was the era and the war of vibe coding. We had a bunch of different players, and in the end we emerged with full end-to-end systems like Cursor... At Sauna, we believe 2026 will be the war of vibe doing, and what it means to 'do things' breaks down into essentially three categories... chief-of-staff work: maintaining your relationships, checking your email, making sure hiring is on point... project management... [and] artifact and report creation."
As prosumer AI agents move beyond the one-off, prompt-based tasks pioneered by Manus ($90M ARR in July 2025), long-running agents that retain memory & adapt to individual work styles face a core challenge in the lack of core infrastructure for maintaining state & coordinating work across a sprawling & fragmented multi-tool environment consisting of project management (Asana, Monday.com), knowledge base (Notion), productivity (Office, Google Workspace), email (Outlook, Gmail) and team chat (Slack)—unlike in coding where AI agents share a centralized, version-controlled repository like GitHub. "In the world of coding tasks, there are far fewer infrastructure problems because the main source of truth is git. Once you start a job, the agent does something, you interact with it, you approve it, it merges to main, and you can throw the whole thing away. But in knowledge tasks, the product of the work is just the sandbox itself. There's no main branch for 'what would you like to eat'... You can say four different things in four different meetings, and the agent has to make sense of it based on context."
Rather than selling top-down enterprise licenses like Glean ($208M ARR, up 89% YoY) or competing head-on with horizontal AI chat like ChatGPT ($25B annualized revenue, up 194% YoY), prosumer AI agents are betting on bottom-up adoption by founders and individual knowledge workers who adopt independently and pay with a company card as a wedge into the company, with switching costs built from compounding memory and context that deepens with every use. "'Prosumer' means you have to treat these users as consumers, because in the end they decide by themselves whether they want to use something or not. But the main target of the work they're doing with your tool is work... the standards of your product need to meet consumer-grade standards, which are high, and then they pay with a company card... There's also a compounding value to using this product versus something like ChatGPT, because the memory and the connections actually grow over time."

Questions

We noticed you had a startup before Wordware and Sauna, focused on using an early version of GPT and BERT with an always-on listening device to augment human memory. Can you tell us how that experience shaped your vision and to what extent it helped inform what you're doing now?
Recently you had an internal shift where you started Wordware, which was originally infrastructure for building agents, and over roughly the last eight months you've pivoted to go after what you call the "prosumer." Can you take us through that shift and what made you decide the bigger opportunity was there?
So just to recap, the bottom-up shift is more of a strategic decision based on growth paths rather than a fundamental change in how you approach the technology?
Tell us what Sauna looks like today. What do you consider your initial product-market fit, what do your best customers look like, and how do they use the platform?
Are those three categories, chief of staff, project management, and artifact creation, explicitly broken out in the product?
What triggers a user to go directly to Sauna versus ChatGPT, Gmail, or Notion? What's the driver for working in Sauna rather than going directly to those other tools?
The idea of compounding context is central to what you're building. Can you connect that to how AI memory has evolved, and explain how Sauna's approach differentiates from other approaches to agentic memory? Who does memory well today, and how does Sauna compare?
RAG was long thought to be made obsolete by models with massive context windows. Was that a mistake? It sounds like retrieval is still very useful for Sauna.
There have been fairly divergent approaches to tool use and tool calling: structured function calling, standards like MCP, and OpenClaw, which was interesting because it used CLIs and CLI wrappers rather than MCP. What do you think the stack looks like and how do you think it evolves?
How do you map the competitive landscape? On the Wordware website you mentioned having to compete with Codex and Claude Code, but there are also startups like Lindy that customers are implementing. Where does Sauna fit in?
What about at the startup level? Companies like General Intelligence Company, Zo Computer, and Lindy come to mind. Do you consider them competitors? Do you worry more about them or about Claude Code?
Manus is another computer use agent. How do you think about them?
I'm interested in the choice between an embedded agent inside apps people already use, like what Poke does, versus a standalone app. It's clear you're going the standalone route. Can you talk about that?
You've written about AI interactivity moving from turn-based chat to a "delegation dashboard." What does that look like in practice?
And importantly, the agent can generate tasks for that board. It doesn't all have to come from you.
Fast forward five years, everything goes right. What do you see Sauna becoming, and what does knowledge work look like?
When you say "prosumer," you basically mean that you're targeting bottom-up adoption by professionals. Is that fair?

Interview

We noticed you had a startup before Wordware and Sauna, focused on using an early version of GPT and BERT with an always-on listening device to augment human memory. Can you tell us how that experience shaped your vision and to what extent it helped inform what you're doing now?

Some entrepreneurs have multiple ideas in their life which they can put to fruition. I'm the opposite. I probably have one idea, and that idea is extremely simple: context plus intelligence can equal action in the world. A lot of the actions that humans take right now just suck. They don't really dial in on what humanity could be good at, and what we are good at tends to be boring stuff.

Since I was a kid, I heard the story of an ape that could communicate with 600 sign language signs. I kept thinking that, even though it can express emotion, there are some ideas, like quantum physics, which that ape will just never understand. Part of my mission became figuring out what humans could be pushing themselves to understand, but basically don't have time for, because we're dealing with the equivalent of those 600 sign language signs. What came out of that was a dream to reduce the mundane and give people more time for deep, intuition-based work. Even in science, a lot of people agree that the last mile of any given discovery is often based on intuition, not just direct thinking.

Since those early days, I thought: I can't work on the intelligence part because I don't have enough resources to train huge models, but with my background in LSTMs, which I researched at Cambridge, I wanted to focus on the context side of doing things. So I started this company in 2018. I flew to San Francisco for the first time in America and ended up living in the Tenderloin because I thought it was pretty central.

I didn't know what the hell the Tenderloin was. America was very good to me. We quickly got some investment and started working, but sometimes being too early is not good. GPT-2 and BERT were the first transformer architecture models, and even though I could see that something was there on the intelligence side of the equation, they were just very difficult to work with.

The idea behind Sauna is that our memory is very limited. The only time memory really works is when you immediately know what you're going to use some information for at the moment you hear it. A lot of the time, even when you read a book, you have to reread it because the circumstances of your life have changed. That's frustrating. We already ingested that information, so why can't we just apply it to new parts of our life? The way memory works in Sauna is that it's not only listening to everything you say and reading all your emails, it's also looking proactively at where you can be better based on the core principles you've chosen for your life. The main takeaway is that context is everything. I've been working on this problem for a long time, and it's still not solved.

Recently you had an internal shift where you started Wordware, which was originally infrastructure for building agents, and over roughly the last eight months you've pivoted to go after what you call the "prosumer." Can you take us through that shift and what made you decide the bigger opportunity was there?

It's a very interesting shift across the whole economy. The ideas behind Wordware were very similar to the ideas behind Sauna. It was just focused on creating the ability for companies to have context, have intelligence, and collect that context in a better way than the individual could. We were doing pretty well. We were part of Y Combinator's summer batch at the end of 2024, from which we raised $30 million. We got to a million ARR in two weeks and had 10 million people try our agents.

But we realized that the models are getting smarter at such a fast pace that it's not wise to power the layer just below them. What you end up having to do is deploy engineers to bring these solutions to companies from top to bottom. What might actually win is the bottom-up approach. That was a big revelation for me. When I looked at how companies are growing in this era, the Lovables, the ElevenLabs, they've set a completely different bar. I was confident I could get to $10 million ARR in 18 months with the previous Wordware iteration. But then I thought: we're at a $200 million-plus valuation and the next round should be at a billion-plus. The old 100x revenue-to-valuation multiples no longer apply. Those numbers have dropped. At the peak, Perplexity was raising at 150x revenue. That has completely changed.

So the bar shifted, and I realized that in order to actually be part of the game, we had to think bigger. Our main competition is Anthropic and we are trying to play in the trillion-dollar game. There are going to be startups that reach a trillion by trying to automate knowledge work. 2026 is the era of "vibe doing," when people realize that utility is more important than a non-deterministic approach to things.

So just to recap, the bottom-up shift is more of a strategic decision based on growth paths rather than a fundamental change in how you approach the technology?

The technology is still very similar, to be honest. We run on sandboxes, and so on. It's more about growing a pair and accepting that there might be a world where in five or ten years everything is done by 10 companies. Maybe even 15 years out, but I live in San Francisco so everything seems a little accelerated here. You're sitting there thinking: I'm going to be working on this for 10 years, I've got great people in this office, I really like my work, and money is not that much of an issue in the valley if you're well-funded. So you might as well go for the big one.

Tell us what Sauna looks like today. What do you consider your initial product-market fit, what do your best customers look like, and how do they use the platform?

2025 was the era and the war of vibe coding. We had a bunch of different players, and in the end we emerged with full end-to-end systems like Cursor, which help you across the entire product lifespan. At Sauna, we believe 2026 will be the war of vibe doing, and what it means to "do things" breaks down into essentially three categories.

The first is chief-of-staff work: maintaining your relationships, checking your email, making sure hiring is on point, reviewing your ATS, reviewing your LinkedIn connections, which is surprisingly difficult because LinkedIn doesn't have an API. The second is project management. A real first-time user example is uploading a bunch of transcripts and Notion documents, which we connect to natively, and then trying to figure out each day, on the right channel, whether anything is drifting from a document you wrote four days ago. Is it a good drift or a bad drift? People upload all of their meetings and you can actually see what is becoming truth in a given project. It's almost as if the project has its own main branch in an engineering sense. It can also be something more personal: one of our customers is spending thousands of dollars a month on a project around wedding planning, and Sauna maintains all the relationships with flower suppliers, creates a website, and so on. Chief-of-staff tasks, CRM, executive assistant work, all of that.

The third pillar is artifact and report creation. Whenever you do research and want to publish the results, or whenever you're creating a PowerPoint presentation, you can do all of that with Sauna. At the end of the day, Sauna is a very good assistant with 3,000 connections and access to a file system and a memory system that learns how you do things, proactively does that work, schedules things, and becomes your main command center for commanding your fleet of AI agents out in the world.

Are those three categories, chief of staff, project management, and artifact creation, explicitly broken out in the product?

No. Horizontal AI agents get used for those three things, but they're not structured that way in the product. The biggest magic and the biggest drawback of horizontal products is that you sometimes only find out what they're used for once users get their hands on them. The first-time user experience can be tough with horizontal products, but the most amazing moments are when you discover that a user has applied the product to something you never even imagined was possible.

There's also a compounding value to using this product versus something like ChatGPT, because the memory and the connections actually grow over time.

What triggers a user to go directly to Sauna versus ChatGPT, Gmail, or Notion? What's the driver for working in Sauna rather than going directly to those other tools?

The memory and connections actually grow. The way you do things is learned, and skills that embody that knowledge about how you like things done get spawned automatically. When I was preparing for today, I dropped a voice note on iMessage. We have integrations with iMessage and Slack. I had actually forgotten about this interview, and Sauna knew I hadn't prepared. She dropped me a voice note to get me ready. I didn't even know I needed to prepare, but looking at my daily briefing, it's quite something. These magical moments appear once you start to understand the system. Also, the most powerful agents are not necessarily the most autonomous. They still ask for your review because the actions they take can be very consequential.

The idea of compounding context is central to what you're building. Can you connect that to how AI memory has evolved, and explain how Sauna's approach differentiates from other approaches to agentic memory? Who does memory well today, and how does Sauna compare?

Memory is something you need to approach as an overlapping multitude of solutions that work together. In our particular approach, it essentially starts with the file system. Sauna knows what my five biggest priorities are and is able to move the right stuff to archive throughout the night. I have personal areas, which are ongoing things that keep happening, and projects, which have a deadline. Personal areas include things like poetry, my relationship, taxes, therapy, and workouts. Work areas include one-on-ones with everyone on the team, founders-only meetings, investors, hiring, and marketing. Then I've got resources for random miscellaneous stuff. That's tier one of the file structure.

We are opinionated about how you should run your life. There's a book called "Building a Second Brain" that outlines how you should take notes, and we borrowed a lot from that concept, but now you don't have to do the heavy lifting. The AI does it for you, capturing the things you would normally forget. Then you have identity files: rules, user preferences, relationships, tools. We're quite opinionated about these seven files. Everyone I know lives in the relationships file, and Sauna knows to pull from it at the right times. The user file contains detailed information about me: priorities, time preferences, key quotes I've noted, notes, rules, and so on.

On top of that, we have semantic memory. You can just ask something like "what's the way I like to work out?" and it pulls from the right place without you having to find the right folder. This is for the stuff you can't really capture in a file. It's an overlapping system that is managed proactively to get the best possible performance.

We're actually only 1% lower on the most well-known knowledge retrieval benchmark than providers that focus solely on retrieval. Our approach scored 83.6 and the top player scored 84.6, with everyone else below that. Retrieval is really important to us as a company.

RAG was long thought to be made obsolete by models with massive context windows. Was that a mistake? It sounds like retrieval is still very useful for Sauna.

At some stage people confused the term RAG with vector databases. RAG, if you spell it out, is Retrieval Augmented Generation. What that basically means is you have to populate the context window with something. A million tokens is really not that much. The whole secret sauce of your AI company is how you construct both your system prompt and the rest of the context window as you're using tools.

The system prompt used to be just something static. Now it's considered one of the most important things you do, and it has multiple dynamic elements that change every day. The simplest example is injecting the time and date into the system prompt so it's no longer static. We probably have hundreds of these things we inject into the system prompt, and in a way that is retrieval augmented generation, because we retrieve something to augment the generation.

As you move forward, there are tool calls, and these tool calls try to pull the right information or learn from previous failed attempts in a reflection loop. If something hasn't worked, you inject the error, retrieve it from the code execution, and create a new generation. RAG is just such a wide term that it's at the core of everything right now. It's just not the vector database approach that everyone latched onto for a while. We do use vector databases as one of a million tools, but they are just one form of retrieval among many.

There have been fairly divergent approaches to tool use and tool calling: structured function calling, standards like MCP, and OpenClaw, which was interesting because it used CLIs and CLI wrappers rather than MCP. What do you think the stack looks like and how do you think it evolves?

Most agents use MCP by directly exposing the tools to the LLM, but what works better in our opinion is converting the MCP tools into an API and then asking the LLM to write the code that calls that API. MCP in its nature is relatively obfuscated and takes a long time to do anything. If you compare sending a message via Slack MCP versus the way we do it, which is writing code to call the API directly, ours is far faster. We very rarely use raw MCP. When we do use MCPs, we just grab the OAuth token and use them in code.

This has been our approach for the last six months. The difficulty of being in the AI race is that there's always something more you can add to your product. We've been in beta for a long time, and some competitors, who maybe don't have the same valuation pressure, have released quicker but thinner versions. OpenClaw is one of those tools. Their main thing was running on a separate computer, giving it full access, and allowing code execution. We've had all of those things and we've kept it secure, but we were paralyzed a bit by trying to do everything on too many fronts at once. Now we have an open sign-up on Sauna, and even though we're not pushing it in terms of marketing yet, everyone can sign up and use it.

How do you map the competitive landscape? On the Wordware website you mentioned having to compete with Codex and Claude Code, but there are also startups like Lindy that customers are implementing. Where does Sauna fit in?

The patterns follow the same trajectory as vibe coding. This year you need to keep close tabs on all your competition, but you can't let your team freak out every time a competitor ships something new. A lot of their stuff doesn't work. One competitor pushed scheduled tasks to a huge number of people and you couldn't set up a single scheduled task. It just threw errors. Then they fixed it, but people were treating it like it was incredible. We've seen this before: ChatGPT plugins didn't work, then MCPs became everything, and now MCPs are back to being seen as not that useful. As a founder you need to learn that there are waves to this, and some waves should be ignored. The job is to know which ones to ignore. But at the same time, keep tabs on all your competition and selectively borrow good things while doubling down on your core competencies.

This is difficult because it's a war. You have to borrow some things, but you have to borrow faster and make sure that users trust you about what's worth borrowing. My lead investor put it well: in the world of vibe coding, Cursor is one of those products that steals the right things and creates trust in the user's mind. Whatever is out there in the market, Cursor will have it five days later, only if it's worth having. And if it's worth having, they will have it. That balance is hard to maintain, and it's also hard to keep the team rallying in the right direction and trusting that you're going to figure it out.

What about at the startup level? Companies like General Intelligence Company, Zo Computer, and Lindy come to mind. Do you consider them competitors? Do you worry more about them or about Claude Code?

It's everyone, really. I'm a little less worried about Lindy because I think they've already settled into a certain way of doing things. I'll try their product and see if there are any UX paradigms worth borrowing. But in the end, you just have to do your job. The biggest moat you can have is execution. The agent harness underneath is really important and not actually that easy to copy. Making things actually work is harder than it looks. Zo Computer has its own cloud computer for each person, so that's probably the closest comparison. And there's now Perplexity Computer. Perplexity and Claude are the ones we worry about most.

Manus is another computer use agent. How do you think about them?

Manus is a very good agent, but it's not persistent and it doesn't really have that idea of a knowledge system. Most tasks are one-off, and there are real difficulties in keeping sandboxes live for weeks when some processes are waiting to be reawakened. Take an agent that handles ordering food on Uber Eats: it has to be somewhat live and present. Manus simplified this for good reasons. Now it's important that we do it a little better architecturally.

When you're dealing with coding tasks, there are far fewer infrastructure problems because the main source of truth is git. Once you start a job, the agent does something, you interact with it, you approve it, it merges to main, and you can throw the whole thing away. You don't have to keep it alive. But in knowledge tasks, the product of the work is just the sandbox itself. There's no main branch for "what would you like to eat."

Imagine a codebase where every email you receive can write to it and make changes, with no PR process. That's essentially what's happening: a living codebase with no PR system and no clear way to manage what gets written to main. You can say four different things in four different meetings, and the agent has to make sense of it based on context, like who you were talking with. If I'm talking with a candidate, that conversation is not going to merge to my main truth tree. But the architecture of these long-running processes puts a lot of weight on our whole system, because you have to keep sandboxes alive, ideally cheaply, forever.

Manus never did these kinds of long-running processes. They killed your sandbox and did a snapshot of it. You'd have to wait 30 seconds to a minute to restart it, and you couldn't have multiple back-and-forth conversations. You'd have to wait for it to come back alive. For us, many of these conversations get interacted with on a sub-one-second basis. There's a lot more architecture that goes into how we run Sauna.

I'm interested in the choice between an embedded agent inside apps people already use, like what Poke does, versus a standalone app. It's clear you're going the standalone route. Can you talk about that?

The standalone approach. The way I think about it is: you delegate work like you would with a chief of staff. You often delegate through a WhatsApp voice note. I can drop an iMessage voice note to my Sauna. But when I actually sit down to review the work that chief of staff has done, maybe six or seven tasks they've completed throughout my day, I sit down with a standalone app that works better as a mission control. Think of it as fire and forget: come back when you're ready to review the results of seven different tasks.

You've written about AI interactivity moving from turn-based chat to a "delegation dashboard." What does that look like in practice?

Right now our delegation dashboard is basically a kanban with things the agent proactively suggests it can do. That's also very important: the agent searches through all your connections, because people are famously bad at knowing what can be done with AI. Even as the CEO of an AI-first company, I'll procrastinate on a task, eventually send a voice note, and Sauna does it end to end perfectly. Then I think: why have I been procrastinating on this for weeks?

The kanban has "to do," "working," and "review." "To do" is things you fire off. "Working" is what the agent is actively handling. "Review" is where you actually go in, look at the output, click the right button, and make sure the agent isn't messing up your life.

And importantly, the agent can generate tasks for that board. It doesn't all have to come from you.

Yes, that is very important. Because we went cloud-first, which came with its own difficulties architecturally, the agent can do a bunch of work at night while you're sleeping. That's really powerful, because Claude Code cannot do that. You'd have to leave your computer open, or basically buy a Mac Mini to get it done. Because of the architectural work we went through, which not many others have tackled yet, we're able to run things in the background and monitor all your connections and every email, around the clock.

Fast forward five years, everything goes right. What do you see Sauna becoming, and what does knowledge work look like?

The truth is I don't really know. These models have been moving at such a quick pace that human brains are not good at extrapolating exponential graphs, and I'm both scared and excited. When I think about the next one to three years, I'm paraphrasing someone whose name I can't remember, but they said: I don't know what's going to get a human to Mars, but I'm pretty sure there's going to be a ton of spreadsheets, emails, and documents along the way.

If you can accelerate the "main branch of truth" idea I described earlier, people spend so much time communicating and trying to merge their different contexts in order to do a job they don't fully understand because the context was never properly set. It's just really exciting to complete those mundane jobs first.

That means making sure no human ever has to go into email, review a message, download a PDF, copy-paste it into a Google doc, update the doc based on comments, copy-paste it into DocuSign, send it for signature, and then wait three days for the other side to do the same. Our kids are going to look back at this like we were idiots. So part one is: get rid of all the mundane stuff.

Part two is that a great AI assistant doesn't only do the mundane work, it also helps people with creative work, based on augmenting human memory, knowing things about you that you didn't even realize you once accepted as true, knowing you better than you know yourself, and filling in those gaps to get people to actual intuitive creative work.

What does that mean for the world? My best hope is that it doesn't wreck the global economy. We're at risk. When you start seeing how many white-collar jobs these systems are eliminating, it's scary. We have three times more debt in the system than we had in 2009, and these white-collar workers are living in even more expensive homes. If things start failing at scale, we're going to print more money. That's not my job to solve. I'm just trying to enable people to do less mundane work and help people be the best versions of themselves. I have to approach it from the lens of technology. Historically, technology enabled humanity to improve. Even in the industrial revolution, when machines took work out of people's lives, it ended up being a net benefit.

I do sincerely hope that human nature is good enough that we're going to spin this in the right way. I'm in awe of Dario Amodei's response to pressure from the Department of Defense and the government. It's incredible to be able to hold to your own ideals so firmly against that kind of pressure. But we are in a weird moment in human history given the current politics and wars. It's a very complex question. I would love to tell you we're just going to have AI agents and we're going to be on the beach, but it's a complex issue and I'm hopeful while also understanding a lot of the difficulties that come with making this transition.

When you say "prosumer," you basically mean that you're targeting bottom-up adoption by professionals. Is that fair?

That's very fair. "Prosumer" is a made-up term for something that means you have to treat these users as consumers, because in the end they decide by themselves whether they want to use something or not. But the main target of the work they're doing with your tool is work. They might have personal stuff they do in the same system, like that wedding planning example, but fundamentally it's work, and they pay with a business credit card. So the standards of your product need to meet consumer-grade standards, which are high, and then they pay with a company card.

Disclaimers

This transcript is for information purposes only and does not constitute advice of any type or trade recommendation and should not form the basis of any investment decision. Sacra accepts no liability for the transcript or for any errors, omissions or inaccuracies in respect of it. The views of the experts expressed in the transcript are those of the experts and they are not endorsed by, nor do they represent the opinion of Sacra. Sacra reserves all copyright, intellectual property rights in the transcript. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any transcript is strictly prohibited.

Filip Kozera, CEO of Wordware, on the rise of vibe doing

Background

Questions

Interview

Disclaimers

Read more from
#ai

Pocket revenue, growth, and valuation

Pocket at $27M annualized revenue

Scott Stevenson, CEO of Spellbook, on building Cursor for contracts

Create a free account, or log in.

Free article limit reached.

Standard membership required.

Standard membership required.

Background

Questions

Interview

Disclaimers

Read more from #ai

Pocket revenue, growth, and valuation

Pocket at $27M annualized revenue

Scott Stevenson, CEO of Spellbook, on building Cursor for contracts

Read more from
#ai