Head of Product at SaaS startup on building a personal AI OS with Codex automations and Claude Cowork

Background

We spoke with a power user and early adopter who has built an extensive personal AI stack across Codex and Claude Code, running roughly twenty daily automations that span email, Slack, calendar, customer bugs, and career decisions.

The conversation covers how he wires together custom integrations, where he still has to step in, and what it actually takes to get agentic workflows reliable enough to trust.

Key points via Sacra AI:

AI personal operating systems are emerging from the bottom up—power users like this operator are hand-wiring four email accounts, four Slack workspaces, iMessage, Grain, Linear, PostHog, and custom Google Workspace OAuth flows into a single Codex environment, spending ~14% of their AI usage just maintaining the plumbing. "Every skill is still software: it needs to be tested, edge cases need to be figured out, and it needs to be refined. The same goes for automations. I did a scan of all my Codex and Claude Code usage recently, and about fourteen percent of it—measured by chats—was maintaining, fixing, and setting up automation and skill stuff."
The emerging workflow pattern isn't one model doing everything—it's orchestration across specialized agents, with Codex handling cross-tool execution and memory-intensive daily automations while Claude Code handles strategic reasoning, connected via a lightweight CLI bridge (`claude—p`) that routes specific prompts for a second opinion and returns synthesized recommendations. "I have a skill set up in Codex—I invoke it and say, 'Get Claude's second opinion on this.' It constructs a prompt based on what we're talking about, sends it over with `claude—p`—the noninteractive mode for Claude—waits for the response, digests it, and gives me suggestions on what to implement and what not to."
The ceiling on full agent delegation isn't trust or capability in isolation—it's voice fidelity and context persistence: even with a dedicated writing skill encoding style rules across email, Slack, LinkedIn, and long-form writing, AI-sounding artifacts (em dashes, "it's not this, it's that" constructions) keep surfacing, and high-level goals like career direction or company migration context have to be re-injected repeatedly because the agent doesn't string them together across sessions. "I sometimes have to repeat higher-level goals for things I'm working on… It also doesn't infer high-level things that are going on from my meetings, or string things together across meetings. I have to actively push it to do that."

Questions

Which tools do you use regularly, and which have you tried between Claude Cowork and OpenAI's Codex app?
When did you first try each of them, and how often are you using Claude Cowork versus Codex now?
What made Codex take over that much of your usage? Walk me through the last few things you delegated to Codex end to end: what were they, what did you give it, and what came back?
Can you walk me through the last time one of those daily automations ran—say the email drafting, customer-call insights, or Reddit competitor scan—what context Codex had access to, what steps it took, and what you actually did with the output?
When it finishes that "What I Promised Yesterday" run, what does the output look like—is it a list, calendar events, draft emails, Slack reminders—and where do you review or approve it before anything gets sent or scheduled?
When you added that disposition step, how did that change the usefulness of the workflow? What can Codex actually do after you mark something "schedule it" or "handle this," versus what still requires you to manually jump into another tool?
Walk me through one recent item from that list where you told Codex to "handle this." What was the promise, what did Codex do across email or Slack or calendar, and what did you check before letting it go out?
Can you give me an example where you did let Codex take the next step itself—like drafting or sending an email, creating a calendar event, updating a backlog, or posting somewhere?
Before you let Codex add the card details, what made you comfortable giving it that level of access—and did it ask for confirmation or show you exactly what it was going to do before submitting?
When Codex is doing these cross-tool tasks—email, Slack, calendar, vendors, files—how did you wire all that up? What does it actually have access to, and what was the setup process like?
Walk me through one of those custom integrations end to end—maybe Google Workspace or iMessage. What did you have to build, how does Codex invoke it, and what still breaks or needs manual babysitting?
When Codex actually uses that Google Workspace integration, is it mostly reading context—like email and calendar—or is it also creating and modifying things, like calendar events, drafts, slides, docs, or sending emails?
Could you walk me through the last Google Workspace action where it actually changed something—like sent an email, made a calendar event, or created a doc or slide? What did you ask for, what did Codex do, and what did you review before it finalized it?
Walk me through the last calendar change you made that way. What did you type into Codex, what did it infer from your existing schedule, and did it create the event directly or come back with options first?
When it picked the first bad time, was that because the drive to Salt Lake wasn't in your calendar, or because it didn't understand travel time? And after you corrected it, does it remember that kind of constraint going forward?
What kinds of things do you find yourself repeatedly re-explaining to Codex—like locations, preferences, work priorities, writing style, company context—because they don't persist well enough yet?
Can you walk me through the last time you dumped that higher-level context in? What were you trying to decide, what did you give Codex, and what kind of output or guidance did it produce?
When you went back and forth between Codex and Claude Code for that career decision, what did each one do better? Why not just stay in one tool for the whole thread?
Can you walk me through a task where that split was really clear—where you used Claude Code to think through the plan, then Codex to actually carry it out?
Can you walk me through how that "Codex pings Claude for a second opinion" setup works in practice? What does the skill send to Claude, what comes back, and how do you decide whether to follow Claude's advice or keep Codex moving?
When you use that for a big code or UX change, what does your review process look like before you trust it? What are you checking yourself versus what are you comfortable letting Codex and Claude debate and resolve?
Walk me through the last big code or UX change where you used Codex, maybe with Claude as a second opinion—what was the change, what did the agents produce, and what did you personally have to fix or overrule before shipping it?
Can you walk me through that CloudFluent onboarding change end to end—what the flow needed to do, what you asked Codex to produce, where Claude reviewed the plan, and what you actually shipped after review?
When that Claude handoff broke, what did Codex do? Did it clearly tell you "I can't reach Claude," or did it fail in a more confusing way? And how did that affect the growth-plan workflow?
When something like that breaks, do you usually stop and fix the plumbing right away, or route around it? And more broadly, how often do these agent workflows depend on you maintaining custom glue versus the app just working out of the box?
Can you walk me through one automation or skill that started out flaky but became genuinely useful after refinement? What broke at first, what tests or guardrails did you add, and what does it reliably do now?
Can you walk me through a recent bug or feedback report it handled—where the report came from, what context Codex pulled in, what it tried to do automatically, and where you still had to step in?
When it sent the list of affected customers back, what did that handoff look like? Did Codex draft the Slack or Linear update itself, and what did you check before sharing it with the customer success team?
On that bug workflow, how much of the value was time saved versus getting a deeper impact analysis that might not have happened otherwise?
For that CS bug workflow, what would need to change for you to let Codex close the loop fully—update Linear, notify CS, maybe even draft customer-facing language—without you reviewing every time?
If you imagine that full agent chatbot for CS, what would it need access to, and what guardrails would it need before you'd let CS rely on it directly?
You mentioned a "Boostly brain" context layer. What would actually go into that—docs, recent releases, customer segments, migration notes, internal terminology—and how would you keep it current enough that the agent doesn't reason from stale context?
Walk me through what happens today when Codex needs that company context. Does it pull from Notion directly, do you paste things in, or do you maintain separate context files or skills for it?
When you put context into docs or Notion for the agent, has that changed how you write or organize information? Are you now documenting things differently because you expect Codex to use it later?
Roughly what share of your workweek touches Codex now, either directly or through automations? And what are the categories of work you still avoid putting into it?
What changed in your habits to make that possible? What did you have to start recording, routing, or delegating differently so that 80–90% of your week could flow through it?
Walk me through your morning. When you start the day, what do you open first, what does Codex summarize or prioritize for you, and where do you still go directly into Slack, calendar, or email yourself?
On the email side, when Codex scans and drafts for you, what makes you decide "yes, draft this" versus "I'll just handle it myself"? And can you walk me through the last email thread where Codex's draft actually saved you meaningful time?
When you say you had to iterate quite a few times, what was wrong with the first drafts—was it tone, accuracy about what each person asked, the level of personalization, or something else?
You've mentioned a few times that you still tweak messages so they sound like you. What have you tried to improve that—writing samples, style guides, examples—and where does Codex still miss your voice?
When you catch those issues, do you usually fix them manually, or do you feed the correction back into the Travis writing skill? And has that skill actually gotten better over time, or does it keep making the same mistakes?
When you think about what would break through that ceiling, is it better memory of your edits, tighter integration with the actual send surfaces, more controllable style constraints, or something else?

Interview

Which tools do you use regularly, and which have you tried between Claude Cowork and OpenAI's Codex app?

I use both.

When did you first try each of them, and how often are you using Claude Cowork versus Codex now?

I use Codex probably ninety percent of the time now. I started using Cowork the moment it came out—I'm a super early adopter. I've been trying both from the beginning, but mostly use Codex now.

What made Codex take over that much of your usage? Walk me through the last few things you delegated to Codex end to end: what were they, what did you give it, and what came back?

A huge part of it is that it just works. Number one, it's a much better-built desktop app—there's a lot of polish, a lot of edge cases that work way better than Cowork. At the same time, the model is more reliable for the kind of actions Cowork handles, like checking your email and running automations. I have a huge portion of my life set up in Codex now—around twenty automations that run every day. That covers everything from checking my email and writing drafts, to summarizing podcasts, scanning Reddit for competitor information, and pulling insights from customer calls. Then there's some really specific stuff: I own part of a business we're trying to sell, and Codex gives me daily updates on that. It categorizes all my personal expenses, prioritizes my day, syncs skills across Codex and Claude Code so I can switch between the two, and writes release notes for features we've shipped so I can market them internally.

Can you walk me through the last time one of those daily automations ran—say the email drafting, customer-call insights, or Reddit competitor scan—what context Codex had access to, what steps it took, and what you actually did with the output?

I have one called "What I Promised Yesterday." It has access to the call recording software I use for Boostly and for personal stuff, plus all my email accounts, my texts, Slack, and my calendar. It goes in, checks everything I said the previous day, and flags the things I committed to people so I remember to follow through and get them on my calendar.

When it finishes that "What I Promised Yesterday" run, what does the output look like—is it a list, calendar events, draft emails, Slack reminders—and where do you review or approve it before anything gets sent or scheduled?

It's a big list, basically asking me to disposition each item as either done, schedule it on my calendar, tell Codex to handle it, put it in a backlog, or just skip it. I recently added that disposition step—it used to just give me a list of things. This is all happening in Codex, just in a chat.

When you added that disposition step, how did that change the usefulness of the workflow? What can Codex actually do after you mark something "schedule it" or "handle this," versus what still requires you to manually jump into another tool?

I almost never jump into another tool. Codex can handle everything I ask it to do. The only time I go into another tool is to tweak the language. If it's replying to an email or a Slack message, I still don't fully trust it to sound like me—even though I have stuff trained on my writing style and voice. I just barely started doing the disposition step this way, only a couple of days ago.

Walk me through one recent item from that list where you told Codex to "handle this." What was the promise, what did Codex do across email or Slack or calendar, and what did you check before letting it go out?

One item was that I had said I was going to visit someone—I'm in a bishopric in an LDS ward—but my schedule filled up. I got reminded of that through this workflow, but I didn't have Codex write the text to them. I just opened Messages myself and wrote to them to reschedule.

Can you give me an example where you did let Codex take the next step itself—like drafting or sending an email, creating a calendar event, updating a backlog, or posting somewhere?

For Boostly, I was working with a vendor on our SEO offering and Codex reminded me that I needed to add card details. Once I had a card ready, I just told Codex to go add those card details to the vendor so they could kick off the trial for me.

Before you let Codex add the card details, what made you comfortable giving it that level of access—and did it ask for confirmation or show you exactly what it was going to do before submitting?

It was pretty careful, since it was related to a credit card. I created a secure area on my computer to store those details. I'm fairly trusting with the frontier models from OpenAI and Anthropic because I think they're incentivized not to screw things up. I wouldn't do this with something like OpenClaw or Hermes—I think it's too possible for those to go off the rails.

When Codex is doing these cross-tool tasks—email, Slack, calendar, vendors, files—how did you wire all that up? What does it actually have access to, and what was the setup process like?

This is probably due to me being an early adopter and putting a lot of work in over time, first with Claude Code and then with Codex, to have everything available. I do use some of Codex's plugins that work pretty nicely out of the box—Vercel and a couple of others. But a lot of stuff I built with custom API connections, because it's important for it to work across so many accounts. For example, I have four email accounts and four Slack accounts and four calendar accounts I wanted it to pay attention to, so I created custom scripts and a CLI with a specific app and API connection for Google Workspace to connect all of those. I've connected my local iMessage, calendar, Chrome, Figma, Linear, a call recording tool called Grain, Telegram, and a whole bunch of Chrome automations. I use Codex's computer use and browser use to log in to different sites as me.

Walk me through one of those custom integrations end to end—maybe Google Workspace or iMessage. What did you have to build, how does Codex invoke it, and what still breaks or needs manual babysitting?

I set up the Google Workspace integration by creating an app in Google Cloud, enabling the Gmail, Calendar, Slides, and all other Google-related APIs, and then connecting all my accounts—my personal one, my day job, and two side projects. I also have a small automation that runs to check whether the connection is still healthy. For a long time I had issues with auth breaking until I figured out how to make the token last much longer. Now it stays pretty well connected.

When Codex actually uses that Google Workspace integration, is it mostly reading context—like email and calendar—or is it also creating and modifying things, like calendar events, drafts, slides, docs, or sending emails?

It's both—all of the above.

Could you walk me through the last Google Workspace action where it actually changed something—like sent an email, made a calendar event, or created a doc or slide? What did you ask for, what did Codex do, and what did you review before it finalized it?

I have a running thread where I ask it to create or modify calendar events. I basically just use that as my interface for my calendar rather than ever creating anything manually.

Walk me through the last calendar change you made that way. What did you type into Codex, what did it infer from your existing schedule, and did it create the event directly or come back with options first?

It does both. The most recent one: I sent a screenshot of a calendar event and said, "Go find another time for this on Monday or Tuesday—I won't get to it today." It moved the event to a time it thought would work on Monday. I said that wouldn't work because I had to drive to Salt Lake during that time, and then it moved it to another block on Tuesday.

When it picked the first bad time, was that because the drive to Salt Lake wasn't in your calendar, or because it didn't understand travel time? And after you corrected it, does it remember that kind of constraint going forward?

It was because the drive wasn't in my calendar—there's no way it would have known. And it's not great at remembering those constraints going forward.

What kinds of things do you find yourself repeatedly re-explaining to Codex—like locations, preferences, work priorities, writing style, company context—because they don't persist well enough yet?

I sometimes have to repeat higher-level goals for things I'm working on. For example, I'm evaluating options between going full time on a side project, starting a different company, or staying at my current job—and I often have to dump a lot of context into Codex to get it up to speed. It also doesn't infer high-level things that are going on from my meetings, or string things together across meetings. I have to actively push it to do that.

Can you walk me through the last time you dumped that higher-level context in? What were you trying to decide, what did you give Codex, and what kind of output or guidance did it produce?

I was trying to think through what the ideal career outcome would be over the next couple of years. I ended up writing a huge prompt and going back and forth between Codex and Claude Code to get both of their opinions. Even then, it got a little caught up in the details rather than staying focused on high-level execution.

When you went back and forth between Codex and Claude Code for that career decision, what did each one do better? Why not just stay in one tool for the whole thread?

Claude Code is still better at strategic and high-level reasoning—I just get better results there. I've heard people say that Claude Code, and specifically the latest version of Opus, infers your intent better, and I think that's probably true. Codex takes things a bit more literally.

Can you walk me through a task where that split was really clear—where you used Claude Code to think through the plan, then Codex to actually carry it out?

I haven't been using Claude Code as much recently because of the friction of switching, but I have a skill where Codex pings Claude for a second opinion. I do it that way, or sometimes I fully spin up a separate chat in my terminal with Claude Code. That's usually for strategic or personal things—figuring out what I want to do with my career, or doing marketing strategy for my side project, CloudFluent.

Can you walk me through how that "Codex pings Claude for a second opinion" setup works in practice? What does the skill send to Claude, what comes back, and how do you decide whether to follow Claude's advice or keep Codex moving?

I have a skill set up in Codex—I invoke it and say, "Get Claude's second opinion on this." It constructs a prompt based on what we're talking about, sends it over with `claude—p`—the noninteractive mode for Claude—waits for the response, digests it, and gives me suggestions on what to implement and what not to. I'll do this for really big code changes or UX changes when I'm building something.

When you use that for a big code or UX change, what does your review process look like before you trust it? What are you checking yourself versus what are you comfortable letting Codex and Claude debate and resolve?

It depends on the issue. If it's a bug, I'm going to verify that it works and also evaluate the architectural decision it's suggesting. If it's a UI or UX change, I'm going to review the actual suggestions or changes it makes. It really just depends on the context.

Walk me through the last big code or UX change where you used Codex, maybe with Claude as a second opinion—what was the change, what did the agents produce, and what did you personally have to fix or overrule before shipping it?

The most recent example was actually that career discussion—it was giving synthesis based on what I had told it, and then I would say, "What does Claude think about this? Go grab that and let me know." Another example was when I was redoing the onboarding flow for CloudFluent, my AI training tool. It was a relatively large change and I wanted Claude to review the plan.

Can you walk me through that CloudFluent onboarding change end to end—what the flow needed to do, what you asked Codex to produce, where Claude reviewed the plan, and what you actually shipped after review?

I'll give you a different example. One was a growth plan for CloudFluent—basically, given all this context, what should I do to grow it? In that case, the Claude connection actually broke. I think I need to set up API billing since Claude Code recently changed how they bill for noninteractive sessions, so that's what happened in the most recent one.

When that Claude handoff broke, what did Codex do? Did it clearly tell you "I can't reach Claude," or did it fail in a more confusing way? And how did that affect the growth-plan workflow?

It clearly told me it couldn't reach Claude. It didn't know why—it just knew that `claude—p` hung and wouldn't return anything.

When something like that breaks, do you usually stop and fix the plumbing right away, or route around it? And more broadly, how often do these agent workflows depend on you maintaining custom glue versus the app just working out of the box?

There's a lot of effort that goes into the maintenance and setup because there's a lot of exploration going on—it's not like everything is straightforward. A lot of it is me learning and figuring out what's possible and trying to push the boundary. Things break and need to be tweaked constantly. Something I'm learning is that every skill is still software: it needs to be tested, edge cases need to be figured out, and it needs to be refined. The same goes for automations. I did a scan of all my Codex and Claude Code usage recently, and about fourteen percent of it—measured by chats—was maintaining, fixing, and setting up automation and skill stuff.

Can you walk me through one automation or skill that started out flaky but became genuinely useful after refinement? What broke at first, what tests or guardrails did you add, and what does it reliably do now?

The clearest one is the Google Workspace integration. The auth would break on different accounts at different times, so there was always something broken and I'd have to re-authenticate. It was really frustrating. I also didn't have enough API access and had to go enable a bunch of APIs in Google Cloud Console. Once I made the auth last longer and set an automation to check it every day, and added sufficient API access, that one calmed down. Another one that started flaky but turned useful is an automation that tries to answer bugs and feedback reports that are reported internally. That one took a lot of back-and-forth prompting to ingest information in the right way, on a good schedule, and then troubleshoot bugs automatically. Frankly, I'm still tweaking that one.

Can you walk me through a recent bug or feedback report it handled—where the report came from, what context Codex pulled in, what it tried to do automatically, and where you still had to step in?

The report came in through Slack and Linear. Our customer success team uses a Linear integration in Slack, and they created a bug report about a customer whose embedded form wasn't getting filled in correctly. My automation picked up that bug and looked into why it was happening. It came back with really good research—it scanned our codebase and our database, and since we had recently done a migration from our legacy system to a new system, it had to figure out that some of the broken behavior was because of a poorly migrated form. It came back with a solid root cause for that specific customer. Where I still had to push it was telling it to find the total impact—which other customers were affected and why. It did the "why" pretty well, but I had to prompt it back and forth to identify the other affected customers. We ended up with a list of around five customers with the issue, which I sent back to the CS reps, since it mostly came down to customers needing to reimplement a new form that couldn't be migrated.

When it sent the list of affected customers back, what did that handoff look like? Did Codex draft the Slack or Linear update itself, and what did you check before sharing it with the customer success team?

It drafted the update, and I did double-check its work—still just through prompting Codex. I asked it why it affected those specific customers and in what time frame. Once I was confident in the list, I copied it from Codex and pasted it into the ticket. I could have had Codex send it directly, but sometimes when it sends messages like that it adds little phrases that don't sound like me.

On that bug workflow, how much of the value was time saved versus getting a deeper impact analysis that might not have happened otherwise?

The value is mostly time saved, because I would just use Codex manually if the automation didn't exist. The analysis would happen either way—the purpose of the automation is to save me time and increase our CS response rate.

For that CS bug workflow, what would need to change for you to let Codex close the loop fully—update Linear, notify CS, maybe even draft customer-facing language—without you reviewing every time?

Probably just giving it more context, and I'm not even sure exactly what context, because there's a lot of nitty-gritty detail about what happened with the migration, why it happened, and what was going on in the legacy codebase. That would be pretty hard to put fully on autopilot. The other thing is that it's just a scheduled automation running through Codex right now. To really hand it off, it would need to be a full agent chatbot so it could respond in real time as CS is asking questions.

If you imagine that full agent chatbot for CS, what would it need access to, and what guardrails would it need before you'd let CS rely on it directly?

It would need read-only access to the production database, access to our codebase—or really just GitHub so it could see recent releases. It would need access to Linear and Slack, and a Boostly brain context layer so it knows who we are, what we're doing, and what's recent. It would also need access to HubSpot, our call recording tool called Ask Elephant, and our texting tool called Quo so it could see exactly what customers have requested. Access to PostHog would be nice too, so it could see recent interactions customers had with the product. The main guardrails: I'd start with internal-facing only, not customer-facing, and keep it mostly read-only until it produced really good output. I'd also build in confidence scoring—if it was highly certain it found a bug, I'd have it write a fix and submit it as a PR, but it wouldn't merge the PR on its own. If confidence was lower, it would just gather information from CS, the database, and the codebase and tee it up for an engineer to review.

You mentioned a "Boostly brain" context layer. What would actually go into that—docs, recent releases, customer segments, migration notes, internal terminology—and how would you keep it current enough that the agent doesn't reason from stale context?

All of the above, plus strategy and an overarching sense of who Boostly is and what it does. We have some of that in Notion that we're testing out, but it's not fully implemented yet.

Walk me through what happens today when Codex needs that company context. Does it pull from Notion directly, do you paste things in, or do you maintain separate context files or skills for it?

The problem is more of an internal organization issue than the tools not being ready. One of the cofounders has done work along those lines but hasn't fully released it, so everyone does a bit of their own thing. I put stuff in either Notion or just in documents that the agent has access to.

When you put context into docs or Notion for the agent, has that changed how you write or organize information? Are you now documenting things differently because you expect Codex to use it later?

Not really. I do record a lot more things—transcripts, output from Codex or Claude Code, automations, and so on. But the structure hasn't changed dramatically. That's probably an area where I could improve my workflows, having everything automatically ingested more. But things are changing so fast that over-structuring it gets brittle. When I set up a new automation or system, I just point it at the context I want it to look at, rather than relying on standard agent configuration files.

Roughly what share of your workweek touches Codex now, either directly or through automations? And what are the categories of work you still avoid putting into it?

There's very little I avoid putting into it—I'd say eighty to ninety percent. There's hardly anything it doesn't touch, because I record everything now. Any in-person or live conversations all end up touching Codex. The only real exception is my own writing, which is just an extension of thinking and can't really be automated. But basically everything goes through Codex.

What changed in your habits to make that possible? What did you have to start recording, routing, or delegating differently so that 80–90% of your week could flow through it?

I'll clarify—I don't do everything *through* Codex. I still check Slack directly. Increasingly I read my email through Codex, but I still use my calendar's own UI because it's just a nice interface. To get to this point, I just had to connect a lot of things over time, and Codex had to get good enough to handle all the connections and context.

Walk me through your morning. When you start the day, what do you open first, what does Codex summarize or prioritize for you, and where do you still go directly into Slack, calendar, or email yourself?

Codex does a bunch of scans of my email, calendar, Slack, messages, and so on, and comes back with different views on how I should prioritize my day, what's on my schedule, what's going on with my email, and so on. For email, I'll usually just ask it things there, and if there are drafts to write, I'll have it create them and then do final tweaking and send from my actual email client. For Slack, I mostly go straight into Slack because it's just faster to respond there. I also check my calendar manually.

On the email side, when Codex scans and drafts for you, what makes you decide "yes, draft this" versus "I'll just handle it myself"? And can you walk me through the last email thread where Codex's draft actually saved you meaningful time?

The last time it saved me meaningful time was when I needed to ask slightly personalized follow-up questions to a bunch of people who had taken my AI training. I wanted to follow a basic template but pull in specific things each person had asked during the class. So I had Codex create those drafts, iterated on them quite a few times, and then went in and sent them with maybe a small tweak.

When you say you had to iterate quite a few times, what was wrong with the first drafts—was it tone, accuracy about what each person asked, the level of personalization, or something else?

Mostly length and tone. But part of that is just how I write anyway—even if I write something from scratch, I'll often iterate on it. It wasn't that it did a terrible job; seeing the draft just gave me ideas on how to improve it.

You've mentioned a few times that you still tweak messages so they sound like you. What have you tried to improve that—writing samples, style guides, examples—and where does Codex still miss your voice?

I have a big skill called the Travis writing skill that summarizes how I should write for messages, Slack, email, LinkedIn, and long-form writing. But AI-sounding elements still creep in—em dashes, "it's not this, it's that" constructions—even though I've instructed it not to do that. It does pretty well on drafts, but it still isn't quite right.

When you catch those issues, do you usually fix them manually, or do you feed the correction back into the Travis writing skill? And has that skill actually gotten better over time, or does it keep making the same mistakes?

I do both, and it has gotten better over time, but it also keeps making a lot of mistakes. It feels like we're asymptoting toward what it can do right now.

When you think about what would break through that ceiling, is it better memory of your edits, tighter integration with the actual send surfaces, more controllable style constraints, or something else?

Probably better memory of my edits and more controllable style constraints.

Disclaimers

This transcript is for information purposes only and does not constitute advice of any type or trade recommendation and should not form the basis of any investment decision. Sacra accepts no liability for the transcript or for any errors, omissions or inaccuracies in respect of it. The views of the experts expressed in the transcript are those of the experts and they are not endorsed by, nor do they represent the opinion of Sacra. Sacra reserves all copyright, intellectual property rights in the transcript. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any transcript is strictly prohibited.

Head of Product at SaaS startup on building a personal AI OS with Codex automations and Claude Cowork

Background

Questions

Interview

Disclaimers

Read more from

OpenAI

Ops lead at Scale AI on using Claude Cowork & Codex for QC automation and multi-tool debugging at scale

UX lead at real estate firm on running a website redesign with Claude Cowork

SOTA model nightclub hype cycle

Read more from

Anthropic

Head of Product Marketing at SaaS startup on automating product marketing with Claude Cowork

Operations at Whop on using Claude to ship product & automate ops

Claude Code vs. Cursor

Read more from
#ai

Arena revenue, growth, and valuation

$100M/year Nielsen of LLMs

$20M/year Replit for GCs

Create a free account, or log in.

Free article limit reached.

Standard membership required.

Standard membership required.

Background

Questions

Interview

Disclaimers

Read more from OpenAI

Ops lead at Scale AI on using Claude Cowork & Codex for QC automation and multi-tool debugging at scale

UX lead at real estate firm on running a website redesign with Claude Cowork

SOTA model nightclub hype cycle

Read more from Anthropic

Head of Product Marketing at SaaS startup on automating product marketing with Claude Cowork

Operations at Whop on using Claude to ship product & automate ops

Claude Code vs. Cursor

Read more from #ai

Arena revenue, growth, and valuation

$100M/year Nielsen of LLMs

$20M/year Replit for GCs

Read more from

OpenAI

Read more from

Anthropic

Read more from
#ai