Voice and Browser Agents Integration

Diving deeper into

David Mlcoch, co-founder & CEO of Asteroid, on browser automation and the last mile problem of AI

Interview
voice agents and browser agents need to work in tandem
Analyzed 7 sources

The strategic point is that voice AI only replaces labor when it can finish the job inside the system of record. In many industries, the phone call is just the intake step. The real work is logging into an old portal, clicking through brittle forms, and saving the data where the business actually runs. That keeps browser agents important even as voice models get better, because many legacy systems still have no usable API layer.

  • This is why healthcare, insurance, and supply chain show up first. A caller may speak naturally to a voice bot, but the company still depends on an old scheduling, quoting, or claims portal. Asteroid is built around that last mile, using browser agents to sign in, navigate screens, and enter the structured data the call collected.
  • The split in the market is concrete. Voice platforms like Vapi focus on the live conversation layer, speech in, model orchestration, speech out. Browserbase and Stagehand focus on hosted browsers and developer tooling. Asteroid is pushing the workflow layer for non technical operations teams that need reliable, repeated task execution across legacy web apps.
  • The likely end state is not agent first software replacing every UI overnight. It is a transition where APIs handle clean modern systems, and browser agents sit on top of old web software as a compatibility layer. Over time, the human facing UI becomes less important for routine work because the agent is the main user of the interface.

Going forward, the winning enterprise stack will combine conversation, tool calling, and browser execution into one supervised workflow. The biggest opportunity is in high volume, repetitive operations where a company already knows the script, but the software underneath is too fragmented or too old to integrate cleanly. That is where end to end automation starts to look like a real labor substitute instead of a demo.