Hybrid Browser Automation for Back Office
David Mlcoch, co-founder & CEO of Asteroid, on browser automation and the last mile problem of AI
The key bottleneck in browser agents is not whether the model can click, it is whether the workflow feels fast enough to trust and cheap enough to repeat. OpenAI folded Operator into ChatGPT agent after launching it as a standalone browser product, and the newer agent explicitly combines a visual browser for GUI actions with a text based browser for faster web work. That shift shows the market moving toward hybrid systems, where vision handles edge cases and HTML based methods do most of the routine work.
-
In practice, a vision browser means the model is looking at screenshots and moving a cursor like a person. That is flexible, but slower than reading page structure directly. Anthropic also documents computer use as a screenshot driven tool and notes latency is still too slow for many human style interactions.
-
Asteroid is built around the opposite requirement, repeated back office jobs in legacy portals. It runs cloud Chromium sessions, uses Playwright on the interaction layer, and mixes DOM understanding with screenshot based control. The product is designed for insurance, healthcare, and supply chain teams that need the same workflow run reliably at scale, not a one time consumer task.
-
That creates a clear stack split. Browserbase sells hosted browser infrastructure to developers, while Asteroid packages browser automation for operations teams that would otherwise pay developers or an RPA vendor like UiPath to wire brittle scripts into old web portals. The value is less in a single smart click, and more in turning messy human procedures into monitored repeatable runs.
The next phase of browser automation will be hybrid by default and increasingly verticalized. Foundation model companies will keep improving native computer use, but the winners in enterprise workflow automation will be the products that add speed, supervision, logging, and domain specific playbooks for high volume tasks inside old systems that still have no API.