Fleet Becomes Core RL Infrastructure
Fleet
Fleet’s jump shows that the bottleneck in frontier AI has moved from getting more text to giving models safe places to practice work. Labs are buying simulated copies of tools like CRMs and spreadsheets because computer use models improve by trying tasks over and over, seeing what changed on screen and in the underlying data, then getting scored on whether the result was actually correct. That turns Fleet from a services shop into core post training infrastructure.
-
The product is not a generic sandbox. Fleet builds resettable versions of real business workflows, where an agent can update a record, move through multiple apps, and be checked against structured state like a database. That is much closer to how enterprise work actually breaks than a simple benchmark or chat prompt.
-
The closest comparables come from data vendors moving up stack. Surge AI, Mercor, and Turing can bundle environments into existing lab relationships, while ServiceNow can build its own native training and benchmark layer inside software it already owns through WorkArena and BrowserGym.
-
The spending signal lines up with a broader market shift. OpenAI now offers reinforcement fine tuning with grader based workflows, Anthropic has shipped computer use infrastructure, and enterprise software platforms are increasingly framing simulation and hands on learning as the path to reliable agents.
The next step is for environment vendors to own more of the loop around training, evaluation, and deployment gating. As labs and software companies race to make agents reliable inside real applications, the winning layer will be the one that can supply realistic tasks, trusted scoring, and reusable workflows faster than customers can build them internally.