Agents Control Execution Stack
Cognition
This release shows that coding agents win by controlling the whole execution stack, not just swapping in a smarter model. For Devin and Windsurf, faster tokens matter because the agent is constantly reading files, planning edits, running tests, and retrying failed steps. When Cognition tunes the model, serves it on Cerebras at up to 950 tok/s, and adjusts the agent harness together, the result is shorter task loops and fewer breakdowns in real software workflows.
-
Devin is not a chatbot bolted onto an editor. It opens a dev environment in Windsurf, modifies code, runs tests, iterates until checks pass, and opens pull requests. In that workflow, reliability comes from how the agent sequences tools and recovers from errors, not only from benchmark quality.
-
The Cerebras tie in explains the speed claim in practical terms. At 950 plus tokens per second, Cognition can use a frontier size in house model for high volume codebase reading and generation without paying frontier model latency on every step, which improves both responsiveness and gross margin.
-
This is also a competitive response to the broader IDE market. Cursor has been making agent mode, parallel agents, terminal access, and web search central to the product, while Windsurf had already reached $40M ARR by February 2025. The market is shifting from autocomplete to full task execution inside the IDE.
The next phase is tighter vertical integration, where the best coding products train models on their own editor and agent data, run them on optimized inference infrastructure, and ship them inside proprietary workflows. That should push Devin and Windsurf toward faster multi step execution, lower unit costs, and more enterprise grade automation inside daily software development.