Cerebras Growth Fueled by Coding Agents
OpenAI's side chip
Cerebras’s shift from selling a few $2M machines to renting speed by the token turned a narrow hardware business into an infrastructure supplier for AI coding products. In 2023, revenue was still tied to big hardware deals with labs and enterprises. By 2025, Cerebras Cloud reached $152M, or 30% of revenue, because coding agents need answers almost instantly while they loop through edits, tests, and subagents across a codebase.
-
The old business was lumpy and concentrated. Cerebras sold wafer scale systems into national labs, pharma, and energy customers, with long sales cycles and very large deal sizes. Moving to an API replaced occasional box sales with continuous usage revenue from software companies and enterprises.
-
Coding agents are an especially good fit because latency changes the product itself. A model that streams at roughly 950 to 1,000 plus tokens per second can feel interactive enough for rapid retries, codebase scans, and parallel subagents, which is why products like Cognition, Windsurf, and Codex became meaningful demand drivers.
-
This also explains why Cerebras is positioned differently from Nvidia. Nvidia mostly reaches inference demand through GPU clouds, while Cerebras is selling a more opinionated stack aimed at the small but growing slice of workloads where every second matters and faster responses let customers improve both user experience and gross margin.
The next leg is less about winning generic AI compute and more about becoming the default backend for agentic software. As coding tools, research agents, and reasoning workflows make many more short inference calls, the vendors that can turn low latency into a visibly better product will send a growing share of spend toward specialized inference platforms like Cerebras.