Cerebras secures multiyear OpenAI inference agreement
Cerebras
This deal makes Cerebras more than a niche chip vendor, it makes the company part of OpenAI’s production inference stack. The important shift is from selling a few $2M systems to labs, toward selling always on compute capacity for live products. OpenAI is using Cerebras for a latency first tier where speed matters more than absolute standardization on Nvidia, and the agreement runs long enough to underwrite major infrastructure buildout through 2028.
-
The agreement covers 750 megawatts of inference capacity delivered in phases through 2028, and external reporting pegged its value at more than $10B. That is infrastructure procurement at cloud scale, not an experimental hardware purchase, which gives Cerebras multi year revenue visibility and a much larger deployment footprint.
-
OpenAI has already put a model into production on Cerebras chips, GPT-5.3-Codex-Spark, as a low latency serving tier. That fits Cerebras’ core advantage, keeping more of the model on one wafer scale chip so responses arrive fast enough for coding agents and other interactive workflows where delay breaks the product experience.
-
The broader context is OpenAI diversifying away from a single supplier. Internal research also shows OpenAI pairing Cerebras with AMD, Broadcom, and Arm related efforts, while keeping GPUs central. Cerebras is therefore not replacing Nvidia across the stack, it is winning the slice of inference where ultra fast token generation changes unit economics and user experience.
Going forward, the biggest consequence is that inference infrastructure will split into tiers. Nvidia remains the general purpose default, but Cerebras is positioned to own the premium low latency lane for coding, agentic, and real time products. If that lane keeps growing, Cerebras can evolve from a hardware company with concentrated buyers into a scaled inference utility for frontier model labs.