API inference broadens Cerebras customer base
Cerebras vs Nvidia
This shifts Cerebras from selling a few giant machines to selling a fast utility that can grow with customer traffic. A national lab might buy one system after months of procurement, but an API lets a startup plug in the same day, test one workflow, then ramp usage as tokens grow. That makes demand less tied to a handful of big hardware deals and more tied to everyday product usage in coding, search, and enterprise AI apps.
-
The old business was concentrated and slow. Cerebras disclosed that G42 made up about 83% of 2023 revenue, and the company had historically sold roughly $2M systems into labs and specialized buyers. API inference broadens the buyer base from procurement driven institutions to software teams that buy with a credit card or lightweight enterprise contract.
-
The new buyer cares less about chip architecture and more about what shows up in the app. Coding products like Cognition and Windsurf can run fine tuned open models at roughly 950 plus tokens per second, which means autocomplete, code edits, and agent steps return fast enough to feel interactive, while also lowering cost versus sending every task to a frontier model.
-
This is the same go to market logic that helped inference platforms like Together AI expand beyond raw GPU rental. Usage priced APIs pull in many smaller customers, then expand with workload volume. In that model, the valuable asset is not just the chip, it is the ability to turn low latency into repeatable token spend across many applications.
Going forward, the center of gravity moves toward latency sensitive AI products where speed changes user behavior and unit economics. If Cerebras keeps turning hardware advantages into easy API adoption, it can carve out a durable slice of inference even while Nvidia remains the default training and general purpose compute platform.