Cerebras Inference API Drives Growth

Diving deeper into

Cerebras vs Nvidia

Document
launched a cloud inference API in summer 2024 that has since become the primary growth driver
Analyzed 6 sources

This marks Cerebras moving from a lumpy hardware vendor into a usage based AI infrastructure company. Selling $2M systems to a handful of labs meant long procurement cycles and concentrated revenue. Launching Cerebras Inference in August 2024 turned the product into an API that developers could wire into an app in hours and pay for per token, which made speed itself the product and opened a far broader buyer base.

  • The launch timing matters because the revenue base was still heavily tied to a few large hardware buyers before the shift. Cerebras had about $78.7M of estimated 2023 revenue, with G42 accounting for about 83 percent of that year, so an API business gave it a path away from single customer dependence.
  • The product changed how money flows. Instead of selling a machine and waiting for a new procurement, Cerebras now charges on inference usage, similar to model APIs. That fit customers like Perplexity, Notion, Windsurf, and Cognition that need fast responses but do not want to buy and run specialized hardware themselves.
  • The strongest proof point is in coding agents, where latency directly affects user experience and gross margin. Cerebras says Cognition and Windsurf run fine tuned open models at 950 plus tokens per second, fast enough to route some tasks off expensive frontier models and still keep the interaction feeling immediate.

The next step is that inference clouds like Cerebras will compete less on raw benchmark bragging rights and more on owning latency sensitive workloads such as coding, search, and agent loops. If Cerebras keeps turning hardware advantages into easy API adoption, it can carve out a durable layer beside Nvidia rather than trying to replace it head on.