H100 Pricing Drives Cloud Split
Samiur Rahman, CEO of Heyday, on building a production-grade AI stack
This pricing gap is why GPU clouds like CoreWeave broke open the market. For a company like Heyday, H100s are not a nice to have expense, they are the machine that runs fine tuning and model serving. If AWS charges roughly $3.93 per H100 hour on p5 while CoreWeave has been quoted around $2 to $4 per H100 hour depending on configuration, the hyperscaler premium is big enough to outweigh AWS reliability for many startup workloads.
-
The practical split is cheap training on NeoClouds, reliable inference on AWS. Another ML team described AWS as the place to pay the AWS tax for mature deployment, while Lambda and CoreWeave won training clusters on lower per GPU pricing and better willingness to customize interconnect and cluster setup.
-
CoreWeave did not win by having a different H100. It won by wrapping H100s in production features that looked enough like AWS, including Kubernetes, autoscaling, networking, and public APIs, while still undercutting hyperscaler pricing. That let teams move Docker workloads over without rebuilding their stack.
-
Groq points to the next step in the market. Instead of renting the same Nvidia GPU more cheaply, it sells inference on custom chips through a token priced API. That matters because once buyers care more about tokens per second and cost per token than raw GPU access, Nvidia cloud pricing becomes less central.
The market is heading toward a clean split. Hyperscalers will keep the most reliability sensitive workloads, NeoClouds will keep winning whenever GPU hours are compared line by line, and specialized inference clouds like Groq will pull off the highest volume serving jobs if they keep delivering much faster output at competitive unit cost.