GPU Shortages Threaten Baseten Scaling

Diving deeper into

Baseten

Company Report
Shortages of H100/B200 GPUs could restrict the company's ability to scale customer workloads and sustain competitive pricing.
Analyzed 5 sources

GPU scarcity matters because Baseten sells reliability and speed, not just raw model hosting. If H100 and B200 capacity tightens, the immediate problem is not abstract supply chain risk, it is slower customer onboarding, less room to absorb traffic spikes, and weaker ability to keep per minute pricing low while still protecting margin. That pressure is sharper for an inference platform because enterprise customers expect endpoints to come up fast and stay available during unpredictable bursts.

  • The market is still concentrated around a small number of premium NVIDIA SKUs. In one cloud GPU customer interview, the buyer described the market as mainly about how many H100s or B200s are on offer, and said reserved capacity can take 3 to 6 months to stand up. That lead time is manageable for planned clusters, but painful for an inference platform that needs to add capacity quickly.
  • Baseten does have a pricing ladder from older T4s up to B200s, which helps it route some workloads to cheaper hardware. But the highest value inference use cases, larger models, lower latency targets, and memory hungry deployments increasingly want top tier GPUs. When those are scarce, Baseten either pays more for supply, queues customers, or steers workloads onto less optimal chips and accepts lower performance.
  • Larger GPU clouds can turn hardware access into a competitive weapon. CoreWeave had B200 instances generally available by May 29, 2025, and AWS announced general availability of EC2 P6 B200 instances on May 15, 2025. Providers with earlier or broader Blackwell access can win customers simply by being able to say yes faster, while also spreading fixed infrastructure costs across more volume.

Going forward, the winners in inference will be the companies that pair good developer tooling with the deepest and most flexible hardware supply. That pushes Baseten toward securing more durable reserved capacity, broadening workload support across more GPU types, and using software scheduling to squeeze more output from every scarce H100 or B200 before hyperscalers and larger GPU clouds turn hardware access into pricing power.