RunPod Undercuts Serverless API Pricing

Diving deeper into

Segmind

Company Report
RunPod's sub-$0.001 per second pricing puts pressure on serverless API margins across the industry.
Analyzed 6 sources

This pricing gap means serverless inference is turning into a scale business where simple model hosting stops being very profitable. Segmind charges about $0.0108 per GPU second on its serverless API tier, while RunPod now lists several serverless GPUs below $0.001 per second, including A100 at $0.00076 and 16GB GPUs at $0.00016. That leaves room for infrastructure buyers to build their own endpoints much more cheaply, especially for standardized open source models.

  • The workflow difference matters. On Segmind, a developer picks from 150 plus hosted models and calls one API. On RunPod, the team packages its own model in a container, gets a REST endpoint, and pays only while the worker is running. That makes RunPod attractive for teams that already know what model they want and care most about unit cost.
  • The interview evidence shows why this pressure is real in practice. Segmind uses RunPod serverless for both inference and fine tuning, monitors per second pricing closely, and chose RunPod partly because it was among the cheapest options with broad GPU choice. The team also found RunPod easier for endpoint monitoring than Modal.
  • Modal sits at the opposite end of the market. It offers B200 and H200 GPUs serverlessly, plus multi node clusters with high speed interconnect, which is a better fit for very large models and high performance training. That supports premium dedicated workloads, but it does not remove the low end pricing pressure that RunPod creates for commodity inference.

The next step is a split market. Low complexity inference will keep getting cheaper and move toward raw infrastructure pricing, while companies like Segmind will have to win with packaged workflows, tuned vertical models, and tools like PixelFlow that save customers setup work instead of just reselling GPU time.