Cloud Providers Subsidize AI Inference

Diving deeper into

Replicate

Company Report
These providers can offer AI inference at cost as a loss leader to promote adoption of their broader cloud offerings
Analyzed 6 sources

The key advantage hyperscalers have is that inference does not need to make money on its own. AWS, Google Cloud, and Azure can treat model serving as one more feature inside a much larger cloud contract, where the real profit may come from storage, databases, networking, security, and long term spend commitments. That makes it hard for a standalone platform built on GPU markup to win purely on price, especially with enterprise buyers that already run inside those clouds.

  • Replicate charges on GPU usage and earns a markup on the underlying compute. That works best when customers value simplicity, model discovery, and easy packaging through Cog. It is more exposed when cloud vendors use AI endpoints to deepen broader account relationships rather than maximize margin on each inference call.
  • The bundling is concrete, not theoretical. Google lets Vertex AI online prediction usage draw on committed use discounts, Azure lets AI services charges use Azure prepayment credits, and AWS Bedrock offers lower pricing for batch and commitment based usage. A customer already committed to a cloud can see AI inference as discounted incremental spend.
  • Other specialists face the same pressure, but the stronger ones offset it with enterprise controls or performance gains. Baseten leans on compliance and multi cloud routing, while Fireworks leans on custom inference optimization and enterprise deployment options. Replicate remains strongest where fast self serve access to thousands of open models matters more than procurement standardization.

This is pushing the market toward a split. Hyperscalers will absorb the default enterprise workloads that fit neatly inside existing cloud estates, while independents will need to own a sharper wedge, such as best developer workflow, better open model coverage, or meaningfully better speed and cost on specific workloads. For Replicate, the path forward is to become more than convenient GPU resale.