CoreWeave outages compared to AWS

Diving deeper into

Samiur Rahman, CEO of Heyday, on building a production-grade AI stack

Interview
CoreWeave’s definitely had more outages than AWS.
Analyzed 6 sources

This is the core tradeoff that created the NeoCloud category, cheaper AI compute now in exchange for less battle tested uptime. Heyday was willing to run all ML compute on CoreWeave because it offered AWS like Kubernetes workflows, public endpoints, VPC support, and autoscaling at materially lower cost than AWS. But when inference failures hit, the product breaks in visible ways, which is why AWS still carries a reliability premium for production workloads.

  • For Heyday, the difference was not raw model speed. H100s on CoreWeave and Lambda were comparable. The real difference was operational maturity. CoreWeave let the team move Docker and Kubernetes based workloads over from AWS with minimal code changes, while Lambda stayed mostly for cheaper experiments and training runs.
  • A second customer described the same market split more explicitly. NeoClouds like CoreWeave and Lambda were cheaper and more configurable for training, sometimes by roughly 2x per GPU hour, while AWS was used for inference because teams expected mature storage, programmable infrastructure, and higher uptime.
  • The strategic point for CoreWeave is that every outage matters more than it does for AWS. AWS has formal SLAs across core services like EC2 and S3, and customers already treat it as default infrastructure. CoreWeave is building toward that standard, but its wedge has been AI specific GPUs plus AWS like tooling, not AWS level trust yet.

Going forward, CoreWeave’s next leg of competition is less about getting access to GPUs and more about proving that AI native cloud can be as boring and dependable as traditional cloud. If it closes that trust gap while keeping a cost edge, it becomes much harder for startups to justify moving inference back to AWS.