Unbundling Training and Inference Clouds
Lambda customer at Iambic Therapeutics on GPU infrastructure choices for ML training and inference
This split shows that GPU cloud is not one market, it is two jobs with different winners. Training rewards cheap reserved clusters with strong interconnect, while deployment rewards boring reliability, on demand capacity, and infrastructure that can be scripted, versioned, and kept up all the time. At Iambic, training on Lambda runs roughly $500,000 to $1M per month, while AWS inference is far smaller and valued more for uptime and tooling than raw GPU price.
-
For training, the deciding factor was not just cheaper H100 hours, it was getting a fixed cluster with the right InfiniBand style networking and hardware layout. Iambic evaluated Lambda, CoreWeave, AWS, and Oracle, and found the neo clouds more willing to customize the cluster and still come in cheaper.
-
For deployment, the hard part is everything around the GPU. AWS brings EKS, EC2, S3, Terraform friendly workflows, and easy access to smaller inference GPUs like A10G and L40S. That matters when a model has to answer requests quickly, survive spikes, and be managed like normal software infrastructure.
-
This pattern shows up elsewhere. Heyday used Lambda for cheaper experimentation, but kept production on CoreWeave because it already behaved more like an AWS style Kubernetes environment with autoscaling, networking, and production features. Across the market, startups often begin on lower cost GPU specialists, then move toward platforms with stronger serving layers as workloads mature.
The next step is a fuller unbundling of training clouds from inference clouds. Neo clouds are moving up from raw GPU rental into one click clusters and better researcher tooling, while hyperscalers and production focused GPU platforms keep winning the serving layer. Over time, the strongest providers will be the ones that package the surrounding workflow, not just the chip hour.