Lambda for training AWS for inference

Diving deeper into

Lambda customer at Iambic Therapeutics on GPU infrastructure choices for ML training and inference

Interview
Lambda Labs wins on price and flexibility for training workloads, while AWS provides reliability for inference despite higher costs.
Analyzed 4 sources

This split shows that GPU cloud is separating into two jobs, cheap custom clusters for building models, and mature general cloud for serving them. At Iambic, training means reserving symmetric A100, H100, and B200 clusters with strong InfiniBand links for months at a time, where Lambda could customize the setup and beat hyperscaler pricing. Inference is smaller spend, but it needs predictable spin up, storage, and uptime, which made AWS worth the premium.

  • The real training bottleneck was not just GPU access, it was cluster design. Iambic needed HGX style systems with high quality interconnect, and in late 2023 Lambda and CoreWeave were willing to rework specs around that need, while AWS and Oracle were not ready on the same timeline.
  • The budget split makes the tradeoff concrete. Iambic spends roughly $500,000 to $1M a month on Lambda training clusters, versus about $50,000 to $100,000 on AWS inference on average. When training is the larger line item, a 2x per GPU hour difference matters far more than extra cloud convenience.
  • This matches the broader market shape. CoreWeave is pushing upmarket into large reserved enterprise clusters, Lambda is winning more flexible growth stage workloads, and software layers like Together AI resell capacity with easier APIs for startups that do not want to manage raw infrastructure themselves.

The next step is a cleaner middle layer that gives training teams Lambda style economics with much less manual infrastructure work. If NeoClouds can package reserved multi GPU clusters with better scheduling, storage, and Kubernetes primitives, more inference and even mixed training deployment workflows will move off hyperscalers over time.