InfiniBand-driven Custom Cluster Lock-in

Diving deeper into

Lambda customer at Iambic Therapeutics on GPU infrastructure choices for ML training and inference

Interview
That flexibility was necessary in order for us to sign any contract at all because we needed this InfiniBand.
Analyzed 4 sources

The real moat in training cloud is not raw GPU access, it is willingness to reshape the cluster around a customer’s exact workload. At Iambic, InfiniBand grade interconnect was a hard requirement for synchronous multi GPU training, and AWS and Oracle could not deliver it on time even at higher prices. Lambda and CoreWeave won because they would custom spec the network, reserve the cluster for 12 to 18 months, and still come in cheaper.

  • This is why GPU hours are only partly fungible. Once a team builds around a provider’s Kubernetes setup, storage design, security model, and reserved cluster shape, switching means retraining people, rewriting infra, and risking downtime. Iambic says Lambda deepened that lock in by helping with custom Kubernetes components, storage setup, and even an air gapped cluster.
  • The comparison with CoreWeave shows where Lambda sits in the market. CoreWeave is built for the biggest enterprise reservations and very large HGX deployments, while Lambda has historically won smaller and mid market buyers with lower per GPU pricing and more flexibility. In Iambic’s late 2023 bake off, the two were technically close, and Lambda won mainly on lower final price.
  • Hyperscalers still matter, but more for inference than training in this workflow. Iambic keeps inference on AWS because it wants mature services, on demand instance availability, S3, and infrastructure as code. That split shows how NeoClouds can win the high performance training cluster, while AWS keeps the always on production serving layer.

The next phase of competition moves from simply having GPUs to packaging them into repeatable, high performance clusters with better software around scheduling, storage, and observability. As Lambda standardizes its one click clusters and expands InfiniBand connected capacity, it can turn what began as custom engineering for customers like Iambic into a broader product advantage for research teams that need top tier training performance without CoreWeave scale or hyperscaler overhead.