Modular AI Infrastructure vs Lock-in
Modular
The real moat in cloud AI stacks is convenience, but that convenience comes from wiring the compiler, runtime, chips, storage, and orchestration into one vendor specific path. AWS Neuron ships as the developer stack for Trainium and Inferentia with its own compiler and runtime, while Google’s TPU stack ties JAX and PyTorch/XLA into LibTPU and XLA on Cloud TPU. That makes these systems fast inside one cloud, but each extra optimization pushes teams deeper into that cloud’s tooling and hardware path.
-
In practice, portability breaks at the point where an ML team has tuned kernels, profiling, distributed training, and deployment around one accelerator family. Moving from Trainium to TPU, or from TPU to another cloud, is not just moving model weights. It often means changing runtimes, retesting performance, and rebuilding ops workflows.
-
The tradeoff looks similar to Databricks in cloud data. Multi cloud support is valuable because buyers want leverage and flexibility, especially when infrastructure costs are large and long lived. Modular is extending that logic lower in the stack, into the compiler and execution layer where hardware specific lock in usually starts.
-
This is why cloud vendors still have a distribution advantage. They can bundle silicon, managed services, notebooks, storage, monitoring, and procurement into one contract. But that same bundling makes them weakest in on premises, hybrid, and cross cloud setups, where customers care more about running the same workload in multiple places than squeezing out one cloud specific optimization.
The next phase of AI infrastructure will split between vertically integrated cloud stacks for customers that want the simplest in cloud path, and neutral software layers for customers that want bargaining power across chips and clouds. As enterprises spread inference and training across more environments, portability will become a buying criterion, not just a technical preference.