LLM Deployment as Data Movement

Diving deeper into

Towaki Takikawa, CEO and co-founder of Outerport, on the rise of DevOps for LLMs

Interview
LLM deployment differs fundamentally from traditional software deployment because models are massive files
Analyzed 5 sources

The key change is that LLM deployment is becoming a data movement problem, not just a code shipping problem. In normal software delivery, the hard part is rolling out new code safely. In LLM systems, teams also have to fetch, stage, cache, and swap model weight files that can be tens of gigabytes, then fit them across CPU RAM and GPU memory before any request can run. That turns startup time, memory pressure, and artifact handling into first order infrastructure concerns.

  • A small computer vision model like ResNet could be around 170MB, while a 7B LLM can be roughly 17GB. That size jump breaks assumptions built into normal container and rollout workflows, where images are expected to be relatively lightweight and fast to start.
  • The bottleneck is not only storage. Model weights often move from object storage to CPU memory, then again into GPU memory. Outerport’s wedge is a daemon that keeps weights warm in CPU memory and swaps them into GPUs faster, which cuts cold starts that can otherwise stretch toward a minute.
  • This is why the winners in LLM infrastructure split into layers. vLLM and TensorRT-LLM optimize runtime performance once a model is loaded, while tools like Outerport and Modal focus on orchestrating where models live, when they load, and how multi-model systems avoid paying that startup cost over and over.

The next phase of MLOps will look more like DevOps for fleets of large artifacts. As more enterprises run custom and self hosted models, the control point will shift toward systems that can do live model swaps, progressive rollouts, and compound AI orchestration without forcing every deployment to reload huge files from scratch.