Live Model Swapping Becomes Infrastructure
Towaki Takikawa, CEO and co-founder of Outerport, on the rise of DevOps for LLMs
Outerport is getting pulled in first where model switching is already a direct production cost, not where orchestration is still mostly an app logic problem. In diffusion workflows, users regularly chain several local models on one machine, so every extra reload burns GPU minutes and slows creative iteration. In LLMs, many teams still call hosted APIs or run one main model at a time, so the pain is less frequent and less visible.
-
The strongest early demand is about money, with better developer experience as the side effect. In ComfyUI style image pipelines, cutting a run from about a minute to seconds means fewer paid GPU minutes and many more creative attempts per session, which makes the savings obvious to both artists and teams.
-
LLM teams do have multi model workflows, but today they are more often stitched together at the API layer than by hot swapping several local model files in GPU memory. That pushes spending toward routing, observability, and provider selection tools like OpenRouter, rather than low level model loading infrastructure.
-
Outerport sits lower in the stack than developer platforms like Modal and inference runtimes like vLLM or TensorRT-LLM. Modal abstracts where code runs, and vLLM or TensorRT-LLM make one model run faster. Outerport is focused on the memory movement problem between storage, CPU, and GPU when many large models need to be kept warm and swapped quickly.
As LLM products move from single model chat to agentic systems that mix specialist models, retrieval, and private deployment, the diffusion world’s workflow pattern should spread into language. That shift will make cost control and complexity management converge, and it is where model loading and live swapping become core infrastructure rather than a niche optimization.