New DevOps Layer for LLMs

Diving deeper into

Towaki Takikawa, CEO and co-founder of Outerport, on the rise of DevOps for LLMs

Interview
existing DevOps tools weren't built to handle
Analyzed 3 sources

This points to a new infrastructure layer where the hard part is no longer shipping code, but keeping giant model files warm, movable, and live on GPU backed systems. Traditional DevOps assumes small containers that start fast. LLM systems instead require moving 10GB to 20GB artifacts from storage into CPU memory and then into GPU memory, often with minute long startup times, which breaks normal rollout, autoscaling, and zero downtime update patterns.

  • Earlier MLOps tools were built for researchers logging experiments through simple Python APIs. The stack now shifts toward ops teams that want OpenTelemetry style telemetry, durable dashboards, alerting, and deployment controls for production inference and training traffic.
  • The practical bottleneck is cold start. A small computer vision model like ResNet was about 170MB, while a 7B LLM can be about 17GB. That size jump turns model rollout into a memory orchestration problem across disk, CPU RAM, and GPU RAM, not just a container shipping problem.
  • This is why tools like Outerport and Modal matter. The opportunity is in making multi model systems feel more like microservices, where developers can swap models, keep state warm, and run chained workflows without paying repeated one minute load penalties on every step.

The category is moving toward a split market. High level tools will keep serving teams that orchestrate model APIs, while a new DevOps for self hosted AI stack grows around model registries, live model swapping, GPU memory management, and production observability. As compound AI systems spread, that lower layer becomes core infrastructure rather than an edge case.