MLOps Shifting From Research to Ops
Towaki Takikawa, CEO and co-founder of Outerport, on the rise of DevOps for LLMs
The center of gravity in MLOps is moving from notebook convenience to production reliability. Earlier tools won by letting researchers log metrics with a few lines of Python, but LLM deployment now looks more like running a busy distributed system, with 10 to 20GB model files, GPU memory handoffs, alerting, and live updates across Kubernetes clusters. That shift changes the buyer from the model builder to the team on call for uptime and cost.
-
The old stack was built for experimentation. Researchers used tools like Weights & Biases because they could push training metrics into a web dashboard quickly. Ops teams instead want OpenTelemetry, time series databases, and Grafana, because those fit existing incident response and monitoring workflows.
-
The technical trigger is model size and runtime complexity. A ResNet era model could be hundreds of megabytes, while a small LLM can be around 17GB. That makes deployment less about serving Python code and more about moving huge artifacts through storage, CPU RAM, and GPU memory without long cold starts.
-
This is creating a split market. Modal abstracts GPUs behind a simple developer experience for smaller projects, while tools like Outerport are being built around daemon processes, gRPC, and Kubernetes friendly memory management for teams running self hosted models and compound AI systems at higher scale.
The next phase of MLOps will look more like classic infrastructure software, but rebuilt around models instead of containers. The winners will be the tools that fit cleanly into enterprise observability, CI/CD, and security stacks, while still hiding enough of the GPU and model loading complexity that AI teams can ship faster.