Control Layer for Multi-Model AI

Diving deeper into

Towaki Takikawa, CEO and co-founder of Outerport, on the rise of DevOps for LLMs

Interview
compound AI systems merge together multiple custom AI models, LLMs and diffusion models with databases and external APIs
Analyzed 4 sources

Outerport is building the control layer for self hosted AI systems, where the hard part is not calling a model but keeping many large models ready so a workflow can run without long pauses or wasted GPU time. Its software runs as a daemon on a machine, manages model weights across storage, CPU, and GPU memory, and lets developers swap models through a simple API instead of rebuilding their inference stack.

  • In practice, Outerport tackles cold starts. Loading even a small LLM can take up to a minute, because weights must move from disk to CPU memory and then into GPU memory. Outerport keeps multiple model weights staged in memory and speeds swapping, which matters when one workflow chains several models together.
  • The product fits compound AI workflows, where one system may call an LLM, a diffusion model, a database, and an external API in sequence. Without a scheduler for large model files, teams either keep separate GPUs running for each step, which is expensive, or accept long waits while each model loads.
  • Relative to Modal or Baseten, Outerport sits lower in the stack. Modal makes GPU execution feel like a Python function in the cloud, and Baseten focuses on model hosting and inference. Outerport is the memory management and deployment layer for custom, self hosted, multi model systems that need to run across mixed hardware.

The next step is from faster model swapping into full DevOps for AI models, including live updates, version control, and deployment across on premises, cloud, and edge hardware. If multi model agents become normal enterprise software, the companies that own model movement and orchestration will become core infrastructure.