Kong Enforces Runtime AI Cost Control

Diving deeper into

Kong

Company Report
the governance layer between agents and enterprise systems becomes a prerequisite for controlling AI spend.
Analyzed 10 sources

Kong is trying to make AI cost control happen before the model call, not after the bill arrives. In agent workflows, token spend explodes when agents keep calling tools, pulling extra context, and bouncing across models. A governance layer sits in that path, deciding which APIs an agent can use, which model gets each request, and what traffic should be cached, blocked, or rerouted. That turns cost control into runtime infrastructure, not finance cleanup.

  • Context Mesh matters because most enterprise context already lives behind APIs and event streams, not inside the model stack. Kong is extending from proxying LLM calls to discovering existing APIs, turning them into agent ready tools, and applying runtime governance on top, which makes every tool call part of the same control plane as the model request.
  • The practical spend problem is multiplicative. A normal chatbot makes one model call. An agent can call a model, hit a CRM API, fetch inventory, ask another model, and repeat that loop several times. Kong’s semantic routing, caching, MCP traffic controls, and A2A observability are all aimed at cutting unnecessary repeats and sending simpler work to cheaper models.
  • This is also a buyer expansion story. The original API gateway buyer cared about uptime and security for app traffic. The AI gateway buyer is often a platform or governance team that now needs one place to enforce auth, rate limits, audit trails, and provider controls across LLM, MCP, and agent to agent traffic. Databricks is pushing a similar control point inside its data platform.

The next step is that enterprises will treat governed agent traffic the way they already treat governed API traffic. If that shift holds, the winning layer will not just route prompts. It will decide which tools agents may touch, how much context they can pull, and which model is worth paying for on each step of a workflow.