Semantic Routing Saves Enterprises Millions

Diving deeper into

Augusto Marietti, CEO of Kong, on the end of tokenmaxxing

Interview
an enterprise can save tens of millions of dollars by having the gateway semantically route those requests to a less expensive, less powerful model
Analyzed 7 sources

This is really a FinOps control point for AI, not just a plumbing layer. In practice, the gateway sits between employees or internal apps and a company’s mix of models, then classifies each prompt and decides whether it needs a top tier model or can be handled by a cheaper one, while also caching repeat requests and enforcing token limits. That matters because large enterprises increasingly run several LLMs at once, and without a routing layer, the default behavior is to send everything to the most capable and most expensive endpoint.

  • Kong is positioning this as a behind the firewall enterprise product, distinct from OpenRouter style token brokerage and from public gateways like Vercel or Cloudflare. The buyer is a central platform or security team that wants one control plane for internal AI traffic, not a developer looking for the cheapest public API.
  • The savings come from task matching. A retrieval question, summarization job, or routine support prompt often does not need the strongest reasoning model. Kong added semantic routing and semantic caching in 2024 so the gateway can inspect the request, pick an adequate lower cost model, and avoid repeated calls when similar prompts recur.
  • This is becoming more important as AI usage shifts from occasional chat to higher volume internal workflows. Kong says most enterprise AI gateway deployments today use routing, caching, and prompt compression, and that enterprises commonly operate multiple LLMs internally, which creates the setup needed to trade model quality against cost on each request.

The next step is that model routing becomes a standard enterprise policy layer, much like API rate limiting did in the last cloud cycle. As companies move from employee chatbots to agents that make many model calls per task, the winners will be the gateways that combine routing, governance, and billing, because that is where AI cost, security, and operational control converge.