Kong AI Gateway Chooses Cheaper Models
Kong
This is really a buying wedge into AI infrastructure budgets, because it turns model choice from an app level guess into a centrally enforced cost policy. In practice, one gateway sits in front of OpenAI, Anthropic, Azure OpenAI, and self hosted models, then inspects each request and sends easy work like summarization or classification to a cheaper model while reserving frontier models for harder prompts. That matters most in large companies running many models at once.
-
The product works like a traffic cop for LLM calls. App teams keep using one endpoint, while the gateway handles token rate limits, prompt guards, PII scrubbing, caching, logging, and model routing without adding that logic inside every application.
-
The routing piece is not simple failover. Kong added semantic routing in AI Gateway 3.8, which chooses a model at runtime based on prompt similarity thresholds, and later added body based model routing in 3.14, making model selection a policy layer rather than a hard coded app decision.
-
Competition is splitting two ways. AI native routers like Portkey and LiteLLM compete on routing intelligence and developer ergonomics, while MuleSoft, IBM API Connect, Gravitee, and WSO2 compete by bundling LLM governance into broader integration suites for buyers trying to reduce vendor count.
This pushes API gateways toward becoming the control plane for all machine traffic, not just REST APIs. As more enterprises run many models, agents, and tools at the same time, the winning platform will be the one that can decide, in real time, which request needs the best model and which one only needs the cheapest acceptable answer.