Augusto Marietti, CEO of Kong, on the end of tokenmaxxing

Background

We first covered OpenRouter at $10M annualized revenue (July 2025) and then at $50M annualized revenue (June 2026). To learn more about how model routing is evolving as a category across developers, startups, and enterprises, we reached out to Augusto Marietti, co-founder & CEO of Kong.

Key points via Sacra AI:

What started as a single "model routing" category has diverged into three distinct categories of business, with Costco-for-tokens marketplaces like OpenRouter for developers routing requests to the cheapest model with a ~5% brokerage fee, public gateways like Cloudflare and Vercel allowing startups to access many LLMs via one API key and one billing relationship, and behind-the-firewall enterprise gateways like Kong to manage internal LLM consumption. "OpenRouter is essentially the Costco of LLMs and will likely keep growing in that lane… People call all of these 'AI gateways,' but they're genuinely different markets with different sales motions and different use cases."
As companies move away from tokenmaxxing leaderboards and aim to encourage more cost-efficient LLM usage, the primary strategy has become using a gateway layer that sits between employees and their AI apps to semantically route simpler requests to cheaper models, enforce security policies, and cache repeated queries. "If 10,000 employees inside a company send a simple prompt, how do you route that to the cheapest appropriate LLM in your portfolio so you're not burning expensive tokens unnecessarily? Can you compress prompts to reduce token usage? Can you cache common prompts at the gateway layer so you never even hit the LLM?... Without that routing layer, everything defaults to the most expensive model, and costs scale proportionally with usage."
While the cost of top-end frontier model tokens has fallen 25% since 2022, total spend is up 700x with agents consuming 10x more tokens per task due to large context windows, reasoning chains, and multi-step tool loops, making model routing key to keeping AI budgets from scaling as the biggest enterprises shift from chatbots to agents over the next 2-3 years. "One of the main reasons people use our enterprise AI gateway is exactly to manage that, since end users typically don't think about which model they're calling, they just want an answer. At scale, with 10,000 employees asking mid-complexity questions, an enterprise can save tens of millions of dollars by having the gateway semantically route those requests to a less expensive, less powerful model when the task doesn't need the frontier model. Without that routing layer, everything defaults to the most expensive model, and costs scale proportionally with usage."

Questions

To get started, maybe we could take the historical route and talk about the evolution of Kong, coming up in the early API marketplace era, through microservices, to now the AI gateway. How do you think about the history of APIs in this market?
Is that long-tail SaaS dynamic, similar to what Zapier built around connecting all these apps, still relevant when it comes to AI and agents, or has that changed?
On the topic of MCP, do you see it as the best long-term solution, or more of an intermediate step?
What do you make of foundation model labs getting into API middleware, like Anthropic acquiring Stainless?
Let's talk about billing. You mentioned it earlier, could you talk about OpenMeter and why Kong acquired it?
From your perspective, how do you think about the different pricing model flavors out there, pay-as-you-go, tiered credits, and other usage-based schemes companies are using?
From what you're seeing, are people coming to OpenMeter primarily for monetization or for cost control? And has that shifted since the acquisition in September?
You mentioned Metronome being acquired by Stripe. How do customers decide between Kong, OpenMeter, Metronome, or Orb? Is the API integration a major differentiator?
How do you think about "model routing" as a market?
What's the split you're seeing between enterprises enforcing internal guardrails and throttling versus building customer-facing AI features?
How much are you seeing open source models gain ground, are they reaching parity with the frontier labs?
I'm curious what you're seeing on cost. The recent Opus models, paired with something like the Cursor/Sonnet combination, were the first time I'd really had to think seriously about cost. What are you seeing internally as frontier models get more expensive at the high end, more throttling, more routing to cheaper models?
What about security? Has that come up more given things like the LiteLLM breach?
What about the shift toward agents? Companies used to run single LLM calls for chatbots, now agents make many calls over the course of a run. Is that changing what you're seeing from enterprises around cost or restrictions?
Looking ahead, can you talk a bit about what you’re thinking about going forward with the evolution of agents and marketplaces?

Interview

To get started, maybe we could take the historical route and talk about the evolution of Kong, coming up in the early API marketplace era, through microservices, to now the AI gateway. How do you think about the history of APIs in this market?

Kong today is about a thousand people, operating in 28 countries, roughly 80% enterprise (the global 2,000 and global 5,000) and 20% SMB, mid-market, and AI-native companies in the long tail. But Kong started as an API marketplace.

In 2010, the original thesis was that cloud computing was becoming the electricity of the industrial revolution, metered, switched on and off, with the big hyperscalers like AWS in their early days. What was missing was the assembly line. The two biggest innovations of the second industrial revolution were the assembly line and electricity. We saw cloud computing as the electricity, but there was no assembly line in software. We thought APIs would become that assembly line.

So we built a GitHub for APIs and monetized it. We raised capital, had plenty of near-death moments running out of money, especially as an immigrant with no visa. I actually slept on Travis Kalanick's couch at one point, that's a whole separate story. But the thesis was: build the assembly line.

We got to 300,000 developers and a couple million dollars in gross merchandise volume. Some companies, like an image processing API that was very popular at the time, were monetizing entirely through Kong. It was long tail, not enterprise at all, and there were real liquidity and supply issues. The APIs weren't very reliable since they came from the long tail.

You could have a text processing API charging $50 a month, and if the provider went on vacation and the API went down, we were blamed but couldn't reach them. We also rarely had infrastructure exclusivity, so people would still need to get an API key on the provider's website and come back to integrate it. All of this friction kept the marketplace from working well, and growth started to tank about halfway through our Series A.

So we looked at what we'd built internally. One thing we'd built was an API router, gateway, and proxy doing rate limiting, throttling, caching, and billing. We realized every company in the world would eventually need this as they became API companies. This was around 2015, 2016, when Docker, Terraform, and Elasticsearch were all coming up. So we decided to open source the API Gateway and make that infrastructure available to everyone.

We launched as Kong in mid-2015 and got about 5,000 GitHub stars in thirty days, which was a big deal at the time. That's when we knew this was a real business. We sold Mashape to an Andreessen Horowitz portfolio company, raised our Series B, and became Kong, Inc. in August of 2017. That was the start of this new journey.

Is that long-tail SaaS dynamic, similar to what Zapier built around connecting all these apps, still relevant when it comes to AI and agents, or has that changed?

I'll bring up MuleSoft too. They were good at integrating systems that didn't talk to each other, enterprise service bus style. Then APIs came along with the cloud-native revolution, and workloads started speaking the same language, which removed the need for that kind of enterprise data mapping. That's when we built a cloud-native API gateway and took the position that everything is an API, let's just make those run faster, more efficiently, more securely, and more governed, especially internally. Some enterprises have 20,000 or even 30,000 APIs internally, especially the global 2,000.

In this new AI world, with much bigger workloads, what worked before doesn't work as well now. In the cloud-native world, it was fine to be siloed: a business unit running a mobile loyalty program would have its own database and stack, central data teams would use something like Databricks or Teradata, different teams would run AWS or GCP, and you'd only integrate cross-functionally for things like ServiceNow, Workday, or Salesforce. Everything else stayed in its own silo.

In an agentic world, that doesn't work, because agents need to operate end-to-end across the enterprise to be valuable, and right now each system is siloed, with no shared authentication, authorization, or system of record. Enterprises are struggling to unblock AI in production for exactly this reason, that's why forward-deployed engineering is having a moment. I believe one of the biggest unlock problems in AI right now is to build “digital railroads” so agentic workflows can actually move across systems. That's part of what our AI gateway is trying to solve.

To your question about SaaS: every SaaS company will need to become an API key or die. A lot of SaaS products will disappear because there's no reason to keep buying, say, a standalone HR dashboard. But SaaS companies with proprietary data or critical workflows, like ServiceNow, will survive, they'll just need to become headless, exposed through an MCP server, an API key, or whatever protocol comes next. I think SaaS interoperability becomes the default status quo going forward.

On the topic of MCP, do you see it as the best long-term solution, or more of an intermediate step?

I describe MCP as Duolingo for APIs, it makes them speak natural language. So if you're interacting through chat or natural language, you need MCP. If you're machine-to-machine, you don't, you'd use an API or SDK directly. But anytime natural language is involved, MCP is needed.

We're actually involved with the MCP spec through Anthropic. In it, there are some authorization and authentication issues we're seeing show up in the enterprise. MCP is great for governance of traffic. Plain CLI access is more one-to-one, but for the enterprise, MCP gives you authentication, governance, logging, and analytics. We offer an MCP gateway, essentially a managed layer over a fleet of MCP servers, that lets the CIO, CISO, and CTO sleep at night. I think MCP is here to stay in the enterprise.

There's also A2A for agents, which is newer, and I expect more protocols to emerge, including new payment protocols from various providers. At the core, though, the world is moving from GUIs (for humans) to programmatic interfaces (for agents). That's the throughline, whatever you call the specific protocol.

Our AI gateway covers three areas: the LLM gateway, which was our original product, the MCP gateway, and the agent gateway, which is the newest. The biggest inflection point for us was about a year ago when enterprises started adopting the MCP gateway specifically, because they needed to manage and govern MCP traffic. Our business grew 11x in eleven months around that. MCP was really the killer use case for the AI gateway. I think it'll stay essential for the natural-language layer of AI and will keep improving over time.

What do you make of foundation model labs getting into API middleware, like Anthropic acquiring Stainless?

I know that team, they're phenomenal. Stainless really nailed something nobody expected you could charge for, high-quality SDKs. It's almost funny that even companies doing AI-assisted coding still need to buy an SDK company's output because the quality bar is so high.

I think there are two threads here. AI started with chat as the interface, but I think for humans the long-term interface becomes voice, once it's fast and accurate with no lag. For machines, it becomes APIs or whatever the prevailing programmatic protocol is. I think the labs know that, which is why they're moving into SDKs, though I don't think it's about SDK revenue itself, that's probably only a few million dollars as a business. I think it's about the data: they get visibility into how people actually use the SDK, which helps them improve products like Claude Code using that metadata, since Stainless was used by OpenAI and Grok as well. My read is it's a strategic move to capture SDK usage data to improve their own agentic coding tools.

Let's talk about billing. You mentioned it earlier, could you talk about OpenMeter and why Kong acquired it?

We acquired OpenMeter in September. I'd actually been an angel investor in Metronome back in 2020, though I came to think Metronome was too expensive and was built more for product managers tweaking monetization features than for engineers. I'd been watching this space for a while for two reasons.

First, our large enterprise infrastructure customers wanted chargeback capabilities, this business unit gets charged this much, and so on, the classic chargeback use case. Second, once we started seeing token usage and volume grow with our AI traffic, it became clear that the future of AI monetization wasn't primarily a billing problem, it was a metering problem: tokens per millisecond, tokens per watt on the margin side, tokens per second on throughput. I became convinced metering would be the foundation for billing in the age of AI.

Since we were already proxying the traffic, the question became: what if people could just route that traffic into a metering layer, set a price, and connect to a payment gateway with one click? I looked at five companies, and OpenMeter had the best metering technology and the most engineering-driven approach. It didn't have things like trial refills or some other features the others had, but it was the strongest for engineers, which is what I valued.

We went with them and acquired the company. Now we have an end-to-end solution: you manage, govern, secure, and accelerate API and AI traffic, and now you can also monetize it with a click. If you don't want to monetize, you can use it for internal chargebacks, or for managing gross margin and token cost management, which is a big topic right now. It gives us a feature set that will only become more important in a token-based economy.

From your perspective, how do you think about the different pricing model flavors out there, pay-as-you-go, tiered credits, and other usage-based schemes companies are using?

I think eventually all revenue becomes token revenue, directly or indirectly. The most common monetizable unit will be tokens. API calls remain a unit too, and eventually outcomes will be as well, we're already seeing early forms of that. Voice will likely be its own dimension, billed by the minute. Storage businesses bill on compute and gigabytes.

Pretty much everything comes down to an API call at the base layer, but tokens sit as the layer above that, and outcomes-based pricing sits above tokens. Outcome-based pricing already works reasonably well for simpler use cases, like an AI support resolving a ticket for a fixed cost, but it's not yet quantifiable for more complex business use cases. I think in five to ten years we'll see real outcome-based pricing become standard.

From what you're seeing, are people coming to OpenMeter primarily for monetization or for cost control? And has that shifted since the acquisition in September?

It's mostly monetization. We can sell it to a company like Docker to monetize containers per minute, which has nothing to do with APIs specifically, or any other custom usage object. It's a full suite.

So as I was saying, we can sell the metering and billing product on its own, even to companies that aren't doing anything with APIs as a unit of measure, we go after that market directly too.

The key insight is that one plus one equals three: you manage your APIs, set up a gateway, and can monetize traffic through plans, but we also go after billing and metering independently. About half of our billing and metering customers aren't API-centric businesses at all, they're monetizing entirely different things.

We're doubling the engineering team on that side and we're moving as fast as we can on the roadmap: credits, auto-refill, and other features for that classic billing and metering segment. We're pushing hard there because it also lets us cross-sell into APIs, though it doesn't have to start with APIs.

You mentioned Metronome being acquired by Stripe. How do customers decide between Kong, OpenMeter, Metronome, or Orb? Is the API integration a major differentiator?

For non-API companies, customers choose OpenMeter because, to this day, we have the best metering technology, better than Orb, better than the others, and it's genuinely built for engineers. You can use the Go SDK, send your meter events, and start monetizing, it's very simple and engineering-driven.

We started from a metering-first approach. Orb and Metronome started more from a FinOps or product-manager mindset, monetizing specific features, it's a different vibe. Engineering-led teams tend to prefer Metronome, while FinOps-led teams tend to prefer Orb. Other players like Amberflo are smaller.

For API-centric companies, the decision is more straightforward: you can meter directly on the API, authenticate it, monetize it, and enforce limits, none of the standalone billing tools can do enforcement the way we can. You can block usage once someone hits a token limit or gets throttled, and then publish pricing plans through a developer portal, all integrated in one place.

For that use case, we are clearly the right fit since everything's already connected. For non-API use cases, if you want a metering-based approach rather than a feature-flag approach, OpenMeter is usually the right solution. On payment gateways, we mainly support Stripe today, with Adyen support coming on the roadmap.

How do you think about "model routing" as a market?

There's a lot of misconception about what "model routing" or "AI gateway" actually means as a market, understandably, given how the term gets used loosely. There are really three distinct flavors.

The first is when we started building in February 2024, we were among the first to open source this category. It's the behind-the-firewall use case: roughly 60% of enterprises now use seven or more LLMs internally, and the sprawl is significant. Our thesis was that just as enterprises adopted different clouds and microservices frameworks, the same thing would happen with AI, so we built an AI gateway to manage dozens of LLMs internally. That's a true internal use case: AI security, governance, acceleration, token limits, and so on. It doesn't get much attention on social media, but it's very lucrative, closer to a Palantir-style approach selling into large enterprise. That's our market.

The second flavor, emerging around the same time, was OpenRouter, essentially a wholesale marketplace for LLMs at a discount, similar to Costco. A model like Grok gets relatively little usage on its own, so putting it on OpenRouter at a discount drives volume because people chase cheap tokens. That's a fundamentally different business, they charge something like a 5% brokerage fee. It still needs a proxy, but the actual business is arbitrage, not the proxy itself.

The third flavor is public AI gateways like Cloudflare's or Vercel's, which let you consume multiple LLMs through one API key. Similar to OpenRouter in that the business is arbitrage with a cut taken on top, rather than the proxy infrastructure itself.

Then there's open source, like LiteLLM and Portkey. LiteLLM started as a Python SDK and later added a proxy, but the proxy was never really the core product, and Python is relatively slow and prone to breaking under heavy load. Our customers who need real scale come to Kong because we can handle quadrillions of tokens without error rates for any internal enterprise use cases. Open source tools like these can get adopted internally or externally, but they tend to stay fairly small as standalone businesses.

So to summarize: OpenRouter is essentially the Costco of LLMs and will likely keep growing in that lane. We're more the Palantir of API and LLM token infrastructure, and that business has become quite material for us. The public consumption side is about finding the best model and routing for general internet use.

The enterprise side is different: if 10,000 employees inside a company send a simple prompt, how do you route that to the cheapest appropriate LLM in your portfolio so you're not burning expensive tokens unnecessarily? Can you compress prompts to reduce token usage? Can you cache common prompts at the gateway layer so you never even hit the LLM? On top of that, you get visibility into all API and AI traffic inside the enterprise: who's using what, what the margin looks like per token, and you can rate-limit specific parts of the organization.

You can also sanitize sensitive information, like medical records, before it reaches an LLM. It functions as a kind of AI control tower inside the enterprise, with gateways acting as the checkpoints. People call all of these "AI gateways," but they're genuinely different markets with different sales motions and different use cases.

What's the split you're seeing between enterprises enforcing internal guardrails and throttling versus building customer-facing AI features?

It's overwhelmingly internal right now. We went from about 10 enterprise customers a year ago to about 100 today, and the vast majority of that growth is enterprises using us as a tollgate to unlock AI internally without things going wrong.

About 90% use semantic caching, semantic routing, and prompt compression, essentially what OpenRouter does for arbitrage, but applied internally, plus additional governance features layered on top.

How much are you seeing open source models gain ground, are they reaching parity with the frontier labs?

There's real usage, but not parity, not even close. We’re working on a State of Enterprise AI report using anonymized data from our AI gateway across a year of traffic, showing which LLMs enterprises actually use.

Anthropic is clearly ahead of everyone in that data. But enterprises do typically use five or six different LLMs, and usually one of those is open source.

I'm curious what you're seeing on cost. The recent Opus models, paired with something like the Cursor/Sonnet combination, were the first time I'd really had to think seriously about cost. What are you seeing internally as frontier models get more expensive at the high end, more throttling, more routing to cheaper models?

One of the main reasons people use our enterprise AI gateway is exactly to manage that, since end users typically don't think about which model they're calling, they just want an answer.

At scale, with 10,000 employees asking mid-complexity questions, an enterprise can save tens of millions of dollars by having the gateway semantically route those requests to a less expensive, less powerful model when the task doesn't need the frontier model.

Without that routing layer, everything defaults to the most expensive model, and costs scale proportionally with usage. That smart routing between your own open-source models and various commercial LLMs, letting you essentially arbitrage cost versus capability, is one of the biggest use cases alongside security and governance.

What about security? Has that come up more given things like the LiteLLM breach?

Yes, actually a lot of customers came to us specifically because they didn't trust LiteLLM for security reasons, it's a strong framework, but not built to enterprise security standards, it's fundamentally a Python tool.

We saw a meaningful number of inbound inquiries, somewhere around fifty, in the week after that incident from companies worried about LiteLLM's security. LiteLLM is a great Python framework, and it became the default proxy when there weren't many AI Python SDKs available back in 2023, so it spread everywhere. But it's not built for scale, once you start handling a billion calls, it breaks down with elevated error rates and latency.

Below that volume, it's fine. Portkey has similar limits, it also starts showing error rates and latency issues around a billion calls a day. They're solid tools for simpler, lower-volume use cases, but enterprise customers with serious security and regulatory requirements generally won't go with Python-based middleware at that scale.

What about the shift toward agents? Companies used to run single LLM calls for chatbots, now agents make many calls over the course of a run. Is that changing what you're seeing from enterprises around cost or restrictions?

Enterprises are genuinely far behind on agents, surprisingly so, and we see this constantly. There's agentic coding, where people run multiple agents through tools like Replit or Cursor, but even there, enterprise adoption lags. For true agentic workflows across the enterprise, I think we're about three years away, because SaaS systems are siloed and authorization is locked down across them.

Think of internal APIs the way you'd think about something like Daintree or Neville systems internally. If you look at the latest agentic benchmarks that came out last week, intelligence scores are improving rapidly across the board, but multi-step API tool calling was still the cheapest, weakest area, only around 17% accuracy. Calling the right APIs, knowing what to do with the results, and having proper access, that's still the hardest part, roughly 80% of the difficulty, and it hasn't really improved in two or three years.

That's the real unlock needed for enterprise: true end-to-end agentic workflows across proprietary internal APIs, with a universal system of record an agent can be pointed at, to retrieve whatever it needs, essentially "food for agents." I think that's still two or three years away.

That's part of why frontier labs are pushing so hard into forward-deployed engineering. Even companies like Ramp are doing this now, everyone wants to be Palantir and hire forward-deployed engineers, because they've realized you can't just hand an enterprise a smart model and expect it to work.

There's too much internal politics, too many people, 55,000 offices worldwide, 75 different data warehouses, that complexity doesn't go away just because you have a powerful model. It'll take forward-deployed engineering teams, ours included, a couple of years to unblock enough business processes through straightforward engineering work that agents can then actually run reliably.

In the meantime, what's happening is every SaaS category is building its own agent: Ashby for recruiting, Pigment for FP&A, and so on. But those remain vertical silos, there's no real cross-functional workflow connecting HR, sales, and engineering, because the underlying systems stay isolated.

Looking ahead, can you talk a bit about what you’re thinking about going forward with the evolution of agents and marketplaces?

Our strategy has two parts. First is what we call the mountain pass: starting with API calls and now AI tokens, and building the technology layer, the API gateway, MCP gateway, AI gateway, LLM gateway, agent gateway. We sit directly in that traffic and add value by making it faster, more efficient, and more secure. That's the first phase.

If you can do that at scale across enough companies, you effectively end up owning visibility into that traffic, and you can build a system of record of all the active APIs, MCP servers, and the consumers, clients, and agents using them. That's what we're building inside our platform.

Once you've accumulated enough supply of APIs, you can turn that supply into a marketplace and open it up to agentic demand. At that point, you don't need separate API key provisioning or separate billing relationships for every service, you give an agent a single wallet pointed at the system of record, and we handle API keys, billing, and provisioning underneath, since we're already running the supply side directly.

That's the long-term vision, something like an eBay for APIs and Agents, but it only works if you're already running the underlying API infrastructure, which we weren't originally set up to do, we were a proxy sitting in front of a proxy. That's why building up enough supply mattered first.

Now we manage millions of APIs, enough critical mass across enough categories that we're ready to build a marketplace layer on top of that infrastructure, a separate product, but powered by everything we already run behind the firewall. That's the long-term goal: unleashing AI as an assembly line for agents to go get the building blocks for their next creation.

Disclaimers

This transcript is for information purposes only and does not constitute advice of any type or trade recommendation and should not form the basis of any investment decision. Sacra accepts no liability for the transcript or for any errors, omissions or inaccuracies in respect of it. The views of the experts expressed in the transcript are those of the experts and they are not endorsed by, nor do they represent the opinion of Sacra. Sacra reserves all copyright, intellectual property rights in the transcript. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any transcript is strictly prohibited.

Augusto Marietti, CEO of Kong, on the end of tokenmaxxing

Background

Questions

Interview

Disclaimers

Read more from
#ai

Arena revenue, growth, and valuation

$100M/year Nielsen of LLMs

$20M/year Replit for GCs

Create a free account, or log in.

Free article limit reached.

Standard membership required.

Standard membership required.

Background

Questions

Interview

Disclaimers

Read more from #ai

Arena revenue, growth, and valuation

$100M/year Nielsen of LLMs

$20M/year Replit for GCs

Read more from
#ai