Turbopuffer Bridges Serverless and Reserved
Turbopuffer
The key move is that turbopuffer has turned storage architecture into a pricing ladder, so the same namespace can start as cheap, mostly cold infrastructure and later become reserved hot capacity without a migration. That matters because AI retrieval traffic is usually uneven. Many workspaces sit idle for hours or days, then a small set becomes busy enough that paying per query stops making sense and always warm reserved nodes do.
-
Pinning is not a separate product. It is a metadata setting on an existing namespace. Data still lives durably in object storage, but queries route to reserved query nodes with SSD cache, and billing flips from TB queried to GB hours. turbopuffer says break even is typically around 10 QPS.
-
That closes the classic gap between serverless and provisioned search. Pinecone, Vespa, Elasticsearch, and Postgres based stacks usually force a harder choice up front between always on capacity and lower cost cold data. Interviews here describe turbopuffer winning when traffic is spiky and corpus size is large, while Vespa wins when custom ranking matters more.
-
In practice, this fits namespace per tenant or namespace per workspace products. Cursor uses a separate namespace for each codebase, and large operators describe keeping each user or tenant's mostly irrelevant data cold until that specific session needs it. Namespace pinning extends that model upmarket for a few consistently hot namespaces.
The next step is that retrieval vendors will converge on mixed pricing, but turbopuffer has an early advantage because pinning is built into the namespace model rather than bolted on as a separate deployment tier. If that model keeps absorbing hotter workloads, it expands from archival and bursty retrieval into more core production search budgets.