TurboPuffer for sparse multi tenant search
Diving deeper into
AI engineer at Indeed on TurboPuffer vs. Vespa vs. Elasticsearch at scale
That's where the TurboPuffer advantage really shows up—you only need the relevant data available in RAM as hot data, and all the irrelevant data can stay in cold storage
Analyzed 2 sources
Reviewing context
TurboPuffer wins when the corpus is mostly sleeping. In the Indeed workload, each session usually touches one tenant's records, while the rest of the corpus stays cold, so the system avoids paying RAM costs for data that will not be queried. That makes the best fit a multi-tenant corpus with lots of cold data and uneven traffic, not a uniformly hot index where everything needs to stay ready all the time.
-
The practical shape is per-customer retrieval. The team said the cost win came from lots of cold storage data plus spiky traffic, where a user mainly needs their own records. In that setup, relevant data gets promoted to hot RAM only when needed.
-
Isolation was handled with metadata filters and namespaces, not fully separate systems per tenant. That helps narrow search to the right slice of data, but permission logic and cross-region privacy rules add operational complexity, especially when policy cannot live entirely in the vector layer.
-
This is different from heavy fan-out agent workloads. In a second large scale evaluation, rare lookups across very large corpora could trigger warm starts from cold storage, creating unpredictable tail latency. That makes always-on systems better when many parallel searches must all return quickly.
The direction is clear, TurboPuffer is strongest as retrieval infrastructure for massive, sparse, multi-tenant corpora where most data is rarely touched. If it adds deeper policy-aware filtering and keeps reducing cold-start pain, it can expand from cost optimized tenant search into a broader serving layer for production AI workloads.