Cheaper Retrieval Drives Index Expansion
Turbopuffer
The core implication is that cheaper retrieval changes demand, not just pricing. When a search system can leave most data in cheap object storage and only pull hot slices into faster cache, teams stop treating search as a scarce feature and start indexing long tail data, old records, and per customer namespaces that were too expensive to keep live before. That is Jevons paradox in retrieval, lower unit cost creates more total usage, not less.
-
Turbopuffer is built around object storage as the source of truth, with SSD and memory caches layered on top. That makes it a natural fit for workloads with huge corpora, many namespaces, and spiky access patterns, where most data is rarely queried but still needs to be searchable.
-
In practice, this expands retrieval into datasets companies previously left dark. Cursor used turbopuffer for a namespace per codebase design and said the architecture cut costs dramatically while supporting 80M+ namespaces and 1T+ vectors, which is exactly the kind of indexing expansion lower storage economics unlock.
-
The tradeoff is that expanded indexing does not mean every workload fits equally well. Interviews show retrieval quality can be comparable to Vespa or Elasticsearch for generic agent and search tasks, but products that need heavy personalization, custom ranking, strict latency guarantees, or stronger hybrid search still lean toward Vespa, Elasticsearch, or always on systems.
The next step is that retrieval infrastructure stops being sized around the hot working set and starts being sized around the total knowledge base. As that happens, more products will make every tenant, archive, and log stream searchable by default, and competition will shift from basic vector storage toward ranking, freshness, policy controls, and workflow specific retrieval layers.