Turbopuffer weak fit for code search

AI engineer at Meta on evaluating Turbopuffer vs. Pinecone vs. Weaviate

Code search is a weak fit for Turbopuffer

Analyzed 7 sources

This points to a core limit in Turbopuffer's product position, it is strongest when semantic similarity matters more than exact lookup, but code search usually needs both exact symbol matching and very fresh indexes. In practice, developers search for things like class names, file paths, function names, and recent PR changes, which pushes the workload toward hybrid systems that blend vector search with BM25 style keyword retrieval and keep updates flowing continuously.

1 sacra 2 sacra 3 turbopuffer 4 weaviate 5 weaviate 6 cursor

The interview draws a clean line between archival retrieval and code retrieval. Turbopuffer fits large, infrequently queried corpora where storage cost matters most, but code search cares more about freshness and sparse relevance than cheap cold storage.

1 sacra 2 sacra
Weaviate is built more directly for this pattern. Its docs describe hybrid search as vector plus BM25 run in parallel, with configurable weighting, and keyword search is explicitly meant for exact tokens and domain specific terms, which is closer to how codebases are actually searched.

4 weaviate 5 weaviate
Cursor's own indexing flow shows why freshness matters. It computes embeddings for each file, starts indexing automatically when a project opens, indexes new files incrementally, and also indexes merged PR history, so the retrieval layer is expected to track a codebase that changes every day.

6 cursor 7 cursor

The likely path forward is that code retrieval stacks keep separating into two layers. A cheap object storage engine can still work as background candidate generation, but winning developer workflows will center on engines that combine exact token search, semantic recall, and fast incremental updates in one always warm system.

1 sacra 3 turbopuffer 4 weaviate 6 cursor