Turbopuffer weak fit for code search
AI engineer at Meta on evaluating Turbopuffer vs. Pinecone vs. Weaviate
This points to a core limit in Turbopuffer's product position, it is strongest when semantic similarity matters more than exact lookup, but code search usually needs both exact symbol matching and very fresh indexes. In practice, developers search for things like class names, file paths, function names, and recent PR changes, which pushes the workload toward hybrid systems that blend vector search with BM25 style keyword retrieval and keep updates flowing continuously.
-
The interview draws a clean line between archival retrieval and code retrieval. Turbopuffer fits large, infrequently queried corpora where storage cost matters most, but code search cares more about freshness and sparse relevance than cheap cold storage.
-
Weaviate is built more directly for this pattern. Its docs describe hybrid search as vector plus BM25 run in parallel, with configurable weighting, and keyword search is explicitly meant for exact tokens and domain specific terms, which is closer to how codebases are actually searched.
-
Cursor's own indexing flow shows why freshness matters. It computes embeddings for each file, starts indexing automatically when a project opens, indexes new files incrementally, and also indexes merged PR history, so the retrieval layer is expected to track a codebase that changes every day.
The likely path forward is that code retrieval stacks keep separating into two layers. A cheap object storage engine can still work as background candidate generation, but winning developer workflows will center on engines that combine exact token search, semantic recall, and fast incremental updates in one always warm system.