TurboPuffer enables instant filtered search

Diving deeper into

AI engineer at Indeed on TurboPuffer vs. Vespa vs. Elasticsearch at scale

Interview
TurboPuffer vectors are incrementally indexed—with something like an LSM-fresh vector index—which supports filtering plus immediate visibility of writes in search results.
Analyzed 3 sources

TurboPuffer is trying to solve the hardest tradeoff in vector search, fresh writes without giving up cheap storage. The important point is that teams can stream in a new vector, attach metadata filters like tenant or document type, and have that record show up in search immediately, instead of waiting for a full rebuild. That makes it usable for live agent workflows, while still keeping most of the corpus in lower cost object storage.

  • At Indeed, updates are handled with a hybrid model, batch writes for bulk refreshes, plus real time indexing for records that need to become searchable right away. The team said this supports filtering and immediate write visibility, which is the practical definition of freshness in production search.
  • This is a real advantage over systems that are painful to reindex quickly. Another large scale engineer contrasted TurboPuffer with Elasticsearch, which they described as particularly bad at fast reindexing, and said update behavior and cache consistency are often where production migrations get messy.
  • The tradeoff is that immediate visibility does not remove cold start behavior. TurboPuffer can expose fresh results quickly, but if the relevant partition is cold, latency can still spike while data is fetched from object storage. That is why users separate freshness from tail latency when evaluating it against Vespa or always on systems.

The direction is toward search systems that behave more like databases, where writes become queryable right away, but still tier storage aggressively in the background. If TurboPuffer keeps improving freshness, filtering, and predictable tail latency together, it moves from a cheap retrieval layer toward a more complete production serving system for large scale AI search.