Exa's Curated AI Search Index
Diving deeper into
Will Bryk, CEO of Exa, on building search for AI agents
We try to crawl the highest quality subset that we can.
Analyzed 6 sources
Reviewing context
This is really a product strategy about precision, not just infrastructure cost. Exa is deciding that an AI search engine gets better by refusing to index a huge amount of low value web clutter, because bad pages do not just waste compute, they actively poison retrieval. That makes the index feel more like a curated working set of useful pages, people, companies, and papers than a mirror of the whole public web.
-
The practical reason is that agent search is much more sensitive to junk than consumer search. If an LLM retrieves spammy SEO pages, the answer quality drops with it. Exa describes its system as targeting high quality pages, and Google has also spent heavily on spam policy updates to suppress scaled low quality content.
-
This also reveals how Exa is different from Google style search. Google can afford broad coverage, then rank and filter on top. Exa is using embeddings over a smaller but more intentional corpus, because its goal is to return the exact set of matching entities or documents for complex queries, not just a page of likely links.
-
There is a roadmap embedded in the comment. Exa says comprehensiveness still matters over time, but the startup move is to begin with the open web pages that carry the most information density, then improve quality scoring and retrieval so the index can widen without flooding results with garbage.
Over the next few years, the winners in AI search will likely be the companies that can expand coverage without losing taste. As more of the web gets polluted by scaled content and more valuable sources sit behind access controls, quality selection and quality ranking become the core moat, not simple crawl volume.