Ex-employee at Exa on building search infrastructure for AI data pipelines
Jan-Erik Asplund
Background
We spoke with an engineering lead at Exa who previously built their Websets product and infrastructure for retrieving and serving search results. The conversation explores how they use Exa's search API to power a data pipeline that processes 5,000 daily queries yielding 50,000-100,000 results, and compares Exa with alternatives like Parallel for different use cases.
Key points via Sacra AI:
- Exa's high result volume and content extraction capabilities are critical for daily data creation pipelines that process 5,000 queries yielding 50-100k results per day with ~80% relevance rate. "We use Exa to do a lot of searches for high-quality articles about technical implementations such as Python, React, and Swift. We have a list of 5,000 prompts that we search through every single day to find new articles that surface up... The number of results returned is one big thing. Second is the amount of information we get from Exa in terms of the full content."
- While Parallel excels at agentic research and sentiment analysis with reliable summaries, Exa's comprehensive coverage and ability to return up to 10,000 paginated results per query makes it indispensable for data-intensive workflows despite occasional extraction issues with paywalled or JavaScript-heavy content. "Parallel is better at doing agentic runs. It can search for everything like the website does. It can do tasks and research a little bit better, depending on different cases. But search-wise, they are pretty much the same as other API-based solutions. They don't provide the high number of results that we can get from Exa's API."
- As engineering lead making purchasing decisions in the "tens of thousands" range, the interviewee sees AI-native search as a distinct industry category that won't be absorbed by foundation model providers, with plans to expand Exa usage for more data creation needs. "We'll keep using Exa, that's for sure. We're going to expand our usage and do more and more scraping. We have more data scraping needs that definitely require Exa's help... AI is going to search way more than humans do because of how agentic we are now. So yes, it will be a separate industry."
Questions
- Could you tell me a bit about your company? What you do, and what your role is?
- Which of these products do you use? Exa, Parallel, or both? And roughly how long have you been using them?
- Can you walk me through the problem you were trying to solve when you first adopted Exa? What was the need and what were you doing before?
- At a high level, how central are Exa and Parallel to your workflow today? Are they core infrastructure you rely on daily or more of a nice-to-have that you use occasionally?
- Can you walk me through a specific workflow where you use Exa? Step by step, what it looks like in practice from query to final output.
- When you run those 5,000 daily queries and get results back, what happens next in the pipeline? Are engineers reviewing them? Are agents automatically qualifying them? Are you scraping full content and enriching it? Walk me through what happens after the initial search response comes back.
- Who on your team is actually interacting with Exa directly? Is it mostly engineers building and maintaining this pipeline, do analysts or other roles ever touch it? And roughly how many people are involved?
- How has your usage of Exa changed since you first adopted it? Are you running more queries now, using more features, or applying it to new use cases compared to when you started?
- When you say it's more reliable now, what specifically improved? Was it result quality? Uptime/latency, fewer missed or duplicate results, better filters like published dates, or something else?
- How does Exa integrate into your broader stack? Is it called by internal services, AI agents, scheduled jobs, or is there any human-in-the-loop tooling on top of it?
- If Exa disappeared tomorrow, what would you do? How painful would it be to replace? What would you fall back on?
- You mentioned that for about half your queries, you could potentially use something like Serpapi or Parallel. What specifically makes the other half only viable with Exa? What's different about those niche queries in terms of structure, intent, or required depth?
- When you were originally evaluating solutions, what other products or approaches did you seriously consider alongside Exa? What was the shortlist at the time?
- When you compared Exa to other search APIs, what stood out?
- What specifically did Exa do better in that comparison? Was it result relevance, the semantic search capability, the number of results returned, API design, pricing, or something else?
- On search result quality specifically, how did you evaluate that? Did you run side-by-side tests across providers to measure recall or precision in some systematic way, or was it more of a qualitative review of the outputs?
- When you use an LLM to do that first-pass relevance screening, what input do you feed it from Exa? Is it just the title and snippet, or are you pulling full text and having the model judge based on the content?
- How important is full text access from Exa to your overall workflow? If you only had titles and snippets and had to scrape everything separately, how would that change things for you?
- Switching over to Parallel for a moment. What was the problem you adopted Parallel for a few months ago in your other project?
- In that audit, what have you found so far is the biggest difference between Parallel and Exa in practice?
- When you say Parallel is better at agentic runs and deeper research, can you walk me through a concrete example of a task where Parallel outperformed Exa? What was the input, and what did it produce that was meaningfully different?
- In that sentiment-style workflow with Parallel, what do you actually get back that you use? A written summary, a set of cited sources, extracted quotes, or something else?
- How do you assess whether that summary is correct and representative versus cherry-picked or hallucinated? What's your validation step?
- In practice, how often do you find issues in those Parallel summaries? Like mismatched citations, missing nuance, or wrong conclusions?
- When there are issues, what tends to be the failure mode?
- How do you decide when to use Parallel versus Exa for a given task today? What's the rule of thumb?
- When you use Exa for the data creation process, what does it do so well that keeps you using it? If you had to pick the single most important strength, what would it be?
- What's the biggest frustration or limitation you've hit with Exa so far?
- When Exa misses or only partially extracts page content, what types of pages does that usually happen on? Paywalled sites, heavy JavaScript rendering, weird HTML structures, PDFs, or something else?
- When you hit those paywalled or heavy JS pages, what does your fallback system look like? Are you using headless Chrome/Playwright-style rendering, third-party scrapers, residential proxies, or something else?
- On the economic side, can you give me a sense of what you pay for Exa today? Roughly what tier or spend level you're at?
- What drives that spend the most for you? Query volume, the full text extraction, number of results per query, or something else?
- Do you feel like Exa is priced fairly for the value it delivers at that spend level?
- What would have to change for it to feel not worth it? What kind of price increase or degradation in performance would make you reconsider?
- On data freshness and coverage, how up-to-date does Exa feel today for your needs? Are you typically finding new articles within hours, days, or weeks? What's your expectation versus what you observe?
- When you say 'good amount,' about how many truly new articles per day are you seeing across the 5,000 queries?
- For those 50-100k results per day, roughly what ends up being genuinely new and useful versus duplicates, near-duplicates, or irrelevant content?
- How are you defining 'useful' in that context? What criteria has to be true for a result to count as useful in your pipeline?
- When the LLM judges relevance, roughly what percentage of the 50 to 100,000 daily results pass that filter and make it into your final dataset?
- That's quite high. Do you think that 80 percent pass rate is mostly because your 5,000 queries are already very well-tuned, or because Exa's recall is broad but still fairly precise? Which is doing more of the work?
- That's interesting. Can you give me an example of one of those relatively simple queries that still yield high-precision results from Exa? Just so I can understand the level of specificity.
- When you run that kind of query in Exa, are you using any special operators or settings? Like semantic versus keyword mode, domain filters, or only the published date filter you mentioned?
- When you say you push the number of results to the maximum limit per query, what does that look like in practice? How many results are you requesting per query, and do you paginate beyond the first page?
- At that scale, do you ever run into rate limits, latency spikes, or reliability issues from Exa when paginating that deep?
- How painful are those rate limits for you? Are they a minor inconvenience you can work around?
- On Parallel's pricing and value, how does it compare in your mind to Exa? Does it feel cheap, expensive, or just different because of the agentic layer?
- Who makes the purchasing decision for tools like Exa and Parallel in your organization? Is it you, engineering leadership, or someone on the business side?
- When you justify that spend internally, what's the main value story you tell? Like time saved, better coverage, quality, or enabling a product you couldn't build otherwise?
- Have you considered building your own search and retrieval pipeline with crawling, indexing, and embeddings? If not, what's the main reason you never seriously considered it?
- Looking forward, what's on your wish list for Exa? If you could wave a magic wand and add one capability that would materially improve your workflows, what would it be?
- On the published date issue, how does it typically fail for you? Is it missing dates entirely, wrong by a few days, or totally incorrect like years off?
- How do you handle those fallback dates in your pipeline today? Do you drop the result, rescrape and parse, or override with another signal?
- How do you see your usage of Exa and Parallel evolving over the next year or two? Do you expect to expand significantly, stay roughly flat, or potentially move off one of them?
- What's the main new use case or expansion area that you expect will drive that increased usage?
- Do you think foundation model providers like OpenAI or Google will eventually offer built-in search capabilities that are good enough to replace something like Exa for your use case? How do you think about that platform risk?
- More broadly, how do you see this market evolving over the next few years? AI-native search APIs, agentic research systems, the infrastructure layer for giving models access to the web—do you think this becomes a large standalone category or gets absorbed into bigger platforms?
Interview
Could you tell me a bit about your company? What you do, and what your role is?
Exa is a search company designed for AI. We provide an information layer for all AI applications, including agents, work, and model training. My role at the company was an engineering lead for one of our products called Websets, which is a comprehensive search of the internet with AI agents doing qualifications and enrichment around the results. I was also involved in building out some of our infrastructure for retrieving and serving search results, as well as developer experiences with our dashboard and documentation. I cover front-end to back-end to infrastructure for our search products.
Which of these products do you use? Exa, Parallel, or both? And roughly how long have you been using them?
Both. I've been using Exa since I started in 2024, and I started using Parallel a few months ago for another project of mine.
Can you walk me through the problem you were trying to solve when you first adopted Exa? What was the need and what were you doing before?
We were actually internally using Exa for the website product at first because we were trying to build a perfect search engine where, given an arbitrary query, you're able to search for all the results in the entire internet that qualify and fit that query. Using Exa, we're able to execute many queries, return a lot of results, and then further drill down into those results to find additional results that qualify for that query.
At a high level, how central are Exa and Parallel to your workflow today? Are they core infrastructure you rely on daily or more of a nice-to-have that you use occasionally?
They are core infrastructure that I use daily. Without search, we're not able to do any data scraping and data curation.
Can you walk me through a specific workflow where you use Exa? Step by step, what it looks like in practice from query to final output.
We use Exa to do a lot of searches for high-quality articles about technical implementations such as Python, React, and Swift. We have a list of 5,000 prompts that we search through every single day to find new articles that surface up. We run through those 5,000 queries using Exa's API and get the results. Then we match against the results we already have in our database to find new articles. We also use the published dates filter that Exa provides to narrow down our searches to within the last week and then within the next few days so we can more efficiently narrow down the results.
When you run those 5,000 daily queries and get results back, what happens next in the pipeline? Are engineers reviewing them? Are agents automatically qualifying them? Are you scraping full content and enriching it? Walk me through what happens after the initial search response comes back.
We have agents with a few different qualifications. We start with looking at the publish date to see if articles are new to our database, which contains all the articles we've previously scraped. We also look at the full contents to see if there's a delta between the content we got from this search and the snapshots we got in previous searches. That's how we qualify the results.
Who on your team is actually interacting with Exa directly? Is it mostly engineers building and maintaining this pipeline, do analysts or other roles ever touch it? And roughly how many people are involved?
Engineers mostly. It's primarily me and another colleague.
How has your usage of Exa changed since you first adopted it? Are you running more queries now, using more features, or applying it to new use cases compared to when you started?
We're applying it to more use cases because Exa has become more powerful and comprehensive in its coverage. We also use it more often because it's more reliable now. We're able to run more queries reliably in a cost-effective and efficient way. That's why we count on Exa more nowadays for our search data creation and run it more frequently than before.
When you say it's more reliable now, what specifically improved? Was it result quality? Uptime/latency, fewer missed or duplicate results, better filters like published dates, or something else?
It's two things. The search quality has become better because the coverage is better. There are more websites indexed in the Exa search engine. The second thing is reliability is better. The error rates have gone significantly lower since we first started using it.
How does Exa integrate into your broader stack? Is it called by internal services, AI agents, scheduled jobs, or is there any human-in-the-loop tooling on top of it?
There's generally no human involvement. It's mostly just automated jobs that run every single day. We do review the articles and data that we create periodically, but not every day, while Exa is running daily.
If Exa disappeared tomorrow, what would you do? How painful would it be to replace? What would you fall back on?
We would need to choose another data provider or search engine to replace Exa. For some of our use cases, it wouldn't make sense because we have many niche requests and queries that only Exa can provide given its embedding search nature. But for perhaps half of our queries, we could be served through basic search engines like SERP API providers or competitors like Parallel. The main concern is the number of results we can get from these searches. Exa provides a much more generous number of results compared to other providers. While we could run additional API requests to get more exhaustive results from alternatives, it would still be less exhaustive than what Exa can provide.
You mentioned that for about half your queries, you could potentially use something like Serpapi or Parallel. What specifically makes the other half only viable with Exa? What's different about those niche queries in terms of structure, intent, or required depth?
The query complexity is higher, and the depth we need to go is greater because of the limited results these queries get. We want to dig deeper into finding new things from those complex queries. Exa is better at searching for complex and more vague topics, like 'great articles about React' versus simply 'articles about React,' or a more specific query like 'useEffect in React.' Exa is better at dealing with vague queries and at handling queries with very few results that require deeper digging.
When you were originally evaluating solutions, what other products or approaches did you seriously consider alongside Exa? What was the shortlist at the time?
We looked at two different types of products. We looked at data brokers, like Bright Data, and we looked at other search APIs like Parallel, Tavily, etc. Data brokers couldn't meet our needs because they don't update as frequently as we wanted, and they also give you much less customization of what you want to get from the data dump. We would have to do a lot of filtering from the data dumps they provide to get to the data we want to create. So data brokers did not work for us.
When you compared Exa to other search APIs, what stood out?
For other search APIs, a lot of them do essentially the same thing. But Exa just does it better.
What specifically did Exa do better in that comparison? Was it result relevance, the semantic search capability, the number of results returned, API design, pricing, or something else?
The number of results returned is one big thing. Second is the amount of information we get from Exa in terms of the full content. And the third thing is search result quality, which is a lot better than other platforms we've audited.
On search result quality specifically, how did you evaluate that? Did you run side-by-side tests across providers to measure recall or precision in some systematic way, or was it more of a qualitative review of the outputs?
It was more of a qualitative evaluation. We use AI to help with our qualitative analysis to do a first screening of relevance to the query using LLMs. We're not using any relevancy scores or formal benchmarks. So it's LLMs doing a first pass on whether the content is relevant to the query, and then we do manual quality assessment.
When you use an LLM to do that first-pass relevance screening, what input do you feed it from Exa? Is it just the title and snippet, or are you pulling full text and having the model judge based on the content?
The full text.
How important is full text access from Exa to your overall workflow? If you only had titles and snippets and had to scrape everything separately, how would that change things for you?
Very important because we need the data from each website for our data creation process. If we don't have that, we'd need to rely on another provider or build our own browser automation tool to scrape the content, which is difficult because different websites have different content restrictions and structures.
Switching over to Parallel for a moment. What was the problem you adopted Parallel for a few months ago in your other project?
In that audit, what have you found so far is the biggest difference between Parallel and Exa in practice?
Parallel is better at doing agentic runs. It can search for everything like the website does. It can do tasks and research a little bit better, depending on different cases. But search-wise, they are pretty much the same as other API-based solutions. They don't provide the high number of results that we can get from Exa's API.
When you say Parallel is better at agentic runs and deeper research, can you walk me through a concrete example of a task where Parallel outperformed Exa? What was the input, and what did it produce that was meaningfully different?
For example, we frequently ask for people's sentiments about different products. We might ask for people's sentiments about different programming languages or frameworks for front-end development. Parallel was able to crawl and search for different results and then create a solid snippet of the sentiments around different frameworks. That is done better than Exa, even though Exa also has a research API that does similar things. Parallel just does it a little better.
In that sentiment-style workflow with Parallel, what do you actually get back that you use? A written summary, a set of cited sources, extracted quotes, or something else?
The entire summary.
How do you assess whether that summary is correct and representative versus cherry-picked or hallucinated? What's your validation step?
We look at the citations and review them. Then we run them through LLMs to determine if the citations match with the summary that was provided by Parallel.
In practice, how often do you find issues in those Parallel summaries? Like mismatched citations, missing nuance, or wrong conclusions?
It's very few, actually.
When there are issues, what tends to be the failure mode?
Citations being quoted incorrectly and hallucinated content.
How do you decide when to use Parallel versus Exa for a given task today? What's the rule of thumb?
For our specific use cases, we just use Exa because we need the results more than the agentic capabilities. Whenever I use Parallel, it's probably when I need a quick summary of results, a synthesis, or an AI summary of multiple results. But for the data creation process, we mostly use Exa.
When you use Exa for the data creation process, what does it do so well that keeps you using it? If you had to pick the single most important strength, what would it be?
Number of results. It can return many, many results.
What's the biggest frustration or limitation you've hit with Exa so far?
I think the content extraction quality is not as good as we want it to be because we have high requirements for the data itself since we're creating data for our clients. Sometimes it might miss content that was on the webpage or not be able to extract the entire content—it may extract parts of it or even very little of it. We run our own separate browser instances that scrape these edge cases. If Exa could provide better content or more reliable content extraction, that would dramatically reduce what we need to do.
When Exa misses or only partially extracts page content, what types of pages does that usually happen on? Paywalled sites, heavy JavaScript rendering, weird HTML structures, PDFs, or something else?
It's mostly paywalled and JavaScript-heavy content. That's what we see most often.
When you hit those paywalled or heavy JS pages, what does your fallback system look like? Are you using headless Chrome/Playwright-style rendering, third-party scrapers, residential proxies, or something else?
We're using a combination of residential proxies and headless browsers. Proxies allow us to scrape without getting blocked, and then we use browsers to deal with JavaScript content. We're also running AI to manage these headless browsers to avoid JavaScript blocks.
On the economic side, can you give me a sense of what you pay for Exa today? Roughly what tier or spend level you're at?
We pay in the tens of thousands right now.
What drives that spend the most for you? Query volume, the full text extraction, number of results per query, or something else?
Query volume and the number of results that we need to get.
Do you feel like Exa is priced fairly for the value it delivers at that spend level?
Yes, definitely.
What would have to change for it to feel not worth it? What kind of price increase or degradation in performance would make you reconsider?
I think if it starts to return the same results over and over again and doesn't get new results or refresh its index, then I would have second thoughts about using it.
On data freshness and coverage, how up-to-date does Exa feel today for your needs? Are you typically finding new articles within hours, days, or weeks? What's your expectation versus what you observe?
We do our scrapes daily, and it seems to be pretty good. We can get a good amount of new information every day.
When you say 'good amount,' about how many truly new articles per day are you seeing across the 5,000 queries?
For 5,000 queries, we're looking at about 10 to 20 times the number of results that we get from these 5,000 queries. That's about 50,000 to 100,000.
For those 50-100k results per day, roughly what ends up being genuinely new and useful versus duplicates, near-duplicates, or irrelevant content?
They are useful.
How are you defining 'useful' in that context? What criteria has to be true for a result to count as useful in your pipeline?
They have to be new compared to our previously scraped data, and they have to be relevant to the query, which is determined by the LLM judge.
When the LLM judges relevance, roughly what percentage of the 50 to 100,000 daily results pass that filter and make it into your final dataset?
I haven't looked into the data yet, but it's more like 80 percent.
That's quite high. Do you think that 80 percent pass rate is mostly because your 5,000 queries are already very well-tuned, or because Exa's recall is broad but still fairly precise? Which is doing more of the work?
I think Exa is pretty precise. The queries themselves are very simplistic; they're not really engineered.
That's interesting. Can you give me an example of one of those relatively simple queries that still yield high-precision results from Exa? Just so I can understand the level of specificity.
One query that keeps giving good results is 'What are the best articles to learn about React as a beginner?' That one always works.
When you run that kind of query in Exa, are you using any special operators or settings? Like semantic versus keyword mode, domain filters, or only the published date filter you mentioned?
Published date filter mostly. I set it to auto. We also push the number of results to the maximum limit.
When you say you push the number of results to the maximum limit per query, what does that look like in practice? How many results are you requesting per query, and do you paginate beyond the first page?
We find that requesting about 10,000 results gets us a good enough number. And we do look into every page.
At that scale, do you ever run into rate limits, latency spikes, or reliability issues from Exa when paginating that deep?
We do run into rate limits, but we implement progressive timeouts for that.
How painful are those rate limits for you? Are they a minor inconvenience you can work around?
Not really painful. They're minor inconveniences that we can engineer solutions to avoid.
On Parallel's pricing and value, how does it compare in your mind to Exa? Does it feel cheap, expensive, or just different because of the agentic layer?
I think the pricing is basically the same as Exa, but the agentic part of it is different. Parallel prices differently on the agentic side. You put in more money for more compute, which is different from how Exa prices. Exa prices in terms of the sheer number of results. So in a way, Parallel is a little bit more expensive because the agentic part requires more money to run.
Who makes the purchasing decision for tools like Exa and Parallel in your organization? Is it you, engineering leadership, or someone on the business side?
I make those decisions because I'm the engineering lead.
When you justify that spend internally, what's the main value story you tell? Like time saved, better coverage, quality, or enabling a product you couldn't build otherwise?
It's simply that we cannot build a reliable data creation process without having search, and we're not building search ourselves. So we need a search provider for this.
Have you considered building your own search and retrieval pipeline with crawling, indexing, and embeddings? If not, what's the main reason you never seriously considered it?
No, we never considered it. We cannot build a solution without constantly needing to reindex. The ongoing maintenance would be huge.
Looking forward, what's on your wish list for Exa? If you could wave a magic wand and add one capability that would materially improve your workflows, what would it be?
Having more consistent content extraction would be better. I know they have a monitoring system I haven't tried out yet. Having a more robust workflow of qualifying each result against the already great dataset would be helpful, just so we can get the newest content. I also notice that published dates sometimes are not really robust. There are websites we get from Exa that have weird or wrong published dates. So extracting the right published date would be great as well.
On the published date issue, how does it typically fail for you? Is it missing dates entirely, wrong by a few days, or totally incorrect like years off?
It's just wrong. It defaults back to the first of January of this year, which is incorrect.
How do you handle those fallback dates in your pipeline today? Do you drop the result, rescrape and parse, or override with another signal?
We have fallbacks for whether the content has been scraped or not in our system. We handle that, so it's fine.
How do you see your usage of Exa and Parallel evolving over the next year or two? Do you expect to expand significantly, stay roughly flat, or potentially move off one of them?
We'll keep using Exa, that's for sure. We're going to expand our usage and do more and more scraping. We have more data scraping needs that definitely require Exa's help.
What's the main new use case or expansion area that you expect will drive that increased usage?
It's just the different data creation needs that we have. Each data creation process we need to do has its own set of queries that we need to run.
Do you think foundation model providers like OpenAI or Google will eventually offer built-in search capabilities that are good enough to replace something like Exa for your use case? How do you think about that platform risk?
AI companies might not provide as much customizability and raw information because most will default to generating content rather than giving you raw results. We need those raw results. And Google would likely not do it because that would jeopardize their ad business, as the AI would be doing searches without users looking at any of the links. So I don't think Google would do it.
More broadly, how do you see this market evolving over the next few years? AI-native search APIs, agentic research systems, the infrastructure layer for giving models access to the web—do you think this becomes a large standalone category or gets absorbed into bigger platforms?
It looks like it's a separate industry, given how the industry has evolved from when I first started in 2024 until now. I'm pretty confident this will be its own category of businesses. AI is going to search way more than humans do because of how agentic we are now. So yes, it will be a separate industry.
Disclaimers
This transcript is for information purposes only and does not constitute advice of any type or trade recommendation and should not form the basis of any investment decision. Sacra accepts no liability for the transcript or for any errors, omissions or inaccuracies in respect of it. The views of the experts expressed in the transcript are those of the experts and they are not endorsed by, nor do they represent the opinion of Sacra. Sacra reserves all copyright, intellectual property rights in the transcript. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any transcript is strictly prohibited.

