Stuart Kearney, co-founder of Vetted, on AI agents in shopping

Jan-Erik Asplund
View PDF
None

Background

OpenAI’s new shopping features, announced on April 28th, follow Perplexity’s November 2024 launch of its own embedded commerce experience, as both compete to own the shopping research layer.

We reached out to Stuart Kearney, co-founder of Vetted ($14M Series A, Insight Partners), an AI-powered shopping assistant, about how AI models are reshaping ecommerce.

Key points from our conversation via Sacra AI:

For more, check out this other research from our platform:

Questions

  1. Tell us about the journey from Slant to Lustre to Vetted, what you learned at each stage and what inspired you to build & focus on Vetted?
  2. What is Vetted in short?
  3. Tell us about the categories of products that people use Vetted for and how it breaks down. Are there a handful of major categories? What does the long tail look like?
  4. How do you think about the shopper’s journey from awareness, research to checkout? How do you think about each step and where it’s important for Vetted to play and reduce friction?
  5. You decided to build Vetted as a horizontal shopping platform, versus a vertical shopping assistant around travel, automotive, or some specific category. How did you think about the tradeoffs and why did you choose to go horizontal?
  6. Online reviews have been gamed to the point of being unhelpful, which is why so many people use Reddit for product reviews and recommendations. How do you think about Reddit as core data infrastructure for products like Vetted and what do you see as the future of truthful, helpful & human online reviews & recommendations given the current incentive structures that are in place?
  7. OpenAI recently announced shopping features for ChatGPT. What do you think OpenAI can learn from Google’s lack of success in shopping via Google Shopping?
  8. Perplexity has a shopping product that has embedded product listings and one-click checkout. How do you think about the tradeoffs of surfacing products directly in answers versus routing users to external merchants?
  9. As one of the big early players/successes in the space, I'm really curious to hear your take on what Honey got right and maybe where they were more limited.
  10. Do you have a sense of how often people ask a question on Vetted and it's one-and-done versus how often people are having a real conversation with the bot?
  11. Amazon has its shopping assistant Rufus AI. Shopify’s Shop App is building AI-powered recommendations into its marketplace experience. How do you see AI shopping assistant experiences in marketplaces playing out?
  12. What role does a trusted reviewer & publisher like Wirecutter have in the world of AI? Does its human powered reviews & recommendations make it higher signal in a world of AI slop, does it become a data provider like a Reddit to companies like Vetted and OpenAI or something else?
  13. Is there a fair amount of manual or semi-manual ranking involved to set which sources should be more reliable than others?
  14. What do you make of agentic commerce and where are the opportunities and what are the current limitations in building agentic shopping experiences?
  15. When you say there are a lot of models in between search and checkout that are vulnerable, are you thinking about all these e-commerce site optimization tools designed to get people to click or buy specific things? Is that what you're thinking about, or is it a broader scope?
  16. Booking travel is one of the only parts of shopping I enjoy—looking at these places and thinking about them and imagining them.
  17. How do you see brands and merchants adjusting to this? We've already seen publishers thinking about AI SEO content, but how do you think brands and merchants will adjust more broadly?
  18. If everything goes right for Vetted in 5 years, what does it become and how is the world changed?

Interview

Tell us about the journey from Slant to Lustre to Vetted, what you learned at each stage and what inspired you to build & focus on Vetted?

All of them effectively had the same North Star. We've always wanted to be the best place on the Internet to get recommendations for what to buy—a trustworthy consumer platform. Slant was a Q&A community approach solving that problem. It was kind of like a mix between Quora and Wikipedia. We scaled that to millions of people a month using it. But manually curating a knowledge graph of all the products on the Internet did not scale. Also, dealing with SEO as a traffic source is challenging as well.

Lessons learned from Slant: we took a lot of that data and started training our neural nets. We asked, can we scale what the community was doing by automating it? So we built our own product knowledge graphs and search tech from scratch and that was Lustre.

With Lustre, we learned just how difficult handling long-tail commerce really was. The NLP tech at the time just failed to enable a product reliable enough to answer all these questions.

Once LLMs came along, everything changed. What they unlocked was the ability to structure unstructured data. We've always wanted to look at YouTube transcripts and Reddit comments, but until LLMs came along, you couldn't really use them. Named entity recognition was terrible for extracting products being discussed, and even sentiment analysis was really unreliable.

So same North Star, but three very different technologies enabling that experience. LLMs have unlocked the ability to actually build a reliable, high-quality product recommendation UX for the first time. I feel quite blessed that Vetted is built on top of that.

What is Vetted in short?

For us, we want Vetted to be the most simple and trustworthy way of knowing what to buy. For me, I'm always the person my family goes to for purchasing advice. A lot of nerds grew up like this—you get a lot of DMs asking for recommendations.

The inspiration was: what if everyone had access to a nerdy, intelligent friend who's read the entire Internet? And you could talk to that person and get that expertise channeled back to you, personalized just for you, from somebody who's effectively read everything.

Tell us about the categories of products that people use Vetted for and how it breaks down. Are there a handful of major categories? What does the long tail look like?

Our target customer is the head of household with a couple of kids. The AOV is around $40—that's really where we're targeting. We're solving for Larry Page’s “toothbrush test”—how do you build a platform that people would use frequently enough to build a relationship with?

That's why we didn't focus on expensive one-off purchases like TVs, which our previous products did. You can't really build a relationship that way unless you're building an SEO-based business model. We're looking for things people buy multiple times. Skincare is good for us. Buying the best bath towels—things you wouldn't really think about doing a lot of research on. But if you want the best $40 bath towels, there are great ones out there. Vetted is a really great way of enhancing everything you buy to be high quality for the money.

In traditional search technology, most people's queries are quite short—just "moisturizer." Then they'll go hunting for reviews to see the one that solved their problem. We now see longer queries like, "I have dry skin. This made me break out in the past. What's the best one for me?" These longer queries create an amazing user experience, but would have been impossible to handle previously. You'd need to scale your NLP to solve these challenges. We've seen a lot of that behavior, which is why there's no one category—it's incredibly long-tail and distributed.

How do you think about the shopper’s journey from awareness, research to checkout? How do you think about each step and where it’s important for Vetted to play and reduce friction?

For us, research is the messiest place today in the journey. It has the most overwhelming choice and is where all the ads are. That's the hard problem to solve. Once you've solved that, you build trust with the consumer, and from there, it's easier to go up and down the funnel.

It's very hard to build a checkout product and go upstream from that because you're dealing with very different problem sets. Upstream, you're not just dealing with technology—it's consumer psychology, how you lay out information, how you help users go through the decision-making process, and how many products to recommend. These are nuanced problems. But if you can nail that, you can earn the right to be the default shopping home and then move up-funnel to discovery and recommendations. Long-term, you can move down-funnel toward checkout itself.

You decided to build Vetted as a horizontal shopping platform, versus a vertical shopping assistant around travel, automotive, or some specific category. How did you think about the tradeoffs and why did you choose to go horizontal?

Depending on your viewpoint, we are a verticalized play. If you're Perplexity or ChatGPT, shopping is just one of the verticals they're tackling. And within shopping, there are tons of more verticalized plays.

From our perspective, we're looking for a vertical you can take on that earns the right to be on someone's home screen for direct use. Because SEO is a hard game these days. Having a business model saying "I'm going to be the best TV recommendation platform on the Internet" is really hard when Google is trying to own that themselves.

Shopping is the only vertical in the history of search that has ever been siphoned away from Google—Amazon has managed to own shopping search. Shopping is deep enough, interesting enough, and frequent enough to warrant its own platform. That's why we sliced that off and focused there.

We want to be that default home for all those behaviors. Beyond just buying bath towels, many of our users ask us what game to play or book to read—any sort of purchasing decisions. We want to own that "what should I buy" habit in our users' minds.

Online reviews have been gamed to the point of being unhelpful, which is why so many people use Reddit for product reviews and recommendations. How do you think about Reddit as core data infrastructure for products like Vetted and what do you see as the future of truthful, helpful & human online reviews & recommendations given the current incentive structures that are in place?

I view Reddit as the last bastion alongside YouTube—the last bastion of authentic knowledge on the web. Everything else has gone away. They are an incredibly important component of how the web works today. The license from ChatGPT shows that. GPT-2 was basically trained just by looking at the links from Reddit—it was the first version of these models using Reddit as a source of quality on the web.

They're incredibly important and will become more important. What I'm hoping is AI systems will help them fight spam, similar to how Google managed to stay one step ahead in the cat-and-mouse spam game. Reddit and YouTube should be able to develop good adversarial spam systems to help keep high-quality conversations alive. They'll find themselves more and more important as this plays out.

OpenAI recently announced shopping features for ChatGPT. What do you think OpenAI can learn from Google’s lack of success in shopping via Google Shopping?

We have a thesis around rethinking shopping from the ground up with an AI edge, which is what we're building at Vetted. OpenAI has made a good product with really fast execution. They'll run into similar challenges as Google, in that shopping is this unique vertical with its own set of dynamics. They'll have a broad focus and have done a good job across queries like Google did, but I think there's an opportunity for a more focused play. There's a lot of nuance in that journey that's hard to get right—the difference between a fashion search versus a commodity search, and the full journey across them.

They're in a good position, obviously becoming the new default starting point for queries. Shopping is a natural behavior there. But I think it also warrants its own standalone approach.

What happened with Google Shopping is interesting. The product has gotten better recently—they've tightened it up a lot. Why did Amazon beat Google in shopping? It's not just that shopping is complicated with its niche, but also the reliability and consistency of the user experience. You open Amazon and know you'll get two-day shipping, good returns, and you know where the user reviews are. It's consistent and expected—you can go there, get it done, and leave.

On Google, you get random merchants from places like mikeshardwareshop.com you've never heard of before. You have to figure all that stuff out. The layout changes all the time. It's confusing.

OpenAI will have a similar challenge because their ambitions are broader than just shopping. So it'll be hard to keep a consistent, deep user experience, and LLMs are naturally non-deterministic. Building a consistent, reliable shopping experience for the whole journey is a hard problem while also trying to build AGI.

Perplexity has a shopping product that has embedded product listings and one-click checkout. How do you think about the tradeoffs of surfacing products directly in answers versus routing users to external merchants?

It's the right solution for the future. Native commerce has been talked about for a long time, and people have tried to make it happen. The challenge is that the bar for execution is incredibly high. It's very difficult to keep your stock status synced, pricing data synced, return data synced—to move all that merchant data onto your experience seamlessly and reliably. Really hard to do. As soon as you start making mistakes—because your vision is for the product to show up at the doorstep the next day—you lose user trust quickly.

Every time the wrong product gets shipped, users lose confidence. To get through those technical complexities, they start making the user experience worse. Perplexity doesn't even give good price comparison across merchants for shopping questions and they no longer support Amazon. They're trying to simplify the space to make one-click checkout work with today's approaches, but it makes the UX worse.

For us, I think it's the right vision. We'd like to get there as well. But for now, I'd rather send really high-intent traffic to a merchant who already knows exactly what SKU the customer wants to buy. They can double-check everything, verify shipping, and make the purchase themselves.

Stripe just announced their Order API yesterday. That looks like a pretty good solution. People have been trying to do this for a long time. Honey bought a company called Two Tap back in the day to do automatic checkout technology. It's been a North Star maker for many people, but it's an incredibly nuanced and difficult thing to get right.

For example, your system may think an order went through, but an anti-fraud system detected it was an automated bot and rejected it in the back end without telling you. You've told your user they bought that J.Crew order, but it never actually went through. There are many nuances, which is why we think it's safer to build a UX that sends users to the merchant today and then slowly move that functionality in-house.

As one of the big early players/successes in the space, I'm really curious to hear your take on what Honey got right and maybe where they were more limited.

Honey had the right team, right time, and right product. They grew at a time when affiliate rates were strong and it was cheap to buy traffic. They were basically printing money immediately and had a really good growth loop.

As I alluded to earlier, they struggled to go up-funnel. When they released their iOS app, the vision was to build something like Vetted—open an app, do searches, buy products, have them checked out, have your coupons all in one place. Good team, good vision, but the technology just wasn't there at the time. The search didn't quite work well enough.

When you're going up-funnel from checkout, so many things need to go right. One little weak link in the chain—whether search being wrong or merchant data being out of sync—and the whole thing breaks. They executed well with their vision, but were too early on the tech side.

Do you have a sense of how often people ask a question on Vetted and it's one-and-done versus how often people are having a real conversation with the bot?

There's a lot of multi-turn conversation. People who become more familiar with Vetted will use longer queries upfront to get to a good outcome in one shot. But many people are used to searching like they're searching Amazon—they come in and search for "electric scooter."

Then Vetted will ask, "Well, what do you want to do with it? Are you commuting? What's your budget?" So Vetted goes back and forth, refines the query, and moves the user through that journey. We see a lot of multi-turn conversations. People ask how these two products compare against each other or about the price history of an item. Multi-turn is interesting, and we're seeing a lot of behavior move in that direction.

Amazon has its shopping assistant Rufus AI. Shopify’s Shop App is building AI-powered recommendations into its marketplace experience. How do you see AI shopping assistant experiences in marketplaces playing out?

I've had some hit-and-miss experiences with Rufus. Amazon recently announced they've had some good success with it, but my personal experience has been pretty hit-and-miss.

Long-term, I don't see these being two competitive ways of using Amazon—the main search bar versus talking to an assistant in an intercom thing in the bottom corner. I think eventually these experiences will be unified. Traditional search technologies for navigation queries will always be superior—it's faster and easier when I know what I want. Help me navigate to it and let me go, which is a lot of the site traffic.

They'll probably do some kind of query analysis—"Hey, this is an informational or research query"—and then embed that into the same experience you're seeing today.

The challenge for Amazon is they have the most well-oiled search machine in commerce ever built. It's very hard to mess with that and try new stuff. So they're keeping it separate for now to give it time to catch up, but I see them combining it back together eventually.

Another challenge when you're Amazon or Google is you have so much product UX tech debt to deal with—how people expect the platform to work. So you have to merge these two worlds together, and like with Google, they now have AI tabs and AI overviews all over the place.

If you go to ChatGPT or even Perplexity to some degree, it's a relatively consistent user experience. That's the challenge when you have this huge money-making search infrastructure—how to move it in this new direction. It's obviously the future, but it probably performs worse for most queries today than their existing infrastructure.

What role does a trusted reviewer & publisher like Wirecutter have in the world of AI? Does its human powered reviews & recommendations make it higher signal in a world of AI slop, does it become a data provider like a Reddit to companies like Vetted and OpenAI or something else?

There are only a handful of them that have the brand and earned trust—Wirecutter, RTINGS.com, Consumer Reports—just four or five others that actually take the time to buy multiple products and really test them. They've earned the right to direct traffic, and I think they'll continue having that. As AI slop happens and people start seeing questionable overall quality, that brand signal becomes more valuable.

But I see their incidental SEO traffic going away. The scenario where someone doesn't know Wirecutter but ends up on their page because it's the first result on Google—that doesn't really happen anymore. They'll need to move to some sort of licensing model, like Reddit's approach, where their data gets incorporated into other platforms.

I also think AI will get better at identifying the right creators. At the end of the day, if you're buying a new coffee machine, your recommendations should be powered by James Hoffman, not a random Reddit thread or even Wirecutter. He's the expert on what coffee machine you should buy. As these systems get better at routing to the right data sources, Wirecutter and good creators will see their traffic routed correctly to them.

Is there a fair amount of manual or semi-manual ranking involved to set which sources should be more reliable than others?

I wouldn't go as far as saying there's manual ranking. There's a lot of our experience being expressed in the prompts and system architecture. In modern systems, if you're doing it right, there are many evaluations for determining if an answer is good or how it should work. We have evaluations to determine what kind of question should be routed where, and if there's an issue, we've probably made a mistake somewhere.

This is what I was alluding to earlier—shopping is a difficult niche. We really want to get this right and earn consumer trust. It's easy to get something working, but ensuring the exact right creator is recommended for the exact right category at the exact right time takes a lot of work—that last 20%, which is 80% of the effort.

It's less about manual data curation and more about extensive evaluations. Plus, a decade of obsession in this space being expressed in how the system works. When you ask a question to Vetted, we actually make over a hundred LLM calls. It's not just one model spitting out an answer. Each one of those steps imbue how we think people should research products and consider what's relevant. When you pull them all together, that's how you get high-quality outputs.

What do you make of agentic commerce and where are the opportunities and what are the current limitations in building agentic shopping experiences?

It's going to be incredibly disruptive. If you buy this vision of coming in, asking something, and having an agent that actually checks out for you, there are a lot of existing business models at risk of being disintermediated—being quite literally bypassed by this agentic system. I think it's the future of how this will work.

That said, there are companies today in a position of having great data on people, owning parts of the checkout—PayPal, Stripe, Visa, Mastercard. They've all announced their own agentic shopping initiatives in the last few weeks. The future is probably one of them figuring out how to build an end-to-end working solution for their customers.

If you're PayPal, you understand my purchasing history and you've solved a lot of the checkout problems that are really hard for someone new to address. There's a chance they could build something quite special. It would benefit not only them and their business model but also their merchants who can now get routed additional traffic.

It's hard to be successful with just a slice of the shopping experience at this point. You either want to nail that OpenAI integration and relationship long-term, or you want to make sure you own the relationship with the end consumer.

When you say there are a lot of models in between search and checkout that are vulnerable, are you thinking about all these e-commerce site optimization tools designed to get people to click or buy specific things? Is that what you're thinking about, or is it a broader scope?

There's that, and there are ads on various platforms along the way that agents won't be reading anymore. But I also think e-commerce site optimizers still have a role.

A big limitation on generic commerce today is this idea that you can say "Go book me a flight to Thailand next week" and it just goes off in one shot. I think that's folly. It won't happen for a while because there are so many micro-decisions being made on that journey today that no agent can personalize.

Take buying coat hangers. You go to Amazon, type "coat hanger," and get options for a 1-pack, 30-pack, 10-pack, plastic, wooden, fabric, two-day shipping etc. It takes milliseconds to load that page and another second to look at them and decide "I want the 10-pack of wooden ones." There's no personalization data that ChatGPT has about me to know I want wooden ones, not plastic, and I want them in bulk.

So the idea of me asking ChatGPT for coat hangers and having it one-shot the purchase is silly. Instead, you need the real-time ability to construct an interface—a more traditional one—to solve the problem in front of the consumer. If you're looking for coat hangers, it renders five different types, you pick one, and then it proceeds. I think you'll see new industries and optimizations inside that flow. People underestimate how many little things current UIs solve quickly and efficiently.

Booking travel is one of the only parts of shopping I enjoy—looking at these places and thinking about them and imagining them.

There's that, and also you don't know what you don't know until you see the options available. You may get inspired and realize you want to go to a different part of the country, or you discover a hotel you didn't know about before. You don't want that read out to you in a Siri message. Traditional interfaces are still superior in various ways. I think you'll see a blend between the two approaches.

How do you see brands and merchants adjusting to this? We've already seen publishers thinking about AI SEO content, but how do you think brands and merchants will adjust more broadly?

It's super early days, so it's hard to know exactly. In general, the philosophy will be "I have a new consumer reading my content and I need to write for them as well." On the merchant side, that may mean ensuring that when an agent visits my product page, it can validate that my product is a good fit for my target consumer and has all the right information—delivery speed, return rate, etc.—quantifiable and exposed to the agent.

We've had frameworks like schema.org for merchants to put metadata about their products, but I know from experience it's very poorly filled out and often out of sync. It's an afterthought for many people today. Making sure that data is well-organized will become more important to ensure pages are effectively machine-readable.

If everything goes right for Vetted in 5 years, what does it become and how is the world changed?

In five years, we want to earn that default place you turn to when making these decisions. That could be an app on a home screen. What I really hope happens is Siri gets its act together and becomes the routing layer for consumers and the apps they should use to answer questions. We want to be the app people install to Siri, so Siri uses us to tell people what to buy. That would be an exciting future for app developers—a new distribution mechanism to load a skill into Siri. We would love to be the shopping skill for people.

How this impacts the industry—what has always frustrated me about e-commerce is how inefficient it really is. If I go to Amazon right now and buy a hundred-dollar blender, I'm not getting a hundred-dollar blender; I'm getting a $14 blender with a bunch of margin passed on to me. Amazon now takes over 50% of GMV as their internal revenue because ads are a huge part of product sales. So $30 of that blender is basically a charge to me because it was so hard to convince me to buy that blender in the first place. That margin is our opportunity.

If Vetted becomes this highly efficient routing layer from consumers to the best products for them, that's a win for everybody. Brands and merchants can focus on building the best possible product for consumers rather than differentiating themselves through advertising. It becomes our job to route them to the best consumer for their product. The whole thing becomes far more efficient and that’s a win for everybody.

Disclaimers

This transcript is for information purposes only and does not constitute advice of any type or trade recommendation and should not form the basis of any investment decision. Sacra accepts no liability for the transcript or for any errors, omissions or inaccuracies in respect of it. The views of the experts expressed in the transcript are those of the experts and they are not endorsed by, nor do they represent the opinion of Sacra. Sacra reserves all copyright, intellectual property rights in the transcript. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any transcript is strictly prohibited.

Read more from

Function Health revenue, growth, and valuation

lightningbolt_icon Unlocked Report
Continue Reading

Read more from

Gorgias revenue, growth, and valuation

lightningbolt_icon Unlocked Report
Continue Reading

Read more from

Clio revenue, growth, and valuation

lightningbolt_icon Unlocked Report
Continue Reading

Invisible revenue, growth, and valuation

lightningbolt_icon Unlocked Report
Continue Reading

Replit revenue, growth, and valuation

lightningbolt_icon Unlocked Report
Continue Reading