Charles Chretien, co-founder of Prequel, on the modern data stack’s ROI problem

Jan-Erik Asplund
View PDF
None

Background

We've previously covered the evolution of the modern data stack through interviews with Prequel co-founder Conor McCarter, dbt CEO Tristan Handy, and Census co-founder Sean Lynch.

To learn more about where data infrastructure is heading now, we reached out to Charles Chretien, co-founder of Prequel ($5M seed, NextView).

Key points from our conversation via Sacra AI:

  • The 2023 data budget crunch exposed a fundamental ROI problem with the modern data stack movement: horizontal tools that process terabytes of data often struggle to show business impact, while verticalized data platforms like CDPs can point directly to lift in CTR, conversion rates, and paid media efficiency. "One of the big problems that the space has encountered is it can be really hard to measure the ROI of data and to show the value. You had companies investing in it and spending crazy data budgets. Then when budgets got tighter in 2023, those were some of the first tools to get cut just because it was hard for anyone to look at it and be like, 'Hey, this is driving meaningful ROI for our business.'"
  • Databricks is growing nearly 2x faster than Snowflake because it started upstream with managed Spark (launched 2016), expanding downstream into warehousing (launched Databricks SQL in December 2021) and bundling adjacent products, versus Snowflake which started downstream with the warehouse (launched 2015) and must shift left. "Databricks started with their data pipeline tool, as managed Spark. You look at everything that could be downstream of a Spark job, and there's a data warehouse, but there's a bunch of other things too. They can keep on adding to those to grow their business... On the other side, you've got Snowflake, which started further downstream. They started with the data warehousing piece. If they want to keep on expanding, they've got to try and shift left, which is just going to be harder for them to pull off."
  • AI is driving a new wave of data infrastructure demand as companies race to power AI experiences with customer data, which at mid-market and enterprise lives primarily in data warehouses, creating tailwinds for companies like Prequel ($5M seed, NextView) that can help SaaS vendors pull data from customer warehouses into their apps. "Companies are trying to power more AI experiences. To feed those AI experiences, they need data from their customers, whether it's context or other inputs. If you're talking about mid-market plus and enterprise, [that data] is already centralized in the data warehouse."

Questions

  1. The “modern data stack” seems to have peaked in ~2021 and has been rolling back since. We’ve seen a wave of consolidation in the space. How would you describe the history of the modern data stack, what has happened over the last ~5 years and where are we today?
  2. How do buyers think about constructing their data stacks today? Are they looking for more consolidation / all-in-one of their tools and if so, what is driving that?
  3. What effect is AI having on how companies think about their data infrastructure? Does the team construction look the same or is that changing as well?
  4. Fivetran has been rolling up a lot of companies in the modern data stack including Census and dbt. How do you think about what Fivetran is building and becoming by virtue of these acquisitions?
  5. Help us understand what has happened in CDP over the last 3-5 years. Hightouch entered the space via reverse ETL. mParticle verticalized for ecommerce before its acquisition by Rokt. Segment merged CDP with messaging via Twilio Engage. Customer.io added CDP to bundle it with email automation. Rudderstack appears to still be pure play CDP. What does CDP mean and what is the opportunity today?
  6. Reverse ETL as a category appears to be dead. Census was acquired by Fivetran and Hightouch has become a CDP. What learning should we take away from the demise of reverse ETL?
  7. You’ve called Databricks a data platform with a data warehouse business vs. Snowflake which is a data warehouse trying to become a data platform. What does that mean? Why is it easier for Databricks to add more lines of business and drive attach to those businesses vs. Snowflake doing the same?
  8. Prequel has launched data import in addition to data export. What’s the customer need that drove that decision and why should import & export be part of one offering?
  9. How do you think of how Prequel relates to other “enterprise readiness” platforms like WorkOS (SSO) and Vanta (SOC-2)? Are they potential channel partners or future competitors?
  10. Are niche databases continuing to proliferate and why? Is fragmentation increased or declined in the database & data warehouse space over time and if so, what’s happening there?
  11. What about something like Supabase, what do you make of their rise in the wake of all these AI coding apps?
  12. Do you see the agentic, autonomous database creation and management as being important here?
  13. Is AI a tailwind for Prequel as companies look to consolidate their SaaS data into one place to build internal AI experiences on top? How does Prequel intend to position more strongly for AI?
  14. If everything goes right for Prequel over the next five years, what does the company look like in 2031? How is the world different?
  15. Anything we didn't talk about or any hot takes, LinkedIn posts you want to rehearse?

Interview

The “modern data stack” seems to have peaked in ~2021 and has been rolling back since. We’ve seen a wave of consolidation in the space. How would you describe the history of the modern data stack, what has happened over the last ~5 years and where are we today?

Back around 2020, the concept of a data stack was still fairly new. It really started with Redshift in the early 2010s, and then Snowflake came fairly rapidly after that. You had Fivetran gaining steam into the late 2010s. 2020 is sort of the first time that you've got this full-fledged suite of tools that you can use to do a bunch of data analysis. It had all this promise and these expectations tied to it. You saw massive investment and coupled with low interest rates and a ton of capital from the VC side, you had all these tools pop up.

One of the big problems that the space has encountered is that it can be hard to measure the ROI of data and to show the value. You had companies investing in it and spending big on data budgets. Then when budgets got tighter in 2023, those were some of the first tools to get cut, simply because it was hard for anyone to look at it and be like, "Hey, this is driving meaningful ROI for our business." You had this return to sanity of sorts with consolidation around the few main tools in the space—the data warehouse and the ETL tool to name a few. But a lot of the more add-on type tools got a lot less popular at that point in time.

How do buyers think about constructing their data stacks today? Are they looking for more consolidation / all-in-one of their tools and if so, what is driving that?

I think they are. That theme is true even outside of the data stack for what it's worth. We're in a bundling cycle across the board (even outside of data), where tools are trying to become platforms and are being forced to do more and more for their users. In data, that's definitely the case.

You got Fivetran that bought DBT recently. It's probably the most high-profile acquisition we've seen. But across the board, you're seeing these tools just trying to offer more and more capabilities to try and not get obsoleted, and command higher spend from their customers.

What effect is AI having on how companies think about their data infrastructure? Does the team construction look the same or is that changing as well?

There's a couple effects. One is people are realizing that data is a core driver of the AI experiences they can power. In this weird way, it's more important today than it ever was. You're seeing more investment in core data infrastructure, core data engineering, those types of things. The other dynamic is a lot of people seem to be interested in this idea of text-to-SQL, ie you can ask a question in English and get an answer back from your data.

That's not a trend that we're super bullish on at Prequel, or rather one that we haven't seen play out yet.

There's a lot of headwinds for that. Specifically, the hard part is that the nuance of data is about exactly how it's defined, and that can be very hard for a model to grasp. Even if the model can write a SQL query for you, it might not realize that a given column has this very specific meaning to it. A really great data analyst will help you refine the question you want to ask because they know the dataset, and they're trying to help you bridge the gap between your ask and your business need. I don't know that models are quite able to do that yet. But it's something that a lot of people are working on, so you might see a lot of improvement there as well.

Fivetran has been rolling up a lot of companies in the modern data stack including Census and dbt. How do you think about what Fivetran is building and becoming by virtue of these acquisitions?

I don't have any inside intel—this is all just me observing the market and various incentives. But what it looks like from the outside is two things. One, Fivetran is facing some competitive pressure on their core business.

Their core business is moving data from your production database to your data warehouse or from your SaaS tools to your data warehouse. You have data warehouses that are offering the "replicate your production DB to your data warehouse" part. BigQuery famously does this on GCP. You're seeing others start to catch up.

That line of business is getting attacked. At the same time, you're seeing SaaS tools offer their own data pipelines—Stripe data pipeline. Let's replicate your Stripe data from Stripe to your data warehouse. I'm a little biased here because that's what we enable software vendors to do, but you're seeing more and more of that happen. You're seeing this attack on their main line of business.

At the same time, they're a growth-stage company. Their last valuation was around 5.5 billion, somewhere around there. Clearly, the next step for them is to IPO. In order to do that, they have to show that they can command a lot of growth, and the best path for them to do that is to become a platform.

If I'm Fivetran, I'm asking, "Okay, I help you move the data. What else can I do around that that's going to be value-add?" The first obvious step is, "Well, I'll help you transform the data too." It's kind of a no-brainer that you would look at DBT or that type of capability and say, "If you look at ELT—extract, load, transform—they're doing the E and then the L, and now they're also doing the T." It's great. Makes sense. Great synergy. Good for the customer.

But if I'm them, I probably wouldn't stop there. I would ask, "What else can I help you do?" What's funny is Fivetran and DBT use up all this data warehouse compute. They run a ton of workloads on the data warehouse. They're by far the biggest driver of compute on places like Snowflake. So, hey, if we get into the business of compute and we make it a default option for our customers, we could probably drastically expand the piece of the pie that we own. If I'm them, that's probably what I'm looking to do, either buying an existing compute engine or building one in-house and starting to sell compute to grow my business for the next 10x step.

Help us understand what has happened in CDP over the last 3-5 years. Hightouch entered the space via reverse ETL. mParticle verticalized for ecommerce before its acquisition by Rokt. Segment merged CDP with messaging via Twilio Engage. Customer.io added CDP to bundle it with email automation. Rudderstack appears to still be pure play CDP. What does CDP mean and what is the opportunity today?

I have less of a take about whether every SaaS app is going to be a CDP and more about why CDPs seem to be doing well and what data warehouses will do next. What's really interesting about the CDP is it's effectively a data platform, but it's been verticalized. You have a finite set of inputs, and then you've got to massage the data in some way, and then you're going to push it back to a few places like your ad platform, your messaging platform, and so on to activate that data and act on it.

You've got these very finite integrations on both sides, and you've got a fairly finite set of workflows you might want to do in the middle too. That allows those companies to offer a more compelling experience and a much easier experience. You can take all those flows, template them out, and now as a marketing team, you can do this without spending a ton of money on a data team, data expertise, and so on.

You're driving a really clean ROI. You can point to the lift in CTR, the paid media efficiency, the conversion rate, whatever your KPI is. It's a very compelling sell compared to a more horizontal data platform that might be like, "Hey, we process two terabytes of data for you." It's like, "Great, but what did that actually accomplish for me?" CDPs have this compelling story they can tell, and I suspect that you will see data platforms continuing to verticalize.

As people are realizing it's hard to drive value out of horizontal data. Instead, it's useful to think about "What are the niches in which the data is the most valuable?" Marketing and customer acquisition is clearly one, and that's the one that's seen the most progress to date, but others will follow.

Reverse ETL as a category appears to be dead. Census was acquired by Fivetran and Hightouch has become a CDP. What learning should we take away from the demise of reverse ETL?

The gist there is it's a really important capability. Your data is only as good as how you activate it, and so getting it into the tools where you can use it is really important.

But what the players probably realized is that it's a fairly small TAM for venture-backed companies.

There was probably not enough room for them to grow into tens-of-billions-of-dollars-type businesses. You had Census exit to Fivetran, which I'm sure was a good outcome for them. You had Hightouch pivot into being a more full-fledged CDP because they're already working with marketing teams. It was one of those that just didn't make sense as a standalone tool.

You’ve called Databricks a data platform with a data warehouse business vs. Snowflake which is a data warehouse trying to become a data platform. What does that mean? Why is it easier for Databricks to add more lines of business and drive attach to those businesses vs. Snowflake doing the same?

There's a couple of core dynamics at play. One is going back to this idea that it's easier to add product lines downstream from where you sit. If you take Databricks, where they started was their data pipeline tool: it's managed Spark initially. It's really about moving data around, processing data, and so on.

You look at everything that could be downstream of a Spark job, and there's a data warehouse, but there's a bunch of other things too. They can keep on adding to those to grow their business. They can also offer adjacent features. For example, they bought Tecton, the ML feature store. It lets them expand into a ton of white space.

On the other side, you've got Snowflake, which started further downstream. They started with the data warehousing piece. Their initial move was to expand downstream of that into more BI. They bought Streamlit. They are pushing Cortex (their AI analyst) and so on. But that market is, by definition, smaller, so it's harder for them to grow into. If they want to keep on expanding, they've got to try and shift left, which is going to be harder for them to pull off. You're seeing that start to happen. They bought Crunchy Data to try and offer an application database, to be upstream of the data warehouse, things like that.

I think that's shown in their revenue lines. Databricks is at $4 billion in run rate, give or take, based on available numbers. Only about $1 billion is from data warehousing. It's clearly a line of business that they're growing well, but that they've added to an existing business. Whereas Snowflake's revenue is around $5 billion, something like that. But the lion's share of that is data warehousing. They're clearly having a harder time going multi-product and multi-business line.

Prequel has launched data import in addition to data export. What’s the customer need that drove that decision and why should import & export be part of one offering?

Absolutely. Companies are trying to power more AI experiences. To feed those AI experiences, they need data from their customers, whether it's context or other inputs. They want to pull data in from their customers.

On the customer side, this data lives mostly in data warehouses. It can live in a bunch of SaaS tools, but if you're talking about mid-market plus and enterprise, it's already centralized in the data warehouse. Really, companies want to pull in data from their customer's data warehouse. That's exactly what we're enabling, what we're powering.

For us, it's very much an AI tailwind type of play. We're at this unique moment in time where everyone's trying to ship AI experiences. They need data to do this, and we're going to be the pipe that gets them that data. As far as why it makes sense for import and export to coexist, from a technical standpoint, it's a very adjacent problem space. We can take all the expertise we have from moving large volumes of data in one direction and leverage that to move data in the other direction and do that incredibly well.

How do you think of how Prequel relates to other “enterprise readiness” platforms like WorkOS (SSO) and Vanta (SOC-2)? Are they potential channel partners or future competitors?

We think of them as great channel partners. We have a great relationship, for example, with Michael Grinich at WorkOS. It's this nice thing where we have very similar ICPs, but our value props are so different and our domains of expertise are so different. We're all tackling huge markets that it doesn't feel like in a year we're going to be going toe-to-toe with WorkOS on deals. It's more of a nice synergy, if anything.

Are niche databases continuing to proliferate and why? Is fragmentation increased or declined in the database & data warehouse space over time and if so, what’s happening there?

To touch on the M&A piece briefly, both Snowflake and Databricks realized it was essential for them to have an application database offering, and so that's why you saw the Neon and the Crunchy Data acquisitions. As far as the market more broadly, I would argue you're seeing more consolidation around standards in an interesting way. I'll distinguish briefly between two types of databases, which you might be familiar with.

You've got OLAP, which is mostly your analytical database used for running big analytical queries like sums. And then you've got your OLTP, which is your transactional database, more like your application database—Postgres, MySQL, whatever

In OLAP, what's interesting is you're seeing consolidation around the file format. There's this thing called open table formats. You've probably heard of them—Iceberg, Delta, Hudi, and they're all mostly based on Parquet as the storage format. Even though you're seeing a bunch of different compute engines—Snowflake and Databricks and ClickHouse and whatever else for different use cases—a lot of the storage is consolidating around those very open formats. That's great for the ecosystem because there's more interoperability. There's less vendor lock-in. It sort of makes the story nicer for teams.

On the transactional side, there's definitely a lot of new databases popping up. You know, we can talk about Tiger, CockroachDB, FerretDB, Neon, Vitess, PlanetScale. There's a lot. What's interesting, though, is they're also mostly coalescing around a set of standards. For example, most of the ones I just talked about are Postgres-based. They're forks of Postgres or just Postgres under the hood.

My high-level take would be that it's a good time to be a data consumer because you got all these choices, and they all play pretty nicely with each other. You can kind of specialize depending on your query pattern and your need.

As far as whether you expect to continue seeing more of a proliferation or not, I don't have a strong take there, but it'll be interesting to follow.

The last piece I'll mention is you're also seeing a new crop of databases for more AI-type applications like LanceDB and so on. The main open question for me there is whether these formats are going to win or if they're going to be well-served by extensions of existing formats. The canonical example is when people got into vectors, Pinecone was all the rage. And then pgvector, the Postgres extension, started doing pretty well. Now it's much less clear that Pinecone has a ton of value-add, and I wonder if you're going to see those types of things too with LanceDB and so on.

What about something like Supabase, what do you make of their rise in the wake of all these AI coding apps?

You've got this confluence of factors. A, it's Postgres, so it's very well understood. The LLMs are pretty good at working with it because there's a ton of data. The training dataset is huge. It also connects easily to a bunch of different tools.

But also what Supabase has done, which is awesome, is you can easily spin up new instances. It's very well-suited to those agentic or AI coding-type workflows because your cost of creating a new database and running something is super low.

Actually, you mentioned LinkedIn earlier. One of the posts I want to write at some point is why Supabase is the biggest existential risk to Snowflake or Databricks. I haven't fully fleshed out the argument yet, but the gist of it is they sit upstream from Snowflake and Databricks. If they continue capturing the market and they get into, "Hey, we're going to offer a data warehouse solution," they might not replace Snowflake and Databricks at current accounts, but all the new crop of companies that are popping up might never need to purchase a Snowflake or Databricks, and that would be a big problem for them.

Do you see the agentic, autonomous database creation and management as being important here?

Definitely. That's part of what made Neon so appealing to Databricks. A lot of Neon's business was coming from those agentic workflows because they had this unique architecture where it was really easy to spin up a new database on top of existing storage. Yeah, that definitely seems to be where the puck is going. Like a lot of the AI space, the question is when is it going to graduate from the vibe coding piece to the more standard way of doing things? But there's a lot of potential there.

Is AI a tailwind for Prequel as companies look to consolidate their SaaS data into one place to build internal AI experiences on top? How does Prequel intend to position more strongly for AI?

Absolutely. What we're currently doing is we're getting incredibly good at being this connective data infrastructure between our customers and their customers. We're sort of like the data pipeline, the data highway between all these software companies, and that's what we want to double down on.

AI has made the demand for that data and those data integrations just so much more important. Our near-to-medium-term vision is to continue being the default player enabling that, and then expanding out from there into a few more use cases that we have in mind.

If everything goes right for Prequel over the next five years, what does the company look like in 2031? How is the world different?

That feels like forever away. Are we going to have jobs in 2031?

At a high level, similar answer. We're really interested in being that connective tissue between all those software companies, enabling interoperability and really easy data flow and data movement, and then just powering experiences on top of that.

Anything we didn't talk about or any hot takes, LinkedIn posts you want to rehearse?

Oh man, I have so many hot takes. We have a post coming out soon about this, so I'll just briefly tease it. A hot take we have is that data warehouses are going to go vertical and start powering vertical experiences instead of just being horizontal platforms. You've got context clues here and there to help you foresee this.

Snowflake just bought Observe, which is an observability tool. ClickHouse had previously made an observability acquisition. In a different vertical, you have Databricks who recently brought on the former founder of ActionIQ, which was a CDP. I wouldn't be surprised if Databricks went into the CDP game. But I would expect to continue seeing fireworks in the battle between the warehouses and seeing them duke it out on specific verticals.

Disclaimers

This transcript is for information purposes only and does not constitute advice of any type or trade recommendation and should not form the basis of any investment decision. Sacra accepts no liability for the transcript or for any errors, omissions or inaccuracies in respect of it. The views of the experts expressed in the transcript are those of the experts and they are not endorsed by, nor do they represent the opinion of Sacra. Sacra reserves all copyright, intellectual property rights in the transcript. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any transcript is strictly prohibited.

Read more from

Tristan Handy, CEO of dbt Labs, on dbt’s multi-cloud tailwinds

lightningbolt_icon Unlocked Report
Continue Reading
None