Sacra Logo Sign In

Tristan Handy, CEO of dbt Labs, on dbt’s multi-cloud tailwinds

Jan-Erik Asplund
None

Background

After we published our report on dbt Labs last month, we had the opportunity to speak with CEO Tristan Handy and the executive team at dbt, get feedback on our revenue model, and learn about the tailwinds driving growth at dbt.

Based on those conversations, we revised our revenue model—which previously showed significant deceleration in 2024 ARR growth—to reflect ARR growth holding in the mid-60% range in 2023 with acceleration through 2024.

Key points from our conversation via Sacra AI:

  • The modern data stack represents an unbundling cycle where ETL (Fivetran), the data warehouse (Snowflake), data modeling (dbt) and dashboarding & BI (Looker) broke out into separate categories with a large number of tools that had to be held together with spit & baling wire. “There was this era in 2020 and 2021 where enterprises were very proud of having sophisticated modern data stacks: ‘We have 12 different products, and our data engineers have stitched them all together, and we have a best-of-breed solution.’ But that was a short-lived moment in time. Very quickly, we've moved back into a world where enterprises don't want to spend all their time duct-taping together a bunch of different vendor solutions… They want stuff to actually work together.”
  • Today, with companies consolidating tools to save on cost and reduce complexity, a rebundling cycle is in effect centered around the data warehouse, with Snowflake, Databricks and the hyperscalers simultaneously integrating with dbt and launching cloud-native data products that compete with dbt. “We believe that cross-cloud solutions like ours will eventually prevail because customers will not concentrate all their investments in a single data cloud. You're absolutely correct that every vendor in this space — Snowflake, Databricks, and others — is expanding their platform offerings. The same patterns are unfolding within the hyperscalers as well. Everyone is broadening their platform, but what none of these companies can truly offer is a cross-cloud solution. That's why this capability is such a significant advantage for us.”
  • In 2024, data clouds had their "VHS moment" as Apache Iceberg emerged as the open table standard—particularly with Databricks, the company behind competing Delta Lake, acquiring Tabular for over $1B to bring the founders of Iceberg in under one roof—making multi-cloud a tangible reality and throwing tailwinds behind cross-cloud solutions like dbt that abstract away the underlying infrastructure and offer an orchestration layer on top. “[T]he data control plane is . . . separate from the underlying data clouds. Increasingly, CDOs [Chief Data Officers] recognize that the world is a multi-cloud world, and they're not going to standardize on just one cloud. What needs to be true, then, is that your business logic, your metadata, all of this stuff, has to live a layer up from the data clouds because you're fundamentally going to have multiple investments there.”

Questions

  1. How did dbt Labs find product-market fit for its commercial product, coming from its open source software origins? What was that journey like and what were the key inflection points?
  2. Tell me about the vision around "One dbt" — making sure dbt Core and the dbt Cloud communities are unified and not split, how does that work?
  3. Focusing on the commercial product, what does it mean to talk about dbt Cloud as a “data control plane” that includes orchestration, observability, and cataloging? It sounds like more than making it easier to transform data?
  4. We’ll come back to multicloud and Iceberg. You mentioned how we're seeing consolidation in the data stack. Companies like Snowflake and Databricks, for example, have end-to-end data stack aspirations. Does dbt Cloud as one of the rebundlers put you on a collision course with other data stack integrators?
  5. In terms of Apache Iceberg adoption, you mentioned it as one of the enablers of multicloud. We’ve seen Databricks acquire Tabular, founded by the creators of Apache Iceberg, and embrace interoperability. Can you talk about Iceberg adoption specifically and how that has been a tailwind for your business?
  6. In terms of dbt Labs’s role in that multicloud data world, can you put those next steps together for us? Is it that dbt Labs as a transformation tool plays a key role as connective tissue —sitting as it does between the data ingestion and BI/analytics layers?
  7. I've heard tools like dbt and dbt Cloud described in terms of different user types: you have hardcore data engineers, less technical data analysts, and in the middle, you have analytics engineers. When you talk about expanding access with features like visual editing, are we talking about dbt becoming a general business tool for nontechnical users as well?

Interview

How did dbt Labs find product-market fit for its commercial product, coming from its open source software origins? What was that journey like and what were the key inflection points?

It's a different journey for every single company. What's a little bit unique about our journey is that most products with open source roots are sold to software developers, so they have a different set of commercial opportunities.

These companies initially commercialize around a product that needs to be run in an extremely demanding production environment. It's very hard to achieve availability, uptime, etc. So a company hosting on your behalf and doing it at a really high level of quality has a lot of value to software engineers.

For dbt, because it's primarily batch-based, that's not as significant a problem for our target market. Data practitioners have a totally different problem set — they don't have the interest in, or patience, for all the technical complexity that comes with using open source. They have an ease-of-use problem, especially in large enterprises where you've got dozens to many hundreds of folks building dbt pipelines. It's a very challenging operational burden to make sure all those folks have working dbt environments.

For many of the less technical people, like data analysts, they aren't familiar with maintaining local development environments at all — they want to do their work in a browser or in a GUI. 

Those are the problems we latched onto very early: making this thing accessible to more and more people. That's been the first step in our journey.

We've done a lot on making dbt accessible to many people - and there's always more to do there. But what you’ll see from us going forward are more technical innovations, but less focused on packaging and integration and making everything easy to use. Instead it will be fundamental innovation around cost optimization and next-generation capabilities.

Tell me about the vision around "One dbt" — making sure dbt Core and the dbt Cloud communities are unified and not split, how does that work?

The One dbt message is very simple and straightforward. There's a very large set of humans in the world that use dbt. Most of them use our open source product, which is common for open source companies. You see a massive installed base of your open source product, and then some subset of that uses the commercial product.

The problem that we are solving with one dbt is that these two sets of humans need to be able to interact well together. Within a given company, you might see some folks using Core and some folks using Cloud, and right now it's actually very hard for those two things to work together. But there are great ways to make these products work much better together, which is very important for the long-term health of the company.

In the past, our answer had been, "Oh well, you should migrate everyone over to Cloud," and that isn't a very realistic answer. We actually have to do a better job meeting folks where they are and taking them on a journey. This is important to us because we're a very community-focused company. No one wants to hear in a functional community that they're using the bad thing or the wrong thing, or that their thing doesn't work with the better thing. The bridge there needs to be really healthy to make the community healthy.

Focusing on the commercial product, what does it mean to talk about dbt Cloud as a “data control plane” that includes orchestration, observability, and cataloging? It sounds like more than making it easier to transform data?

On the data control plane, there are two big trends going on. One is the consolidation of the data infrastructure space, and the other is standards in Iceberg. On the first one, there was this era in 2020 and 2021 where companies — enterprises — were very proud of having sophisticated modern data stacks: "We have 12 different products, and our data engineers have stitched them all together, and we have a best-of-breed solution." But that was a short-lived moment in time.

Very quickly, we've moved back into a world where enterprises don't want to spend all their time duct-taping together a bunch of different vendor solutions. They don't want to maintain a bunch of vendor contracts. They want stuff to actually work together. So the data control plane, in part, is just an integrated vision of how these things should work together.

The way I like to talk about it, is that our bar for quality is the Apple ecosystem. You get a new iPhone and you hold it up to your old iPhone, and it just transfers everything over magically. It just works. Those are the types of experiences that we should expect in data engineering too.

The other part of the data control plane is that it is separate from the underlying data clouds. Increasingly, CDOs [Chief Data Officers] recognize that the world is a multi-cloud world, and they're not going to standardize on just one cloud. What needs to be true, then, is that your business logic, your metadata, all of this stuff, has to live a layer up from the data clouds because you're fundamentally going to have multiple investments there.

We’ll come back to multicloud and Iceberg. You mentioned how we're seeing consolidation in the data stack. Companies like Snowflake and Databricks, for example, have end-to-end data stack aspirations. Does dbt Cloud as one of the rebundlers put you on a collision course with other data stack integrators?

In the software infrastructure space, coopetition is pervasive. It's not a secret. Think about the relationship between Snowflake and AWS. Snowflake operates on top of AWS and utilizes a significant amount of AWS compute and storage, so AWS appreciates Snowflake's business. At the same time, AWS sells a product called Redshift that competes directly with Snowflake. This is the reality that everyone operates in.

You could observe the same dynamic with Confluent Cloud and GCP's competing service called MSK [Managed Service for Kafka]. GCP sellers actually enjoy selling Confluent Cloud because it's excellent and customers love it. The outcome is that customers ultimately get what they want.

We believe that cross-cloud solutions like ours will eventually prevail because customers will not concentrate all their investments in a single data cloud. You're absolutely correct that every vendor in this space — Snowflake, Databricks, and others — is expanding their platform offerings. The same patterns are unfolding within the hyperscalers as well. Everyone is broadening their platform, but what none of these companies can truly offer is a cross-cloud solution. That's why this capability is such a significant advantage for us.

In terms of Apache Iceberg adoption, you mentioned it as one of the enablers of multicloud. We’ve seen Databricks acquire Tabular, founded by the creators of Apache Iceberg, and embrace interoperability. Can you talk about Iceberg adoption specifically and how that has been a tailwind for your business?

Before Iceberg, you could talk about multi-cloud, but there was no great way to execute on it. It involved a lot of double-writing of data, and whenever you double-write data, it costs a lot and it also introduces opportunities for error. Fundamentally, the multi-cloud world doesn't really work if you don't have some way for all these different data clouds to share access to the same data. You didn't actually see us talk about multi-cloud a year ago at this time. 

Sorry to get too meta on you — but everything ultimately comes back to physics. Data in these clouds is literally stored on drives in physical data centers, and you can look at how long it takes for data to move between data centers. If you don't have a way for different data clouds to read from the same data, you're just not going to be able to overcome the physics.

And now, as of June 2024, we have public commitments from really all of the large data clouds to a standard file format. This is like a VHS moment or the Blu-ray moment.

I believe it creates significant upside to the business because it transforms multi-cloud from being merely aspirational into a tangible reality. Now we can focus on the tactical next steps to achieve the vision.

In terms of dbt Labs’s role in that multicloud data world, can you put those next steps together for us? Is it that dbt Labs as a transformation tool plays a key role as connective tissue —sitting as it does between the data ingestion and BI/analytics layers?

If you've seen this data control plane graphic that we have, there are three big layers of that graphic: the data cloud layer, the pipeline layer, and the governance/control plane layer.

So, the first step in realizing the multi-cloud world has to come from the data clouds themselves. If these data clouds don't support common file formats, then everything's dead, and that's why the commitments to a single standard were such a big deal.

But this change has to progress up the stack. The next step has to come from the pipeline layer, which is where we sit. If your pipelines don't work in a multi-cloud way, then again, everything grinds to a halt. Pipeline companies like Fivetran [data ingestion] and dbt Labs — between us we write some ungodly percentage of all the datasets in all of the cloud providers. If our products, as they now do, are able to write data natively to Iceberg, then that data becomes accessible across this multi-cloud world.

I've heard tools like dbt and dbt Cloud described in terms of different user types: you have hardcore data engineers, less technical data analysts, and in the middle, you have analytics engineers. When you talk about expanding access with features like visual editing, are we talking about dbt becoming a general business tool for nontechnical users as well?

This is a meaty question — it's a good one — it depends on the time frame that you're talking about.

The fundamental thing that we have always been about and continue to be about is bringing the best practices of software engineering to data. The core insight that the company was founded on is that production data systems are software systems. We know how to build production software systems and we just need to borrow many of those practices and make them feel native to data practitioners.

The challenge with that vision is that there are a lot of humans that use data and they use it very differently for different purposes, using different tooling. We focused originally on this persona we call the analytics engineer, which was essentially a data analyst who really wanted to be able to self-serve and build their own data pipelines. They got more technical in order to unblock themselves, but they didn't really want to go build Spark jobs or something like that. That's how dbt grew into a thing.

Interestingly, the next persona that got religion on dbt was actually the data engineer. We went more technical first, because dbt is highly configurable, fits nicely into CLI-based workflows, and fits nicely with the other tooling in the space like Airflow. So you saw data engineers go all in on dbt, then the next thing we're doing now is starting to shift a little bit more towards the less technical folks.

One of the things that many people get wrong about this space is that less technical doesn't mean less capable — it means focused on different priorities. People in this space have historically really underestimated the data analyst, much to their detriment. They have thought things like "data analysts can't participate in software engineering best practices," but that's wrong.

The visual editor is a big deal for us. It's making these software engineering best practices—source code management, testing, PRs—accessible to analysts' workflow without making it feel inaccessible. It's bringing all of that stuff to their workflow but making it feel very native to them. This means that everything they do is not just generating tech debt and fragile data systems but actually contributing to production-grade code over time.

If you're talking about the long arc of history of the company, how many people use data in businesses every day? It's hard to know. Out of 8 billion humans, we're talking about mid-hundreds of millions or high hundreds of millions — I don't know, it's a shitload of people. A lot of those people today still use spreadsheets.

If we’ve learned a lesson from all the workflow improvements that we have helped people with, it’s that the improvements are independent of the specific user interface. For example, you could totally imagine a spreadsheet-based interface that’s backed by Git. The point is that over time this innovation will continue to spiral out, and I don't think there are any particular limitations on this in terms of which data users will eventually get their hands on it.

Disclaimers

This transcript is for information purposes only and does not constitute advice of any type or trade recommendation and should not form the basis of any investment decision. Sacra accepts no liability for the transcript or for any errors, omissions or inaccuracies in respect of it. The views of the experts expressed in the transcript are those of the experts and they are not endorsed by, nor do they represent the opinion of Sacra. Sacra reserves all copyright, intellectual property rights in the transcript. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any transcript is strictly prohibited.

Read more from

dbt Labs revenue, growth, and valuation

lightningbolt_icon Unlocked Report
Continue Reading

Read more from

Databricks at $2.4B ARR growing 60%

lightningbolt_icon Unlocked Report
Continue Reading

Databricks revenue, growth, and valuation

lightningbolt_icon Unlocked Report
Continue Reading