dbt as System of Record

dbt Labs

dbt's existing integration with major data platforms positions it well to become the system of record for data assets.

Analyzed 9 sources

dbt has the best shot at becoming the metadata home for data teams because it already sits in the exact workflow where tables, tests, lineage, and metric definitions get created. Analysts and analytics engineers write the transformation logic in dbt, dbt compiles and runs it on Snowflake, BigQuery, Databricks, and Redshift, and dbt Cloud layers on docs, scheduling, cataloging, and semantic definitions above those warehouses. That puts dbt one layer above the storage engines, where cross platform metadata can live in one place.

1 sacra 2 sacra 3 sacra 4 sacra 5 getdbt 6 databricks

The practical advantage is vendor neutrality. A large company may run Snowflake for one business unit and Databricks for another. dbt Cloud already expands across multiple warehouses inside one customer, which makes it a natural place to store shared business logic and asset definitions instead of recreating them in each warehouse catalog.

2 sacra 3 sacra 7 sacra
dbt is moving from a transformation tool into a control plane. Its commercial product already sells collaboration, CI, scheduler, docs, governance, orchestration, observability, and cataloging. That is the same bundle a system of record needs, because metadata only matters if it is tied to how data gets built, tested, refreshed, and consumed every day.

2 sacra 4 sacra 5 getdbt
The competition is real, but narrower than it looks. Snowflake Horizon Catalog and Databricks Unity Catalog are strong inside their own clouds, while dbt can sit across them. Databricks explicitly recommends dbt-databricks for Unity Catalog enabled projects, which shows the relationship is both cooperative and competitive.

2 sacra 6 databricks 8 snowflake 9 databricks

The next step is for dbt to turn its semantic layer and catalog into the default place where companies define what a customer, revenue table, or trusted KPI actually means, once, then reuse it everywhere. If that happens, warehouses remain the systems of storage, but dbt becomes the system that tells every tool and team which data asset is trusted and how it should be used.

2 sacra 4 sacra 5 getdbt