Warehouses Not for Production Applications

Diving deeper into

Julia Schottenstein, Product Manager at dbt Labs, on the business model of open source

Interview
warehouses are not a good choice to build production applications on top of.
Analyzed 5 sources

This draws a hard line between software that records transactions and software that analyzes them. Warehouses are built to scan huge tables, run scheduled transforms, and answer broad queries across lots of data. Production apps usually need the opposite, small writes, fast row lookups, strict transaction handling, and predictable response times every time a user clicks a button. dbt fits on the analytics side by turning raw warehouse data into clean tables and shared logic that apps can read from when freshness requirements are loose enough.

  • The useful edge case is not replacing the app database, it is serving read heavy features from warehouse data. That includes customer dashboards, internal admin tools, or near real time status views where data can refresh every few minutes instead of every few milliseconds.
  • The stack around dbt was built as specialized layers, ingestion, warehouse, transformation, BI, because each job has different performance needs. That separation is why dbt became valuable in the first place, it lets teams define business logic once without forcing the warehouse to become the system of record for application state.
  • The market is now trying to close this gap from the warehouse side. Snowflake added Hybrid Tables for lightweight transactional workloads, and Databricks added Lakebase Postgres for OLTP inside the lakehouse. Even so, both moves effectively admit that classic warehouses alone were not enough for production application workloads.

The next wave is more warehouse adjacent applications, not a world where one engine does everything. More customer facing software will read warehouse prepared data for analytics, AI context, and operational dashboards, while dedicated transactional databases keep handling the core write path. That leaves dbt in a strong position as the layer that defines the data those applications consume.