What are the underlying causes of the creation of multiple definitions of metrics, and why can't this be solved within the data warehouse?

The way that BI tools traditionally talk to data and get data is they run SQL queries against your data warehouse, something like Redshift or Snowflake. There may be a code-based SQL editor, or it may be a drag-and-drop tool that generates a SQL query that they execute against the warehouse. Either way, you have a lot of flexibility there. All the aggregation logic for what “revenue” means and how revenue is defined is done in the BI tool itself. The challenge of that is you might define that SQL query differently in two different tools. You might even define it differently in two different dashboards in the same tool. We've seen that happen a lot at Lyft.

Why isn’t that just done in the warehouse? Why not create a definition of revenue, put it into some table that lives inside your data warehouse, and query that table directly? Well, the challenge with that is flexibility. Any time you materialize a metric into a table in a database, you are defining the ways you can slice that data. You are defining the grain of that table.

Maybe you have that metric cut by day or by week, and you want that metric by month, or you want it by product line, or you want it by city. Then you need to create a new table, or you need to run a separate SQL query that queries another raw data set that gives you the flexibility that you need. You run into the same problem over and over again.