Warehouses Poor for ML and Streaming

Diving deeper into

George Xing, co-founder and CEO of Supergrain, on the future of business intelligence

Interview
it's not great for machine learning, and it doesn't support real time or streaming analytics as a first-class citizen yet
Analyzed 5 sources

The key strategic point is that the warehouse was built first for batch SQL analytics, so newer workloads like model training and sub second event processing opened the door for adjacent platforms to become control points. In practice, Snowflake became the place dashboards query, while Databricks and stream processing systems grew around notebook based ML, feature pipelines, Kafka style ingestion, and always on event handling.

  • Inside the BI workflow, the warehouse is where teams store cleaned tables and run SQL for dashboards. George Xing describes the problem above that layer, metric logic often lives separately in each BI tool, which is why semantic products emerged even when the warehouse remains the data hub.
  • Databricks was designed around Spark workloads that mix batch, streaming, and ML in one environment. Its lakehouse architecture explicitly positions real time processing and machine learning alongside BI, which is why it became the natural comparison when warehouse vendors stretched beyond reporting.
  • Snowflake has since moved toward both gaps. Its current docs highlight Snowflake ML, including feature store and model registry capabilities, and Snowpipe Streaming for low latency ingestion and live dashboards. That confirms the economic logic George Xing pointed to, more workloads on platform means more compute consumed.

The next phase is warehouse rebundling. Core data platforms are pulling in semantics, ML, and streaming so more applications can run against one governed data layer. That shifts competition away from raw storage and toward who best supports operational analytics, AI workflows, and application building on the same underlying data.