Storage Consolidates Around Open Formats

Charles Chretien, co-founder of Prequel, on the modern data stack’s ROI problem

a lot of the storage is consolidating around those very open formats.

Analyzed 6 sources

The key shift is that data warehouse competition is moving up from who stores the data to who can do the most useful work on top of shared data files. In practice, teams can keep raw data in Parquet based table formats like Iceberg, Delta Lake, and Hudi, then point different engines at the same lake. That makes it easier to swap query engines, mix vendors, and avoid rewriting pipelines every time a new analytics tool wins budget.

1 sacra 2 sacra 3 snowflake 4 databricks 5 delta

Open table formats separate storage from compute. Iceberg adds table metadata, snapshots, schema changes, and transactions on top of files in object storage, and Snowflake can query Iceberg tables stored outside Snowflake. That means the storage layer can stay put while compute choices expand.

2 sacra 3 snowflake
Delta Lake follows the same pattern from Databricks' side. It extends Parquet with a transaction log, stores data and metadata in cloud object storage, and serves as Databricks' default table format. The common ingredient is Parquet files underneath, with open metadata layers on top.

4 databricks 5 delta
The strategic consequence is lower switching cost. Related research on ClickHouse and Firebolt notes that once data sits in Iceberg, the barrier to trying a different query engine drops sharply, even if some engines still deliver better performance on their own managed storage.

1 sacra 6 sacra

This points toward a market where storage standards keep hardening while differentiation shifts to speed, governance, vertical workflows, and AI features. The winners are likely to be the platforms that treat open formats as the default substrate, then layer the best developer experience and highest value workloads on top.

1 sacra 4 databricks 6 sacra