ClickHouse reduces headcount from 100 to 25
AI program manager at AstraZeneca on running self-hosted ClickHouse
The 100 to 25 comparison shows that ClickHouse is not just a cheaper database, it changes how much human process the stack demands. In this case the savings come from replacing a broader Databricks workflow, where teams manage notebooks, ML tooling, Delta Lake pipelines, and governance layers, with a narrower speed layer tuned for one job, fast retrieval and analytics on petabyte scale clinical data.
-
The headcount gap is concrete inside the workflow. The interview breaks it into 15 data engineers versus 3 to 4, 4 to 5 DBAs and storage engineers versus 1 to 2, and 25 to 30 ML engineers and data scientists versus 2 to 3 for the same program output.
-
That works because the workload moved from a general purpose lakehouse to a specialized OLAP engine. AstraZeneca reports sub 30 to 40 millisecond simple aggregations and under 200 millisecond complex groupings on billions of rows in ClickHouse, versus minutes on Databricks dashboards.
-
The trade is not no ops, it is different ops. Databricks bundles managed pieces like Delta Lake, MLflow, notebooks, and auto scaling patterns, while ClickHouse asks for more cluster tuning, indexing, sharding, backups, and release testing, but with far less platform sprawl once the workload is well defined.
The next step is that more enterprises will split the stack this way. Snowflake or Databricks will stay as the broad governance and transformation layer, while ClickHouse takes the latency critical serving path for AI agents, search, observability, and user facing analytics, where every second of delay and every extra engineer shows up immediately in cost and product quality.