AstraZeneca moves AI retrieval to ClickHouse

Diving deeper into

AI program manager at AstraZeneca on running self-hosted ClickHouse

Interview
AstraZeneca moved agentic AI workloads from Databricks to self-hosted ClickHouse because real-time retrieval from petabytes of patient records required sub-200ms latency that Databricks couldn't deliver
Analyzed 4 sources

This migration shows that agentic AI turns the database choice into an application latency decision, not a data platform standardization decision. AstraZeneca kept Databricks for large scale ML and broader data work, but moved the retrieval loop for AI agents onto self hosted ClickHouse because an agent that has to fetch patient history, group billions of rows, and ground responses for clinicians cannot wait minutes between steps. In that setup, sub second retrieval becomes the product itself.

  • The practical split is speed layer versus governance layer. ClickHouse handles real time retrieval, vector search, logs, and user facing analytics. Snowflake handles complex transformations, regulatory reporting, and cross domain integration. That makes ClickHouse additive inside enterprises, not a full warehouse replacement.
  • The appeal is not just faster queries. AstraZeneca described simple aggregations in 30 to 40 milliseconds, complex groupings across several petabytes in under 200 milliseconds, and a drop from roughly 100 people to 25 at the program level. That is a product architecture win and an operating model win at the same time.
  • This fits ClickHouse's broader lane in the market. It wins where teams need very fast SQL on huge append heavy datasets, like observability, embedded analytics, and now retrieval for AI systems. Databricks remains far larger and broader, but its strength is the full lakehouse and AI platform, not the last mile serving path for every latency sensitive workload.

Going forward, more enterprises will run this kind of two tier architecture, with a heavyweight system for governed transformation and a separate serving engine for AI retrieval and real time analytics. That shift expands the market for ClickHouse and similar engines, especially as AI agents make every slow query visible to end users and every extra second feel like product failure.