Retrieval as compliance at AstraZeneca

Diving deeper into

AI program manager at AstraZeneca on running self-hosted ClickHouse

Interview
As a regulatory organization, all information passed to scientists, surgeons, or healthcare professionals must be vetted through compliance and meet regulatory standards.
Analyzed 3 sources

The key point is that in pharma AI, retrieval is part of the compliance system, not just a search feature. AstraZeneca is using ClickHouse to fetch the exact approved internal and external passages that an LLM can cite back to oncologists and other healthcare professionals, which turns speed and vector search accuracy into a regulatory requirement, not a nice to have. Keeping embeddings and metadata in one database also makes it easier to trace what source was used for each answer.

  • AstraZeneca is self hosting ClickHouse because regulated workloads need tighter control over infrastructure and release timing. The team tests stable releases on a fixed cadence, keeps transactional systems separate from analytics, and uses Snowflake as the governance layer for reporting and cross domain controls.
  • The practical value of the unified setup is that the same system stores the vector embedding, the source document metadata, and the fast analytical filters used to narrow results. That matters when an HCP query needs only oncology relevant, policy cleared, current material instead of a semantically similar but non approved answer.
  • This also shows where ClickHouse is winning. It is not replacing Snowflake for enterprise governance, and it is not being judged only against dedicated vector databases. It is being picked for latency sensitive retrieval jobs where one engine can do vector search, structured filtering, and audit friendly grounding at production speed.

Going forward, regulated AI stacks are likely to split more clearly into a governance layer and a real time retrieval layer. That favors databases like ClickHouse that can combine vector search with very fast SQL filtering, while enterprise adoption will keep depending on stronger compliance controls, predictable upgrades, and clear audit trails around every generated answer.