ClickHouse wins AI agent observability

Diving deeper into

$250M/year Databricks for AI agents

Document
ClickHouse has become the go-to database for AI agent observability
Analyzed 4 sources

ClickHouse is winning AI agent observability because this workload looks less like classic app monitoring and more like a giant stream of append only events that must be sliced fast and cheaply. Prompts, tool calls, traces, evals, and cost records all pile up like clickstream data. That matches ClickHouse’s core design, which is why teams are using it as the storage and query layer underneath debugging and analytics for agent systems.

  • In practice, observability teams use ClickHouse to keep far more history for less money. One operator cut cluster CPU and memory to one third of OpenSearch, extended retention from 7 days to 30 days, and still paid about half as much. Another enterprise said ClickHouse was 10 to 20x cheaper than Datadog at its scale.
  • The reason AI agents fit especially well is latency. At AstraZeneca, agentic AI queries on petabyte scale records ran in under 200 milliseconds on ClickHouse, versus minutes on Databricks dashboards. That makes it usable for tracing what an agent did, why it failed, and how retrieval or prompting changed the outcome.
  • This does not make ClickHouse a full Datadog replacement. Datadog still bundles alerting, routing, remediation, and integrations into systems like Slack and Jira, while ClickHouse mainly provides the database, query engine, and real time analytics layer. The wedge is that many AI teams first need cheap storage and fast debugging before they need a full operations suite.

The next step is a stack split, where ClickHouse becomes the default event store for agent traces and evals, while workflow and alerting products build on top. As agent traffic explodes, the winning observability platforms will be the ones that can keep every interaction, query it in milliseconds, and do it cheaply enough that developers never have to sample away the evidence.