AI Agents Creating Infinite Storage Demand

Diving deeper into

Renen Hallak, CEO of VAST Data, on AI agents creating infinite storage demand

Interview
It is three compounding exponents of data that has to be stored forever with very fast access.
Analyzed 7 sources

This claim points to why AI storage is shifting from a back office system to a core bottleneck in the stack. The data load grows in three ways at once, more agents exist, each agent observes more multimodal data, and each agent also creates new outputs like code, video, checkpoints, and memory. VAST is built around that pattern, using one shared all flash system for files, objects, database queries, and GPU pipelines so customers do not keep copying data across separate tools.

  • The practical issue is not just storing more bytes. AI clusters need many GPU servers reading the same training set, embeddings, checkpoints, and model outputs at once. VASTs architecture separates compute from shared flash, so one pool can feed large parallel jobs without splitting data across many isolated arrays.
  • The forever part matters because agent systems create audit trails and reusable memory. If a robot, coding agent, or research agent acts on text, images, audio, or web data, operators want the inputs, outputs, and intermediate state preserved for tuning, debugging, and governance. That turns storage from archive into active working memory.
  • This is also where VAST tries to win against Pure, NetApp, and classic storage vendors. Incumbents are adding NVIDIA certified AI storage, but their products are still mainly storage systems. VAST is trying to bundle storage, metadata catalog, database access, and in place processing into one purchase, which raises wallet share when AI moves from experiments to production.

The next step is that storage vendors will be judged less on raw capacity and more on whether they can act like the data plane for AI factories. As agent fleets grow, the winning systems will be the ones that keep hot data cheap enough to retain, fast enough for GPUs to use immediately, and unified enough that teams stop moving data between separate storage, analytics, and inference stacks.