Prompt Caching Cuts Agent Costs

Diving deeper into

Tasklet

Company Report
Prompt caching, a technique the team developed during the Shortwave years, is a key lever for keeping per-run costs manageable as session context grows longer.
Analyzed 5 sources

Prompt caching is what turns long running agents from a margin trap into a software business. As Tasklet sessions get longer, the expensive part is often not the latest user instruction, but the repeated context, system rules, tool definitions, and prior turns that must be sent back to the model. Caching lets Tasklet pay full price once for that stable context, then reuse it at a much lower read cost on later steps, which matters because Tasklet is built on Anthropic and sells higher intelligence tiers with larger context windows.

  • Anthropic prices cached reads at 10 percent of normal input token cost, while cache writes cost more up front. That means a long agent run gets cheaper after the first step if the same prompt frame is reused, which is exactly the pattern in multi step automation.
  • What is being cached is concrete, not abstract. Anthropic allows caching of system instructions, tool definitions, and message content. For Tasklet, that can include the agent rules, active tools, and accumulated session history that would otherwise be retransmitted on every turn.
  • The Shortwave connection matters because email is an unusually context heavy workflow. Inbox state, prior threads, user style, and task instructions all persist across actions. The same team is now applying that lesson to Tasklet, where longer lived agents create the same token repetition problem at a bigger scale.

Going forward, the winners in agent software will not just have better models, they will have better cost architecture around those models. If Tasklet keeps pushing more work into long lived, multi tool sessions, caching becomes a core product advantage because it supports richer context, steadier gross margins, and a cleaner upsell from Basic to Genius without inference costs rising one for one with session length.