AI Engineering
Claude Prompt Caching
Prompt caching lets you mark parts of your prompt as cacheable so Anthropic stores them server-side between requests. Subsequent calls reuse the cached prefix instead of re-processing it — **90% cheaper** on cached input tokens and significantly faster (TTFT reduction).
2
Minutes
4
Concepts
+45
XP
1
How It Works
python
response = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a code reviewer. Here is the full codebase: ...",
            "cache_control": {"type": "ephemeral"}  # ← cache this block
        }
    ],
    messages=[{"role": "user", "content": "Review the auth module"}]
)
  • Cache write: 25% MORE than base input price (one-time cost to store)
  • Cache read: 90% LESS than base input price (the payoff)
  • TTL: 5 minutes (refreshed on each cache hit)
  • Minimum cacheable: 1,024 tokens (Haiku) or 2,048 tokens (Sonnet/Opus)
  • Breakeven: ~4 requests with the same prefix within the TTL window