LearnPod

Today's Queue

Pod

⚡ AI Engineering

Claude Prompt Caching

Prompt caching lets you mark parts of your prompt as cacheable so Anthropic stores them server-side between requests. Subsequent calls reuse the cached prefix instead of re-processing it — **90% cheaper** on cached input tokens and significantly faster (TTFT reduction).

Minutes

Concepts

+15+30

Read+Quiz

How It Works

python

response = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a code reviewer. Here is the full codebase: ...",
            "cache_control": {"type": "ephemeral"}  # ← cache this block
        }
    ],
    messages=[{"role": "user", "content": "Review the auth module"}]
)

Cache write: 25% MORE than base input price (one-time cost to store)
Cache read: 90% LESS than base input price (the payoff)
TTL: 5 minutes (refreshed on each cache hit)
Minimum cacheable: 1,024 tokens (Haiku) or 2,048 tokens (Sonnet/Opus)
Breakeven: ~4 requests with the same prefix within the TTL window