⚡ AI Engineering
Claude Prompt Caching
Prompt caching lets you mark parts of your prompt as cacheable so Anthropic stores them server-side between requests. Subsequent calls reuse the cached prefix instead of re-processing it — **90% cheaper** on cached input tokens and significantly faster (TTFT reduction).
2
Minutes
4
Concepts
+45
XP
1
How It Works
python
response = client.messages.create(
model="claude-sonnet-4-6-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a code reviewer. Here is the full codebase: ...",
"cache_control": {"type": "ephemeral"} # ← cache this block
}
],
messages=[{"role": "user", "content": "Review the auth module"}]
)- Cache write: 25% MORE than base input price (one-time cost to store)
- Cache read: 90% LESS than base input price (the payoff)
- TTL: 5 minutes (refreshed on each cache hit)
- Minimum cacheable: 1,024 tokens (Haiku) or 2,048 tokens (Sonnet/Opus)
- Breakeven: ~4 requests with the same prefix within the TTL window