What to cache
Long system prompts. Few-shot examples. RAG documents reused across queries. Tool definitions.
Advertisement
Cache breakpoints
Anthropic: up to 4 cache breakpoints per prompt. Content up to each breakpoint cached separately.
Advertisement
TTL
Typically 5 minutes since last hit. Some providers offer longer TTL for higher price.
Cost math
Cache write: ~1.25x normal input cost. Cache read: ~0.1x. Break-even after 2-3 hits.