What to cache

Long system prompts. Few-shot examples. RAG documents reused across queries. Tool definitions.

Advertisement

Cache breakpoints

Anthropic: up to 4 cache breakpoints per prompt. Content up to each breakpoint cached separately.

Advertisement

TTL

Typically 5 minutes since last hit. Some providers offer longer TTL for higher price.

Cost math

Cache write: ~1.25x normal input cost. Cache read: ~0.1x. Break-even after 2-3 hits.