Where to put critical info
Start or end. Middle attention drops sharply. Repeat critical instructions at both ends for insurance.
Advertisement
Cost
1M input tokens ~ $3-10 depending on model. Cache aggressively. Batch similar queries.
Advertisement
Needle in haystack
Retrieve fact planted at random position. Modern models pass simple needle tests. Fail on multi-fact reasoning across long context.
When RAG still wins
Long-context tries end-to-end. RAG selective. RAG cheaper + often more accurate. Long context for exploratory, RAG for production.