Where to put critical info

Start or end. Middle attention drops sharply. Repeat critical instructions at both ends for insurance.

Advertisement

Cost

1M input tokens ~ $3-10 depending on model. Cache aggressively. Batch similar queries.

Advertisement

Needle in haystack

Retrieve fact planted at random position. Modern models pass simple needle tests. Fail on multi-fact reasoning across long context.

When RAG still wins

Long-context tries end-to-end. RAG selective. RAG cheaper + often more accurate. Long context for exploratory, RAG for production.