Cost Optimization — Cheap Models + Cached Prefixes

Model routing

Cheap classifier picks: easy → Haiku/mini. Hard → Opus/GPT-4. 80% of traffic routes cheap. Bulk savings.

Advertisement

Long system prompts + few-shot: cache once, reuse. 90% cost reduction on cached prefix.

Advertisement

Non-real-time work → batch endpoints. 50% cheaper on Anthropic/OpenAI. 24-hour SLA.

LLMLingua compresses instructions/context 2-10x preserving semantics. Trade quality for cost.