Model routing
Cheap classifier picks: easy → Haiku/mini. Hard → Opus/GPT-4. 80% of traffic routes cheap. Bulk savings.
Advertisement
Prompt caching
Long system prompts + few-shot: cache once, reuse. 90% cost reduction on cached prefix.
Advertisement
Batch API
Non-real-time work → batch endpoints. 50% cheaper on Anthropic/OpenAI. 24-hour SLA.
Prompt compression
LLMLingua compresses instructions/context 2-10x preserving semantics. Trade quality for cost.