Setup

Curate 100-10k (input, output) example bank. Embed inputs. Store in vector DB. At inference: embed query → retrieve top-K → include in prompt.

Advertisement

Diversity

Naive kNN picks similar → less diverse. MMR (Maximal Marginal Relevance) balances similarity + diversity. Better generalization.

Advertisement

Cost

Extra embedding + retrieval per query. Small vs LLM call. Retrieval < 50ms typically.

When it helps most

Long-tail tasks: many niche subcategories, each with different pattern. Static few-shot can't cover. Dynamic retrieves right examples.