Setup
Curate 100-10k (input, output) example bank. Embed inputs. Store in vector DB. At inference: embed query → retrieve top-K → include in prompt.
Advertisement
Diversity
Naive kNN picks similar → less diverse. MMR (Maximal Marginal Relevance) balances similarity + diversity. Better generalization.
Advertisement
Cost
Extra embedding + retrieval per query. Small vs LLM call. Retrieval < 50ms typically.
When it helps most
Long-tail tasks: many niche subcategories, each with different pattern. Static few-shot can't cover. Dynamic retrieves right examples.