Chunking
Split into overlapping windows. Process each. Combine. Overlap prevents cutting mid-idea. Sentence/paragraph-aware splitting.
Advertisement
Map-reduce
Map: process each chunk. Reduce: combine intermediate outputs. Classic for summarization at scale.
Advertisement
Retrieval (RAG)
Embed corpus. Query embeds top-K chunks. Feed only relevant chunks to LLM. Standard for KB Q&A.
Long-context gotchas
Even 200K windows suffer 'lost in the middle': model attends more to start + end. Put critical info at boundaries.