Pipeline
Ingest → chunk → embed → store in vector DB. Query → embed → similarity search → top-K → prompt with context → generate.
Advertisement
Chunk size
Trade-off: small chunks (150-300 tokens) precise but lose context. Large (500-1000) rich but noisy. Overlapping windows help.
Advertisement
Embedding models
text-embedding-3-small (OpenAI), Voyage AI, Cohere embed. Multilingual variants. BAAI/bge open-source.
Reranking
Retrieve top-50, rerank via cross-encoder (Cohere Rerank, BGE reranker). Feed top-5 after rerank. Big quality gain.