Indexing pipeline

Source changes (DB CDC, doc upload) → enrichment (extract, embed) → indexer → search backend (Elastic, OpenSearch, Vespa). Eventual consistency; new content shows up in seconds to minutes.

Advertisement

Retrieval vs ranking

Retrieval: get candidate set (top 1000) from index. Ranking: re-rank candidates with richer features (user signals, ML scores). Often separate services; ranking is the slower, more iterative side.

Advertisement

Freshness vs quality

Real-time index (every doc indexed immediately): great freshness, slower queries. Bulk index (daily): faster queries, stale results. Hybrid: real-time for hot content, bulk for archive. Most production systems land here.