Vector databases are how RAG systems retrieve relevant chunks. The market is crowded — Pinecone, Weaviate, Qdrant, Chroma, Milvus, plus 'add-on' vectors in Postgres (pgvector) and Elasticsearch. The differences matter when you go past prototype.
Just-vectors vs full-text-augmented
Pure vector DBs (Pinecone, Chroma, Qdrant) excel at semantic search. Hybrid systems (Weaviate, Elasticsearch with kNN, OpenSearch) combine vector + BM25 full-text scoring — often more accurate for technical queries with exact-match terms. For most RAG use cases, hybrid is the right default.
Filter performance
Real queries combine vector similarity + metadata filter (e.g., 'similar docs WHERE author = X AND year > 2024'). Filter performance varies wildly. Pinecone struggles with high-cardinality filters; Qdrant and Weaviate are designed for it; pgvector inherits Postgres index quality.
Operational footprint
| DB | Self-host | Hosted | Best when |
|---|---|---|---|
| Pinecone | No | Yes | Want zero ops, low scale |
| Qdrant | Yes (Rust) | Yes | Filter-heavy queries |
| Weaviate | Yes | Yes | Hybrid search built in |
| Chroma | Yes (Python) | Beta | Prototyping, embedded |
| pgvector | Yes (extension) | RDS/Supabase | Already have Postgres |
Index choices
HNSW: O(log N) lookup, requires ~3-5× memory. IVF: cluster-based, faster build, slightly slower query. Most production deployments use HNSW for query latency. Configure ef_construction ~200, M ~16-32. Tune based on recall@10 measurements on your data.
Cost reality
Pinecone managed pod: $70/mo for ~1M vectors. Self-hosted Qdrant on a $50/mo VM: ~10M vectors. The hosted markup is 5-20× — fine for prototypes, painful at scale. Plan migration path early.