MinHash for Jaccard
H(S) = min over hash functions of hash values in set S. P[H(S1) = H(S2)] = |S1 ∩ S2| / |S1 ∪ S2|.
Advertisement
SimHash for cosine
Random hyperplane. h(x) = sign(w·x). Angle between vectors ↔ P[different hash].
Advertisement
Multi-probe
Query multiple nearby buckets. Combines low false-negative rate with reasonable index size.
Complexity
Preprocess: O(N × L). Query: O(N^ρ) where ρ < 1 depending on approximation factor c.