MinHash for Jaccard

H(S) = min over hash functions of hash values in set S. P[H(S1) = H(S2)] = |S1 ∩ S2| / |S1 ∪ S2|.

Advertisement

SimHash for cosine

Random hyperplane. h(x) = sign(w·x). Angle between vectors ↔ P[different hash].

Advertisement

Multi-probe

Query multiple nearby buckets. Combines low false-negative rate with reasonable index size.

Complexity

Preprocess: O(N × L). Query: O(N^ρ) where ρ < 1 depending on approximation factor c.