Structure
class BloomFilter {
BitSet bits;
int size, k; // k hash functions
void add(String x) {
for (int i = 0; i < k; i++)
bits.set(hash(x, i) % size);
}
boolean mightContain(String x) {
for (int i = 0; i < k; i++)
if (!bits.get(hash(x, i) % size)) return false;
return true;
}
}Advertisement
False positive rate
~(1 - e^(-kn/m))^k where n = items inserted, m = bit array size, k = hash functions. Design for target FPR.
Advertisement
Optimal k
k = (m/n) × ln(2). ~9-10 bits per item gives 1% FPR with k=7.
Applications
Bigtable (skip disk reads). Chrome (malicious URL check). Cassandra (SSTable lookup). Anywhere disk-latency matters + false positives OK.