Structure

class BloomFilter {
  BitSet bits;
  int size, k;  // k hash functions
  
  void add(String x) {
    for (int i = 0; i < k; i++)
      bits.set(hash(x, i) % size);
  }
  boolean mightContain(String x) {
    for (int i = 0; i < k; i++)
      if (!bits.get(hash(x, i) % size)) return false;
    return true;
  }
}
Advertisement

False positive rate

~(1 - e^(-kn/m))^k where n = items inserted, m = bit array size, k = hash functions. Design for target FPR.

Advertisement

Optimal k

k = (m/n) × ln(2). ~9-10 bits per item gives 1% FPR with k=7.

Applications

Bigtable (skip disk reads). Chrome (malicious URL check). Cassandra (SSTable lookup). Anywhere disk-latency matters + false positives OK.