Algorithm

int[] reservoir = new int[k];
int i = 0;
for (int x : stream) {
  if (i < k) reservoir[i] = x;
  else {
    int j = random(0, i);  // 0 to i inclusive
    if (j < k) reservoir[j] = x;
  }
  i++;
}
Advertisement

Why uniform

Item at position i has probability k/i of being in reservoir. Inductive proof.

Advertisement

Weighted reservoir sampling

A-Res algorithm. Assign each item random^(1/weight). Keep k largest keys.

Distributed reservoir

Each worker samples its shard. Combine with weights.