Algorithm
int[] reservoir = new int[k];
int i = 0;
for (int x : stream) {
if (i < k) reservoir[i] = x;
else {
int j = random(0, i); // 0 to i inclusive
if (j < k) reservoir[j] = x;
}
i++;
}Advertisement
Why uniform
Item at position i has probability k/i of being in reservoir. Inductive proof.
Advertisement
Weighted reservoir sampling
A-Res algorithm. Assign each item random^(1/weight). Keep k largest keys.
Distributed reservoir
Each worker samples its shard. Combine with weights.