Reservoir Sampling Explained
Problem Given a stream of elements too large to store in memory, pick a random element from the stream with uniform probability. Solution This is a classic problem known as Reservoir Sampling. The goal is to randomly select an element from a large stream of data with uniform probability, even when the entire stream cannot be stored in memory. Introduction to Problem Reservoir sampling, often referred to as Algorithm R as described by Jeffrey Vitter in Random Sampling with a Reservoir, is a widely used technique in data processing. It allows for the random selection of k samples from a set S containing n items, where n is very large or unknown. Each of the chosen k items forms a “reservoir,” ensuring that every item is selected with an equal probability of 1/n. ...