Here’s a simple probability problem with an answer you may find surprising. (The statement of the problem and solution are simple. The proof is

Random sample overlap

submited by

Style Pass

2024-04-25 20:30:04

Here’s a simple probability problem with an answer you may find surprising. (The statement of the problem and solution are simple. The proof is not as simple.)

Suppose you draw 1,000 serial numbers at random from a set of 1,000,000. Then you make another random sample of 1,000. How likely is it that no numbers will be the same on both lists?

The probability of no overlap in the two samples is approximately 1/e. That is, in the limit as n approaches infinity, the probability converges to 1/e.

There are Binomial(n², n) ways to choose a sample of size n from a population of size n². After we’ve drawn the first sample, there are Binomial(n² – n, n) ways to draw a new sample that doesn’t overlap with the first one. The probability of such a sample is

The fractions on the last line are in decreasing order, and so their product is less than the first one raised to the nth power, and greater than the last one raised to the nth power.