When performing A/B testing, we're measuring the mean of a metric (such as spend or conversion) on two distinct subsets, and then compare the means to

Five ways to reduce variance in A/B testing

submited by

Style Pass

2024-09-28 13:00:03

When performing A/B testing, we're measuring the mean of a metric (such as spend or conversion) on two distinct subsets, and then compare the means to each other. Because of the Central Limit Theorem, the measurement of the two means follows a normal distribution, as does the difference of the two (if A and B are normal, so is A - B).

Per the above, the variance of the lift (defined here as the difference) is $ s^2 = s_A^2 + s_B^2 $, where $s_A^2$ and $s_B^2$ are the variances of the A and B measurements, respectively, and $ s_i^2 = \frac{ \sigma^2 }{ N_i } $. So $ s^2 \approx \frac{\sigma^2}{N} $, where $\sigma^2$ is is the variance of the metric itself over the entire population.

The simplest way to reduce variance is to collect more samples, if possible. For every 4x increase of $N$, we get a 2x decrement of $s$. In code:

Note: in real life, collecting more samples is not always possible, since the amount of samples may be limited, or there may be a time limit by which we want to conclude the experiment. Also, in many situations, the relationship of the length of time $t$ that the experiment runs and sample size $N$ is sub-linear, because over time users tend to return. In other words, if we run an experiment for 2 weeks, and get $2N$ users, running it for another 2 weeks will probably yield less than $2N$ additional users.