Measuring results in A/B tests
The improvement (or lack thereof) observed is what matters in an A/B test.
Your goal is to be able make statements like this: “the group that received the new version of the email performed 2-7% better than the control group.”
You should always report a range of improvement/worsening.
Here’s how to calculate open and click rates:
Click rate
![]() |
| Click rate and improvement equations |
Always use unique counts (unique email opens and/or clicks) and the number of successfully delivered emails.
Margin of error (MOE)
All computed averages/rates have inherent variance/error.
An A/B test is what is more generally known as a “Bernoulli trial” (a random experiment with two possible outcomes). The “standard error” calculation is very straightforward and involves using (the probability of an open or a click — the same number as the rate you calculated above) and N (the number of observations in the group).
The way to calculate standard error is
![]() |
| Standard Error Equation |
![]() |
| probability of observing an event |
For a given statistical confidence level, the standard error needs to be multiplied by a number
![]() |
| One-tail Z score at 1-alpha |
![]() |
| Margin of error equation |
For a 95% confidence interval, you report your individual open and click rates as
![]() |
| 95% confidence interval equation |
(95% is the recommended confidence interval for most A/B tests)
When you report improvement rates (the difference of two numbers with MOEs), the equation for the standard error is
![]() |
| Standard error equation for comparing the difference between two trials/experiments |
where p1 and p2 are the rates for each group, respectively, and n1 and n2 are the number of samples in each group.
Thus, you report the improvement rate as
![]() |
| Improvement rate equation |
As can be seen in the graph below, the standard error peaks when p = 0.5. A/B tests where the conversion rate isn’t on the high or low end tend to require many more samples.
In general, more data lowers the error rate.
| More data lowers the error rate |
In the next article, we will address how long an A/B test should run.









No comments:
Post a Comment