01 March 2016

An Introduction to A/B Testing (Part 3) - Measuring results.

This is a continuation of our A/b testing series. The previous article can be found at An Introduction to A/B Testing (Part 2) - Designing an email A/B test.


Measuring results in A/B tests

The improvement (or lack thereof) observed is what matters in an A/B test.
Your goal is to be able make statements like this: “the group that received the new version of the email performed 2-7% better than the control group.”
You should always report a range of improvement/worsening.
Here’s how to calculate open and click rates:

Open rate

Open rate and improvement equations
Open rate and improvement equations

Click rate

Click rate and improvement equations
Click rate and improvement equations


Always use unique counts (unique email opens and/or clicks) and the number of successfully delivered emails.


Margin of error (MOE)

All computed averages/rates have inherent variance/error.
An A/B test is what is more generally known as a “Bernoulli trial” (a random experiment with two possible outcomes). The “standard error” calculation is very straightforward and involves using p (the probability of an open or a click — the same number as the rate you calculated above) and N (the number of observations in the group).
The way to calculate standard error is

Standard Error Equation
Standard Error Equation

where  is the rate (or probability) at which the observed event happened 

probability of observing an event
probability of observing an event
For a given statistical confidence level, the standard error needs to be multiplied by a number
One-tail Z score
One-tail Z score at 1-alpha

which is based on the significance level you choose,

Margin of error equation
Margin of error equation
For a 95% confidence interval, you report your individual open and click rates as
95% confidence interval equation
95% confidence interval equation
(95% is the recommended confidence interval for most A/B tests)
When you report improvement rates (the difference of two numbers with MOEs), the equation for the standard error is
Standard error equation for comparing the difference between two trials/experiments
Standard error equation for comparing the difference between two trials/experiments

where p1 and p2 are the rates for each group, respectively, and n1 and n2 are the number of samples in each group.
Thus, you report the improvement rate as
Improvement rate equation
Improvement rate equation

As can be seen in the graph below, the standard error peaks when p = 0.5. A/B tests where the conversion rate isn’t on the high or low end tend to require many more samples.
In general, more data lowers the error rate.
More data lowers the error rate
More data lowers the error rate

In the next article, we will address how long an A/B test should run.

No comments: