Stand__Sure: An Introduction to A/B Testing (Part 3)

This is a continuation of our A/b testing series. The previous article can be found at An Introduction to A/B Testing (Part 2) - Designing an email A/B test.

Measuring results in A/B tests

The improvement (or lack thereof) observed is what matters in an A/B test.

Your goal is to be able make statements like this: “the group that received the new version of the email performed 2-7% better than the control group.”

You should always report a range of improvement/worsening.

Here’s how to calculate open and click rates:

Open rate

Open rate and improvement equations

Click rate

Always use unique counts (unique email opens and/or clicks) and the number of successfully delivered emails .

Click rate and improvement equations

Always use unique counts (unique email opens and/or clicks) and the number of successfully delivered emails.

Margin of error (MOE)

All computed averages/rates have inherent variance/error.

An A/B test is what is more generally known as a “Bernoulli trial” (a random experiment with two possible outcomes). The “standard error” calculation is very straightforward and involves using

p

(the probability of an open or a click — the same number as the rate you calculated above) and N (the number of observations in the group).

The way to calculate standard error is

Standard Error Equation

where

p

is the rate (or probability) at which the observed event happened

probability of observing an event

For a given statistical confidence level, the standard error needs to be multiplied by a number

One-tail Z score at 1-alpha

which is based on the significance level you choose,

Margin of error equation

For a 95% confidence interval, you report your individual open and click rates as

95% confidence interval equation

(95% is the recommended confidence interval for most A/B tests)

When you report improvement rates (the difference of two numbers with MOEs), the equation for the standard error is

Standard error equation for comparing the difference between two trials/experiments

where p1 and p2 are the rates for each group, respectively, and n1 and n2 are the number of samples in each group.

Thus, you report the improvement rate as

Improvement rate equation

As can be seen in the graph below, the standard error peaks when p = 0.5. A/B tests where the conversion rate isn’t on the high or low end tend to require many more samples.

In general, more data lowers the error rate.

More data lowers the error rate

In the next article, we will address how long an A/B test should run.

01 March 2016

An Introduction to A/B Testing (Part 3) - Measuring results.

Measuring results in A/B tests

Open rate

Click rate

Always use unique counts (unique email opens and/or clicks) and the number of successfully delivered emails.

Margin of error (MOE)

No comments:

Pages