Do not make this A/B Testing Mistake – The importance of picking the right test6 min readReading Time: 5 minutes
A/B Testing and Its Constituent Parts
A/B testing compares two versions of a webpage or an app to determine which one performs better for a specific goal or outcome. Typically, the variants are shown at random to a subset of a specific customer segment, to study how the members of the segment interact with each variant. The performance of the variants is measured against a primary KPI. These findings are then generalized to a larger population and decisions are made. A/B tests are run till they achieve the desired confidence level.
Let’s break this down.
The first part is the goal that the A/B test needs to be optimized for. In order for an A/B test to be successful, it needs to be optimized for a single or primary KPI (key performance indicator).
The optimization goals vary from enterprise to enterprise. They are arrived at by formulating a hypothesis. An alternate hypothesis – the initial hypothesis that predicts a relationship between variables – is measured against the null hypothesis – a prediction of no relationship between the variables.
When an experience is created, be it on a website or app, it can be created for a specific audience segment or for all traffic to a website. Understanding who the experience is for, and ensuring that a subset of the right audience is exposed to the A/B test is the next critical part of the success of the A/B test. Remember that the findings from the A/b test are going to be generalised to a larger population. It is therefore essential to agree on the definition of the “target audience or segment”.
A/B tests run for a pre-determined time period. If too long, you risk losing money as you delay exposing your customers to the ‘higher converting’ experience. If too short, the findings of the A/B test simply are not statistically relevant and therefore no longer generalizable to a larger audience. So should all tests run at >98% confidence? A trade-off needs to be made between statistical confidence/ relevance and the time it will take to achieve that confidence. For less strategic decisions, don’t set high confidence scores. Of course, the time it takes is also influenced by how much traffic comes to your website.
An obvious, yet common mistake analysts make when designing A/B tests is not limiting the number of variations. If you want to know if a specific piece of content should appear in the first scroll or the third, create two journeys which vary in only that way. Then, any difference in performance can be attributed to just that change. When multiple variations exist within two A/B tests, you go back to guessing which one made an impact, or assume both did.
The below sections explain each of these in greater detail.
Why picking the right test is important:
A common mistake during A/B tests arises from not picking the right statistical technique, leading to skewed results.
Choosing the right test for the experiment is as important as the test itself because it ensures that the right kind of data is being compared for the hypothesis.
This depends on various factors like the distribution of the data, data type, the number of groups, variance in the data etc.
Statistical tests are conducted to ensure that the lift between the two groups validates the hypothesis, instead of leaving the result to chance. These statistical tests have some inherent assumptions which should be validated.
How to choose the right Statistical test:
Before we dive in, we need to touch upon a couple of things. Quite a few factors come into play while picking a test, such as the nature of the variable. Here’s a quick dive into the terms you will come across:
- Predictor Variable: The predictor variable is the independent variable that provides information about the dependent variable with respect to a particular outcome.
- Outcome Variable: The outcome variable is the dependent variable, which is a function of the predictor variable(s).
Or if you want a simpler explanation, here’s our friend Patrick to help out:
- Quantitative Variable: Quantitative Variables, as the name suggests, are quantifiable and measurable. Example: number of units sold
- Categorical Variable: Also known as Qualitative Variables, categorical variables hold values that are usually names or labels. They are not quantifiable. Example: Color of a shirt, breed of dog, etc
The variable type plays a huge role in deciding what test to pick for accurate results.
Let’s get testing!
- Test for normality: Normality refers to the statistical distribution called Normal Distribution, also known as Gaussian Distribution. A normal distribution is symmetrical and continuous.
For continuous data, testing of normality is very important because based on the normality status, measures of central tendency and selection of the type of statistical test necessary for the data (parametric/non-parametric) is decided. When it is a normal distribution, it is said to be “Parametric”. A number of statistical tests, such as the Student’s t-test and the one-way and two-way ANOVA require a normally distributed sample population. Else, we can choose to normalise the data and apply t-tests or go for non-parametric tests like Mann-Whitney test etc. However, the non-parametric counterparts are slightly less efficient in comparison to parametric tests because they are distribution-free.
- Normality can be tested statistically, for eg, Shapiro, or through Graphical ways like QQ Plots, Histograms, and Graphs.
- Check the type of predictor and outcome variable. These variables can either be categorical or quantitative. The statistical technique needs to be picked based on the type of predictor and outcome variables.
If the Predictor variable is Categorical:
If you know that the predictor variable is categorical, there are two ways to proceed depending on the type of the outcome variable.
If the outcome variable is categorical:
- If the outcome variable is categorical, you can proceed with the Chi-Square test.
- Chi-Square Test: The chi-square test compares the size of any discrepancies between the expected results and the actual results, given the size of the sample and the number of variables in the relationship.
If the outcome variable is quantitative:
If the outcome variable is quantitative, you need to run a comparison test on the means of the groups, which further splits your method based on the number of groups and the number of outcome variables within them.
If there are two groups:
- If there are two groups, you should run a two-sample T-Test to compare the means of the two groups.
- Two-Sample T-Test: In a two-sample T-test, the means of two populations could be equal or not. You can use the test when your data values are independent, are randomly sampled from two normal populations, and the two independent groups have equal variances.
If there are more than two groups:
If there are more than two groups, you must check the number of outcome variables.
- If there is only one outcome variable, proceed with the ANOVA test.
Analysis of Variance(ANOVA): ANOVA tells us whether two or more groups are similar or not based on their mean similarity and f-score(the measure of the test’s accuracy)
- If there’s more than one outcome variable, proceed with the MANOVA test.
Multivariate Analysis of Variance(MANOVA): MANOVA is an extension of the ANOVA test and it compares the means of more than two groups to check for their mean similarity based on covariance.
Following these methods ensures that your test reaches significance and delivers the outcome you expect, which leads to constructive decision-making and favourable results.