- Behind The Screen
- Posts
- Statistical significance explained like I'm five
Statistical significance explained like I'm five
Even PhDs misunderstand statistical significance. Let me explain it in layman's terms.
Almost everyone on DTC X misinterpret statistical significance. And I get it. It’s funky, and I’ve seen plenty of PhDs misunderstand statistical significance, too.
I have a Master's degree in quantitative economics. So I figured I’d explain it to you in layman’s terms.
Sample variance
The first thing to understand is that any time you A/B test something, you do so with a sample, not the full data set.
For example, if you randomly weighed 5 men and 5 women on the street and took the average, you may end up with data that says that men and women weigh the same. But, in reality, if you weighed all men and women in the world, the average man would weigh more than the average woman.
So, in samples, you can observe data that aren't necessarily true if you had the whole data set. It’s called sample variance. But, as you could imagine, if you weighed more and more men and women, the variance decreases and you get closer and closer to the "truth". Sample size matters.
Testing landing pages
Now, let's say that we test conversion rates of two landing pages, A and B. Let's say the CRs are 1.3% for A and 2% for B in a sample of 10,000 visitors (5000/each variant).
To figure out whether B is statistically significant better than A, or just random sample variance like in the weights example, we use a tool called p-values. The current practice in many industries is to a p-value and check whether it's below 5%. If it is, B is significantly better than A at a 5% significance level.
P values
P-values assume that the conversion rates of A and B are IDENTICAL. That B isn’t better than A.
Then, p-values ask "how likely is it that we would observe results AT LEAST AS EXTREME as in our sample even though A and B are actually identical in performance?" That's p-values.
So, a 4% p-value means that there's a 4% probability of observing a difference AT LEAST AS EXTREME as your data in a sample of 10,000 visitors EVEN THOUGH THE LANDING PAGES ARE IDENTICAL.
And then the logic is that because we consider 4% low, we conclude that B is better than A. Else, why would we observe something so extreme that it only happens 4% of the time.
Type 1 errors
This also means that there’s a risk of concluding that B is better than A even though it really isn’t. That' risk is called type 1 errors. When using a 5% significance level, you're willing to wrongly say that B is better than A 5% of the time.
An intuitive way to think about this is someone flipping a coin 10 times. It's entirely possible to flip heads 10 times in a row. But it's very unlikely, so if it happens, you're willing to conclude that the coin is not a normal coin. But even so, there's still a small chance that it's actually a normal coin — you just flipped 10 heads out of pure coincidence.
1% vs 5% significance levels
That's also why in medicine they often work with 1% significance levels when testing new medicine, treatments, etc. Because all medicine has side effects, we want to be even more sure that “B is better than A” - that the medicin works - before telling people to eat stuff with side effects. So we’re only willing to wrongly conclude that the medicine works 1% of the time.
Sample size and asymptotic theory
It’s important to say that all the math around calculating is based on what’s called asymptotic theory. Asymptotic theory is the theory of what happens with statistical tests such as p-values when sample size increases towards infinity.
It’s a little abstract - okay, a lot - but it’s important. The important thing to understand is that the math behind p-values etc. only holds when the sample size is sufficiently large. In other words, you need a lot of data for the p-value to even make sense. If you don’t have enough data, it’s like calculating 2+2 and getting 4, but it’s really 5. You just don’t know it.
How large, you may think? That’s impossible to say. If the effect is large, e.g., if your B variant really is 3x better than A, you need a lot less data to determine that. But if it’s only slightly better, you need a lot more data to separate out the signal from the noise.
I hope this helps you understand statistical significance. Please ask any questions and tell me if there's something unclear!
Mathias
What if the number #1 reason you're not reaching your goals isn't your product, your Facebook ads, or your email campaigns? What if it's your site?
If you're like most, you picked your theme in the Shopify theme store based on the looks. Or the price. And that's the problem.
Most Shopify themes are built to be pretty, not to sell. Theme designers come from a design background. They know how to design but not how to design to sell.
So, you’re leaving a lot of money on the table. Your site is the final destination for all your marketing. If it’s not performing as it should, it’s hurting all your marketing activities.
But what if you could transform your site into a sales-generating powerhouse?
With my CRO handbook, you learn how to design top-selling online stores. For just $49, you learn how to:
🚀 Dramatically boost your conversion rates
💰 Multiply your revenue without working longer hours
💼 Maximize the ROI of every marketing dollar you spend
Don't let your website kill your sales. Learn how to turn visitors into buyers and unleash your store's true earning potential.
Do you offer a money-back guarantee?
Yes. I'll give you all your money back if you don't think the content is worth $49 — no questions asked.