The Plot Thickens: How to Make Better Decisions with Your Data
The mean, the median and even a statistical test may only tell part of the story. You should always aspire to use your data in more sophisticated methods to get the most out of your new campaigns. We’ll show you which and why
Most of our day consists of decision making. From the moment we wake up, to choosing our outfit of the day and coffee at Starbucks, to the road we’ll take to work. We constantly make decisions. In our professional life, our considerations revolve around: which campaigns to send, which creative is better, what offering will produce the highest ROI, and many others that impact the revenue and the success of our company.
So, what does it take to make a good decision? Is good intuition sufficient? How about past experiences? While intuition is great, and using past experience works more often than not, what we should use to make the most informed decision is our data.
We’ll use the following example throughout this blog: We’d like to increase the spending of a particular group of customers, a target group. After we’ve done some deep thinking that results in a great offering – how can we decide if this offering will increase the spending?
If you’re already an Optimove customer, you know the answer – run a test/control experiment; which is what we’ll exemplify below. Now assume that the following numbers are the results of your experiment. Based on these numbers, can we say that our campaign is doing well, and we’ll send it again tomorrow?
At first it seems that the test group outperformed the control group in the average. Note that in this example, we pay no attention to the response proportion. The offering did cause people to spend more on average. So, would you like to keep this campaign running?
Remember Them Outliers
A savvy analyst like yourself realizes that making a decision based only on the mean is insufficient, so you decide to conduct a statistical test (e.g., a t-test). Surprisingly the test isn’t significant – there is no difference between the group’s mean. Now we have a contradiction: On the one hand, we see an actual difference between the means, but on the other hand, a statistical test indicating that the difference is random.
Moreover, we know that the mean is very sensitive to extreme values. Think of the following scenario: out of the 866 users who respond in the test group, 860 users spent $10 and the other 6 spent $9830.44. The average is still the same, but we know that it was “dragged” up because we had a small group of big spenders in the test group. If we run the campaign tomorrow based on today’s results, we might get different means. The t-test is insignificant due to outliers in the sample.
Your colleague suggests that you look at the median: Do the following figures convince you to keep the campaign running for another day? We can see that the difference between the median in the table below is smaller than the difference between the mean – which distance should we believe? What method can we use to make the right decision? What can we learn from these numbers?
To answer that last question, we first need to answer the following:
- What is causing the difference between the means?
- Why does the t-test come out insignificant even though we have a large distance between the means?
- Why is the difference between the median smaller than the difference between the means?
Data, Can You Keep It Down?
One key term we need to understand in order to get the answers to our questions is variance. It measures how much the data is dispersed. In other words, the average distance of the data from the mean. Let’s add the variance into our table and use it to answer one of the questions (note that I’ve used the standard deviation, the root of the variance):
We can see that the standard deviation in the test group is higher than the standard deviation in the control group – which might indicate that some users in the test group spend extreme sums (high or low), while the same appears less in the control group. Now we can answer the first question – the reason our means are so far apart is due to the high variance in the groups, primarily in the test group.
Using the same argument, we can answer the second question – the t-test came out insignificant as the high variance in the groups demonstrates that the mean we see today is not the mean we’ll see tomorrow or the day after – the mean will move up or down, thus we cannot claim that the difference is true and not random. Mathematically the explanation is a bit more complex but uses the same arguments.
But the variance alone cannot explain the minor difference between the medians. To understand why the medians are relatively closer than the means, we must think in terms of distributions. You can think of the median as the “center” of the distribution – half of the observations are to the left and the other half are to the right. If the median is close to one another but the mean is very far away, what can we learn?
We know that the median is the center of the data, and we know that the mean is very sensitive to extreme values, so one can assume that we have more extreme values (both in numbers of those values and the value itself) than in the control group. If I asked you to draw the distribution of each group – what would it look like?
This is a histogram of the data. Note that the red group is the control group and the green one is the test. Each bar represents the number of people who spent a certain amount of money. For example, we can see that almost 75 users in the control group spent between $35 – $50 and almost 25 users in the control group spent the same sum (in the red circle).
A few things pop right up:
- Most of the data – in both groups, is centered between $25 and $100. This explains why the medians are relatively close.
- We can see (in the blue circle) that there are some extreme values in the test group. This explains the difference in the means between the groups, and in turn explains why the t-test came out insignificant.
- We can see that in the test group, many users spent more than $300. All in all, we can see that the test mean is highly affected by 15 users or so (out of 866, that’s less than 2%). We can actually see the effect that a small subgroup has on the total.
The main takeaway from these three points is that prior to looking at the plot, we had to relay on vague numbers, such as the variance or the standard deviation to answer our questions. After looking at the plot, not only did we get a visual explanation to our contradictions (the means are far enough but the t-test is not significant), but we learned an easy and simple method.
The Sea of Dots
While histograms are extremely useful and very common as a visualization tool, they can sometimes be overly specific and we could get lost within tons of data. All we care to see is how much our data is scattered, if there are areas of high density, and how many extreme values we have in each group.
For that task, the recommended plot is the box plot:
The box plot maps histogram values into a simpler representation. The grey box represents 50% of the data, the left line is the 1st quartile (the 25% mark) and the right line is the 3rd quartile (75% mark). The thick line is the median and the red cross is the mean.
We can already see that in the test group the box is wider, thus implying that we have higher variance in the test group. The lines stretching to the left and right of the box represent the 5th (5% mark) and 95th (95% mark) quantiles. Again, we can see that the test group contains higher values than the control group (this is one of the reasons why the test group’s mean is higher).
The last thing in the plot is the dots to the right. These dots represent values above the 95th quantile – the extreme values. We can draw the same conclusions here as in the histogram – we see small clusters of big spenders in the test group – they inflate the variance.
In conclusion: We began with a problem – the raw numbers draw a confusing picture: the means are far away from each other while the medians are close, a t-test came out insignificant and we had a decision to make – send this campaign tomorrow or not. We saw that trying to use simpler numbers (the variance and the standard deviation) helped us get a better picture but still, we had three open questions we couldn’t answer. We then used the histogram to apply our data and it proved to be helpful. The final solution was to use the box plot for visual learning, and we used it to answer all of our questions with ease. Our take home message: Numbers can only tell part of the story, we can learn a lot from plots, and always look at a data plot before deciding.
P.S. Why t-test is not significant when the variance is high
This is a mathematical explanation as to why when we have outliers within the data, the t-test is insignificant, even if we have a large enough distance between the group means. Roughly speaking, t-test measures the ratio between the means distance and the combined variance of the groups. If the distance is “large” relative to the variance, we say that the distance isn’t due to randomness and the test is significant. If the distance is small relative to the variance, we claim the opposite.
When do we have a large variance? When we have outliers in the data (recall that the variance measures the distance of the observations from the mean). Here’s a short numerical example:
Assume we run a campaign and got the following results:
t-test result: The test mean is bigger than the control mean.
Now, let’s change the values of the minimum and maximum observations in the test group, without changing the mean:
Now, the t-test is not significant, despite the fact that the means are the same.