How to Treat Every Customer Campaign as a Marketing Experiment
As a marketer or retention expert, you are spending plenty of time and money on your customer retention campaigns. This article will discuss how to measure the effectiveness of your campaigns – in monetary terms – in order to optimize future campaigns and maximize the revenues they generate. In other words, we’re going to explore how to treat every campaign as a “marketing experiment.”
Many marketers use email open rates and click rates as their primary means of measuring campaign effectiveness. These metrics may provide important insight into brand strength and customer engagement, as well as aspects of the actual email campaign (e.g., the offer, the subject line and the template visuals). However, these metrics provide no indication of what customers did on your site after they clicked on the campaign. Even more importantly, the standard response metrics tell you nothing about the monetary uplift generated by the campaign – and this should be the most important metric you’re looking at.
Control Groups for a Marketing Experiment
The key to determining the effectiveness of any customer marketing campaign is the proper use of a control group. A control group is a subset of the customers you’re targeting with a particular campaign who you decide will not receive the campaign.
The members of the control group are randomly selected to represent the entire target group of customers. In other words, they should be similar to the members of the entire group and thus be exposed to the same set of conditions, except for the particular marketing campaign being tested.
For example, let’s say that an online retail marketer has performed some customer segmentation to select 10,000 customers to receive a particular offer. He will send this campaign to only 9,500 of them (the “test group”), setting aside a randomly selected subgroup of 500 customers (the “control group”) who will not receive it. Once the campaign is over, the marketer will determine the effectiveness of the campaign by comparing the additional revenues generated by the test group with those generated by the control group.
Determining the Ideal Size of your Control Groups
It is very important that the control group is a representative sample of the overall campaign population. When the control group size is large enough, the process of random selection will always result in a control group that effectively represents the entire group.
The sample size you need depends on the size of the overall campaign population. For 10,000 customers, as in the example above, 5% is sufficient. As a rule of thumb, smaller campaigns require a larger percentage to generate a valid control group. So, for campaigns targeting less than a couple of thousand customers, it’s a good idea to use 10%-20% instead.
There is one additional factor to consider when deciding upon the size of your control group: your expected response rate. When you expect a particularly low response rate for a particular campaign (for example, when sending an offer to long-dormant “churn” customers), you will need a larger control group in order to obtain statistically significant results. On the other hand, if you expect a particularly high response rate (for example, sending a bonus to your best customers), a smaller control group will be sufficient.
Selecting a Control Group using Excel
You can use Excel to easily extract a control group from any list of customers. See the Appendix to this post below for detailed instructions.
Analyzing the Results of your Marketing Experiment
Let’s imagine that, before starting to use control groups, you sent a marketing campaign to all 1,000 of your best customers. You offered them a 10% discount on every product in your store for one full week.
The campaign’s metrics might have looked like this:
At first glance, the campaign results look terrific! 20% is a high response rate, and 200 customers spending an average of $200 apparently generated an additional $40,000 of revenue in one week!
But wait: since you didn’t run this campaign as a marketing experiment, you really have no way of knowing how many of these 1,000 customers would have made a purchase this week anyway, or how much of this $40,000 would have been spent by these customers even without the campaign.
One might think that by simply comparing the purchase rates and spend amounts of a similar group of customers from a previous time period (when no campaign was run) to the current period (in which the campaign ran) can reveal how much additional revenue was generated by the campaign. The reason that this comparison is not valid is that there are always numerous other factors affecting customer behavior from one period to another. It is crucial that the comparison between the test and control groups is made during the same time period.
Running the same Campaign as a Marketing Experiment
If you had run this campaign as a marketing experiment, using 10% of the target customers as a control group (who did not receive the campaign), the campaign’s results might have looked like this:
Let’s take a look at the control group. Even without receiving any offer, 14% of your best customers made a purchase from your store during the week in question anyway! They spent an average of $150 each. Since the control group represents the entire target group, we can extend the control group’s buying behavior to represent the scenario that no campaign had been run at all. Thus, we could have expected that, absent any campaign at all, the entire target group would have made purchases totaling $21,000 (1,000 customers x 14% x $150).
So the question we need to answer is: how much additional revenues resulted from the marketing campaign? (For the sake of simplicity, we are not considering any costs to the company of offering the 10% discount, although including this in the calculations would be straightforward.)
In actuality, the set of all 1,000 of your best customers spent $38,100 this week: the 180 customers who received the campaign spent a total of $36,000 plus the 14 customers who didn’t receive the campaign spent an additional $2,100.
Thus, the actual gain generated by this marketing campaign is $17,100 ($38,100–$21,000), a far cry from the $40,000 apparent gain we concluded before using a control group.
Recap
In conclusion, treating this campaign as a marketing experiment allowed us to obtain an accurate understanding of the campaign’s actual success.
Using control groups, you can determine the actual monetary uplift of every marketing campaign. By testing many campaigns, and keeping close tabs on the true effectiveness of each, you will be able to gradually optimize all your marketing campaigns for maximum financial results (as well as improved customer retention and customer engagement).
Running a Clean Marketing Experiment
To retain the integrity of the results, it is important to ensure that no additional factors under your control are influencing customer behavior. In other words, during the measurement period of a particular campaign (usually a number days), the test and control groups should not be exposed to any other targeted offers or incentives. If customers are receiving multiple simultaneous campaigns, it becomes impossible to measure the effectiveness of any one campaign. In our experience, many marketers fail to realize the importance of isolating their marketing experiments.
Thus, keeping track of which customers are within the duration period of other campaigns and excluding them from any additional concurrent campaigns is a crucial aspect of selecting target customer groups for any marketing campaign.
Automating the Process
Selecting target groups for campaigns, extracting valid control groups and analyzing all of the results is not an easy process to perform manually (e.g., using Excel). This is especially true for companies running dozens of marketing campaigns every month, including regularly recurring ones for particular target groups (e.g., new customers, VIPs, customers at risk of churn).
It is far more advisable to use software that can automate this process. Our Optimove customer modeling and campaign management system can do this for you. Optimove automatically selects lists of customer IDs based on defined targeting criteria and then can automatically select a valid random control group of the ideal size. After the campaign measurement period concludes, the software will calculate the financial uplift generated by each campaign (as well as other key metrics). Ultimately, Optimove makes every campaign into a measurable marketing experiment which feeds the software’s self-learning recommendation engine.
More Ways to Analyze Campaign Performance
The analysis methodology discussed in this post is the basis of all good campaign analysis. In my next post, I will delve deeper into additional approaches of analyzing and optimizing campaign performance. Stay tuned…
Appendix: Selecting a Control Group using Excel
You can use Excel to easily extract a control group as follows (click here to open an Excel file demonstrating this approach):
- Copy the entire list of customer IDs you’ve selected for a campaign into column A of an Excel spreadsheet.
- Enter the =RAND() function into column B next to each customer ID. This assigns a random number between 0 and 1 to each customer ID. (A fast way to do this is to type the function into the first cell and then double-click the + drag handle which is at the bottom-right corner of the cell.)
- Select and copy all the numbers in column B. Then, use Paste Values (in Paste Special) to paste the actual random numbers into column C.
- Sort the table by column C. This will randomly scramble the list of customer IDs.
- Remove the top 5%-20% of the customer IDs in the list to be your control group – these randomly selected customers will not receive the campaign.
Great refresher on this process Pini. I’m very interested in measuring how social media campaigns can affect loyalty, but I’m not sure how you can apply control groups in this channel. Do you have any experience measuring social media’s impact on customer retention?
Hi Dave,
Thanks for your comment.
You can use control groups only when you can control who sees the message and who doesn’t. So, if you post something on your Facebook fan page, you are not in control, and potentially any fan might get the message. There is no way to use control groups in this type of activity. However, on Facebook, you can use their custom audience tool and set up a PPC campaign for specific people (based on their email addresses). Then, you can isolate some of them and make them the control group by not including them in the PPC campaign. I hope this helps!
Cheers,
Pini
Hi Pini,
quite an insightful article. Is there any way we can apply this to kind of marketing experiment to a new to company prospect base. For exp – if we had an email campaign over a month with a series of emails to around 4k recipients for a free trial followed by 2-3 conversion emails during the course of the trial, how would you apply this methodology ?
You cannot apply this methodology to prospecting campaigns. Control groups are used to measure incremental improvements to customer retention as a result of a campaign. In prospecting campaigns; all conversions are incremental since the control performance is effectively 0.
Don’t hold back a control sample on prospects… you’ll only be missing out on potential sales!
You could do a creative test… try sending a different creative to a sample of prospects. Then measure which one works better to move to the new one if it outperforms the old.
Can you elaborate on your statement – don’t hold back a control for prospecting campaigns? I’ve used this practice in the past and need addtl insight to support not doing it. Is there something I can refer to that supports your statement?
Hi Alok,
I’m glad that you found the post insightful.
As to your question, the campaign you would like to test should be separated into two semi-campaigns: (1) the initial email sent to your prospect base – introducing your service for the first time, and (2) the follow-up emails to those who registered for the free trial.
As Josh explained in his response to you, there is no point of setting a control group when sending out the initial intro email, as your prospects do not know of your service in the first place, and therefore won’t have an opportunity to register to your service if the email wouldn’t have been sent.
As a side note, in order to examine effectiveness of such campaigns, you can use split testing (aka A/B testing), in which you target a specific group through a given channel with two (or more) different marketing messages simultaneously, and see which works better on the group.
When executing the second semi-campaign (following up on the prospects that registered to the free trial), you should definitely set aside a control group, as you are expecting part of your prospects to convert even without receiving the follow-up email; the control group set in such a campaign will represent what would have happened if you hadn’t executed the campaign in first place. By comparing the behavior of the test group (who received the campaign) to the behavior of the control group (who did not receive the campaign), you’ll be able to calculate the actual uplift generated by the campaign.
Cheers,
Pini
Hi Pini,
That was really helpful. I’m interested in knowing if there could be cases of test groups outperforming control groups?
How should such situations be handled? Do we need to address the control group with offers made to test group? Will it help?
Thanks and regards,
Amod
Hi Amod,
Sorry for the delay in my response, I’ve been traveling for the past couple of weeks.
Yes, it does sometimes happen that the control group’s desirable behavior (deposits, wagers, purchases, etc.) exceeds that of the test group. In our experience, this most often happens when the campaign was not set up sufficiently well for the test. Examples of poor setup include targeting too few customers in the campaign, a control group that is not a good representation of the test group, too short a measurement period or overlapping campaign activity (i.e., the control group customers received a different campaign during some or all of the measurement period of the campaign we’re testing). When the campaign setup is valid, it can still happen that the control group does better than the test group. This type of result indicates that the campaign was a failure – not only did it not encourage the recipients of the offer/incentive to spend more than the control group (who spent what they did without receiving any incentive), but it may have even “turned off” some customers, i.e., prevented some of them from making purchases that they might have otherwise. Sometimes such a result is very valuable – because it gives the marketer a strong indication of what not to do! Of course, before reaching such a definite conclusion, it’s always a good idea to run a test like this again to eliminate the possibility that some kind of one-time anomaly affected the results.
Cheers,
Pini
Hi Pini,
Many thanks for your insightful article.
Could we use the same approach in selection of representative control group for mobile phone services. Please take into your consideration that we already have a clear sense on customer usage level, lifecycle etc. Could we use such kind of historic data to choose the representative control group.
Please also advise on the possible revenue uplift calculation scenarios for telecom company.
Thank You,
Elchin
Hi Elchin,
I’m glad that you found this post insightful and are looking into implementing this methodology for your business.
Assuming your customer base is somewhat large, as your business is mobile phone services, once you isolate a control group using pure randomization (the statistical term is Randomized Experiment), this group will serve as a representative sample. In order to get the best results, you might want to run a few iterations, and each time examine the sample’s stats in comparison to the population. Once you receive similar results, you know your sample represents the population (a valid representative sample).
Using this methodology, you cannot make conclusions about historical events. Rather, the use of this method will help you make conclusions about an action you take on your population, when isolating your control group. Then, your control group, which did not receive the action, will represent what would have happened – regarding the entire population – if you wouldn’t have taken the action in first place. Comparing the test and control groups, you’ll be able to accurately calculate the impact of the action you took.
If you’d like further information, please contact us using the Contact page on this site.
Cheers,
Pini
Regarding isolating the experiment, you say “during the measurement period of a particular campaign (usually a number days), the test and control groups should not be exposed to any other targeted offers or incentives”.
if a marketing email is sent during the measurement period for another email, and the mailed and control groups selected for the first email should in theory receive the second communication in equal proportions, would this interfere with the uplift estimate of the first email? I’ve had difference of opinion on this question. I suspect that it could interfere with the uplift calculation on the first email, e.g. customers who are in the control for the first email might be more likely to be incremental responders on the second email because they haven’t just received and responded to an email offer, which would boost the control on the first email and underestimate revenue. Any thoughts?
Hi A, thanks for your question!
This is only one of many reasons why customers receiving a marketing campaign (an “experiment”) should not be exposed to any other targeted offers or incentives. Even if equal proportions are used for the second email, there might be a correlation between the effects of the different offers which will affect the campaign results observed. Furthermore, the theoretical assumption that customers are more responsive to the first offer they received (more so than the second one) is questionable. It depends on numerous factors, including at what lifecycle stage the customer is and the type of offer received. In any case, to design a valid experiment which isolates the impact of a single campaign (as much as possible), we must design measurements to rely on facts rather than on assumptions which may or may not be correct.
Also, if you send more than one email, even if it’s distributed evenly between test and control groups, you won’t be able to determine what was the exact effect of the actual campaign you are measuring, and which of the campaigns had driven the customers to action.
Regards,
Pini
Pinni,
Having a randomized control group for each of the email allows you to determine the incremental impact of each email, even if they are sent around the same time to roughly the same audience. There is no need to just trust my opinion – you can run the numbers yourself. The whole point of a control group is to control of all outside factors , including other communications the groups are exposed to it.
I also believe it is a dangerous advice to hold any kind of marketing while measuring against control. In real life, all kinds of marketing happens, and your “clean” test results are likely to be not applicable. If you will be using your marketing in BAU situations, you should be measuring in BAU situations.
Dear Tanya,
You’re right that the marketer cannot possibly control every outside factor that might influence the test and control groups differently. The main idea behind using test and control groups in marketing is to create a state in which both groups are in environments as identical as the marketer can possibly control, with the one exception of the campaign itself: the test group receives the campaign, and the control group does not.
Cheers,
Pini
More interesting is to measure whether the indicators really are different statistically . Therefore it is suggested to test hypotheses and then if variables is increased, an experimental design is performed.
-Misael
Great point Misael! For the sake of simplicity in this blog article, we didn’t mention that. However, that is exactly the approach we use in Optimove to calculate a campaign’s uplift and report statistically significant results.
Cheers,
Pini
Hi Pini,
How could a campaign’s effectiveness be determined without using the Test Vs. Control methodology? I’ve been using Test Vs. Control methodology for a while (90% Test and 10% Control to date) and now we don’t want to lose out on communicating to any customers. How can I connect with all the customers and still be able to measure the effectiveness of the campaign?
Thanks in advance,
Skanda
Hi Skanda,
No, there’s no way to estimate the campaign’s actual effectiveness without isolating a control group representative of the test group. However, if you must communicate with all the customers all the time and there’s no way whatsoever to keep some out of the messaging loop, you should use A/B testing to try to find the best match between the campaign and the customer persona. But this approach doesn’t allow you to evaluate the financial outcome of your campaigns.
Moreover, if you use a correct randomization mechanism, you reduce the chance that a certain customer would be included in the control group too many times (using a 10% control means that correct randomization will result in a 10% chance that a customer is included in the control group the first time, and a 1% chance that he will be included in the control twice in a row). Customers change — it is very important to validate your marketing plan all the time, and make sure it always remains relevant and engaging.
Thanks for your question,
Pini
What happens in the case when you want to measure the control group using 2 scenarios for a single promotion. Let’s say you have a campaign with 2,000 homogenous customers which in one scenario you offer a 10% discount (like your example) and in another scenario you offer a cash back rebate?
How would you conduct the test? Would you have one group receive the discount, another group receive the rebate and compare both against one control group? Would you have 2 control groups one for the customers receiving the discount the other for the customers receiving the rebate?
Hi Gabe,
The most accurate way to measure the uplift of each offer (action) in the scenario you describe is to create three different groups, such as: 45% offer A, 45% offer B and 10% control, and make sure each group is a representative sample of the target group population (in your example, the group of 2,000 homogeneous customers). Afterwards, the results should be compared using ANOVA and post hoc (e.g. Tukey’s) tests. Another, less recommended, option is to apply A/B testing (50% offer A and 50% offer B) to compare the results of these two offers. This will indicate which offer is more successful, but will not allow estimating the actual financial uplift of either offer.
Cheers,
Pini
Can we build two separate models for test and control group to predict loyalty and perform sensitivity analysis using the new scores?
If I understand correctly, you are asking if one can estimate the impact of a given campaign on the future value (FV) of the customers in a campaign (and not just the immediate uplift) by analyzing the differences between the behavior of the test and control groups.
The answer is absolutely! You will need to calculate the expected future value of each individual customer in the data warehouse every day (like the Optimove software does). Armed with this data, one can analyze the before-and-after FV differences among the test and control groups (i.e., perform a sensitivity analysis) in order to determine if a particular campaign made a statistically significant impact on the FV of the test group versus the control group.
Cheers,
Pini
Great article!
We send out multiple campaigns every quarter with assigned test and control groups for each campaign. We want to know the net incremental revenue impact of all the campaigns as well as the confidence interval of the uplift. We can add all the net revenue uplift but not sure how to find the confidence interval of combined uplift of all campaigns. any suggestions/ thoughts?
Thanks
Hi, thanks for your question.
When discussing uplift calculations, we don’t usually talk about confidence intervals.
If you have a number of identical campaigns (meaning, the same offer with the same measurement period duration, with the same split ratio of test/control recipients, etc.) run at different times, then you can combine the parameters of all these campaigns (even if some of the individual results were not statistically significant) and analyze the uplift as if they were all a single campaign. This, by the way, is exactly what Optimove’s “recurring campaign analysis” feature does.
However, it would not be valid to do this for a number of different campaigns. In the case of different campaigns, there is no statistically valid way to analyze them in the aggregate. However, if all of the individual campaign results of a set of different campaigns were statistically significant, then it is valid to sum all the uplift figures and this result will also be statistically significant.
Cheers,
Pini
Great read, I have two questions:
1. I know you said 10% as control size, but do you have a control size calculation to have enough control in place to ensure results are valid but not too much (loss of revenue?), i.e. if target group is 200,000, then 10% of that is 20,000 most statisticians would suggest 500 as control?
2. How do you check if the uplift between control and treatment is significant? If response rate is low, say 1 out of 500 customers responded, would this still be a valid campaign? or should I disregard the campaign and increase sample size for the next one?
Thanks,
Mark
Hi Mark,
Good questions!
To your first question, the sample size you need certainly depends on the size of the overall campaign population. If you have a target group of 1M customers, even 1% would be sufficient for a control group. There are two factors to consider when isolating a control group: (1) The customers in the control group must be a good representation of the overall group. For instance, both groups should contain the same percentage of VIPs, mid-tier and low-tier customers. (2) The absolute number of customers in the control group should be at least 30, as a rule of thumb. If that’s not possible, I recommend repeating the campaign/experiment multiple times in order to aggregate more observations and analyze them together.
To your second question, we’re examining the uplift of the campaigns via two parameters: (1) The difference in the “response” rates. In other words, what’s the ratio of customers between the test and control groups who made a purchase, placed a deposit, played the game, etc. during the measurement period? (2) The difference in the activity amount (e.g., purchase amount, deposit amount). In other words, how big is the difference between the average purchase amount of customers from the control group who made a purchase/deposit and customers from the test group who made a purchase during the measurement period. In the situation you mentioned, we first check for a statistically significant uplift in terms of response rate. So, your campaign might certainly produce a valid uplift result. However, because there is only one respondent in the control group, we cannot perform the statistical test.
Cheers,
Pini
In the example you mentioned that you would test for statistical significance of the uplift for the response rates. Would you also have to test for significant difference in the average $ amounts? My concern is that if there is large variance in the $ figures the results could be misleading.
Thanks for your comment Peter. Average income amounts definitely should be tested for statistically significance. We recommend performing a statistical test which takes into account the variance of the values. Our software’s built-in mechanism uses the T-test in order to determine the significance of the uplift for each campaign.
Cheers,
Pini
Hi Pini,
I need some suggestion for a Test vs. Control study which I need to work on. It is related in the area of Fraud/Risk. We have identified around 23,000 people as high chance of committing fraud online and we want to do the Test vs. Control study where the test treatment is – people in Test group will be reviewed and take action against them whereas no action will be taken to people in Control group. And the success of the action will be decided by measuring and comparing on certain performance metrics between test and control group. Here, the maximum number of people we can keep in control group is between 5% to 10% of 23,000 people. The remaining people will go to Test group. For example, 5% of 23,000 in control group and 95% of 23,000 in Test group.
The suggestion I require from you is what should be the statistical significant % of people I can keep in the Control group so that I can get the significant read on those metrics after the test and do Hypothesis test of significance between the 2 groups to compare the performance with certain degree of confidence. Please note that these 23,000 people are new customers and we don’t have any historical transaction data for them.
To add further, the reason we are not doing 50%-50% split of 23,000 people for Test and Control Cell is we can’t afford to keep 50% of customers (Control group) to commit fraud and no action is taken. So, we want to keep the % of customers in control cell as minimum as possible.
Thanks for your advice,
Mona
Hi Mona,
Since you don’t have any historical information about these customers, the only way to isolate a control group is by random isolation. In this case, there’s no way to ensure that the control and the test groups are balanced.
Given a total group size of 23,000 with random control group isolation, a control group containing 5% of the entire group should be sufficient to obtain a representative control group that will yield statistically significant results.
Cheers,
Pini
Hi Pini, a great article, thanks.
Hi Pini,
Thanks for this valuable reminder.
I have a few questions about testing using fallow / holdout groups, i.e. a group of customers who never receive any (email) marketing campaign (even if they subscribed to marketing campaigns).
1) What percentage of the customer base should be used as holdout group?
2) What’s the purpose of this form of testing? To understand whether email campaigns as a whole are creating an uplift (and how much)?
Thanks for all your articles, I enjoy reading them.
Francesco
Hi Francesco,
I’m glad you found this article valuable.
To your first question, there truly isn’t an absolute value. You want to aim for a control group that effectively represents the entire segment you are targeting. Therefore, the sample size you need depends on the size of the overall campaign population. For example, for a campaign targeting 10,000 customers, 5% is sufficient. As a rule of thumb, smaller campaigns require a larger percentage to generate a valid control group. So, for campaigns targeting fewer than 2,000 customers, it’s a good idea to use 10%-20% instead.
Solutions might also include an auto-calculation mechanism (Optimove does). In cases of a recurring campaign, this mechanism would adjust the control group as results are obtained. For example, it would lower the control group percentage when the campaign is having a positive impact and results remain statistically valid.
Regarding the purpose of this form of testing, you are correct in your assumption. The key to accurately determining the effectiveness of any customer marketing campaign is the proper use of a control group. Since the control group is a representative subset of the customers you’re targeting with a particular campaign who will not receive the campaign, it can provide an indication as to whether your campaign is having an impact on the customers. By comparing response patterns between those customers actually receiving the campaign and those in the control group, you can determine the uplift of your campaigns.
This approach is not restricted to only a single campaign or channel (as in your example regarding email). By testing many campaigns or series of multiple campaigns (e.g., an onboarding journey), and tracking the true effectiveness of each series/strategy, you will be able to gradually optimize your entire marketing strategy for maximum response and uplift.
Cheers,
Pini