Optimization: Test, Improve, Repeat
An Optimove professional workshop about marketing plan optimization as an ongoing learning process
– [Yohai] I am Yohai, Optimove Chief Data Scientist. And here, we meet Tal Kedar, the CTO.
Now, it’s a big topic and we are very short on time, so let’s dive right in. So, I’m sure that, as marketers, you all wish to create the most effective and meaningful interaction with each of your customers, and you also want to improve by doing so, you want to get better. Now, in this session, I would like to share with you how do we, as scientist, view and address that challenge. And the main insight or headline that we’d like you to take is that, an optimization process, a marketing optimization process, it’s actually an ongoing learning process. You learn and it allows you to improve in step-by-step, like it allows incremental improvement.
Okay. So, any such learning process contain three main components: learning units, actions/offers collection, and a scientific environment that allows to measure the effect of each match between an action and a customer. Now, let’s elaborate of each of the components. First, the learning units, your customer groups. By customer groups, I mean VIP customers or active big spenders, or chain customers who have potential, or any other customer groups that can be defined by demographic attributes, or by behavioral attributes, or by any other information that you have about your customers. And the question here is, what makes a customer group a good learning unit, a good and valid learning unit? So first and foremost, it has to be meaningful, and it must be clear and well-understood. And if we go a bit higher level, if we just zoom out and look on all of our learning social, they have to be distinct and unique and each one should be different from one another. And why is that? Because if we send a campaign for a group, and when we aggregated the result and we see if the campaign was effective, if this group is meaningful and we can understand it, so we can learn something from that interaction.
For instance, we can learn that 20% discount is very effective for active big spenders in risk of churn, but it’s not effective for new, young high-rollers. But, on the other hand, if this group is just a meaningless bunch of random customers, it has no logic in it, so we cannot learn anything, it becomes a black box and we don’t want that. And I tell you more is in than that, in many smart marketing team that I got to meet, these units actually dictate a jargon inside the department, and it streamlines the insights of the learning process. And then you can hear sentences like, “We must switch the campaign for the active big spenders because it’s not active and it’s not effective anymore,” or, “The campaign for the churn customer is really effective.” And customers in your team and the marketers start to speak this language and streamlines all the information, okay?
So, we have our learning units and we gain information about them and try to improve their performance. But how many customer groups should we have, how granular should we go? Now, it’s clear that if we treat all our customers as a single group, only one learning unit, so it would be too heterogeneous and too crude, and we won’t be able to find the treatment that fit all our customers. On the other hand, if we treat each customer of a single learning unit, so it also has a price because it’s very difficult to manage, very hard to manage. And that’s not the only price, it also might damage the ability to generalize. It also may damage our ability to generalize our insight. And this concept is a bit tricky, so let me simplify by an example. Let’s say that one of my customer group, one of my learning unit, is defined as a 40-year-old male who live in London, jogs every day at 6 a.m., works in a bank, and after he gets back from work, he reads fiction. I know him very well, so I might give him a tremendous treatment, but I cannot generalize it to other customers.
Now, on the other hand, if I zoom up a little bit and I define my group as male between 37 to 40, who live in a big city and exercise, so then it would be much more easy for me to apply what I learned today to the customer that would belong to that group tomorrow. And in case it sounds familiar, it’s because it’s a very well-known concept that’s called generalization, versus other fitting which is relevant to many domains like machine learning, statistics, mathematics, and many others, okay? But we have to get a decision, so how granular should we go? There’s no sweet spot here but what you should have in mind is that you don’t want to be too heterogeneous, you want your group to be homogeneous. And you must have a number of groups that you can manage that fits your resources, okay?
Next component, the actions bank. And I won’t elaborate on this one because Itai and Dana are giving a fascinating session on exactly how to create and maintain a rich and varied actions bank. But I do want to make a short comment regarding the last two components. Many times, you will see that the process of defining the learning units and the process of establishing the actions bank, hold back customers from start…sorry, marketers, hold back marketers for start, our customers. They don’t know how to start so they just freeze, and that’s a bad option. It’s a learning process we’re talking about, it is very important to kick it off and start learning.
And this process can be kicked off also with a simple definition of learning units, like lifecycle stages, or RFM states, or any other simple business logic that you have in your company. And with also a few actions in your bank, you don’t have to have like dozens or 100 actions. Even if you have only few actions in your bank, you can kick off the process and start learning. As I said, it’s a learning process, start learning, okay? Last but not least, the third component, the scientific environment that allowed to measure the effect of each match between action and customers. Now, at that point, you already understand that optimization is primarily based on learning, which is based on the ability to measure. Now, if you can’t measure, you can’t learn. And if you can’t learn, you can’t optimize. It’s that simple and there’s no way to cheat here.
So, in order to measure, to be able to measure the effect of each interaction, you should follow a few simple yet tremendously important rules. Rule number one, think what you are trying to get, what do you try to achieve? Are you trying to win back churn customers or to increase loyalty, or maybe to increase the other value? According to that, according to the campaign objective, you should define set of KPIs. And now, rule number two is use a control group. And that is very important, and yet, many marketers tend to neglect that. And we can clarify it by a very simple example. Let’s say that last week at Sunday, last Sunday, you sent a coupon to all of your database, all of your customers got a coupon from you. And at the end of the week, you aggregated the results and you saw that there was an increase of 10% in the order value relative to the previous week before we sent the campaign. Can we say that that campaign was successful? Did it increase the order value? We can never tell. Unless we have a comparable control group, we don’t know what would have happened had no campaign been sent. A comparable control group gives us just that information, it gives us the baseline of what would have happened. Now, I’ve repeated that term “comparable control group.” Comparable, what do I mean by comparable? Take a look at this slide.
So, let’s say that I have a magic pill that can boost the performance of sumo wrestlers. And I will start a fight between these two guys and give my pill to the big one. Is that a valid experiment? Can I rely on the results of this experiment? Goes without saying, of course, I can. And the same work with your customer, with your campaigns, when you isolate a control group, you should use the random allocation, and you must verify in advance that the customers of test and control are equivalent at least in few main parameters. We call them variances and let’s give an example. If you have like VIP customers in your group, so you should verify that the proportion of them is similar in both test and control group. And if you think the gender might affect the results of the campaign, so you should balance it in advance.
Okay. So, we defined objectives, we isolated a control group, we sent the campaign, aggregated the results, now, we have to measure the net effect, test minus control. That simple. But wait a minute, how do we do that? Do we take the average of the test minus the average of the control? Many marketers tend to do it and that’s an issue. It’s an issue because averages are highly sensitive, and they might be corrupted by only a single observation, it’s all what it takes to make the average non-representative, one observation. That’s why we, as scientists, we never ever look only on the average, we always look on the entire distribution. You can do it using the box plot, or any other plots that show your distribution. You can look on percentiles and the median and standard deviation, but don’t rely solely on the average. And if you want to be more sophisticated and you want to be more certain that your results are not coincidental, so you should use statistical testing. And that’s a very wide subject which is beyond the scope of that session, but if you have question, so you are most welcome to step by after the session, okay?
So, we’ve completed all the steps and now we have a figure in our hand, we have a new piece of information. We know what is the net effect, what is the uplift of the interaction of action A and group B. How do we use it? So obviously, there are many sophisticated and advanced methodologies that can do that, but for the sake of simplicity, I would like to discuss three simple options which I like to think of as a divide and conquer strategy. Option number one is successful match. The campaign is effective, it works, what we’re going to do? We’re going to keep sending it, we can though reduce the proportion of the control group, we can move from exploration to exploitation, but we don’t stop the measurement, we don’t stop to measure because things are dynamic. And what works today might won’t work tomorrow. And if it stops work, we want to be able to figure it out, so don’t stop to measure.
Okay. The second option is an ineffective match. Simple again. The campaign is not effective, it doesn’t work, what are we going to do? We’re going to go to our action bank, pick another action, and switch it with the current one, that’s it. Third option is a bit more tricky, here, we have a partial match. What does it mean? It means that the campaign is effective only on a subgroup of the initial customer group. So here we have to split the customer, the former learning unit, we keep sending the campaign to one group, to one subgroup. And also, we switch action for the other group. And now, in case you didn’t notice, we’ve just added a new learning unit to the process. All its process, it’s actually an iterative process. As you can see, we set campaigns, we aggregate the results, we conclude insights. And then we inject them back into the process in order to get better, in order to improve and get better results.
Now, before I give…I’m giving the mic to Tal, I want to give you some homework to do when you get back to your office next week. First, make sure your learning units are meaningful, distinct and homogeneous. Two, verify that each of your campaigns use a comparable control group, uses a comparable control group. Three, take a look on your distribution, on the distribution of your results, and make sure that you don’t rely on averages that are corrupted by a single or two observations. And number four, make sure you have an iterative learning process set in place. Please Tal.
– [Tal] Thanks. So what I’d like to do is to give you a concrete example of implementing the methodology that Yohai described by taking a detour to the casino. And in American slang, the slot machines, those machines that you put a dollar, you pull a lever, and you either win or you don’t, as in my case, are called a one-armed bandit, because they take away your money and they have only one arm. But imagine walking into a room where you have multiple such machines of different makes.
So, each machine has a different probability of winning when you pull the lever. And if you do win, for each machine, you’d end up winning a different amount. One of those things, we don’t know in advance, and the aim is to get a strategy that over a long, long series of games, guarantees that you maximize your winnings or minimizes your regret. This is actually a marketing problem, right? Let’s assume we have multiple variants of an email, right? One with a cute dog picture, one with a cute kitten, and one with a horse, whatever it is. When you send out those variants, each would have their own conversion rate, right? And when people convert based on the offering, you might have a different transaction amount or deposit amount or whatever it is that emails promote. And we’re looking for a strategy that, given best results, would allow us to converge on the best variant to send, optimizing whatever KPI we’re looking, so in this case, revenue attributed to this campaign, right?
There is a family of optimal strategies for this well-known problem called the multi-armed bandit problem. It’s a family, it’s not a single solution. And what’s common to all solutions in the optimal family is they strike some kind of an optimal balance between using the information about the best option encountered thus far, and trying out other options in the hope that eventually they prove to be superior, and so we would want to migrate to those. And there is a frontier, in a sense, there is an optimal balance between exploitation and exploration, you can be more conservative or less conservative, but you can’t be better than some frontier, and you don’t want to be below that. One such simple strategy is called Thompson sampling and it boils down to choosing the variant which has the highest probability of being the best. It’s a very, very simple strategy but still one of the optimal strategies. When we start, we have four variants, each of them has an equal probability of proving to be the best.
So, we’re going to send 25% the audience each of the variants. After the first batch of emails, we might get a little bit more information. The third variant has a 40% chance of proving to be the best, so we’re going to send that to 40% of the audience, right? A little bit more data changes the picture again. And now, the fourth variant has a 49% of being the best, so we’re going to send that to 49% of the audience. This very simple strategy can be enhanced by running the experiment in parallel over many segments of the audiences, those are the learning units that Yohai described, right? Males that age that jog, really boring segment, but we can come up with better ones. And this very, very simple system that I described is a way to automatically scale out the kind of experiments that follow the methodology in a very easy way, accessible to marketing teams of all sizes and sophistication level. That’s it.