Prediction Isn’t Prophesy! Reverse-Engineer to Predict Customer Behavior

Video Transcript

So, hello again and welcome to our session, “Prediction is not a Prophecy: Reverse-Engineer to Predict Customer Behavior.” It has always been the dream of marketers to predict their customer’s behavior. Marketers are interested in questions like, “What is the next product my customers would buy? Who has the potential to become a top-tier customer?”

Or questions like, “Who is at higher risk to churn and to end his relationship with our brand?” Today with the rise in digital and the wealth of new data coupled with advancement in machine learning and AI technologies, we are able to address those challenges and give those questions an answer. Now, prediction might seem a little bit overwhelming and daunting, so for those of you who don’t have any kind of experience in coding language such as R or Python, in today’s session, we are going to show you how you can get started and overcome the perception of a barrier that some of you might have and to be independent in creating your own predictive model.

Now, for those of you who have already taken their first step in creating prediction model, we will also dedicate part of this session to talk about how these models can be taken to the next level. My name is Aviram Khayat. I’m a marketing data scientist in Optimove as part of their strategic services department. Here with me today, Jonathan Inbar, who is a team leader in the strategic services department as well.

Both Jonathan and I are facing with machine learning algorithms and predictive modeling on our day-to-day work. And today, we would like to share with you some of our experience related to the question, “How to predict customer behavior.” So, we will start with talking about some of the key challenges in predicting customer behavior.

Then we will talk about some common use cases in popular model that we are faced on our day-to-day work. And today, as I mentioned before, you will walk out from this workshop with not only one, but two practical step-by-step guides that will help you to create your first predictive models. The first one will be for the question, how to predict which product your customers is more likely to purchase.

And the second one, how to identify your VIP customers at early stage of their engagement with your brand. To finalize our workshop, we will talk about how those models can be taken to the next level using some more advanced applications and sophisticated tools. So, let’s start with the question, “What makes predicting customer behavior a challenge?”

And I think the best way to understand that is by a simple example. Now, let’s say that you are Netflix. Okay? And your best friend just watched “The Titanic”. Now, God knows why, but he decided he wanted to watch “The Titanic”. What you would recommend to your friend to watch next. Now, there are several options over here.

Option number one can be that you would recommend to your friend to see another romantic movie like “The Titanic” was, right? Or maybe you would assume that your friend would like to watch another movie that describes an historical event like “The Titanic” was. Or maybe you would assume that because your friend is such a DiCaprio fan, he would like to see another ones of DiCaprio’s movies such as “The Wolf of Wall Street”, right?

Now you get it, right? This is not an easy question and there is no one right definitive answer to this challenge, right? Now, in this Netflix case, the reason this is a challenge prediction question is because there are, first of all, many options that you can choose from, but also there are multiple variables that you need to consider in order to have a data-driven decision. For example, what is your friend’s age?

What is your friend’s gender, his favorite actor, his favorite director, and so on and so forth. Now, when you think about it, no wonder that Netflix created the content offering a $1 million prize to the team that will come up with the most accurate and sophisticated product recommendation model in order to predict their viewer preferences.

Now, what probably most of you did in order to answer this question is to make a reasonable assumption, but making reasonable assumption is not enough. And the truth is that you don’t need to use assumption today because we have the tools and the methods to address these kinds of challenges and to have a data-driven decision and more about it on our DIY later on. Now, the Netflix case is just one out of many prediction challenges out there.

And now we would like to talk about some common use cases that are more relevant to your industries. So, let’s start with our first use case, which is churn prediction. Now, yeah, this is a funny one. Yeah. So, defining who will be your churn customer can be a little bit tricky, right?

Because it varies from business to business, but once it has been decided, we like to better understand who is our high-risk customers and to engage with them in order to increase retention. Now, the ability to predict that a particular customer is at high risk of churning when there is still time to do something about it represent a high potential revenue source to any kind of online business out there.

Two additional common use cases that we would like to talk about and dive in a bit deeper is product recommendation that we already mentioned in the Netflix case and VIP propensity or the ability to identify your VIP customers at the early stage of engagement.

So, I think it’s a good time to move on to our first DIY for today, how to give the best recommendations to your customers with your own recommendation model, and Jonathan will take the lead from here.

– Thank you, Aviram. So, I want to start with the question of, “Why?” Why do we need this model in our life?

Well, we know that today, the customer’s attention span is ever decreasing, right? The customers are overloaded with tons of information on a daily basis and sometimes they only need to be pointed towards the right direction in order to make a decision. Well, I want to give you some interesting numbers. So, with Amazon, 35% of their sales are coming from their site recommendation, right?

It’s pretty amazing. Or like Aviram mentioned Netflix had the most famous contest of $1 million prize for the team that will create the best recommendation model for them. And with Google news, this annoying app that you have on your phone, there is a 40% higher click-through rate for articles that are coming from their recommendation model as well.

So, we can definitely see that there is value to smarter recommendation out there. Now, what are we trying to achieve here in the model? Well, we pointed out three main things. The first one is to be able to predict how much you may like a certain product or a service that I want to recommend to you.

Second thing is to be able to compose the list of the end-best items for you and then I can show the items via email or the website or whatever. Third thing is to be able to compose a list of the end-best users for a specific product. So, in case that you want to now promote a specific item or do an upsale or something like that, this could help you identify the best customers for that.

How are we going to create those type of models? Well, today I’m going to talk about the method called collaborative filtering. It’s pretty fundamental and easy to implement. And it includes two different approaches, a user-based approach, and an item-based approach. So, let’s start with the user-based approach just to understand what is it very quickly.

So, in user-based approach, we’re taking people that are similar to each other, could be based on demographic information, purchase history, and so on. And because those people are similar based on characteristics and behavior, they’re probably most likely to also purchase similar items or services. So, using this visualization here, let’s say, for example, that those two people are highly correlated to each other based on the data that we have.

This person purchased products A, B, and C, but this person only purchased product B and C and didn’t purchase yet product A. And because those two people are similar to each other, we can recommend to him product A because we have high confidence that he would purchase this product or like this product.

In an item-based approach, instead of looking at similar people, we’re looking at similar or highly-correlated products to one another. Now, what does it mean? It means that from the data that we have, we saw that, for example, all the people that purchase product C also purchased product A, right? Now, because those two products are correlated and this person, again, purchased product C but didn’t purchase yet product A, we can recommend to him product A because he has a high likelihood to purchase it.

Now, in our DIY, I’m going to use an item-based approach. It’s an easier way to walk you through it, but you can definitely use one of those methods when you create your own recommendation model. So, let’s start with step number one. Here, we are going to create a matrix called the Co-occurrence matrix, what a big name.

And the numbers of the matrix showing the number of customers that purchased two specific products, right? So, in our example here, we have 3000 customers that purchased both product one and product number five. So, this basically give us the first indication of which two products are associated to each other.

And in step two, we’re going to normalize the matrix. We’re going to use a method called Jaccard similarity. It’s very easy to calculate. Basically, for each of the numbers in the matrix, we are going to divide it by the total number of customers who purchased both of the products. So, going back to my example of those 3000 customers, you can see that we are going to divide them by the sum of the column, right?

So, all the customers that purchase product number five, plus the sum of the row, all the customers that purchase product number one. Doing so gives us a ratio of 0.08. Now, why the hell are we doing this normalization? Well, we are going to treat cases such as a company that is now giving all of her promotion to product number one, for example.

Or that specific product is already being displayed in the company’s homepage and therefore it’s probably already the most popular product out there and people are purchasing it anyway. And this is why we are doing this normalization to treat those cases because they can secure results of the model. In step three, which is kind of optional, here we’re doing a deep dive into a customer level now.

We can reduce the recommendations on products based on each customer’s purchase frequency, right? So, we’re taking this matrix that we have from the previous step, we’re adding now a vector of all the products and the number of times the customer purchased those products. So, in the bottom line, the more you purchase a specific product, the less likely I’m going to recommend it to you, which kind of makes sense.

In step four, then we calculate for each of the customers, his favorite product. Okay? So, in my case, let’s say my favorite product is now product number one, it’s something that is pretty easy to calculate using either a spreadsheet or a SQL query or something like that. So, let’s say my favorite product now is product number one. I’ve created the matrix, done the calculations, did the normalizations, we got a final ratio for each one of those products.

Now, I can take the product with the highest ratio, in this case, product number three, and recommend it to me as the top best product for me out there. But I can also take, like, the top five products, the top 10 products, the top 50 products. It’s all depends on the use case that I need right now. So, this concludes the first DIY for today.

In DIY number two, I’m going to discuss about how we can identify customers with high VIP potential in an early stage. So, again, I’m starting with the question of why do we need this model? All right. So, we know that VIPs are generating the highest value to the business and early identification can lead to higher VIP conversion.

And we also want to be able to nurture those VIPs-to-be from an early stage and guide them in a way that is fit best our business, right? Here, we’re going to identify the likelihood to become a VIP in a given timeframe, either one month from now, three months from now, six months from now.

And we’re going to discuss about how we make this decision of choosing those periods in a few slides. We’re going to create the model using a weighted scoring method. While there are many methods out there, this one is something that all of you can do yourself. We are going to look how VIPs behaved in the past before they become VIPs and use that data to create a prediction model for those high VIP potential customers.

So, let’s start with the first step. We need to select our prediction point. We have two periods that we need to decide on, a learning period, which is the timeframe that we examine how our customers behaved in the past, let’s say one month since they first purchase from us.

Okay? And then we need to choose a prediction period, which is how far into the future I’m going to look and check if our customers became VIPs or not, let’s say the following three months after or six months, right? How are we going to decide those two periods? We can pull a distribution of the percentage of VIPs by the days between first purchase and the first time they actually became VIPs to our company.

So, in here, in this example, we have 3% that became VIPs for the first time after 1 week since they first purchase, or 30% that became at the second month and so on. And we can understand from this distribution that we have a high concentration of people becoming VIPs for the first time after, let’s say three months since first purchase, right?

And we have a very low concentration at an early stage such as first week, second week and so on. So, we can decide that with this example, our learning period will be the first two weeks since first purchase and we are going to predict for our customers for the following three months into the future.

In step three, we start to create our dataset of customers, right? So, we obviously have our customers. We are going to have results variables that is actually saying if a customer became VIP or not afterwards. And then we’re going to start adding all other attributes we think are relative to the model, relevant for the model, sorry, like total purchase in the last two weeks, increase in activity days, demographic attributes such as gender, age, and so on.

And in step number four, we are going to decide which attributes I’m going to keep in my model. So, we can pull for all those attributes a distribution based on our results variable, right, and check if we see a very significant separation between those that became VIPs and though that didn’t become VIPs. So, let’s see two examples.

First one is for a total purchase amount in the last two weeks, right? We can see all the different ranges that we have here and we see that we have a very clear separation. If I, like, draw a line right in the middle between those that did become VIPs in the end, in the dark blue bars, and those that didn’t become VIPs in the light blue bars, right?

So, these attributes, I can say with high confidence can give me a lot of value in my model. So, I should probably keep it. Now, the second example is for gender. Here, we can see that no matter if you’re a male or a female, the probability to become a VIP doesn’t change that much. So, it’s probably not giving me that much value to my model. So, I should just remove it, right?

And this is what we can do for all the rest of the attributes. In step five, we need to give a score for each range in every attribute. So, I’m taking, as an example, this distribution of total purchase amount in the last two weeks, turning it into a table form, right? We have the ranges from the previous slide, the percentage that became VIPs, the percentage that didn’t become VIPs, and for each one of them, I’m creating a ratio which is basically dividing the percentage of yes by the percentage of no, like in this example here that we get 3.57.

For each one of the ratios then, we create a scale of score between 0 and 10. We give 10 to the highest ratio out there, which makes sense, right? If a customer did over $1,000 of purchase in the last two weeks, he probably should deserve the highest score. And then for the rest of the ratios, we are calculating a weighted score relative to the 10 that we gave to the highest one.

Then in step six, we give a weight for each attribute that we decided to keep in our model, right? So, the weight basically reflects the importance of each attribute to my model. We can decide on the weight using the distributions that we put from previous steps. And then after we give away to each one of them, we can calculate a layered score by multiplying the score of each attribute with the weight of each attribute.

So, in this example, we have a customer here that got all those different-layer scores, and in the next step, we are calculating his final weighted score, right? So, in this example, he got a final score of 8.2 by summing all those layer scores up. Now, the question is, is this enough for him to be considered as a potential VIP or not?

Right? So, we can play with different threshold, right? Take all the customers that received a score over 8 or over 9 or 9.5, and we can see how it affects the last step, which is validating our model. Okay? So, we have three measurements here that we do for checking our model performance. Accuracy is if I’m taking a random customer, what is the probability to flag him correctly either yes or no.

Precision is if I flag you as a potential VIP already, right? So, meaning yes, what is the probability that I’m correct and you are actually going to become a VIP in the following X months that I decided as my prediction period? And the last thing here is the recall, which means that from all of those that actually become VIPs at the end, how many did we manage to flag looking back using our model and didn’t miss.

So, this concludes two DIYs that were pretty basic and I’m going to hand it over to Aviram to explain how we can take this to the next level by applying some more advanced methods.

– Thank you, Jonathan. So, what Jonathan just presented were basic methodologies in order for you to start and created your own predictive models. I would like to dedicate the last part of the session to talk about how you can take predictive modeling to the next level using some more advanced applications. So, on our day-to-day work, in order to handle large dataset and to be able to create robust predictive models, we are using statistical programming such as R or Python and also different kinds of machine learning algorithms such as logistic regression, decision tree, random forest, and many others.

These algorithms and many others help us to increase the performances of our model and to improve the accuracy that Jonathan just mentioned when he talked about the validation part. Another thing that we are doing using those algorithms is to improve the feature selection process, the feature where we choose the right parameters to our model. Now, this is a really crucial process because it will define if our model will be accurate or not.

Another cool thing that we can do using those machine learning algorithms is to build a self-optimizing model. A self-optimizing model means that we can create a predictive model that will be updated automatically on a periodical basis and it will be based on the newly available data that we have on our system. Now, the motivation to create a model which is updated is because in most businesses today, the customer’s behavior is dynamic and can change and evolve over time.

So, we will like to create a predictive model that can change according to the changes in the business. Now, let’s see together how this process works. Let’s say that we created a predictive model and implemented the model on the customer’s marketing software system in the beginning of the year, right? Now, this model runs every day and each customer is assigned with a predictive value.

In the case of the VIP, it can be, “Yes, the customer has a high potential to become a VIP,” and, “No, the customers has low potential to become a VIP.” Okay, so the model rounds every day and then after a few months, we decided we would like to refresh the model. Now, refreshing the model means that we will create a new predictive model based on the data that we already collected, including the newly available data that we have during this period.

Now, if the new model turns out to be more accurate than the current one, we will replace the current one. Okay? But if not, we will keep the current model in our system. Now, the decision will be based on predefined matrix that we need to decide on. Now again, as I mentioned, when this procedure is set, afterwards, everything has been done automatically with no any manual efforts.

And that’s lead us to the last part of this workshop. And I would like to go back to where we started. Now, as I mentioned, prediction might seem a little bit overwhelming and daunting. So, we hope that after this session you will be able to be independent in creating your own predictive model.

And if some of you already have taken the first steps, we encourage you to think about how you can take your predictive modeling to the next level. Thank you very much for listening. I hope that you enjoyed our session. Please feel free to reach out if you have any questions and enjoy the rest of your day.

A step-by-step guide to building product recommendation and potential VIP models. It’s not magic, it’s science.

Video Transcript