Detect Your VIPs Early in the Game
At Optimove, we use machine learning to identify gaming VIPs as early as possible. In this article we propose a method you can try yourselfPosted in Customer Retention, Gaming, Data Analysis on 30 March 2017 by:
VIPs exist in every industry, whether a company defines them as such or not. Mostly, they are a relatively small group of players which generates a large portion of the company’s revenue. Across different business verticals the top 5% of customers are responsible for an average of 60% of revenues, and in iGaming – up to 80%.
What if you could identify a customer as a VIP, even before he knows it?
Most companies use pre-defined rules for defining players as VIPs. These rules can be simple or complex, and based on fixed or dynamic thresholds - each method has its pros and cons.
Flagging a player as “VIP to be”, or scaling his VIP potential while he’s still new (two weeks from first deposit), can be valuable to the company. Although it can be tricky to identify the diamonds early on, treating these players as VIPs from their first steps can ensure a long term and very beneficial relationship.
Harvest your data to detect potential VIPs
So how to identify a VIP player fast? The idea is to use supervised machine learning to identify early behavior patterns which hint at future VIP status. These patterns can later be coded into rules which will flag players as potential VIPs as early as two weeks from first deposit, and enable marketers to provide them with the appropriate treatment that will ensure this result or give the right push to bring it to fruition.
This can be done using different methods and tools, including, but not limited to, correlation analysis, decision trees, random forests and logistic regression. Below is an outline of one of the methods we use at Optimove. It involves data prep, running decision tree algorithms in R and extracting appropriate rules.
- Create a data set based on past customer’s activity in their first 14 days after first deposit. You can use data from different time points to enlarge your sample.
- Check which players became VIPs in the following 3 months – the number of months should be dictated by the industry and company strategy. Mark each player in the data set as VIP (1) or not (0). The table now contains the player's behavior in his first 14 days (deposits/wagers/bonuses etc.) and a dependent column, IsVip, indicating if he turned into a VIP after 3 months.
- For the next step we use decision tree algorithms, which might require some work on your data:
- Remove highly correlated X variables to avoid random selection of variables into the tree. For example, the attribute "Number of Deposit Days" will probably be highly correlated to the attribute "Number of Deposits".
- Feature selection – find the features that have the strongest effect on the Y variable.
- split the data into a training set (70%-85% from the entire set) and a test set. There are several, more sophisticated ways to do this than a random split (such as k-fold).
Obviously, sometimes the data set requires additional preprocessing, like variable standardization scaling. We will not elaborate on that process here.
Run the algorithm on the test set – you can fine-tune the training set according to the results, just refrain from over fitting. The screenshot below will easily guide you how to do it in R using existing packages. We start with loading packages and importing the data into R:
Now we have our table stored in "TreeInput". The next step is to split it into training set and test set:
In this case, the training set will consist of 75% of the data, and the test set - of 25%.
Now, all that’s left to do is to create the tree:
After running the first 3 lines of code, your model is stored in tree.mod. The next step is pruning the tree. Pruning helps us focus on the more important parts of the tree by removing redundant parts. This is a result of a tree we generated (only the first level):
Let's look at the first node: the first variable the tree chose was deposit amount. We can also see that the sample size is 45e+3 (45K new players), and that 3.6% of them became VIPs. When splitting this sample by a deposit amount lower or higher than $192, as the tree suggests, we get two new groups:
- On the right, we can see a group of 3,879 players (which qualified for the criteria deposit amount>$192). 23% of them eventually became VIPs.
- On the left, we have a group of around 41K players. Of them, only 1.8% became VIPs.
- After generating 100 trees, we’ve found that the average deposit amount for the first split is $200, and that 25% of the players who deposited over $200 became VIPs. Now, we can decide on a rule as follows: all players who deposited over $200 in the past 2 weeks (new players in our case), will be flagged as potential VIPs.
- Test your model on the test set to check type I and type II errors –
First, we need to create a confusion matrix in R:
The above code inserts the predicted results on the test set, using the model tree.prune we created, into pred. Then, it inserts a comparison between the prediction to reality to confusion. Notice that now you predict on the test set, using the model built on the training set.
These are the numbers:
- The model identified 205 VIPs, of which 41 were successfully identified -
20% positive predictive rate
- The model identified 1,346 non-VIPs, of which 1,319 were successfully identified- 98% negative predictive rate
In order to decide whether a model answers our needs, we can measure additional parameters such as sensitivity, specificity and type I & II errors. In our case, we would prefer type II errors (non-VIP identified as VIP) over type I errors (VIP identified as non-VIP). There are two reasons for this:
- Even though a player hasn’t turned into a VIP, he still meets the criteria of depositing over $200. That means he is stronger than the average player, and should be treated accordingly.
- The "cost" of flagging a player as VIP, even if he won't become one eventually, is small or non-existent. On the other hand, the revenue lost by overlooking a potential VIP and thus losing him as a customer, has an adverse effect on the company’s revenue.
Looking at the numbers, we can see that despite using a simple rule (the first split of the tree), it identifies 60% percent of the players who eventually turned into VIPs.
Keep in mind that this number might go up, since you can now motivate these players to increase their activity.
Successful VIP detection
There’s no doubt that a player’s initial activity (first deposit, first day purchases etc.) can imply his chances of becoming a VIP in the future. Sometimes, surprising connections can be found between future VIPs and certain games in the product, specific sport disciplines, favorite platform or even demographic attributes like origin country. Finding a connection like this can help you flag a potential VIP even on his first day.
At Optimove, the process outlined above helps us discover hidden VIPs and communicate with them in a way that ensures VIPs are nurtured from the very beginning.