This page provides a summary of the results from running a lead scoring use case.
Use case understanding
Companies around the globe spend a large number of resources in their efforts to acquire and retain customers. Lead scoring essentially the process of creating a model that ranks prospects against their propensity to become customers in the near future.
It used to be that these models were created (if they existed) through subject matter experts who would attach “points” to different actions from customers. For example, a customer looking at the website could be worth +5 points while a customer clicking on an email link could be worth +8 points. In the end, you would have customers with cumulative points attached and you could prioritize communication accordingly.
Industries have since shifted to using machine learning models in order to do lead scoring because models are easily retrainable, they provide more accurate results, and can find deeper interactions between features.
The dataset we will be using comes from a company that sells online courses to industry professionals.
Each row represents a person who landed on the website, and each column represents some information about this prospect. There are columns such as “lead origin,” “lead source,” and “total time spent on the website.”
This dataset can be found and downloaded through Kaggle.
The target column is “Converted,” which indicates whether or not this prospect bought a course.
To create a lead scoring model using DataRobot, you just need to drag and drop the CSV file to DataRobot (new project page), and specify “Converted” as the target column (Figure 1).
Figure 1. Initiating Autopilot
Then you click Start and leave the Autopilot process to finish.
NOTE: The “Tags” column was a potential candidate for target leakage so it was removed from the modeling procedure. (This article explains more about target leakage.)
Autopilot takes some time to complete depending on the amount of workers available to you. It’s easy to take a look at the model building results starting from Feature Impact.
Figure 2. Feature Impact
As shown in Figure 2, it seems that “Lead Quality” and “Total Time Spent on Website” are the most important predictors for the best model trained by DataRobot.
We can now take a closer look at how the specific values of these two features affect our target outcome. Figure 3 shows the impact of “Lead Quality.”
Figure 3. Feature Effects—Lead Quality
As shown, for specific values of Lead Quality like “High in Relevance,” the probability of a prospect actually buying an online course is far greater than when the lead quality equals “Not sure.”
Figure 4 shows the impact of “Total Time Spent on Website.”
Figure 4. Feature Effects—Total Time Spent on Website
For "Total Time Spent on Website," it seems that the longer someone spends at the website, the higher the probability that they will actually buy an online course. There are some diminishing returns though since after 800 seconds, the probability does not increase further.
Figure 5. Hotspot insights
Hot spots represent simple rules with high predictive performance. These rules are good predictors for data and can easily be translated and implemented as business rules. Red dots show rules that correspond to high conversion rate and we can see one such rule in Figure 5.
Subject matter experts could take these rules and design a website based on them. For example, if a rule tells you that people who stay longer than 300 seconds tend to buy an online course, we could try to keep prospects occupied in the website for at least this long and see if that has any impact on conversion.
If you’re a licensed DataRobot customer, search the in-app documentation for Feature Impact or FeatureEffects. Also you can search the in-app documentation for DataRobot insights, then locate “Using Hotspot insights” for more information.