Lead Scoring

This page provides a summary of the results from running a lead scoring use case.

Use case understanding

Companies around the globe spend a large number of resources in their efforts to acquire and retain customers. Lead scoring is essentially the process of creating a model that ranks prospects against their propensity to become customers in the near future.

It used to be that these models were created (if they existed) through subject matter experts who would attach “points” to different actions from customers. For example, a customer looking at the website could be worth +5 points while a customer clicking on an email link could be worth +8 points. In the end, you would have customers with cumulative points attached and you could prioritize communication accordingly.

Industries have since shifted to using machine learning models in order to do lead scoring because models are easily retrainable, they provide more accurate results, and can find deeper interactions between features.

Python sample and data

We've created a notebook for an end-to-end demonstration of how Python can be used jointly with DataRobot to produce a lead scoring model. This notebook and related dataset is available at the DataRobot Community GitHub.


The dataset we will be using comes from a company that sells online courses to industry professionals.

Each row represents a person who landed on the website, and each column represents some information about this prospect. There are columns such as “lead origin,” “lead source,” and “total time spent on the website.”

(The dataset is available from the DataRobot Community GitHub.) 

The target column is “Converted,” which indicates whether or not this prospect bought a course.

Initiating Autopilot

To create a lead scoring model using DataRobot, you just need to drag and drop the CSV file to DataRobot (new project page), and specify “Converted” as the target column (Figure 1).

Figure 1. Initiating AutopilotFigure 1. Initiating Autopilot

Then you click Start and leave the Autopilot process to finish.

NOTE: The “Tags” column was a potential candidate for target leakage so it was removed from the modeling procedure. (This article explains more about target leakage.)

Feature Impact

Autopilot takes some time to complete depending on the amount of workers available to you. It’s easy to take a look at the model building results starting from Feature Impact.

Figure 2. Feature ImpactFigure 2. Feature Impact

As shown in Figure 2, it seems that “Lead Quality” and “Total Time Spent on Website” are the most important predictors for the best model trained by DataRobot.

Feature Effects

We can now take a closer look at how the specific values of these two features affect our target outcome. Figure 3 shows the impact of “Lead Quality.”

Figure 3. Feature Effects—Lead QualityFigure 3. Feature Effects—Lead Quality

As shown, for specific values of Lead Quality like “High in Relevance,” the probability of a prospect actually buying an online course is far greater than when the lead quality equals “Not sure.”

Figure 4 shows the impact of “Total Time Spent on Website.”

Figure 4. Feature Effects—Total Time Spent on WebsiteFigure 4. Feature Effects—Total Time Spent on Website

For "Total Time Spent on Website," it seems that the longer someone spends at the website, the higher the probability that they will actually buy an online course. There are some diminishing returns though since after 800 seconds, the probability does not increase further.

Hotspot insights

Figure 5. Hotspot insightsFigure 5. Hotspot insights

Hot spots represent simple rules with high predictive performance. These rules are good predictors for data and can easily be translated and implemented as business rules. Red dots show rules that correspond to high conversion rate and we can see one such rule in Figure 5.

Subject matter experts could take these rules and design a website based on them. For example, if a rule tells you that people who stay longer than 300 seconds tend to buy an online course, we could try to keep prospects occupied in the website for at least this long and see if that has any impact on conversion.

More Information

If you’re a licensed DataRobot customer, search the in-app documentation for Feature Impact or Feature Effects. Also you can search the in-app documentation for DataRobot insights, then locate “Using Hotspot insights” for more information.

Labels (4)
DataRobot Employee
DataRobot Employee

Hi ! 


It looks like the link to the Kaggle lead scoring Dataset may be broken, do you know where i can find the new link? 

Community Team
Community Team

All set! Thanks Craig - the link included an unnecessary and "dastardly" period.

Version history
Revision #:
12 of 12
Last update:
‎05-15-2020 01:22 PM
Updated by: