cancel
Showing results for
Did you mean:

In this exercise we are going to use a strategy called “lead scoring” to predict the probability that a prospect will become a customer. To achieve this, we are going to use binary classification.

You can also do this project programmatically with R or Python. You can find the notebooks for this post on our Community Github page: the R notebooks are here, and the Python notebooks are here.

### Dataset

The dataset was taken from the UCI Machine Learning Repository. It was published in a paper by Sérgio Moro and colleagues in 2014.  The dataset is attached to this article; see below.

[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

This dataset includes information from a direct telemarketing campaign of a Portuguese bank. The target is indicated by the feature where “yes” means that the prospect purchased the product being offered and “no” means that they did not.

Figure 1. Data

### Start Projects

For the setup, import the dataset (bank-full.csv) and set the target as y. Change the modeling mode to Quick and press the Start button.

### Select Model to Evaluate

You want to select the 80% version of the top model to evaluate. Star the model to keep track of it. Change the optimization metric from LogLoss to AUC, which is a little easier to interpret because it is bound between 0 and 1. Now take a look at the Evaluate > ROC Curve tab.

As you can see, the accuracy is quite high: over 90%. If you look at the confusion matrix, you will find that there are quite a few True Negatives (correct rejections) and more than three times as many True Positives (hits) than False Positives (false alarms). Overall this model looks pretty good.

Figure 2. ROC Curve

### Model Insights

After you've evaluated your model, take a look at the Understand > Feature Impact tab. Here you will find how much your features impacted your model. This is calculated using permutation importance, which is a model-agnostic approach. You can see that duration, month, and day are the top three features.

Figure 3. Feature Impact

You can also go to the Understand > Feature Effects tab. This tool will show you how each feature is impacting the target. For example, you can see the feature duration shows a positive impact on the target: when duration increases, so does the likelihood of purchasing.

Figure 4. Feature Effects

Finally, you can look at the local impact of features on your predictions in the Understand > Prediction Explanations tab. From here you can see which features are impacting a sample of rows from the top and bottom end of the prediction distribution. When you look at these, they should conceptually align with what you saw on the Feature Effects tab.

Figure 5. Prediction Explanations

### Making Predictions

You can make predictions on the Predict > Make Prediction tab. From here, you can upload a new dataset that includes the features in your training dataset but without the target. Then, when you click Compute the platform will use the model you created to score those new rows of data. You can then download the predictions straight from the GUI.

Figure 6. Make Predictions

Labels (4)

• ### Use Cases

Computer Board

Hi @lhaviland I believe the links to the R and Python Notebooks are broken. Any chance to get access to those?

Community Team

okay @pipaloff looks like the R link works but the Python link is broken. Can you try R again and see you if get there? And I'll track down the Python link. thank you for bringing it to my attention!

Community Team

Again, thanks for letting me know.

Computer Board

Great, thanks! Appreciate the follow up

Version history
Last update:
Friday
Updated by:
Contributors