Model Explainability with SHAP in DataRobot

Community Team
Community Team
3 5 841

(Part of a model building learning session series.)

State-of-the-art machine learning models have a reputation for being accurate but difficult to interpret. DataRobot’s explainable AI features help you understand not just what your model predicts, but how it arrives at its predictions. In this learning session we take a look at SHAP values (SHapley values) for both Feature Impact and Prediction Explanation, which is newly integrated into DataRobot in release 6.1. SHAP is a model-explanation system based on Shapley values, which tells you how much each model feature affects each prediction. A wide variety of top-performing DataRobot blueprints now integrate SHAP, including linear models, trees and tree ensembles, and multistage combinations of these.

No matter how you interact with models, you will get some useful insights from SHAP values. Model developers can learn which features matter, which helps focus their development efforts. Model evaluators and regulators can sanity-check predictions against domain knowledge and business rules. Model consumers can learn which features were most important in individual cases, and use that as a guide for actionable next steps. Regardless of your role, seeing how the model makes its predictions can help you understand and trust it.

Hosts

  • Mark Romanowsky (DataRobot, Data Scientist—Explainable AI)
  • Rajiv Shah (DataRobot, Data Scientist—Customer Success)
  • Jack Jablonski (DataRobot, AI Success Manager)

More Information

  • DataRobot Community: SHAP Insights
  • If you’re a licensed DataRobot customer, search the in-app Platform Documentation for SHAP-based Prediction Explanations and SHAP reference . Also, within the SHAP reference topic see the section "Additivity in Prediction Explanations" to learn why sometimes SHAP values do not add up to the prediction.
5 Comments
DataRobot Employee
DataRobot Employee

Q: Is the base value constant for the models or does it change for different prediction sets?

A: The base value is a property of a trained model. It is the model prediction averaged over the data used to train it. It won't change based on different predictions sets. However, if you retrain the model on an increased amount of training data, the outcome is a new and different model, which will have a different base value.

DataRobot Employee
DataRobot Employee

Q: SquareMeters is largely a continuous variable. Can we produce a pseudo factor curves based on Shapley? i.e. 100m = -20 , 500m = +200, 1500m = +500 (from baseline)

A: You could do something like this by binning your variable and computing the mean SHAP value for each bin. However, if this is what you want, consider using a GA2M model instead, which follows a similar line of thinking, and can reveal 2-variable interaction effects. And yes, we support SHAP insights for GA2M models!

 

DataRobot Employee
DataRobot Employee

Q: Can DeepShap be used with ANN used as a binary classifier (not images etc)?

A: Yes, DeepSHAP can be used in this way. In fact, you may see Keras models in the DataRobot leaderboard in a binary or regression project, and SHAP insights for these models are based on DeepSHAP.

 

DataRobot Employee
DataRobot Employee

Q: Can we surface the SHAP explanations via DataRobot API at runtime? (for use in the app UI that uses the DR ML models.)

A: Yes, you can access the Shap explanations via our API.  This makes it easy to incorporate explanations into web applications for example.

 

DataRobot Employee
DataRobot Employee

Q: Does SHAP or XEMP handle categorical predictions? or only quantitative like pricing?

A: If you are asking about categorical target variables, like in a binary classification project, the answer for both SHAP and XEMP is "yes". Both methods show you which features are most important for affecting the predicted probability of being in a particular class. Note that you may need to include the effect of the logistic function to demonstrate additivity in the SHAP values. 

If you are asking about categorical features, like "US state" as an input column, the answer for both SHAP and XEMP is again "yes". Both methods handle all standard input variable types and preprocessing steps.