Recently I did build a binary classification( 1 vs 0) model and deployed the highest ranking model built using Gradient Boosting classifier. My question is regarding the signs of the column Explanation_1_strength. Will it be correct to say that if the sign of this number is positive, then this feature is contributing towards the record being in positive class(i.e. 1). On the other hand, if sign is negative, then it is pushing the data record towards the negative class( i.e. 0).
Can someone kindly clarify if I am thinking correct about it. Thanks in advance.
Yes, you are correct. The values 1-3 are arranged by absolute value of their strengths, and a positive value means the needle is moving towards the positive class, with a negative away from it.
Appreciate your prompt response. One additional question I have;
For my example I can see that the data record is being classified as 0 (i.e. safe) , but there are no features with negative coefficients. I am looking at top five features and can see only 5 features with positive coefficients. This is looking quite mysterious to me.
I am not being able to comprehend, why is this happening. Can you kindly clarify?
Not only are you provided the negative and positive class values, but for a deployed model, the threshold at which the positive class is set. This would have been chosen during your deployment, typically based on evaluating the ROC curve and true/false positives. DataRobot's output response will provide you the threshold, as well as the prediction with the threshold applied. I can have all positive coefficients, yet still not have enough stacked up to hit the positive threshold. Eg. I might have a high threshold set to be very picky/minimize false positives - I'm very selective about who I lend money to, and I have a limited amount, so I'll need a 0.95 threshold to be met. Even if all the major factors of a score were positive, if it "only" got up to 0.9, DataRobot would still provide the negative class as the predicted value.