Do features that contain a negative impact score, indicate the extent to which this feature influences the target outcome towards a particular side (i.e. Higher/Lower) or does it instead indicate to what negative extent the feature impacts the overall health (i.e Accuracy) of the model by being there?
Solved! Go to Solution.
I would start by checking the Importance score for the variable in the Data page. A value close to zero suggests a model built using that variable itself is unlikely to be of any use, and that should explain the negative feature impact.
If, instead, the importance score is modest, there are at least 3 other areas I personally would explore:
Low sample size
For larger datasets, DataRobot estimates impact based on a limited sample of observations. You may want to ensure if DataRobot is using the maximum sample size available to compute feature impact for your project.
Computed independently for each feature, feature impact can be understated for correlated features. As an example, if two features in your model were height but in different units, removing one or the other should result in little to no change in performance. It is probably just fine to leave out one or the other, but probably not both! DataRobot's feature association matrix may give some clues here.
Perhaps you're skeptical of the idea that permuting features is the right way to go, and some out there would agree (for example, permuting features might result in nonsensical combinations of feature values). Consider a slightly modified version of feature impact: create a featurelist without the feature with negative feature impact, retrain the model with the new featurelist, and compare against your original model to check if there is a material deterioration in performance. This is known as leave-one-covariate-out feature importance.
Hope this helps!
Yes he has validated my concern but the overall issue is still pending a broader discussion I would say. I'll accept the solution, however, if any community member has anything to add to the aforementioned discussion, that would be awesome.
Hi @DREnthusiast - Thanks for asking such great questions here in the community! It looks like IraWatt gave you the answer you needed? If that's right, please hit "Accept as Solution" on his response so others can find it too.
Precisely! That's the interesting statement here.
Whether the model presumably did "better" with the feature shuffled vs in its original state indicating that the feature was originally noise at best.
I guess that is to be still determined. I'll leave this question as un-answered in the hopes of someone having another perspective coming forward and sheding some more light. But you nonetheless, have validated my concerns about the topic!
Much appreciated @IraWatt !
Would be interesting to see what the characteristics of a "negative" impact feature would look like, is it possible to do worse then a list of random numbers? Probably very model dependent.
Decreased relative too the models performance with that feature in its initial order. When you run feature impact it is computed on a sample of the training data so the error would be relative to how close the model predicts the actual with the feature in the correct order vs shuffled.
DR automatically remove redundant features to a degree. EDA just creates feature lists which are of good quality (ie. not an ID, full of missing values ect..), a redundant feature can still be of good quality. When modelling DataRobot trys different feature lists which could remove redundant features. Moreover, most modelling algorithms reduce the impact of useless features (though would preform better without them). However, the Docs states one of the two main points of Feature Impact is a way to identify unimportant or redundant columns. Therefore, It is still a necessary thing to check.
Looking at impact vs negative impact, the key difference is with negative impact the model did better when that feature was shuffled, implying the feature is just noise, where as all positive impact presumably had some signal.
Mhm, but when you say "that would mean that the error decreased", decreased relative to what?
I can understand quantifying how "much" of an increase in error, this permutation has led to, compared to the actual values and this "increase in error", I assume, can go either way (higher/lower vs actual value) but as for the other side (i.e. negative impact), what would the decrease mean or be relative to?
Besides, if we assume the feature were to be redundant, wouldn't D.R have picked up on that?
Edit: I've read that it "computes a drop in accuracy that resulted from shuffling", but that still doesn't really explain how it computes a "negative" impact score because you'll almost always have a delta decrease in accuracy. So what characterizes a negative impact vs just an impact?
By default DataRobot uses permutation to calculate feature impact (Generate the Feature Impact chart).
Permutation-based feature impact describes how much the error of a model would increase, based on a sample of the training data, if values in the column/feature are shuffled (feature-impact-methodologies).
If the feature impact is negative that would mean that the error decreased by shuffling that feature making it most likely a redundant feature which negatively impacts the overall health of the model.
SHAP-based and Tree-based are two other methods that DataRobot can use (set in advanced options) to calculate feature impact, how they are interpreted is described here in the docs.
Very interesting question, negative impact is not something I have encountered.