If two features change predictions in a similar way, DataRobot recognizes them as correlated and identifies the feature with lower feature impact as redundant.
How do we quantify or measure "similar way"?
If two features are highly correlated, the prediction difference (prediction before feature shuffle -- prediction after feature shuffle) of the two features should also be correlated. The prediction difference can be used to evaluate pairwise feature correlation. For example, two highly correlated features are first selected. The feature with lower feature impact is identified as the redundant feature.
Do we consider two features redundant when their prediction differences is the same/between -x% and +x%?
We look at the correlation coefficient between the prediction differences and if it's above a certain threshold, we call the less important one (according to the models' feature impact) redundant.
1. Calculate prediction difference before and after feature shuffle
(pred_diff[i] = pred_before[i] - pred_after[i])
2. Calculate pairwise feature correlation (top 50 features, according to model feature impact) based on pred_diff
3. Identify redundant features (high correlation based on our threshold), test that removal does not affect accuracy significantly.