Does datarobot address the problem of multicolliear features and eliminate them before building model
Solved! Go to Solution.
Hi @devang ,
That's a great question! So in general, the accuracies of most of the algorithms we use (neural networks, trees, etc.) are not impacted by multicollinear features. The only advantage for cleaning up multicollinear features for these algorithms is that it can improve the interpretability with regards to feature impact (permutation importance), feature effects (partial dependence), etc.
Obviously for linear models eliminating multicollinear features is key. DataRobot does automatically perform feature selection and attempts to identify redundant features, so this will certainly help. Once that is done, you can also look at the "Feature Association" matrix which shows the mutual information or Cramer's V for all pairs of variables (and also clusters similar variables). This should help you manually eliminate any multicollinear variables that DataRobot didn't automatically detect.
Does that answer your question?
Duncan
Hi @devang ,
That's a great question! So in general, the accuracies of most of the algorithms we use (neural networks, trees, etc.) are not impacted by multicollinear features. The only advantage for cleaning up multicollinear features for these algorithms is that it can improve the interpretability with regards to feature impact (permutation importance), feature effects (partial dependence), etc.
Obviously for linear models eliminating multicollinear features is key. DataRobot does automatically perform feature selection and attempts to identify redundant features, so this will certainly help. Once that is done, you can also look at the "Feature Association" matrix which shows the mutual information or Cramer's V for all pairs of variables (and also clusters similar variables). This should help you manually eliminate any multicollinear variables that DataRobot didn't automatically detect.
Does that answer your question?
Duncan