Solved: Re: Multicollinearity in Regression - DataRobot Community

devang · ‎02-26-2020

Does datarobot address the problem of multicolliear features and eliminate them before building model

duncanrenfrow · ‎02-27-2020

That's a great question! So in general, the accuracies of most of the algorithms we use (neural networks, trees, etc.) are not impacted by multicollinear features. The only advantage for cleaning up multicollinear features for these algorithms is that it can improve the interpretability with regards to feature impact (permutation importance), feature effects (partial dependence), etc.

Obviously for linear models eliminating multicollinear features is key. DataRobot does automatically perform feature selection and attempts to identify redundant features, so this will certainly help. Once that is done, you can also look at the "Feature Association" matrix which shows the mutual information or Cramer's V for all pairs of variables (and also clusters similar variables). This should help you manually eliminate any multicollinear variables that DataRobot didn't automatically detect.

Does that answer your question?

Duncan

View solution in original post

duncanrenfrow · ‎02-27-2020

Hi @devang ,

That's a great question! So in general, the accuracies of most of the algorithms we use (neural networks, trees, etc.) are not impacted by multicollinear features. The only advantage for cleaning up multicollinear features for these algorithms is that it can improve the interpretability with regards to feature impact (permutation importance), feature effects (partial dependence), etc.

Obviously for linear models eliminating multicollinear features is key. DataRobot does automatically perform feature selection and attempts to identify redundant features, so this will certainly help. Once that is done, you can also look at the "Feature Association" matrix which shows the mutual information or Cramer's V for all pairs of variables (and also clusters similar variables). This should help you manually eliminate any multicollinear variables that DataRobot didn't automatically detect.

Does that answer your question?

Duncan

Multicollinearity in Regression

Multicollinearity in Regression

Modeling

Oracle

How to make your own lagged features

Google Ads use case

Feature Generation

Downloaded Predictions do not Match Targets