cancel
Showing results for 
Search instead for 
Did you mean: 

Multicollinearity in Regression

Multicollinearity in Regression

Does datarobot address the problem of multicolliear features and eliminate them before building model

Labels (1)
0 Kudos
1 Solution

Accepted Solutions

Hi @devang ,

 

That's a great question! So in general, the accuracies of most of the algorithms we use (neural networks, trees, etc.) are not impacted by multicollinear features. The only advantage for cleaning up multicollinear features for these algorithms is that it can improve the interpretability with regards to feature impact (permutation importance), feature effects (partial dependence), etc.

Obviously for linear models eliminating multicollinear features is key. DataRobot does automatically perform feature selection and attempts to identify redundant features, so this will certainly help. Once that is done, you can also look at the "Feature Association" matrix which shows the mutual information or Cramer's V for all pairs of variables (and also clusters similar variables). This should help you manually eliminate any multicollinear variables that DataRobot didn't automatically detect. 

 

Does that answer your question?

Duncan 

View solution in original post

1 Reply

Hi @devang ,

 

That's a great question! So in general, the accuracies of most of the algorithms we use (neural networks, trees, etc.) are not impacted by multicollinear features. The only advantage for cleaning up multicollinear features for these algorithms is that it can improve the interpretability with regards to feature impact (permutation importance), feature effects (partial dependence), etc.

Obviously for linear models eliminating multicollinear features is key. DataRobot does automatically perform feature selection and attempts to identify redundant features, so this will certainly help. Once that is done, you can also look at the "Feature Association" matrix which shows the mutual information or Cramer's V for all pairs of variables (and also clusters similar variables). This should help you manually eliminate any multicollinear variables that DataRobot didn't automatically detect. 

 

Does that answer your question?

Duncan