cancel
Showing results for 
Search instead for 
Did you mean: 

Tips for creating uncorrelated feature lists

Jaume Masip
Data Scientist
Data Scientist

Tips for creating uncorrelated feature lists

Let’s assume that you are interested in creating an uncorrelated feature list from which you can build final models, for example, by running DataRobot’s autopilot modeling process. Essentially the goal is to reduce the feature set which are as uncorrelated as possible, while ensuring there is limited degradation in the performance metrics.

 

You can easily do that in DataRobot by following these steps:

 

Step1. Find an elastic net model in the Repository, either before or after kicking-off the Autopilot modeling process for the first time

JaumeMasip_0-1655120499891.png

 

Step 2. Run this elastic model on the Informative Feature feature list (with all the features), 64% sample size and All CV run

JaumeMasip_1-1655120552086.png

 

Step 3. Apply Advanced Tuning to the elastic model, so that alpha parameter is equal to 1, which is the lasso penalty. Tune the lambda parameter as well and set a high value - e.g., if the current parameter value is lambda=0.01, try lambda=0.1

JaumeMasip_2-1655120610870.png

JaumeMasip_3-1655120626053.png

 

Step 4. Run the Feature Impact on this tuned model

JaumeMasip_4-1655120655556.png

 

Step 5. Use the option to create a new Feature List (Top N features) from the Feature Impact - where N is the number of top variables that you would like to use. The lasso penalty has dropped the correlated features, so you can tune the lambda parameter value (see step 4) until you have dropped enough correlated features that you are happy with the result

JaumeMasip_5-1655120682976.png

 

Step 6. Use Feature Associations feature to review the correlation between the variables of the new “Top N” feature list that you have created

JaumeMasip_6-1655120706557.png

 

Step 7. Finally use this new feature list to train other models without correlation, either by re-running the Autopilot modeling process or retraining an specific existing model from the Leaderboard

JaumeMasip_7-1655120733953.png

JaumeMasip_8-1655120747705.png

 

Let me know if you have further questions about this tip!

@Jaume Masip 

1 Reply
shaz13
Data Scientist
Data Scientist

This is creative and brilliant. One extra tip I can suggest - 

All of this is also possible using DataRobot Python Client via code. If you are interested to automate and scale this to many projects. It's all feasible with Python API client