cancel
Showing results for 
Search instead for 
Did you mean: 

How DataRobot reduce features?

MTL_new
Blue LED

How DataRobot reduce features?

As you already know that DataRobot can automatically reduce features. I wonder how its works? (for example DR reduce feature m48 etc)

0 Kudos
3 Replies
Vinay
DataRobot Employee
DataRobot Employee

Hi @MTL_new ,

 

Welcome to the DataRobot community!

 

DataRobot automatically creates the DR Reduced Features based on the Feature Impact of the top non-blender model on the leaderboard. DR Reduced Features, in most cases, consists of the features that provide 95% of the accumulated impact for the model.

The same detail is provided in our docs here 

zouquncai
Data Scientist
Data Scientist

DataRobot implemented multiple approaches of feature reduction at different stages of the modeling life cycle. 1) during EDA1 (right after you upload the data to DataRobot), non-informative features such as those with too many unique values will be excluded automatically from the informative feature list; 2) after EDA2 (after you click "Start"), DataRobot will remove features with target leakage, i.e. features highly correlated with the target; AND features with ACE score < 0.0005, i.e. features that are marginally correlated with the target; 3) during model training/analysis stage, features with accumulative feature importance over 0.95 will be retained with redundant features removed; 4) some of the algorithms offer intrinsic feature reduction such as LASSO, ENET by shrinking coefficients to 0. 5) If you are using the Automated Feature Discovery functionality to generate additional features from a secondary dataset(s), DataRobot also does supervised feature reduction. Hope above helps. Always keep in mind that feature reduction or feature selection heavily depends on domain knowledge.

desmond_lim
Data Scientist
Data Scientist

Dear @MTL_new ,

As my colleagues @Vinay and @zouquncai have responded to your query I hope it is clearer now for you to understand how the Reduced Feature List has been generated on the leaderboard for the recommended model.

It is using the Permutation-based Feature Impact to rank the most useful features for the specific model ( https://docs.datarobot.com/en/docs/modeling/analyze-models/understand/feature-impact.html#permutatio... ).

 

The DataRobot University instructor-led AutoML I course covers this in great detail ( https://university.datarobot.com/automl-i ) as well as many other topics, processes and best practices that the DataRobot platform incorporates into it's automated model development for the user.

0 Kudos