cancel
Showing results for 
Search instead for 
Did you mean: 

Feature selection issue

Feature selection issue

Hello

 

I have question about Datarobot's feature selection methodology.

I did data analysis as follow odering.

 

1) Make project with 190 features

2) Reduce features from 190 to 12 using FIRE(Feature Importance Rank Ensembling). The best feature list name is "Reduced FL by Median Rank, top12".

https://docs.datarobot.com/en/docs/api/guide/python/feat-select/Feature-Importance-Rank-Ensembling.h...

3) Get the best recommanded model with 1.7672(validation), 2.8646(CV) RMSE.

4) Make another feature list by my self and Retrain it. The name of custom feature list is "feature_17". 

5) The new retrained model with "feature_17" have better performance like 0.7199(validation), 1.0193(CV) RMSE.

 

I'm very dissapointed with the results.

Becuase the selected feature list by user have the best performance.

It means I don't need to use DataRobot's feature selection function.

"Reduced FL by Median Rank, top12" and "feature_17" did not overlap.

"DR Reduced Features M230" which is made by DataRobot also did not overlap with "feature_17".

 

[Leader board results]

111.png

 

 

I wonder why this results come out?

I found that "feature_17"'s feature impact have lots of redundant alerts.

While "Reduced FL by Median Rank, top12" have just a redundant alert.

 

[Reduced FL by Median Rank, top12 's Feature Impact]

222.png

 

[feature_17's Feature Impact]

333.png

 

Is the redundant feature affacted to select feature list?

I thought FIRE is the best way to select features in DataRobot.

How can I explain this results?

 

Help me.

Thank you.

Labels (1)
0 Kudos
2 Replies

I read below a Q&A content.

https://community.datarobot.com/t5/platform/how-datarobot-reduce-features/m-p/15346

If I use FIRE(Feature Importance Rank Ensembling) python code, redundant features and target leakages will be removed automatically by create_modeling_featurelist(namefeatures)?

Becuase I saw this code from FIRE python lines.

 

# Create a new featurelist
featurelist = project.create_modeling_featurelist(f'Reduced FL by Median Rank, top{n_feats}', top_ranked_feats)
featurelist_id = featurelist.id

 

If it is right, can I use redundant features and leakage features?

Do you have option to power off removing  redundant features and leakage features?

0 Kudos
Abdul.J
DataRobot Alumni

Hi,

Can you try running a project without this setting in the "additional" tab of the advanced settings in a project and see if it helps. 

 

AbdulJ_0-1661326462006.png

 

0 Kudos