I have question about Datarobot's feature selection methodology.
I did data analysis as follow odering.
1) Make project with 190 features
2) Reduce features from 190 to 12 using FIRE(Feature Importance Rank Ensembling). The best feature list name is "Reduced FL by Median Rank, top12".
3) Get the best recommanded model with 1.7672(validation), 2.8646(CV) RMSE.
4) Make another feature list by my self and Retrain it. The name of custom feature list is "feature_17".
5) The new retrained model with "feature_17" have better performance like 0.7199(validation), 1.0193(CV) RMSE.
I'm very dissapointed with the results.
Becuase the selected feature list by user have the best performance.
It means I don't need to use DataRobot's feature selection function.
"Reduced FL by Median Rank, top12" and "feature_17" did not overlap.
"DR Reduced Features M230" which is made by DataRobot also did not overlap with "feature_17".
[Leader board results]
I wonder why this results come out?
I found that "feature_17"'s feature impact have lots of redundant alerts.
While "Reduced FL by Median Rank, top12" have just a redundant alert.
[Reduced FL by Median Rank, top12 's Feature Impact]
[feature_17's Feature Impact]
Is the redundant feature affacted to select feature list?
I thought FIRE is the best way to select features in DataRobot.
How can I explain this results?