When it comes to reducing features list, the "elbow method" is mentioned.
I've understood the principle but my question is : how do you define where is the elbow ? Is it based on the feature impact value ? On its difference between two features ?
Thank you for your help !
Solved! Go to Solution.
You can download the Feature impact table as a ".csv" file and make an analysis on what features to keep for the reduced feature list.
The elbow method is designed to make decisions on the balance between the complexity and quality of models, but feature reduction doesn't always follow the elbow method assumption - "more complexity means better model fit", usually it is the opposite - at some point more features makes model performance worse, but it is hard to figure out before retraining model on reduced features.
Also please check our article about the FIRE technique
Thank you for your reply @IraWatt !
I get a better understanding of the concept.
In DataRobot context, how do you read from "Feature Impact" the 95% accumulated impact ?
Because if I want to create a custom reduced feature list using the elbow method, I need to be able to determine where is the elbow.
Interesting question, from what I understand the Elbow method is used more in unsupervised learning and the elbow is defined by looking to when adding more clusters does not return any meaningful increase in explaining the difference/variance between rows. Elbow method
You can identify the elbow just by looking at the curve as it typically looks like an inverse exponential curve.
The automated reduced feature list in DataRobot, in most cases, consists of the features that provide 95% of the accumulated impact for the model. Feature lists: DataRobot docs