Machine learning models have biases using small data, and some industries such as health care and manufacturing lack labeled data. In light of this, a good approach is to select robust features to build models.

 

This accelerator introduces an approach to select robust features, use multiple seeds for cross validation, add dummy features to compute the median permutation importance, and then select the most robust dummy features.

 

This notebook outlines how to:

  1. Connect to DataRobot
  2. Create multiple projects by multiple seeds and adding dummy features
  3. Create blend models of top-performing models
  4. Retrieve modeling permutation importance from the top-performing blend models
  5. Remove features whose permutation importance are lower than dummy features
Labels (1)
Contributors
Version history
Last update:
‎03-01-2024 01:20 PM
Updated by: