Showing results for 
Search instead for 
Did you mean: 

How does DataRobot deal with unseen values when scoring

DataRobot Employee
DataRobot Employee
1 Reply
Data Scientist
Data Scientist

At prediction time, the way DataRobot handles new levels of a categorical variable it has not seen in the training set depends on the task used for Categorical variable encoding in the blueprint:

For the "one-hot encoding" task: The new level is replaced by the category "All Others". The blueprint creates an "All Others" category in the following conditions:

- if there are less observations in the category than `min_support` (default 5)

- if there are more categories than `card_max` (default 100)

- if max_features is reached (not used by default)

If there is no "All Others", then it would put 0 for all binary flags in the encoded variable. 


For the "ordinal encoding" task: The ordinal encoder will encode new levels either as the "missing" level or as a "low support" level. The "low support" would get created if there are less observations than `min_support` or more categories than `card_max` If there is neither missing nor low support in the training data, it will get encoded as a new level "low support" which will be similar to the least frequent category in the training data.