At prediction time, the way DataRobot handles new levels of a categorical variable it has not seen in the training set depends on the task used for Categorical variable encoding in the blueprint:
For the "one-hot encoding" task: The new level is replaced by the category "All Others". The blueprint creates an "All Others" category in the following conditions:
- if there are less observations in the category than `min_support` (default 5)
- if there are more categories than `card_max` (default 100)
- if max_features is reached (not used by default)
If there is no "All Others", then it would put 0 for all binary flags in the encoded variable.
For the "ordinal encoding" task: The ordinal encoder will encode new levels either as the "missing" level or as a "low support" level. The "low support" would get created if there are less observations than `min_support` or more categories than `card_max` If there is neither missing nor low support in the training data, it will get encoded as a new level "low support" which will be similar to the least frequent category in the training data.