When running autopilot from API, how can we know what will happen with missing data?
What techniques will Datarobot implement? How do you impute categorical variables?
please also send any any documentation pointers for this
t_chandra
was just noticing this earlier post too which might help too
Fyi @t_chandra https://community.datarobot.com/t5/platform/encoding-of-categorical-variables-and-imputing/m-p/12260
For numerical value, the median is used for imputing the data. If you go to a trained model blueprint and click on the missing value imputed, you will see how we perform the impute. We just impute using the medial, but if you click on copy and edit, you will be able to choose random imputing.
Now, for categorical features. We add a missing class to represent the missing values.
By the way, if you click on Copy and Edit (orange top right corner), you can not only edit the blueprint but see all the additional pre-processing
functionality we offer. Just click on a node and then the + sign to see the list of pre-processing approaches we have.
I hope this answers your question.