Please help me, I have an excel file with lots of missing data, and I need data robot to impute values where there are none so that I can do some regression analysis again in Excel. I know DataRobot deals with the missing values, but after uploading the dataset to DataRobot, how can I again download the fixed dataset to Excel so I can do some additional calculations there?
Hi Helen – Missingness is a deep topic. You are right that DataRobot deals with missing values, but it deals with them differently in different modeling blueprints. We can also use non-random missingness to find predictive signal in the data, so you are right to not simply throw out rows with missing data.
For numeric missing values, one of the simpler techniques that a blueprint might do is to impute missing values with the median, then create an indicator variable to flag if that feature was missing. For categorical missing values, models often treat those as a new category level that can be called "=Missing=".
However, you normally cannot download the preprocessed data from an intermediary step of a model. Do you only need to impute a few features? If they are numeric, I would recommend trying to impute the median in excel.
If you want to try something much more involved, you can train on the data you have to model the missing values. You can use DataRobot to build models with one of your variables as the target, then use the predictions to impute the missing values. To reiterate, the model would only train on rows where the target feature is not missing. Next, you would make predictions for the rows in your dataset with a missing target. You could repeat this process with different projects and different targets for each column with missing data.