This guide walks you through setting up a prediction file for a DataRobot time series model. We will be generating these predictions via the GUI, but you can do this via the API as well.
We are going to use a model from a project we created here, which forecasted store sales.
To generate the prediction dataset, we need to take into account some settings that were defined during project setup and created during the modeling process. First, you identify the forecast point for your predictions. Then, create blank rows in your prediction dataset to cover the length of the Forecast Distance for your project. In this case the forecast point is 6/14/14, and the forecast distance is one to seven days, so I have added time steps (blank rows) for 6/15/14 to 6/21/14 to the data.
Figure 2. Blank rows
Next, make sure the prediction dataset contains previous records for each series that is equal to, or exceeds, the historical rows. The historical rows requirement is calculated from the Feature Derivation Window plus any difference applied to certain models that require differencing. You can find the Historical rows requirement on the Predict tab, in the right column.
Figure 3. Historical rows
In our case, the historical rows requirement is -42, which means we must include at least the previous 42 records from our forecast point for each series. In this case, I have included the previous 51 records.
Figure 4. Previous Records added
Now we need to fill in any Known In Advance (KIA) features. If your project does not have KIA features you can move to the next step. If your project has KIA features, you must prepopulate them for your forecast distance(s). In this project we had three KIA features: Store_Size, Marketing, and TouristEvent (see Figure 5).
Figure 5. Known In Advance features for the project
The Known In Advance features are populated in the image below.
Figure 6. Known In Advance features populated
Finally, for multiseries projects, you will need to stack the data in the long format in the same way as the training dataset. If you do not want predictions for a series the model was not trained on, be sure to exclude them from the dataset.
Figure 7. Stacked prediction dataset
Prediction file checklist
Have you included all the fields that were in the training data?
Have you included at least the minimum historical rows?
Have you added in the forecast time steps?
If your project has KIA features, have you prepopulated their values?
If your project is a multiseries, have you “stacked” your series in the long format, the same way the training dataset is structured?
If your project is a multiseries, have you removed any series you don’t want predictions for, such as new series the model wasn’t trained on?
Now that we have the prediction file set up correctly, we can go to DataRobot and select the model we want to use, then go to the Predict tab. From here, simply drag and drop the prediction dataset and click Compute Predictions.
After predictions are computed, we can either download the forecasts now (as shown in Figure 8), or we can add in the prediction intervals and then download.
Figure 8. Upload dataset and computed predictions
If you would like to include the prediction intervals, click Preview, then select Options. You can adjust the Prediction Intervals if desired. In our case, we will stick with the defaults and click Update preview, then Download Predictions.
Figure 9. Prediction Intervals
When we open our prediction file, we see our predictions along with our prediction intervals.
Figure 10. Predictions
Using the API
You are able to do all of this via the API. Below is a Python code snippet to upload a prediction dataset, request predictions with 80% prediction intervals, and download as a pandas dataframe.
project = dr.Project.get(<PID>)
models = project.get_datetime_models()
best_model = sorted(
[model for model in models if model.metrics[project.metric]['backtesting']],
key=lambda m: m.metrics[project.metric]['backtesting'],
dataset = project.upload_dataset('DR_Sales_prediction.csv')
pred_job = best_model.request_predictions(dataset_id=dataset.id,
preds = pred_job.get_result_when_complete()
If you have any questions, just click Comment (below) and let us know.