Data Setup for Time Series Predictions

cancel
Showing results for 
Search instead for 
Did you mean: 

Data Setup for Time Series Predictions

(Last updated July 2020)

This guide walks you through setting up a prediction file for a DataRobot time series model. We will be generating these predictions via the GUI, but you can do this via the API as well.

We are going to use a model from a project we created here, which forecasted store sales.

lhaviland_0-1624911260170.png

To generate the prediction dataset, we need to take into account some settings that were defined during project setup and created during the modeling process. First, you identify the forecast point for your predictions. Then, create blank rows in your prediction dataset to cover the length of the Forecast Distance for your project. In this case the forecast point is 6/14/14, and the forecast distance is one to seven days, so I have added time steps (blank rows) for 6/15/14 to 6/21/14 to the data.


Figure 2. Blank rowsFigure 2. Blank rows


Next, make sure the prediction dataset contains previous records for each series that is equal to, or exceeds, the historical rows. The historical rows requirement is calculated from the Feature Derivation Window plus any difference applied to certain models that require differencing. You can find the Historical rows requirement on the Predict tab, in the right column.


Figure 3. Historical rowsFigure 3. Historical rows

In our case, the historical rows requirement is -42, which means we must include at least the previous 42 records from our forecast point for each series. In this case, I have included the previous 51 records.


Figure 4. Previous Records addedFigure 4. Previous Records added

Now we need to fill in any Known In Advance (KIA) features. If your project does not have KIA features you can move to the next step. If your project has KIA features, you must prepopulate them for your forecast distance(s). In this project we had three KIA features: Store_Size, Marketing, and TouristEvent (see Figure 5).

Figure 5. Known In Advance features for the projectFigure 5. Known In Advance features for the project

The Known In Advance features are populated in the image below.

Figure 6. Known In Advance features populatedFigure 6. Known In Advance features populated

Finally, for multiseries projects, you will need to stack the data in the long format in the same way as the training dataset. If you do not want predictions for a series the model was not trained on, be sure to exclude them from the dataset.

Figure 7. Stacked prediction datasetFigure 7. Stacked prediction dataset

Prediction file checklist

  • Have you included all the fields that were in the training data?
  • Have you included at least the minimum historical rows?
  • Have you added in the forecast time steps?
  • If your project has KIA features, have you prepopulated their values?
  • If your project is a multiseries, have you “stacked” your series in the long format, the same way the training dataset is structured?
  • If your project is a multiseries, have you removed any series you don’t want predictions for, such as new series the model wasn’t trained on?

Now that we have the prediction file set up correctly, we can go to DataRobot and select the model we want to use, then go to the Predict tab. From here, simply drag and drop the prediction dataset and click Compute Predictions.

After predictions are computed, we can either download the forecasts now (as shown in Figure 8), or we can add in the prediction intervals and then download.

Figure 8. Upload dataset and computed predictionsFigure 8. Upload dataset and computed predictions

If you would like to include the prediction intervals, click Preview, then select Options. You can adjust the Prediction Intervals if desired. In our case, we will stick with the defaults and click Update preview, then Download Predictions.

Figure 9. Prediction IntervalsFigure 9. Prediction Intervals

When we open our prediction file, we see our predictions along with our prediction intervals.

Figure 10. PredictionsFigure 10. Predictions

Using the API

You are able to do all of this via the API. Below is a Python code snippet to upload a prediction dataset, request predictions with 80% prediction intervals, and download as a pandas dataframe.

 

 

project = dr.Project.get(<PID>)

models = project.get_datetime_models()
best_model = sorted(
  [model for model in models if model.metrics[project.metric]['backtesting']],
  key=lambda m: m.metrics[project.metric]['backtesting'],
)[0]

dataset = project.upload_dataset('DR_Sales_prediction.csv')
pred_job = best_model.request_predictions(dataset_id=dataset.id,
                                         include_prediction_intervals=True,
                                         prediction_intervals_size=80)
preds = pred_job.get_result_when_complete()
preds.head(5)

 

 


If you have any questions, just click Comment (below) and let us know.

Labels (2)
Comments
Mitch_Carmen
Data Scientist
Data Scientist

Super helpful to see this @Tony! Thanks!

irene lin
Blue LED

How does the exogenous variable be generated after forecast point?

For example, Num_employees are generated between 6/15/14 to 6/21/14.

It depends on the Missing Values Imputed method, just the last value (72 in 6/15/14) or others?

Tony
Data Scientist
Data Scientist

Hi @irene lin 

 

When setting up the prediction file, the only features that we populate into the future are the variables that we have set up as Known In Advance in the project. In figure 6, for that project we had three Known In Advance variables; Store_size, Marketing, and Tourist_event. Since we know what those values will be for the next seven days we are able to populate them for the model to use. The other variables in the dataset are left blank for the future dates, because we don't know them yet. 

 

To actually get those future dates and the values filled in for those, there are several different ways depending on the tools you are using. A common method is to just build a dataset of the timestamps you will need (both training and prediction), then left join your dataset(s) onto that to fill in the values. 

 

irene lin
Blue LED

Hi @Tony ,

Thanks for reply.


How DataRobot will do if those variables (non KIA variables) are left blank?

Tony
Data Scientist
Data Scientist

Hi @irene lin 

 

DataRobot has relaxed Known In Advance features, so if you have missing Known In Advance features you can still get predictions. At a high level, the missing values would be imputed with a median value.  

Version history
Last update:
‎06-28-2021 05:38 PM
Updated by:
Contributors