This article provides an end-to-end walkthrough of how to create a demand forecast with DataRobot Automated Time Series. Specifically, you’ll learn about importing data, target selection, as well as modeling options, evaluation, interpretation, and deployment. (If you want to see how you can use the API to do demand forecasting with multiseries data, see this notebook in the DataRobot Community GitHub.)
We are going to use this dataset from a company with ten stores to forecast demand for the next 30 days. In the dataset the stores are stacked on top of each other in a long format. As you can see the data has a number of variables with different variable types such as date, numerical, categorical, and text. Three variables need to be highlighted:
the Date column with days as the unit of analysis,
the Sales column, which is the target variable we want to forecast, and
the Store column, which contains the names of the different stores we will be forecasting.
Figure 1. Dataset in long format with stores “stacked” on top of each other, with a mixture of data types including date, text, categorical, and numeric
Uploading dataset and setting options
To create a demand forecast model using DataRobot, you need to upload the dataset into DataRobot (new project page), and specify Sales as the target column. Then, you need to tell DataRobot that this is a time series problem by setting up time aware modeling, selecting the date field, and selecting time series modeling. DataRobot has detected that this is a multiseries dataset, and returns a list of potential variables to use for the series ID. In this case we will select Store, and click SetseriesID (Figure 2).
Figure 2. Set up Time Aware Modeling, and set series ID
We need to tell DataRobot how far into the future we want to forecast, and how far into the past to go to create lag features and rolling statistics. We need to change the forecast distance to 1 to 30 days, and for now we will use the default feature derivation window (Figure 3).
Figure 3. Set forecast distance
Automated Time Series has a number of modeling options that can be configured. Here, we will look at the most commonly used options.
Next is the option to partition the data (Show Advanced Settings > Date/Time tab). With time series, you can’t just randomly sample data into partitions. The correct approach is backtesting, which trains on historical data and validates on recent data. You can adjust the validation periods, as well as the number of backtests to suit your needs. We will use the defaults for this dataset, as shown in Figure 5.
Figure 5. Backtesting and validation length options
Known In Advance Variables
For time series projects we have the ability to indicate features we will know in advance, so DataRobot can also generate non-lagged features for these variables. In Advanced Options, under the TimeSeries tab you can specify columns that will be known at the forecast point (Figure 4).
Figure 4. Declaring known in advance variables
DataRobot also allows you to provide an event calendar that will allow it to generate forward-looking features so that the model will be able to better capture special events. The event calendar for this dataset (Figure 6) consists of two fields: the date and the name of the event.
Figure 6. Event Calendar
To add the event calendar, scroll down a bit in the TimeSeries tab. Find the Calendarofholidaysandspecialevents section and add your calendar here (Figure 7). For non-multiseries projects: if you don’t have a calendar handy, you can have DataRobot generate one specific to a selected country code. The resulting calendar will include all relevant events during the time period of your dataset.
Figure 7. Adding the event calendar
There are many more options we could experiment with, but for now this is enough to get started.
When we hit Start, DataRobot will take the original features we gave it, and create hundreds of derived features for the numeric, categorical, and text variables. It will then reduce the newly created features down as shown in Figure 8.
Figure 8. DataRobot has created many derived features from the original features
After Autopilot (Full or Quick) completes we can examine the results of the Leaderboard, and evaluate the top-performing model across all backtests.
Also you’ll see that we ran all backtests and unlocked the holdout data. To do this, with your model selected from the Leaderboard click Run in the All Backtests column. From the worker panel, click Unlock project Holdout for all models.
Figure 9. Running all backtests and unlocking holdout
Accuracy Over Time
In Figure 10 we can see the actual and predicted values plotted over time. We can also change the backtest and forecast distances, so we can evaluate the accuracy at different forecast distances across the validation periods.
Figure 10. Accuracy over time
Figure 11 shows the option to see the accuracy over time for each series, or to see the average across all series.
Figure 11. Accuracy over time with drop down by series, or average
Forecast vs Actuals
On the Forecast vs Actuals tab (as shown in Figure 12), we can see what the forecast would be for any given forecast point in the validation period. This allows you to compare how predictions behave from different forecast points to different times in the future.
Figure 12. Forecast vs Actuals
The SeriesInsights tab provides the accuracy of each series based on the metric we choose, in this case RMSE (Figure 13). This is a good way to quickly evaluate the accuracy of each individual series.
Figure 13. Series Insights
The Stability tab provides a summary of how well a model performs on different backtests to determine if it is consistent across time (Figure 14).
Figure 14. Stability
The Forecasting Accuracy tab explains how accurate the model is for each forecast distance (Figure 15).
Figure 13. Forecasting Accuracy
In the Feature Impact tab (under the Understand division, Figure 16) you can see the relative impact of each feature on your specific model, including the derived features.
Figure 16. Feature Impact
The Feature Effects tab shows how changes to the value of each feature changes model predictions. In Figure 17 you can see that as Sales (nonzero) (35 day average baseline) increases, Sales (actual) also increases, proportionally.
Figure 17. Feature Effects
Prediction Explanations explain why your model assigned a value to a specific observation (Figure 18).
Figure 18. Prediction Explanations
Now that we have built and selected our demand forecast model, we want to get predictions. There are two ways to get time series predictions from DataRobot.
The first is the simplest: you can use the UI to drag-and-drop a prediction dataset (Figure 19). This is typically used for testing, or for small ad-hoc forecasting projects that don’t require frequent predictions.
Figure 18. Predictions, drag-and-drop
The second method is to deploy a REST endpoint and request predictions via API (Figure 20). This connects the model to a dedicated prediction server and creates a dedicated deployment object.