This article presents an end-to-end walkthrough of how to create a demand forecast with the DataRobot Automated Time Series product. Specifically, you’ll learn about importing data and target selection, as well as options for evaluating, interpreting, and deploying models.
We are going to use a dataset from a single store to forecast demand for the next 7 days. The dataset has a number of different types of variables including date, text, categoricals, and numerics; the most important of these are the Date column which specifies our unit of analysis (daily), and the Sales column which is the target variable we want to forecast. DataRobot can handle regular, semi-regular, and irregular time series data.
Figure 1. Dataset with a mixture of data types including date, text, categoricals, and numerics
Uploading dataset and setting options
To create a demand forecast model using DataRobot, you need to upload the dataset into DataRobot (new project page), and specify Sales as the target column. Then you need to tell DataRobot this is a time series problem by selecting Set up time-aware modeling and then selecting the Date field. Next you need to select Time Series Modeling.
We need to tell DataRobot how far into the future we want to forecast, and how far into the past to go in order to create lag features and rolling statistics. We can experiment with different settings, but for now we will use the defaults. (Figure 2)
Figure 2. Dataset uploaded, time aware modeling set up
Automated Time Series has a number of modeling options that can be configured. Here, we will look at the most commonly used options.
Known In Advance Variables
For time series projects we have the ability to indicate features we will know in advance, so DataRobot can also generate non-lagged features for these variables. We have two columns that will be known at the forecast point: Marketing and TouristEvent. Check the box next to each, scroll up to Menu, and select the features as “known in advance.” (Figure 3)
Figure 3. Declaring known in advance variables
Next is the option to partition the data. With time series, you can’t just randomly sample data into partitions. The correct approach is backtesting, which trains on historical data, and validates on recent data. You can adjust the validation periods, as well as the number of backtests to suit your needs.
Select Show Advanced Options > Partitioning > Date/Time and use the defaults for this dataset as seen in Figure 4.
Figure 4. Backtesting and validation length options
DataRobot also allows you to provide an event calendar that will allow it to generate forward-looking features so that the model will be able to better capture special events. Figure 5 shows the event calendar for this dataset; it consists of two fields: the date and the name of the event.
Figure 5. Event Calendar
To add the event calendar select Time Series (under Partitioning), scroll down to Calendar of holidays and special events, and drag and drop the calendar file as seen in Figure 6.
Figure 6. Adding the event calendar
There are many more options we could experiment with, but for now this is enough to get started.
Once we hit Start, DataRobot will take the 10 original features we gave it and create hundreds of derived features for the numeric, categorical, and text variables. It will then reduce the 426 newly created features down to 220 as shown in Figure 7.
Figure 7. DataRobot has created 220 derived features from the original 10
After AutoPilot completes we can examine the results of the Leaderboard (Models tab), and evaluate the top performing model across all backtests.
Accuracy Over Time
In Figure 8, we can see the actual and predicted values plotted over time for the selected model (Evaluate > Accuracy over Time). We can also change the backtest and forecast distances, so we can evaluate the accuracy at different forecast distances across the validation periods.
Figure 8. Accuracy over time
On the Stability tab (Evaluate division), we are provided a summary of how well a model performs on different backtests to understand if it is consistent across time (Figure 9).
Figure 9. Stability
On the Forecast Accuracy tab (Evaluate division), we see how accurate the model is for each forecast distance (Figure 10).
Figure 10. Forecasting Accuracy
In Figure 11 you can see the relative impact of each feature on this model, including the derived features (Understand > Feature Impact).
Figure 11. Feature Impact
Here we can see how changes to the value of each feature change model predictions (Understand > Feature Effects). We see in Figure 12 that as Sales (7 day average baseline) increases, Sales (actual) increases proportionally.
Figure 12. Feature Effects
Prediction Explanations (Understand tab) tell you why your model assigned a value to a specific observation (Figure 13).
Figure 13. Prediction Explanations
Now that we have built and selected our demand forecast model, we want to get predictions. There are two ways to get Time Series predictions from DataRobot.
The first is the simplest: use the GUI to drag and drop a prediction dataset (Figure 14). This is typically used for testing, or for small, ad-hoc forecasting projects that don’t require frequent predictions.
Figure 14. Predictions, drag and drop
The second method is to deploy a REST endpoint and request predictions via API (Figure 15). This connects the model to a dedicated prediction server and creates a dedicated deployment object.
Figure 15. Deploy model to prediction server
If you’re a licensed DataRobot customer, search the in-app Platform Documentation for Time Series modeling.