Time Series—Classification

cancel
Showing results for 
Search instead for 
Did you mean: 

We're looking into an issue with broken attachments right now. Please stay tuned!

Time Series—Classification

(Article updated October 2020)

This end-to-end walkthrough explains how to perform time series classification with DataRobot Automated Time Series (or AutoTS). Specifically, you’ll learn about importing data and target selection, as well as modeling options, evaluation, interpretation, and deployment.

We are going to use this dataset from a company with ten stores to forecast whether or not they will be properly staffed for the next seven days. In the dataset the stores are stacked on top of each other in a long format. As you can see the data has a number of variables with different variable types such as date, numerical, categorical, and text. Three variables need to be highlighted:

  • the Date column with days as the unit of analysis,
  • the Correct_Num_Emp column, which is the target variable we want to forecast, and
  • the Store column, which contains the names of the different stores we will be forecasting.

Figure 1. Dataset in long format with stores “stacked” on top of each other, with a mixture of data types including date, text, categorical, and numericFigure 1. Dataset in long format with stores “stacked” on top of each other, with a mixture of data types including date, text, categorical, and numeric

Uploading dataset and setting options

To create a time series classification model using DataRobot, you need to upload the dataset (through the new project page) and specify Correct_Num_Emp as the target column. Then you need to tell DataRobot that this is a time series problem by setting up time aware modeling (Set up time-aware modeling), selecting the date field, and selecting Time Series Modeling.

DataRobot has detected that this is a multiseries dataset and returns a list of potential variables to use for the series ID. In this case we will select Store, and click Set series ID (Figure 2).  

Figure 2. Set up Time Aware Modeling, and set series IDFigure 2. Set up Time Aware Modeling, and set series ID

We need to tell DataRobot how far into the future we want to forecast, and how far into the past to go to create lag features and rolling statistics. We will leave the forecast distance of 1 to 7 days, and we will use the default feature derivation window (Figure 3).

Figure 3. Set forecast distanceFigure 3. Set forecast distance

Automated Time Series has a number of modeling options that can be configured. Here, we will look at the most commonly used options.

Known In Advance Variables

For time series projects we have the ability to indicate features we will know in advance, so DataRobot can also generate non-lagged features for these variables. We have three columns that will be known at the forecast point: Store_Size, Marketing, and DestinationEvent. Check the box next to each, then click Menu and select toggle as “known in advance” (Figure 4).

Figure 4. Declaring known in advance variablesFigure 4. Declaring known in advance variables

Backtests

Next is the option to partition the data. With time series, you can’t just randomly sample data into partitions. The correct approach is backtesting, which trains on historical data and validates on recent data. You can adjust the validation periods, as well as the number of backtests to suit your needs. We will use the defaults for this dataset, as shown in Figure 5.

Figure 5. Backtesting and validation length optionsFigure 5. Backtesting and validation length options

Event Calendar

DataRobot also allows you to provide an event calendar that will allow it to generate forward-looking features so that the model will be able to better capture special events. The event calendar for this dataset (Figure 6) consists of two fields: the date and the name of the event.

Figure 6. Event CalendarFigure 6. Event Calendar

To add the event calendar, click the Time Series tab, scroll down to Calendar of holidays and special events, and drag and drop the calendar file (Figure 7).

Figure 7. Adding the event calendarFigure 7. Adding the event calendar

 There are many more options we could experiment with, but for now this is enough to get started.

Modeling

When we hit Start, DataRobot will take the 13 original features we gave it, and create dozens of derived features for the numeric, categorical, and text variables (Figure 8 shows 65 new features available when Autopilot starts).

Figure 8. DataRobot has created 163 derived features from the original 13Figure 8. DataRobot has created 163 derived features from the original 13

After Autopilot completes we can examine the results of the Leaderboard, and evaluate the top-performing model across all backtests.

ROC Curve

On the ROC Curve tab (shown in Figure 9), we can check how well the prediction distribution captures the model separation. In the Selection Summary box, we have metrics such as the F1 score, recall, precision, and others. To the right is the well-known confusion matrix. At the bottom, we have the ROC Curve. This is followed by the Prediction Distribution, where you are able to adjust the probability thresholds. Lastly, we have the Cumulative Gain and Lift Charts, which tell you how many times the effectiveness increases, by using this model instead of a naive method.

Figure 9. ROC CurveFigure 9. ROC Curve

Accuracy Over Time

In Figure 10 we can see the actual and predicted values plotted over time. We can also change the backtest and forecast distances so that we can evaluate the accuracy at different forecast distances across the validation periods.

Figure 10. Accuracy over timeFigure 10. Accuracy over time

Figure 11 shows the option to see the accuracy over time for each series, or see the average across all series.

Figure 11. Accuracy over time with dropdown by series, or averageFigure 11. Accuracy over time with dropdown by series, or average

Forecast vs Actuals

On the Forecast vs Actuals tab (Figure 12), we can see how the forecast would be for any given forecast point in the validation period. This allows you to compare how predictions behave from different forecast points to different times in the future.

Figure 12. Forecast vs ActualsFigure 12. Forecast vs Actuals

Series Accuracy

The Series Accuracy tab provides the accuracy of each series based on the metric we choose (Figure 13). This is a good way to quickly evaluate the accuracy of each individual series.

Figure 13. Series AccuracyFigure 13. Series Accuracy

Stability

The Stability tab provides a summary of how well a model performs on different backtests to determine if it is consistent across time (Figure 14).

Figure 14. StabilityFigure 14. Stability

The Forecast Accuracy tab explains how accurate the model is for each forecast distance (Figure 15).

Figure 15. Forecasting AccuracyFigure 15. Forecasting Accuracy

In the Feature Impact tab under the Understand tab (Figure 16) you can see the relative impact of each feature on your specific model, including the derived features.

Figure 16. Feature ImpactFigure 16. Feature Impact

The Feature Effects tab shows how changes to the value of each feature change model predictions.Figure 17. Feature EffectsFigure 17. Feature Effects

Prediction Explanations

Prediction Explanations tell you why your model assigned a value to a specific observation (Figure 18).

Figure 18. Prediction ExplanationsFigure 18. Prediction Explanations

Predictions

Now that we have built and selected our demand forecast model, we want to get predictions. There are two ways to get time series predictions from DataRobot.

The first is the simplest: you can use the GUI to drag-and-drop a prediction dataset (Figure 19). This is typically used for testing or for small ad-hoc forecasting projects that don’t require frequent predictions.

Figure 19. Predictions, drag and dropFigure 19. Predictions, drag and drop

The second method is to deploy a REST endpoint and request predictions via API (Figure 20). This connects the model to a dedicated prediction server and creates a dedicated deployment object.

Figure 20. Deploy model to prediction serverFigure 20. Deploy model to prediction server

Finally, the third method is to deploy your model via Docker. This allows you to put the model closer to the data to reduce latency, as well as scale the scoring model as needed. (For help deploying models via Docker, see instructions for using AWS, AKS, and GCP.)

Figure 21. Portable Prediction ServerFigure 21. Portable Prediction Server

 

More Information

If you’re a licensed DataRobot customer, search the in-app Platform Documentation for Time series modeling.

Labels (2)
Comments
Sander
Blue LED

what is about classification for  multivariate time series, especially with mixture of categorical and continues values

for example with multivariate time series with table per each label

 

like  target is YES

date                  f1                       f2                  f3     

dec                    0.1                    a                   234

jan                     -0.5                  a                     456

feb                    3.4                   b                      123

march               0.6                   b                    678

 

like  target is NO

date                  f1                       f2                  f3     

dec                    -0.1                    c                   1234

jan                     0.5                  a                     4456

feb                    2.4                   g                      2123

march               1.6                   b                    6678

 

 

 

can you share some such a big dataset (train and test )  with performance of your code

then it will be possible to test your performance vs  other packages

in any case seems to be you do not prove your capabilities by this way

why you are not open?

do you afraid to show how bad you abilities are?

Version history
Last update:
‎02-03-2021 07:24 PM
Updated by:
Contributors