Time Series Anomaly Detection

In this article we discuss Time Series Anomaly Detection. We are going to see how we can use DataRobot Automated Time Series to train a model that detects time series anomalies. 

This use case identifies when a motor is about to experience a failure. The dataset (Figure 1) contains various IoT sensor readings from six different motors inside a manufacturing plant. Our goal is to build an anomaly detection model. Through our testing data, we'll be able to use some labeled anomalies to help select and verify the best model.

Figure 1. DatasetFigure 1. Dataset

Uploading dataset and setting options

To create an anomaly detection model using DataRobot, you first need to upload the dataset into DataRobot (new project screen). After the dataset has been uploaded, you need to tell DataRobot that this is actually an unsupervised time series project. To do that, select No target? and Proceed in unsupervised mode

Figure 2. Unsupervised modelingFigure 2. Unsupervised modeling

Next we set up time aware modeling, choose our date/time feature, and then select Time Series Modeling.

Figure 3. Set up time aware modelingFigure 3. Set up time aware modeling

We need to tell DataRobot how far into the past to go to create lag features and rolling statistics. For this example we will use the default feature derivation window of 120 minutes to 0.  

Figure 4. Feature Derivation window for time aware modelingFigure 4. Feature Derivation window for time aware modeling

Automated Time Series has a number of modeling options that can be configured. We'll look at the most commonly used options. 

Backtests

With time series, we can’t just randomly sample data into partitions. The correct approach is called backtesting, and DataRobot does this automatically. Backtesting ensures we train on historical data and validate on recent data, and then repeat that multiple times to ensure we have a stable model. You can adjust the validation periods and the number of backtests to suit your needs. 

To see the backtesting settings, navigate to Show Advanced Options > Partitioning > Date/Time(For this article, we left the defaults for the dataset as shown in Figure 5.)

Figure 5. Backtesting and validation length optionsFigure 5. Backtesting and validation length options

Event Calendar

DataRobot also allows you to provide an event calendar that will allow it to generate forward-looking features so that the model will be able to better capture special events. A calendar file consists of two fields: the date and the name of the event (Figure 6). 

Figure 6. Event CalendarFigure 6. Event Calendar

To add an event calendar, select Time Series (under Advanced Options), scroll down to Calendar of holidays and special events, and drag and drop your calendar file as shown in Figure 7.

Figure 7. Adding the event calendarFigure 7. Adding the event calendar

There are many more options we could experiment with, but for now this is enough to get started.

Modeling

Once we hit Start, DataRobot will take the original features we gave it (74 for our dataset), and create hundreds of derived features for the numeric, categorical, and text variables. For the dataset we used for this article, Automated Time Series created 456 new time series features as shown in Figure 8. 

Figure 8. DataRobot created many derived features from the original featuresFigure 8. DataRobot created many derived features from the original features

After AutoPilot completes we can examine the results of the Leaderboard (Models tab), and evaluate the top-performing model across all backtests. 

Leaderboard and Synthetic AUC

The Leaderboard sorts the models by the Synthetic AUC. This metric enables you to evaluate your dataset if you don’t have an external test set that identifies the anomalies. 

Figure 9. Leaderboard sorted by Synthetic AUCFigure 9. Leaderboard sorted by Synthetic AUC

Synthetic AUC is a good metric to help identify the best model(s) to use for your dataset; however, the anomalies it finds might be different than the actual anomalies in your dataset. For this article we are using an external test set. We select a model on the Leaderboard, and navigate to the Predict tab and drag-and-drop that dataset to Prediction Datasets section.

Once the test dataset is uploaded, we go to Forecast settings

Figure 10. Upload External Test SetFigure 10. Upload External Test Set

Select Forecast Range Predictions, check the Use known anomalies column to generate scores checkbox and select the name of the column to include when generating scores. Now, we can click Compute Predictions

Once the scores are computed, go to Menu and select Show External Test Column; a new column with that information shows in the Leaderboard. We can compute the rest of the external test set scores for the other blueprints. Once they are finished the Leaderboard will look similar to Figure 11. 

Figure 11. Leaderboard with External Test Set columnFigure 11. Leaderboard with External Test Set column

Anomaly Over Time

One of the most popular visualizations for a time series anomaly detection project is the Anomaly Over Time chart (under the Evaluate tab). Here we can see the anomaly scores plotted over time. We can also change the backtest so that we can evaluate the anomaly scores across the validation periods. 

Figure 12. Anomaly Over TimeFigure 12. Anomaly Over Time

Anomaly Assessment

On the Anomaly Assessment tab (under the Evaluate tab), we can see which features are contributing to the anomaly score via the SHAP values. This is incredibly useful for gaining additional insight into your data and for explaining high scores.

Figure 13. Anomaly AssessmentFigure 13. Anomaly Assessment

ROC Curve

On the ROC Curve tab (under the Evaluate tab) we can check how well the prediction distribution captures the model separation. In the Selection Summary box, you can find the F1 score, recall, precision, and others. At the top right we have the well-known confusion matrix.

Now let’s examine the graphs in the bottom of the tab. The first graph on the left is the ROC curve. This is followed by the Prediction Distribution, where you can adjust and try out different probability thresholds for your target. Lastly, we have the Cumulative Charts (gain and lift charts), which tell you how your effectiveness increases by using this model instead of naive method.

Figure 14. ROC Curve tabFigure 14. ROC Curve tab

Feature Impact 

In Figure 15 you can see the relative impact of each feature on this model, including the derived features (Understand > Feature Impact tab).

Figure 15. Feature ImpactFigure 15. Feature Impact

Feature Effects 

Here we can see how changes to the value of each feature change model predictions (Understand > Feature Effects tab). Figure 16 shows that as motor_2_rpm (actual) increases or decreases, the anomaly score increases. 

Figure 16. Feature EffectsFigure 16. Feature Effects

Prediction Explanations 

Prediction Explanations (from the Understand tab) tell you why your model assigned a value to a specific observation (Figure 17). 

Figure 17. Prediction ExplanationsFigure 17. Prediction Explanations

Predictions

Now that we have built and selected our demand forecast model, we want to get predictions. There are three ways to get time series predictions from DataRobot. 

The first is the simplest: use the GUI to drag-and-drop a prediction dataset (Figure 18). This method is typically used for testing, or for small, ad-hoc forecasting projects that don’t require frequent predictions.

Figure 18. Predictions, drag and dropFigure 18. Predictions, drag and drop

The second method is to create a deployment. This creates a REST endpoint so that you can request predictions via API. This connects the model to a dedicated prediction server and creates a dedicated deployment object. 

Figure 19. Deploy model to prediction serverFigure 19. Deploy model to prediction server

The third method is to deploy your model via Docker. This allows you to put the model closer to the data to reduce latency, as well as scale the scoring model as needed as shown in Figure 20.

Figure 20. Portable Prediction ServerFigure 20. Portable Prediction Server

If you want to try this out for yourself, go to DataRobot University and register for the Time Series Anomaly Detection Lab.

More Information

Check out these resources for more information on the various features we discussed here.

Community articles:

If you’re a licensed DataRobot customer, search the in-app Platform Documentation for Unsupervised learning (anomaly detection) and Time series modeling.

Version history
Revision #:
7 of 7
Last update:
2 weeks ago
Updated by:
 
Contributors