Automated Time Series Walkthrough

cancel
Showing results for 
Search instead for 
Did you mean: 

Automated Time Series Walkthrough

(Article updated October 2020.)

Overview

The DataRobot Automated Time Series product accelerates your AI success by combining cutting-edge machine learning and automation with the team you already have in place. Automated Time Series incorporates the knowledge, experience, and best practices of the world's leading data scientists, delivering unmatched levels of automation, accuracy, transparency, and collaboration to help your business become an AI-driven enterprise.

Use Case

This guide will demonstrate the basics of how to build, select, deploy, and monitor a time series model using the automated machine learning capabilities of DataRobot. Time series forecasting is one of the most valuable yet difficult problems in data science that businesses face today. This means that many users are missing out on these capabilities due to lack of expertise and resources. DataRobot solves this challenge, and puts this technology into the hands of both novice users and experienced data scientists.
Time series models learn from recent history to forecast future values. The data for time series use cases comes in many different shapes ranging from daily data to individual transactions.

The use case that will be highlighted throughout these examples comes from the retail industry where we will forecast store sales for the next seven days. Accurately forecasting sales allows companies to do more than just prevent overstocking: it enables businesses to assess store performance while also managing staffing, inventory, and their supply chain. With the automated time series capabilities of DataRobot, we can quickly create an accurate forecasting model across thousands of different stores or product lines, evaluate the most important factors that impact sales, and get future predictions.

Automated Time Series Modeling

STEP 1: Prepare Your Data

Since we want to predict future daily sales for each store, we need to aggregate raw transactions into daily totals. An example file is shown below and includes an identifier for store, our date column, the daily number of sales, and many other attributes about the stores, internal promotions, and external factors like holidays or the level of inflation. (Figure 1) As is often the case, adding more data will increase the predictive power of your model. It's always easier to start with the data you have available today and experiment with other information once you have a working model.

Figure 1. TableFigure 1. Table

STEP 2: Load and Profile Your Data

To get started with DataRobot, you will log in and load a prepared training dataset. DataRobot currently supports .csv, .tsv, .dsv, .xls, .xlsx, .sas7bdat, .bz2, .gz, .zip, .tar, and .tgz file types, plus Apache Avro, Parquet, and ORC (Figure 1). (Note: If you wish to use Avro or Parquet data, contact your DataRobot representative for access to the feature.)

Directly loading data from production databases for model building allows you to quickly train and retrain models, and eliminates the need to export data to a file for ingestion.

DataRobot supports any database that provides a JDBC driver—meaning most databases on the market today can connect to DataRobot. Drivers for Postgres, Oracle, MySQL, Amazon Redshift, Microsoft SQL Server, Snowflake, kdb+, and Hadoop Hive are most commonly used (Figure 2).

Figure 2. ImportFigure 2. Import

After you load your data, DataRobot performs exploratory data analysis (EDA), detecting the data types and showing the number of unique, missing, mean, median, standard deviation, and minimum and maximum values. This information is helpful for getting a sense of the dataset shape and distribution (Figure 3).

Figure 3. DistributionFigure 3. Distribution

STEP 3: Select a Prediction Target and Date/Time Feature

Next, select a prediction target (what you are trying to forecast) from the uploaded dataset. DataRobot will analyze your training dataset and automatically determine the type of analysis (in this case, regression).

DataRobot will automatically recognize date and/or time-based features and ask if you want to set up time-aware modeling. To do so, simply select the recommended feature (i.e., the “Date” in our dataset); DataRobot will display a chart of your prediction target across time (Figure 4).

Figure 4. StartFigure 4. Start

STEP 4: Customize forecast windows and set information as known in advance

DataRobot allows you to set the forecast point, i.e., the moment in time when you want to make a prediction. In our case, we want to predict sales for the next seven days because that’s how often we need to restock our stores. The setting on the right side of the window below ensures the model will generate predictions for each of the next seven days. In other words, if we make our forecast on Sunday, we'll get the predicted sales by store for the next day (Monday) and the coming week (Figure 5).

The left side of the window below will determine the types of features DataRobot will build. The defaults work well, but you can always adjust them for the problem at hand.

Two rules of thumb here:

  • The further back in time you go to derive features, the less information you have to train your model.
  • If the data is changing a lot (as is often the case in finance), a shorter window of time to derive new features is probably better so that you don't incorporate stale information.

Figure 5. Feature Derivation WindowFigure 5. Feature Derivation Window

To prevent target leakage, and include valuable features in the model, DataRobot allows you to specify features that are known ahead of time (“known in advance” features). In our example, we know about upcoming marketing events, holidays, and that the square footage for the store won't change. We can select all of these variables and set them as “known in advance” variables. These variables add a lot to the predictive power of the model. For example, telling DataRobot that there is a holiday coming up in the next seven days can be used to improve the predictions for our sales forecast. Holidays and special event calendars can be uploaded separately.

The default modeling mode is “Quick,” which employs a very effective and efficient use of DataRobot’s automation capabilities. For more control over which algorithms DataRobot runs, there are Manual and full Autopilot options. If you want to further customize the model building process, you can modify a variety of advanced parameters, optimization metrics, feature lists, transformations, partitioning, and sampling options.

STEP 5: Begin the Modeling Process

Click the Start button to begin training models. Once the modeling process begins, DataRobot analyzes the target and implements time series best practices. DataRobot also creates time-based features to use in the different blueprints.

You can easily see how many features contain useful information, and edit feature lists used for modeling (Figure 6).

Figure 6. Feature ListsFigure 6. Feature Lists

There are also options to drill down on variables to view distributions and trends (Figure 7).

Figure 7. TrendsFigure 7. Trends

In addition to traditional time series models like ARIMA, DataRobot automatically builds modern algorithms such as XGBoost, Light GBM, Keras LSTM, TensorFlow, DeepAR, as well as proprietary models like Eureqa, and even open source Prophet models from Facebook that can be compared directly to the traditional models. DataRobot optimizes data automatically for each algorithm, performing operations like one-hot encoding, missing value imputation, text mining, and standardization to transform features for optimal results.

DataRobot streamlines model development by automatically ranking models (or ensembles of models) based on model performance for backtesting and holdout partitions. By cost-effectively evaluating a near-infinite combination of data transformations, features, algorithms, and tuning parameters in parallel across a cluster of servers, DataRobot delivers the best predictive model in the shortest amount of time.

STEP 6: Evaluate the Results of Automated Modeling

After automated modeling is complete, the models Leaderboard (Figure 8 ) will rank each machine learning model so you can then evaluate and select the one you want to use. Click on a model and you have options to Evaluate, Understand, Describe, and Predict.

Figure 8. LeaderboardFigure 8. Leaderboard

To estimate possible model performance, the Evaluate options include industry standard Lift Chart, Feature Fit, Accuracy over Time (Figure 9),  Forecast vs. Actual, and Advanced Tuning. There are also options for measuring models by Learning Curves, Speed versus Accuracy, and Comparisons. The interactive charts to evaluate models are very detailed, but don't require a background in data science in order to understand what they convey.

Figure 9. Accuracy Over TImeFigure 9. Accuracy Over TIme

STEP 7: Review how your Chosen Model Works

DataRobot offers superior transparency, interpretability, and explainability so you easily understand how models were built, and have the confidence to explain to others why a model made the prediction it did.

In the Describe tab, you can view the end-to-end model blueprint containing details of the specific feature engineering tasks and algorithms DataRobot uses to run the model (Figure 10).

Figure 10. BlueprintFigure 10. Blueprint

In the Understand tab, popular exploratory capabilities include Feature Impact, Feature Effects, Prediction Explanations, and Word Cloud. These all help you understand what drives the model’s predictions.

Feature Impact measures how much each feature contributes to the overall accuracy of the model. For example, the reason why a patient was discharged from a hospital has a direct relationship to the likelihood of a patient being readmitted to the hospital. This insight can be invaluable for guiding an organization to focus on what matters most (Figure 11).

Figure 11. Feature ImpactFigure 11. Feature Impact

The Feature Effects chart displays model details on a per-feature basis (a feature's effect on the overall prediction), depicting how a model understands the relationship between each variable and the target (Figure 12). It provides specific values within each column that are likely large factors in determining sales over the next seven days.

Figure 12. Feature EffectsFigure 12. Feature Effects

STEP 8: Make Predictions

Every model built in DataRobot is immediately ready for deployment (Figure 13). You can:

  • Upload a new dataset to DataRobot to be scored in batch and downloaded.
  • Create a REST API endpoint to score data directly from applications. An independent prediction server is available to support low latency, high throughput prediction requirements.
  • Export the model for in-place scoring in Hadoop.
  • Download scoring code, either as editable source code or self-contained executables, to embed directly in applications to speed up computationally intensive operations.
  • Download model coefficients.
  • Export the model to the portable prediction server via Docker, and deploy anywhere in your environment (consider this example of deploying and monitoring on Google Cloud Platform).
Figure 13. PredictionsFigure 13. Predictions

We can easily explore the sales predictions for the next seven days for each store (Figure 14), download the values, and understand our confidence using our estimated prediction interval (blue area).

Figure 14. Prediction PreviewFigure 14. Prediction Preview

STEP 9: Monitor and Manage Deployed Models

With DataRobot you can proactively monitor and manage all deployed machine learning models (including models created outside of DataRobot) to maintain peak prediction performance (Figure 15). This ensures that the machine learning models driving your business are accurate and consistent throughout changing market conditions.

At a glance you can view a summary of metrics from all models in production, including the number of requests (predictions) and key health statistics:

  • Service Health looks at core performance metrics from an operations or engineering perspective: latency, throughput, errors, and usage.
  • Data Drift proactively looks for changes in the data characteristics over time to let you know if there are trends that could impact model reliability.
  • Accuracy compares actual values (or ground truth) corresponding to our predictions so you can assess model performance using standard machine learning metrics.

Figure 15. DeploymentsFigure 15. Deployments

From here you can apply “embedded DataRobot data science” expertise to review model performance and detect model decay. By clicking on a model you can see how the predictions the model has made have changed over time. Dramatic changes here can indicate that your model has gone off track.

You can also analyze data drift (Figure 16) to assess if the model is reliable, even before you get the actual values back. You’re essentially analyzing the difference between the data you’ve scored this model on vs. the data the model was trained on. DataRobot compares the most important features in the model (as measured by its Feature Impact score) and how different each feature’s distribution is from the training data.

  • Green dots indicate features that haven't changed much
  • Yellow dots indicate features that have changed but aren't very important. You should examine these, but changes with these features don't necessarily mandate action, especially if you have lots of models.
  • Red dots indicate important features that have drifted. The more red dots you have, the greater the likelihood that your model needs to be rebuilt or replaced.
Figure 16. DriftFigure 16. Drift

If you decide to replace a model that has drifted, simply paste the URL from a re-trained DataRobot model (a model trained on more recent data from the same data source), or from one that has compatible features (Figure 17). After DataRobot validates that the model matches you can select a reason why you made the replacement for a permanent archive. From this point forward, new prediction requests will go against the new model with no impact to downstream processes. If you ever decide to restore the previous model, you can easily do that through the same process.

Figure 17. Replace ModelFigure 17. Replace Model

Conclusion

DataRobot’s time series capabilities are available as part of a fully-managed software service (SaaS), or in several Enterprise configurations to match your business needs and IT requirements. All configurations feature a constantly expanding set of diverse, best-in-class algorithms from R, Python, Spark, Eureqa, and other sources, giving you the best set of tools for your machine learning and AI challenges.

DataRobot can also automate the development of sophisticated regression and classification models when time series calculations are not required. A similar overview document is available that describes how general regression and classification models can be built in DataRobot. In addition to using the GUI, you can achieve everything covered in this document with Python or R through the API. If you have any questions then leave a comment below.

Attachment: We've attached a PDF file of this article.

Labels (2)
Comments
Blue LED

what is about classification for  multivariate time series, especially with mixture of categorical and continues values

can you share some such a dataset (train and test )  with performance of your code

then it will be possible to test your performance vs  other packages

in any case seems to be you do not prove your capabilities by this way

why you are not open?

do you afraid to show how bad you abilities are?

Version history
Revision #:
19 of 19
Last update:
a week ago
Updated by:
 
Contributors