(Updated July 2020)
This article presents a simple step-by-step guide to using DataRobot MLOps through a typical model deployment and monitoring lifecycle.
This begins with uploading your model into MLOps as a package to the Model Registry--specifically, a package created using a model built in DataRobot AutoML or DataRobot AutoTS. Then we’ll see how to create a deployment from your model package, how to monitor the model performance by comparing predictions to actual outcomes, and how incoming data change across features over time using data drift. Then finally, how to replace the model when its performance degrades.
Creating a deployment begins with uploading a package into the Model Registry. The Model Registry is the central hub for all your model packages, and a package contains a file or set of files with information about your model; this varies depending on the type of model being deployed. Once the package is in the Model Registry you can create a deployment.
The Model Registry provides you with a consistent deployment, replacement, and management experience for all of your ML models. If the model built is in the DataRobot AutoML platform, the model package is automatically added to the Model Registry package list when the deployment gets created from the Leaderboard; otherwise packages are added to the package list manually by upload.
There are three kinds of model packages in the Model Registry:
A key difference is that DataRobot models and custom models have prediction requests received and processed within MLOps through an API call, while external models handle predictions in an outside environment, and after which the predictions are transferred back to MLOps for tracking.
The Model Registry provides the ability to register all of your models in one place to give you a consistent experience, regardless of the origin and location of a deployment. In all three cases, you use MLOps to track the predictions the model makes and assess the model accuracy just the same.
To add a model into the MLOps Model Registry that was built using the DataRobot AutoML or AutoTS platform, the first step is to download the model you want to deploy as a package file from the model Leaderboard. Navigate to your selected model on the Leaderboard, select Predict > Deploy.
Then simply click Generate and download model package.
Once processing completes, give the package a name and save it to your file system.
Next, navigate to the Model Registry. Under Model Packages select Add New Package and then select Import model package file from the dropdown menu. You can either drag and drop the model package file you just saved into the user interface, or you can use the file system browser to locate the file.
Since all the information about the model is already known from within DataRobot, such as the target, the source training data, and the label threshold (if a binary classification model), you don’t need to supply any more information. But you may wish to edit the default name and description.
With the package now in the Model Registry, you can now use it to create a deployment.
Once your model is in the Model Registry, it’s easy to create a deployment from the model package that holds it. In just a few steps you’ll be up and running quickly with your deployment.
To start, select Deploy from the actions menu on the right.
On the following page you’ll enter the remaining information needed to track predictions and model accuracy.
The information you see in the Model section has already been supplied from the contents of the model package file.
Likewise, we see in the Learning section that the data used to train the model is also already known; DataRobot stored the information from when it created the AutoML or AutoTS project.
The Inference section contains information about capturing predictions and we can see that it is only partially complete. DataRobot stores the incoming prediction data received via an API call at the url endpoint provided. If your DataRobot instance is hosted on the Managed AI Cloud, the sub-domain name will be derived from your account, and if you have an on-prem installation, your endpoint will be hosted at your domain. Capturing the predictions allows DataRobot to assess how the nature of your incoming prediction data differs from your training data.
To capture those differences, click Enable Data Drift Tracking. Checking the button to perform segment analysis allows DataRobot to identify characteristics of the incoming data, such as the permission level of the user making the requests or the IP address where the request came from.
But if you want to track the prediction accuracy, you need to be able to associate the predictions the model makes with the actual results. And it’s often that the actual outcome isn’t known for days, weeks, or months later. We refer to these actual results simply as the “actuals.” Now you need an identifier to associate the predictions with the actuals. The Association ID is an extra column that is appended to the rows of the request data, and uniquely identifies each prediction. Then, when you upload the actuals dataset, you supply the Association ID and the actual value of what happened to tie the two together.
Which brings us to the last section for the Actuals outcome. After the deployment is created and you acquire the actuals, you can click the Add Data link to upload them. Just follow a few more steps when uploading actuals.
All that’s left to do now is to give your deployment a name, click Create deployment, and indicate the level of importance for the deployment; this creates the new Deployment. The deployed model is now ready to receive prediction requests and MLOps will start tracking the predictions. To find out more about the model Importance settings, have a look at the MLOps Governance capabilities.
Let's assume you have a deployment and are actively making predictions, but now you want to see how well your model is performing. To do this, you need to upload the actual outcome data and associate it with the predictions that were made.
To track the model accuracy we need to first import the actuals data into the AI Catalog. The AI Catalog is your own dedicated storage resource and provides a centralized way to manage data sets from different data sources. We won't go into the many features it has except to say that you will upload and store your actuals data here.
To do so, click the AI Catalog tab at the top of the screen and then click Add to catalog. Select the source of the data you want to upload and then select the data; in this example we’re uploading a local file.
Navigate back to the Deployment dashboard (click the Deployments tab at the top of the screen) and select your deployment.
Now, to return to the previous page where we created the deployment, click the Settings tab, and we see the Actuals section is now enabled. Click Add Data to locate the actuals data from the AI Catalog. From here, indicate the Actuals Response column (which holds your actual outcome results), the Association ID column to link back to the predictions made, an optional column name to keep a record of what action was taken given the result, along with a timestamp if you want to keep track of when the actual values were obtained.
Click the Accuracy tab and you’ll see how the predictions perform in comparison to the actual outcomes.
In the majority of cases, your models will degrade over time. The composition or type of data may change, or the way you collect and store it may change.
On the Accuracy page, we see the difference between the predictions made and the actual values (and in this case shown here, we can see in the image below that the model is fairly consistently under-predicting the actuals). Most likely, the degraded predictions are a result of a change in the composition of data.
The Data Drift page shows you how the prediction data changes over time from the data you originally used to train a model. In the plot on the left, each green, yellow or red dot represents a feature. The degree of feature importance is on the X-axis, and a calculation of the severity of data drift on the Y-axis. In the plot on the right, we see the range of values of each selected feature, with the original training data in dark blue and more recent prediction data in light blue. Looking at a few examples, we can see how the composition of the data has changed.
So inevitably you’ll want to retrain your model on the latest data and replace the model currently in the deployment with the new model. DataRobot MLOps makes this easy by providing a simple interface to swap out your model, all the while maintaining the lineage of models and all collected drift and accuracy data. And this occurs seamlessly, without any service disruption.
To replace a model, you can click on the actions menu on the far right of the Deployment details page as we see here, and select Replace model. However, this option is only available to select if you are a deployment owner. You can also select Replace model from the same menu on the deployment dashboard on the row for your deployment.
Then simply point DataRobot to the model you want to use by uploading another model package file or referencing one in the Model Registry. DataRobot will do a check that the data types match and then prompt you to indicate a reason for the change, such as degradation seen in data drift. Then, just click Accept and Replace to submit the change.
If you have the governance workflow enabled then reviewers will be notified that the pending change is ready for their review, and the update will occur once it has been approved. In the case that you do not have governance workflow enabled for model replacement, the update is immediate for the deployment.
Now you’ll see the new model located in the History column on the deployment Overview page. Navigating through the Service Health, Data Drift, and Accuracy pages, you’ll find the same dropdown menu allowing you to select a version of the model you want to explore.
Look for more articles and videos tutorials in the DataRobot Community.
If you’re a licensed DataRobot customer, search the in-app Platform Documentation for: Replace a deployed model, Create model packages, Using the Model Registry, Data Drift tab, Accuracy tab, Settings tab.