(Updated February 2021)
With DataRobot Machine Learning Operations (MLOps), you have a central hub to deploy, monitor, manage, and govern machine learning models in production to maximize your investments in data science teams and to manage risk and regulatory compliance.
In this article we're going to present a simple step-by-step guide to using DataRobot MLOps through a typical lifecycle. We start with a quick tour of the main pages where you will be spending most of your time as you utilize MLOps to monitor and manage your deployed models. Then, we present the steps for getting your models and data into MLOps.
We begin by uploading a model into MLOps as a Model Package into the Model Registry. Then we’ll see how to create a deployment from your model package. Next we show how to monitor incoming data for changes across all the model feature variables over time using data drift, along with assessing the model performance by comparing predictions made to actual outcomes. This includes a step to upload the actual results so that accuracy can be tracked. From there we show how to replace a model when its performance degrades. And finally, we discuss leveraging a process control framework for your model development and implementation workflows using MLOps Governance.
The Deployments dashboard is the first page you land on when you access the MLOps user interface. It presents an inventory of all of your deployments.
By deployment we are referring to the model you have deployed and is available for scoring or inference requests. The deployment is a separate entity from the model; you can replace the model with a newer model version without disrupting the way you access it to get predictions, since you access the model through the deployment. This also allows MLOps to monitor each underlying model version separately and keep track of the historical lineage of models for the deployment.
On the Deployments dashboard, across the top of the inventory, a summary of the usage and status of all active deployments is displayed, with color-coded health indicators.
Beneath the summary is an individual report for each deployment. Next to the name of the deployment is the relative status of each deployment across three core monitoring dimensions: Service Health, Data Drift, and Accuracy. In addition, the columns displayed can be switched to show the deployments with information relevant to the governance perspective, vs the prediction health information. These views are referred to as “lenses.”
A few metrics on prediction activity traffic are shown as well as a menu of options available to manage the model.
To view all of this information in detail, select the deployment you want to view; you will land on the Overview tab, the first of several deployment details tabs that provide features for monitoring and managing the deployment.
The deployment Overview tab provides a model-specific summary that describes the deployment, including the information you supplied when creating the deployment and any model replacement and deployment governance activity.
The Service Health tab tracks metrics about a deployment’s ability to respond to prediction requests quickly and reliably. This helps identify any bottlenecks affecting speed and response time. It also helps you assess throughput and capacity, which is critical to proper resource provisioning in order to support good performance and latency levels.
Next is the Data Drift page. By leveraging training data (aka “learning data”) and prediction scoring data (aka “inference data”) that are added to your deployment, MLOps can assess data drift, which is a calculation of how the incoming data for predictions differed from the data used to train the model.
The Accuracy tab shows you how accurate the predictions are for the model. Capturing the results for what actually occurred from the predictions your models make may be immediately apparent, or may take days, weeks, or even months to determine. In any case, once you have those predictions and you upload them, MLOps will associate the actual results with the predictions made and present the calculated accuracy for review and analysis. Information shown in the Accuracy tab can help identify if the deployment should be replaced.
The Humility tab is useful for configuring rules that enable models to recognize, in real-time, when they make uncertain predictions or receive data they haven't seen before. In these cases, you can specify actions to take for individual predictions, allowing you to have desired behaviors with rules that depend on different triggers. You use humility rules to associate trigger conditions with corresponding actions, then track humility data collected over time. Using humility rules helps to mitigate risk for model predictions in production. In addition, prediction warnings help you identify when predictions don’t match their expected result in production. If the Prediction Intervals feature is enabled, you can use it to help quantify the accuracy of predictions for regression projects.
Within the Challengers tab you can compare predictions, accuracy, and data errors of alternative models to the current production model. DataRobot will compare the other models to your “champion” model to determine if one of them would be a better fit for deployment.
Under Predictions, you’ll see Python code samples of the necessary lines of code needed to make an API call to DataRobot to score new data (for real-time or batch predictions). In many cases, you can simply copy and paste this into your software program and in a matter of minutes you’re integrated and up and running with DataRobot and our API. Also under this tab you can make batch predictions with a deployed model Also under this tab you can make batch predictions with a deployed model, or configure to make predictions with enterprise databases (such as Microsoft SQL Server and Snowflake) or write predictions to Tableau server. Finally, if this is an external deployment, you can configure it to be monitored with a MLOps agent
The last tab, Settings, provides an interface to upload and configure datasets associated with the deployment and underlying model. Notably, this allows you to add data to a deployment, set up notification settings to monitor deployments with alerts, set up a schedule for replaying predictions with challengers, and enable prediction warnings.
Creating a deployment begins with creating a model package and uploading it into the Model Registry.
The Model Registry is the central hub for all your model packages, and a package contains a file, a set of files, and/or information about your model; this varies depending on the type of model being deployed. DataRobot MLOps is flexible to be able to work with:
In all three cases, you create a model package, and once the package is in the Model Registry, from there you can create a deployment.
The Model Registry provides you with a consistent deployment, replacement, and management experience, regardless of the type of model you have deployed. If the model built is in DataRobot AutoML or AutoTS, the model package can be automatically added to the Model Registry package list when the deployment gets created from the Leaderboard; otherwise, packages are added to the package list manually.
Creating a model package is a simple process. The following procedures walk through creating a model package and deployment for each of the three model types:
A key difference is that DataRobot models and custom models have prediction requests received and processed within MLOps through an API call, while external models handle predictions in an outside environment and then those predictions are transferred back to MLOps. An MLOps Agent—the software you install that communicates from your environment back to the MLOps environment—tracks the predictions transferred to MLOps. For this reason the code sample displayed in the Predictions tab is different for the Agent software vs a DataRobot or custom model. However, in all three cases, you source a deployment from a Model Package, utilize MLOps to monitor the data drift and predictions the model makes, and manage the model just the same.
For a model built within DataRobot, navigate to the Leaderboard and click on the model you want to deploy. Then select Predict > Deploy. You have several deployment options available to you:
In all three cases, a model package is created in the Model Registry.
If you create a model package, then after you save the model package file to your file system, you upload it into the destination MLOps environment from the Model Registry.
MLOps allows you to bring your own pre-trained models into DataRobot and the MLOps environment. These models are called custom inference models; inference here means the model is implemented to service prediction requests. By uploading a custom inference model, you can specify the execution environment and which library versions are required to run and test it for readiness to accept prediction requests. Once it passes the test, you can either deploy it or add the package to the Model Registry (from where you can make any further edits and then deploy it when ready). DataRobot supports custom models built with a variety of coding languages, including Python, R, and Java.
Using custom models is beyond the scope of this document. DataRobot licensed customers can find more information in the in-app Platform Documentation, within the "Custom Model Workshop" topic.
The MLOps agent allows you to monitor and manage external models, i.e., those running outside of DataRobot MLOps. With this functionality, predictions and information from these models can be reported as part of DataRobot MLOps deployments. You can use the same model management tools to monitor accuracy, data drift, prediction distribution, latency, etc., regardless of where the model is running.
To create a model package for an external model that is monitored by the MLOps agent, navigate to Model Registry > Model Packages. Click Add New Package and select New external model package.
In the resulting dialog box, complete the fields related to the MLOPs agent-monitored model from which you are retrieving statistics. (The agent software must be installed in your environment to act as a bridge between your model and the MLOps external model deployment. Complete information for setting up the agent is provided in other articles and from the documentation included with the MLOps agent tarball. If needed, search the in-app Platform Documentation for Make Predictions tab for information about the MLOps agent tarball.)
Once the model package is in the Model Registry, you simply navigate to the menu at the far right of any model package and select Deploy.
On the following page you’ll enter the remaining information needed to track predictions and model accuracy.
The information you see in the Model section (such as name and target) has already been supplied from the contents of the model package file.
Likewise, we see in the Learning section that the data used to train the model is also already known; DataRobot stored the information from when it created the AutoML or AutoTS project.
The Inference section contains information about capturing predictions and we can see that it is only partially complete. DataRobot stores the incoming prediction data received via an API call at the URL endpoint provided. If your DataRobot platform is hosted on the Managed AI Cloud, the subdomain name will be derived from your account, and if you have an on-premise installation, your endpoint will be hosted at your domain.
Capturing the predictions allows DataRobot to assess how the nature of your incoming prediction data differs from your training data. To capture those differences, toggle on Enable data drift tracking. If you select to track attributes for segmented analysis then DataRobot will identify characteristics of the incoming data, such as the permission level of the user making the requests or the IP address where the request came from.
In order to track the prediction accuracy, you need to be able to associate the predictions the model makes with the actual results. Commonly, the actual outcome isn’t known for days, weeks, or months later. We refer to these actual results simply as the “actuals.” Now you need an identifier to associate the predictions with the actuals. The Association ID uniquely identifies each prediction and appears in an extra column that is appended to the rows of the request data. When you upload the actuals dataset, you supply the Association ID and the actual value of what happened: this ties them together.
Which brings us to the last section for the Actuals outcome. After the deployment is created and you acquire the actuals, you can drop the file to the Settings > Actuals section to upload them.
All that’s left to do now is to give your deployment a name, click Create deployment, and indicate the level of importance for the deployment; this creates the new deployment. The deployed model is now ready to receive prediction requests and MLOps will start tracking the predictions. To find out more about the model Importance settings, have a look at the MLOps Governance capabilities (Step 9: Governance)
As suggested above, there are some shortcuts for creating a deployment, depending on the type of model. For a DataRobot model, you can deploy it directly from the Leaderboard. For a custom model, you can deploy from the Custom Model Workshop.
You have a deployment and are making predictions, but now you want to see how well your model is performing. To do this, you need to upload the actual outcome data and associate it with the predictions that were made.
To track the model accuracy we need to first import the actuals data into DataRobot. The AI Catalog is your own dedicated storage resource and provides a centralized way to manage data sets from different data sources. We won't go into the many features it has, except to say that you will upload and store your actuals data here. To do so, add your local file under the Actuals section: behind the scenes, DataRobot also adds the file to the AI Catalog.
Once the actuals data is added you can specify the assigned features: Actuals Response column (which holds your actual outcome results), the Association ID column to link back to the predictions made, an optional column name to keep a record of what action was taken given the result, and an optional column name with a timestamp if you want to keep track of when the actual values were obtained.
Select to save changes to the deployment. When viewing the Accuracy tab you’ll see how the predictions perform in comparison to the actual outcomes.
Service Health tracks metrics about a deployment’s ability to respond to prediction requests quickly and reliably. This helps identify any bottlenecks affecting speed and response time. It also helps you assess throughput and capacity, which is critical to proper resource provisioning in order to support good performance and latency levels.
In the majority of cases, your models will degrade over time. The composition or type of data may change, or the way you collect and store it may change.
On the Accuracy tab, we see the difference between the predictions made and the actual values (and in this case shown here, we can see in the image below that the model is fairly consistently under-predicting the actuals). Most likely, the degraded predictions are a result of a change in the composition of data.
The Data Drift tab shows you how the prediction data changes over time from the data you originally used to train the model. In the plot on the left, each green, yellow, or red dot represents a feature. The degree of feature importance is shown on the X-axis, and a calculation of the severity of data drift is on the Y-axis. In the plot on the right, we see the range of values of each selected feature, with the original training data in dark blue and more recent prediction data in light blue. Looking at a few examples, we can see how the composition of the data has changed.
So inevitably you’ll want to retrain your model on the latest data and replace the model currently in the deployment with the new model. DataRobot MLOps makes this easy by providing a simple interface to swap out your model, all the while maintaining the lineage of models and all collected drift and accuracy data. And this occurs seamlessly, without any service disruption.
To replace a model, you can select the actions menu on the far right of the Deployments tab and select Replace model. You can also select Replace model from the same menu on the Deployment dashboard on the row for your deployment. (You can only replace a model if you are the deployment owner.)
Then, simply point DataRobot to the model you want to use: a local .mlpkg file, a model from the Model Registry, or a model pasted from an AutoML URL. DataRobot will check that the data types and features match and then prompt you to indicate a reason for the change, such as degradation seen in data drift. After specifying the reason, just click Accept and replace to submit the change.
With the governance workflow enabled reviewers will be notified that the pending change is ready for their review, and the update will occur once it has been approved. In the case that you do not have governance workflow enabled for model replacement, the update is immediate for the deployment.
And you can see a history of the changes to this deployment, including replacing the model:
Now you’ll see the new model located in the History column on the deployment Overview tab. Navigating through the Service Health, Data Drift, and Accuracy tabs, you’ll find the same dropdown menu allowing you to select a version of the model you want to explore.
All machine learning models tend to degrade over time. While DataRobot does monitor your deployment in real-time, you can always check on it to review the model health. To further assist you, DataRobot provides automated monitoring with a notification system. You can configure notifications to alert you when the service health, incoming prediction data, or model accuracy exceed your defined acceptable levels.
To configure the conditions for notifications, navigate to the Deployment > Settings tab, and select Notifications.
You have three options for control notification delivery via email:
The Monitoring tab is where you set exactly what values trigger the notifications. Users that have the role of “Owner” will be able to modify these settings; however, any user with whom the deployment has been shared can configure the level for the notifications that they want to receive, as shown on the Notifications tab. A user that isn’t an owner of the deployment can still view the same settings information.
Monitoring is available for Service Health, Data Drift, and Accuracy. The checkbox enables notification delivery at regularly scheduled intervals, ranging from minimally on the hour for service health, all the way to as long as once a quarter, which is available for all three performance monitors.
MLOps governance provides your organization with a rights management framework for your model development workflow and process. Certain users are designated to review and approve events related to your deployments. The types of controllable events include creating or deleting deployments, and replacing the underlying model in a deployment.
With governance approval workflow enabled, before you deploy a model you’re prompted to assign an importance level to it: Critical, High, Moderate, or Low. The importance level helps you prioritize your deployments and the way you manage them. How you specify importance for a deployment is going to be based on the factors that drive the business value for where and how you’re applying the model. Typically this reflects a collection of these factors, such as the amount of prediction volume, the potential financial impact, or any regulatory exposure.
Once the deployment is created, reviewers are alerted via email that it requires review. Reviewers are users who are assigned the role of an MLOps deployment administrator; approving deployments is one of their primary functions. While awaiting review, the deployment will be flagged as “NEEDS APPROVAL” in the Deployments dashboard. When reviewers access a deployment that needs approval, they will see a notification and be prompted to begin the review process.
DataRobot’s MLOps platform provides you with one place to manage all your production models, regardless of where they are created or deployed. You can now deliver the value of AI by simplifying the deployment and management of models from multiple machine learning platforms in production. This allows you to proactively manage production models to prevent production issues, ensuring both model trust and performance. This includes live model health monitoring with real-time dashboards, automated monitoring alerts on data deviations, and key model metrics.
When your model is found to have degraded, MLOps model replacement makes your models “hot-swappable” to streamline the model update process without interrupting existing business processes. And with Governance applied, you can safely scale AI projects and maintain control over production models to minimize risk and comply with regulations.