With DataRobot Machine Learning Operations (MLOps), you have a central hub to deploy, monitor, manage, and govern machine learning models in production to maximize your investments in data science teams and to manage risk and regulatory compliance.
In this article we're going to present a simple step-by-step guide to using DataRobot MLOps through a typical lifecycle. We start with a quick tour of the main pages where you will be spending most of your time as you utilize MLOps to monitor and manage your deployed models. Then, we present the steps for getting your models and data into MLOps.
We begin by uploading a model into MLOps as a Model Package into the Model Registry. Then we’ll see how to create a deployment from your model package. Next we show how to monitor incoming data for changes across all the model feature variables over time using data drift, along with assessing the model performance by comparing predictions made to actual outcomes. This includes a step to upload the actual results so that accuracy can be tracked. From there we show how to replace a model when its performance degrades. And finally, we discuss leveraging a process control framework for your model development and implementation workflows using MLOps Governance.
The Deployments dashboard is the first page you land when you access the MLOps user interface. It presents an inventory of all of your deployments.
By deployment we are referring to the model you have deployed and is available for scoring or inference requests. The deployment is a separate entity from the model; you can replace the model with a newer model version without disrupting the way you access it to get predictions, since you access the model through the deployment. This also allows MLOps to monitor each underlying model version separately and keep track of the historical lineage of models for the deployment.
On the Deployments dashboard, across the top of the inventory, a summary of the usage and status of all active deployments is displayed, with color-coded health indicators.
Beneath the summary is an individual report for each deployment. Next to the name of the deployment is the relative status of each deployment across three core monitoring dimensions: Service Health, Data Drift, and Accuracy. In addition, the columns displayed can be switched to show the deployments with information relevant to the governance perspective, vs the prediction health information. These views are referred to as “lenses.”
A few metrics on prediction activity traffic are shown as well as a menu of options available to manage the model.
To view all of this information in detail, select the deployment you want to view; you will land on the Overview page, the first of several deployment details pages that provide features for monitoring and managing the deployment.
The deployment Overview page provides a model-specific summary that describes the deployment, including the information you supplied when creating the deployment and any model replacement activity.
The Service Health tab tracks metrics about a deployment’s ability to respond to prediction requests quickly and reliably. This helps identify any bottlenecks affecting speed and response time. It also helps you assess throughput and capacity, which is critical to proper resource provisioning in order to support good performance and latency levels.
Next is the Data Drift page. By leveraging training data (aka “learning data”) and prediction scoring data (aka “inference data”) that are added to your deployment, MLOps can assess data drift, which is a calculation of how the incoming data for predictions differed from the data used to train the model.
The Accuracy page shows you how accurate the predictions are for the model. Capturing the results for what actually occurred from the predictions your models make may be immediately apparent, or may take days, weeks, or even months to determine. In any case, once you have those predictions and you upload them, MLOps will associate the actual results with the predictions made and present the calculated accuracy for review and analysis.
Under Integrations, you’ll see a code sample in the Python language of the necessary lines of code needed to make an API call to DataRobot to score new data. In many cases, you can simply copy and paste this into your software program and in a matter of minutes you’re integrated and up and running with DataRobot and our API.
And lastly, Settings provides an interface to upload and configure datasets associated with the deployment and underlying model. Namely, this allows you to add data to a deployment, set up notification settings to monitor deployments with alerts, and enable prediction warnings.
Creating a deployment begins with creating a model package and uploading it into the Model Registry.
The Model Registry is the central hub for all your model packages, and a package contains a file, a set of files, and/or information about your model; this varies depending on the type of model being deployed. DataRobot MLOps is flexible to be able to work with:
In all three cases, you create a model package, and once the package is in the Model Registry, from there you can create a deployment.
The Model Registry provides you with a consistent deployment, replacement, and management experience, regardless of the type of model you have deployed. If the model built is in DataRobot AutoML or AutoTS, the model package can be automatically added to the Model Registry package list when the deployment gets created from the Leaderboard; otherwise, packages are added to the package list manually.
Creating a model package is a simple process. The following procedures walk through creating a model package and deployment for each of the three model types:
A key difference is that DataRobot models and custom models have prediction requests received and processed within MLOps through an API call, while external models handle predictions in an outside environment and then those predictions are transferred back to MLOps. An MLOps Agent—the software you install that communicates from your environment back to the MLOps environment—tracks the predictions transferred to MLOps. For this reason the code sample displayed in the Integrations page is different for the Agent software vs a DataRobot or custom model. However, in all three cases, you source a deployment from a Model Package, utilize MLOps to monitor the data drift and predictions the model makes, and manage the model just the same.
For a model built within DataRobot, navigate to the Leaderboard and click on the model you want to deploy. Then select Predict > Deploy. You have three deployment options available to you:
In all three cases, a model package is created in the Model Registry.
If you use option 1 and create a model package, then after you save the model package file to your file system, you upload it into the destination MLOps environment from the Model Registry.
MLOps allows you to bring your own pre-trained models into DataRobot and the MLOps environment. These models are called custom inference models; inference here means the model is implemented to service prediction requests. By uploading a custom inference model, you can specify the execution environment and which library versions are required to run and test it for readiness to accept prediction requests. Once it passes the test, you can either deploy it or add the package to the Model Registry (from where you can make any further edits and then deploy it when ready). DataRobot supports custom models built with a variety of coding languages, including Python, Scala, and Java.
Using custom models is beyond the scope of this document. DataRobot licensed customers can find more information in the in-app Platform Documentation, within the "Custom Model Workshop" section.
The MLOps agent allows you to monitor and manage external models, i.e., those running outside of DataRobot MLOps. With this functionality, predictions and information from these models can be reported as part of DataRobot MLOps deployments. You can use the same model management tools to monitor accuracy, data drift, prediction distribution, latency, etc., regardless of where the model is running.
To create a model package for an external model that is monitored by the MLOps agent, navigate to Model Registry > Model Packages. Click Add New Package and select New external model package.
In the resulting dialog box, complete the fields pertaining to the MLOPs Agent-monitored model from which you are retrieving statistics. (The agent software must be installed in your environment to act as a bridge between your model and the MLOps external model deployment. Complete information for setting up the agent is provided in other articles and from the documentation included with the MLOps agent tarball. If needed, search the in-app Platform Documentation for Integrations tab for information about the MLOps agent tarball.)
Once the model package is in the Model Registry, you simply navigate to the menu at the far right of any model package and select Deploy.
On the following page you’ll enter the remaining information needed to track predictions and model accuracy.
The information you see in the Model section (such as name and target) has already been supplied from the contents of the model package file.
Likewise, we see in the Learning section that the data used to train the model is also already known; DataRobot stored the information from when it created the AutoML or AutoTS project.
The Inference section contains information about capturing predictions and we can see that it is only partially complete. DataRobot stores the incoming prediction data received via an API call at the URL endpoint provided. If your DataRobot instance is hosted on the Managed AI Cloud, the subdomain name will be derived from your account, and if you have an on-premise installation, your endpoint will be hosted at your domain.
Capturing the predictions allows DataRobot to assess how the nature of your incoming prediction data differs from your training data. To capture those differences, click Enable data drift tracking. Checking the button to perform segment analysis allows DataRobot to identify characteristics of the incoming data, such as the permission level of the user making the requests or the IP address where the request came from.
But if you want to track the prediction accuracy, you need to be able to associate the predictions the model makes with the actual results. Commonly, the actual outcome isn’t known for days, weeks, or months later. We refer to these actual results simply as the “actuals.” Now you need an identifier to associate the predictions with the actuals. The Association ID uniquely identifies each prediction and appears in an extra column that is appended to the rows of the request data. When you upload the actuals dataset, you supply the Association ID and the actual value of what happened: this ties them together.
Which brings us to the last section for the Actuals outcome. After the deployment is created and you acquire the actuals, you can click the Add Data link to upload them. Just follow a few more steps when Uploading Actuals.
All that’s left to do now is to give your deployment a name, click Create deployment, and indicate the level of importance for the deployment; this creates the new deployment. The deployed model is now ready to receive prediction requests and MLOps will start tracking the predictions. To find out more about the model Importance settings, have a look at the MLOps Governance capabilities (Step 9: Governance)
As suggested above, there are some shortcuts for creating a deployment, depending on the type of model. For a DataRobot model, you can deploy it directly from the Leaderboard. For a custom model, you can deploy from the Custom Model Workshop.
You have a deployment and are making predictions, but now you want to see how well your model is performing. To do this, you need to upload the actual outcome data and associate it with the predictions that were made.
To track the model accuracy we need to first import the actuals data into the AI Catalog. The AI Catalog is your own dedicated storage resource and provides a centralized way to manage data sets from different data sources. We won't go into the many features it has, except to say that you will upload and store your actuals data here. To do so, select AI Catalog and click Add to Catalog. Then, select the source of your data to upload it, which in this case is a local file.
Navigate back to the Deployments dashboard and select your deployment. Now, to return to the previous page where we created the deployment, click the Settings menu item, and we see the Actuals section is now enabled.
Click Add Data to locate the actuals data from the AI Catalog. From here you specify the following: the Actuals Response column (which holds your actual outcome results), the Association ID column to link back to the predictions made, an optional column name to keep a record of what action was taken given the result, and an optional column name with a timestamp if you want to keep track of when the actual values were obtained.
Click Upload when you’re finished specifying this information. Click the Accuracy tab and you’ll see how the predictions perform in comparison to the actual outcomes.
Service Health tracks metrics about a deployment’s ability to respond to prediction requests quickly and reliably. This helps identify any bottlenecks affecting speed and response time. It also helps you assess throughput and capacity, which is critical to proper resource provisioning in order to support good performance and latency levels.
In the majority of cases, your models will degrade over time. The composition or type of data may change, or the way you collect and store it may change.
On the Accuracy page, we see the difference between the predictions made and the actual values (and in this case shown here, we can see in the image below that the model is fairly consistently under-predicting the actuals). Most likely, the degraded predictions are a result of a change in the composition of data.
The Data Drift page shows you how the prediction data changes over time from the data you originally used to train the model. In the plot on the left, each green, yellow, or red dot represents a feature. The degree of feature importance is shown on the X-axis, and a calculation of the severity of data drift is on the Y-axis. In the plot on the right, we see the range of values of each selected feature, with the original training data in dark blue and more recent prediction data in light blue. Looking at a few examples, we can see how the composition of the data has changed.
So inevitably you’ll want to retrain your model on the latest data and replace the model currently in the deployment with the new model. DataRobot MLOps makes this easy by providing a simple interface to swap out your model, all the while maintaining the lineage of models and all collected drift and accuracy data. And this occurs seamlessly, without any service disruption.
To replace a model, you can select actions menu on the far right of the Deployments > Deployments List page and select Replace model. However, this option is only available to select if you are a deployment owner. You can also select Replace model from the same menu on the Deployment dashboard on the row for your deployment.
Then, simply point DataRobot to the model you want to use by uploading another model package file or referencing one in the Model Registry. DataRobot will do a check that the data types match and then prompt you to indicate a reason for the change, such as degradation seen in data drift. Then, just click Accept and replace to submit the change.
With the governance workflow enabled reviewers will be notified that the pending change is ready for their review, and the update will occur once it has been approved. In the case that you do not have governance workflow enabled for model replacement, the update is immediate for the deployment.
Now you’ll see the new model located in the History column on the deployment Overview page. Navigating through the Service Health, Data Drift, and Accuracy pages, you’ll find the same dropdown menu allowing you to select a version of the model you want to explore.
All machine learning models tend to degrade over time. While DataRobot does monitor your deployment in real-time, you can always check on it to review the model health. To further assist you, DataRobot provides automated monitoring with a notification system. You can configure notifications to alert you when the service health, incoming prediction data, or model accuracy exceed your defined acceptable levels.
To configure the conditions for notifications, navigate to the Deployment Settings menu, and click Notifications.
You have three options for control notification delivery via email:
The Monitor tab is where you set exactly what values trigger the notifications. Users that have the role of “Owner” will be able to modify these settings; however, any user with whom the deployment has been shared can configure the level for the notifications that they want to receive, as shown on the Notifications tab. A user that isn’t an owner of the deployment can still view the same settings information.
Monitoring is available for Service Health, Data Drift, and Accuracy. The checkbox enables notification delivery at regularly scheduled intervals, ranging from minimally on the hour for service health, all the way to as long as once a quarter, which is available for all three performance monitors.
MLOps governance provides your organization with a rights management framework for your model development workflow and process. Certain users are designated to review and approve events related to your deployments. The types of controllable events include creating or deleting deployments, and replacing the underlying model in a deployment.
With governance approval workflow enabled, before you deploy a model you’re prompted to assign an importance level to it: Critical, High, Moderate, or Low. The importance level helps you prioritize your deployments and the way you manage them. How you specify importance for a deployment is going to be based on the factors that drive the business value for where and how you’re applying the model. Typically this reflects a collection of these factors, such as the amount of prediction volume, the potential financial impact, or any regulatory exposure.
Once the deployment is created, reviewers are alerted via email that it requires review. Reviewers are users who are assigned the role of an MLOps deployment administrator; approving deployments is one of their primary functions. While awaiting review, the deployment will be flagged as “NEEDS APPROVAL” in the Deployments dashboard (Deployments List). When reviewers access a deployment that needs approval, they will see a notification and be prompted to begin the review process.
DataRobot’s MLOps platform provides you with one place to manage all your production models, regardless of where they are created or deployed. You can now deliver the value of AI by simplifying the deployment and management of models from multiple machine learning platforms in production. This allows you to proactively manage production models to prevent production issues, ensuring both model trust and performance. This includes live model health monitoring with real-time dashboards, automated monitoring alerts on data deviations, and key model metrics.
When your model is found to have degraded, MLOps model replacement makes your models “hot-swappable” to streamline the model update process without interrupting existing business processes. And with Governance applied, you can safely scale AI projects and maintain control over production models to minimize risk and comply with regulations.