You can find the latest information in the DataRobot public documentation. Click ? in-app to access the full platform documentation for your version of DataRobot.
(Article updated July 2020)
In this article we present a step-by-step guide to using MLOps with a model that runs in another environment. We describe registering the remote model with MLOps and then calling the MLOps agent API to send prediction data back to record it.
Figure 1. Model packages in Model Registry
MLOps agents provide monitoring and management capabilities for models that are remotely executed within your own execution environment. They collect information from your model, such as prediction output and baseline training data, and communicate back to MLOps to crunch the numbers. You can use the same powerful tools to monitor accuracy, data drift, prediction distribution, and latency, regardless of where the model is running—with the added benefit that remote model deployments are managed within MLOps side-by-side with other supported deployment types.
Architecture and Process Flow
Figure 2. MLOps service/agent process flowThe process works as follows:
You request predictions from your models as you normally would. When the predictions are received back, you make a function call to the MLOps agent API to capture the prediction data. The MLOps agent software is currently available in Python and Java libraries.
The data you pass in the function call contains the prediction data itself, but can also include the prediction time, the number of predictions, and other metrics and deployment statistics.
The MLOps agent logs the prediction data by writing it to a buffer to a location you configure when you first set up the agent.
The agent detects that data is written to the buffer and reads it. If the agent is running as a service, it retrieves the data as soon as it’s available; otherwise, the agent retrieves prediction data when it is run manually.
The agent then reports the prediction data back to MLOps in DataRobot.
Create a Remote Model Package
Figure 3. Create new model package
To get started, you first create a new model package and deployment in DataRobot to represent your remote model. A model package represents information about the model that MLOps will be tracking. This applies for a model built by DataRobot, a custom model uploaded into MLOps, or a remote model (as we are doing here). The model package gets registered in the Model Registry so that MLOps can process the prediction data. Navigate to the Model Registry and, under Model Packages, click Add New Package, and then New external model package.
Figure 4. Add new external model package
Give your package a name and optionally a description and where the model is located. You should provide the learning data so that MLOps can determine how your prediction data is drifting from the training data used to fit your model as a baseline. Specifying a build environment allows MLOps to provide you with the right code example and download link in the right programming language. Indicate what the target variable is and whether the model is for regression or classification. If for classification, also provide the positive and negative class labels, and the label prediction threshold. Then click Create Package.
You will now see the new remote model package in the model registry.
Create a Deployment from the Remote Model Package
The next step is to create a deployment from the model package so that MLOps can track prediction data. To do so, select Deploy from the action menu on the right, which routes you to the Deployments page, Settings tab.
Figure 5. Create deployment for model package
Here you define the features that MLOps will monitor and keep track of. In the Inference section you can enable data drift tracking, prediction row storage, and segment analysis tracking for your deployment. Data drift compares the model’s learning data to its scoring data in order to analyze how the model’s input data changes over time. If you want to enable data drift tracking and did not upload learning data when you created the model package, you can upload that data now. Give the deployment a name, then click Create deployment.
Download and install the MLOps Agent
On the deployment Overview page, click the Integrations tab which will provide you with a code example in Python and Java. You can use this as a template to copy and paste into your workflow, fitting it in with the call to your model’s predict function.
Figure 6. Example code template (shown here in Python)
Before you run the code, download the MLOps agent software archive file and install it. Alternatively you can also download the file from Developer Tools under your profile; see the External Monitoring Agent download link. The file contains all the MLOps components the agent needs and uses, packaged within a single tarball, including the MLOps agent and library, some usage examples, and software documentation, as follows:
The MLOps agent operates by reporting prediction data and information to the MLOps service. It references a configuration file that specifies the location of the service and provides the API token for creating an authenticated connection.
The MLOps library provides interfaces for reporting metrics from remote models to the MLOps service, and is available in Python and Java.
The examples provide API examples for receiving predictions and reporting metrics using the MLOps library, along with scripts to start and stop the agent, get current agent status, upload a training dataset, and create files for storing environment variables.
(And lastly) docs provide HTML documentation with detailed explanations of the components and a quick start to using the MLOps agent.
Once the agent is installed, using it is a two-step process. First, start the MLOps agent service, and then call the API functions to send data back to MLOps. The examples we’re using here are for Python running inside a Jupyter notebook, but could just as easily be done from the command line, with some differences depending on the language you’re using.
To start the agent, execute the start script (start-agent.sh), which can be found in the bin directory of your agent installation:
Figure 7. Starting the MLOps agent (start-agent.sh)
Now let’s walk through some functions using the agent to capture prediction data. We’ll train a simple random forest classifier model to use for this example:
Figure 8. Training a model (random forest classifier)
We will register the model as a model package within MLOps using the following function calls from the MLOpsClient object: upload the data used to train the model, create the deployment, deploy it, and enable data drift tracking:
Figure 9. Register the model as an MLOps model package
Next, let’s use the random forest model we just built to get some predictions and send them to MLOps. The MLOps object is initialized, report_deployment_stats is called to send the number of predictions made and the time, and then report_predictions_data is called to send the prediction data:
Figure 10. Initialize the model and send prediction information (number made and data)
Then, to simulate submitting actuals, we write out a file that contains them. In reality in most cases, the actual result values for predictions aren’t known for some time afterward; however, for the purposes of this example (and to show functionality), we’ll output the actuals as a file using the test dataset found in the MLOps agent examples directory:
Figure 11. Get association IDs for predictions
Figure 12. Write actuals file to simulate submitting actuals
Note that the actuals contain an association ID; this allows MLOps to match the actuals with the predictions at the server.
Figure 13. Submit actuals
We then shut down the MLOps object:
Figure 14. Shut down the MLOps object
Lastly (presumably some time later), when they are ready to be submitted, upload the actuals. This is done by calling submit_actuals on the MLOps client.