(Article updated October 2020)
This article presents a simple step-by-step guide to using your own custom, pre-trained model with DataRobot MLOps.
The MLOps suite of tools for monitoring model performance and managing model lifecycle can be used with the custom, pre-trained models that you build in your development environment. As when managing a model built with DataRobot, you add your custom model to the Model Registry by creating a model package. There are two unique pieces to a model package for a custom model. In addition to providing information such as the target, the machine learning type of model, and data used to train it, you also provide the serialized model file (such as a Python pickle file of your model), and information about the execution environment and which libraries are needed to run the model.
Begin by navigating to Model Registry > Custom Model Workshop. The Models page lists all the custom models you’ve created. Click Add New Model to create a new custom model package, and then supply some information that describes it: give the model a name and indicate the target feature and the target type as either binary classification or regression. Additionally, there are optional fields to provide the programming language used to build the model, and a model description. When you’ve completed all of the desired fields, click Add Custom Model to add it to the list of models available in the Workshop.
Next, you need to add files to the new custom model package to tell MLOps how to process the prediction results. In the left pane, you upload a few individual files, or a folder of files, which varies depending on the language of your model and how you want it run. You can upload local files or you can retrieve the files remotely from an Amazon S3 bucket or a Github repository.
At a minimum, you need just one file: the serialized model. You can also include a file with code for additional hooks that DataRobot uses, for example, to load the model, run preprocessing steps or apply any transformations. This file is custom.py for Python models, or custom.R for R models. You can also include the file requirements.txt to specify what libraries are required to run your model.
Then in the right pane, select a “Drop-In” environment to use with the custom model. In the dropdown menu, you can select from one of the two types of environments available: a pre-baked “Drop-in” environment or your own custom environment.
A “Drop-in” environment uses a preconfigured set of common libraries made to work with specific types of model algorithms. For example for Python, there are drop-in environments for scikit-learn, XGBoost, or Pytorch libraries. There are also drop-in environments for R and Java. These environments are maintained and provided by MLOps in the Custom Model Workshop, and cover almost any environment you’d want to use with your custom model. However, if you have additional libraries or library versions to specify, you can add the requirements.txt file to the left pane to incorporate those. You will see the specific library and version in the right pane, and then select Build Environment to build the new environment on top of the drop-in.
By providing an environment separate from a custom model, MLOps can build the environment as a distinct entity for you. This allows you to reuse an environment with specific requirements defined, for any of your models that need it.
To create a new custom environment, navigate to the Environments menu and click Add New Environment. You give the new environment a name, optionally provide description text, and then upload a ZIP or TAR archive that contains the environment files. The archive file contains:
Note: You can find more specific information about customizing and configuring an environment on the MLOps GitHub repository and in the in-app Platform Documentation by searching Using environments with custom models.
When all fields are complete, click Add: the custom environment now is ready for use. Over time, you may want to add a new version of the environment, for example if you want to use newer versions of libraries. You can also see all the active deployments operating under the environment, and view the environment metadata information.
Returning to the custom model, let’s use the Python scikit-learn drop-in environment. Now, with the model paired with an environment, we’re ready to test it with a sample prediction dataset.
The test detects if the model runs into any errors when making predictions; you want to make sure the model handles predictions successfully before you deploy it. Below you can see a trail of the recent tests you ran, and you can see all previous tests listed under Test tab along with a log file of any errors encountered.
Now let’s click Test Model, provide a test dataset of rows to make predictions for, and click Start Test.
A convenient and easy alternative to using this test engine in the user interface, is to use the DRUM tool, which is the DataRobot User Model Runner. This tool allows you to test your custom model locally in your development environment, providing you test results almost immediately to iterate quickly. This tool is available for Python models with a simple pip install command.(https://pypi.org/project/datarobot-drum/)
If you want to update a model for any reason, such as the availability of new package versions, different preprocessing steps, or different hyperparameters, you can update the file contents to create a new version of the model, similar to updating an environment with a new version.
To do so, select the model from the Workshop to edit it and navigate to the Assemble tab. In the Model section, you can delete any existing files you may have in the window, or select Add Files and upload the new files or folders that you want to include.
When you update the individual contents of a model, a new minor version is created (1.1, 1.2, etc.). You can create a new major version of a model (1.0, 2.0, etc.) by selecting New version, and selecting either Copy contents of previous version (to the new version) or create empty version (and then add new files to use for the model).
You can see a list of all model versions under the Versions tab.
If you want to add learning data to the custom model (which allows you to deploy it), you can do so by selecting a custom model and navigating to the Model Info tab which lists attributes about a custom model.
Click Add Learning Data and a pop-up window appears, prompting you to upload the learning data used to train the model.
When you’ve added the learning data, MLOps is able to determine how new incoming predictions differ from, or drift apart from, the original training data. Optionally, you can specify a column name containing the partitioning information for your data (based on training/validation/holdout partitions). When the upload is complete, click Add Learning Data. The other information presented is the data you provided when the custom model was first created.
With your custom model now tested successfully, we’re ready to deploy it. This can be done simply by clicking the Deploy link in the middle of the screen. Alternatively, you can also click View Registry Package if you just want to review the model package for the custom model, but not yet deploy it. For example if you have governance in place for deployment review and approval.
By clicking Deploy, we’re taken to the deployment information page, where some information for the custom model is automatically provided from when it was created. The items on this page are described fully in our other content on the Deployment Details, but in summary, from here you complete the rest of the information needed to deploy the model:
When you’ve added all the available data and your model is fully defined, your deployment is ready to be created. Give the deployment a name at the top of the screen and click Create deployment. Note that by creating the deployment, a model package is created and will appear under the Model Packages tab in the Model Registry.
You can view all the deployments created from this package at any time by clicking Current Deployments from the Custom Model Workshop in the Model Registry. Or by navigating to Model Packages, finding your custom model package, and then clicking Deployments.
With the custom model package now in the Model Registry, if it hasn’t already been deployed—or to deploy it again as a new deployment—simply click to deploy from the menu on the far right, as can be done for any type of model.
Once deployed, you’re ready to make predictions via the API, and begin to monitor and manage the deployment with the full suite of MLOps capabilities.
Community: Introducing DRUM
If you’re a licensed DataRobot customer, search the in-app Platform Documentation for Creating custom inference models and Using environments with custom models.