Machine learning models trained and deployed outside DataRobot can be monitored with the DataRobot MLOps agent, aka the agent, included with the DataRobot MLOps package. The agent, a Java utility running in parallel with the deployed model, can be used to monitor models developed in Java, Python, and R programming languages.
The agent communicates with the model by means of a spooler (i.e., file system, GCP Pub/Sub, AWS SQS, or RabbitMQ) by sending model statistics (e.g., number of the scored records, number of features, scoring time, data drift, etc.) back to the MLOps dashboard. The agent can be embedded into a Docker image and deployed on the Kubernetes cluster for scalability and robustness.
This article walks through an example for deploying the MLOps agent on Google Kubernetes Engine (GKE) with Pub/Sub as a spooler to monitor a custom Python model that was developed outside DataRobot. The custom model will be scored at the local machine and send the statistics to GCP Pub/Sub. Finally, the agent (deployed on GKE) will consume this data and send it back to the DataRobot MLOps dashboard.
First, create an external deployment in the DataRobot platform. If you are not familiar with the process, see step 8 in this article. You will use the resulting model ID and deployment ID to configure communications with the agent as explained in "Step 4. Run Docker container locally."
We want to create a Docker image that embeds the agent at this step.
mlopsUrl: "MLOPS-URL"
apiToken: "YOUR-DR-API-TOKEN"
channelConfigs:
- type: "PUBSUB_SPOOL"
details: {name: "pubsub", projectId: "YOUR-GOOGLE-PROJECT-ID", topicName: "YOUR-PUBSUB-TOPIC-ID-DEFINED-AT-STEP-2"}
FROM openjdk:8
ENV AGENT_BASE_LOC=/opt/datarobot/ma
ENV AGENT_LOG_PROPERTIES=stdout.mlops.log4j2.properties
ENV AGENT_CONF_LOC=$AGENT_BASE_LOC/conf/mlops.agent.conf.yaml
COPY mlops-agent-*.jar ${AGENT_BASE_LOC}/mlops-agent.jar
COPY conf $AGENT_BASE_LOC/conf
COPY entrypoint.sh /
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["./entrypoint.sh"]
#!/bin/sh
echo "######## STARTING MLOPS-AGENT ########"
echo
exec java -Dlog.file=$AGENT_BASE_LOC/logs/mlops.agent.log -Dlog4j.configurationFile=file:$AGENT_BASE_LOC/conf/$AGENT_LOG_PROPERTIES -cp $AGENT_BASE_LOC/mlops-agent.jar com.datarobot.mlops.agent.Agent --config $AGENT_CONF_LOC
export PROJECT_ID=ai-XXXXXXX-111111
docker build -t gcr.io/${PROJECT_ID}/monitoring-agents:v1 .
This step is often considered as optional but our advice is to always test your image locally to save your time and network bandwidth. In order to send the statistics from the custom Python model back to MLOps, the corresponding library (along with Java and R libraries) is provided in the monitoring agent tarball downloaded at Step 3. You can find them in the lib directory.
docker run -it --rm --name ma -v /path-to-you-directory/mlops.agent.conf.yaml:/opt/datarobot/ma/conf/mlops.agent.conf.yaml -v /path-to-you-directory/your-google-application-credentials.json:/opt/datarobot/ma/conf/gac.json -e GOOGLE_APPLICATION_CREDENTIALS="/opt/datarobot/ma/conf/gac.json"
gcr.io/ai-XXXXX-111111/monitoring-agents:v1
Here is the example of the Python code where your model is scored (all package imports are omitted):
DEPLOYMENT_ID = "EXTERNAL-DEPLOYMENT-ID-DEFINED-AT-STEP-1"
MODEL_ID = "EXTERNAL-MODEL-ID-DEFINED-AT-STEP-1"
PROJECT_ID = "YOUR-GOOGLE-PROJECT-ID"
TOPIC_ID = "YOUR-PUBSUB-TOPIC-ID-DEFINED-AT-STEP-2"
# MLOPS: initialize the MLOps instance
mlops = MLOps() \
.set_deployment_id(DEPLOYMENT_ID) \
.set_model_id(MODEL_ID) \
.set_pubsub_spooler(PROJECT_ID, TOPIC_ID) \
.init()
# Read your custom model pickle file (model has been trained outside DataRobot)
model = pd.read_pickle('custom_model.pickle')
# Read scoring data
features_df_scoring = pd.read_csv('features.csv')
# Get predictions
start_time = time.time()
predictions = model.predict_proba(features_df_scoring)
predictions = predictions.tolist()
num_predictions = len(predictions)
end_time = time.time()
# MLOPS: report the number of predictions in the request and the execution time
mlops.report_deployment_stats(num_predictions, end_time - start_time)
# MLOPS: report the features and predictions
mlops.report_predictions_data(features_df=features_df_scoring, predictions=predictions)
# MLOPS: release MLOps resources when finished
mlops.shutdown()
Set the GOOGLE_APPLICATION_CREDENTIALS environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="your-google-application-credentials.json"
Note: You will need your JSON file with credentials downloaded in the prerequisites, "Step 3. Download Google Cloud account credentials."
Score your data locally to test if the model works as expected. There is a new record in monitoring agent log:
python score-your-model.py
The statistics in the MLOps dashboard have been updated as well:
Once the container image has been tested and validated locally, you need to upload it to a registry so that your Google Kubernetes Engine (GKE) cluster can download and run it.
Now that the Docker image is stored in the Container Registry, you need to create a GKE cluster.
gcloud config set project $PROJECT_ID
gcloud config set compute/zone europe-west1-b
Create a cluster.
Note: We create a private cluster with unrestricted access to the public endpoint here for simplicity. You should restrict access to the control plane for the production environment for obvious security reasons. The detailed information about configuring different GKE private clusters can be found here.
gcloud container clusters create monitoring-agents-cluster \
--network default \
--create-subnetwork name=my-subnet-0 \
--no-enable-master-authorized-networks \
--enable-ip-alias \
--enable-private-nodes \
--master-ipv4-cidr 172.16.0.32/28 \
--no-enable-basic-auth \
--no-issue-client-certificate
where:
--create-subnetwork name=my-subnet-0 causes GKE to automatically create a subnet named my-subnet-0.
--no-enable-master-authorized-networks disables authorized networks for the cluster.
--enable-ip-alias makes the cluster VPC-native.
--enable-private-nodes indicates that the cluster's nodes do not have external IP addresses.
--master-ipv4-cidr 172.16.0.32/28 specifies an internal address range for the control plane. This setting is permanent for this cluster.
--no-enable-basic-auth indicates to disable basic auth for the cluster.
--no-issue-client-certificate disables issuing a client certificate.
This command will finish as follows:
Run the following command to see the cluster worker instances:
gcloud compute instances list
The MLOps agent running on GKE private cluster should have access to the DataRobot MLOps service. To do this, we need to give the private nodes outbound access to the internet. It can be achieved using a NAT cloud router. The official Google documentation is here.
gcloud compute routers create nat-router \
--network default \
--region europe-west1
gcloud compute routers nats create nat-config \
--router-region europe-west1 \
--router nat-router \
--nat-all-subnet-ip-ranges \
--auto-allocate-nat-external-ips
Now we can create K8s ConfigMaps that will contain MLOps agent configuration and Google credentials.
Note: You should use K8s Secrets to save your configuration files for the production usage.
Note: You will need your JSON file with credentials downloaded as part of prerequisites, "Step 3. Download Google Cloud account credentials."
kubectl create configmap ma-configmap --from-file=mlops.agent.conf.yaml=your-path/mlops.agent.conf.yaml
kubectl create configmap gac-configmap --from-file=gac.json=your-google-application-credentials.json
apiVersion: apps/v1
kind: Deployment
metadata:
name: ma-deployment
labels:
app: ma
spec:
replicas: 3
selector:
matchLabels:
app: ma
template:
metadata:
labels:
app: ma
spec:
containers:
- name: ma
image: gcr.io/ai-XXXXXXX-11111111/monitoring-agents:v1
volumeMounts:
- name: agent-conf-volume
mountPath: /opt/datarobot/ma/conf/mlops.agent.conf.yaml
subPath: mlops.agent.conf.yaml
- name: gac-conf-volume
mountPath: /opt/datarobot/ma/conf/gac.json
subPath: gac.json
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /opt/datarobot/ma/conf/gac.json
ports:
- containerPort: 80
volumes:
- name: agent-conf-volume
configMap:
items:
- key: mlops.agent.conf.yaml
path: mlops.agent.conf.yaml
name: ma-configmap
- name: gac-conf-volume
configMap:
items:
- key: gac.json
path: gac.json
name: gac-configmap
DataRobot MLOps offers the ability to monitor all your ML models (trained in DataRobot or outside) in a centralized dashboard. The external model monitoring can be done using MLOps agents that can collect the statistics from ML models developed in Java, Python, or R programming languages.
This article demonstrates how the external Python model–trained outside DataRobot–can be successfully monitored in the MLOps dashboard by agents deployed on GKE and using Pub/Sub as a communication channel.
DataRobot Community resources:
Community GitHub repo: MLOps agent
your link for Monitoring all Your Models with DataRobot MLOps Agent it actually this same article
fixed @jhorace - thank you!