MLOps Agent with GKE and Pub/Sub

Showing results for 
Search instead for 
Did you mean: 

MLOps Agent with GKE and Pub/Sub


  • Introduction
  • Prerequisites
    • Step 1. Install Google Cloud SDK
    • Step 2. Install Kubernetes command-line tool
    • Step 3. Download Google cloud account credentials
  • Main steps
    • Step 1. Create external deployment in MLOps
    • Step 2. Create Pub/Sub topic and subscription
    • Step 3. Create Docker image with MLOps agent
    • Step 4. Run Docker container locally
    • Step 5. Push Docker image to Container Registry
    • Step 6. Create GKE cluster
    • Step 7. Create cloud router
    • Step 8. Create K8s ConfigMaps
    • Step 9. Create K8s Deployment
    • Step 10. Score model
  • Clean up
  • Conclusion


Machine learning models trained and deployed outside DataRobot can be monitored with the DataRobot MLOps agent, aka the agent, included with the DataRobot MLOps package. The agent, a Java utility running in parallel with the deployed model, can be used to monitor models developed in Java, Python, and R programming languages.

The agent communicates with the model by means of a spooler (i.e., file system, GCP Pub/Sub, AWS SQS, or RabbitMQ) by sending model statistics (e.g., number of the scored records, number of features, scoring time, data drift, etc.) back to the MLOps dashboard. The agent can be embedded into a Docker image and deployed on the Kubernetes cluster for scalability and robustness.

This article walks through an example for deploying the MLOps agent on Google Kubernetes Engine (GKE) with Pub/Sub as a spooler to monitor a custom Python model that was developed outside DataRobot. The custom model will be scored at the local machine and send the statistics to GCP Pub/Sub. Finally, the agent (deployed on GKE) will consume this data and send it back to the DataRobot MLOps dashboard.


Step 1. Install Google Cloud SDK

  1. Install the Google Cloud SDK specific to your operating system. The details can be found here.
  2. Run the following at a command prompt: gcloud init. You will be asked to choose an existing project or create a new one, and to select the compute zone.

Step 2. Install Kubernetes command-line tool

  • Install the Kubernetes command-line tool:
    gcloud components install kubectl.

Step 3. Download Google Cloud account credentials

  1. Retrieve the service account credentials to call Google Cloud APIs. If you don’t have a default service account, you can create it by following this procedure
  2. Once it’s created, you should download the JSON file containing your credentials. There are two ways to pass your credentials later to the application that will call Google Cloud APIs: via environment variable GOOGLE_APPLICATION_CREDENTIALS or using code.

Main steps

Step 1. Create external deployment in MLOps

First, create an external deployment in the DataRobot platform. If you are not familiar with the process, see step 8 in this article. You will use the resulting model ID and deployment ID to configure communications with the agent as explained in "Step 4. Run Docker container locally."

Step 2. Create Pub/Sub topic and subscription

  1. Go to your Google Cloud console Pub/Sub service and create a topic (i.e., a named resource
    to which publishers send messages).

  2. Next, create a subscription (a named resource representing the stream of messages from a single, specific topic, to be delivered to the subscribing application) using that Pub/Sub topic and delivery type Pull. This provides a Subscription ID. (As needed, you can configure message retention duration or other parameters.)


Step 3. Create Docker image with MLOps agent

We want to create a Docker image that embeds the agent at this step.

  1. Create the working directory on the machine where you will prepare the necessary files.
  2. Create the directory, conf.
  3. Download the tarball file with the MLOps agent from DataRobot Developer Tools and unzip it.

  4. Copy file from the unzipped directory/conf to your working directory/conf.
  5. Copy mlops.agent.conf.yaml file to the working directory. Make sure the following parameters are provided (and then we’ll use the defaults for all other parameters):
    • mlopsUrl—for DataRobot Managed AI Cloud, this is; use the correct URL for your installation
    • apiToken
    • projectId (GCP ProjectId)
    • topicName (defined at “Step 2. Create Pub/Sub topic and subscription”)

      mlopsUrl: "MLOPS-URL"
      apiToken: "YOUR-DR-API-TOKEN"
        - type: "PUBSUB_SPOOL"
          details: {name: "pubsub", projectId: "YOUR-GOOGLE-PROJECT-ID", topicName: "YOUR-PUBSUB-TOPIC-ID-DEFINED-AT-STEP-2"}

  6. Copy unzipped directory/lib/mlops-agent-X.X.X.jar file to the working directory.
  7. Create the Dockerfile with the following content in the working directory:

    FROM openjdk:8
    ENV AGENT_BASE_LOC=/opt/datarobot/ma
    ENV AGENT_CONF_LOC=$AGENT_BASE_LOC/conf/mlops.agent.conf.yaml
    COPY mlops-agent-*.jar ${AGENT_BASE_LOC}/mlops-agent.jar
    COPY conf $AGENT_BASE_LOC/conf
    COPY /
    RUN chmod +x /
    ENTRYPOINT ["./"]
  8. Create file with the following content:

    echo "######## STARTING MLOPS-AGENT ########"
    exec java -Dlog.file=$AGENT_BASE_LOC/logs/mlops.agent.log -Dlog4j.configurationFile=file:$AGENT_BASE_LOC/conf/$AGENT_LOG_PROPERTIES -cp $AGENT_BASE_LOC/mlops-agent.jar com.datarobot.mlops.agent.Agent --config $AGENT_CONF_LOC
  9. Create Docker image (don’t forget the period at the end of Docker build command).

    export PROJECT_ID=ai-XXXXXXX-111111
    docker build -t${PROJECT_ID}/monitoring-agents:v1 .
  10. Run the docker images command to verify that the build was successful.

Step 4. Run Docker container locally

This step is often considered as optional but our advice is to always test your image locally to save your time and network bandwidth. In order to send the statistics from the custom Python model back to MLOps, the corresponding library (along with Java and R libraries) is provided in the monitoring agent tarball downloaded at Step 3. You can find them in the lib directory.

  1. Install DataRobotMLOps library for Python:
    pip install datarobot_mlops-6.3.3-py2.py3-none-any.whl

  2. Run your Docker container image.

    Note: You will need your JSON file with credentials downloaded in the prerequisites (Step 3. Download Google Cloud account credentials).

    docker run -it --rm --name ma -v /path-to-you-directory/mlops.agent.conf.yaml:/opt/datarobot/ma/conf/mlops.agent.conf.yaml -v /path-to-you-directory/your-google-application-credentials.json:/opt/datarobot/ma/conf/gac.json -e GOOGLE_APPLICATION_CREDENTIALS="/opt/datarobot/ma/conf/gac.json"

    Here is the example of the Python code where your model is scored (all package imports are omitted):

        # MLOPS: initialize the MLOps instance
        mlops = MLOps() \
            .set_deployment_id(DEPLOYMENT_ID) \
            .set_model_id(MODEL_ID) \
            .set_pubsub_spooler(PROJECT_ID, TOPIC_ID) \
        # Read your custom model pickle file (model has been trained outside DataRobot)
        model = pd.read_pickle('custom_model.pickle')
        # Read scoring data
        features_df_scoring = pd.read_csv('features.csv')
        # Get predictions
        start_time = time.time()
        predictions = model.predict_proba(features_df_scoring)
        predictions = predictions.tolist()
        num_predictions = len(predictions)
        end_time = time.time()
        # MLOPS: report the number of predictions in the request and the execution time
        mlops.report_deployment_stats(num_predictions, end_time - start_time)
        # MLOPS: report the features and predictions
        mlops.report_predictions_data(features_df=features_df_scoring, predictions=predictions)
        # MLOPS: release MLOps resources when finished
  3. Set the GOOGLE_APPLICATION_CREDENTIALS environment variable:
    export GOOGLE_APPLICATION_CREDENTIALS="your-google-application-credentials.json"

    Note: You will need your JSON file with credentials downloaded in the prerequisites, "Step 3. Download Google Cloud account credentials."

  4. Score your data locally to test if the model works as expected. There is a new record in monitoring agent log: 


    The statistics in the MLOps dashboard have been updated as well:


Step 5. Push Docker image to Container Registry

Once the container image has been tested and validated locally, you need to upload it to a registry so that your Google Kubernetes Engine (GKE) cluster can download and run it.

  1. Configure the Docker command-line tool to authenticate to Container Registry:
    gcloud auth configure-docker

  2. Push the Docker image you built at “Step 3. Create Docker image with MLOps agent” to the Container Registry: 
    docker push${PROJECT_ID}/monitoring-agents:v1

Step 6. Create GKE cluster

Now that the Docker image is stored in the Container Registry, you need to create a GKE cluster.

  1. Set your project ID and Compute Engine zone options for the gcloud tool:

    gcloud config set project $PROJECT_ID
    gcloud config set compute/zone europe-west1-b

  2. Create a cluster.

    Note: We create a private cluster with unrestricted access to the public endpoint here for simplicity. You should restrict access to the control plane for the production environment for obvious security reasons. The detailed information about configuring different GKE private clusters can be found here.

    gcloud container clusters create monitoring-agents-cluster \
        --network default \
        --create-subnetwork name=my-subnet-0 \
        --no-enable-master-authorized-networks \
        --enable-ip-alias \
        --enable-private-nodes \
        --master-ipv4-cidr \
        --no-enable-basic-auth \


    --create-subnetwork name=my-subnet-0 causes GKE to automatically create a subnet named my-subnet-0.

    --no-enable-master-authorized-networks disables authorized networks for the cluster.

    --enable-ip-alias makes the cluster VPC-native.

    --enable-private-nodes indicates that the cluster's nodes do not have external IP addresses.

    --master-ipv4-cidr specifies an internal address range for the control plane. This setting is permanent for this cluster.

    --no-enable-basic-auth indicates to disable basic auth for the cluster.

    --no-issue-client-certificate disables issuing a client certificate.

    This command will finish as follows:


  3. Run the following command to see the cluster worker instances:
    gcloud compute instances list


Step 7. Create cloud router

The MLOps agent running on GKE private cluster should have access to the DataRobot MLOps service. To do this, we need to give the private nodes outbound access to the internet. It can be achieved using a NAT cloud router. The official Google documentation is here.

  1. Create a cloud router:
    gcloud compute routers create nat-router \
        --network default \
        --region europe-west1

  2. Add configuration to the router. 
    gcloud compute routers nats create nat-config \
        --router-region europe-west1 \
        --router nat-router \
        --nat-all-subnet-ip-ranges \

Step 8. Create K8s ConfigMaps

Now we can create K8s ConfigMaps that will contain MLOps agent configuration and Google credentials.

Note: You should use K8s Secrets to save your configuration files for the production usage.

Note: You will need your JSON file with credentials downloaded as part of prerequisites, "Step 3. Download Google Cloud account credentials."

  • Create ConfigMaps:
    kubectl create configmap ma-configmap --from-file=mlops.agent.conf.yaml=your-path/mlops.agent.conf.yaml
    kubectl create configmap gac-configmap --from-file=gac.json=your-google-application-credentials.json

Step 9. Create K8s Deployment

  1. Create the file ma-deployment.yaml with the following content. (Note that we use three always-running replicas; if you need autoscaling, please use kubectl autoscale deployment.)

    apiVersion: apps/v1
    kind: Deployment
      name: ma-deployment
        app: ma
      replicas: 3
          app: ma
            app: ma
          - name: ma
            - name:  agent-conf-volume
              mountPath: /opt/datarobot/ma/conf/mlops.agent.conf.yaml
              subPath: mlops.agent.conf.yaml
            - name:  gac-conf-volume
              mountPath: /opt/datarobot/ma/conf/gac.json
              subPath: gac.json
              value: /opt/datarobot/ma/conf/gac.json
            - containerPort: 80
          - name:  agent-conf-volume
              - key: mlops.agent.conf.yaml
                path: mlops.agent.conf.yaml
              name: ma-configmap
          - name:  gac-conf-volume
              - key: gac.json
                path: gac.json
              name: gac-configmap
  2. Create deployment:
    kubectl apply -f ma-deployment.yaml

  3. Check running pods:
    kubectl get pods

Step 10. Score model

  1. Now once again we can score our local model:

  2. Check the GKE Pod log: it shows that one record has been sent to DataRobot.


  3. Check the Pub/Sub log.
  4. Check the DataRobot MLOps dashboard.

Clean up

  1. Delete the NAT in router:
    gcloud compute routers nats delete nat-config --router=nat-router --router-region=europe-west1

  2. Delete the cloud router:
    gcloud compute routers delete nat-router --region=europe-west1

  3. Delete the cluster:
    gcloud container clusters delete monitoring-agents-cluster


DataRobot MLOps offers the ability to monitor all your ML models (trained in DataRobot or outside) in a centralized dashboard. The external model monitoring can be done using MLOps agents that can collect the statistics from ML models developed in Java, Python, or R programming languages. 

This article demonstrates how the external Python model–trained outside DataRobot–can be successfully monitored in the MLOps dashboard by agents deployed on GKE and using Pub/Sub as a communication channel.

More information

DataRobot Community resources:

Community GitHub repo: MLOps agent

Labels (3)
Blue LED

your link for Monitoring all Your Models with DataRobot MLOps Agent it actually this same article

Community Team
Community Team

fixed @jhorace - thank you!

Version history
Revision #:
8 of 8
Last update:
‎12-02-2020 04:32 PM
Updated by: