MLOps Agent with GKE and Pub/Sub

cancel
Showing results for 
Search instead for 
Did you mean: 

MLOps Agent with GKE and Pub/Sub

Outline

  • Introduction
  • Prerequisites
    • Step 1. Install Google Cloud SDK
    • Step 2. Install Kubernetes command-line tool
    • Step 3. Download Google cloud account credentials
  • Main steps
    • Step 1. Create external deployment in MLOps
    • Step 2. Create Pub/Sub topic and subscription
    • Step 3. Create Docker image with MLOps agent
    • Step 4. Run Docker container locally
    • Step 5. Push Docker image to Container Registry
    • Step 6. Create GKE cluster
    • Step 7. Create cloud router
    • Step 8. Create K8s ConfigMaps
    • Step 9. Create K8s Deployment
    • Step 10. Score model
  • Clean up
  • Conclusion

Introduction

Machine learning models trained and deployed outside DataRobot can be monitored with the DataRobot MLOps agent, aka the agent, included with the DataRobot MLOps package. The agent, a Java utility running in parallel with the deployed model, can be used to monitor models developed in Java, Python, and R programming languages.

The agent communicates with the model by means of a spooler (i.e., file system, GCP Pub/Sub, AWS SQS, or RabbitMQ) by sending model statistics (e.g., number of the scored records, number of features, scoring time, data drift, etc.) back to the MLOps dashboard. The agent can be embedded into a Docker image and deployed on the Kubernetes cluster for scalability and robustness.

This article walks through an example for deploying the MLOps agent on Google Kubernetes Engine (GKE) with Pub/Sub as a spooler to monitor a custom Python model that was developed outside DataRobot. The custom model will be scored at the local machine and send the statistics to GCP Pub/Sub. Finally, the agent (deployed on GKE) will consume this data and send it back to the DataRobot MLOps dashboard.

Prerequisites

Step 1. Install Google Cloud SDK

  1. Install the Google Cloud SDK specific to your operating system. The details can be found here.
  2. Run the following at a command prompt: gcloud init. You will be asked to choose an existing project or create a new one, and to select the compute zone.

Step 2. Install Kubernetes command-line tool

  • Install the Kubernetes command-line tool:
    gcloud components install kubectl.

Step 3. Download Google Cloud account credentials

  1. Retrieve the service account credentials to call Google Cloud APIs. If you don’t have a default service account, you can create it by following this procedure
  2. Once it’s created, you should download the JSON file containing your credentials. There are two ways to pass your credentials later to the application that will call Google Cloud APIs: via environment variable GOOGLE_APPLICATION_CREDENTIALS or using code.

Main steps

Step 1. Create external deployment in MLOps

First, create an external deployment in the DataRobot platform. If you are not familiar with the process, see step 8 in this article. You will use the resulting model ID and deployment ID to configure communications with the agent as explained in "Step 4. Run Docker container locally."

Step 2. Create Pub/Sub topic and subscription

  1. Go to your Google Cloud console Pub/Sub service and create a topic (i.e., a named resource
    to which publishers send messages).

    pustinov_0-1606853963604.png
  2. Next, create a subscription (a named resource representing the stream of messages from a single, specific topic, to be delivered to the subscribing application) using that Pub/Sub topic and delivery type Pull. This provides a Subscription ID. (As needed, you can configure message retention duration or other parameters.)

    pustinov_1-1606853963742.png

Step 3. Create Docker image with MLOps agent

We want to create a Docker image that embeds the agent at this step.

  1. Create the working directory on the machine where you will prepare the necessary files.
  2. Create the directory, conf.
  3. Download the tarball file with the MLOps agent from DataRobot Developer Tools and unzip it.

    pustinov_2-1606853963629.png
    pustinov_3-1606853963794.png
  4. Copy stdout.mlops.log4j2.properties file from the unzipped directory/conf to your working directory/conf.
  5. Copy mlops.agent.conf.yaml file to the working directory. Make sure the following parameters are provided (and then we’ll use the defaults for all other parameters):
    • mlopsUrl—for DataRobot Managed AI Cloud, this is https://app.datarobot.com; use the correct URL for your installation
    • apiToken
    • projectId (GCP ProjectId)
    • topicName (defined at “Step 2. Create Pub/Sub topic and subscription”)

      mlopsUrl: "MLOPS-URL"
      apiToken: "YOUR-DR-API-TOKEN"
      channelConfigs:
        - type: "PUBSUB_SPOOL"
          details: {name: "pubsub", projectId: "YOUR-GOOGLE-PROJECT-ID", topicName: "YOUR-PUBSUB-TOPIC-ID-DEFINED-AT-STEP-2"}

  6. Copy unzipped directory/lib/mlops-agent-X.X.X.jar file to the working directory.
  7. Create the Dockerfile with the following content in the working directory:

    FROM openjdk:8
    
    ENV AGENT_BASE_LOC=/opt/datarobot/ma
    ENV AGENT_LOG_PROPERTIES=stdout.mlops.log4j2.properties
    ENV AGENT_CONF_LOC=$AGENT_BASE_LOC/conf/mlops.agent.conf.yaml
    
    COPY mlops-agent-*.jar ${AGENT_BASE_LOC}/mlops-agent.jar
    COPY conf $AGENT_BASE_LOC/conf
    COPY entrypoint.sh /
    
    RUN chmod +x /entrypoint.sh
    
    ENTRYPOINT ["./entrypoint.sh"]
  8. Create entrypoint.sh file with the following content:

    #!/bin/sh
    
    echo "######## STARTING MLOPS-AGENT ########"
    echo
    
    exec java -Dlog.file=$AGENT_BASE_LOC/logs/mlops.agent.log -Dlog4j.configurationFile=file:$AGENT_BASE_LOC/conf/$AGENT_LOG_PROPERTIES -cp $AGENT_BASE_LOC/mlops-agent.jar com.datarobot.mlops.agent.Agent --config $AGENT_CONF_LOC
  9. Create Docker image (don’t forget the period at the end of Docker build command).

    export PROJECT_ID=ai-XXXXXXX-111111
    
    docker build -t gcr.io/${PROJECT_ID}/monitoring-agents:v1 .
  10. Run the docker images command to verify that the build was successful.
pustinov_4-1606853963695.png

Step 4. Run Docker container locally

This step is often considered as optional but our advice is to always test your image locally to save your time and network bandwidth. In order to send the statistics from the custom Python model back to MLOps, the corresponding library (along with Java and R libraries) is provided in the monitoring agent tarball downloaded at Step 3. You can find them in the lib directory.

  1. Install DataRobotMLOps library for Python:
    pip install datarobot_mlops-6.3.3-py2.py3-none-any.whl

  2. Run your Docker container image.

    Note: You will need your JSON file with credentials downloaded in the prerequisites (Step 3. Download Google Cloud account credentials).

    docker run -it --rm --name ma -v /path-to-you-directory/mlops.agent.conf.yaml:/opt/datarobot/ma/conf/mlops.agent.conf.yaml -v /path-to-you-directory/your-google-application-credentials.json:/opt/datarobot/ma/conf/gac.json -e GOOGLE_APPLICATION_CREDENTIALS="/opt/datarobot/ma/conf/gac.json"
    gcr.io/ai-XXXXX-111111/monitoring-agents:v1


    pustinov_5-1606853963840.png
    Here is the example of the Python code where your model is scored (all package imports are omitted):

    DEPLOYMENT_ID = "EXTERNAL-DEPLOYMENT-ID-DEFINED-AT-STEP-1"
    MODEL_ID = "EXTERNAL-MODEL-ID-DEFINED-AT-STEP-1"
    PROJECT_ID = "YOUR-GOOGLE-PROJECT-ID"
    TOPIC_ID = "YOUR-PUBSUB-TOPIC-ID-DEFINED-AT-STEP-2"
    
        # MLOPS: initialize the MLOps instance
        mlops = MLOps() \
            .set_deployment_id(DEPLOYMENT_ID) \
            .set_model_id(MODEL_ID) \
            .set_pubsub_spooler(PROJECT_ID, TOPIC_ID) \
            .init()
    
        # Read your custom model pickle file (model has been trained outside DataRobot)
        model = pd.read_pickle('custom_model.pickle')
    
        # Read scoring data
        features_df_scoring = pd.read_csv('features.csv')
    
        # Get predictions
        start_time = time.time()
        predictions = model.predict_proba(features_df_scoring)
        predictions = predictions.tolist()
        num_predictions = len(predictions)
        end_time = time.time()
    
        # MLOPS: report the number of predictions in the request and the execution time
        mlops.report_deployment_stats(num_predictions, end_time - start_time)
    
        # MLOPS: report the features and predictions
        mlops.report_predictions_data(features_df=features_df_scoring, predictions=predictions)
    
        # MLOPS: release MLOps resources when finished
        mlops.shutdown()
  3. Set the GOOGLE_APPLICATION_CREDENTIALS environment variable:
    export GOOGLE_APPLICATION_CREDENTIALS="your-google-application-credentials.json"

    Note: You will need your JSON file with credentials downloaded in the prerequisites, "Step 3. Download Google Cloud account credentials."

  4. Score your data locally to test if the model works as expected. There is a new record in monitoring agent log: 
    python score-your-model.py

    pustinov_6-1606853963622.png

    The statistics in the MLOps dashboard have been updated as well:

    pustinov_7-1606853963765.png

Step 5. Push Docker image to Container Registry

Once the container image has been tested and validated locally, you need to upload it to a registry so that your Google Kubernetes Engine (GKE) cluster can download and run it.

  1. Configure the Docker command-line tool to authenticate to Container Registry:
    gcloud auth configure-docker

    pustinov_8-1606853963808.png
  2. Push the Docker image you built at “Step 3. Create Docker image with MLOps agent” to the Container Registry: 
    docker push gcr.io/${PROJECT_ID}/monitoring-agents:v1

Step 6. Create GKE cluster

Now that the Docker image is stored in the Container Registry, you need to create a GKE cluster.

  1. Set your project ID and Compute Engine zone options for the gcloud tool:

    gcloud config set project $PROJECT_ID
    gcloud config set compute/zone europe-west1-b

  2. Create a cluster.

    Note: We create a private cluster with unrestricted access to the public endpoint here for simplicity. You should restrict access to the control plane for the production environment for obvious security reasons. The detailed information about configuring different GKE private clusters can be found here.

    gcloud container clusters create monitoring-agents-cluster \
        --network default \
        --create-subnetwork name=my-subnet-0 \
        --no-enable-master-authorized-networks \
        --enable-ip-alias \
        --enable-private-nodes \
        --master-ipv4-cidr 172.16.0.32/28 \
        --no-enable-basic-auth \
        --no-issue-client-certificate

    where:

    --create-subnetwork name=my-subnet-0 causes GKE to automatically create a subnet named my-subnet-0.

    --no-enable-master-authorized-networks disables authorized networks for the cluster.

    --enable-ip-alias makes the cluster VPC-native.

    --enable-private-nodes indicates that the cluster's nodes do not have external IP addresses.

    --master-ipv4-cidr 172.16.0.32/28 specifies an internal address range for the control plane. This setting is permanent for this cluster.

    --no-enable-basic-auth indicates to disable basic auth for the cluster.

    --no-issue-client-certificate disables issuing a client certificate.

    This command will finish as follows:

    pustinov_9-1606853963843.png

  3. Run the following command to see the cluster worker instances:
    gcloud compute instances list

    pustinov_10-1606853963587.png

Step 7. Create cloud router

The MLOps agent running on GKE private cluster should have access to the DataRobot MLOps service. To do this, we need to give the private nodes outbound access to the internet. It can be achieved using a NAT cloud router. The official Google documentation is here.

  1. Create a cloud router:
    gcloud compute routers create nat-router \
        --network default \
        --region europe-west1

  2. Add configuration to the router. 
    gcloud compute routers nats create nat-config \
    
        --router-region europe-west1 \
        --router nat-router \
        --nat-all-subnet-ip-ranges \
        --auto-allocate-nat-external-ips​

Step 8. Create K8s ConfigMaps

Now we can create K8s ConfigMaps that will contain MLOps agent configuration and Google credentials.

Note: You should use K8s Secrets to save your configuration files for the production usage.

Note: You will need your JSON file with credentials downloaded as part of prerequisites, "Step 3. Download Google Cloud account credentials."

  • Create ConfigMaps:
    kubectl create configmap ma-configmap --from-file=mlops.agent.conf.yaml=your-path/mlops.agent.conf.yaml
    
    kubectl create configmap gac-configmap --from-file=gac.json=your-google-application-credentials.json

Step 9. Create K8s Deployment

  1. Create the file ma-deployment.yaml with the following content. (Note that we use three always-running replicas; if you need autoscaling, please use kubectl autoscale deployment.)

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: ma-deployment
      labels:
        app: ma
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: ma
      template:
        metadata:
          labels:
            app: ma
        spec:
          containers:
          - name: ma
            image: gcr.io/ai-XXXXXXX-11111111/monitoring-agents:v1
            volumeMounts:
            - name:  agent-conf-volume
              mountPath: /opt/datarobot/ma/conf/mlops.agent.conf.yaml
              subPath: mlops.agent.conf.yaml
            - name:  gac-conf-volume
              mountPath: /opt/datarobot/ma/conf/gac.json
              subPath: gac.json
            env:
            - name: GOOGLE_APPLICATION_CREDENTIALS
              value: /opt/datarobot/ma/conf/gac.json
            ports:
            - containerPort: 80
          volumes:
          - name:  agent-conf-volume
            configMap:
              items:
              - key: mlops.agent.conf.yaml
                path: mlops.agent.conf.yaml
              name: ma-configmap
          - name:  gac-conf-volume
            configMap:
              items:
              - key: gac.json
                path: gac.json
              name: gac-configmap
  2. Create deployment:
    kubectl apply -f ma-deployment.yaml

  3. Check running pods:
    kubectl get pods
pustinov_11-1606853963658.png

Step 10. Score model

  1. Now once again we can score our local model:
    python score-your-model.py

  2. Check the GKE Pod log: it shows that one record has been sent to DataRobot.

    pustinov_12-1606853963602.png

  3. Check the Pub/Sub log.
    pustinov_13-1606853963527.png
  4. Check the DataRobot MLOps dashboard.
    pustinov_14-1606853963728.png

Clean up

  1. Delete the NAT in router:
    gcloud compute routers nats delete nat-config --router=nat-router --router-region=europe-west1

  2. Delete the cloud router:
    gcloud compute routers delete nat-router --region=europe-west1

  3. Delete the cluster:
    gcloud container clusters delete monitoring-agents-cluster

Conclusion

DataRobot MLOps offers the ability to monitor all your ML models (trained in DataRobot or outside) in a centralized dashboard. The external model monitoring can be done using MLOps agents that can collect the statistics from ML models developed in Java, Python, or R programming languages. 

This article demonstrates how the external Python model–trained outside DataRobot–can be successfully monitored in the MLOps dashboard by agents deployed on GKE and using Pub/Sub as a communication channel.

More information

DataRobot Community resources:

Community GitHub repo: MLOps agent

Labels (3)
Comments
jhorace
Blue LED

your link for Monitoring all Your Models with DataRobot MLOps Agent it actually this same article

Linda
Community Team
Community Team

fixed @jhorace - thank you!

Version history
Last update:
‎12-02-2020 04:32 PM
Updated by:
Contributors