Deploy and Monitor DataRobot Models on AWS

cancel
Showing results for 
Search instead for 
Did you mean: 

Deploy and Monitor DataRobot Models on AWS

Outline

  • Introduction
  • Preliminary steps
    • Step 1. Check flag
    • Step 2. Install required resources
  • Main steps
    • Step 1. Create DataRobot MLOps model package
    • Step 2. Create Docker container image with MLOps package
    • Step 3. Run your Docker container image locally
    • Step 4. Push Docker image to Amazon Elastic Container Registry (ECR)
    • Step 5. Create external deployment in DataRobot MLOps
    • Step 6. Create Amazon EKS cluster
    • Step 7. Deploy MLOps package to Kubernetes
    • Step 8. Horizontal pod autoscaling
    • Step 9. Expose your model to the world (load balancing)
  • Clean up
  • Conclusion

Introduction

DataRobot MLOps (Machine Learning Operations) is a product that facilitates the routing of machine learning models to production and includes deployment, governance, and monitoring functionalities. DataRobot customers can deploy DataRobot models into their own Kubernetes clusters. In doing so, they still have the advantages of all the model monitoring provided by DataRobot’s model monitoring platform, such as service health, data drift, etc. These exportable DataRobot models are known as Portable Prediction Servers (PPSs). The models are embedded into Docker containers which provide flexibility and portability, making them suitable for container orchestration tools such as Kubernetes.

DataRobot is a cloud-agnostic platform that works with all top-3 cloud providers (AWS, GCP, Azure). This tutorial covers the step-by-step process for deploying a DataRobot model on Amazon Elastic Kubernetes Service (EKS). (See other DataRobot Community tutorials that describe the deployment process of DataRobot models on Azure Kubernetes Service and on Google Kubernetes Engine (GCP).)

Preliminary steps

Step 1. Check flag

  • The flag “Enable MMM model package export” should be enabled for your DataRobot account. (If needed, contact your administrator or DataRobot representative for more information.)

Step 2. Install required resources

There are two approaches to spin up the Amazon EKS cluster: use the eksctl tool (CLI for Amazon EKS), or use AWS Management Console. Usage of eksctl tool is the simplest and fastest way to create an EKS cluster. If you need more fine-grained tuning (for example, IAM role and VPC creation) then, when spinning up the cluster, you can use the AWS Management Console.

In order to not overload this tutorial with the technical details on how to create IAM roles, VPC, subnets, internet gateways, and so on in the cloud environment, the approach of using the eksctl tool will be described here.

Prerequisites

There are some prerequisites to interacting with AWS and underlying services. If any/all of these tools are already installed and configured for you, you can skip the corresponding steps. The detailed instructions for each step can be found here.

  1. Install the AWS CLI, version 2.
    aws --version

  2. Configure your AWS CLI credentials.
  3. Install eksctl.
    eksctl version

  4. Install and configure kubectl (CLI for Kubernetes clusters).
    kubectl version --short --client

  5. Check that you successfully installed all tools.

Output:

pustinov_0-1602535152848.png

Main steps

Deploying DataRobot models on a Kubernetes infrastructure consists of three main parts:

  • Preparing and pushing the Docker container with theMLOps package to the container registry.
  • Creating the external deployment in DataRobot.
  • Creating the Kubernetes cluster.

Step 1. Create DataRobot MLOps model package

For this tutorial we’re using the Kaggle housing prices dataset, https://www.kaggle.com/c/home-data-for-ml-course/data. Once Autopilot finishes model building, you can create and download the MLOps model package. To do this, navigate to the Models tab, select the model you want, and click Predict > Downloads. In the MLOps Package section, select Generate & Download.

Figure 1. Generate & Download MLOps packageFigure 1. Generate & Download MLOps package

This generates and downloads the model package (.mlpkg file) which contains all the necessary information about the model.

Step 2. Create Docker container image with MLOps package

Now you are ready to create a Docker container image. 

Note: You will need to contact DataRobot Support for information on how you can access the PPS base image. 

Once you have the PPS base image, use the following Dockerfile to generate an image that includes the DataRobot model. The .mlpkg file will be copied into the Docker image so make sure the Dockerfile and .mlpkg file are in the same folder.

Figure 2. DockerfileFigure 2. Dockerfile
  1. Go to the folder containing Dockerfile and .mlpkg file.
  2. Build the Docker image docker:
    build -t house-regression-model.

  3. Tag your image:
    docker tag house-regression-model:latest 0000000000000.xxx.ecr.us-east-1.amazonaws.com/house-regression-model:latest

    (replacing 000000000 with your actual repository id; you can find this id in the Elastic Container Service > View push commands of your AWS console GUI).

  4. Run the docker images command to verify that the build was successful.
    Figure 3. Build was successfulFigure 3. Build was successful
    The generated image will contain the DataRobot model and the MLOps agent used to transfer the metrics about service and model health back to the DataRobot MLOps platform.

Step 3. Run your Docker container image locally

This step is often considered as optional but our advice is to always test your image locally to save time and network bandwidth since the size of containers can be in the order of tens of gigabytes.

  1. Run your Docker container image:
    docker run --rm --name house-regression -p 8080:8080 -it 00000000000.xxx.ecr.us-east-1.amazonaws.com/house-regression-model:latest

    Figure 4. Run Docker container locallyFigure 4. Run Docker container locally
  2. Score your data locally to test if the model works as expected:
    curl -X POST http://localhost:8080/predictions -H "Content-Type: text/csv" --data-binary @kaggle_house_test_dataset.csv

    Figure 5. Score your data locallyFigure 5. Score your data locally

Step 4. Push Docker image to Amazon Elastic Container Registry (ECR)

You need to upload the container image to a registry so that your Amazon EKS cluster can download and run it.

  1. Configure the Docker command-line tool to authenticate to Elastic Container Registry:
    aws ecr get-login-password --region us-east-1 | docker login --username XXX --password-stdin 00000000000.xxx.ecr.us-east-1.amazonaws.com

  2. Push the Docker image you just built to ECR:
    docker push 00000000000.xxx.ecr.us-east-1.amazonaws.com/house-regression-model:latest

Step 5. Create external deployment in DataRobot MLOps

  1. Create an external deployment in MLOps. To do this, navigate to the Model Registry tab and click Model Packages. Select Add New Package and select New external model package.

    Figure 6. Create new external model packageFigure 6. Create new external model package
  2. Configure the external model package as shown in Figure 7. Note that Target name is case-sensitive.

    Figure 7. New external package configuration (target is case sensitive, learning data is optional but it’s required to identify data drift)Figure 7. New external package configuration (target is case sensitive, learning data is optional but it’s required to identify data drift)
  3. Make note of the MLOps model ID from the URL as shown in Figure 8. (You’re going to need this in Step 7. Deploy MLOps package to Kubernetes.)

    Figure 8. Get MLOps Model IDFigure 8. Get MLOps Model ID
  4. Now, while still on the Model Registry page, select the Deployments tab (to the right of the Package Info tab) and click Deploy model package.

    pustinov_9-1602535152803.png
    The Deployments page is shown with the information prefilled for the model package you created.

    pustinov_10-1602535152820.png
  5. Finish specifying any information needed for the deployment and click Create deployment.
  6. Make note of the MLOps deployment ID from the URL as shown in Figure 9. (You’re going to need this in Step 7. Deploy MLOps package to Kubernetes.)

    Figure 9. Get MLOps Deployment IDFigure 9. Get MLOps Deployment ID

      

Step 6. Create Amazon EKS cluster

Now that the Docker image is stored in ECR and the external deployment is created, you can spin up an Amazon EKS cluster. The EKS cluster needs VPC with either:

  • two public subnets and two private subnets, or 
  • a VPC with three public subnets. 

The Amazon EKS requires subnets in at least two Availability Zones. A VPC with public and private subnets is recommended so that Kubernetes can create public load balancers in the public subnets that load balance traffic to pods running on nodes that are in private subnets.

  1. (Optional) Create or choose two public and two private subnets in your VPC. (Important: Make sure that “Auto-assign public IPv4 address” is enabled for the public subnets.)

    Note: The eksctl tool will create all necessary subnets behind the scenes if you don’t provide the corresponding --vpc-private-subnets and --vpc-public-subnets parameters.
  2. Create the cluster:

    eksctl create cluster \
    --name house-regression \
    --vpc-private-subnets=subnet-xxxxxxx,subnet-xxxxxxx \
    --vpc-public-subnets=subnet-xxxxxxx,subnet-xxxxxxx \
    --ssh-access \
    --ssh-public-key my-public-key.pub \
    --managed

    Note: usage of --managed parameter enables Amazon EKS-managed nodegroups (https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html). This feature automates the provisioning and lifecycle management of nodes (EC2 instances) for Amazon EKS Kubernetes clusters. You can provision optimized groups of nodes for their clusters; EKS will keep their nodes up to date with the latest Kubernetes and host OS versions. The eksctl tool makes it possible to choose the specific size and instance type family via command line flags or config files.

    Note: Although --ssh-public-key is optional, it is highly recommended that you specify it when you create your node group with a cluster. This option enables SSH access to the nodes in your managed node group. Enabling SSH access allows you to connect to your instances and gather diagnostic information if there are issues. You cannot enable remote access after the node group is created.

    This command will finish as follows:
    pustinov_12-1602535152838.png
    Cluster provisioning usually takes between 10 and 15 minutes.

  3. When your cluster is ready, test that your kubectl configuration is correct:

    kubectl get svc

    pustinov_13-1602535152831.png

Step 7. Deploy MLOps package to Kubernetes

  1. Create a Kubernetes namespace:
    kubectl create namespace house-regression-namespace

  2. Save the following contents to a file named house-regression-service.yaml on your local machine.

    Note: You should provide the values of image, DataRobot API token, model ID, and deployment ID (both IDs were obtained at Step 5. Create external deployment in DataRobot MLOps).

    apiVersion: v1
    kind: Service
    metadata:
      name: house-regression-service
      namespace: house-regression-namespace
      labels:
        app: house-regression-app
    spec:
      selector:
        app: house-regression-app
      ports:
        - protocol: TCP
          port: 80
          targetPort: 8080
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: house-regression-deployment
      namespace: house-regression-namespace
      labels:
        app: house-regression-app
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: house-regression-app
      template:
        metadata:
          labels:
            app: house-regression-app
        spec:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: beta.kubernetes.io/arch
                    operator: In
                    values:
                    - amd64
          containers:
          - name: house-regression-model
            image: 0000000000.xxx.ecr.us-east-1.amazonaws.com/house-regression-model:latest
            env:
            - name: PORTABLE_PREDICTION_API_WORKERS_NUMBER
              value: "2"
            - name: PORTABLE_PREDICTION_API_MONITORING_ACTIVE
              value: "True"
            - name: PORTABLE_PREDICTION_API_MONITORING_SETTINGS
              value: output_type=spooler_type=filesystem;directory=/tmp;max_files=50;file_max_size=10240000;model_id=<your mlops_model_id_obtained_at_step_5>;deployment_id=<your mlops_deployment_id_obtained_at_step_5>
            - name: MONITORING_AGENT
              value: "True"
            - name: MONITORING_AGENT_DATAROBOT_APP_URL
              value: https://app.datarobot.com/
            - name: MONITORING_AGENT_DATAROBOT_APP_TOKEN
              value: <your_datarobot_api_token>
            ports:
            - containerPort: 80
  3. Create a Kubernetes service and deployment:
    kubectl apply -f house-regression-service.yaml

  4. View all resources that exist in house-regression-namespace:
    kubectl get all -n house-regression-namespace

    Output:
    pustinov_14-1602535152817.png

 

Step 8. Horizontal pod autoscaling

The Kubernetes Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replication controller, or replica set based on that resource's CPU utilization. This can help your applications scale out to meet increased demand or scale in when resources are not needed, thus freeing up your nodes for other applications. When you set a target CPU utilization percentage, the Horizontal Pod Autoscaler scales your application in or out to try to meet that target.

  1. Create a Horizontal Pod Autoscaler resource for the php-apache deployment:
    kubectl autoscale deployment house-regression-deployment -n house-regression-namespace --cpu-percent=80 --min=1 --max=5

  2. View all resources that exist in house-regression-namespace:
    kubectl get all -n house-regression-namespace

    Horizontal Pod Autoscaler appears in the resources list.

    Output:

    pustinov_15-1602535152836.png

Step 9. Expose your model to the world (load balancing)

Amazon EKS supports the Network Load Balancer and the Classic Load Balancer for pods running on Amazon EC2 instance nodes through the Kubernetes service of type LoadBalancer

Note: You must tag the public subnets in your VPC so that Kubernetes knows to use only those subnets for external load balancers instead of choosing a public subnet in each Availability Zone (in lexicographical order by subnet ID). 
kubernetes.io/role/elb = 1

Private subnets must be tagged in the following way so that Kubernetes knows it can use the subnets for internal load balancers. 
kubernetes.io/role/internal-elb = 1

Important: If you use an Amazon EKS AWS CloudFormation template to create your VPC after March 26, 2020, then the subnets created by the template are tagged when they're created (as explained here).

  1. Use the kubectl expose command to generate a Kubernetes service for house-regression-deployment:

    kubectl expose deployment house-regression-deployment -n house-regression-namespace --name=house-regression-external --type=LoadBalancer --port 80 --target-port 8080

    Where:
    --port is the port number configured on the Load Balancer
    --target-port is the port number that the house-regression-deployment container is listening on

  2. Run the following command to get the service details:
    kubectl get service -n house-regression-namespace

    Output:

    pustinov_16-1602535152818.png
  3. Copy the EXTERNAL_IP address.

  4. Score your model using the EXTERNAL_IP address (copied above):
    curl -X POST http://<EXTERNAL_IP>/predictions -H "Content-Type: text/csv" --data-binary @kaggle_house_test_dataset.csv

    Output:
    pustinov_17-1602535152825.png

  5. Check the service health of the external deployment created at the Step 5. Create external deployment in DataRobot MLOps. We can see that our scoring request is now included in the statistics.
    Figure 10. DataRobot MLOps dashboard shows our scoring request to the model deployed on Amazon EKSFigure 10. DataRobot MLOps dashboard shows our scoring request to the model deployed on Amazon EKS

Clean up

  1. Remove the sample service, deployment, pods, and namespace:
    kubectl delete namespace house-regression-namespace

  2. Delete the cluster:

    eksctl delete cluster \
    --name house-regression

    Output:
    pustinov_19-1602535152851.png

Conclusion

This tutorial explained how to deploy and monitor DataRobot models on the Amazon EKS platform via a Portable Prediction Server (PPS). A PPS is embedded into Docker containers alongside the MLOps agents, making it possible to acquire the principal IT (service health, number of requests, etc.) and ML (accuracy, data drift etc.) metrics in the cloud and monitor them on the centralized DataRobot MLOps dashboard.

Using DataRobot PPSs allows you to avoid vendor lock-in and easily migrate between cloud environments or deploy models across them simultaneously.

Labels (2)
Version history
Revision #:
14 of 14
Last update:
Friday
Updated by:
 
Contributors