(Updated October 2020)
DataRobot MLOps (Machine Learning Operations) is a product that facilitates the routing of machine learning models to production and includes deployment, governance, and monitoring functionalities. DataRobot customers can deploy DataRobot models into their own Kubernetes clusters. In doing so, they still have the advantages of all the model monitoring provided by DataRobot’s model monitoring platform, such as service health, data drift, etc. These exportable DataRobot models are known as Portable Prediction Servers (PPSs). The models are embedded into Docker containers which provide flexibility and portability, making them suitable for container orchestration tools such as Kubernetes.
DataRobot is a cloud-agnostic platform that works with all top-3 cloud providers (AWS, GCP, Azure). This tutorial covers the step-by-step process for deploying a DataRobot model on Amazon Elastic Kubernetes Service (EKS). (See other DataRobot Community tutorials that describe the deployment process of DataRobot models on Azure Kubernetes Service and on Google Kubernetes Engine (GCP).)
There are two approaches to spin up the Amazon EKS cluster: use the eksctl tool (CLI for Amazon EKS), or use AWS Management Console. Usage of eksctl tool is the simplest and fastest way to create an EKS cluster. If you need more fine-grained tuning (for example, IAM role and VPC creation) then, when spinning up the cluster, you can use the AWS Management Console.
In order to not overload this tutorial with the technical details on how to create IAM roles, VPC, subnets, internet gateways, and so on in the cloud environment, the approach of using the eksctl tool will be described here.
There are some prerequisites to interacting with AWS and underlying services. If any/all of these tools are already installed and configured for you, you can skip the corresponding steps. The detailed instructions for each step can be found here.
Deploying DataRobot models on a Kubernetes infrastructure consists of three main parts:
For this tutorial we’re using the Kaggle housing prices dataset, https://www.kaggle.com/c/home-data-for-ml-course/data. Once Autopilot finishes model building, you can create and download the MLOps model package. To do this, navigate to the Models tab, select the model you want, and click Predict > Downloads. In the MLOps Package section, select Generate & Download.
This generates and downloads the model package (.mlpkg file) which contains all the necessary information about the model.
Now you are ready to create a Docker container image.
Note: You will need to contact DataRobot Support for information on how you can access the PPS base image.
Once you have the PPS base image, use the following Dockerfile to generate an image that includes the DataRobot model. The .mlpkg file will be copied into the Docker image so make sure the Dockerfile and .mlpkg file are in the same folder.
This step is often considered as optional but our advice is to always test your image locally to save time and network bandwidth since the size of containers can be in the order of tens of gigabytes.
You need to upload the container image to a registry so that your Amazon EKS cluster can download and run it.
Now that the Docker image is stored in ECR and the external deployment is created, you can spin up an Amazon EKS cluster. The EKS cluster needs VPC with either:
The Amazon EKS requires subnets in at least two Availability Zones. A VPC with public and private subnets is recommended so that Kubernetes can create public load balancers in the public subnets that load balance traffic to pods running on nodes that are in private subnets.
eksctl create cluster \ --name house-regression \ --vpc-private-subnets=subnet-xxxxxxx,subnet-xxxxxxx \ --vpc-public-subnets=subnet-xxxxxxx,subnet-xxxxxxx \ --ssh-access \ --ssh-public-key my-public-key.pub \ --managed
Note: usage of --managed parameter enables Amazon EKS-managed nodegroups (https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html). This feature automates the provisioning and lifecycle management of nodes (EC2 instances) for Amazon EKS Kubernetes clusters. You can provision optimized groups of nodes for their clusters; EKS will keep their nodes up to date with the latest Kubernetes and host OS versions. The eksctl tool makes it possible to choose the specific size and instance type family via command line flags or config files.
Note: Although --ssh-public-key is optional, it is highly recommended that you specify it when you create your node group with a cluster. This option enables SSH access to the nodes in your managed node group. Enabling SSH access allows you to connect to your instances and gather diagnostic information if there are issues. You cannot enable remote access after the node group is created.
This command will finish as follows:
Cluster provisioning usually takes between 10 and 15 minutes.
When your cluster is ready, test that your kubectl configuration is correct:kubectl get svc
apiVersion: v1 kind: Service metadata: name: house-regression-service namespace: house-regression-namespace labels: app: house-regression-app spec: selector: app: house-regression-app ports: - protocol: TCP port: 80 targetPort: 8080 --- apiVersion: apps/v1 kind: Deployment metadata: name: house-regression-deployment namespace: house-regression-namespace labels: app: house-regression-app spec: replicas: 3 selector: matchLabels: app: house-regression-app template: metadata: labels: app: house-regression-app spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: beta.kubernetes.io/arch operator: In values: - amd64 containers: - name: house-regression-model image: 0000000000.xxx.ecr.us-east-1.amazonaws.com/house-regression-model:latest env: - name: PORTABLE_PREDICTION_API_WORKERS_NUMBER value: "2" - name: PORTABLE_PREDICTION_API_MONITORING_ACTIVE value: "True" - name: PORTABLE_PREDICTION_API_MONITORING_SETTINGS value: output_type=spooler_type=filesystem;directory=/tmp;max_files=50;file_max_size=10240000;model_id=<your mlops_model_id_obtained_at_step_5>;deployment_id=<your mlops_deployment_id_obtained_at_step_5> - name: MONITORING_AGENT value: "True" - name: MONITORING_AGENT_DATAROBOT_APP_URL value: https://app.datarobot.com/ - name: MONITORING_AGENT_DATAROBOT_APP_TOKEN value: <your_datarobot_api_token> ports: - containerPort: 80
The Kubernetes Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replication controller, or replica set based on that resource's CPU utilization. This can help your applications scale out to meet increased demand or scale in when resources are not needed, thus freeing up your nodes for other applications. When you set a target CPU utilization percentage, the Horizontal Pod Autoscaler scales your application in or out to try to meet that target.
View all resources that exist in house-regression-namespace:
kubectl get all -n house-regression-namespace
Horizontal Pod Autoscaler appears in the resources list.
Amazon EKS supports the Network Load Balancer and the Classic Load Balancer for pods running on Amazon EC2 instance nodes through the Kubernetes service of type LoadBalancer.
Note: You must tag the public subnets in your VPC so that Kubernetes knows to use only those subnets for external load balancers instead of choosing a public subnet in each Availability Zone (in lexicographical order by subnet ID).
kubernetes.io/role/elb = 1
Private subnets must be tagged in the following way so that Kubernetes knows it can use the subnets for internal load balancers.
kubernetes.io/role/internal-elb = 1
Important: If you use an Amazon EKS AWS CloudFormation template to create your VPC after March 26, 2020, then the subnets created by the template are tagged when they're created (as explained here).
Run the following command to get the service details:
kubectl get service -n house-regression-namespace
Copy the EXTERNAL_IP address.
Score your model using the EXTERNAL_IP address (copied above):
curl -X POST http://<EXTERNAL_IP>/predictions -H "Content-Type: text/csv" --data-binary @kaggle_house_test_dataset.csv
Check the service health of the external deployment created at the Step 5. Create external deployment in DataRobot MLOps. We can see that our scoring request is now included in the statistics.
eksctl delete cluster \
This tutorial explained how to deploy and monitor DataRobot models on the Amazon EKS platform via a Portable Prediction Server (PPS). A PPS is embedded into Docker containers alongside the MLOps agents, making it possible to acquire the principal IT (service health, number of requests, etc.) and ML (accuracy, data drift etc.) metrics in the cloud and monitor them on the centralized DataRobot MLOps dashboard.
Using DataRobot PPSs allows you to avoid vendor lock-in and easily migrate between cloud environments or deploy models across them simultaneously.