How to Monitor Spark Models with DataRobot MLOps

cancel
Showing results for 
Search instead for 
Did you mean: 
Hi Paxata Community members! Welcome to the DataRobot Community! You will find all the Paxata content you know and love— CLICK HERE.

How to Monitor Spark Models with DataRobot MLOps

fhuthmacher_0-1593211213725.png

In prior tutorials we discussed how easy it is to deploy and monitor models with DataRobot MLOps. But what about use cases where you don’t want to deploy a model within DataRobot MLOps, yet you still want to manage and monitor it from one central dashboard? This is the scenario where DataRobot’s Monitoring Agent (MLOps agent) comes to the rescue.

In this tutorial, we will explore how you can manage and monitor remote models, i.e., models that are not running within DataRobot MLOps. These are models that are deployed on your own infrastructure, literally anywhere you like. Common examples are serverless deployments (AWS Lambda, Azure Functions) or deployments on Spark clusters (Hadoop, Databricks, AWS EMR). 

To illustrate this process, we will take the example of a DataRobot model that is being deployed on a Databricks cluster, and walk through the process of monitoring this model with DataRobot MLOps in a central dashboard that covers all of your models, regardless of where they have been developed or deployed. This approach works for any model that runs within a Spark cluster.  

1) Create a Model

In this example we will create a model with DataRobot AutoML and subsequently import it into our Databricks cluster.

In this particular use case, we will use our public LendingClub dataset to predict the loan amount for each application. You can download the dataset here: https://s3.amazonaws.com/datarobot_public_datasets/10K_Lending_Club_Loans.csv.

Please note that if you already have a regression model that runs in your Spark cluster, then you can skip this step, and jump directly to 3) Install DataRobot’s MLOps monitoring agent and library.

Because we are using DataRobot AutoML, our model is just a few clicks away:

  1. Drop the training data into DataRobot

    Select URL import and then paste the above url as shown below.

    fhuthmacher_1-1593211213728.png
    fhuthmacher_2-1593211213723.png
  2. Select the target & start Autopilot

    Specify the loan_amnt column as target, and then select Start button to kick-off the Autopilot run.
    fhuthmacher_3-1593211213731.png
  3. Download Scoring Code
    Once the Autopilot is complete, switch to the Leaderboard and select a model that has a scoring code badge as shown below.
    fhuthmacher_4-1593211213829.png
    Now click the
    Predict tab, and then on the Downloads tab to get to the Scoring Code JAR download screen as shown below. 
    fhuthmacher_5-1593211213827.png
    Then click
    Download. This will start the download of the JAR file.

For more information about Scoring Code, please refer to one of the following community articles covering this in more detail:

And if you’re a licensed DataRobot customer, search the in-app Platform Documentation for Scoring Code.

2) Deploy a Model

To install DataRobot’s Scoring Code, we have to install the previously downloaded JAR file, along with DataRobot’s Spark Wrapper within the Databricks cluster as shown below.

To do this, select the cluster settings, navigate to the Libraries tab, and click Install New. Select the Scoring Code JAR file (e.g., 5ed68d70455df33366ce0508.jar) and click Install.

fhuthmacher_6-1593211213729.png


Once the install is complete, repeat the same steps and install DataRobot’s Spark Wrapper which you can download from here, or pull the latest version of it directly from Maven here.

3) Install DataRobot’s MLOps Monitoring Agent and Library

fhuthmacher_7-1593211213734.png

Remote models do not directly communicate with DataRobot MLOps. 

Instead, the communication is handled via DataRobot MLOps monitoring agents, which support many different spooling mechanisms (e.g., flat files, AWS SQS, RabbitMQ).

These agents are typically deployed in the customer environment where the model itself is running.

To simplify communication with the DataRobot MLOps monitoring agent, there are libraries available for all of the common programming languages. 

In the end, the model is instructed to talk to the agent with the help of the MLOps library, and the agent collects all the metrics from the model and relays them to the MLOps server and dashboards.

In this tutorial, our runtime environment is Spark, so we will install the MLOps library to our Spark cluster (Databricks) in the same way we installed the model itself previously (see step 2, Deploy a Model).

Further we will install the MLOps monitoring agent in an Azure Kubernetes Service (AKS) cluster alongside RabbitMQ, which we will use as our queuing system of choice.

  1. Install RabbitMQ in AKS cluster

    This tutorial assumes that you are familiar with AKS, as well as Azure’s CLI. For further information, please refer to Microsoft’s quick start tutorial here.

    If you don’t have a running AKS cluster, create one as shown below.
    RESOURCE_GROUP=ai_success_eng
    
    CLUSTER_NAME=AIEngineeringDemo
    
    
    az aks create \
    
    --resource-group $RESOURCE_GROUP \
    
    --name $CLUSTER_NAME \
    
    -s Standard_B2s \
    
    --node-count 1 \
    
    --generate-ssh-keys \
    
    --service-principal XXXXXX \
    
    --client-secret XXXX \
    
    --enable-cluster-autoscaler \
    
    --min-count 1 \
    
    --max-count 2

    Next we start the Kubernetes dashboard.

    az aks browse --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME​

    There are many ways to deploy applications, and probably the easiest way is via the Kubernetes dashboard. To install RabbitMQ, we select + CREATE, then CREATE AN APP

    In this screen, you specify, the app name, e.g., “rabbitmqdemo”, the container image “rabbitmq:latest”, 1 pod (we are not concerned about HA), mark the service as external, specify the port mapping for rabbitmq’s default ports (5672, and 15672), and then click DEPLOY.
    fhuthmacher_8-1593211213738.png
    That’s it, we just installed RabbitMQ.

  2. Install MLOps Monitoring Agent

    You can install the agent anywhere, but for this tutorial, we are going to install it alongside RabbitMQ.

    The actual agent can be downloaded directly from within your DataRobot cluster as shown below.

    fhuthmacher_9-1593211213831.png

    Copy this tarball to the container on which RabbitMQ is running, by executing the below command (if necessary update the filename of the tarball).
    kubectl cp datarobot-mlops-agent-6.1.0.tar.gz default/rabbitmq-649ccbd8cb-qjb4l:/opt​

    Now that we have copied the tarball to the container, we can connect to the CLI of the container, configure the agent, and start it.

    fhuthmacher_10-1593211213739.png

    fhuthmacher_11-1593211213741.png
    In the open CLI, review the tarball name and, if necessary, update the filename in the command. Then execute the below commands.

    cd /opt && tar -xvzf mlops-agent-6.1.0.tar &&
    cd mlops-agent-6.1.0/conf

    In this directory, you will find a couple of configuration files, and you need to update mlops.agent.conf.yaml to point it to your DataRobot MLOps instance and message queue.

    To update the configuration and run the agent, we install Vim and Java with the following command.

    apt-get update &&
    apt-get install vim &&
    apt-get install default-jdk​


    In this example we are using RabbitMQ and the DataRobot Managed AI Cloud solution, so we configure mlopsURL, apiToken, and channelConfigs as shown below.

    fhuthmacher_12-1593211213753.png


    Before we start the agent, we can also enable the RabbitMQ Management UI and create a new user so that we can monitor queues more easily by running the following commands.

    ### enable rabbit mq ui
    
    rabbitmq-plugins enable rabbitmq_management &&
    
    ### add user via cli 
    
    rabbitmqctl add_user <username> <your password> &&
    rabbitmqctl set_user_tags <username> administrator &&
    rabbitmqctl set_permissions -p / <username> ".*" ".*" ".*"​

    Now that RabbitMQ is configured, and the updated configuration is saved, we can switch to the /bin directory, and start the agent.

    cd ../bin &&
    ./start-agent.sh ​

     Confirm that the agent is running correctly by checking its status.

    ./status-agent.sh

    Also you can check the logs, which you can find in the log directory (/logs), to ensure that everything is running as expected.

    fhuthmacher_13-1593211213736.png

  3. Install MLOps Library in Spark cluster (Databricks)

    Download the library from here. Then select the cluster settings, navigate to the Libraries tab, and click Install New. Select the MLOps JAR file (e.g., MLOps.jar) and click Install.

    fhuthmacher_14-1593211213732.png

4) Run your Model

Now that we have all the prerequisites in place, it is time to actually run our model to get some predictions.

// Scala example (see also PySpark example in notebook references at the bottom)

// 1) Use local DataRobot Model for Scoring
import com.datarobot.prediction.spark.Predictors

// referencing model_id, which is the same as the generated filename of the JAR file
val DataRobotModel = com.datarobot.prediction.spark.Predictors.getPredictor("5ed68d70455df33366ce0508") 

// 2) read the scoring data
val scoringDF = sql("select * from 10k_lending_club_loans_with_id_csv")

// 3) Score the data and save results to spark dataframe
val output = DataRobotModel.transform(scoringDF)

// 4) Review/consume scoring results 
output.show(1,false)​

To track the actual scoring time, you can wrap the scoring command, so the updated code would look like the below.

// to track the actual scoring time

def time[A](f: => A): Double = {
  val s = System.nanoTime
  val ret = f
  val scoreTime = (System.nanoTime-s)/1e6 * 0.001
  println("time: "+ scoreTime+"s")
  return scoreTime
}

// 1) Use local DataRobot Model for Scoring
import com.datarobot.prediction.spark.Predictors

// referencing model_id, which is the same as the generated filename of the JAR file
val DataRobotModel = com.datarobot.prediction.spark.Predictors.getPredictor("5ed708a8fca6a1433abddbcb") 

// 2) read the scoring data
val scoringDF = sql("select * from 10k_lending_club_loans_with_id_csv")

val scoreTime = time {

  // Score the data and save results to spark dataframe
  val scoring_output = DataRobotModel.transform(scoringDF)
  scoring_output.show(1,false)
  scoring_output.createOrReplaceTempView("scoring_output")
}​

5) Report Usage to DataRobot MLOps via Monitoring Agents

Finally, we used our model to predict the loan amount of an application. Now we want to report all the telemetrics around these predictions to our DataRobot MLOps server and dashboards. To do this, we can leverage the below methods as illustrated.

  1. Create external deployment

    Before we can report scoring details, we have to create an external deployment within DataRobot MLOps. This only has to be done once, and can be done via the UI in DataRobot MLOps as shown below.
    fhuthmacher_15-1593211213749.png

    Select
    Model Registry, then Model Packages, and New external model package from the menu. Then specify a name, description, upload corresponding training data for drift tracking, the model location, target, and prediction type as shown below. When finished, click Create package.

    fhuthmacher_16-1593211213751.png
    Once the external model package is created, take note of the model ID in the URL as shown below.

    fhuthmacher_17-1593211213745.png
    Next, click on the
    Deployments tab and then select Create new deployment.
    fhuthmacher_18-1593211213740.png
    Once the deployment has been created, you will be forwarded to the Deployments page from where you can get the deployment ID (from the URL).

    fhuthmacher_19-1593211213742.png
    Now that we have our model ID and deployment ID, we can actually report the predictions in the next step.

  2. Report prediction details

    To actually report prediction details to DataRobot, you run the below code in your Spark environment. Make sure to update the input parameters accordingly.

    import com.datarobot.mlops.spark.MLOpsSparkUtils
    val channelConfig = "OUTPUT_TYPE=RABBITMQ;RABBITMQ_URL=amqp://<<RABBIT HOSTNAME>>:5672;RABBITMQ_QUEUE_NAME=mlopsQueue"
    
    MLOpsSparkUtils.reportPredictions(
    
                    scoringDF, // spark dataframe with actual scoring data
                    "5ec3313XXXXXXXXX", // external DeploymentId 
                    "5ec3313XXXXXXXXX", // external ModelId
                    channelConfig, // rabbitMQ config
                    scoringTime, // actual scoring time
                    Array("PREDICTION"), //target column
                    "id" // AssociationId 
                    )

     

  3. Report actuals

    At some point you also might get actual values, which you then can report to track accuracy over time. This can be accomplished with the function shown below.

    import com.datarobot.mlops.spark.MLOpsSparkUtils
    
    val actualsDF = spark.sql("select id as associationId, loan_amnt as actualValue, null as timestamp  from actuals")
    
    MLOpsSparkUtils.reportActuals(
          actualsDF,
          deploymentId,
          ModelId,
          channelConfig
        )​

Final Thoughts

Even though we deployed a model outside of DataRobot on a Spark cluster (Databricks), we can monitor it like any other model and track service health and data drift as well as actuals in one central dashboard (see below).

fhuthmacher_20-1593211213755.png


Complete sample notebooks with all code snippets for Scala and PySpark can be found here.

More Information

Check out this community article, Machine Learning Operations (MLOps) Walkthrough, which provides a full MLOps walkthrough.

And if you’re a licensed DataRobot customer, search the in-app Platform Documentation for Scoring Code.

Labels (2)
Version history
Revision #:
3 of 3
Last update:
a week ago
Updated by:
 
Contributors