Are you dealing with stacked predictions or pipelines where you have to score 10+ different models and consolidate results based on a predefined business logic? Or, are you dealing with frequently changing business logic?
These are just a few of the scenarios where a simple but versatile API wrapper can help with turning raw predictions into actual decisions. This tutorial explains how you can implement a "decision engine" using an API wrapper. As you'll see, the process is quite straightforward:
For this tutorial it is assumed you have a basic knowledge of Docker, Python, and Django.
We will deploy our Docker image locally on a Linux server for this tutorial, but you could just as easily deploy it to AWS EKS or Azure AKS or, alternatively, take the same logic and implement it with a serverless architecture such as AWS Lambda or Azure Functions.
You can download Docker for your OS from here:
https://hub.docker.com/editions/community/docker-ce-server-centos and follow the install instructions for your respective Linux distribution (e.g., for CentOS you can follow the instructions here: https://docs.docker.com/engine/install/centos/).
To install the latest version of Docker Engine and container, run the following command.
$ sudo yum install docker-ce docker-ce-cli containerd.io
In this tutorial we will create a model with DataRobot AutoML and subsequently deploy it to a DataRobot prediction server.
In this particular use case, we will use our public Lending Club dataset (10K_Lending_Club_Loans.csv) to predict the likelihood of default. You can download the dataset from here:
Because we are using DataRobot AutoML, our model is just a few clicks away.
Deploy the recommended model to DataRobot prediction server.
Once Quick Autopilot is complete, switch to the Leaderboard and select any model from the top of the Leaderboard.
We will reference these credentials in the API wrapper later.
Now that we have deployed the model, you can pull the Docker image containing the API wrapper. Use the following command to install the Decision Engine (i.e., API wrapper):
docker run -it -p 8000:8000 \ -e DJANGO_SUPERUSER_USERNAME=<USERNAME> \ -e DJANGO_SUPERUSER_PASSWORD=<PASSWORD> \ -e DJANGO_SUPERUSER_EMAIL=<EMAIL> \ felix85/datarobot_decisions_engine
Important: Before running the above command for the first time, please replace the <username>, <password>, and <email> with your respective credentials.
Also, if you wanted to keep this service running, even when your console is closed, you would instead use the command shown below:
docker run -p 8000:8000 \ -e DJANGO_SUPERUSER_USERNAME=<USERNAME> \ -e DJANGO_SUPERUSER_PASSWORD=<PASSWORD> \ -e DJANGO_SUPERUSER_EMAIL=<EMAIL> \ felix85/datarobot_decisions_engine
Before you can use the Decision Engine, you need to complete the configuration. To do so, open a browser of your choice and navigate to http://127.0.0.1:8000.
You will see the DataRobot Decisions GUI.
Enter the previously specified username and password (from Install the Decision Engine) and click Log in.
In the displayed DataRobot Decisions - Decision Engine admin page, finalize the configuration as explained below.
Specify the name, for example “LoanALogic,” and paste the Python sample code shown below.
# -*- coding: utf-8 -*- """ Created on 2020/07/09 @author: Felix Huthmacher """ import pandas as pd import datetime ## data preparation / pre-processing business logic def data_prepare(features_df): # e.g. enhance web service-input with some other features features_df = features_df features_df['FICO'] = '850' # Specify deployment id / model for scoring based on a certain feature/business entity # you could specify a different deployment for each row in the dataframe features_df['deployment_id'] = features_df['sub_grade'].apply(lambda x: <REPLACE WITH YOUR DEPLOYMENT_ID> if x == 'A2' else '<REPLACE WITH YOUR DEPLOYMENT_ID>') return features_df ## post-processing business logic def business_logic(df_new): # e.g. different thresholds based on geographic area df_new['decision'] = df_new['addr_state'].apply(lambda x: '1' if x == 'CA' else '0') return df_new['decision'], df_new
Make sure to update the code snippet with the corresponding deployment ID that you created in step 4 (Create and Deploy a Model).
The above sample code includes two methods:
The method data_prepare allows you to add pre-processing steps such as data enrichment, feature engineering, or duplicated input rows for scoring against multiple models in parallel.
Each row in the dataframe can be pointed to a different deployment ID / model for scoring based on bespoke business logic.
The method business_logic allows you to consolidate scoring results based on predefined business logic. For example you can define different probability thresholds based on geographic data /customer segments, or roll up results from multiple models before returning results. This way you can return decisions rather than just raw prediction results, which simplifies integrations with downstream systems.
Specify the prediction server instance
The last step is to specify the prediction server instance and credentials that we want to use for our predictions.
For this we click Change and then specify the name, server URL, Datarobot Key (only required for a DataRobot Managed AI Cloud deployment), username, and API token, as well as the default logic connector. (All connection details and credentials can be found in the sample code from step 4 Create and Deploy a Model.)
Click Save when done.
Now that we have completed the configuration, we can use our Decision Engine. You can download the Postman collection that includes a sample REST and SOAP request from here.
By default the Decision Engine supports basic authentication.
The username and password can be configured in the Django settings.py here as shown below.
DataRobot natively supports a REST API, but you can easily convert this Decision Engine (i.e., API Wrapper) to a SOAP API as shown below. Input and Output structure can be adjusted as needed.
Now that we have created our Decision Engine (i.e., API wrapper), we can turn our raw predictions into actionable decisions. Additionally. We can encapsulate business logic and put governance around it through logging and versioning. Security is important, thus not everyone can change the business logic and altering business logic or generating decisions requires authentication. Because security is important, the decision engine has restrictions for changing business logic and altering business logic, and requires authentication to generate decisions.
Every change can be logged, and as soon as a particular business logic has been used to generate decisions, it cannot be altered; instead, users have to create new versions.
If you have followed any of my previous tutorials when you already know what is coming next.
Because we are leveraging the DataRobot Prediction API, we automatically benefit from its built-in monitoring functionality; this enables us to monitor a model’s performance and benchmark it against other models. Also it allows us to replace the model at any point in time without having to write or change a single line of code.
Finally, this sample code can also easily be modified to work with different different scoring methods such as portable prediction servers and scoring code, or to expose different API routes and protocols.
Full source code can be found in the Community GitHub here.