DataRobot Prediction API Usage & HTTP Status Interpretation

cancel
Showing results for 
Search instead for 
Did you mean: 

DataRobot Prediction API Usage & HTTP Status Interpretation

Let’s assume we've already uploaded the well-known Titanic dataset to DataRobot and used its automated machine learning capabilities to find a great model. We then deployed that model and now our goal is to make predictions on passengers to see if they would have survived or not.

The most common and well-integrated way to use a DataRobot model is to host it within the DataRobot platform as a deployment, and call it from the DataRobot Prediction API.  (Licensed DataRobot customers can search the in-app documentation for "Prediction API" to read more information.)

There are several ways to gather the necessary values to construct a call to a deployed model via the API. The Integrations tab in the GUI provides the easiest way to get the most useful information for a specific deployment. This tab provides a simple Python script to assemble one HTTP request.

Screen Shot 2020-01-24 at 10.14.57 AM.png

Several values will need to be leveraged to craft an HTTP request containing one-to-many records to be scored in the data payload, including the following:

  • Authentication and privileges are established via the USERNAME and API_KEY.  The user must have access to use the deployed model.
  • DEPLOYMENT_ID is the unique ID of the deployment in DataRobot, which sits in front of the model (and can be swapped behind the ID, if desired, for a refresh.)
  • DATAROBOT_KEY is an additional engine access key and is required only for DataRobot Managed AI Cloud. (Customers with their own installation can remove this option.)
  • The appropriate Content-Type header needs to be specified for the input data: application/json or text/csv for CSV data. UTF-8 compatible characters are supported.
  • The hostname URL is typically a load balancer in front of one-to-many prediction engines.

A simple Titanic model will be used for demonstration purposes. An HTTP request can be constructed in virtually any programming language; the command line tool curl will be used in the examples below to demonstrate calls.  The endpoint accepts both JSON and CSV and, by default, returns data in JSON format.  All payloads demonstrated below will simply use a single input record to score.

 

$ cat test1.csv
PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q

 

The following is a basic minimal scoring example. This goes through the support host, to a Titanic survival binary classification model and is authenticated with the user mike.t.  The input type is specified as CSV via text/csv in the header, as well as an additional Managed AI Cloud-only header item of the DATAROBOT-KEY.  Lastly, the data payload of the request is loaded from the file test1.csv.

 

$ curl -X POST "https://YOUR.HOST.datarobot.com/predApi/v1.0/deployments/123_YOUR_DEPLOYMENTID_456/predictions" -u mike.t@datarobot.com:YOUR_API_KEY -H "Content-Type: text/csv" -H "datarobot-key: 222_YOUR_CLOUD_DATAROBOT_KEY_bbb" --data-binary "@test1.csv"
----------------
{"data":[{"predictionValues":[{"value":0.1234847921,"label":1.0},{"value":0.8765152079,"label":0.0}],"predictionThreshold":0.5,"prediction":0.0,"rowId":0}]}

 

Note: The JSON response can be copied and pasted into an online JSON viewer for ease of browsing and understanding the response structure.

The following shows a similar call requesting that the surrogate key of the data be returned in the payload along with the top three features that impacted the score the most. This example introduces the passthroughColumns and maxCodes parameters, and swaps the endpoint from predictions to predictionExplanations.

 

$ curl -X POST "https://YOUR.HOST.datarobot.com/predApi/v1.0/deployments/123_YOUR_DEPLOYMENTID_456/predictionExplanations?passthroughColumns=PassengerId&maxCodes=3" -u mike.t@datarobot.com:YOUR_API_KEY -H "Content-Type: text/csv" -H "datarobot-key: 222_YOUR_CLOUD_DATAROBOT_KEY_bbb" --data-binary "@test1.csv"
--------------------
{"data":[{"predictionExplanations":[{"featureValue":"Kelly, Mr. James","strength":-0.423789016,"feature":"Name","qualitativeStrength":"---","label":1.0},{"featureValue":null,"strength":-0.3177691377,"feature":"Cabin","qualitativeStrength":"---","label":1.0},{"featureValue":"male","strength":-0.2648695556,"feature":"Sex","qualitativeStrength":"--","label":1.0}],"rowId":0,"predictionValues":[{"value":0.1234847921,"label":1.0},{"value":0.8765152079,"label":0.0}],"predictionThreshold":0.5,"prediction":0.0,"passthroughValues":{"PassengerId":"892"}}]}

 

The following request uses inline JSON:

 

$ curl -X POST "https://YOUR.HOST.datarobot.com/predApi/v1.0/deployments/123_YOUR_DEPLOYMENTID_456/predictions" -u mike.t@datarobot.com:YOUR_API_KEY -H "Content-Type: application/json" -H "datarobot-key: 222_YOUR_CLOUD_DATAROBOT_KEY_bbb" --data '[{"PassengerId": 892, "Pclass": 3, "Name": "Kelly, Mr. James", "Sex": "male", "Age": 34.5, "SibSp": 0, "Parch": 0, "Ticket": 330911, "Fare":7.8292, "Cabin": null, "Embarked": "Q"}]'
--------------------------
{"data":[{"predictionValues":[{"value":0.1234847921,"label":1.0},{"value":0.8765152079,"label":0.0}],"predictionThreshold":0.5,"prediction":0.0,"rowId":0}]}

 

The default response for a request is in JSON; however, the Accept header can be specified to create a CSV response if desired. (This feature became available as of DataRobot 6.0 and is currently available in the Managed AI Cloud.)

 

curl -X POST "https://datarobot-support.orm.datarobot.com/predApi/v1.0/deployments/123_YOUR_DEPLOYMENTID_456/predictions" -u mike.t@datarobot.com:YOUR_API_KEY -H "Content-Type: text/csv" -H "datarobot-key: 222_YOUR_CLOUD_DATAROBOT_KEY_bbb" -H "Accept: text/csv" --data-binary "@test1.csv"
---------------------------
Survived_1_PREDICTION,Survived_0_PREDICTION,Survived_PREDICTION,THRESHOLD,POSITIVE_CLASS
0.123484792,0.876515208,0,0.5,1

 

An additional header can be specified if there is a desire to exclude a request from drift tracking within model management:

--header "'X-DataRobot-Skip-Drift-Tracking': '1'"

When working with batches, a client-side batch scoring script is available. That script shreds and parallelizes an input CSV file of any size into a scored and flattened output CSV file. (Note that the Python batch scoring script has been deprecated and replaced with the Batch Prediction API. Although the script can still function in some environments, legacy Prediction API routes on the prediction servers in the Managed AI Cloud are disabled, meaning some commands won't work. DataRobot customers: search the in-app documentation for information on Batch Prediction API.

A Python script that can be used for scoring batches of data can be found under the Integrations tab of a specific deployment within DataRobot.

Screen Shot 2020-04-03 at 12.08.53 PM.png

Decoding and debugging failed requests is most easily accomplished via directly querying the API to obtain HTTP status codes and JSON return messages.

 

$ curl -i -X POST "https://YOUR.HOST.datarobot.com/predApi/v1.0/deployments/123_YOUR_DEPLOYMENTID_456/predictions" -u mike.t@datarobot.com:YOUR_API_KEY -H "Content-Type: text/csv" -H "datarobot-key: 222_YOUR_CLOUD_DATAROBOT_KEY_bbb" --data-binary "@test1.csv"
HTTP/1.1 200 OK
----------------
{"data":[{"predictionValues":[{"value":0.1234847921,"label":1.0},{"value":0.8765152079,"label":0.0}],"predictionThreshold":0.5,"prediction":0.0,"rowId":0}]}

 

Status Code Meaning
200 Success!
400 Malformed; check response.content.  Possible bad data or header.
401 Check API token and (cloud only) DataRobot-Key
403 Check privileges - user appears unauthorized to access Deployment ID
404 Check URL path / Deployment ID
405 Confirm POST request was sent rather than GET
422 Check (case sensitive) feature names and request.content message for list
Labels (2)
Comments
DataRobot Employee
DataRobot Employee

All of the above code still works, although I will note a change in the documentation and the interface for sending the credentials.  Rather than basic user authentication (deprecated) the current preferred option is to send in the authorization via a header field as a bearer token.  For example:

curl -i -X POST "https://YOUR.HOST/predApi/v1.0/deployments/123_YOUR_DEPLOYMENTID_456/predictions?passthroughColumns=PassengerId&maxExplanations=3" -H "Authorization: Bearer YOUR_API_KEY" -H "Content-Type: text/csv" -H "datarobot-key: 222_YOUR_CLOUD_DATAROBOT_KEY_bbb" --data-binary "@test1.csv"

 

Version history
Revision #:
17 of 17
Last update:
‎04-16-2020 02:39 PM
Updated by: