This article introduces you to Batch Prediction API which can be used to score big datasets on deployed DataRobot models. The Python notebook available at the DataRobot Community GitHub provides instructions for getting predictions out of a DataRobot deployed model.
The Batch Prediction API allows you to score huge datasets by taking advantage of the prediction servers you have already deployed. The API is available through the DataRobot Public API and can be consumed by REST-enabled clients or the DataRobot Python Public API bindings. (View the full documentation for the DataRobot Python client package here.)
There are multiple advantages to using the Batch Prediction API:
Score large datasets from (and to) Amazon S3 buckets.
Score big datasets from the DataRobot AI Catalog.
Mix and match intake and output options. For example, you could be ingesting data from a local file and saving results to an S3 bucket.
Receive results while still uploading the file to be scored.
Protect against Prediction Server overload with a concurrency control level option.
Include Prediction Explanations (where threshold is customizable).
The Batch Prediction API can intake (ingest) data from the following:
The Batch Prediction API can output (save) data to the following:
When scoring huge datasets, the Batch Prediction API will automatically split the data into chunks and score them concurrently on the prediction instance specified by the deployment. The level of concurrency is a parameter that the user can directly change depending on the case.
The Batch Prediction API does not stop you from using all of the model monitoring capabilities you are used to having when working with DataRobot.
If Data Drift is enabled in your deployment, any predictions passed through the Batch Prediction API will be tracked as usual
If Accuracy tracking is enabled, the output will have the Association ID that can be used later on to report actuals.
Consistent scoring with updated model
If you have have replaced a deployed model after a job has been queued, DataRobot will still use the deployed model at the time of job creation. This ensures that every row is scored using the same mode, and the results stay consistent.
Documentation for the latest DataRobot Python client is available here.
If you’re a licensed DataRobot customer:
Search the in-app Platform Documentation for Batch Prediction API, Intake Options, and Output Options.
Search the in-app API Documentation for Batch Predictions (within the API Reference).