Batch Prediction API

This article introduces you to Batch Prediction API which can be used to score big datasets on deployed DataRobot models. The Python notebook available at the DataRobot Community GitHub provides instructions for getting predictions out of a DataRobot deployed model.

Basics

The Batch Prediction API allows you to score huge datasets by taking advantage of the prediction servers you have already deployed. The API is available through the DataRobot Public API and can be consumed by REST-enabled clients or the DataRobot Python Public API bindings. (View the full documentation for the DataRobot Python client package here.)

Advantages

There are multiple advantages to using the Batch Prediction API:

  • Score large datasets from (and to) Amazon S3 buckets.
  • Score big datasets from the DataRobot AI Catalog.
  • Mix and match intake and output options. For example, you could be ingesting data from a local file and saving results to an S3 bucket.
  • Receive results while still uploading the file to be scored.
  • Protect against Prediction Server overload with a concurrency control level option.
  • Include Prediction Explanations (where threshold is customizable).

lhaviland_0-1605614228788.png

Intake Options

The Batch Prediction API can intake (ingest) data from the following:

  • Local File
  • S3
  • AI Catalog
  • JDBC

Output Options

The Batch Prediction API can output (save) data to the following:

  • Local file
  • S3
  • JDBC

Concurrent Scoring

When scoring huge datasets, the Batch Prediction API will automatically split the data into chunks and score them concurrently on the prediction instance specified by the deployment. The level of concurrency is a parameter that the user can directly change depending on the case.

Model monitoring

The Batch Prediction API does not stop you from using all of the model monitoring capabilities you are used to having when working with DataRobot.

  • If Data Drift is enabled in your deployment, any predictions passed through the Batch Prediction API will be tracked as usual
  • If Accuracy tracking is enabled, the output will have the Association ID that can be used later on to report actuals.

Consistent scoring with updated model

If you have have replaced a deployed model after a job has been queued, DataRobot will still use the deployed model at the time of job creation. This ensures that every row is scored using the same mode, and the results stay consistent.

More Information

Documentation for the latest DataRobot Python client is available here.

If you’re a licensed DataRobot customer:

  • Search the in-app Platform Documentation for Batch Prediction API, Intake Options, and Output Options.
  • Search the in-app API Documentation for Batch Predictions (within the API Reference).
Comments
Blue LED

@lhaviland was wondering, where in the API/Docs can I find a reference to the file size limits for input/output and timeout settings?

 

Data Scientist
Data Scientist

Hey @chhay ,

Great question! If you have access to the platform documentation - which you can find this on the top right-hand corner when you are logged into DataRobot, you can search for " Batch Prediction API" and you should get the full documentation. Nevertheless, let me try to answer your questions in short:

- There is no input/output limit.

- The maximum runtime (after which it would timeout) is 4 hours for AI Managed Cloud and unlimited for on-premise installations.

Hope this helps!

 

 

Blue LED

Hey @Thodoris ,

Can you provide me a link in case I am not looking at the right place? Here's where I have looked and did not see limits referenced at all in the api docs (where I would have hoped to find it).

Batch Predictions API Link: https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.22.1/autodoc/api_reference.html?hig...

DataRobot App Docs: https://app.datarobot.com/docs/reference/large-preds-api.html?highlight=batchprediction

Thanks,

Community Team
Community Team

Hi @chhay - I can answer on @Thodoris's behalf, for this specific question. For his answer he reposted the information from this article (from the Limits table, above). Even thought it's not in the product doc, it is accurate. Hope this helps?

Community Team
Community Team

@chhay -- okay we were both looking at the wrong page in the API doc. Sorry about that. You can find the information in the in-app API doc here.

Blue LED

@lhaviland - that's the page I was looking for but couldn't find. Thanks!

Version history
Revision #:
20 of 20
Last update:
2 weeks ago
Updated by:
 
Contributors