Using the Parameterized Batch Scoring Command Line Scripts

cancel
Showing results for 
Search instead for 
Did you mean: 

Using the Parameterized Batch Scoring Command Line Scripts

DataRobot's in-app Platform Documentation identifies some Command Line Interface (CLI) scripts that can be used to create local file batch prediction scoring jobs. These scripts, found by searching the documentation for Batch Predictions Scripts, are available in both Python (generally for Linux and MacOS) and PowerShell (for Windows).

(Note that the deprecated script “Batch Scoring Script” may show up in search results as well. Make sure you ignore that topic and instead select “Batch Predictions Scripts.” If you are a DataRobot Managed AI Cloud user, you can view this direct link to access the documentation for Batch Predictions Scripts.)

When viewing that page of the in-app Platform Documentation, make sure you pay attention to the important instructions at the top of the page. As explained, when using a Python file you need to toggle the executable flag. Then, in either environment, the executable can be placed within the environment path so that it can be run from any location. The scripts themselves are downloadable from the documentation page.

Screen Shot 2020-10-07 at 6.51.53 PM.png

You can use these scripts instead of using the Integrations tab code for Batch usage found in DataRobot. For a specific deployment, select Predictions -> Prediction API -> Batch (toggle); this provides a local file with an example batch scoring script. 

lhaviland_0-1602272045087.png

 

All necessary values (Deployment ID, API Token, etc.) for the example script are hardcoded into the file.  This is in contrast to the CLI scripts for download (mentioned above), which are generic and will expect these same values as command line inputs.  When dealing with large volumes of deployments or automated tools, generally the parameterized CLI version that leverages a single common script is preferred for ease of maintenance and use. The deployment code (from the Integrations tab) is still a great reference for sourcing the values necessary to construct the executable call from the command line.  

The command line utility requires at least three parameters and an API token; otherwise, the default values for all others parameters are appropriate. 

python batch_prediction.py <input-file.csv> <output-file.csv> <deployment_id>

For example:

./batch_prediction.py input.csv output.csv 5becb1bc1234567890 --api_token=abcd1234efgh

The following example specifies the three parameters explained above, plus the following:

  • a surrogate ID column (to keep)
  • the DataRobot application server
  • and to not verify the SSL cert (note that not verifying the SSL certificate is not a recommended practice specifying a verifiable host)
./batch_prediction.py input.csv output.csv 5becb1bc1234567890 --api_token=abcd1234efgh --keep_cols=PassengerId --host="https://app.datarobot.com " --no_verify_ssl --max_prediction_explanations=3

Although the data is returned in the order it was provided, as a best practice it is advisable to specify column(s) with the keep_cols parameter to use as a join key back to the original dataset.

More Information

If you’re a licensed DataRobot customer, search the in-app Platform Documentation for Batch Predictions Scripts.

Labels (3)
Version history
Revision #:
10 of 10
Last update:
2 weeks ago
Updated by: