cancel
Showing results for 
Search instead for 
Did you mean: 

dr.BatchPredictionJob.score_to_file Error

dr.BatchPredictionJob.score_to_file Error

Datarobot Team,

Method in question: dr.BatchPredictionJob.score_to_file

scenario:

A)

2 different deploymentId.

It looks like the first job that ran goes through and the second job takes about 2 minutes to respond back with message: Timed out waiting for download to become available for job ID [JobId]. Other jobs may be occupying the queue. Consider raising the timeout.

When I try to search for the job, using this: dr.Job.get([ProjectId], [JobId]) I get back: 404 client error: {'message': 'Could not find job [JobId] in project [ProjectId]'}

Based on the documentation, the job should be queued up but I'm not seeing that happen either.

B)

When the job errors from the client side and the server side (datarobot) doesn't show any error and continues to score, how can I retrieve the job again? The call to score_to_file is blocking and when it errors it doesn't return any jobId. I can see that the job is still running on datarobot but can't get it. Am I SOL in this case?

1 Solution

Accepted Solutions

Hi @chhay - in a browser you're logged into or via the API you can check a URL point to see your last batch prediction jobs.  I believe they are held for 48 hours at this location: https://app.datarobot.com/api/v2/batchPredictions/ - with a job id you can get the metadata for that specific job at https://app.datarobot.com/api/v2/batchPredictions/1234567890/ - replacing the ID with your own.  I am giving you the raw direct endpoints here, they will require a header passing your API token as well; I'm not sure if the python SDK library wraps these particular endpoints at the moment.

It should be noted that dr.Job.get is operating at a different kind of job and queue; these are jobs at a project level that involve project queue efforts, such as running a model on the leaderboard, or calculating the feature impact for a model.

Your scoring however is working on the level of a deployment; and a job there refers to a batch prediction job, and the batch prediction queue (typically 1 job is run at a time in a FIFO queue, and all prediction resources are dedicating to scoring that job as it runs.)

For large files the connection is more sensitive to upload/download hiccups; if an option, something like object storage (like AWS S3) might be something to consider.  You can retrieve info about the previous jobs as noted above, and in the case of a file to file scoring job, there will be a download link in the metadata where the scored results can be downloaded.  This link will be valid for 48 hours, after which the result is purged.

I also wanted to note you likely want a passthrough column so you can easily join your scored data back to the original dataset, so you can consider adding this within the job details:

'passthrough_columns': ['CUSTOMER_ID'],

View solution in original post

2 Replies

Hi @chhay - in a browser you're logged into or via the API you can check a URL point to see your last batch prediction jobs.  I believe they are held for 48 hours at this location: https://app.datarobot.com/api/v2/batchPredictions/ - with a job id you can get the metadata for that specific job at https://app.datarobot.com/api/v2/batchPredictions/1234567890/ - replacing the ID with your own.  I am giving you the raw direct endpoints here, they will require a header passing your API token as well; I'm not sure if the python SDK library wraps these particular endpoints at the moment.

It should be noted that dr.Job.get is operating at a different kind of job and queue; these are jobs at a project level that involve project queue efforts, such as running a model on the leaderboard, or calculating the feature impact for a model.

Your scoring however is working on the level of a deployment; and a job there refers to a batch prediction job, and the batch prediction queue (typically 1 job is run at a time in a FIFO queue, and all prediction resources are dedicating to scoring that job as it runs.)

For large files the connection is more sensitive to upload/download hiccups; if an option, something like object storage (like AWS S3) might be something to consider.  You can retrieve info about the previous jobs as noted above, and in the case of a file to file scoring job, there will be a download link in the metadata where the scored results can be downloaded.  This link will be valid for 48 hours, after which the result is purged.

I also wanted to note you likely want a passthrough column so you can easily join your scored data back to the original dataset, so you can consider adding this within the job details:

'passthrough_columns': ['CUSTOMER_ID'],

@doyouevendata ,

thanks for the suggestion. I've since changed to just use the .score method and it behaves as expected.

 

0 Kudos