cancel
Showing results for 
Search instead for 
Did you mean: 

Worker Loss Error

azamrik
Image Sensor

Worker Loss Error

I am using the DataRobot python SDK to progrommatically train and test models on new data to get candidates for scheduled model replacement. 

 

it's been running fine for a while now, but I have recently received this error 

datarobot.errors.AsyncProcessUnsuccessfulError: The job did not complete successfully. Job Data: {'status': 'ERROR', 'code': 1, 'description': '', 'created': '2021-05-22T12:01:34.000901Z', 'message': 'Worker loss detected', 'statusType': '', 'statusId': '05853a89-5f2e-416a-8f2c-6b718e74591e'}

 

It would be great if anyone can explain what does this error mean, why does it happen, and how to avoid it in the future.

 

Thanks.

Labels (2)
4 Replies
Linda
DataRobot Alumni

Hi @azamrik I'm hoping someone from the community can help you soon! sorry you're having issues

0 Kudos
Linda
DataRobot Alumni

Hi @azamrik - Sorry you haven't received an answer yet.

 

Here's what I've got for you:

message': 'Worker loss detected'

usually indicates that the job was running on a spot instance that was killed. 

 

Can you please try re-running the job and let me know if that succeeds?

Will look for your answer and try to keep you moving ahead.

Linda

0 Kudos

I'm using DataRobot as a managed service, I'm not hosting it on my own cloud. Could the error still be caused by a spot instance loss? 

 

Also, yes, the job ran successfully when I resubmitted it.

Linda
DataRobot Alumni

@azamrik Happy to hear you have the job running successfully now!

Also, I've been told a spot loss can be the cause of the issue even though you're using DataRobot as a managed service.

Hope this helps

Linda

0 Kudos