Worker Loss Error

Worker Loss Error

I am using the DataRobot python SDK to progrommatically train and test models on new data to get candidates for scheduled model replacement. 

 

it's been running fine for a while now, but I have recently received this error 

datarobot.errors.AsyncProcessUnsuccessfulError: The job did not complete successfully. Job Data: {'status': 'ERROR', 'code': 1, 'description': '', 'created': '2021-05-22T12:01:34.000901Z', 'message': 'Worker loss detected', 'statusType': '', 'statusId': '05853a89-5f2e-416a-8f2c-6b718e74591e'}

 

It would be great if anyone can explain what does this error mean, why does it happen, and how to avoid it in the future.

 

Thanks.

Labels (2)
1 Solution

Accepted Solutions
Linda
DataRobot Alumni

Hi @azamrik - Sorry you haven't received an answer yet.

 

Here's what I've got for you:

message': 'Worker loss detected'

usually indicates that the job was running on a spot instance that was killed. 

 

Can you please try re-running the job and let me know if that succeeds?

Will look for your answer and try to keep you moving ahead.

Linda

View solution in original post

0 Kudos
4 Replies
Linda
DataRobot Alumni

Hi @azamrik I'm hoping someone from the community can help you soon! sorry you're having issues

0 Kudos
Linda
DataRobot Alumni

Hi @azamrik - Sorry you haven't received an answer yet.

 

Here's what I've got for you:

message': 'Worker loss detected'

usually indicates that the job was running on a spot instance that was killed. 

 

Can you please try re-running the job and let me know if that succeeds?

Will look for your answer and try to keep you moving ahead.

Linda

0 Kudos

I'm using DataRobot as a managed service, I'm not hosting it on my own cloud. Could the error still be caused by a spot instance loss? 

 

Also, yes, the job ran successfully when I resubmitted it.

@azamrik Happy to hear you have the job running successfully now!

Also, I've been told a spot loss can be the cause of the issue even though you're using DataRobot as a managed service.

Hope this helps

Linda

0 Kudos