I am using the DataRobot python SDK to progrommatically train and test models on new data to get candidates for scheduled model replacement.
it's been running fine for a while now, but I have recently received this error
datarobot.errors.AsyncProcessUnsuccessfulError: The job did not complete successfully. Job Data: {'status': 'ERROR', 'code': 1, 'description': '', 'created': '2021-05-22T12:01:34.000901Z', 'message': 'Worker loss detected', 'statusType': '', 'statusId': '05853a89-5f2e-416a-8f2c-6b718e74591e'}
It would be great if anyone can explain what does this error mean, why does it happen, and how to avoid it in the future.
Thanks.
Solved! Go to Solution.
@azamrik Happy to hear you have the job running successfully now!
Also, I've been told a spot loss can be the cause of the issue even though you're using DataRobot as a managed service.
Hope this helps
Linda
I'm using DataRobot as a managed service, I'm not hosting it on my own cloud. Could the error still be caused by a spot instance loss?
Also, yes, the job ran successfully when I resubmitted it.
Hi @azamrik - Sorry you haven't received an answer yet.
Here's what I've got for you:
message': 'Worker loss detected'
usually indicates that the job was running on a spot instance that was killed.
Can you please try re-running the job and let me know if that succeeds?
Will look for your answer and try to keep you moving ahead.
Linda
Hi @azamrik I'm hoping someone from the community can help you soon! sorry you're having issues