I'm trying to deploy a model that I've chosen from the model registry in DataRobot and I'm encountering a few issues that I've tried troubleshooting with the DataRobot docs as well as the resources available in the community. Some of the issues I'm having are:
Range in predictions tab:
Making predictions in deployment tab:
Sorry for the long post. Thanks for your time
Solved! Go to Solution.
Hey @Shai, I think I can answer these for you.
Association ID: There is no practical need to upload the association ID with your training data. The primary purpose of the association ID is to track model performance over time on new data. Out in production, you will feed your model new rows to predict every so often, but in practice you won't have the actual results yet. Over time, you'll observe actual results, and you'll want to map those actual results back onto the predictions you made before. Because the training data already has the outcome/target column, there's no need to join actuals on at a later time.
As for the time series example, I'd be worried about a situation where you make multiple predictions each day, possibly one for series A and one for series B, or maybe you need to make predictions every few hours (for example). If you're confident that the date is a unique identifier, then I don't think it's a problem to use it.
Range in predictions tab: You are correct. You cannot look at a granularity (like month) if you don't yet have a month of observations. You can adjust the slider to view different time periods, anything between deployment date and current date.
Making predictions in deployment tab: When you navigate to the predictions-->make predictions subtab of a deployment, you can drop in a prediction dataset (e.g., csv) right there to be scored. This may be different from the one you dropped in on the Models tab (i.e. model leaderboard), and typically it will be different. It sounds like your question might be specific to time series, in which case your guess is correct: input a scoring dataset with enough filled-in rows of data for the time series model to "look back" and the number empty rows (with dates) you want to predict forward.
Generally, the deployment itself doesn't need new scoring data to "run." The deployment is active, waiting for that new scoring data. But yes, to monitor drift and health, you would need to input scoring data to be predicted. To monitor accuracy, you would have to separately submit actuals to the deployment. That will require an association ID field with column name and values that match those in your scoring datasets that were already predicted.
Let me know if there's anything I can clarify in there!
@matthias_kullowatz, thank you so so much for this, I really appreciate it.
Everything is clear to me from your response but if I get stuck with anything I'll just post another question
Again, thanks for the reply. It has cleared up the confusion I had
Shai, here's an article on a few ways to upload actuals once you have them - as well as some considerations when choosing an Association ID value. https://community.datarobot.com/t5/resources/measuring-prediction-accuracy-uploading-actual-results/...