Model Deployment - TS - DataRobot Community

Shai · ‎06-24-2021

Hi

I have quite a few questions on this aspect on DR, which may seem trivial (I have done the DR University courses, looked through the docs and have previously posted a question (which was well answered) but I feel like there are still a few things I'm missing in terms of understanding.

Predictions in predict tab vs deployment tab

The predicitions DR is outputting for the same model and same input file is outputting two different results. Is there a reason for this other than human error? If so, I will reinspect the files I used.

Technical requirements for data drift and accuracy

I see the minimum number of predicitions to be made by the model for both of these to be "enabled" is 100 predicitions. My issue is that, because I'm doing monthly predictions, I'd need to have the model predict 100 months into the future (unless I reformat the input file such that the empty target variable cells in the excel file have time stamps such that I reduce the number of months into the future it is predicting i.e. instead of predicting July 2021 - July 2018 + 99 months into the future, I have it from, say, July 2018 - July 2018 + 99 months into the future. Are there any problems that this may cause? (I think this is the biggest issue I have and I think I'm lacking a bit of understanding)

Further, I tried both of these methods and it only seemed to increase the number of total predictions by 12 every time I uploaded a dataset (I'm assuming this has something to do with the FW I set when calibrating the model at the beginning - to 12 months). I'm not understanding how this is linked since is it not just dependent on the file I'm feeding DR and essentially "forcing" it to make predictions for the empty cells in the excel file corresponding to the cells filled with time stamps?

Also, even if I uploaded the same excel file (asking DR to predict the same 12 months), it updated the total number of predictions made. I'm not seeing how the same output values will help enable the drift and accuracy since I can upload the same dataset until I reach 100 total predictions. Unless DR doesn't recognise this and will just give a drift and accuracy depending on the input files, producing unreliable results for drift and accuracy?

General process for deployment

Once you have meet the technincal requirements for drift, you get a value for that based on the data you have given for it to predict (the scoring data?); and a value for accuracy depending on the actuals you input. Now, depending on the timeline I created implicitly through the time stamps for which I am prediciting, I will have to wait X amount of months until I "catch up" to the latest predicted value in time for which to upload the corresponding actual to update the accuracy (I'm aware I can do it one month at a time when the new actual value becomes available but I guess the question remains the same) - what does the model do in the meantime?

I guess this sounds like a weird/trivial question to ask but, to elaborate, I've noticed the service health status becomes "unknown" after a couple of days without feeding it predictions or actuals. Why would it not just retain the same status from its most recent job?

There are some other questions that I have but I'm struggling to articulate them so I think these questions are a good start to create a dialogue with experts on this matter. Also I think even, regardless of the questions being answered, discussing these issues with someone else will help shed some light on them.

Sorry for the long post - thank you all who took the time to read thus far 🙂

Model Deployment - TS

Model Deployment - TS

Oracle

How to make your own lagged features

Google Ads use case

Feature Generation

Downloaded Predictions do not Match Targets