Question for production/deployment

Abdalmahsan Al Firm · ‎05-15-2022

Dear DataRobot team,

We are working on an advanced analytics project in which we develop 20+ cross-sell models.

Given the objectives of the project, we have written scripts connecting to DataRobot API which will trigger every month to:

train models on monthly data,
test model on holdout set (latest month), get performance estimates (ROC, uplift, etc.)
finalise best model
train on complete monthly data (including latest month)
generate predictions on scoring set

This implies (as per this discussion on DataRobot community) that we will need to setup new projects and train new models every month.

Question:

In this situation, deployment functions do not seem relevant for us as we build new models automatically via API (instead of challenger option), and get prediction on scoring set.

Is this correct? If not, how can we best utilise the deployment functions in DataRobot?

dalilaB · ‎05-16-2022

I'm working on something similar, while I'm not finished, I will be using MLOps to retrain my model within the same project. By the way, unless you freeze a model, parameters will be tuned during model building. The reason for doing so, is that a model that worked on previous data will work on the upcoming data.

You don't need to build a new project to do what you are doing, you just need to retrain the model, and this can be done in MLOps. I doubt that creating a new project will result in a different model recommendation.

desmond_lim · ‎05-18-2022

@Abdalmahsan Al Firm

The scenario you describe is to periodically retrain your models with new data regardless of model decay or deterioration. You have specified a time granularity of 1 month however if there is a drastic change in your environment and data to cause model decay would you then update your models immediately?

Allow me to elaborate further:

1. Model development can be automated but I usually find it more useful to understand the drivers of the predictions from the model Understanding options in the platform.

2. Any environmental or data change causing you to update/retrain your models could be significant to you and the model Understand module would be able to better help redirect the company's resources for investigation or action.

3. DataRobot MLOps allows one to track model performance on a much finer granularity than monthly and hence allow one to react much quicker.

4. DataRobot MLOps also allows one to identify if the performance problem is due to model decay or just a user or systems or communications issue.

5. DataRobot AutoML and MLOps serve two distinct functions of a wholistic business process and one can use each independently, but as you can see DataRobot provides a comprehensive solution in one platform.

Below are the links to the instructor-led DataRobot AutoML I and MLOps I classes for better understanding of the two components:

https://university.datarobot.com/automl-i

https://university.datarobot.com/mlops-i

Hope that helps.