I have a model deployment which retrain model by scheduler.
Is there any method to know the data drift about the different version model?
For example, DatasetV1 for ModelV1 training, DatasetV2 for ModelV2 training.
The abstract representations for time range is :
DatasetV1 : 1~10, DatasetV2 : 11~20 or 1~20.
How to know dataset for 1~10 and 11~20(1~20) whether drift or not?
hey @Rod, the key point is that it doesn't matter that it is 'training data'. All you need to do is have a trained model then make it predict some new data (in this case the training data used for the other model) and the Data Drift tab will show how the two datasets differ. I am assuming the target is the same for both models?
No worries 🙂 Yes DataRobot can detect data drift between any two datasets. The difference between the two datasets will be shown in the 'Feature Drift vs. Feature Importance chart' and 'Feature Details chart' .
@IraWatt Thanks for your replying.
This is typo error, hope you don't mind this 🙂
The dataset timeline is 1,2,3 .... 18, 19, 20. From the old to the new.
As stated above, ModelV1 predict 10~20 and ModelV2 predict 20~30.
Therefore, clarifying this question : "detect data drift from two training data."
Is it feasible on DataRobot?
Hi @Rod , can you clarify what you mean by 'mothed'? Data Drift is calculated by comparing the training data to the prediction data (also known as inference data). Drift is specific to the training data used so for ModelV2 trained on 11~20 to monitor drift it would need to predict 1~10 and vice versa, ModelV1 trained on 1~10 would need to predict 11~20.