cancel
Showing results for 
Search instead for 
Did you mean: 

Data Drift Showing Inaccurate results

Data Drift Showing Inaccurate results

Hello colleagues

Recently I trained a classification model on the platform and am observing some difference in Data distribution between training and scoring data.

Plz see screenshot.

Screen Shot 2021-10-04 at 3.39.03 PM.png 

But when I chk my actual data for the COUNTRY variable, I am not seeing any new values. If you see in the screenshot, on the far right end we are seeing NEW VALUES in scoring set.

But here are the results from actual data.

Training set;

values are PH, US, Other, GB, PK, IN, CN

 

And for scoring set the values are; US, PH, Other, GB, PK, IN, CN.
 
I am not being able to see any new levels/values for COUNTRY variable in scoring set. But why is Datarobot saying, there are new levels showing up and hence a drift.
Is this a bug. kindly advise. 
Labels (1)
1 Solution

Accepted Solutions

It sounds then like sampling is not the explanation. At this point I would recommend filing a support ticket so that our team can do a more detailed investigation.

View solution in original post

0 Kudos
5 Replies
Linda
DataRobot Alumni

Hi @Jayant - Sorry you haven't received help yet with this question. I've elevated it to the DataRobot team and someone should answer momentarily.

0 Kudos
jmbledsoe
DataRobot Alumni

How large was your training dataset? I know that the training baseline only consists of a sample of the training data, but it's fairly large (about 500 MB I think) so if your training dataset it small it would encompass the entire thing.

0 Kudos

Appreciate your response. My final model, (which is being used for predictions) is trained on the entire train data, consisting of 80K rows. Plz see screenshot. I am not sure its the size of train data here, because the model is exposed to entire train data. Let me know if more information is needed here. It is crucial for me to know why is this drift occurring here.

Screen Shot 2021-10-07 at 9.39.00 AM.png 

0 Kudos

It sounds then like sampling is not the explanation. At this point I would recommend filing a support ticket so that our team can do a more detailed investigation.

0 Kudos

Hi @Jayant - To create a ticket, you can simply send an email to support@datarobot.com with the information you shared here. If you’d prefer, I can create the ticket - just let me know.

Thanks

Linda

0 Kudos