cancel
Showing results for 
Search instead for 
Did you mean: 

When do we unlock holdout data?

Stephem
Blue LED

Hi, folks.

 

Based on the default modelling options, there is holdout data (20%) that is held.

I understand that once we unlock the holdout data, there is no going back.

My questions are:

 

1. When do we unlock the holdout data? At the very end of the process? If so, does it mean we can no longer run any other model, as the results would be affected?

2. Or can new models be run after unlocking the holdout data? Asking because I see that the validation, cross-validation and holdout scores are all different for the new model that I run. Is there a reference you may refer me to, in terms of how unlocking the holdout data affects new models being run after that?

 

Many thanks,

 

2 Replies
sam632
Image Sensor

I don't have answers for everything you're asking but I see this in the datarobot docs -

 

"You should only unlock your holdout data after having made all your model-related decisions. Once your project's holdout has been unlocked, it cannot be re-locked."

 

on this page - https://docs.datarobot.com/en/modeling/build-models/build-basic/unlocking-holdout.html#unlock-holdou...

 

Sam

mcohen
Data Scientist
Data Scientist

Unlocking the holdout set doesn't preclude you from building more models, but in general, it is good data science practice to not peek at the holdout until after you've decided on your model.  There are various reasons and opinions behind that, and you can find a lot of perspectives online.  Here's one:  https://neuroneurotic.net/2016/08/25/realistic-data-peeking-isnt-as-bad-as-you-thought-its-worse/