Fix validation set

Fix validation set

Hi Team, I need help. I have a data set with has only 51 rows/records. one of the column is "Year" and mentioned below is the distribution of records with respect to value of "Year". I want to keep the records of 2019 as validation set. When i tried to do this with partition it's not allowing me to do "Traning-validation-holdout". I was able to do only CV with fold, which considerably reduces the number of records/fold. Is there any way to fix the validation set with respect to record number / group or any method. Thanks in advance Year Records 2013 3 2015 18 2016 2 2017 10 2018 8 2019 9 2020 1
0 Kudos
1 Solution

Accepted Solutions
Thakur
DataRobot Alumni

Hi Manoj,

 

Thank you for reaching out. Using partition, the variable should only have 3 unique values for choosing TVH as the option. If you are using year then you would be running into the issue of having 7 unique years so it can't be used for TVH partition. You should create a new variable with 3 unique values capturing rows for 2019 (validation), 2020 (holdout), and one for the rest of the rows which would be part of the training. Another simpler strategy would be to convert your Year variable as date type before uploading to DataRobot and use OTV as the option for model validation. Below is the link to the documentation of OTV

https://app.datarobot.com/docs/time/date-time.html

Please let me know if you need further help.

Kind regards,

Thakur

View solution in original post

2 Replies
Thakur
DataRobot Alumni

Hi Manoj,

 

Thank you for reaching out. Using partition, the variable should only have 3 unique values for choosing TVH as the option. If you are using year then you would be running into the issue of having 7 unique years so it can't be used for TVH partition. You should create a new variable with 3 unique values capturing rows for 2019 (validation), 2020 (holdout), and one for the rest of the rows which would be part of the training. Another simpler strategy would be to convert your Year variable as date type before uploading to DataRobot and use OTV as the option for model validation. Below is the link to the documentation of OTV

https://app.datarobot.com/docs/time/date-time.html

Please let me know if you need further help.

Kind regards,

Thakur

Hi, Thanks a lot. Partition approach worked.

And i am yet to try the OTV.

0 Kudos