cancel
Showing results for 
Search instead for 
Did you mean: 

How to check out specific training data?

How to check out specific training data?

Hello.

 

I have a question regarding training and test data used for modeling.

Is it possible to download training or test data after modeling?

I need to identify which data was used for training model.

Or just want to identify which data is holdout.

 

Thank you.

Labels (1)
0 Kudos
3 Replies

Hi!

To map cross-validation and holdout folds to original data you should in the Leaderboard, select any model and click Predict > Make Predictions.  There select all data to make predictions on. After computing predictions and downloading the file, you will find their rowID - which will reference to rowID in the original file provided for training. Along with predictions, you may ask for up to 5  optional features that might help you understand partition stats. If 5 features are not enough - you may concatenate the prediction file with partition information to the original file.

Hope this answer finds you well,
Bogdan

Thank you for the answer.

 

But I don't mean downloading predictions from training or test data.

I just want to identify which data was input for modeling.

When start button is clicked, DataRobot divides training and holdout.

For example, there are A,B,C,D,E,F,G,H,I,J rows.

DataRobot would ramdomly seperate training data like A,B,C,E,F,G,I,J for 80% sample,

and holdout data would D,H rows.

How can I recognize which rows are holdout?

If I wnat to choose holdout data as D,H rows, do I need to seperate data before upload to DataRobot and use 'make predictions' menu?

 

I'm not sure my explain works to make sense.

 

0 Kudos

Hi!

So there are two parts:
1) If you want to provide your train validation partition, then you should use Column-based partitioning (Partition Feature) 

2) If you want to get how DataRobot partitioned automatically for you, then you should use my previous answer. Prediction and model, in that case, don't matter. Partition for all models is the same, and that is just one point of access for partition split and predictions on training.