How to check out specific training data?

cookie_yamyam · ‎09-19-2022

Hello.

I have a question regarding training and test data used for modeling.

Is it possible to download training or test data after modeling?

I need to identify which data was used for training model.

Or just want to identify which data is holdout.

Thank you.

Bogdan Tsal-Tsalko · ‎09-19-2022

Hi!

To map cross-validation and holdout folds to original data you should in the Leaderboard, select any model and click Predict > Make Predictions. There select all data to make predictions on. After computing predictions and downloading the file, you will find their rowID - which will reference to rowID in the original file provided for training. Along with predictions, you may ask for up to 5 optional features that might help you understand partition stats. If 5 features are not enough - you may concatenate the prediction file with partition information to the original file.

Hope this answer finds you well,
Bogdan

cookie_yamyam · ‎09-19-2022

Thank you for the answer.

But I don't mean downloading predictions from training or test data.

I just want to identify which data was input for modeling.

When start button is clicked, DataRobot divides training and holdout.

For example, there are A,B,C,D,E,F,G,H,I,J rows.

DataRobot would ramdomly seperate training data like A,B,C,E,F,G,I,J for 80% sample,

and holdout data would D,H rows.

How can I recognize which rows are holdout?

If I wnat to choose holdout data as D,H rows, do I need to seperate data before upload to DataRobot and use 'make predictions' menu?

I'm not sure my explain works to make sense.

Bogdan Tsal-Tsalko · ‎09-20-2022

Hi!

So there are two parts:
1) If you want to provide your train validation partition, then you should use Column-based partitioning (Partition Feature)

2) If you want to get how DataRobot partitioned automatically for you, then you should use my previous answer. Prediction and model, in that case, don't matter. Partition for all models is the same, and that is just one point of access for partition split and predictions on training.

How to check out specific training data?

Modeling

How to stop uploading

How do I upload a JDBC driver

Paxata Cache Folder

how to transform the var type in workbench

Understanding Model