Dataset split

Dataset split

Hi,

Is there any settings that needs to change to allow for 20% of data set to be used for test, 80% for training using Data robot.As I've limited experience working with Datarobot. Please help me in this regards!

0 Kudos
1 Reply

Hey @Sreerag_k! In most cases when using random or stratified partitioning in DataRobot, the default is an 80% training sample and 20% test, or "holdout", sample. That 80% is further split up into five (by default) cross validation partitions, each equal to 16% of the total data. Instances where the defaults are different include projects with very large datasets and Auto Time Series projects, to name a few.

 

You can manipulate the partitioning with a lot of flexibility. Here is the documentation with details about the five distinct partitioning methods available in DataRobot, and here is the documentation on how Autopilot uses those partitions in the default scenario. 

0 Kudos