Is there any sampling done in EDA2 when calculating ACE score? The case is two datasets which are only different in the order of rows, are run separately with same OTV setting (same date range for partitions, same number of rows in each partition), and there is a visible difference in the ACE scores. Does the ACE score depend on the order of rows?
EDA1 sample will vary based on order of rows for sure. EDA2 starts with EDA1 and then removes rows that are in the holdout too, so project settings can also matter.
There are 8k rows and 70 features, fairly small datasets.
ACE doesn’t need a large sample: It could be 1k or even 100. If the dataset is less than 500MB, then all rows may be in the sample, but the order may be different.