Train partition performance issue

Train partition performance issue

During the test of the DataRobot platform, I had met some uncertainty.
After the final model is built I can easily get Validation and Holdout performance metric values.
My question connected with Train partition performance. As I understand Train performance can be calculated in 2 different ways:

  1. I have to go to the Model -> Predict -> Make Predictions and click "Compute Predictions" on Training data. After that whole sample with predictions can be downloaded and split into Partitions (similarly to DataRobot). Finally Train/Validation/Holdout performance metric can be easily calculated (manually on python using the local computer).
  2. Perform the External test (Model -> Predict -> Make Predictions) on same (whole) dataset. After calculation and downloading predictions, the rest of the steps are the same to the previous approach (split into partitions and calculate metric)

The problem is different metric and predicted values (the difference is significant) on the Train(only) partition of the dataset if we compare 2 approaches of getting it.


In the above example I got the next values of AUC:
Validation :
DataRobot UI - 0.7638
Approach 1 - 0.7638
Approach 2 - 0.7638
Holdout :
DataRobot UI - 0.7585
Approach 1 - 0.7585
Approach 2 - 0.7585
DataRobot UI - ???
Approach 1 - 0.7546
Approach 2 - 0.8313

1 Reply
DataRobot Alumni

So this is a great and careful analysis!  

DataRobot has tried to build some safeguards into getting predictions on training data.  As you are aware of, if you train on some data and then ask for predictions on that data - you will get unrealistically high performance.  

As a guardrail in DataRobot, when you perform predictions using the Training Data option, we use stacked predictions.  With stacked predictions, DataRobot builds multiple models on different subsets of the data. The prediction for any row is made using a model that excluded that data from training. In this way, each prediction is effectively an “out-of-sample” prediction.

If you just upload the training data, you get around that safeguard and end up with misleadingly accurate predictions (overfit).

You can see more details about this in the documentation in the section on Make Predictions Tab (this link works for trial users)