Under any given trained model, if I wanted to compute the predictions of the training dataset, D.R would perform something called Stacked Predictions.
I'm wondering if Stacked Predictions is just a fancy way of saying Cross Validation. Is this a right assumption?
Because when you do click "Compute Predictions", on the right hand side, I only saw one model being used by D.R to compute the predictions. But the problem i am seeing is that in order for D.R to give you out-of-sample predictions for all training data, it would have to use the partition-specific model, which as far as it's concerned is an out-of-sample partition, to compute the predictions for the rows corresponding to that specific partition.
Interestingly, if i upload a dataset and change the initial partition method to TVH, rather than CV, and compute the predictions for all traning data using a trained model, it somehow still comes up with partitions.
So this got me wondering if it is re-partitioning and re-training multiple models on these various partitions behind the scenes, even after model-tranining is done. Any ideas on whether this is the case?
According to the Docs on Stacked Predictions - "DataRobot builds multiple models on different subsets of the data. The prediction for any row is made using a model that excluded that data from training."
For TVH partitioning, although the UI looks like one model is being trained for stacked predictions if you click it to look at the log (shown below) it says "CV started" therefore, it is in fact multiple models being trained.
With CV partitioning for your original model It wouldn't make sense to train any new models for stacked predictions as you already have the models from CV which can already predict the training data.
The documentation doesn't specifically say Stacked Predictionsperforms CV but the logs give it away. Hope this answers the question.