This is a bit of a long-winded "series" of questions, but they all revolve around the same topic and are correlated.
I remember reading somewhere, either on a D.R course or D.R documentation that, depending on how you select your FD, D.R will train multiple models on some copy of your dataset corresponding to different forecast distances, keeping into acount the upper limit set by you.
To validate this, using the "Store_Sales_Daily_training.csv" dataset, with 942 records, in an AutoTS forecasting-based project, I varied the Forecasting Distance, keeping everything else the same (i.e. Holdout+Backtests, FDW etc..).
This is how the # of rows in the derived dataset varied with varying FDs:
With a FDW of [-35,0], 90 day holdout set, 5 day gap, 2 backtests+Holdout:
FD: +6 to +6 = 870 rows
FD: +1 to +1 = 875 rows
FD: +1 to +30 = 25,380 rows
Having said all this, this raises a few questions.
Assuming D.R does in fact train multiple models on copies of the dataset for different FDs, and assuming you want to train for a FD of +1 to +5:
Is D.R testing multiple FDs per unit of time, incrementaing by 1 unit of time as it goes on? (i.e. training on +1 to +1 | +2 to +2 | ... | +5 to +5)
Is D.R testing the FDs in a range format, incrementing by 1 unit of time as it goes on? (i.e. +1 to +2 | +1 to +3 | ... | +1 to +5)?
After model training and into model deployment, are these extra "hypothetical" models different from each other in that they were each used optimized for different FD ranges? P.S (If they are different, how are these multiple models being handled in deployment?)
Is D.R using the same model and gradually or sequentially training it on tougher ranges of FDs every time until it gets to your desired FD?
In the D.R course, "Lab: Build a Time Series Model", they use a training dataset called "Store_Sales_Daily_training.csv" as outlined above. This training dataset contains 942 records in total. When uploaded to D.R as an AutoTS project with a FDW of [-35, 0], a FD of [1, 30], a holdout set of 90 days and a gap length of 5 one would expect the available training data to be a total of: 942 - (1+35+90) = 821 (p.s Not sure if Gap Length also affects available training data).
According to D.R, however, under the Derived Modeling Data -> View more info tab , the new derived dataset contains a total of 25,380 rows across all forecast distances since D.R mentions that for each forecast unit of distance, it create a model, and consequently, a copy of the dataset to train with.
If we divide this value by the FD value (i.e. 30), we can get the # of records per forecast distance that D.R is training on. So if we were to devide 25,380 rows by the # of forcast distance days we should get what I expect, which is 821 but instead I get 846 which is not equal to 821. Where did the extra 25 rows come from? What did I miss?
This is just one example taking a FD of +1 to +30. If you calculate what should the resulting amount of rows be for a FD of +1 to +1 or +6 to +6, it doesn't add up.
Any help on the aforementioned questions would be much appreciated.
From the Time Series: Session A course, whether DataRobot creates 1 model or 5 separate models in forecasting is determined by the type of model. For instance, Elastic nets create multiple models for each time period where as XG boost does not use multiple models, instead a time parameter is passed to one model.
For Elastic nets (Elastic Nets Docs ) this would look like:
1. Is D.R testing multiple FDs per unit of time, incrementaing by 1 unit of time as it goes on (i.e. training on +1 to +1 | +2 to +2 | ... | +5 to +5)
I haven't yet done the maths on your rows, but I can confirm from testing that gap (COG) definitely does reduce the amount of training data.