It defaults to 35 when I am doing a daily prediction of one week through four weeks using 13 months of data.
What are the Best Practices to very much harness the power of having so much data available. Of course, to throw a monkey wrench in it, there is definitely:
Taking into account that weekends will have structural zeros.
Seasonality is huge here. So approximately one month of a priori prediction data probably will not cut it.
I will incorporate Holidays into the mix when there will be no deliveries.
Thank you for the help you already provided Shu. We have already demonstrated, for the most part, excellent recovery of weekends when there is supposed to be no data.
From a Best Practices perspective, 35 days for the Derivation Window seems too low. Maybe try 100 or even 365? I am also assuming the time that DataRobot needs to build the models based on many more days grows exponentially?
The binary weekday known in advance feature can be the easiest to apply to address the structural zeros for weekends.
The FDW requires experimentation for your specific use case, and can be most efficiently tested out by building a model factory via the API, which essentially loops through a bunch of shorter/longer window sizes to build the corresponding projects.
Here are a couple reference notebooks based on the Python API:
May I ask for a good site to learn how to use the APIs?
For #2 it suggests to use Python as well as the API - I currently have Python on my machine but via Anaconda. Should I install Python through DataRobot?
As previously mentioned, we got decent recovery of Weekend Structural Zeros by having put them in with the a priori data. Not perfect, but decent for the most part. Do you recommend that we do a different or complementary strategy to do the weekends similarly to how we incorporate the Holidays when there will be no deliveries? I feel that we are already in ok shape there - cannot argue with feelings either.
Lastly, I set up the DataRobot to automatically generate the Holiday Calendar. They are all designated as P1D. In previous correspondence we also feel it is important to examine target dates that correspond with our Sales Events. In other words rather than have Holidays where we will not have zero deliveries, we might have quarterly dates when we might expect to see a bump or increase of deliveries for certain periods of time. Would you recommend a way of coding this? Of course it is an empirical question and my hunch is that there might be a "lag delay" too. Again will be an empirical question.