Hey Amodi - thanks for reaching out!
Have you seen some of the community content for setting/using forecast distance? Here are a few things that may help:
Time Series—Classification
Demand Forecast—Multiseries
Data Setup for Time Series Predictions
Hoping other communuity members can help you out too.
-linda
Hi Amodi!
Let's set up a few parameters and then talk through how to interpret forecast distance. First, let's pretend in your set up you are predicting 1 to 7 days out. We will use today as our forecast point. How I like to think about forecast distance is if we made the prediction today you will have a prediction for tomorrow, the day after, ... all the way to 7 days from today. Tomorrow would be a forecast distance of 1, the day after will be forecast distance from 2, and 7 days from now will be a forecast distance of 7. Now, let's pretend we are making the prediction again with tomorrow being our forecast point. The day after tomorrow's forecast distance is now 1. How I generally think of forecast distance is from the point in time you are making the prediction how far into the future are you predicting.
When you are predicting for a forecast range let's say last week, every one of those days from last week can have a prediction from different points in time from the previous week meaning that for Monday of last week you can have a forecast distance of 1 to 7.
Hope this helps!
Hi Amodi!
What a thoughtful question! Inside DataRobot, when we create all of the other time series derived features, it takes into account the quarterly and monthly seasonality if you set the feature derivation window far back enough to span that time. When you make a range prediction, the question you are trying to ask is which forecast points do you use to determine the final trends. I hope I am understanding that correctly. Two things to keep in mind when making those predictions are at what point in time will you make the prediction and when will you use them. If you are making a weekly prediction that will forecast out 12 weeks in advance every time, wouldn't you just look at the first forecast distance for every week that you made the prediction since that contains the most information about the following week? If you are making a prediction this week for the next 12 weeks and you will not be making another prediction until 12 weeks from now then you would use all 12 forecast distances. I think using the range predictions is good for historical data, but if you are truly predicting into the future then I would use a single forecast point. Again, keep in mind how often you will make predictions, and when you will execute on these predictions and that should help you decide what to use.
The historical / forecast range predictions are generally used as part of a model evaluation process. An example would be training the model on 2019 data and then seeing how it would have preformed in the first half of 2020. To do this would require stepping through 26 forecasts, assuming you were making a forecast every week (i.e. W1 forecast out 12 weeks, W2 forecast out 12 weeks, W3 forecast out 12 weeks, etc.). We do that stepping process automatically to generate for every week, what were the 12 different forecasts (1 week out, 2 weeks out, 3 weeks out, etc.) so that you can do your own analysis (e.g. to compare to a different forecasting methodology, to look for issues, to estimate ROI etc.).
Thanks a lot for time to answer this query.
Yes, I understand that historical / forecast range predictions are generally used as part of a model evaluation process and various forecast distance are going to be used for historic analysis. But how does a forecast distance is related back to my forecast (future unseen prediction). For ex. - I set the derivation and forecast window to 12 and, In my analysis I found out that forecast distance 7 was able to capture a better trend with good score of metric. Then how do I use THAT (forecast distance 7) in making my prediction and should I?
I am sorry but I have few scenarios where i would want more clarity-
> how would we relate to a data that has 2 seasonality monthly and quarterly. Lets take an example for sales - What if sales on 30th June is auto-correlated to previous quarter sales and previous month sale. In this scenario how should we choose a forecast point.
The forecast point is related to describing what is in the future vs. what is in the past. Say you are trying to make a forecast for the 30th from the 23rd. Then I would have the 23rd be the forecast point,. You can then set the feature derivation window start date and end dates if you want to leverage different points in the history (e.g. given a long enough history we will generate averages over the same week in the previous month or the average over the previous month or quarter) or specify different seasonalities (e.g. 90 day seasonality) directly under the advanced options. That will make it easy to compare different approaches like-for-like as the forecast windows will align, and also mean you can compare to the simple approaches (like using the latest known value as the forecast).
> What if there is no seasonality and no trend (random time series), the also would it make sense to use forecast points.
Yes. We still need to know what points you want forecasts and what to use for them (even if just to know whether you want to predict into the same data that was used for training.
> What if there is a cyclic seasonality which is too big for our derivation window to catch
Many of these are designed to be captured using the seasonal dummies in the models (e.g. by learning per-month effects to capture the yearly seasonal pattern). You can also specify a seasonality separate and longer than the FDW (e.g. generate stats and lags for the past month but difference relative to the same value last year) in the advanced options.