I am running a multi-series model with each series having approximately 30 rows of data each. I noticed that the accuracy scores suggest that I run the two different series in separate projects. On breaking up the data and running the separate projects I noticed that the volume of data does not meet the minimum requirements of 35 rows.
I'm just wondering why I can run a multi-series process even though the volume of data for each series is less than 35 rows? I'm GUESSING DataRobot is looking at the combined dataset, i.e. 60 rows, to derive "average features" that are applicable to both series, but maybe some clarity on the process will help me better understand the problem I'm having.
HI @Anonymous, I have seen this article previously, and although this is a really useful article that helped me through the first models I ran, I don't think there is anything in there that sheds light on the issue I'm currently having
Hi @Shai ,
Time Series data size limitations are comprised from the following restrictions:
1. We need at least 20 data points to train models.
2. We need at least 4 data points to validate models.
3. We need to have at least some amount of history for feature derivation, hence we require additional 11 points for that.
When you combine two series into a multi-series project with 60 rows, the training data and the validation data are combined from two series, allowing you to have >=20 training rows and >=4 validation rows. DataRobot Time Series in multi-series mode doesn't train series separately, most of our models take advantage of multi-series learning, hence it is not a problem to have less than 20 training rows or less than 4 validation rows for a single series, but it becomes a problem when you try modeling your series separately.
Really sorry for the late reply, I missed the email notification and didn't go back to the project because I managed to get a more enriched dataset for single series modelling.
Thanks for the clarification on the restrictions. I also did see somewhere that DataRobot uses cross model training (although I thought you had to manually choose this option when setting up to run the process in the environment - unless cross series learning and multi-series learning are two different tools).
I guess what I wanted to find out initially, when first asking the question, was just some clarity on how the multi-series learning works because say, for example, you have a single series project with 30 rows of monthly data and you want to perform time series modelling. You then have a multi-series project of, say, 10 series of 4 rows each. I'm struggling to see how a single series project with 30 months of historical data is less informative than the multi-series project with only 4 months of historical data for each series in the project (assuming each series in the project covers the same 4 months).
If I'm looking too deeply into this please let me know
Thanks for your time
Hi @Shai ,
Just to clarify first. Cross series functionality enables additional modeling features aggregated across all series, or aggregated across a certain group of series defined by a key. For example, if you are modeling sales for a demand forecasting use case, a good feature might be a total number of sales across all series or an average number of sales over the last month in all series in the same department. Hence cross-series are meant to enrich the feature derivation space of a multi-series project with additional predictors.
Regarding multi-series modeling in general. You are certainly correct that modeling ten series with 4 rows of history is rather inefficient. DataRobot does not restrict such scenarios explicitly since it is still capable of modeling such datasets. It is still possible though, as multi-series models like ENET or XGBoost will be trained on the data from multiple series, so the combined trained and validation sets for them will be large enough to meet our minimum technical requirements.
Hope this is helpful, let me know if you have more questions!
Thanks for this, I can now see the difference.
So a model such as ENET or XGBoost allows for "reliable" results to be produced from multi-series models that has limited history? If this is the case, I guess my confusion stemmed from my lack of knowledge I have about the mechanics of the models DataRobot uses; I will look into these (and other) models then.
Again, thanks for all the help
It can produce reliable results if the relationships are relatively stationary / stable over time with enough variation across the series for the models to learn those relationships.