cancel
Showing results for 
Search instead for 
Did you mean: 

Question about Partitioning Function in Time Series Modeling

Question about Partitioning Function in Time Series Modeling

Question about Partitioning Function in Time Series Modeling

I know that Partitioning function in Time Series modeling is needed for data cross-validation.

 

Regarding the function, I am curious about the following points. I've been studying the document to answer the question, but the question is not easily resolved.

 

1. I wonder what exactly the roles of the Train, Validation, and Test divisions are.
2. Using the Advanced Option, you can set the Test Set to a minimum, but it is not possible to completely eliminate it. Why is it impossible to remove the Test Set?
3. I can expect the model to be tuned to the validation set up for backtesting. In this case, is it correct to expect that the parameter with the lowest average performance of all Backtests will be tuned to the parameter of the final model?
4. If the latest data is of high importance, I would like to reflect in the model training that the validation environment in the recent period is the most important. In this case, can I set the weights by Backtesting? Or is there any other way to achieve the purpose?

 

Thanks

Labels (1)
0 Kudos
1 Reply

Hello, Kim! Thanks for your question.

We have documentation covering our Time-series partitioning here and here. Best it is described by our set up screen: 

 
 

otp-backtesting

 

But I will try to answer your questions:
1) Holdout (Test is confusing naming) - has the same meaning as in any Data Science project - this is unseen data for the model, so for Time-series it is the latest partition of data.
Validation - data partition that precedes Holdout, (you can see on image 3 green parts for validation)

Training - data that precedes Validation or Holdout to create all the features needed.
2) In the number of backtests you can remove the Holdout
3) Backtests are used to choose the best performing blueprint among trained blueprints. The final model is the blueprint retrained on the whole dataset, so it can tune hyperparameters separately. But it is still possible to set those hyperparameters in advanced options of blueprint yourself.

4) You can do that. But I should warn you that this will have little to no effect on the model training, as validation and training partitions are separated by time and most important data will be presented in validation only. We recommend using validation length for this proposes. So selecting a model by its performance on the most important period of time.

Hope this helps.

0 Kudos