cancel
Showing results for 
Search instead for 
Did you mean: 

input the trend

input the trend

I have an xgboost based model for time series analysis. The training dataset is composed of datetime and daily production value.
When I want to predict the next few days, is it possible to input the model with the production values detected in the previous days in order to have a more accurate forecast?

Labels (1)
0 Kudos
5 Replies
shaz13
Data Scientist
Data Scientist

Sure, here's an example.

We have our Date and Sales as our target and our targetlag (-1 day)

 

Screenshot 2022-06-27 at 8.54.47 PM.png

And for the 2022-06-27 - You would send the prediction input as below

 

Screenshot 2022-06-28 at 10.09.54 AM.png

 

Hope this helps. 

can you give me an example?

0 Kudos
shaz13
Data Scientist
Data Scientist

Yes. All you need is create a new feature called as lag_target which lags the target by 1 day duration in this case.


And, during production - You would pass a dataframe with the latest true value (previous day's actual) in column lag_target . 


Let me know if you need any help further. Thanks!

ok but the info I would need is: is it possible to give the model published in input also the latest true values to have a finer forecast?

0 Kudos
shaz13
Data Scientist
Data Scientist

Hi @darkaulius
The thumb rule is - Any feature you think can give signal or is informative for better prediction should be included as part of model's features as long as you have access to these features during inference. 

We usually give a gap from our prediction point in case of a time series problem. This period of gap is called "Blind History Gap". And, is used to make sure that the model is not just simply a function of most recent value but learns from historic time series signals in the data. 

I would suggest if you can try two projects - With (-30, -7) and (-30, -3) as your Feature Derivation Windows to see if there is any significant difference in your chosen metric. Choose the window which is practically, accounts for delay in data and scores are stable across different partition sets. 

Hope this helps.