Good afternoon all,
I will describe a scenario that I am facing currently.
I have a series of events (repetitive steps), and every time the step finish, we reset the data to zero.
Metaphorically speaking and making it simple, imagine I have a bucket. I need to fill the bucket with water every time, once I think the bucket is filled, I empty the bucket. Between step and step (between each filling) there is a gap of time, and this gap is not the same every time, can change, since some preparation for the new filling must be done first.
Features during filling can be flow, pressure, volume of the water inside the bucket, etc…, so these features may have some conditions/restrictions (i.e. volume values inside the bucket cannot decrease, only be flat -no filling- or increase).
Can Time Series discriminate the events (steps/fillings) independently? I mean, if I have 100 steps with their data, can TS analyze each step individually and come back with the prediction of the next one?
And can TS impose some conditions to the features like having only increments (cumulative features)?
Thanks for your time!
Firstly big thank you to Emily and yourself for your prompt answer.
Understood, maybe OTV is the first approach, since each step itself is time depending along all the features but with some independency among all the different steps.
Thank you Chad for clarify some of this for us!
Thanks for engaging me on this, @emily! Your response is great but I want to address a couple things I'm hearing and one caveat that is easy to miss.
First, non-independence is part of what time series addresses. Mainly, because past values of the target are used to predict future values, the set-up inherently assumes some inter-row dependency. You can address that in DataRobot's AutoTS but we suggest doing so with the guidance of a CFDS. Alternatively, you can use OTV (out-of-time validation) to allow for time-awareness while preserving each row's independence in the approach. In general, however, anytime you want to use past features to predict future values (whether you use the target or not, you are assuming the rows are not independent...otherwise, the past values would have little predictive power).
Second, regarding any gaps you may have between dates, there are multiple strategies to address and it is hard to recommend one without more context. Because DataRobot makes it easy to experiment, you could try adding back in missing values and leaving the target blank.
Finally, regarding monotonicity constraints, you cannot currently set those in AutoTS. If you require monotonicity constraints, I recommend to frame your problem using OTV.
Customer Facing Data Scientist and Inside Squad Team Captain
"Can Time Series discriminate the events (steps/fillings) independently? I mean, if I have 100 steps with their data, can TS analyze each step individually and come back with the prediction of the next one?"
The answer to this question, pending a few assumptions, is "yes". That is what the platform is designed to do.
Below you can see an example dataset, I think this is the easiest way to describe our approach. You can see the target here is sales, and we have features in other columns that vary with "Sales". You can indicate within DataRobot whether or not these features are known in advance, or whether you know them at the time of measurement - on this case day. Notice that the time feature (dates) are all equidistant. There is also a grouping column called "Store", and this allows you to do multiple time series on different stores.
"And can TS impose some conditions to the features like having only increments (cumulative features)?(i.e. volume values inside the bucket cannot decrease, only be flat -no filling- or increase)""
DataRobot will automatically create hundreds of lagged features within a specified window relative to a forecast point to aid in the prediction. The platform will also look for longer term cyclical trends. (Edit: You can indicate monotonic constraints on features only in OTV).
Something to consider:
Generally, the data needs to have equal distance between each measurement, which might be a problem in your case or it might not. I think if you artificially created dates for each observation, and then indicated the days until the "next bucket fill" as a feature then you might be able to get around this. @chad is more well versed in time series than I am, so please feel free to push back here Chad.
Do you know any of these features in advance? Meaning, at the time of prediction will some of the columns/data points be known? For example, in the scenario you explained will water pressure be something you control or strategically vary with each observation?
Does this answer your question?
Thanks for posting on our community!