I am trying to predict future company earnings using earnings using cross-sectional time-series data (panel data), and do not see any DataRobot function supporting this type of data. I want to predict companies future earning one quarter into the future, using the historical data for the given company. In DataRobot it does not appear to use the previous data for a given company as input when predicting future earning for the given company, when using time-series modelling.
When creating a time-series model the model perform very poorly, with a negative R2. In addition, using a Out-of-time model, the performance of the model was better, however DataRobot did not create lagged features. Is it possible to create lagged feature for out-of-time modelling in DataRobot?
Creating seperate models for each company is not a possibility as this would result in more than 4,000 different models.
Below table gives an example of the dimensions of the data, using fictive values. As shown below the same date occurs multiple times in the dataset.
|Date||EPS||Revenue||Net income||Total assets||1-month stock return||Ticker|
Do you have any suggestions as to how to best go about cross-sectional time-series data in DataRobot?
You help is much appreciated.
Hi @Oliverdollerup - I've heard from some community members who are considering suggestions for your use case and question. hopefully some folks can help you soon!
I do know that it will help to know for the to know what version of the DataRobot platform you’re using -- v5.3 or v6.0 or so forth, or maybe the AI Platform Trial? If you’re using the Managed Cloud version then you can see a version number when signing back in, like this (below the logo):
Hope you can reply back with that info soon
Firstly, time series models do not work well when predicting highly noisy or random events. Stock prices are a no-go without extensive expert knowledge and specialized data. Perhaps company earnings are easier, but I just wanted to set expectations. While I'd suspect a poor model score for such a difficult problem, anything better than the market baseline would be groundbreaking.
If you have the Automated Time Series product enabled, then you can model both single-series datasets and multi-series datasets (time series panel data). For the example data table, EPS is the target, Date is the date column, and Ticker would be the series ID. This means you would have a separate series for each stock.
For out-of-time modeling, it would be very difficult to avoid overfitting for this type of use case. OTV projects do not automatically do all of the feature derivation of time series projects. However, this would be a time series project since the readings are at fixed time step intervals. This structure would catch more information that would be lost with OTV partitioning.