Solved: Re: Chow test - In datarobot - DataRobot Community

MManojkumar · ‎06-11-2020

Hi everyone, Is there a way to perform chow test in datarobot? is there are any alternates / models which could substitute for such test in datarobot? Thanks

emily · ‎06-29-2020

Hi NiCdBattery,

One suggestion for a statistical solution is to run the models with and without those variables, export the error metric for each row - how yo do this will depend on which metric you are using. You will then end up with 2 or 3 distributions of data. An error distribution for the model with spatial/temporal data and an error distribution for the model(s) without spatial/temporal data. You can then do a non-parametric comparison (such as Wilcoxon) to see if those distributions are significantly different. If they are not, then you have statistical evidence that the temporal/spatial features are not significant.

You can support this finding with a lack of feature impact and minimal effects on partial dependence in the model that includes those features as well.

I hope this helps,

Emily

View solution in original post

emily · ‎06-17-2020

Hi Mmanojkumar, The Chow test is used primarily in two ways: 1. in time series to see if there is a break point or change in the regression slope due to some event. 2. Testing how independent variables are impacting different subpopulations of within a dataset - for example is it better to make 1 linear models for the entire dataset or split it up into 2. Which goal would you be trying to achieve? Or is there another outcome that this test addresses that I am missing? Thanks, Emily

MManojkumar · ‎06-29-2020

Hi Emily, Thanks for the response. You are right. We have been questioned about the lack of Temporal and spatial variables in out DataRobot model, we explained there is no significance for those variables as captured by Datarobot. But they are asking us to do a Chow test to prove the same, that both temporal and spatial variables are not important. That's why i was trying to understand is there a direct way in Datarobot to do such test or any alternates.

emily · ‎06-29-2020

Hi NiCdBattery,

One suggestion for a statistical solution is to run the models with and without those variables, export the error metric for each row - how yo do this will depend on which metric you are using. You will then end up with 2 or 3 distributions of data. An error distribution for the model with spatial/temporal data and an error distribution for the model(s) without spatial/temporal data. You can then do a non-parametric comparison (such as Wilcoxon) to see if those distributions are significantly different. If they are not, then you have statistical evidence that the temporal/spatial features are not significant.

You can support this finding with a lack of feature impact and minimal effects on partial dependence in the model that includes those features as well.

I hope this helps,

Emily

MManojkumar · ‎06-30-2020

Hi Emily,

Thanks for the suggestions, that makes sense. Will try that

Chow test - In datarobot

Chow test - In datarobot

Time Series

Paxata Cache Folder

how to transform the var type in workbench

Understanding Model

Time Series Modelling

Trial Walkthrough Issue