cancel
Showing results for 
Search instead for 
Did you mean: 

What is the difference between Series ID and Segement ID?

What is the difference between Series ID and Segement ID?

Hello

 

What is the difference between Series ID and Segement ID?

I understand that if I choose segement ID, I can get a combined model which is consisted of the best models by each segement ID.

If I choose Series ID and don't choose Segment ID, couldn't I get a combined model?

 

If I only choose Series ID, is the champion model independently consisted of models by each series ID?

In this case, how can I find the feature impact from each model, not a combined one?

I guess if I only choose Series ID, it is not segmented modeling.

It looks like a single model.

What means that choosing only series ID or not.

 

Thank you.

Labels (1)
0 Kudos
2 Solutions

Accepted Solutions
jenD
DataRobot Employee
DataRobot Employee

Hello @cookie_yamyam, thanks for the question and for exploring segmented modeling! You are correct, DataRobot does not create a champion model when you set only the Series ID. In that case, you run multiseries modeling, which creates a single model. From the documentation

 

If you select segmented modeling, DataRobot creates individual sets of models for each segment (and then automatically combines the best model per segment to create a single deployment). If you don’t select segmented modeling, DataRobot creates, from the dataset, a single model representing all series.

 

To find feature impact for a multiseries model, you use the same Understand > Feature Impact tab as any regular time series model (or non-time, for that matter). For segmented modeling, access the Combined Model’s Leaderboard to explore the individual impacts of the segments, including Feature Impact. Hope that helps!

 

jen

 

View solution in original post

matt
Data Scientist
Data Scientist

Just to add a bit to what Jen said:

 

I understand that if I choose segement ID, I can get a combined model which is consisted of the best models by each segement ID.

If I choose Series ID and don't choose Segment ID, couldn't I get a combined model?

A combined model, using Segmentation, is somewhat different than a typical multiseries modeling approach. Think of it with this example:

 

I have a dataset which contains 100 series of time series data. These series may or may not be related to one another. I could potentially create a "model" in a few ways:

  • I could manually split my data into 100 single series problems
  • I could add the whole dataset to DataRobot and model it in a single project
  • I could add the whole dataset to DataRobot and model it in a segmented project.

So knowing these are my options, what is the difference here? If I select only a multiseries_id I will be using a single project within DataRobot. This will build 1 model that is capable of predicting all the series. If my data is related this is a great choice because its much faster to do and simpler to follow!

 

What about if my data really doesn't have a lot in common? Say I am a grocery and my dataset is just 100 different foods in the store. These might have similar sales patterns, but they might be very unrelated. Or say I am a bank and I have 100 customers. Some might have large credit spending and large transaction accounts, and others only ever add a small amount each month for savings. Depending on the use case, you may want to build a project that can very accurately predict these different groups. Can this be done with a single model? Possibly, but you may lose a lot of accuracy trying to fit a single time series model to such a diverse set of series.

 

This is where the Segment_Id and the Combined model come in. You can have a single segmented modeling project, but each Segment_ID will be an entirely separate autopilot run for each segment. The major benefit to this is that from an accuracy and usability perspective, this acts like just ONE model. For predictions and deployments, you interact with just the one segmented model, and it handles everything else for you, even though in reality each segment is its own entire autopilot.

View solution in original post

3 Replies
jenD
DataRobot Employee
DataRobot Employee

Hello @cookie_yamyam, thanks for the question and for exploring segmented modeling! You are correct, DataRobot does not create a champion model when you set only the Series ID. In that case, you run multiseries modeling, which creates a single model. From the documentation

 

If you select segmented modeling, DataRobot creates individual sets of models for each segment (and then automatically combines the best model per segment to create a single deployment). If you don’t select segmented modeling, DataRobot creates, from the dataset, a single model representing all series.

 

To find feature impact for a multiseries model, you use the same Understand > Feature Impact tab as any regular time series model (or non-time, for that matter). For segmented modeling, access the Combined Model’s Leaderboard to explore the individual impacts of the segments, including Feature Impact. Hope that helps!

 

jen

 

matt
Data Scientist
Data Scientist

Just to add a bit to what Jen said:

 

I understand that if I choose segement ID, I can get a combined model which is consisted of the best models by each segement ID.

If I choose Series ID and don't choose Segment ID, couldn't I get a combined model?

A combined model, using Segmentation, is somewhat different than a typical multiseries modeling approach. Think of it with this example:

 

I have a dataset which contains 100 series of time series data. These series may or may not be related to one another. I could potentially create a "model" in a few ways:

  • I could manually split my data into 100 single series problems
  • I could add the whole dataset to DataRobot and model it in a single project
  • I could add the whole dataset to DataRobot and model it in a segmented project.

So knowing these are my options, what is the difference here? If I select only a multiseries_id I will be using a single project within DataRobot. This will build 1 model that is capable of predicting all the series. If my data is related this is a great choice because its much faster to do and simpler to follow!

 

What about if my data really doesn't have a lot in common? Say I am a grocery and my dataset is just 100 different foods in the store. These might have similar sales patterns, but they might be very unrelated. Or say I am a bank and I have 100 customers. Some might have large credit spending and large transaction accounts, and others only ever add a small amount each month for savings. Depending on the use case, you may want to build a project that can very accurately predict these different groups. Can this be done with a single model? Possibly, but you may lose a lot of accuracy trying to fit a single time series model to such a diverse set of series.

 

This is where the Segment_Id and the Combined model come in. You can have a single segmented modeling project, but each Segment_ID will be an entirely separate autopilot run for each segment. The major benefit to this is that from an accuracy and usability perspective, this acts like just ONE model. For predictions and deployments, you interact with just the one segmented model, and it handles everything else for you, even though in reality each segment is its own entire autopilot.

I really appreciated your kind answer!

It's really helpful to understand.

 

Thank you very much.

0 Kudos