Re: tools to address low frequency high separatio... - DataRobot Community

elenaSP · ‎12-16-2019

Does DataRobot have tools to address low frequency high separation variables? For example there are 10 variables in data that appear less than 10 bps of the time but when they appear they drastically change the averages, do they have a mechanism to handle such information while model is still throwing them out because of low frequency.

duncanrenfrow · ‎12-17-2019

Hi @elenaSP ,

Here's my initial thought: DataRobot heavily uses tree based ensemble methods (Random forests, gradient boosted trees, etc.). These trees will split on variables to maximize information gain. If there are low frequency variables that cleanly separate different different classes, I suspect the tree based methods will preferentially split on these variables regardless of their frequency.

I may need more context though: is this a regression or classification problem? And what is the use case?

tools to address low frequency high separation variables?

tools to address low frequency high separation variables?

Oracle

How to make your own lagged features

Google Ads use case

Feature Generation

Downloaded Predictions do not Match Targets

tools to address low frequency­ high separation variables?

tools to address low frequency­ high separation variables?

Oracle

How to make your own lagged features

Google Ads use case

Feature Generation

Downloaded Predictions do not Match Targets

tools to address low frequency high separation variables?

tools to address low frequency high separation variables?