Showing results for 
Search instead for 
Did you mean: 

tools to address low frequency­ high separation variables?

Blue LED

Does DataRobot have tools to address low frequency­ high separation variables? For example there are 10 variables in data that appear less than 10 bps of the time but when they appear they drastically change the averages, do they have a mechanism to handle such information while model is still throwing them out because of low frequency.

Labels (2)
0 Kudos
1 Reply
Data Scientist
Data Scientist

Hi @elenaSP ,

Here's my initial thought: DataRobot heavily uses tree based ensemble methods (Random forests, gradient boosted trees, etc.). These trees will split on variables to maximize information gain. If there are low frequency variables that cleanly separate different different classes, I suspect the tree based methods will preferentially split on these variables regardless of their frequency.

I may need more context though: is this a regression or classification problem? And what is the use case?

0 Kudos