cancel
Showing results for 
Search instead for 
Did you mean: 

What are the ‘entropy’ feature(s) that were auto created?

rking
Blue LED

My OTV project includes multiple datasets. After running autopilot I see I have many more features (looking at the Informative Features) and that some are named “entropy” … like “(60 days entropy)”
What are these ‘entropy’ feature(s) that DataRobot derived? How should I use them?

Labels (1)
0 Kudos
5 Replies
vyas.adhikari
Data Scientist
Data Scientist
It's a measure of how messy and unpredictable an array of values is (e.g. how diverse is a categorical column). It's therefore indeed a measure of predictability, or order in the data. It can be applied to categoricals where there's no ordering defined. Here's how we calculate it: On entropy features we use the Shannon entropy formula. Note we use natural base (base 10 log) instead of base 2 log2. Thus the unit is natural instead of bits, mainly because log2 requires one additional step in Java. The range of entropy depends on the number of values being observed, can be >1 when there are more than 2 unique values.
doyouevendata
DataRobot Employee
DataRobot Employee

Note that at this point you can use them as any other feature - but since DataRobot derived them, you will not need to provide them should you choose to deploy a model that leverages them.  DataRobot will derive them during a scoring request, as long as you provide the input features associated with them.

rking
Blue LED

@vyas.adhikari @doyouevendata appreciate your help

0 Kudos
yzhang4
Blue LED

Hi vyas, may i know why the entropy is larger than 1? How the interpret is calculated? thanks.

0 Kudos
vyas.adhikari
Data Scientist
Data Scientist
Hi yzhang4, see updated answer above
0 Kudos