Solved: What are the ‘entropy’ feature(s) that were auto c... - DataRobot Community

rking · ‎01-06-2021

My OTV project includes multiple datasets. After running autopilot I see I have many more features (looking at the Informative Features) and that some are named “entropy” … like “(60 days entropy)”
What are these ‘entropy’ feature(s) that DataRobot derived? How should I use them?

vyas.adhikari · ‎01-06-2021

It's a measure of how messy and unpredictable an array of values is (e.g. how diverse is a categorical column). It's therefore indeed a measure of predictability, or order in the data. It can be applied to categoricals where there's no ordering defined. Here's how we calculate it: On entropy features we use the Shannon entropy formula. Note we use natural base (base 10 log) instead of base 2 log2. Thus the unit is natural instead of bits, mainly because log2 requires one additional step in Java. The range of entropy depends on the number of values being observed, can be >1 when there are more than 2 unique values.

View solution in original post

vyas.adhikari · ‎01-06-2021

It's a measure of how messy and unpredictable an array of values is (e.g. how diverse is a categorical column). It's therefore indeed a measure of predictability, or order in the data. It can be applied to categoricals where there's no ordering defined. Here's how we calculate it: On entropy features we use the Shannon entropy formula. Note we use natural base (base 10 log) instead of base 2 log2. Thus the unit is natural instead of bits, mainly because log2 requires one additional step in Java. The range of entropy depends on the number of values being observed, can be >1 when there are more than 2 unique values.

doyouevendata · ‎01-07-2021

Note that at this point you can use them as any other feature - but since DataRobot derived them, you will not need to provide them should you choose to deploy a model that leverages them. DataRobot will derive them during a scoring request, as long as you provide the input features associated with them.

rking · ‎01-10-2021

@vyas.adhikari @doyouevendata appreciate your help

yzhang4 · ‎01-31-2021

Hi vyas, may i know why the entropy is larger than 1? How the interpret is calculated? thanks.

vyas.adhikari · ‎02-01-2021

Hi yzhang4, see updated answer above

What are the ‘entropy’ feature(s) that were auto created?

What are the ‘entropy’ feature(s) that were auto created?

Paxata Cache Folder

how to transform the var type in workbench

Understanding Model

Time Series Modelling

Trial Walkthrough Issue