Hello, I have a primary dataset, and multiple secondary dataset to join with. How can we make DataRobot keep all the original fields (regardless of it being informative or not), and make calculated field using two or more features (before modeling starts)?
For example, I have alert date in primary data, and can be joined by multiple account opening date in secondary data. I would like to calculate difference between account opening and alert data as account age; however account open date is discarded by DataRobot.
Please advise, thanks
Hey @Jieruide - Thanks for sharing that guidance back to the community. Hoping this means you're moving forward again to predictions!
Hi @joao , I talked to DataRobot, it is suggested that I add a secondary table that has one to one relationship to primary table WITHOUT time aware setup. I had similar thought as you suggested, but then it need to set look back period to accommodate the earliest open date.
Because my secondary table has many to one relationship with primary, I ended up deriving the feature myself, and summarize using avg, min, or max. e.g. avg account open date.
Hi @Jieruide, I suggest you use open date as time aware so the date difference with the primary date (alert date) is computed. Note that you should set a feature derivation window large enough to ensure a good coverage from that secondary dataset. Let me know if its works. Thanks
Hi joao, thanks for the answer. I did create custom feature list and use it in relationship editor. Also I turned off supervised feature reduction. Overall I did get more features, but still the date features are discarded. To provide more info:
Please advise, thanks
In the AI Catalog, you can create a custom feature list on your secondary dataset that includes the non informative feature. Then you can use the custom feature list in the relationship editor. DataRobot will then use the 'account opening' feature when searching for date differences. If the difference between 'account opening' and 'alert date' is informative, it will be available in your project before modeling. You can also inspect the feature derivation log to confirm the features in your custom feature list have been included.