Selecting Target Variables in Modeling

Community Team
Community Team
3 5 1,788

(Part of a model building learning session series.)

Many people who are new to data science struggle with the preparation of the data. This is not surprising, given the many tutorials available on how to build models using already prepared data (think: Kaggle).

In this learning session, we spend some time exploring an important piece that is often missing: creating the target variable.

Hosts

  • Lukas Innig (DataRobot, Customer Facing Data Scientist)
  • Rajiv Shah (DataRobot, Customer Facing Data Scientist)

Now what? 

After watching the learning session, you should check out more information about target selection and preparing data.

  • Target Variable (DataRobot Wiki)
  • DataRobot licensed customers: search in-app Platform documentation for Modeling workflow, and locate the information "Setting the target feature."

Questions?

Also, if you have comments or questions that weren't answered in the learning session, click Comment (below) and post them now.

5 Comments
Computer Board

Great video since this is an important topic in data science that isn't common spoken about. 

The customer churn example reminded me a lot of this blog post: https://blog.featurelabs.com/prediction-engineering-churn/.

They have a similar diagram to the ones you show in the video.

churn.png

 

Do you know of any other resources or tools that deal with this of prediction engineering? 

Data Scientist
Data Scientist
Hi Joe, thank you for your interest in this. There is some more information specifically on Churn prediction here: https://blog.datarobot.com/predicting-churn-how-data-can-help-with-customer-retention Other than that, I can recommend the TWIML podcast. They had an episode on these types of challenges in ML recently: https://twimlai.com/twiml-talk-349-turning-ideas-into-ml-powered-products-with-emmanuel-ameisen/ Cheers, Lukas
Ball Bearing

Excellent video. Many times we forget to ask the correct question in the design of our models.

I have a question regarding the FDW (Feature Derivation Window). Personally, I have only worked with one type of FDW, that is features that have been aggregated/calculated from last 12 months of data.
What do you think about having multiple FDW (ie features at 6months, 3months, or 1month)?

Data Scientist
Data Scientist

I often recommend trying different Feature Derivation Windows (FDWs) based on your problem to see what works best.  In our Time Series Starter course, we actually have you try different FDWs.

It's a bit more complicated, but I could see mixing features at different derivations.  Some features, like weather, maybe not be relevant over a long feature derivation window.  Other features, like job satisfaction, might be relevant over longer periods of time.  If we are measuring your happiness at work, your satisfaction 6 months ago could be a useful factor as well as the weather yesterday.   Does this make sense?

Community Team
Community Team

@Carlos_Mariona Sorry, your question somehow got lost! Hope @rshah was able to help? Please let us know 

Announcements
BIG NEWS: The DataRobot Community is getting a new look!
Over the next few weeks, we'll be reorganizing some of our content to provide you with faster & easier help for your DataRobot questions. Stay tuned and check out some more information here.

HEADS UP: Guided AI Learning has moved!
As previously announced, we've now moved the articles from Guided AI Learning to Resources. And all self-paced learning is avaliable from DataRobot University. Go there for the complete, on-demand selection of world-class machine learning courses.