cancel
Showing results for 
Search instead for 
Did you mean: 

Best Practices for Imbalanced Data and Partitioning

Community Team
Community Team
4 0 776

(Part of a model building learning session series.)

In this two-part learning session we discuss best practices around data partitioning and working with imbalanced datasets.

Five-fold cross-validation is often the silver bullet for partitioning your validation dataset, but there are some dangerous caveats you have to be aware of to make sure that you're building robust models. In part 1 of this learning session, we talk about those pitfalls and outline strategies for handling them.

Binary target variables are very common in data science use cases, many of which are severely imbalanced. When you're building models for infrequent events, such as predicting fraud or identifying product failures, it's important to watch out for imbalance in your data. In part 2 of this learning session, we discuss strategies for working with imbalanced datasets and provide some rules-of-thumb for these types of use cases.

Hosts

  • Matt Marzillo (DataRobot, Customer Facing Data Scientist)
  • Mitch Carmen (DataRobot, Customer Facing Data Scientist)
  • Jack Jablonski (DataRobot, AI Success Manager)

Now what?

After watching this two-part learning session, you should check out these resources for more information.

 DataRobot Community:

DataRobot licensed customers: search in-app Platform Documentation for: Show Advanced Options link, Date/time partitioning (and out-of-time validation),  or Early release: Profit curve and payoff matrices (within the Version 6. 0.0 release notes).

Questions?

Also, if you have comments or questions that weren't answered in the learning session, you can send email to aisuccess-learningsessions@datarobot.com or click Comment (below) and post them now. We're looking forward to hearing from you!

Announcements
BIG NEWS: The DataRobot Community is getting a new look!
Over the next few weeks, we'll be reorganizing some of our content to provide you with faster & easier help for your DataRobot questions. Stay tuned and check out some more information here.

HEADS UP: Guided AI Learning has moved!
As previously announced, we've now moved the articles from Guided AI Learning to Resources. And all self-paced learning is avaliable from DataRobot University. Go there for the complete, on-demand selection of world-class machine learning courses.