May 26: Best Practices for Imbalanced Data and Partitioning

Community Team
Community Team
2 0 259

(Part of a model building learning session series.)

In this two-part learning session we discuss best practices around data partitioning and working with imbalanced datasets.

Five-fold cross-validation is often the silver bullet for partitioning your validation dataset, but there are some dangerous caveats you have to be aware of to make sure that you're building robust models. In the part 1 of this learning session, we talk about those pitfalls and outline strategies for handling them.

Binary target variables are very common in data science use cases, many of which are severely imbalanced. When you're building models for infrequent events, such as predicting fraud or identifying product failures, it's important to watch out for imbalance in your data. (In part 2 of this learning session we discuss strategies for working with imbalanced datasets and provide some rules-of-thumb for these types of use cases.)

When: Tuesday, May 26, 2020 11:00 AM - 12:00 PM EDT (Part 1)

Click here to register now.


  • Matt Marzillo (DataRobot, Customer Facing Data Scientist)
  • Mitch Carmen(DataRobot, Customer Facing Data Scientist)
  • Jack Jablonski (DataRobot, AI Success Manager)

After registering, you will receive a confirmation email containing information about joining the learning session.

Already have questions for the hosts?

If there's some information or questions you'd like our hosts to cover during the learning session, click Comment (below), post them now, and we'll make sure to address them. We're looking forward to hearing from you!

Looking for live, instructor-led classes? See details: DataRobot University