Best Practices for Building ML “Learning Datasets”

Showing results for 
Search instead for 
Did you mean: 

Best Practices for Building ML “Learning Datasets”

The science of building datasets to teach your ML models

Did you know that 80% of a data scientist’s time is spent finding, cleaning, and reorganizing data? A number of things are working together to contribute to this:

  • We have more data than ever to derive insights from, but it’s coming from more places and in more dissimilar shapes than ever before. We have best of breed SaaS apps for every part of our business. We have more cloud data sources on distinct cloud platforms. On top of that, we have data coming from sensors and devices—all in different shapes and sizes. To get actionable insights from this requires the data to be combined, cleaned, and standardized. For most organizations today, this requires many people of various skill sets using multiple tools, languages, and platforms to get the job done.
  • To compound the situation, being forced to rely on highly skilled IT resources to do the data preparation is not feasible and takes those resources away from other mission-critical application tasks. This creates a bottleneck with business requests for new datasets taking longer and longer to fulfill.
  • Lastly, not only is the pace of business driving a sharp increase in the number of requests, but there often is no real single view to look at your business anymore. Many questions are very specific to the issue the user is working on and the business context becomes paramount. Once again, relying on our scarce IT resources often then leads to multiple iterations before the right data is ready for model building.

The great news: DataRobot has a data prep product that empowers you to significantly reduce the amount of time you spend preparing your data. But before you can even begin your data prep work, you’ve got to know where to start your predictive analytics journey--for example:

  • How do you define or frame the business problem you need to solve?
  • How do you identify the kinds of data you require?
  • And, how do you structure your data in order to successfully teach your ML models?

The articles in this series assist you in tackling these data science fundamentals so that you can jump start your predictive analytics journey.

Labels (2)
Version history
Last update:
‎02-12-2021 03:59 PM
Updated by: