Putting “Automation” in Automated Machine Learning
As you think about what automated machine learning platforms should provide in order for you to take advantage of the latest developments in AI, it’s useful to look at some of the typical steps that data scientists themselves go through when they build machine learning models.
This article will walk you through examples of everything involved in the modeling process, along with highlights of the automation built into the DataRobot platform that get you up and running with ML. (Note: this article is adapted from a session at DataRobot’s AI Experience Worldwide Conference.)
1. Format data inputs
After assembling a machine learning dataset together from various data sources, there is still additional data processing required to use it for model building; DataRobot automates this processing with blueprints that are generated dynamically according to each project you create.
DataRobot can handle all kinds of data types, including text, images, and geospatial. The blueprints will automatically identify and process whatever type of variables you’ve included in your dataset.
Features are the columns in your dataset that are used to detect patterns related to your target outcome. After you upload a dataset, DataRobot's automation will create feature lists that help you select the best features for model building.
In order for models to be successful, they need input features that have potentially useful signals. The process of creating features to improve your models is known as feature engineering. In addition to the feature engineering that’s automated within blueprints, DataRobot also generates additional features for date variables, such as “day of the week,” “day of the month,” etc.
2. Ensure that reliable patterns are found
Data partitioning & model validation
To ensure that models are learning reliable patterns in the data, they need to be built or “trained” on historical examples that are not the same as the examples they are tested or “validated” on. As part of its automated guardrails, DataRobot separates (or “partitions”) the rows in your uploaded dataset to prevent models from simply memorizing the examples they’re trained on.
Once you have separate partitions within your dataset for building and validating, you can then measure how well a model is capturing patterns in the data and compare performance across different models.
3. Select and evaluate model options
Ranked models & evaluation metrics
After you start a DataRobot project in Autopilot mode, the automation runs a data science competition on your dataset to produce a leaderboard of models ranked by your preferred evaluation metric.
When building models, you don’t usually know ahead of time which machine learning algorithms will work the best, so you want to try out a variety of different approaches and let the best options bubble up to the top.
Every model on the Leaderboard is also “tuned” automatically, ensuring that the best settings are used when searching for patterns between input features and the target.
Whether you want to evaluate how accurately a model is capturing patterns related to the target, or you just want to understand what the discovered relationships look like, the interpretation tools provide all you need to gather insights from your data and explain them to your colleagues.
5. Document the process
Compliance documentation & downloadable assets
All of the charts and visualizations in DataRobot can be downloaded, along with the data that was used to create them.
With DataRobot’s auto-generated compliance documentation, you can also download a full writeup of everything that went into the model building process. From there you just have to fill in the details specific to you and your company, such as how a model will be used in your business operations and who the various stakeholders are.
This article has shown you examples of what DataRobot automates for you to get started with machine learning.
Even with the level of automation that DataRobot provides, it’s still up to you to use your analytical skills to investigate what a model has learned and interpret the insights that were discovered.
If you don’t feel like you have those skills, or you would like to refresh or increase your ML/AI analytical skills, check out the DataRobot University library of educational resources.