An example workflow for building multiple one-vs-rest models with DataRobot.
Multiclass Classification in DataRobot
The simplest way to build a multiclass classification model in DataRobot is to set your multiclass target. When you hit Start, DataRobot compares the performance of multiple multiclass classification models. This approach is simple and results in helpful insights, including a confusion matrix, Feature Impact chart, Lift Charts, word cloud, and text mining chart for each individual class.
However, some insights like the Feature Effects charts and Prediction Explanations are still unavailable for multiclass models due to the difficulty of rendering those charts with multiple targets. And there are some instances when you may find improved accuracy from building multiple binary classification models. This approach is often referred to as one-vs-rest (or one-vs-all) modeling. Instead of a single multiclass model, you build a separate binary classification model for each class; each model predicts whether the target is the individual class or any of the other classes.
In this article, we walk through the process of building a series of one-vs-rest models for a use case where a more interpretable approach is preferred, using a use case brought to us from the field of geology.
Example: A Lithofacies Classification Model
Lithofacies are characteristics of rock such as chemical composition and petrophysical properties like permeability, etc. Determining the facies of various rock formations in a reservoir is central to oil and gas development, as it helps characterize the reservoir and predict where recoverable oil or gas is likely to be. While core samples can be used to directly determine the facies types in a well, core samples are expensive to recover and are not always available. Wireline log data can be more readily gathered as an alternative to core samples. If log data is compared to labeled core samples, a statistical model can be built that predicts facies types using the log data.
The dataset contains 4,000 measurements taken from nine separate cores, and consists of:
9 Rock types to classify (Targets)
9 Predictive Features (columns)
Instead of building a single multiclass project, we’ll build nine separate binary classification models:
Building Nine Projects with the DataRobot Modeling API
The steps are:
Convert the original data to 9 one-vs-rest dataframes.
Build 9 one-vs-rest projects in DataRobot, and run Autopilot on each.
Compare the combined performance of the top models from our 9 projects to the top model from a single multiclass project.
Compare feature impact across the different top one-vs-rest models.
Generate Prediction Explanations from the one-vs-rest models and compare the explanations for individual samples.
We will partition our data by well, leaving the Stuart & Crawford wells as holdouts. The samples from the seven other training wells are randomly shuffled into 5 partitions using a random number assignment in the partition column.
Optionally, you could have DataRobot do cross-validation by well (e.g., train on six wells, validate on the seventh) by passing the well name rather than a random integer to the 'partition' column.