In addition to the information available from the Leaderboard, the Models page provides other tabs to help compare model results. These are Learning Curves, Speed vs Accuracy, and Model Comparison.
Learning Curves shows how performance changes as the sample size increases. You can use learning curves to help determine whether it's worthwhile to increase the size of your dataset. Getting additional data can be expensive, so it may be worthwhile if it improves the model accuracy.
Figure 1. Tools for comparing model results
In Figure 2, we see lines in the graph on the left with dots that connect each line segment. Each dot represents a portion of the training data. So by default we have 16 percent, 32 percent and 64 percent of our training data. And if the holdout is unlocked, then the validation data performance is shown as well as up to 80 percent.
Hovering the mouse over any of the line segments highlights the name of the associated machine learning algorithm (listed to the right). Each line represents the machine learning algorithm and the feature list that was used to train it. So each is a group that consists of the models for each of the different training data set sizes.
Figure 2. Learning Curves graph The Speed vs Accuracy analysis plot shows the tradeoff between prediction runtime and predictive accuracy, and helps you choose the best model with the lowest overhead as a combination of the two. On the Y-axis we see the currently selected metric, which in this case is LogLoss. On the X-axis we see the prediction speed as the time in milliseconds to score two thousand records. Like the learning curves display, we can hover the mouse over each dot or the name of the machine learning algorithm to highlight its counterpart on the opposite graph.
Figure 3. Speed vs Accuracy
Model Comparison provides a mechanism to show more detailed ways to compare two models in your project. Comparing models can help identify a model that more precisely meets your requirements. It can also help in selecting candidates for ensembling, or building blender models, as they are called. For example, two models may diverge considerably, but by blending them, you can improve your predictions. Or maybe you have two relatively strong models and, by blending them, you can create an even better result to create a model comparison. You need to first select the two models you want to compare, shown at the top of the page in blue on the left and yellow on the right. By clicking either of those, you're able to select a model. Next, choose a chart type that you want to display for comparing the selected models.
Starting with the ROC curve, this option helps to explore classification projects in terms of performance and statistics, namely the balance of the true positive rate and the false positive rate as a function of a cutoff threshold.
Figure 4. Model Comparison
The Lift chart depicts how effective each model is at predicting the target at different value ranges. We can look at this like a distribution of the predictions that each model makes, ordered from lowest to highest predictions, and by any number of bins that we select in the Number of Bins dropdown list.
Figure 5. Lift chart
The Dual Lift chart is a mechanism for visualizing how two competing models perform against each other; that is, their degree of divergence in relative performance. So, whereas the Lift chart sorts predictions from lowest to highest for a single model, the Dual Lift chart sorts the rows by the magnitude of the difference between each of the two models’ prediction scores. What we see is that the plot color coding matches the color of the model at the top, and the divergence between the two widen from the left, flip over at the midpoint, and then widen again on the right.
The Dual Lift chart is a good tool for assessing candidates for ensemble modeling. Finding different models with large divergences in the target rate (as shown with the orange line) could indicate good pairs of models to blend. That is, does a model show strength in a particular quadrant of the data? You might be able to create a strong ensemble by blending a model that is strong in an opposite quadrant.