Using R API to access the Repository and Leaderboa... - DataRobot Community

dalilaB · ‎03-30-2022

In DataRobot, you can find blueprints that were already trained and other blueprints that you may want to train. Trained blueprints are available from the model Leaderboard:

and blueprints you may want to train are available from the Repository:

When you run a project using a mode other than Comprehensive Autopilot, DataRobot first creates blueprints based on the characteristics of your data and puts them in the Repository. Then, it chooses a subset from these to train; when training completes, these are the blueprints you’ll find in the Leaderboard.

I’ve discovered that at times, after the Leaderboard is populated, it can be useful to train some of those blueprints that DataRobot skipped. For example, you can try a more complex Keras blueprint like Keras Residual AutoInt Classifier using Training Schedule (3 Attention Layers with 2 Heads, 2 Layers: 100, 100 Units). In some cases, I’ve decided I would like to directly access the trained model through R and retrain it with a different feature list or tune its hyperparameters.

This tip explains how you can use R to access blueprints from either the Leaderboard or from the Repository.

From the Leaderboard

Let’s assume you already have a project. The project ID is always in the URL and is always preceded by projects/.

Use the code below to find all blueprints trained for the project.

project <- GetProject(projectId)
modelsInLeaderboard <- ListModels(project)
modelsInLeaderboard_df <- as.data.frame(modelsInLeaderboard)

Below shows a sample result. The table provides information for all models built, including type of model (blueprint) and IDs for each model and related blueprint,

as well as the feature lists used to build the models, the sample size, and the metric chosen for the validation set. In our case, our metric was LogLoss, so what you see here is the LogLoss.

From the Repository

You can check all blueprints for the project using the following function:

blueprintsInRepository <- ListBlueprints(project)

This function returns a list of blueprints with their modelType, expandedModel, and blueprintId. And because these blueprints are specifically created for this project, you’ll also get a projectId column. While the result is a list, the list converted to a data frame to make it easier to understand, as shown below.

Now, if you want to train a specific blueprint you just need to call RequestNewModel. As a minimum, this method requires a project and a list of blueprints. In this example, you can see that we’re training the first 4 blueprints from all blueprints in the Repository

RequestNewModel(project,blueprintsInRepository[1:4])

For instance, the first item in blueprintsInRepository[[1]], as shown below, is a Majority Class Classifier, with blueprint ID "069f2c55b2189eaedec4cdb16a8af9b8":

$projectId
[1] "623c6e72bffdbd1f4a1ad8f3"
$modelType
[1] "Majority Class Classifier"
$processes
[1] "Majority Class Classifier"
$blueprintCategory
[1] "DataRobot"
$supportsMonotonicConstraints
[1] FALSE
$monotonicIncreasingFeaturelistId
NULL
$monotonicDecreasingFeaturelistId
NULL
$blueprintId

For a full solution, please check here for R code written by Theodoris Petropoulos. (If you want to do this with Python instead, you can see this other project.) Have questions about this tip or other ideas for tips? Please let me know! Dalila Benachenhou

Using R API to access the Repository and Leaderboard

Using R API to access the Repository and Leaderboard

From the Leaderboard

From the Repository

API

Trial

dataset with multiple targets

Push Jar File to Snowflake

Automate downloading scoring jar file from a DR de...

Use the REST API to get more meta-data on Predicti...

Three different ways to build an insurance loss co...