cancel
Showing results for 
Search instead for 
Did you mean: 

Using R API to access the Repository and Leaderboard

dalilaB
Data Scientist
Data Scientist

Using R API to access the Repository and Leaderboard

In DataRobot, you can find blueprints that were already trained and other blueprints that you may want to train. Trained blueprints are available from the model Leaderboard:

Linda_1-1648685529058.png

and blueprints you may want to train are available from the Repository:  

Linda_2-1648685593063.png

When you run a project using a mode other than Comprehensive Autopilot, DataRobot first creates blueprints based on the characteristics of your data and puts them in the Repository. Then, it chooses a subset from these to train; when training completes, these are the blueprints you’ll find in the Leaderboard.

I’ve discovered that at times, after the Leaderboard is populated, it can be useful to train some of those blueprints that DataRobot skipped. For example, you can try a more complex Keras blueprint like Keras Residual AutoInt Classifier using Training Schedule (3 Attention Layers with 2 Heads, 2 Layers: 100, 100 Units)In some cases, I’ve decided I would like to directly access the trained model through R and retrain it with a different feature list or tune its hyperparameters.

This tip explains how you can use R to access blueprints from either the Leaderboard or from the Repository.

From the Leaderboard

Let’s assume you already have a project. The project ID is always in the URL and is always preceded by projects/.

Linda_2-1648674526517.png

Use the code below to find all blueprints trained for the project.

 

project <- GetProject(projectId)
modelsInLeaderboard <- ListModels(project)
modelsInLeaderboard_df <- as.data.frame(modelsInLeaderboard)

 

Below shows a sample result. The table provides information for all models built, including type of model (blueprint) and IDs for each model and related blueprint, 

Linda_3-1648674526513.png

as well as the feature lists used to build the models, the sample size, and the metric chosen for the validation set. In our case, our metric was LogLoss, so what you see here is the LogLoss.

Linda_4-1648674526550.png

From the Repository

You can check all blueprints for the project using the following function:

 

blueprintsInRepository <- ListBlueprints(project)

 

This function returns a list of blueprints with their modelType, expandedModel, and blueprintId. And because these blueprints are specifically created for this project, you’ll also get a projectId column. While the result is a list, the list converted to a data frame to make it easier to understand, as shown below. 

Linda_5-1648674526517.png

Now, if you want to train a specific blueprint you just need to call RequestNewModel. As a minimum, this method requires a project and a list of blueprints. In this example, you can see that we’re training the first 4 blueprints from all blueprints in the Repository 

 

RequestNewModel(project,blueprintsInRepository[1:4])

 

For instance, the first item in blueprintsInRepository[[1]], as shown below, is a Majority Class Classifier, with blueprint ID "069f2c55b2189eaedec4cdb16a8af9b8":

 

$projectId
[1] "623c6e72bffdbd1f4a1ad8f3"
$modelType
[1] "Majority Class Classifier"
$processes
[1] "Majority Class Classifier"
$blueprintCategory
[1] "DataRobot"
$supportsMonotonicConstraints
[1] FALSE
$monotonicIncreasingFeaturelistId
NULL
$monotonicDecreasingFeaturelistId
NULL
$blueprintId

 

For a full solution, please check here for R code written by Theodoris Petropoulos. (If you want to do this with Python instead, you can see this other project.) Have questions about this tip or other ideas for tips? Please let me know! Dalila Benachenhou

Labels (3)
0 Replies