Hello. I have a model which I've built using the UI and I now want to extract information about it via the R API. I can find the project ID easily enough, by e.g.;
res = GET(
url = "https://app.eu.datarobot.com/api/v2/projects/",
add_headers("Authorization" = paste("Bearer", token, sep = " ")),
encode = "json",
verbose()
)
dt_projects <- data.table(fromJSON(content(res, as = "text")))
project_id <- dt_projects[projectName == projName, id]
What I can't see how to do anywhere is extract the dataset id related to this project. Any ideas?
Thanks, Tom
Solved! Go to Solution.
datasetID only applies to datasets in AI Catalog.
If a project is created by uploading a dataset directly (like via URL, drag and drop, local file upload, api project creation, ...) it wont have an associated AI Catalog entry, and wont have an associated datasetId.
Interesting... I built the particular model I'm trying to interrogate using the platform, and my project has catalogId = NULL. Indeed every single ID listed on the page (https://app.eu.datarobot.com/api/v2/projects/61********31) is null, apart from the project ID.
It may be quickest that I just re-upload a dataset, or a sample of it, via the API - which is practical for this task but obviously not always ideal!
Using the project_id you can get details about the project.
Example I have projectId 615442d2c3b2a5d486280268:
https://app.datarobot.com/api/v2/projects/615442d2c3b2a5d486280268/
The response for that is:
{"id": "615442d2c3b2a5d486280268", "projectName": "10k_diabetes.csv", "fileName": "10k_diabetes.csv", "stage": "modeling", "autopilotMode": 0, "created": "2021-09-29T10:41:41.499215Z", "target": "readmitted", "metric": "LogLoss", "partition": {"cvMethod": "stratified", "validationType": "CV", "holdoutPct": 20.0, "reps": 5, "useTimeSeries": null, "validationLevel": null, "datetimeCol": null, "cvHoldoutLevel": null, "holdoutLevel": null, "trainingLevel": null, "userPartitionCol": null, "validationPct": null, "partitionKeyCols": null}, "recommender": {"isRecommender": null, "recommenderItemId": null, "recommenderUserId": null}, "advancedOptions": {"weights": null, "blueprintThreshold": 3, "responseCap": false, "seed": null, "scaleoutModelingMode": "disabled", "defaultMonotonicIncreasingFeaturelistId": null, "defaultMonotonicDecreasingFeaturelistId": null, "onlyIncludeMonotonicBlueprints": false, "shapOnlyMode": false, "runLeakageRemovedFeatureList": true, "smartDownsampled": false, "majorityDownsamplingRate": null, "downsampledMinorityRows": null, "downsampledMajorityRows": null, "blendBestModels": true, "prepareModelForDeployment": true, "considerBlendersInRecommendation": false, "scoringCodeOnly": false}, "positiveClass": 1.0, "maxTrainPct": 64.0, "maxTrainRows": 6400, "scaleoutMaxTrainPct": 64.0, "scaleoutMaxTrainRows": 6400, "holdoutUnlocked": false, "catalogId": "615326e4a8a592c8ff27ff81", "catalogVersionId": "615326e5a8a592c8ff27ff82", "externalTimeSeriesBaselineDatasetMetadata": null, "segmentation": null, "targetType": "Binary", "unsupervisedMode": false, "useFeatureDiscovery": false, "quickrun": true, "automodelDeploymentId": null}
If the project was built from the AI Catalog the response will contain the datasetId as:
"catalogId": "615326e4a8a592c8ff27ff81"
The inverse is also available, if you have the datasetId you can get it's associated projects:
https://app.datarobot.com/api/v2/datasets/615326e4a8a592c8ff27ff81/projects/
{"count": 1, "next": null, "previous": null, "data": [{"id": "615442d2c3b2a5d486280268", "url": "https://app.datarobot.com/api/v2/projects/615442d2c3b2a5d486280268/"}], "totalCount": 1}
Thanks @IraWatt - Good shout and this might indeed be the intended solution but for me it is not: both are R functions with no arguments, which just return me an empty list.
Hi @Tom B,
I'm not an active user of the R API but the R Package Docs lists two functions 'ListDataSources' and 'ListDataStores' one of which may be what your looking for.
All the best,
Ira