Solved: Running same project N times with different random... - DataRobot Community

GiorgioC · ‎05-09-2022

Hi all,

I was wondering if there is a way of using the API to upload and run the same project N times using different random seeds. Basically cloning and re-running a project N times with all the same parameters except random seeds.

The goal is having N autopilot results coming from different seeds of the same input dataset.

Thank you in advance for any input.

Cheers,

Giorgio

@christian

IraWatt · ‎05-09-2022

Hey @GiorgioC,

Absolutely you can, not sure if your using Python or R. But in Python you can do something little like this:

import datarobot as dr
dr.Client(token='example', endpoint='https://app.datarobot.com/api/v2')

project = dr.Project.get('example')

for x in range(6):
    new_project = project.clone_project(new_project_name='This is                  
    my new project'+str(x))
    new_project.set_target(target="SHOT_NUMBER",advanced_options= 
    dr.AdvancedOptions(
    seed= x
))

Essentially you can clone with 'clone_project' then loop through and changing the advanced options in the set_target function.

Pretty sure that will do it looking at the documentation API Reference — DataRobot Python Client 2.28.0 documentation (readthedocs-hosted.com), however I'm not sure where to see the seed used after the project has begun running so I cant 100% confirm.

Any questions make sure to ask 🙂

HTH!

Ira

View solution in original post

Lukas · ‎05-10-2022

Hi @GiorgioC and @IraWatt

Great answer by Ira, the only thing to add: to get the random seed after a project has been set up is via

project.advanced_options

Cheers,

Lukas

View solution in original post

GiorgioC · ‎05-12-2022

Hi @IraWatt and @Lukas ,

Just wanted to add a little correction for the community. The previous code, while it did not generate an error, it was not setting the right amount of CV folds from the cloning project. This version below, is tested and works for 10 CV folds (note the variable "partitioning" added).

project = dr.Project.get('6262NNNNNNNNNNN')
partitioning = dr.StratifiedCV(holdout_pct = 20, reps = 10)

for x in range(6):
new_project = project.clone_project(new_project_name='test'+str(x))
new_project.set_target(target="your_target", advanced_options= dr.AdvancedOptions(seed= x), partitioning_method = partitioning)

View solution in original post

IraWatt · ‎05-09-2022

Hey @GiorgioC,

Absolutely you can, not sure if your using Python or R. But in Python you can do something little like this:

import datarobot as dr
dr.Client(token='example', endpoint='https://app.datarobot.com/api/v2')

project = dr.Project.get('example')

for x in range(6):
    new_project = project.clone_project(new_project_name='This is                  
    my new project'+str(x))
    new_project.set_target(target="SHOT_NUMBER",advanced_options= 
    dr.AdvancedOptions(
    seed= x
))

Essentially you can clone with 'clone_project' then loop through and changing the advanced options in the set_target function.

Pretty sure that will do it looking at the documentation API Reference — DataRobot Python Client 2.28.0 documentation (readthedocs-hosted.com), however I'm not sure where to see the seed used after the project has begun running so I cant 100% confirm.

Any questions make sure to ask 🙂

HTH!

Ira

Lukas · ‎05-10-2022

Hi @GiorgioC and @IraWatt

Great answer by Ira, the only thing to add: to get the random seed after a project has been set up is via

project.advanced_options

Cheers,

Lukas

IraWatt · ‎05-10-2022

Ah thanks @Lukas ! Knew there must have been a way : D

GiorgioC · ‎05-10-2022

Thank you so much @IraWatt and @Lukas. I tested it and it worked like a charme!

IraWatt · ‎05-10-2022

Great to hear @GiorgioC glad we could help : )

Linda · ‎05-10-2022

Nice job, @IraWatt and @Lukas! Thank you for sharing your knowledge and best practices with Giorgio and the community at large! And thank you @GiorgioC for making sure to select "Accept as Solution" so other members can find the right answers!

IraWatt · ‎05-10-2022

No worries @Linda ! : D

Andrew Nicholls · ‎05-11-2022

Hi @GiorgioC, I was wondering if you'd be willing to share your goals with this process. I'm curious what you're looking to discover.

GiorgioC · ‎05-11-2022

Hi @Andrew Nicholls,

The idea is having a stress-test on the HO to confirm that the features in consideration are consistently predictive and not just a one time case, which could be due to a statical fluke of having picked a "good" HO. I usually re-do the same in a negative stress-test, where I randomly scrambled the labels and re-train again per mutationally. Here, instead of above I would want to see constantly a 50/50 (flip of a coin situation). It will tell me the features are really predictive of the real label and not just DR finding random patterns in the data.

Running same project N times with different random seeds

Running same project N times with different random seeds

API

Fine tuning gemma 7b with LORA adaptors in AWS sag...

Oracle

How to make your own lagged features

Google Ads use case

Feature Generation