cancel
Showing results for 
Search instead for 
Did you mean: 

Running same project N times with different random seeds

GiorgioC
Image Sensor

Running same project N times with different random seeds

Hi all,

 

I was wondering if there is a way of using the API to upload and run the same project N times using different random seeds. Basically cloning and re-running a project N times with all the same parameters except random seeds. 

The goal is having autopilot results coming from different seeds of the same input dataset.

 

Thank you in advance for any input.

 

Cheers,

Giorgio

@christian 

Labels (1)
10 Replies
IraWatt
Laser

Hey @GiorgioC,

Absolutely you can, not sure if your using Python or R. But in Python you can do something little like this:

 

import datarobot as dr
dr.Client(token='example', endpoint='https://app.datarobot.com/api/v2')

project = dr.Project.get('example')

for x in range(6):
    new_project = project.clone_project(new_project_name='This is                  
    my new project'+str(x))
    new_project.set_target(target="SHOT_NUMBER",advanced_options= 
    dr.AdvancedOptions(
    seed= x
))

 

Essentially you can clone with 'clone_project' then loop through and changing the advanced options in the set_target function.

 

Pretty sure that will do it looking at the documentation API Reference — DataRobot Python Client 2.28.0 documentation (readthedocs-hosted.com), however I'm not sure where to see the seed used after the project has begun running so I cant 100% confirm.

Any questions make sure to ask  

HTH! 

Ira

 

Lukas
Data Scientist
Data Scientist

Hi @GiorgioC and @IraWatt 

 

Great answer by Ira, the only thing to add: to get the random seed after a project has been set up is via

 

project.advanced_options

Lukas_0-1652165901233.png

 

Cheers,

Lukas

IraWatt
Laser

Ah thanks @Lukas ! Knew there must have been a way : D 

GiorgioC
Image Sensor

Thank you so much @IraWatt and @Lukas. I tested it and it worked like a charme! 

Great to hear @GiorgioC glad we could help : )

0 Kudos
Linda
Community Manager
Community Manager

Nice job, @IraWatt and @Lukas! Thank you for sharing your knowledge and best practices with Giorgio and the community at large! And thank you @GiorgioC for making sure to select "Accept as Solution" so other members can find the right answers!

Linda_0-1652213307817.gif

IraWatt
Laser

No worries @Linda  ! : D 

0 Kudos
Andrew Nicholls
Data Scientist
Data Scientist

Hi @GiorgioC, I was wondering if you'd be willing to share your goals with this process. I'm curious what you're looking to discover. 

0 Kudos

Hi @Andrew Nicholls,

 

The idea is having a stress-test on the HO to confirm that the features in consideration are consistently predictive and not just a one time case, which could be due to a statical fluke of having picked a "good" HO. I usually re-do the same in a negative stress-test, where I randomly scrambled the labels and re-train again per mutationally. Here, instead of above I would want to see constantly a 50/50 (flip of a coin situation). It will tell me the features are really predictive of the real label and not just DR finding random patterns in the data. 

0 Kudos
GiorgioC
Image Sensor

Hi @IraWatt  and @Lukas ,

 

Just wanted to add a little correction for the community. The previous code, while it did not generate an error, it was not setting the right amount of CV folds from the cloning project. This version below, is tested and works for 10 CV folds (note the variable "partitioning" added).

 

 

project = dr.Project.get('6262NNNNNNNNNNN')
partitioning = dr.StratifiedCV(holdout_pct = 20, reps = 10)


for x in range(6):
new_project = project.clone_project(new_project_name='test'+str(x))
new_project.set_target(target="your_target", advanced_options= dr.AdvancedOptions(seed= x), partitioning_method = partitioning)