I was wondering if there is a way of using the API to upload and run the same project N times using different random seeds. Basically cloning and re-running a project N times with all the same parameters except random seeds.
The goal is having N autopilot results coming from different seeds of the same input dataset.
Thank you in advance for any input.
Solved! Go to Solution.
Absolutely you can, not sure if your using Python or R. But in Python you can do something little like this:
import datarobot as dr dr.Client(token='example', endpoint='https://app.datarobot.com/api/v2') project = dr.Project.get('example') for x in range(6): new_project = project.clone_project(new_project_name='This is my new project'+str(x)) new_project.set_target(target="SHOT_NUMBER",advanced_options= dr.AdvancedOptions( seed= x ))
Essentially you can clone with 'clone_project' then loop through and changing the advanced options in the set_target function.
Pretty sure that will do it looking at the documentation API Reference — DataRobot Python Client 2.28.0 documentation (readthedocs-hosted.com), however I'm not sure where to see the seed used after the project has begun running so I cant 100% confirm.
Any questions make sure to ask
Hi @Andrew Nicholls,
The idea is having a stress-test on the HO to confirm that the features in consideration are consistently predictive and not just a one time case, which could be due to a statical fluke of having picked a "good" HO. I usually re-do the same in a negative stress-test, where I randomly scrambled the labels and re-train again per mutationally. Here, instead of above I would want to see constantly a 50/50 (flip of a coin situation). It will tell me the features are really predictive of the real label and not just DR finding random patterns in the data.
Just wanted to add a little correction for the community. The previous code, while it did not generate an error, it was not setting the right amount of CV folds from the cloning project. This version below, is tested and works for 10 CV folds (note the variable "partitioning" added).
project = dr.Project.get('6262NNNNNNNNNNN')
partitioning = dr.StratifiedCV(holdout_pct = 20, reps = 10)
for x in range(6):
new_project = project.clone_project(new_project_name='test'+str(x))
new_project.set_target(target="your_target", advanced_options= dr.AdvancedOptions(seed= x), partitioning_method = partitioning)