The only way that I know to add a blueprint to a project to be trained and then used in prediction is to select from the list project.get_blueprints(). This was working.
But, just now, with some new data, my code complained that the blueprint list was empty. I also checked on the WEB GUI, and found that there were 0 models on that project. And the problem was that DR had not attached any blueprints at all.
I have been using this approach for a while using the Python API and it was working. And through the GUI I selected a blueprint using the add-blueprint option. It trained fine and got validation etc.
How can I select this model through the Python API?
As a work-around I have tried turning on the autopilot and waiting for it to finish, but it is sitting there for a long time going through other models. The potential that I see for this approach is to train one project on autopilot to get a collection of blueprints and then extract those into other projects. I am hoping there is a better way.
Solved! Go to Solution.
@Kreshnaa -- thanks for your help. I have accepted both your earlier comment and mine as I feel that they will both be important if anyone else follows this up with a similar problem.
That's great. A reduced blueprint list can exponentially reduce your training time for quick experimentation.
Hi @Kreshnaa ,
It looks like I have a sequence that works.
project = dr.Project.create() project.set_target( ... manual mode ...) project.pause_autopilot() # mess with blueprint list job = project.train(bluprint) project.unpause_autopilot() model = md.model_job.wait_for_async_model_creation(project_.id,job.id)
I am not getting much in the way of error messages in any of this. Rather they just don't physically do what I want. Now that I can at least get hold of the blueprints, I am still finding that there are models on the queue. I want to be able to create a project, load data, and have it train only the model that I want it to train. It takes a while to train the model on 4 million data points, and so I want to shorten that time by excluding stuff that I know I don't want.
Ideally if the Project.set_target() with manual mode or Project.start() with autopilot_mode set to False executes successfully it will result in suggestions on the repository with no models on the leaderboard. If there is a case where there are no models on the repository after set_target() or start() then it needs further investigation. What is the error message that you get when running the above commands?
I was using project.create for more or less that reason. And it was working on my original trial data. I switched to the more complicated data - and it stopped working.
I get the principle - can you tell me why the list was not populated in this case?
You can use Project.create and Project.set_target (refer to the code from my earlier comment). Project.start is limited in terms of advanced functionality (https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.26.1/autodoc/api_reference.html#dat...)
In Manual mode, autopilot is not run. It only generates the blueprint and stores it in the repository. The manual mode suggested list (Repository) usually covers a wide range of models (including one's that are not run during autopilot) based on the dataset used.
Hi @Kreshnaa ,
Thanks for your response,
The core problem really is how to grab something that was not on the suggested list. If I create a project with autopilot_on=false, then how can I add a model to it without having to then run autopilot anyway?
I was originally doing what you suggested. And it was working. Then on some new and more complicated tables - it stopped working. But, I could still select from the drop down list on the GUI. When I tried doing a wait on the project in case the problem was that the list was not yet populated - I ran into trouble where no combination of options would work. Either I could not wait, or the list was not populated.
I think you need to start the project on manual mode so that blueprints are suggested and do not run automatically. Once they're suggested (this can be found on the Repository tab next to the leaderboard) you can access that list of blueprints and run the required one. This way you don't have to run the autopilot every time.
Attaching a small code sample for the same.
proj = dr.Project.create(sourcedata=train, project_name= 'manual_mode')
bp_list = proj.get_blueprints()
first_blueprint = bp_list
Let me know if this is answers your question.