From the Python API, how does one specify the optional features for a prediction run?
The item in the field on the web GUI "Optional features (0 of 5)".
"Best code is no code at all" (c) Jeff Atwood, co-founder of Stack Overflow and Stack Exchange.
We tried to solve as many problems and edge cases along with stability and reliability as possible from our side.
@Bogdan Tsal-Tsalko "best code is no code" is an oxymoron. It is not a deep concept. It just means the person who said it is no good at code. How about "The best GUI is no GUI". Fashions change but what you are telling me is that the DR API is intended to discourage people from using it at all.
@Eu Jin This was definitely not the message I got in my first meeting about the functionality of the DR API.
DataRobot is made around automatization, so Datarobot is made to reduce code, and DR API is made to give you access to all the operations in-app through code. Uploading predictions is routine we automated for our users. You had a question about troubles with aligning predictions back, now you have two options - write code as you want it, or use what is automated.
Hey @Bruce , I decided to come up with a solution https://github.com/calamarif/datarobot_gui_to_code (please read on before you click on it), because on presenting this to a talented colleague ( @Lukas ) , he said "that's great, but hold my beer" (slight misrepresentation of his words), and asked why didn't. I just do this:
import datarobot as dr import pandas as pd p = dr.Project.get('6242f741974efc8f60e26fcf') prediction_data = pd.read_excel('/Users/lukas.innig/DataRobot/Datasets/10k_diabetes.xlsx')[:100] ds = p.upload_dataset(prediction_data) m = p.get_models() passthrough_cols = ['admission_type_id','discharge_disposition_id','admission_source_id'] pred_job = m.request_predictions(ds.id) preds = pred_job.get_result_when_complete() preds.set_index('row_id').join(prediction_data[passthrough_cols])
Very straight forward and pretty much what I tried first.
If you try to join on row_id you have the problem that SQL is not deterministic about order, so if you call the query twice and just join on row number, then the predictions can get attached to the wrong rows. At best you have to download to CSV first, and attach to that. But, then that forces me to use a CSV and handle the ensuing datatype problems. I did implement this, but it was unpopular at my work place. My source is a Snowflake database. I need to work directly through that.
But, the predictions download does not include data fields except those listed by hand in the Web GUI, and even then -- only when the download is done through the GUI, and not when done through the API. Apart from this being my experience - I also got official confirmation of this through my Datarobot contact, who has said that they will put in a request for a modification to the API that will allow me to do this through the API.
And, admittedly, I just don't like the idea of joining the two tables back together using row numbers. Any number of things could go wrong on the Datarobot end and cause massive havoc. Even just knowing that DR uses the same order as the query in any case at all is problematic (I could not find an assurance of this in the documentation). I would rather have a linking field that I personally supplied.
I do acknowledge that I, myself, have downloaded the predictions later rather than performing a blocking wait for the job, but another developer who mentioned the problem to me was using code that did wait for the job. So, I assume that that call produces the same download (without the extra fields).
Addendum - I just tried it, and can confirm that the "extra" fields were not supplied in the download using get_result_when_complete().
Ah got it, thanks for the explanation @Bruce.
I made some changes to the solution i posted in github to use the AI Catalog (instead of a local file) which I think will solve your problem - https://github.com/calamarif/datarobot_gui_to_code
Please let me know how you go, would be keen to get your feedback.