From the Python API, how does one specify the optional features for a prediction run?
The item in the field on the web GUI "Optional features (0 of 5)".
H @calamari -- thanks, I appreciate your effort on this. I suspect that something like what you seem to be doing could work. I will look into it in more detail in the next sprint. Bruce.
Ah got it, thanks for the explanation @Bruce.
I made some changes to the solution i posted in github to use the AI Catalog (instead of a local file) which I think will solve your problem - https://github.com/calamarif/datarobot_gui_to_code
Please let me know how you go, would be keen to get your feedback.
Very straight forward and pretty much what I tried first.
If you try to join on row_id you have the problem that SQL is not deterministic about order, so if you call the query twice and just join on row number, then the predictions can get attached to the wrong rows. At best you have to download to CSV first, and attach to that. But, then that forces me to use a CSV and handle the ensuing datatype problems. I did implement this, but it was unpopular at my work place. My source is a Snowflake database. I need to work directly through that.
But, the predictions download does not include data fields except those listed by hand in the Web GUI, and even then -- only when the download is done through the GUI, and not when done through the API. Apart from this being my experience - I also got official confirmation of this through my Datarobot contact, who has said that they will put in a request for a modification to the API that will allow me to do this through the API.
And, admittedly, I just don't like the idea of joining the two tables back together using row numbers. Any number of things could go wrong on the Datarobot end and cause massive havoc. Even just knowing that DR uses the same order as the query in any case at all is problematic (I could not find an assurance of this in the documentation). I would rather have a linking field that I personally supplied.
I do acknowledge that I, myself, have downloaded the predictions later rather than performing a blocking wait for the job, but another developer who mentioned the problem to me was using code that did wait for the job. So, I assume that that call produces the same download (without the extra fields).
Addendum - I just tried it, and can confirm that the "extra" fields were not supplied in the download using get_result_when_complete().
Hey @Bruce , I decided to come up with a solution https://github.com/calamarif/datarobot_gui_to_code (please read on before you click on it), because on presenting this to a talented colleague ( @Lukas ) , he said "that's great, but hold my beer" (slight misrepresentation of his words), and asked why didn't. I just do this:
import datarobot as dr
import pandas as pd
p = dr.Project.get('6242f741974efc8f60e26fcf')
prediction_data = pd.read_excel('/Users/lukas.innig/DataRobot/Datasets/10k_diabetes.xlsx')[:100]
ds = p.upload_dataset(prediction_data)
m = p.get_models()[0]
passthrough_cols = ['admission_type_id','discharge_disposition_id','admission_source_id']
pred_job = m.request_predictions(ds.id)
preds = pred_job.get_result_when_complete()
preds.set_index('row_id').join(prediction_data[passthrough_cols])
DataRobot is made around automatization, so Datarobot is made to reduce code, and DR API is made to give you access to all the operations in-app through code. Uploading predictions is routine we automated for our users. You had a question about troubles with aligning predictions back, now you have two options - write code as you want it, or use what is automated.
@Bogdan Tsal-Tsalko "best code is no code" is an oxymoron. It is not a deep concept. It just means the person who said it is no good at code. How about "The best GUI is no GUI". Fashions change but what you are telling me is that the DR API is intended to discourage people from using it at all.
@Eu Jin This was definitely not the message I got in my first meeting about the functionality of the DR API.
"Best code is no code at all" (c) Jeff Atwood, co-founder of Stack Overflow and Stack Exchange.
We tried to solve as many problems and edge cases along with stability and reliability as possible from our side.
I will look into this.
But why would I want my "experience" to be code free? Code is easy. Its dealing with GUIs that is a problem.
Did you try the Snowflake manual on integration with DataRobot? We highly value our efforts to integrate with the Snowflake environment to make your experience easy and code-free.
Please let us know if there are any issues with ensuring the order of predictions returning to Snowflake or any troubles with your use-case using it.
I was using that basic approach, when I uploaded from a csv file.
But, I am now using a dynamic query from snowflake.
So, my concern is that previous developers at my place of work have found that since SQL can return results in indeterminate order, that the association using the order of the rows is unreliable. This was presented to me, when I started, as a big problem that was fixed by us adding a defacto key to the prediction explanations data. The very key that I cannot (apparently) set using the API. Which is still a weird oversight that should be fixed - regardless of whether there is a workaround.
I have modified the SQL queries so that they should return a deterministic order. But, it is unclear to me that Datarobot will respect that order, and that is certainly not something that I am comfortable taking the word of Datarobot about. I would much rather be able to cross check that by having my own key exist in the data.
------
I am using the only Python library I am aware for this purpose: "datarobot".