From the Python API, how does one specify the optional features for a prediction run?
The item in the field on the web GUI "Optional features (0 of 5)".
I need to know the API you are using to give concrete advice.
If you are using real-time Python API (the most popular one), then you create pandas data frame for prediction, get predictions, and just set new column equals prediction (because the order is kept by DataRobot).
If you do prediction and receive it in different environments then before prediction you save the data with all columns you need with row index generated from 0 with step 1. And when you read the predictions you join them to the previous data (the one you saved) by "rowId".
In both situations, DataRobot doesn't need to send column values back to your environment.
So, I should upload the data into DR for prediction and then DR will attach rowid. And then DR will link the predictions to the data by rowid. So, I download the predictions and download the data, both of which have the DR row number, and then join on rowid.
That sounds plausible.
Is that your advice?
JFTR - That does not seem to lead to any saving on bandwidth.
As I mentioned above that extra column with a single integer is "rowId" ensuring your order. This column is applicable to any data and project so it doesn't need to be specified extra.
The excuse that Datarobot is saving on transmission bandwidth is lame. Even just one extra column with a single integer that I control would be enough - and without that information it makes things rather hard. And that should be my choice, not Datarobot's. I could change the query so that I am absolutely certain that it has a very specific order - but then how can I be sure that DR will respect that order when it downloads it for the predictions? SQL relations have no inherent order except on output.
Also, you can do it through the Web GUI, so again - this is a lame excuse. I am only asking to do through the API what I can already do through the GUI.
But, I take it that you are actually saying that DR is that lame. Wow!
@dustin.burke can you confirm this limitation on the DR API ?
I see your concern about getting features back along with prediction. The reason behind not providing features back - is reducing amount information transmitted through the internet along with reducing the latency. Instead one have all the data in code already, and it is believed to be easy to set one additional column with predictions. If there is concern of order problems, one may check that with "rowId" provided along with predictions and join by it.
Let me know if you need support in coding that.
I did just find this: https://community.datarobot.com/t5/platform/python-api-get-predictions-with-some-features/m-p/10957#...
which says, essentially, that it absolutely cannot be done. Please do not tell me that the DataRobot interface is that stupid. What I need is an API call that allows me to specify the optional features because I am not in a position to rely on the order that DataRobot and SQL happen to think the data should be in this time. Do I have to have the code pop up the DR Web GUI and say "please human enter this code"? That would be enough for me to vote against renewal of the service contract. FWIW.