Hi, Community!
I continue experimenting with Feature Discovery in Data Robot. My code sample is based on this learning session, but I'm using an API, but not the SDK. I managed to reproduce the whole process from data load to scoring, but now I have a question.
I'm using `POST /api/v2/projects/(projectId)/predictions/` endpoint with the data set id to run prediction in my POC app. And it works great. But from what I know it would be executed on the same workers that are used for model training that might be a bottleneck in the future. In our production application we are not using feature discovery, but using Prediction API to use prediction-dedicated workers and not share resources with models that are running model training. My goal is to use prediction-dedicated workers for predictions in the project where feature-discovery is enabled. But in Prediction API you have to upload CSV files, but not the data set id, that makes me think that it would not use feature discovery because it based on data sets.
So, can I make predictions using Prediction API for the model where feature discovery is enabled?
Best regards,
Evgeni
Solved! Go to Solution.
Evgeni,
To use the dedicated resource engines (as well as get the associated service and model health tracking metrics associated with it) - you'll have to register your model as a deployment. After that, you'll be able to use it for batch predictions.
For those features discovered by DataRobot; you can replicate the logic outside of DataRobot, and present them ready to score; or you can have DataRobot create them. For the latter, DataRobot will leverage the secondary datasets associated with your primary dataset in the project. This will include the creation of a secondary dataset configuration; this is where your might have a secondary dataset that takes snapshots of some data via a jdbc database connection, and you would specify which of those snapshots to use (the latest, or a specific snapshot, or in some cases - go back to the database when the prediction job starts.) - see this page for documentation.
Evgeni,
To use the dedicated resource engines (as well as get the associated service and model health tracking metrics associated with it) - you'll have to register your model as a deployment. After that, you'll be able to use it for batch predictions.
For those features discovered by DataRobot; you can replicate the logic outside of DataRobot, and present them ready to score; or you can have DataRobot create them. For the latter, DataRobot will leverage the secondary datasets associated with your primary dataset in the project. This will include the creation of a secondary dataset configuration; this is where your might have a secondary dataset that takes snapshots of some data via a jdbc database connection, and you would specify which of those snapshots to use (the latest, or a specific snapshot, or in some cases - go back to the database when the prediction job starts.) - see this page for documentation.
Thanks a lot, @doyouevendata !
Did I understand this right that to use a prediction API on a project where feature discovery is enabled I need:
1. Deploy my model
2. Use a Prediction API endpoint with a CSV file that contains prediction data in the same format as my primary data set. Then Data Robot would join data from secondary data sets to my file and do automated feature engineering in the same way as it does while training models.
Best regards,
Evgeni
@doyouevendata , thanks a lot. I will try it