Solved: Prediction API for the project with feature discov... - DataRobot Community

evgeni · ‎03-16-2021

Hi, Community!

I continue experimenting with Feature Discovery in Data Robot. My code sample is based on this learning session, but I'm using an API, but not the SDK. I managed to reproduce the whole process from data load to scoring, but now I have a question.

I'm using `POST /api/v2/projects/(projectId)/predictions/` endpoint with the data set id to run prediction in my POC app. And it works great. But from what I know it would be executed on the same workers that are used for model training that might be a bottleneck in the future. In our production application we are not using feature discovery, but using Prediction API to use prediction-dedicated workers and not share resources with models that are running model training. My goal is to use prediction-dedicated workers for predictions in the project where feature-discovery is enabled. But in Prediction API you have to upload CSV files, but not the data set id, that makes me think that it would not use feature discovery because it based on data sets.

So, can I make predictions using Prediction API for the model where feature discovery is enabled?

Best regards,

Evgeni

doyouevendata · ‎03-16-2021

Evgeni,

To use the dedicated resource engines (as well as get the associated service and model health tracking metrics associated with it) - you'll have to register your model as a deployment. After that, you'll be able to use it for batch predictions.

For those features discovered by DataRobot; you can replicate the logic outside of DataRobot, and present them ready to score; or you can have DataRobot create them. For the latter, DataRobot will leverage the secondary datasets associated with your primary dataset in the project. This will include the creation of a secondary dataset configuration; this is where your might have a secondary dataset that takes snapshots of some data via a jdbc database connection, and you would specify which of those snapshots to use (the latest, or a specific snapshot, or in some cases - go back to the database when the prediction job starts.) - see this page for documentation.

View solution in original post

doyouevendata · ‎03-16-2021

Evgeni,

To use the dedicated resource engines (as well as get the associated service and model health tracking metrics associated with it) - you'll have to register your model as a deployment. After that, you'll be able to use it for batch predictions.

For those features discovered by DataRobot; you can replicate the logic outside of DataRobot, and present them ready to score; or you can have DataRobot create them. For the latter, DataRobot will leverage the secondary datasets associated with your primary dataset in the project. This will include the creation of a secondary dataset configuration; this is where your might have a secondary dataset that takes snapshots of some data via a jdbc database connection, and you would specify which of those snapshots to use (the latest, or a specific snapshot, or in some cases - go back to the database when the prediction job starts.) - see this page for documentation.

evgeni · ‎03-16-2021

Thanks a lot, @doyouevendata !

Did I understand this right that to use a prediction API on a project where feature discovery is enabled I need:

1. Deploy my model

2. Use a Prediction API endpoint with a CSV file that contains prediction data in the same format as my primary data set. Then Data Robot would join data from secondary data sets to my file and do automated feature engineering in the same way as it does while training models.

Best regards,

Evgeni

doyouevendata · ‎03-16-2021

Correct - although not the additional configuration as in that documentation.
You can either upload the primary dataset - like that which drove your project - and based on the secondary configuration defined above, DataRobot will derive the discovered features for you. Alternatively, you can replicate the discovered features and simply include them in your csv. Note the deployments that use discovered features currently only support batch scoring.

evgeni · ‎03-16-2021

@doyouevendata , thanks a lot. I will try it

Prediction API for the project with feature discovery

Prediction API for the project with feature discovery

Fine tuning gemma 7b with LORA adaptors in AWS sag...

Oracle

How to make your own lagged features

Google Ads use case

Feature Generation