Joining Metadata and prediction result

Mali · ‎07-06-2022

Hi

To keep a history of all BatchPredictions (scheduled jobs), we need metadata. Let's say I have a table of Metadata and a table of Prediction. How can I merge these two tables and get the related prediction_id from the metadata?

Cheers

Mali

dalilaB · ‎07-06-2022

Is one metadata for each batch predictions? Do each batch prediction has a unique id. For instance, batch prediction 1 will have batch_id 1, batch prediction 2 has batch_id 2. I'm assuming that the metadata has also a field called batch_id referring the the same batch_id in each batch. If so, you can just perform an inner join with key batch_id.
Where do you have the batches? Have you appended them to each other? if so, than you will just need an inner join, else, you need to append them and then perform an inner join.
If your batches and metadata reside in AI Catalog, you can use Workspace to set up a pipeline with SPARK SQL.
I hope this answers your question.

Mali · ‎07-06-2022

Thanks, Dalila for your reply.

Yes, each batch prediction has a batchPredictions_id, but there is no batchPredictions_id in the prediction_output.Objects Diagram

As you can see we have metadata with BatchPrediction_id and also in separate table predictios_output(Binary Classification Models), but no unique id as BatchPrediction_id in predictions output. The blue field is coming from the input dataset the rest is created automatically by Datarobot.

My question is how to join Predictions output with the related metadata?

Thanks

Mali

dalilaB · ‎07-07-2022

I see in the Predictions_MetaData you have output_dataStoreId . One possible solution is to add this field to the Binary classification models with the blue columns (output_dataStoreId)
If the The blue field is coming from the input dataset the rest is created automatically by Datarobot then add that dataset id to your prediction output

Mali · ‎07-07-2022

Yes, that's a good idea, but output_dataStoreId is created at the same time as predictions, so we need datarobot to add this unique key for both (predictions and metadata) in order to establish connections between them.
What is the output_dataStoreId for that specific prediction if we want to add it after both of these data have been generated?

Mali · ‎07-07-2022

FYI, I've found the output_dataStoreId is not unique as you can see here is repeated for different batchPredictions.

Mali · ‎07-07-2022

A batch prediction has an ID field that is prediction_id and unique, so if this field exists in the predictions csv, then metadata and predictions can be joined.

dalilaB · ‎07-08-2022

From the sheet, I see that the Input_dataStoreId but not the Output_dataStoreId. Can you please screenshot again your sheet with output_dataStoreId included? Thanks

dalilaB · ‎07-08-2022

I checked with the MLOps team. Here is their answer: You seem like you want to include the batch prediction job ID itself in the output. That is not something that is supported

Mali · ‎07-12-2022

Hi

It means having BatchPrediction_id in the prediction result is impossible by the UI, my question is, if we use Zepl and create BatchPredictions by the python code, is it possible to add BatchPrediction_id in the prediction output?

Joining Metadata and prediction result

Oracle

How to make your own lagged features

Google Ads use case

Feature Generation

Downloaded Predictions do not Match Targets