cancel
Showing results for 
Search instead for 
Did you mean: 

Joining Metadata and prediction result

Mali
Image Sensor

Joining Metadata and prediction result

Hi 

 

To keep a history of all BatchPredictions (scheduled jobs), we need metadata. Let's say I have a table of Metadata and a table of Prediction. How can I merge these two tables and get the related prediction_id from the metadata?

 

Cheers

Mali

Labels (1)
0 Kudos
9 Replies
dalilaB
Data Scientist
Data Scientist

Is one metadata for each batch predictions?  Do each batch prediction has a unique id.  For instance, batch prediction 1 will have batch_id 1, batch prediction 2 has batch_id 2.  I'm assuming that the metadata has also a field called batch_id referring the the same batch_id in each batch.  If so, you can just perform an inner join with key batch_id.  
Where do you have the batches?  Have you appended them to each other?  if so, than you will just need an inner join, else, you need to append them and then perform an inner join.
If your batches and metadata reside in AI Catalog, you can use Workspace to set up a pipeline with SPARK SQL.
I hope this answers your question.

0 Kudos
Mali
Image Sensor

Thanks, Dalila for your reply.

Yes, each batch prediction has a batchPredictions_id, but there is no batchPredictions_id in the prediction_output.Objects DiagramObjects Diagram

As you can see we have metadata with BatchPrediction_id and also in separate table predictios_output(Binary Classification Models), but no unique id as BatchPrediction_id in predictions output. The blue field is coming from the input dataset the rest is created automatically by Datarobot.

 

My question is how to join Predictions output with the related metadata?

 

Thanks

Mali

 

0 Kudos
dalilaB
Data Scientist
Data Scientist

I see in the Predictions_MetaData you have output_dataStoreId . One possible solution is to add this field to the Binary classification models with the blue columns (output_dataStoreId
If the The blue field is coming from the input dataset the rest is created automatically by Datarobot then add that dataset id to your prediction output  

0 Kudos
Mali
Image Sensor

Yes, that's a good idea, but output_dataStoreId is created at the same time as predictions, so we need datarobot to add this unique key for both (predictions and metadata) in order to establish connections between them.
What is the output_dataStoreId for that specific prediction if we want to add it after both of these data have been generated?

0 Kudos
Mali
Image Sensor

FYI, I've found the output_dataStoreId is not unique as you can see here is repeated for different batchPredictions.Screen Shot 2022-07-08 at 3.18.23 PM.png

0 Kudos
Mali
Image Sensor

A batch prediction has an ID field that is prediction_id and unique, so if this field exists in the predictions csv, then metadata and predictions can be joined.

dalilaB
Data Scientist
Data Scientist

From the sheet, I see that the Input_dataStoreId but not the Output_dataStoreId.  Can you please screenshot again your sheet with output_dataStoreId included?  Thanks

0 Kudos
dalilaB
Data Scientist
Data Scientist

I checked with the MLOps team.  Here is their answer:  You seem like you want to include the batch prediction job ID itself in the output.  That is not something that is supported

0 Kudos
Mali
Image Sensor

Hi 

It means having BatchPrediction_id in the prediction result is impossible by the UI, my question is, if we use Zepl and create BatchPredictions by the python code, is it possible to add BatchPrediction_id in the prediction output?

 

0 Kudos