cancel
Showing results for 
Search instead for 
Did you mean: 

Joining Metadata and prediction result

Joining Metadata and prediction result

Hi 

 

To keep a history of all BatchPredictions (scheduled jobs), we need metadata. Let's say I have a table of Metadata and a table of Prediction. How can I merge these two tables and get the related prediction_id from the metadata?

 

Cheers

Mali

0 Kudos
9 Replies
dalilaB
DataRobot Alumni

Is one metadata for each batch predictions?  Do each batch prediction has a unique id.  For instance, batch prediction 1 will have batch_id 1, batch prediction 2 has batch_id 2.  I'm assuming that the metadata has also a field called batch_id referring the the same batch_id in each batch.  If so, you can just perform an inner join with key batch_id.  
Where do you have the batches?  Have you appended them to each other?  if so, than you will just need an inner join, else, you need to append them and then perform an inner join.
If your batches and metadata reside in AI Catalog, you can use Workspace to set up a pipeline with SPARK SQL.
I hope this answers your question.

0 Kudos

Thanks, Dalila for your reply.

Yes, each batch prediction has a batchPredictions_id, but there is no batchPredictions_id in the prediction_output.Objects DiagramObjects Diagram

As you can see we have metadata with BatchPrediction_id and also in separate table predictios_output(Binary Classification Models), but no unique id as BatchPrediction_id in predictions output. The blue field is coming from the input dataset the rest is created automatically by Datarobot.

 

My question is how to join Predictions output with the related metadata?

 

Thanks

Mali

 

0 Kudos

I see in the Predictions_MetaData you have output_dataStoreId . One possible solution is to add this field to the Binary classification models with the blue columns (output_dataStoreId
If the The blue field is coming from the input dataset the rest is created automatically by Datarobot then add that dataset id to your prediction output  

0 Kudos

Yes, that's a good idea, but output_dataStoreId is created at the same time as predictions, so we need datarobot to add this unique key for both (predictions and metadata) in order to establish connections between them.
What is the output_dataStoreId for that specific prediction if we want to add it after both of these data have been generated?

0 Kudos

FYI, I've found the output_dataStoreId is not unique as you can see here is repeated for different batchPredictions.Screen Shot 2022-07-08 at 3.18.23 PM.png

0 Kudos

A batch prediction has an ID field that is prediction_id and unique, so if this field exists in the predictions csv, then metadata and predictions can be joined.

dalilaB
DataRobot Alumni

From the sheet, I see that the Input_dataStoreId but not the Output_dataStoreId.  Can you please screenshot again your sheet with output_dataStoreId included?  Thanks

0 Kudos
dalilaB
DataRobot Alumni

I checked with the MLOps team.  Here is their answer:  You seem like you want to include the batch prediction job ID itself in the output.  That is not something that is supported

0 Kudos

Hi 

It means having BatchPrediction_id in the prediction result is impossible by the UI, my question is, if we use Zepl and create BatchPredictions by the python code, is it possible to add BatchPrediction_id in the prediction output?

 

0 Kudos