I access datarobot through python for training/scoring/evaluating models from databricks.
I was wondering if there's a pre-built datarobot flavor for mlflow, or if anyone has worked on making an mlflow model for data robot?
I'm not seeing anything after a solid day of research. I've been making my own with pyfunc, but this is a hail mary to see if anyone else has also worked on it.
I'm not aware of any pre-built mlflow "flavor" specifically for Datarobot, but you may be able to integrate Datarobot with mlflow through the Pyfunc flavor.
The Pyfunc flavor allows you to package any Python function that can be used to make predictions on new data into a format that can be deployed with mlflow. You can define a function that loads the Datarobot model and use it as the predict function in the Pyfunc model.
Here's an example code snippet that shows how you can define a Pyfunc model that uses a Datarobot model for predictions:
import datarobot as dr
# Define the function that loads the Datarobot model and makes predictions
# Load the Datarobot model
model = dr.Model.get('YOUR_MODEL_ID')
# Make predictions on the input data
predictions = model.predict(data)
# Define the Pyfunc model
def predict(self, context, model_input):
# Log the Pyfunc model with mlflow
In this example, you would replace "YOUR_MODEL_ID" with the actual ID of your Datarobot model.
After logging the model with mlflow, you can use the standard mlflow deployment tools to deploy the model to different target environments.
I hope this helps! Let me know if you have any more questions.
Sure Jonathan, will reach out to you on your email.
Yes and I'd be interested in whatever you have, at whatever stage it is (reasons at the end). I'll preface it that I'm getting familiar with mlflow right now, by no means an expert/know all of what it can do yet. The biggest reason is that mlflow is a core part of the recommended machine learning workflow on databricks. Datarobot is a (if not the) core part of our machine learning stack, so I'm left trying to reconcile these tools.
This current project uses datarobot time aware modeling. I'm working with a business team at the local site, and with analysts on our team. The mlflow features I'm currently looking at are definitely metrics logging, parameter logging, associating that with a project id and model id, independent trial and error among analysts on the team (and reconciling/housing/evaluating them in a central place).
Also, without a template for datarobot with mlflow, I'm having to think through and implement what-I-think-is-a-good-workflow (e.g. what parameters to log and where, what diagnostics to save, what metadata should be saved by default, how do I represent the leaderboard, what does making the model available for other users look like, what does 'production' look like, ...)
It would be nice if there was a framework that helped gently reinforced/made it easy to follow best practices with datarobot.
We have notebooks with papermill and mlflow for tracking experiments on use cases. Please let us know what exactly are you looking for in using mlflow? Is it metric and artifact tracking?