I access datarobot through python for training/scoring/evaluating models from databricks.
I was wondering if there's a pre-built datarobot flavor for mlflow, or if anyone has worked on making an mlflow model for data robot?
I'm not seeing anything after a solid day of research. I've been making my own with pyfunc, but this is a hail mary to see if anyone else has also worked on it.
Yes and I'd be interested in whatever you have, at whatever stage it is (reasons at the end). I'll preface it that I'm getting familiar with mlflow right now, by no means an expert/know all of what it can do yet. The biggest reason is that mlflow is a core part of the recommended machine learning workflow on databricks. Datarobot is a (if not the) core part of our machine learning stack, so I'm left trying to reconcile these tools.
This current project uses datarobot time aware modeling. I'm working with a business team at the local site, and with analysts on our team. The mlflow features I'm currently looking at are definitely metrics logging, parameter logging, associating that with a project id and model id, independent trial and error among analysts on the team (and reconciling/housing/evaluating them in a central place).
Also, without a template for datarobot with mlflow, I'm having to think through and implement what-I-think-is-a-good-workflow (e.g. what parameters to log and where, what diagnostics to save, what metadata should be saved by default, how do I represent the leaderboard, what does making the model available for other users look like, what does 'production' look like, ...)
It would be nice if there was a framework that helped gently reinforced/made it easy to follow best practices with datarobot.
We have notebooks with papermill and mlflow for tracking experiments on use cases. Please let us know what exactly are you looking for in using mlflow? Is it metric and artifact tracking?