streamlining for speed when testing/debugging an API script
I'm developing a script that cycles through 7 x 5 x b model creations (where b is the number of blueprints) and evaluates them in terms of our metric as well as the which features have high impact, and then retraining the best ones on a higher sample size etc. It drags very slowly with respect to the need to test while building out its complexity.
I'm looking for ways to streamline this just for this testing/debugging phase. I've already whittled down to only 100 rows of data, only 3-5 features, 32 percent sample, and the fast training model types I could find so far (xgboost and Naive Bayes). Is there anything else I can do to speed up things, keeping in mind that at this point I don't care at all how well the models perform now; I'm just getting the plumbing in place.
1) Use AI Catalog to reduce the amount of uploads and reduce the risks of messing dataset
2) Reuse the same project to the maximum. For example, to check the performance of another feature list, you can create it in the same project and rerun autopilot on a new feature list with limits for partitions.
3) Name your projects and feature lists accordingly, add versions into their names, to track changes. 4) DataRobot doesn't guarantee the same validation split for different datasets, so preferably you should provide a consistent one by Partition feature 5) For tracking your performance in notebooks one may try using papermill and MLFlow python libraries
Actually I'm already doing (2) and (3). I'm not seeing upload time as a bottleneck, although this could become a factor; maybe for that I'll consider (1). In regard to (4), I'm using an outside holdout to measure all project models against. Within the projects, I'm thinking that CV within each project ensures that different splits in different projects won't matter much; but I'm looking at ways this partition feature may helpf. In regard to (5), I'll look at papermill and MLFlow, but could you elaborate on the specific application of these to my question?