Training machine learning models on relatively large datasets (e.g., hundreds of gigabytes) can be challenging. In this blog post, I present two methods for solving that challenge; each compares favorably to training on the full dataset.
Using the DataRobot R client package and historical data from the U.S., France, and Spain, I created an Automated Time Series model that illustrates the effectiveness of lockdowns. Check it out and try it out yourself!
You can find a new tutorial on how to replicate the predicting COVID-19 model at the county level using DataRobot here.
This post includes a video walkthrough on how to create these models in the GUI, as well as a Python notebook for achieving this programatically. You can also find the dataset attached, as well as some advice on data sources for COVID-19.
Please feel free to post any questions you might have in Community. We'd also love to hear how you are modeling for COVID-19.
Here, I explain what target leakage is, show you an example, provide a few suggestions to avoid potential leakage, and then explain the target leakage detection support built into the DataRobot platform.
Community v1.5 Phase 3 is here!
Phase 3 of the DataRobot Community UI/UX v1.5 revamp is now live! When you get a chance, have a look at the style and layout changes in the Read category and Search results. You'll also find some improvements to the search functionality that make it easier to find just the answers you need. Look for more info about Phase 3, and for what's coming next.