cancel
Showing results for 
Search instead for 
Did you mean: 

Forecasting COVID-19 Reported Cases

dalilaB
Data Scientist
Data Scientist
1 0 804

Introduction

Does a lockdown help reduce the spread of the virus and flatten the curve? When a lockdown is imposed, how long long before affects the spread of the virus or on reduces the number of new cases?  Unfortunately, the last pandemic was the Spanish Flu in 1918, long before the internet, easy data collection, or computers. To use machine learning to answer these questions, the lack of historical data meant we had to start from scratch; we had to collect sufficient time series data to make a forecast. 

Collecting this data was a challenge, at least in the beginning. In the U.S., for over 22 days the number of cases was in the single digits; by March 15th, 2020, there were only cases were in the triple digits, and only 24 historical days to forecast with: not enough to forecast 15 days ahead. 

On March 15th 2020, I decided to answer these questions by building a 15-day “total confirmed cases” forecast model for the U.S., France, and Spain. I chose the U.S. because at the time it didn’t have a cohesive approach for a lockdown, while the other three countries had just begun their lockdown efforts. I waited and waited. Then, on March 31st, I compared my forecast to reality. The challenge of this work was that I didn’t have sufficient time series data to build a reliable 15-days-ahead forecast, given that I had only 24 days of data. My solution was to build a model to forecast only one day ahead and then use that model to expand out to 15 days.  (This model would assume no policy changes were implemented.)   

Does Lockdown Flatten the Curve?

On March 15th, I decided to forecast the total number of confirmed cases in the U.S., France, and Spain by March 30th. While France and Spain had just started a national lockdown, the U.S. didn’t have consistent or comprehensive social distancing and lockdown measures. On March 31st, I evaluated my forecast to the numbers on the ground and was astonished at how close my forecast (170,002) was to the observed (167,007) total number of confirmed cases with a margin of error of around 5%.

dalilaB_0-1611679532548.png

This was impressive until you realize many U.S. states and counties had not designed a strategy to deal with the spread of the virus (see table below). Although schools were closed on March 16, 2020, the stay-at-home order for most states didn’t actually start until March 30th; this means the impressive model performance had to do with the absence of policies to slow down the virus. 

Table: Formal Policies in Place 

States

Stay At Home

School Closure 

Non Essential Closure

Arizona

3/30/20

3/16/20

No

Alabama

No

3/19/20

3/28/20

DC

3/30/20

3/16/20

3/25/20

Maryland

3/30/20

3/16/20

3/23/20

Virginia

3/30/20

3/16/20

No

Oklahoma

No

3/17/20

No

The models for the U.S., France, and Spain were built during the same time period. With the exception of the U.S., the model performance diverged significantly from the actual situation (see figures below).  One explanation for the deviation: unlike the U.S., these countries imposed lockdowns. Spain imposed its lockdown (March 15th, 2020), followed by France (March 16th, 2020); these lockdown dates were after the training dates. In both cases, by March 20th the forecasted trend was much higher than the observed, which supports the suggestion of lockdown effectiveness. (Side note: The differences in the curves below have to do with each country’s lockdown level and enforcement.)  

dalilaB_1-1611679532539.png

dalilaB_2-1611679532540.png

On April 1st I decided to perform another 15-day forecast, but for only the U.S. this time.  After training, I used the DataRobot-recommended model—Non-Seasonal Auto-ARIMA—to forecast out for the next 15 days. By that date, many state governors had decided to impose a lockdown and social distancing, and I decided to do the forecast with a twist. What would the total number of confirmed cases be on April 15th if the U.S. followed the Spain or France models? I knew the lockdown had an effect on slowing the growth of the total number of confirmed cases in Spain and France, but would the total number of confirmed cases in the U.S. be closer to one of these countries versus the other, or will the old model still work? I plotted my forecasts and waited for the number of total confirmed cases to emerge.

dalilaB_4-1611679532547.png

On April 16th, 2020, the U.S. recorded 648,148 confirmed cases. The French model forecasted 540,701, and the Spanish model forecasted 1.4 million. On the other hand, the U.S. model built before the lockdown was imposed forecasted 6 million confirmed cases. Fortunately, the confirmed cases in the U.S. were substantially smaller than the forecasted model when lockdown policies were not widely imposed in all the states. Furthermore, the number recorded for the U.S. was approximately 20 percent higher than the French model forecast. This means the U.S. was able to significantly reduce the number of new cases by imposing a general lockdown in most of the states.

The model shows that a lockdown works and, within only days, effectively reduces the spread of a virus. I shared my sample code and training data in this DataRobot Community GitHub repo. Try it out yourself and let me know what you get. Looking forward to hearing your results! 

Announcements
Need a Tip?
DataRobot experts are putting together some helpful DataRobot usage tips for the platform, trial, features, etc. You can find these easily in the Tip of the Day board (under Read). Let us know if you've found a good one or have a good one to add!

New to DataRobot? Check out all the resources to help you get going quickly! See the quick index for Knowledge Base Resources and quick index for Learning Sessions to find links to some great learning content.

DataRobot Release 7.1
Learn about changes in DataRobot Release 7.1. You can also watch the on-demand webinar. If you have questions about the release, ask them right here!