As a contribution to the fight against COVID-19 caused by SARS-CoV-2, this research is firstly focused on building a Machine Learning based model to predict, in a shorter time, confirmed COVID-19 cases among suspected cases, based on commonly collected laboratory exams during a visit to hospital´s emergency room. The original dataset is from Kaggle #1.
Addressing this issue is important, particularly in the context of an overwhelmed health system with possible limitations to perform tests for the detection of SARS-CoV-2. Traditional tests could be delayed even if only a target subpopulation would be tested. Avoid these delays is also important in order to reduce the risk of further illness spreading.
DataRobot Platform is used to build the model, calculate Prediction Probabilities, Features Impact, and Feature Effects, analyze Prediction Explanations, etc. Next, the model results are displayed, together with other important information, on full-interactive easy-to-digest dashboards in Tableau. Dashboards (and model) can be updated as frequently as new data and key information will be available.
The final objective is to use the dashboards as additional DIAGNOSIS TOOL that would help healthcare professionals to, for example, determine which patients to prioritize as more likely to be infected. Due to the fact that the model itself will reflect the particularities and nuances of distinct hospitals, cities, states, or countries, the proposed DIAGNOSIS TOOL would be very flexible and customizable.
In order to improve the model already built a search for more datasets that include laboratory test results, physician comments, etc., has been carried out. Unfortunately, until now the search has been unsuccessful.
To fill the gap and as an example, information regarding images of CT scans of anonymous patients that had been already diagnosed with COVID-19 (Kaggle #2), has been manually blended with the original dataset. A snap-shot of the resulting data table is shown above.
The constructed data table was used to rebuild the original model. The basic idea was to analyze the impact on the model´s performance when additional key knowledge is taking into account. Results are very promising and the insight derived would be very actionable.
DASHBOARDS here depicts the results obtained so far. It has been included an UNCERTAINTY area/zone (to be defined by the domain´s expert) that characterizes a fraction of negative-diagnostic-cases that could require more tests and deeper analysis; chest CT scan images of some patients with confirmed coronavirus disease (and physician comments) have been also included in the dashboards, in order to have these critical data points readily available alongside Prediction Probabilities, and the features/variables that the model found important for predicting confirmed COVID-19 cases among suspected cases.
In order to continue improving the proposed DIAGNOSTIC TOOL, more anonymized-real-datasets (similar to the data table above), would be required to train and fine-tune models in DataRobot.
Please, data, any suggestions, and comments would be kindly appreciated.