There is simply no issue more important, more pressing, or more inescapable than the ongoing novel coronavirus (COVID-19) outbreak. As health care workers and public officials the world over attempt to grapple with a situation not seen in contemporary history, it is only natural that we as data professionals attempt to view the ongoing situation through a lens that is familiar – analytics. While wading into a discussion of such magnitude has its own unique considerations, there are still things we can learn. There is no discussion or practice of analytics in any context without a preceding discussion or evaluation of existing data. In the case of COVID-19, the amount and scope of existing datasets is overwhelming with a multitude of datasets both domestic and international. The widely available data does a good job of capturing the number of reported cases as well as the state/regional location and high-level status of each. However, there are endless possibilities of what could be achieved if existing datasets were more granular or more closely showed the connections between cases.

My colleagues and I have focused on Singapore, whose Ministry of Health has been tracking the links between each case and identifying known clusters, a practice known as contact tracing.  While Singapore is far from the only country or entity practicing contact tracing or widespread testing, their publishing of the anonymized cases by cluster and group presented a uniquely robust dataset to show the connections between known cases.

While analysis around the spread and connection of cases is a common facet of epidemiology, additional technological insights can be gained from the use of currently underused datasets. Mobile phone location data, in combination with analysis of known COVID-19 cases, could lead to a highly proactive and accurate system that flags points of spread, possibly adding additional accuracy to models of spread and giving stakeholders tangible results for decision making. But such detail is not without its pitfalls, as the recent epidemiological study and monitoring by Israel’s government have been met by harsh criticism from privacy advocates. Modeling efforts domestically could be similarly enhanced should improvements be made to public datasets around social factors such as income, employment, and ethnicity. Recent CDC Reports have begun to release additional information but still fall short of the kind of comprehensive information needed for this type of analysis.

This disease and its response muddle the already murky waters of privacy and data governance in a world where technology often outpaces the layperson’s understanding of it. Advanced analytics techniques such as network analysis and disease modeling efforts could be greatly enhanced with more accurate location data and more descriptive variables. This data can and should become more available and more accurate as officials recognize the need for more advanced tracking and as the initial surge of cases begins to come under control. It is also the responsibility of data experts to recognize that the unique circumstances of this outbreak require additional care and consideration.