1
|
Stolerman LM, Clemente L, Poirier C, Parag KV, Majumder A, Masyn S, Resch B, Santillana M. Using digital traces to build prospective and real-time county-level early warning systems to anticipate COVID-19 outbreaks in the United States. SCIENCE ADVANCES 2023; 9:eabq0199. [PMID: 36652520 PMCID: PMC9848273 DOI: 10.1126/sciadv.abq0199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 12/19/2022] [Indexed: 06/17/2023]
Abstract
Coronavirus disease 2019 (COVID-19) continues to affect the world, and the design of strategies to curb disease outbreaks requires close monitoring of their trajectories. We present machine learning methods that leverage internet-based digital traces to anticipate sharp increases in COVID-19 activity in U.S. counties. In a complementary direction to the efforts led by the Centers for Disease Control and Prevention (CDC), our models are designed to detect the time when an uptrend in COVID-19 activity will occur. Motivated by the need for finer spatial resolution epidemiological insights, we build upon previous efforts conceived at the state level. Our methods-tested in an out-of-sample manner, as events were unfolding, in 97 counties representative of multiple population sizes across the United States-frequently anticipated increases in COVID-19 activity 1 to 6 weeks before local outbreaks, defined when the effective reproduction number Rt becomes larger than 1 for a period of 2 weeks.
Collapse
Affiliation(s)
- Lucas M. Stolerman
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Department of Mathematics, Oklahoma State University, Stillwater, OK, USA
| | - Leonardo Clemente
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA
| | - Canelle Poirier
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Kris V. Parag
- NIHR Health Protection Research Unit, Behavioural Science and Evaluation, University of Bristol, Bristol, UK
| | | | - Serge Masyn
- Global Public Health, Janssen R&D, Beerse, Belgium
| | - Bernd Resch
- Department of Geoinformatics - Z-GIS, University of Salzburg, Salzburg, Austria
- Center for Geographic Analysis, Harvard University, Cambridge, MA, USA
| | - Mauricio Santillana
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA
- Harvard University, T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
2
|
Miller AR, Charepoo S, Yan E, Frost RW, Sturgeon ZJ, Gibbon G, Balius PN, Thomas CS, Schmitt MA, Sass DA, Walters JB, Flood TL, Schmitt TA. Reliability of COVID-19 data: An evaluation and reflection. PLoS One 2022; 17:e0251470. [PMID: 36327273 PMCID: PMC9632841 DOI: 10.1371/journal.pone.0251470] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 12/10/2021] [Indexed: 11/06/2022] Open
Abstract
IMPORTANCE The rapid proliferation of COVID-19 has left governments scrambling, and several data aggregators are now assisting in the reporting of county cases and deaths. The different variables affecting reporting (e.g., time delays in reporting) necessitates a well-documented reliability study examining the data methods and discussion of possible causes of differences between aggregators. OBJECTIVE To statistically evaluate the reliability of COVID-19 data across aggregators using case fatality rate (CFR) estimates and reliability statistics. DESIGN, SETTING, AND PARTICIPANTS Cases and deaths were collected daily by volunteers via state and local health departments, as primary sources and newspaper reports, as secondary sources. In an effort to begin comparison for reliability statistical analysis, BroadStreet collected data from other COVID-19 aggregator sources, including USAFacts, Johns Hopkins University, New York Times, The COVID Tracking Project. MAIN OUTCOMES AND MEASURES COVID-19 cases and death counts at the county and state levels. RESULTS Lower levels of inter-rater agreement were observed across aggregators associated with the number of deaths, which manifested itself in state level Bayesian estimates of COVID-19 fatality rates. CONCLUSIONS AND RELEVANCE A national, publicly available data set is needed for current and future disease outbreaks and improved reliability in reporting.
Collapse
Affiliation(s)
- April R. Miller
- Department of Public Health, Simmons University, Boston, Massachusetts, United States of America
| | - Samin Charepoo
- Department of Data Science and Neuroscience, Simmons University, Boston, Massachusetts, United States of America
| | - Erik Yan
- Duke Global Health Institute, Duke University, Durham, North Carolina, United States of America
| | - Ryan W. Frost
- Department of Mathematics & Statistics, Boston University, Boston, Massachusetts, United States of America
| | - Zachary J. Sturgeon
- Department of Physical Sciences, University of California San Diego, La Jolla, California, United States of America
| | - Grace Gibbon
- Global School of Public Health, New York University, New York City, New York, United States of America
| | - Patrick N. Balius
- Division of Environmental Health Sciences, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Cedonia S. Thomas
- Department of Biology, Tougaloo College, Tougaloo College, Tougaloo, Mississippi, United States of America
| | - Melanie A. Schmitt
- Pediatric Ophthalmology and Adult Strabismus, Department of Ophthalmology and Visual Sciences, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Daniel A. Sass
- Department of Management Science and Statistics, The University of Texas at San Antonio, San Antonio, Texas, United States of America
| | - James B. Walters
- BroadStreet Health, Milwaukee, Wisconsin, United States of America
| | - Tracy L. Flood
- BroadStreet Health, Milwaukee, Wisconsin, United States of America
| | | | | |
Collapse
|
3
|
Mukka M, Pesälä S, Juutinen A, Virtanen MJ, Mustonen P, Kaila M, Helve O. Online searches of children’s oseltamivir in public primary and specialized care: Detecting influenza outbreaks in Finland using dedicated databases for health care professionals. PLoS One 2022; 17:e0272040. [PMID: 35930527 PMCID: PMC9355218 DOI: 10.1371/journal.pone.0272040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 07/12/2022] [Indexed: 11/18/2022] Open
Abstract
Introduction
Health care professionals working in primary and specialized care typically search for medical information from Internet sources. In Finland, Physician’s Databases are online portals aimed at professionals seeking medical information. As dosage errors may occur when prescribing medication to children, professionals’ need for reliable medical information has increased in public health care centers and hospitals. Influenza continues to be a public health threat, with young children at risk of developing severe illness and easily transmitting the virus. Oseltamivir is used to treat children with influenza. The objective of this study was to compare searches for children’s oseltamivir and influenza diagnoses in primary and specialized care, and to determine if the searches could aid detection of influenza outbreaks.
Methods
We compared searches in Physician’s Databases for children’s oral suspension of oseltamivir (6 mg/mL) for influenza diagnoses of children under 7 years and laboratory findings of influenza A and B from the National Infectious Disease Register. Searches and diagnoses were assessed in primary and specialized care across Finland by season from 2012–2016. The Moving Epidemic Method (MEM) calculated seasonal starts and ends, and paired differences in the mean compared two indicators. Correlation was tested to compare seasons.
Results
We found that searches and diagnoses in primary and specialized care showed visually similar patterns annually. The MEM-calculated starting weeks in searches appeared mainly in the same week. Oseltamivir searches in primary care preceded diagnoses by −1.0 weeks (95% CI: −3.0, −0.3; p = 0.132) with very high correlation (τ = 0.913). Specialized care oseltamivir searches and diagnoses correlated moderately (τ = 0.667).
Conclusion
Health care professionals’ searches for children’s oseltamivir in online databases linked with the registers of children’s influenza diagnoses in primary and specialized care. Therefore, database searches should be considered as supplementary information in disease surveillance when detecting influenza epidemics.
Collapse
Affiliation(s)
- Milla Mukka
- University of Helsinki, Helsinki, Finland
- * E-mail:
| | - Samuli Pesälä
- University of Helsinki, Helsinki, Finland
- Epidemiological Operations Unit, City of Helsinki, Helsinki, Finland
| | - Aapo Juutinen
- Department of Health Security, Finnish Institute for Health and Welfare, Helsinki, Finland
| | - Mikko J. Virtanen
- Department of Health Security, Finnish Institute for Health and Welfare, Helsinki, Finland
| | | | - Minna Kaila
- Clinicum, University of Helsinki, Helsinki, Finland
| | - Otto Helve
- Department of Health Security, Finnish Institute for Health and Welfare, Helsinki, Finland
- Children’s Hospital, Pediatric Research Center, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| |
Collapse
|
4
|
Stockham N, Washington P, Chrisman B, Paskov K, Jung JY, Wall DP. Causal Modeling to Mitigate Selection Bias and Unmeasured Confounding in Internet-Based Epidemiology of COVID-19: Model Development and Validation. JMIR Public Health Surveill 2022; 8:e31306. [PMID: 35605128 PMCID: PMC9307267 DOI: 10.2196/31306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 02/22/2022] [Accepted: 05/17/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Selection bias and unmeasured confounding are fundamental problems in epidemiology that threaten study internal and external validity. These phenomena are particularly dangerous in internet-based public health surveillance, where traditional mitigation and adjustment methods are inapplicable, unavailable, or out of date. Recent theoretical advances in causal modeling can mitigate these threats, but these innovations have not been widely deployed in the epidemiological community. OBJECTIVE The purpose of our paper is to demonstrate the practical utility of causal modeling to both detect unmeasured confounding and selection bias and guide model selection to minimize bias. We implemented this approach in an applied epidemiological study of the COVID-19 cumulative infection rate in the New York City (NYC) spring 2020 epidemic. METHODS We collected primary data from Qualtrics surveys of Amazon Mechanical Turk (MTurk) crowd workers residing in New Jersey and New York State across 2 sampling periods: April 11-14 and May 8-11, 2020. The surveys queried the subjects on household health status and demographic characteristics. We constructed a set of possible causal models of household infection and survey selection mechanisms and ranked them by compatibility with the collected survey data. The most compatible causal model was then used to estimate the cumulative infection rate in each survey period. RESULTS There were 527 and 513 responses collected for the 2 periods, respectively. Response demographics were highly skewed toward a younger age in both survey periods. Despite the extremely strong relationship between age and COVID-19 symptoms, we recovered minimally biased estimates of the cumulative infection rate using only primary data and the most compatible causal model, with a relative bias of +3.8% and -1.9% from the reported cumulative infection rate for the first and second survey periods, respectively. CONCLUSIONS We successfully recovered accurate estimates of the cumulative infection rate from an internet-based crowdsourced sample despite considerable selection bias and unmeasured confounding in the primary data. This implementation demonstrates how simple applications of structural causal modeling can be effectively used to determine falsifiable model conditions, detect selection bias and confounding factors, and minimize estimate bias through model selection in a novel epidemiological context. As the disease and social dynamics of COVID-19 continue to evolve, public health surveillance protocols must continue to adapt; the emergence of Omicron variants and shift to at-home testing as recent challenges. Rigorous and transparent methods to develop, deploy, and diagnosis adapted surveillance protocols will be critical to their success.
Collapse
Affiliation(s)
- Nathaniel Stockham
- Neurosciences Interdepartmental Program, Stanford University, Palo Alto, CA, United States
| | - Peter Washington
- Department of Bioengineering, Stanford University, Stanford, CA, United States
| | - Brianna Chrisman
- Department of Bioengineering, Stanford University, Stanford, CA, United States
| | - Kelley Paskov
- Biomedical Informatics Program, Stanford University, Stanford, CA, United States
| | - Jae-Yoon Jung
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
| | - Dennis Paul Wall
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
- Department of Pediatrics, Stanford University, Stanford, CA, United States
| |
Collapse
|
5
|
Miliou I, Xiong X, Rinzivillo S, Zhang Q, Rossetti G, Giannotti F, Pedreschi D, Vespignani A. Predicting seasonal influenza using supermarket retail records. PLoS Comput Biol 2021; 17:e1009087. [PMID: 34252075 PMCID: PMC8297944 DOI: 10.1371/journal.pcbi.1009087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 07/22/2021] [Accepted: 05/15/2021] [Indexed: 11/19/2022] Open
Abstract
Increased availability of epidemiological data, novel digital data streams, and the rise of powerful machine learning approaches have generated a surge of research activity on real-time epidemic forecast systems. In this paper, we propose the use of a novel data source, namely retail market data to improve seasonal influenza forecasting. Specifically, we consider supermarket retail data as a proxy signal for influenza, through the identification of sentinel baskets, i.e., products bought together by a population of selected customers. We develop a nowcasting and forecasting framework that provides estimates for influenza incidence in Italy up to 4 weeks ahead. We make use of the Support Vector Regression (SVR) model to produce the predictions of seasonal flu incidence. Our predictions outperform both a baseline autoregressive model and a second baseline based on product purchases. The results show quantitatively the value of incorporating retail market data in forecasting models, acting as a proxy that can be used for the real-time analysis of epidemics.
Collapse
Affiliation(s)
- Ioanna Miliou
- University of Pisa, Pisa, Italy
- ISTI-CNR, Pisa, Italy
| | - Xinyue Xiong
- Northeastern University, Boston, Massachusetts, United States of America
| | | | - Qian Zhang
- Northeastern University, Boston, Massachusetts, United States of America
| | | | | | | | | |
Collapse
|
6
|
Predicting regional influenza epidemics with uncertainty estimation using commuting data in Japan. PLoS One 2021; 16:e0250417. [PMID: 33886669 PMCID: PMC8062106 DOI: 10.1371/journal.pone.0250417] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 04/06/2021] [Indexed: 11/19/2022] Open
Abstract
Obtaining an accurate prediction of the number of influenza patients in specific areas is a crucial task undertaken by medical institutions. Infections (such as influenza) spread from person to person, and people are rarely confined to a single area. Therefore, creating a regional influenza prediction model should consider the flow of people between different areas. Although various regional flu prediction models have previously been proposed, they do not consider the flow of people among areas. In this study, we propose a method that can predict the geographical distribution of influenza patients using commuting data to represent the flow of people. To elucidate the complex spatial dependence relations, our model uses an extension of the graph convolutional network (GCN). Additionally, a prediction interval for medical institutions is proposed, which is suitable for cyclic time series. Subsequently, we used the weekly data of flu patients from health authorities as the ground-truth to evaluate the prediction interval and performance of influenza patient prediction in each prefecture in Japan. The results indicate that our GCN-based model, which used commuting data, considerably improved the predictive accuracy over baseline values both temporally and spatially to provide an appropriate prediction interval. The proposed model is vital in practical settings, such as in the decision making of public health authorities and addressing growth in vaccine demand and workload. This paper primarily presents a GCN as a useful means for predicting the spread of an epidemic.
Collapse
|
7
|
Kogan NE, Clemente L, Liautaud P, Kaashoek J, Link NB, Nguyen AT, Lu FS, Huybers P, Resch B, Havas C, Petutschnig A, Davis J, Chinazzi M, Mustafa B, Hanage WP, Vespignani A, Santillana M. An early warning approach to monitor COVID-19 activity with multiple digital traces in near real time. SCIENCE ADVANCES 2021; 7:eabd6989. [PMID: 33674304 PMCID: PMC7935356 DOI: 10.1126/sciadv.abd6989] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 01/19/2021] [Indexed: 05/18/2023]
Abstract
Given still-high levels of coronavirus disease 2019 (COVID-19) susceptibility and inconsistent transmission-containing strategies, outbreaks have continued to emerge across the United States. Until effective vaccines are widely deployed, curbing COVID-19 will require carefully timed nonpharmaceutical interventions (NPIs). A COVID-19 early warning system is vital for this. Here, we evaluate digital data streams as early indicators of state-level COVID-19 activity from 1 March to 30 September 2020. We observe that increases in digital data stream activity anticipate increases in confirmed cases and deaths by 2 to 3 weeks. Confirmed cases and deaths also decrease 2 to 4 weeks after NPI implementation, as measured by anonymized, phone-derived human mobility data. We propose a means of harmonizing these data streams to identify future COVID-19 outbreaks. Our results suggest that combining disparate health and behavioral data may help identify disease activity changes weeks before observation using traditional epidemiological monitoring.
Collapse
Affiliation(s)
- Nicole E Kogan
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Leonardo Clemente
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.
| | - Parker Liautaud
- Department of Earth and Planetary Sciences, Harvard University, Cambridge, MA, USA.
| | - Justin Kaashoek
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Nicholas B Link
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Andre T Nguyen
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- University of Maryland, Baltimore County, Baltimore, MD, USA
- Booz Allen Hamilton, Columbia, MD, USA
| | - Fred S Lu
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Peter Huybers
- Department of Earth and Planetary Sciences, Harvard University, Cambridge, MA, USA
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Bernd Resch
- Department of Geoinformatics - Z_GIS, University of Salzburg, Salzburg, Austria
- Center for Geographic Analysis, Harvard University, Cambridge, MA, USA
| | - Clemens Havas
- Department of Geoinformatics - Z_GIS, University of Salzburg, Salzburg, Austria
| | - Andreas Petutschnig
- Department of Geoinformatics - Z_GIS, University of Salzburg, Salzburg, Austria
| | | | | | - Backtosch Mustafa
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - William P Hanage
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
8
|
Liu D, Clemente L, Poirier C, Ding X, Chinazzi M, Davis J, Vespignani A, Santillana M. Real-Time Forecasting of the COVID-19 Outbreak in Chinese Provinces: Machine Learning Approach Using Novel Digital Data and Estimates From Mechanistic Models. J Med Internet Res 2020; 22:e20285. [PMID: 32730217 PMCID: PMC7459435 DOI: 10.2196/20285] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 07/24/2020] [Accepted: 07/24/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The inherent difficulty of identifying and monitoring emerging outbreaks caused by novel pathogens can lead to their rapid spread; and if left unchecked, they may become major public health threats to the planet. The ongoing coronavirus disease (COVID-19) outbreak, which has infected over 2,300,000 individuals and caused over 150,000 deaths, is an example of one of these catastrophic events. OBJECTIVE We present a timely and novel methodology that combines disease estimates from mechanistic models and digital traces, via interpretable machine learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real time. METHODS Our method uses the following as inputs: (a) official health reports, (b) COVID-19-related internet search activity, (c) news media activity, and (d) daily forecasts of COVID-19 activity from a metapopulation mechanistic model. Our machine learning methodology uses a clustering technique that enables the exploitation of geospatial synchronicities of COVID-19 activity across Chinese provinces and a data augmentation technique to deal with the small number of historical disease observations characteristic of emerging outbreaks. RESULTS Our model is able to produce stable and accurate forecasts 2 days ahead of the current time and outperforms a collection of baseline models in 27 out of 32 Chinese provinces. CONCLUSIONS Our methodology could be easily extended to other geographies currently affected by COVID-19 to aid decision makers with monitoring and possibly prevention.
Collapse
Affiliation(s)
- Dianbo Liu
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| | - Leonardo Clemente
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States
- Tecnologico de Monterrey, Monterrey, Mexico
| | - Canelle Poirier
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| | - Xiyu Ding
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Harvard TH Chan School of Public Health, Boston, MA, United States
| | - Matteo Chinazzi
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, United States
| | - Jessica Davis
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, United States
| | - Alessandro Vespignani
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, United States
- ISI Foundation, Turin, Italy
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States
- Harvard TH Chan School of Public Health, Boston, MA, United States
| |
Collapse
|