1
|
Wang D, Lentzen M, Botz J, Valderrama D, Deplante L, Perrio J, Génin M, Thommes E, Coudeville L, Fröhlich H. Development of an early alert model for pandemic situations in Germany. Sci Rep 2023; 13:20780. [PMID: 38012282 PMCID: PMC10682010 DOI: 10.1038/s41598-023-48096-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 11/22/2023] [Indexed: 11/29/2023] Open
Abstract
The COVID-19 pandemic has pointed out the need for new technical approaches to increase the preparedness of healthcare systems. One important measure is to develop innovative early warning systems. Along those lines, we first compiled a corpus of relevant COVID-19 related symptoms with the help of a disease ontology, text mining and statistical analysis. Subsequently, we applied statistical and machine learning (ML) techniques to time series data of symptom related Google searches and tweets spanning the time period from March 2020 to June 2022. In conclusion, we found that a long-short-term memory (LSTM) jointly trained on COVID-19 symptoms related Google Trends and Twitter data was able to accurately forecast up-trends in classical surveillance data (confirmed cases and hospitalization rates) 14 days ahead. In both cases, F1 scores were above 98% and 97%, respectively, hence demonstrating the potential of using digital traces for building an early alert system for pandemics in Germany.
Collapse
Affiliation(s)
- Danqi Wang
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany.
| | - Manuel Lentzen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, University of Bonn, Friedrich Hirzebruch-Allee 6, 53115, Bonn, Germany
| | - Jonas Botz
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, University of Bonn, Friedrich Hirzebruch-Allee 6, 53115, Bonn, Germany
| | - Diego Valderrama
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, University of Bonn, Friedrich Hirzebruch-Allee 6, 53115, Bonn, Germany
| | | | - Jules Perrio
- Quinten Health, 8 Rue Vernier, 75017, Paris, France
| | - Marie Génin
- Quinten Health, 8 Rue Vernier, 75017, Paris, France
| | | | | | - Holger Fröhlich
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany.
- Bonn-Aachen International Center for IT, University of Bonn, Friedrich Hirzebruch-Allee 6, 53115, Bonn, Germany.
| |
Collapse
|
2
|
Luca M, Campedelli GM, Centellegher S, Tizzoni M, Lepri B. Crime, inequality and public health: a survey of emerging trends in urban data science. Front Big Data 2023; 6:1124526. [PMID: 37303974 PMCID: PMC10248183 DOI: 10.3389/fdata.2023.1124526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 05/10/2023] [Indexed: 06/13/2023] Open
Abstract
Urban agglomerations are constantly and rapidly evolving ecosystems, with globalization and increasing urbanization posing new challenges in sustainable urban development well summarized in the United Nations' Sustainable Development Goals (SDGs). The advent of the digital age generated by modern alternative data sources provides new tools to tackle these challenges with spatio-temporal scales that were previously unavailable with census statistics. In this review, we present how new digital data sources are employed to provide data-driven insights to study and track (i) urban crime and public safety; (ii) socioeconomic inequalities and segregation; and (iii) public health, with a particular focus on the city scale.
Collapse
Affiliation(s)
- Massimiliano Luca
- Mobile and Social Computing Lab, Bruno Kessler Foundation, Trento, Italy
- Faculty of Computer Science, Free University of Bolzano, Bolzano, Italy
| | | | | | - Michele Tizzoni
- Department of Sociology and Social Research, University of Trento, Trento, Italy
| | - Bruno Lepri
- Mobile and Social Computing Lab, Bruno Kessler Foundation, Trento, Italy
| |
Collapse
|
3
|
Wolken M, Sun T, McCall C, Schneider R, Caton K, Hundley C, Hopkins L, Ensor K, Domakonda K, Kalvapalle P, Persse D, Williams S, Stadler LB. Wastewater surveillance of SARS-CoV-2 and influenza in preK-12 schools shows school, community, and citywide infections. WATER RESEARCH 2023; 231:119648. [PMID: 36702023 PMCID: PMC9858235 DOI: 10.1016/j.watres.2023.119648] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 12/16/2022] [Accepted: 01/18/2023] [Indexed: 06/17/2023]
Abstract
Wastewater surveillance is a passive and efficient way to monitor the spread of infectious diseases in large populations and high transmission areas such as preK-12 schools. Infections caused by respiratory viruses in school-aged children are likely underreported, particularly because many children may be asymptomatic or mildly symptomatic. Wastewater monitoring of SARS-CoV-2 has been studied extensively and primarily by sampling at centralized wastewater treatment plants, and there are limited studies on SARS-CoV-2 in preK-12 school wastewater. Similarly, wastewater detections of influenza have only been reported in wastewater treatment plant and university manhole samples. Here, we present the results of a 17-month wastewater monitoring program for SARS-CoV-2 (n = 2176 samples) and influenza A and B (n = 1217 samples) in 51 preK-12 schools. We show that school wastewater concentrations of SARS-CoV-2 RNA were strongly associated with COVID-19 cases in schools and community positivity rates, and that influenza detections in school wastewater were significantly associated with citywide influenza diagnosis rates. Results were communicated back to schools and local communities to enable mitigation strategies to stop the spread, and direct resources such as testing and vaccination clinics. This study demonstrates that school wastewater surveillance is reflective of local infections at several population levels and plays a crucial role in the detection and mitigation of outbreaks.
Collapse
Affiliation(s)
- Madeline Wolken
- Department of Civil and Environmental Engineering, Rice University, 6100 Main Street MS-519, Houston, TX, USA; Department of Epidemiology, Human Genetics and Environmental Sciences, University of Texas Health Science Center, 1200 Pressler Street, Houston, TX, USA
| | - Thomas Sun
- Department of Statistics, Rice University, 6100 Main Street MS 138, Houston, TX, USA
| | - Camille McCall
- Department of Civil and Environmental Engineering, Rice University, 6100 Main Street MS-519, Houston, TX, USA
| | | | - Kelsey Caton
- Houston Health Department, 8000 N. Stadium Dr., Houston, TX, USA
| | - Courtney Hundley
- Houston Health Department, 8000 N. Stadium Dr., Houston, TX, USA
| | - Loren Hopkins
- Department of Statistics, Rice University, 6100 Main Street MS 138, Houston, TX, USA; Houston Health Department, 8000 N. Stadium Dr., Houston, TX, USA
| | - Katherine Ensor
- Department of Statistics, Rice University, 6100 Main Street MS 138, Houston, TX, USA
| | - Kaavya Domakonda
- Houston Health Department, 8000 N. Stadium Dr., Houston, TX, USA
| | | | - David Persse
- Houston Health Department, 8000 N. Stadium Dr., Houston, TX, USA; Department of Medicine and Surgery, Baylor College of Medicine, Houston, TX, USA; City of Houston Emergency Medical Services, Houston, TX, USA
| | - Stephen Williams
- Houston Health Department, 8000 N. Stadium Dr., Houston, TX, USA
| | - Lauren B Stadler
- Department of Civil and Environmental Engineering, Rice University, 6100 Main Street MS-519, Houston, TX, USA.
| |
Collapse
|
4
|
Mavragani A, Fragkozidis G, Zarkogianni K, Nikita KS. Long Short-term Memory-Based Prediction of the Spread of Influenza-Like Illness Leveraging Surveillance, Weather, and Twitter Data: Model Development and Validation. J Med Internet Res 2023; 25:e42519. [PMID: 36745490 PMCID: PMC9941907 DOI: 10.2196/42519] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 11/29/2022] [Accepted: 11/30/2022] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND The potential to harness the plurality of available data in real time along with advanced data analytics for the accurate prediction of influenza-like illness (ILI) outbreaks has gained significant scientific interest. Different methodologies based on the use of machine learning techniques and traditional and alternative data sources, such as ILI surveillance reports, weather reports, search engine queries, and social media, have been explored with the ultimate goal of being used in the development of electronic surveillance systems that could complement existing monitoring resources. OBJECTIVE The scope of this study was to investigate for the first time the combined use of ILI surveillance data, weather data, and Twitter data along with deep learning techniques toward the development of prediction models able to nowcast and forecast weekly ILI cases. By assessing the predictive power of both traditional and alternative data sources on the use case of ILI, this study aimed to provide a novel approach for corroborating evidence and enhancing accuracy and reliability in the surveillance of infectious diseases. METHODS The model's input space consisted of information related to weekly ILI surveillance, web-based social (eg, Twitter) behavior, and weather conditions. For the design and development of the model, relevant data corresponding to the period of 2010 to 2019 and focusing on the Greek population and weather were collected. Long short-term memory (LSTM) neural networks were leveraged to efficiently handle the sequential and nonlinear nature of the multitude of collected data. The 3 data categories were first used separately for training 3 LSTM-based primary models. Subsequently, different transfer learning (TL) approaches were explored with the aim of creating various feature spaces combining the features extracted from the corresponding primary models' LSTM layers for the latter to feed a dense layer. RESULTS The primary model that learned from weather data yielded better forecast accuracy (root mean square error [RMSE]=0.144; Pearson correlation coefficient [PCC]=0.801) than the model trained with ILI historical data (RMSE=0.159; PCC=0.794). The best performance was achieved by the TL-based model leveraging the combination of the 3 data categories (RMSE=0.128; PCC=0.822). CONCLUSIONS The superiority of the TL-based model, which considers Twitter data, weather data, and ILI surveillance data, reflects the potential of alternative public sources to enhance accurate and reliable prediction of ILI spread. Despite its focus on the use case of Greece, the proposed approach can be generalized to other locations, populations, and social media platforms to support the surveillance of infectious diseases with the ultimate goal of reinforcing preparedness for future epidemics.
Collapse
Affiliation(s)
| | - Georgios Fragkozidis
- School of Electrical and Computer Engineering, National Technical University of Athens, Zografos, Athens, Greece
| | - Konstantia Zarkogianni
- School of Electrical and Computer Engineering, National Technical University of Athens, Zografos, Athens, Greece
| | - Konstantina S Nikita
- School of Electrical and Computer Engineering, National Technical University of Athens, Zografos, Athens, Greece
| |
Collapse
|
5
|
Stolerman LM, Clemente L, Poirier C, Parag KV, Majumder A, Masyn S, Resch B, Santillana M. Using digital traces to build prospective and real-time county-level early warning systems to anticipate COVID-19 outbreaks in the United States. SCIENCE ADVANCES 2023; 9:eabq0199. [PMID: 36652520 PMCID: PMC9848273 DOI: 10.1126/sciadv.abq0199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 12/19/2022] [Indexed: 06/17/2023]
Abstract
Coronavirus disease 2019 (COVID-19) continues to affect the world, and the design of strategies to curb disease outbreaks requires close monitoring of their trajectories. We present machine learning methods that leverage internet-based digital traces to anticipate sharp increases in COVID-19 activity in U.S. counties. In a complementary direction to the efforts led by the Centers for Disease Control and Prevention (CDC), our models are designed to detect the time when an uptrend in COVID-19 activity will occur. Motivated by the need for finer spatial resolution epidemiological insights, we build upon previous efforts conceived at the state level. Our methods-tested in an out-of-sample manner, as events were unfolding, in 97 counties representative of multiple population sizes across the United States-frequently anticipated increases in COVID-19 activity 1 to 6 weeks before local outbreaks, defined when the effective reproduction number Rt becomes larger than 1 for a period of 2 weeks.
Collapse
Affiliation(s)
- Lucas M. Stolerman
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Department of Mathematics, Oklahoma State University, Stillwater, OK, USA
| | - Leonardo Clemente
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA
| | - Canelle Poirier
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Kris V. Parag
- NIHR Health Protection Research Unit, Behavioural Science and Evaluation, University of Bristol, Bristol, UK
| | | | - Serge Masyn
- Global Public Health, Janssen R&D, Beerse, Belgium
| | - Bernd Resch
- Department of Geoinformatics - Z-GIS, University of Salzburg, Salzburg, Austria
- Center for Geographic Analysis, Harvard University, Cambridge, MA, USA
| | - Mauricio Santillana
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, Boston, MA, USA
- Harvard University, T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
6
|
Jing F, Li Z, Qiao S, Zhang J, Olatosi B, Li X. Using geospatial social media data for infectious disease studies: a systematic review. INTERNATIONAL JOURNAL OF DIGITAL EARTH 2023; 16:130-157. [PMID: 37997607 PMCID: PMC10664840 DOI: 10.1080/17538947.2022.2161652] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Accepted: 12/17/2022] [Indexed: 11/25/2023]
Abstract
Geospatial social media (GSM) data has been increasingly used in public health due to its rich, timely, and accessible spatial information, particularly in infectious disease research. This review synthesized 86 research articles that use GSM data in infectious diseases published between December 2013 and March 2022. These articles cover 12 infectious disease types ranging from respiratory infectious diseases to sexually transmitted diseases with spatial levels varying from the neighborhood, county, state, and country. We categorized these studies into three major infectious disease research domains: surveillance, explanation, and prediction. With the assistance of advanced statistical and spatial methods, GSM data has been widely and deeply applied to these domains, particularly in surveillance and explanation domains. We further identified four knowledge gaps in terms of contextual information use, application scopes, spatiotemporal dimension, and data limitations and proposed innovation opportunities for future research. Our findings will contribute to a better understanding of using GSM data in infectious diseases studies and provide insights into strategies for using GSM data more effectively in future research.
Collapse
Affiliation(s)
- Fengrui Jing
- Geoinformation and Big Data Research Laboratory, Department of Geography, University of South Carolina, Columbia, SC, USA
- Big Data Health Science Center, University of South Carolina, Columbia, SC, USA
| | - Zhenlong Li
- Geoinformation and Big Data Research Laboratory, Department of Geography, University of South Carolina, Columbia, SC, USA
- Big Data Health Science Center, University of South Carolina, Columbia, SC, USA
| | - Shan Qiao
- Big Data Health Science Center, University of South Carolina, Columbia, SC, USA
- Department of Health Promotion, Education, and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| | - Jiajia Zhang
- Big Data Health Science Center, University of South Carolina, Columbia, SC, USA
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| | - Banky Olatosi
- Big Data Health Science Center, University of South Carolina, Columbia, SC, USA
- Department of Health Services Policy and Management, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| | - Xiaoming Li
- Big Data Health Science Center, University of South Carolina, Columbia, SC, USA
- Department of Health Promotion, Education, and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| |
Collapse
|
7
|
Amusa LB, Twinomurinzi H, Phalane E, Phaswana-Mafuya RN. Big data and infectious disease epidemiology: A bibliometric analysis and research agenda. Interact J Med Res 2022; 12:e42292. [PMID: 36913554 PMCID: PMC10071404 DOI: 10.2196/42292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/21/2022] [Accepted: 11/29/2022] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Infectious diseases represent a major challenge for health systems worldwide. With the recent global pandemic of COVID-19, the need to research strategies to treat these health problems has become even more pressing. Although the literature on big data and data science in health has grown rapidly, few studies have synthesized these individual studies, and none has identified the utility of big data in infectious disease surveillance and modeling. OBJECTIVE This paper aims to synthesize research and identify hotspots of big data in infectious disease epidemiology. METHODS Bibliometric data from 3054 documents that satisfied the inclusion criteria were retrieved from the Web of Science database over 22 years (2000-2022) were analyzed and reviewed. The search retrieval occurred on October 17, 2022. Bibliometric analysis was performed to illustrate the relationships between research constituents, topics, and key terms in the retrieved documents. RESULTS The bibliometric analysis revealed internet searches and social media as the most utilized big data sources for infectious disease surveillance or modeling. It also placed the US and Chinese institutions as leaders in this research area. Disease monitoring and surveillance, utility of electronic health (or medical) records, methodology framework for infodemiology tools, and machine/deep learning were identified as the core research themes. CONCLUSIONS Proposals for future studies are made based on these findings. This study will provide healthcare informatics scholars with a comprehensive understanding of big data research in infectious disease epidemiology.
Collapse
Affiliation(s)
| | | | - Edith Phalane
- University of Johannesburg, Auckland park, Johannesburg, ZA
| | | |
Collapse
|
8
|
Maaß CH. Shedding light on dark figures: Steps towards a methodology for estimating actual numbers of COVID-19 infections in Germany based on Google Trends. PLoS One 2022; 17:e0276485. [PMID: 36288363 PMCID: PMC9605024 DOI: 10.1371/journal.pone.0276485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 10/02/2022] [Indexed: 11/07/2022] Open
Abstract
In order to shed light on unmeasurable real-world phenomena, we investigate exemplarily the actual number of COVID-19 infections in Germany based on big data. The true occurrence of infections is not visible, since not every infected person is tested. This paper demonstrates that coronavirus-related search queries issued on Google can depict true infection levels appropriately. We find significant correlation between search volume and national as well as federal COVID-19 cases as reported by RKI. Additionally, we discover indications that the queries are indeed causal for infection levels. Finally, this approach can replicate varying dark figures throughout different periods of the pandemic and enables early insights into the true spread of future virus outbreaks. This is of high relevance for society in order to assess and understand the current situation during virus outbreaks and for decision-makers to take adequate and justifiable health measures.
Collapse
|
9
|
Amusa LB, Twinomurinzi H, Okonkwo CW. Modeling COVID-19 incidence with Google Trends. Front Res Metr Anal 2022; 7:1003972. [PMID: 36186843 PMCID: PMC9520600 DOI: 10.3389/frma.2022.1003972] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 08/30/2022] [Indexed: 11/13/2022] Open
Abstract
Infodemiologic methods could be used to enhance modeling infectious diseases. It is of interest to verify the utility of these methods using a Nigerian case study. We used Google Trends data to track COVID-19 incidences and assessed whether they could complement traditional data based solely on reported case numbers. Data on the Nigerian weekly COVID-19 cases spanning through March 1, 2020, to May 31, 2021, were matched with internet search data from Google Trends. The reported weekly incidence numbers and the GT data were split into training and testing sets. ARIMA models were fitted to describe reported weekly COVID cases using the training set. Several COVID-related search terms were theoretically and empirically assessed for initial screening. The utilized Google Trends (GT) variable was added to the ARIMA model as a regressor. Model forecasts, both with and without GTD, were compared with weekly cases in the test set over 13 weeks. Forecast accuracies were compared visually and using RMSE (root mean square error) and MAE (mean average error). Statistical significance of the difference in predictions was determined with the two-sided Diebold-Mariano test. Preliminary results of contemporaneous correlations between COVID-related search terms and weekly COVID cases reveal “loss of smell,” “loss of taste,” “fever” (in order of magnitude) as significantly associated with the official cases. Predictions of the ARIMA model using solely reported case numbers resulted in an RMSE (root mean squared error) of 411.4 and mean absolute error (MAE) of 354.9. The GT expanded model achieved better forecasting accuracy (RMSE: 388.7 and MAE = 340.1). Corrected Akaike Information Criteria also favored the GT expanded model (869.4 vs. 872.2). The difference in predictive performances was significant when using a two-sided Diebold-Mariano test (DM = 6.75, p < 0.001) for the 13 weeks. Google trends data enhanced the predictive ability of a traditionally based model and should be considered a suitable method to enhance infectious disease modeling.
Collapse
|
10
|
Sumner SA, Bowen D, Holland K, Zwald ML, Vivolo-Kantor A, Guy GP, Heuett WJ, Pressley DP, Jones CM. Estimating Weekly National Opioid Overdose Deaths in Near Real Time Using Multiple Proxy Data Sources. JAMA Netw Open 2022; 5:e2223033. [PMID: 35862045 PMCID: PMC9305381 DOI: 10.1001/jamanetworkopen.2022.23033] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
IMPORTANCE Opioid overdose is a leading public health problem in the United States; however, national data on overdose deaths are delayed by several months or more. OBJECTIVES To build and validate a statistical model for estimating national opioid overdose deaths in near real time. DESIGN, SETTING, AND PARTICIPANTS In this cross-sectional study, signals from 5 overdose-related, proxy data sources encompassing health, law enforcement, and online data from 2014 to 2019 in the US were combined using a LASSO (least absolute shrinkage and selection operator) regression model, and weekly predictions of opioid overdose deaths were made for 2018 and 2019 to validate model performance. Results were also compared with those from a baseline SARIMA (seasonal autoregressive integrated moving average) model, one of the most used approaches to forecasting injury mortality. EXPOSURES Time series data from 2014 to 2019 on emergency department visits for opioid overdose from the National Syndromic Surveillance Program, data on the volume of heroin and synthetic opioids circulating in illicit markets via the National Forensic Laboratory Information System, data on the search volume for heroin and synthetic opioids on Google, and data on post volume on heroin and synthetic opioids on Twitter and Reddit were used to train and validate prediction models of opioid overdose deaths. MAIN OUTCOMES AND MEASURES Model-based predictions of weekly opioid overdose deaths in the United States were made for 2018 and 2019 and compared with actual observed opioid overdose deaths from the National Vital Statistics System. RESULTS Statistical models using the 5 real-time proxy data sources estimated the national opioid overdose death rate for 2018 and 2019 with an error of 1.01% and -1.05%, respectively. When considering the accuracy of weekly predictions, the machine learning-based approach possessed a mean error in its weekly estimates (root mean squared error) of 60.3 overdose deaths for 2018 (compared with 310.2 overdose deaths for the SARIMA model) and 67.2 overdose deaths for 2019 (compared with 83.3 overdose deaths for the SARIMA model). CONCLUSIONS AND RELEVANCE Results of this serial cross-sectional study suggest that proxy administrative data sources can be used to estimate national opioid overdose mortality trends to provide a more timely understanding of this public health problem.
Collapse
Affiliation(s)
- Steven A. Sumner
- National Center for Injury Prevention and Control, US Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Daniel Bowen
- Division of Violence Prevention, US Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Kristin Holland
- Division of Overdose Prevention, US Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Marissa L. Zwald
- Division of Violence Prevention, US Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Alana Vivolo-Kantor
- Division of Overdose Prevention, US Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Gery P. Guy
- Division of Overdose Prevention, US Centers for Disease Control and Prevention, Atlanta, Georgia
| | - William J. Heuett
- Diversion Control Division, US Drug Enforcement Administration, Springfield, Virginia
| | - DeMia P. Pressley
- Diversion Control Division, US Drug Enforcement Administration, Springfield, Virginia
| | - Christopher M. Jones
- National Center for Injury Prevention and Control, US Centers for Disease Control and Prevention, Atlanta, Georgia
| |
Collapse
|
11
|
Khakimova A, Abdollahi L, Zolotarev O, Rahim F. Global interest in vaccines during the COVID-19 pandemic: Evidence from Google Trends. Vaccine X 2022; 10:100152. [PMID: 35291263 PMCID: PMC8915451 DOI: 10.1016/j.jvacx.2022.100152] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 10/23/2021] [Accepted: 02/21/2022] [Indexed: 12/16/2022] Open
Abstract
COVID-19 (coronavirus disease 2019) vaccines have become available; now, everyone has the opportunity to get vaccinated. We used Google Trends (GT) data to assess the global public interest in COVID-19 vaccines during the pandemic. For the analysis, a period of 17 months was chosen (from Jan 19, 2020, to Jul 04, 2021). Interest in user queries was tracked by keywords (corona vaccine, COVID-19 vaccine development, Sputnik v, Pfizer vaccine, AstraZeneca vaccine, etc.). The geographic analysis of queries was also carried out. The interest of users in the vaccine is significantly increasing. It is focused on the side effects of vaccines, and users pay attention to vaccines' developers from different countries. The correlation between the scientific publications devoted to vaccine development and such requests of users on the internet is absent. This study shows that internet search patterns can be used to gauge public attitudes towards coronavirus vaccination. Safety concerns consistently high follow an interest in vaccine side effects. This data can be used to track and predict attitudes towards vaccination of populations from COVID-19 in different countries before global vaccination becomes available to help mitigate the adverse effects of the pandemic.
Collapse
Affiliation(s)
- Aida Khakimova
- Department of Development of Scientific and Innovation Activities, Russian New University, Moscow, Russia
| | - Leila Abdollahi
- Department of Medical Library and Information Scince, School of Health Managment and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
| | - Oleg Zolotarev
- Department of Information Systems in Economics and Management, Russian New University, Moscow, Russia
| | - Fakher Rahim
- Metabolomics and Genomics Research Center, Tehran University of Medical Sciences, Tehran, Iran
- Health Research Institute, Thalassemia and Hemoglobinopathy Research Centre, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| |
Collapse
|
12
|
Oto OA, Kardeş S, Guller N, Safak S, Dirim AB, Başhan Y, Demir E, Artan AS, Yazıcı H, Turkmen A. Impact of the COVID-19 pandemic on interest in renal diseases. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:711-718. [PMID: 34341920 PMCID: PMC8328136 DOI: 10.1007/s11356-021-15675-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 07/23/2021] [Indexed: 06/13/2023]
Abstract
There is an information gap about the public's interest in nephrological diseases in the COVID-19 era. The objective was to identify public interest in kidney diseases during the pandemic. In this infodemiology study, Google Trends was queried for a total of 50 search queries corresponding to a broad spectrum of nephrological diseases and the term "nephrologist." Two time intervals of 2020 (March 15-July 4 and July 5-October 31) were compared to similar time intervals of 2016-2019 for providing information on interest in different phases of the pandemic. Compared to the prior 4 years, analyses showed significant decreases in relative search volume (RSV) in the majority (76%) of search queries on March 15-July 4, 2020 period. However, RSV of the majority of search queries (≈70%) on July 5-October 31, 2020 period was not significantly different from similar periods of the previous 4 years, with an increase in search terms of amyloidosis, kidney biopsy, hematuria, chronic kidney disease, hypertension, nephrolithiasis, acute kidney injury, and Fabry disease. During the early pandemic, there have been significant decreases in search volumes for many nephrological diseases. However, this trend reversed in the period from July 5 to October 31, 2020, implying the increased need for information on kidney diseases. The results of this study enable us to understand how COVID-19 impacted the interest in kidney diseases and demands/needs for kidney diseases by the general public during the pandemic.
Collapse
Affiliation(s)
- Ozgur Akin Oto
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey.
| | - Sinan Kardeş
- Department of Medical Ecology and Hydroclimatology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| | - Nurane Guller
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| | - Seda Safak
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| | - Ahmet Burak Dirim
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| | - Yağmur Başhan
- Department of Nephrology, Haseki Education Research Hospital, Istanbul, Turkey
| | - Erol Demir
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| | - Ayse Serra Artan
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| | - Halil Yazıcı
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| | - Aydın Turkmen
- Department of Nephrology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey
| |
Collapse
|
13
|
Trevino J, Malik S, Schmidt M. Integrating Google Trends Search Engine Query Data Into Adult Emergency Department Volume Forecasting: Infodemiology Study. JMIR INFODEMIOLOGY 2022; 2:e32386. [PMID: 37113800 PMCID: PMC10014085 DOI: 10.2196/32386] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 10/05/2021] [Accepted: 12/07/2021] [Indexed: 04/29/2023]
Abstract
Background The search for health information from web-based resources raises opportunities to inform the service operations of health care systems. Google Trends search query data have been used to study public health topics, such as seasonal influenza, suicide, and prescription drug abuse; however, there is a paucity of literature using Google Trends data to improve emergency department patient-volume forecasting. Objective We assessed the ability of Google Trends search query data to improve the performance of adult emergency department daily volume prediction models. Methods Google Trends search query data related to chief complaints and health care facilities were collected from Chicago, Illinois (July 2015 to June 2017). We calculated correlations between Google Trends search query data and emergency department daily patient volumes from a tertiary care adult hospital in Chicago. A baseline multiple linear regression model of emergency department daily volume with traditional predictors was augmented with Google Trends search query data; model performance was measured using mean absolute error and mean absolute percentage error. Results There were substantial correlations between emergency department daily volume and Google Trends "hospital" (r=0.54), combined terms (r=0.50), and "Northwestern Memorial Hospital" (r=0.34) search query data. The final Google Trends data-augmented model included the predictors Combined 3-day moving average and Hospital 3-day moving average and performed better (mean absolute percentage error 6.42%) than the final baseline model (mean absolute percentage error 6.67%)-an improvement of 3.1%. Conclusions The incorporation of Google Trends search query data into an adult tertiary care hospital emergency department daily volume prediction model modestly improved model performance. Further development of advanced models with comprehensive search query terms and complementary data sources may improve prediction performance and could be an avenue for further research.
Collapse
Affiliation(s)
- Jesus Trevino
- Department of Emergency Medicine The George Washington University School of Medicine & Health Sciences Washington, DC United States
| | - Sanjeev Malik
- Department of Emergency Medicine Northwestern University Feinberg School of Medicine Chicago, IL United States
| | - Michael Schmidt
- Department of Emergency Medicine Northwestern University Feinberg School of Medicine Chicago, IL United States
| |
Collapse
|
14
|
Würschinger Q. Social Networks of Lexical Innovation. Investigating the Social Dynamics of Diffusion of Neologisms on Twitter. Front Artif Intell 2021; 4:648583. [PMID: 34790894 PMCID: PMC8591557 DOI: 10.3389/frai.2021.648583] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Accepted: 07/13/2021] [Indexed: 11/13/2022] Open
Abstract
Societies continually evolve and speakers use new words to talk about innovative products and practices. While most lexical innovations soon fall into disuse, others spread successfully and become part of the lexicon. In this paper, I conduct a longitudinal study of the spread of 99 English neologisms on Twitter to study their degrees and pathways of diffusion. Previous work on lexical innovation has almost exclusively relied on usage frequency for investigating the spread of new words. To get a more differentiated picture of diffusion, I use frequency-based measures to study temporal aspects of diffusion and I use network analyses for a more detailed and accurate investigation of the sociolinguistic dynamics of diffusion. The results show that frequency measures manage to capture diffusion with varying success. Frequency counts can serve as an approximate indicator for overall degrees of diffusion, yet they miss important information about the temporal usage profiles of lexical innovations. The results indicate that neologisms with similar total frequency can exhibit significantly different degrees of diffusion. Analysing differences in their temporal dynamics of use with regard to their age, trends in usage intensity, and volatility contributes to a more accurate account of their diffusion. The results obtained from the social network analysis reveal substantial differences in the social pathways of diffusion. Social diffusion significantly correlates with the frequency and temporal usage profiles of neologisms. However, the network visualisations and metrics identify neologisms whose degrees of social diffusion are more limited than suggested by their overall frequency of use. These include, among others, highly volatile neologisms (e.g., poppygate) and political terms (e.g., alt-left), whose use almost exclusively goes back to single communities of closely-connected, like-minded individuals. I argue that the inclusion of temporal and social information is of particular importance for the study of lexical innovation since neologisms exhibit high degrees of temporal volatility and social indexicality. More generally, the present approach demonstrates the potential of social network analysis for sociolinguistic research on linguistic innovation, variation, and change.
Collapse
|
15
|
Kiang MV, Chen JT, Krieger N, Buckee CO, Alexander MJ, Baker JT, Buckner RL, Coombs G, Rich-Edwards JW, Carlson KW, Onnela JP. Sociodemographic characteristics of missing data in digital phenotyping. Sci Rep 2021; 11:15408. [PMID: 34326370 PMCID: PMC8322366 DOI: 10.1038/s41598-021-94516-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 07/12/2021] [Indexed: 11/09/2022] Open
Abstract
The ubiquity of smartphones, with their increasingly sophisticated array of sensors, presents an unprecedented opportunity for researchers to collect longitudinal, diverse, temporally-dense data about human behavior while minimizing participant burden. Researchers increasingly make use of smartphones for "digital phenotyping," the collection and analysis of raw phone sensor and log data to study the lived experiences of subjects in their natural environments using their own devices. While digital phenotyping has shown promise in fields such as psychiatry and neuroscience, there are fundamental gaps in our knowledge about data collection and non-collection (i.e., missing data) in smartphone-based digital phenotyping. In this meta-study using individual-level data from six different studies, we examined accelerometer and GPS sensor data of 211 participants, amounting to 29,500 person-days of observation, using Bayesian hierarchical negative binomial regression with study- and user-level random intercepts. Sensitivity analyses including alternative model specification and stratified models were conducted. We found that iOS users had lower GPS non-collection than Android users. For GPS data, rates of non-collection did not differ by race/ethnicity, education, age, or gender. For accelerometer data, Black participants had higher rates of non-collection, but rates did not differ by sex, education, or age. For both sensors, non-collection increased by 0.5% to 0.9% per week. These results demonstrate the feasibility of using smartphone-based digital phenotyping across diverse populations, for extended periods of time, and within diverse cohorts. As smartphones become increasingly embedded in everyday life, the insights of this study will help guide the design, planning, and analysis of digital phenotyping studies.
Collapse
Affiliation(s)
- Mathew V Kiang
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
| | - Jarvis T Chen
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Nancy Krieger
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Caroline O Buckee
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Monica J Alexander
- Department of Sociology, University of Toronto, Toronto, ON, Canada
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Justin T Baker
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Institute for Technology in Psychiatry, McLean Hospital, Belmont, MA, USA
| | - Randy L Buckner
- Department of Psychology, Harvard University, Cambridge, MA, USA
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Department of Radiology, Massachusetts General Hospital, Boston, MA, USA
| | - Garth Coombs
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Janet W Rich-Edwards
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Division of Women's Health, Department of Medicine, Brigham and Women's Hospital and Harvard Medical, Boston, MA, USA
| | - Kenzie W Carlson
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jukka-Pekka Onnela
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
16
|
Turtle J, Riley P, Ben-Nun M, Riley S. Accurate influenza forecasts using type-specific incidence data for small geographic units. PLoS Comput Biol 2021; 17:e1009230. [PMID: 34324487 PMCID: PMC8354478 DOI: 10.1371/journal.pcbi.1009230] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 08/10/2021] [Accepted: 06/30/2021] [Indexed: 11/24/2022] Open
Abstract
Influenza incidence forecasting is used to facilitate better health system planning and could potentially be used to allow at-risk individuals to modify their behavior during a severe seasonal influenza epidemic or a novel respiratory pandemic. For example, the US Centers for Disease Control and Prevention (CDC) runs an annual competition to forecast influenza-like illness (ILI) at the regional and national levels in the US, based on a standard discretized incidence scale. Here, we use a suite of forecasting models to analyze type-specific incidence at the smaller spatial scale of clusters of nearby counties. We used data from point-of-care (POC) diagnostic machines over three seasons, in 10 clusters, capturing: 57 counties; 1,061,891 total specimens; and 173,909 specimens positive for Influenza A. Total specimens were closely correlated with comparable CDC ILI data. Mechanistic models were substantially more accurate when forecasting influenza A positive POC data than total specimen POC data, especially at longer lead times. Also, models that fit subpopulations of the cluster (individual counties) separately were better able to forecast clusters than were models that directly fit to aggregated cluster data. Public health authorities may wish to consider developing forecasting pipelines for type-specific POC data in addition to ILI data. Simple mechanistic models will likely improve forecast accuracy when applied at small spatial scales to pathogen-specific data before being scaled to larger geographical units and broader syndromic data. Highly local forecasts may enable new public health messaging to encourage at-risk individuals to temporarily reduce their social mixing during seasonal peaks and guide public health intervention policy during potentially severe novel influenza pandemics.
Collapse
Affiliation(s)
- James Turtle
- Infectious Disease Group, Predictive Science Inc., San Diego, California, United States
- * E-mail:
| | - Pete Riley
- Infectious Disease Group, Predictive Science Inc., San Diego, California, United States
| | - Michal Ben-Nun
- Infectious Disease Group, Predictive Science Inc., San Diego, California, United States
| | - Steven Riley
- Infectious Disease Group, Predictive Science Inc., San Diego, California, United States
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| |
Collapse
|
17
|
Miliou I, Xiong X, Rinzivillo S, Zhang Q, Rossetti G, Giannotti F, Pedreschi D, Vespignani A. Predicting seasonal influenza using supermarket retail records. PLoS Comput Biol 2021; 17:e1009087. [PMID: 34252075 PMCID: PMC8297944 DOI: 10.1371/journal.pcbi.1009087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 07/22/2021] [Accepted: 05/15/2021] [Indexed: 11/19/2022] Open
Abstract
Increased availability of epidemiological data, novel digital data streams, and the rise of powerful machine learning approaches have generated a surge of research activity on real-time epidemic forecast systems. In this paper, we propose the use of a novel data source, namely retail market data to improve seasonal influenza forecasting. Specifically, we consider supermarket retail data as a proxy signal for influenza, through the identification of sentinel baskets, i.e., products bought together by a population of selected customers. We develop a nowcasting and forecasting framework that provides estimates for influenza incidence in Italy up to 4 weeks ahead. We make use of the Support Vector Regression (SVR) model to produce the predictions of seasonal flu incidence. Our predictions outperform both a baseline autoregressive model and a second baseline based on product purchases. The results show quantitatively the value of incorporating retail market data in forecasting models, acting as a proxy that can be used for the real-time analysis of epidemics.
Collapse
Affiliation(s)
- Ioanna Miliou
- University of Pisa, Pisa, Italy
- ISTI-CNR, Pisa, Italy
| | - Xinyue Xiong
- Northeastern University, Boston, Massachusetts, United States of America
| | | | - Qian Zhang
- Northeastern University, Boston, Massachusetts, United States of America
| | | | | | | | | |
Collapse
|
18
|
Abstract
Infectious disease control critically depends on surveillance and predictive modeling of outbreaks. We argue that routine mobile-phone use can provide a source of infectious disease information via the measurements of behavioral changes in call-detail records (CDRs) collected for billing. In anonymous CDR metadata linked with individual health information from the A(H1N1)pdm09 outbreak in Iceland, we observe that people moved significantly less and placed fewer, but longer, calls in the few days around diagnosis than normal. These results suggest that disease-transmission models should explicitly consider behavior changes during outbreaks and advance mobile-phone traces as a potential universal data source for such efforts. Epidemic preparedness depends on our ability to predict the trajectory of an epidemic and the human behavior that drives spread in the event of an outbreak. Changes to behavior during an outbreak limit the reliability of syndromic surveillance using large-scale data sources, such as online social media or search behavior, which could otherwise supplement healthcare-based outbreak-prediction methods. Here, we measure behavior change reflected in mobile-phone call-detail records (CDRs), a source of passively collected real-time behavioral information, using an anonymously linked dataset of cell-phone users and their date of influenza-like illness diagnosis during the 2009 H1N1v pandemic. We demonstrate that mobile-phone use during illness differs measurably from routine behavior: Diagnosed individuals exhibit less movement than normal (1.1 to 1.4 fewer unique tower locations; P<3.2×10−3), on average, in the 2 to 4 d around diagnosis and place fewer calls (2.3 to 3.3 fewer calls; P<5.6×10−4) while spending longer on the phone (41- to 66-s average increase; P<4.6×10−10) than usual on the day following diagnosis. The results suggest that anonymously linked CDRs and health data may be sufficiently granular to augment epidemic surveillance efforts and that infectious disease-modeling efforts lacking explicit behavior-change mechanisms need to be revisited.
Collapse
|
19
|
Agarwal A, Uniyal D, Toshniwal D, Deb D. Dense Vector Embedding Based Approach to Identify Prominent Disseminators From Twitter Data Amid COVID-19 Outbreak. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2021. [DOI: 10.1109/tetci.2021.3067661] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
20
|
Aiken EL, Nguyen AT, Viboud C, Santillana M. Toward the use of neural networks for influenza prediction at multiple spatial resolutions. SCIENCE ADVANCES 2021; 7:7/25/eabb1237. [PMID: 34134985 PMCID: PMC8208709 DOI: 10.1126/sciadv.abb1237] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 04/29/2021] [Indexed: 05/24/2023]
Abstract
Mitigating the effects of disease outbreaks with timely and effective interventions requires accurate real-time surveillance and forecasting of disease activity, but traditional health care-based surveillance systems are limited by inherent reporting delays. Machine learning methods have the potential to fill this temporal "data gap," but work to date in this area has focused on relatively simple methods and coarse geographic resolutions (state level and above). We evaluate the predictive performance of a gated recurrent unit neural network approach in comparison with baseline machine learning methods for estimating influenza activity in the United States at the state and city levels and experiment with the inclusion of real-time Internet search data. We find that the neural network approach improves upon baseline models for long time horizons of prediction but is not improved by real-time internet search data. We conduct a thorough analysis of feature importances in all considered models for interpretability purposes.
Collapse
Affiliation(s)
- Emily L Aiken
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA.
| | - Andre T Nguyen
- Booz Allen Hamilton, Columbia, MD 21044, USA
- University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| | - Cecile Viboud
- Fogarty International Center, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mauricio Santillana
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA.
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02215, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA 02215, USA
| |
Collapse
|
21
|
Kardeş S. Public interest in spa therapy during the COVID-19 pandemic: analysis of Google Trends data among Turkey. INTERNATIONAL JOURNAL OF BIOMETEOROLOGY 2021; 65:945-950. [PMID: 33442780 PMCID: PMC7805426 DOI: 10.1007/s00484-021-02077-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Revised: 12/28/2020] [Accepted: 01/05/2021] [Indexed: 05/03/2023]
Abstract
In Turkey, spas are widely used and preferred by patients who are seeking relief from their disability and pain. The spa therapy program is partly reimbursed by the national health insurance system. The objective of the present study was to leverage Google Trends to elucidate the public interest in spas in Turkey during the COVID-19 pandemic. Google Trends was queried to analyze search trends within Turkey for the Turkish term representing a spa (i.e., kaplıca) from January 01, 2016, to September 30, 2020. The relative search volume of "kaplıca" was statistically significantly decreased in the March 15-May 30, 2020 (- 73.04%; p < 0.001); May 31-July 25, 2020 (- 41.38%; p < 0.001); and July 26-September 19, 2020 (- 29.98%; p < 0.001) periods compared to similar periods of preceding 4 years (2016-2019). After June 1, 2020, the relative search volume was shown to have a moderate recovery, without reaching the level of 2016-2019. Public interest in spas showed an initial sharp decline between mid-March and May, with a moderate increase during the June-August period. This finding might be indicative of public preference in undertaking spa therapy during the COVID-19 period. In Turkey, spas might be used to increase places providing rehabilitation for both non-COVID-19 patients and survivors of COVID-19 with long-term symptoms during the pandemic.
Collapse
Affiliation(s)
- Sinan Kardeş
- Department of Medical Ecology and Hydroclimatology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Turkey.
| |
Collapse
|
22
|
Runkle JD, Sugg MM, Graham G, Hodge B, March T, Mullendore J, Tove F, Salyers M, Valeika S, Vaughan E. Participatory COVID-19 Surveillance Tool in Rural Appalachia : Real-Time Disease Monitoring and Regional Response. Public Health Rep 2021; 136:327-337. [PMID: 33601984 PMCID: PMC8580398 DOI: 10.1177/0033354921990372] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/06/2021] [Indexed: 01/19/2023] Open
Abstract
INTRODUCTION Few US studies have examined the usefulness of participatory surveillance during the coronavirus disease 2019 (COVID-19) pandemic for enhancing local health response efforts, particularly in rural settings. We report on the development and implementation of an internet-based COVID-19 participatory surveillance tool in rural Appalachia. METHODS A regional collaboration among public health partners culminated in the design and implementation of the COVID-19 Self-Checker, a local online symptom tracker. The tool collected data on participant demographic characteristics and health history. County residents were then invited to take part in an automated daily electronic follow-up to monitor symptom progression, assess barriers to care and testing, and collect data on COVID-19 test results and symptom resolution. RESULTS Nearly 6500 county residents visited and 1755 residents completed the COVID-19 Self-Checker from April 30 through June 9, 2020. Of the 579 residents who reported severe or mild COVID-19 symptoms, COVID-19 symptoms were primarily reported among women (n = 408, 70.5%), adults with preexisting health conditions (n = 246, 70.5%), adults aged 18-44 (n = 301, 52.0%), and users who reported not having a health care provider (n = 131, 22.6%). Initial findings showed underrepresentation of some racial/ethnic and non-English-speaking groups. PRACTICAL IMPLICATIONS This low-cost internet-based platform provided a flexible means to collect participatory surveillance data on local changes in COVID-19 symptoms and adapt to guidance. Data from this tool can be used to monitor the efficacy of public health response measures at the local level in rural Appalachia.
Collapse
Affiliation(s)
- Jennifer D. Runkle
- North Carolina Institute for Climate Studies, North Carolina State University, Asheville, NC, USA
| | - Maggie M. Sugg
- Department of Geography and Planning, Appalachian State University, Boone, NC, USA
| | - Garrett Graham
- North Carolina Institute for Climate Studies, North Carolina State University, Asheville, NC, USA
| | - Bryan Hodge
- Mountain Area Health Education, Asheville, NC, USA
| | - Terri March
- Hendersonville Family Medicine Residency, Mountain Area Health Education, Asheville, NC, USA
| | | | - Fletcher Tove
- Buncombe County Health and Human Services, Asheville, NC, USA
| | - Martha Salyers
- Public Health and Human Services Division, Eastern Band of the Cherokee Indians, Cherokee, NC, USA
| | - Steve Valeika
- Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Ellis Vaughan
- Buncombe County Health and Human Services, Asheville, NC, USA
| |
Collapse
|
23
|
Nsoesie EO, Oladeji O, Abah ASA, Ndeffo-Mbah ML. Forecasting influenza-like illness trends in Cameroon using Google Search Data. Sci Rep 2021; 11:6713. [PMID: 33762599 PMCID: PMC7991669 DOI: 10.1038/s41598-021-85987-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Accepted: 03/09/2021] [Indexed: 12/13/2022] Open
Abstract
Although acute respiratory infections are a leading cause of mortality in sub-Saharan Africa, surveillance of diseases such as influenza is mostly neglected. Evaluating the usefulness of influenza-like illness (ILI) surveillance systems and developing approaches for forecasting future trends is important for pandemic preparedness. We applied and compared a range of robust statistical and machine learning models including random forest (RF) regression, support vector machines (SVM) regression, multivariable linear regression and ARIMA models to forecast 2012 to 2018 trends of reported ILI cases in Cameroon, using Google searches for influenza symptoms, treatments, natural or traditional remedies as well as, infectious diseases with a high burden (i.e., AIDS, malaria, tuberculosis). The R2 and RMSE (Root Mean Squared Error) were statistically similar across most of the methods, however, RF and SVM had the highest average R2 (0.78 and 0.88, respectively) for predicting ILI per 100,000 persons at the country level. This study demonstrates the need for developing contextualized approaches when using digital data for disease surveillance and the usefulness of search data for monitoring ILI in sub-Saharan African countries.
Collapse
Affiliation(s)
- Elaine O Nsoesie
- Department of Global Health, Boston University School of Public Health, 801 Massachusetts Ave, Crosstown Center 3rd Floor, Boston, MA, 02119, USA.
| | - Olubusola Oladeji
- Department of Global Health, Boston University School of Public Health, 801 Massachusetts Ave, Crosstown Center 3rd Floor, Boston, MA, 02119, USA
| | - Aristide S Abah Abah
- Department of Epidemiological Surveillance, Ministry of Health, Yaoundé, Cameroon
| | - Martial L Ndeffo-Mbah
- Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A & M University, Texas, USA
| |
Collapse
|
24
|
Kogan NE, Clemente L, Liautaud P, Kaashoek J, Link NB, Nguyen AT, Lu FS, Huybers P, Resch B, Havas C, Petutschnig A, Davis J, Chinazzi M, Mustafa B, Hanage WP, Vespignani A, Santillana M. An early warning approach to monitor COVID-19 activity with multiple digital traces in near real time. SCIENCE ADVANCES 2021; 7:eabd6989. [PMID: 33674304 PMCID: PMC7935356 DOI: 10.1126/sciadv.abd6989] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 01/19/2021] [Indexed: 05/18/2023]
Abstract
Given still-high levels of coronavirus disease 2019 (COVID-19) susceptibility and inconsistent transmission-containing strategies, outbreaks have continued to emerge across the United States. Until effective vaccines are widely deployed, curbing COVID-19 will require carefully timed nonpharmaceutical interventions (NPIs). A COVID-19 early warning system is vital for this. Here, we evaluate digital data streams as early indicators of state-level COVID-19 activity from 1 March to 30 September 2020. We observe that increases in digital data stream activity anticipate increases in confirmed cases and deaths by 2 to 3 weeks. Confirmed cases and deaths also decrease 2 to 4 weeks after NPI implementation, as measured by anonymized, phone-derived human mobility data. We propose a means of harmonizing these data streams to identify future COVID-19 outbreaks. Our results suggest that combining disparate health and behavioral data may help identify disease activity changes weeks before observation using traditional epidemiological monitoring.
Collapse
Affiliation(s)
- Nicole E Kogan
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Leonardo Clemente
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.
| | - Parker Liautaud
- Department of Earth and Planetary Sciences, Harvard University, Cambridge, MA, USA.
| | - Justin Kaashoek
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Nicholas B Link
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Andre T Nguyen
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- University of Maryland, Baltimore County, Baltimore, MD, USA
- Booz Allen Hamilton, Columbia, MD, USA
| | - Fred S Lu
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Peter Huybers
- Department of Earth and Planetary Sciences, Harvard University, Cambridge, MA, USA
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Bernd Resch
- Department of Geoinformatics - Z_GIS, University of Salzburg, Salzburg, Austria
- Center for Geographic Analysis, Harvard University, Cambridge, MA, USA
| | - Clemens Havas
- Department of Geoinformatics - Z_GIS, University of Salzburg, Salzburg, Austria
| | - Andreas Petutschnig
- Department of Geoinformatics - Z_GIS, University of Salzburg, Salzburg, Austria
| | | | | | - Backtosch Mustafa
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - William P Hanage
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
25
|
Kardeş S, Kuzu AS, Raiker R, Pakhchanian H, Karagülle M. Public interest in rheumatic diseases and rheumatologist in the United States during the COVID-19 pandemic: evidence from Google Trends. Rheumatol Int 2021; 41:329-334. [PMID: 33070255 PMCID: PMC7568841 DOI: 10.1007/s00296-020-04728-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 10/08/2020] [Indexed: 12/13/2022]
Abstract
To evaluate the public interest in rheumatic diseases during the coronavirus disease 2019 (COVID-19) pandemic. Google Trends was queried to analyze search trends in the United States for numerous rheumatic diseases and also the interest in a rheumatologist. Three 8-week periods in 2020 ((March 15-May 9), (May 10-July 4), and (July 5-August 29)) were compared to similar periods of the prior 4 years (2016-2019). Compared to a similar time period between 2016 and 2019, a significant decrease was found in the relative search volume for more than half of the search terms during the initial March 15-May 9, 2020 period. However, this trend appeared to reverse during the July 5-August 29, 2020 period where the relative volume for nearly half of the search terms were not statistically significant compared to similar periods of the prior 4 years. In addition, this period showed a significant increase in relative volume for the terms: Axial spondyloarthritis, ankylosing spondylitis, psoriatic arthritis, rheumatoid arthritis, Sjögren's syndrome, antiphospholipid syndrome, scleroderma, Kawasaki disease, Anti-Neutrophil Cytoplasmic Antibody (ANCA)-associated vasculitis, and rheumatologist. There was a significant decrease in relative search volume for many rheumatic diseases between March 15 and May 9, 2020 when compared to similar periods during the prior 4 years. However, the trends reversed after the initial period ended. There was an increase in relative search for the term "rheumatologist" between July and August 2020 suggesting the need for rheumatologists during the COVID-19 pandemic. Policymakers and healthcare providers should address the informational demands on rheumatic diseases and needs for rheumatologists by the general public during pandemics like COVID-19.
Collapse
Affiliation(s)
- Sinan Kardeş
- Department of Medical Ecology and Hydroclimatology, Istanbul Faculty of Medicine, Istanbul University, Capa-Fatih, 34093 Istanbul, Turkey
| | - Ali Suat Kuzu
- Department of Medical Ecology and Hydroclimatology, Istanbul Faculty of Medicine, Istanbul University, Capa-Fatih, 34093 Istanbul, Turkey
| | - Rahul Raiker
- West Virginia University School of Medicine, Morgantown, WV USA
| | - Haig Pakhchanian
- George Washington University School of Medicine & Health Science, Washington, DC USA
| | - Mine Karagülle
- Department of Medical Ecology and Hydroclimatology, Istanbul Faculty of Medicine, Istanbul University, Capa-Fatih, 34093 Istanbul, Turkey
| |
Collapse
|
26
|
Masthi R, Jahan A, Bharathi D, Abhilash P, Kaniyarakkal V, Tv S, Gowda G, Ts R, Goud R, Rao S, Hegde A. Postcode based participatory disease surveillance systems : a comparison with traditional risk-based surveillance and its application in the COVID-19 pandemic. JMIR Public Health Surveill 2021. [PMID: 33481758 DOI: 10.2196/20746] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Background: The SARS-Cov-2 infection has rapidly saturated health systems and traditional surveillance networks are finding hard to keep pace with its spread. We designed a participatory disease surveillance (PDS) system, to capture symptoms of Influenza-like illness (ILI) to estimate SARS-CoV-2 infection in the community. While data generated by these platforms can help public health organisations find community hotspots and effectively direct control measures, it has never been compared to traditional systems. OBJECTIVE Methods and Objectives: A completely anonymised web based PDS system, www.trackcovid-19.org was developed. We evaluated the symptomatic responses received form the PDS system to the traditional risk based surveillance carried out by the Bruhat Bengaluru Mahanagara Palike over a period of 45 days in the South Indian city of Bengaluru. METHODS Methods and Objectives: A completely anonymised web based PDS system, www.trackcovid-19.org was developed. We evaluated the symptomatic responses received form the PDS system to the traditional risk based surveillance carried out by the Bruhat Bengaluru Mahanagara Palike over a period of 45 days in the South Indian city of Bengaluru. RESULTS Results: The PDS system recorded 11062 entries from 106 Postal codes. A healthy response was obtained from 10863 users while 199 (1.8%) reported symptomatic. Subgroup analysis of a 14 day symptomatic window recorded 33 (0.29%) responses. Risk based surveillance was carried out covering a population of 605,284 with 209 (0.03%) individuals identified symptomatic. CONCLUSIONS Conclusion: Web PDS platforms provide better visualisation of community infection when compared to traditional risk based surveillance systems. They are extremely useful by providing real time information in the extended battle against this pandemic. When integrated into national disease surveillance systems, they can provide long term community surveillance adding an important cost-effective layer to already available data sources.
Collapse
Affiliation(s)
- Ramesh Masthi
- Kempegowda Institute of Medical Sciences, Bangalore, IN
| | - Afraz Jahan
- Kempegowda Institute of Medical Sciences, Bangalore, IN
| | | | | | | | - Sanjay Tv
- Kempegowda Institute of Medical Sciences, Bangalore, IN
| | | | - Ranganath Ts
- Bangalore Medical College & Research Institute, Bangalore, IN
| | | | | | - Ajay Hegde
- Trackcovid-19.org, 349, 4th Main, Sadashivananagr, Bangalore, IN
| |
Collapse
|
27
|
Tseng VS, Jia-Ching Ying J, Wong ST, Cook DJ, Liu J. Computational Intelligence Techniques for Combating COVID-19: A Survey. IEEE COMPUT INTELL M 2020. [DOI: 10.1109/mci.2020.3019873] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
28
|
Koehlmoos TP, Janvrin ML, Korona-Bailey J, Madsen C, Sturdivant R. COVID-19 Self-Reported Symptom Tracking Programs in the United States: Framework Synthesis. J Med Internet Res 2020; 22:e23297. [PMID: 33006943 PMCID: PMC7584449 DOI: 10.2196/23297] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 09/02/2020] [Accepted: 09/14/2020] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND With the continued spread of COVID-19 in the United States, identifying potential outbreaks before infected individuals cross the clinical threshold is key to allowing public health officials time to ensure local health care institutions are adequately prepared. In response to this need, researchers have developed participatory surveillance technologies that allow individuals to report emerging symptoms daily so that their data can be extrapolated and disseminated to local health care authorities. OBJECTIVE This study uses a framework synthesis to evaluate existing self-reported symptom tracking programs in the United States for COVID-19 as an early-warning tool for probable clusters of infection. This in turn will inform decision makers and health care planners about these technologies and the usefulness of their information to aid in federal, state, and local efforts to mobilize effective current and future pandemic responses. METHODS Programs were identified through keyword searches and snowball sampling, then screened for inclusion. A best fit framework was constructed for all programs that met the inclusion criteria by collating information collected from each into a table for easy comparison. RESULTS We screened 8 programs; 6 were included in our final framework synthesis. We identified multiple common data elements, including demographic information like race, age, gender, and affiliation (all were associated with universities, medical schools, or schools of public health). Dissimilarities included collection of data regarding smoking status, mental well-being, and suspected exposure to COVID-19. CONCLUSIONS Several programs currently exist that track COVID-19 symptoms from participants on a semiregular basis. Coordination between symptom tracking program research teams and local and state authorities is currently lacking, presenting an opportunity for collaboration to avoid duplication of efforts and more comprehensive knowledge dissemination.
Collapse
Affiliation(s)
| | - Miranda Lynn Janvrin
- Uniformed Services University, Bethesda, MD, United States.,Health Services Research Program, Henry M Jackson Foundation for the Advancement of Military Medicine, Bethesda, MD, United States
| | - Jessica Korona-Bailey
- Uniformed Services University, Bethesda, MD, United States.,Health Services Research Program, Henry M Jackson Foundation for the Advancement of Military Medicine, Bethesda, MD, United States
| | - Cathaleen Madsen
- Uniformed Services University, Bethesda, MD, United States.,Health Services Research Program, Henry M Jackson Foundation for the Advancement of Military Medicine, Bethesda, MD, United States
| | - Rodney Sturdivant
- Uniformed Services University, Bethesda, MD, United States.,Health Services Research Program, Henry M Jackson Foundation for the Advancement of Military Medicine, Bethesda, MD, United States
| |
Collapse
|
29
|
Cheng HY, Wu YC, Lin MH, Liu YL, Tsai YY, Wu JH, Pan KH, Ke CJ, Chen CM, Liu DP, Lin IF, Chuang JH. Applying Machine Learning Models with An Ensemble Approach for Accurate Real-Time Influenza Forecasting in Taiwan: Development and Validation Study. J Med Internet Res 2020; 22:e15394. [PMID: 32755888 PMCID: PMC7439145 DOI: 10.2196/15394] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 12/21/2019] [Accepted: 06/13/2020] [Indexed: 12/14/2022] Open
Abstract
Background Changeful seasonal influenza activity in subtropical areas such as Taiwan causes problems in epidemic preparedness. The Taiwan Centers for Disease Control has maintained real-time national influenza surveillance systems since 2004. Except for timely monitoring, epidemic forecasting using the national influenza surveillance data can provide pivotal information for public health response. Objective We aimed to develop predictive models using machine learning to provide real-time influenza-like illness forecasts. Methods Using surveillance data of influenza-like illness visits from emergency departments (from the Real-Time Outbreak and Disease Surveillance System), outpatient departments (from the National Health Insurance database), and the records of patients with severe influenza with complications (from the National Notifiable Disease Surveillance System), we developed 4 machine learning models (autoregressive integrated moving average, random forest, support vector regression, and extreme gradient boosting) to produce weekly influenza-like illness predictions for a given week and 3 subsequent weeks. We established a framework of the machine learning models and used an ensemble approach called stacking to integrate these predictions. We trained the models using historical data from 2008-2014. We evaluated their predictive ability during 2015-2017 for each of the 4-week time periods using Pearson correlation, mean absolute percentage error (MAPE), and hit rate of trend prediction. A dashboard website was built to visualize the forecasts, and the results of real-world implementation of this forecasting framework in 2018 were evaluated using the same metrics. Results All models could accurately predict the timing and magnitudes of the seasonal peaks in the then-current week (nowcast) (ρ=0.802-0.965; MAPE: 5.2%-9.2%; hit rate: 0.577-0.756), 1-week (ρ=0.803-0.918; MAPE: 8.3%-11.8%; hit rate: 0.643-0.747), 2-week (ρ=0.783-0.867; MAPE: 10.1%-15.3%; hit rate: 0.669-0.734), and 3-week forecasts (ρ=0.676-0.801; MAPE: 12.0%-18.9%; hit rate: 0.643-0.786), especially the ensemble model. In real-world implementation in 2018, the forecasting performance was still accurate in nowcasts (ρ=0.875-0.969; MAPE: 5.3%-8.0%; hit rate: 0.582-0.782) and remained satisfactory in 3-week forecasts (ρ=0.721-0.908; MAPE: 7.6%-13.5%; hit rate: 0.596-0.904). Conclusions This machine learning and ensemble approach can make accurate, real-time influenza-like illness forecasts for a 4-week period, and thus, facilitate decision making.
Collapse
Affiliation(s)
| | | | - Min-Hau Lin
- Taiwan Centers for Disease Control, Taipei, Taiwan
| | - Yu-Lun Liu
- Taiwan Centers for Disease Control, Taipei, Taiwan
| | | | - Jo-Hua Wu
- Value Lab, Acer Inc., Taipei, Taiwan
| | | | - Chih-Jung Ke
- Taiwan Centers for Disease Control, Taipei, Taiwan
| | | | - Ding-Ping Liu
- Taiwan Centers for Disease Control, Taipei, Taiwan.,National Taipei University of Nursing and Health Sciences, Taipei, Taiwan
| | - I-Feng Lin
- Institute of Public Health, National Yang-Ming University, Taipei, Taiwan
| | - Jen-Hsiang Chuang
- Taiwan Centers for Disease Control, Taipei, Taiwan.,Institute of Public Health, National Yang-Ming University, Taipei, Taiwan
| |
Collapse
|
30
|
Aiken EL, McGough SF, Majumder MS, Wachtel G, Nguyen AT, Viboud C, Santillana M. Real-time estimation of disease activity in emerging outbreaks using internet search information. PLoS Comput Biol 2020; 16:e1008117. [PMID: 32804932 PMCID: PMC7451983 DOI: 10.1371/journal.pcbi.1008117] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 08/27/2020] [Accepted: 07/01/2020] [Indexed: 11/18/2022] Open
Abstract
Understanding the behavior of emerging disease outbreaks in, or ahead of, real-time could help healthcare officials better design interventions to mitigate impacts on affected populations. Most healthcare-based disease surveillance systems, however, have significant inherent reporting delays due to data collection, aggregation, and distribution processes. Recent work has shown that machine learning methods leveraging a combination of traditionally collected epidemiological information and novel Internet-based data sources, such as disease-related Internet search activity, can produce meaningful "nowcasts" of disease incidence ahead of healthcare-based estimates, with most successful case studies focusing on endemic and seasonal diseases such as influenza and dengue. Here, we apply similar computational methods to emerging outbreaks in geographic regions where no historical presence of the disease of interest has been observed. By combining limited available historical epidemiological data available with disease-related Internet search activity, we retrospectively estimate disease activity in five recent outbreaks weeks ahead of traditional surveillance methods. We find that the proposed computational methods frequently provide useful real-time incidence estimates that can help fill temporal data gaps resulting from surveillance reporting delays. However, the proposed methods are limited by issues of sample bias and skew in search query volumes, perhaps as a result of media coverage.
Collapse
Affiliation(s)
- Emily L. Aiken
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sarah F. McGough
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Maimuna S. Majumder
- Department of Healthcare Policy, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Gal Wachtel
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
| | - Andre T. Nguyen
- Booz Allen Hamilton, Columbia, Maryland, United States of America
- University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
| | - Cecile Viboud
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Mauricio Santillana
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
31
|
Syamsuddin M, Fakhruddin M, Sahetapy-Engel JTM, Soewono E. Causality Analysis of Google Trends and Dengue Incidence in Bandung, Indonesia With Linkage of Digital Data Modeling: Longitudinal Observational Study. J Med Internet Res 2020; 22:e17633. [PMID: 32706682 PMCID: PMC7414412 DOI: 10.2196/17633] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 05/03/2020] [Accepted: 05/20/2020] [Indexed: 01/18/2023] Open
Abstract
Background The popularity of dengue can be inferred from Google Trends that summarizes Google searches of related topics. Both the disease and its Google Trends have a similar source of causation in the dengue virus, leading us to hypothesize that dengue incidence and Google Trends results have a long-run equilibrium. Objective This research aimed to investigate the properties of this long-run equilibrium in the hope of using the information derived from Google Trends for the early detection of upcoming dengue outbreaks. Methods This research used the cointegration method to assess a long-run equilibrium between dengue incidence and Google Trends results. The long-run equilibrium was characterized by their linear combination that generated a stationary process. The Dickey-Fuller test was adopted to check the stationarity of the processes. An error correction model (ECM) was then adopted to measure deviations from the long-run equilibrium to examine the short-term and long-term effects. The resulting models were used to determine the Granger causality between the two processes. Additional information about the two processes was obtained by examining the impulse response function and variance decomposition. Results The Dickey-Fuller test supported an implicit null hypothesis that the dengue incidence and Google Trends results are nonstationary processes (P=.01). A further test showed that the processes were cointegrated (P=.01), indicating that their particular linear combination is a stationary process. These results permitted us to construct ECMs. The model showed the direction of causality of the two processes, indicating that Google Trends results will Granger-cause dengue incidence (not in the reverse order). Conclusions Various hypothesis testing results in this research concluded that Google Trends results can be used as an initial indicator of upcoming dengue outbreaks.
Collapse
Affiliation(s)
- Muhammad Syamsuddin
- Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung, Bandung, Indonesia
| | - Muhammad Fakhruddin
- Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung, Bandung, Indonesia
| | | | - Edy Soewono
- Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung, Bandung, Indonesia
| |
Collapse
|
32
|
Edo-Osagie O, De La Iglesia B, Lake I, Edeghere O. A scoping review of the use of Twitter for public health research. Comput Biol Med 2020; 122:103770. [PMID: 32502758 PMCID: PMC7229729 DOI: 10.1016/j.compbiomed.2020.103770] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 04/01/2020] [Accepted: 04/17/2020] [Indexed: 11/25/2022]
Abstract
Public health practitioners and researchers have used traditional medical databases to study and understand public health for a long time. Recently, social media data, particularly Twitter, has seen some use for public health purposes. Every large technological development in history has had an impact on the behaviour of society. The advent of the internet and social media is no different. Social media creates public streams of communication, and scientists are starting to understand that such data can provide some level of access into the people's opinions and situations. As such, this paper aims to review and synthesize the literature on Twitter applications for public health, highlighting current research and products in practice. A scoping review methodology was employed and four leading health, computer science and cross-disciplinary databases were searched. A total of 755 articles were retreived, 92 of which met the criteria for review. From the reviewed literature, six domains for the application of Twitter to public health were identified: (i) Surveillance; (ii) Event Detection; (iii) Pharmacovigilance; (iv) Forecasting; (v) Disease Tracking; and (vi) Geographic Identification. From our review, we were able to obtain a clear picture of the use of Twitter for public health. We gained insights into interesting observations such as how the popularity of different domains changed with time, the diseases and conditions studied and the different approaches to understanding each disease, which algorithms and techniques were popular with each domain, and more.
Collapse
Affiliation(s)
- Oduwa Edo-Osagie
- School of Computing Science, University of East Anglia, Norwich, NR4 7TJ, UK.
| | | | - Iain Lake
- School of Environmental Science, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Obaghe Edeghere
- National Infection Service, Public Health England, Birmingham, B3 2PW, UK
| |
Collapse
|
33
|
Scarpino SV, Scott JG, Eggo RM, Clements B, Dimitrov NB, Meyers LA. Socioeconomic bias in influenza surveillance. PLoS Comput Biol 2020; 16:e1007941. [PMID: 32644990 PMCID: PMC7347107 DOI: 10.1371/journal.pcbi.1007941] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 05/11/2020] [Indexed: 11/18/2022] Open
Abstract
Individuals in low socioeconomic brackets are considered at-risk for developing influenza-related complications and often exhibit higher than average influenza-related hospitalization rates. This disparity has been attributed to various factors, including restricted access to preventative and therapeutic health care, limited sick leave, and household structure. Adequate influenza surveillance in these at-risk populations is a critical precursor to accurate risk assessments and effective intervention. However, the United States of America's primary national influenza surveillance system (ILINet) monitors outpatient healthcare providers, which may be largely inaccessible to lower socioeconomic populations. Recent initiatives to incorporate Internet-source and hospital electronic medical records data into surveillance systems seek to improve the timeliness, coverage, and accuracy of outbreak detection and situational awareness. Here, we use a flexible statistical framework for integrating multiple surveillance data sources to evaluate the adequacy of traditional (ILINet) and next generation (BioSense 2.0 and Google Flu Trends) data for situational awareness of influenza across poverty levels. We find that ZIP Codes in the highest poverty quartile are a critical vulnerability for ILINet that the integration of next generation data fails to ameliorate.
Collapse
Affiliation(s)
- Samuel V. Scarpino
- Network Science Institute, Northeastern University, Boston, Massachusetts, United States of America
- Marine & Environmental Sciences, Northeastern University, Boston, Massachusetts, United States of America
- Physics, Northeastern University, Boston, Massachusetts, United States of America
- Health Sciences, Northeastern University, Boston, Massachusetts, United States of America
- ISI Foundation, Turin, Italy
| | - James G. Scott
- Department of Statistics and Data Sciences, The University of Texas at Austin, Austin, Texas, United States of America
| | - Rosalind M. Eggo
- Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Bruce Clements
- Pediatric Healthcare Connection, Austin, Texas, United States of America
| | - Nedialko B. Dimitrov
- Department of Operations Research, The University of Texas at Austin, Austin, Texas, United States of America
| | - Lauren Ancel Meyers
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Santa Fe Institute, Santa Fe, New Mexico, United States of America
| |
Collapse
|
34
|
Hegde A, Masthi R, Krishnappa D. Hyperlocal Postcode Based Crowdsourced Surveillance Systems in the COVID-19 Pandemic Response. Front Public Health 2020; 8:286. [PMID: 32582620 PMCID: PMC7296149 DOI: 10.3389/fpubh.2020.00286] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Accepted: 06/01/2020] [Indexed: 11/18/2022] Open
Abstract
The SARS-CoV-2 pandemic has rapidly saturated healthcare resources across the globe and has led to a restricted screening process, hindering efforts at comprehensive case detection. This has not only facilitated community spread but has also resulted in an underestimation of the true incidence of disease, a statistic which is useful for policy making aimed at controlling the current pandemic and in preparing for future outbreaks. In this perspective, we present a crowdsourced platform developed by us for the true estimation of all SARS-CoV-2 infections in the community, through active self-reporting and layering other authentic datasets. The granularity of data captured by this system could prove to be useful in assisting governments to identify SARS-CoV-2 hotspots in the community facilitating lifting of restrictions in a controlled fashion.
Collapse
Affiliation(s)
- Ajay Hegde
- Senior Clinical Fellow, Queen Elizabeth University Hospital, Glasgow, United Kingdom
| | - Ramesh Masthi
- Professor Head of Community Medicine and Public Health, Kempegowda Institute of Medical Sciences, Bangalore, India
| | - Darshan Krishnappa
- Department of Medicine, University of Minnesota Medical School, Minneapolis, MN, United States
| |
Collapse
|
35
|
Mavragani A. Infodemiology and Infoveillance: Scoping Review. J Med Internet Res 2020; 22:e16206. [PMID: 32310818 PMCID: PMC7189791 DOI: 10.2196/16206] [Citation(s) in RCA: 111] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 02/05/2020] [Accepted: 02/08/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Web-based sources are increasingly employed in the analysis, detection, and forecasting of diseases and epidemics, and in predicting human behavior toward several health topics. This use of the internet has come to be known as infodemiology, a concept introduced by Gunther Eysenbach. Infodemiology and infoveillance studies use web-based data and have become an integral part of health informatics research over the past decade. OBJECTIVE The aim of this paper is to provide a scoping review of the state-of-the-art in infodemiology along with the background and history of the concept, to identify sources and health categories and topics, to elaborate on the validity of the employed methods, and to discuss the gaps identified in current research. METHODS The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines were followed to extract the publications that fall under the umbrella of infodemiology and infoveillance from the JMIR, PubMed, and Scopus databases. A total of 338 documents were extracted for assessment. RESULTS Of the 338 studies, the vast majority (n=282, 83.4%) were published with JMIR Publications. The Journal of Medical Internet Research features almost half of the publications (n=168, 49.7%), and JMIR Public Health and Surveillance has more than one-fifth of the examined studies (n=74, 21.9%). The interest in the subject has been increasing every year, with 2018 featuring more than one-fourth of the total publications (n=89, 26.3%), and the publications in 2017 and 2018 combined accounted for more than half (n=171, 50.6%) of the total number of publications in the last decade. The most popular source was Twitter with 45.0% (n=152), followed by Google with 24.6% (n=83), websites and platforms with 13.9% (n=47), blogs and forums with 10.1% (n=34), Facebook with 8.9% (n=30), and other search engines with 5.6% (n=19). As for the subjects examined, conditions and diseases with 17.2% (n=58) and epidemics and outbreaks with 15.7% (n=53) were the most popular categories identified in this review, followed by health care (n=39, 11.5%), drugs (n=40, 10.4%), and smoking and alcohol (n=29, 8.6%). CONCLUSIONS The field of infodemiology is becoming increasingly popular, employing innovative methods and approaches for health assessment. The use of web-based sources, which provide us with information that would not be accessible otherwise and tackles the issues arising from the time-consuming traditional methods, shows that infodemiology plays an important role in health informatics research.
Collapse
Affiliation(s)
- Amaryllis Mavragani
- Department of Computing Science and Mathematics, Faculty of Natural Sciences, University of Stirling, Stirling, United Kingdom
| |
Collapse
|
36
|
Leal Neto O, Cruz O, Albuquerque J, Nacarato de Sousa M, Smolinski M, Pessoa Cesse EÂ, Libel M, Vieira de Souza W. Participatory Surveillance Based on Crowdsourcing During the Rio 2016 Olympic Games Using the Guardians of Health Platform: Descriptive Study. JMIR Public Health Surveill 2020; 6:e16119. [PMID: 32254042 PMCID: PMC7175192 DOI: 10.2196/16119] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 12/06/2019] [Accepted: 01/27/2020] [Indexed: 12/01/2022] Open
Abstract
Background With the evolution of digital media, areas such as public health are adding new platforms to complement traditional systems of epidemiological surveillance. Participatory surveillance and digital epidemiology have become innovative tools for the construction of epidemiological landscapes with citizens’ participation, improving traditional sources of information. Strategies such as these promote the timely detection of warning signs for outbreaks and epidemics in the region. Objective This study aims to describe the participatory surveillance platform Guardians of Health, which was used in a project conducted during the 2016 Olympic and Paralympic Games in Rio de Janeiro, Brazil, and officially used by the Brazilian Ministry of Health for the monitoring of outbreaks and epidemics. Methods This is a descriptive study carried out using secondary data from Guardians of Health available in a public digital repository. Based on syndromic signals, the information subsidy for decision making by policy makers and health managers becomes more dynamic and assertive. This type of information source can be used as an early route to understand the epidemiological scenario. Results The main result of this research was demonstrating the use of the participatory surveillance platform as an additional source of information for the epidemiological surveillance performed in Brazil during a mass gathering. The platform Guardians of Health had 7848 users who generated 12,746 reports about their health status. Among these reports, the following were identified: 161 users with diarrheal syndrome, 68 users with respiratory syndrome, and 145 users with rash syndrome. Conclusions It is hoped that epidemiological surveillance professionals, researchers, managers, and workers become aware of, and allow themselves to use, new tools that improve information management for decision making and knowledge production. This way, we may follow the path for a more intelligent, efficient, and pragmatic disease control system.
Collapse
Affiliation(s)
- Onicio Leal Neto
- University of Zurich, Zurich, Switzerland.,Epitrack, Recife, Brazil
| | - Oswaldo Cruz
- Scientific Computation Program, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Jones Albuquerque
- Epitrack, Recife, Brazil.,Immunopathology Lab Keizo Asami, Recife, Brazil
| | | | | | | | - Marlo Libel
- Ending Pandemics, San Francisco, CA, United States
| | | |
Collapse
|
37
|
Bowen DA, Wang J, Holland K, Bartholow B, Sumner SA. Conversational topics of social media messages associated with state-level mental distress rates. J Ment Health 2020; 29:234-241. [DOI: 10.1080/09638237.2020.1739251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Daniel A. Bowen
- Division of Violence Prevention, National Center for Injury Prevention and Control, U.S. Centers for Disease Control and Prevention (CDC), Atlanta, GA, USA
| | - Jing Wang
- Division of Violence Prevention, National Center for Injury Prevention and Control, U.S. Centers for Disease Control and Prevention (CDC), Atlanta, GA, USA
| | - Kristin Holland
- Division of Violence Prevention, National Center for Injury Prevention and Control, U.S. Centers for Disease Control and Prevention (CDC), Atlanta, GA, USA
| | - Brad Bartholow
- Division of Violence Prevention, National Center for Injury Prevention and Control, U.S. Centers for Disease Control and Prevention (CDC), Atlanta, GA, USA
| | - Steven A. Sumner
- Office of Strategy and Innovation, National Center for Injury Prevention and Control, U.S. Centers for Disease Control and Prevention (CDC), Atlanta, GA, USA
| |
Collapse
|
38
|
Viboud C, Santillana M. Fitbit-informed influenza forecasts. Lancet Digit Health 2020; 2:e54-e55. [DOI: 10.1016/s2589-7500(19)30241-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 12/30/2019] [Indexed: 10/25/2022]
|
39
|
Darwish A, Rahhal Y, Jafar A. A comparative study on predicting influenza outbreaks using different feature spaces: application of influenza-like illness data from Early Warning Alert and Response System in Syria. BMC Res Notes 2020; 13:33. [PMID: 31948473 PMCID: PMC6964210 DOI: 10.1186/s13104-020-4889-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Accepted: 01/03/2020] [Indexed: 11/10/2022] Open
Abstract
Objective An accurate forecasting of outbreaks of influenza-like illness (ILI) could support public health officials to suggest public health actions earlier. We investigated the performance of three different feature spaces in different models to forecast the weekly ILI rate in Syria using EWARS data from World Health Organization (WHO). Time series feature space was first used and we applied the seven models which are Naïve, Average, Seasonal naïve, drift, dynamic harmonic regression (Dhr), seasonal and trend decomposition using loess (STL) and TBATS. The Second feature space is like some state-of-the-art, which we named \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$53-weeks-before\_52-first-order-difference$$\end{document}53-weeks-before_52-first-order-difference feature space. The third one, we proposed and named \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$n-years-before\_m-weeks-around$$\end{document}n-years-before_m-weeks-around (YnWm) feature space. Machine learning (ML) and deep learning (DL) model were applied to the second and third feature spaces (generalized linear model (GLM), support vector regression (SVR), gradient boosting (GB), random forest (RF) and long short term memory (LSTM)). Results It was indicated that the LSTM model of four layers with \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$1-year-before\_4-weeks-around$$\end{document}1-year-before_4-weeks-around feature space gave more accurate results than other models and reached the lowest MAPE of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$3.52\%$$\end{document}3.52% and the lowest RMSE of 0.01662. I hope that this modelling methodology can be applied in other countries and therefore help prevent and control influenza worldwide.
Collapse
Affiliation(s)
- Ali Darwish
- Department of Informatics, Higher Institute for Applied Sciences and Technology, Damascus, Syria.
| | - Yasser Rahhal
- Department of Informatics, Higher Institute for Applied Sciences and Technology, Damascus, Syria
| | - Assef Jafar
- Department of Informatics, Higher Institute for Applied Sciences and Technology, Damascus, Syria
| |
Collapse
|
40
|
Samaras L, García-Barriocanal E, Sicilia MA. Syndromic surveillance using web data: a systematic review. INNOVATION IN HEALTH INFORMATICS 2020. [PMCID: PMC7153324 DOI: 10.1016/b978-0-12-819043-2.00002-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
During the recent years, a lot of debate is taken place about the evolution of Smart Healthcare systems. Particularly, how these systems can help people improve human conditions of health, by taking advantages of the new Information and Communication Technologies (ICT), regarding early prediction and efficient treatment. The purpose of this study is to provide a systematic review of the current literature available that focuses on information systems on syndromic surveillance using web data. All published items concern articles, books, reviews, reports, conference announcements, and dissertations. We used a variation of PRISMA Statements methodology to conduct a systematic review. The review identifies the relevant published papers from the year 2004 to 2018, systematically includes and explores them to extract similarities, gaps, and conclusions on the research that has been done so far. The results presented concern the year, the examined disease, the web data source, the geographic location/country, and the data analysis method used. The results show that influenza is the most examined infectious disease. The internet tools most used are Twitter and Google. Regarding the geographical areas explored in the published papers, the most examined country is the United States, since many scientists come from this country. There is a significant growth of articles since 2009. There are also various statistical methods used to correlate the data retrieved from the internet to the data from national authorities. The conclusion of all researches is that the Web can be a useful tool for the detection of serious epidemics and for a creation of a syndromic surveillance system using the Web, since we can predict epidemics from web data before they are officially detected in population. With the advance of ICT, Smart Healthcare can benefit from the monitoring of epidemics and the early prediction of such a system, improving national or international health strategies and policy decision. This can be achieved through the provision of new technology tools to enhance health monitoring systems toward the new innovations of Smart Health or eHealth, even with the emerging technologies of Internet of Things. The challenges and impacts of an electronic system based on internet data include the social, medical, and technological disciplines. These can be further extended to Smart Healthcare, as the data streaming can provide with real-time information, awareness on epidemics and alerts for both patients or medical scientists. Finally, these new systems can help improve the standards of human life.
Collapse
|
41
|
Lutz CS, Huynh MP, Schroeder M, Anyatonwu S, Dahlgren FS, Danyluk G, Fernandez D, Greene SK, Kipshidze N, Liu L, Mgbere O, McHugh LA, Myers JF, Siniscalchi A, Sullivan AD, West N, Johansson MA, Biggerstaff M. Applying infectious disease forecasting to public health: a path forward using influenza forecasting examples. BMC Public Health 2019; 19:1659. [PMID: 31823751 PMCID: PMC6902553 DOI: 10.1186/s12889-019-7966-8] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Accepted: 11/19/2019] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Infectious disease forecasting aims to predict characteristics of both seasonal epidemics and future pandemics. Accurate and timely infectious disease forecasts could aid public health responses by informing key preparation and mitigation efforts. MAIN BODY For forecasts to be fully integrated into public health decision-making, federal, state, and local officials must understand how forecasts were made, how to interpret forecasts, and how well the forecasts have performed in the past. Since the 2013-14 influenza season, the Influenza Division at the Centers for Disease Control and Prevention (CDC) has hosted collaborative challenges to forecast the timing, intensity, and short-term trajectory of influenza-like illness in the United States. Additional efforts to advance forecasting science have included influenza initiatives focused on state-level and hospitalization forecasts, as well as other infectious diseases. Using CDC influenza forecasting challenges as an example, this paper provides an overview of infectious disease forecasting; applications of forecasting to public health; and current work to develop best practices for forecast methodology, applications, and communication. CONCLUSIONS These efforts, along with other infectious disease forecasting initiatives, can foster the continued advancement of forecasting science.
Collapse
Affiliation(s)
- Chelsea S Lutz
- Influenza Division, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, 30329, USA.
- Oak Ridge Institute for Science and Education, United States Department of Energy, Oak Ridge, TN, 37830, USA.
- Department of International Health, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, 21205, USA.
| | - Mimi P Huynh
- Infectious Disease Program, Council of State and Territorial Epidemiologists, Atlanta, GA, 30345, USA
| | - Monica Schroeder
- Infectious Disease Program, Council of State and Territorial Epidemiologists, Atlanta, GA, 30345, USA
| | - Sophia Anyatonwu
- PHI/CDC Global Health Fellowship Program, Public Health Institute, Oakland, CA, 94607, USA
| | - F Scott Dahlgren
- Influenza Division, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, 30329, USA
| | - Gregory Danyluk
- Florida Department of Health in Polk County, Bartow, FL, 33830, USA
| | - Danielle Fernandez
- Epidemiology, Disease Control, and Immunization Services, Florida Department of Health in Miami-Dade County, Miami, FL, 33126, USA
| | - Sharon K Greene
- Bureau of Communicable Disease, New York City Department of Health and Mental Hygiene, Queens, New York, NY, 11101, USA
| | | | - Leann Liu
- Office of Science, Surveillance, and Technology, Harris County Public Health, Houston, TX, 77027, USA
| | - Osaro Mgbere
- Disease Prevention and Control Division, Houston Health Department, Houston, TX, 77054, USA
| | - Lisa A McHugh
- Communicable Disease Service, New Jersey Department of Health, Trenton, NJ, 08608, USA
| | - Jennifer F Myers
- Infectious Diseases Branch, California Department of Public Health, Richmond, CA, 94804, USA
| | - Alan Siniscalchi
- Infectious Disease Section, Epidemiology & Emerging Infections Program, State of Connecticut Department of Health, Hartford, CT, 06134, USA
| | - Amy D Sullivan
- Division of Prevention and Community Health, Washington State Department of Health, Olympia, WA, 98504, USA
| | - Nicole West
- Acute and Communicable Disease Prevention, Oregon Health Authority, Portland, OR, 97232, USA
| | - Michael A Johansson
- Division of Vector-Borne Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, San Juan, PR, 00920, USA
| | - Matthew Biggerstaff
- Influenza Division, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, 30329, USA
| |
Collapse
|
42
|
Tideman S, Santillana M, Bickel J, Reis B. Internet search query data improve forecasts of daily emergency department volume. J Am Med Inform Assoc 2019; 26:1574-1583. [PMID: 31730701 PMCID: PMC7647136 DOI: 10.1093/jamia/ocz154] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 07/25/2019] [Accepted: 08/06/2019] [Indexed: 11/15/2022] Open
Abstract
OBJECTIVE Emergency departments (EDs) are increasingly overcrowded. Forecasting patient visit volume is challenging. Reliable and accurate forecasting strategies may help improve resource allocation and mitigate the effects of overcrowding. Patterns related to weather, day of the week, season, and holidays have been previously used to forecast ED visits. Internet search activity has proven useful for predicting disease trends and offers a new opportunity to improve ED visit forecasting. This study tests whether Google search data and relevant statistical methods can improve the accuracy of ED volume forecasting compared with traditional data sources. MATERIALS AND METHODS Seven years of historical daily ED arrivals were collected from Boston Children's Hospital. We used data from the public school calendar, National Oceanic and Atmospheric Administration, and Google Trends. Multiple linear models using LASSO (least absolute shrinkage and selection operator) for variable selection were created. The models were trained on 5 years of data and out-of-sample accuracy was judged using multiple error metrics on the final 2 years. RESULTS All data sources added complementary predictive power. Our baseline day-of-the-week model recorded average percent errors of 10.99%. Autoregressive terms, calendar and weather data reduced errors to 7.71%. Search volume data reduced errors to 7.58% theoretically preventing 4 improperly staffed days. DISCUSSION The predictive power provided by the search volume data may stem from the ability to capture population-level interaction with events, such as winter storms and infectious diseases, that traditional data sources alone miss. CONCLUSIONS This study demonstrates that search volume data can meaningfully improve forecasting of ED visit volume and could help improve quality and reduce cost.
Collapse
Affiliation(s)
- Sam Tideman
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Jonathan Bickel
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Ben Reis
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
- Predictive Medicine Group, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
43
|
Rangarajan P, Mody SK, Marathe M. Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data. PLoS Comput Biol 2019; 15:e1007518. [PMID: 31751346 PMCID: PMC6894887 DOI: 10.1371/journal.pcbi.1007518] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 12/05/2019] [Accepted: 10/29/2019] [Indexed: 12/20/2022] Open
Abstract
Dengue and influenza-like illness (ILI) are two of the leading causes of viral infection in the world and it is estimated that more than half the world’s population is at risk for developing these infections. It is therefore important to develop accurate methods for forecasting dengue and ILI incidences. Since data from multiple sources (such as dengue and ILI case counts, electronic health records and frequency of multiple internet search terms from Google Trends) can improve forecasts, standard time series analysis methods are inadequate to estimate all the parameter values from the limited amount of data available if we use multiple sources. In this paper, we use a computationally efficient implementation of the known variable selection method that we call the Autoregressive Likelihood Ratio (ARLR) method. This method combines sparse representation of time series data, electronic health records data (for ILI) and Google Trends data to forecast dengue and ILI incidences. This sparse representation method uses an algorithm that maximizes an appropriate likelihood ratio at every step. Using numerical experiments, we demonstrate that our method recovers the underlying sparse model much more accurately than the lasso method. We apply our method to dengue case count data from five countries/states: Brazil, Mexico, Singapore, Taiwan, and Thailand and to ILI case count data from the United States. Numerical experiments show that our method outperforms existing time series forecasting methods in forecasting the dengue and ILI case counts. In particular, our method gives a 18 percent forecast error reduction over a leading method that also uses data from multiple sources. It also performs better than other methods in predicting the peak value of the case count and the peak time. Dengue and influenza-like illness (ILI) are leading causes of viral infection in the world and hence it is important to develop accurate methods for forecasting their incidence. We use Autoregressive Likelihood Ratio method, which is a computationally efficient implementation of the variable selection method, in order to obtain a sparse (non-lasso) representation of time series, Google Trends and electronic health records (for ILI) data. This method is used to forecast dengue incidence in five countries/states and ILI incidence in USA. We show that this method outperforms existing time series methods in forecasting these diseases. The method is general and can also be used to forecast other diseases.
Collapse
Affiliation(s)
- Prashant Rangarajan
- Departments of Computer Science and Mathematics, Birla Institute of Technology and Science, Pilani, India
| | - Sandeep K. Mody
- Department of Mathematics, Indian Institute of Science, Bangalore, India
| | - Madhav Marathe
- Department of Computer Science, Network, Simulation Science and Advanced Computing Division, Biocomplexity Institute, University of Virginia, Charlottesville, Virginia, United States of America
- * E-mail:
| |
Collapse
|
44
|
Estimating influenza incidence using search query deceptiveness and generalized ridge regression. PLoS Comput Biol 2019; 15:e1007165. [PMID: 31574086 PMCID: PMC6771994 DOI: 10.1371/journal.pcbi.1007165] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 05/31/2019] [Indexed: 11/22/2022] Open
Abstract
Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically-selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additional data to estimate feature deceptiveness. By doing so, we may gain one more step along the path to accurate, reliable disease incidence estimation using internet data. This capability would improve public health by decreasing the cost and increasing the timeliness of such estimates. While often considered a minor infection, seasonal flu kills many thousands of people every year and sickens millions more. The more accurate and up-to-date public health officials’ view of what the seasonal outbreak is, the more effectively the outbreak can be addressed. Currently, this knowledge is based on collating information on patients who enter the health care system. This approach is accurate, but it’s also expensive and slow. Researchers hope that new approaches based on examining what people do and share on the internet may work more cheaply and quickly. Some internet activity, however, has a history of correspondence with disease activity, but this relationship is coincidental rather than informative. For example, some prior work has found a correspondence between zombie-related social media messages and the flu season, so one could plausibly build accurate flu estimates using such messages that are then fooled by the appearance of a new zombie movie. We tested flu estimation models that incorporate information about this risk of deception, finding that knowledge of deceptiveness does indeed produce more accurate estimates; we also identified a method to estimate deceptiveness. Our results suggest that estimation models used in practice should use information about both how inputs maps to disease activity and also what the potential of each input to be deceptive is. This may get us one step closer to accurate, reliable disease estimates based on internet data, which would improve public health by making those estimates faster and cheaper.
Collapse
|
45
|
Baltrusaitis K, Vespignani A, Rosenfeld R, Gray J, Raymond D, Santillana M. Differences in Regional Patterns of Influenza Activity Across Surveillance Systems in the United States: Comparative Evaluation. JMIR Public Health Surveill 2019; 5:e13403. [PMID: 31579019 PMCID: PMC6777281 DOI: 10.2196/13403] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 07/02/2019] [Accepted: 07/19/2019] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND The Centers for Disease Control and Prevention (CDC) tracks influenza-like illness (ILI) using information on patient visits to health care providers through the Outpatient Influenza-like Illness Surveillance Network (ILINet). As participation in this system is voluntary, the composition, coverage, and consistency of health care reports vary from state to state, leading to different measures of ILI activity between regions. The degree to which these measures reflect actual differences in influenza activity or systematic differences in the methods used to collect and aggregate the data is unclear. OBJECTIVE The objective of our study was to qualitatively and quantitatively compare national and region-specific ILI activity in the United States across 4 surveillance data sources-CDC ILINet, Flu Near You (FNY), athenahealth, and HealthTweets.org-to determine whether these data sources, commonly used as input in influenza modeling efforts, show geographical patterns that are similar to those observed in CDC ILINet's data. We also compared the yearly percentage of FNY participants who sought health care for ILI symptoms across geographical areas. METHODS We compared the national and regional 2018-2019 ILI activity baselines, calculated using noninfluenza weeks from previous years, for each surveillance data source. We also compared measures of ILI activity across geographical areas during 3 influenza seasons, 2015-2016, 2016-2017, and 2017-2018. Geographical differences in weekly ILI activity within each data source were also assessed using relative mean differences and time series heatmaps. National and regional age-adjusted health care-seeking percentages were calculated for each influenza season by dividing the number of FNY participants who sought medical care for ILI symptoms by the total number of ILI reports within an influenza season. Pearson correlations were used to assess the association between the health care-seeking percentages and baselines for each surveillance data source. RESULTS We observed consistent differences in ILI activity across geographical areas for CDC ILINet and athenahealth data. ILI activity for FNY displayed little variation across geographical areas, whereas differences in ILI activity for HealthTweets.org were associated with the total number of tweets within a geographical area. The percentage of FNY participants who sought health care for ILI symptoms differed slightly across geographical areas, and these percentages were positively correlated with CDC ILINet and athenahealth baselines. CONCLUSIONS Our findings suggest that differences in ILI activity across geographical areas as reported by a given surveillance system may not accurately reflect true differences in the prevalence of ILI. Instead, these differences may reflect systematic collection and aggregation biases that are particular to each system and consistent across influenza seasons. These findings are potentially relevant in the real-time analysis of the influenza season and in the definition of unbiased forecast models.
Collapse
Affiliation(s)
- Kristin Baltrusaitis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States
| | | | - Roni Rosenfeld
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Josh Gray
- athenaResearch at athenahealth, Watertown, MA, United States
| | - Dorrie Raymond
- athenaResearch at athenahealth, Watertown, MA, United States
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.,Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
46
|
Su K, Xu L, Li G, Ruan X, Li X, Deng P, Li X, Li Q, Chen X, Xiong Y, Lu S, Qi L, Shen C, Tang W, Rong R, Hong B, Ning Y, Long D, Xu J, Shi X, Yang Z, Zhang Q, Zhuang Z, Zhang L, Xiao J, Li Y. Forecasting influenza activity using self-adaptive AI model and multi-source data in Chongqing, China. EBioMedicine 2019; 47:284-292. [PMID: 31477561 PMCID: PMC6796527 DOI: 10.1016/j.ebiom.2019.08.024] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 08/09/2019] [Accepted: 08/09/2019] [Indexed: 02/05/2023] Open
Abstract
Background Early detection of influenza activity followed by timely response is a critical component of preparedness for seasonal influenza epidemic and influenza pandemic. However, most relevant studies were conducted at the regional or national level with regular seasonal influenza trends. There are few feasible strategies to forecast influenza activity at the local level with irregular trends. Methods Multi-source electronic data, including historical percentage of influenza-like illness (ILI%), weather data, Baidu search index and Sina Weibo data of Chongqing, China, were collected and integrated into an innovative Self-adaptive AI Model (SAAIM), which was constructed by integrating Seasonal Autoregressive Integrated Moving Average model and XGBoost model using a self-adaptive weight adjustment mechanism. SAAIM was applied to ILI% forecast in Chongqing from 2017 to 2018, of which the performance was compared with three previously available models on forecasting. Findings ILI% showed an irregular seasonal trend from 2012 to 2018 in Chongqing. Compared with three reference models, SAAIM achieved the best performance on forecasting ILI% of Chongqing with the mean absolute percentage error (MAPE) of 11·9%, 7·5%, and 11·9% during the periods of the year 2014–2016, 2017, and 2018 respectively. Among the three categories of source data, historical influenza activity contributed the most to the forecast accuracy by decreasing the MAPE by 19·6%, 43·1%, and 11·1%, followed by weather information (MAPE reduced by 3·3%, 17·1%, and 2·2%), and Internet-related public sentiment data (MAPE reduced by 1·1%, 0·9%, and 1·3%). Interpretation Accurate influenza forecast in areas with irregular seasonal influenza trends can be made by SAAIM with multi-source electronic data.
Collapse
Affiliation(s)
- Kun Su
- Department of Epidemiology, College of Preventive Medicine, Army Medical University (Third Military Medical University), Chongqing, People's Republic of China; Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Liang Xu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Guanqiao Li
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Xiaowen Ruan
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xian Li
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Pan Deng
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xinmi Li
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Qin Li
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Xianxian Chen
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Yu Xiong
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Shaofeng Lu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Li Qi
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Chaobo Shen
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Wenge Tang
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Rong Rong
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Boran Hong
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Yi Ning
- Meinian Institute of Health, Beijing, People's Republic of China
| | - Dongyan Long
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Jiaying Xu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xuanling Shi
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Zhihong Yang
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Qi Zhang
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Ziqi Zhuang
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Linqi Zhang
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China.
| | - Jing Xiao
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China.
| | - Yafei Li
- Department of Epidemiology, College of Preventive Medicine, Army Medical University (Third Military Medical University), Chongqing, People's Republic of China.
| |
Collapse
|
47
|
Soliman M, Lyubchich V, Gel YR. Complementing the power of deep learning with statistical model fusion: Probabilistic forecasting of influenza in Dallas County, Texas, USA. Epidemics 2019; 28:100345. [PMID: 31182294 DOI: 10.1016/j.epidem.2019.05.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Revised: 03/08/2019] [Accepted: 05/06/2019] [Indexed: 02/06/2023] Open
Abstract
Influenza is one of the main causes of death, not only in the USA but worldwide. Its significant economic and public health impacts necessitate development of accurate and efficient algorithms for forecasting of any upcoming influenza outbreaks. Most currently available methods for influenza prediction are based on parametric time series and regression models that impose restrictive and often unverifiable assumptions on the data. In turn, more flexible machine learning models and, particularly, deep learning tools whose utility is proven in a wide range of disciplines, remain largely under-explored in epidemiological forecasting. We study the seasonal influenza in Dallas County by evaluating the forecasting ability of deep learning with feedforward neural networks as well as performance of more conventional statistical models, such as beta regression, autoregressive integrated moving average (ARIMA), least absolute shrinkage and selection operators (LASSO), and non-parametric multivariate adaptive regression splines (MARS) models for one week and two weeks ahead forecasting. Furthermore, we assess forecasting utility of Google search queries and meteorological data as exogenous predictors of influenza activity. Finally, we develop a probabilistic forecasting of influenza in Dallas County by fusing all the considered models using Bayesian model averaging.
Collapse
Affiliation(s)
- Marwah Soliman
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
| | - Vyacheslav Lyubchich
- Chesapeake Biological Laboratory, University of Maryland Center for Environmental Science, Solomons, MD, USA.
| | - Yulia R Gel
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
| |
Collapse
|
48
|
Mavragani A, Ochoa G. Google Trends in Infodemiology and Infoveillance: Methodology Framework. JMIR Public Health Surveill 2019; 5:e13439. [PMID: 31144671 PMCID: PMC6660120 DOI: 10.2196/13439] [Citation(s) in RCA: 220] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 02/17/2019] [Accepted: 03/23/2019] [Indexed: 02/06/2023] Open
Abstract
Internet data are being increasingly integrated into health informatics research and are becoming a useful tool for exploring human behavior. The most popular tool for examining online behavior is Google Trends, an open tool that provides information on trends and the variations of online interest in selected keywords and topics over time. Online search traffic data from Google have been shown to be useful in analyzing human behavior toward health topics and in predicting disease occurrence and outbreaks. Despite the large number of Google Trends studies during the last decade, the literature on the subject lacks a specific methodology framework. This article aims at providing an overview of the tool and data and at presenting the first methodology framework in using Google Trends in infodemiology and infoveillance, including the main factors that need to be taken into account for a strong methodology base. We provide a step-by-step guide for the methodology that needs to be followed when using Google Trends and the essential aspects required for valid results in this line of research. At first, an overview of the tool and the data are presented, followed by an analysis of the key methodological points for ensuring the validity of the results, which include selecting the appropriate keyword(s), region(s), period, and category. Overall, this article presents and analyzes the key points that need to be considered to achieve a strong methodological basis for using Google Trends data, which is crucial for ensuring the value and validity of the results, as the analysis of online queries is extensively integrated in health research in the big data era.
Collapse
Affiliation(s)
- Amaryllis Mavragani
- Department of Computing Science and Mathematics, Faculty of Natural Sciences, University of Stirling, Stirling, United Kingdom
| | - Gabriela Ochoa
- Department of Computing Science and Mathematics, Faculty of Natural Sciences, University of Stirling, Stirling, United Kingdom
| |
Collapse
|
49
|
Clemente L, Lu F, Santillana M. Improved Real-Time Influenza Surveillance: Using Internet Search Data in Eight Latin American Countries. JMIR Public Health Surveill 2019; 5:e12214. [PMID: 30946017 PMCID: PMC6470460 DOI: 10.2196/12214] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 02/11/2019] [Accepted: 02/15/2019] [Indexed: 01/18/2023] Open
Abstract
Background Novel influenza surveillance systems that leverage Internet-based real-time data sources including Internet search frequencies, social-network information, and crowd-sourced flu surveillance tools have shown improved accuracy over the past few years in data-rich countries like the United States. These systems not only track flu activity accurately, but they also report flu estimates a week or more ahead of the publication of reports produced by healthcare-based systems, such as those implemented and managed by the Centers for Disease Control and Prevention. Previous work has shown that the predictive capabilities of novel flu surveillance systems, like Google Flu Trends (GFT), in developing countries in Latin America have not yet delivered acceptable flu estimates. Objective The aim of this study was to show that recent methodological improvements on the use of Internet search engine information to track diseases can lead to improved retrospective flu estimates in multiple countries in Latin America. Methods A machine learning-based methodology that uses flu-related Internet search activity and historical information to monitor flu activity, named ARGO (AutoRegression with Google search), was extended to generate flu predictions for 8 Latin American countries (Argentina, Bolivia, Brazil, Chile, Mexico, Paraguay, Peru, and Uruguay) for the time period: January 2012 to December of 2016. These retrospective (out-of-sample) Influenza activity predictions were compared with historically observed flu suspected cases in each country, as reported by Flunet, an influenza surveillance database maintained by the World Health Organization. For a baseline comparison, retrospective (out-of-sample) flu estimates were produced for the same time period using autoregressive models that only leverage historical flu activity information. Results Our results show that ARGO-like models’ predictive power outperform autoregressive models in 6 out of 8 countries in the 2012-2016 time period. Moreover, ARGO significantly improves on historical flu estimates produced by the now discontinued GFT for the time period of 2012-2015, where GFT information is publicly available. Conclusions We demonstrate here that a self-correcting machine learning method, leveraging Internet-based disease-related search activity and historical flu trends, has the potential to produce reliable and timely flu estimates in multiple Latin American countries. This methodology may prove helpful to local public health officials who design and implement interventions aimed at mitigating the effects of influenza outbreaks. Our methodology generally outperforms both the now-discontinued tool GFT, and autoregressive methodologies that exploit only historical flu activity to produce future disease estimates.
Collapse
Affiliation(s)
- Leonardo Clemente
- School of Engineering and Sciences, Tecnologico de Monterrey, Monterrey, Mexico.,Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
| | - Fred Lu
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.,Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
50
|
Talaei-Khoei A, Wilson JM, Kazemi SF. Period of Measurement in Time-Series Predictions of Disease Counts from 2007 to 2017 in Northern Nevada: Analytics Experiment. JMIR Public Health Surveill 2019; 5:e11357. [PMID: 30664479 PMCID: PMC6350093 DOI: 10.2196/11357] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Revised: 10/23/2018] [Accepted: 10/30/2018] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND The literature in statistics presents methods by which autocorrelation can identify the best period of measurement to improve the performance of a time-series prediction. The period of measurement plays an important role in improving the performance of disease-count predictions. However, from the operational perspective in public health surveillance, there is a limitation to the length of the measurement period that can offer meaningful and valuable predictions. OBJECTIVE This study aimed to establish a method that identifies the shortest period of measurement without significantly decreasing the prediction performance for time-series analysis of disease counts. METHODS The data used in this evaluation include disease counts from 2007 to 2017 in northern Nevada. The disease counts for chlamydia, salmonella, respiratory syncytial virus, gonorrhea, viral meningitis, and influenza A were predicted. RESULTS Our results showed that autocorrelation could not guarantee the best performance for prediction of disease counts. However, the proposed method with the change-point analysis suggests a period of measurement that is operationally acceptable and performance that is not significantly different from the best prediction. CONCLUSIONS The use of change-point analysis with autocorrelation provides the best and most practical period of measurement.
Collapse
Affiliation(s)
- Amir Talaei-Khoei
- Department of Information Systems, University of Nevada Reno, Reno, NV, United States.,School of Software, University of Technology Sydney, Sydney, Australia
| | - James M Wilson
- Nevada Medical Intelligence Center, School of Community Health Sciences and Department of Pediatrics, University of Nevada Reno, Reno, NV, United States
| | - Seyed-Farzan Kazemi
- Center for Research and Education in Advanced Transportation Engineering Systems, Rowan University, Glassboro, NJ, United States
| |
Collapse
|