1
|
Ab Rashid MA, Ahmad Zaki R, Wan Mahiyuddin WR, Yahya A. Forecasting New Tuberculosis Cases in Malaysia: A Time-Series Study Using the Autoregressive Integrated Moving Average (ARIMA) Model. Cureus 2023; 15:e44676. [PMID: 37809275 PMCID: PMC10552684 DOI: 10.7759/cureus.44676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/04/2023] [Indexed: 10/10/2023] Open
Abstract
Background The application of the Box-Jenkins autoregressive integrated moving average (ARIMA) model has been widely employed in predicting cases of infectious diseases. It has shown a positive impact on public health early warning surveillance due to its capability in producing reliable forecasting values. This study aimed to develop a prediction model for new tuberculosis (TB) cases using time-series data from January 2013 to December 2018 in Malaysia and to forecast monthly new TB cases for 2019. Materials and methods The ARIMA model was executed using data gathered between January 2013 and December 2018 in Malaysia. Subsequently, the well-fitted model was employed to make projections for new TB cases in the year 2019. To assess the efficacy of the model, two key metrics were utilized: the mean absolute percentage error (MAPE) and stationary R-squared. Furthermore, the sufficiency of the model was validated via the Ljung-Box test. Results The results of this study revealed that the ARIMA (2,1,1)(0,1,0)12 model proved to be the most suitable choice, exhibiting the lowest MAPE value of 6.762. The new TB cases showed a clear seasonality with two peaks occurring in March and December. The proportion of variance explained by the model was 55.8% with a p-value (Ljung-Box test) of 0.356. Conclusions The application of the ARIMA model has developed a simple, precise, and low-cost forecasting model that provides a warning six months in advance for monitoring the TB epidemic in Malaysia, which exhibits a seasonal pattern.
Collapse
Affiliation(s)
- Mohd Ariff Ab Rashid
- Department of Social and Preventive Medicine, Faculty of Medicine, Universiti Malaya, Kuala Lumpur, MYS
| | - Rafdzah Ahmad Zaki
- Department of Social and Preventive Medicine, Faculty of Medicine, Universiti Malaya, Kuala Lumpur, MYS
| | | | - Abqariyah Yahya
- Department of Social and Preventive Medicine, Faculty of Medicine, Universiti Malaya, Kuala Lumpur, MYS
| |
Collapse
|
2
|
Wang Y, Zhou H, Zheng L, Li M, Hu B. Using the Baidu index to predict trends in the incidence of tuberculosis in Jiangsu Province, China. Front Public Health 2023; 11:1203628. [PMID: 37533520 PMCID: PMC10390734 DOI: 10.3389/fpubh.2023.1203628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 07/05/2023] [Indexed: 08/04/2023] Open
Abstract
Objective To analyze the time series in the correlation between search terms related to tuberculosis (TB) and actual incidence data in China. To screen out the "leading" terms and construct a timely and efficient TB prediction model that can predict the next wave of TB epidemic trend in advance. Methods Monthly incidence data of tuberculosis in Jiangsu Province, China, were collected from January 2011 to December 2020. A scoping approach was used to identify TB search terms around common TB terms, prevention, symptoms and treatment. Search terms for Jiangsu Province, China, from January 2011 to December 2020 were collected from the Baidu index database. Correlation coefficients between search terms and actual incidence were calculated using Python 3.6 software. The multiple linear regression model was constructed using SPSS 26.0 software, which also calculated the goodness of fit and prediction error of the model predictions. Results A total of 16 keywords with correlation coefficients greater than 0.6 were screened, of which 11 were the leading terms. The R2 of the prediction model was 0.67 and the MAPE was 10.23%. Conclusion The TB prediction model based on Baidu Index data was able to predict the next wave of TB epidemic trends and intensity 2 months in advance. This forecasting model is currently only available for Jiangsu Province.
Collapse
|
3
|
Mavragani A, Fragkozidis G, Zarkogianni K, Nikita KS. Long Short-term Memory-Based Prediction of the Spread of Influenza-Like Illness Leveraging Surveillance, Weather, and Twitter Data: Model Development and Validation. J Med Internet Res 2023; 25:e42519. [PMID: 36745490 PMCID: PMC9941907 DOI: 10.2196/42519] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 11/29/2022] [Accepted: 11/30/2022] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND The potential to harness the plurality of available data in real time along with advanced data analytics for the accurate prediction of influenza-like illness (ILI) outbreaks has gained significant scientific interest. Different methodologies based on the use of machine learning techniques and traditional and alternative data sources, such as ILI surveillance reports, weather reports, search engine queries, and social media, have been explored with the ultimate goal of being used in the development of electronic surveillance systems that could complement existing monitoring resources. OBJECTIVE The scope of this study was to investigate for the first time the combined use of ILI surveillance data, weather data, and Twitter data along with deep learning techniques toward the development of prediction models able to nowcast and forecast weekly ILI cases. By assessing the predictive power of both traditional and alternative data sources on the use case of ILI, this study aimed to provide a novel approach for corroborating evidence and enhancing accuracy and reliability in the surveillance of infectious diseases. METHODS The model's input space consisted of information related to weekly ILI surveillance, web-based social (eg, Twitter) behavior, and weather conditions. For the design and development of the model, relevant data corresponding to the period of 2010 to 2019 and focusing on the Greek population and weather were collected. Long short-term memory (LSTM) neural networks were leveraged to efficiently handle the sequential and nonlinear nature of the multitude of collected data. The 3 data categories were first used separately for training 3 LSTM-based primary models. Subsequently, different transfer learning (TL) approaches were explored with the aim of creating various feature spaces combining the features extracted from the corresponding primary models' LSTM layers for the latter to feed a dense layer. RESULTS The primary model that learned from weather data yielded better forecast accuracy (root mean square error [RMSE]=0.144; Pearson correlation coefficient [PCC]=0.801) than the model trained with ILI historical data (RMSE=0.159; PCC=0.794). The best performance was achieved by the TL-based model leveraging the combination of the 3 data categories (RMSE=0.128; PCC=0.822). CONCLUSIONS The superiority of the TL-based model, which considers Twitter data, weather data, and ILI surveillance data, reflects the potential of alternative public sources to enhance accurate and reliable prediction of ILI spread. Despite its focus on the use case of Greece, the proposed approach can be generalized to other locations, populations, and social media platforms to support the surveillance of infectious diseases with the ultimate goal of reinforcing preparedness for future epidemics.
Collapse
Affiliation(s)
| | - Georgios Fragkozidis
- School of Electrical and Computer Engineering, National Technical University of Athens, Zografos, Athens, Greece
| | - Konstantia Zarkogianni
- School of Electrical and Computer Engineering, National Technical University of Athens, Zografos, Athens, Greece
| | - Konstantina S Nikita
- School of Electrical and Computer Engineering, National Technical University of Athens, Zografos, Athens, Greece
| |
Collapse
|
4
|
Santangelo OE, Gianfredi V, Provenzano S. Wikipedia searches and the epidemiology of infectious diseases: A systematic review. DATA KNOWL ENG 2022. [DOI: 10.1016/j.datak.2022.102093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
5
|
Abdulkhaleq MT, Rashid TA, Alsadoon A, Hassan BA, Mohammadi M, Abdullah JM, Chhabra A, Ali SL, Othman RN, Hasan HA, Azad S, Mahmood NA, Abdalrahman SS, Rasul HO, Bacanin N, Vimal S. Harmony search: Current studies and uses on healthcare systems. Artif Intell Med 2022; 131:102348. [DOI: 10.1016/j.artmed.2022.102348] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 05/08/2022] [Accepted: 06/30/2022] [Indexed: 11/29/2022]
|
6
|
Beesley LJ, Osthus D, Del Valle SY. Addressing delayed case reporting in infectious disease forecast modeling. PLoS Comput Biol 2022; 18:e1010115. [PMID: 35658007 PMCID: PMC9200328 DOI: 10.1371/journal.pcbi.1010115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 06/15/2022] [Accepted: 04/18/2022] [Indexed: 11/18/2022] Open
Abstract
Infectious disease forecasting is of great interest to the public health community and policymakers, since forecasts can provide insight into disease dynamics in the near future and inform interventions. Due to delays in case reporting, however, forecasting models may often underestimate the current and future disease burden. In this paper, we propose a general framework for addressing reporting delay in disease forecasting efforts with the goal of improving forecasts. We propose strategies for leveraging either historical data on case reporting or external internet-based data to estimate the amount of reporting error. We then describe several approaches for adapting general forecasting pipelines to account for under- or over-reporting of cases. We apply these methods to address reporting delay in data on dengue fever cases in Puerto Rico from 1990 to 2009 and to reports of influenza-like illness (ILI) in the United States between 2010 and 2019. Through a simulation study, we compare method performance and evaluate robustness to assumption violations. Our results show that forecasting accuracy and prediction coverage almost always increase when correction methods are implemented to address reporting delay. Some of these methods required knowledge about the reporting error or high quality external data, which may not always be available. Provided alternatives include excluding recently-reported data and performing sensitivity analysis. This work provides intuition and guidance for handling delay in disease case reporting and may serve as a useful resource to inform practical infectious disease forecasting efforts. The public health community and policymakers are interested in using models to predict future disease rates using information about disease rates in the past. However, our data about the recent past are less reliable than older data, due to a time lag between someone getting sick and their subsequent diagnosis being officially reported. In this paper, we describe strategies to correct reported disease rates from the recent past to account for disease diagnoses that haven’t yet been reported. Using more accurate information about the recent past, we can do a better job predicting what will happen in the future.
Collapse
Affiliation(s)
- Lauren J. Beesley
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
- * E-mail:
| | - Dave Osthus
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Sara Y. Del Valle
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| |
Collapse
|
7
|
Said Abasse K, Toulouse Fournier A, Paquet C, Côté A, Smith PY, Bergeron F, Archambault P. Collaborative Writing Applications in Support of Knowledge Translation and Management during Pandemics: A Scoping Review. Int J Med Inform 2022; 165:104814. [DOI: 10.1016/j.ijmedinf.2022.104814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Revised: 04/17/2022] [Accepted: 06/05/2022] [Indexed: 11/28/2022]
|
8
|
AlRyalat SA, Al Oweidat K, Al-Essa M, Ashouri K, El Khatib O, Al-Rawashdeh A, Yaseen A, Toumar A, Alrwashdeh A. Influenza Altmetric Attention Score and its association with the influenza season in the USA. F1000Res 2022; 9:96. [PMID: 35465063 PMCID: PMC9021684 DOI: 10.12688/f1000research.22127.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/04/2022] [Indexed: 11/20/2022] Open
Abstract
Background: Altmetrics measure the impact of journal articles by tracking social media, Wikipedia, public policy documents, blogs, and mainstream news activity, after which an overall Altmetric attention score (AAS) is calculated for every journal article. In this study, we aim to assess the AAS for influenza related articles and its relation to the influenza season in the USA. Methods: This study used the openly available Altmetric data from Altmetric.com. First, we retrieved all influenza-related articles using an advanced PubMed search query, then we inputted the resulted query into Altmetric explorer. We then calculated the average AAS for each month during the years 2012-2018. Results: A total of 24,964 PubMed documents were extracted, among them, 12,395 documents had at least one attention. We found a significant difference in mean AAS between February and each of January and March (p< 0.001, mean difference of 117.4 and 460.7, respectively). We found a significant difference between June and each of May and July (p< 0.001, mean difference of 1221.4 and 162.7, respectively). We also found a significant difference between October and each of September and November (p< 0.001, mean difference of 88.8 and 154.8, respectively). Conclusion: We observed a seasonal trend in the attention toward influenza-related research, with three annual peaks that correlated with the beginning, peak, and end of influenza seasons in the USA, according to Centers for Disease Control and Prevention (CDC) data.
Collapse
|
9
|
Query-based-learning mortality-related decoders for the developed island economy. Sci Rep 2022; 12:956. [PMID: 35046447 PMCID: PMC8770507 DOI: 10.1038/s41598-022-04855-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 12/30/2021] [Indexed: 11/09/2022] Open
Abstract
Search volumes from Google Trends over clear-defined temporal and spatial scales were reported beneficial in predicting influenza or disease outbreak. Recent studies showed Wiener Model shares merits of interpretability, implementation, and adaptation to nonlinear fluctuation in terms of real-time decoding. Previous work reported Google Trends effectively predicts death-related trends for the continent economy, yet whether it applies to the island economy is unclear. To this end, a framework of the mortality-related model for a developed island economy Taiwan was built based on potential death causes from Google Trends, aiming to provide new insights into death-related online search behavior at a population level. Our results showed estimated trends based on the Wiener model significantly correlated to actual trends, outperformed those with multiple linear regression and seasonal autoregressive integrated moving average. Meanwhile, apart from that involved all possible features, two other sets of feature selecting strategies were proposed to optimize pre-trained models, either by weights or waveform periodicity of features, resulting in estimated death-related dynamics along with spectrums of risk factors. In general, high-weight features were beneficial to both "die" and "death", whereas features that possessed clear periodic patterns contributed more to "death". Of note, normalization before modeling improved decoding performances.
Collapse
|
10
|
He Y, Zhao Y, Chen Y, Yuan H, Tsui K. Nowcasting influenza‐like illness (ILI) via a deep learning approach using google search data: An empirical study on Taiwan ILI. INT J INTELL SYST 2021. [DOI: 10.1002/int.22788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Yuxin He
- College of Urban Transportation and Logistics Shenzhen Technology University Shenzhen China
| | - Yang Zhao
- School of Public Health (Shenzhen) Sun Yat‐Sen University Guangzhou China
| | - Yupeng Chen
- Trial Retail Engineering (T. R. E. China) Yantai China
| | - Hsiang‐Yu Yuan
- Department of Biomedical Sciences City University of Hong Kong Hong Kong China
| | - Kwok‐Leung Tsui
- Department of Industrial and Systems Engineering Virginia Polytechnic Institute and State University Blacksburg Virginia USA
| |
Collapse
|
11
|
Bannister A, Botta F. Rapid indicators of deprivation using grocery shopping data. ROYAL SOCIETY OPEN SCIENCE 2021; 8:211069. [PMID: 34950487 PMCID: PMC8692957 DOI: 10.1098/rsos.211069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 11/29/2021] [Indexed: 06/14/2023]
Abstract
Measuring socio-economic indicators is a crucial task for policy makers who need to develop and implement policies aimed at reducing inequalities and improving the quality of life. However, traditionally this is a time-consuming and expensive task, which therefore cannot be carried out with high temporal frequency. Here, we investigate whether secondary data generated from our grocery shopping habits can be used to generate rapid estimates of deprivation in the city of London in the UK. We show the existence of a relationship between our grocery shopping data and the deprivation of different areas in London, and how we can use grocery shopping data to generate quick estimates of deprivation, albeit with some limitations. Crucially, our estimates can be generated very rapidly with the data used in our analysis, thus opening up the opportunity of having early access to estimates of deprivation. Our findings provide further evidence that new data streams contain accurate information about our collective behaviour and the current state of our society.
Collapse
Affiliation(s)
- Adam Bannister
- Department of Computer Science, University of Exeter, Exeter, UK
| | - Federico Botta
- Department of Computer Science, University of Exeter, Exeter, UK
- The Alan Turing Institute, British Library, London, UK
| |
Collapse
|
12
|
Marmara V, Marmara D, McMenemy P, Kleczkowski A. Cross-sectional telephone surveys as a tool to study epidemiological factors and monitor seasonal influenza activity in Malta. BMC Public Health 2021; 21:1828. [PMID: 34627201 PMCID: PMC8502089 DOI: 10.1186/s12889-021-11862-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 09/27/2021] [Indexed: 11/29/2022] Open
Abstract
Background Seasonal influenza has major implications for healthcare services as outbreaks often lead to high activity levels in health systems. Being able to predict when such outbreaks occur is vital. Mathematical models have extensively been used to predict epidemics of infectious diseases such as seasonal influenza and to assess effectiveness of control strategies. Availability of comprehensive and reliable datasets used to parametrize these models is limited. In this paper we combine a unique epidemiological dataset collected in Malta through General Practitioners (GPs) with a novel method using cross-sectional surveys to study seasonal influenza dynamics in Malta in 2014–2016, to include social dynamics and self-perception related to seasonal influenza. Methods Two cross-sectional public surveys (n = 406 per survey) were performed by telephone across the Maltese population in 2014–15 and 2015–16 influenza seasons. Survey results were compared with incidence data (diagnosed seasonal influenza cases) collected by GPs in the same period and with Google Trends data for Malta. Information was collected on whether participants recalled their health status in past months, occurrences of influenza symptoms, hospitalisation rates due to seasonal influenza, seeking GP advice, and other medical information. Results We demonstrate that cross-sectional surveys are a reliable alternative data source to medical records. The two surveys gave comparable results, indicating that the level of recollection among the public is high. Based on two seasons of data, the reporting rate in Malta varies between 14 and 22%. The comparison with Google Trends suggests that the online searches peak at about the same time as the maximum extent of the epidemic, but the public interest declines and returns to background level. We also found that the public intensively searched the Internet for influenza-related terms even when number of cases was low. Conclusions Our research shows that a telephone survey is a viable way to gain deeper insight into a population’s self-perception of influenza and its symptoms and to provide another benchmark for medical statistics provided by GPs and Google Trends. The information collected can be used to improve epidemiological modelling of seasonal influenza and other infectious diseases, thus effectively contributing to public health. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-021-11862-x.
Collapse
Affiliation(s)
- V Marmara
- Faculty of Economics, Management & Accountancy, University of Malta, Msida, MSD, 2080, Malta
| | - D Marmara
- Faculty of Health Sciences, Mater Dei Hospital, Block A, Level 1, University of Malta, Msida, MSD, 2090, Malta.
| | - P McMenemy
- Department of Mathematics, University of Stirling, Stirling, FK94LA, Scotland, UK
| | - A Kleczkowski
- Department of Mathematics and Statistics, University of Strathclyde, Rm. 1001, 26 Richmond Street, Glasgow, G1 1XH, Scotland
| |
Collapse
|
13
|
Li J, Sia CL, Chen Z, Huang W. Enhancing Influenza Epidemics Forecasting Accuracy in China with Both Official and Unofficial Online News Articles, 2019-2020. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18126591. [PMID: 34207479 PMCID: PMC8296334 DOI: 10.3390/ijerph18126591] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/05/2021] [Accepted: 06/15/2021] [Indexed: 11/16/2022]
Abstract
Real-time online data sources have contributed to timely and accurate forecasting of influenza activities while also suffered from instability and linguistic noise. Few previous studies have focused on unofficial online news articles, which are abundant in their numbers, rich in information, and relatively low in noise. This study examined whether monitoring both official and unofficial online news articles can improve influenza activity forecasting accuracy during influenza outbreaks. Data were retrieved from a Chinese commercial online platform and the website of the Chinese National Influenza Center. We modeled weekly fractions of influenza-related online news articles and compared them against weekly influenza-like illness (ILI) rates using autoregression analyses. We retrieved 153,958,695 and 149,822,871 online news articles focusing on the south and north of mainland China separately from 6 October 2019 to 17 May 2020. Our model based on online news articles could significantly improve the forecasting accuracy, compared to other influenza surveillance models based on historical ILI rates (p = 0.002 in the south; p = 0.000 in the north) or adding microblog data as an exogenous input (p = 0.029 in the south; p = 0.000 in the north). Our finding also showed that influenza forecasting based on online news articles could be 1-2 weeks ahead of official ILI surveillance reports. The results revealed that monitoring online news articles could supplement traditional influenza surveillance systems, improve resource allocation, and offer models for surveillance of other emerging diseases.
Collapse
Affiliation(s)
- Jingwei Li
- School of Management, Xi’an Jiaotong University, Xi’an 710049, China;
- Department of Information Systems, City University of Hong Kong, Hong Kong 999077, China;
| | - Choon-Ling Sia
- Department of Information Systems, City University of Hong Kong, Hong Kong 999077, China;
| | - Zhuo Chen
- College of Public Health, University of Georgia, Athens, GA 30602, USA;
- School of Economics, University of Nottingham Ningbo China, Ningbo 315000, China
| | - Wei Huang
- College of Business, Southern University of Science and Technology, Shenzhen 518000, China
- Correspondence:
| |
Collapse
|
14
|
Choi H, Choi WS, Han E. Suggestion of a simpler and faster influenza-like illness surveillance system using 2014-2018 claims data in Korea. Sci Rep 2021; 11:11243. [PMID: 34045533 PMCID: PMC8159991 DOI: 10.1038/s41598-021-90511-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 05/06/2021] [Indexed: 11/10/2022] Open
Abstract
Influenza is an important public health concern. We propose a new real-time influenza-like illness (ILI) surveillance system that utilizes a nationwide prospective drug utilization monitoring in Korea. We defined ILI-related claims as outpatient claims that contain both antipyretic and antitussive agents and calculated the weekly rate of ILI-related claims, which was compared to weekly ILI rates from clinical sentinel surveillance data during 2014-2018. We performed a cross-correlation analysis using Pearson's correlation, time-series analysis to explore actual correlations after removing any dubious correlations due to underlying non-stationarity in both data sets. We used the moving epidemic method (MEM) to estimate an absolute threshold to designate potential influenza epidemics for the weeks with incidence rates above the threshold. We observed a strong correlation between the two surveillance systems each season. The absolute thresholds for the 4-years were 84.64 and 86.19 cases per 1000claims for claims data and 12.27 and 16.82 per 1000 patients for sentinel data. The epidemic patterns were more similar in the 2016-2017 and 2017-2018 seasons than the 2014-2015 and 2015-2016 seasons. ILI claims data can be loaded to a drug utilization review system in Korea to make an influenza surveillance system.
Collapse
Affiliation(s)
- HeeKyoung Choi
- College of Pharmacy, Yonsei Institute of Pharmaceutical Research, Yonsei University, 162-1 Songdo-dong, Yeonsu-gu, Incheon, Seoul, Republic of Korea
- Division of Infectious Diseases, Department of Internal Medicine, National Health Insurance Service Ilsan Hospital, Ilsan, Republic of Korea
| | - Won Suk Choi
- Division of Infectious Diseases, Department of Internal Medicine, Ansan Hospital, Korea University College of Medicine, Ansan, Republic of Korea
| | - Euna Han
- College of Pharmacy, Yonsei Institute of Pharmaceutical Research, Yonsei University, 162-1 Songdo-dong, Yeonsu-gu, Incheon, Seoul, Republic of Korea.
| |
Collapse
|
15
|
Abstract
Influenza forecasting in the United States (US) is complex and challenging due to spatial and temporal variability, nested geographic scales of interest, and heterogeneous surveillance participation. Here we present Dante, a multiscale influenza forecasting model that learns rather than prescribes spatial, temporal, and surveillance data structure and generates coherent forecasts across state, regional, and national scales. We retrospectively compare Dante's short-term and seasonal forecasts for previous flu seasons to the Dynamic Bayesian Model (DBM), a leading competitor. Dante outperformed DBM for nearly all spatial units, flu seasons, geographic scales, and forecasting targets. Dante's sharper and more accurate forecasts also suggest greater public health utility. Dante placed 1st in the Centers for Disease Control and Prevention's prospective 2018/19 FluSight challenge in both the national and regional competition and the state competition. The methodology underpinning Dante can be used in other seasonal disease forecasting contexts having nested geographic scales of interest.
Collapse
Affiliation(s)
- Dave Osthus
- Los Alamos National Laboratory, Statistical Sciences Group, Los Alamos, NM, USA.
| | - Kelly R Moran
- Los Alamos National Laboratory, Statistical Sciences Group, Los Alamos, NM, USA.,Department of Statistical Science, Duke University, Durham, NC, USA
| |
Collapse
|
16
|
Poirier C, Hswen Y, Bouzillé G, Cuggia M, Lavenu A, Brownstein JS, Brewer T, Santillana M. Influenza forecasting for French regions combining EHR, web and climatic data sources with a machine learning ensemble approach. PLoS One 2021; 16:e0250890. [PMID: 34010293 PMCID: PMC8133501 DOI: 10.1371/journal.pone.0250890] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 04/16/2021] [Indexed: 11/25/2022] Open
Abstract
Effective and timely disease surveillance systems have the potential to help public health officials design interventions to mitigate the effects of disease outbreaks. Currently, healthcare-based disease monitoring systems in France offer influenza activity information that lags real-time by one to three weeks. This temporal data gap introduces uncertainty that prevents public health officials from having a timely perspective on the population-level disease activity. Here, we present a machine-learning modeling approach that produces real-time estimates and short-term forecasts of influenza activity for the twelve continental regions of France by leveraging multiple disparate data sources that include, Google search activity, real-time and local weather information, flu-related Twitter micro-blogs, electronic health records data, and historical disease activity synchronicities across regions. Our results show that all data sources contribute to improving influenza surveillance and that machine-learning ensembles that combine all data sources lead to accurate and timely predictions.
Collapse
Affiliation(s)
- Canelle Poirier
- INSERM, U1099, Rennes, France
- Université de Rennes 1, LTSI, Rennes, France
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States of America
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, United States of America
- * E-mail: (CP); (MS)
| | - Yulin Hswen
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA, United States of America
- Innovation Program, Boston Children’s Hospital, Boston, MA, United States of America
| | - Guillaume Bouzillé
- INSERM, U1099, Rennes, France
- Université de Rennes 1, LTSI, Rennes, France
- CHU Rennes, Centre de Données Cliniques, Rennes, France
| | - Marc Cuggia
- INSERM, U1099, Rennes, France
- Université de Rennes 1, LTSI, Rennes, France
- CHU Rennes, Centre de Données Cliniques, Rennes, France
| | - Audrey Lavenu
- Université de Rennes 1, Faculté de médecine, Rennes, France
- INSERM CIC 1414, Université de Rennes 1, Rennes, France
- IRMAR, Institut de Recherche Mathématique de Rennes, Rennes, France
| | - John S. Brownstein
- Innovation Program, Boston Children’s Hospital, Boston, MA, United States of America
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States of America
| | - Thomas Brewer
- Innovation Program, Boston Children’s Hospital, Boston, MA, United States of America
| | - Mauricio Santillana
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States of America
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, United States of America
- * E-mail: (CP); (MS)
| |
Collapse
|
17
|
Chrzanowski J, Sołek J, Fendler W, Jemielniak D. Assessing Public Interest Based on Wikipedia's Most Visited Medical Articles During the SARS-CoV-2 Outbreak: Search Trends Analysis. J Med Internet Res 2021; 23:e26331. [PMID: 33667176 PMCID: PMC8049630 DOI: 10.2196/26331] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 01/21/2021] [Accepted: 02/18/2021] [Indexed: 12/14/2022] Open
Abstract
Background In the current era of widespread access to the internet, we can monitor public interest in a topic via information-targeted web browsing. We sought to provide direct proof of the global population’s altered use of Wikipedia medical knowledge resulting from the new COVID-19 pandemic and related global restrictions. Objective We aimed to identify temporal search trends and quantify changes in access to Wikipedia Medicine Project articles that were related to the COVID-19 pandemic. Methods We performed a retrospective analysis of medical articles across nine language versions of Wikipedia and country-specific statistics for registered COVID-19 deaths. The observed patterns were compared to a forecast model of Wikipedia use, which was trained on data from 2015 to 2019. The model comprehensively analyzed specific articles and similarities between access count data from before (ie, several years prior) and during the COVID-19 pandemic. Wikipedia articles that were linked to those directly associated with the pandemic were evaluated in terms of degrees of separation and analyzed to identify similarities in access counts. We assessed the correlation between article access counts and the number of diagnosed COVID-19 cases and deaths to identify factors that drove interest in these articles and shifts in public interest during the subsequent phases of the pandemic. Results We observed a significant (P<.001) increase in the number of entries on Wikipedia medical articles during the pandemic period. The increased interest in COVID-19–related articles temporally correlated with the number of global COVID-19 deaths and consistently correlated with the number of region-specific COVID-19 deaths. Articles with low degrees of separation were significantly similar (P<.001) in terms of access patterns that were indicative of information-seeking patterns. Conclusions The analysis of Wikipedia medical article popularity could be a viable method for epidemiologic surveillance, as it provides important information about the reasons behind public attention and factors that sustain public interest in the long term. Moreover, Wikipedia users can potentially be directed to credible and valuable information sources that are linked with the most prominent articles.
Collapse
Affiliation(s)
- Jędrzej Chrzanowski
- Department of Biostatistics and Translational Medicine, Medical University of Łódź, Łódź, Poland
| | - Julia Sołek
- Department of Biostatistics and Translational Medicine, Medical University of Łódź, Łódź, Poland.,Department of Pathology, Medical University of Łódź, Łódź, Poland
| | - Wojciech Fendler
- Department of Biostatistics and Translational Medicine, Medical University of Łódź, Łódź, Poland
| | - Dariusz Jemielniak
- Management in Networked and Digital Societies, Kozminski University, Warszawa, Poland
| |
Collapse
|
18
|
Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm. MATHEMATICS 2021. [DOI: 10.3390/math9050570] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
This paper proposes a feature selection method that is effective in distinguishing colorectal cancer patients from normal individuals using K-means clustering and the modified harmony search algorithm. As the genetic cause of colorectal cancer originates from mutations in genes, it is important to classify the presence or absence of colorectal cancer through gene information. The proposed methodology consists of four steps. First, the original data are Z-normalized by data preprocessing. Candidate genes are then selected using the Fisher score. Next, one representative gene is selected from each cluster after candidate genes are clustered using K-means clustering. Finally, feature selection is carried out using the modified harmony search algorithm. The gene combination created by feature selection is then applied to the classification model and verified using 5-fold cross-validation. The proposed model obtained a classification accuracy of up to 94.36%. Furthermore, on comparing the proposed method with other methods, we prove that the proposed method performs well in classifying colorectal cancer. Moreover, we believe that the proposed model can be applied not only to colorectal cancer but also to other gene-related diseases.
Collapse
|
19
|
Gianfredi V, Santangelo OE, Provenzano S. Correlation between flu and Wikipedia's pages visualization. ACTA BIO-MEDICA : ATENEI PARMENSIS 2021; 92:e2021056. [PMID: 33682825 PMCID: PMC7975939 DOI: 10.23750/abm.v92i1.9790] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 12/10/2020] [Indexed: 12/17/2022]
Abstract
Introduction: This study aimed to assess if the frequency of the Italian general public searches for influenza, using the Wikipedia web-page, are aligned with Istituto Superiore di Sanità (ISS) influenza cases. Materials and Methods: The reported cases of flu were selected from October 2015 to May 2019. Wikipedia Trends was used to assess how many times a specific page was read by users; data were extracted as daily data and aggregated on a weekly basis. The following data were extracted: number of weekly views by users from the October 2015 to May 2019 of the pages: Influenza, Febbre and Tosse (Flu, Fever and Cough, in English). Cross-correlation results are obtained as product-moment correlations between the two times series. Results: Regarding the database with weekly data, temporal correlation was observed between the bulletin of ISS and Wikipedia search trends. The strongest correlation was at a lag of 0 for number of cases and Flu (r=0.7571), Fever and Cough (r=0.7501). The strongest correlation was at a lag of -1 for Fever and Cough (r=0.7501). The strongest correlation was at a lag of 1 for number of cases and Flu (r=0.7559), Fever and Cough (r=0.7501). Conclusions: A possible future application for programming and management interventions of Public Health is proposed.
Collapse
|
20
|
Poulin R, Bennett J, Filion A, Bhattarai UR, Chai X, de Angeli Dutra D, Donlon E, Doherty JF, Jorge F, Milotic M, Park E, Sabadel A, Thomas LJ. iParasitology: Mining the Internet to Test Parasitological Hypotheses. Trends Parasitol 2021; 37:267-272. [PMID: 33547010 DOI: 10.1016/j.pt.2021.01.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 01/12/2021] [Accepted: 01/13/2021] [Indexed: 12/17/2022]
Abstract
Digital data (internet queries, page views, social media posts, images) are accumulating online at increasing rates. Tools for compiling these data and extracting their metadata are now readily available. We highlight the possibilities and limitations of internet data to reveal patterns in host-parasite interactions and encourage parasitologists to embrace iParasitology.
Collapse
Affiliation(s)
- Robert Poulin
- Department of Zoology, University of Otago, P.O. Box 56, Dunedin, New Zealand.
| | - Jerusha Bennett
- Department of Zoology, University of Otago, P.O. Box 56, Dunedin, New Zealand
| | - Antoine Filion
- Department of Zoology, University of Otago, P.O. Box 56, Dunedin, New Zealand
| | | | - Xuhong Chai
- Department of Zoology, University of Otago, P.O. Box 56, Dunedin, New Zealand
| | | | - Erica Donlon
- Department of Zoology, University of Otago, P.O. Box 56, Dunedin, New Zealand
| | | | - Fátima Jorge
- Department of Zoology, University of Otago, P.O. Box 56, Dunedin, New Zealand
| | - Marin Milotic
- Department of Zoology, University of Otago, P.O. Box 56, Dunedin, New Zealand
| | - Eunji Park
- Department of Zoology, University of Otago, P.O. Box 56, Dunedin, New Zealand
| | - Amandine Sabadel
- Department of Zoology, University of Otago, P.O. Box 56, Dunedin, New Zealand
| | - Leighton J Thomas
- Department of Zoology, University of Otago, P.O. Box 56, Dunedin, New Zealand
| |
Collapse
|
21
|
Seasonality of Back Pain in Italy: An Infodemiology Study. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18031325. [PMID: 33535709 PMCID: PMC7908346 DOI: 10.3390/ijerph18031325] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/21/2021] [Accepted: 01/28/2021] [Indexed: 12/27/2022]
Abstract
BACKGROUND E-health tools have been used to assess the temporal variations of different health problems. The aim of our infodemiology study was to investigate the seasonal pattern of search volumes for back pain in Italy. METHODS In Italian, back pain is indicated by the medical word "lombalgia". Using Google Trends, we selected the three search terms related to "lombalgia" with higher relative search volumes (RSV), (namely, "mal di schiena", "dolore alla schiena" and "dolore lombare"), representing the semantic preferences of users when performing web queries for back pain in Italy. Wikipedia page view statistics were used to identify the number of visits to the page "lombalgia". Strength and direction of secular trends were assessed using the Mann-Kendall test. Cosinor analysis was used to evaluate the potential seasonality of back pain-related RSV. RESULTS We found a significant upward secular trend from 2005 to 2020 for search terms "mal di schiena" (τ = 0.734, p < 0.0001), "dolore alla schiena" (τ = 0.713, p < 0.0001) and "dolore lombare" (τ = 0.628, p < 0.0001). Cosinor analysis on Google Trends RSV showed a significant seasonality for the terms "mal di schiena" (pcos < 0.001), "dolore alla schiena" (pcos < 0.0001), "dolore lombare" (pcos < 0.0001) and "lombalgia" (pcos = 0.017). Cosinor analysis performed on views for the page "lombalgia" in Wikipedia confirmed a significant seasonality (pcos < 0.0001). Both analyses demonstrated a peak of interest in winter months and decrease in spring/summer. CONCLUSIONS Our infodemiology approach revealed significant seasonal fluctuations in search queries for back pain in Italy, with peaking volumes during the coldest months of the year.
Collapse
|
22
|
Leuba SI, Yaesoubi R, Antillon M, Cohen T, Zimmer C. Tracking and predicting U.S. influenza activity with a real-time surveillance network. PLoS Comput Biol 2020; 16:e1008180. [PMID: 33137088 PMCID: PMC7707518 DOI: 10.1371/journal.pcbi.1008180] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 12/01/2020] [Accepted: 07/22/2020] [Indexed: 12/29/2022] Open
Abstract
Each year in the United States, influenza causes illness in 9.2 to 35.6 million individuals and is responsible for 12,000 to 56,000 deaths. The U.S. Centers for Disease Control and Prevention (CDC) tracks influenza activity through a national surveillance network. These data are only available after a delay of 1 to 2 weeks, and thus influenza epidemiologists and transmission modelers have explored the use of other data sources to produce more timely estimates and predictions of influenza activity. We evaluated whether data collected from a national commercial network of influenza diagnostic machines could produce valid estimates of the current burden and help to predict influenza trends in the United States. Quidel Corporation provided us with de-identified influenza test results transmitted in real-time from a national network of influenza test machines called the Influenza Test System (ITS). We used this ITS dataset to estimate and predict influenza-like illness (ILI) activity in the United States over the 2015-2016 and 2016-2017 influenza seasons. First, we developed linear logistic models on national and regional geographic scales that accurately estimated two CDC influenza metrics: the proportion of influenza test results that are positive and the proportion of physician visits that are ILI-related. We then used our estimated ILI-related proportion of physician visits in transmission models to produce improved predictions of influenza trends in the United States at both the regional and national scale. These findings suggest that ITS can be leveraged to improve "nowcasts" and short-term forecasts of U.S. influenza activity.
Collapse
Affiliation(s)
- Sequoia I. Leuba
- Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
| | - Reza Yaesoubi
- Health Policy and Management, Yale School of Public Health, New Haven, CT, USA
| | - Marina Antillon
- Household Economics and Health Systems Research Unit, Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Ted Cohen
- Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
| | - Christoph Zimmer
- Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
23
|
Gozzi N, Tizzani M, Starnini M, Ciulla F, Paolotti D, Panisson A, Perra N. Collective Response to Media Coverage of the COVID-19 Pandemic on Reddit and Wikipedia: Mixed-Methods Analysis. J Med Internet Res 2020; 22:e21597. [PMID: 32960775 PMCID: PMC7553788 DOI: 10.2196/21597] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 07/31/2020] [Accepted: 09/09/2020] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND The exposure and consumption of information during epidemic outbreaks may alter people's risk perception and trigger behavioral changes, which can ultimately affect the evolution of the disease. It is thus of utmost importance to map the dissemination of information by mainstream media outlets and the public response to this information. However, our understanding of this exposure-response dynamic during the COVID-19 pandemic is still limited. OBJECTIVE The goal of this study is to characterize the media coverage and collective internet response to the COVID-19 pandemic in four countries: Italy, the United Kingdom, the United States, and Canada. METHODS We collected a heterogeneous data set including 227,768 web-based news articles and 13,448 YouTube videos published by mainstream media outlets, 107,898 user posts and 3,829,309 comments on the social media platform Reddit, and 278,456,892 views of COVID-19-related Wikipedia pages. To analyze the relationship between media coverage, epidemic progression, and users' collective web-based response, we considered a linear regression model that predicts the public response for each country given the amount of news exposure. We also applied topic modelling to the data set using nonnegative matrix factorization. RESULTS Our results show that public attention, quantified as user activity on Reddit and active searches on Wikipedia pages, is mainly driven by media coverage; meanwhile, this activity declines rapidly while news exposure and COVID-19 incidence remain high. Furthermore, using an unsupervised, dynamic topic modeling approach, we show that while the levels of attention dedicated to different topics by media outlets and internet users are in good accordance, interesting deviations emerge in their temporal patterns. CONCLUSIONS Overall, our findings offer an additional key to interpret public perception and response to the current global health emergency and raise questions about the effects of attention saturation on people's collective awareness and risk perception and thus on their tendencies toward behavioral change.
Collapse
|
24
|
Kramer SC, Pei S, Shaman J. Forecasting influenza in Europe using a metapopulation model incorporating cross-border commuting and air travel. PLoS Comput Biol 2020; 16:e1008233. [PMID: 33052907 PMCID: PMC7588111 DOI: 10.1371/journal.pcbi.1008233] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 10/26/2020] [Accepted: 08/10/2020] [Indexed: 11/18/2022] Open
Abstract
Past work has shown that models incorporating human travel can improve the quality of influenza forecasts. Here, we develop and validate a metapopulation model of twelve European countries, in which international translocation of virus is driven by observed commuting and air travel flows, and use this model to generate influenza forecasts in conjunction with incidence data from the World Health Organization. We find that, although the metapopulation model fits the data well, it offers no improvement over isolated models in forecast quality. We discuss several potential reasons for these results. In particular, we note the need for data that are more comparable from country to country, and offer suggestions as to how surveillance systems might be improved to achieve this goal.
Collapse
Affiliation(s)
- Sarah C Kramer
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York, United States of America
| | - Sen Pei
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York, United States of America
| | - Jeffrey Shaman
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York, United States of America
| |
Collapse
|
25
|
Jia Q, Guo Y, Wang G, Barnes SJ. Big Data Analytics in the Fight against Major Public Health Incidents (Including COVID-19): A Conceptual Framework. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E6161. [PMID: 32854265 PMCID: PMC7503476 DOI: 10.3390/ijerph17176161] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 08/19/2020] [Accepted: 08/21/2020] [Indexed: 11/16/2022]
Abstract
Major public health incidents such as COVID-19 typically have characteristics of being sudden, uncertain, and hazardous. If a government can effectively accumulate big data from various sources and use appropriate analytical methods, it may quickly respond to achieve optimal public health decisions, thereby ameliorating negative impacts from a public health incident and more quickly restoring normality. Although there are many reports and studies examining how to use big data for epidemic prevention, there is still a lack of an effective review and framework of the application of big data in the fight against major public health incidents such as COVID-19, which would be a helpful reference for governments. This paper provides clear information on the characteristics of COVID-19, as well as key big data resources, big data for the visualization of pandemic prevention and control, close contact screening, online public opinion monitoring, virus host analysis, and pandemic forecast evaluation. A framework is provided as a multidimensional reference for the effective use of big data analytics technology to prevent and control epidemics (or pandemics). The challenges and suggestions with respect to applying big data for fighting COVID-19 are also discussed.
Collapse
Affiliation(s)
- Qiong Jia
- Department of Management, Hohai Business School, Hohai University, Nanjing 211100, China; (Q.J.); (G.W.)
| | - Yue Guo
- The Department of Information System and Management Engineering, Faculty of Business, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen 518055, China;
| | - Guanlin Wang
- Department of Management, Hohai Business School, Hohai University, Nanjing 211100, China; (Q.J.); (G.W.)
| | - Stuart J. Barnes
- CODA Research Centre, King’s Business School, King’s College London, Bush House, 30 Aldwych, London WC2B 4BG, UK
| |
Collapse
|
26
|
Caldwell WK, Fairchild G, Del Valle SY. Surveilling Influenza Incidence With Centers for Disease Control and Prevention Web Traffic Data: Demonstration Using a Novel Dataset. J Med Internet Res 2020; 22:e14337. [PMID: 32437327 PMCID: PMC7367534 DOI: 10.2196/14337] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 01/29/2020] [Accepted: 03/22/2020] [Indexed: 11/23/2022] Open
Abstract
Background Influenza epidemics result in a public health and economic burden worldwide. Traditional surveillance techniques, which rely on doctor visits, provide data with a delay of 1 to 2 weeks. A means of obtaining real-time data and forecasting future outbreaks is desirable to provide more timely responses to influenza epidemics. Objective This study aimed to present the first implementation of a novel dataset by demonstrating its ability to supplement traditional disease surveillance at multiple spatial resolutions. Methods We used internet traffic data from the Centers for Disease Control and Prevention (CDC) website to determine the potential usability of this data source. We tested the traffic generated by 10 influenza-related pages in 8 states and 9 census divisions within the United States and compared it against clinical surveillance data. Results Our results yielded an r2 value of 0.955 in the most successful case, promising results for some cases, and unsuccessful results for other cases. In the interest of scientific transparency to further the understanding of when internet data streams are an appropriate supplemental data source, we also included negative results (ie, unsuccessful models). Models that focused on a single influenza season were more successful than those that attempted to model multiple influenza seasons. Geographic resolution appeared to play a key role, with national and regional models being more successful, overall, than models at the state level. Conclusions These results demonstrate that internet data may be able to complement traditional influenza surveillance in some cases but not in others. Specifically, our results show that the CDC website traffic may inform national- and division-level models but not models for each individual state. In addition, our results show better agreement when the data were broken up by seasons instead of aggregated over several years. We anticipate that this work will lead to more complex nowcasting and forecasting models using this data stream.
Collapse
Affiliation(s)
- Wendy K Caldwell
- X Computational Physics Division, Los Alamos National Laboratory, Los Alamos, NM, United States.,School of Mathematical and Statistical Sciences, Arizona State University, Tempe, AZ, United States
| | - Geoffrey Fairchild
- Analytics, Intelligence, and Technology Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| | - Sara Y Del Valle
- Analytics, Intelligence, and Technology Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| |
Collapse
|
27
|
Barros JM, Duggan J, Rebholz-Schuhmann D. The Application of Internet-Based Sources for Public Health Surveillance (Infoveillance): Systematic Review. J Med Internet Res 2020; 22:e13680. [PMID: 32167477 PMCID: PMC7101503 DOI: 10.2196/13680] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Revised: 09/18/2019] [Accepted: 11/26/2019] [Indexed: 12/30/2022] Open
Abstract
Background Public health surveillance is based on the continuous and systematic collection, analysis, and interpretation of data. This informs the development of early warning systems to monitor epidemics and documents the impact of intervention measures. The introduction of digital data sources, and specifically sources available on the internet, has impacted the field of public health surveillance. New opportunities enabled by the underlying availability and scale of internet-based sources (IBSs) have paved the way for novel approaches for disease surveillance, exploration of health communities, and the study of epidemic dynamics. This field and approach is also known as infodemiology or infoveillance. Objective This review aimed to assess research findings regarding the application of IBSs for public health surveillance (infodemiology or infoveillance). To achieve this, we have presented a comprehensive systematic literature review with a focus on these sources and their limitations, the diseases targeted, and commonly applied methods. Methods A systematic literature review was conducted targeting publications between 2012 and 2018 that leveraged IBSs for public health surveillance, outbreak forecasting, disease characterization, diagnosis prediction, content analysis, and health-topic identification. The search results were filtered according to previously defined inclusion and exclusion criteria. Results Spanning a total of 162 publications, we determined infectious diseases to be the preferred case study (108/162, 66.7%). Of the eight categories of IBSs (search queries, social media, news, discussion forums, websites, web encyclopedia, and online obituaries), search queries and social media were applied in 95.1% (154/162) of the reviewed publications. We also identified limitations in representativeness and biased user age groups, as well as high susceptibility to media events by search queries, social media, and web encyclopedias. Conclusions IBSs are a valuable proxy to study illnesses affecting the general population; however, it is important to characterize which diseases are best suited for the available sources; the literature shows that the level of engagement among online platforms can be a potential indicator. There is a necessity to understand the population’s online behavior; in addition, the exploration of health information dissemination and its content is significantly unexplored. With this information, we can understand how the population communicates about illnesses online and, in the process, benefit public health.
Collapse
Affiliation(s)
- Joana M Barros
- Insight Centre for Data Analytics, National University of Ireland Galway, Galway, Ireland.,School of Computer Science, National University of Ireland Galway, Galway, Ireland
| | - Jim Duggan
- School of Computer Science, National University of Ireland Galway, Galway, Ireland
| | | |
Collapse
|
28
|
Lu J, Meyer S. Forecasting Flu Activity in the United States: Benchmarking an Endemic-Epidemic Beta Model. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E1381. [PMID: 32098038 PMCID: PMC7068443 DOI: 10.3390/ijerph17041381] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 02/07/2020] [Accepted: 02/15/2020] [Indexed: 11/25/2022]
Abstract
Accurate prediction of flu activity enables health officials to plan disease prevention and allocate treatment resources. A promising forecasting approach is to adapt the well-established endemic-epidemic modeling framework to time series of infectious disease proportions. Using U.S. influenza-like illness surveillance data over 18 seasons, we assessed probabilistic forecasts of this new beta autoregressive model with proper scoring rules. Other readily available forecasting tools were used for comparison, including Prophet, (S)ARIMA and kernel conditional density estimation (KCDE). Short-term flu activity was equally well predicted up to four weeks ahead by the beta model with four autoregressive lags and by KCDE; however, the beta model runs much faster. Non-dynamic Prophet scored worst. Relative performance differed for seasonal peak prediction. Prophet produced the best peak intensity forecasts in seasons with standard epidemic curves; otherwise, KCDE outperformed all other methods. Peak timing was best predicted by SARIMA, KCDE or the beta model, depending on the season. The best overall performance when predicting peak timing and intensity was achieved by KCDE. Only KCDE and naive historical forecasts consistently outperformed the equal-bin reference approach for all test seasons. We conclude that the endemic-epidemic beta model is a performant and easy-to-implement tool to forecast flu activity a few weeks ahead. Real-time forecasting of the seasonal peak, however, should consider outputs of multiple models simultaneously, weighing their usefulness as the season progresses.
Collapse
Affiliation(s)
| | - Sebastian Meyer
- Institute of Medical Informatics, Biometry, and Epidemiology, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany;
| |
Collapse
|
29
|
Darwish A, Rahhal Y, Jafar A. A comparative study on predicting influenza outbreaks using different feature spaces: application of influenza-like illness data from Early Warning Alert and Response System in Syria. BMC Res Notes 2020; 13:33. [PMID: 31948473 PMCID: PMC6964210 DOI: 10.1186/s13104-020-4889-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Accepted: 01/03/2020] [Indexed: 11/10/2022] Open
Abstract
Objective An accurate forecasting of outbreaks of influenza-like illness (ILI) could support public health officials to suggest public health actions earlier. We investigated the performance of three different feature spaces in different models to forecast the weekly ILI rate in Syria using EWARS data from World Health Organization (WHO). Time series feature space was first used and we applied the seven models which are Naïve, Average, Seasonal naïve, drift, dynamic harmonic regression (Dhr), seasonal and trend decomposition using loess (STL) and TBATS. The Second feature space is like some state-of-the-art, which we named \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$53-weeks-before\_52-first-order-difference$$\end{document}53-weeks-before_52-first-order-difference feature space. The third one, we proposed and named \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$n-years-before\_m-weeks-around$$\end{document}n-years-before_m-weeks-around (YnWm) feature space. Machine learning (ML) and deep learning (DL) model were applied to the second and third feature spaces (generalized linear model (GLM), support vector regression (SVR), gradient boosting (GB), random forest (RF) and long short term memory (LSTM)). Results It was indicated that the LSTM model of four layers with \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$1-year-before\_4-weeks-around$$\end{document}1-year-before_4-weeks-around feature space gave more accurate results than other models and reached the lowest MAPE of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$3.52\%$$\end{document}3.52% and the lowest RMSE of 0.01662. I hope that this modelling methodology can be applied in other countries and therefore help prevent and control influenza worldwide.
Collapse
Affiliation(s)
- Ali Darwish
- Department of Informatics, Higher Institute for Applied Sciences and Technology, Damascus, Syria.
| | - Yasser Rahhal
- Department of Informatics, Higher Institute for Applied Sciences and Technology, Damascus, Syria
| | - Assef Jafar
- Department of Informatics, Higher Institute for Applied Sciences and Technology, Damascus, Syria
| |
Collapse
|
30
|
Zimmer C, Leuba SI, Cohen T, Yaesoubi R. Accurate quantification of uncertainty in epidemic parameter estimates and predictions using stochastic compartmental models. Stat Methods Med Res 2019; 28:3591-3608. [PMID: 30428780 PMCID: PMC6517086 DOI: 10.1177/0962280218805780] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Stochastic transmission dynamic models are needed to quantify the uncertainty in estimates and predictions during outbreaks of infectious diseases. We previously developed a calibration method for stochastic epidemic compartmental models, called Multiple Shooting for Stochastic Systems (MSS), and demonstrated its competitive performance against a number of existing state-of-the-art calibration methods. The existing MSS method, however, lacks a mechanism against filter degeneracy, a phenomenon that results in parameter posterior distributions that are weighted heavily around a single value. As such, when filter degeneracy occurs, the posterior distributions of parameter estimates will not yield reliable credible or prediction intervals for parameter estimates and predictions. In this work, we extend the MSS method by evaluating and incorporating two resampling techniques to detect and resolve filter degeneracy. Using simulation experiments, we demonstrate that an extended MSS method produces credible and prediction intervals with desired coverage in estimating key epidemic parameters (e.g. mean duration of infectiousness and R0) and short- and long-term predictions (e.g. one and three-week forecasts, timing and number of cases at the epidemic peak, and final epidemic size). Applying the extended MSS approach to a humidity-based stochastic compartmental influenza model, we were able to accurately predict influenza-like illness activity reported by U.S. Centers for Disease Control and Prevention from 10 regions as well as city-level influenza activity using real-time, city-specific Google search query data from 119 U.S. cities between 2003 and 2014.
Collapse
Affiliation(s)
- Christoph Zimmer
- Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
- Bosch Center for Artificial Intelligence, Robert Bosch GmbH, Renningen, Germany
| | - Sequoia I Leuba
- Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
| | - Ted Cohen
- Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
| | - Reza Yaesoubi
- Health Policy and Management, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
31
|
Choi SB, Kim J, Ahn I. Forecasting type-specific seasonal influenza after 26 weeks in the United States using influenza activities in other countries. PLoS One 2019; 14:e0220423. [PMID: 31765386 PMCID: PMC6876883 DOI: 10.1371/journal.pone.0220423] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Accepted: 11/04/2019] [Indexed: 12/21/2022] Open
Abstract
To identify countries that have seasonal patterns similar to the time series of influenza surveillance data in the United States and other countries, and to forecast the 2018-2019 seasonal influenza outbreak in the U.S., we collected the surveillance data of 164 countries using the FluNet database, search queries from Google Trends, and temperature from 2010 to 2018. Data for influenza-like illness (ILI) in the U.S. were collected from the Fluview database. We identified the time lag between two time-series which were weekly surveillances for ILI, total influenza (Total INF), influenza A (INF A), and influenza B (INF B) viruses between two countries using cross-correlation analysis. In order to forecast ILI, Total INF, INF A, and INF B of next season (after 26 weeks) in the U.S., we developed prediction models using linear regression, auto regressive integrated moving average, and an artificial neural network (ANN). As a result of cross-correlation analysis between the countries located in northern and southern hemisphere, the seasonal influenza patterns in Australia and Chile showed a high correlation with those of the U.S. 22 weeks and 28 weeks earlier, respectively. The R2 score of ANN models for ILI for validation set in 2015-2019 was 0.758 despite how hard it is to forecast 26 weeks ahead. Our prediction models forecast that the ILI for the U.S. in 2018-2019 may be later and less severe than those in 2017-2018, judging from the influenza activity for Australia and Chile in 2018. It allows to estimate peak timing, peak intensity, and type-specific influenza activities for next season at 40th week. The correlation between seasonal influenza patterns in the U.S., Australia, and Chile could be used to forecast the next seasonal influenza pattern, which can help to determine influenza vaccine strategy approximately six months ahead in the U.S.
Collapse
Affiliation(s)
- Soo Beom Choi
- Department of Data-centric Problem Solving Research, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
- Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology, Daejeon, Republic of Korea
| | - Juhyeon Kim
- Department of Data-centric Problem Solving Research, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
- Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology, Daejeon, Republic of Korea
| | - Insung Ahn
- Department of Data-centric Problem Solving Research, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
- Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology, Daejeon, Republic of Korea
| |
Collapse
|
32
|
Rangarajan P, Mody SK, Marathe M. Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data. PLoS Comput Biol 2019; 15:e1007518. [PMID: 31751346 PMCID: PMC6894887 DOI: 10.1371/journal.pcbi.1007518] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 12/05/2019] [Accepted: 10/29/2019] [Indexed: 12/20/2022] Open
Abstract
Dengue and influenza-like illness (ILI) are two of the leading causes of viral infection in the world and it is estimated that more than half the world’s population is at risk for developing these infections. It is therefore important to develop accurate methods for forecasting dengue and ILI incidences. Since data from multiple sources (such as dengue and ILI case counts, electronic health records and frequency of multiple internet search terms from Google Trends) can improve forecasts, standard time series analysis methods are inadequate to estimate all the parameter values from the limited amount of data available if we use multiple sources. In this paper, we use a computationally efficient implementation of the known variable selection method that we call the Autoregressive Likelihood Ratio (ARLR) method. This method combines sparse representation of time series data, electronic health records data (for ILI) and Google Trends data to forecast dengue and ILI incidences. This sparse representation method uses an algorithm that maximizes an appropriate likelihood ratio at every step. Using numerical experiments, we demonstrate that our method recovers the underlying sparse model much more accurately than the lasso method. We apply our method to dengue case count data from five countries/states: Brazil, Mexico, Singapore, Taiwan, and Thailand and to ILI case count data from the United States. Numerical experiments show that our method outperforms existing time series forecasting methods in forecasting the dengue and ILI case counts. In particular, our method gives a 18 percent forecast error reduction over a leading method that also uses data from multiple sources. It also performs better than other methods in predicting the peak value of the case count and the peak time. Dengue and influenza-like illness (ILI) are leading causes of viral infection in the world and hence it is important to develop accurate methods for forecasting their incidence. We use Autoregressive Likelihood Ratio method, which is a computationally efficient implementation of the variable selection method, in order to obtain a sparse (non-lasso) representation of time series, Google Trends and electronic health records (for ILI) data. This method is used to forecast dengue incidence in five countries/states and ILI incidence in USA. We show that this method outperforms existing time series methods in forecasting these diseases. The method is general and can also be used to forecast other diseases.
Collapse
Affiliation(s)
- Prashant Rangarajan
- Departments of Computer Science and Mathematics, Birla Institute of Technology and Science, Pilani, India
| | - Sandeep K. Mody
- Department of Mathematics, Indian Institute of Science, Bangalore, India
| | - Madhav Marathe
- Department of Computer Science, Network, Simulation Science and Advanced Computing Division, Biocomplexity Institute, University of Virginia, Charlottesville, Virginia, United States of America
- * E-mail:
| |
Collapse
|
33
|
A Comparative Study on the Prediction of Occupational Diseases in China with Hybrid Algorithm Combing Models. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019; 2019:8159506. [PMID: 31662788 PMCID: PMC6791229 DOI: 10.1155/2019/8159506] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 08/03/2019] [Accepted: 08/27/2019] [Indexed: 11/17/2022]
Abstract
Occupational disease is a huge problem in China, and many workers are under risk. Accurate forecasting of occupational disease incidence can provide critical information for prevention and control. Therefore, in this study, five hybrid algorithm combing models were assessed on their effectiveness and applicability to predict the incidence of occupational diseases in China. The five hybrid algorithm combing models are the combination of five grey models (EGM, ODGM, EDGM, DGM, and Verhulst) and five state-of-art machine learning models (KNN, SVM, RF, GBM, and ANN). The quality of the models were assessed based on the accuracy of model prediction as well as minimizing mean absolute percentage error (MAPE) and root-mean-squared error (RMSE). Our results showed that the GM-ANN model provided the most precise prediction among all the models with lowest mean absolute percentage error (MAPE) of 3.49% and root-mean-squared error (RMSE) of 1076.60. Therefore, the GM-ANN model can be used for precise prediction of occupational diseases in China, which may provide valuable information for the prevention and control of occupational diseases in the future.
Collapse
|
34
|
Hosseini S, Karami M, Farhadian M, Mohammadi Y. Seasonal Activity of Influenza in Iran: Application of Influenza-like Illness Data from Sentinel Sites of Healthcare Centers during 2010 to 2015. J Epidemiol Glob Health 2019; 8:29-33. [PMID: 30859784 PMCID: PMC7325813 DOI: 10.2991/j.jegh.2018.08.100] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 06/21/2018] [Indexed: 11/26/2022] Open
Abstract
This study aimed to predict seasonal influenza activity and detection of influenza outbreaks. Data of all registered cases (n = 53,526) of influenza-like illnesses (ILIs) from sentinel sites of healthcare centers in Iran were obtained from the FluNet web-based tool, World Health Organization (WHO), from 2010 to 2015. The status of the ILI activity was obtained from the FluNet and considered as the gold standard of the seasonal activity of influenza during the study period. The cumulative sum (CUSUM) as an outbreak detection method was used to predict the seasonal activity of influenza. Also, time series similarity between the ILI trend and CUSUM was assessed using the cross-correlogram. Of 7684 (14%) positive cases of influenza, about 71% were type A virus and 28% were type B virus. The majority of the outbreaks occurred in winter and autumn. Results of the cross-correlogram showed that there was a considerable similarity between time series graphs of the ILI cases and CUSUM values. However, the CUSUM algorithm did not have a good performance in the timely detection of influenza activity. Despite a considerable similarity between time series of the ILI cases and CUSUM algorithm in weekly lag, the seasonal activity of influenza in Iran could not be predicted by the CUSUM algorithm.
Collapse
Affiliation(s)
- Seyedhadi Hosseini
- Department of Epidemiology, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Manoochehr Karami
- Department of Epidemiology, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran.,Research Center for Health Sciences, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Maryam Farhadian
- Modeling of Non-communicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Younes Mohammadi
- Social Determinants of Health Research Center, Hamadan University of Medical Sciences, Hamadan, Iran
| |
Collapse
|
35
|
Zhong X, Raghib M. Revisiting the use of web search data for stock market movements. Sci Rep 2019; 9:13511. [PMID: 31534170 PMCID: PMC6751183 DOI: 10.1038/s41598-019-50131-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Accepted: 09/03/2019] [Indexed: 11/09/2022] Open
Abstract
Advances in Big Data make it possible to make short-term forecasts for market trends from previously unexplored sources. Trading strategies were recently developed by exploiting a link between the online search activity of certain terms semantically related to finance and market movements. Here we build on these earlier results by exploring a data-driven strategy which adaptively leverages the Google Correlate service and automatically chooses a new set of search terms for every trading decision. In a backtesting experiment run from 2008 to 2017 we obtained a 499% cumulative return which compares favourably with benchmark strategies. A crowdsourcing exercise reveals that the term selection process preferentially selects highly specific terms semantically related to finance (e.g. Wells Fargo Bank), which may capture the transient interests of investors, but at the cost of a shorter span of validity. The adaptive strategy quickly updates the set of search terms when a better combination is found, leading to more consistent predictability. We anticipate that this adaptive decision framework can be of value not only for financial applications, but also in other areas of computational social science, where linkages between facets of collective human behavior and online searches can be inferred from digital footprint data.
Collapse
Affiliation(s)
- Xu Zhong
- IBM Research Australia, Melbourne, Victoria, Australia.
| | | |
Collapse
|
36
|
Leveraging Google Trends, Twitter, and Wikipedia to Investigate the Impact of a Celebrity's Death From Rheumatoid Arthritis. J Clin Rheumatol 2019; 24:188-192. [PMID: 29461342 DOI: 10.1097/rhu.0000000000000692] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
BACKGROUND Technological advancements, such as patient-centered smartphone applications, have enabled to support self-management of the disease. Further, the accessibility to health information through the Internet has grown tremendously. This article aimed to investigate how big data can be useful to assess the impact of a celebrity's rheumatic disease on the public opinion. METHODS Variable tools and statistical/computational approaches have been used, including massive data mining of Google Trends, Wikipedia, Twitter, and big data analytics. These tools were mined using an in-house script, which facilitated the process of data collection, parsing, handling, processing, and normalization. RESULTS From Google Trends, the temporal correlation between "Anna Marchesini" and rheumatoid arthritis (RA) queries resulted 0.66 before Anna Marchesini's death and 0.90 after Anna Marchesini's death. The geospatial correlation between "Anna Marchesini" and RA queries resulted 0.45 before Anna Marchesini's death and 0.52 after Anna Marchesini's death. From Wikitrends, after Anna Marchesini's death, the number of accesses to Wikipedia page for RA has increased 5770%. From Twitter, 1979 tweets have been retrieved. Numbers of likes, retweets, and hashtags have increased throughout time. CONCLUSIONS Novel data streams and big data analytics are effective to assess the impact of a disease in a famous person on the laypeople.
Collapse
|
37
|
Su K, Xu L, Li G, Ruan X, Li X, Deng P, Li X, Li Q, Chen X, Xiong Y, Lu S, Qi L, Shen C, Tang W, Rong R, Hong B, Ning Y, Long D, Xu J, Shi X, Yang Z, Zhang Q, Zhuang Z, Zhang L, Xiao J, Li Y. Forecasting influenza activity using self-adaptive AI model and multi-source data in Chongqing, China. EBioMedicine 2019; 47:284-292. [PMID: 31477561 PMCID: PMC6796527 DOI: 10.1016/j.ebiom.2019.08.024] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 08/09/2019] [Accepted: 08/09/2019] [Indexed: 02/05/2023] Open
Abstract
Background Early detection of influenza activity followed by timely response is a critical component of preparedness for seasonal influenza epidemic and influenza pandemic. However, most relevant studies were conducted at the regional or national level with regular seasonal influenza trends. There are few feasible strategies to forecast influenza activity at the local level with irregular trends. Methods Multi-source electronic data, including historical percentage of influenza-like illness (ILI%), weather data, Baidu search index and Sina Weibo data of Chongqing, China, were collected and integrated into an innovative Self-adaptive AI Model (SAAIM), which was constructed by integrating Seasonal Autoregressive Integrated Moving Average model and XGBoost model using a self-adaptive weight adjustment mechanism. SAAIM was applied to ILI% forecast in Chongqing from 2017 to 2018, of which the performance was compared with three previously available models on forecasting. Findings ILI% showed an irregular seasonal trend from 2012 to 2018 in Chongqing. Compared with three reference models, SAAIM achieved the best performance on forecasting ILI% of Chongqing with the mean absolute percentage error (MAPE) of 11·9%, 7·5%, and 11·9% during the periods of the year 2014–2016, 2017, and 2018 respectively. Among the three categories of source data, historical influenza activity contributed the most to the forecast accuracy by decreasing the MAPE by 19·6%, 43·1%, and 11·1%, followed by weather information (MAPE reduced by 3·3%, 17·1%, and 2·2%), and Internet-related public sentiment data (MAPE reduced by 1·1%, 0·9%, and 1·3%). Interpretation Accurate influenza forecast in areas with irregular seasonal influenza trends can be made by SAAIM with multi-source electronic data.
Collapse
Affiliation(s)
- Kun Su
- Department of Epidemiology, College of Preventive Medicine, Army Medical University (Third Military Medical University), Chongqing, People's Republic of China; Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Liang Xu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Guanqiao Li
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Xiaowen Ruan
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xian Li
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Pan Deng
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xinmi Li
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Qin Li
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Xianxian Chen
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Yu Xiong
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Shaofeng Lu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Li Qi
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Chaobo Shen
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Wenge Tang
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Rong Rong
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Boran Hong
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Yi Ning
- Meinian Institute of Health, Beijing, People's Republic of China
| | - Dongyan Long
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Jiaying Xu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xuanling Shi
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Zhihong Yang
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Qi Zhang
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Ziqi Zhuang
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Linqi Zhang
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China.
| | - Jing Xiao
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China.
| | - Yafei Li
- Department of Epidemiology, College of Preventive Medicine, Army Medical University (Third Military Medical University), Chongqing, People's Republic of China.
| |
Collapse
|
38
|
Penny SG, Akella S, Balmaseda MA, Browne P, Carton JA, Chevallier M, Counillon F, Domingues C, Frolov S, Heimbach P, Hogan P, Hoteit I, Iovino D, Laloyaux P, Martin MJ, Masina S, Moore AM, de Rosnay P, Schepers D, Sloyan BM, Storto A, Subramanian A, Nam S, Vitart F, Yang C, Fujii Y, Zuo H, O’Kane T, Sandery P, Moore T, Chapman CC. Observational Needs for Improving Ocean and Coupled Reanalysis, S2S Prediction, and Decadal Prediction. FRONTIERS IN MARINE SCIENCE 2019; 6:391. [PMID: 31534949 PMCID: PMC6750049 DOI: 10.3389/fmars.2019.00391] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Developments in observing system technologies and ocean data assimilation (DA) are symbiotic. New observation types lead to new DA methods and new DA methods, such as coupled DA, can change the value of existing observations or indicate where new observations can have greater utility for monitoring and prediction. Practitioners of DA are encouraged to make better use of observations that are already available, for example, taking advantage of strongly coupled DA so that ocean observations can be used to improve atmospheric analyses and vice versa. Ocean reanalyses are useful for the analysis of climate as well as the initialization of operational long-range prediction models. There are many remaining challenges for ocean reanalyses due to biases and abrupt changes in the ocean-observing system throughout its history, the presence of biases and drifts in models, and the simplifying assumptions made in DA solution methods. From a governance point of view, more support is needed to bring the ocean-observing and DA communities together. For prediction applications, there is wide agreement that protocols are needed for rapid communication of ocean-observing data on numerical weather prediction (NWP) timescales. There is potential for new observation types to enhance the observing system by supporting prediction on multiple timescales, ranging from the typical timescale of NWP, covering hours to weeks, out to multiple decades. Better communication between DA and observation communities is encouraged in order to allow operational prediction centers the ability to provide guidance for the design of a sustained and adaptive observing network.
Collapse
Affiliation(s)
- Stephen G. Penny
- Department of Atmospheric and Oceanic Science, University of Maryland, College Park, MD, United States
| | - Santha Akella
- National Aeronautics and Space Administration, Goddard Space Flight Center, Greenbelt, MD, United States
| | | | - Philip Browne
- European Centre for Medium-Range Weather Forecasts, Reading, United Kingdom
| | - James A. Carton
- Department of Atmospheric and Oceanic Science, University of Maryland, College Park, MD, United States
| | | | | | - Catia Domingues
- Antarctic Climate and Ecosystems Cooperative Research Centre, Hobart, TAS, Australia
| | - Sergey Frolov
- Naval Research Laboratory, Monterey, CA, United States
| | | | - Patrick Hogan
- Naval Research Laboratory, Stennis Space Center, MS, United States
| | - Ibrahim Hoteit
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | | | - Patrick Laloyaux
- European Centre for Medium-Range Weather Forecasts, Reading, United Kingdom
| | | | - Simona Masina
- Euro-Mediterranean Center on Climate Change, Lecce, Italy
| | - Andrew M. Moore
- University of California, Santa Cruz, Santa Cruz, CA, United States
| | - Patricia de Rosnay
- European Centre for Medium-Range Weather Forecasts, Reading, United Kingdom
| | - Dinand Schepers
- European Centre for Medium-Range Weather Forecasts, Reading, United Kingdom
| | - Bernadette M. Sloyan
- Commonwealth Scientific and Industrial Research Organisation, Canberra, ACT, Australia
| | - Andrea Storto
- NATO Centre for Maritime Research and Experimentation, La Spezia, Italy
| | - Aneesh Subramanian
- Department of Atmospheric and Oceanic Science, University of Colorado, Boulder, Boulder, CO, United States
| | | | - Frederic Vitart
- European Centre for Medium-Range Weather Forecasts, Reading, United Kingdom
| | - Chunxue Yang
- Istituto di Scienze Marine, Consiglio Nazionale delle Ricerche, Rome, Italy
| | - Yosuke Fujii
- JMA Meteorological Research Institute, Tsukuba, Japan
| | - Hao Zuo
- European Centre for Medium-Range Weather Forecasts, Reading, United Kingdom
| | - Terry O’Kane
- Commonwealth Scientific and Industrial Research Organisation, Canberra, ACT, Australia
| | - Paul Sandery
- Commonwealth Scientific and Industrial Research Organisation, Canberra, ACT, Australia
| | - Thomas Moore
- Commonwealth Scientific and Industrial Research Organisation, Canberra, ACT, Australia
| | | |
Collapse
|
39
|
Clemente L, Lu F, Santillana M. Improved Real-Time Influenza Surveillance: Using Internet Search Data in Eight Latin American Countries. JMIR Public Health Surveill 2019; 5:e12214. [PMID: 30946017 PMCID: PMC6470460 DOI: 10.2196/12214] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 02/11/2019] [Accepted: 02/15/2019] [Indexed: 01/18/2023] Open
Abstract
Background Novel influenza surveillance systems that leverage Internet-based real-time data sources including Internet search frequencies, social-network information, and crowd-sourced flu surveillance tools have shown improved accuracy over the past few years in data-rich countries like the United States. These systems not only track flu activity accurately, but they also report flu estimates a week or more ahead of the publication of reports produced by healthcare-based systems, such as those implemented and managed by the Centers for Disease Control and Prevention. Previous work has shown that the predictive capabilities of novel flu surveillance systems, like Google Flu Trends (GFT), in developing countries in Latin America have not yet delivered acceptable flu estimates. Objective The aim of this study was to show that recent methodological improvements on the use of Internet search engine information to track diseases can lead to improved retrospective flu estimates in multiple countries in Latin America. Methods A machine learning-based methodology that uses flu-related Internet search activity and historical information to monitor flu activity, named ARGO (AutoRegression with Google search), was extended to generate flu predictions for 8 Latin American countries (Argentina, Bolivia, Brazil, Chile, Mexico, Paraguay, Peru, and Uruguay) for the time period: January 2012 to December of 2016. These retrospective (out-of-sample) Influenza activity predictions were compared with historically observed flu suspected cases in each country, as reported by Flunet, an influenza surveillance database maintained by the World Health Organization. For a baseline comparison, retrospective (out-of-sample) flu estimates were produced for the same time period using autoregressive models that only leverage historical flu activity information. Results Our results show that ARGO-like models’ predictive power outperform autoregressive models in 6 out of 8 countries in the 2012-2016 time period. Moreover, ARGO significantly improves on historical flu estimates produced by the now discontinued GFT for the time period of 2012-2015, where GFT information is publicly available. Conclusions We demonstrate here that a self-correcting machine learning method, leveraging Internet-based disease-related search activity and historical flu trends, has the potential to produce reliable and timely flu estimates in multiple Latin American countries. This methodology may prove helpful to local public health officials who design and implement interventions aimed at mitigating the effects of influenza outbreaks. Our methodology generally outperforms both the now-discontinued tool GFT, and autoregressive methodologies that exploit only historical flu activity to produce future disease estimates.
Collapse
Affiliation(s)
- Leonardo Clemente
- School of Engineering and Sciences, Tecnologico de Monterrey, Monterrey, Mexico.,Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
| | - Fred Lu
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.,Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
40
|
Ning S, Yang S, Kou SC. Accurate regional influenza epidemics tracking using Internet search data. Sci Rep 2019; 9:5238. [PMID: 30918276 PMCID: PMC6437143 DOI: 10.1038/s41598-019-41559-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 03/12/2019] [Indexed: 12/12/2022] Open
Abstract
Accurate, high-resolution tracking of influenza epidemics at the regional level helps public health agencies make informed and proactive decisions, especially in the face of outbreaks. Internet users' online searches offer great potential for the regional tracking of influenza. However, due to the complex data structure and reduced quality of Internet data at the regional level, few established methods provide satisfactory performance. In this article, we propose a novel method named ARGO2 (2-step Augmented Regression with GOogle data) that efficiently combines publicly available Google search data at different resolutions (national and regional) with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) for accurate, real-time regional tracking of influenza. ARGO2 gives very competitive performance across all US regions compared with available Internet-data-based regional influenza tracking methods, and it has achieved 30% error reduction over the best alternative method that we numerically tested for the period of March 2009 to March 2018. ARGO2 is reliable and robust, with the flexibility to incorporate additional information from other sources and resolutions, making it a powerful tool for regional influenza tracking, and potentially for tracking other social, economic, or public health events at the regional or local level.
Collapse
Affiliation(s)
- Shaoyang Ning
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA
| | - Shihao Yang
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA
| | - S C Kou
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA.
| |
Collapse
|
41
|
A season for all things: Phenological imprints in Wikipedia usage and their relevance to conservation. PLoS Biol 2019; 17:e3000146. [PMID: 30835729 PMCID: PMC6400330 DOI: 10.1371/journal.pbio.3000146] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 01/29/2019] [Indexed: 11/19/2022] Open
Abstract
Phenology plays an important role in many human–nature interactions, but these seasonal patterns are often overlooked in conservation. Here, we provide the first broad exploration of seasonal patterns of interest in nature across many species and cultures. Using data from Wikipedia, a large online encyclopedia, we analyzed 2.33 billion pageviews to articles for 31,751 species across 245 languages. We show that seasonality plays an important role in how and when people interact with plants and animals online. In total, over 25% of species in our data set exhibited a seasonal pattern in at least one of their language-edition pages, and seasonality is significantly more prevalent in pages for plants and animals than it is in a random selection of Wikipedia articles. Pageview seasonality varies across taxonomic clades in ways that reflect observable patterns in phenology, with groups such as insects and flowering plants having higher seasonality than mammals. Differences between Wikipedia language editions are significant; pages in languages spoken at higher latitudes exhibit greater seasonality overall, and species seldom show the same pattern across multiple language editions. These results have relevance to conservation policy formulation and to improving our understanding of what drives human interest in biodiversity. Analysis of more than two billion page views over nearly three years for Wikipedia articles for 31,751 species across 245 languages reveals that more than a quarter of species show a seasonal pattern, and several online variations mirror real-world phenology. Digital information archives offer novel opportunities to study human attitudes towards nature and to better understand how people interact with other species of animals and plants. The insights gained from such studies may be able to inform conservation efforts. Our study uses time-series of views to pages in the online encyclopedia Wikipedia to look at how human interest in other species varies seasonally across a wide range of different languages. In total, we extracted pageviews for 31,751 species of plants and animals across 245 Wikipedia language editions. Spanning nearly three years, our data set comprises 2.33 billion pageviews across 126,697 pages. We tested each time-series in our data set to see how well it fit a seasonal pattern and in doing so found several interesting patterns. First, seasonality is a significant factor in when people view information for many plants and animals online; over 20% of all of our species pages met our criteria for seasonality. Second, the prevalence of seasonality varies across different biological classes and also across languages. These variations appear to reflect differences in the life history of species and in the geographic distribution of languages and can correspond to phenological patterns in nature. Our results are relevant to conservationists seeking to understand how interest in various plants and animals may fluctuate over time.
Collapse
|
42
|
Real-Time Forecasting of Hand-Foot-and-Mouth Disease Outbreaks using the Integrating Compartment Model and Assimilation Filtering. Sci Rep 2019; 9:2661. [PMID: 30804467 PMCID: PMC6389963 DOI: 10.1038/s41598-019-38930-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 01/15/2019] [Indexed: 11/09/2022] Open
Abstract
Hand-foot-and-mouth disease (HFMD) is a highly contagious viral infection, and real-time predicting of HFMD outbreaks will facilitate the timely implementation of appropriate control measures. By integrating a susceptible-exposed-infectious-recovered (SEIR) model and an ensemble Kalman filter (EnKF) assimilation method, we developed an integrated compartment model and assimilation filtering forecast model for real-time forecasting of HFMD. When applied to HFMD outbreak data collected for 2008-11 in Beijing, China, our model successfully predicted the peak week of an outbreak three weeks before the actual arrival of the peak, with a predicted maximum infection rate of 85% or greater than the observed rate. Moreover, dominant virus types enterovirus 71 (EV-71) and coxsackievirus A16 (CV-A16) may account for the different patterns of HFMD transmission and recovery observed. The results of this study can be used to inform agencies responsible for public health management of tailored strategies for disease control efforts during HFMD outbreak seasons.
Collapse
|
43
|
Ferland R, Froda S. A statistical tool for comparing seasonal ILI surveillance data. Sci Rep 2019; 9:1422. [PMID: 30723245 PMCID: PMC6363783 DOI: 10.1038/s41598-018-38292-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Accepted: 12/21/2018] [Indexed: 12/02/2022] Open
Abstract
In this paper, we consider the yearly influenza epidemic, as reflected in the seasonal surveillance data compiled by the CDC (Center for Disease Control and Prevention, USA) and we explore a new methodology for comparing specific features of these data. In particular, we focus on the ten HHS (Health and Human Services) regions, and how the incidence data evolves in these regions. In order to perform the comparisons, we consider the relative distribution of weekly new cases over one season and replace the crude data with predicted values. These predictions are obtained after fitting a negative binomial regression model that controls for important covariates. The prediction is computed on a ‘generic’ set of covariate values that takes into account the relative size (population wise) of the regions to be compared. The main results are presented in graphical form, that quickly emphasizes relevant features of the seasonal data and facilitates the comparisons.
Collapse
Affiliation(s)
- René Ferland
- Département de mathématiques, UQAM, C.P. 8888, succursale centre-ville, Montréal, Québec, H3C 3P8, Canada
| | - Sorana Froda
- Département de mathématiques, UQAM, C.P. 8888, succursale centre-ville, Montréal, Québec, H3C 3P8, Canada.
| |
Collapse
|
44
|
Kramer SC, Shaman J. Development and validation of influenza forecasting for 64 temperate and tropical countries. PLoS Comput Biol 2019; 15:e1006742. [PMID: 30811396 PMCID: PMC6411231 DOI: 10.1371/journal.pcbi.1006742] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 03/11/2019] [Accepted: 12/21/2018] [Indexed: 11/19/2022] Open
Abstract
Accurate forecasts of influenza incidence can be used to inform medical and public health decision-making and response efforts. However, forecasting systems are uncommon in most countries, with a few notable exceptions. Here we use publicly available data from the World Health Organization to generate retrospective forecasts of influenza peak timing and peak intensity for 64 countries, including 18 tropical and subtropical countries. We find that accurate and well-calibrated forecasts can be generated for countries in temperate regions, with peak timing and intensity accuracy exceeding 50% at four and two weeks prior to the predicted epidemic peak, respectively. Forecasts are significantly less accurate in the tropics and subtropics for both peak timing and intensity. This work indicates that, in temperate regions around the world, forecasts can be generated with sufficient lead time to prepare for upcoming outbreak peak incidence.
Collapse
Affiliation(s)
- Sarah C. Kramer
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York, United States of America
| | - Jeffrey Shaman
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York, United States of America
| |
Collapse
|
45
|
Osthus D, Daughton AR, Priedhorsky R. Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited. PLoS Comput Biol 2019; 15:e1006599. [PMID: 30707689 PMCID: PMC6373968 DOI: 10.1371/journal.pcbi.1006599] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 02/13/2019] [Accepted: 10/30/2018] [Indexed: 11/19/2022] Open
Abstract
The ability to produce timely and accurate flu forecasts in the United States can significantly impact public health. Augmenting forecasts with internet data has shown promise for improving forecast accuracy and timeliness in controlled settings, but results in practice are less convincing, as models augmented with internet data have not consistently outperformed models without internet data. In this paper, we perform a controlled experiment, taking into account data backfill, to improve clarity on the benefits and limitations of augmenting an already good flu forecasting model with internet-based nowcasts. Our results show that a good flu forecasting model can benefit from the augmentation of internet-based nowcasts in practice for all considered public health-relevant forecasting targets. The degree of forecast improvement due to nowcasting, however, is uneven across forecasting targets, with short-term forecasting targets seeing the largest improvements and seasonal targets such as the peak timing and intensity seeing relatively marginal improvements. The uneven forecasting improvements across targets hold even when "perfect" nowcasts are used. These findings suggest that further improvements to flu forecasting, particularly seasonal targets, will need to derive from other, non-nowcasting approaches.
Collapse
Affiliation(s)
- Dave Osthus
- Los Alamos National Laboratory, Los Alamos, New Mexico, USA
| | - Ashlynn R. Daughton
- Los Alamos National Laboratory, Los Alamos, New Mexico, USA
- University of Colorado Boulder, Boulder, Colorado, USA
| | | |
Collapse
|
46
|
Khatua A, Khatua A, Cambria E. A tale of two epidemics: Contextual Word2Vec for classifying twitter streams during outbreaks. Inf Process Manag 2019. [DOI: 10.1016/j.ipm.2018.10.010] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
47
|
Poirier C, Lavenu A, Bertaud V, Campillo-Gimenez B, Chazard E, Cuggia M, Bouzillé G. Real Time Influenza Monitoring Using Hospital Big Data in Combination with Machine Learning Methods: Comparison Study. JMIR Public Health Surveill 2018; 4:e11361. [PMID: 30578212 PMCID: PMC6320394 DOI: 10.2196/11361] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 09/10/2018] [Accepted: 09/10/2018] [Indexed: 11/25/2022] Open
Abstract
Background Traditional surveillance systems produce estimates of influenza-like illness (ILI) incidence rates, but with 1- to 3-week delay. Accurate real-time monitoring systems for influenza outbreaks could be useful for making public health decisions. Several studies have investigated the possibility of using internet users’ activity data and different statistical models to predict influenza epidemics in near real time. However, very few studies have investigated hospital big data. Objective Here, we compared internet and electronic health records (EHRs) data and different statistical models to identify the best approach (data type and statistical model) for ILI estimates in real time. Methods We used Google data for internet data and the clinical data warehouse eHOP, which included all EHRs from Rennes University Hospital (France), for hospital data. We compared 3 statistical models—random forest, elastic net, and support vector machine (SVM). Results For national ILI incidence rate, the best correlation was 0.98 and the mean squared error (MSE) was 866 obtained with hospital data and the SVM model. For the Brittany region, the best correlation was 0.923 and MSE was 2364 obtained with hospital data and the SVM model. Conclusions We found that EHR data together with historical epidemiological information (French Sentinelles network) allowed for accurately predicting ILI incidence rates for the entire France as well as for the Brittany region and outperformed the internet data whatever was the statistical model used. Moreover, the performance of the two statistical models, elastic net and SVM, was comparable.
Collapse
Affiliation(s)
- Canelle Poirier
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France.,INSERM, U1099, Rennes, France
| | - Audrey Lavenu
- Centre d'Investigation Clinique de Rennes, Université de Rennes 1, Rennes, France
| | - Valérie Bertaud
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France.,INSERM, U1099, Rennes, France.,Centre Hospitalier Universitaire de Rennes, Centre de Données Cliniques, Rennes, France
| | - Boris Campillo-Gimenez
- INSERM, U1099, Rennes, France.,Comprehensive Cancer Regional Center, Eugene Marquis, Rennes, France
| | - Emmanuel Chazard
- Centre d'Etudes et de Recherche en Informatique Médicale EA2694, Université de Lille, Lille, France.,Public Health Department, Centre Hospitalier Régional Universitaire de Lille, Lille, France
| | - Marc Cuggia
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France.,INSERM, U1099, Rennes, France.,Centre Hospitalier Universitaire de Rennes, Centre de Données Cliniques, Rennes, France
| | - Guillaume Bouzillé
- Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France.,INSERM, U1099, Rennes, France.,Centre Hospitalier Universitaire de Rennes, Centre de Données Cliniques, Rennes, France
| |
Collapse
|
48
|
Fairchild G, Tasseff B, Khalsa H, Generous N, Daughton AR, Velappan N, Priedhorsky R, Deshpande A. Epidemiological Data Challenges: Planning for a More Robust Future Through Data Standards. Front Public Health 2018; 6:336. [PMID: 30533407 PMCID: PMC6265573 DOI: 10.3389/fpubh.2018.00336] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Accepted: 11/01/2018] [Indexed: 12/23/2022] Open
Abstract
Accessible epidemiological data are of great value for emergency preparedness and response, understanding disease progression through a population, and building statistical and mechanistic disease models that enable forecasting. The status quo, however, renders acquiring and using such data difficult in practice. In many cases, a primary way of obtaining epidemiological data is through the internet, but the methods by which the data are presented to the public often differ drastically among institutions. As a result, there is a strong need for better data sharing practices. This paper identifies, in detail and with examples, the three key challenges one encounters when attempting to acquire and use epidemiological data: (1) interfaces, (2) data formatting, and (3) reporting. These challenges are used to provide suggestions and guidance for improvement as these systems evolve in the future. If these suggested data and interface recommendations were adhered to, epidemiological and public health analysis, modeling, and informatics work would be significantly streamlined, which can in turn yield better public health decision-making capabilities.
Collapse
Affiliation(s)
- Geoffrey Fairchild
- Analytics, Intelligence, and Technology Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| | - Byron Tasseff
- Analytics, Intelligence, and Technology Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| | - Hari Khalsa
- Analytics, Intelligence, and Technology Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| | - Nicholas Generous
- Analytics, Intelligence, and Technology Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| | - Ashlynn R Daughton
- Analytics, Intelligence, and Technology Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| | - Nileena Velappan
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| | - Reid Priedhorsky
- High Performance Computing Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| | - Alina Deshpande
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| |
Collapse
|
49
|
Apollonio DE, Broyde K, Azzam A, De Guia M, Heilman J, Brock T. Pharmacy students can improve access to quality medicines information by editing Wikipedia articles. BMC MEDICAL EDUCATION 2018; 18:265. [PMID: 30454046 PMCID: PMC6245851 DOI: 10.1186/s12909-018-1375-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Accepted: 11/01/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND Pharmacy training programs commonly ask students to develop or edit drug monographs that summarize key information about new medicines as an academic exercise. We sought to expand on this traditional approach by having students improve actual medicines information pages posted on Wikipedia. METHODS We placed students (n = 119) in a required core pharmacy course into groups of four and assigned each group a specific medicines page on Wikipedia to edit. Assigned pages had high hit rates, suggesting that the topics were of interest to the wider public, but were of low quality, suggesting that the topics would benefit from improvement efforts. We provided course trainings about editing Wikipedia. We evaluated the assignment by surveying student knowledge and attitudes and reviewing the edits on Wikipedia. RESULTS Completing the course trainings increased student knowledge of Wikipedia editing practices. At the end of the assignment, students had a more nuanced understanding of Wikipedia as a resource. Student edits improved substantially the quality of the articles edited, their edits were retained for at least 30 days after course completion, and the average number page views of their edited articles increased. CONCLUSIONS Our results suggest that engaging pharmacy students in a Wikipedia editing assignment is a feasible alternative to writing drug monographs as a classroom assignment. Both tasks provide opportunities for students to demonstrate their skills at researching and explaining drug information but only one serves to improve wider access to quality medicines information. Wikipedia editing assignments are feasible for large groups of pharmacy students and effective in improving publicly available information on one of the most heavily accessed websites globally.
Collapse
Affiliation(s)
- Dorie E. Apollonio
- Department of Clinical Pharmacy, University of California, San Francisco, San Francisco, California USA
| | | | - Amin Azzam
- Department of Psychiatry, University of California, San Francisco, San Francisco, California USA
| | | | - James Heilman
- Department of Emergency Medicine, University of British Columbia, Vancouver, BC Canada
| | - Tina Brock
- Department of Clinical Pharmacy, University of California, San Francisco, San Francisco, California USA
- Faculty of Pharmacy and Pharmaceutical Sciences, Monash University, Melbourne, Australia
| |
Collapse
|
50
|
Chakraborty P, Lewis B, Eubank S, Brownstein JS, Marathe M, Ramakrishnan N. What to know before forecasting the flu. PLoS Comput Biol 2018; 14:e1005964. [PMID: 30312305 PMCID: PMC6193572 DOI: 10.1371/journal.pcbi.1005964] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Prithwish Chakraborty
- Discovery Analytics Center, Virginia Tech, Blacksburg, Virginia, United States of America
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Bryan Lewis
- Biocomplexity Institute, University of Virginia, Charlottesville, Virginia, United States of America
| | - Stephen Eubank
- Network Dynamics and Simulation Science Laboratory, Biocomplexity Institute, Virginia Tech, Blacksburg, Virginia, United States of America
| | - John S. Brownstein
- Children's Hospital Informatics Program, Boston Children’s Hospital, Massachusetts, United States of America
- Department of Pediatrics, Harvard Medical School, Massachusetts, United States of America
| | - Madhav Marathe
- Biocomplexity Institute, University of Virginia, Charlottesville, Virginia, United States of America
- Department of Computer Science, University of Virginia, Charlottesville, Virginia, United States of America
| | - Naren Ramakrishnan
- Discovery Analytics Center, Virginia Tech, Blacksburg, Virginia, United States of America
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
- * E-mail:
| |
Collapse
|