1
|
Yang L, Zhang T, Han X, Yang J, Sun Y, Ma L, Chen J, Li Y, Lai S, Li W, Feng L, Yang W. Influenza Epidemic Trend Surveillance and Prediction Based on Search Engine Data: Deep Learning Model Study. J Med Internet Res 2023; 25:e45085. [PMID: 37847532 PMCID: PMC10618884 DOI: 10.2196/45085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 07/24/2023] [Accepted: 08/04/2023] [Indexed: 10/18/2023] Open
Abstract
BACKGROUND Influenza outbreaks pose a significant threat to global public health. Traditional surveillance systems and simple algorithms often struggle to predict influenza outbreaks in an accurate and timely manner. Big data and modern technology have offered new modalities for disease surveillance and prediction. Influenza-like illness can serve as a valuable surveillance tool for emerging respiratory infectious diseases like influenza and COVID-19, especially when reported case data may not fully reflect the actual epidemic curve. OBJECTIVE This study aimed to develop a predictive model for influenza outbreaks by combining Baidu search query data with traditional virological surveillance data. The goal was to improve early detection and preparedness for influenza outbreaks in both northern and southern China, providing evidence for supplementing modern intelligence epidemic surveillance methods. METHODS We collected virological data from the National Influenza Surveillance Network and Baidu search query data from January 2011 to July 2018, totaling 3,691,865 and 1,563,361 respective samples. Relevant search terms related to influenza were identified and analyzed for their correlation with influenza-positive rates using Pearson correlation analysis. A distributed lag nonlinear model was used to assess the lag correlation of the search terms with influenza activity. Subsequently, a predictive model based on the gated recurrent unit and multiple attention mechanisms was developed to forecast the influenza-positive trend. RESULTS This study revealed a high correlation between specific Baidu search terms and influenza-positive rates in both northern and southern China, except for 1 term. The search terms were categorized into 4 groups: essential facts on influenza, influenza symptoms, influenza treatment and medicine, and influenza prevention, all of which showed correlation with the influenza-positive rate. The influenza prevention and influenza symptom groups had a lag correlation of 1.4-3.2 and 5.0-8.0 days, respectively. The Baidu search terms could help predict the influenza-positive rate 14-22 days in advance in southern China but interfered with influenza surveillance in northern China. CONCLUSIONS Complementing traditional disease surveillance systems with information from web-based data sources can aid in detecting warning signs of influenza outbreaks earlier. However, supplementation of modern surveillance with search engine information should be approached cautiously. This approach provides valuable insights for digital epidemiology and has the potential for broader application in respiratory infectious disease surveillance. Further research should explore the optimization and customization of search terms for different regions and languages to improve the accuracy of influenza prediction models.
Collapse
Affiliation(s)
- Liuyang Yang
- Department of Management Science and Information System, Faculty of Management and Economics, Kunming University of Science and Technology, Kunming, China
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Ting Zhang
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xuan Han
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Jiao Yang
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Yanxia Sun
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Libing Ma
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Guilin Medical University, Guilin, China
| | - Jialong Chen
- Department of Respiratory and Critical Care Medicine, Bejing Hospital, Beijing, China
| | - Yanming Li
- Department of Respiratory and Critical Care Medicine, Bejing Hospital, Beijing, China
| | - Shengjie Lai
- WorldPop, School of Geography and Environmental Science, University of Southampton, Southampton, United Kingdom
| | - Wei Li
- The First People's Hospital of Yunnan Province, Affiliated Hospital of Kunming University of Science and Technology, Kunming, China
| | - Luzhao Feng
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Weizhong Yang
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| |
Collapse
|
2
|
Porcu G, Chen YX, Bonaugurio AS, Villa S, Riva L, Messina V, Bagarella G, Maistrello M, Leoni O, Cereda D, Matone F, Gori A, Corrao G. Web-based surveillance of respiratory infection outbreaks: retrospective analysis of Italian COVID-19 epidemic waves using Google Trends. Front Public Health 2023; 11:1141688. [PMID: 37275497 PMCID: PMC10233021 DOI: 10.3389/fpubh.2023.1141688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 04/28/2023] [Indexed: 06/07/2023] Open
Abstract
Introduction Large-scale diagnostic testing has been proven insufficient to promptly monitor the spread of the Coronavirus disease 2019. Electronic resources may provide better insight into the early detection of epidemics. We aimed to retrospectively explore whether the Google search volume has been useful in detecting Severe Acute Respiratory Syndrome Coronavirus outbreaks early compared to the swab-based surveillance system. Methods The Google Trends website was used by applying the research to three Italian regions (Lombardy, Marche, and Sicily), covering 16 million Italian citizens. An autoregressive-moving-average model was fitted, and residual charts were plotted to detect outliers in weekly searches of five keywords. Signals that occurred during periods labelled as free from epidemics were used to measure Positive Predictive Values and False Negative Rates in anticipating the epidemic wave occurrence. Results Signals from "fever," "cough," and "sore throat" showed better performance than those from "loss of smell" and "loss of taste." More than 80% of true epidemic waves were detected early by the occurrence of at least an outlier signal in Lombardy, although this implies a 20% false alarm signals. Performance was poorer for Sicily and Marche. Conclusion Monitoring the volume of Google searches can be a valuable tool for early detection of respiratory infectious disease outbreaks, particularly in areas with high access to home internet. The inclusion of web-based syndromic keywords is promising as it could facilitate the containment of COVID-19 and perhaps other unknown infectious diseases in the future.
Collapse
Affiliation(s)
- Gloria Porcu
- Biostatistics, Epidemiology and Public Health Unit, Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy
- National Centre for Healthcare Research and Pharmacoepidemiology, University of Milano-Bicocca, Milan, Italy
| | - Yu Xi Chen
- Biostatistics, Epidemiology and Public Health Unit, Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy
- Directorate General for Health, Lombardy Region, Milan, Italy
| | - Andrea Stella Bonaugurio
- Biostatistics, Epidemiology and Public Health Unit, Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy
- Directorate General for Health, Lombardy Region, Milan, Italy
| | - Simone Villa
- Centre for Multidisciplinary Research in Health Science, University of Milan, Milan, Italy
| | - Leonardo Riva
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
- PoliS Lombardia, Milan, Italy
| | - Vincenzina Messina
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
- PoliS Lombardia, Milan, Italy
| | - Giorgio Bagarella
- Directorate General for Health, Lombardy Region, Milan, Italy
- Agency for Health Protection of the Metropolitan Area of Milan, Lombardy Region, Milan, Italy
| | - Mauro Maistrello
- Directorate General for Health, Lombardy Region, Milan, Italy
- Local Health Unit of Melegnano and Martesana, Milan, Italy
| | - Olivia Leoni
- Directorate General for Health, Lombardy Region, Milan, Italy
| | - Danilo Cereda
- Directorate General for Health, Lombardy Region, Milan, Italy
| | | | - Andrea Gori
- ASST Fatebenefratelli-Sacco, Luigi Sacco Hospital – University of Milan, Milan, Italy
- Department of Pathophysiology and Transplantation, School of Medicine and Surgery, University of Milan, Milan, Italy
| | - Giovanni Corrao
- Biostatistics, Epidemiology and Public Health Unit, Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy
- National Centre for Healthcare Research and Pharmacoepidemiology, University of Milano-Bicocca, Milan, Italy
- Directorate General for Health, Lombardy Region, Milan, Italy
| |
Collapse
|
3
|
Sun H, Zhang Y, Gao G, Wu D. Internet search data with spatiotemporal analysis in infectious disease surveillance: Challenges and perspectives. Front Public Health 2022; 10:958835. [PMID: 36544794 PMCID: PMC9760721 DOI: 10.3389/fpubh.2022.958835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 11/09/2022] [Indexed: 12/12/2022] Open
Abstract
With the rapid development of the internet, the application of internet search data has been seen as a novel data source to offer timely infectious disease surveillance intelligence. Moreover, the advancements in internet search data, which include rich information at both space and time scales, enable investigators to sufficiently consider the spatiotemporal uncertainty, which can benefit researchers to better monitor infectious diseases and epidemics. In the present study, we present the necessary groundwork and critical appraisal of the use of internet search data and spatiotemporal analysis approaches in infectious disease surveillance by updating the current stage of knowledge on them. The study also provides future directions for researchers to investigate the combination of internet search data with the spatiotemporal analysis in infectious disease surveillance. Internet search data demonstrate a promising potential to offer timely epidemic intelligence, which can be seen as the prerequisite for improving infectious disease surveillance.
Collapse
Affiliation(s)
- Hua Sun
- Popsmart Technology (Zhejiang) Co., Ltd, Ningbo, China
| | - Yuzhou Zhang
- Popsmart Technology (Zhejiang) Co., Ltd, Ningbo, China
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Guang Gao
- Popsmart Technology (Zhejiang) Co., Ltd, Ningbo, China
| | - Dun Wu
- Popsmart Technology (Zhejiang) Co., Ltd, Ningbo, China
| |
Collapse
|
4
|
Déguilhem A, Malaab J, Talmatkadi M, Renner S, Foulquié P, Fagherazzi G, Loussikian P, Marty T, Mebarki A, Texier N, Schuck S. Identifying Profiles and Symptoms of Patients With Long COVID in France: Data Mining Infodemiology Study Based on Social Media. JMIR INFODEMIOLOGY 2022; 2:e39849. [PMID: 36447795 PMCID: PMC9685517 DOI: 10.2196/39849] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 09/19/2022] [Accepted: 10/01/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND Long COVID-a condition with persistent symptoms post COVID-19 infection-is the first illness arising from social media. In France, the French hashtag #ApresJ20 described symptoms persisting longer than 20 days after contracting COVID-19. Faced with a lack of recognition from medical and official entities, patients formed communities on social media and described their symptoms as long-lasting, fluctuating, and multisystemic. While many studies on long COVID relied on traditional research methods with lengthy processes, social media offers a foundation for large-scale studies with a fast-flowing outburst of data. OBJECTIVE We aimed to identify and analyze Long Haulers' main reported symptoms, symptom co-occurrences, topics of discussion, difficulties encountered, and patient profiles. METHODS Data were extracted based on a list of pertinent keywords from public sites (eg, Twitter) and health-related forums (eg, Doctissimo). Reported symptoms were identified via the MedDRA dictionary, displayed per the volume of posts mentioning them, and aggregated at the user level. Associations were assessed by computing co-occurrences in users' messages, as pairs of preferred terms. Discussion topics were analyzed using the Biterm Topic Modeling; difficulties and unmet needs were explored manually. To identify patient profiles in relation to their symptoms, each preferred term's total was used to create user-level hierarchal clusters. RESULTS Between January 1, 2020, and August 10, 2021, overall, 15,364 messages were identified as originating from 6494 patients of long COVID or their caregivers. Our analyses revealed 3 major symptom co-occurrences: asthenia-dyspnea (102/289, 35.3%), asthenia-anxiety (65/289, 22.5%), and asthenia-headaches (50/289, 17.3%). The main reported difficulties were symptom management (150/424, 35.4% of messages), psychological impact (64/424,15.1%), significant pain (51/424, 12.0%), deterioration in general well-being (52/424, 12.3%), and impact on daily and professional life (40/424, 9.4% and 34/424, 8.0% of messages, respectively). We identified 3 profiles of patients in relation to their symptoms: profile A (n=406 patients) reported exclusively an asthenia symptom; profile B (n=129) expressed anxiety (n=129, 100%), asthenia (n=28, 21.7%), dyspnea (n=15, 11.6%), and ageusia (n=3, 2.3%); and profile C (n=141) described dyspnea (n=141, 100%), and asthenia (n=45, 31.9%). Approximately 49.1% of users (79/161) continued expressing symptoms after more than 3 months post infection, and 20.5% (33/161) after 1 year. CONCLUSIONS Long COVID is a lingering condition that affects people worldwide, physically and psychologically. It impacts Long Haulers' quality of life, everyday tasks, and professional activities. Social media played an undeniable role in raising and delivering Long Haulers' voices and can potentially rapidly provide large volumes of valuable patient-reported information. Since long COVID was a self-titled condition by patients themselves via social media, it is imperative to continuously include their perspectives in related research. Our results can help design patient-centric instruments to be further used in clinical practice to better capture meaningful dimensions of long COVID.
Collapse
Affiliation(s)
| | | | | | | | | | - Guy Fagherazzi
- Deep Digital Phenotyping Research Unit, Department of Precision Health, Luxembourg Institute of Health Strassen Luxembourg
| | | | | | | | | | | |
Collapse
|
5
|
Okunoye B, Ning S, Jemielniak D. Searching for HIV and AIDS Health Information in South Africa, 2004-2019: Analysis of Google and Wikipedia Search Trends. JMIR Form Res 2022; 6:e29819. [PMID: 35275080 PMCID: PMC8956998 DOI: 10.2196/29819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 05/29/2021] [Accepted: 02/04/2022] [Indexed: 11/18/2022] Open
Abstract
Background AIDS, caused by HIV, is a leading cause of mortality in Africa. HIV/AIDS is among the greatest public health challenges confronting health authorities, with South Africa having the greatest prevalence of the disease in the world. There is little research into how Africans meet their health information needs on HIV/AIDS online, and this research gap impacts programming and educational responses to the HIV/AIDS pandemic. Objective This paper reports on how, in general, interest in the search terms “HIV” and “AIDS” mirrors the increase in people living with HIV and the decline in AIDS cases in South Africa. Methods Data on search trends for HIV and AIDS for South Africa were found using the search terms “HIV” and “AIDS” (categories: health, web search) on Google Trends. This was compared with data on estimated adults and children living with HIV, and AIDS-related deaths in South Africa, from the Joint United Nations Programme on HIV/AIDS, and also with search interest in the topics “HIV” and “AIDS” on Wikipedia Afrikaans, the most developed local language Wikipedia service in South Africa. Nonparametric statistical tests were conducted to support the trends and associations identified in the data. Results Google Trends shows a statistically significant decline (P<.001) in search interest for AIDS relative to HIV in South Africa. This trend mirrors progress on the ground in South Africa and is significantly associated (P<.001) with a decline in AIDS-related deaths and people living longer with HIV. This trend was also replicated on Wikipedia Afrikaans, where there was a greater interest in HIV than AIDS. Conclusions This statistically significant (P<.001) association between interest in the search terms “HIV” and “AIDS” in South Africa (2004-2019) and the number of people living with HIV and AIDS in the country (2004-2019) might be an indicator that multilateral efforts at combating HIV/AIDS—particularly through awareness raising and behavioral interventions in South Africa—are bearing fruit, and this is not only evident on the ground, but is also reflected in the online information seeking on the HIV/AIDS pandemic. We acknowledge the limitation that in studying the association between Google search interests on HIV/AIDS and cases/deaths, causal relationships should not be drawn due to the limitations of the data.
Collapse
Affiliation(s)
- Babatunde Okunoye
- Berkman Klein Centre for Internet and Society, Harvard University, Cambridge, MA, United States.,Department of Journalism, Film and Television, University of Johannesburg, Johannesburg, South Africa
| | - Shaoyang Ning
- Department of Mathematics and Statistics, Williams College, Massachusetts, MA, United States
| | - Dariusz Jemielniak
- Management in Networked and Digital Societies Department, Kozminski University, Warsaw, Poland
| |
Collapse
|
6
|
Choo H, Kim M, Choi J, Shin J, Shin SY. Influenza Screening via Deep Learning Using a Combination of Epidemiological and Patient-Generated Health Data: Development and Validation Study. J Med Internet Res 2020; 22:e21369. [PMID: 33118941 PMCID: PMC7661232 DOI: 10.2196/21369] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 08/16/2020] [Accepted: 08/18/2020] [Indexed: 01/16/2023] Open
Abstract
Background Screening for influenza in primary care is challenging due to the low sensitivity of rapid antigen tests and the lack of proper screening tests. Objective The aim of this study was to develop a machine learning–based screening tool using patient-generated health data (PGHD) obtained from a mobile health (mHealth) app. Methods We trained a deep learning model based on a gated recurrent unit to screen influenza using PGHD, including each patient’s fever pattern and drug administration records. We used meteorological data and app-based surveillance of the weekly number of patients with influenza. We defined a single episode as the set of consecutive days, including the day the user was diagnosed with influenza or another disease. Any record a user entered 24 hours after his or her last record was considered to be the start of a new episode. Each episode contained data on the user’s age, gender, weight, and at least one body temperature record. The total number of episodes was 6657. Of these, there were 3326 episodes within which influenza was diagnosed. We divided these episodes into 80% training sets (2664/3330) and 20% test sets (666/3330). A 5-fold cross-validation was used on the training set. Results We achieved reliable performance with an accuracy of 82%, a sensitivity of 84%, and a specificity of 80% in the test set. After the effect of each input variable was evaluated, app-based surveillance was observed to be the most influential variable. The correlation between the duration of input data and performance was not statistically significant (P=.09). Conclusions These findings suggest that PGHD from an mHealth app could be a complementary tool for influenza screening. In addition, PGHD, along with traditional clinical data, could be used to improve health conditions.
Collapse
Affiliation(s)
- Hyunwoo Choo
- Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology, Sungkyunkwan University, Seoul, Republic of Korea
| | | | | | | | - Soo-Yong Shin
- Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology, Sungkyunkwan University, Seoul, Republic of Korea.,Big Data Research Center, Samsung Medical Center, Seoul, Republic of Korea
| |
Collapse
|
7
|
Venkatesh U, Gandhi PA. Prediction of COVID-19 Outbreaks Using Google Trends in India: A Retrospective Analysis. Healthc Inform Res 2020; 26:175-184. [PMID: 32819035 PMCID: PMC7438693 DOI: 10.4258/hir.2020.26.3.175] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 07/09/2020] [Indexed: 01/09/2023] Open
Abstract
Objectives Considering the rising menace of coronavirus disease 2019 (COVID-19), it is essential to explore the methods and resources that might predict the case numbers expected and identify the locations of outbreaks. Hence, we have done the following study to explore the potential use of Google Trends (GT) in predicting the COVID-19 outbreak in India. Methods The Google search terms used for the analysis were “coronavirus”, “COVID”, “COVID 19”, “corona”, and “virus”. GTs for these terms in Google Web, News, and YouTube, and the data on COVID-19 case numbers were obtained. Spearman correlation and lag correlation were used to determine the correlation between COVID-19 cases and the Google search terms. Results “Coronavirus” and “corona” were the terms most commonly used by Internet surfers in India. Correlation for the GTs of the search terms “coronavirus” and “corona” was high (r > 0.7) with the daily cumulative and new COVID-19 cases for a lag period ranging from 9 to 21 days. The maximum lag period for predicting COVID-19 cases was found to be with the News search for the term “coronavirus”, with 21 days, i.e., the search volume for “coronavirus” peaked 21 days before the peak number of cases reported by the disease surveillance system. Conclusions Our study revealed that GTs may predict outbreaks of COVID-19, 2 to 3 weeks earlier than the routine disease surveillance, in India. Google search data may be considered as a supplementary tool in COVID-19 monitoring and planning in India.
Collapse
Affiliation(s)
- U Venkatesh
- Department of Community Medicine, Vardhman Mahavir Medical College (VMMC) and Safdarjung Hospital, New Delhi, India
| | - Periyasamy Aravind Gandhi
- Department of Community Medicine and School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh, India
| |
Collapse
|
8
|
Mavragani A. Infodemiology and Infoveillance: Scoping Review. J Med Internet Res 2020; 22:e16206. [PMID: 32310818 PMCID: PMC7189791 DOI: 10.2196/16206] [Citation(s) in RCA: 118] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 02/05/2020] [Accepted: 02/08/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Web-based sources are increasingly employed in the analysis, detection, and forecasting of diseases and epidemics, and in predicting human behavior toward several health topics. This use of the internet has come to be known as infodemiology, a concept introduced by Gunther Eysenbach. Infodemiology and infoveillance studies use web-based data and have become an integral part of health informatics research over the past decade. OBJECTIVE The aim of this paper is to provide a scoping review of the state-of-the-art in infodemiology along with the background and history of the concept, to identify sources and health categories and topics, to elaborate on the validity of the employed methods, and to discuss the gaps identified in current research. METHODS The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines were followed to extract the publications that fall under the umbrella of infodemiology and infoveillance from the JMIR, PubMed, and Scopus databases. A total of 338 documents were extracted for assessment. RESULTS Of the 338 studies, the vast majority (n=282, 83.4%) were published with JMIR Publications. The Journal of Medical Internet Research features almost half of the publications (n=168, 49.7%), and JMIR Public Health and Surveillance has more than one-fifth of the examined studies (n=74, 21.9%). The interest in the subject has been increasing every year, with 2018 featuring more than one-fourth of the total publications (n=89, 26.3%), and the publications in 2017 and 2018 combined accounted for more than half (n=171, 50.6%) of the total number of publications in the last decade. The most popular source was Twitter with 45.0% (n=152), followed by Google with 24.6% (n=83), websites and platforms with 13.9% (n=47), blogs and forums with 10.1% (n=34), Facebook with 8.9% (n=30), and other search engines with 5.6% (n=19). As for the subjects examined, conditions and diseases with 17.2% (n=58) and epidemics and outbreaks with 15.7% (n=53) were the most popular categories identified in this review, followed by health care (n=39, 11.5%), drugs (n=40, 10.4%), and smoking and alcohol (n=29, 8.6%). CONCLUSIONS The field of infodemiology is becoming increasingly popular, employing innovative methods and approaches for health assessment. The use of web-based sources, which provide us with information that would not be accessible otherwise and tackles the issues arising from the time-consuming traditional methods, shows that infodemiology plays an important role in health informatics research.
Collapse
Affiliation(s)
- Amaryllis Mavragani
- Department of Computing Science and Mathematics, Faculty of Natural Sciences, University of Stirling, Stirling, United Kingdom
| |
Collapse
|
9
|
Qin L, Sun Q, Wang Y, Wu KF, Chen M, Shia BC, Wu SY. Prediction of Number of Cases of 2019 Novel Coronavirus (COVID-19) Using Social Media Search Index. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17072365. [PMID: 32244425 PMCID: PMC7177617 DOI: 10.3390/ijerph17072365] [Citation(s) in RCA: 86] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 03/29/2020] [Accepted: 03/30/2020] [Indexed: 01/02/2023]
Abstract
Predicting the number of new suspected or confirmed cases of novel coronavirus disease 2019 (COVID-19) is crucial in the prevention and control of the COVID-19 outbreak. Social media search indexes (SMSI) for dry cough, fever, chest distress, coronavirus, and pneumonia were collected from 31 December 2019 to 9 February 2020. The new suspected cases of COVID-19 data were collected from 20 January 2020 to 9 February 2020. We used the lagged series of SMSI to predict new suspected COVID-19 case numbers during this period. To avoid overfitting, five methods, namely subset selection, forward selection, lasso regression, ridge regression, and elastic net, were used to estimate coefficients. We selected the optimal method to predict new suspected COVID-19 case numbers from 20 January 2020 to 9 February 2020. We further validated the optimal method for new confirmed cases of COVID-19 from 31 December 2019 to 17 February 2020. The new suspected COVID-19 case numbers correlated significantly with the lagged series of SMSI. SMSI could be detected 6–9 days earlier than new suspected cases of COVID-19. The optimal method was the subset selection method, which had the lowest estimation error and a moderate number of predictors. The subset selection method also significantly correlated with the new confirmed COVID-19 cases after validation. SMSI findings on lag day 10 were significantly correlated with new confirmed COVID-19 cases. SMSI could be a significant predictor of the number of COVID-19 infections. SMSI could be an effective early predictor, which would enable governments’ health departments to locate potential and high-risk outbreak areas.
Collapse
Affiliation(s)
- Lei Qin
- School of Statistics, University of International Business and Economics, Beijing 100029, China; (L.Q.); (Q.S.); (Y.W.)
| | - Qiang Sun
- School of Statistics, University of International Business and Economics, Beijing 100029, China; (L.Q.); (Q.S.); (Y.W.)
| | - Yidan Wang
- School of Statistics, University of International Business and Economics, Beijing 100029, China; (L.Q.); (Q.S.); (Y.W.)
| | - Ke-Fei Wu
- Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City 242, Taiwan; (K.-F.W.); (M.C.)
| | - Mingchih Chen
- Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City 242, Taiwan; (K.-F.W.); (M.C.)
| | - Ben-Chang Shia
- Research Center of Big Data, College of management, Taipei Medical University, Taipei 110, Taiwan;
- College of Management, Taipei Medical University, Taipei 110, Taiwan
- Executive Master Program of Business Administration in Biotechnology, College of management, Taipei Medical University, Taipei 110, Taiwan
| | - Szu-Yuan Wu
- Department of Food Nutrition and Health Biotechnology, College of Medical and Health Science, Asia University, Taichung 41354, Taiwan
- Division of Radiation Oncology, Lo-Hsu Medical Foundation, Lotung Poh-Ai Hospital, Yilan 265, Taiwan
- Big Data Center, Lo-Hsu Medical Foundation, Lotung Poh-Ai Hospital, Yilan 265, Taiwan
- Department of Healthcare Administration, College of Medical and Health Science, Asia University, Taichung 41354, Taiwan
- School of Dentistry, College of Oral Medicine, Taipei Medical University, Taipei 110, Taiwan
- Correspondence:
| |
Collapse
|
10
|
Barros JM, Duggan J, Rebholz-Schuhmann D. The Application of Internet-Based Sources for Public Health Surveillance (Infoveillance): Systematic Review. J Med Internet Res 2020; 22:e13680. [PMID: 32167477 PMCID: PMC7101503 DOI: 10.2196/13680] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Revised: 09/18/2019] [Accepted: 11/26/2019] [Indexed: 12/30/2022] Open
Abstract
Background Public health surveillance is based on the continuous and systematic collection, analysis, and interpretation of data. This informs the development of early warning systems to monitor epidemics and documents the impact of intervention measures. The introduction of digital data sources, and specifically sources available on the internet, has impacted the field of public health surveillance. New opportunities enabled by the underlying availability and scale of internet-based sources (IBSs) have paved the way for novel approaches for disease surveillance, exploration of health communities, and the study of epidemic dynamics. This field and approach is also known as infodemiology or infoveillance. Objective This review aimed to assess research findings regarding the application of IBSs for public health surveillance (infodemiology or infoveillance). To achieve this, we have presented a comprehensive systematic literature review with a focus on these sources and their limitations, the diseases targeted, and commonly applied methods. Methods A systematic literature review was conducted targeting publications between 2012 and 2018 that leveraged IBSs for public health surveillance, outbreak forecasting, disease characterization, diagnosis prediction, content analysis, and health-topic identification. The search results were filtered according to previously defined inclusion and exclusion criteria. Results Spanning a total of 162 publications, we determined infectious diseases to be the preferred case study (108/162, 66.7%). Of the eight categories of IBSs (search queries, social media, news, discussion forums, websites, web encyclopedia, and online obituaries), search queries and social media were applied in 95.1% (154/162) of the reviewed publications. We also identified limitations in representativeness and biased user age groups, as well as high susceptibility to media events by search queries, social media, and web encyclopedias. Conclusions IBSs are a valuable proxy to study illnesses affecting the general population; however, it is important to characterize which diseases are best suited for the available sources; the literature shows that the level of engagement among online platforms can be a potential indicator. There is a necessity to understand the population’s online behavior; in addition, the exploration of health information dissemination and its content is significantly unexplored. With this information, we can understand how the population communicates about illnesses online and, in the process, benefit public health.
Collapse
Affiliation(s)
- Joana M Barros
- Insight Centre for Data Analytics, National University of Ireland Galway, Galway, Ireland.,School of Computer Science, National University of Ireland Galway, Galway, Ireland
| | - Jim Duggan
- School of Computer Science, National University of Ireland Galway, Galway, Ireland
| | | |
Collapse
|
11
|
Zhang Y, Bambrick H, Mengersen K, Tong S, Feng L, Zhang L, Liu G, Xu A, Hu W. Using big data to predict pertussis infections in Jinan city, China: a time series analysis. INTERNATIONAL JOURNAL OF BIOMETEOROLOGY 2020; 64:95-104. [PMID: 31478106 DOI: 10.1007/s00484-019-01796-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 07/06/2019] [Accepted: 08/27/2019] [Indexed: 05/14/2023]
Abstract
This study aims to use big data (climate data, internet query data and school calendar patterns (SCP)) to improve pertussis surveillance and prediction, and develop an early warning model for pertussis epidemics. We collected weekly pertussis notifications, SCP, climate and internet search query data (Baidu index (BI)) in Jinan, China between 2013 and 2017. Time series decomposition and temporal risk assessment were used for examining the epidemic features in pertussis infections. A seasonal autoregressive integrated moving average (SARIMA) model and regression tree model were developed to predict pertussis occurrence using identified predictors. Our study demonstrates clear seasonal patterns in pertussis epidemics, and pertussis activity was most significantly associated with BI at 2-week lag (rBI = 0.73, p < 0.05), temperature at 1-week lag (rtemp = 0.19, p < 0.05) and rainfall at 2-week lag (rrainfall = 0.27, p < 0.05). No obvious relationship between pertussis peaks and school attendance was found in the study. Pertussis cases were more likely to be temporally concentrated throughout the epidemics during the study period. SARIMA models with 2-week-lagged BI and 1-week-lagged temperature had better predictive performance (βsearch query = 0.06, p = 0.02; βtemp = 0.16, p = 0.03) with large correlation coefficients (r = 0.67, p < 0.01) and low root mean squared error (RMSE) value (r = 3.59). The regression tree model identified threshold values of potential predictors (search query, climate and SCP) for pertussis epidemics. Our results showed that internet query in conjunction with social and climatic data can predict pertussis epidemics, which is a foundation of using such data to develop early warning systems.
Collapse
Affiliation(s)
- Yuzhou Zhang
- School of Public Health and Social Work; Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Hilary Bambrick
- School of Public Health and Social Work; Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Kerrie Mengersen
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Shilu Tong
- School of Public Health and Social Work; Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Public Health and Institute of Environment and Human Health, Anhui Medical University, Hefei, Anhui, China
- Shanghai Children's Medical Centre, Shanghai Jiao-Tong University, Shanghai, China
| | - Lei Feng
- Shandong Provincial Centre of Disease Control and Prevention, Jinan, China
| | - Li Zhang
- Shandong Provincial Centre of Disease Control and Prevention, Jinan, China
| | - Guifang Liu
- Shandong Provincial Centre of Disease Control and Prevention, Jinan, China
| | - Aiqiang Xu
- Shandong Provincial Centre of Disease Control and Prevention, Jinan, China
| | - Wenbiao Hu
- School of Public Health and Social Work; Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia.
| |
Collapse
|
12
|
Heartburn-Related Internet Searches and Trends of Interest across Six Western Countries: A Four-Year Retrospective Analysis Using Google Ads Keyword Planner. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 16:ijerph16234591. [PMID: 31756947 PMCID: PMC6926592 DOI: 10.3390/ijerph16234591] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2019] [Revised: 11/16/2019] [Accepted: 11/18/2019] [Indexed: 12/12/2022]
Abstract
The internet is becoming the main source of health-related information. We aimed to investigate data regarding heartburn-related searches made by Google users from Australia, Canada, Germany, Poland, the United Kingdom, and the United States. We retrospectively analyzed data from Google Ads Keywords Planner. We extracted search volumes of keywords associated with “heartburn” for June 2015 to May 2019. The data were generated in the respective primary language. The number of searches per 1000 Google-user years was as follows: 177.4 (Australia), 178.1 (Canada), 123.8 (Germany), 199.7 (Poland), 152.5 (United Kingdom), and 194.5 (United States). The users were particularly interested in treatment (19.0 to 41.3%), diet (4.8 to 10.7%), symptoms (2.6 to 13.1%), and causes (3.7 to 10.0%). In all countries except Germany, the number of heartburn-related queries significantly increased over the analyzed period. For Canada, Germany, Poland, and the United Kingdom, query numbers were significantly lowest in summer; there was no significant seasonal trend for Australia and the United States. The number of heartburn-related queries has increased over the past four years, and a seasonal pattern may exist in certain regions. The trends in heartburn-related searches may reflect the scale of the complaint, and should be verified through future epidemiological studies.
Collapse
|
13
|
Association of sociodemographic factors and internet query data with pertussis infections in Shandong, China. Epidemiol Infect 2019; 147:e302. [PMID: 31727192 PMCID: PMC6873159 DOI: 10.1017/s0950268819001924] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
This study explored how internet queries vary in facilitating monitoring of pertussis, and the effects of sociodemographic characteristics on such variation by city in Shandong province, China. We collected weekly pertussis notifications, Baidu Index (BI) data and yearly sociodemographic data at the city level between 1 January 2009 and 31 December 2017. Spearman's correlation was performed for temporal risk indices, generalised linear models and regression tree models were developed to identify the hierarchical effects and the threshold between sociodemographic factors and internet query data with pertussis surveillance. The BI was correlated with pertussis notifications, with a strongly spatial variation among cities in temporal risk indices (composite temporal risk metric (CTRM) range: 0.59–1.24). The percentage of urban population (relative risk (RR): 1.05, 95% confidence interval (CI) 1.03–1.07), the proportion of highly educated population (RR: 1.27, 95% CI 1.16–1.39) and the internet access rate (RR: 1.04, 95% CI 1.02–1.05) were correlated with CTRM. Higher RRs in the three identified sociodemographic factors were associated with higher stratified CTRM. The percentage of highly educated population was the most important determinant in the BI with pertussis surveillance. The findings may lead to spatially-specific criteria to inform development of an early warning system of pertussis infections using internet query data.
Collapse
|
14
|
Zhang Y, Bambrick H, Mengersen K, Tong S, Hu W. Using Google Trends and ambient temperature to predict seasonal influenza outbreaks. ENVIRONMENT INTERNATIONAL 2018; 117:284-291. [PMID: 29778013 DOI: 10.1016/j.envint.2018.05.016] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Revised: 04/04/2018] [Accepted: 05/07/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND The discovery of the dynamics of seasonal and non-seasonal influenza outbreaks remains a great challenge. Previous internet-based surveillance studies built purely on internet or climate data do have potential error. METHODS We collected influenza notifications, temperature and Google Trends (GT) data between January 1st, 2011 and December 31st, 2016. We performed time-series cross correlation analysis and temporal risk analysis to discover the characteristics of influenza epidemics in the period. Then, the seasonal autoregressive integrated moving average (SARIMA) model and regression tree model were developed to track influenza epidemics using GT and climate data. RESULTS Influenza infection was significantly corrected with GT at lag of 1-7 weeks in Brisbane and Gold Coast, and temperature at lag of 1-10 weeks for the two study settings. SARIMA models with GT and temperature data had better predictive performance. We identified autoregression (AR) for influenza was the most important determinant for influenza occurrence in both Brisbane and Gold Coast. CONCLUSIONS Our results suggested internet search metrics in conjunction with temperature can be used to predict influenza outbreaks, which can be considered as a pre-requisite for constructing early warning systems using search and temperature data.
Collapse
Affiliation(s)
- Yuzhou Zhang
- School of Public Health and Social Work, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia.
| | - Hilary Bambrick
- School of Public Health and Social Work, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia.
| | - Kerrie Mengersen
- Science and Engineering Faculty, Mathematical and Statistical Science, Queensland University of Technology, Brisbane, Queensland, Australia.
| | - Shilu Tong
- School of Public Health and Social Work, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia; School of Public Health and Institute of Environment and Human Health, Anhui Medical University, Hefei, Anhui, China; Shanghai Children's Medical Centre, Shanghai Jiao-Tong University, Shanghai, China.
| | - Wenbiao Hu
- School of Public Health and Social Work, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia.
| |
Collapse
|
15
|
Liang F, Guan P, Wu W, Huang D. Forecasting influenza epidemics by integrating internet search queries and traditional surveillance data with the support vector machine regression model in Liaoning, from 2011 to 2015. PeerJ 2018; 6:e5134. [PMID: 29967755 PMCID: PMC6022725 DOI: 10.7717/peerj.5134] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 06/08/2018] [Indexed: 12/15/2022] Open
Abstract
Background Influenza epidemics pose significant social and economic challenges in China. Internet search query data have been identified as a valuable source for the detection of emerging influenza epidemics. However, the selection of the search queries and the adoption of prediction methods are crucial challenges when it comes to improving predictions. The purpose of this study was to explore the application of the Support Vector Machine (SVM) regression model in merging search engine query data and traditional influenza data. Methods The official monthly reported number of influenza cases in Liaoning province in China was acquired from the China National Scientific Data Center for Public Health from January 2011 to December 2015. Based on Baidu Index, a publicly available search engine database, search queries potentially related to influenza over the corresponding period were identified. An SVM regression model was built to be used for predictions, and the choice of three parameters (C, γ, ε) in the SVM regression model was determined by leave-one-out cross-validation (LOOCV) during the model construction process. The model’s performance was evaluated by the evaluation metrics including Root Mean Square Error, Root Mean Square Percentage Error and Mean Absolute Percentage Error. Results In total, 17 search queries related to influenza were generated through the initial query selection approach and were adopted to construct the SVM regression model, including nine queries in the same month, three queries at a lag of one month, one query at a lag of two months and four queries at a lag of three months. The SVM model performed well when with the parameters (C = 2, γ = 0.005, ɛ = 0.0001), based on the ensemble data integrating the influenza surveillance data and Baidu search query data. Conclusions The results demonstrated the feasibility of using internet search engine query data as the complementary data source for influenza surveillance and the efficiency of SVM regression model in tracking the influenza epidemics in Liaoning.
Collapse
Affiliation(s)
- Feng Liang
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China
| | - Peng Guan
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China
| | - Wei Wu
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China
| | - Desheng Huang
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China.,Department of Mathematics, School of Fundamental Sciences, China Medical University, Shenyang, Liaoning, China
| |
Collapse
|
16
|
Brownstein JS, Chu S, Marathe A, Marathe MV, Nguyen AT, Paolotti D, Perra N, Perrotta D, Santillana M, Swarup S, Tizzoni M, Vespignani A, Vullikanti AKS, Wilson ML, Zhang Q. Combining Participatory Influenza Surveillance with Modeling and Forecasting: Three Alternative Approaches. JMIR Public Health Surveill 2017; 3:e83. [PMID: 29092812 PMCID: PMC5688248 DOI: 10.2196/publichealth.7344] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Revised: 04/06/2017] [Accepted: 10/09/2017] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Influenza outbreaks affect millions of people every year and its surveillance is usually carried out in developed countries through a network of sentinel doctors who report the weekly number of Influenza-like Illness cases observed among the visited patients. Monitoring and forecasting the evolution of these outbreaks supports decision makers in designing effective interventions and allocating resources to mitigate their impact. OBJECTIVE Describe the existing participatory surveillance approaches that have been used for modeling and forecasting of the seasonal influenza epidemic, and how they can help strengthen real-time epidemic science and provide a more rigorous understanding of epidemic conditions. METHODS We describe three different participatory surveillance systems, WISDM (Widely Internet Sourced Distributed Monitoring), Influenzanet and Flu Near You (FNY), and show how modeling and simulation can be or has been combined with participatory disease surveillance to: i) measure the non-response bias in a participatory surveillance sample using WISDM; and ii) nowcast and forecast influenza activity in different parts of the world (using Influenzanet and Flu Near You). RESULTS WISDM-based results measure the participatory and sample bias for three epidemic metrics i.e. attack rate, peak infection rate, and time-to-peak, and find the participatory bias to be the largest component of the total bias. The Influenzanet platform shows that digital participatory surveillance data combined with a realistic data-driven epidemiological model can provide both short-term and long-term forecasts of epidemic intensities, and the ground truth data lie within the 95 percent confidence intervals for most weeks. The statistical accuracy of the ensemble forecasts increase as the season progresses. The Flu Near You platform shows that participatory surveillance data provide accurate short-term flu activity forecasts and influenza activity predictions. The correlation of the HealthMap Flu Trends estimates with the observed CDC ILI rates is 0.99 for 2013-2015. Additional data sources lead to an error reduction of about 40% when compared to the estimates of the model that only incorporates CDC historical information. CONCLUSIONS While the advantages of participatory surveillance, compared to traditional surveillance, include its timeliness, lower costs, and broader reach, it is limited by a lack of control over the characteristics of the population sample. Modeling and simulation can help overcome this limitation as well as provide real-time and long-term forecasting of influenza activity in data-poor parts of the world.
Collapse
Affiliation(s)
- John S Brownstein
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.,Computational Epidemiology Group, Division of Emergency Medicine, Boston Children's Hospital, Boston, MA, United States.,Harvard Medical School, Boston, MA, United States
| | - Shuyu Chu
- Network Dynamics and Simulation Science Laboratory, Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States
| | - Achla Marathe
- Network Dynamics and Simulation Science Laboratory, Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States
| | - Madhav V Marathe
- Network Dynamics and Simulation Science Laboratory, Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States
| | - Andre T Nguyen
- Computational Epidemiology Group, Division of Emergency Medicine, Boston Children's Hospital, Boston, MA, United States.,Booz Allen Hamilton, Boston, MA, United States
| | - Daniela Paolotti
- Computational Epidemiology Laboratory, Institute for Scientific Interchange, Turin, Italy
| | - Nicola Perra
- Centre for Business Networks Analysis, University of Greenwich, London, United Kingdom
| | - Daniela Perrotta
- Computational Epidemiology Laboratory, Institute for Scientific Interchange, Turin, Italy
| | - Mauricio Santillana
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.,Computational Epidemiology Group, Division of Emergency Medicine, Boston Children's Hospital, Boston, MA, United States.,Harvard Medical School, Boston, MA, United States
| | - Samarth Swarup
- Network Dynamics and Simulation Science Laboratory, Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States
| | - Michele Tizzoni
- Computational Epidemiology Laboratory, Institute for Scientific Interchange, Turin, Italy
| | - Alessandro Vespignani
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, United States
| | - Anil Kumar S Vullikanti
- Network Dynamics and Simulation Science Laboratory, Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States
| | - Mandy L Wilson
- Network Dynamics and Simulation Science Laboratory, Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States
| | - Qian Zhang
- Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, United States
| |
Collapse
|
17
|
Seo DW, Shin SY. Methods Using Social Media and Search Queries to Predict Infectious Disease Outbreaks. Healthc Inform Res 2017; 23:343-348. [PMID: 29181246 PMCID: PMC5688036 DOI: 10.4258/hir.2017.23.4.343] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Revised: 08/24/2017] [Accepted: 09/10/2017] [Indexed: 01/19/2023] Open
Abstract
Objectives For earlier detection of infectious disease outbreaks, a digital syndromic surveillance system based on search queries or social media should be utilized. By using real-time data sources, a digital syndromic surveillance system can overcome the limitation of time-delay in traditional surveillance systems. Here, we introduce an approach to develop such a digital surveillance system. Methods We first explain how the statistics data of infectious diseases, such as influenza and Middle East Respiratory Syndrome (MERS) in Korea, can be collected for reference data. Then we also explain how search engine queries can be retrieved from Google Trends. Finally, we describe the implementation of the prediction model using lagged correlation, which can be calculated by the statistical packages, i.e., SPSS (Statistical Package for the Social Sciences). Results Lag correlation analyses demonstrated that search engine data/Twitter have a significant temporal relationship with influenza and MERS data. Therefore, the proposed digital surveillance system can be used to predict infectious disease outbreaks earlier. Conclusions This prediction method could be the core engine for implementing a (near-) real-time digital surveillance system. A digital surveillance system that uses Internet resources has enormous potential to monitor disease outbreaks in the early phase.
Collapse
Affiliation(s)
- Dong-Woo Seo
- Department of Emergency Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Soo-Yong Shin
- Department of Computer Science and Engineering, Kyung Hee University, Yongin, Korea
| |
Collapse
|
18
|
Kagashe I, Yan Z, Suheryani I. Enhancing Seasonal Influenza Surveillance: Topic Analysis of Widely Used Medicinal Drugs Using Twitter Data. J Med Internet Res 2017; 19:e315. [PMID: 28899847 PMCID: PMC5617904 DOI: 10.2196/jmir.7393] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Revised: 06/09/2017] [Accepted: 07/26/2017] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Uptake of medicinal drugs (preventive or treatment) is among the approaches used to control disease outbreaks, and therefore, it is of vital importance to be aware of the counts or frequencies of most commonly used drugs and trending topics about these drugs from consumers for successful implementation of control measures. Traditional survey methods would have accomplished this study, but they are too costly in terms of resources needed, and they are subject to social desirability bias for topics discovery. Hence, there is a need to use alternative efficient means such as Twitter data and machine learning (ML) techniques. OBJECTIVE Using Twitter data, the aim of the study was to (1) provide a methodological extension for efficiently extracting widely consumed drugs during seasonal influenza and (2) extract topics from the tweets of these drugs and to infer how the insights provided by these topics can enhance seasonal influenza surveillance. METHODS From tweets collected during the 2012-13 flu season, we first identified tweets with mentions of drugs and then constructed an ML classifier using dependency words as features. The classifier was used to extract tweets that evidenced consumption of drugs, out of which we identified the mostly consumed drugs. Finally, we extracted trending topics from each of these widely used drugs' tweets using latent Dirichlet allocation (LDA). RESULTS Our proposed classifier obtained an F1 score of 0.82, which significantly outperformed the two benchmark classifiers (ie, P<.001 with the lexicon-based and P=.048 with the 1-gram term frequency [TF]). The classifier extracted 40,428 tweets that evidenced consumption of drugs out of 50,828 tweets with mentions of drugs. The most widely consumed drugs were influenza virus vaccines that had around 76.95% (31,111/40,428) share of the total; other notable drugs were Theraflu, DayQuil, NyQuil, vitamins, acetaminophen, and oseltamivir. The topics of each of these drugs exhibited common themes or experiences from people who have consumed these drugs. Among these were the enabling and deterrent factors to influenza drugs uptake, which are keys to mitigating the severity of seasonal influenza outbreaks. CONCLUSIONS The study results showed the feasibility of using tweets of widely consumed drugs to enhance seasonal influenza surveillance in lieu of the traditional or conventional surveillance approaches. Public health officials and other stakeholders can benefit from the findings of this study, especially in enhancing strategies for mitigating the severity of seasonal influenza outbreaks. The proposed methods can be extended to the outbreaks of other diseases.
Collapse
Affiliation(s)
- Ireneus Kagashe
- School of Management and Economics, Beijing Institute of Technology, Beijing, China
| | - Zhijun Yan
- School of Management and Economics, Beijing Institute of Technology, Beijing, China
- Sustainable Development Research Institute for Economy and Society of Beijing, Beijing, China
| | - Imran Suheryani
- School of Life Science, Department of Biomedical Engineering, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
19
|
Menachemi N, Rahurkar S, Rahurkar M. Using Web-Based Search Data to Study the Public's Reactions to Societal Events: The Case of the Sandy Hook Shooting. JMIR Public Health Surveill 2017; 3:e12. [PMID: 28336508 PMCID: PMC5383805 DOI: 10.2196/publichealth.6033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Revised: 10/29/2016] [Accepted: 02/03/2017] [Indexed: 11/21/2022] Open
Abstract
Background Internet search is the most common activity on the World Wide Web and generates a vast amount of user-reported data regarding their information-seeking preferences and behavior. Although this data has been successfully used to examine outbreaks, health care utilization, and outcomes related to quality of care, its value in informing public health policy remains unclear. Objective The aim of this study was to evaluate the role of Internet search query data in health policy development. To do so, we studied the public’s reaction to a major societal event in the context of the 2012 Sandy Hook School shooting incident. Methods Query data from the Yahoo! search engine regarding firearm-related searches was analyzed to examine changes in user-selected search terms and subsequent websites visited for a period of 14 days before and after the shooting incident. Results A total of 5,653,588 firearm-related search queries were analyzed. In the after period, queries increased for search terms related to “guns” (+50.06%), “shooting incident” (+333.71%), “ammunition” (+155.14%), and “gun-related laws” (+535.47%). The highest increase (+1054.37%) in Web traffic was seen by news websites following “shooting incident” queries whereas searches for “guns” (+61.02%) and “ammunition” (+173.15%) resulted in notable increases in visits to retail websites. Firearm-related queries generally returned to baseline levels after approximately 10 days. Conclusions Search engine queries present a viable infodemiology metric on public reactions and subsequent behaviors to major societal events and could be used by policymakers to inform policy development.
Collapse
Affiliation(s)
- Nir Menachemi
- Richard M. Fairbanks School of Public HealthHealth Policy and ManagementIndiana University-IUPUIIndianapolis, INUnited States.,Regenstrief InstituteCenter for Biomedical InformaticsIndianapolis, INUnited States
| | - Saurabh Rahurkar
- Regenstrief InstituteCenter for Biomedical InformaticsIndianapolis, INUnited States
| | | |
Collapse
|
20
|
Priedhorsky R, Osthus D, Daughton AR, Moran KR, Generous N, Fairchild G, Deshpande A, Del Valle SY. Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda. CSCW : PROCEEDINGS OF THE CONFERENCE ON COMPUTER-SUPPORTED COOPERATIVE WORK. CONFERENCE ON COMPUTER-SUPPORTED COOPERATIVE WORK 2017; 2017:1812-1834. [PMID: 28782059 PMCID: PMC5542563 DOI: 10.1145/2998181.2998183] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Effective disease monitoring provides a foundation for effective public health systems. This has historically been accomplished with patient contact and bureaucratic aggregation, which tends to be slow and expensive. Recent internet-based approaches promise to be real-time and cheap, with few parameters. However, the question of when and how these approaches work remains open. We addressed this question using Wikipedia access logs and category links. Our experiments, replicable and extensible using our open source code and data, test the effect of semantic article filtering, amount of training data, forecast horizon, and model staleness by comparing across 6 diseases and 4 countries using thousands of individual models. We found that our minimal-configuration, language-agnostic article selection process based on semantic relatedness is effective for improving predictions, and that our approach is relatively insensitive to the amount and age of training data. We also found, in contrast to prior work, very little forecasting value, and we argue that this is consistent with theoretical considerations about the nature of forecasting. These mixed results lead us to propose that the currently observational field of internet-based disease surveillance must pivot to include theoretical models of information flow as well as controlled experiments based on simulations of disease.
Collapse
Affiliation(s)
| | - Dave Osthus
- Computer, Computational, and Statistical Sciences (CCS) Division
| | - Ashlynn R Daughton
- Analytics, Intelligence, and Technology (A) Division Los Alamos National Laboratory Los Alamos, NM
| | - Kelly R Moran
- Analytics, Intelligence, and Technology (A) Division Los Alamos National Laboratory Los Alamos, NM
| | - Nicholas Generous
- Analytics, Intelligence, and Technology (A) Division Los Alamos National Laboratory Los Alamos, NM
| | - Geoffrey Fairchild
- Analytics, Intelligence, and Technology (A) Division Los Alamos National Laboratory Los Alamos, NM
| | - Alina Deshpande
- Analytics, Intelligence, and Technology (A) Division Los Alamos National Laboratory Los Alamos, NM
| | - Sara Y Del Valle
- Analytics, Intelligence, and Technology (A) Division Los Alamos National Laboratory Los Alamos, NM
| |
Collapse
|
21
|
Agarwal V, Zhang L, Zhu J, Fang S, Cheng T, Hong C, Shah NH. Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis. J Med Internet Res 2016; 18:e251. [PMID: 27655225 PMCID: PMC5052461 DOI: 10.2196/jmir.6240] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2016] [Revised: 07/26/2016] [Accepted: 07/27/2016] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND By recent estimates, the steady rise in health care costs has deprived more than 45 million Americans of health care services and has encouraged health care providers to better understand the key drivers of health care utilization from a population health management perspective. Prior studies suggest the feasibility of mining population-level patterns of health care resource utilization from observational analysis of Internet search logs; however, the utility of the endeavor to the various stakeholders in a health ecosystem remains unclear. OBJECTIVE The aim was to carry out a closed-loop evaluation of the utility of health care use predictions using the conversion rates of advertisements that were displayed to the predicted future utilizers as a surrogate. The statistical models to predict the probability of user's future visit to a medical facility were built using effective predictors of health care resource utilization, extracted from a deidentified dataset of geotagged mobile Internet search logs representing searches made by users of the Baidu search engine between March 2015 and May 2015. METHODS We inferred presence within the geofence of a medical facility from location and duration information from users' search logs and putatively assigned medical facility visit labels to qualifying search logs. We constructed a matrix of general, semantic, and location-based features from search logs of users that had 42 or more search days preceding a medical facility visit as well as from search logs of users that had no medical visits and trained statistical learners for predicting future medical visits. We then carried out a closed-loop evaluation of the utility of health care use predictions using the show conversion rates of advertisements displayed to the predicted future utilizers. In the context of behaviorally targeted advertising, wherein health care providers are interested in minimizing their cost per conversion, the association between show conversion rate and predicted utilization score, served as a surrogate measure of the model's utility. RESULTS We obtained the highest area under the curve (0.796) in medical visit prediction with our random forests model and daywise features. Ablating feature categories one at a time showed that the model performance worsened the most when location features were dropped. An online evaluation in which advertisements were served to users who had a high predicted probability of a future medical visit showed a 3.96% increase in the show conversion rate. CONCLUSIONS Results from our experiments done in a research setting suggest that it is possible to accurately predict future patient visits from geotagged mobile search logs. Results from the offline and online experiments on the utility of health utilization predictions suggest that such prediction can have utility for health care providers.
Collapse
Affiliation(s)
- Vibhu Agarwal
- Biomedical Informatics Training Program, Stanford University, Stanford, CA, United States.
| | | | | | | | | | | | | |
Collapse
|
22
|
Shin SY, Seo DW, An J, Kwak H, Kim SH, Gwack J, Jo MW. High correlation of Middle East respiratory syndrome spread with Google search and Twitter trends in Korea. Sci Rep 2016; 6:32920. [PMID: 27595921 PMCID: PMC5011762 DOI: 10.1038/srep32920] [Citation(s) in RCA: 79] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Accepted: 08/16/2016] [Indexed: 01/07/2023] Open
Abstract
The Middle East respiratory syndrome coronavirus (MERS-CoV) was exported to Korea in 2015, resulting in a threat to neighboring nations. We evaluated the possibility of using a digital surveillance system based on web searches and social media data to monitor this MERS outbreak. We collected the number of daily laboratory-confirmed MERS cases and quarantined cases from May 11, 2015 to June 26, 2015 using the Korean government MERS portal. The daily trends observed via Google search and Twitter during the same time period were also ascertained using Google Trends and Topsy. Correlations among the data were then examined using Spearman correlation analysis. We found high correlations (>0.7) between Google search and Twitter results and the number of confirmed MERS cases for the previous three days using only four simple keywords: "MERS", "" ("MERS (in Korean)"), "" ("MERS symptoms (in Korean)"), and "" ("MERS hospital (in Korean)"). Additionally, we found high correlations between the Google search and Twitter results and the number of quarantined cases using the above keywords. This study demonstrates the possibility of using a digital surveillance system to monitor the outbreak of MERS.
Collapse
Affiliation(s)
- Soo-Yong Shin
- Department of Biomedical Informatics, Asan Medical Center, Seoul, Korea
| | - Dong-Woo Seo
- Department of Emergency Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Jisun An
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Haewoon Kwak
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Sung-Han Kim
- Department of Infectious Diseases, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Jin Gwack
- Center for Disease Control and Prevention, Osong, Chungbuk, Korea
| | - Min-Woo Jo
- Department of Preventive Medicine, University of Ulsan College of Medicine, Seoul, Korea
| |
Collapse
|
23
|
Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea. PLoS One 2016; 11:e0158539. [PMID: 27391028 PMCID: PMC4938422 DOI: 10.1371/journal.pone.0158539] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 06/17/2016] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the previous digital surveillance systems did not distinguish mobile and desktop search queries. The purpose of this study was to compare the performance of mobile and desktop search queries in terms of digital influenza surveillance. METHODS AND RESULTS The study period was from September 6, 2010 through August 30, 2014, which consisted of four epidemiological years. Influenza-like illness (ILI) and virologic surveillance data from the Korea Centers for Disease Control and Prevention were used. A total of 210 combined queries from our previous survey work were used for this study. Mobile and desktop weekly search data were extracted from Naver, which is the largest search engine in Korea. Spearman's correlation analysis was used to examine the correlation of the mobile and desktop data with ILI and virologic data in Korea. We also performed lag correlation analysis. We observed that the influenza surveillance performance of mobile search queries matched or exceeded that of desktop search queries over time. The mean correlation coefficients of mobile search queries and the number of queries with an r-value of ≥ 0.7 equaled or became greater than those of desktop searches over the four epidemiological years. A lag correlation analysis of up to two weeks showed similar trends. CONCLUSION Our study shows that mobile search queries for influenza surveillance have equaled or even become greater than desktop search queries over time. In the future development of influenza surveillance using search queries, the recognition of changing trend of mobile search data could be necessary.
Collapse
|
24
|
Woo H, Cho Y, Shim E, Lee JK, Lee CG, Kim SH. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea. J Med Internet Res 2016; 18:e177. [PMID: 27377323 PMCID: PMC4949385 DOI: 10.2196/jmir.4955] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2015] [Revised: 04/17/2016] [Accepted: 05/19/2016] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. OBJECTIVE In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. METHODS Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. RESULTS In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; P<.001) and virological incidence rate (r=.963; P<.001). CONCLUSIONS These results demonstrate the feasibility of using search queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data.
Collapse
Affiliation(s)
- Hyekyung Woo
- Department of Health Science and Service, School of Public Health, Seoul National University, Seoul, Republic Of Korea
| | | | | | | | | | | |
Collapse
|
25
|
Lee D, Lee H, Choi M. Examining the Relationship Between Past Orientation and US Suicide Rates: An Analysis Using Big Data-Driven Google Search Queries. J Med Internet Res 2016; 18:e35. [PMID: 26868917 PMCID: PMC4768042 DOI: 10.2196/jmir.4981] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Revised: 10/11/2015] [Accepted: 12/11/2015] [Indexed: 11/15/2022] Open
Abstract
Background Internet search query data reflect the attitudes of the users, using which we can measure the past orientation to commit suicide. Examinations of past orientation often highlight certain predispositions of attitude, many of which can be suicide risk factors. Objective To investigate the relationship between past orientation and suicide rate by examining Google search queries. Methods We measured the past orientation using Google search query data by comparing the search volumes of the past year and those of the future year, across the 50 US states and the District of Columbia during the period from 2004 to 2012. We constructed a panel dataset with independent variables as control variables; we then undertook an analysis using multiple ordinary least squares regression and methods that leverage the Akaike information criterion and the Bayesian information criterion. Results It was found that past orientation had a positive relationship with the suicide rate (P≤.001) and that it improves the goodness-of-fit of the model regarding the suicide rate. Unemployment rate (P≤.001 in Models 3 and 4), Gini coefficient (P≤.001), and population growth rate (P≤.001) had a positive relationship with the suicide rate, whereas the gross state product (P≤.001) showed a negative relationship with the suicide rate. Conclusions We empirically identified the positive relationship between the suicide rate and past orientation, which was measured by big data-driven Google search query.
Collapse
Affiliation(s)
- Donghyun Lee
- Korea Advanced Institute of Science and Technology, Graduate School of Innovation and Technology Management, Daejeon, Republic Of Korea
| | | | | |
Collapse
|
26
|
Yom-Tov E, Borsa D, Hayward AC, McKendry RA, Cox IJ. Automatic identification of Web-based risk markers for health events. J Med Internet Res 2015; 17:e29. [PMID: 25626480 PMCID: PMC4327439 DOI: 10.2196/jmir.4082] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Revised: 12/22/2014] [Accepted: 01/12/2015] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND The escalating cost of global health care is driving the development of new technologies to identify early indicators of an individual's risk of disease. Traditionally, epidemiologists have identified such risk factors using medical databases and lengthy clinical studies but these are often limited in size and cost and can fail to take full account of diseases where there are social stigmas or to identify transient acute risk factors. OBJECTIVE Here we report that Web search engine queries coupled with information on Wikipedia access patterns can be used to infer health events associated with an individual user and automatically generate Web-based risk markers for some of the common medical conditions worldwide, from cardiovascular disease to sexually transmitted infections and mental health conditions, as well as pregnancy. METHODS Using anonymized datasets, we present methods to first distinguish individuals likely to have experienced specific health events, and classify them into distinct categories. We then use the self-controlled case series method to find the incidence of health events in risk periods directly following a user's search for a query category, and compare to the incidence during other periods for the same individuals. RESULTS Searches for pet stores were risk markers for allergy. We also identified some possible new risk markers; for example: searching for fast food and theme restaurants was associated with a transient increase in risk of myocardial infarction, suggesting this exposure goes beyond a long-term risk factor but may also act as an acute trigger of myocardial infarction. Dating and adult content websites were risk markers for sexually transmitted infections, such as human immunodeficiency virus (HIV). CONCLUSIONS Web-based methods provide a powerful, low-cost approach to automatically identify risk factors, and support more timely and personalized public health efforts to bring human and economic benefits.
Collapse
|