1
|
Habibdoust A, Seifaddini M, Tatar M, Araz OM, Wilson FA. Predicting COVID-19 new cases in California with Google Trends data and a machine learning approach. Inform Health Soc Care 2024; 49:56-72. [PMID: 38353707 DOI: 10.1080/17538157.2024.2315246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
BACKGROUND Google Trends data can be a valuable source of information for health-related issues such as predicting infectious disease trends. OBJECTIVES To evaluate the accuracy of predicting COVID-19 new cases in California using Google Trends data, we develop and use a GMDH-type neural network model and compare its performance with a LTSM model. METHODS We predicted COVID-19 new cases using Google query data over three periods. Our first period covered March 1, 2020, to July 31, 2020, including the first peak of infection. We also estimated a model from October 1, 2020, to January 7, 2021, including the second wave of COVID-19 and avoiding possible biases from public interest in searching about the new pandemic. In addition, we extended our forecasting period from May 20, 2020, to January 31, 2021, to cover an extended period of time. RESULTS Our findings show that Google relative search volume (RSV) can be used to accurately predict new COVID-19 cases. We find that among our Google relative search volume terms, "Fever," "COVID Testing," "Signs of COVID," "COVID Treatment," and "Shortness of Breath" increase model predictive accuracy. CONCLUSIONS Our findings highlight the value of using data sources providing near real-time data, e.g., Google Trends, to detect trends in COVID-19 cases, in order to supplement and extend existing epidemiological models.
Collapse
Affiliation(s)
- Amir Habibdoust
- Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA
| | | | - Moosa Tatar
- Department of Pharmaceutical Health Outcomes and Policy, University of Houston College of Pharmacy, Houston, Texas, USA
| | - Ozgur M Araz
- College of Business, University of Nebraska- Lincoln, Lincoln, Nebraska, USA
| | - Fernando A Wilson
- Matheson Center for Health Care Studies, University of Utah, Salt Lake City, Utah, USA
- Department of Population Health Sciences, University of Utah, Salt Lake City, Utah, USA
- Department of Economics, University of Utah, Salt Lake City, Utah, USA
| |
Collapse
|
2
|
Tudor C, Sova RA. Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence. PeerJ Comput Sci 2023; 9:e1518. [PMID: 37869464 PMCID: PMC10588692 DOI: 10.7717/peerj-cs.1518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 07/14/2023] [Indexed: 10/24/2023]
Abstract
Background Colorectal cancer (CRC) is the third most prevalent and second most lethal form of cancer in the world. Consequently, CRC cancer prevalence projections are essential for assessing the future burden of the disease, planning resource allocation, and developing service delivery strategies, as well as for grasping the shifting environment of cancer risk factors. However, unlike cancer incidence and mortality rates, national and international agencies do not routinely issue projections for cancer prevalence. Moreover, the limited or even nonexistent cancer statistics for large portions of the world, along with the high heterogeneity among world nations, further complicate the task of producing timely and accurate CRC prevalence projections. In this situation, population interest, as shown by Internet searches, can be very important for improving cancer statistics and, in the long run, for helping cancer research. Methods This study aims to model, nowcast and forecast the CRC prevalence at the global level using a three-step framework that incorporates three well-established univariate statistical and machine-learning models. First, data mining is performed to evaluate the relevancy of Google Trends (GT) data as a surrogate for the number of CRC survivors. The results demonstrate that population web-search interest in the term "colonoscopy" is the most reliable indicator to nowcast CRC disease prevalence. Then, various statistical and machine-learning models, including ARIMA, ETS, and FNNAR, are trained and tested using relevant GT time series. Finally, the updated monthly query series spanning 2004-2022 and the best forecasting model in terms of out-of-sample forecasting ability (i.e., the neural network autoregression) are utilized to generate point forecasts up to 2025. Results Results show that the number of people with colorectal cancer will continue to rise over the next 24 months. This in turn emphasizes the urgency for public policies aimed at reducing the population's exposure to the principal modifiable risk factors, such as lifestyle and nutrition. In addition, given the major drop in population interest in CRC during the first wave of the COVID-19 pandemic, the findings suggest that public health authorities should implement measures to increase cancer screening rates during pandemics. This in turn would deliver positive externalities, including the mitigation of the global burden and the enhancement of the quality of official statistics.
Collapse
Affiliation(s)
- Cristiana Tudor
- Bucharest University of Economic Studies, Bucharest, Romania
| | | |
Collapse
|
3
|
Tselebis A, Zabuliene L, Milionis C, Ilias I. Pandemic and precocious puberty - a Google trends study. World J Methodol 2023; 13:1-9. [PMID: 36684480 PMCID: PMC9850652 DOI: 10.5662/wjm.v13.i1.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 11/29/2022] [Accepted: 01/11/2023] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Recent publications from several countries have reported that more young people (mainly girls) are experiencing precocious puberty (PP)/menarche during the coronavirus disease 2019 pandemic compared to the past. This variation is attributed to the stress of confinement, lack of exercise, obesity and disturbed sleep patterns. A common feature of the relevant papers, however, is the small number of reported cases of PP. Studies have shown that searches for diseases on the internet also reflect to some extent the epidemiology of these diseases.
AIM To estimate, through internet searches for PP, any changes in the epidemiology of PP.
METHODS We assessed in Google Trends searches for 21 PP-related terms in English internationally (which practically dwarf searches in other languages), in the years 2017-2021. Additionally, we assessed local searches for selected terms, in English and local languages, in countries where a rise in PP has been reported. Searches were collected in Relative Search Volumes format and analyzed using Kendall’s Tau test, with a statistical significance threshold of P < 0.05.
RESULTS Internationally, searches for three PP-related terms showed no noticeable change over the study period, while searches for eight terms showed a decrease. An increase was found over time in searches for nine PP-related terms. Of the 17 searches in English and local languages, in countries where a rise in PP has been reported, 5 showed a significant increase over time.
CONCLUSION Over the study period, more than half of the search terms showed little change or declined. The discrepancy between internet searches for PP and the reported increase in the literature is striking. It would be expected that a true increase in the incidence of PP would also be aptly reflected in Google trends. If our findings are valid, the literature may have been biased. The known secular trend of decreasing age of puberty may also have played a role.
Collapse
Affiliation(s)
- Athanasios Tselebis
- Department of Psychiatry, “Sotiria” General Chest Diseases Hospital, Athens GR-11527, Greece
| | - Lina Zabuliene
- Faculty of Medicine, Vilnius University, Vilnius LT-03101, Lithuania
| | - Charalampos Milionis
- Department of Endocrinology, Elena Venizelou General and Maternity Hospital, Athens GR-11521, Greece
| | - Ioannis Ilias
- Department of Endocrinology, Elena Venizelou General and Maternity Hospital, Athens GR-11521, Greece
| |
Collapse
|
4
|
Di Simone E, Panattoni N, De Giorgi A, Rodríguez-Muñoz PM, Bondanelli M, Rodríguez-Cortés FJ, López-Soto PJ, Giannetta N, Dionisi S, Di Muzio M, Fabbian F. Googling Insomnia, Light, Metabolism, and Circadian: A Population Interest Simple Report. Brain Sci 2022; 12:brainsci12121683. [PMID: 36552143 PMCID: PMC9775449 DOI: 10.3390/brainsci12121683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 11/30/2022] [Accepted: 12/03/2022] [Indexed: 12/13/2022] Open
Abstract
Exposure to light at night, insomnia, and disrupted circadian patterns could be considered risk factors for developing noncommunicable diseases. Understanding the awareness of the general population about the abovementioned factors could be essential to predict noncommunicable diseases. This report aimed to investigate the general community's interest in circadian, insomnia, metabolism, and light using Google Trends, and to evaluate results from different geographic areas. Relative search volumes (RSVs) for the factors mentioned, filtered by the "Health" category, were collected between 2007 and 2021. Moreover, RSVs were analysed in five different European languages. Worldwide mean RSVs for "Circadian", "Insomnia", "Light", and "Metabolism" during the study period were 2%, 13.4%, 62.2%, and 10%, respectively. In different developed countries, searching for light, insomnia, and metabolism were different, suggesting a variable level of awareness. Limited knowledge about the circadian pattern of human activities was detected. The highest correlation coefficient was calculated. Our results suggest the potential role of extensive data analysis in understanding the public interest and awareness about these risk factors. Moreover, it should be interpreted as the onset of stimulus for researchers to use comprehensible language for reaching comprehensive media coverage to prevent sleep and circadian system disturbances.
Collapse
Affiliation(s)
- Emanuele Di Simone
- Nursing, Technical, Rehabilitation, Assistance and Research Direction-IRCCS Istituti Fisioterapici Ospitalieri-IFO, 00144 Rome, Italy
| | - Nicolò Panattoni
- Nursing, Technical, Rehabilitation, Assistance and Research Direction-IRCCS Istituti Fisioterapici Ospitalieri-IFO, 00144 Rome, Italy
| | - Alfredo De Giorgi
- Clinica Medica Unit, University Hospital of Ferrara, 44124 Ferrara, Italy
| | - Pedro Manuel Rodríguez-Muñoz
- Department of Nursing and Physiotherapy, Universidad de Salamanca, 37008 Salamanca, Spain
- Department of Nursing, Instituto Maimónides de Investigación Biomédica de Córdoba, 14004 Córdoba, Spain
| | - Marta Bondanelli
- Department of Medical Sciences, University of Ferrara, 44121 Ferrara, Italy
| | - Francisco José Rodríguez-Cortés
- Department of Nursing, Instituto Maimónides de Investigación Biomédica de Córdoba, 14004 Córdoba, Spain
- Department of Nursing, Pharmacology and Physiotherapy, Universidad de Córdoba, 14004 Córdoba, Spain
- Hospital Universitario Reina Sofía de Córdoba, 14004 Córdoba, Spain
| | - Pablo Jesús López-Soto
- Department of Nursing, Instituto Maimónides de Investigación Biomédica de Córdoba, 14004 Córdoba, Spain
- Department of Nursing, Pharmacology and Physiotherapy, Universidad de Córdoba, 14004 Córdoba, Spain
- Hospital Universitario Reina Sofía de Córdoba, 14004 Córdoba, Spain
| | - Noemi Giannetta
- School of Nursing, UniCamillus-Saint Camillus International University of Health and Medical Sciences, 00131 Rome, Italy
| | - Sara Dionisi
- Department of Clinical and Molecular Medicine, Sapienza University of Rome, 00185 Rome, Italy
| | - Marco Di Muzio
- Department of Clinical and Molecular Medicine, Sapienza University of Rome, 00185 Rome, Italy
| | - Fabio Fabbian
- Department of Nursing, Instituto Maimónides de Investigación Biomédica de Córdoba, 14004 Córdoba, Spain
| |
Collapse
|
5
|
Monkeypox Outbreak Reflecting Rising Search Trend and Concern in Nonendemic Countries: A Google Trend Analysis. Disaster Med Public Health Prep 2022; 17:e286. [PMID: 36245310 DOI: 10.1017/dmp.2022.243] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
6
|
Saegner T, Austys D. Forecasting and Surveillance of COVID-19 Spread Using Google Trends: Literature Review. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:12394. [PMID: 36231693 PMCID: PMC9566212 DOI: 10.3390/ijerph191912394] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/23/2022] [Accepted: 09/26/2022] [Indexed: 06/16/2023]
Abstract
The probability of future Coronavirus Disease (COVID)-19 waves remains high, thus COVID-19 surveillance and forecasting remains important. Online search engines harvest vast amounts of data from the general population in real time and make these data publicly accessible via such tools as Google Trends (GT). Therefore, the aim of this study was to review the literature about possible use of GT for COVID-19 surveillance and prediction of its outbreaks. We collected and reviewed articles about the possible use of GT for COVID-19 surveillance published in the first 2 years of the pandemic. We resulted in 54 publications that were used in this review. The majority of the studies (83.3%) included in this review showed positive results of the possible use of GT for forecasting COVID-19 outbreaks. Most of the studies were performed in English-speaking countries (61.1%). The most frequently used keyword was "coronavirus" (53.7%), followed by "COVID-19" (31.5%) and "COVID" (20.4%). Many authors have made analyses in multiple countries (46.3%) and obtained the same results for the majority of them, thus showing the robustness of the chosen methods. Various methods including long short-term memory (3.7%), random forest regression (3.7%), Adaboost algorithm (1.9%), autoregressive integrated moving average, neural network autoregression (1.9%), and vector error correction modeling (1.9%) were used for the analysis. It was seen that most of the publications with positive results (72.2%) were using data from the first wave of the COVID-19 pandemic. Later, the search volumes reduced even though the incidence peaked. In most countries, the use of GT data showed to be beneficial for forecasting and surveillance of COVID-19 spread.
Collapse
|
7
|
Amusa LB, Twinomurinzi H, Okonkwo CW. Modeling COVID-19 incidence with Google Trends. Front Res Metr Anal 2022; 7:1003972. [PMID: 36186843 PMCID: PMC9520600 DOI: 10.3389/frma.2022.1003972] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 08/30/2022] [Indexed: 11/13/2022] Open
Abstract
Infodemiologic methods could be used to enhance modeling infectious diseases. It is of interest to verify the utility of these methods using a Nigerian case study. We used Google Trends data to track COVID-19 incidences and assessed whether they could complement traditional data based solely on reported case numbers. Data on the Nigerian weekly COVID-19 cases spanning through March 1, 2020, to May 31, 2021, were matched with internet search data from Google Trends. The reported weekly incidence numbers and the GT data were split into training and testing sets. ARIMA models were fitted to describe reported weekly COVID cases using the training set. Several COVID-related search terms were theoretically and empirically assessed for initial screening. The utilized Google Trends (GT) variable was added to the ARIMA model as a regressor. Model forecasts, both with and without GTD, were compared with weekly cases in the test set over 13 weeks. Forecast accuracies were compared visually and using RMSE (root mean square error) and MAE (mean average error). Statistical significance of the difference in predictions was determined with the two-sided Diebold-Mariano test. Preliminary results of contemporaneous correlations between COVID-related search terms and weekly COVID cases reveal “loss of smell,” “loss of taste,” “fever” (in order of magnitude) as significantly associated with the official cases. Predictions of the ARIMA model using solely reported case numbers resulted in an RMSE (root mean squared error) of 411.4 and mean absolute error (MAE) of 354.9. The GT expanded model achieved better forecasting accuracy (RMSE: 388.7 and MAE = 340.1). Corrected Akaike Information Criteria also favored the GT expanded model (869.4 vs. 872.2). The difference in predictive performances was significant when using a two-sided Diebold-Mariano test (DM = 6.75, p < 0.001) for the 13 weeks. Google trends data enhanced the predictive ability of a traditionally based model and should be considered a suitable method to enhance infectious disease modeling.
Collapse
|
8
|
Ilias I, Milionis C, Koukkou E. COVID-19 and thyroid disease: An infodemiological pilot study. World J Methodol 2022; 12:99-106. [PMID: 35721248 PMCID: PMC9157630 DOI: 10.5662/wjm.v12.i3.99] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 02/11/2022] [Accepted: 03/27/2022] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Google Trends searches for symptoms and/or diseases may reflect actual disease epidemiology. Recently, Google Trends searches for coronavirus disease 2019 (COVID-19)-associated terms have been linked to the epidemiology of COVID-19. Some studies have linked COVID-19 with thyroid disease.
AIM To assess COVID-19 cases per se vs COVID-19-associated Google Trends searches and thyroid-associated Google Trends searches.
METHODS We collected data on worldwide weekly Google Trends searches regarding “COVID-19”, “severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)”, “coronavirus”, “smell”, “taste”, “cough”, “thyroid”, “thyroiditis”, and “subacute thyroiditis” for 92 wk and worldwide weekly COVID-19 cases' statistics in the same time period. The study period was split in half (approximately corresponding to the preponderance of different SARS-COV-2 virus variants) and in each time period we performed cross-correlation analysis and mediation analysis.
RESULTS Significant positive cross-correlation function values were noted in both time periods. More in detail, COVID-19 cases per se were found to be associated with no lag with Google Trends searches for COVID-19 symptoms in the first time period and in the second time period to lead searches for symptoms, COVID-19 terms, and thyroid terms. COVID-19 cases per se were associated with thyroid-related searches in both time periods. In the second time period, the effect of “COVID-19” searches on “thyroid’ searches was significantly mediated by COVID-19 cases (P = 0.048).
CONCLUSION Searches for a non-specific symptom or COVID-19 search terms mostly lead Google Trends thyroid-related searches, in the second time period. This time frame/sequence particularly in the second time period (noted by the preponderance of the SARS-COV-2 delta variant) lends some credence to associations of COVID-19 cases per se with (apparent) thyroid disease (via searches for them).
Collapse
Affiliation(s)
- Ioannis Ilias
- Department of Endocrinology, Diabetes & Metabolism, Elena Venizelou Hospital, Athens GR-11521, Greece
| | - Charalampos Milionis
- Department of Endocrinology, Diabetes & Metabolism, Elena Venizelou Hospital, Athens GR-11521, Greece
| | - Eftychia Koukkou
- Department of Endocrinology, Diabetes & Metabolism, Elena Venizelou Hospital, Athens GR-11521, Greece
| |
Collapse
|
9
|
Kumar S. The global impact of pandemics on world economy and public health response. COMPUTATIONAL APPROACHES FOR NOVEL THERAPEUTIC AND DIAGNOSTIC DESIGNING TO MITIGATE SARS-COV-2 INFECTION 2022. [PMCID: PMC9300556 DOI: 10.1016/b978-0-323-91172-6.00022-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Since the dawn of the human era on earth, pandemics of human diseases have proved to be stumbling blocks to endeavors of growth and prosperity. There are historical mentions of pandemics and epidemics every few hundred years or so. The black plague, smallpox, cholera, plague, influenza, etc., have been reported variously in human history as reasons for considerable human misery in terms of both losses of lives and wealth. Pandemics have been estimated to affect the economy both positively and negatively. The current Coronavirus disease 2019 pandemic has brought to the forefront the need and means to explore global health catastrophes.
Collapse
|