1
|
Kamiński M, Czarny J, Skrzypczak P, Sienicki K, Roszak M. The Characteristics, Uses, and Biases of Studies Related to Malignancies Using Google Trends: Systematic Review. J Med Internet Res 2023; 25:e47582. [PMID: 37540544 PMCID: PMC10439473 DOI: 10.2196/47582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 05/24/2023] [Accepted: 06/12/2023] [Indexed: 08/05/2023] Open
Abstract
BACKGROUND The internet is a primary source of health information for patients, supplementing physician care. Google Trends (GT), a popular tool, allows the exploration of public interest in health-related phenomena. Despite the growing volume of GT studies, none have focused explicitly on oncology, creating a need for a systematic review to bridge this gap. OBJECTIVE We aimed to systematically characterize studies related to oncology using GT to describe its utilities and biases. METHODS We included all studies that used GT to analyze Google searches related to malignancies. We excluded studies written in languages other than English. The search was performed using the PubMed engine on August 1, 2022. We used the following search input: "Google trends" AND ("oncology" OR "cancer" or "malignancy" OR "tumor" OR "lymphoma" OR "multiple myeloma" OR "leukemia"). We analyzed sources of bias that included using search terms instead of topics, lack of confrontation of GT statistics with real-world data, and absence of sensitivity analysis. We performed descriptive statistics. RESULTS A total of 85 articles were included. The first study using GT for oncology research was published in 2013, and since then, the number of publications has increased annually. The studies were categorized as follows: 22% (19/85) were related to prophylaxis, 20% (17/85) pertained to awareness events, 11% (9/85) were celebrity-related, 13% (11/85) were related to COVID-19, and 47% (40/85) fell into other categories. The most frequently analyzed cancers were breast (n=28), prostate (n=26), lung (n=18), and colorectal cancers (n=18). We discovered that of the 85 studies, 17 (20%) acknowledged using GT topics instead of search terms, 79 (93%) disclosed all search input details necessary for replicating their results, and 34 (40%) compared GT statistics with real-world data. The most prevalent methods for analyzing the GT data were correlation analysis (55/85, 65%) and peak analysis (43/85, 51%). The authors of only 11% (9/85) of the studies performed a sensitivity analysis. CONCLUSIONS The number of studies related to oncology using GT data has increased annually. The studies included in this systematic review demonstrate a variety of concerning topics, search strategies, and statistical methodologies. The most frequently analyzed cancers were breast, prostate, lung, colorectal, skin, and cervical cancers, potentially reflecting their prevalence in the population or public interest. Although most researchers provided reproducible search inputs, only one-fifth used GT topics instead of search terms, and many studies lacked a sensitivity analysis. Scientists using GT for medical research should ensure the quality of studies by providing a transparent search strategy to reproduce results, preferring to use topics over search terms, and performing robust statistical calculations coupled with sensitivity analysis.
Collapse
Affiliation(s)
- Mikołaj Kamiński
- Department of Rheumatology, District Hospital in Kościan, Kościan, Poland
- Department of the Treatment of Obesity, Metabolic Disorders, and of Clinical Dietetics, Poznań University of Medical Sciences, Poznań, Poland
| | - Jakub Czarny
- Faculty of Medicine, Poznan University of Medical Sciences, Poznań, Poland
| | - Piotr Skrzypczak
- Department of Thoracic Surgery, Poznan University of Medical Sciences, Poznań, Poland
| | - Krzysztof Sienicki
- Department of Computer Science and Statistics, Poznan University of Medical Sciences, Poznań, Poland
| | - Magdalena Roszak
- Department of Computer Science and Statistics, Poznan University of Medical Sciences, Poznań, Poland
| |
Collapse
|
2
|
Neumann K, Mason SM, Farkas K, Santaularia NJ, Ahern J, Riddell CA. Harnessing Google Health Trends Data for Epidemiologic Research. Am J Epidemiol 2023; 192:430-437. [PMID: 36193858 PMCID: PMC9619602 DOI: 10.1093/aje/kwac171] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 08/25/2022] [Accepted: 09/30/2022] [Indexed: 01/21/2023] Open
Abstract
Interest in using internet search data, such as that from the Google Health Trends Application Programming Interface (GHT-API), to measure epidemiologically relevant exposures or health outcomes is growing due to their accessibility and timeliness. Researchers enter search term(s), geography, and time period, and the GHT-API returns a scaled probability of that search term, given all searches within the specified geographic-time period. In this study, we detailed a method for using these data to measure a construct of interest in 5 iterative steps: first, identify phrases the target population may use to search for the construct of interest; second, refine candidate search phrases with incognito Google searches to improve sensitivity and specificity; third, craft the GHT-API search term(s) by combining the refined phrases; fourth, test search volume and choose geographic and temporal scales; and fifth, retrieve and average multiple samples to stabilize estimates and address missingness. An optional sixth step involves accounting for changes in total search volume by normalizing. We present a case study examining weekly state-level child abuse searches in the United States during the coronavirus disease 2019 pandemic (January 2018 to August 2020) as an application of this method and describe limitations.
Collapse
Affiliation(s)
- Krista Neumann
- Correspondence to Krista Neumann, Division of Epidemiology, School of Public Health, University of California, Berkeley, Room #5404, 2121 Berkeley Way West, Berkeley, California, 94720 ()
| | - Susan M Mason
- Division of Epidemiology and Community Health, University of Minnesota, Minnesota, United States
| | - Kriszta Farkas
- Division of Epidemiology, School of Public Health, University of California, Berkeley, United States
- Division of Epidemiology and Community Health, University of Minnesota, Minnesota, United States
| | - N Jeanie Santaularia
- Division of Epidemiology and Community Health, University of Minnesota, Minnesota, United States
| | - Jennifer Ahern
- Division of Epidemiology, School of Public Health, University of California, Berkeley, United States
| | - Corinne A Riddell
- Division of Epidemiology, School of Public Health, University of California, Berkeley, United States
- Division of Biostatistics, School of Public Health, University of California, Berkeley, United States
| |
Collapse
|
3
|
Raman spectroscopy combined with machine learning algorithms for rapid detection Primary Sjögren's syndrome associated with interstitial lung disease. Photodiagnosis Photodyn Ther 2022; 40:103057. [PMID: 35944848 DOI: 10.1016/j.pdpdt.2022.103057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 07/15/2022] [Accepted: 08/05/2022] [Indexed: 12/14/2022]
Abstract
BACKGROUND Interstitial lung disease (ILD) is a major complication of Primary Sjögren's syndrome (pSS) patients.It is one of the main factors leading to death. The aim of this study is to evaluate the value of serum Raman spectroscopy combined with machine learning algorithms in the discriminatory diagnosis of patients with Primary Sjögren's syndrome associated with interstitial lung disease (pSS-ILD). METHODS Raman spectroscopy was performed on the serum of 30 patients with pSS, 28 patients with pSS-ILD and 30 healthy controls (HC). First, the data were pre-processed using baseline correction, smoothing, outlier removal and normalization operations. Then principal component analysis (PCA) is used to reduce the dimension of data. Finally, support vector machine(SVM), k nearest neighbor (KNN) and random forest (RF) models are established for classification. RESULTS In this study, SVM, KNN and RF were used as classification models, where SVM chooses polynomial kernel function (poly). The average accuracy, sensitivity, and precision of the three models were obtained after dimensionality reduction. The Accuracy of SVM (poly) was 5.71% higher than KNN and 6.67% higher than RF; Sensitivity was 5.79% higher than KNN and 8.56% higher than RF; Precision was 6.19% higher than KNN and 7.45% higher than RF. It can be seen that the SVM (poly) had better discriminative effect. In summary, SVM (poly) had a fine classification effect, and the average accuracy, sensitivity and precision of this model reached 89.52%, 91.27% and 89.52%, respectively, with an AUC value of 0.921. CONCLUSIONS This study demonstrates that serum RS combined with machine learning algorithms is a valuable tool for diagnosing patients with pSS-ILD. It has promising applications.
Collapse
|
4
|
Gao C, Zhang R, Chen X, Yao T, Song Q, Ye W, Li P, Wang Z, Yi D, Wu Y. Integrating Internet multisource big data to predict the occurrence and development of COVID-19 cryptic transmission. NPJ Digit Med 2022; 5:161. [PMID: 36307547 DOI: 10.1038/s41746-022-00704-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 10/07/2022] [Indexed: 11/09/2022] Open
Abstract
With the recent prevalence of COVID-19, cryptic transmission is worthy of attention and research. Early perception of the occurrence and development risk of cryptic transmission is an important part of controlling the spread of COVID-19. Previous relevant studies have limited data sources, and no effective analysis has been carried out on the occurrence and development of cryptic transmission. Hence, we collect Internet multisource big data (including retrieval, migration, and media data) and propose comprehensive and relative application strategies to eliminate the impact of national and media data. We use statistical classification and regression to construct an early warning model for occurrence and development. Under the guidance of the improved coronavirus herd immunity optimizer (ICHIO), we construct a "sampling-feature-hyperparameter-weight" synchronous optimization strategy. In occurrence warning, we propose an undersampling synchronous evolutionary ensemble (USEE); in development warning, we propose a bootstrap-sampling synchronous evolutionary ensemble (BSEE). Regarding the internal training data (Heilongjiang Province), the ROC-AUC of USEE3 incorporating multisource data is 0.9553, the PR-AUC is 0.8327, and the R2 of BSEE2 fused by the "nonlinear + linear" method is 0.8698. Regarding the external validation data (Shaanxi Province), the ROC-AUC and PR-AUC values of USEE3 were 0.9680 and 0.9548, respectively, and the R2 of BSEE2 was 0.8255. Our method has good accuracy and generalization and can be flexibly used in the prediction of cryptic transmission in various regions. We propose strategy research that integrates multiple early warning tasks based on multisource Internet big data and combines multiple ensemble models. It is an extension of the research in the field of traditional infectious disease monitoring and has important practical significance and innovative theoretical value.
Collapse
Affiliation(s)
- Chengcheng Gao
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China
| | - Rui Zhang
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China
| | - Xicheng Chen
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China
| | - Tianhua Yao
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China
| | - Qiuyue Song
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China
| | - Wei Ye
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China
| | - PengPeng Li
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China
| | - Zhenyan Wang
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China
| | - Dong Yi
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China.
| | - Yazhou Wu
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, 400038, China.
| |
Collapse
|
5
|
Song S, Li Q, Shen L, Sun M, Yang Z, Wang N, Liu J, Liu K, Shao Z. From Outbreak to Near Disappearance: How Did Non-pharmaceutical Interventions Against COVID-19 Affect the Transmission of Influenza Virus? Front Public Health 2022; 10:863522. [PMID: 35425738 PMCID: PMC9001955 DOI: 10.3389/fpubh.2022.863522] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 03/02/2022] [Indexed: 11/13/2022] Open
Abstract
Influenza shares the same putative transmission pathway with coronavirus disease 2019 (COVID-19), and causes tremendous morbidity and mortality annually globally. Since the transmission of COVID-19 in China, a series of non-pharmaceutical interventions (NPIs) against to the disease have been implemented to contain its transmission. Based on the surveillance data of influenza, Search Engine Index, and meteorological factors from 2011 to 2021 in Xi'an, and the different level of emergence responses for COVID-19 from 2020 to 2021, Bayesian Structural Time Series model and interrupted time series analysis were applied to quantitatively assess the impact of NPIs in sequent phases with different intensities, and to estimate the reduction of influenza infections. From 2011 to 2021, a total of 197,528 confirmed cases of influenza were reported in Xi'an, and the incidence of influenza continuously increased from 2011 to 2019, especially, in 2019-2020, when the incidence was up to 975.90 per 100,000 persons; however, it showed a sharp reduction of 97.68% in 2020-2021, and of 87.22% in 2021, comparing with 2019-2020. The highest impact on reduction of influenza was observed in the phase of strict implementation of NPIs with an inclusion probability of 0.54. The weekly influenza incidence was reduced by 95.45%, and an approximate reduction of 210,100 (95% CI: 125,100-329,500) influenza infections was found during the post-COVID-19 period. The reduction exhibited significant variations in the geographical, population, and temporal distribution. Our findings demonstrated that NPIs against COVID-19 had a long-term impact on the reduction of influenza transmission.
Collapse
Affiliation(s)
- Shuxuan Song
- Department of Epidemiology, Ministry of Education Key Lab of Hazard Assessment and Control in Special Operational Environment, School of Public Health, Air Force Medical University, Xi'an, China
| | - Qian Li
- Department of Infectious Disease Control and Prevention, Xi'an Center for Disease Prevention and Control, Xi'an, China
| | - Li Shen
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, China
| | - Minghao Sun
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, China
| | - Zurong Yang
- Department of Epidemiology, Ministry of Education Key Lab of Hazard Assessment and Control in Special Operational Environment, School of Public Health, Air Force Medical University, Xi'an, China
| | - Nuoya Wang
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, China
| | - Jifeng Liu
- Department of Infectious Disease Control and Prevention, Xi'an Center for Disease Prevention and Control, Xi'an, China
| | - Kun Liu
- Department of Epidemiology, Ministry of Education Key Lab of Hazard Assessment and Control in Special Operational Environment, School of Public Health, Air Force Medical University, Xi'an, China
| | - Zhongjun Shao
- Department of Epidemiology, Ministry of Education Key Lab of Hazard Assessment and Control in Special Operational Environment, School of Public Health, Air Force Medical University, Xi'an, China
| |
Collapse
|
6
|
Oladeji O, Zhang C, Moradi T, Tarapore D, Stokes AC, Marivate V, Sengeh MD, Nsoesie EO. Monitoring Information-Seeking Patterns and Obesity Prevalence in Africa With Internet Search Data: Observational Study. JMIR Public Health Surveill 2021; 7:e24348. [PMID: 33913815 PMCID: PMC8120431 DOI: 10.2196/24348] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 02/12/2021] [Accepted: 02/23/2021] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND The prevalence of chronic conditions such as obesity, hypertension, and diabetes is increasing in African countries. Many chronic diseases have been linked to risk factors such as poor diet and physical inactivity. Data for these behavioral risk factors are usually obtained from surveys, which can be delayed by years. Behavioral data from digital sources, including social media and search engines, could be used for timely monitoring of behavioral risk factors. OBJECTIVE The objective of our study was to propose the use of digital data from internet sources for monitoring changes in behavioral risk factors in Africa. METHODS We obtained the adjusted volume of search queries submitted to Google for 108 terms related to diet, exercise, and disease from 2010 to 2016. We also obtained the obesity and overweight prevalence for 52 African countries from the World Health Organization (WHO) for the same period. Machine learning algorithms (ie, random forest, support vector machine, Bayes generalized linear model, gradient boosting, and an ensemble of the individual methods) were used to identify search terms and patterns that correlate with changes in obesity and overweight prevalence across Africa. Out-of-sample predictions were used to assess and validate the model performance. RESULTS The study included 52 African countries. In 2016, the WHO reported an overweight prevalence ranging from 20.9% (95% credible interval [CI] 17.1%-25.0%) to 66.8% (95% CI 62.4%-71.0%) and an obesity prevalence ranging from 4.5% (95% CI 2.9%-6.5%) to 32.5% (95% CI 27.2%-38.1%) in Africa. The highest obesity and overweight prevalence were noted in the northern and southern regions. Google searches for diet-, exercise-, and obesity-related terms explained 97.3% (root-mean-square error [RMSE] 1.15) of the variation in obesity prevalence across all 52 countries. Similarly, the search data explained 96.6% (RMSE 2.26) of the variation in the overweight prevalence. The search terms yoga, exercise, and gym were most correlated with changes in obesity and overweight prevalence in countries with the highest prevalence. CONCLUSIONS Information-seeking patterns for diet- and exercise-related terms could indicate changes in attitudes toward and engagement in risk factors or healthy behaviors. These trends could capture population changes in risk factor prevalence, inform digital and physical interventions, and supplement official data from surveys.
Collapse
Affiliation(s)
- Olubusola Oladeji
- Department of Global Health, School of Public Health, Boston University, Boston, MA, United States
| | - Chi Zhang
- Department of Computer Science, Boston University, Boston, MA, United States
| | - Tiam Moradi
- Department of Computer Science, Boston University, Boston, MA, United States
| | - Dharmesh Tarapore
- Department of Computer Science, Boston University, Boston, MA, United States
| | - Andrew C Stokes
- Department of Global Health, School of Public Health, Boston University, Boston, MA, United States
| | - Vukosi Marivate
- Department of Computer Science, University of Pretoria, Pretoria, South Africa
| | - Moinina D Sengeh
- Directorate of Science, Technology and Innovation, Freetown, Sierra Leone
| | - Elaine O Nsoesie
- Department of Global Health, School of Public Health, Boston University, Boston, MA, United States
| |
Collapse
|