1
|
Zhou W, Huang D, Liang Q, Huang T, Wang X, Pei H, Chen S, Liu L, Wei Y, Qin L, Xie Y. Early warning and predicting of COVID-19 using zero-inflated negative binomial regression model and negative binomial regression model. BMC Infect Dis 2024; 24:1006. [PMID: 39300391 PMCID: PMC11414173 DOI: 10.1186/s12879-024-09940-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 09/16/2024] [Indexed: 09/22/2024] Open
Abstract
BACKGROUND It is difficult to detect the outbreak of emergency infectious disease based on the exiting surveillance system. Here we investigate the utility of the Baidu Search Index, an indicator of how large of a keyword is in Baidu's search volume, in the early warning and predicting the epidemic trend of COVID-19. METHODS The daily number of cases and the Baidu Search Index of 8 keywords (weighted by population) from December 1, 2019 to March 15, 2020 were collected and analyzed with times series and Spearman correlation with different time lag. To predict the daily number of COVID-19 cases using the Baidu Search Index, Zero-inflated negative binomial regression was used in phase 1 and negative binomial regression model was used in phase 2 and phase 3 based on the characteristic of independent variable. RESULTS The Baidu Search Index of all keywords in Wuhan was significantly higher than Hubei (excluded Wuhan) and China (excluded Hubei). Before the causative pathogen was identified, the search volume of "Influenza" and "Pneumonia" in Wuhan increased with the number of new onset cases, their correlation coefficient was 0.69 and 0.59, respectively. After the pathogen was public but before COVID-19 was classified as a notifiable disease, the search volume of "SARS", "Pneumonia", "Coronavirus" in all study areas increased with the number of new onset cases with the correlation coefficient was 0.69 ~ 0.89, while "Influenza" changed to negative correlated (rs: -0.56 ~ -0.64). After COVID-19 was closely monitored, the Baidu Search Index of "COVID-19", "Pneumonia", "Coronavirus", "SARS" and "Mask" could predict the epidemic trend with 15 days, 5 days and 6 days lead time, respectively in Wuhan, Hubei (excluded Wuhan) and China (excluded Hubei). The predicted number of cases would increase 1.84 and 4.81 folds, respectively than the actual number of cases in Wuhan and Hubei (excluded Wuhan) from 21 January to 9 February. CONCLUSION The Baidu Search Index could be used in the early warning and predicting the epidemic trend of COVID-19, but the search keywords changed in different period. Considering the time lag from onset to diagnosis, especially in the areas with medical resources shortage, internet search data can be a highly effective supplement of the existing surveillance system.
Collapse
Affiliation(s)
- Wanwan Zhou
- Department of Epidemiology and Biostatistics, Guangxi Medical University, 22 Shuangyong Road, Qingxiu District, Nanning, Guangxi, 530021, China
| | - Daizheng Huang
- Institute of Life Science, Guangxi Medical University, Nanning, China
| | - Qiuyu Liang
- Department of Health Management, The People's Hospital of Guangxi Zhuang Autonomous Region & Research Center of Health Management, Guangxi Academy of Medical Sciences, Nanning, China
| | - Tengda Huang
- Department of Epidemiology and Biostatistics, Guangxi Medical University, 22 Shuangyong Road, Qingxiu District, Nanning, Guangxi, 530021, China
| | - Xiaomin Wang
- Department of Epidemiology and Biostatistics, Guangxi Medical University, 22 Shuangyong Road, Qingxiu District, Nanning, Guangxi, 530021, China
| | - Hengyan Pei
- Department of Epidemiology and Biostatistics, Guangxi Medical University, 22 Shuangyong Road, Qingxiu District, Nanning, Guangxi, 530021, China
| | - Shiwen Chen
- Department of Epidemiology and Biostatistics, Guangxi Medical University, 22 Shuangyong Road, Qingxiu District, Nanning, Guangxi, 530021, China
| | - Lu Liu
- Department of Epidemiology and Biostatistics, Guangxi Medical University, 22 Shuangyong Road, Qingxiu District, Nanning, Guangxi, 530021, China
| | - Yuxia Wei
- Department of Epidemiology and Biostatistics, Guangxi Medical University, 22 Shuangyong Road, Qingxiu District, Nanning, Guangxi, 530021, China
| | - Litai Qin
- Department of Epidemiology and Biostatistics, Guangxi Medical University, 22 Shuangyong Road, Qingxiu District, Nanning, Guangxi, 530021, China
| | - Yihong Xie
- Department of Epidemiology and Biostatistics, Guangxi Medical University, 22 Shuangyong Road, Qingxiu District, Nanning, Guangxi, 530021, China.
| |
Collapse
|
2
|
Yang K, Liu L, Wen Y. The impact of Bayesian optimization on feature selection. Sci Rep 2024; 14:3948. [PMID: 38366092 PMCID: PMC10873405 DOI: 10.1038/s41598-024-54515-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 02/13/2024] [Indexed: 02/18/2024] Open
Abstract
Feature selection is an indispensable step for the analysis of high-dimensional molecular data. Despite its importance, consensus is lacking on how to choose the most appropriate feature selection methods, especially when the performance of the feature selection methods itself depends on hyper-parameters. Bayesian optimization has demonstrated its advantages in automatically configuring the settings of hyper-parameters for various models. However, it remains unclear whether Bayesian optimization can benefit feature selection methods. In this research, we conducted extensive simulation studies to compare the performance of various feature selection methods, with a particular focus on the impact of Bayesian optimization on those where hyper-parameters tuning is needed. We further utilized the gene expression data obtained from the Alzheimer's Disease Neuroimaging Initiative to predict various brain imaging-related phenotypes, where various feature selection methods were employed to mine the data. We found through simulation studies that feature selection methods with hyper-parameters tuned using Bayesian optimization often yield better recall rates, and the analysis of transcriptomic data further revealed that Bayesian optimization-guided feature selection can improve the accuracy of disease risk prediction models. In conclusion, Bayesian optimization can facilitate feature selection methods when hyper-parameter tuning is needed and has the potential to substantially benefit downstream tasks.
Collapse
Affiliation(s)
- Kaixin Yang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, No 56 Xinjian South Road, Yingze District, Taiyuan, Shanxi, China
| | - Long Liu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, No 56 Xinjian South Road, Yingze District, Taiyuan, Shanxi, China.
| | - Yalu Wen
- Department of Statistics, University of Auckland, 38 Princes Street, Auckland Central, Auckland, 1010, New Zealand.
| |
Collapse
|