1
|
Wang X, Ren J, Ren H, Song W, Qiao Y, Zhao Y, Linghu L, Cui Y, Zhao Z, Chen L, Qiu L. Diabetes mellitus early warning and factor analysis using ensemble Bayesian networks with SMOTE-ENN and Boruta. Sci Rep 2023; 13:12718. [PMID: 37543637 PMCID: PMC10404250 DOI: 10.1038/s41598-023-40036-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 08/03/2023] [Indexed: 08/07/2023] Open
Abstract
Diabetes mellitus (DM) has become the third chronic non-infectious disease affecting patients after tumor, cardiovascular and cerebrovascular diseases, becoming one of the major public health issues worldwide. Detection of early warning risk factors for DM is key to the prevention of DM, which has been the focus of some previous studies. Therefore, from the perspective of residents' self-management and prevention, this study constructed Bayesian networks (BNs) combining feature screening and multiple resampling techniques for DM monitoring data with a class imbalance in Shanxi Province, China, to detect risk factors in chronic disease monitoring programs and predict the risk of DM. First, univariate analysis and Boruta feature selection algorithm were employed to conduct the preliminary screening of all included risk factors. Then, three resampling techniques, SMOTE, Borderline-SMOTE (BL-SMOTE) and SMOTE-ENN, were adopted to deal with data imbalance. Finally, BNs developed by three algorithms (Tabu, Hill-climbing and MMHC) were constructed using the processed data to find the warning factors that strongly correlate with DM. The results showed that the accuracy of DM classification is significantly improved by the BNs constructed by processed data. In particular, the BNs combined with the SMOTE-ENN resampling improved the most, and the BNs constructed by the Tabu algorithm obtained the best classification performance compared with the hill-climbing and MMHC algorithms. The best-performing joint Boruta-SMOTE-ENN-Tabu model showed that the risk factors of DM included family history, age, central obesity, hyperlipidemia, salt reduction, occupation, heart rate, and BMI.
Collapse
Affiliation(s)
- Xuchun Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Jiahui Ren
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Hao Ren
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Wenzhu Song
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Yuchao Qiao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Ying Zhao
- Shanxi Centre for Disease Control and Prevention, Taiyuan, 030012, Shanxi, China
| | - Liqin Linghu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
- Shanxi Centre for Disease Control and Prevention, Taiyuan, 030012, Shanxi, China
| | - Yu Cui
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Zhiyang Zhao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Limin Chen
- Shanxi Provincial People's Hospital, Taiyuan, Shanxi, China.
| | - Lixia Qiu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China.
| |
Collapse
|
2
|
Zhu K, Kurowicka D. Regular vines with strongly chordal pattern of (conditional) independence. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
3
|
Quan D, Ren J, Ren H, Linghu L, Wang X, Li M, Qiao Y, Ren Z, Qiu L. Exploring influencing factors of chronic obstructive pulmonary disease based on elastic net and Bayesian network. Sci Rep 2022; 12:7563. [PMID: 35534641 PMCID: PMC9085890 DOI: 10.1038/s41598-022-11125-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Accepted: 04/08/2022] [Indexed: 01/15/2023] Open
Abstract
AbstractThis study aimed to construct Bayesian networks (BNs) to analyze the network relationships between COPD and its influencing factors, and the strength of each factor's influence on COPD was reflected through network reasoning. Elastic Net and Max-Min Hill-Climbing (MMHC) algorithm were adopted to screen the variables on the surveillance data of COPD among residents in Shanxi Province, China from 2014 to 2015, and construct BNs respectively. 10 variables finally entered the model after screening by Elastic Net. The BNs constructed by MMHC showed that smoking status, household air pollution, family history, cough, air hunger or dyspnea were directly related to COPD, and Gender was indirectly linked to COPD through smoking status. Moreover, smoking status, household air pollution and family history were the parent nodes of COPD, and cough, air hunger or dyspnea represented the child nodes of COPD. In other words, smoking status, household air pollution and family history were related to the occurrence of COPD, and COPD would make patients’ cough, air hunger or dyspnea worse. Generally speaking, BNs could reveal the complex network linkages between COPD and its relevant factors well, making it more convenient to carry out targeted prevention and control of COPD.
Collapse
|
4
|
Wang X, Pan J, Ren Z, Zhai M, Zhang Z, Ren H, Song W, He Y, Li C, Yang X, Li M, Quan D, Chen L, Qiu L. Application of a novel hybrid algorithm of Bayesian network in the study of hyperlipidemia related factors: a cross-sectional study. BMC Public Health 2021; 21:1375. [PMID: 34247609 PMCID: PMC8273956 DOI: 10.1186/s12889-021-11412-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 06/29/2021] [Indexed: 12/27/2022] Open
Abstract
Background This article aims to understand the prevalence of hyperlipidemia and its related factors in Shanxi Province. On the basis of multivariate Logistic regression analysis to find out the influencing factors closely related to hyperlipidemia, the complex network connection between various variables was presented through Bayesian networks(BNs). Methods Logistic regression was used to screen for hyperlipidemia-related variables, and then the complex network connection between various variables was presented through BNs. Since some drawbacks stand out in the Max-Min Hill-Climbing (MMHC) hybrid algorithm, extra hybrid algorithms are proposed to construct the BN structure: MMPC-Tabu, Fast.iamb-Tabu and Inter.iamb-Tabu. To assess their performance, we made a comparison between these three hybrid algorithms with the widely used MMHC hybrid algorithm on randomly generated datasets. Afterwards, the optimized BN was determined to explore to study related factors for hyperlipidemia. We also make a comparison between the BN model with logistic regression model. Results The BN constructed by Inter.iamb-Tabu hybrid algorithm had the best fitting degree to the benchmark networks, and was used to construct the BN model of hyperlipidemia. Multivariate logistic regression analysis suggested that gender, smoking, central obesity, daily average salt intake, daily average oil intake, diabetes mellitus, hypertension and physical activity were associated with hyperlipidemia. BNs model of hyperlipidemia further showed that gender, BMI, and physical activity were directly related to the occurrence of hyperlipidemia, hyperlipidemia was directly related to the occurrence of diabetes mellitus and hypertension; the average daily salt intake, daily average oil consumption, smoking, and central obesity were indirectly related to hyperlipidemia. Conclusions The BN of hyperlipidemia constructed by the Inter.iamb-Tabu hybrid algorithm is more reasonable, and allows for the overall linking effect between factors and diseases, revealing the direct and indirect factors associated with hyperlipidemia and correlation between related variables, which can provide a new approach to the study of chronic diseases and their associated factors. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-021-11412-5.
Collapse
Affiliation(s)
- Xuchun Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, 030001, Shanxi, China
| | - Jinhua Pan
- Key Laboratory of Public Health Safety of Ministry of Education, School of Public Health, Fudan University, Shanghai, 200032, China
| | - Zeping Ren
- Shanxi Centre for Disease Control and Prevention, Taiyuan, 030012, Shanxi, China
| | - Mengmeng Zhai
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, 030001, Shanxi, China
| | - Zhuang Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, 030001, Shanxi, China
| | - Hao Ren
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, 030001, Shanxi, China
| | - Weimei Song
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, 030001, Shanxi, China
| | - Yuling He
- Shanxi Centre for Disease Control and Prevention, Taiyuan, 030012, Shanxi, China
| | - Chenglian Li
- Shanxi Centre for Disease Control and Prevention, Taiyuan, 030012, Shanxi, China
| | - Xiaojuan Yang
- Shanxi Centre for Disease Control and Prevention, Taiyuan, 030012, Shanxi, China
| | - Meichen Li
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, 030001, Shanxi, China
| | - Dichen Quan
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, 030001, Shanxi, China
| | - Limin Chen
- Shanxi Provincial People's Hospital, Taiyuan city, Shanxi Province, China.
| | - Lixia Qiu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, 030001, Shanxi, China.
| |
Collapse
|
5
|
Müller D, Czado C. Dependence modelling in ultra high dimensions with vine copulas and the Graphical Lasso. Comput Stat Data Anal 2019. [DOI: 10.1016/j.csda.2019.02.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
6
|
Risk Measurement of Stock Markets in BRICS, G7, and G20: Vine Copulas versus Factor Copulas. MATHEMATICS 2019. [DOI: 10.3390/math7030274] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Multivariate copulas have been widely used to handle risk in the financial market. This paper aimed to adopt two novel multivariate copulas, Vine copulas and Factor copulas, to measure and compare the financial risks of the emerging economy, developed economy, and global economy. In this paper, we used data from three groups (BRICS, which stands for emerging markets, specifically, those of Brazil, Russia, India, China, and South Africa; G7, which refers to developed countries; and G20, which represents the global market), separated into three periods (pre-crisis, crisis, and post-crisis) and weighed Value at Risk (VaR) and Expected Shortfall (ES) (based on their market capitalization) to compare among three copulas, C-Vine, D-Vine, and Factor copulas. Also, real financial data demonstrated that Factor copulas have stronger stability and perform better than the other two copulas in high-dimensional data. Moreover, we showed that BRICS has the highest risk and G20 has the lowest risk of the three groups.
Collapse
|
7
|
Affiliation(s)
- Vinnie Ko
- Department of MathematicsUniversity of Oslo Oslo Norway
| | | | | |
Collapse
|
8
|
Affiliation(s)
- Harry Joe
- Department of Statistics; University of British Columbia; Vancouver British Columbia Canada V6T 1Z4
| |
Collapse
|
9
|
Affiliation(s)
- Elif F. Acar
- Department of Statistics; University of Manitoba; Winnipeg Manitoba, Canada R3T 2N2
| | - Parisa Azimaee
- Department of Statistics; University of Manitoba; Winnipeg Manitoba, Canada R3T 2N2
| | - Md. Erfanul Hoque
- Department of Statistics; University of Manitoba; Winnipeg Manitoba, Canada R3T 2N2
| |
Collapse
|
10
|
Müller D, Czado C. Representing Sparse Gaussian DAGs as Sparse R-Vines Allowing for Non-Gaussian Dependence. J Comput Graph Stat 2018. [DOI: 10.1080/10618600.2017.1366911] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Dominik Müller
- Department of Mathematics, Technische Universität München, Garching, Germany
| | - Claudia Czado
- Department of Mathematics, Technische Universität München, Garching, Germany
| |
Collapse
|
11
|
Zhu M, Liu S, Jiang J. A novel divergence for sensitivity analysis in Gaussian Bayesian networks. Int J Approx Reason 2017. [DOI: 10.1016/j.ijar.2017.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|