1
|
Mohseni-Takalloo S, Mohseni H, Mozaffari-Khosravi H, Mirzaei M, Hosseinzadeh M. The effect of data balancing approaches on the prediction of metabolic syndrome using non-invasive parameters based on random forest. BMC Bioinformatics 2024; 25:18. [PMID: 38212697 PMCID: PMC10782700 DOI: 10.1186/s12859-024-05633-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 01/02/2024] [Indexed: 01/13/2024] Open
Abstract
BACKGROUND Metabolic syndrome (MetS) is a cluster of metabolic abnormalities (including obesity, insulin resistance, hypertension, and dyslipidemia), which can be used to identify at-risk populations for diabetes and cardiovascular diseases, the main causes of morbidity and mortality worldwide. The achievement of a simple approach for diagnosing MetS without needing biochemical tests is so valuable. The present study aimed to predict MetS using non-invasive features based on a successful random forest learning algorithm. Also, to deal with the problem of data imbalance that naturally exists in this type of data, the effect of two different data balancing approaches, including the Synthetic Minority Over-sampling Technique (SMOTE) and Random Splitting data balancing (SplitBal), on model performance is investigated. RESULTS The most important determinant for MetS prediction was waist circumference. Applying a random forest learning algorithm to imbalanced data, the trained models reach 86.9% and 79.4% accuracies and 37.1% and 38.2% sensitivities in men and women, respectively. However, by applying the SplitBal data balancing technique, the best results were obtained, and despite that the accuracy of the trained models decreased by 7.8% and 11.3%, but their sensitivity improved significantly to 82.3% and 73.7% in men and women, respectively. CONCLUSIONS The random forest learning method, along with data balancing techniques, especially SplitBal, could create MetS prediction models with promising results that can be applied as a useful prognostic tool in health screening programs.
Collapse
Affiliation(s)
- Sahar Mohseni-Takalloo
- School of Public Health, Bam University of Medical Sciences, Bam, Iran
- Research Center for Food Hygiene and Safety, School of Public Health, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
- Department of Nutrition, School of Public Health, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| | - Hadis Mohseni
- Computer Engineering Department, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Hassan Mozaffari-Khosravi
- Research Center for Food Hygiene and Safety, School of Public Health, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
- Department of Nutrition, School of Public Health, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| | - Masoud Mirzaei
- Yazd Cardiovascular Research Centre, Non-Communicable Diseases Research Institute, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| | - Mahdieh Hosseinzadeh
- Research Center for Food Hygiene and Safety, School of Public Health, Shahid Sadoughi University of Medical Sciences, Yazd, Iran.
- Department of Nutrition, School of Public Health, Shahid Sadoughi University of Medical Sciences, Yazd, Iran.
| |
Collapse
|
2
|
Mohseni-Takalloo S, Mozaffari-Khosravi H, Mohseni H, Mirzaei M, Hosseinzadeh M. Metabolic syndrome prediction using non-invasive and dietary parameters based on a support vector machine. Nutr Metab Cardiovasc Dis 2024; 34:126-135. [PMID: 37949713 DOI: 10.1016/j.numecd.2023.08.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Revised: 07/14/2023] [Accepted: 08/21/2023] [Indexed: 11/12/2023]
Abstract
BACKGROUND AND AIMS Metabolic syndrome (MetS) is a widely used index for finding people at risk for chronic diseases, including cardiovascular disease and diabetes. Early detection of MetS is especially important in prevention programs. Relying on previous studies that suggest machine learning methods as a valuable approach for diagnosing MetS, this study aimed to develop MetS prediction models based on support vector machine (SVM) algorithms, applying non-invasive and low-cost (NI&LC), and also dietary parameters. METHODS AND RESULTS This population-based research was conducted on a large dataset of 4596 participants within the framework of the Shahedieh cohort study. An Extremely Randomized Trees Classifier was used to select the most effective features among NI&LC and dietary data. The prediction models were developed based on SVM algorithms, and their performance was assessed by accuracy, sensitivity, specificity, positive prediction value, negative prediction value, f1-score, and receiver operating characteristic curve. MetS was diagnosed in 14% of men and 22% of women. Among NI&LC features, waist circumference, body mass index, waist-to-height ratio, waist-to-hip ratio, systolic blood pressure, and diastolic blood pressure were the most predictive variables. By using NI&LC features, models with 78.4% and 63.5% accuracy and 81.2% and 75.3% sensitivity were yielded for men and women, respectively. By incorporating NI&LC and dietary features, the accuracy of the model in women improved by 3.7%. CONCLUSIONS SVM algorithms had promising potential for early detection of MetS relying on NI&LC parameters. These models can be used in prevention programs, clinical practice, and personal applications.
Collapse
Affiliation(s)
- Sahar Mohseni-Takalloo
- Research Center for Food Hygiene and Safety, School of Public Health, Shahid Sadoughi University of Medical Sciences, Yazd, Iran; Department of Nutrition, School of Public Health, Shahid Sadoughi University of Medical Sciences, Yazd, Iran; School of Public Health, Bam University of Medical Sciences, Bam, Iran
| | - Hassan Mozaffari-Khosravi
- Research Center for Food Hygiene and Safety, School of Public Health, Shahid Sadoughi University of Medical Sciences, Yazd, Iran; Department of Nutrition, School of Public Health, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| | - Hadis Mohseni
- Computer Engineering Department, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Masoud Mirzaei
- Yazd Cardiovascular Research Centre, Non-communicable Disease Institute, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| | - Mahdieh Hosseinzadeh
- Research Center for Food Hygiene and Safety, School of Public Health, Shahid Sadoughi University of Medical Sciences, Yazd, Iran; Department of Nutrition, School of Public Health, Shahid Sadoughi University of Medical Sciences, Yazd, Iran.
| |
Collapse
|
3
|
Boitor O, Stoica F, Mihăilă R, Stoica LF, Stef L. Automated Machine Learning to Develop Predictive Models of Metabolic Syndrome in Patients with Periodontal Disease. Diagnostics (Basel) 2023; 13:3631. [PMID: 38132215 PMCID: PMC10743072 DOI: 10.3390/diagnostics13243631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/04/2023] [Accepted: 12/06/2023] [Indexed: 12/23/2023] Open
Abstract
Metabolic syndrome is experiencing a concerning and escalating rise in prevalence today. The link between metabolic syndrome and periodontal disease is a highly relevant area of research. Some studies have suggested a bidirectional relationship between metabolic syndrome and periodontal disease, where one condition may exacerbate the other. Furthermore, the existence of periodontal disease among these individuals significantly impacts overall health management. This research focuses on the relationship between periodontal disease and metabolic syndrome, while also incorporating data on general health status and overall well-being. We aimed to develop advanced machine learning models that efficiently identify key predictors of metabolic syndrome, a significant emphasis being placed on thoroughly explaining the predictions generated by the models. We studied a group of 296 patients, hospitalized in SCJU Sibiu, aged between 45-79 years, of which 57% had metabolic syndrome. The patients underwent dental consultations and subsequently responded to a dedicated questionnaire, along with a standard EuroQol 5-Dimensions 5-Levels (EQ-5D-5L) questionnaire. The following data were recorded: DMFT (Decayed, Missing due to caries, and Filled Teeth), CPI (Community Periodontal Index), periodontal pockets depth, loss of epithelial insertion, bleeding after probing, frequency of tooth brushing, regular dental control, cardiovascular risk, carotid atherosclerosis, and EQ-5D-5L score. We used Automated Machine Learning (AutoML) frameworks to build predictive models in order to determine which of these risk factors exhibits the most robust association with metabolic syndrome. To gain confidence in the results provided by the machine learning models provided by the AutoML pipelines, we used SHapley Additive exPlanations (SHAP) values for the interpretability of these models, from a global and local perspective. The obtained results confirm that the severity of periodontal disease, high cardiovascular risk, and low EQ-5D-5L score have the greatest impact in the occurrence of metabolic syndrome.
Collapse
Affiliation(s)
- Ovidiu Boitor
- Dental Medicine Research Center, Faculty of Medicine, “Lucian Blaga” University, 550024 Sibiu, Romania;
| | - Florin Stoica
- Department of Mathematics and Informatics, Research Center in Informatics and Information Technology, Faculty of Sciences, “Lucian Blaga” University, 550024 Sibiu, Romania;
| | - Romeo Mihăilă
- Department of Internal Medicine, Faculty of Medicine, “Lucian Blaga” University, 550024 Sibiu, Romania;
| | - Laura Florentina Stoica
- Department of Mathematics and Informatics, Research Center in Informatics and Information Technology, Faculty of Sciences, “Lucian Blaga” University, 550024 Sibiu, Romania;
| | - Laura Stef
- Department of Oral Health, Dental Medicine Research Center, Faculty of Medicine, “Lucian Blaga” University, 550024 Sibiu, Romania;
| |
Collapse
|
4
|
Kim H, Heo JH, Lim DH, Kim Y. Development of a Metabolic Syndrome Classification and Prediction Model for Koreans Using Deep Learning Technology: The Korea National Health and Nutrition Examination Survey (KNHANES) (2013-2018). Clin Nutr Res 2023; 12:138-153. [PMID: 37214780 PMCID: PMC10193438 DOI: 10.7762/cnr.2023.12.2.138] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 03/21/2023] [Accepted: 03/27/2023] [Indexed: 05/24/2023] Open
Abstract
The prevalence of metabolic syndrome (MetS) and its cost are increasing due to lifestyle changes and aging. This study aimed to develop a deep neural network model for prediction and classification of MetS according to nutrient intake and other MetS-related factors. This study included 17,848 individuals aged 40-69 years from the Korea National Health and Nutrition Examination Survey (2013-2018). We set MetS (3-5 risk factors present) as the dependent variable and 52 MetS-related factors and nutrient intake variables as independent variables in a regression analysis. The analysis compared and analyzed model accuracy, precision and recall by conventional logistic regression, machine learning-based logistic regression and deep learning. The accuracy of train data was 81.2089, and the accuracy of test data was 81.1485 in a MetS classification and prediction model developed in this study. These accuracies were higher than those obtained by conventional logistic regression or machine learning-based logistic regression. Precision, recall, and F1-score also showed the high accuracy in the deep learning model. Blood alanine aminotransferase (β = 12.2035) level showed the highest regression coefficient followed by blood aspartate aminotransferase (β = 11.771) level, waist circumference (β = 10.8555), body mass index (β = 10.3842), and blood glycated hemoglobin (β = 10.1802) level. Fats (cholesterol [β = -2.0545] and saturated fatty acid [β = -2.0483]) showed high regression coefficients among nutrient intakes. The deep learning model for classification and prediction on MetS showed a higher accuracy than conventional logistic regression or machine learning-based logistic regression.
Collapse
Affiliation(s)
- Hyerim Kim
- Department of Food and Nutrition, Gyeongsang National University, Jinju 52828, Korea
| | - Ji Hye Heo
- Department of Information & Statistics, Gyeongsang National University, Jinju 52828, Korea
| | - Dong Hoon Lim
- Department of Information & Statistics, Research Institute of Natural Science (RINS), Gyeongsang National University, Jinju 52828, Korea
| | - Yoona Kim
- Department of Food and Nutrition, Institute of Agriculture and Life Science, Gyeongsang National University, Jinju 52828, Korea
| |
Collapse
|
5
|
Jiang X, Yang Z, Wang S, Deng S. “Big Data” Approaches for Prevention of the Metabolic Syndrome. Front Genet 2022; 13:810152. [PMID: 35571045 PMCID: PMC9095427 DOI: 10.3389/fgene.2022.810152] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 03/28/2022] [Indexed: 11/21/2022] Open
Abstract
Metabolic syndrome (MetS) is characterized by the concurrence of multiple metabolic disorders resulting in the increased risk of a variety of diseases related to disrupted metabolism homeostasis. The prevalence of MetS has reached a pandemic level worldwide. In recent years, extensive amount of data have been generated throughout the research targeted or related to the condition with techniques including high-throughput screening and artificial intelligence, and with these “big data”, the prevention of MetS could be pushed to an earlier stage with different data source, data mining tools and analytic tools at different levels. In this review we briefly summarize the recent advances in the study of “big data” applications in the three-level disease prevention for MetS, and illustrate how these technologies could contribute tobetter preventive strategies.
Collapse
Affiliation(s)
- Xinping Jiang
- Department of United Ultrasound, The First Hospital of Jilin University, Changchun, China
| | - Zhang Yang
- Department of Vascular Surgery, The First Hospital of Jilin University, Changchun, China
| | - Shuai Wang
- Department of Vascular Surgery, The First Hospital of Jilin University, Changchun, China
| | - Shuanglin Deng
- Department of Oncological Neurosurgery, The First Hospital of Jilin University, Changchun, China
- *Correspondence: Shuanglin Deng,
| |
Collapse
|
6
|
Kim J, Mun S, Lee S, Jeong K, Baek Y. Prediction of metabolic and pre-metabolic syndromes using machine learning models with anthropometric, lifestyle, and biochemical factors from a middle-aged population in Korea. BMC Public Health 2022; 22:664. [PMID: 35387629 PMCID: PMC8985311 DOI: 10.1186/s12889-022-13131-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 03/30/2022] [Indexed: 01/10/2023] Open
Abstract
Background Metabolic syndrome (MetS) is a complex condition that appears as a cluster of metabolic abnormalities, and is closely associated with the prevalence of various diseases. Early prediction of the risk of MetS in the middle-aged population provides greater benefits for cardiovascular disease-related health outcomes. This study aimed to apply the latest machine learning techniques to find the optimal MetS prediction model for the middle-aged Korean population. Methods We retrieved 20 data types from the Korean Medicine Daejeon Citizen Cohort, a cohort study on a community-based population of adults aged 30–55 years. The data included sex, age, anthropometric data, lifestyle-related data, and blood indicators of 1991 individuals. Participants satisfying two (pre-MetS) or ≥ 3 (MetS) of the five NECP-ATP III criteria were included in the MetS group. MetS prediction used nine machine learning models based on the following algorithms: Decision tree, Gaussian Naïve Bayes, K-nearest neighbor, eXtreme gradient boosting (XGBoost), random forest, logistic regression, support vector machine, multi-layer perceptron, and 1D convolutional neural network. All analyses were performed by sequentially inputting the features in three steps according to their characteristics. The models’ performances were compared after applying the synthetic minority oversampling technique (SMOTE) to resolve data imbalance. Results MetS was detected in 33.85% of the subjects. Among the MetS prediction models, the tree-based random forest and XGBoost models showed the best performance, which improved with the number of features used. As a measure of the models’ performance, the area under the receiver operating characteristic curve (AUC) increased by up to 0.091 when the SMOTE was applied, with XGBoost showing the highest AUC of 0.851. Body mass index and waist-to-hip ratio were identified as the most important features in the MetS prediction models for this population. Conclusions Tree-based machine learning models were useful in identifying MetS with high accuracy in middle-aged Koreans. Early diagnosis of MetS is important and requires a multidimensional approach that includes self-administered questionnaire, anthropometric, and biochemical measurements.
Collapse
Affiliation(s)
- Junho Kim
- KM Data Division, Korea Institute of Oriental Medicine, 1672 Yuseongdae-ro, Yuseong-gu, Daejeon, Republic of Korea
| | - Sujeong Mun
- KM Data Division, Korea Institute of Oriental Medicine, 1672 Yuseongdae-ro, Yuseong-gu, Daejeon, Republic of Korea
| | - Siwoo Lee
- KM Data Division, Korea Institute of Oriental Medicine, 1672 Yuseongdae-ro, Yuseong-gu, Daejeon, Republic of Korea
| | - Kyoungsik Jeong
- KM Data Division, Korea Institute of Oriental Medicine, 1672 Yuseongdae-ro, Yuseong-gu, Daejeon, Republic of Korea
| | - Younghwa Baek
- KM Data Division, Korea Institute of Oriental Medicine, 1672 Yuseongdae-ro, Yuseong-gu, Daejeon, Republic of Korea.
| |
Collapse
|