1
|
Shin D. Prediction of metabolic syndrome using machine learning approaches based on genetic and nutritional factors: a 14-year prospective-based cohort study. BMC Med Genomics 2024; 17:224. [PMID: 39232768 PMCID: PMC11373243 DOI: 10.1186/s12920-024-01998-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 08/28/2024] [Indexed: 09/06/2024] Open
Abstract
INTRODUCTION Metabolic syndrome is a chronic disease associated with multiple comorbidities. Over the last few years, machine learning techniques have been used to predict metabolic syndrome. However, studies incorporating demographic, clinical, laboratory, dietary, and genetic factors to predict the incidence of metabolic syndrome in Koreans are limited. In the present study, we propose a genome-wide polygenic risk score for the prediction of metabolic syndrome, along with other factors, to improve the prediction accuracy of metabolic syndrome. METHODS We developed 7 machine learning-based models and used Cox multivariable regression, deep neural network (DNN), support vector machine (SVM), stochastic gradient descent (SGD), random forest (RAF), Naïve Bayes (NBA) classifier, and AdaBoost (ADB) to predict the incidence of metabolic syndrome at year 14 using the dataset from the Korean Genome and Epidemiology Study (KoGES) Ansan and Ansung. RESULTS Of the 5440 patients, 2,120 were considered to have new-onset metabolic syndrome. The AUC values of model, which included sex, age, alcohol intake, energy intake, marital status, education status, income status, smoking status, dried laver intake, and genome-wide polygenic risk score (gPRS) Z-score based on 344,447 SNPs (p-value < 1.0), were the highest for RAF (0.994 [95% CI 0.985, 1.000]) and ADB (0.994 [95% CI 0.986, 1.000]). CONCLUSIONS Incorporating both gPRS and demographic, clinical, laboratory, and seaweed data led to enhanced metabolic syndrome risk prediction by capturing the distinct etiologies of metabolic syndrome development. The RAF- and ADB-based models predicted metabolic syndrome more accurately than the NBA-based model for the Korean population.
Collapse
Affiliation(s)
- Dayeon Shin
- Department of Food and Nutrition, Inha University, Incheon, 22212, Republic of Korea.
| |
Collapse
|
2
|
Xiaoxue W, Zijun W, Shichen C, Mukun Y, Yi C, Linqing M, Wenpei B. Risk prediction model of metabolic syndrome in perimenopausal women based on machine learning. Int J Med Inform 2024; 188:105480. [PMID: 38754284 DOI: 10.1016/j.ijmedinf.2024.105480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 11/24/2023] [Accepted: 05/08/2024] [Indexed: 05/18/2024]
Abstract
INTRODUCTION Metabolic syndrome (MetS) is considered to be an important parameter of cardio-metabolic health and contributing to the development of atherosclerosis, type 2 diabetes. The incidence of MetS significantly increases in postmenopausal women, therefore, the perimenopausal period is considered a critical phase for prevention. We aimed to use four machine learning methods to predict whether perimenopausal women will develop MetS within 2 years. METHODS Women aged 45-55 years who underwent 2 consecutive years of physical examinations in Ninth Clinical College of Peking University between January 2021 and December 2022 were included. We extracted 26 features from physical examinations, and used backward selection method to select top 10 features with the largest area under the receiver operating characteristic curve (AUC). Extreme gradient boosting (XGBoost), Random forest (RF), Multilayer perceptron (MLP) and Logistic regression (LR) were used to establish the model. Those performance were measured by AUC, accuracy, precision, recall and F1 score. SHapley Additive exPlanation (SHAP) value was used to identify risk factors affecting perimenopausal MetS. RESULTS A total of 8700 women had physical examination records, and 2,254 women finally met the inclusion criteria. For predicting MetS events, RF and XGBoost had the highest AUC (0.96, 0.95, respectively). XGBoost has the highest F1 value (F1 = 0.77), followed by RF, LR and MLP. SHAP value suggested that the top 5 variables affecting MetS in this study were Waist circumference, Fasting blood glucose, High-density lipoprotein cholesterol, Triglycerides and Diastolic blood pressure, respectively. CONCLUSION We've developed a targeted MetS risk prediction model for perimenopausal women, using health examination data. This model enables early identification of high MetS risk in this group, offering significant benefits for individual health management and wider socio-economic health initiatives.
Collapse
Affiliation(s)
- Wang Xiaoxue
- Department of Obstetrics and Gynecology, Peking University Ninth School of Clinical Medicine, Beijing Shijitan Hospital, Beijing 100038, China
| | - Wang Zijun
- Department of Obstetrics and Gynecology, Peking University Ninth School of Clinical Medicine, Beijing Shijitan Hospital, Beijing 100038, China
| | - Chen Shichen
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yang Mukun
- Department of Obstetrics and Gynecology, Peking University Ninth School of Clinical Medicine, Beijing Shijitan Hospital, Beijing 100038, China
| | - Chen Yi
- Department of Obstetrics and Gynecology, Peking University Ninth School of Clinical Medicine, Beijing Shijitan Hospital, Beijing 100038, China
| | - Miao Linqing
- Beijing Advanced Innovation Center for Intelligent Robots and Systems, Beijing Institute of Technology, Beijing 100081, China
| | - Bai Wenpei
- Department of Obstetrics and Gynecology, Peking University Ninth School of Clinical Medicine, Beijing Shijitan Hospital, Beijing 100038, China.
| |
Collapse
|
3
|
Gutiérrez-Esparza G, Martinez-Garcia M, Ramírez-delReal T, Groves-Miralrio LE, Marquez MF, Pulido T, Amezcua-Guerra LM, Hernández-Lemus E. Sleep Quality, Nutrient Intake, and Social Development Index Predict Metabolic Syndrome in the Tlalpan 2020 Cohort: A Machine Learning and Synthetic Data Study. Nutrients 2024; 16:612. [PMID: 38474741 DOI: 10.3390/nu16050612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 02/16/2024] [Accepted: 02/19/2024] [Indexed: 03/14/2024] Open
Abstract
This study investigated the relationship between Metabolic Syndrome (MetS), sleep disorders, the consumption of some nutrients, and social development factors, focusing on gender differences in an unbalanced dataset from a Mexico City cohort. We used data balancing techniques like SMOTE and ADASYN after employing machine learning models like random forest and RPART to predict MetS. Random forest excelled, achieving significant, balanced accuracy, indicating its robustness in predicting MetS and achieving a balanced accuracy of approximately 87%. Key predictors for men included body mass index and family history of gout, while waist circumference and glucose levels were most significant for women. In relation to diet, sleep quality, and social development, metabolic syndrome in men was associated with high lactose and carbohydrate intake, educational lag, living with a partner without marrying, and lack of durable goods, whereas in women, best predictors in these dimensions include protein, fructose, and cholesterol intake, copper metabolites, snoring, sobbing, drowsiness, sanitary adequacy, and anxiety. These findings underscore the need for personalized approaches in managing MetS and point to a promising direction for future research into the interplay between social factors, sleep disorders, and metabolic health, which mainly depend on nutrient consumption by region.
Collapse
Affiliation(s)
- Guadalupe Gutiérrez-Esparza
- Researcher for Mexico CONAHCYT, National Council of Humanities, Sciences and Technologies, Mexico City 08400, Mexico
- Clinical Research, National Institute of Cardiology 'Ignacio Chávez', Mexico City 14080, Mexico
| | - Mireya Martinez-Garcia
- Department of Immunology, National Institute of Cardiology 'Ignacio Chávez', Mexico City 14080, Mexico
| | - Tania Ramírez-delReal
- Center for Research in Geospatial Information Sciences, Aguascalientes 20313, Mexico
| | | | - Manlio F Marquez
- Department of Electrocardiology, National Institute of Cardiology 'Ignacio Chavez', Mexico City 14080, Mexico
| | - Tomás Pulido
- Cardiopulmonary Department, National Institute of Cardiology 'Ignacio Chávez', Mexico City 14080, Mexico
| | - Luis M Amezcua-Guerra
- Department of Immunology, National Institute of Cardiology 'Ignacio Chávez', Mexico City 14080, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de Mexico, Mexico City 04510, Mexico
| |
Collapse
|
4
|
Chen Q, Chen Z, Zhu X, Zhuang J, Yao L, Zheng H, Li J, Xia T, Lin J, Huang J, Zeng Y, Fan C, Fan J, Song D, Zhang Y. Artificial neural network-based model for sleep quality prediction for frontline medical staff during major medical assistance. Digit Health 2024; 10:20552076241287363. [PMID: 39398893 PMCID: PMC11467980 DOI: 10.1177/20552076241287363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024] Open
Abstract
Background: The sleep quality of medical staff was severely affected during COVID-19, but the factors influencing the sleep quality of frontline staff involved in medical assistance remained unclear, and screening tools for their sleep quality were lacking. Methods: From June 25 to July 14, 2022, we conducted an Internet-based cross-sectional survey. The Pittsburgh Sleep Quality Index (PSQI), a self-designed general information questionnaire, and a questionnaire regarding the factors influencing sleep quality were combined to understand the sleep quality of frontline medical staff in Fujian Province supporting Shanghai in the past month. A chi-square test was used to compare participant characteristics, and multivariate unconditional logistic regression analysis was used to determine the predictors of sleep quality. Stratified sampling was used to divide the data into a training test set (n = 1061, 80%) and an independent validation set (n = 265, 20%). Six models were developed and validated using logistic regression, artificial neural network, gradient augmented tree, random forest, naive Bayes, and model decision tree. Results: A total of 1326 frontline medical staff were included in this survey, with a mean PSQI score of 11.354 ± 4.051. The prevalence of poor sleep quality was 80.8% (n = 1072, PSQI >7). Six variables related to sleep quality were used as parameters in the prediction model, including type of work, professional job title, work shift, weight change, tea consumption during assistance, and basic diseases. The artificial neural network (ANN) model produced the best overall performance with area under the curve, accuracy, sensitivity, specificity, precision, F1 score, and kappa of 71.6%, 68.7%, 66.7%, 69.2%, 34.0%, 45.0%, and 26.2% respectively. Conclusions: In this study, the ANN model, which demonstrated excellent predictive efficiency, showed potential for application in monitoring the sleep quality of medical staff and provide some scientific guidance suggestions for early intervention.
Collapse
Affiliation(s)
- Qingquan Chen
- The Sleep Disorder Medicine Center of the Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China
- The School of Public Health, Fujian Medical University, Fuzhou, Fujian Province, China
| | - Zeshun Chen
- The School of Clinical Medicine, Fujian Medical University, Fuzhou, Fujian Province, China
| | - Xi Zhu
- The School of Public Health, Fujian Medical University, Fuzhou, Fujian Province, China
| | - Jiajing Zhuang
- The School of Clinical Medicine, Fujian Medical University, Fuzhou, Fujian Province, China
| | - Ling Yao
- The School of Clinical Medicine, Fujian Medical University, Fuzhou, Fujian Province, China
| | - Huaxian Zheng
- The School of Clinical Medicine, Fujian Medical University, Fuzhou, Fujian Province, China
| | - Jiaxin Li
- Anyang University, Anyang, Henan Province, China
| | - Tian Xia
- The School of Clinical Medicine, Fujian Medical University, Fuzhou, Fujian Province, China
| | - Jiayi Lin
- The School of Clinical Medicine, Fujian Medical University, Fuzhou, Fujian Province, China
| | - Jiewei Huang
- The Graduate School of Fujian Medical University, Fuzhou, Fujian Province, China
| | - Yifu Zeng
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, Guangdong Province, China
| | - Chunmei Fan
- The Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China
| | - Jimin Fan
- The Sleep Disorder Medicine Center of the Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China
| | - Duanhong Song
- The Sleep Disorder Medicine Center of the Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China
- The Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China
| | - Yixiang Zhang
- The Sleep Disorder Medicine Center of the Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian Province, China
| |
Collapse
|
5
|
Shin H, Shim S, Oh S. Machine learning-based predictive model for prevention of metabolic syndrome. PLoS One 2023; 18:e0286635. [PMID: 37267302 DOI: 10.1371/journal.pone.0286635] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 05/19/2023] [Indexed: 06/04/2023] Open
Abstract
Metabolic syndrome (MetS) is a chronic disease caused by obesity, high blood pressure, high blood sugar, and dyslipidemia and may lead to cardiovascular disease or type 2 diabetes. Therefore, the detection and prevention of MetS at an early stage are imperative. Individuals can detect MetS early and manage it effectively if they can easily monitor their health status in their daily lives. In this study, a predictive model for MetS was developed utilizing solely noninvasive information, thereby facilitating its practical application in real-world scenarios. The model's construction deliberately excluded three features requiring blood testing, specifically those for triglycerides, blood sugar, and HDL cholesterol. We used a large-scale Korean health examination dataset (n = 70, 370; the prevalence of MetS = 13.6%) to develop the predictive model. To obtain informative features, we developed three novel synthetic features from four basic information: waist circumference, systolic and diastolic blood pressure, and gender. We tested several classification algorithms and confirmed that the decision tree model is the most appropriate for the practical prediction of MetS. The proposed model achieved good performance, with an AUC of 0.889, a recall of 0.855, and a specificity of 0.773. It uses only four base features, which results in simplicity and easy interpretability of the model. In addition, we performed calibrations on the prediction probability and calibrated the model. Therefore, the proposed model can provide MetS diagnosis and risk prediction results. We also proposed a MetS risk map such that individuals could easily determine whether they had metabolic syndrome.
Collapse
Affiliation(s)
- Hyunseok Shin
- Department of Computer Science, Dankook University, Youngin, South Korea
| | - Simon Shim
- Department of Applied Data Science, San José State University, San Jose, CA, United States of America
| | - Sejong Oh
- Department of Software Science, Dankook University, Youngin, South Korea
| |
Collapse
|
6
|
Zheng W, Chen Q, Yao L, Zhuang J, Huang J, Hu Y, Yu S, Chen T, Wei N, Zeng Y, Zhang Y, Fan C, Wang Y. Prediction Models for Sleep Quality Among College Students During the COVID-19 Outbreak: Cross-sectional Study Based on the Internet New Media. J Med Internet Res 2023; 25:e45721. [PMID: 36961495 PMCID: PMC10131726 DOI: 10.2196/45721] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 02/15/2023] [Accepted: 02/16/2023] [Indexed: 02/18/2023] Open
Abstract
BACKGROUND COVID-19 has been reported to affect the sleep quality of Chinese residents; however, the epidemic's effects on the sleep quality of college students during closed-loop management remain unclear, and a screening tool is lacking. OBJECTIVE This study aimed to understand the sleep quality of college students in Fujian Province during the epidemic and determine sensitive variables, in order to develop an efficient prediction model for the early screening of sleep problems in college students. METHODS From April 5 to 16, 2022, a cross-sectional internet-based survey was conducted. The Pittsburgh Sleep Quality Index (PSQI) scale, a self-designed general data questionnaire, and the sleep quality influencing factor questionnaire were used to understand the sleep quality of respondents in the previous month. A chi-square test and a multivariate unconditioned logistic regression analysis were performed, and influencing factors obtained were applied to develop prediction models. The data were divided into a training-testing set (n=14,451, 70%) and an independent validation set (n=6194, 30%) by stratified sampling. Four models using logistic regression, an artificial neural network, random forest, and naïve Bayes were developed and validated. RESULTS In total, 20,645 subjects were included in this survey, with a mean global PSQI score of 6.02 (SD 3.112). The sleep disturbance rate was 28.9% (n=5972, defined as a global PSQI score >7 points). A total of 11 variables related to sleep quality were taken as parameters of the prediction models, including age, gender, residence, specialty, respiratory history, coffee consumption, stay up, long hours on the internet, sudden changes, fears of infection, and impatient closed-loop management. Among the generated models, the artificial neural network model proved to be the best, with an area under curve, accuracy, sensitivity, specificity, positive predictive value, and negative predictive value of 0.713, 73.52%, 25.51%, 92.58%, 57.71%, and 75.79%, respectively. It is noteworthy that the logistic regression, random forest, and naive Bayes models achieved high specificities of 94.41%, 94.77%, and 86.40%, respectively. CONCLUSIONS The COVID-19 containment measures affected the sleep quality of college students on multiple levels, indicating that it is desiderate to provide targeted university management and social support. The artificial neural network model has presented excellent predictive efficiency and is favorable for implementing measures earlier in order to improve present conditions.
Collapse
Affiliation(s)
- Wanyu Zheng
- The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China
| | - Qingquan Chen
- The School of Public Health, Fujian Medical University, Fuzhou, China
- The Graduate School of Fujian Medical University, Fuzhou, China
| | - Ling Yao
- The School of Clinical Medicine, Fujian Medical University, Fuzhou, China
| | - Jiajing Zhuang
- The School of Clinical Medicine, Fujian Medical University, Fuzhou, China
| | - Jiewei Huang
- The Graduate School of Fujian Medical University, Fuzhou, China
| | - Yiming Hu
- The School of Public Health, Fujian Medical University, Fuzhou, China
| | - Shaoyang Yu
- The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China
| | - Tebin Chen
- The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China
| | - Nan Wei
- The School of Clinical Medicine, Fujian Medical University, Fuzhou, China
| | - Yifu Zeng
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
| | - Yixiang Zhang
- The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China
| | - Chunmei Fan
- The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China
| | - Youjuan Wang
- The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China
| |
Collapse
|
7
|
Cabeza-Ramírez LJ, Rey-Carmona FJ, Del Carmen Cano-Vicente M, Solano-Sánchez MÁ. Analysis of the coexistence of gaming and viewing activities in Twitch users and their relationship with pathological gaming: a multilayer perceptron approach. Sci Rep 2022; 12:7904. [PMID: 35551493 PMCID: PMC9098150 DOI: 10.1038/s41598-022-11985-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 05/03/2022] [Indexed: 11/24/2022] Open
Abstract
The enormous expansion of the video game sector, driven by the emergence of live video game streaming platforms and the professionalisation of this hobby through e-sports, has spurred interest in research on the relationships with potential adverse effects derived from cumulative use. This study explores the co-occurrence of the consumption and viewing of video games, based on an analysis of the motivations for using these services, the perceived positive uses, and the gamer profile. To that end, a multilayer perceptron artificial neural network is developed and tested on a sample of 970 video game users. The results show that the variables with a significant influence on pathological gaming are the motivation of a sense of belonging to the different platforms, as well as the positive uses relating to making friends and the possibility of making this hobby a profession. Furthermore, the individual effects of each of the variables have been estimated. The results indicate that the social component linked to the positive perception of making new friends and the self-perceived level as a gamer have been identified as possible predictors, when it comes to a clinical assessment of the adverse effects. Conversely, the variables age and following specific streamers are found to play a role in reducing potential negative effects.
Collapse
Affiliation(s)
- L Javier Cabeza-Ramírez
- Department of Statistics, Econometrics, Operations Research, Business and Applied Economics, Faculty of Law, Business and Economics Sciences, University of Córdoba, Puerta Nueva s/n, 14071, Córdoba, Spain.
| | - Francisco José Rey-Carmona
- Department of Statistics, Econometrics, Operations Research, Business and Applied Economics, Faculty of Law, Business and Economics Sciences, University of Córdoba, Puerta Nueva s/n, 14071, Córdoba, Spain
| | - Ma Del Carmen Cano-Vicente
- Department of Statistics, Econometrics, Operations Research, Business and Applied Economics, Faculty of Law, Business and Economics Sciences, University of Córdoba, Puerta Nueva s/n, 14071, Córdoba, Spain
| | - Miguel Ángel Solano-Sánchez
- Department of Applied Economics, Faculty of Social Sciences (Melilla Campus), University of Granada, Calle Santander, 1, 52005, Melilla, Spain
| |
Collapse
|
8
|
Kim J, Mun S, Lee S, Jeong K, Baek Y. Prediction of metabolic and pre-metabolic syndromes using machine learning models with anthropometric, lifestyle, and biochemical factors from a middle-aged population in Korea. BMC Public Health 2022; 22:664. [PMID: 35387629 PMCID: PMC8985311 DOI: 10.1186/s12889-022-13131-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 03/30/2022] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Metabolic syndrome (MetS) is a complex condition that appears as a cluster of metabolic abnormalities, and is closely associated with the prevalence of various diseases. Early prediction of the risk of MetS in the middle-aged population provides greater benefits for cardiovascular disease-related health outcomes. This study aimed to apply the latest machine learning techniques to find the optimal MetS prediction model for the middle-aged Korean population. METHODS We retrieved 20 data types from the Korean Medicine Daejeon Citizen Cohort, a cohort study on a community-based population of adults aged 30-55 years. The data included sex, age, anthropometric data, lifestyle-related data, and blood indicators of 1991 individuals. Participants satisfying two (pre-MetS) or ≥ 3 (MetS) of the five NECP-ATP III criteria were included in the MetS group. MetS prediction used nine machine learning models based on the following algorithms: Decision tree, Gaussian Naïve Bayes, K-nearest neighbor, eXtreme gradient boosting (XGBoost), random forest, logistic regression, support vector machine, multi-layer perceptron, and 1D convolutional neural network. All analyses were performed by sequentially inputting the features in three steps according to their characteristics. The models' performances were compared after applying the synthetic minority oversampling technique (SMOTE) to resolve data imbalance. RESULTS MetS was detected in 33.85% of the subjects. Among the MetS prediction models, the tree-based random forest and XGBoost models showed the best performance, which improved with the number of features used. As a measure of the models' performance, the area under the receiver operating characteristic curve (AUC) increased by up to 0.091 when the SMOTE was applied, with XGBoost showing the highest AUC of 0.851. Body mass index and waist-to-hip ratio were identified as the most important features in the MetS prediction models for this population. CONCLUSIONS Tree-based machine learning models were useful in identifying MetS with high accuracy in middle-aged Koreans. Early diagnosis of MetS is important and requires a multidimensional approach that includes self-administered questionnaire, anthropometric, and biochemical measurements.
Collapse
Affiliation(s)
- Junho Kim
- KM Data Division, Korea Institute of Oriental Medicine, 1672 Yuseongdae-ro, Yuseong-gu, Daejeon, Republic of Korea
| | - Sujeong Mun
- KM Data Division, Korea Institute of Oriental Medicine, 1672 Yuseongdae-ro, Yuseong-gu, Daejeon, Republic of Korea
| | - Siwoo Lee
- KM Data Division, Korea Institute of Oriental Medicine, 1672 Yuseongdae-ro, Yuseong-gu, Daejeon, Republic of Korea
| | - Kyoungsik Jeong
- KM Data Division, Korea Institute of Oriental Medicine, 1672 Yuseongdae-ro, Yuseong-gu, Daejeon, Republic of Korea
| | - Younghwa Baek
- KM Data Division, Korea Institute of Oriental Medicine, 1672 Yuseongdae-ro, Yuseong-gu, Daejeon, Republic of Korea.
| |
Collapse
|
9
|
Atsawarungruangkit A, Laoveeravat P, Promrat K. Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database. World J Hepatol 2021; 13:1417-1427. [PMID: 34786176 PMCID: PMC8568572 DOI: 10.4254/wjh.v13.i10.1417] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 05/11/2021] [Accepted: 09/19/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease, affecting over 30% of the United States population. Early patient identification using a simple method is highly desirable.
AIM To create machine learning models for predicting NAFLD in the general United States population.
METHODS Using the NHANES 1988-1994. Thirty NAFLD-related factors were included. The dataset was divided into the training (70%) and testing (30%) datasets. Twenty-four machine learning algorithms were applied to the training dataset. The best-performing models and another interpretable model (i.e., coarse trees) were tested using the testing dataset.
RESULTS There were 3235 participants (n = 3235) that met the inclusion criteria. In the training phase, the ensemble of random undersampling (RUS) boosted trees had the highest F1 (0.53). In the testing phase, we compared selective machine learning models and NAFLD indices. Based on F1, the ensemble of RUS boosted trees remained the top performer (accuracy 71.1% and F1 0.56) followed by the fatty liver index (accuracy 68.8% and F1 0.52). A simple model (coarse trees) had an accuracy of 74.9% and an F1 of 0.33.
CONCLUSION Not every machine learning model is complex. Using a simpler model such as coarse trees, we can create an interpretable model for predicting NAFLD with only two predictors: fasting C-peptide and waist circumference. Although the simpler model does not have the best performance, its simplicity is useful in clinical practice.
Collapse
Affiliation(s)
- Amporn Atsawarungruangkit
- Division of Gastroenterology, Warren Alpert Medical School, Brown University, Providence, RI 02903, United States
| | - Passisd Laoveeravat
- Division of Digestive Diseases and Nutrition, University of Kentucky College of Medicine, Lexington, KY 40536, United States
| | - Kittichai Promrat
- Division of Gastroenterology, Warren Alpert Medical School, Brown University, Providence, RI 02903, United States
- Division of Gastroenterology and Hepatology, Providence VA Medical Center, Providence, RI 02908, United States
| |
Collapse
|
10
|
Peer-to-Peer Tourism: Tourists’ Profile Estimation through Artificial Neural Networks. JOURNAL OF THEORETICAL AND APPLIED ELECTRONIC COMMERCE RESEARCH 2021. [DOI: 10.3390/jtaer16040063] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Peer-to-peer tourism is one of the great global trends that is transforming the tourism sector, introducing several changes in many aspects of tourism, such as the way of travelling, staying or living the experience in the destination. This research aims to determine the relationship between the sociodemographic characteristics of tourists interested in peer-to-peer accommodation and the importance they give to various motivational factors about this type of tourism in a “cultural-tourism” city. The methodology used in this research is an artificial neural network of the multilayer perceptron type to estimate a sociodemographic profile of the peer-to-peer accommodation tourist user based on predetermined input values consisting of the answers to the Likert-type questions previously carried out using a questionnaire. Thus, the model developed, through a customized set of answers to these questions, allows the presentation of a “composite picture” of a peer-to-peer tourist based on sociodemographic characteristics. This function is especially interesting for adapting the peer-to-peer hosting offer according to the preferences of potential users.
Collapse
|