1
|
Liu YQ, Chang TW, Lee LC, Chen CY, Hsu PS, Tsan YT, Yang CT, Chu WM. Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in Taiwan. Diagnostics (Basel) 2024; 15:72. [PMID: 39795600 PMCID: PMC11719639 DOI: 10.3390/diagnostics15010072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Revised: 12/17/2024] [Accepted: 12/20/2024] [Indexed: 01/13/2025] Open
Abstract
Background: The prevalence of diabetes is increasing worldwide, particularly in the Pacific Ocean island nations. Although machine learning (ML) models and data mining approaches have been applied to diabetes research, there was no study utilizing ML models to predict diabetes incidence in Taiwan. We aimed to predict the onset of diabetes in order to raise health awareness, thereby promoting any necessary lifestyle modifications and help mitigate disease burden. Methods: The research dataset used in the study was retrieved from the Clinical Data Center of Taichung Veterans General Hospital. We collected data from the available electronic health records with a total of 33 items being employed for model construction. Individuals with diabetes and those with missing data were excluded. Ultimately, 6687 adults were included in the final analysis, where we implemented three different ML algorithms, including logistic regression (LR), random forest (RF) and extreme gradient boosting (XGBoost) in order to predict diabetes. Results: The top five important factors involved in the prediction model were glycated hemoglobin (HbA1c), fasting blood glucose, weight, free thyroxine (fT4), and triglycerides (TG). Notably, random forest, logistic regression, and XGBoost reached 99%, 99%, and 98% accuracy, respectively. fT4 seems to be one of the significant features in predicting the onset of diabetes. Moreover, this would be the first study using machine learning models to predict diabetes that has demonstrated the importance of thyroid hormone. Conclusions: A total of 33 items were able to be put into the machine learning model in order to predict diabetes with promising accuracy. In comparison to prior studies on machine learning models, this study not only identified similar key factors for predicting diabetes but also highlighted the significance of thyroid hormones, a factor that was previously overlooked. Moreover, it highlighted the relevance of predicting type 2 diabetes using more affordable methods, which would be useful for clinical healthcare professionals and endocrinologists who apply the models to clinical practice.
Collapse
Affiliation(s)
- Ying-Qiang Liu
- Department of Medical Education, Taichung Veterans General Hospital, Taichung 407219, Taiwan
| | - Tzu-Wei Chang
- Department of Family Medicine, Taichung Veterans General Hospital, Taichung 407219, Taiwan
- Division of Family Medicine, Department of Medicine, Taipei Veterans General Hospital Yuanshan Branch, Yilan 264018, Taiwan
| | - Lung-Chun Lee
- Department of Family Medicine, Taichung Veterans General Hospital, Taichung 407219, Taiwan
| | - Chia-Yu Chen
- Department of Application Value-Added Service, SYSTEX Corporation, Taipei 114730, Taiwan
| | - Pi-Shan Hsu
- Department of Family Medicine, Taichung Veterans General Hospital, Taichung 407219, Taiwan
| | - Yu-Tse Tsan
- Division of Occupational Medicine, Department of Emergency Medicine, Taichung Veterans General Hospital, Taichung 407219, Taiwan
- School of Medicine, Chung Shan Medical University, Taichung 402306, Taiwan
| | - Chao-Tung Yang
- Department of Computer Science, Tunghai University, Taichung 407224, Taiwan
- Research Center for Smart Sustainable Circular Economy, Tunghai University, Taichung 407224, Taiwan
| | - Wei-Min Chu
- Department of Family Medicine, Taichung Veterans General Hospital, Taichung 407219, Taiwan
- Geriatrics and Gerontology Research Center, College of Medicine, National Chung Hsing University, Taichung 402202, Taiwan
- Department of Post-Baccalaureate Medicine, College of Medicine, National Chung Hsing University, Taichung 402202, Taiwan
- School of Medicine, National Yang Ming Chiao Tung University, Taipei 112304, Taiwan
| |
Collapse
|
2
|
Talebi Moghaddam M, Jahani Y, Arefzadeh Z, Dehghan A, Khaleghi M, Sharafi M, Nikfar G. Predicting diabetes in adults: identifying important features in unbalanced data over a 5-year cohort study using machine learning algorithm. BMC Med Res Methodol 2024; 24:220. [PMID: 39333899 PMCID: PMC11430121 DOI: 10.1186/s12874-024-02341-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Accepted: 09/16/2024] [Indexed: 09/30/2024] Open
Abstract
BACKGROUND Imbalanced datasets pose significant challenges in predictive modeling, leading to biased outcomes and reduced model reliability. This study addresses data imbalance in diabetes prediction using machine learning techniques. Utilizing data from the Fasa Adult Cohort Study (FACS) with a 5-year follow-up of 10,000 participants, we developed predictive models for Type 2 diabetes. METHODS We employed various data-level and algorithm-level interventions, including SMOTE, ADASYN, SMOTEENN, Random Over Sampling and KMeansSMOTE, paired with Random Forest, Gradient Boosting, Decision Tree and Multi-Layer Perceptron (MLP) classifier. We evaluated model performance using F1 score, AUC, and G-means-metrics chosen to provide a comprehensive assessment of model accuracy, discrimination ability, and overall balance in performance, particularly in the context of imbalanced datasets. RESULTS our study uncovered key factors influencing diabetes risk and evaluated the performance of various machine learning models. Feature importance analysis revealed that the most influential predictors of diabetes differ between males and females. For females, the most important factors are triglyceride (TG), basal metabolic rate (BMR), and total cholesterol (CHOL), whereas for males, the key predictors are body Mass Index (BMI), serum glutamate Oxaloacetate Transaminase (SGOT), and Gamma-Glutamyl (GGT). Across the entire dataset, BMI remains the most important variable, followed by SGOT, BMR, and energy intake. These insights suggest that gender-specific risk profiles should be considered in diabetes prevention and management strategies. In terms of model performance, our results show that ADASYN with MLP classifier achieved an F1 score of 82.17 ± 3.38, AUC of 89.61 ± 2.09, and G-means of 89.15 ± 2.31. SMOTE with MLP followed closely with an F1 score of 79.85 ± 3.91, AUC of 89.7 ± 2.54, and G-means of 89.31 ± 2.78. The SMOTEENN with Random Forest combination achieved an F1 score of 78.27 ± 1.54, AUC of 87.18 ± 1.12, and G-means of 86.47 ± 1.28. CONCLUSION These combinations effectively address class imbalance, improving the accuracy and reliability of diabetes predictions. The findings highlight the importance of using appropriate data-balancing techniques in medical data analysis.
Collapse
Affiliation(s)
- Maryam Talebi Moghaddam
- Noncommunicable Diseases Research Center, Fasa University of Medical Sciences, Fasa, Iran
- Student of Biostatistics, Department of Biostatistics and Epidemiology, School of Public Health, Kerman University of Medical Sciences, Kerman, Iran
| | - Yones Jahani
- Modeling in Health Research Center Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran
| | - Zahra Arefzadeh
- Faculty of Data Science and Intelligent Systems, Persian Gulf University, Bushehr, Iran
| | - Azizallah Dehghan
- Noncommunicable Diseases Research Center, Fasa University of Medical Sciences, Fasa, Iran
- Department of Epidemiology and Biostatistics, School of Health, Fasa University of Medical Sciences, Fasa, Iran
| | - Mohsen Khaleghi
- Department of Mathematics and Computer Science, Fasa Branch, Islamic Azad University, Fasa, Iran.
| | - Mehdi Sharafi
- Endocrinology and Metabolism Research Center, Hormozgan University of Medical Sciences, Bandar, Abbas, Iran.
| | - Ghasem Nikfar
- Research Development Unit Valiasr Hospital, Fasa University of Medical Sciences, Fasa, Iran
| |
Collapse
|
3
|
Lee HA, Park H, Hong YS. Validation of the Framingham Diabetes Risk Model Using Community-Based KoGES Data. J Korean Med Sci 2024; 39:e47. [PMID: 38317447 PMCID: PMC10843969 DOI: 10.3346/jkms.2024.39.e47] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 12/04/2023] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND An 8-year prediction of the Framingham Diabetes Risk Model (FDRM) was proposed, but the predictor has a gap with current clinical standards. Therefore, we evaluated the validity of the original FDRM in Korean population data, developed a modified FDRM by redefining the predictors based on current knowledge, and evaluated the internal and external validity. METHODS Using data from a community-based cohort in Korea (n = 5,409), we calculated the probability of diabetes through FDRM, and developed a modified FDRM based on modified definitions of hypertension (HTN) and diabetes. We also added clinical features related to diabetes to the predictive model. Model performance was evaluated and compared by area under the curve (AUC). RESULTS During the 8-year follow-up, the cumulative incidence of diabetes was 8.5%. The modified FDRM consisted of age, obesity, HTN, hypo-high-density lipoprotein cholesterol, elevated triglyceride, fasting glucose, and hemoglobin A1c. The expanded clinical model added γ-glutamyl transpeptidase to the modified FDRM. The FDRM showed an estimated AUC of 0.71, and the model's performance improved to an AUC of 0.82 after applying the redefined predictor. Adding clinical features (AUC = 0.83) to the modified FDRM further improved in discrimination, but this was not maintained in the validation data set. External validation was evaluated on population-based cohort data and both modified models performed well, with AUC above 0.82. CONCLUSION The performance of FDRM in the Korean population was found to be acceptable for predicting diabetes, but it was improved when corrected with redefined predictors. The validity of the modified model needs to be further evaluated.
Collapse
Affiliation(s)
- Hye Ah Lee
- Clinical Trial Center, Ewha Womans University Mokdong Hospital, Seoul, Korea.
| | - Hyesook Park
- Department of Preventive Medicine, College of Medicine, Ewha Womans University, Seoul, Korea
- Graduate Program in System Health Science and Engineering, Ewha Womans University, Seoul, Korea
| | - Young Sun Hong
- Department of Internal Medicine, College of Medicine, Ewha Womans University, Seoul, Korea
| |
Collapse
|
4
|
Tanaka M, Akiyama Y, Mori K, Hosaka I, Kato K, Endo K, Ogawa T, Sato T, Suzuki T, Yano T, Ohnishi H, Hanawa N, Furuhashi M. Predictive modeling for the development of diabetes mellitus using key factors in various machine learning approaches. DIABETES EPIDEMIOLOGY AND MANAGEMENT 2024; 13:100191. [DOI: 10.1016/j.deman.2023.100191] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
5
|
Jain N, Patel B, Hanawal M, Lila AR, Memon S, Bandgar T, Kumar A. Machine learning for predicting diabetic metabolism in the Indian population using polar metabolomic and lipidomic features. Metabolomics 2023; 20:1. [PMID: 38017183 DOI: 10.1007/s11306-023-02066-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 11/16/2023] [Indexed: 11/30/2023]
Abstract
AIMS To identify metabolite and lipid biomarkers of diabetes in the Indian subpopulation in newly diagnosed diabetic and long-term diabetic individuals. To utilize the global polar metabolomic and lipidomic profiles to predict the susceptibility of an individual to diabetes using machine learning algorithms. MATERIALS AND METHODS 87 individuals, including healthy, newly diabetic, and long-term diabetics on medication, were included in the study. Post consent, their serum was used to isolate polar metabolome and lipidome. NMR and LCMS were used to identify the polar metabolites and lipids, respectively. Statistical analysis was done to determine significantly altered molecules. NMR and LCMS comprehensive data were utilized to generate diabetic models using machine learning algorithms. 10 more individuals (pre-diabetic) were recruited, and their polar metabolomic and lipidomic profiles were generated. Pre-diabetic metabolic profiles were then utilized to predict the diabetic status of the metabolome and lipidome beyond glucose levels. RESULTS Mannose, Betaine, Xanthine, Triglyceride (38:1), Sphingomyelin (d63:7), and Phosphatidic acid (37:2) are some of the top key biomarkers of diabetes. The predictive model generated showed the receiver operating characteristic area under the curve (ROC-AUC) as 1 on both test and validation data indicating excellent accuracy. This model then predicted the diabetic closeness of the metabolism of pre-diabetic individuals based on probability scores. CONCLUSION Polar metabolic and lipid profile of diabetic individuals is very different from that of healthy individuals. Lipid profile alters before the polar metabolic profile in diabetes-susceptible individuals. Without regard to glucose, the diabetic closeness of the metabolism of any individual can be determined.
Collapse
Affiliation(s)
- Nikita Jain
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Maharashtra, 400076, India
| | - Bhaumik Patel
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Maharashtra, 400076, India
| | - Manjesh Hanawal
- Industrial Engineering and Operations Research, Indian Institute of Technology Bombay, Mumbai, Maharashtra, 400076, India
| | - Anurag R Lila
- Seth G.S. Medical College and KEM Hospital, Parel, Mumbai, 400012, India
| | - Saba Memon
- Seth G.S. Medical College and KEM Hospital, Parel, Mumbai, 400012, India
| | - Tushar Bandgar
- Seth G.S. Medical College and KEM Hospital, Parel, Mumbai, 400012, India
| | - Ashutosh Kumar
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Maharashtra, 400076, India.
- Lab No. 606, Department of Biosciences and Bioengineering, Indian Institute of Technology (IIT) Bombay, Mumbai, 400076, India.
| |
Collapse
|
6
|
Chun JW, Kim HS. The Present and Future of Artificial Intelligence-Based Medical Image in Diabetes Mellitus: Focus on Analytical Methods and Limitations of Clinical Use. J Korean Med Sci 2023; 38:e253. [PMID: 37550811 PMCID: PMC10412032 DOI: 10.3346/jkms.2023.38.e253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 07/12/2023] [Indexed: 08/09/2023] Open
Abstract
Artificial intelligence (AI)-based diagnostic technology using medical images can be used to increase examination accessibility and support clinical decision-making for screening and diagnosis. To determine a machine learning algorithm for diabetes complications, a literature review of studies using medical image-based AI technology was conducted using the National Library of Medicine PubMed, and the Excerpta Medica databases. Lists of studies using diabetes diagnostic images and AI as keywords were combined. In total, 227 appropriate studies were selected. Diabetic retinopathy studies using the AI model were the most frequent (85.0%, 193/227 cases), followed by diabetic foot (7.9%, 18/227 cases) and diabetic neuropathy (2.7%, 6/227 cases). The studies used open datasets (42.3%, 96/227 cases) or directly constructed data from fundoscopy or optical coherence tomography (57.7%, 131/227 cases). Major limitations in AI-based detection of diabetes complications using medical images were the lack of datasets (36.1%, 82/227 cases) and severity misclassification (26.4%, 60/227 cases). Although it remains difficult to use and fully trust AI-based imaging analysis technology clinically, it reduces clinicians' time and labor, and the expectations from its decision-support roles are high. Various data collection and synthesis data technology developments according to the disease severity are required to solve data imbalance.
Collapse
Affiliation(s)
- Ji-Won Chun
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Hun-Sung Kim
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea.
| |
Collapse
|
7
|
Shin J, Lee J, Ko T, Lee K, Choi Y, Kim HS. Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness. J Pers Med 2022; 12:1899. [PMID: 36422075 PMCID: PMC9698354 DOI: 10.3390/jpm12111899] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 11/04/2022] [Accepted: 11/08/2022] [Indexed: 01/25/2024] Open
Abstract
The early prediction of diabetes can facilitate interventions to prevent or delay it. This study proposes a diabetes prediction model based on machine learning (ML) to encourage individuals at risk of diabetes to employ healthy interventions. A total of 38,379 subjects were included. We trained the model on 80% of the subjects and verified its predictive performance on the remaining 20%. Furthermore, the performances of several algorithms were compared, including logistic regression, decision tree, random forest, eXtreme Gradient Boosting (XGBoost), Cox regression, and XGBoost Survival Embedding (XGBSE). The area under the receiver operating characteristic curve (AUROC) of the XGBoost model was the largest, followed by those of the decision tree, logistic regression, and random forest models. For the survival analysis, XGBSE yielded an AUROC exceeding 0.9 for the 2- to 9-year predictions and a C-index of 0.934, while the Cox regression achieved a C-index of 0.921. After lowering the threshold from 0.5 to 0.25, the sensitivity increased from 0.011 to 0.236 for the 2-year prediction model and from 0.607 to 0.994 for the 9-year prediction model, while the specificity showed negligible changes. We developed a high-performance diabetes prediction model that applied the XGBSE algorithm with threshold adjustment. We plan to use this prediction model in real clinical practice for diabetes prevention after simplifying and validating it externally.
Collapse
Affiliation(s)
- Juyoung Shin
- Health Promotion Center, Seoul St. Mary’s Hospital, Seoul 06591, Korea
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Joonyub Lee
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Taehoon Ko
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Kanghyuck Lee
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
- Department of Biomedicine and Health Sciences, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Yera Choi
- NAVER CLOVA AI Lab, Seongnam 13561, Korea
| | - Hun-Sung Kim
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| |
Collapse
|
8
|
Kim H, Jung DY, Lee S, Cho J, Yim HW, Kim H. Retrospective cohort analysis comparing changes in blood glucose level and body composition according to changes in thyroid-stimulating hormone level. J Diabetes 2022; 14:620-629. [PMID: 36114679 PMCID: PMC9512769 DOI: 10.1111/1753-0407.13315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 08/08/2022] [Accepted: 08/27/2022] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND In the euthyroid state, the risk of developing diabetes according to changes in thyroid-stimulating hormone (TSH) levels remains controversial. Additionally, the correlation of various body indices affecting blood glucose levels according to changes in TSH levels over a certain period is not well known. METHODS Patients who underwent health check-ups twice at a 2 year interval at a tertiary university hospital between 2009 and 2018 were included. By dividing baseline TSH levels into quartiles (TSH_Q1, Q2, Q3, and Q4), various variables were compared, and their changes after 2 years (∆TSH_Q1, Q2, Q3, and Q4) were confirmed. RESULTS Among 15 557 patients, the incidence of diabetes mellitus after 2 years was 2.4% (377/15 557 patients). There was no statistically significant difference in the incidence of diabetes according to TSH_Q (p = 0.243) or ∆TSH_Q (p = 0.131). However, as TSH levels increased, skeletal muscle mass decreased (p < 0.001), and body fat mass and percent body fat significantly increased (p < 0.001). As ∆TSH increased, ∆fasting blood glucose and ∆body mass index also significantly increased (all p < 0.001). The incidence of diabetes decreased significantly as skeletal muscle mass increased (odds ratio 0.734, p < 0.001). CONCLUSIONS Owing to the short study period, it was not possible to prove a statistical relationship between the incidence of diabetes mellitus and TSH levels in the euthyroid state. Significant decreases in skeletal muscle mass and increases in body mass index and body fat mass according to baseline TSH levels were demonstrated. Therefore, a focus on improving physical functions, such as increasing muscle mass and decreasing fat, is required in this case.
Collapse
Affiliation(s)
- Hyunah Kim
- College of PharmacySookmyung Women's UniversitySeoulRepublic of Korea
| | - Da Young Jung
- Department of Biostatistics, Clinical Research Coordinating Center, Catholic Medical CenterThe Catholic University of KoreaSeoulRepublic of Korea
| | - Seung‐Hwan Lee
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary's Hospital, College of MedicineThe Catholic University of KoreaSeoulRepublic of Korea
| | - Jae‐Hyoung Cho
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary's Hospital, College of MedicineThe Catholic University of KoreaSeoulRepublic of Korea
| | - Hyeon Woo Yim
- Department of Preventive Medicine, College of MedicineThe Catholic University of KoreaSeoulRepublic of Korea
| | - Hun‐Sung Kim
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary's Hospital, College of MedicineThe Catholic University of KoreaSeoulRepublic of Korea
- Department of Medical Informatics, College of MedicineThe Catholic University of KoreaSeoulRepublic of Korea
| |
Collapse
|