1
|
Jiang L, Xia Z, Zhu R, Gong H, Wang J, Li J, Wang L. Diabetes risk prediction model based on community follow-up data using machine learning. Prev Med Rep 2023; 35:102358. [PMID: 37654514 PMCID: PMC10465943 DOI: 10.1016/j.pmedr.2023.102358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/31/2023] [Accepted: 08/01/2023] [Indexed: 09/02/2023] Open
Abstract
Diabetes is a chronic metabolic disease characterized by hyperglycemia, the follow-up management of diabetes patients is mostly in the community, but the relationship between key lifestyle indicators in community follow-up and the risk of diabetes is unclear. In order to explore the association between key life characteristic indicators of community follow-up and the risk of diabetes, 252,176 follow-up records of people with diabetes patients from 2016 to 2023 were obtained from Haizhu District, Guangzhou. According to the follow-up data, the key life characteristic indicators that affect diabetes are determined, and the optimal feature subset is obtained through feature selection technology to accurately assess the risk of diabetes. A diabetes risk assessment model based on a random forest classifier was designed, which used optimal feature parameter selection and algorithm model comparison, with an accuracy of 91.24% and an AUC corresponding to the ROC curve of 97%. In order to improve the applicability of the model in clinical and real life, a diabetes risk score card was designed and tested using the original data, the accuracy was 95.15%, and the model reliability was high. The diabetes risk prediction model based on community follow-up big data mining can be used for large-scale risk screening and early warning by community doctors based on patient follow-up data, further promoting diabetes prevention and control strategies, and can also be used for wearable devices or intelligent biosensors for individual patient self examination, in order to improve lifestyle and reduce risk factor levels.
Collapse
Affiliation(s)
- Liangjun Jiang
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| | - Zhenhua Xia
- Electronics & Information School of Yangtze University, Jingzhou, China
| | - Ronghui Zhu
- Shenzhen Nanshan Medical Group HQ, Shenzhen, China
| | - Haimei Gong
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| | - Jing Wang
- E-link Wisdom Co., Ltd, Shenzhen, China
| | - Juan Li
- Haizhu District Community Health Development Guidance Center, Guangzhou, China
| | - Lei Wang
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| |
Collapse
|
2
|
Chellappan D, Rajaguru H. Detection of Diabetes through Microarray Genes with Enhancement of Classifiers Performance. Diagnostics (Basel) 2023; 13:2654. [PMID: 37627916 PMCID: PMC10453776 DOI: 10.3390/diagnostics13162654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 08/06/2023] [Accepted: 08/07/2023] [Indexed: 08/27/2023] Open
Abstract
Diabetes is a life-threatening, non-communicable disease. Diabetes mellitus is a prevalent chronic disease with a significant global impact. The timely detection of diabetes in patients is necessary for an effective treatment. The primary objective of this study is to propose a novel approach for identifying type II diabetes mellitus using microarray gene data. Specifically, our research focuses on the performance enhancement of methods for detecting diabetes. Four different Dimensionality Reduction techniques, Detrend Fluctuation Analysis (DFA), the Chi-square probability density function (Chi2pdf), the Firefly algorithm, and Cuckoo Search, are used to reduce high dimensional data. Metaheuristic algorithms like Particle Swarm Optimization (PSO) and Harmonic Search (HS) are used for feature selection. Seven classifiers, Non-Linear Regression (NLR), Linear Regression (LR), Logistics Regression (LoR), Gaussian Mixture Model (GMM), Bayesian Linear Discriminant Classifier (BLDC), Softmax Discriminant Classifier (SDC), and Support Vector Machine-Radial Basis Function (SVM-RBF), are utilized to classify the diabetic and non-diabetic classes. The classifiers' performances are analyzed through parameters such as accuracy, recall, precision, F1 score, error rate, Matthews Correlation Coefficient (MCC), Jaccard metric, and kappa. The SVM (RBF) classifier with the Chi2pdf Dimensionality Reduction technique with a PSO feature selection method attained a high accuracy of 91% with a Kappa of 0.7961, outperforming all of the other classifiers.
Collapse
Affiliation(s)
- Dinesh Chellappan
- Department of Electrical and Electronics Engineering, KPR Institute of Engineering and Technology, Coimbatore 641 407, Tamil Nadu, India
| | - Harikumar Rajaguru
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638 401, Tamil Nadu, India;
| |
Collapse
|
3
|
Li L, Cheng Y, Ji W, Liu M, Hu Z, Yang Y, Wang Y, Zhou Y. Machine learning for predicting diabetes risk in western China adults. Diabetol Metab Syndr 2023; 15:165. [PMID: 37501094 PMCID: PMC10373320 DOI: 10.1186/s13098-023-01112-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 06/15/2023] [Indexed: 07/29/2023] Open
Abstract
OBJECTIVE Diabetes mellitus is a global epidemic disease. Long-time exposure of patients to hyperglycemia can lead to various type of chronic tissue damage. Early diagnosis of and screening for diabetes are crucial to population health. METHODS We collected the national physical examination data in Xinjiang, China, in 2020 (a total of more than 4 million people). Three types of physical examination indices were analyzed: questionnaire, routine physical examination and laboratory values. Integrated learning, deep learning and logistic regression methods were used to establish a risk model for type-2 diabetes mellitus. In addition, to improve the convenience and flexibility of the model, a diabetes risk score card was established based on logistic regression to assess the risk of the population. RESULTS An XGBoost-based risk prediction model outperformed the other five risk assessment algorithms. The AUC of the model was 0.9122. Based on the feature importance ranking map, we found that hypertension, fasting blood glucose, age, coronary heart disease, ethnicity, parental diabetes mellitus, triglycerides, waist circumference, total cholesterol, and body mass index were the most important features of the risk prediction model for type-2 diabetes. CONCLUSIONS This study established a diabetes risk assessment model based on multiple ethnicities, a large sample and many indices, and classified the diabetes risk of the population, thus providing a new forecast tool for the screening of patients and providing information on diabetes prevention for healthy populations.
Collapse
Affiliation(s)
- Lin Li
- Zhongshan School of Medicine, Sun Yat-sen University, No. 74, Zhongshan Second Road, Yuexiu District, Guangzhou, 510080, Guangdong, China
| | - Yinlin Cheng
- Zhongshan School of Medicine, Sun Yat-sen University, No. 74, Zhongshan Second Road, Yuexiu District, Guangzhou, 510080, Guangdong, China
| | - Weidong Ji
- Zhongshan School of Medicine, Sun Yat-sen University, No. 74, Zhongshan Second Road, Yuexiu District, Guangzhou, 510080, Guangdong, China
| | - Mimi Liu
- Zhongshan School of Medicine, Sun Yat-sen University, No. 74, Zhongshan Second Road, Yuexiu District, Guangzhou, 510080, Guangdong, China
| | - Zhensheng Hu
- Zhongshan School of Medicine, Sun Yat-sen University, No. 74, Zhongshan Second Road, Yuexiu District, Guangzhou, 510080, Guangdong, China
| | - Yining Yang
- People's Hospital of Xinjiang Uygur Autonomous Region, No. 91 Tianchi Road, Tianshan District, Urumqi, 830001, Xijiang, China.
| | - Yushan Wang
- Center of Health Management, The First Affiliated Hospital of Xinjiang Medical University, No. 393, Xinyi Road, Xinshi District, Urumqi, 830054, Xinjiang, China.
| | - Yi Zhou
- Zhongshan School of Medicine, Sun Yat-sen University, No. 74, Zhongshan Second Road, Yuexiu District, Guangzhou, 510080, Guangdong, China.
| |
Collapse
|
4
|
Mao Y, Zhu Z, Pan S, Lin W, Liang J, Huang H, Li L, Wen J, Chen G. Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real-world retrospective cohort study. J Diabetes Investig 2022; 14:309-320. [PMID: 36345236 PMCID: PMC9889616 DOI: 10.1111/jdi.13937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 10/04/2022] [Accepted: 10/16/2022] [Indexed: 11/11/2022] Open
Abstract
AIMS/INTRODUCTION To compare the application value of different machine learning (ML) algorithms for diabetes risk prediction. MATERIALS AND METHODS This is a 3-year retrospective cohort study with a total of 3,687 participants being included in the data analysis. Modeling variable screening and predictive model building were carried out using logistic regression (LR) analysis and 10-fold cross-validation, respectively. In total, six different ML algorithms, including random forests, light gradient boosting machine, extreme gradient boosting, adaptive boosting (AdaBoost), multi-layer perceptrons and gaussian naive bayes were used for model construction. Model performance was mainly evaluated by the area under the receiver operating characteristic curve. The best performing ML model was selected for comparison with the traditional LR model and visualized using Shapley additive explanations. RESULTS A total of eight risk factors most associated with the development of diabetes were identified by univariate and multivariate LR analysis, and they were visualized in the form of a nomogram. Among the six different ML models, the random forests model had the best predictive performance. After 10-fold cross-validation, its optimal model has an area under the receiver operating characteristic value of 0.855 (95% confidence interval [CI] 0.823-0.886) in the training set and 0.835 (95% CI 0.779-0.892) in the test set. In the traditional LR model, its area under the receiver operating characteristic value is 0.840 (95% CI 0.814-0.866) in the training set and 0.834 (95% CI 0.785-0.884) in the test set. CONCLUSIONS In the real-world epidemiological research, the combination of traditional variable screening and ML algorithm to construct a diabetes risk prediction model has satisfactory clinical application value.
Collapse
Affiliation(s)
- Yaqian Mao
- Department of Internal Medicine, Fujian Provincial Hospital South BranchShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Zheng Zhu
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Shuyao Pan
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Wei Lin
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Jixing Liang
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Huibin Huang
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Liantao Li
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Junping Wen
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Gang Chen
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina,Fujian Provincial Key Laboratory of Medical Analysis, Fujian Academy of MedicalFuzhouChina
| |
Collapse
|
5
|
Islam MM, Rahman MJ, Menhazul Abedin M, Ahammed B, Ali M, Ahmed NF, Maniruzzaman M. Identification of the risk factors of type 2 diabetes and its prediction using machine learning techniques. Health Syst (Basingstoke) 2022; 12:243-254. [PMID: 37234468 PMCID: PMC10208154 DOI: 10.1080/20476965.2022.2141141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 10/20/2022] [Indexed: 11/07/2022] Open
Abstract
This study identified the risk factors for type 2 diabetes (T2D) and proposed a machine learning (ML) technique for predicting T2D. The risk factors for T2D were identified by multiple logistic regression (MLR) using p-value (p<0.05). Then, five ML-based techniques, including logistic regression, naïve Bayes, J48, multilayer perceptron, and random forest (RF) were employed to predict T2D. This study utilized two publicly available datasets, derived from the National Health and Nutrition Examination Survey, 2009-2010 and 2011-2012. About 4922 respondents with 387 T2D patients were included in 2009-2010 dataset, whereas 4936 respondents with 373 T2D patients were included in 2011-2012. This study identified six risk factors (age, education, marital status, SBP, smoking, and BMI) for 2009-2010 and nine risk factors (age, race, marital status, SBP, DBP, direct cholesterol, physical activity, smoking, and BMI) for 2011-2012. RF-based classifier obtained 95.9% accuracy, 95.7% sensitivity, 95.3% F-measure, and 0.946 area under the curve.
Collapse
Affiliation(s)
- Md. Merajul Islam
- Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh
- Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh, Bangladesh
| | | | | | - Benojir Ahammed
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | - Mohammad Ali
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | - N.A.M Faisal Ahmed
- Institute of Education and Research, University of Rajshahi, Rajshahi, Bangladesh
| | | |
Collapse
|
6
|
Xia S, Zhang Y, Peng B, Hu X, Zhou L, Chen C, Lu C, Chen M, Pang C, Dai Y, Ji J. Detection of mild cognitive impairment in type 2 diabetes mellitus based on machine learning using privileged information. Neurosci Lett 2022; 791:136908. [DOI: 10.1016/j.neulet.2022.136908] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 09/28/2022] [Accepted: 10/04/2022] [Indexed: 01/21/2023]
|
7
|
He F, Xie L, Sun X, Xu J, Li Y, Liu R, Sun K, Shen D, Gu J, Ji T, Guo W. A Scoring System for Predicting Neoadjuvant Chemotherapy Response in Primary High-Grade Bone Sarcomas: A Multicenter Study. Orthop Surg 2022; 14:2499-2509. [PMID: 36017768 PMCID: PMC9531107 DOI: 10.1111/os.13469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 07/10/2022] [Accepted: 07/25/2022] [Indexed: 11/27/2022] Open
Abstract
Objective Currently, there is a lack of good clinical tools for evaluating the effect of chemotherapy preoperatively on primary high‐grade bone sarcomas. Our goal was to investigate the predictive value of the clinical findings and establish a scoring system to predict chemotherapy response. Methods We conducted a retrospective multicenter cohort study and reviewed 322 patients with primary high‐grade bone sarcomas. Patients who routinely received neoadjuvant chemotherapy and underwent primary tumor resection with an assessment of tumor necrosis rate (TNR) were enrolled in this study. The medical records of patients were collected from November 1, 2011, to March 1, 2018, at Peking University People's Hospital (PKUPH) and Peking University Shougang Hospital (PKUSH). The mean age of the patients was 16.2 years (range 3–52 years), of whom 65.5% were male. The clinical data collected before and after neoadjuvant chemotherapy included the degree of pain, laboratory inspection, X‐ray, CT, contrast‐enhanced magnetic resonance (MR), and positron emission tomography‐computed tomography (PET‐CT). Several machine learning models, including logistic regression, decision trees, support vector machines, and neural networks, were used to classify the chemotherapy responses. Area under the curve (AUC) of the scoring system to predict chemotherapy response is the primary outcome measure. Results For patients without events, a minimum follow‐up of 24 months was achieved. The median follow‐up time was 43.3 months, and it ranged from 24 to 84 months. The 5 years progression‐free survival (PFS) of the included patients was 54.1%. The 5 years PFS rate was 39.7% for poor responders and 74.9% for good responders. Features such as longest diameter reduction ratio (up to three points), clear bone boundary formation (up to two points), tumor necrosis measured by magnetic resonance (up to two points), maximum standard uptake value (SUVmax) decrease (up to three points), and significant alkaline phosphatase decrease (up to 1 point) were identified as significant predictors of good histological response and constituted the scoring system. A score ≥4 predicts a good response to chemotherapy. The scoring system based on the above factors performed well, achieving an AUC of 0.893. For nonmeasurable lesions (classified by the revised Response Evaluation Criteria in Solid Tumors [RECIST 1.1]), the AUC was 0.901. Conclusion We first devised a well‐performing comprehensive scoring system to predict the response to neoadjuvant chemotherapy in primary high‐grade bone sarcomas.
Collapse
Affiliation(s)
- Fangzhou He
- Musculoskeletal Tumor Center, Peking University People's Hospital, Beijing, China
| | - Lu Xie
- Musculoskeletal Tumor Center, Peking University People's Hospital, Beijing, China
| | - Xin Sun
- Musculoskeletal Tumor Center, Peking University People's Hospital, Beijing, China
| | - Jie Xu
- Musculoskeletal Tumor Center, Peking University People's Hospital, Beijing, China
| | - Yuan Li
- Department of Radiology, Peking University People's Hospital, Beijing, China
| | - Rong Liu
- Department of Radiology, Peking University People's Hospital, Beijing, China
| | - Kunkun Sun
- Department of Pathology, Peking University People's Hospital, Beijing, China
| | - Danhua Shen
- Department of Pathology, Peking University People's Hospital, Beijing, China
| | - Jin Gu
- Department of Surgical Oncology, Peking University Shougang Hospital, Beijing, China
| | - Tao Ji
- Musculoskeletal Tumor Center, Peking University People's Hospital, Beijing, China
| | - Wei Guo
- Musculoskeletal Tumor Center, Peking University People's Hospital, Beijing, China
| |
Collapse
|
8
|
Liu Q, Zhou Q, He Y, Zou J, Guo Y, Yan Y. Predicting the 2-Year Risk of Progression from Prediabetes to Diabetes Using Machine Learning among Chinese Elderly Adults. J Pers Med 2022; 12:jpm12071055. [PMID: 35887552 PMCID: PMC9324396 DOI: 10.3390/jpm12071055] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Revised: 06/06/2022] [Accepted: 06/23/2022] [Indexed: 11/18/2022] Open
Abstract
Identifying people with a high risk of developing diabetes among those with prediabetes may facilitate the implementation of a targeted lifestyle and pharmacological interventions. We aimed to establish machine learning models based on demographic and clinical characteristics to predict the risk of incident diabetes. We used data from the free medical examination service project for elderly people who were 65 years or older to develop logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost) machine learning models for the follow-up results of 2019 and 2020 and performed internal validation. The receiver operating characteristic (ROC), sensitivity, specificity, accuracy, and F1 score were used to select the model with better performance. The average annual progression rate to diabetes in prediabetic elderly people was 14.21%. Each model was trained using eight features and one outcome variable from 9607 prediabetic individuals, and the performance of the models was assessed in 2402 prediabetes patients. The predictive ability of four models in the first year was better than in the second year. The XGBoost model performed relatively efficiently (ROC: 0.6742 for 2019 and 0.6707 for 2020). We established and compared four machine learning models to predict the risk of progression from prediabetes to diabetes. Although there was little difference in the performance of the four models, the XGBoost model had a relatively good ROC value, which might perform well in future exploration in this field.
Collapse
Affiliation(s)
- Qing Liu
- Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, China; (Q.L.); (Q.Z.)
| | - Qing Zhou
- Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, China; (Q.L.); (Q.Z.)
| | - Yifeng He
- School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; (Y.H.); (J.Z.)
| | - Jingui Zou
- School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; (Y.H.); (J.Z.)
| | - Yan Guo
- Wuhan Center for Disease Control and Prevention, Wuhan 430015, China;
| | - Yaqiong Yan
- Wuhan Center for Disease Control and Prevention, Wuhan 430015, China;
- Correspondence:
| |
Collapse
|
9
|
Gollapalli M, Alansari A, Alkhorasani H, Alsubaii M, Sakloua R, Alzahrani R, Taha Al-Hariri M, Nasser Alfares M, AlKhafaji D, Jaafar Al Argan R, Albaker W. A novel stacking ensemble for detecting three types of diabetes mellitus using a Saudi Arabian dataset: Pre-diabetes, T1DM, and T2DM. Comput Biol Med 2022; 147:105757. [DOI: 10.1016/j.compbiomed.2022.105757] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 05/27/2022] [Accepted: 06/18/2022] [Indexed: 11/29/2022]
|
10
|
Delpino F, Costa Â, Farias S, Chiavegatto Filho A, Arcêncio R, Nunes B. Machine learning for predicting chronic diseases: a systematic review. Public Health 2022; 205:14-25. [DOI: 10.1016/j.puhe.2022.01.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 10/26/2021] [Accepted: 01/11/2022] [Indexed: 12/12/2022]
|
11
|
A novel kernel based approach to arbitrary length symbolic data with application to type 2 diabetes risk. Sci Rep 2022; 12:4985. [PMID: 35322076 PMCID: PMC8943170 DOI: 10.1038/s41598-022-08757-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 03/07/2022] [Indexed: 11/08/2022] Open
Abstract
Predictive modeling of clinical data is fraught with challenges arising from the manner in which events are recorded. Patients typically fall ill at irregular intervals and experience dissimilar intervention trajectories. This results in irregularly sampled and uneven length data which poses a problem for standard multivariate tools. The alternative of feature extraction into equal-length vectors via methods like Bag-of-Words (BoW) potentially discards useful information. We propose an approach based on a kernel framework in which data is maintained in its native form: discrete sequences of symbols. Kernel functions derived from the edit distance between pairs of sequences may then be utilized in conjunction with support vector machines to classify the data. Our method is evaluated in the context of the prediction task of determining patients likely to develop type 2 diabetes following an earlier episode of elevated blood pressure of 130/80 mmHg. Kernels combined via multi kernel learning achieved an F1-score of 0.96, outperforming classification with SVM 0.63, logistic regression 0.63, Long Short Term Memory 0.61 and Multi-Layer Perceptron 0.54 applied to a BoW representation of the data. We achieved an F1-score of 0.97 on MKL on external dataset. The proposed approach is consequently able to overcome limitations associated with feature-based classification in the context of clinical data.
Collapse
|
12
|
Samet S, Laouar MR, Bendib I, Eom S. Analysis and Prediction of Diabetes Disease Using Machine Learning Methods. INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY 2022. [DOI: 10.4018/ijdsst.303943] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
To increase healthcare quality, early illness prediction helps patients prevent potentially life-threatening health issues before it is too late. Artificial intelligence is a rapidly evolving area, and its applications to diabetes, a worldwide epidemic, have the potential to revolutionize the way diabetes is diagnosed and managed. A total of six supervised machine learning algorithms based on patient data were used and compared to predict the diagnosis of diabetes mellitus. For experiments, the Pima Indians Diabetes Database was used, and their missing values were carefully handled by different techniques. For random train-test splits, the Random Forest classification algorithm achieved an accuracy rate of 92 percent. This model outperforms other state-of-the-art approaches due to the application of a combination of techniques for dealing with missing values (the mixture of imputing missing values techniques) that is proposed. With this approach, the models of this manuscript achieved better accuracy than prior work done with the Pima diabetes data.
Collapse
Affiliation(s)
- Sarra Samet
- Laboratory of Mathematics, Informatics, and Systems (LAMIS), University of Larbi Tebessi, Algeria
| | - Mohamed Ridda Laouar
- Laboratory of Mathematics, Informatics, and Systems (LAMIS), University of Larbi Tebessi, Algeria
| | - Issam Bendib
- Laboratory of Mathematics, Informatics, and Systems (LAMIS), University of Larbi Tebessi, Algeria
| | - Sean Eom
- Department of Management, Southeast Missouri State University, USA
| |
Collapse
|
13
|
Fregoso-Aparicio L, Noguez J, Montesinos L, García-García JA. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol Metab Syndr 2021; 13:148. [PMID: 34930452 PMCID: PMC8686642 DOI: 10.1186/s13098-021-00767-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 12/07/2021] [Indexed: 12/12/2022] Open
Abstract
Diabetes Mellitus is a severe, chronic disease that occurs when blood glucose levels rise above certain limits. Over the last years, machine and deep learning techniques have been used to predict diabetes and its complications. However, researchers and developers still face two main challenges when building type 2 diabetes predictive models. First, there is considerable heterogeneity in previous studies regarding techniques used, making it challenging to identify the optimal one. Second, there is a lack of transparency about the features used in the models, which reduces their interpretability. This systematic review aimed at providing answers to the above challenges. The review followed the PRISMA methodology primarily, enriched with the one proposed by Keele and Durham Universities. Ninety studies were included, and the type of model, complementary techniques, dataset, and performance parameters reported were extracted. Eighteen different types of models were compared, with tree-based algorithms showing top performances. Deep Neural Networks proved suboptimal, despite their ability to deal with big and dirty data. Balancing data and feature selection techniques proved helpful to increase the model's efficiency. Models trained on tidy datasets achieved almost perfect models.
Collapse
Affiliation(s)
- Luis Fregoso-Aparicio
- School of Engineering and Sciences, Tecnologico de Monterrey, Av Lago de Guadalupe KM 3.5, Margarita Maza de Juarez, 52926 Cd Lopez Mateos, Mexico
| | - Julieta Noguez
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - Luis Montesinos
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - José A. García-García
- Hospital General de Mexico Dr. Eduardo Liceaga, Dr. Balmis 148, Doctores, Cuauhtemoc, 06720 Mexico City, Mexico
| |
Collapse
|
14
|
Wang Y, Wang L, Su Y, Zhong L, Peng B. Prediction model for the onset risk of impaired fasting glucose: a 10-year longitudinal retrospective cohort health check-up study. BMC Endocr Disord 2021; 21:211. [PMID: 34686184 PMCID: PMC8540134 DOI: 10.1186/s12902-021-00878-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Accepted: 10/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Impaired fasting glucose (IFG) is a prediabetic condition. Considering that the clinical symptoms of IFG are inconspicuous, these tend to be easily ignored by individuals, leading to conversion to diabetes mellitus (DM). In this study, we established a prediction model for the onset risk of IFG in the Chongqing health check-up population to provide a reference for prevention in a health check-up cohort. METHODS We conducted a retrospective longitudinal cohort study in Chongqing, China from January 2009 to December 2019. The qualified subjects were more than 20 years old and had more than two health check-ups. After following the inclusion and exclusion criteria, the cohort population was randomly divided into a training set and a test set at a ratio of 7:3. We first selected the predictor variables through the univariate generalized estimation equation (GEE), and then the training set was used to establish the IFG risk model based on multivariate GEE. Finally, the sensitivity, specificity, and receiver operating characteristic curves were used to verify the performance of the model. RESULTS A total of 4,926 subjects were included in this study, with an average of 3.87 check-up records, including 2,634 males and 2,292 females. There were 442 IFG cases during the follow-up period, including 286 men and 156 women. The incidence density was 26.88/1000 person-years for men and 18.53/1000 person-years for women (P<0.001). The predictor variables of our prediction model include male (relative risk (RR) =1.422, 95 % confidence interval (CI): 0.923-2.193, P=0.3849), age (RR=1.030, 95 %CI: 1.016-1.044, P<0.0001), waist circumference (RR=1.005, 95 %CI: 0.999-1.012, P=0.0975), systolic blood pressure (RR=1.004, 95 %CI: 0.993-1.016, P=0.4712), diastolic blood pressure (RR=1.023, 95 %CI: 1.005-1.041, P=0.0106), obesity (RR=1.797, 95 %CI: 1.126-2.867, P=0.0140), triglycerides (RR=1.107, 95 %CI: 0.943-1.299, P=0.2127), high-density lipoprotein cholesterol (RR=0.992, 95 %CI: 0.476-2.063, P=0.9818), low-density lipoprotein cholesterol (RR=1.793, 95 %CI: 1.085-2.963, P=0.0228), blood urea (RR=1.142, 95 %CI: 1.022-1.276, P=0.0192), serum uric acid (RR=1.004, 95 %CI: 1.002-1.005, P=0.0003), total cholesterol (RR=0.674, 95 %CI: 0.403-1.128, P=0.1331), and serum creatinine levels (RR=0.960, 95 %CI: 0.945-0.976, P<0.0001). The area under the receiver operating characteristic curve (AUC) in the training set was 0.740 (95 %CI: 0.712-0.768), and the AUC in the test set was 0.751 (95 %CI: 0.714-0.817). CONCLUSIONS The prediction model for the onset risk of IFG had good predictive ability in the health check-up cohort.
Collapse
Affiliation(s)
- Yuqi Wang
- Department of Epidemiology and Health Statistics, School of Public Health and Management, Chongqing Medical University, 400016 Chongqing, China
- Medical Data Research Institute of Chongqing Medical University, 400016 Chongqing, China
| | - Liangxu Wang
- School of Basic Medicine, Kunming Medical University, 650031 Kunming, China
| | - Yanli Su
- The First Affiliated Hospital of Chongqing Medical University Health Management Centre, 400016 Chongqing, China
| | - Li Zhong
- The First Affiliated Hospital of Chongqing Medical University Health Management Centre, 400016 Chongqing, China
| | - Bin Peng
- Department of Epidemiology and Health Statistics, School of Public Health and Management, Chongqing Medical University, 400016 Chongqing, China
| |
Collapse
|
15
|
Rufo DD, Debelee TG, Ibenthal A, Negera WG. Diagnosis of Diabetes Mellitus Using Gradient Boosting Machine (LightGBM). Diagnostics (Basel) 2021; 11:1714. [PMID: 34574055 PMCID: PMC8467876 DOI: 10.3390/diagnostics11091714] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 09/06/2021] [Accepted: 09/17/2021] [Indexed: 12/01/2022] Open
Abstract
Diabetes mellitus (DM) is a severe chronic disease that affects human health and has a high prevalence worldwide. Research has shown that half of the diabetic people throughout the world are unaware that they have DM and its complications are increasing, which presents new research challenges and opportunities. In this paper, we propose a preemptive diagnosis method for diabetes mellitus (DM) to assist or complement the early recognition of the disease in countries with low medical expert densities. Diabetes data are collected from the Zewditu Memorial Hospital (ZMHDD) in Addis Ababa, Ethiopia. Light Gradient Boosting Machine (LightGBM) is one of the most recent successful research findings for the gradient boosting framework that uses tree-based learning algorithms. It has low computational complexity and, therefore, is suited for applications in limited capacity regions such as Ethiopia. Thus, in this study, we apply the principle of LightGBM to develop an accurate model for the diagnosis of diabetes. The experimental results show that the prepared diabetes dataset is informative to predict the condition of diabetes mellitus. With accuracy, AUC, sensitivity, and specificity of 98.1%, 98.1%, 99.9%, and 96.3%, respectively, the LightGBM model outperformed KNN, SVM, NB, Bagging, RF, and XGBoost in the case of the ZMHDD dataset.
Collapse
Affiliation(s)
- Derara Duba Rufo
- College of Engineering and Technology, Dilla University, Dilla 419, Ethiopia;
| | - Taye Girma Debelee
- College of Electrical and Mechanical Engineering, Addis Ababa Science and Technology University, Addis Ababa 120611, Ethiopia;
- Ethiopian Artificial Intelligence Center, Addis Ababa 40782, Ethiopia;
| | - Achim Ibenthal
- Faculty of Engineering and Health, HAWK Universityof Applied Sciences and Arts, 37085 Göttingen, Germany
| | | |
Collapse
|
16
|
Kanimozhi N, Singaravel G. Hybrid artificial fish particle swarm optimizer and kernel extreme learning machine for type-II diabetes predictive model. Med Biol Eng Comput 2021; 59:841-867. [PMID: 33738640 DOI: 10.1007/s11517-021-02333-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 02/03/2021] [Indexed: 10/21/2022]
Abstract
The World Health Organization (WHO) estimated that in 2016, 1.6 million deaths caused were due to diabetes. Precise and on-time diagnosis of type-II diabetes is crucial to reduce the risk of various diseases such as heart disease, stroke, kidney disease, diabetic retinopathy, diabetic neuropathy, and macrovascular problems. The non-invasive methods like machine learning are reliable and efficient in classifying the people subjected to type-II diabetics risk and healthy people into two different categories. This present study aims to develop a stacking-based integrated kernel extreme learning machine (KELM) model for identifying the risk of type-II diabetic patients based on the follow-up time on the diabetes research center dataset. The Pima Indian Diabetic Dataset (PIDD) and a Diabetic Research Center dataset are used in this study. A min-max normalization is used to preprocess the noisy datasets. The Hybrid Particle Swarm Optimization-Artificial Fish Swarm Optimization (HAFPSO) algorithm used satisfies the multi-objective problem by increasing the Classification Accuracy (CA) and decreasing the kernel complexity of the optimal learners (NBC) selected. At last, the model is integrated by utilizing the KELM as a meta-classifier which combines the predictions of the twenty Base Learners as a whole. The proposed classification method helps the clinicians to predict the patients who are at a high risk of type-II diabetes in the future with the highest accuracy of 98.5%. The proposed method is tested with different measures such as accuracy, sensitivity, specificity, Mathews Correlation Coefficient, and Kappa Statistics are calculated. The results obtained show that the KELM-HAFPSO approach is a promising new tool for identifying type-II diabetes.
Collapse
Affiliation(s)
- N Kanimozhi
- Department of Computer Science and Engineering, GKM College of Engineering and Technology, Chennai, India.
| | - G Singaravel
- Department of Information Technology, K S Rangasamy College of Engineering, Tiruchengode, India
| |
Collapse
|
17
|
Ray A, Chaudhuri AK. Smart healthcare disease diagnosis and patient management: Innovation, improvement and skill development. MACHINE LEARNING WITH APPLICATIONS 2021. [DOI: 10.1016/j.mlwa.2020.100011] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
18
|
Chen W, Alexandre PA, Ribeiro G, Fukumasu H, Sun W, Reverter A, Li Y. Identification of Predictor Genes for Feed Efficiency in Beef Cattle by Applying Machine Learning Methods to Multi-Tissue Transcriptome Data. Front Genet 2021; 12:619857. [PMID: 33664767 PMCID: PMC7921797 DOI: 10.3389/fgene.2021.619857] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 01/15/2021] [Indexed: 12/22/2022] Open
Abstract
Machine learning (ML) methods have shown promising results in identifying genes when applied to large transcriptome datasets. However, no attempt has been made to compare the performance of combining different ML methods together in the prediction of high feed efficiency (HFE) and low feed efficiency (LFE) animals. In this study, using RNA sequencing data of five tissues (adrenal gland, hypothalamus, liver, skeletal muscle, and pituitary) from nine HFE and nine LFE Nellore bulls, we evaluated the prediction accuracies of five analytical methods in classifying FE animals. These included two conventional methods for differential gene expression (DGE) analysis (t-test and edgeR) as benchmarks, and three ML methods: Random Forests (RFs), Extreme Gradient Boosting (XGBoost), and combination of both RF and XGBoost (RX). Utility of a subset of candidate genes selected from each method for classification of FE animals was assessed by support vector machine (SVM). Among all methods, the smallest subsets of genes (117) identified by RX outperformed those chosen by t-test, edgeR, RF, or XGBoost in classification accuracy of animals. Gene co-expression network analysis confirmed the interactivity existing among these genes and their relevance within the network related to their prediction ranking based on ML. The results demonstrate a great potential for applying a combination of ML methods to large transcriptome datasets to identify biologically important genes for accurately classifying FE animals.
Collapse
Affiliation(s)
- Weihao Chen
- College of Animal Science and Technology, Yangzhou University, Yangzhou, China.,CSIRO Agriculture and Food, St Lucia, QLD, Australia
| | | | - Gabriela Ribeiro
- School of Animal Science and Food Engineering, University of São Paulo, Pirassununga, Brazil
| | - Heidge Fukumasu
- School of Animal Science and Food Engineering, University of São Paulo, Pirassununga, Brazil
| | - Wei Sun
- College of Animal Science and Technology, Yangzhou University, Yangzhou, China.,Institute of Agriculture Science and Technology Development, Yangzhou University, Yangzhou, China.,Joint International Research Laboratory of Agriculture and Agri-Product Safety of Ministry of Education of China, Yangzhou University, Yangzhou, China
| | | | - Yutao Li
- CSIRO Agriculture and Food, St Lucia, QLD, Australia
| |
Collapse
|
19
|
Abstract
Machine learning shows enormous potential in facilitating decision-making regarding kidney diseases. With the development of data preservation and processing, as well as the advancement of machine learning algorithms, machine learning is expected to make remarkable breakthroughs in nephrology. Machine learning models have yielded many preliminaries to moderate and several excellent achievements in the fields, including analysis of renal pathological images, diagnosis and prognosis of chronic kidney diseases and acute kidney injury, as well as management of dialysis treatments. However, it is just scratching the surface of the field; at the same time, machine learning and its applications in renal diseases are facing a number of challenges. In this review, we discuss the application status, challenges and future prospects of machine learning in nephrology to help people further understand and improve the capacity for prediction, detection, and care quality in kidney diseases.
Collapse
|
20
|
Novel Machine Learning Can Predict Acute Asthma Exacerbation. Chest 2021; 159:1747-1757. [PMID: 33440184 DOI: 10.1016/j.chest.2020.12.051] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 12/11/2020] [Accepted: 12/16/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Asthma exacerbations result in significant health and economic burden, but are difficult to predict. RESEARCH QUESTION Can machine learning (ML) models with large-scale outpatient data predict asthma exacerbations? STUDY DESIGN AND METHODS We analyzed data extracted from electronic health records (EHRs) of asthma patients treated at the Cleveland Clinic from 2010 through 2018. Demographic information, comorbidities, laboratory values, and asthma medications were included as covariates. Three different models were built with logistic regression, random forests, and a gradient boosting decision tree to predict: (1) nonsevere asthma exacerbation requiring oral glucocorticoid burst, (2) ED visits, and (3) hospitalizations. RESULTS Of 60,302 patients, 19,772 (32.8%) had at least one nonsevere exacerbation requiring oral glucocorticoid burst, 1,748 (2.9%) requiring and ED visit and 902 (1.5%) requiring hospitalization. Nonsevere exacerbation, ED visit, and hospitalization were predicted best by light gradient boosting machine, an algorithm used in ML to fit predictive analytic models, and had an area under the receiver operating characteristic curve of 0.71 (95% CI, 0.70-0.72), 0.88 (95% CI, 0.86-0.89), and 0.85 (95% CI, 0.82-0.88), respectively. Risk factors for all three outcomes included age, long-acting β agonist, high-dose inhaled glucocorticoid, or chronic oral glucocorticoid therapy. In subgroup analysis of 9,448 patients with spirometry data, low FEV1 and FEV1 to FVC ratio were identified as top risk factors for asthma exacerbation, ED visits, and hospitalization. However, adding pulmonary function tests did not improve models' prediction performance. INTERPRETATION Models built with an ML algorithm from real-world outpatient EHR data accurately predicted asthma exacerbation and can be incorporated into clinical decision tools to enhance outpatient care and to prevent adverse outcomes.
Collapse
|
21
|
Wu Y, Hu H, Cai J, Chen R, Zuo X, Cheng H, Yan D. Machine Learning for Predicting the 3-Year Risk of Incident Diabetes in Chinese Adults. Front Public Health 2021; 9:626331. [PMID: 34268283 PMCID: PMC8275929 DOI: 10.3389/fpubh.2021.626331] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 05/21/2021] [Indexed: 02/05/2023] Open
Abstract
Purpose: We aimed to establish and validate a risk assessment system that combines demographic and clinical variables to predict the 3-year risk of incident diabetes in Chinese adults. Methods: A 3-year cohort study was performed on 15,928 Chinese adults without diabetes at baseline. All participants were randomly divided into a training set (n = 7,940) and a validation set (n = 7,988). XGBoost method is an effective machine learning technique used to select the most important variables from candidate variables. And we further established a stepwise model based on the predictors chosen by the XGBoost model. The area under the receiver operating characteristic curve (AUC), decision curve and calibration analysis were used to assess discrimination, clinical use and calibration of the model, respectively. The external validation was performed on a cohort of 11,113 Japanese participants. Result: In the training and validation sets, 148 and 145 incident diabetes cases occurred. XGBoost methods selected the 10 most important variables from 15 candidate variables. Fasting plasma glucose (FPG), body mass index (BMI) and age were the top 3 important variables. And we further established a stepwise model and a prediction nomogram. The AUCs of the stepwise model were 0.933 and 0.910 in the training and validation sets, respectively. The Hosmer-Lemeshow test showed a perfect fit between the predicted diabetes risk and the observed diabetes risk (p = 0.068 for the training set, p = 0.165 for the validation set). Decision curve analysis presented the clinical use of the stepwise model and there was a wide range of alternative threshold probability spectrum. And there were almost no the interactions between these predictors (most P-values for interaction >0.05). Furthermore, the AUC for the external validation set was 0.830, and the Hosmer-Lemeshow test for the external validation set showed no statistically significant difference between the predicted diabetes risk and observed diabetes risk (P = 0.824). Conclusion: We established and validated a risk assessment system for characterizing the 3-year risk of incident diabetes.
Collapse
Affiliation(s)
- Yang Wu
- Department of Endocrinology, The First Affiliated Hospital of Shenzhen University, Shenzhen, China
- Department of Endocrinology, Shenzhen Second People's Hospital, Shenzhen, China
- Shenzhen University Health Science Center, Shenzhen, China
| | - Haofei Hu
- Shenzhen University Health Science Center, Shenzhen, China
- Department of Nephrology, The First Affiliated Hospital of Shenzhen University, Shenzhen, China
- Department of Nephrology, Shenzhen Second People's Hospital, Shenzhen, China
| | - Jinlin Cai
- Department of Endocrinology, The First Affiliated Hospital of Shenzhen University, Shenzhen, China
- Department of Endocrinology, Shenzhen Second People's Hospital, Shenzhen, China
- Shantou University Medical College, Shantou, China
| | - Runtian Chen
- Department of Endocrinology, The First Affiliated Hospital of Shenzhen University, Shenzhen, China
- Department of Endocrinology, Shenzhen Second People's Hospital, Shenzhen, China
- Shenzhen University Health Science Center, Shenzhen, China
| | - Xin Zuo
- Department of Endocrinology, The Third People's Hospital of Shenzhen, Shenzhen, China
| | - Heng Cheng
- Department of Endocrinology, The Third People's Hospital of Shenzhen, Shenzhen, China
| | - Dewen Yan
- Department of Endocrinology, The First Affiliated Hospital of Shenzhen University, Shenzhen, China
- Department of Endocrinology, Shenzhen Second People's Hospital, Shenzhen, China
- Shenzhen University Health Science Center, Shenzhen, China
- *Correspondence: Dewen Yan
| |
Collapse
|
22
|
Prediction of Type 2 Diabetes Risk and Its Effect Evaluation Based on the XGBoost Model. Healthcare (Basel) 2020; 8:healthcare8030247. [PMID: 32751894 PMCID: PMC7551910 DOI: 10.3390/healthcare8030247] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2020] [Revised: 07/27/2020] [Accepted: 07/29/2020] [Indexed: 11/17/2022] Open
Abstract
In view of the harm of diabetes to the population, we have introduced an ensemble learning algorithm—EXtreme Gradient Boosting (XGBoost) to predict the risk of type 2 diabetes and compared it with Support Vector Machines (SVM), the Random Forest (RF) and K-Nearest Neighbor (K-NN) algorithm in order to improve the prediction effect of existing models. The combination of convenient sampling and snowball sampling in Xicheng District, Beijing was used to conduct a questionnaire survey on the personal data, eating habits, exercise status and family medical history of 380 middle-aged and elderly people. Then, we trained the models and obtained the disease risk index for each sample with 10-fold cross-validation. Experiments were made to compare the commonly used machine learning algorithms mentioned above and we found that XGBoost had the best prediction effect, with an average accuracy of 0.8909 and the area under the receiver’s working characteristic curve (AUC) was 0.9182. Therefore, due to the superiority of its architecture, XGBoost has more outstanding prediction accuracy and generalization ability than existing algorithms in predicting the risk of type 2 diabetes, which is conducive to the intelligent prevention and control of diabetes in the future.
Collapse
|
23
|
Musacchio N, Giancaterini A, Guaita G, Ozzello A, Pellegrini MA, Ponzani P, Russo GT, Zilich R, de Micheli A. Artificial Intelligence and Big Data in Diabetes Care: A Position Statement of the Italian Association of Medical Diabetologists. J Med Internet Res 2020; 22:e16922. [PMID: 32568088 PMCID: PMC7338925 DOI: 10.2196/16922] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 03/09/2020] [Accepted: 04/12/2020] [Indexed: 12/24/2022] Open
Abstract
Since the last decade, most of our daily activities have become digital. Digital health takes into account the ever-increasing synergy between advanced medical technologies, innovation, and digital communication. Thanks to machine learning, we are not limited anymore to a descriptive analysis of the data, as we can obtain greater value by identifying and predicting patterns resulting from inductive reasoning. Machine learning software programs that disclose the reasoning behind a prediction allow for “what-if” models by which it is possible to understand if and how, by changing certain factors, one may improve the outcomes, thereby identifying the optimal behavior. Currently, diabetes care is facing several challenges: the decreasing number of diabetologists, the increasing number of patients, the reduced time allowed for medical visits, the growing complexity of the disease both from the standpoints of clinical and patient care, the difficulty of achieving the relevant clinical targets, the growing burden of disease management for both the health care professional and the patient, and the health care accessibility and sustainability. In this context, new digital technologies and the use of artificial intelligence are certainly a great opportunity. Herein, we report the results of a careful analysis of the current literature and represent the vision of the Italian Association of Medical Diabetologists (AMD) on this controversial topic that, if well used, may be the key for a great scientific innovation. AMD believes that the use of artificial intelligence will enable the conversion of data (descriptive) into knowledge of the factors that “affect” the behavior and correlations (predictive), thereby identifying the key aspects that may establish an improvement of the expected results (prescriptive). Artificial intelligence can therefore become a tool of great technical support to help diabetologists become fully responsible of the individual patient, thereby assuring customized and precise medicine. This, in turn, will allow for comprehensive therapies to be built in accordance with the evidence criteria that should always be the ground for any therapeutic choice.
Collapse
Affiliation(s)
| | - Annalisa Giancaterini
- Diabetology Service, Muggiò Polyambulatory, Azienda Socio Sanitaria Territoriale, Monza, Italy
| | - Giacomo Guaita
- Diabetology, Endocrinology and Metabolic Diseases Service, Azienda Tutela Salute Sardegna-Azienda Socio Sanitaria Locale, Carbonia, Italy
| | - Alessandro Ozzello
- Departmental Structure of Endocrine Diseases and Diabetology, Azienda Sanitaria Locale TO3, Pinerolo, Italy
| | - Maria A Pellegrini
- Italian Association of Diabetologists, Rome, Italy.,New Coram Limited Liability Company, Udine, Italy
| | - Paola Ponzani
- Operative Unit of Diabetology, La Colletta Hospital, Azienda Sanitaria Locale 3, Genova, Italy
| | - Giuseppina T Russo
- Department of Clinical and Experimental Medicine, University of Messina, Messina, Italy
| | | | - Alberto de Micheli
- Associazione dei Cavalieri Italiani del Sovrano Militare Ordine di Malta, Genova, Italy
| |
Collapse
|
24
|
Islam MM, Rahman MJ, Chandra Roy D, Maniruzzaman M. Automated detection and classification of diabetes disease based on Bangladesh demographic and health survey data, 2011 using machine learning approach. Diabetes Metab Syndr 2020; 14:217-219. [PMID: 32193086 DOI: 10.1016/j.dsx.2020.03.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/12/2020] [Revised: 03/08/2020] [Accepted: 03/09/2020] [Indexed: 10/24/2022]
Abstract
BACKGROUND AND AIMS Diabetes has been recognized as a continuing health challenge for the twenty-first century, both in developed and developing countries including Bangladesh. The main objective of this study is to use machine learning (ML) based classifiers for automated detection and classification of diabetes. METHODS The diabetes dataset have taken from Bangladesh demographic and health survey, 2011 data having 1569 respondents are 127 diabetes. Two statistical tests as independent t for continuous and chi-square for categorical variables are used to determine the risk factors of diabetes. Six ML-based classifiers as support vector machine, random forest, linear discriminant analysis, logistic regression, k-nearest neighborhood, bagged classification and regression tree (Bagged CART) have been adopted to predict and classify of diabetes. RESULTS Our findings show that 11 factors out of 15 factors are significantly associated with diabetes. Bagged CART provides the highest accuracy and area under the curve of 94.3% and 0.600. CONCLUSIONS Bagged CART anticipates a very supportive computational resource for classification of diabetes and it would be very helpful to the doctors for making a decision to control diabetes disease in Bangladesh.
Collapse
Affiliation(s)
- Md Merajul Islam
- Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| | - Md Jahanur Rahman
- Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| | - Dulal Chandra Roy
- Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| | - Md Maniruzzaman
- Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh; Statistics Discipline, Khulna University, Khulna, 9208, Bangladesh.
| |
Collapse
|