1
|
Waqas Khan Q, Iqbal K, Ahmad R, Rizwan A, Nawaz Khan A, Kim D. An intelligent diabetes classification and perception framework based on ensemble and deep learning method. PeerJ Comput Sci 2024; 10:e1914. [PMID: 38660179 PMCID: PMC11041940 DOI: 10.7717/peerj-cs.1914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 02/06/2024] [Indexed: 04/26/2024]
Abstract
Sugar in the blood can harm individuals and their vital organs, potentially leading to blindness, renal illness, as well as kidney and heart diseases. Globally, diabetic patients face an average annual mortality rate of 38%. This study employs Chi-square, mutual information, and sequential feature selection (SFS) to choose features for training multiple classifiers. These classifiers include an artificial neural network (ANN), a random forest (RF), a gradient boosting (GB) algorithm, Tab-Net, and a support vector machine (SVM). The goal is to predict the onset of diabetes at an earlier age. The classifier, developed based on the selected features, aims to enable early diagnosis of diabetes. The PIMA and early-risk diabetes datasets serve as test subjects for the developed system. The feature selection technique is then applied to focus on the most important and relevant features for model training. The experiment findings conclude that the ANN exhibited a spectacular performance in terms of accuracy on the PIMA dataset, achieving a remarkable accuracy rate of 99.35%. The second experiment, conducted on the early diabetes risk dataset using selected features, revealed that RF achieved an accuracy of 99.36%. Based on our experimental results, it can be concluded that our suggested method significantly outperformed baseline machine learning algorithms already employed for diabetes prediction on both datasets.
Collapse
Affiliation(s)
- Qazi Waqas Khan
- Department of Computer Engineering, Jeju National University, South Korea, Jeju-si, Jeju, South Korea
| | - Khalid Iqbal
- Department of Computer Science, COMSATS University Islamabad, Attock Campus, Attock, Punjab, Pakistan
| | - Rashid Ahmad
- Department of Computer Science, COMSATS University Islamabad, Attock Campus, Attock, Punjab, Pakistan
- Bigdata Research Center, Jeju National University, Jeju-si, Jeju, South Korea
| | - Atif Rizwan
- Department of Computer Engineering, Jeju National University, South Korea, Jeju-si, Jeju, South Korea
| | - Anam Nawaz Khan
- Department of Computer Engineering, Jeju National University, South Korea, Jeju-si, Jeju, South Korea
| | - DoHyeun Kim
- Department of Computer Engineering, Jeju National University, South Korea, Jeju-si, Jeju, South Korea
| |
Collapse
|
2
|
Zhang M, Jiao H, Wang C, Qu Y, Lv S, Zhao D, Zhong X. Physical activity, sleep disorders, and type of work in the prevention of cognitive function decline in patients with hypertension. BMC Public Health 2023; 23:2431. [PMID: 38057774 PMCID: PMC10699000 DOI: 10.1186/s12889-023-17343-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 11/26/2023] [Indexed: 12/08/2023] Open
Abstract
BACKGROUND Hypertensive patients are likelier to have cognitive function decline (CFD). This study aimed to explore physical activity level, sleep disorders, and type of work that influenced intervention effects on cognitive function decline in hypertensive patients and to establish a decision tree model to analyze their predictive significance on the incidence of CFD in hypertensive patients. METHODS This cross-sectional study recruited patients with essential hypertension from several hospitals in Shandong Province from May 2022 to December 2022. Subject exclusion criteria included individuals diagnosed with congestive heart failure, valvular heart disease, cardiac surgery, hepatic and renal dysfunction, and malignancy. Recruitment is through multiple channels such as hospital medical and surgical outpatient clinics, wards, and health examination centers. Cognitive function was assessed using the Mini-Mental State Examination (MMSE), and sleep quality was assessed using the Pittsburgh Sleep Quality Index (PSQI). Moreover, we obtained information on the patients' type of work through a questionnaire and their level of physical activity through the International Physical Activity Questionnaire (IPAQ). RESULTS The logistic regression analysis results indicate that sleep disorder is a significant risk factor for CFD in hypertension patients(OR:1.85, 95%CI:[1.16,2.94]), mental workers(OR:0.12, 95%CI: [0.04,0.37]) and those who perform both manual and mental workers(OR: 0.5, 95%CI: [0.29,0.86]) exhibit protective effects against CFD. Compared to low-intensity, moderate physical activity(OR: 0.53, 95%CI: [0.32,0.87]) and high-intensity physical activity(OR: 0.26, 95%CI: [0.12,0.58]) protects against CFD in hypertension patients. The importance of predictors in the decision tree model was ranked as follows: physical activity level (54%), type of work (27%), and sleep disorders (19%). The area under the ROC curves the decision tree model predicted was 0.72 [95% CI: 0.68 to 0.76]. CONCLUSION Moderate and high-intensity physical activity may reduce the risk of developing CFD in hypertensive patients. Sleep disorders is a risk factor for CFD in hypertensive patients. Hypertensive patients who engage in mental work and high-intensity physical activity effectively mitigate the onset of CFD in hypertensive patients.
Collapse
Affiliation(s)
- Mengdi Zhang
- The First Clinical Medical College, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Huachen Jiao
- Department of Cardiology, Affiliated Hospital of Shandong University of Traditional Chinese Medicine, No. 42, Wenhua West Road, Lixia District, Jinan, Shandong, China.
| | - Cong Wang
- Department of Cardiology, Affiliated Hospital of Shandong University of Traditional Chinese Medicine, No. 42, Wenhua West Road, Lixia District, Jinan, Shandong, China
| | - Ying Qu
- The First Clinical Medical College, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Shunxin Lv
- College of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Dongsheng Zhao
- The First Clinical Medical College, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Xia Zhong
- The First Clinical Medical College, Shandong University of Traditional Chinese Medicine, Jinan, China
| |
Collapse
|
3
|
Mondal HS, Ahmed KA, Birbilis N, Hossain MZ. Machine learning for detecting DNA attachment on SPR biosensor. Sci Rep 2023; 13:3742. [PMID: 36879019 PMCID: PMC9987359 DOI: 10.1038/s41598-023-29395-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 02/03/2023] [Indexed: 03/08/2023] Open
Abstract
Optoelectric biosensors measure the conformational changes of biomolecules and their molecular interactions, allowing researchers to use them in different biomedical diagnostics and analysis activities. Among different biosensors, surface plasmon resonance (SPR)-based biosensors utilize label-free and gold-based plasmonic principles with high precision and accuracy, allowing these gold-based biosensors as one of the preferred methods. The dataset generated from these biosensors are being used in different machine learning (ML) models for disease diagnosis and prognosis, but there is a scarcity of models to develop or assess the accuracy of SPR-based biosensors and ensure a reliable dataset for downstream model development. Current study proposed innovative ML-based DNA detection and classification models from the reflective light angles on different gold surfaces of biosensors and associated properties. We have conducted several statistical analyses and different visualization techniques to evaluate the SPR-based dataset and applied t-SNE feature extraction and min-max normalization to differentiate classifiers of low-variances. We experimented with several ML classifiers, namely support vector machine (SVM), decision tree (DT), multi-layer perceptron (MLP), k-nearest neighbors (KNN), logistic regression (LR) and random forest (RF) and evaluated our findings in terms of different evaluation metrics. Our analysis showed the best accuracy of 0.94 by RF, DT and KNN for DNA classification and 0.96 by RF and KNN for DNA detection tasks. Considering area under the receiver operating characteristic curve (AUC) (0.97), precision (0.96) and F1-score (0.97), we found RF performed best for both tasks. Our research shows the potentiality of ML models in the field of biosensor development, which can be expanded to develop novel disease diagnosis and prognosis tools in the future.
Collapse
Affiliation(s)
- Himadri Shekhar Mondal
- ANU College of Engineering, Computing and Cybernetics, The Australian National University, Canberra, ACT, 2600, Australia. .,Data61, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, ACT, 2601, Australia.
| | - Khandaker Asif Ahmed
- Australian Centre for Disease Preparedness (ACDP), CSIRO, Geelong, VIC, 3220, Australia
| | - Nick Birbilis
- ANU College of Engineering, Computing and Cybernetics, The Australian National University, Canberra, ACT, 2600, Australia.,Faculty of Science, Engineering and Built Environment, Deakin University, Burwood, VIC, 3125, Australia
| | - Md Zakir Hossain
- ANU College of Engineering, Computing and Cybernetics, The Australian National University, Canberra, ACT, 2600, Australia. .,Biological Data Science Institute, The Australian National University, Canberra, ACT, 2600, Australia. .,Data61, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, ACT, 2601, Australia. .,Faculty of Science and Engineering, Curtin University, Perth, WA, 6102, Australia.
| |
Collapse
|
4
|
Mistry S, Riches NO, Gouripeddi R, Facelli JC. Environmental exposures in machine learning and data mining approaches to diabetes etiology: A scoping review. Artif Intell Med 2023; 135:102461. [PMID: 36628796 PMCID: PMC9834645 DOI: 10.1016/j.artmed.2022.102461] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 10/06/2022] [Accepted: 11/23/2022] [Indexed: 12/03/2022]
Abstract
BACKGROUND Environmental exposures are implicated in diabetes etiology, but are poorly understood due to disease heterogeneity, complexity of exposures, and analytical challenges. Machine learning and data mining are artificial intelligence methods that can address these limitations. Despite their increasing adoption in etiology and prediction of diabetes research, the types of methods and exposures analyzed have not been thoroughly reviewed. OBJECTIVE We aimed to review articles that implemented machine learning and data mining methods to understand environmental exposures in diabetes etiology and disease prediction. METHODS We queried PubMed and Scopus databases for machine learning and data mining studies that used environmental exposures to understand diabetes etiology on September 19th, 2022. Exposures were classified into specific external, general external, or internal exposures. We reviewed machine learning and data mining methods and characterized the scope of environmental exposures studied in the etiology of general diabetes, type 1 diabetes, type 2 diabetes, and other types of diabetes. RESULTS We identified 44 articles for inclusion. Specific external exposures were the most common exposures studied, and supervised models were the most common methods used. Well-established specific external exposures of low physical activity, high cholesterol, and high triglycerides were predictive of general diabetes, type 2 diabetes, and prediabetes, while novel metabolic and gut microbiome biomarkers were implicated in type 1 diabetes. DISCUSSION The use of machine learning and data mining methods to elucidate environmental triggers of diabetes was largely limited to well-established risk factors identified using easily explainable and interpretable models. Future studies should seek to leverage machine learning and data mining to explore the temporality and co-occurrence of multiple exposures and further evaluate the role of general external and internal exposures in diabetes etiology.
Collapse
Affiliation(s)
- Sejal Mistry
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Center of Excellence for Exposure Health Informatics, University of Utah, Salt Lake City, UT, USA
| | - Naomi O Riches
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Center of Excellence for Exposure Health Informatics, University of Utah, Salt Lake City, UT, USA; Department of Obstetrics and Gynecology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Ramkiran Gouripeddi
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Center of Excellence for Exposure Health Informatics, University of Utah, Salt Lake City, UT, USA; Clinical and Translational Science Institute, University of Utah, Salt Lake City, UT, USA
| | - Julio C Facelli
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Center of Excellence for Exposure Health Informatics, University of Utah, Salt Lake City, UT, USA; Clinical and Translational Science Institute, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
5
|
An automated unsupervised deep learning–based approach for diabetic retinopathy detection. Med Biol Eng Comput 2022; 60:3635-3654. [DOI: 10.1007/s11517-022-02688-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 10/02/2022] [Indexed: 11/07/2022]
|
6
|
Kibria HB, Matin A. The Severity Prediction of The Binary And Multi-Class Cardiovascular Disease - A Machine Learning-Based Fusion Approach. Comput Biol Chem 2022; 98:107672. [DOI: 10.1016/j.compbiolchem.2022.107672] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 02/25/2022] [Accepted: 03/26/2022] [Indexed: 12/22/2022]
|
7
|
Naz H, Ahuja S. SMOTE-SMO-based expert system for type II diabetes detection using PIMA dataset. Int J Diabetes Dev Ctries 2021. [DOI: 10.1007/s13410-021-00969-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
8
|
Brnabic A, Hess LM. Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making. BMC Med Inform Decis Mak 2021; 21:54. [PMID: 33588830 PMCID: PMC7885605 DOI: 10.1186/s12911-021-01403-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 01/20/2021] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. METHODS This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. RESULTS A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. CONCLUSIONS A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.
Collapse
Affiliation(s)
| | - Lisa M Hess
- Eli Lilly and Company, Indianapolis, IN, USA.
| |
Collapse
|
9
|
Wang Z, Hou J, Shi Y, Tan Q, Peng L, Deng Z, Wang Z, Guo Z. Influence of Lifestyles on Mild Cognitive Impairment: A Decision Tree Model Study. Clin Interv Aging 2020; 15:2009-2017. [PMID: 33149562 PMCID: PMC7604452 DOI: 10.2147/cia.s265839] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 08/20/2020] [Indexed: 12/26/2022] Open
Abstract
Objective To explore the effects of different lifestyle choices on mild cognitive impairment (MCI) and to establish a decision tree model to analyse their predictive significance on the incidence of MCI. Methods Study participants were recruited from geriatric and physical examination centres from October 2015 to October 2019: 330 MCI patients and 295 normal cognitive (NC) patients. Cognitive function was evaluated by the Mini-Mental State Examination Scale (MMSE) and Clinical Dementia Scale (CDR), while the Barthel Index (BI) was used to evaluate life ability. Statistical analysis included the χ2 test, logistic regression, and decision tree. The ROC curve was drawn to evaluate the predictive ability of the decision tree model. Results Logistic regression analysis showed that low education, living alone, smoking, and a high-fat diet were risk factors for MCI, while young age, tea drinking, afternoon naps, social engagement, and hobbies were protective factors for MCI. Social engagement, a high-fat diet, hobbies, living condition, tea drinking, and smoking entered all nodes of the decision tree model, with social engagement as the root node variable. The importance of predictive variables in the decision tree model showed social engagement, a high-fat diet, tea drinking, hobbies, living condition, and smoking as 33.57%, 27.74%, 22.14%, 11.94%, 4.61%, and 0%, respectively. The area under the ROC curve predicted by the decision tree model was 0.827 (95% CI: 0.795~0.856). Conclusion The decision tree model has good predictive ability. MCI was closely related to lifestyle; social engagement was the most important factor in predicting the occurrence of MCI.
Collapse
Affiliation(s)
- Zongqiu Wang
- Department of Geriatrics, The Affiliated Hospital of Qingdao University, Qingdao, People's Republic of China
| | - Jiwen Hou
- Department of Geriatrics, The Affiliated Hospital of Chengdu University, Chengdu, People's Republic of China
| | - Yu Shi
- Department of Critical Medicine, Weihai Central Hospital, Weihai, People's Republic of China
| | - Qiaowen Tan
- Department of Geriatrics, The Affiliated Hospital of Qingdao University, Qingdao, People's Republic of China
| | - Lin Peng
- Department of Geriatrics, The Affiliated Hospital of Qingdao University, Qingdao, People's Republic of China
| | - Zhiying Deng
- Department of Geriatrics, The Affiliated Hospital of Qingdao University, Qingdao, People's Republic of China
| | - Zhihong Wang
- Department of Geriatrics, The Affiliated Hospital of Qingdao University, Qingdao, People's Republic of China
| | - Zongjun Guo
- Department of Geriatrics, The Affiliated Hospital of Qingdao University, Qingdao, People's Republic of China
| |
Collapse
|
10
|
Naz H, Ahuja S. Deep learning approach for diabetes prediction using PIMA Indian dataset. J Diabetes Metab Disord 2020; 19:391-403. [PMID: 32550190 DOI: 10.1007/s40200-020-00520-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 03/20/2020] [Indexed: 12/18/2022]
Abstract
Purpose International Diabetes Federation (IDF) stated that 382 million people are living with diabetes worldwide. Over the last few years, the impact of diabetes has been increased drastically, which makes it a global threat. At present, Diabetes has steadily been listed in the top position as a major cause of death. The number of affected people will reach up to 629 million i.e. 48% increase by 2045. However, diabetes is largely preventable and can be avoided by making lifestyle changes. These changes can also lower the chances of developing heart disease and cancer. So, there is a dire need for a prognosis tool that can help the doctors with early detection of the disease and hence can recommend the lifestyle changes required to stop the progression of the deadly disease. Method Diabetes if untreated may turn into fatal and directly or indirectly invites lot of other diseases such as heart attack, heart failure, brain stroke and many more. Therefore, early detection of diabetes is very significant so that timely action can be taken and the progression of the disease may be prevented to avoid further complications. Healthcare organizations accumulate huge amount of data including Electronic health records, images, omics data, and text but gaining knowledge and insight into the data remains a key challenge. The latest advances in Machine learning technologies can be applied for obtaining hidden patterns, which may diagnose diabetes at an early phase. This research paper presents a methodology for diabetes prediction using a diverse machine learning algorithm using the PIMA dataset. Results The accuracy achieved by functional classifiers Artificial Neural Network (ANN), Naive Bayes (NB), Decision Tree (DT) and Deep Learning (DL) lies within the range of 90-98%. Among the four of them, DL provides the best results for diabetes onset with an accuracy rate of 98.07% on the PIMA dataset. Hence, this proposed system provides an effective prognostic tool for healthcare officials. The results obtained can be used to develop a novel automatic prognosis tool that can be helpful in early detection of the disease. Conclusion The outcome of the study confirms that DL provides the best results with the most promising extracted features. DL achieves the accuracy of 98.07% which can be used for further development of the automatic prognosis tool. The accuracy of the DL approach can further be enhanced by including the omics data for prediction of the onset of the disease.
Collapse
Affiliation(s)
- Huma Naz
- Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
| | - Sachin Ahuja
- Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
| |
Collapse
|
11
|
Classification and prediction of diabetes disease using machine learning paradigm. Health Inf Sci Syst 2020; 8:7. [PMID: 31949894 DOI: 10.1007/s13755-019-0095-z] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 12/21/2019] [Indexed: 12/19/2022] Open
Abstract
Background and objectives Diabetes is a chronic disease characterized by high blood sugar. It may cause many complicated disease like stroke, kidney failure, heart attack, etc. About 422 million people were affected by diabetes disease in worldwide in 2014. The figure will be reached 642 million in 2040. The main objective of this study is to develop a machine learning (ML)-based system for predicting diabetic patients. Materials and methods Logistic regression (LR) is used to identify the risk factors for diabetes disease based on p value and odds ratio (OR). We have adopted four classifiers like naïve Bayes (NB), decision tree (DT), Adaboost (AB), and random forest (RF) to predict the diabetic patients. Three types of partition protocols (K2, K5, and K10) have also adopted and repeated these protocols into 20 trails. Performances of these classifiers are evaluated using accuracy (ACC) and area under the curve (AUC). Results We have used diabetes dataset, conducted in 2009-2012, derived from the National Health and Nutrition Examination Survey. The dataset consists of 6561 respondents with 657 diabetic and 5904 controls. LR model demonstrates that 7 factors out of 14 as age, education, BMI, systolic BP, diastolic BP, direct cholesterol, and total cholesterol are the risk factors for diabetes. The overall ACC of ML-based system is 90.62%. The combination of LR-based feature selection and RF-based classifier gives 94.25% ACC and 0.95 AUC for K10 protocol. Conclusion The combination of LR and RF-based classifier performs better. This combination will be very helpful for predicting diabetic patients.
Collapse
|
12
|
Xue M, Su Y, Li C, Wang S, Yao H. Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework. J Diabetes Res 2020; 2020:6873891. [PMID: 33029536 PMCID: PMC7532405 DOI: 10.1155/2020/6873891] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 08/01/2020] [Accepted: 09/02/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND An estimated 425 million people globally have diabetes, accounting for 12% of the world's health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. METHODS A total of 584,168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by p values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables' importance scores of T2DM. RESULTS The results indicated that XGBoost had the best performance (accuracy = 0.906, precision = 0.910, recall = 0.902, F-1 = 0.906, and AUC = 0.968). The degree of variables' importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving). CONCLUSIONS We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables' importance scores gives a clue to prevent diabetes occurrence.
Collapse
Affiliation(s)
- Mingyue Xue
- Hospital of Traditional Chinese Medicine Affiliated to the Fourth Clinical Medical College of Xinjiang Medical University, Urumqi, China
- College of Public Health, Xinjiang Medical University, Urumqi, China
| | - Yinxia Su
- College of Public Health, Xinjiang Medical University, Urumqi, China
| | - Chen Li
- The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
| | - Shuxia Wang
- Center of Health Management, The First Affiliated Hospital, Xinjiang Medical University, Urumqi, China
| | - Hua Yao
- Center of Health Management, The First Affiliated Hospital, Xinjiang Medical University, Urumqi, China
| |
Collapse
|
13
|
Pei D, Yang T, Zhang C. Estimation of Diabetes in a High-Risk Adult Chinese Population Using J48 Decision Tree Model. Diabetes Metab Syndr Obes 2020; 13:4621-4630. [PMID: 33273837 PMCID: PMC7705272 DOI: 10.2147/dmso.s279329] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 10/27/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND To predict and make an early diagnosis of diabetes is a critical approach in a population with high risk of diabetes, one of the devastating diseases globally. Traditional and conventional blood tests are recommended for screening the suspected patients; however, applying these tests could have health side effects and expensive cost. The goal of this study was to establish a simple and reliable predictive model based on the risk factors associated with diabetes using a decision tree algorithm. METHODS A retrospective cross-sectional study was used in this study. A total of 10,436 participants who had a health check-up from January 2017 to July 2017 were recruited. With appropriate data mining approaches, 3454 participants remained in the final dataset for further analysis. Seventy percent of these participants (2420 cases) were then randomly allocated to either the training dataset for the construction of the decision tree or the testing dataset (30%, 1034 cases) for evaluation of the performance of the decision tree. For this purpose, the cost-sensitive J48 algorithm was used to develop the decision tree model. RESULTS Utilizing all the key features of the dataset consisting of 14 input variables and two output variables, the constructed decision tree model identified several key factors that are closely linked to the development of diabetes and are also modifiable. Furthermore, our model achieved an accuracy of classification of 90.3% with a precision of 89.7% and a recall of 90.3%. CONCLUSION By applying simple and cost-effective classification rules, our decision tree model estimates the development of diabetes in a high-risk adult Chinese population with strong potential for implementation of diabetes management.
Collapse
Affiliation(s)
- Dongmei Pei
- Department of Health Management, Shengjing Hospital of China Medical University, Shenyang, People’s Republic of China
- Correspondence: Dongmei Pei Department of Health Management, Shengjing Hospital of China Medical University, No. 36, Sanhao Street, Heping District, Shenyang110004, People’s Republic of China Email
| | - Tengfei Yang
- Department of Health Management, Shengjing Hospital of China Medical University, Shenyang, People’s Republic of China
| | - Chengpu Zhang
- Department of Health Management, Shengjing Hospital of China Medical University, Shenyang, People’s Republic of China
| |
Collapse
|