1
|
El-Bashbishy AES, El-Bakry HM. Pediatric diabetes prediction using deep learning. Sci Rep 2024; 14:4206. [PMID: 38378741 PMCID: PMC11291908 DOI: 10.1038/s41598-024-51438-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 01/04/2024] [Indexed: 02/22/2024] Open
Abstract
This study proposed a novel technique for early diabetes prediction with high accuracy. Recently, Deep Learning (DL) has been proven to be expeditious in the diagnosis of diabetes. The supported model is constructed by implementing ten hidden layers and a multitude of epochs using the Deep Neural Network (DNN)-based multi-layer perceptron (MLP) algorithm. We proceeded to meticulously fine-tune the hyperparameters within the fully automated DL architecture to optimize data preprocessing, prediction, and classification using a novel dataset of Mansoura University Children's Hospital Diabetes (MUCHD), which allowed for a comprehensive evaluation of the system's performance. The system was validated and tested using a sample of 548 patients, each with 18 significant features. Various validation metrics were employed to ensure the reliability of the results using cross-validation approaches with various statistical measures of accuracy, F-score, precision, sensitivity, specificity, and Dice similarity coefficient. The high performance of the proposed system can help clinicians accurately diagnose diabetes, with a remarkable accuracy rate of 99.8%. According to our analysis, implementing this method results in a noteworthy increase of 0.39% in the overall system performance compared to the current state-of-the-art methods. Therefore, we recommend using this method to predict diabetes.
Collapse
Affiliation(s)
- Abeer El-Sayyid El-Bashbishy
- Information Systems Department, Faculty of Computer and Information Sciences, Mansoura University, Mansoura, Egypt.
| | - Hazem M El-Bakry
- Head of Information Systems Department, Faculty of Computer and Information Sciences, Mansoura University, Mansoura, Egypt
| |
Collapse
|
2
|
Feng X, Cai Y, Xin R. Optimizing diabetes classification with a machine learning-based framework. BMC Bioinformatics 2023; 24:428. [PMID: 37957549 PMCID: PMC10644638 DOI: 10.1186/s12859-023-05467-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 09/04/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND Diabetes is a metabolic disorder usually caused by insufficient secretion of insulin from the pancreas or insensitivity of cells to insulin, resulting in long-term elevated blood sugar levels in patients. Patients usually present with frequent urination, thirst, and hunger. If left untreated, it can lead to various complications that can affect essential organs and even endanger life. Therefore, developing an intelligent diagnosis framework for diabetes is necessary. RESULT This paper proposes a machine learning-based diabetes classification framework machine learning optimized GAN. The framework encompasses several methodological approaches to address the diverse challenges encountered during the analysis. These approaches encompass the implementation of the mean and median joint filling method for handling missing values, the application of the cap method for outlier processing, and the utilization of SMOTEENN to mitigate sample imbalance. Additionally, the framework incorporates the employment of the proposed Diabetes Classification Model based on Generative Adversarial Network and employs logistic regression for detailed feature analysis. The effectiveness of the framework is evaluated using both the PIMA dataset and the diabetes dataset obtained from the GEO database. The experimental findings showcase our model achieved exceptional results, including a binary classification accuracy of 96.27%, tertiary classification accuracy of 99.31%, precision and f1 score of 0.9698, recall of 0.9698, and an AUC of 0.9702. CONCLUSION The experimental results show that the framework proposed in this paper can accurately classify diabetes and provide new ideas for intelligent diagnosis of diabetes.
Collapse
Affiliation(s)
- Xin Feng
- School of Science, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China
- State Key Laboratory of Inorganic Synthesis and Preparative Chemistry, College of Chemistry, Jilin University, Changchun, 130012, People's Republic of China
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, 130012, People's Republic of China
| | - Yihuai Cai
- School of Science, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China.
| | - Ruihao Xin
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China.
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, People's Republic of China.
| |
Collapse
|
3
|
Jiang L, Xia Z, Zhu R, Gong H, Wang J, Li J, Wang L. Diabetes risk prediction model based on community follow-up data using machine learning. Prev Med Rep 2023; 35:102358. [PMID: 37654514 PMCID: PMC10465943 DOI: 10.1016/j.pmedr.2023.102358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/31/2023] [Accepted: 08/01/2023] [Indexed: 09/02/2023] Open
Abstract
Diabetes is a chronic metabolic disease characterized by hyperglycemia, the follow-up management of diabetes patients is mostly in the community, but the relationship between key lifestyle indicators in community follow-up and the risk of diabetes is unclear. In order to explore the association between key life characteristic indicators of community follow-up and the risk of diabetes, 252,176 follow-up records of people with diabetes patients from 2016 to 2023 were obtained from Haizhu District, Guangzhou. According to the follow-up data, the key life characteristic indicators that affect diabetes are determined, and the optimal feature subset is obtained through feature selection technology to accurately assess the risk of diabetes. A diabetes risk assessment model based on a random forest classifier was designed, which used optimal feature parameter selection and algorithm model comparison, with an accuracy of 91.24% and an AUC corresponding to the ROC curve of 97%. In order to improve the applicability of the model in clinical and real life, a diabetes risk score card was designed and tested using the original data, the accuracy was 95.15%, and the model reliability was high. The diabetes risk prediction model based on community follow-up big data mining can be used for large-scale risk screening and early warning by community doctors based on patient follow-up data, further promoting diabetes prevention and control strategies, and can also be used for wearable devices or intelligent biosensors for individual patient self examination, in order to improve lifestyle and reduce risk factor levels.
Collapse
Affiliation(s)
- Liangjun Jiang
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| | - Zhenhua Xia
- Electronics & Information School of Yangtze University, Jingzhou, China
| | - Ronghui Zhu
- Shenzhen Nanshan Medical Group HQ, Shenzhen, China
| | - Haimei Gong
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| | - Jing Wang
- E-link Wisdom Co., Ltd, Shenzhen, China
| | - Juan Li
- Haizhu District Community Health Development Guidance Center, Guangzhou, China
| | - Lei Wang
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| |
Collapse
|
4
|
Zhou H, Xin Y, Li S. A diabetes prediction model based on Boruta feature selection and ensemble learning. BMC Bioinformatics 2023; 24:224. [PMID: 37264332 DOI: 10.1186/s12859-023-05300-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 04/21/2023] [Indexed: 06/03/2023] Open
Abstract
BACKGROUND AND OBJECTIVE As a common chronic disease, diabetes is called the "second killer" among modern diseases. Currently, there is no medical cure for diabetes. We can only rely on medication for auxiliary treatment. However, many diabetic patients still die each year. In addition, a considerable number of people do not pay attention to their physical health or opt out of treatment due to lack of money, which eventually leads to various complications. Therefore, diagnosing diabetes at an early stage and intervening early is necessary; thus, developing an early detection method for diabetes is essential. METHODS In this study, a diabetes prediction model based on Boruta feature selection and ensemble learning is proposed. The model contains the use of Boruta feature selection, the extraction of salient features from datasets, the use of the K-Means++ algorithm for unsupervised clustering of data and stacking of an ensemble learning method for classification. It has been validated on a diabetes dataset. RESULTS The experiments were performed on the PIMA Indian diabetes dataset. The model was evaluated by accuracy, precision and F1 index. The obtained results show that the accuracy rate of the model reaches 98% and achieves good results. CONCLUSION Compared with other diabetes prediction models, this model achieved better results, and the obtained results indicate that this model is superior to other models in diabetes prediction and has better performance.
Collapse
Affiliation(s)
- Hongfang Zhou
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, China.
- Shaanxi Key Laboratory of Network Computing and Security Technology, Xi'an, 710048, China.
| | - Yinbo Xin
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, China
| | - Suli Li
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, China
| |
Collapse
|
5
|
Emmons KM, Mendez S, Lee RM, Erani D, Mascioli L, Abreu M, Adams S, Daly J, Bierer BE. Data sharing in the context of community-engaged research partnerships. Soc Sci Med 2023; 325:115895. [PMID: 37062144 PMCID: PMC10308954 DOI: 10.1016/j.socscimed.2023.115895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 03/04/2023] [Accepted: 04/06/2023] [Indexed: 04/18/2023]
Abstract
Over the past 20 years, the National Institutes for Health (NIH) has implemented several policies designed to improve sharing of research data, such as the NIH public access policy for publications, NIH genomic data sharing policy, and National Cancer Institute (NCI) Cancer Moonshot public access and data sharing policy. In January 2023, a new NIH data sharing policy has gone into effect, requiring researchers to submit a Data Management and Sharing Plan in proposals for NIH funding (NIH. Supplemental information to the, 2020b; NIH. Final policy for data, 2020a). These policies are based on the idea that sharing data is a key component of the scientific method, as it enables the creation of larger data repositories that can lead to research questions that may not be possible in individual studies (Alter and Gonzalez, 2018; Jwa and Poldrack, 2022), allows enhanced collaboration, and maximizes the federal investment in research. Important questions that we must consider as data sharing is expanded are to whom do benefits of data sharing accrue and to whom do benefits not accrue? In an era of growing efforts to engage diverse communities in research, we must consider the impact of data sharing for all research participants and the communities that they represent. We examine the issue of data sharing through a community-engaged research lens, informed by a long-standing partnership between community-engaged researchers and a key community health organization (Kruse et al., 2022). We contend that without effective community engagement and rich contextual knowledge, biases resulting from data sharing can remain unchecked. We provide several recommendations that would allow better community engagement related to data sharing to ensure both community and researcher understanding of the issues involved and move toward shared benefits. By identifying good models for evaluating the impact of data sharing on communities that contribute data, and then using those models systematically, we will advance the consideration of the community perspective and increase the likelihood of benefits for all.
Collapse
Affiliation(s)
- Karen M Emmons
- Department of Social and Behavioral Science, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA.
| | - Samuel Mendez
- Department of Social and Behavioral Science, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - Rebekka M Lee
- Department of Social and Behavioral Science, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - Diana Erani
- Massachusetts League of Community Health Centers, 40 Court Street, 10th Floor, Boston, MA, 02108, USA
| | - Lynette Mascioli
- Massachusetts League of Community Health Centers, 40 Court Street, 10th Floor, Boston, MA, 02108, USA
| | - Marlene Abreu
- Massachusetts League of Community Health Centers, 40 Court Street, 10th Floor, Boston, MA, 02108, USA
| | - Susan Adams
- Massachusetts League of Community Health Centers, 40 Court Street, 10th Floor, Boston, MA, 02108, USA
| | - James Daly
- Department of Social and Behavioral Science, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - Barbara E Bierer
- Division of Global Health Equity, Department of Medicine, Brigham and Women's Hospital, 75 Francis St., Boston, MA, 02115, USA; Department of Medicine, Harvard Medical School, 25 Shattuck St., Boston, MA, 02115, USA
| |
Collapse
|
6
|
Afsaneh E, Sharifdini A, Ghazzaghi H, Ghobadi MZ. Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review. Diabetol Metab Syndr 2022; 14:196. [PMID: 36572938 PMCID: PMC9793536 DOI: 10.1186/s13098-022-00969-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 12/16/2022] [Indexed: 12/28/2022] Open
Abstract
Diabetes as a metabolic illness can be characterized by increased amounts of blood glucose. This abnormal increase can lead to critical detriment to the other organs such as the kidneys, eyes, heart, nerves, and blood vessels. Therefore, its prediction, prognosis, and management are essential to prevent harmful effects and also recommend more useful treatments. For these goals, machine learning algorithms have found considerable attention and have been developed successfully. This review surveys the recently proposed machine learning (ML) and deep learning (DL) models for the objectives mentioned earlier. The reported results disclose that the ML and DL algorithms are promising approaches for controlling blood glucose and diabetes. However, they should be improved and employed in large datasets to affirm their applicability.
Collapse
|
7
|
Application of data augmentation techniques towards metabolomics. Comput Biol Med 2022; 148:105916. [DOI: 10.1016/j.compbiomed.2022.105916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 07/11/2022] [Accepted: 07/23/2022] [Indexed: 11/22/2022]
|
8
|
Olisah CC, Smith L, Smith M. Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 220:106773. [PMID: 35429810 DOI: 10.1016/j.cmpb.2022.106773] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 01/25/2022] [Accepted: 03/22/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND AND OBJECTIVE Diabetes mellitus is a metabolic disorder characterized by hyperglycemia, which results from the inadequacy of the body to secrete and respond to insulin. If not properly managed or diagnosed on time, diabetes can pose a risk to vital body organs such as the eyes, kidneys, nerves, heart, and blood vessels and so can be life-threatening. The many years of research in computational diagnosis of diabetes have pointed to machine learning to as a viable solution for the prediction of diabetes. However, the accuracy rate to date suggests that there is still much room for improvement. In this paper, we are proposing a machine learning framework for diabetes prediction and diagnosis using the PIMA Indian dataset and the laboratory of the Medical City Hospital (LMCH) diabetes dataset. We hypothesize that adopting feature selection and missing value imputation methods can scale up the performance of classification models in diabetes prediction and diagnosis. METHODS In this paper, a robust framework for building a diabetes prediction model to aid in the clinical diagnosis of diabetes is proposed. The framework includes the adoption of Spearman correlation and polynomial regression for feature selection and missing value imputation, respectively, from a perspective that strengthens their performances. Further, different supervised machine learning models, the random forest (RF) model, support vector machine (SVM) model, and our designed twice-growth deep neural network (2GDNN) model are proposed for classification. The models are optimized by tuning the hyperparameters of the models using grid search and repeated stratified k-fold cross-validation and evaluated for their ability to scale to the prediction problem. RESULTS Through experiments on the PIMA Indian and LMCH diabetes datasets, precision, sensitivity, F1-score, train-accuracy, and test-accuracy scores of 97.34%, 97.24%, 97.26%, 99.01%, 97.25 and 97.28%, 97.33%, 97.27%, 99.57%, 97.33, are achieved with the proposed 2GDNN model, respectively. CONCLUSION The data preprocessing approaches and the classifiers with hyperparameter optimization proposed within the machine learning framework yield a robust machine learning model that outperforms state-of-the-art results in diabetes mellitus prediction and diagnosis. The source code for the models of the proposed machine learning framework has been made publicly available.
Collapse
Affiliation(s)
- Chollette C Olisah
- Centre for Machine Vision, Bristol Robotics Laboratory, University of the West of England, Bristol, UK.
| | - Lyndon Smith
- Centre for Machine Vision, Bristol Robotics Laboratory, University of the West of England, Bristol, UK
| | - Melvyn Smith
- Centre for Machine Vision, Bristol Robotics Laboratory, University of the West of England, Bristol, UK
| |
Collapse
|
9
|
Tuppad A, Patil SD. Machine learning for diabetes clinical decision support: a review. ADVANCES IN COMPUTATIONAL INTELLIGENCE 2022; 2:22. [PMID: 35434723 PMCID: PMC9006199 DOI: 10.1007/s43674-022-00034-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 02/27/2022] [Accepted: 03/03/2022] [Indexed: 12/14/2022]
Abstract
Type 2 diabetes has recently acquired the status of an epidemic silent killer, though it is non-communicable. There are two main reasons behind this perception of the disease. First, a gradual but exponential growth in the disease prevalence has been witnessed irrespective of age groups, geography or gender. Second, the disease dynamics are very complex in terms of multifactorial risks involved, initial asymptomatic period, different short-term and long-term complications posing serious health threat and related co-morbidities. Majority of its risk factors are lifestyle habits like physical inactivity, lack of exercise, high body mass index (BMI), poor diet, smoking except some inevitable ones like family history of diabetes, ethnic predisposition, ageing etc. Nowadays, machine learning (ML) is increasingly being applied for alleviation of diabetes health burden and many research works have been proposed in the literature to offer clinical decision support in different application areas as well. In this paper, we present a review of such efforts for the prevention and management of type 2 diabetes. Firstly, we present the medical gaps in diabetes knowledge base, guidelines and medical practice identified from relevant articles and highlight those that can be addressed by ML. Further, we review the ML research works in three different application areas namely—(1) risk assessment (statistical risk scores and ML-based risk models), (2) diagnosis (using non-invasive and invasive features), (3) prognosis (from normoglycemia/prior morbidity to incident diabetes and prognosis of incident diabetes to related complications). We discuss and summarize the shortcomings or gaps in the existing ML methodologies for diabetes to be addressed in future. This review provides the breadth of ML predictive modeling applications for diabetes while highlighting the medical and technological gaps as well as various aspects involved in ML-based diabetes clinical decision support.
Collapse
Affiliation(s)
- Ashwini Tuppad
- School of Computer Science and Engineering, REVA University, Rukmini Knowledge Park, Kattigenahalli, Bangalore, Karnataka India
| | - Shantala Devi Patil
- School of Computer Science and Engineering, REVA University, Rukmini Knowledge Park, Kattigenahalli, Bangalore, Karnataka India
| |
Collapse
|
10
|
Rabie O, Alghazzawi D, Asghar J, Saddozai FK, Asghar MZ. A Decision Support System for Diagnosing Diabetes Using Deep Neural Network. Front Public Health 2022; 10:861062. [PMID: 35372240 PMCID: PMC8970706 DOI: 10.3389/fpubh.2022.861062] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 02/07/2022] [Indexed: 01/16/2023] Open
Abstract
Background and Objective According to the WHO, diabetes mellitus is a long-term condition marked by high blood sugar levels. The consequences might be far-reaching. According to current increases in mortality, diabetes has risen to number 10 among the leading causes of mortality worldwide. When used to predict diabetes using unbalanced datasets from testing, machine learning (ML) classifiers and established approaches for encoding categorical data have exhibited a broad variety of surprising outcomes. Early studies also made use of an artificial neural network to extract features without obtaining a grasp of the sequence information. Methods This study offers a deep learning-based decision support system (DSS), utilizing bidirectional long/short-term memory (BiLSTM), to accurately predict diabetic illness from patient data. In order to predict diabetes, the BiLSTM hybrid model was used after balancing the data set. Results Unlike earlier studies, this proposed model's trial findings were promising, with an accuracy of 93.07%, 93% precision, 92% recall, and a 92% F1-score. Conclusions Using a BILSTM model for classification outperforms current approaches in the diabetes detection domain.
Collapse
Affiliation(s)
- Osama Rabie
- Information Systems Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Daniyal Alghazzawi
- Information Systems Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Junaid Asghar
- Faculty of Pharmacy, Gomal University, Dera Ismail Khan, Pakistan
| | - Furqan Khan Saddozai
- Institute of Computing and Information Technology, Gomal University, Dera Ismail Khan, Pakistan
| | - Muhammad Zubair Asghar
- Institute of Computing and Information Technology, Gomal University, Dera Ismail Khan, Pakistan
- *Correspondence: Muhammad Zubair Asghar
| |
Collapse
|
11
|
Fine-Tuning Fuzzy KNN Classifier Based on Uncertainty Membership for the Medical Diagnosis of Diabetes. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12030950] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Diabetes, a metabolic disease in which the blood glucose level rises over time, is one of the most common chronic diseases at present. It is critical to accurately predict and classify diabetes to reduce the severity of the disease and treat it early. One of the difficulties that researchers face is that diabetes datasets are limited and contain outliers and missing data. Additionally, there is a trade-off between classification accuracy and operational law for detecting diabetes. In this paper, an algorithm for diabetes classification is proposed for pregnant women using the Pima Indians Diabetes Dataset (PIDD). First, a preprocessing step in the proposed algorithm includes outlier rejection, imputing missing values, the standardization process, and feature selection of the attributes, which enhance the dataset’s quality. Second, the classifier uses the fuzzy KNN method and modifies the membership function based on the uncertainty theory. Third, a grid search method is applied to achieve the best values for tuning the fuzzy KNN method based on uncertainty membership, as there are hyperparameters that affect the performance of the proposed classifier. In turn, the proposed tuned fuzzy KNN based on uncertainty classifiers (TFKNN) deals with the belief degree, handles membership functions and operation law, and avoids making the wrong categorization. The proposed algorithm performs better than other classifiers that have been trained and evaluated, including KNN, fuzzy KNN, naïve Bayes (NB), and decision tree (DT). The results of different classifiers in an ensemble could significantly improve classification precision. The TFKNN has time complexity O(kn2d), and space complexity O(n2d). The TFKNN model has high performance and outperformed the others in all tests in terms of accuracy, specificity, precision, and average AUC, with values of 90.63, 85.00, 93.18, and 94.13, respectively. Additionally, results of empirical analysis of TFKNN compared to fuzzy KNN, KNN, NB, and DT demonstrate the global superiority of TFKNN in precision, accuracy, and specificity.
Collapse
|
12
|
Liu X, Zhang W, Zhang Q, Chen L, Zeng T, Zhang J, Min J, Tian S, Zhang H, Huang H, Wang P, Hu X, Chen L. Development and validation of a machine learning-augmented algorithm for diabetes screening in community and primary care settings: A population-based study. Front Endocrinol (Lausanne) 2022; 13:1043919. [PMID: 36518245 PMCID: PMC9742532 DOI: 10.3389/fendo.2022.1043919] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/11/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Opportunely screening for diabetes is crucial to reduce its related morbidity, mortality, and socioeconomic burden. Machine learning (ML) has excellent capability to maximize predictive accuracy. We aim to develop ML-augmented models for diabetes screening in community and primary care settings. METHODS 8425 participants were involved from a population-based study in Hubei, China since 2011. The dataset was split into a development set and a testing set. Seven different ML algorithms were compared to generate predictive models. Non-laboratory features were employed in the ML model for community settings, and laboratory test features were further introduced in the ML+lab models for primary care. The area under the receiver operating characteristic curve (AUC), area under the precision-recall curve (auPR), and the average detection costs per participant of these models were compared with their counterparts based on the New China Diabetes Risk Score (NCDRS) currently recommended for diabetes screening. RESULTS The AUC and auPR of the ML model were 0·697and 0·303 in the testing set, seemingly outperforming those of NCDRS by 10·99% and 64·67%, respectively. The average detection cost of the ML model was 12·81% lower than that of NCDRS with the same sensitivity (0·72). Moreover, the average detection cost of the ML+FPG model is the lowest among the ML+lab models and less than that of the ML model and NCDRS+FPG model. CONCLUSION The ML model and the ML+FPG model achieved higher predictive accuracy and lower detection costs than their counterpart based on NCDRS. Thus, the ML-augmented algorithm is potential to be employed for diabetes screening in community and primary care settings.
Collapse
Affiliation(s)
- XiaoHuan Liu
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - Weiyue Zhang
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - Qiao Zhang
- Department of Cardiovascular Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Long Chen
- Department of Computer Science and Technology, Tsinghua University, Beijing, China
| | - TianShu Zeng
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - JiaoYue Zhang
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - Jie Min
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - ShengHua Tian
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - Hao Zhang
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | | | - Ping Wang
- Precision Health Program, Department of Radiology, College of Human Medicine, Michigan State University, East Lansing, MI, United States
| | - Xiang Hu
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
- *Correspondence: LuLu Chen, ; Xiang Hu,
| | - LuLu Chen
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
- *Correspondence: LuLu Chen, ; Xiang Hu,
| |
Collapse
|
13
|
Fregoso-Aparicio L, Noguez J, Montesinos L, García-García JA. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol Metab Syndr 2021; 13:148. [PMID: 34930452 PMCID: PMC8686642 DOI: 10.1186/s13098-021-00767-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 12/07/2021] [Indexed: 12/12/2022] Open
Abstract
Diabetes Mellitus is a severe, chronic disease that occurs when blood glucose levels rise above certain limits. Over the last years, machine and deep learning techniques have been used to predict diabetes and its complications. However, researchers and developers still face two main challenges when building type 2 diabetes predictive models. First, there is considerable heterogeneity in previous studies regarding techniques used, making it challenging to identify the optimal one. Second, there is a lack of transparency about the features used in the models, which reduces their interpretability. This systematic review aimed at providing answers to the above challenges. The review followed the PRISMA methodology primarily, enriched with the one proposed by Keele and Durham Universities. Ninety studies were included, and the type of model, complementary techniques, dataset, and performance parameters reported were extracted. Eighteen different types of models were compared, with tree-based algorithms showing top performances. Deep Neural Networks proved suboptimal, despite their ability to deal with big and dirty data. Balancing data and feature selection techniques proved helpful to increase the model's efficiency. Models trained on tidy datasets achieved almost perfect models.
Collapse
Affiliation(s)
- Luis Fregoso-Aparicio
- School of Engineering and Sciences, Tecnologico de Monterrey, Av Lago de Guadalupe KM 3.5, Margarita Maza de Juarez, 52926 Cd Lopez Mateos, Mexico
| | - Julieta Noguez
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - Luis Montesinos
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - José A. García-García
- Hospital General de Mexico Dr. Eduardo Liceaga, Dr. Balmis 148, Doctores, Cuauhtemoc, 06720 Mexico City, Mexico
| |
Collapse
|
14
|
Atteia G, Abdel Samee N, Zohair Hassan H. DFTSA-Net: Deep Feature Transfer-Based Stacked Autoencoder Network for DME Diagnosis. ENTROPY 2021; 23:e23101251. [PMID: 34681974 PMCID: PMC8534911 DOI: 10.3390/e23101251] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 09/14/2021] [Accepted: 09/23/2021] [Indexed: 12/13/2022]
Abstract
Diabetic macular edema (DME) is the most common cause of irreversible vision loss in diabetes patients. Early diagnosis of DME is necessary for effective treatment of the disease. Visual detection of DME in retinal screening images by ophthalmologists is a time-consuming process. Recently, many computer-aided diagnosis systems have been developed to assist doctors by detecting DME automatically. In this paper, a new deep feature transfer-based stacked autoencoder neural network system is proposed for the automatic diagnosis of DME in fundus images. The proposed system integrates the power of pretrained convolutional neural networks as automatic feature extractors with the power of stacked autoencoders in feature selection and classification. Moreover, the system enables extracting a large set of features from a small input dataset using four standard pretrained deep networks: ResNet-50, SqueezeNet, Inception-v3, and GoogLeNet. The most informative features are then selected by a stacked autoencoder neural network. The stacked network is trained in a semi-supervised manner and is used for the classification of DME. It is found that the introduced system achieves a maximum classification accuracy of 96.8%, sensitivity of 97.5%, and specificity of 95.5%. The proposed system shows a superior performance over the original pretrained network classifiers and state-of-the-art findings.
Collapse
Affiliation(s)
- Ghada Atteia
- Information Technology Department, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, Riyadh 11461, Saudi Arabia;
- Correspondence: or
| | - Nagwan Abdel Samee
- Information Technology Department, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, Riyadh 11461, Saudi Arabia;
- Computer Engineering Department, Misr University for Science and Technology, Giza 12511, Egypt
| | - Hassan Zohair Hassan
- Department of Mechanical Engineering, College of Engineering, Alfaisal University, Takhassusi Street, P.O. Box 50927, Riyadh 11533, Saudi Arabia;
| |
Collapse
|