1
|
Phan LT, Rakkiyappan R, Manavalan B. REMED-T2D: A robust ensemble learning model for early detection of type 2 diabetes using healthcare dataset. Comput Biol Med 2025; 187:109771. [PMID: 39914204 DOI: 10.1016/j.compbiomed.2025.109771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 12/31/2024] [Accepted: 01/29/2025] [Indexed: 02/21/2025]
Abstract
Early diagnosis and timely treatment of diabetes are critical for effective disease management and the prevention of complications. Undiagnosed diabetes can lead to an increased risk of several health issues. Although numerous machine learning (ML) models have been designed to detect diabetes, many exhibit unsatisfactory performance, are not publicly available, and lack validation on external datasets. To address these limitations, we have developed REMED-T2D, an advanced ensemble ML approach that enhances predictive accuracy and robustness through the integration of diverse ML algorithms. Our approach involves a rigorous data preprocessing process and systematic evaluation of 20 different algorithms, encompassing both conventional ML and deep learning for diabetes prediction. Firstly, we applied an under-sampling approach to an imbalanced Pima Indian Diabetes dataset and generated five balanced datasets. Using these datasets, we investigated various computational strategies to select the optimal model for accurate diabetes classification. Our results demonstrate that REMED-T2D outperformed state-of-the-art methods on the training dataset, with notable improvements in ACC (1.40-4.60%) and MCC (3.50-9.80%). Extensive external validations revealed that the model trained on a five-feature subset achieved ACC of 92.61 % and 92.26 % on the RTML1 and Pabna datasets, respectively. Moreover, a model based on a seven-feature subset improved ACC by 2.80 % and MCC by 13.27 % on the RTML2 dataset. These results suggest the potential of REMED-T2D to predict diabetes in Asian females. Notably, this is the first study to conduct such a comprehensive analysis using the Pima dataset, incorporating a diverse set of ML algorithms. Furthermore, we have developed a publicly accessible web server (https://balalab-skku.org/REMED-T2D/) to facilitate self-monitoring and timely medical interventions. We believe REMED-T2D will assist healthcare professionals in detecting diabetes earlier and implementing preventive measures, ultimately improving health outcomes for those at risk.
Collapse
Affiliation(s)
- Le Thi Phan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16149, Gyeonggi-do, Republic of Korea
| | - Rajan Rakkiyappan
- Department of Mathematics, Bharathiar University, Coimbatore, 641046, Tamil Nadu, India
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16149, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
2
|
Shams MY, Tarek Z, Elshewey AM. A novel RFE-GRU model for diabetes classification using PIMA Indian dataset. Sci Rep 2025; 15:982. [PMID: 39762262 PMCID: PMC11704062 DOI: 10.1038/s41598-024-82420-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2024] [Accepted: 12/05/2024] [Indexed: 01/11/2025] Open
Abstract
Diabetes is a long-term condition characterized by elevated blood sugar levels. It can lead to a variety of complex disorders such as stroke, renal failure, and heart attack. Diabetes requires the most machine learning help to diagnose diabetes illness at an early stage, as it cannot be treated and adds significant complications to our health-care system. The diabetes PIMA Indian dataset (PIDD) was used for classification in several studies, it includes 768 instances and 9 features; eight of the features are the predictors, and one feature is the target. Firstly, we performed the preprocessing stage that includes mean imputation and data normalization. Afterwards, we trained the extracted features using various types of Machine Learning (ML); Random Forest (RF), Logistic Regression (LR), K-Nearest neighbor (KNN), Naïve Bayes (NB), Histogram Gradient Boost (HGB), and Gated Recurrent Unit (GRU) models. To achieve the classification for the PIDD, a new model called Recursive Feature Elimination-GRU (RFE-GRU) is proposed in this paper. RFE is vital for selecting features in the training dataset that are most important in predicting the target variable. While the GRU handles the challenge of vanishing and inflating gradient of the features results from RFE. Several predictive evaluation metrics, including precision, recall, F1-score, accuracy, and Area Under the Curve (AUC) achieved 90.50%, 90.70%, 90.50%, 90.70%, 0.9278, respectively, to verify and validate the execution of the RFE-GRU model. The comparative results showed that the RFE-GRU model is better than other classification models.
Collapse
Affiliation(s)
- Mahmoud Y Shams
- Faculty of Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh, 33516, Egypt.
| | - Zahraa Tarek
- Faculty of Computers and Information, Computer Science Department, Mansoura University, Mansoura, 35561, Egypt
| | - Ahmed M Elshewey
- Department of Computer Science, Faculty of Computers and Information, Suez University, P. O. Box 43221, Suez, Egypt
| |
Collapse
|
3
|
Altamimi A, Alarfaj AA, Umer M, Alabdulqader EA, Alsubai S, Kim TH, Ashraf I. An automated approach to predict diabetic patients using KNN imputation and effective data mining techniques. BMC Med Res Methodol 2024; 24:221. [PMID: 39333904 PMCID: PMC11438170 DOI: 10.1186/s12874-024-02324-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 08/28/2024] [Indexed: 09/30/2024] Open
Abstract
Diabetes is thought to be the most common illness in underdeveloped nations. Early detection and competent medical care are crucial steps in reducing the effects of diabetes. Examining the signs associated with diabetes is one of the most effective ways to identify the condition. The problem of missing data is not very well investigated in existing works. In addition, existing studies on diabetes detection lack accuracy and robustness. The available datasets frequently contain missing information for the automated detection of diabetes, which might negatively impact machine learning model performance. This work suggests an automated diabetes prediction method that achieves high accuracy and effectively manages missing variables in order to address this problem. The proposed strategy employs a stacked ensemble voting classifier model with three machine learning models. and a KNN Imputer to handle missing values. Using the KNN imputer, the suggested model performs exceptionally well, with accuracy, precision, recall, F1 score, and MCC of 98.59%, 99.26%, 99.75%, 99.45%, and 99.24%, respectively. In two scenarios one with missing values eliminated and the other with KNN imputer, the study thoroughly compared the suggested model with seven other machine learning techniques. The outcomes demonstrate the superiority of the suggested model over current state-of-the-art methods and confirm its efficacy. This work demonstrates the capability of KNN imputer and looks at the problem of missing values for diabetes detection. Medical professionals can utilize the results to improve care for diabetes patients and discover problems early.
Collapse
Affiliation(s)
- Abdulaziz Altamimi
- College of Computer Science and Engineering, University of Hafr Al-Batin, Hafr Al-Batin, 39524, Saudi Arabia
| | - Aisha Ahmed Alarfaj
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, 11671, Saudi Arabia
| | - Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, 63100, Pakistan
| | - Ebtisam Abdullah Alabdulqader
- Department of Information Technology, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al-Kharj, 11942, Saudi Arabia
| | - Tai-Hoon Kim
- School of Electrical and Computer Engineering, Chonnam National University, 50, Daehak-ro, Yeosu-si, 59626, Republic of Korea.
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, 38541, Republic of Korea.
| |
Collapse
|
4
|
Baqir A, Ali M, Jaffar S, Sherazi HHR, Lee M, Bashir AK, Al Dabel MM. Identifying COVID-19 survivors living with post-traumatic stress disorder through machine learning on Twitter. Sci Rep 2024; 14:18902. [PMID: 39143145 PMCID: PMC11325037 DOI: 10.1038/s41598-024-69687-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 08/07/2024] [Indexed: 08/16/2024] Open
Abstract
The COVID-19 pandemic has disrupted people's lives and caused significant economic damage around the world, but its impact on people's mental health has not been paid due attention by the research community. According to anecdotal data, the pandemic has raised serious concerns related to mental health among the masses. However, no systematic investigations have been conducted previously on mental health monitoring and, in particular, detection of post-traumatic stress disorder (PTSD). The goal of this study is to use classical machine learning approaches to classify tweets into COVID-PTSD positive or negative categories. To this end, we employed various Machine Learning (ML) classifiers, to segregate the psychotic difficulties with the user's PTSD in the context of COVID-19, including Random Forest Support Vector Machine, Naïve Bayes, and K-Nearest Neighbor. ML models are trained and tested using various combinations of feature selection strategies to get the best possible combination. Based on our experimentation on real-world dataset, we demonstrate our model's effectiveness to perform classification with an accuracy of 83.29% using Support Vector Machine as classifier and unigram as a feature pattern.
Collapse
Affiliation(s)
- Anees Baqir
- Complex Human Behavior Laboratory, Fondazione Bruno Kessler, Trento, Italy
- Northeastern University, London, UK
| | - Mubashir Ali
- School of Computer Science, University of Birmingham, Birmingham, UK
| | | | | | - Mark Lee
- School of Computer Science, University of Birmingham, Birmingham, UK
| | - Ali Kashif Bashir
- Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, UK
- Woxsen School of Business, Woxsen University, Hyderabad, 502 345, India
- Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon
| | - Maryam M Al Dabel
- Department of Computer Science and Engineering, College of Computer Science and Engineering, University of Hafr Al Batin, Hafar Al-Batin, Saudi Arabia
| |
Collapse
|
5
|
Adelson RP, Garikipati A, Zhou Y, Ciobanu M, Tawara K, Barnes G, Singh NP, Mao Q, Das R. Machine Learning Approach with Harmonized Multinational Datasets for Enhanced Prediction of Hypothyroidism in Patients with Type 2 Diabetes. Diagnostics (Basel) 2024; 14:1152. [PMID: 38893680 PMCID: PMC11172278 DOI: 10.3390/diagnostics14111152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 05/24/2024] [Accepted: 05/29/2024] [Indexed: 06/21/2024] Open
Abstract
Type 2 diabetes (T2D) is a global health concern with increasing prevalence. Comorbid hypothyroidism (HT) exacerbates kidney, cardiac, neurological and other complications of T2D; these risks can be mitigated pharmacologically upon detecting HT. The current HT standard of care (SOC) screening in T2D is infrequent, delaying HT diagnosis and treatment. We present a first-to-date machine learning algorithm (MLA) clinical decision tool to classify patients as low vs. high risk for developing HT comorbid with T2D; the MLA was developed using readily available patient data from harmonized multinational datasets. The MLA was trained on data from NIH All of US (AoU) and UK Biobank (UKBB) (Combined dataset) and achieved a high negative predictive value (NPV) of 0.989 and an AUROC of 0.762 in the Combined dataset, exceeding AUROCs for the models trained on AoU or UKBB alone (0.666 and 0.622, respectively), indicating that increasing dataset diversity for MLA training improves performance. This high-NPV automated tool can supplement SOC screening and rule out T2D patients with low HT risk, allowing for the prioritization of lab-based testing for at-risk patients. Conversely, an MLA output that designates a patient to be at risk of developing HT allows for tailored clinical management and thereby promotes improved patient outcomes.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Qingqing Mao
- Montera, Inc. dba Forta, 548 Market St, PMB 89605, San Francisco, CA 94104-5401, USA; (R.P.A.); (A.G.); (Y.Z.); (M.C.); (K.T.); (G.B.); (N.P.S.); (R.D.)
| | | |
Collapse
|
6
|
Al Sadi K, Balachandran W. Leveraging a 7-Layer Long Short-Term Memory Model for Early Detection and Prevention of Diabetes in Oman: An Innovative Approach. Bioengineering (Basel) 2024; 11:379. [PMID: 38671800 PMCID: PMC11048439 DOI: 10.3390/bioengineering11040379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 03/28/2024] [Accepted: 04/12/2024] [Indexed: 04/28/2024] Open
Abstract
This study develops a 7-layer Long Short-Term Memory (LSTM) model to enhance early diabetes detection in Oman, aligning with the theme of 'Artificial Intelligence in Healthcare'. The model focuses on addressing the increasing prevalence of Type 2 diabetes, projected to impact 23.8% of Oman's population by 2050. It employs LSTM neural networks to manage factors contributing to this rise, including obesity and genetic predispositions, and aims to bridge the gap in public health awareness and prevention. The model's performance is evaluated through various metrics. It achieves an accuracy of 99.40%, specificity and sensitivity of 100% for positive cases, a recall of 99.34% for negative cases, an F1 score of 96.24%, and an AUC score of 94.51%. These metrics indicate the model's capability in diabetes detection. The implementation of this LSTM model in Oman's healthcare system is proposed to enhance early detection and prevention of diabetes. This approach reflects an application of AI in addressing a significant health concern, with potential implications for similar healthcare challenges relating to globally diagnostic capabilities, representing a significant leap forward in healthcare technology in Oman.
Collapse
Affiliation(s)
- Khoula Al Sadi
- Department of Electronic and Electrical Engineering Research, Brunel University London, Uxbridge UB8 3PH, UK;
- Information Technology Department, University of Technology and Applied Sciences-Al-Mussanha, P.O. Box 13, Muladdah 314, Oman
| | - Wamadeva Balachandran
- Department of Electronic and Electrical Engineering Research, Brunel University London, Uxbridge UB8 3PH, UK;
| |
Collapse
|
7
|
Kharya S, Soni S, Pati A, Panigrahi A, Giri J, Qin H, Mallik S, Nayak DSK, Swarnkar T. Weighted Bayesian Belief Network for diabetics: a predictive model. Front Artif Intell 2024; 7:1357121. [PMID: 38665371 PMCID: PMC11043522 DOI: 10.3389/frai.2024.1357121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 03/27/2024] [Indexed: 04/28/2024] Open
Abstract
Diabetes is an enduring metabolic condition identified by heightened blood sugar levels stemming from insufficient production of insulin or ineffective utilization of insulin within the body. India is commonly labeled as the "diabetes capital of the world" owing to the widespread prevalence of this condition. To the best of the authors' last knowledge updated on September 2021, approximately 77 million adults in India were reported to be affected by diabetes, reported by the International Diabetes Federation. Owing to the concealed early symptoms, numerous diabetic patients go undiagnosed, leading to delayed treatment. While Computational Intelligence approaches have been utilized to improve the prediction rate, a significant portion of these methods lacks interpretability, primarily due to their inherent black box nature. Rule extraction is frequently utilized to elucidate the opaque nature inherent in machine learning algorithms. Moreover, to resolve the black box nature, a method for extracting strong rules based on Weighted Bayesian Association Rule Mining is used so that the extracted rules to diagnose any disease such as diabetes can be very transparent and easily analyzed by the clinical experts, enhancing the interpretability. The WBBN model is constructed utilizing the UCI machine learning repository, demonstrating a performance accuracy of 95.8%.
Collapse
Affiliation(s)
- Shweta Kharya
- Department of Computer Science and Engineering, Bhilai Institute of Technology, Durg, Chhattisgarh, India
| | - Sunita Soni
- Department of Computer Science and Engineering, Bhilai Institute of Technology, Durg, Chhattisgarh, India
| | - Abhilash Pati
- Department of Computer Science and Engineering, Siksha ‘O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
| | - Amrutanshu Panigrahi
- Department of Computer Science and Engineering, Siksha ‘O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
| | - Jayant Giri
- Department of Mechanical Engineering, Yeshwantrao Chavan College of Engineering, Nagpur, India
| | - Hong Qin
- Department of Computer Science and Engineering, University of Tennessee at Chattanooga, Chattanooga, TN, United States
| | - Saurav Mallik
- Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA, United States
| | - Debasish Swapnesh Kumar Nayak
- Department of Computer Science and Engineering, Siksha ‘O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
| | - Tripti Swarnkar
- Department of Computer Science and Engineering, Siksha ‘O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
| |
Collapse
|
8
|
Perumalraja R, Felcia Logan's Deshna B, Swetha N. Statistical performance review on diagnosis of leukemia, glaucoma and diabetes mellitus using AI. Stat Med 2024; 43:1227-1237. [PMID: 38247116 DOI: 10.1002/sim.10004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 12/28/2023] [Accepted: 12/29/2023] [Indexed: 01/23/2024]
Abstract
The growth of artificial intelligence (AI) in the healthcare industry tremendously increases the patient outcomes by reshaping the way we diagnose, treat and monitor patients. AI-based innovation in healthcare include exploration of drugs, personalized medicine, clinical diagnosis investigations, robotic-assisted surgery, verified prescriptions, pregnancy care for women, radiology, and reviewed patient information analytics. However, prediction of AI-based solutions are depends mainly on the implementation of statistical algorithms and input data set. In this article, statistical performance review on various algorithms, Accuracy, Precision, Recall and F1-Score used to predict the diagnosis of leukemia, glaucoma, and diabetes mellitus is presented. Review on statistical algorithms' performance, used for individual disease diagnosis gives a complete picture of various research efforts during the last two decades. At the end of statistical review on each disease diagnosis, we have discussed our inferences that will give future directions for the new researchers on selection of AI statistical algorithm as well as the input data set.
Collapse
Affiliation(s)
- Rengaraju Perumalraja
- Department of Information Technology, Velammal College of Engineering and Technology, Madurai, India
| | - B Felcia Logan's Deshna
- Department of Information Technology, Velammal College of Engineering and Technology, Madurai, India
| | - N Swetha
- Department of Information Technology, Velammal College of Engineering and Technology, Madurai, India
| |
Collapse
|
9
|
Stephen BUA, Uzoewulu BC, Asuquo PM, Ozuomba S. Diabetes and hypertension MobileHealth systems: a review of general challenges and advancements. JOURNAL OF ENGINEERING AND APPLIED SCIENCE 2023; 70:78. [DOI: 10.1186/s44147-023-00240-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 06/14/2023] [Indexed: 01/06/2025]
Abstract
AbstractMobile health (mHealth) systems are sipping into more and more healthcare functions with self-management being the foremost modus operandi. However, there has been challenges. This study explores challenges with mHealth self-management of diabetes and hypertension, two of the most comorbid chronic diseases. Existing literature present the challenges in fragments, certain subsets of the challenges at a time. Nevertheless, feedback from patient/users in extant literature depict very variegated concerns that are also interdependent. This work pursues provision of an encyclopedic, but not redundant, view of the challenges with mHealth systems for self-management of diabetes and hypertension.Furthermore, the work identifies machine learning (ML) and self-management approaches as potential drivers of potency of diabetes and hypertension mobile health systems. The nexus between ML and diabetes and hypertension mHealth systems was found to be under-explored. For ML contributions to management of diabetes, we found that machine learning has been applied most to diabetes prediction followed by diagnosis, with therapy in distant third. For diabetes therapy research, only physical and dietary therapy were emphasized in reviewed literature. The four most considered performance metrics were accuracy, ROC-AUC, sensitivity, and specificity. Random forest was the best performing algorithm across all metrics, for all purposes covered in the literature. For hypertension, in descending order, hypertension prediction, prediction of risk factors, and prediction of prehypertension were most considered areas of hypertension management witnessing application of machine learning. SVM averaged best ML algorithm in accuracy and sensitivity, while random forest averaged best performing in specificity and ROC-AUC.
Collapse
|
10
|
Feng X, Cai Y, Xin R. Optimizing diabetes classification with a machine learning-based framework. BMC Bioinformatics 2023; 24:428. [PMID: 37957549 PMCID: PMC10644638 DOI: 10.1186/s12859-023-05467-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 09/04/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND Diabetes is a metabolic disorder usually caused by insufficient secretion of insulin from the pancreas or insensitivity of cells to insulin, resulting in long-term elevated blood sugar levels in patients. Patients usually present with frequent urination, thirst, and hunger. If left untreated, it can lead to various complications that can affect essential organs and even endanger life. Therefore, developing an intelligent diagnosis framework for diabetes is necessary. RESULT This paper proposes a machine learning-based diabetes classification framework machine learning optimized GAN. The framework encompasses several methodological approaches to address the diverse challenges encountered during the analysis. These approaches encompass the implementation of the mean and median joint filling method for handling missing values, the application of the cap method for outlier processing, and the utilization of SMOTEENN to mitigate sample imbalance. Additionally, the framework incorporates the employment of the proposed Diabetes Classification Model based on Generative Adversarial Network and employs logistic regression for detailed feature analysis. The effectiveness of the framework is evaluated using both the PIMA dataset and the diabetes dataset obtained from the GEO database. The experimental findings showcase our model achieved exceptional results, including a binary classification accuracy of 96.27%, tertiary classification accuracy of 99.31%, precision and f1 score of 0.9698, recall of 0.9698, and an AUC of 0.9702. CONCLUSION The experimental results show that the framework proposed in this paper can accurately classify diabetes and provide new ideas for intelligent diagnosis of diabetes.
Collapse
Affiliation(s)
- Xin Feng
- School of Science, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China
- State Key Laboratory of Inorganic Synthesis and Preparative Chemistry, College of Chemistry, Jilin University, Changchun, 130012, People's Republic of China
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, 130012, People's Republic of China
| | - Yihuai Cai
- School of Science, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China.
| | - Ruihao Xin
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China.
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, People's Republic of China.
| |
Collapse
|
11
|
Li M, Jiang Y, Zhang Y, Zhu H. Medical image analysis using deep learning algorithms. Front Public Health 2023; 11:1273253. [PMID: 38026291 PMCID: PMC10662291 DOI: 10.3389/fpubh.2023.1273253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 10/05/2023] [Indexed: 12/01/2023] Open
Abstract
In the field of medical image analysis within deep learning (DL), the importance of employing advanced DL techniques cannot be overstated. DL has achieved impressive results in various areas, making it particularly noteworthy for medical image analysis in healthcare. The integration of DL with medical image analysis enables real-time analysis of vast and intricate datasets, yielding insights that significantly enhance healthcare outcomes and operational efficiency in the industry. This extensive review of existing literature conducts a thorough examination of the most recent deep learning (DL) approaches designed to address the difficulties faced in medical healthcare, particularly focusing on the use of deep learning algorithms in medical image analysis. Falling all the investigated papers into five different categories in terms of their techniques, we have assessed them according to some critical parameters. Through a systematic categorization of state-of-the-art DL techniques, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), Long Short-term Memory (LSTM) models, and hybrid models, this study explores their underlying principles, advantages, limitations, methodologies, simulation environments, and datasets. Based on our results, Python was the most frequent programming language used for implementing the proposed methods in the investigated papers. Notably, the majority of the scrutinized papers were published in 2021, underscoring the contemporaneous nature of the research. Moreover, this review accentuates the forefront advancements in DL techniques and their practical applications within the realm of medical image analysis, while simultaneously addressing the challenges that hinder the widespread implementation of DL in image analysis within the medical healthcare domains. These discerned insights serve as compelling impetuses for future studies aimed at the progressive advancement of image analysis in medical healthcare research. The evaluation metrics employed across the reviewed articles encompass a broad spectrum of features, encompassing accuracy, sensitivity, specificity, F-score, robustness, computational complexity, and generalizability.
Collapse
Affiliation(s)
- Mengfang Li
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Yuanyuan Jiang
- Department of Cardiovascular Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Yanzhou Zhang
- Department of Cardiovascular Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Haisheng Zhu
- Department of Cardiovascular Medicine, Wencheng People’s Hospital, Wencheng, China
| |
Collapse
|
12
|
Surya J, Kashyap H, Nadig RR, Raman R. Developing a Risk Stratification Model Based on Machine Learning for Targeted Screening of Diabetic Retinopathy in the Indian Population. Cureus 2023; 15:e45853. [PMID: 37881381 PMCID: PMC10595397 DOI: 10.7759/cureus.45853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/24/2023] [Indexed: 10/27/2023] Open
Abstract
OBJECTIVE This study aimed to develop a predictive risk score model based on deep learning (DL) independent of fundus photography, totally reliant on systemic data through targeted screening from a population-based study to diagnose diabetic retinopathy (DR) in the Indian population. METHODS It involved machine learning application on datasets of a cross-sectional population-based study. A total of 1425 subjects (1175 subjects with known diabetes and 250 with newly diagnosed diabetes) were included in the study. We applied five machine learning algorithms, random forest (RF), logistic regression (LR), support vector machines (SVM), artificial neural networks (ANN), and decision trees (DT), to predict diabetic retinopathy in our datasets. We incorporated a percentage split in the first experiment and randomly divided our data set into 80% as a training set and 20% as a test set. We performed a three-way data split in the second experiment to prevent overestimating predictive performance. We randomly divided our data set into 60% as a training set, 20% as a validation set, and 20% as the test set. Furthermore, we integrated five-fold cross-validation to split the percentage to evaluate our method. We judged the predictive performance based on the receiver operating characteristic (ROC) curve, the area under the curve (AUC), accuracy (Acc), sensitivity, and specificity. RESULTS The RF classifier achieved the best prediction performance with AUC, Acc, and sensitivity values of 0.91, 0.89, and 0.90, respectively, in the percentage split. Similarly, a three-way data split attained an outcome of 0.86 and 0.85 in AUC and Acc. Likewise, the five-fold cross-validation performed the best with results of 0.90, 0.97, 0.91, and 0.75 in AUC, Acc, sensitivity, and specificity, respectively. CONCLUSION Since the RF classifier achieved the best performance, we propose it to identify diabetic retinopathy for targeted screening in the general population.
Collapse
Affiliation(s)
- Janani Surya
- Epidemiology and Biostatistics, National Institute of Epidemiology, Chennai, IND
| | - Himanshu Kashyap
- Shri Bhagwan Mahavir Vitreoretinal Services, Medical Research Foundation, Sankara Nethralaya, Chennai, IND
| | - Ramya R Nadig
- Shri Bhagwan Mahavir Vitreoretinal Services, Medical Research Foundation, Sankara Nethralaya, Chennai, IND
| | - Rajiv Raman
- Shri Bhagwan Mahavir Vitreoretinal Services, Medical Research Foundation, Sankara Nethralaya, Chennai, IND
| |
Collapse
|
13
|
Verma N, Singh S, Prasad D. Performance analysis and comparison of Machine Learning and LoRa-based Healthcare model. Neural Comput Appl 2023; 35:12751-12761. [PMID: 37192938 PMCID: PMC9989556 DOI: 10.1007/s00521-023-08411-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 02/13/2023] [Indexed: 03/09/2023]
Abstract
Diabetes Mellitus (DM) is a widespread condition that is one of the main causes of health disasters around the world, and health monitoring is one of the sustainable development topics. Currently, the Internet of Things (IoT) and Machine Learning (ML) technologies work together to provide a reliable method of monitoring and predicting Diabetes Mellitus. In this paper, we present the performance of a model for patient real-time data collection that employs the Hybrid Enhanced Adaptive Data Rate (HEADR) algorithm for the Long-Range (LoRa) protocol of the IoT. On the Contiki Cooja simulator, the LoRa protocol's performance is measured in terms of high dissemination and dynamic data transmission range allocation. Furthermore, by employing classification methods for the detection of diabetes severity levels on acquired data via the LoRa (HEADR) protocol, Machine Learning prediction takes place. For prediction, a variety of Machine Learning classifiers are employed, and the final results are compared with the already existing models where the Random Forest and Decision Tree classifiers outperform the others in terms of precision, recall, F-measure, and receiver operating curve (ROC) in the Python programming language. We also discovered that using k-fold cross-validation on k-neighbors, Logistic regression (LR), and Gaussian Nave Bayes (GNB) classifiers boosted the accuracy.
Collapse
Affiliation(s)
- Navneet Verma
- Computer Science and Engineering Department, DCRUST, Murthal, Sonipat, 131027 India
| | - Sukhdip Singh
- Computer Science and Engineering Department, DCRUST, Murthal, Sonipat, 131027 India
| | - Devendra Prasad
- Computer Science and Engineering Department, PIET, Samalkha, Panipat, 132103 India
| |
Collapse
|
14
|
Diabetes Mellitus Disease Prediction and Type Classification Involving Predictive Modeling Using Machine Learning Techniques and Classifiers. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING 2022. [DOI: 10.1155/2022/7899364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The Diabetes-Mellitus (DM) disease is considered a persistent ailment that is triggered by excessive sugar levels in the blood of a person. It gives rise to severe health complications when left untreated and can also give rise to related diseases such as cardiac attack, nervous damage, foot problems, liver and kidney damage, and eye problems. These problems are caused by a series of factors interrelated to one another such as age, gender, family history, BMI, and Blood Glucose. Various Machine-Learning (ML) algorithms are being used in order to predict and detect the disease to avoid further complications of health. The Diabetes prediction process can be further improvised by identifying the type a person is being affected by and the probability of the occurrence of the related diseases. In order to perform the mentioned task, two types of the dataset are used in the study, namely, PIMA and a clinical survey dataset. Various ML algorithms such as Random Forest, Light Gradient Boosting Machine, Gradient Boosting Machine, Support Vector Machine, Decision Tree, and XGBoost are being used. The performance metrics used are accuracy, precision, recall, specificity, and sensitivity. Techniques such as Data Augmentation and Sampling are used. In comparison with the research conducted previously, the paper focuses on improvisation of the accuracy with a percentage of 95.20 using the LGBM Classifier, and Diabetes is also classified as Prediabetes or Diabetes using many Classification mechanisms.
Collapse
|
15
|
Predictive Analysis of Diabetes-Risk with Class Imbalance. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:3078025. [PMID: 36268149 PMCID: PMC9578843 DOI: 10.1155/2022/3078025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 08/31/2022] [Accepted: 09/05/2022] [Indexed: 11/29/2022]
Abstract
Diabetes type 2 (T2DM) is a common chronic disease, increasingly leading to many complications and affecting vital organs. Hyperglycemia is the main characteristic caused by insufficient insulin secretion and poses a serious risk to human health. The objective is to construct a type-2 diabetes prediction model with high classification accuracy. Advanced machine learning and predictive model techniques are utilized to achieve cutting-edge techniques for the early diagnosis of diabetes. This paper proposes an efficient performance model to predict and classify the minority class of type-2 diabetes. The impact of oversampling and undersampling approaches to reduce the effect of an unbalanced class has been compared to classification performance algorithms. Synthetic Minority Oversampling (SMOTE) and Tomek-links techniques are applied and examined. The outcomes were then compared to the original unbalanced dataset using an artificial neural network (ANN) predictive model. The model is compared with other state-of-the-art classifiers such as support vector machine (SVM), random forest (RF), and decision tree (DT). The tuned model had the best accuracy of 92.2%. The experimental findings clearly manifest the improvement in accuracy and evaluation metrics in terms of AUC and F1-measure using the SMOTE oversampling strategy rather than the baseline and undersampling schemes. The study recommends adopting dynamic hyperparameter optimization to further improve accuracy.
Collapse
|
16
|
Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation. ADVANCES IN HUMAN-COMPUTER INTERACTION 2022. [DOI: 10.1155/2022/9220560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The technical improvements in healthcare sector today have given rise to many new inventions in the field of artificial intelligence. Patterns for disease identification are carried out, and the onset of prediction of many diseases is detected. Diseases include diabetes mellitus disease, fatal heart diseases, and symptomatic cancer. There are many algorithms that have played a critical role in the prediction of diseases. This paper proposes an ML based approach for diabetes mellitus disease prediction. For diabetes prediction, many ML algorithms are compared and used in the proposed work, and finally the three ML classifiers providing the highest accuracy are determined: RF, GBM, and LGBM. The accuracy of prediction is obtained using two types of datasets. They are Pima Indians dataset and a curated dataset. The ML classifiers LGBM, GB, and RF are used to build a predictive model, and the accuracy of each classifier is noted and compared. In addition to the generalized prediction mechanism, the data augmentation technique is also used, and the final accuracy of prediction is obtained for the classifiers LGBM, GB, and RF. A comparative study and demonstration between augmentation and non-augmentation are also discussed for the two datasets used in order to further improve the performance accuracy for predicting diabetes disease.
Collapse
|
17
|
Ragab M, AL-Ghamdi ASALM, Fakieh B, Choudhry H, Mansour RF, Koundal D. Prediction of Diabetes through Retinal Images Using Deep Neural Network. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:7887908. [PMID: 35694596 PMCID: PMC9187442 DOI: 10.1155/2022/7887908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 04/19/2022] [Accepted: 05/17/2022] [Indexed: 12/15/2022]
Abstract
Microvascular problems of diabetes, such as diabetic retinopathy and macular edema, can be seen in the eye's retina, and the retinal images are being used to screen for and diagnose the illness manually. Using deep learning to automate this time-consuming process might be quite beneficial. In this paper, a deep neural network, i.e., convolutional neural network, has been proposed for predicting diabetes through retinal images. Before applying the deep neural network, the dataset is preprocessed and normalised for classification. Deep neural network is constructed by using 7 layers, 5 kernels, and ReLU activation function, and MaxPooling is implemented to combine important features. Finally, the model is implemented to classify whether the retinal image belongs to a diabetic or nondiabetic class. The parameters used for evaluating the model are accuracy, precision, recall, and F1 score. The implemented model has achieved a training accuracy of more than 95%, which is much better than the other states of the art algorithms.
Collapse
Affiliation(s)
- Mahmoud Ragab
- Information Technology Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Centre for Artificial Intelligence in Precision Medicines, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Mathematics Department, Faculty of Science, Al-Azhar University, Naser City 11884, Cairo, Egypt
| | - Abdullah S. AL-Malaise AL-Ghamdi
- Information Systems Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Information Systems Department, HECI School, Dar Alhekma University, Jeddah, Saudi Arabia
- Center of Excellence in Smart Environment Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Bahjat Fakieh
- Information Systems Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Hani Choudhry
- Centre for Artificial Intelligence in Precision Medicines, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- Biochemistry Department, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Romany F. Mansour
- Department of Mathematics, Faculty of Science, New Valley University, El-Kharga 72511, Egypt
| | - Deepika Koundal
- School of Computer Science, University of Petroleum & Energy Studies, Dehradun, India
| |
Collapse
|
18
|
Ahamed BS, Arya MS, Nancy V AO. Prediction of Type-2 Diabetes Mellitus Disease Using Machine Learning Classifiers and Techniques. FRONTIERS IN COMPUTER SCIENCE 2022. [DOI: 10.3389/fcomp.2022.835242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The technological advancements in today's healthcare sector have given rise to many innovations for disease prediction. Diabetes mellitus is one of the diseases that has been growing rapidly among people of different age groups; there are various reasons and causes involved. All these reasons are considered as different attributes for this study. To predict type-2 diabetes mellitus disease, various machine learning algorithms can be used. The objective of using the algorithm is to construct a predictive model to critically predict whether a person is affected by diabetes. The classifiers taken are logistic regression, XGBoost, gradient boosting, decision trees, ExtraTrees, random forest, and light gradient boosting machine (LGBM). The dataset used is PIMA Indian Dataset sourced from UC Irvine Repository. The performance of these algorithms is compared in reference to the accuracy obtained. The results obtained from these classifiers show that the LGBM classifier has the highest accuracy of 95.20% in comparison with the other algorithms.
Collapse
|
19
|
Rabie O, Alghazzawi D, Asghar J, Saddozai FK, Asghar MZ. A Decision Support System for Diagnosing Diabetes Using Deep Neural Network. Front Public Health 2022; 10:861062. [PMID: 35372240 PMCID: PMC8970706 DOI: 10.3389/fpubh.2022.861062] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 02/07/2022] [Indexed: 01/16/2023] Open
Abstract
Background and Objective According to the WHO, diabetes mellitus is a long-term condition marked by high blood sugar levels. The consequences might be far-reaching. According to current increases in mortality, diabetes has risen to number 10 among the leading causes of mortality worldwide. When used to predict diabetes using unbalanced datasets from testing, machine learning (ML) classifiers and established approaches for encoding categorical data have exhibited a broad variety of surprising outcomes. Early studies also made use of an artificial neural network to extract features without obtaining a grasp of the sequence information. Methods This study offers a deep learning-based decision support system (DSS), utilizing bidirectional long/short-term memory (BiLSTM), to accurately predict diabetic illness from patient data. In order to predict diabetes, the BiLSTM hybrid model was used after balancing the data set. Results Unlike earlier studies, this proposed model's trial findings were promising, with an accuracy of 93.07%, 93% precision, 92% recall, and a 92% F1-score. Conclusions Using a BILSTM model for classification outperforms current approaches in the diabetes detection domain.
Collapse
Affiliation(s)
- Osama Rabie
- Information Systems Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Daniyal Alghazzawi
- Information Systems Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Junaid Asghar
- Faculty of Pharmacy, Gomal University, Dera Ismail Khan, Pakistan
| | - Furqan Khan Saddozai
- Institute of Computing and Information Technology, Gomal University, Dera Ismail Khan, Pakistan
| | - Muhammad Zubair Asghar
- Institute of Computing and Information Technology, Gomal University, Dera Ismail Khan, Pakistan
- *Correspondence: Muhammad Zubair Asghar
| |
Collapse
|
20
|
R. S, M. S, Hasan MK, Saeed RA, Alsuhibany SA, Abdel-Khalek S. An Empirical Model to Predict the Diabetic Positive Using Stacked Ensemble Approach. Front Public Health 2022; 9:792124. [PMID: 35127623 PMCID: PMC8814448 DOI: 10.3389/fpubh.2021.792124] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Accepted: 11/30/2021] [Indexed: 01/18/2023] Open
Abstract
Today, disease detection automation is widespread in healthcare systems. The diabetic disease is a significant problem that has spread widely all over the world. It is a genetic disease that causes trouble for human life throughout the lifespan. Every year the number of people with diabetes rises by millions, and this affects children too. The disease identification involves manual checking so far, and automation is a current trend in the medical field. Existing methods use a single algorithm for the prediction of diabetes. For complex problems, a single model is not enough because it may not be suitable for the input data or the parameters used in the approach. To solve complex problems, multiple algorithms are used. These multiple algorithms follow a homogeneous model or heterogeneous model. The homogeneous model means the same algorithm, but the model has been used multiple times. In the heterogeneous model, different algorithms are used. This paper adopts a heterogeneous ensemble model called the stacked ensemble model to predict whether a person has diabetes positively or negatively. This stacked ensemble model is advantageous in the prediction. Compared to other existing models such as logistic regression Naïve Bayes (72), (74.4), and LDA (81%), the proposed stacked ensemble model has achieved 93.1% accuracy in predicting blood sugar disease.
Collapse
Affiliation(s)
- Sivashankari R.
- School of Information Technology and Engineering, Vellore Institute of Technology (VIT), Vellore, India
| | - Sudha M.
- School of Information Technology and Engineering, Vellore Institute of Technology (VIT), Vellore, India
| | - Mohammad Kamrul Hasan
- Center for Cyber Security, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia
- *Correspondence: Mohammad Kamrul Hasan ;
| | - Rashid A. Saeed
- Department of Computer Engineering, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Suliman A. Alsuhibany
- Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia
| | - Sayed Abdel-Khalek
- Mathematics and Statistics Department, College of Science, Taif University, Taif, Saudi Arabia
- Mathematics Department, Sohag University, Sohag, Egypt
| |
Collapse
|