1
|
Hasan M, Yasmin F, Hassan MM, Yu X, Yeasmin S, Joshi H, Islam SMS. Enhancing stroke disease classification through machine learning models via a novel voting system by feature selection techniques. PLoS One 2025; 20:e0312914. [PMID: 39787105 PMCID: PMC11717207 DOI: 10.1371/journal.pone.0312914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Accepted: 10/16/2024] [Indexed: 01/12/2025] Open
Abstract
Heart disease remains a leading cause of mortality and morbidity worldwide, necessitating the development of accurate and reliable predictive models to facilitate early detection and intervention. While state of the art work has focused on various machine learning approaches for predicting heart disease, but they could not able to achieve remarkable accuracy. In response to this need, we applied nine machine learning algorithms XGBoost, logistic regression, decision tree, random forest, k-nearest neighbors (KNN), support vector machine (SVM), gaussian naïve bayes (NB gaussian), adaptive boosting, and linear regression to predict heart disease based on a range of physiological indicators. Our approach involved feature selection techniques to identify the most relevant predictors, aimed at refining the models to enhance both performance and interpretability. The models were trained, incorporating processes such as grid search hyperparameter tuning, and cross-validation to minimize overfitting. Additionally, we have developed a novel voting system with feature selection techniques to advance heart disease classification. Furthermore, we have evaluated the models using key performance metrics including accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (ROC AUC). Among the models, XGBoost demonstrated exceptional performance, achieving 99% accuracy, precision, F1-Score, 98% recall, and 100% ROC AUC. This study offers a promising approach to early heart disease diagnosis and preventive healthcare.
Collapse
Affiliation(s)
- Mahade Hasan
- School of Software, Nanjing University of Information Science and Technology, Nanjing, China
| | - Farhana Yasmin
- Department of Computer Science and Technology, Nanjing University of Information Science and Technology, Nanjing, China
| | - Md. Mehedi Hassan
- Computer Science and Engineering Discipline, Khulna University, Khulna, Bangladesh
| | - Xue Yu
- School of Software, Nanjing University of Information Science and Technology, Nanjing, China
| | - Soniya Yeasmin
- Department of Computer Science and Engineering, North Western University, Khulna, Bangladesh
| | - Herat Joshi
- Great River Health Systems, Burlington, IA, United States of America
| | | |
Collapse
|
2
|
Čepová L, Elangovan M, Ramesh JVN, Chohan MK, Verma A, Mohammad F. Improving privacy-preserving multi-faceted long short-term memory for accurate evaluation of encrypted time-series MRI images in heart disease. Sci Rep 2024; 14:20218. [PMID: 39215022 PMCID: PMC11364645 DOI: 10.1038/s41598-024-70593-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024] Open
Abstract
In therapeutic diagnostics, early diagnosis and monitoring of heart disease is dependent on fast time-series MRI data processing. Robust encryption techniques are necessary to guarantee patient confidentiality. While deep learning (DL) algorithm have improved medical imaging, privacy and performance are still hard to balance. In this study, a novel approach for analyzing homomorphivally-encrypted (HE) time-series MRI data is introduced: The Multi-Faceted Long Short-Term Memory (MF-LSTM). This method includes privacy protection. The MF-LSTM architecture protects patient's privacy while accurately categorizing and forecasting cardiac disease, with accuracy (97.5%), precision (96.5%), recall (98.3%), and F1-score (97.4%). While segmentation methods help to improve interpretability by identifying important region in encrypted MRI images, Generalized Histogram Equalization (GHE) improves image quality. Extensive testing on selected dataset if encrypted time-series MRI images proves the method's stability and efficacy, outperforming previous approaches. The finding shows that the suggested technique can decode medical image to expose visual representation as well as sequential movement while protecting privacy and providing accurate medical image evaluation.
Collapse
Affiliation(s)
- Lenka Čepová
- Department of Machining, Assembly and Engineering Metrology, Faculty of Mechanical Engineering, VSB-Technical University of Ostrava, 70800, Ostrava, Czech Republic
| | - Muniyandy Elangovan
- Department of Biosciences, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, 602 105, India.
- Applied Science Research Center, Applied Science Private University, Amman, Jordan.
| | - Janjhyam Venkata Naga Ramesh
- Department of CSE, Graphic Era Hill University, Dehradun, 248002, India
- Department of CSE, Graphic Era Deemed To Be University, Dehradun, Uttarakhand, 248002, India
| | - Mandeep Kaur Chohan
- Department of Computer Science and Engineering, Faculty of Engineering and Technology, Jain (Deemed-to-Be) University, Bengaluru, Karnataka, India
- Department of Computer Science and Engineering, Vivekananda Global University, Jaipur, India
| | - Amit Verma
- University Centre for Research and Development, Chandigarh University, Gharuan, Mohali, Punjab, India
| | - Faruq Mohammad
- Department of Chemistry, College of Science, King Saud University, P.O. Box 2455, Riyadh, 11451, Kingdom of Saudi Arabia
| |
Collapse
|
3
|
Scala A, Trunfio TA, Improta G. The classification algorithms to support the management of the patient with femur fracture. BMC Med Res Methodol 2024; 24:150. [PMID: 39014322 PMCID: PMC11251118 DOI: 10.1186/s12874-024-02276-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 07/05/2024] [Indexed: 07/18/2024] Open
Abstract
Effectiveness in health care is a specific characteristic of each intervention and outcome evaluated. Especially with regard to surgical interventions, organization, structure and processes play a key role in determining this parameter. In addition, health care services by definition operate in a context of limited resources, so rationalization of service organization becomes the primary goal for health care management. This aspect becomes even more relevant for those surgical services for which there are high volumes. Therefore, in order to support and optimize the management of patients undergoing surgical procedures, the data analysis could play a significant role. To this end, in this study used different classification algorithms for characterizing the process of patients undergoing surgery for a femoral neck fracture. The models showed significant accuracy with values of 81%, and parameters such as Anaemia and Gender proved to be determined risk factors for the patient's length of stay. The predictive power of the implemented model is assessed and discussed in view of its capability to support the management and optimisation of the hospitalisation process for femoral neck fracture, and is compared with different model in order to identify the most promising algorithms. In the end, the support of artificial intelligence algorithms laying the basis for building more accurate decision-support tools for healthcare practitioners.
Collapse
Affiliation(s)
- Arianna Scala
- Department of Public Health, University of Naples "Federico II", Naples, Italy
| | - Teresa Angela Trunfio
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", Naples, Italy.
| | - Giovanni Improta
- Department of Public Health, University of Naples "Federico II", Naples, Italy
- Interdepartmental Research Center on Management and Innovation in Healthcare, University of Naples "Federico II", Naples, Italy
| |
Collapse
|
4
|
Revathi T, Balasubramaniam S, Sureshkumar V, Dhanasekaran S. An Improved Long Short-Term Memory Algorithm for Cardiovascular Disease Prediction. Diagnostics (Basel) 2024; 14:239. [PMID: 38337755 PMCID: PMC10855367 DOI: 10.3390/diagnostics14030239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 01/17/2024] [Accepted: 01/21/2024] [Indexed: 02/12/2024] Open
Abstract
Cardiovascular diseases, prevalent as leading health concerns, demand early diagnosis for effective risk prevention. Despite numerous diagnostic models, challenges persist in network configuration and performance degradation, impacting model accuracy. In response, this paper introduces the Optimally Configured and Improved Long Short-Term Memory (OCI-LSTM) model as a robust solution. Leveraging the Salp Swarm Algorithm, irrelevant features are systematically eliminated, and the Genetic Algorithm is employed to optimize the LSTM's network configuration. Validation metrics, including the accuracy, sensitivity, specificity, and F1 score, affirm the model's efficacy. Comparative analysis with a Deep Neural Network and Deep Belief Network establishes the OCI-LSTM's superiority, showcasing a notable accuracy increase of 97.11%. These advancements position the OCI-LSTM as a promising model for accurate and efficient early diagnosis of cardiovascular diseases. Future research could explore real-world implementation and further refinement for seamless integration into clinical practice.
Collapse
Affiliation(s)
- T.K. Revathi
- Department of Computer Science and Engineering, Sona College of Technology, Salem 636005, India;
| | | | - Vidhushavarshini Sureshkumar
- Department of Computer Science and Engineering, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Vadapalani Campus, Chennai 600026, India;
| | | |
Collapse
|
5
|
Mansoor C, Chettri SK, Naleer H. Development of an efficient novel method for coronary artery disease prediction using machine learning and deep learning techniques. Technol Health Care 2024; 32:4545-4569. [PMID: 39031414 PMCID: PMC11613076 DOI: 10.3233/thc-240740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Accepted: 05/22/2024] [Indexed: 07/22/2024]
Abstract
BACKGROUND Heart disease is a severe health issue that results in high fatality rates worldwide. Identifying cardiovascular diseases such as coronary artery disease (CAD) and heart attacks through repetitive clinical data analysis is a significant task. Detecting heart disease in its early stages can save lives. The most lethal cardiovascular condition is CAD, which develops over time due to plaque buildup in coronary arteries, causing incomplete blood flow obstruction. Machine Learning (ML) is progressively used in the medical sector to detect CAD disease. OBJECTIVE The primary aim of this work is to deliver a state-of-the-art approach to enhancing CAD prediction accuracy by using a DL algorithm in a classification context. METHODS A unique ML technique is proposed in this study to predict CAD disease accurately using a deep learning algorithm in a classification context. An ensemble voting classifier classification model is developed based on various methods such as Naïve Bayes (NB), Logistic Regression (LR), Decision Tree (DT), XGBoost, Random Forest (RF), Convolutional Neural Network (CNN), Support Vector Machine (SVM), K Nearest Neighbor (KNN), Bidirectional LSTM and Long Short-Term Memory (LSTM). The performance of the ensemble models and a novel model are compared in this study. The Alizadeh Sani dataset, which consists of a random sample of 216 cases with CAD, is used in this study. Synthetic Minority Over Sampling Technique (SMOTE) is used to address the issue of imbalanced datasets, and the Chi-square test is used for feature selection optimization. Performance is assessed using various assessment methodologies, such as confusion matrix, accuracy, recall, precision, f1-score, and auc-roc. RESULTS When a novel algorithm achieves the highest accuracy relative to other algorithms, it demonstrates its effectiveness in several ways, including superior performance, robustness, generalization capability, efficiency, innovative approaches, and benchmarking against baselines. These characteristics collectively contribute to establishing the novel algorithm as a promising solution for addressing the target problem in machine learning and related fields. CONCLUSION Implementing the novel model in this study significantly improved performance, achieving a prediction accuracy rate of 92% in the detection of CAD. These findings are competitive and on par with the top outcomes among other methods.
Collapse
Affiliation(s)
- C.M.M. Mansoor
- Assam Don Bosco University, Guwahati, India
- South Eastern University of Sri Lanka, Oluvil, Sri Lanka
| | - Sarat Kumar Chettri
- Department of Computer Applications, Assam Don Bosco University, Guwahati, India
| | - H.M.M. Naleer
- Department of Computer Science, South Eastern University of Sri Lanka, Oluvil, Sri Lanka
| |
Collapse
|
6
|
Mandal A, Pradhan S. Predict2Protect: Machine Learning Web Application in Early Detection of Heart Disease. Cureus 2023; 15:e49305. [PMID: 38024045 PMCID: PMC10666956 DOI: 10.7759/cureus.49305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/23/2023] [Indexed: 12/01/2023] Open
Abstract
Across the world, there are few universal scenarios, but the pain of losing a loved one to heart disease is an exception and a reality shared by millions every year. Heart disease is the greatest killer in society today, and one prevalent root of this issue is untimely diagnosis, often caused by unsustainable costs and lack of accessible healthcare for underserved populations. Recognizing these disparities, the goal of this project was to create an easily available application and interface for all that accurately indicates one's risk of heart disease. To address this, a machine learning model, Predict2Protect, was built in Python. An open-source dataset compiled of 1025 patients of diverse backgrounds was scaled, adjusted to include inquiries answerable by patients, and split into 75% for training, 15% for validation, and 25% for testing. Four models were tested with the hypothesis that if the RandomForestClassifier was used, it would have the highest validity. This was not supported, as the DecisionTree model had a 100% accuracy for training data and 95% for test data. Through the application software Streamlit, this program was processed into a web application that is now found in browser extensions. The application reports the risk of one having heart disease with a 95% accuracy and describes the risk percentage of developing heart disease within the next year. With a simple interface and high accuracy, Predict2Protect aims to provide a view into one's health with the goals of accessible heart disease prediction and early treatment for patients around the world.
Collapse
Affiliation(s)
- Ankita Mandal
- Center for Medical Sciences, Mills E. Godwin High School, Richmond, USA
| | - Soma Pradhan
- Obstetrics and Gynecology, Bon Secours St. Mary's Hospital, Richmond, USA
| |
Collapse
|
7
|
Mirjalili SR, Soltani S, Heidari Meybodi Z, Marques-Vidal P, Kraemer A, Sarebanhassanabadi M. An innovative model for predicting coronary heart disease using triglyceride-glucose index: a machine learning-based cohort study. Cardiovasc Diabetol 2023; 22:200. [PMID: 37542255 PMCID: PMC10403891 DOI: 10.1186/s12933-023-01939-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 07/24/2023] [Indexed: 08/06/2023] Open
Abstract
BACKGROUND Various predictive models have been developed for predicting the incidence of coronary heart disease (CHD), but none of them has had optimal predictive value. Although these models consider diabetes as an important CHD risk factor, they do not consider insulin resistance or triglyceride (TG). The unsatisfactory performance of these prediction models may be attributed to the ignoring of these factors despite their proven effects on CHD. We decided to modify standard CHD predictive models through machine learning to determine whether the triglyceride-glucose index (TyG-index, a logarithmized combination of fasting blood sugar (FBS) and TG that demonstrates insulin resistance) functions better than diabetes as a CHD predictor. METHODS Two-thousand participants of a community-based Iranian population, aged 20-74 years, were investigated with a mean follow-up of 9.9 years (range: 7.6-12.2). The association between the TyG-index and CHD was investigated using multivariate Cox proportional hazard models. By selecting common components of previously validated CHD risk scores, we developed machine learning models for predicting CHD. The TyG-index was substituted for diabetes in CHD prediction models. All components of machine learning models were explained in terms of how they affect CHD prediction. CHD-predicting TyG-index cut-off points were calculated. RESULTS The incidence of CHD was 14.5%. Compared to the lowest quartile of the TyG-index, the fourth quartile had a fully adjusted hazard ratio of 2.32 (confidence interval [CI] 1.16-4.68, p-trend 0.04). A TyG-index > 8.42 had the highest negative predictive value for CHD. The TyG-index-based support vector machine (SVM) performed significantly better than diabetes-based SVM for predicting CHD. The TyG-index was not only more important than diabetes in predicting CHD; it was the most important factor after age in machine learning models. CONCLUSION We recommend using the TyG-index in clinical practice and predictive models to identify individuals at risk of developing CHD and to aid in its prevention.
Collapse
Affiliation(s)
- Seyed Reza Mirjalili
- Yazd Cardiovascular Research Center, Non-Communicable Diseases Research Institute, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| | - Sepideh Soltani
- Yazd Cardiovascular Research Center, Non-Communicable Diseases Research Institute, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| | - Zahra Heidari Meybodi
- Yazd Cardiovascular Research Center, Non-Communicable Diseases Research Institute, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| | - Pedro Marques-Vidal
- Department of Internal Medicine, BH10-642, Rue du Bugnon 46, CH-1011, Lausanne, Switzerland
| | - Alexander Kraemer
- Department of Health Sciences, Bielefeld University, Bielefeld, Germany
| | - Mohammadtaghi Sarebanhassanabadi
- Yazd Cardiovascular Research Center, Non-Communicable Diseases Research Institute, Shahid Sadoughi University of Medical Sciences, Yazd, Iran.
| |
Collapse
|