1
|
Saleh H, Amer E, Abuhmed T, Ali A, Al-Fuqaha A, El-Sappagh S. Computer aided progression detection model based on optimized deep LSTM ensemble model and the fusion of multivariate time series data. Sci Rep 2023; 13:16336. [PMID: 37770490 PMCID: PMC10539296 DOI: 10.1038/s41598-023-42796-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 09/14/2023] [Indexed: 09/30/2023] Open
Abstract
Alzheimer's disease (AD) is the most common form of dementia. Early and accurate detection of AD is crucial to plan for disease modifying therapies that could prevent or delay the conversion to sever stages of the disease. As a chronic disease, patient's multivariate time series data including neuroimaging, genetics, cognitive scores, and neuropsychological battery provides a complete profile about patient's status. This data has been used to build machine learning and deep learning (DL) models for the early detection of the disease. However, these models still have limited performance and are not stable enough to be trusted in real medical settings. Literature shows that DL models outperform classical machine learning models, but ensemble learning has proven to achieve better results than standalone models. This study proposes a novel deep stacking framework which combines multiple DL models to accurately predict AD at an early stage. The study uses long short-term memory (LSTM) models as base models over patient's multivariate time series data to learn the deep longitudinal features. Each base LSTM classifier has been optimized using the Bayesian optimizer using different feature sets. As a result, the final optimized ensembled model employed heterogeneous base models that are trained on heterogeneous data. The performance of the resulting ensemble model has been explored using a cohort of 685 patients from the University of Washington's National Alzheimer's Coordinating Center dataset. Compared to the classical machine learning models and base LSTM classifiers, the proposed ensemble model achieves the highest testing results (i.e., 82.02, 82.25, 82.02, and 82.12 for accuracy, precision, recall, and F1-score, respectively). The resulting model enhances the performance of the state-of-the-art literature, and it could be used to build an accurate clinical decision support tool that can assist domain experts for AD progression detection.
Collapse
Affiliation(s)
- Hager Saleh
- Faculty of Computers and Artificial Intelligence, South Valley University, Hurghada, Egypt
| | - Eslam Amer
- Communications and Information Technology, The Institute of Electronics, Queen's University of Belfast, Belfast, UK
| | - Tamer Abuhmed
- Information Laboratory (InfoLab), College of Computing and Informatics, Sungkyunkwan University, Seoul, Suwon, 16419, South Korea.
| | - Amjad Ali
- Information and Computing Technology (ICT) Division, College of Science and Engineering (CSE), Hamad Bin Khalifa University, Doha, Qatar
| | - Ala Al-Fuqaha
- Information and Computing Technology (ICT) Division, College of Science and Engineering (CSE), Hamad Bin Khalifa University, Doha, Qatar
| | - Shaker El-Sappagh
- Information Laboratory (InfoLab), College of Computing and Informatics, Sungkyunkwan University, Seoul, Suwon, 16419, South Korea.
- Faculty of Computer Science and Engineering, Galala University, Suez, 435611, Egypt.
- Faculty of Computers and Artificial Intelligence, Benha University, Banha, 13518, Egypt.
| |
Collapse
|
2
|
Junaid M, Ali S, Eid F, El-Sappagh S, Abuhmed T. Explainable machine learning models based on multimodal time-series data for the early detection of Parkinson's disease. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 234:107495. [PMID: 37003039 DOI: 10.1016/j.cmpb.2023.107495] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 02/23/2023] [Accepted: 03/17/2023] [Indexed: 06/19/2023]
Abstract
BACKGROUND AND OBJECTIVES Parkinson's Disease (PD) is a devastating chronic neurological condition. Machine learning (ML) techniques have been used in the early prediction of PD progression. Fusion of heterogeneous data modalities proved its capability to improve the performance of ML models. Time series data fusion supports the tracking of the disease over time. In addition, the trustworthiness of the resulting models is improved by adding model explainability features. The literature on PD has not sufficiently explored these three points. METHODS In this work, we proposed an ML pipeline for predicting the progression of PD that is both accurate and explainable. We explore the fusion of different combinations of five time series modalities from the Parkinson's Progression Markers Initiative (PPMI) real-world dataset, including patient characteristics, biosamples, medication history, motor, and non-motor function data. Each patient has six visits. The problem has been formulated in two ways: ❶ a three-class based progression prediction with 953 patients in each time series modality, and ❷ a four-class based progression prediction with 1,060 patients in each time series modality. The statistical features of these six visits were calculated from each modality and diverse feature selection methods were applied to select the most informative feature sets. The extracted features were used to train a set of well-known ML models including Support vector machines (SVM), random forests (RF), extra tree classifier (ETC), light gradient boosting machines (LGBM), and stochastic gradient descent (SGD). We examined a number of data-balancing strategies in the pipeline with different combinations of modalities. ML models have been optimized using the Bayesian optimizer. A comprehensive evaluation of various ML methods has been conducted, and the best models have been extended to provide different explainability features. RESULTS We compare the performance of ML models before and after optimization and using and without using feature selection. In the three-class experiment and with various modality fusions, the LGBM model produced the most accurate results with a 10-fold cross-validation (10-CV) accuracy of 90.73% using non-motor function modality. RF produced the best results in the four-class experiment with various modality fusions with a 10-CV accuracy of 94.57% using non-motor modality. With the fused dataset of non-motor and motor function modalities, the LGBM model outperformed the other ML models in both the 3-class and 4-class experiments (i.e., 10-CV accuracy of 94.89% and 93.73%, respectively). Using the Shapely Additive Explanations (SHAP) framework, we employed global and instance-based explanations to explain the behavior of each ML classifier. Moreover, we extended the explainability by implementing the LIME and SHAPASH local explainers. The consistency of these explainers has been explored. The resultant classifiers were accurate, explainable, and thus medically more relevant and applicable. CONCLUSIONS The select modalities and feature sets were confirmed by the literature and medical experts. The various explainers suggest that the bradykinesia (NP3BRADY) feature was the most dominant and consistent. By providing thorough insights into the influence of multiple modalities on the disease risk, the suggested approach is expected to help improve the clinical knowledge of PD progression processes.
Collapse
Affiliation(s)
- Muhammad Junaid
- Information Laboratory (InfoLab), Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon 16419, South Korea.
| | - Sajid Ali
- Information Laboratory (InfoLab), Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon 16419, South Korea.
| | - Fatma Eid
- Technology Management, Stony Brook University, New York 11794, USA.
| | - Shaker El-Sappagh
- Information Laboratory (InfoLab), College of Computing and Informatics, Sungkyunkwan University, Suwon 16419, South Korea; Faculty of Computer Science and Engineering, Galala University, Suez 435611, Egypt; Information Systems Department, Faculty of Computers and Artificial Intelligence, Benha University, Banha, 13518, Egypt.
| | - Tamer Abuhmed
- Information Laboratory (InfoLab), College of Computing and Informatics, Sungkyunkwan University, Suwon 16419, South Korea.
| |
Collapse
|
3
|
Automatic detection of Alzheimer’s disease progression: An efficient information fusion approach with heterogeneous ensemble classifiers. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
4
|
Morgan-Benita JA, Galván-Tejada CE, Cruz M, Galván-Tejada JI, Gamboa-Rosales H, Arceo-Olague JG, Luna-García H, Celaya-Padilla JM. Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features. Healthcare (Basel) 2022; 10:healthcare10081362. [PMID: 35893185 PMCID: PMC9331873 DOI: 10.3390/healthcare10081362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/11/2022] [Accepted: 07/15/2022] [Indexed: 11/16/2022] Open
Abstract
Type 2 diabetes mellitus (T2DM) represents one of the biggest health problems in Mexico, and it is extremely important to early detect this disease and its complications. For a noninvasive detection of T2DM, a machine learning (ML) approach that uses ensemble classification models with dichotomous output that is also fast and effective for early detection and prediction of T2D can be used. In this article, an ensemble technique by hard voting is designed and implemented using generalized linear regression (GLM), support vector machines (SVM) and artificial neural networks (ANN) for the classification of T2DM patients. In the materials and methods as a first step, the data is balanced, standardized, imputed and integrated into the three models to classify the patients in a dichotomous result. For the selection of features, an implementation of LASSO is developed, with a 10-fold cross-validation and for the final validation, the Area Under the Curve (AUC) is used. The results in LASSO showed 12 features, which are used in the implemented models to obtain the best possible scenario in the developed ensemble model. The algorithm with the best performance of the three is SVM, this model obtained an AUC of 92% ± 3%. The ensemble model built with GLM, SVM and ANN obtained an AUC of 90% ± 3%.
Collapse
Affiliation(s)
- Jorge A. Morgan-Benita
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Carlos E. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Miguel Cruz
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Av. Cuauhtémoc 330, Col. Doctores, Del. Cuauhtémoc, Mexico City 06720, Mexico;
| | - Jorge I. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Hamurabi Gamboa-Rosales
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Jose G. Arceo-Olague
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Huizilopoztli Luna-García
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
- Correspondence: (H.L.-G.); (J.M.C.-P.)
| | - José M. Celaya-Padilla
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
- Correspondence: (H.L.-G.); (J.M.C.-P.)
| |
Collapse
|
5
|
Impact of the learners diversity and combination method on the generation of heterogeneous classifier ensembles. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107689] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
6
|
Internet of Things (IoT)-Based Wireless Health: Enabling Technologies and Applications. ELECTRONICS 2021. [DOI: 10.3390/electronics10020148] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Wireless health is transforming health care by integrating wireless technologies into conventional medicine, including the diagnosis, monitoring, and treatment of illness [...]
Collapse
|
7
|
Novaes MT, Ferreira de Carvalho OL, Guimarães Ferreira PH, Nunes Tiraboschi TL, Silva CS, Zambrano JC, Gomes CM, de Paula Miranda E, Abílio de Carvalho Júnior O, de Bessa Júnior J. Prediction of secondary testosterone deficiency using machine learning: A comparative analysis of ensemble and base classifiers, probability calibration, and sampling strategies in a slightly imbalanced dataset. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100538] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|
8
|
Accelerating Retinal Fundus Image Classification Using Artificial Neural Networks (ANNs) and Reconfigurable Hardware (FPGA). ELECTRONICS 2019. [DOI: 10.3390/electronics8121522] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Diabetic retinopathy (DR) and glaucoma are common eye diseases that affect a blood vessel in the retina and are two of the leading causes of vision loss around the world. Glaucoma is a common eye condition where the optic nerve that connects the eye to the brain becomes damaged, whereas DR is a complication of diabetes caused by high blood sugar levels damaging the back of the eye. In order to produce an accurate and early diagnosis, an extremely high number of retinal images needs to be processed. Given the required computational complexity of image processing algorithms and the need for high-performance architectures, this paper proposes and demonstrates the use of fully parallel field programmable gate arrays (FPGAs) to overcome the burden of real-time computing in conventional software architectures. The experimental results achieved through software implementation were validated on an FPGA device. The results showed a remarkable improvement in terms of computational speed and power consumption. This paper presents various preprocessing methods to analyse fundus images, which can serve as a diagnostic tool for detection of glaucoma and diabetic retinopathy. In the proposed adaptive thresholding-based preprocessing method, features were selected by calculating the area of the segmented optic disk, which was further classified using a feedforward neural network (NN). The analysis was carried out using feature extraction through existing methodologies such as adaptive thresholding, histogram and wavelet transform. Results obtained through these methods were quantified to obtain optimum performance in terms of classification accuracy. The proposed hardware implementation outperforms existing methods and offers a significant improvement in terms of computational speed and power consumption.
Collapse
|