1
|
Khan A, Zubair S, Shuaib M, Sheneamer A, Alam S, Assiri B. Development of a robust parallel and multi-composite machine learning model for improved diagnosis of Alzheimer's disease: correlation with dementia-associated drug usage and AT(N) protein biomarkers. Front Neurosci 2024; 18:1391465. [PMID: 39308946 PMCID: PMC11412962 DOI: 10.3389/fnins.2024.1391465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Accepted: 08/12/2024] [Indexed: 09/25/2024] Open
Abstract
Introduction Machine learning (ML) algorithms and statistical modeling offer a potential solution to offset the challenge of diagnosing early Alzheimer's disease (AD) by leveraging multiple data sources and combining information on neuropsychological, genetic, and biomarker indicators. Among others, statistical models are a promising tool to enhance the clinical detection of early AD. In the present study, early AD was diagnosed by taking into account characteristics related to whether or not a patient was taking specific drugs and a significant protein as a predictor of Amyloid-Beta (Aβ), tau, and ptau [AT(N)] levels among participants. Methods In this study, the optimization of predictive models for the diagnosis of AD pathologies was carried out using a set of baseline features. The model performance was improved by incorporating additional variables associated with patient drugs and protein biomarkers into the model. The diagnostic group consisted of five categories (cognitively normal, significant subjective memory concern, early mildly cognitively impaired, late mildly cognitively impaired, and AD), resulting in a multinomial classification challenge. In particular, we examined the relationship between AD diagnosis and the use of various drugs (calcium and vitamin D supplements, blood-thinning drugs, cholesterol-lowering drugs, and cognitive drugs). We propose a hybrid-clinical model that runs multiple ML models in parallel and then takes the majority's votes, enhancing the accuracy. We also assessed the significance of three cerebrospinal fluid biomarkers, Aβ, tau, and ptau in the diagnosis of AD. We proposed that a hybrid-clinical model be used to simulate the MRI-based data, with five diagnostic groups of individuals, with further refinement that includes preclinical characteristics of the disorder. The proposed design builds a Meta-Model for four different sets of criteria. The set criteria are as follows: to diagnose from baseline features, baseline and drug features, baseline and protein features, and baseline, drug and protein features. Results We were able to attain a maximum accuracy of 97.60% for baseline and protein data. We observed that the constructed model functioned effectively when all five drugs were included and when any single drug was used to diagnose the response variable. Interestingly, the constructed Meta-Model worked well when all three protein biomarkers were included, as well as when a single protein biomarker was utilized to diagnose the response variable. Discussion It is noteworthy that we aimed to construct a pipeline design that incorporates comprehensive methodologies to detect Alzheimer's over wide-ranging input values and variables in the current study. Thus, the model that we developed could be used by clinicians and medical experts to advance Alzheimer's diagnosis and as a starting point for future research into AD and other neurodegenerative syndromes.
Collapse
Affiliation(s)
- Afreen Khan
- Department of Computer Application, Faculty of Engineering & IT, Integral University, Lucknow, India
| | - Swaleha Zubair
- Department of Computer Science, Faculty of Science, Aligarh Muslim University, Aligarh, India
| | - Mohammed Shuaib
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| | - Abdullah Sheneamer
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| | - Shadab Alam
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| | - Basem Assiri
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| |
Collapse
|
2
|
Priyanka EB, Vivek S, Thangavel S, Sampathkumar V, Al-Zaqri N, Warad I. Forecasting and meta-features estimation of wastewater and climate change impacts in coastal region using manifold learning. ENVIRONMENTAL RESEARCH 2024; 240:117355. [PMID: 37863164 DOI: 10.1016/j.envres.2023.117355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Revised: 08/31/2023] [Accepted: 10/07/2023] [Indexed: 10/22/2023]
Abstract
South Asia's coastlines are the most densely inhabited and economically active ecosystems have already begun to shift due to climate change. Over the past century, climate change has contributed to a gradual and considerable rise in sea level, which has eroded shorelines and increased storm-related coastal flooding. The differences in estuary water quality over time, both seasonally and annually, have been efficiently controlled by changes in stream flow. Assessment requires digitized analytical platforms to lower the risk of catastrophes associated with climate change in coastal towns. To predict future changes in an area's vulnerability and waste planning decisions, a prospective investigation requires qualitative and quantitative scenarios. The paper concentrates on the development of a forecasting platform to evaluate the climate change and waste water impacts on the south coastal region of India. Due to the enhancement of Digitization, a multi-model ensemble combined with manifold learning is implemented on the multi-case models influencing the uncertainty probability rate of 23% and can be ignored with desired precaution on the coastal environmental. Because Manifold Learning Analysis results cannot be utilized directly in wastewater management studies because of their inherent biases, a statistical bias correction and meta-feature estimation have been implemented. Within the climate-hydrology modeling chain, the results demonstrate a wide range of expected changes in water resources in some places. Experimental statistics reveal that the forecasted rate of 91.45% will be the better choice to reduce the uncertainty of climatic change and wastewater management.
Collapse
Affiliation(s)
- E B Priyanka
- Department of Mechatronics Engineering, Kongu Engineering College, Perundurai, 638060, India.
| | - S Vivek
- Department of Civil Engineering, GMR Institute of Technology, Razam, Andra Pradesh, 532127, India.
| | - S Thangavel
- Department of Mechatronics Engineering, Kongu Engineering College, Perundurai, 638060, India.
| | - V Sampathkumar
- Department of Civil Engineering, Kongu Engineering College, Perundurai, 638060, India.
| | - Nabil Al-Zaqri
- Department of Chemistry, College of Science, King Saud University, P.O. Box 2455, Riyadh 11451, Saudi Arabia.
| | - Ismail Warad
- Department of Chemistry, AN-Najah National University, P.O. Box 7, Nablus, Palestine; Research Centre, Manchester Salt & Catalysis, Unit C, 88- 90 Chorlton Rd, M15 4AN Manchester, United Kingdom.
| |
Collapse
|
3
|
Bakasa W, Viriri S. Stacked ensemble deep learning for pancreas cancer classification using extreme gradient boosting. Front Artif Intell 2023; 6:1232640. [PMID: 37876961 PMCID: PMC10591225 DOI: 10.3389/frai.2023.1232640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 09/04/2023] [Indexed: 10/26/2023] Open
Abstract
Ensemble learning aims to improve prediction performance by combining several models or forecasts. However, how much and which ensemble learning techniques are useful in deep learning-based pipelines for pancreas computed tomography (CT) image classification is a challenge. Ensemble approaches are the most advanced solution to many machine learning problems. These techniques entail training multiple models and combining their predictions to improve the predictive performance of a single model. This article introduces the idea of Stacked Ensemble Deep Learning (SEDL), a pipeline for classifying pancreas CT medical images. The weak learners are Inception V3, VGG16, and ResNet34, and we employed a stacking ensemble. By combining the first-level predictions, an input train set for XGBoost, the ensemble model at the second level of prediction, is created. Extreme Gradient Boosting (XGBoost), employed as a strong learner, will make the final classification. Our findings showed that SEDL performed better, with a 98.8% ensemble accuracy, after some adjustments to the hyperparameters. The Cancer Imaging Archive (TCIA) public access dataset consists of 80 pancreas CT scans with a resolution of 512 * 512 pixels, from 53 male and 27 female subjects. A sample of two hundred and twenty-two images was used for training and testing data. We concluded that implementing the SEDL technique is an effective way to strengthen the robustness and increase the performance of the pipeline for classifying pancreas CT medical images. Interestingly, grouping like-minded or talented learners does not make a difference.
Collapse
Affiliation(s)
| | - Serestina Viriri
- School of Mathematics Statistics & Computer Science, College of Agriculture, Engineering and Science, University of KwaZulu-Natal, Durban, South Africa
| |
Collapse
|
4
|
Wang J, Zhou J, Wu H, Chen Y, Liang B. The Diagnosis of Malignant Pleural Effusion Using Tumor-Marker Combinations: A Cost-Effectiveness Analysis Based on a Stacking Model. Diagnostics (Basel) 2023; 13:3136. [PMID: 37835879 PMCID: PMC10572148 DOI: 10.3390/diagnostics13193136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 09/27/2023] [Accepted: 10/03/2023] [Indexed: 10/15/2023] Open
Abstract
PURPOSE By incorporating the cost of multiple tumor-marker tests, this work aims to comprehensively evaluate the financial burden of patients and the accuracy of machine learning models in diagnosing malignant pleural effusion (MPE) using tumor-marker combinations. METHODS Carcinoembryonic antigen (CEA), carbohydrate antigen (CA)19-9, CA125, and CA15-3 were collected from pleural effusion (PE) and peripheral blood (PB) of 319 patients with pleural effusion. A stacked ensemble (stacking) model based on five machine learning models was utilized to evaluate the diagnostic accuracy of tumor markers. We evaluated the discriminatory accuracy of various tumor-marker combinations using the area under the curve (AUC), sensitivity, and specificity. To evaluate the cost-effectiveness of different tumor-marker combinations, a comprehensive score (C-score) with a tuning parameter w was proposed. RESULTS In most scenarios, the stacking model outperformed the five individual machine learning models in terms of AUC. Among the eight tumor markers, the CEA in PE (PE.CEA) showed the best AUC of 0.902. Among all tumor-marker combinations, the PE.CA19-9 + PE.CA15-3 + PE.CEA + PB.CEA combination (C9 combination) achieved the highest AUC of 0.946. When w puts more weight on the cost, the highest C-score was achieved with the single PE.CEA marker. As w puts over 0.8 weight on AUC, the C-score favored diagnostic models with more expensive tumor-marker combinations. Specifically, when w was set to 0.99, the C9 combination achieved the best C-score. CONCLUSION The stacking diagnostic model using PE.CEA is a relatively accurate and affordable choice in diagnosing MPE for patients without medical insurance or in a low economic level. The stacking model using the combination PE.CA19-9 + PE.CA15-3 + PE.CEA + PB.CEA is the most accurate diagnostic model and the best choice for patients without an economic burden. From a cost-effectiveness perspective, the stacking diagnostic model with PE.CA19-9 + PE.CA15-3 + PE.CEA combination is particularly recommended, as it gains the best trade-off between the low cost and high effectiveness.
Collapse
Affiliation(s)
- Jingyuan Wang
- Department of Biostatistics, School of Public Health, Peking University, Beijing 100191, China; (J.W.); (J.Z.); (H.W.)
| | - Jiangjie Zhou
- Department of Biostatistics, School of Public Health, Peking University, Beijing 100191, China; (J.W.); (J.Z.); (H.W.)
| | - Hanyu Wu
- Department of Biostatistics, School of Public Health, Peking University, Beijing 100191, China; (J.W.); (J.Z.); (H.W.)
| | - Yangyu Chen
- Department of Respiration and Critical Care Medicine, Beijing Chaoyang Hospital, Beijing 100020, China;
| | - Baosheng Liang
- Department of Biostatistics, School of Public Health, Peking University, Beijing 100191, China; (J.W.); (J.Z.); (H.W.)
| |
Collapse
|
5
|
Fahmy HA, Fahmy SF, Del Barrio García AA, Botella Juan G. An Ensemble Multi-Stream Classifier for Infant needs Detection. Heliyon 2023; 9:e15098. [PMID: 37123937 PMCID: PMC10130778 DOI: 10.1016/j.heliyon.2023.e15098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 01/26/2023] [Accepted: 03/27/2023] [Indexed: 04/08/2023] Open
Abstract
In this paper, we propose a novel multi-stream video classifier for infant needs detection. The proposed system is an ensemble-based system that combines several machine learning to improve the overall result of the state-of-the-art algorithms. It is a multi-stream in the sense that it combines the output predictions of both audio and images of infants from every single classifier employed in the system for a unified result. This produces better performance and results compared to the previous other research techniques, which relied on only one of these modalities. For training and testing the proposed system, from the Dunstan Baby Language video collection, we built three separate datasets for videos, images, and sounds encompassing the five primary infant needs that require predicting. These are: hunger, have wind, uncomfortable (require diaper change), wants to burp or tired, with a total of 3348 samples. We used four different ensemble algorithms for the best reachable performance. The proposed algorithm improves the overall accuracies of each single classifier from a low of 51% to a high of 99%. The proposed method also improves the accuracy of the classification process by about 9% compared to the state-of-the-art approaches, which was 90%.
Collapse
|
6
|
Abstract
In brain–computer interfaces (BCIs), it is crucial to process brain signals to improve the accuracy of the classification of motor movements. Machine learning (ML) algorithms such as artificial neural networks (ANNs), linear discriminant analysis (LDA), decision tree (D.T.), K-nearest neighbor (KNN), naive Bayes (N.B.), and support vector machine (SVM) have made significant progress in classification issues. This paper aims to present a signal processing analysis of electroencephalographic (EEG) signals among different feature extraction techniques to train selected classification algorithms to classify signals related to motor movements. The motor movements considered are related to the left hand, right hand, both fists, feet, and relaxation, making this a multiclass problem. In this study, nine ML algorithms were trained with a dataset created by the feature extraction of EEG signals.The EEG signals of 30 Physionet subjects were used to create a dataset related to movement. We used electrodes C3, C1, CZ, C2, and C4 according to the standard 10-10 placement. Then, we extracted the epochs of the EEG signals and applied tone, amplitude levels, and statistical techniques to obtain the set of features. LabVIEW™2015 version custom applications were used for reading the EEG signals; for channel selection, noise filtering, band selection, and feature extraction operations; and for creating the dataset. MATLAB 2021a was used for training, testing, and evaluating the performance metrics of the ML algorithms. In this study, the model of Medium-ANN achieved the best performance, with an AUC average of 0.9998, Cohen’s Kappa coefficient of 0.9552, a Matthews correlation coefficient of 0.9819, and a loss of 0.0147. These findings suggest the applicability of our approach to different scenarios, such as implementing robotic prostheses, where the use of superficial features is an acceptable option when resources are limited, as in embedded systems or edge computing devices.
Collapse
|