1
|
Ji W, Wang C, Chen H, Liang Y, Wang S. Predicting post-stroke cognitive impairment using machine learning: A prospective cohort study. J Stroke Cerebrovasc Dis 2023; 32:107354. [PMID: 37716104 DOI: 10.1016/j.jstrokecerebrovasdis.2023.107354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 08/27/2023] [Accepted: 09/11/2023] [Indexed: 09/18/2023] Open
Abstract
BACKGROUND Post-stroke cognitive impairment (PSCI) is a serious complication of stroke that warrants prompt detection and management. Consequently, the development of a diagnostic prediction model holds clinical significance. OBJECTIVE Machine learning algorithms were employed to identify crucial variables and forecast PSCI occurrence within 3-6 months following acute ischemic stroke (AIS). METHODS A prospective study was conducted on a developed cohort (331 patients) utilizing data from the Affiliated Zhongda Hospital of Southeast University between January 2022 and August 2022, as well as an external validation cohort (66 patients) from December 2022 to January 2023. The optimal model was determined by integrating nine machine learning classification models, and personalized risk assessment was facilitated by a Shapley Additive exPlanations (SHAP) interpretation. RESULTS Age, education, baseline National Institutes of Health Scale (NIHSS), Cerebral white matter degeneration (CWMD), Homocysteine (Hcy), and C-reactive protein (CRP) were identified as predictors of PSCI occurrence. Gaussian Naïve Bayes (GNB) model was determined to be the optimal model, surpassing other classifier models in the validation set (area under the curve [AUC]: 0.925, 95 % confidence interval [CI]: 0.861 - 0.988) and achieving the lowest Brier score. The GNB model performed well in the test sets (AUC: 0.919, accuracy: 0.864, sensitivity: 0.818, and specificity: 0.932). CONCLUSIONS The present study involved the development of a GNB model and its elucidation through employment of the SHAP method. These findings provide compelling evidence for preventing PSCI, which could serve as a guide for high-risk patients to undertake appropriate preventive measures.
Collapse
Affiliation(s)
- Wencan Ji
- Nanjing Medical University, Nanjing, China; Jiangsu Research Center for Primary Health Development and General Practice Education, Jiangsu, China; Department of General Practice, Zhongda Hospital, Southeast University, Nanjing, China
| | - Canjun Wang
- Center of Clinical Laboratory Medicine, Zhongda Hospital, Southeast University, Nanjing, China
| | - Hanqing Chen
- Department of General Practice, Zhongda Hospital, Southeast University, Nanjing, China
| | - Yan Liang
- Department of General Practice, Zhongda Hospital, Southeast University, Nanjing, China
| | - Shaohua Wang
- Nanjing Medical University, Nanjing, China; Department of Endocrinology, Affiliated Zhongda Hospital of Southeast University, Nanjing, China.
| |
Collapse
|
2
|
Subhalakshmi RT, Balamurugan SAA, Sasikala S. Deep learning based fusion model for COVID-19 diagnosis and classification using computed tomography images. Concurr Eng Res Appl 2022; 30:116-127. [PMID: 35382156 PMCID: PMC8968394 DOI: 10.1177/1063293x211021435] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Recently, the COVID-19 pandemic becomes increased in a drastic way, with the availability of a limited quantity of rapid testing kits. Therefore, automated COVID-19 diagnosis models are essential to identify the existence of disease from radiological images. Earlier studies have focused on the development of Artificial Intelligence (AI) techniques using X-ray images on COVID-19 diagnosis. This paper aims to develop a Deep Learning Based MultiModal Fusion technique called DLMMF for COVID-19 diagnosis and classification from Computed Tomography (CT) images. The proposed DLMMF model operates on three main processes namely Weiner Filtering (WF) based pre-processing, feature extraction and classification. The proposed model incorporates the fusion of deep features using VGG16 and Inception v4 models. Finally, Gaussian Naïve Bayes (GNB) based classifier is applied for identifying and classifying the test CT images into distinct class labels. The experimental validation of the DLMMF model takes place using open-source COVID-CT dataset, which comprises a total of 760 CT images. The experimental outcome defined the superior performance with the maximum sensitivity of 96.53%, specificity of 95.81%, accuracy of 96.81% and F-score of 96.73%.
Collapse
Affiliation(s)
- RT Subhalakshmi
- Department of Information Technology, Sethu Institute of Technology, Virudhunagar, Tamil Nadu, India
| | - S Appavu alias Balamurugan
- Department of Computer Science, Central University of Tamil Nadu, Thiruvarur, Tamil Nadu, India
- S Appavu alias Balamurugan, Department of Computer Science, Central University of Tamil Nadu, Thiruvarur – 610 005, Tamilnadu, India.
| | - S Sasikala
- Department of Computer Science and Engineering, Velammal College of Engineering and Technology, Madurai, Tamil Nadu, India
| |
Collapse
|
3
|
Hu J, Zhou L, Li B, Zhang X, Chen N. Improve hot region prediction by analyzing different machine learning algorithms. BMC Bioinformatics 2021; 22:522. [PMID: 34696728 PMCID: PMC8543831 DOI: 10.1186/s12859-021-04420-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 09/08/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the process of designing drugs and proteins, it is crucial to recognize hot regions in protein-protein interactions. Each hot region of protein-protein interaction is composed of at least three hot spots, which play an important role in binding. However, it takes time and labor force to identify hot spots through biological experiments. If predictive models based on machine learning methods can be trained, the drug design process can be effectively accelerated. RESULTS The results show that different machine learning algorithms perform similarly, as evaluating using the F-measure. The main differences between these methods are recall and precision. Since the key attribute of hot regions is that they are packed tightly, we used the cluster algorithm to predict hot regions. By combining Gaussian Naïve Bayes and DBSCAN, the F-measure of hot region prediction can reach 0.809. CONCLUSIONS In this paper, different machine learning models such as Gaussian Naïve Bayes, SVM, Xgboost, Random Forest, and Artificial Neural Network are used to predict hot spots. The experiment results show that the combination of hot spot classification algorithm with higher recall rate and clustering algorithm with higher precision can effectively improve the accuracy of hot region prediction.
Collapse
Affiliation(s)
- Jing Hu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China.,Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, 430065, Hubei, China
| | - Longwei Zhou
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China.,Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, 430065, Hubei, China
| | - Bo Li
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China.,Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, 430065, Hubei, China
| | - Xiaolong Zhang
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei, China. .,Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, 430065, Hubei, China.
| | - Nansheng Chen
- Molecular Biology and Biochemistry, Simon Fraser University, Vancouver, BC, Canada.
| |
Collapse
|
4
|
Nag A, Gerritsen A, Doeppke C, Harman-Ware AE. Machine Learning-Based Classification of Lignocellulosic Biomass from Pyrolysis-Molecular Beam Mass Spectrometry Data. Int J Mol Sci 2021; 22:ijms22084107. [PMID: 33921121 PMCID: PMC8071563 DOI: 10.3390/ijms22084107] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/09/2021] [Accepted: 04/09/2021] [Indexed: 12/04/2022] Open
Abstract
High-throughput analysis of biomass is necessary to ensure consistent and uniform feedstocks for agricultural and bioenergy applications and is needed to inform genomics and systems biology models. Pyrolysis followed by mass spectrometry such as molecular beam mass spectrometry (py-MBMS) analyses are becoming increasingly popular for the rapid analysis of biomass cell wall composition and typically require the use of different data analysis tools depending on the need and application. Here, the authors report the py-MBMS analysis of several types of lignocellulosic biomass to gain an understanding of spectral patterns and variation with associated biomass composition and use machine learning approaches to classify, differentiate, and predict biomass types on the basis of py-MBMS spectra. Py-MBMS spectra were also corrected for instrumental variance using generalized linear modeling (GLM) based on the use of select ions relative abundances as spike-in controls. Machine learning classification algorithms e.g., random forest, k-nearest neighbor, decision tree, Gaussian Naïve Bayes, gradient boosting, and multilayer perceptron classifiers were used. The k-nearest neighbors (k-NN) classifier generally performed the best for classifications using raw spectral data, and the decision tree classifier performed the worst. After normalization of spectra to account for instrumental variance, all the classifiers had comparable and generally acceptable performance for predicting the biomass types, although the k-NN and decision tree classifiers were not as accurate for prediction of specific sample types. Gaussian Naïve Bayes (GNB) and extreme gradient boosting (XGB) classifiers performed better than the k-NN and the decision tree classifiers for the prediction of biomass mixtures. The data analysis workflow reported here could be applied and extended for comparison of biomass samples of varying types, species, phenotypes, and/or genotypes or subjected to different treatments, environments, etc. to further elucidate the sources of spectral variance, patterns, and to infer compositional information based on spectral analysis, particularly for analysis of data without a priori knowledge of the feedstock composition or identity.
Collapse
Affiliation(s)
- Ambarish Nag
- Computational Science Center, National Renewable Energy Laboratory, 15013 Denver West Pkwy, Golden, CO 80401, USA; (A.N.); (A.G.)
| | - Alida Gerritsen
- Computational Science Center, National Renewable Energy Laboratory, 15013 Denver West Pkwy, Golden, CO 80401, USA; (A.N.); (A.G.)
| | - Crissa Doeppke
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Pkwy, Golden, CO 80401, USA;
| | - Anne E. Harman-Ware
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, 15013 Denver West Pkwy, Golden, CO 80401, USA;
- Correspondence:
| |
Collapse
|
5
|
Ontivero-Ortega M, Lage-Castellanos A, Valente G, Goebel R, Valdes-Sosa M. Fast Gaussian Naïve Bayes for searchlight classification analysis. Neuroimage 2017; 163:471-479. [PMID: 28877514 DOI: 10.1016/j.neuroimage.2017.09.001] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Revised: 08/01/2017] [Accepted: 09/01/2017] [Indexed: 10/18/2022] Open
Abstract
The searchlight technique is a variant of multivariate pattern analysis (MVPA) that examines neural activity across large sets of small regions, exhaustively covering the whole brain. This usually involves application of classifier algorithms across all searchlights, which entails large computational costs especially when testing the statistical significance of the accuracies with permutation methods. In this article, a new implementation of the Gaussian Naive Bayes classifier is presented (henceforth massive-GNB). This approach allows classification in all searchlights simultaneously, and is faster than previously published searchlight GNB implementations, as well as other more complex classifiers including support vector machines (SVM). To ensure that the gain in speed for GNB would be useful in searchlight analysis, we compared the accuracies of massive-GNB and SVM in detecting the lateral occipital complex (LOC) in an fMRI localizer experiment (26 subjects). Moreover, this region as defined in a meta-analysis of many activation studies was used as a gold standard to compare error rates for both classifiers. In individual searchlights, SVM was somewhat more accurate than massive-GNB and more selective in detecting the meta-analytic LOC. However, with multiple comparison correction at the cluster-level the two classifiers performed equivalently. Thus for cluster-level analysis, massive-GNB produces an accuracy similar to more sophisticated classifiers but with a substantial gain in speed. Massive-GNB (available as a public Matlab toolbox) could facilitate the more widespread use of searchlight analysis.
Collapse
Affiliation(s)
| | - Agustin Lage-Castellanos
- Department of NeuroInformatics, Cuban Center for Neuroscience, Cuba; Department of Cognitive Neuroscience, Maastricht University, Netherlands
| | - Giancarlo Valente
- Department of Cognitive Neuroscience, Maastricht University, Netherlands
| | - Rainer Goebel
- Department of Cognitive Neuroscience, Maastricht University, Netherlands
| | | |
Collapse
|