1
|
Ferdowsi M, Hasan MM, Habib W. Responsible AI for cardiovascular disease detection: Towards a privacy-preserving and interpretable model. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 254:108289. [PMID: 38905988 DOI: 10.1016/j.cmpb.2024.108289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 06/10/2024] [Accepted: 06/16/2024] [Indexed: 06/23/2024]
Abstract
BACKGROUND AND OBJECTIVE Cardiovascular disease (CD) is a major global health concern, affecting millions with symptoms like fatigue and chest discomfort. Timely identification is crucial due to its significant contribution to global mortality. In healthcare, artificial intelligence (AI) holds promise for advancing disease risk assessment and treatment outcome prediction. However, machine learning (ML) evolution raises concerns about data privacy and biases, especially in sensitive healthcare applications. The objective is to develop and implement a responsible AI model for CD prediction that prioritize patient privacy, security, ensuring transparency, explainability, fairness, and ethical adherence in healthcare applications. METHODS To predict CD while prioritizing patient privacy, our study employed data anonymization involved adding Laplace noise to sensitive features like age and gender. The anonymized dataset underwent analysis using a differential privacy (DP) framework to preserve data privacy. DP ensured confidentiality while extracting insights. Compared with Logistic Regression (LR), Gaussian Naïve Bayes (GNB), and Random Forest (RF), the methodology integrated feature selection, statistical analysis, and SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) for interpretability. This approach facilitates transparent and interpretable AI decision-making, aligning with responsible AI development principles. Overall, it combines privacy preservation, interpretability, and ethical considerations for accurate CD predictions. RESULTS Our investigations from the DP framework with LR were promising, with an area under curve (AUC) of 0.848 ± 0.03, an accuracy of 0.797 ± 0.02, precision at 0.789 ± 0.02, recall at 0.797 ± 0.02, and an F1 score of 0.787 ± 0.02, with a comparable performance with the non-privacy framework. The SHAP and LIME based results support clinical findings, show a commitment to transparent and interpretable AI decision-making, and aligns with the principles of responsible AI development. CONCLUSIONS Our study endorses a novel approach in predicting CD, amalgamating data anonymization, privacy-preserving methods, interpretability tools SHAP, LIME, and ethical considerations. This responsible AI framework ensures accurate predictions, privacy preservation, and user trust, underscoring the significance of comprehensive and transparent ML models in healthcare. Therefore, this research empowers the ability to forecast CD, providing a vital lifeline to millions of CD patients globally and potentially preventing numerous fatalities.
Collapse
Affiliation(s)
- Mahbuba Ferdowsi
- Department of Mechatronics and Biomedical Engineering, Lee Kong Chian Faculty of Engineering and Science, Universiti Tunku Abdul Rahman (UTAR), Kajang, Selangor 43200, Malaysia.
| | - Md Mahmudul Hasan
- School of Computer Science and Engineering, University of New South Wales (UNSW), Sydney, NSW 2052, Australia
| | - Wafa Habib
- Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya (UM), Kuala Lumpur 50603, Malaysia
| |
Collapse
|
2
|
Alassafi MO, Aziz W, AlGhamdi R, Alshdadi AA, Nadeem MSA, Khan IR, Albishry N, Bahaddad A, Altalbe A. Scale based entropy measures and deep learning methods for analyzing the dynamical characteristics of cardiorespiratory control system in COVID-19 subjects during and after recovery. Comput Biol Med 2024; 170:108032. [PMID: 38310805 DOI: 10.1016/j.compbiomed.2024.108032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Revised: 01/20/2024] [Accepted: 01/25/2024] [Indexed: 02/06/2024]
Abstract
COVID-19, known as Coronavirus Disease 2019 primarily targets the respiratory system and can impact the cardiovascular system, leading to a range of cardiorespiratory complications. The current forefront in analyzing the dynamical characteristics of physiological systems and aiding clinical decision-making involves the integration of entropy-based complexity techniques with artificial intelligence. Entropy-based measures offer promising prospects for identifying disturbances in cardiorespiratory control system (CRCS) among COVID-19 patients by assessing the oxygen saturation variability (OSV) signals. In this investigation, we employ scale-based entropy (SBE) methods, including multiscale entropy (MSE), multiscale permutation entropy (MPE), and multiscale fuzzy entropy (MFE), to characterize the dynamical characteristics of OSV signals. These measurements serve as features for the application of traditional machine learning (ML) and deep learning (DL) approaches in the context of classifying OSV signals from COVID-19 patients during their illness and subsequent recovery. We use the Beurer PO-80 pulse oximeter which non-invasively acquired OSV and pulse rate data from COVID-19 infected patients during the active infection phase and after a two-month recovery period. The dataset comprises of 88 recordings collected from 44 subjects(26 men and 18 women), both during their COVID-19 illness and two months post-recovery. Prior to analysis, data preprocessing is performed to remove artifacts and outliers. The application of SBE measures to OSV signals unveils a reduction in signal complexity during the course of COVID-19. Leveraging these SBE measures as feature sets, we employ two DL techniques, namely the radial basis function network (RBFN) and RBFN with dynamic delay algorithm (RBFNDDA), for the classification of OSV data collected during and after COVID-19 recovery. To evaluate the classification performance, we employ standard metrics such as sensitivity, specificity, false positive rate (FPR), and the area under the receiver operator characteristic curve (AUC). Among the three scale-based entropy measures, MFE outperformed MSE and MPE by achieving the highest classification performance using RBFN with 13 best features having sensitivity (0.84), FPR (0.30), specificity (0.70) and AUC (0.77). The outcomes of our study demonstrate that SBE measures combined with DL methods offer a valuable approach for categorizing OSV signals obtained during and after COVID-19, ultimately aiding in the detection of CRCS dysfunction.
Collapse
Affiliation(s)
- Madini O Alassafi
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Wajid Aziz
- Department of Computer Science and Information Technology, King Abdullah Campus, University of Azad Jammu and Kashmir Muzaffarabad (AK), Pakistan
| | - Rayed AlGhamdi
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia.
| | | | - Malik Sajjad Ahmed Nadeem
- Department of Computer Science and Information Technology, King Abdullah Campus, University of Azad Jammu and Kashmir Muzaffarabad (AK), Pakistan
| | | | - Nabeel Albishry
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Adel Bahaddad
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ali Altalbe
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
3
|
Sutradhar A, Al Rafi M, Shamrat FMJM, Ghosh P, Das S, Islam MA, Ahmed K, Zhou X, Azad AKM, Alyami SA, Moni MA. BOO-ST and CBCEC: two novel hybrid machine learning methods aim to reduce the mortality of heart failure patients. Sci Rep 2023; 13:22874. [PMID: 38129433 PMCID: PMC10739972 DOI: 10.1038/s41598-023-48486-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023] Open
Abstract
Heart failure (HF) is a leading cause of mortality worldwide. Machine learning (ML) approaches have shown potential as an early detection tool for improving patient outcomes. Enhancing the effectiveness and clinical applicability of the ML model necessitates training an efficient classifier with a diverse set of high-quality datasets. Hence, we proposed two novel hybrid ML methods ((a) consisting of Boosting, SMOTE, and Tomek links (BOO-ST); (b) combining the best-performing conventional classifier with ensemble classifiers (CBCEC)) to serve as an efficient early warning system for HF mortality. The BOO-ST was introduced to tackle the challenge of class imbalance, while CBCEC was responsible for training the processed and selected features derived from the Feature Importance (FI) and Information Gain (IG) feature selection techniques. We also conducted an explicit and intuitive comprehension to explore the impact of potential characteristics correlating with the fatality cases of HF. The experimental results demonstrated the proposed classifier CBCEC showcases a significant accuracy of 93.67% in terms of providing the early forecasting of HF mortality. Therefore, we can reveal that our proposed aspects (BOO-ST and CBCEC) can be able to play a crucial role in preventing the death rate of HF and reducing stress in the healthcare sector.
Collapse
Affiliation(s)
- Ananda Sutradhar
- Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City (DSC), Birulia, Savar, Dhaka, 1216, Bangladesh
| | - Mustahsin Al Rafi
- Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City (DSC), Birulia, Savar, Dhaka, 1216, Bangladesh
| | - F M Javed Mehedi Shamrat
- Department of Computer System and Technology, University of Malaya, 50603, Kuala Lumpur, Malaysia
| | - Pronab Ghosh
- Department of Computer Science, Lakehead University, 955 Oliver Rd, Thunder Bay, ON, P7B 5E1, Canada
| | - Subrata Das
- Department of Computer Science, Lakehead University, 955 Oliver Rd, Thunder Bay, ON, P7B 5E1, Canada
| | - Md Anaytul Islam
- Department of Computer Science, Lakehead University, 955 Oliver Rd, Thunder Bay, ON, P7B 5E1, Canada
| | - Kawsar Ahmed
- Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada
- Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail, 1902, Bangladesh
- Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh
| | - Xujuan Zhou
- School of Business, University of Southern Queensland, Toowoomba, Australia
| | - A K M Azad
- Department of Mathematics and Statistics, Faculty of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), 13318, Riyadh, Saudi Arabia
| | - Salem A Alyami
- Department of Mathematics and Statistics, Faculty of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), 13318, Riyadh, Saudi Arabia
| | - Mohammad Ali Moni
- Centre for AI & Digital Health Technology, Artificial Intelligence & Cyber Future Institute, Charles Stuart University, Bathurst, NSW, 2795, Australia.
| |
Collapse
|
4
|
Pachiyannan P, Alsulami M, Alsadie D, Saudagar AKJ, AlKhathami M, Poonia RC. A Cardiac Deep Learning Model (CDLM) to Predict and Identify the Risk Factor of Congenital Heart Disease. Diagnostics (Basel) 2023; 13:2195. [PMID: 37443589 DOI: 10.3390/diagnostics13132195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 06/07/2023] [Accepted: 06/19/2023] [Indexed: 07/15/2023] Open
Abstract
Congenital heart disease (CHD) is a critical global public health concern, particularly when it comes to newborn mortality. Low- and middle-income countries face the highest mortality rates due to limited resources and inadequate healthcare access. To address this pressing issue, machine learning presents an opportunity to develop accurate predictive models that can assess the risk of death from CHD. These models can empower healthcare professionals by identifying high-risk infants and enabling appropriate care. Additionally, machine learning can uncover patterns in the risk factors associated with CHD mortality, leading to targeted interventions that prevent or reduce mortality among vulnerable newborns. This paper proposes an innovative machine learning approach to minimize newborn mortality related to CHD. By analyzing data from infants diagnosed with CHD, the model identifies key risk factors contributing to mortality. Armed with this knowledge, healthcare providers can devise customized interventions, including intensified care for high-risk infants and early detection and treatment strategies. The proposed diagnostic model utilizes maternal clinical history and fetal health information to accurately predict the condition of newborns affected by CHD. The results are highly promising, with the proposed Cardiac Deep Learning Model (CDLM) achieving remarkable performance metrics, including a sensitivity of 91.74%, specificity of 92.65%, positive predictive value of 90.85%, negative predictive value of 55.62%, and a miss rate of 91.03%. This research aims to make a significant impact by equipping healthcare professionals with powerful tools to combat CHD-related newborn mortality, ultimately saving lives and improving healthcare outcomes worldwide.
Collapse
Affiliation(s)
| | - Musleh Alsulami
- Information Systems Department, Umm Al-Qura University, Makkah 21961, Saudi Arabia
| | - Deafallah Alsadie
- Information Systems Department, Umm Al-Qura University, Makkah 21961, Saudi Arabia
| | | | - Mohammed AlKhathami
- Information Systems Department, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11432, Saudi Arabia
| | | |
Collapse
|
5
|
Esmaeili P, Roshanravan N, Mousavi S, Ghaffari S, Mesri Alamdari N, Asghari-Jafarabadi M. Machine learning framework for atherosclerotic cardiovascular disease risk assessment. J Diabetes Metab Disord 2023; 22:423-430. [PMID: 37255822 PMCID: PMC10225383 DOI: 10.1007/s40200-022-01160-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 11/20/2022] [Indexed: 06/01/2023]
Abstract
Introduction Atherosclerotic cardiovascular disease (ASCVD) is the first leading cause of mortality globally. To identify the individual risk factors of ASCVD utilizing the machine learning (ML) approaches. Materials & methods This cohort-based cross-sectional study was conducted on data of 500 participants with ASCVD among Tabriz University Medical Sciences employees, during 2020. The data with ML methods were developed and validated to predict ASCVD risk with naive Bayes (NB), spurt vesture machines (SVM), regression tree (RT), k-nearest neighbors (KNN), artificial neural networks (ANN), generalized additive models (GAM), and logistic regression (LR). Results Accuracy of the models ranged from 95.7 to 98.1%, with a sensitivity of 50.0 to 97.3%, specificity of 74.3 to 99.1%, positive predictive value (PPV) of 0.0 to 98.0%, negative predictive value (NPV) of 68.4 to 100.0%, positive likelihood ratio (LR +) of 13.8 to 96.4%, negative likelihood ratio (LR-) of 3.6 to 51.9%, and area under ROC curve (AUC) of 62.5 to 99.4%. The ANN fit the data best with an accuracy of 98.1% (95% CI: 96.5-99.1), a specificity of 99.1% (95% CI: 97.7-99.9), a LR + of 96.4% (95% CI: 36.2-258.8), and AUC of 99.4% (95% CI: 85.2-97.0). Based on the optimal model, sex (females), age, smoking, and metabolic syndrome were shown to be the most important risk factors of ASCVD. Conclusion Sex (females), age, smoking, and metabolic syndrome were predictors obtained by ANN. Considering the ANN as the optimal model identified, more accurate prevention planning may be designed.
Collapse
Affiliation(s)
- Parya Esmaeili
- Liver and Gastrointestinal Diseases Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
- Department of Epidemiology and Biostatistics, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Neda Roshanravan
- Cardiovascular Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Saeid Mousavi
- Department of Epidemiology and Biostatistics, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Samad Ghaffari
- Cardiovascular Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | | | - Mohammad Asghari-Jafarabadi
- Department of Epidemiology and Biostatistics, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
- Road Traffic Injury Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
- Cabrini Research, Cabrini Health, 154 Wattletree Rd, Malvern, VIC 3144 Australia
- School of Public Health and Preventative Medicine, Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, VIC 3800 Australia
| |
Collapse
|
6
|
Yan K, Li T, Marques JAL, Gao J, Fong SJ. A review on multimodal machine learning in medical diagnostics. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:8708-8726. [PMID: 37161218 DOI: 10.3934/mbe.2023382] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Nowadays, the increasing number of medical diagnostic data and clinical data provide more complementary references for doctors to make diagnosis to patients. For example, with medical data, such as electrocardiography (ECG), machine learning algorithms can be used to identify and diagnose heart disease to reduce the workload of doctors. However, ECG data is always exposed to various kinds of noise and interference in reality, and medical diagnostics only based on one-dimensional ECG data is not trustable enough. By extracting new features from other types of medical data, we can implement enhanced recognition methods, called multimodal learning. Multimodal learning helps models to process data from a range of different sources, eliminate the requirement for training each single learning modality, and improve the robustness of models with the diversity of data. Growing number of articles in recent years have been devoted to investigating how to extract data from different sources and build accurate multimodal machine learning models, or deep learning models for medical diagnostics. This paper reviews and summarizes several recent papers that dealing with multimodal machine learning in disease detection, and identify topics for future research.
Collapse
Affiliation(s)
- Keyue Yan
- Department of Computer and Information Science, University of Macau, Macau SAR, China
| | - Tengyue Li
- Department of Computer and Information Science, University of Macau, Macau SAR, China
| | | | - Juntao Gao
- Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
| | - Simon James Fong
- Department of Computer and Information Science, University of Macau, Macau SAR, China
- Institute of Artificial Intelligence, Chongqing Technology and Business University, Chongqing, China
| |
Collapse
|
7
|
Yang J, Yee PL, Khan AA, Karamti H, Eldin ET, Aldweesh A, Jery AE, Hussain L, Omar A. Intelligent lung cancer MRI prediction analysis based on cluster prominence and posterior probabilities utilizing intelligent Bayesian methods on extracted gray-level co-occurrence (GLCM) features. Digit Health 2023; 9:20552076231172632. [PMID: 37256015 PMCID: PMC10226179 DOI: 10.1177/20552076231172632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 04/12/2023] [Indexed: 06/01/2023] Open
Abstract
Lung cancer is the second foremost cause of cancer due to which millions of deaths occur worldwide. Developing automated tools is still a challenging task to improve the prediction. This study is specifically conducted for detailed posterior probabilities analysis to unfold the network associations among the gray-level co-occurrence matrix (GLCM) features. We then ranked the features based on t-test. The Cluster Prominence is selected as target node. The association and arc analysis were determined based on mutual information. The occurrence and reliability of selected cluster states were computed. The Cluster Prominence at state ≤330.85 yielded ROC index of 100%, relative Gini index of 99.98%, and relative Gini index of 100%. The proposed method further unfolds the dynamics and to detailed analysis of computed features based on GLCM features for better understanding of the hidden dynamics for proper diagnosis and prognosis of lung cancer.
Collapse
Affiliation(s)
- Jing Yang
- Faculty of Computer Science and
Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia
| | - Por Lip Yee
- Faculty of Computer Science and
Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia
| | - Abdullah Ayub Khan
- Department of Computer Science and
Information Technology, Benazir Bhutto Shaheed University Lyari, Karachi,
Pakistan
| | - Hanen Karamti
- Department of Computer Sciences,
College of Computer and Information Sciences, Princess Nourah bint Abdulrahman
University, Riyadh, Saudi Arabia
| | - Elsayed Tag Eldin
- Faculty of Engineering and Technology, Future University in Egypt, New Cairo, Cairo, Egypt
| | - Amjad Aldweesh
- College of Computer Science and
Information Technology, Shaqra University, Shaqra, Saudi Arabia
| | - Atef El Jery
- Department of Chemical Engineering,
College of Engineering, King Khalid University, Abha, Saudi Arabia
- National Engineering School of Gabes,
Gabes University, Zrig Gabes, Tunisia
| | - Lal Hussain
- Department of Computer Science and
Information Technology, King Abdullah Campus Chatter Kalas, University of Azad Jammu
and Kashmir, Muzaffarabad, Azad Kashmir, Pakistan
- Department of Computer Science and
Information Technology, University of Azad Jammu and Kashmir, Athmuqam, Azad
Kashmir, Pakistan
| | - Abdulfattah Omar
- Department of English, College of
Science & Humanities, Prince Sattam Bin Abdulaziz
University, Al-Kharj, Saudi Arabia
| |
Collapse
|
8
|
Hussain L, Malibari AA, Alzahrani JS, Alamgeer M, Obayya M, Al-Wesabi FN, Mohsen H, Hamza MA. Bayesian dynamic profiling and optimization of important ranked energy from gray level co-occurrence (GLCM) features for empirical analysis of brain MRI. Sci Rep 2022; 12:15389. [PMID: 36100621 PMCID: PMC9470580 DOI: 10.1038/s41598-022-19563-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 08/31/2022] [Indexed: 11/09/2022] Open
Abstract
AbstractAccurate classification of brain tumor subtypes is important for prognosis and treatment. Researchers are developing tools based on static and dynamic feature extraction and applying machine learning and deep learning. However, static feature requires further analysis to compute the relevance, strength, and types of association. Recently Bayesian inference approach gains attraction for deeper analysis of static (hand-crafted) features to unfold hidden dynamics and relationships among features. We computed the gray level co-occurrence (GLCM) features from brain tumor meningioma and pituitary MRIs and then ranked based on entropy methods. The highly ranked Energy feature was chosen as our target variable for further empirical analysis of dynamic profiling and optimization to unfold the nonlinear intrinsic dynamics of GLCM features extracted from brain MRIs. The proposed method further unfolds the dynamics and to detailed analysis of computed features based on GLCM features for better understanding of the hidden dynamics for proper diagnosis and prognosis of tumor types leading to brain stroke.
Collapse
|
9
|
de Moura LV, Mattjie C, Dartora CM, Barros RC, Marques da Silva AM. Explainable Machine Learning for COVID-19 Pneumonia Classification With Texture-Based Features Extraction in Chest Radiography. Front Digit Health 2022; 3:662343. [PMID: 35112097 PMCID: PMC8801500 DOI: 10.3389/fdgth.2021.662343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 11/29/2021] [Indexed: 12/18/2022] Open
Abstract
Both reverse transcription-PCR (RT-PCR) and chest X-rays are used for the diagnosis of the coronavirus disease-2019 (COVID-19). However, COVID-19 pneumonia does not have a defined set of radiological findings. Our work aims to investigate radiomic features and classification models to differentiate chest X-ray images of COVID-19-based pneumonia and other types of lung patterns. The goal is to provide grounds for understanding the distinctive COVID-19 radiographic texture features using supervised ensemble machine learning methods based on trees through the interpretable Shapley Additive Explanations (SHAP) approach. We use 2,611 COVID-19 chest X-ray images and 2,611 non-COVID-19 chest X-rays. After segmenting the lung in three zones and laterally, a histogram normalization is applied, and radiomic features are extracted. SHAP recursive feature elimination with cross-validation is used to select features. Hyperparameter optimization of XGBoost and Random Forest ensemble tree models is applied using random search. The best classification model was XGBoost, with an accuracy of 0.82 and a sensitivity of 0.82. The explainable model showed the importance of the middle left and superior right lung zones in classifying COVID-19 pneumonia from other lung patterns.
Collapse
Affiliation(s)
- Luís Vinícius de Moura
- Medical Image Computing Laboratory, School of Technology, Pontifical Catholic University of Rio Grande do Sul, PUCRS, Porto Alegre, Brazil
| | - Christian Mattjie
- Medical Image Computing Laboratory, School of Technology, Pontifical Catholic University of Rio Grande do Sul, PUCRS, Porto Alegre, Brazil
- Graduate Program in Biomedical Gerontology, School of Medicine, Pontifical Catholic University of Rio Grande do Sul, PUCRS, Porto Alegre, Brazil
| | - Caroline Machado Dartora
- Medical Image Computing Laboratory, School of Technology, Pontifical Catholic University of Rio Grande do Sul, PUCRS, Porto Alegre, Brazil
- Graduate Program in Biomedical Gerontology, School of Medicine, Pontifical Catholic University of Rio Grande do Sul, PUCRS, Porto Alegre, Brazil
| | - Rodrigo C. Barros
- Machine Learning Theory and Applications Lab, School of Technology, Pontifical Catholic University of Rio Grande do Sul, PUCRS, Porto Alegre, Brazil
| | - Ana Maria Marques da Silva
- Medical Image Computing Laboratory, School of Technology, Pontifical Catholic University of Rio Grande do Sul, PUCRS, Porto Alegre, Brazil
- Graduate Program in Biomedical Gerontology, School of Medicine, Pontifical Catholic University of Rio Grande do Sul, PUCRS, Porto Alegre, Brazil
| |
Collapse
|
10
|
Simultaneous Feature Selection and Support Vector Machine Optimization Using an Enhanced Chimp Optimization Algorithm. ALGORITHMS 2021. [DOI: 10.3390/a14100282] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Chimp Optimization Algorithm (ChOA), a novel meta-heuristic algorithm, has been proposed in recent years. It divides the population into four different levels for the purpose of hunting. However, there are still some defects that lead to the algorithm falling into the local optimum. To overcome these defects, an Enhanced Chimp Optimization Algorithm (EChOA) is developed in this paper. Highly Disruptive Polynomial Mutation (HDPM) is introduced to further explore the population space and increase the population diversity. Then, the Spearman’s rank correlation coefficient between the chimps with the highest fitness and the lowest fitness is calculated. In order to avoid the local optimization, the chimps with low fitness values are introduced with Beetle Antenna Search Algorithm (BAS) to obtain visual ability. Through the introduction of the above three strategies, the ability of population exploration and exploitation is enhanced. On this basis, this paper proposes an EChOA-SVM model, which can optimize parameters while selecting the features. Thus, the maximum classification accuracy can be achieved with as few features as possible. To verify the effectiveness of the proposed method, the proposed method is compared with seven common methods, including the original algorithm. Seventeen benchmark datasets from the UCI machine learning library are used to evaluate the accuracy, number of features, and fitness of these methods. Experimental results show that the classification accuracy of the proposed method is better than the other methods on most data sets, and the number of features required by the proposed method is also less than the other algorithms.
Collapse
|