1
|
Salih AM, Galazzo IB, Gkontra P, Rauseo E, Lee AM, Lekadir K, Radeva P, Petersen SE, Menegaz G. A review of evaluation approaches for explainable AI with applications in cardiology. Artif Intell Rev 2024; 57:240. [PMID: 39132011 PMCID: PMC11315784 DOI: 10.1007/s10462-024-10852-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/03/2024] [Indexed: 08/13/2024]
Abstract
Explainable artificial intelligence (XAI) elucidates the decision-making process of complex AI models and is important in building trust in model predictions. XAI explanations themselves require evaluation as to accuracy and reasonableness and in the context of use of the underlying AI model. This review details the evaluation of XAI in cardiac AI applications and has found that, of the studies examined, 37% evaluated XAI quality using literature results, 11% used clinicians as domain-experts, 11% used proxies or statistical analysis, with the remaining 43% not assessing the XAI used at all. We aim to inspire additional studies within healthcare, urging researchers not only to apply XAI methods but to systematically assess the resulting explanations, as a step towards developing trustworthy and safe models. Supplementary Information The online version contains supplementary material available at 10.1007/s10462-024-10852-w.
Collapse
Affiliation(s)
- Ahmed M. Salih
- William Harvey Research Institute, NIHR Barts Biomedical Research Centre, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK
- Department of Population Health Sciences, University of Leicester, University Rd, Leicester, LE1 7RH UK
- Department of Computer Science, University of Zakho, Duhok road, Zakho, Kurdistan Iraq
| | - Ilaria Boscolo Galazzo
- Department of Engineering for Innovative Medicine, University of Verona, S. Francesco, 22, 37129 Verona, Italy
| | - Polyxeni Gkontra
- Artificial Intelligence in Medicine Lab (BCN-AIM), Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585, 08007 Barcelona, Spain
| | - Elisa Rauseo
- William Harvey Research Institute, NIHR Barts Biomedical Research Centre, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK
| | - Aaron Mark Lee
- William Harvey Research Institute, NIHR Barts Biomedical Research Centre, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK
| | - Karim Lekadir
- Artificial Intelligence in Medicine Lab (BCN-AIM), Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585, 08007 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, Barcelona, Spain
| | - Petia Radeva
- Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585, 08007 Barcelona, Spain
| | - Steffen E. Petersen
- William Harvey Research Institute, NIHR Barts Biomedical Research Centre, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK
- Barts Heart Centre, St Bartholomew’s Hospital, Barts Health NHS Trust, West Smithfield, London, UK
- Health Data Research, London, UK
- Alan Turing Institute, London, UK
| | - Gloria Menegaz
- Department of Engineering for Innovative Medicine, University of Verona, S. Francesco, 22, 37129 Verona, Italy
| |
Collapse
|
2
|
Goettling M, Hammer A, Malberg H, Schmidt M. xECGArch: a trustworthy deep learning architecture for interpretable ECG analysis considering short-term and long-term features. Sci Rep 2024; 14:13122. [PMID: 38849417 PMCID: PMC11161651 DOI: 10.1038/s41598-024-63656-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 05/30/2024] [Indexed: 06/09/2024] Open
Abstract
Deep learning-based methods have demonstrated high classification performance in the detection of cardiovascular diseases from electrocardiograms (ECGs). However, their blackbox character and the associated lack of interpretability limit their clinical applicability. To overcome existing limitations, we present a novel deep learning architecture for interpretable ECG analysis (xECGArch). For the first time, short- and long-term features are analyzed by two independent convolutional neural networks (CNNs) and combined into an ensemble, which is extended by methods of explainable artificial intelligence (xAI) to whiten the blackbox. To demonstrate the trustworthiness of xECGArch, perturbation analysis was used to compare 13 different xAI methods. We parameterized xECGArch for atrial fibrillation (AF) detection using four public ECG databases ( n = 9854 ECGs) and achieved an F1 score of 95.43% in AF versus non-AF classification on an unseen ECG test dataset. A systematic comparison of xAI methods showed that deep Taylor decomposition provided the most trustworthy explanations ( + 24 % compared to the second-best approach). xECGArch can account for short- and long-term features corresponding to clinical features of morphology and rhythm, respectively. Further research will focus on the relationship between xECGArch features and clinical features, which may help in medical applications for diagnosis and therapy.
Collapse
Affiliation(s)
- Marc Goettling
- Institute of Biomedical Engineering, TU Dresden, Fetscherstr. 29, 01307, Dresden, Germany
| | - Alexander Hammer
- Institute of Biomedical Engineering, TU Dresden, Fetscherstr. 29, 01307, Dresden, Germany
| | - Hagen Malberg
- Institute of Biomedical Engineering, TU Dresden, Fetscherstr. 29, 01307, Dresden, Germany
| | - Martin Schmidt
- Institute of Biomedical Engineering, TU Dresden, Fetscherstr. 29, 01307, Dresden, Germany.
| |
Collapse
|
3
|
Gupta U, Paluru N, Nankani D, Kulkarni K, Awasthi N. A comprehensive review on efficient artificial intelligence models for classification of abnormal cardiac rhythms using electrocardiograms. Heliyon 2024; 10:e26787. [PMID: 38562492 PMCID: PMC10982903 DOI: 10.1016/j.heliyon.2024.e26787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 02/20/2024] [Indexed: 04/04/2024] Open
Abstract
Deep learning has made many advances in data classification using electrocardiogram (ECG) waveforms. Over the past decade, data science research has focused on developing artificial intelligence (AI) based models that can analyze ECG waveforms to identify and classify abnormal cardiac rhythms accurately. However, the primary drawback of the current AI models is that most of these models are heavy, computationally intensive, and inefficient in terms of cost for real-time implementation. In this review, we first discuss the current state-of-the-art AI models utilized for ECG-based cardiac rhythm classification. Next, we present some of the upcoming modeling methodologies which have the potential to perform real-time implementation of AI-based heart rhythm diagnosis. These models hold significant promise in being lightweight and computationally efficient without compromising the accuracy. Contemporary models predominantly utilize 12-lead ECG for cardiac rhythm classification and cardiovascular status prediction, increasing the computational burden and making real-time implementation challenging. We also summarize research studies evaluating the potential of efficient data setups to reduce the number of ECG leads without affecting classification accuracy. Lastly, we present future perspectives on AI's utility in precision medicine by providing opportunities for accurate prediction and diagnostics of cardiovascular status in patients.
Collapse
Affiliation(s)
- Utkarsh Gupta
- Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru, 560012, India
| | - Naveen Paluru
- Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru, 560012, India
| | - Deepankar Nankani
- Department of Computer Science and Engineering, Indian Institute of Technology, Guwahati, Assam, 781039, India
| | - Kanchan Kulkarni
- IHU-LIRYC, Heart Rhythm Disease Institute, Fondation Bordeaux Université, Pessac, Bordeaux, F-33000, France
- University of Bordeaux, INSERM, Centre de recherche Cardio-Thoracique de Bordeaux, U1045, Bordeaux, F-33000, France
| | - Navchetan Awasthi
- Faculty of Science, Mathematics and Computer Science, Informatics Institute, University of Amsterdam, Amsterdam, 1090 GH, the Netherlands
- Department of Biomedical Engineering and Physics, Amsterdam UMC, Amsterdam, 1081 HV, the Netherlands
| |
Collapse
|
4
|
Ayano YM, Schwenker F, Dufera BD, Debelee TG. Interpretable Machine Learning Techniques in ECG-Based Heart Disease Classification: A Systematic Review. Diagnostics (Basel) 2022; 13:111. [PMID: 36611403 PMCID: PMC9818170 DOI: 10.3390/diagnostics13010111] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 12/22/2022] [Accepted: 12/23/2022] [Indexed: 12/31/2022] Open
Abstract
Heart disease is one of the leading causes of mortality throughout the world. Among the different heart diagnosis techniques, an electrocardiogram (ECG) is the least expensive non-invasive procedure. However, the following are challenges: the scarcity of medical experts, the complexity of ECG interpretations, the manifestation similarities of heart disease in ECG signals, and heart disease comorbidity. Machine learning algorithms are viable alternatives to the traditional diagnoses of heart disease from ECG signals. However, the black box nature of complex machine learning algorithms and the difficulty in explaining a model's outcomes are obstacles for medical practitioners in having confidence in machine learning models. This observation paves the way for interpretable machine learning (IML) models as diagnostic tools that can build a physician's trust and provide evidence-based diagnoses. Therefore, in this systematic literature review, we studied and analyzed the research landscape in interpretable machine learning techniques by focusing on heart disease diagnosis from an ECG signal. In this regard, the contribution of our work is manifold; first, we present an elaborate discussion on interpretable machine learning techniques. In addition, we identify and characterize ECG signal recording datasets that are readily available for machine learning-based tasks. Furthermore, we identify the progress that has been achieved in ECG signal interpretation using IML techniques. Finally, we discuss the limitations and challenges of IML techniques in interpreting ECG signals.
Collapse
Affiliation(s)
| | | | - Bisrat Derebssa Dufera
- Addis Ababa Institute of Technology, Addis Ababa University, Addis Ababa 11760, Ethiopia
| | - Taye Girma Debelee
- Ethiopian Artificial Intelligence Institute, Addis Ababa 40782, Ethiopia
- College of Electrical and Computer Engineering, Addis Ababa Science and Technology University, Addis Ababa 16417, Ethiopia
| |
Collapse
|
5
|
Jekova I, Christov I, Krasteva V. Atrioventricular Synchronization for Detection of Atrial Fibrillation and Flutter in One to Twelve ECG Leads Using a Dense Neural Network Classifier. SENSORS (BASEL, SWITZERLAND) 2022; 22:6071. [PMID: 36015834 PMCID: PMC9413391 DOI: 10.3390/s22166071] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 08/10/2022] [Accepted: 08/10/2022] [Indexed: 06/01/2023]
Abstract
This study investigates the use of atrioventricular (AV) synchronization as an important diagnostic criterion for atrial fibrillation and flutter (AF) using one to twelve ECG leads. Heart rate, lead-specific AV conduction time, and P-/f-wave amplitude were evaluated by three representative ECG metrics (mean value, standard deviation), namely RR-interval (RRi-mean, RRi-std), PQ-interval (PQi-mean, PQI-std), and PQ-amplitude (PQa-mean, PQa-std), in 71,545 standard 12-lead ECG records from the six largest PhysioNet CinC Challenge 2021 databases. Two rhythm classes were considered (AF, non-AF), randomly assigning records into training (70%), validation (20%), and test (10%) datasets. In a grid search of 19, 55, and 83 dense neural network (DenseNet) architectures and five independent training runs, we optimized models for one-lead, six-lead (chest or limb), and twelve-lead input features. Lead-set performance and SHapley Additive exPlanations (SHAP) input feature importance were evaluated on the test set. Optimal DenseNet architectures with the number of neurons in sequential [1st, 2nd, 3rd] hidden layers were assessed for sensitivity and specificity: DenseNet [16,16,0] with primary leads (I or II) had 87.9-88.3 and 90.5-91.5%; DenseNet [32,32,32] with six limb leads had 90.7 and 94.2%; DenseNet [32,32,4] with six chest leads had 92.1 and 93.2%; and DenseNet [128,8,8] with all 12 leads had 91.8 and 95.8%, indicating sensitivity and specificity values, respectively. Mean SHAP values on the entire test set highlighted the importance of RRi-mean (100%), RR-std (84%), and atrial synchronization (40-60%) for the PQa-mean (aVR, I), PQi-std (V2, aVF, II), and PQi-mean (aVL, aVR). Our focus on finding the strongest AV synchronization predictors of AF in 12-lead ECGs would lead to a comprehensive understanding of the decision-making process in advanced neural network classifiers. DenseNet self-learned to rely on a few ECG behavioral characteristics: first, characteristics usually associated with AF conduction such as rapid heart rate, enhanced heart rate variability, and large PQ-interval deviation in V2 and inferior leads (aVF, II); second, characteristics related to a typical P-wave pattern in sinus rhythm, which is best distinguished from AF by the earliest negative P-peak deflection of the right atrium in the lead (aVR) and late positive left atrial deflection in lateral leads (I, aVL). Our results on lead-selection and feature-selection practices for AF detection should be considered for one- to twelve-lead ECG signal processing settings, particularly those measuring heart rate, AV conduction times, and P-/f-wave amplitudes. Performances are limited to the AF diagnostic potential of these three metrics. SHAP value importance can be used in combination with a human expert's ECG interpretation to change the focus from a broad observation of 12-lead ECG morphology to focusing on the few AV synchronization findings strongly predictive of AF or non-AF arrhythmias. Our results are representative of AV synchronization findings across a broad taxonomy of cardiac arrhythmias in large 12-lead ECG databases.
Collapse
|