1
|
Salih AM, Galazzo IB, Gkontra P, Rauseo E, Lee AM, Lekadir K, Radeva P, Petersen SE, Menegaz G. A review of evaluation approaches for explainable AI with applications in cardiology. Artif Intell Rev 2024; 57:240. [PMID: 39132011 PMCID: PMC11315784 DOI: 10.1007/s10462-024-10852-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/03/2024] [Indexed: 08/13/2024]
Abstract
Explainable artificial intelligence (XAI) elucidates the decision-making process of complex AI models and is important in building trust in model predictions. XAI explanations themselves require evaluation as to accuracy and reasonableness and in the context of use of the underlying AI model. This review details the evaluation of XAI in cardiac AI applications and has found that, of the studies examined, 37% evaluated XAI quality using literature results, 11% used clinicians as domain-experts, 11% used proxies or statistical analysis, with the remaining 43% not assessing the XAI used at all. We aim to inspire additional studies within healthcare, urging researchers not only to apply XAI methods but to systematically assess the resulting explanations, as a step towards developing trustworthy and safe models. Supplementary Information The online version contains supplementary material available at 10.1007/s10462-024-10852-w.
Collapse
Affiliation(s)
- Ahmed M. Salih
- William Harvey Research Institute, NIHR Barts Biomedical Research Centre, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK
- Department of Population Health Sciences, University of Leicester, University Rd, Leicester, LE1 7RH UK
- Department of Computer Science, University of Zakho, Duhok road, Zakho, Kurdistan Iraq
| | - Ilaria Boscolo Galazzo
- Department of Engineering for Innovative Medicine, University of Verona, S. Francesco, 22, 37129 Verona, Italy
| | - Polyxeni Gkontra
- Artificial Intelligence in Medicine Lab (BCN-AIM), Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585, 08007 Barcelona, Spain
| | - Elisa Rauseo
- William Harvey Research Institute, NIHR Barts Biomedical Research Centre, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK
| | - Aaron Mark Lee
- William Harvey Research Institute, NIHR Barts Biomedical Research Centre, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK
| | - Karim Lekadir
- Artificial Intelligence in Medicine Lab (BCN-AIM), Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585, 08007 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, Barcelona, Spain
| | - Petia Radeva
- Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585, 08007 Barcelona, Spain
| | - Steffen E. Petersen
- William Harvey Research Institute, NIHR Barts Biomedical Research Centre, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK
- Barts Heart Centre, St Bartholomew’s Hospital, Barts Health NHS Trust, West Smithfield, London, UK
- Health Data Research, London, UK
- Alan Turing Institute, London, UK
| | - Gloria Menegaz
- Department of Engineering for Innovative Medicine, University of Verona, S. Francesco, 22, 37129 Verona, Italy
| |
Collapse
|
2
|
Bu ZJ, Jiang N, Li KC, Lu ZL, Zhang N, Yan SS, Chen ZL, Hao YH, Zhang YH, Xu RB, Chi HW, Chen ZY, Liu JP, Wang D, Xu F, Liu ZL. Development and Validation of an Interpretable Machine Learning Model for Early Prognosis Prediction in ICU Patients with Malignant Tumors and Hyperkalemia. Medicine (Baltimore) 2024; 103:e38747. [PMID: 39058887 PMCID: PMC11272258 DOI: 10.1097/md.0000000000038747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 06/07/2024] [Indexed: 07/28/2024] Open
Abstract
This study aims to develop and validate a machine learning (ML) predictive model for assessing mortality in patients with malignant tumors and hyperkalemia (MTH). We extracted data on patients with MTH from the Medical Information Mart for Intensive Care-IV, version 2.2 (MIMIC-IV v2.2) database. The dataset was split into a training set (75%) and a validation set (25%). We used the Least Absolute Shrinkage and Selection Operator (LASSO) regression to identify potential predictors, which included clinical laboratory indicators and vital signs. Pearson correlation analysis tested the correlation between predictors. In-hospital death was the prediction target. The Area Under the Curve (AUC) and accuracy of the training and validation sets of 7 ML algorithms were compared, and the optimal 1 was selected to develop the model. The calibration curve was used to evaluate the prediction accuracy of the model further. SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) enhanced model interpretability. 496 patients with MTH in the Intensive Care Unit (ICU) were included. After screening, 17 clinical features were included in the construction of the ML model, and the Pearson correlation coefficient was <0.8, indicating that the correlation between the clinical features was small. eXtreme Gradient Boosting (XGBoost) outperformed other algorithms, achieving perfect scores in the training set (accuracy: 1.000, AUC: 1.000) and high scores in the validation set (accuracy: 0.734, AUC: 0.733). The calibration curves indicated good predictive calibration of the model. SHAP analysis identified the top 8 predictive factors: urine output, mean heart rate, maximum urea nitrogen, minimum oxygen saturation, minimum mean blood pressure, maximum total bilirubin, mean respiratory rate, and minimum pH. In addition, SHAP and LIME performed in-depth individual case analyses. This study demonstrates the effectiveness of ML methods in predicting mortality risk in ICU patients with MTH. It highlights the importance of predictors like urine output and mean heart rate. SHAP and LIME significantly enhanced the model's interpretability.
Collapse
Affiliation(s)
- Zhi-Jun Bu
- Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Nan Jiang
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
- The Third Affiliated Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Ke-Cheng Li
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
- Department of Andrology, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Zhi-Lin Lu
- First Clinical College, Hubei University of Chinese Medicine, Wuhan, China
| | - Nan Zhang
- School of International Studies, University of International Business and Economics, Beijing, China
| | - Shao-Shuai Yan
- Department of Thyropathy, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Zhi-Lin Chen
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Yu-Han Hao
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Yu-Huan Zhang
- School of Acupuncture and Orthopedics, Hubei University of Chinese Medicine, Wuhan, China
| | - Run-Bing Xu
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
- Department of Hematology and Oncology, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Han-Wei Chi
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Zu-Yi Chen
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Jian-Ping Liu
- Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Dan Wang
- Surgery of Thyroid Gland and Breast, Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan, China
- Hubei Shizhen Laboratory, Wuhan, China
| | - Feng Xu
- The Third Affiliated Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Zhao-Lan Liu
- Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| |
Collapse
|
3
|
Jin J, Lu J, Su X, Xiong Y, Ma S, Kong Y, Xu H. Development and Validation of an ICU-Venous Thromboembolism Prediction Model Using Machine Learning Approaches: A Multicenter Study. Int J Gen Med 2024; 17:3279-3292. [PMID: 39070227 PMCID: PMC11283785 DOI: 10.2147/ijgm.s467374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 07/12/2024] [Indexed: 07/30/2024] Open
Abstract
Purpose The purpose of this study was to establish and validate machine learning-based models for predicting the risk of venous thromboembolism (VTE) in intensive care unit (ICU) patients. Patients and Methods The clinical data of 1494 ICU patients who underwent Doppler ultrasonography or venography between December 2020 and March 2023 were extracted from three tertiary hospitals. The Boruta algorithm was used to screen the essential variables associated with VTE. Five machine learning algorithms were employed: Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Gradient Boosting Decision Tree (GBDT), and Logistic Regression (LR). Hyperparameter optimization was conducted on the predictive model of the training dataset. The performance in the validation dataset was measured using indicators, including the area under curve (AUC) of the receiver operating characteristic (ROC) curve, specificity, and F1 score. Finally, the optimal model was interpreted using the SHapley Additive exPlanation (SHAP) package. Results The incidence of VTE among the ICU patients in this study was 26.04%. We screened 19 crucial features for the risk prediction model development. Among the five models, the RF model performed best, with an AUC of 0.788 (95% CI: 0.738-0.838), an accuracy of 0.759 (95% CI: 0.709-0.809), a sensitivity of 0.633, and a Brier score of 0.166. Conclusion A machine learning-based model for prediction of VTE in ICU patients were successfully developed, which could assist clinical medical staff in identifying high-risk populations for VTE in the early stages so that prevention measures can be implemented to reduce the burden on the ICU patients.
Collapse
Affiliation(s)
- Jie Jin
- School of Nursing, Binzhou Medical University, Binzhou, People’s Republic of China
| | - Jie Lu
- School of Nursing, Binzhou Medical University, Binzhou, People’s Republic of China
| | - Xinyang Su
- Department of Spine Surgery, Binzhou Medical University Hospital, Binzhou, People’s Republic of China
| | - Yinhuan Xiong
- Department of Nursing, Binzhou People’s Hospital, Binzhou, People’s Republic of China
| | - Shasha Ma
- Department of Neurosurgery, Binzhou Medical University Hospital, Binzhou, People’s Republic of China
| | - Yang Kong
- School of Health Management, Binzhou Medical University, Yantai, People’s Republic of China
| | - Hongmei Xu
- School of Nursing, Binzhou Medical University, Binzhou, People’s Republic of China
| |
Collapse
|
4
|
Ketabi M, Andishgar A, Fereidouni Z, Sani MM, Abdollahi A, Vali M, Alkamel A, Tabrizi R. Predicting the risk of mortality and rehospitalization in heart failure patients: A retrospective cohort study by machine learning approach. Clin Cardiol 2024; 47:e24239. [PMID: 38402566 PMCID: PMC10894620 DOI: 10.1002/clc.24239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 01/17/2024] [Accepted: 02/09/2024] [Indexed: 02/26/2024] Open
Abstract
BACKGROUND Heart failure (HF) is a global problem, affecting more than 26 million people worldwide. This study evaluated the performance of 10 machine learning (ML) algorithms and chose the best algorithm to predict mortality and readmission of HF patients by using The Fasa Registry on Systolic HF (FaRSH) database. HYPOTHESIS ML algorithms may better identify patients at increased risk of HF readmission or death with demographic and clinical data. METHODS Through comprehensive evaluation, the best-performing model was used for prediction. Finally, all the trained models were applied to the test data, which included 20% of the total data. For the final evaluation and comparison of the models, five metrics were used: accuracy, F1-score, sensitivity, specificity and Area Under Curve (AUC). RESULTS Ten ML algorithms were evaluated. The CatBoost (CAT) algorithm uses a series of decision tree models to create a nonlinear model, and this CAT algorithm performed the best of the 10 models studied. According to the three final outcomes from this study, which involved 2488 participants, 366 (14.7%) of the patients were readmitted to the hospital, 97 (3.9%) of the patients died within 1 month of the follow-up, and 342 (13.7%) of the patients died within 1 year of the follow-up. The most significant variables to predict the events were length of stay in the hospital, hemoglobin level, and family history of MI. CONCLUSIONS The ML-based risk stratification tool was able to assess the risk of 5-year all-cause mortality and readmission in patients with HF. ML could provide an explicit explanation of individualized risk prediction and give physicians an intuitive understanding of the influence of critical features in the model.
Collapse
Affiliation(s)
- Marzieh Ketabi
- Student Research CommitteeFasa University of Medical SciencesFasaIran
| | | | - Zhila Fereidouni
- Department of Medical Surgical NursingFasa University of Medical ScienceFarsIran
| | | | - Ashkan Abdollahi
- School of MedicineShiraz University of Medical SciencesShirazIran
| | - Mohebat Vali
- Student Research CommitteeShiraz University of Medical SciencesShirazIran
| | - Abdulhakim Alkamel
- Noncommunicable Diseases Research CenterFasa University of Medical ScienceFasaIran
| | - Reza Tabrizi
- Noncommunicable Diseases Research CenterFasa University of Medical ScienceFasaIran
- Clinical Research Development UnitFasa University of Medical SciencesFasaIran
| |
Collapse
|
5
|
Huang G, Jin Q, Mao Y. Predicting the 5-Year Risk of Nonalcoholic Fatty Liver Disease Using Machine Learning Models: Prospective Cohort Study. J Med Internet Res 2023; 25:e46891. [PMID: 37698911 PMCID: PMC10523217 DOI: 10.2196/46891] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 08/02/2023] [Accepted: 08/16/2023] [Indexed: 09/13/2023] Open
Abstract
BACKGROUND Nonalcoholic fatty liver disease (NAFLD) has emerged as a worldwide public health issue. Identifying and targeting populations at a heightened risk of developing NAFLD over a 5-year period can help reduce and delay adverse hepatic prognostic events. OBJECTIVE This study aimed to investigate the 5-year incidence of NAFLD in the Chinese population. It also aimed to establish and validate a machine learning model for predicting the 5-year NAFLD risk. METHODS The study population was derived from a 5-year prospective cohort study. A total of 6196 individuals without NAFLD who underwent health checkups in 2010 at Zhenhai Lianhua Hospital in Ningbo, China, were enrolled in this study. Extreme gradient boosting (XGBoost)-recursive feature elimination, combined with the least absolute shrinkage and selection operator (LASSO), was used to screen for characteristic predictors. A total of 6 machine learning models, namely logistic regression, decision tree, support vector machine, random forest, categorical boosting, and XGBoost, were utilized in the construction of a 5-year risk model for NAFLD. Hyperparameter optimization of the predictive model was performed in the training set, and a further evaluation of the model performance was carried out in the internal and external validation sets. RESULTS The 5-year incidence of NAFLD was 18.64% (n=1155) in the study population. We screened 11 predictors for risk prediction model construction. After the hyperparameter optimization, CatBoost demonstrated the best prediction performance in the training set, with an area under the receiver operating characteristic (AUROC) curve of 0.810 (95% CI 0.768-0.852). Logistic regression showed the best prediction performance in the internal and external validation sets, with AUROC curves of 0.778 (95% CI 0.759-0.794) and 0.806 (95% CI 0.788-0.821), respectively. The development of web-based calculators has enhanced the clinical feasibility of the risk prediction model. CONCLUSIONS Developing and validating machine learning models can aid in predicting which populations are at the highest risk of developing NAFLD over a 5-year period, thereby helping delay and reduce the occurrence of adverse liver prognostic events.
Collapse
Affiliation(s)
- Guoqing Huang
- Department of Endocrinology, The First Affiliated Hospital of Ningbo University, Ningbo, China
- Health Science Center, Ningbo University, Ningbo, China
| | - Qiankai Jin
- Department of Endocrinology, The First Affiliated Hospital of Ningbo University, Ningbo, China
- Health Science Center, Ningbo University, Ningbo, China
| | - Yushan Mao
- Department of Endocrinology, The First Affiliated Hospital of Ningbo University, Ningbo, China
| |
Collapse
|
6
|
Khan MS, Arshad MS, Greene SJ, Van Spall HGC, Pandey A, Vemulapalli S, Perakslis E, Butler J. Artificial intelligence and heart failure: A state-of-the-art review. Eur J Heart Fail 2023; 25:1507-1525. [PMID: 37560778 DOI: 10.1002/ejhf.2994] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 08/06/2023] [Accepted: 08/08/2023] [Indexed: 08/11/2023] Open
Abstract
Heart failure (HF) is a heterogeneous syndrome affecting more than 60 million individuals globally. Despite recent advancements in understanding of the pathophysiology of HF, many issues remain including residual risk despite therapy, understanding the pathophysiology and phenotypes of patients with HF and preserved ejection fraction, and the challenges related to integrating a large amount of disparate information available for risk stratification and management of these patients. Risk prediction algorithms based on artificial intelligence (AI) may have superior predictive ability compared to traditional methods in certain instances. AI algorithms can play a pivotal role in the evolution of HF care by facilitating clinical decision making to overcome various challenges such as allocation of treatment to patients who are at highest risk or are more likely to benefit from therapies, prediction of adverse outcomes, and early identification of patients with subclinical disease or worsening HF. With the ability to integrate and synthesize large amounts of data with multidimensional interactions, AI algorithms can supply information with which physicians can improve their ability to make timely and better decisions. In this review, we provide an overview of the AI algorithms that have been developed for establishing early diagnosis of HF, phenotyping HF with preserved ejection fraction, and stratifying HF disease severity. This review also discusses the challenges in clinical deployment of AI algorithms in HF, and the potential path forward for developing future novel learning-based algorithms to improve HF care.
Collapse
Affiliation(s)
| | | | - Stephen J Greene
- Division of Cardiology, Duke University School of Medicine, Durham, NC, USA
- Duke Clinical Research Institute, Durham, NC, USA
| | - Harriette G C Van Spall
- Department of Medicine and Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
| | - Ambarish Pandey
- Canada Population Health Research Institute, Hamilton, ON, Canada
- Division of Cardiology, Department of Internal Medicine, UT Southwestern Medical Center, Dallas, TX, USA
| | - Sreekanth Vemulapalli
- Division of Cardiology, Duke University School of Medicine, Durham, NC, USA
- Duke Clinical Research Institute, Durham, NC, USA
| | | | - Javed Butler
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
- Baylor Scott and White Research Institute, Dallas, TX, USA
| |
Collapse
|