1
|
Wahid KA, Kaffey ZY, Farris DP, Humbert-Vidan L, Moreno AC, Rasmussen M, Ren J, Naser MA, Netherton TJ, Korreman S, Balakrishnan G, Fuller CD, Fuentes D, Dohopolski MJ. Artificial intelligence uncertainty quantification in radiotherapy applications - A scoping review. Radiother Oncol 2024; 201:110542. [PMID: 39299574 DOI: 10.1016/j.radonc.2024.110542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 08/18/2024] [Accepted: 09/09/2024] [Indexed: 09/22/2024]
Abstract
BACKGROUND/PURPOSE The use of artificial intelligence (AI) in radiotherapy (RT) is expanding rapidly. However, there exists a notable lack of clinician trust in AI models, underscoring the need for effective uncertainty quantification (UQ) methods. The purpose of this study was to scope existing literature related to UQ in RT, identify areas of improvement, and determine future directions. METHODS We followed the PRISMA-ScR scoping review reporting guidelines. We utilized the population (human cancer patients), concept (utilization of AI UQ), context (radiotherapy applications) framework to structure our search and screening process. We conducted a systematic search spanning seven databases, supplemented by manual curation, up to January 2024. Our search yielded a total of 8980 articles for initial review. Manuscript screening and data extraction was performed in Covidence. Data extraction categories included general study characteristics, RT characteristics, AI characteristics, and UQ characteristics. RESULTS We identified 56 articles published from 2015 to 2024. 10 domains of RT applications were represented; most studies evaluated auto-contouring (50 %), followed by image-synthesis (13 %), and multiple applications simultaneously (11 %). 12 disease sites were represented, with head and neck cancer being the most common disease site independent of application space (32 %). Imaging data was used in 91 % of studies, while only 13 % incorporated RT dose information. Most studies focused on failure detection as the main application of UQ (60 %), with Monte Carlo dropout being the most commonly implemented UQ method (32 %) followed by ensembling (16 %). 55 % of studies did not share code or datasets. CONCLUSION Our review revealed a lack of diversity in UQ for RT applications beyond auto-contouring. Moreover, we identified a clear need to study additional UQ methods, such as conformal prediction. Our results may incentivize the development of guidelines for reporting and implementation of UQ in RT.
Collapse
Affiliation(s)
- Kareem A Wahid
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Zaphanlene Y Kaffey
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - David P Farris
- Research Medical Library, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Laia Humbert-Vidan
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Amy C Moreno
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Jintao Ren
- Department of Oncology, Aarhus University Hospital, Denmark
| | - Mohamed A Naser
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Tucker J Netherton
- Department of Radiation Physics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Stine Korreman
- Department of Oncology, Aarhus University Hospital, Denmark
| | | | - Clifton D Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - David Fuentes
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| | - Michael J Dohopolski
- Department of Radiation Oncology, The University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
2
|
Harris C, Olshvang D, Chellappa R, Santhanam P. Obesity prediction: Novel machine learning insights into waist circumference accuracy. Diabetes Metab Syndr 2024; 18:103113. [PMID: 39243515 DOI: 10.1016/j.dsx.2024.103113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 08/27/2024] [Accepted: 08/28/2024] [Indexed: 09/09/2024]
Abstract
AIMS This study aims to enhance the precision of obesity risk assessments by improving the accuracy of waist circumference predictions using machine learning techniques. METHODS We utilized data from the NHANES and Look AHEAD studies, applying machine learning algorithms augmented with uncertainty quantification. Our approach centered on conformal prediction techniques, which provide a methodological basis for generating prediction intervals that reflect uncertainty levels. This method allows for constructing intervals expected to contain the true waist circumference values with a high degree of probability. RESULTS The application of conformal predictions yielded high coverage rates, achieving 0.955 for men and 0.954 for women in the NHANES dataset. These rates surpassed the expected performance benchmarks and demonstrated robustness when applied to the Look AHEAD dataset, maintaining coverage rates of 0.951 for men and 0.952 for women. Traditional point prediction models did not show such high consistency or reliability. CONCLUSIONS The findings support the integration of waist circumference into standard clinical practice for obesity-related risk assessments using machine learning approaches.
Collapse
Affiliation(s)
- Carl Harris
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21287, USA
| | - Daniel Olshvang
- Department of Electrical and Computer Engineering, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, 21287, USA
| | - Rama Chellappa
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21287, USA; Department of Electrical and Computer Engineering, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, 21287, USA
| | - Prasanna Santhanam
- Division of Endocrinology, Diabetes, and Metabolism, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA.
| |
Collapse
|
3
|
Kobayashi K, Takamizawa Y, Miyake M, Ito S, Gu L, Nakatsuka T, Akagi Y, Harada T, Kanemitsu Y, Hamamoto R. Can physician judgment enhance model trustworthiness? A case study on predicting pathological lymph nodes in rectal cancer. Artif Intell Med 2024; 154:102929. [PMID: 38996696 DOI: 10.1016/j.artmed.2024.102929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 06/24/2024] [Accepted: 07/02/2024] [Indexed: 07/14/2024]
Abstract
Explainability is key to enhancing the trustworthiness of artificial intelligence in medicine. However, there exists a significant gap between physicians' expectations for model explainability and the actual behavior of these models. This gap arises from the absence of a consensus on a physician-centered evaluation framework, which is needed to quantitatively assess the practical benefits that effective explainability should offer practitioners. Here, we hypothesize that superior attention maps, as a mechanism of model explanation, should align with the information that physicians focus on, potentially reducing prediction uncertainty and increasing model reliability. We employed a multimodal transformer to predict lymph node metastasis of rectal cancer using clinical data and magnetic resonance imaging. We explored how well attention maps, visualized through a state-of-the-art technique, can achieve agreement with physician understanding. Subsequently, we compared two distinct approaches for estimating uncertainty: a standalone estimation using only the variance of prediction probability, and a human-in-the-loop estimation that considers both the variance of prediction probability and the quantified agreement. Our findings revealed no significant advantage of the human-in-the-loop approach over the standalone one. In conclusion, this case study did not confirm the anticipated benefit of the explanation in enhancing model reliability. Superficial explanations could do more harm than good by misleading physicians into relying on uncertain predictions, suggesting that the current state of attention mechanisms should not be overestimated in the context of model explainability.
Collapse
Affiliation(s)
- Kazuma Kobayashi
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan.
| | - Yasuyuki Takamizawa
- Department of Colorectal Surgery, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan.
| | - Mototaka Miyake
- Department of Diagnostic Radiology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan.
| | - Sono Ito
- Department of Colorectal Surgery, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan.
| | - Lin Gu
- Machine Intelligence for Medical Engineering Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8904, Japan.
| | - Tatsuya Nakatsuka
- Department of Applied Electronics, Graduate School of Advanced Engineering, Tokyo University of Science, 6-3-1 Niijuku, Katsushika-ku, Tokyo 125-8585, Japan.
| | - Yu Akagi
- Department of Biomedical Informatics, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8655, Japan.
| | - Tatsuya Harada
- Machine Intelligence for Medical Engineering Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8904, Japan.
| | - Yukihide Kanemitsu
- Department of Colorectal Surgery, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan.
| | - Ryuji Hamamoto
- Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan.
| |
Collapse
|
4
|
Loftus TJ, Balch JA, Abbott KL, Hu D, Ruppert MM, Shickel B, Ozrazgat-Baslanti T, Efron PA, Tighe PJ, Hogan WR, Rashidi P, Cardel MI, Upchurch GR, Bihorac A. Community-engaged artificial intelligence research: A scoping review. PLOS DIGITAL HEALTH 2024; 3:e0000561. [PMID: 39178307 PMCID: PMC11343451 DOI: 10.1371/journal.pdig.0000561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 06/27/2024] [Indexed: 08/25/2024]
Abstract
The degree to which artificial intelligence healthcare research is informed by data and stakeholders from community settings has not been previously described. As communities are the principal location of healthcare delivery, engaging them could represent an important opportunity to improve scientific quality. This scoping review systematically maps what is known and unknown about community-engaged artificial intelligence research and identifies opportunities to optimize the generalizability of these applications through involvement of community stakeholders and data throughout model development, validation, and implementation. Embase, PubMed, and MEDLINE databases were searched for articles describing artificial intelligence or machine learning healthcare applications with community involvement in model development, validation, or implementation. Model architecture and performance, the nature of community engagement, and barriers or facilitators to community engagement were reported according to PRISMA extension for Scoping Reviews guidelines. Of approximately 10,880 articles describing artificial intelligence healthcare applications, 21 (0.2%) described community involvement. All articles derived data from community settings, most commonly by leveraging existing datasets and sources that included community subjects, and often bolstered by internet-based data acquisition and subject recruitment. Only one article described inclusion of community stakeholders in designing an application-a natural language processing model that detected cases of likely child abuse with 90% accuracy using harmonized electronic health record notes from both hospital and community practice settings. The primary barrier to including community-derived data was small sample sizes, which may have affected 11 of the 21 studies (53%), introducing substantial risk for overfitting that threatens generalizability. Community engagement in artificial intelligence healthcare application development, validation, or implementation is rare. As healthcare delivery occurs primarily in community settings, investigators should consider engaging community stakeholders in user-centered design, usability, and clinical implementation studies to optimize generalizability.
Collapse
Affiliation(s)
- Tyler J. Loftus
- University of Florida Intelligent Clinical Care Center, Gainesville, Florida, United States of America
- Department of Surgery, University of Florida Health, Gainesville, Florida, United States of America
| | - Jeremy A. Balch
- University of Florida Intelligent Clinical Care Center, Gainesville, Florida, United States of America
- Department of Surgery, University of Florida Health, Gainesville, Florida, United States of America
| | - Kenneth L. Abbott
- Department of Surgery, University of Florida Health, Gainesville, Florida, United States of America
| | - Die Hu
- University of Florida Intelligent Clinical Care Center, Gainesville, Florida, United States of America
- Department of Surgery, University of Florida Health, Gainesville, Florida, United States of America
| | - Matthew M. Ruppert
- University of Florida Intelligent Clinical Care Center, Gainesville, Florida, United States of America
- Department of Medicine, University of Florida Health, Gainesville, Florida, United States of America
- College of Medicine, University of Central Florida, Orlando, Florida, United States of America
| | - Benjamin Shickel
- University of Florida Intelligent Clinical Care Center, Gainesville, Florida, United States of America
- Department of Medicine, University of Florida Health, Gainesville, Florida, United States of America
| | - Tezcan Ozrazgat-Baslanti
- University of Florida Intelligent Clinical Care Center, Gainesville, Florida, United States of America
- Department of Medicine, University of Florida Health, Gainesville, Florida, United States of America
| | - Philip A. Efron
- Department of Surgery, University of Florida Health, Gainesville, Florida, United States of America
| | - Patrick J. Tighe
- Departments of Anesthesiology, Orthopedics, and Information Systems/Operations Management, University of Florida Health, Gainesville, Florida, United States of America
| | - William R. Hogan
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, United States of America
| | - Parisa Rashidi
- University of Florida Intelligent Clinical Care Center, Gainesville, Florida, United States of America
- Departments of Biomedical Engineering, Computer and Information Science and Engineering, and Electrical and Computer Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Michelle I. Cardel
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, United States of America
| | - Gilbert R. Upchurch
- Department of Surgery, University of Florida Health, Gainesville, Florida, United States of America
| | - Azra Bihorac
- University of Florida Intelligent Clinical Care Center, Gainesville, Florida, United States of America
- Department of Surgery, University of Florida Health, Gainesville, Florida, United States of America
- Department of Medicine, University of Florida Health, Gainesville, Florida, United States of America
| |
Collapse
|
5
|
Ghadimi DJ, Vahdani AM, Karimi H, Ebrahimi P, Fathi M, Moodi F, Habibzadeh A, Khodadadi Shoushtari F, Valizadeh G, Mobarak Salari H, Saligheh Rad H. Deep Learning-Based Techniques in Glioma Brain Tumor Segmentation Using Multi-Parametric MRI: A Review on Clinical Applications and Future Outlooks. J Magn Reson Imaging 2024. [PMID: 39074952 DOI: 10.1002/jmri.29543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 07/07/2024] [Accepted: 07/08/2024] [Indexed: 07/31/2024] Open
Abstract
This comprehensive review explores the role of deep learning (DL) in glioma segmentation using multiparametric magnetic resonance imaging (MRI) data. The study surveys advanced techniques such as multiparametric MRI for capturing the complex nature of gliomas. It delves into the integration of DL with MRI, focusing on convolutional neural networks (CNNs) and their remarkable capabilities in tumor segmentation. Clinical applications of DL-based segmentation are highlighted, including treatment planning, monitoring treatment response, and distinguishing between tumor progression and pseudo-progression. Furthermore, the review examines the evolution of DL-based segmentation studies, from early CNN models to recent advancements such as attention mechanisms and transformer models. Challenges in data quality, gradient vanishing, and model interpretability are discussed. The review concludes with insights into future research directions, emphasizing the importance of addressing tumor heterogeneity, integrating genomic data, and ensuring responsible deployment of DL-driven healthcare technologies. EVIDENCE LEVEL: N/A TECHNICAL EFFICACY: Stage 2.
Collapse
Affiliation(s)
- Delaram J Ghadimi
- School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amir M Vahdani
- Image Guided Surgery Lab, Research Center for Biomedical Technologies and Robotics, Advanced Medical Technologies and Equipment Institute, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Tehran, Iran
| | - Hanie Karimi
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Pouya Ebrahimi
- Cardiovascular Diseases Research Institute, Tehran Heart Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Mobina Fathi
- School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farzan Moodi
- School of Medicine, Iran University of Medical Sciences, Tehran, Iran
- Quantitative MR Imaging and Spectroscopy Group (QMISG), Tehran University of Medical Sciences, Tehran, Iran
| | - Adrina Habibzadeh
- Student Research Committee, Fasa University of Medical Sciences, Fasa, Iran
| | | | - Gelareh Valizadeh
- Quantitative MR Imaging and Spectroscopy Group (QMISG), Tehran University of Medical Sciences, Tehran, Iran
| | - Hanieh Mobarak Salari
- Quantitative MR Imaging and Spectroscopy Group (QMISG), Tehran University of Medical Sciences, Tehran, Iran
| | - Hamidreza Saligheh Rad
- Quantitative MR Imaging and Spectroscopy Group (QMISG), Tehran University of Medical Sciences, Tehran, Iran
- Department of Medical Physics and Biomedical Engineering, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
6
|
Wahid KA, Kaffey ZY, Farris DP, Humbert-Vidan L, Moreno AC, Rasmussen M, Ren J, Naser MA, Netherton TJ, Korreman S, Balakrishnan G, Fuller CD, Fuentes D, Dohopolski MJ. Artificial Intelligence Uncertainty Quantification in Radiotherapy Applications - A Scoping Review. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.13.24307226. [PMID: 38798581 PMCID: PMC11118597 DOI: 10.1101/2024.05.13.24307226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Background/purpose The use of artificial intelligence (AI) in radiotherapy (RT) is expanding rapidly. However, there exists a notable lack of clinician trust in AI models, underscoring the need for effective uncertainty quantification (UQ) methods. The purpose of this study was to scope existing literature related to UQ in RT, identify areas of improvement, and determine future directions. Methods We followed the PRISMA-ScR scoping review reporting guidelines. We utilized the population (human cancer patients), concept (utilization of AI UQ), context (radiotherapy applications) framework to structure our search and screening process. We conducted a systematic search spanning seven databases, supplemented by manual curation, up to January 2024. Our search yielded a total of 8980 articles for initial review. Manuscript screening and data extraction was performed in Covidence. Data extraction categories included general study characteristics, RT characteristics, AI characteristics, and UQ characteristics. Results We identified 56 articles published from 2015-2024. 10 domains of RT applications were represented; most studies evaluated auto-contouring (50%), followed by image-synthesis (13%), and multiple applications simultaneously (11%). 12 disease sites were represented, with head and neck cancer being the most common disease site independent of application space (32%). Imaging data was used in 91% of studies, while only 13% incorporated RT dose information. Most studies focused on failure detection as the main application of UQ (60%), with Monte Carlo dropout being the most commonly implemented UQ method (32%) followed by ensembling (16%). 55% of studies did not share code or datasets. Conclusion Our review revealed a lack of diversity in UQ for RT applications beyond auto-contouring. Moreover, there was a clear need to study additional UQ methods, such as conformal prediction. Our results may incentivize the development of guidelines for reporting and implementation of UQ in RT.
Collapse
Affiliation(s)
- Kareem A. Wahid
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Zaphanlene Y. Kaffey
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - David P. Farris
- Research Medical Library, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Laia Humbert-Vidan
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Amy C. Moreno
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | | | - Jintao Ren
- Department of Oncology, Aarhus University Hospital, Denmark
| | - Mohamed A. Naser
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Tucker J. Netherton
- Department of Radiation Physics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Stine Korreman
- Department of Oncology, Aarhus University Hospital, Denmark
| | | | - Clifton D. Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - David Fuentes
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Michael J. Dohopolski
- Department of Radiation Oncology, The University of Texas Southwestern Medical Center, Dallas, Texas, USA
| |
Collapse
|
7
|
Berumen F, Ouellet S, Enger S, Beaulieu L. Aleatoric and epistemic uncertainty extraction of patient-specific deep learning-based dose predictions in LDR prostate brachytherapy. Phys Med Biol 2024; 69:085026. [PMID: 38484398 DOI: 10.1088/1361-6560/ad3418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 03/14/2024] [Indexed: 04/10/2024]
Abstract
Objective.In brachytherapy, deep learning (DL) algorithms have shown the capability of predicting 3D dose volumes. The reliability and accuracy of such methodologies remain under scrutiny for prospective clinical applications. This study aims to establish fast DL-based predictive dose algorithms for low-dose rate (LDR) prostate brachytherapy and to evaluate their uncertainty and stability.Approach.Data from 200 prostate patients, treated with125I sources, was collected. The Monte Carlo (MC) ground truth dose volumes were calculated with TOPAS considering the interseed effects and an organ-based material assignment. Two 3D convolutional neural networks, UNet and ResUNet TSE, were trained using the patient geometry and the seed positions as the input data. The dataset was randomly split into training (150), validation (25) and test (25) sets. The aleatoric (associated with the input data) and epistemic (associated with the model) uncertainties of the DL models were assessed.Main results.For the full test set, with respect to the MC reference, the predicted prostateD90metric had mean differences of -0.64% and 0.08% for the UNet and ResUNet TSE models, respectively. In voxel-by-voxel comparisons, the average global dose difference ratio in the [-1%, 1%] range included 91.0% and 93.0% of voxels for the UNet and the ResUNet TSE, respectively. One forward pass or prediction took 4 ms for a 3D dose volume of 2.56 M voxels (128 × 160 × 128). The ResUNet TSE model closely encoded the well-known physics of the problem as seen in a set of uncertainty maps. The ResUNet TSE rectum D2cchad the largest uncertainty metric of 0.0042.Significance.The proposed DL models serve as rapid dose predictors that consider the patient anatomy and interseed attenuation effects. The derived uncertainty is interpretable, highlighting areas where DL models may struggle to provide accurate estimations. The uncertainty analysis offers a comprehensive evaluation tool for dose predictor model assessment.
Collapse
Affiliation(s)
- Francisco Berumen
- Service de Physique Médicale et de Radioprotection, Centre Intégré de Cancérologie, CHU de Québec-Université Laval et Centre de recherche du CHU de Québec, Quebec, Quebec, Canada
- Département de Physique, de Génie Physique et d'Optique et Centre de Recherche sur le Cancer, Université Laval, Quebec, Quebec, Canada
| | - Samuel Ouellet
- Service de Physique Médicale et de Radioprotection, Centre Intégré de Cancérologie, CHU de Québec-Université Laval et Centre de recherche du CHU de Québec, Quebec, Quebec, Canada
- Département de Physique, de Génie Physique et d'Optique et Centre de Recherche sur le Cancer, Université Laval, Quebec, Quebec, Canada
| | - Shirin Enger
- Medical Physics Unit, Department of Oncology, McGill University, Montreal, Quebec, Canada
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | - Luc Beaulieu
- Service de Physique Médicale et de Radioprotection, Centre Intégré de Cancérologie, CHU de Québec-Université Laval et Centre de recherche du CHU de Québec, Quebec, Quebec, Canada
- Département de Physique, de Génie Physique et d'Optique et Centre de Recherche sur le Cancer, Université Laval, Quebec, Quebec, Canada
| |
Collapse
|
8
|
Lambert B, Forbes F, Doyle S, Dehaene H, Dojat M. Trustworthy clinical AI solutions: A unified review of uncertainty quantification in Deep Learning models for medical image analysis. Artif Intell Med 2024; 150:102830. [PMID: 38553168 DOI: 10.1016/j.artmed.2024.102830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 02/28/2024] [Accepted: 03/01/2024] [Indexed: 04/02/2024]
Abstract
The full acceptance of Deep Learning (DL) models in the clinical field is rather low with respect to the quantity of high-performing solutions reported in the literature. End users are particularly reluctant to rely on the opaque predictions of DL models. Uncertainty quantification methods have been proposed in the literature as a potential solution, to reduce the black-box effect of DL models and increase the interpretability and the acceptability of the result by the final user. In this review, we propose an overview of the existing methods to quantify uncertainty associated with DL predictions. We focus on applications to medical image analysis, which present specific challenges due to the high dimensionality of images and their variable quality, as well as constraints associated with real-world clinical routine. Moreover, we discuss the concept of structural uncertainty, a corpus of methods to facilitate the alignment of segmentation uncertainty estimates with clinical attention. We then discuss the evaluation protocols to validate the relevance of uncertainty estimates. Finally, we highlight the open challenges for uncertainty quantification in the medical field.
Collapse
Affiliation(s)
- Benjamin Lambert
- Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut des Neurosciences, Grenoble, 38000, France; Pixyl Research and Development Laboratory, Grenoble, 38000, France
| | - Florence Forbes
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble, 38000, France
| | - Senan Doyle
- Pixyl Research and Development Laboratory, Grenoble, 38000, France
| | - Harmonie Dehaene
- Pixyl Research and Development Laboratory, Grenoble, 38000, France
| | - Michel Dojat
- Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut des Neurosciences, Grenoble, 38000, France.
| |
Collapse
|
9
|
Martín Vicario C, Rodríguez Salas D, Maier A, Hock S, Kuramatsu J, Kallmuenzer B, Thamm F, Taubmann O, Ditt H, Schwab S, Dörfler A, Muehlen I. Uncertainty-aware deep learning for trustworthy prediction of long-term outcome after endovascular thrombectomy. Sci Rep 2024; 14:5544. [PMID: 38448445 PMCID: PMC10917742 DOI: 10.1038/s41598-024-55761-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 02/27/2024] [Indexed: 03/08/2024] Open
Abstract
Acute ischemic stroke (AIS) is a leading global cause of mortality and morbidity. Improving long-term outcome predictions after thrombectomy can enhance treatment quality by supporting clinical decision-making. With the advent of interpretable deep learning methods in recent years, it is now possible to develop trustworthy, high-performing prediction models. This study introduces an uncertainty-aware, graph deep learning model that predicts endovascular thrombectomy outcomes using clinical features and imaging biomarkers. The model targets long-term functional outcomes, defined by the three-month modified Rankin Score (mRS), and mortality rates. A sample of 220 AIS patients in the anterior circulation who underwent endovascular thrombectomy (EVT) was included, with 81 (37%) demonstrating good outcomes (mRS ≤ 2). The performance of the different algorithms evaluated was comparable, with the maximum validation under the curve (AUC) reaching 0.87 using graph convolutional networks (GCN) for mRS prediction and 0.86 using fully connected networks (FCN) for mortality prediction. Moderate performance was obtained at admission (AUC of 0.76 using GCN), which improved to 0.84 post-thrombectomy and to 0.89 a day after stroke. Reliable uncertainty prediction of the model could be demonstrated.
Collapse
Affiliation(s)
- Celia Martín Vicario
- Department of Neuroradiology, Friedrich-Alexander University of Erlangen-Nuremberg, University Hospital Erlangen, Erlangen, Germany.
- Pattern Recognition Lab, Friedrich Alexander University, Erlangen, Germany.
| | - Dalia Rodríguez Salas
- Department of Neuroradiology, Friedrich-Alexander University of Erlangen-Nuremberg, University Hospital Erlangen, Erlangen, Germany
- Pattern Recognition Lab, Friedrich Alexander University, Erlangen, Germany
| | - Andreas Maier
- Pattern Recognition Lab, Friedrich Alexander University, Erlangen, Germany
| | - Stefan Hock
- Department of Neuroradiology, Friedrich-Alexander University of Erlangen-Nuremberg, University Hospital Erlangen, Erlangen, Germany
| | - Joji Kuramatsu
- Department of Neurology, Friedrich-Alexander University of Erlangen-Nuremberg, University Hospital Erlangen, Erlangen, Germany
| | - Bernd Kallmuenzer
- Department of Neurology, Friedrich-Alexander University of Erlangen-Nuremberg, University Hospital Erlangen, Erlangen, Germany
| | | | | | | | - Stefan Schwab
- Department of Neurology, Friedrich-Alexander University of Erlangen-Nuremberg, University Hospital Erlangen, Erlangen, Germany
| | - Arnd Dörfler
- Department of Neuroradiology, Friedrich-Alexander University of Erlangen-Nuremberg, University Hospital Erlangen, Erlangen, Germany
| | - Iris Muehlen
- Department of Neuroradiology, Friedrich-Alexander University of Erlangen-Nuremberg, University Hospital Erlangen, Erlangen, Germany
| |
Collapse
|
10
|
Subramanian HV, Canfield C, Shank DB. Designing explainable AI to improve human-AI team performance: A medical stakeholder-driven scoping review. Artif Intell Med 2024; 149:102780. [PMID: 38462282 DOI: 10.1016/j.artmed.2024.102780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 12/20/2023] [Accepted: 01/14/2024] [Indexed: 03/12/2024]
Abstract
The rise of complex AI systems in healthcare and other sectors has led to a growing area of research called Explainable AI (XAI) designed to increase transparency. In this area, quantitative and qualitative studies focus on improving user trust and task performance by providing system- and prediction-level XAI features. We analyze stakeholder engagement events (interviews and workshops) on the use of AI for kidney transplantation. From this we identify themes which we use to frame a scoping literature review on current XAI features. The stakeholder engagement process lasted over nine months covering three stakeholder group's workflows, determining where AI could intervene and assessing a mock XAI decision support system. Based on the stakeholder engagement, we identify four major themes relevant to designing XAI systems - 1) use of AI predictions, 2) information included in AI predictions, 3) personalization of AI predictions for individual differences, and 4) customizing AI predictions for specific cases. Using these themes, our scoping literature review finds that providing AI predictions before, during, or after decision-making could be beneficial depending on the complexity of the stakeholder's task. Additionally, expert stakeholders like surgeons prefer minimal to no XAI features, AI prediction, and uncertainty estimates for easy use cases. However, almost all stakeholders prefer to have optional XAI features to review when needed, especially in hard-to-predict cases. The literature also suggests that providing both system- and prediction-level information is necessary to build the user's mental model of the system appropriately. Although XAI features improve users' trust in the system, human-AI team performance is not always enhanced. Overall, stakeholders prefer to have agency over the XAI interface to control the level of information based on their needs and task complexity. We conclude with suggestions for future research, especially on customizing XAI features based on preferences and tasks.
Collapse
Affiliation(s)
- Harishankar V Subramanian
- Engineering Management & Systems Engineering, Missouri University of Science and Technology, 600 W 14(th) Street, Rolla, MO 65409, United States of America
| | - Casey Canfield
- Engineering Management & Systems Engineering, Missouri University of Science and Technology, 600 W 14(th) Street, Rolla, MO 65409, United States of America.
| | - Daniel B Shank
- Psychological Science, Missouri University of Science and Technology, 500 W 14(th) Street, Rolla, MO 65409, United States of America
| |
Collapse
|
11
|
Nakagawa S, Ono N, Hakamata Y, Ishii T, Saito A, Yanagimoto S, Kanaya S. Quantitative evaluation model of variable diagnosis for chest X-ray images using deep learning. PLOS DIGITAL HEALTH 2024; 3:e0000460. [PMID: 38489375 PMCID: PMC10942047 DOI: 10.1371/journal.pdig.0000460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 02/04/2024] [Indexed: 03/17/2024]
Abstract
The purpose of this study is to demonstrate the use of a deep learning model in quantitatively evaluating clinical findings typically subject to uncertain evaluations by physicians, using binary test results based on routine protocols. A chest X-ray is the most commonly used diagnostic tool for the detection of a wide range of diseases and is generally performed as a part of regular medical checkups. However, when it comes to findings that can be classified as within the normal range but are not considered disease-related, the thresholds of physicians' findings can vary to some extent, therefore it is necessary to define a new evaluation method and quantify it. The implementation of such methods is difficult and expensive in terms of time and labor. In this study, a total of 83,005 chest X-ray images were used to diagnose the common findings of pleural thickening and scoliosis. A novel method for quantitatively evaluating the probability that a physician would judge the images to have these findings was established. The proposed method successfully quantified the variation in physicians' findings using a deep learning model trained only on binary annotation data. It was also demonstrated that the developed method could be applied to both transfer learning using convolutional neural networks for general image analysis and a newly learned deep learning model based on vector quantization variational autoencoders with high correlations ranging from 0.89 to 0.97.
Collapse
Affiliation(s)
- Shota Nakagawa
- Department of Science and Technology, Nara Institute of Science and Technology, Ikoma, Nara, Japan
| | - Naoaki Ono
- Department of Science and Technology, Nara Institute of Science and Technology, Ikoma, Nara, Japan
- Data Science Center, Nara Institute of Science and Technology, Ikoma, Nara, Japan
| | | | - Takashi Ishii
- Division for Health Service Promotion, the University of Tokyo, Japan
| | - Akira Saito
- Division for Health Service Promotion, the University of Tokyo, Japan
| | | | - Shigehiko Kanaya
- Department of Science and Technology, Nara Institute of Science and Technology, Ikoma, Nara, Japan
- Data Science Center, Nara Institute of Science and Technology, Ikoma, Nara, Japan
| |
Collapse
|
12
|
Kanwal N, López-Pérez M, Kiraz U, Zuiverloon TCM, Molina R, Engan K. Are you sure it's an artifact? Artifact detection and uncertainty quantification in histological images. Comput Med Imaging Graph 2024; 112:102321. [PMID: 38199127 DOI: 10.1016/j.compmedimag.2023.102321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 11/08/2023] [Accepted: 12/12/2023] [Indexed: 01/12/2024]
Abstract
Modern cancer diagnostics involves extracting tissue specimens from suspicious areas and conducting histotechnical procedures to prepare a digitized glass slide, called Whole Slide Image (WSI), for further examination. These procedures frequently introduce different types of artifacts in the obtained WSI, and histological artifacts might influence Computational Pathology (CPATH) systems further down to a diagnostic pipeline if not excluded or handled. Deep Convolutional Neural Networks (DCNNs) have achieved promising results for the detection of some WSI artifacts, however, they do not incorporate uncertainty in their predictions. This paper proposes an uncertainty-aware Deep Kernel Learning (DKL) model to detect blurry areas and folded tissues, two types of artifacts that can appear in WSIs. The proposed probabilistic model combines a CNN feature extractor and a sparse Gaussian Processes (GPs) classifier, which improves the performance of current state-of-the-art artifact detection DCNNs and provides uncertainty estimates. We achieved 0.996 and 0.938 F1 scores for blur and folded tissue detection on unseen data, respectively. In extensive experiments, we validated the DKL model on unseen data from external independent cohorts with different staining and tissue types, where it outperformed DCNNs. Interestingly, the DKL model is more confident in the correct predictions and less in the wrong ones. The proposed DKL model can be integrated into the preprocessing pipeline of CPATH systems to provide reliable predictions and possibly serve as a quality control tool.
Collapse
Affiliation(s)
- Neel Kanwal
- Department of Electrical Engineering and Computer Science, University of Stavanger, 4021 Stavanger, Norway.
| | - Miguel López-Pérez
- Department of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain
| | - Umay Kiraz
- Department of Pathology, Stavanger University Hospital, 4011 Stavanger, Norway; Department of Chemistry, Bioscience and Environmental Engineering, University of Stavanger, 4021 Stavanger, Norway
| | - Tahlita C M Zuiverloon
- Department of Urology, University Medical Center Rotterdam, Erasmus MC Cancer Institute, 1035 GD Rotterdam, The Netherlands
| | - Rafael Molina
- Department of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain
| | - Kjersti Engan
- Department of Electrical Engineering and Computer Science, University of Stavanger, 4021 Stavanger, Norway
| |
Collapse
|
13
|
Seoni S, Jahmunah V, Salvi M, Barua PD, Molinari F, Acharya UR. Application of uncertainty quantification to artificial intelligence in healthcare: A review of last decade (2013-2023). Comput Biol Med 2023; 165:107441. [PMID: 37683529 DOI: 10.1016/j.compbiomed.2023.107441] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 08/27/2023] [Accepted: 08/29/2023] [Indexed: 09/10/2023]
Abstract
Uncertainty estimation in healthcare involves quantifying and understanding the inherent uncertainty or variability associated with medical predictions, diagnoses, and treatment outcomes. In this era of Artificial Intelligence (AI) models, uncertainty estimation becomes vital to ensure safe decision-making in the medical field. Therefore, this review focuses on the application of uncertainty techniques to machine and deep learning models in healthcare. A systematic literature review was conducted using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Our analysis revealed that Bayesian methods were the predominant technique for uncertainty quantification in machine learning models, with Fuzzy systems being the second most used approach. Regarding deep learning models, Bayesian methods emerged as the most prevalent approach, finding application in nearly all aspects of medical imaging. Most of the studies reported in this paper focused on medical images, highlighting the prevalent application of uncertainty quantification techniques using deep learning models compared to machine learning models. Interestingly, we observed a scarcity of studies applying uncertainty quantification to physiological signals. Thus, future research on uncertainty quantification should prioritize investigating the application of these techniques to physiological signals. Overall, our review highlights the significance of integrating uncertainty techniques in healthcare applications of machine learning and deep learning models. This can provide valuable insights and practical solutions to manage uncertainty in real-world medical data, ultimately improving the accuracy and reliability of medical diagnoses and treatment recommendations.
Collapse
Affiliation(s)
- Silvia Seoni
- Biolab, PolitoBIOMedLab, Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy
| | | | - Massimo Salvi
- Biolab, PolitoBIOMedLab, Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy
| | - Prabal Datta Barua
- School of Business (Information System), University of Southern Queensland, Toowoomba, QLD, 4350, Australia; Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, 2007, Australia
| | - Filippo Molinari
- Biolab, PolitoBIOMedLab, Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy.
| | - U Rajendra Acharya
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield, Australia
| |
Collapse
|
14
|
Loftus TJ, Altieri MS, Balch JA, Abbott KL, Choi J, Marwaha JS, Hashimoto DA, Brat GA, Raftopoulos Y, Evans HL, Jackson GP, Walsh DS, Tignanelli CJ. Artificial Intelligence-enabled Decision Support in Surgery: State-of-the-art and Future Directions. Ann Surg 2023; 278:51-58. [PMID: 36942574 DOI: 10.1097/sla.0000000000005853] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2023]
Abstract
OBJECTIVE To summarize state-of-the-art artificial intelligence-enabled decision support in surgery and to quantify deficiencies in scientific rigor and reporting. BACKGROUND To positively affect surgical care, decision-support models must exceed current reporting guideline requirements by performing external and real-time validation, enrolling adequate sample sizes, reporting model precision, assessing performance across vulnerable populations, and achieving clinical implementation; the degree to which published models meet these criteria is unknown. METHODS Embase, PubMed, and MEDLINE databases were searched from their inception to September 21, 2022 for articles describing artificial intelligence-enabled decision support in surgery that uses preoperative or intraoperative data elements to predict complications within 90 days of surgery. Scientific rigor and reporting criteria were assessed and reported according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews guidelines. RESULTS Sample size ranged from 163-2,882,526, with 8/36 articles (22.2%) featuring sample sizes of less than 2000; 7 of these 8 articles (87.5%) had below-average (<0.83) area under the receiver operating characteristic or accuracy. Overall, 29 articles (80.6%) performed internal validation only, 5 (13.8%) performed external validation, and 2 (5.6%) performed real-time validation. Twenty-three articles (63.9%) reported precision. No articles reported performance across sociodemographic categories. Thirteen articles (36.1%) presented a framework that could be used for clinical implementation; none assessed clinical implementation efficacy. CONCLUSIONS Artificial intelligence-enabled decision support in surgery is limited by reliance on internal validation, small sample sizes that risk overfitting and sacrifice predictive performance, and failure to report confidence intervals, precision, equity analyses, and clinical implementation. Researchers should strive to improve scientific quality.
Collapse
Affiliation(s)
- Tyler J Loftus
- Department of Surgery, University of Florida Health, Gainesville, FL
- American College of Surgeons Health Information Technology Committee and Artificial Intelligence Subcommittee, Chicago, IL
| | - Maria S Altieri
- American College of Surgeons Health Information Technology Committee and Artificial Intelligence Subcommittee, Chicago, IL
- Department of Surgery, University of Pennsylvania, Philadelphia, PA
| | - Jeremy A Balch
- Department of Surgery, University of Florida Health, Gainesville, FL
- American College of Surgeons Health Information Technology Committee and Artificial Intelligence Subcommittee, Chicago, IL
| | - Kenneth L Abbott
- Department of Surgery, University of Florida Health, Gainesville, FL
- American College of Surgeons Health Information Technology Committee and Artificial Intelligence Subcommittee, Chicago, IL
| | - Jeff Choi
- American College of Surgeons Health Information Technology Committee and Artificial Intelligence Subcommittee, Chicago, IL
- Department of Surgery, Stanford University, Stanford, CA
| | - Jayson S Marwaha
- American College of Surgeons Health Information Technology Committee and Artificial Intelligence Subcommittee, Chicago, IL
- Department of Surgery, Beth Israel Deaconess Medical Center
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| | - Daniel A Hashimoto
- American College of Surgeons Health Information Technology Committee and Artificial Intelligence Subcommittee, Chicago, IL
- Department of Surgery, University of Pennsylvania Perelman School of Medicine
- General Robotics, Automation, Sensing, and Perception Laboratory, University of Pennsylvania School of Engineering and Applied Science, Philadelphia, PA
| | - Gabriel A Brat
- American College of Surgeons Health Information Technology Committee and Artificial Intelligence Subcommittee, Chicago, IL
- Department of Surgery, Beth Israel Deaconess Medical Center
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| | - Yannis Raftopoulos
- American College of Surgeons Health Information Technology Committee and Artificial Intelligence Subcommittee, Chicago, IL
- Weight Management Program, Holyoke Medical Center, Holyoke, MA
| | - Heather L Evans
- American College of Surgeons Health Information Technology Committee and Artificial Intelligence Subcommittee, Chicago, IL
- Department of Surgery, Medical University of South Carolina, Charleston, SC
| | - Gretchen P Jackson
- American College of Surgeons Health Information Technology Committee and Artificial Intelligence Subcommittee, Chicago, IL
- Digital, Intuitive Surgical, Sunnyvale, CA; Departments of Pediatric Surgery, Pediatrics, and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Danielle S Walsh
- American College of Surgeons Health Information Technology Committee and Artificial Intelligence Subcommittee, Chicago, IL
- Department of Surgery, University of Kentucky, Lexington, KY
| | - Christopher J Tignanelli
- American College of Surgeons Health Information Technology Committee and Artificial Intelligence Subcommittee, Chicago, IL
- Department of Surgery
- Institute for Health Informatics
- Program for Clinical Artificial Intelligence, Center for Learning Health Systems Science, University of Minnesota, Minneapolis, MN
| |
Collapse
|
15
|
Huang AA, Huang SY. Computation of the distribution of model accuracy statistics in machine learning: Comparison between analytically derived distributions and simulation-based methods. Health Sci Rep 2023; 6:e1214. [PMID: 37091362 PMCID: PMC10119581 DOI: 10.1002/hsr2.1214] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 03/16/2023] [Accepted: 03/20/2023] [Indexed: 04/25/2023] Open
Abstract
Background and Aims All fields have seen an increase in machine-learning techniques. To accurately evaluate the efficacy of novel modeling methods, it is necessary to conduct a critical evaluation of the utilized model metrics, such as sensitivity, specificity, and area under the receiver operator characteristic curve (AUROC). For commonly used model metrics, we proposed the use of analytically derived distributions (ADDs) and compared it with simulation-based approaches. Methods A retrospective cohort study was conducted using the England National Health Services Heart Disease Prediction Cohort. Four machine learning models (XGBoost, Random Forest, Artificial Neural Network, and Adaptive Boost) were used. The distribution of the model metrics and covariate gain statistics were empirically derived using boot-strap simulation (N = 10,000). The ADDs were created from analytic formulas from the covariates to describe the distribution of the model metrics and compared with those of bootstrap simulation. Results XGBoost had the most optimal model having the highest AUROC and the highest aggregate score considering six other model metrics. Based on the Anderson-Darling test, the distribution of the model metrics created from bootstrap did not significantly deviate from a normal distribution. The variance created from the ADD led to smaller SDs than those derived from bootstrap simulation, whereas the rest of the distribution remained not statistically significantly different. Conclusions ADD allows for cross study comparison of model metrics, which is usually done with bootstrapping that rely on simulations, which cannot be replicated by the reader.
Collapse
Affiliation(s)
- Alexander A. Huang
- Northwestern University Feinberg School of MedicineNorthwestern UniversityChicagoIllinoisUSA
| | - Samuel Y. Huang
- Virginia Commonwealth School of MedicineVirginia Commonwealth UniversityRichmondVirginiaUSA
| |
Collapse
|
16
|
Cobianchi L, Piccolo D, Dal Mas F, Agnoletti V, Ansaloni L, Balch J, Biffl W, Butturini G, Catena F, Coccolini F, Denicolai S, De Simone B, Frigerio I, Fugazzola P, Marseglia G, Marseglia GR, Martellucci J, Modenese M, Previtali P, Ruta F, Venturi A, Kaafarani HM, Loftus TJ. Surgeons' perspectives on artificial intelligence to support clinical decision-making in trauma and emergency contexts: results from an international survey. World J Emerg Surg 2023; 18:1. [PMID: 36597105 PMCID: PMC9811693 DOI: 10.1186/s13017-022-00467-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 11/28/2022] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI) is gaining traction in medicine and surgery. AI-based applications can offer tools to examine high-volume data to inform predictive analytics that supports complex decision-making processes. Time-sensitive trauma and emergency contexts are often challenging. The study aims to investigate trauma and emergency surgeons' knowledge and perception of using AI-based tools in clinical decision-making processes. METHODS An online survey grounded on literature regarding AI-enabled surgical decision-making aids was created by a multidisciplinary committee and endorsed by the World Society of Emergency Surgery (WSES). The survey was advertised to 917 WSES members through the society's website and Twitter profile. RESULTS 650 surgeons from 71 countries in five continents participated in the survey. Results depict the presence of technology enthusiasts and skeptics and surgeons' preference toward more classical decision-making aids like clinical guidelines, traditional training, and the support of their multidisciplinary colleagues. A lack of knowledge about several AI-related aspects emerges and is associated with mistrust. DISCUSSION The trauma and emergency surgical community is divided into those who firmly believe in the potential of AI and those who do not understand or trust AI-enabled surgical decision-making aids. Academic societies and surgical training programs should promote a foundational, working knowledge of clinical AI.
Collapse
Affiliation(s)
- Lorenzo Cobianchi
- Department of Clinical, Diagnostic and Pediatric Sciences, University of Pavia, Via Alessandro Brambilla, 74, 27100, Pavia, PV, Italy.
- General Surgery, IRCCS Policlinico San Matteo Foundation, Pavia, Italy.
- ITIR - Institute for Transformative Innovation Research, University of Pavia, Pavia, Italy.
| | - Daniele Piccolo
- Department of Clinical, Diagnostic and Pediatric Sciences, University of Pavia, Via Alessandro Brambilla, 74, 27100, Pavia, PV, Italy
- Department of Neurosurgery, ASUFC Santa Maria Della Misericordia, Udine, Italy
| | - Francesca Dal Mas
- Department of Management, Ca' Foscari University of Venice, Venice, Italy
| | | | - Luca Ansaloni
- Department of Clinical, Diagnostic and Pediatric Sciences, University of Pavia, Via Alessandro Brambilla, 74, 27100, Pavia, PV, Italy
- General Surgery, IRCCS Policlinico San Matteo Foundation, Pavia, Italy
| | - Jeremy Balch
- Department of Surgery, University of Florida Health, Gainesville, FL, USA
| | - Walter Biffl
- Division of Trauma and Acute Care Surgery, Scripps Memorial Hospital La Jolla, La Jolla, CA, USA
| | - Giovanni Butturini
- Department of HPB Surgery, Pederzoli Hospital, Peschiera del Garda, Italy
| | | | - Federico Coccolini
- General, Emergency and Trauma Surgery Department, Pisa University Hospital Pisa, Pisa, Italy
| | - Stefano Denicolai
- ITIR - Institute for Transformative Innovation Research, University of Pavia, Pavia, Italy
- Department of Economics and Management, University of Pavia, Pavia, Italy
| | - Belinda De Simone
- Department of Emergency, Digestive and Metabolic Minimally Invasive Surgery, Poissy and Saint Germain en Laye Hospitals, Poissy, France
| | - Isabella Frigerio
- Department of HPB Surgery, Pederzoli Hospital, Peschiera del Garda, Italy
| | - Paola Fugazzola
- Department of Clinical, Diagnostic and Pediatric Sciences, University of Pavia, Via Alessandro Brambilla, 74, 27100, Pavia, PV, Italy
- General Surgery, IRCCS Policlinico San Matteo Foundation, Pavia, Italy
| | - Gianluigi Marseglia
- Department of Clinical, Diagnostic and Pediatric Sciences, University of Pavia, Via Alessandro Brambilla, 74, 27100, Pavia, PV, Italy
- IRCCS Policlinico San Matteo Foundation, Pediatric Clinic., Pavia, Italy
| | | | | | | | - Pietro Previtali
- ITIR - Institute for Transformative Innovation Research, University of Pavia, Pavia, Italy
- Department of Economics and Management, University of Pavia, Pavia, Italy
| | - Federico Ruta
- General Direction, ASL BAT (Health Agency), Andria, Italy
| | - Alessandro Venturi
- ITIR - Institute for Transformative Innovation Research, University of Pavia, Pavia, Italy
- Department of Political and Social Sciences, University of Pavia, Pavia, Italy
- Bureau of the Presidency, IRCCS Policlinico San Matteo Foundation, Pavia, Italy
| | - Haytham M Kaafarani
- Harvard Medical School, Boston, MA, USA
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Massachusetts General Hospital, Boston, MA, USA
| | - Tyler J Loftus
- Department of Surgery, University of Florida Health, Gainesville, FL, USA
| |
Collapse
|
17
|
Machine Learning in Cardiovascular Imaging: A Scoping Review of Published Literature. CURRENT RADIOLOGY REPORTS 2023; 11:34-45. [PMID: 36531124 PMCID: PMC9742664 DOI: 10.1007/s40134-022-00407-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/17/2022] [Indexed: 12/14/2022]
Abstract
Purpose of Review In this study, we planned and carried out a scoping review of the literature to learn how machine learning (ML) has been investigated in cardiovascular imaging (CVI). Recent Findings During our search, we found numerous studies that developed or utilized existing ML models for segmentation, classification, object detection, generation, and regression applications involving cardiovascular imaging data. We first quantitatively investigated the different aspects of study characteristics, data handling, model development, and performance evaluation in all studies that were included in our review. We then supplemented these findings with a qualitative synthesis to highlight the common themes in the studied literature and provided recommendations to pave the way for upcoming research. Summary ML is a subfield of artificial intelligence (AI) that enables computers to learn human-like decision-making from data. Due to its novel applications, ML is gaining more and more attention from researchers in the healthcare industry. Cardiovascular imaging is an active area of research in medical imaging with lots of room for incorporating new technologies, like ML. Supplementary Information The online version contains supplementary material available at 10.1007/s40134-022-00407-8.
Collapse
|