1
|
Warren BE, Alkhalifah F, Ahrari A, Min A, Fawzy A, Annamalai G, Jaberi A, Beecroft R, Kachura JR, Mafeld SC. Feasibility of Artificial Intelligence Powered Adverse Event Analysis: Using a Large Language Model to Analyze Microwave Ablation Malfunction Data. Can Assoc Radiol J 2025; 76:171-179. [PMID: 39169480 DOI: 10.1177/08465371241269436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2024] Open
Abstract
Objectives: Determine if a large language model (LLM, GPT-4) can label and consolidate and analyze interventional radiology (IR) microwave ablation device safety event data into meaningful summaries similar to humans. Methods: Microwave ablation safety data from January 1, 2011 to October 31, 2023 were collected and type of failure was categorized by human readers. Using GPT-4 and iterative prompt development, the data were classified. Iterative summarization of the reports was performed using GPT-4 to generate a final summary of the large text corpus. Results: Training (n = 25), validation (n = 639), and test (n = 79) data were split to reflect real-world deployment of an LLM for this task. GPT-4 demonstrated high accuracy in the multiclass classification problem of microwave ablation device data (accuracy [95% CI]: training data 96.0% [79.7, 99.9], validation 86.4% [83.5, 89.0], test 87.3% [78.0, 93.8]). The text content was distilled through GPT-4 and iterative summarization prompts. A final summary was created which reflected the clinically relevant insights from the microwave ablation data relative to human interpretation but had inaccurate event class counts. Conclusion: The LLM emulated the human analysis, suggesting feasibility of using LLMs to process large volumes of IR safety data as a tool for clinicians. It accurately labelled microwave ablation device event data by type of malfunction through few-shot learning. Content distillation was used to analyze a large text corpus (>650 reports) and generate an insightful summary which was like the human interpretation.
Collapse
Affiliation(s)
- Blair E Warren
- Department of Medical Imaging, University of Toronto, Temerty Faculty of Medicine, Toronto, ON, Canada
- Division of Vascular and Interventional Radiology, Joint Department of Medical Imaging, University Health Network, Toronto, ON, Canada
| | - Fahd Alkhalifah
- Department of Medical Imaging, University of Toronto, Temerty Faculty of Medicine, Toronto, ON, Canada
- Division of Vascular and Interventional Radiology, Joint Department of Medical Imaging, University Health Network, Toronto, ON, Canada
| | - Aida Ahrari
- Department of Medical Imaging, University of Toronto, Temerty Faculty of Medicine, Toronto, ON, Canada
- Division of Vascular and Interventional Radiology, Joint Department of Medical Imaging, University Health Network, Toronto, ON, Canada
| | - Adam Min
- Department of Medical Imaging, University of Toronto, Temerty Faculty of Medicine, Toronto, ON, Canada
- Division of Vascular and Interventional Radiology, Joint Department of Medical Imaging, University Health Network, Toronto, ON, Canada
| | - Aly Fawzy
- Department of Medical Imaging, University of Toronto, Temerty Faculty of Medicine, Toronto, ON, Canada
| | - Ganesan Annamalai
- Department of Medical Imaging, University of Toronto, Temerty Faculty of Medicine, Toronto, ON, Canada
- Division of Vascular and Interventional Radiology, Joint Department of Medical Imaging, University Health Network, Toronto, ON, Canada
| | - Arash Jaberi
- Department of Medical Imaging, University of Toronto, Temerty Faculty of Medicine, Toronto, ON, Canada
- Division of Vascular and Interventional Radiology, Joint Department of Medical Imaging, University Health Network, Toronto, ON, Canada
| | - Robert Beecroft
- Department of Medical Imaging, University of Toronto, Temerty Faculty of Medicine, Toronto, ON, Canada
- Division of Vascular and Interventional Radiology, Joint Department of Medical Imaging, University Health Network, Toronto, ON, Canada
| | - John R Kachura
- Department of Medical Imaging, University of Toronto, Temerty Faculty of Medicine, Toronto, ON, Canada
- Division of Vascular and Interventional Radiology, Joint Department of Medical Imaging, University Health Network, Toronto, ON, Canada
| | - Sebastian C Mafeld
- Department of Medical Imaging, University of Toronto, Temerty Faculty of Medicine, Toronto, ON, Canada
- Division of Vascular and Interventional Radiology, Joint Department of Medical Imaging, University Health Network, Toronto, ON, Canada
| |
Collapse
|
2
|
van Rijswijk RE, Bogdanovic M, Roy J, Yeung KK, Zeebregts CJ, Geelkerken RH, Groot Jebbink E, Wolterink JM, Reijnen MMPJ. Multimodal Artificial Intelligence Model for Prediction of Abdominal Aortic Aneurysm Shrinkage After Endovascular Repair ( the ART in EVAR study). J Endovasc Ther 2025:15266028251314359. [PMID: 39882767 DOI: 10.1177/15266028251314359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]
Abstract
PURPOSE The goal of the study described in this protocol is to build a multimodal artificial intelligence (AI) model to predict abdominal aortic aneurysm (AAA) shrinkage 1 year after endovascular aneurysm repair (EVAR). METHODS In this retrospective observational multicenter study, approximately 1000 patients will be enrolled from hospital records of 5 experienced vascular centers. Patients will be included if they underwent elective EVAR for infrarenal AAA with initial assisted technical success and had imaging available of the same modality preoperatively and at 1-year follow-up (CTA-CTA or US-US). Data collection will include baseline and vascular characteristics, medication use, procedural data, preoperative and postoperative imaging data, follow-up data, and complications. PROPOSED ANALYSES The cohort will be stratified into 3 groups of AAA remodeling based on the maximum AAA diameter difference between the preoperative and 1-year postoperative moment. Patients with a diameter reduction of ≥5 mm will be assigned to the AAA shrinkage group, cases with an increase of ≥5 mm will be assigned to the AAA growth group, and patients with a diameter increase or reduction of <5 mm will be assigned to the stable AAA group. Furthermore, an additional fourth group will include all patients who underwent an AAA-related reintervention within the first year after EVAR, because both the complication and the reintervention might have influenced the state of AAA remodeling at 1 year. The preoperative and postoperative CTA scans will be used for anatomical AAA analysis and biomechanical assessment through semi-automatic segmentation and finite element analysis. All collected clinical, biomechanical, and imaging data will be used to create an AI prediction model for AAA shrinkage. Explainable AI techniques will be used to identify the most descriptive input features in the model. Predicting factors resulting from the AI model will be compared with conventional univariate and multivariate logistic regression analyses to find the best model for the prediction of AAA shrinkage. The study is registered at www.clinicaltrials.gov under the registration number NCT06250998. CLINICAL IMPACT This study aims to develop a robust and high-performance AI model for predicting AAA shrinkage one-year after EVAR, with great potential for optimizing both EVAR treatment and follow-up. The model can identify cases with an initially lower chance of early AAA shrinkage, in whom EVAR-treatment could be tailored by including additional preoperative coil embolization, active sac management and/or postoperative tranexamic acid therapy, which have shown to promote AAA shrinkage rate but are too complex and costly to perform in all patients. The model could aid in stratification of post-EVAR surveillance based on the patient's individual risk and possibly decrease follow-up for the 40-50% of patients who will experience AAA sac shrinkage. Overall, the AI prediction model is expected to improve patient survival and decrease the number of reinterventions after EVAR and associated healthcare costs.
Collapse
Affiliation(s)
- Rianne E van Rijswijk
- Department of Vascular Surgery, Rijnstate, Arnhem, The Netherlands
- Multi-Modality Medical Imaging Group, Technical Medical Centre, University of Twente, Enschede, The Netherlands
| | - Marko Bogdanovic
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Vascular Surgery, Karolinska University Hospital, Stockholm, Sweden
| | - Joy Roy
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Vascular Surgery, Karolinska University Hospital, Stockholm, Sweden
| | - Kak Khee Yeung
- Department of Surgery, Amsterdam University Medical Center, Vrije Universiteit Amsterdam, Amsterdam Cardiovascular Sciences, Amsterdam, The Netherlands
| | - Clark J Zeebregts
- Division of Vascular Surgery, Department of Surgery, University Medical Center Groningen, Groningen, The Netherlands
| | - Robert H Geelkerken
- Multi-Modality Medical Imaging Group, Technical Medical Centre, University of Twente, Enschede, The Netherlands
- Department of Vascular Surgery, Medisch Spectrum Twente, Enschede, The Netherlands
| | - Erik Groot Jebbink
- Department of Vascular Surgery, Rijnstate, Arnhem, The Netherlands
- Multi-Modality Medical Imaging Group, Technical Medical Centre, University of Twente, Enschede, The Netherlands
| | - Jelmer M Wolterink
- Department of Applied Mathematics, Technical Medical Centre, University of Twente, Enschede, The Netherlands
| | - Michel M P J Reijnen
- Department of Vascular Surgery, Rijnstate, Arnhem, The Netherlands
- Multi-Modality Medical Imaging Group, Technical Medical Centre, University of Twente, Enschede, The Netherlands
| |
Collapse
|
3
|
Cronin P, Nasser OMH, Rawson JV. Currently Available Radiology-Specific Reporting Guidelines. Acad Radiol 2025:S1076-6332(25)00014-5. [PMID: 39880692 DOI: 10.1016/j.acra.2025.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 12/01/2024] [Accepted: 01/14/2025] [Indexed: 01/31/2025]
Abstract
The aim of this paper is to contextualize and review reporting guidelines available at the EQUATOR Network that are most relevant to radiology-specific investigations. Eight EQUATOR Network reporting guidelines for the clinical area of radiology, not including the subspecialized areas of imaging of the cardiovascular, neurologic, and oncologic diseases are reviewed and discussed. The reporting guidelines are for diagnostic and therapeutic clinical research. Why the reporting guideline was development, by whom, their aims and what they hope to achieve are discussed. A table summarizes what the reporting guideline is provided for; an acronym if present is given; a full bibliographic reference with PMID number; the reporting guideline website URL or link; the study design and section of the report that the guideline applies to; and the date that the reporting guideline was last updated.
Collapse
Affiliation(s)
- Paul Cronin
- Emory Department of Radiology and Imaging Science, Division of Cardiothoracic Imaging, Emory University, Atlanta, Georgia (P.C.).
| | - Omar Msto Hussain Nasser
- Harvard Medical School, Boston, Massachusetts (O.M.H.N., J.V.R.); Department of Radiology, Beth Israel Medical Center, Boston, Massachusetts (O.M.H.N., J.V.R.)
| | - James V Rawson
- Harvard Medical School, Boston, Massachusetts (O.M.H.N., J.V.R.); Department of Radiology, Beth Israel Medical Center, Boston, Massachusetts (O.M.H.N., J.V.R.)
| |
Collapse
|
4
|
Feigerlova E, Hani H, Hothersall-Davies E. A systematic review of the impact of artificial intelligence on educational outcomes in health professions education. BMC MEDICAL EDUCATION 2025; 25:129. [PMID: 39871336 PMCID: PMC11773843 DOI: 10.1186/s12909-025-06719-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Accepted: 01/20/2025] [Indexed: 01/29/2025]
Abstract
BACKGROUND Artificial intelligence (AI) has a variety of potential applications in health professions education and assessment; however, measurable educational impacts of AI-based educational strategies on learning outcomes have not been systematically evaluated. METHODS A systematic literature search was conducted using electronic databases (CINAHL Plus, EMBASE, Proquest, Pubmed, Cochrane Library, and Web of Science) to identify studies published until October 1st 2024, analyzing the impact of AI-based tools/interventions in health profession assessment and/or training on educational outcomes. The present analysis follows the PRISMA 2020 statement for systematic reviews and the structured approach to reporting in health care education for evidence synthesis. RESULTS The final analysis included twelve studies. All were single centers with sample sizes ranging from 4 to 180 participants. Three studies were randomized controlled trials, and seven had a quasi-experimental design. Two studies were observational. The studies had a heterogenous design. Confounding variables were not controlled. None of the studies provided learning objectives or descriptions of the competencies to be achieved. Three studies applied learning theories in the development of AI-powered educational strategies. One study reported the analysis of the authenticity of the learning environment. No study provided information on the impact of feedback activities on learning outcomes. All studies corresponded to Kirkpatrick's second level evaluating technical skills or quantifiable knowledge. No study evaluated more complex tasks, such as the behavior of learners in the workplace. There was insufficient information on training datasets and copyright issues. CONCLUSIONS The results of the analysis show that the current evidence regarding measurable educational outcomes of AI-powered interventions in health professions education is poor. Further studies with a rigorous methodological approach are needed. The present work also highlights that there is no straightforward guide for evaluating the quality of research in AI-based education and suggests a series of criteria that should be considered. TRIAL REGISTRATION Methods and inclusion criteria were defined in advance, specified in a protocol and registered in the OSF registries ( https://osf.io/v5cgp/ ). CLINICAL TRIAL NUMBER not applicable.
Collapse
Affiliation(s)
- Eva Feigerlova
- Faculté de médecine, maïeutique et métiers de la santé, Université de Lorraine, Nancy, France.
- Centre universitaire d'enseignement par simulation (CUESiM), Hôpital virtuel de Lorraine, Université de Lorraine, Nancy, France.
- Institut national de la santé et de la recherche médicale (Inserm), Unité mixte de recherche (UMR) U1116 - Défaillance cardiovasculaire aiguë et chronique (DCAC), Université de Lorraine, Nancy, France.
- Centre Universitaire d'Enseignement par Simulation - CUESim Hôpital Virtuel de Lorraine - HVL, Faculté de Médecine, Maïeutique et Métiers de la Santé, 9, Avenue de la Forêt de Haye, Vandœuvre-lès-Nancy, 54505, France.
| | - Hind Hani
- Faculté de médecine, maïeutique et métiers de la santé, Université de Lorraine, Nancy, France
- Centre universitaire d'enseignement par simulation (CUESiM), Hôpital virtuel de Lorraine, Université de Lorraine, Nancy, France
| | | |
Collapse
|
5
|
Yamagishi Y, Nakamura Y, Hanaoka S, Abe O. Large Language Model Approach for Zero-Shot Information Extraction and Clustering of Japanese Radiology Reports: Algorithm Development and Validation. JMIR Cancer 2025; 11:e57275. [PMID: 39864093 DOI: 10.2196/57275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 12/16/2024] [Accepted: 12/18/2024] [Indexed: 01/28/2025] Open
Abstract
Background The application of natural language processing in medicine has increased significantly, including tasks such as information extraction and classification. Natural language processing plays a crucial role in structuring free-form radiology reports, facilitating the interpretation of textual content, and enhancing data utility through clustering techniques. Clustering allows for the identification of similar lesions and disease patterns across a broad dataset, making it useful for aggregating information and discovering new insights in medical imaging. However, most publicly available medical datasets are in English, with limited resources in other languages. This scarcity poses a challenge for development of models geared toward non-English downstream tasks. Objective This study aimed to develop and evaluate an algorithm that uses large language models (LLMs) to extract information from Japanese lung cancer radiology reports and perform clustering analysis. The effectiveness of this approach was assessed and compared with previous supervised methods. Methods This study employed the MedTxt-RR dataset, comprising 135 Japanese radiology reports from 9 radiologists who interpreted the computed tomography images of 15 lung cancer patients obtained from Radiopaedia. Previously used in the NTCIR-16 (NII Testbeds and Community for Information Access Research) shared task for clustering performance competition, this dataset was ideal for comparing the clustering ability of our algorithm with those of previous methods. The dataset was split into 8 cases for development and 7 for testing, respectively. The study's approach involved using the LLM to extract information pertinent to lung cancer findings and transforming it into numeric features for clustering, using the K-means method. Performance was evaluated using 135 reports for information extraction accuracy and 63 test reports for clustering performance. This study focused on the accuracy of automated systems for extracting tumor size, location, and laterality from clinical reports. The clustering performance was evaluated using normalized mutual information, adjusted mutual information , and the Fowlkes-Mallows index for both the development and test data. Results The tumor size was accurately identified in 99 out of 135 reports (73.3%), with errors in 36 reports (26.7%), primarily due to missing or incorrect size information. Tumor location and laterality were identified with greater accuracy in 112 out of 135 reports (83%); however, 23 reports (17%) contained errors mainly due to empty values or incorrect data. Clustering performance of the test data yielded an normalized mutual information of 0.6414, adjusted mutual information of 0.5598, and Fowlkes-Mallows index of 0.5354. The proposed method demonstrated superior performance across all evaluation metrics compared to previous methods. Conclusions The unsupervised LLM approach surpassed the existing supervised methods in clustering Japanese radiology reports. These findings suggest that LLMs hold promise for extracting information from radiology reports and integrating it into disease-specific knowledge structures.
Collapse
Affiliation(s)
- Yosuke Yamagishi
- Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Yuta Nakamura
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Tokyo, Japan
| | - Shouhei Hanaoka
- Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Osamu Abe
- Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
6
|
Miller R, Jackson L, Vilic D, Boyce L, Shuaib H. Artificial intelligence and machine learning capabilities in the detection of acute scaphoid fracture: a critical review. J Hand Surg Eur Vol 2025:17531934241312896. [PMID: 39846169 DOI: 10.1177/17531934241312896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/24/2025]
Abstract
This paper discusses the current literature surrounding the potential use of artificial intelligence and machine learning models in the diagnosis of acute obvious and occult scaphoid fractures. Current studies have notable methodological flaws and are at high risk of bias, precluding meaningful comparisons with clinician performance (the current reference standard). Specific areas should be addressed in future studies to help advance the meaningful and clinical use of artificial intelligence for radiograph interpretation.
Collapse
Affiliation(s)
- Robert Miller
- Plastic and Reconstructive Surgery Department, Guy's and St Thomas' Hospital, London, UK
| | - Laurence Jackson
- Clinical Scientific Computing, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - Dijana Vilic
- Clinical Scientific Computing, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - Louis Boyce
- St Andrew's Centre for Plastic Surgery and Burns, Broomfield Hospital, Chelmsford, UK
| | - Haris Shuaib
- Clinical Scientific Computing, Guy's and St Thomas' NHS Foundation Trust, London, UK
| |
Collapse
|
7
|
Bathla G, Zamboni CG, Larson N, Liu Y, Zhang H, Lee NH, Agarwal A, Soni N, Sonka M. Radiomics-Based Differentiation of Glioblastoma and Metastatic Disease: Impact of Different T1-Contrast-Enhanced Sequences on Radiomics Features and Model Performance. AJNR Am J Neuroradiol 2025:ajnr.A8470. [PMID: 39179298 DOI: 10.3174/ajnr.a8470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Accepted: 08/13/2024] [Indexed: 08/26/2024]
Abstract
BACKGROUND AND PURPOSE Even though glioblastoma (GB) and brain metastases (BM) can be differentiated using radiomics, it remains unclear if the model performance may vary based on the contrast-enhanced sequence used. Our aim was to evaluate the radiomics-based model performance for differentiation between GB and brain metastases BM using MPRAGE and volumetric interpolated breath-hold examination (VIBE) T1-contrast-enhanced sequence. MATERIALS AND METHODS T1 contrast-enhanced (T1-CE) MPRAGE and VIBE sequences acquired in 108 patients (31 GBs and 77 BM) during the same MRI session were retrospectively evaluated. After standardized image preprocessing and segmentation, radiomics features were extracted from necrotic and enhancing tumor components. Pearson correlation analysis of radiomics features from tumor subcomponents was also performed. A total of 90 machine learning pipelines were evaluated using a 5-fold cross-validation. Performance was measured by mean area under the curve (AUC)-receiver operating characteristic (ROC), log loss, and Brier scores. RESULTS A feature-wise comparison showed that the radiomics features between sequences were strongly correlated, with the highest correlation for shape-based features. The mean AUC across the top 10 pipelines ranged between 0.851 and 0.890 with T1-CE MPRAGE and between 0.869 and 0.907 with the T1-CE VIBE sequence. The top-performing models for the MPRAGE sequence commonly used support vector machines, while those for the VIBE sequence used either support vector machines or random forest. Common feature-reduction methods for top-performing models included linear combination filter and least absolute shrinkage and selection operator for both sequences. For the same machine learning feature-reduction pipeline, model performances were comparable (AUC-ROC difference range, -0.078-0.046). CONCLUSIONS Radiomics features derived from T1-CE MPRAGE and VIBE sequences are strongly correlated and may have similar overall classification performance for differentiating GB from BM.
Collapse
Affiliation(s)
- Girish Bathla
- From the Department of Radiology (G.B., C.G.Z.), University of Iowa Hospitals and Clinics, Iowa City, Iowa
- Division of Neuroradiology (G.B.), Department of Radiology, Mayo Clinic, Rochester, Minnesota
| | - Camila G Zamboni
- From the Department of Radiology (G.B., C.G.Z.), University of Iowa Hospitals and Clinics, Iowa City, Iowa
| | - Nicholas Larson
- Division of Clinical Trials and Biostatistics (N.L.), Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota
| | - Yanan Liu
- College of Engineering (Y.L., H.Z., N.H.L., M.S.), University of Iowa, Iowa City, Iowa
| | - Honghai Zhang
- College of Engineering (Y.L., H.Z., N.H.L., M.S.), University of Iowa, Iowa City, Iowa
| | - Nam H Lee
- College of Engineering (Y.L., H.Z., N.H.L., M.S.), University of Iowa, Iowa City, Iowa
| | - Amit Agarwal
- Division of Neuroradiology (A.A., N.S.), Department of Radiology, Mayo Clinic, Jacksonville, Florida
| | - Neetu Soni
- Division of Neuroradiology (A.A., N.S.), Department of Radiology, Mayo Clinic, Jacksonville, Florida
| | - Milan Sonka
- College of Engineering (Y.L., H.Z., N.H.L., M.S.), University of Iowa, Iowa City, Iowa
| |
Collapse
|
8
|
Koyun M, Taskent I. Evaluation of Advanced Artificial Intelligence Algorithms' Diagnostic Efficacy in Acute Ischemic Stroke: A Comparative Analysis of ChatGPT-4o and Claude 3.5 Sonnet Models. J Clin Med 2025; 14:571. [PMID: 39860577 PMCID: PMC11765597 DOI: 10.3390/jcm14020571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2024] [Revised: 01/15/2025] [Accepted: 01/16/2025] [Indexed: 01/27/2025] Open
Abstract
Background/Objectives: Acute ischemic stroke (AIS) is a leading cause of mortality and disability worldwide, with early and accurate diagnosis being critical for timely intervention and improved patient outcomes. This retrospective study aimed to assess the diagnostic performance of two advanced artificial intelligence (AI) models, Chat Generative Pre-trained Transformer (ChatGPT-4o) and Claude 3.5 Sonnet, in identifying AIS from diffusion-weighted imaging (DWI). Methods: The DWI images of a total of 110 cases (AIS group: n = 55, healthy controls: n = 55) were provided to the AI models via standardized prompts. The models' responses were compared to radiologists' gold-standard evaluations, and performance metrics such as sensitivity, specificity, and diagnostic accuracy were calculated. Results: Both models exhibited a high sensitivity for AIS detection (ChatGPT-4o: 100%, Claude 3.5 Sonnet: 94.5%). However, ChatGPT-4o demonstrated a significantly lower specificity (3.6%) compared to Claude 3.5 Sonnet (74.5%). The agreement with radiologists was poor for ChatGPT-4o (κ = 0.036; %95 CI: -0.013, 0.085) but good for Claude 3.5 Sonnet (κ = 0.691; %95 CI: 0.558, 0.824). In terms of the AIS hemispheric localization accuracy, Claude 3.5 Sonnet (67.2%) outperformed ChatGPT-4o (32.7%). Similarly, for specific AIS localization, Claude 3.5 Sonnet (30.9%) showed greater accuracy than ChatGPT-4o (7.3%), with these differences being statistically significant (p < 0.05). Conclusions: This study highlights the superior diagnostic performance of Claude 3.5 Sonnet compared to ChatGPT-4o in identifying AIS from DWI. Despite its advantages, both models demonstrated notable limitations in accuracy, emphasizing the need for further development before achieving full clinical applicability. These findings underline the potential of AI tools in radiological diagnostics while acknowledging their current limitations.
Collapse
Affiliation(s)
- Mustafa Koyun
- Department of Radiology, Kastamonu Training and Research Hospital, Kastamonu 37150, Turkey
| | - Ismail Taskent
- Department of Radiology, Kastamonu University, Kastamonu 37150, Turkey;
| |
Collapse
|
9
|
Dorosan M, Chen YL, Zhuang Q, Lam SWS. In Silico Evaluation of Algorithm-Based Clinical Decision Support Systems: Protocol for a Scoping Review. JMIR Res Protoc 2025; 14:e63875. [PMID: 39819973 DOI: 10.2196/63875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 09/30/2024] [Accepted: 10/09/2024] [Indexed: 01/19/2025] Open
Abstract
BACKGROUND Integrating algorithm-based clinical decision support (CDS) systems poses significant challenges in evaluating their actual clinical value. Such CDS systems are traditionally assessed via controlled but resource-intensive clinical trials. OBJECTIVE This paper presents a review protocol for preimplementation in silico evaluation methods to enable broadened impact analysis under simulated environments before clinical trials. METHODS We propose a scoping review protocol that follows an enhanced Arksey and O'Malley framework and PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines to investigate the scope and research gaps in the in silico evaluation of algorithm-based CDS models-specifically CDS decision-making end points and objectives, evaluation metrics used, and simulation paradigms used to assess potential impacts. The databases searched are PubMed, Embase, CINAHL, PsycINFO, Cochrane, IEEEXplore, Web of Science, and arXiv. A 2-stage screening process identified pertinent articles. The information extracted from articles was iteratively refined. The review will use thematic, trend, and descriptive analyses to meet scoping aims. RESULTS We conducted an automated search of the databases above in May 2023, with most title and abstract screenings completed by November 2023 and full-text screening extended from December 2023 to May 2024. Concurrent charting and full-text analysis were carried out, with the final analysis and manuscript preparation set for completion in July 2024. Publication of the review results is targeted from July 2024 to February 2025. As of April 2024, a total of 21 articles have been selected following a 2-stage screening process; these will proceed to data extraction and analysis. CONCLUSIONS We refined our data extraction strategy through a collaborative, multidisciplinary approach, planning to analyze results using thematic analyses to identify approaches to in silico evaluation. Anticipated findings aim to contribute to developing a unified in silico evaluation framework adaptable to various clinical workflows, detailing clinical decision-making characteristics, impact measures, and reusability of methods. The study's findings will be published and presented in forums combining artificial intelligence and machine learning, clinical decision-making, and health technology impact analysis. Ultimately, we aim to bridge the development-deployment gap through in silico evaluation-based potential impact assessments. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/63875.
Collapse
Affiliation(s)
- Michael Dorosan
- Health Services Research Centre, Singapore Health Services Pte Ltd, Singapore, Singapore
| | - Ya-Lin Chen
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | - Qingyuan Zhuang
- Division of Supportive and Palliative Care, National Cancer Centre Singapore, Singapore, Singapore
- Data and Computational Science Core, National Cancer Centre Singapore, Singapore, Singapore
- Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
| | - Shao Wei Sean Lam
- Health Services Research Centre, Singapore Health Services Pte Ltd, Singapore, Singapore
- Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
- Health Services Research Institute, SingHealth Duke-NUS Academic Medical Centre, Singapore, Singapore
- Health Services and Systems Research, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
- Lee Kong Chian School of Business, Singapore Management University, Singapore, Singapore
| |
Collapse
|
10
|
Liu S, McCoy AB, Wright A. Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines. J Am Med Inform Assoc 2025:ocaf008. [PMID: 39812777 DOI: 10.1093/jamia/ocaf008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 12/17/2024] [Accepted: 01/03/2025] [Indexed: 01/16/2025] Open
Abstract
OBJECTIVE The objectives of this study are to synthesize findings from recent research of retrieval-augmented generation (RAG) and large language models (LLMs) in biomedicine and provide clinical development guidelines to improve effectiveness. MATERIALS AND METHODS We conducted a systematic literature review and a meta-analysis. The report was created in adherence to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 analysis. Searches were performed in 3 databases (PubMed, Embase, PsycINFO) using terms related to "retrieval augmented generation" and "large language model," for articles published in 2023 and 2024. We selected studies that compared baseline LLM performance with RAG performance. We developed a random-effect meta-analysis model, using odds ratio as the effect size. RESULTS Among 335 studies, 20 were included in this literature review. The pooled effect size was 1.35, with a 95% confidence interval of 1.19-1.53, indicating a statistically significant effect (P = .001). We reported clinical tasks, baseline LLMs, retrieval sources and strategies, as well as evaluation methods. DISCUSSION Building on our literature review, we developed Guidelines for Unified Implementation and Development of Enhanced LLM Applications with RAG in Clinical Settings to inform clinical applications using RAG. CONCLUSION Overall, RAG implementation showed a 1.35 odds ratio increase in performance compared to baseline LLMs. Future research should focus on (1) system-level enhancement: the combination of RAG and agent, (2) knowledge-level enhancement: deep integration of knowledge into LLM, and (3) integration-level enhancement: integrating RAG systems within electronic health records.
Collapse
Affiliation(s)
- Siru Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States
- Department of Computer Science, Vanderbilt University, Nashville, TN 37212, United States
| | - Allison B McCoy
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States
| | - Adam Wright
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37212, United States
| |
Collapse
|
11
|
Kelly BS, Duignan S, Mathur P, Dillon H, Lee EH, Yeom KW, Keane PA, Lawlor A, Killeen RP. Can ChatGPT4-vision identify radiologic progression of multiple sclerosis on brain MRI? Eur Radiol Exp 2025; 9:9. [PMID: 39812885 PMCID: PMC11735712 DOI: 10.1186/s41747-024-00547-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Accepted: 12/16/2024] [Indexed: 01/16/2025] Open
Abstract
BACKGROUND The large language model ChatGPT can now accept image input with the GPT4-vision (GPT4V) version. We aimed to compare the performance of GPT4V to pretrained U-Net and vision transformer (ViT) models for the identification of the progression of multiple sclerosis (MS) on magnetic resonance imaging (MRI). METHODS Paired coregistered MR images with and without progression were provided as input to ChatGPT4V in a zero-shot experiment to identify radiologic progression. Its performance was compared to pretrained U-Net and ViT models. Accuracy was the primary evaluation metric and 95% confidence interval (CIs) were calculated by bootstrapping. We included 170 patients with MS (50 males, 120 females), aged 21-74 years (mean 42.3), imaged at a single institution from 2019 to 2021, each with 2-5 MRI studies (496 in total). RESULTS One hundred seventy patients were included, 110 for training, 30 for tuning, and 30 for testing; 100 unseen paired images were randomly selected from the test set for evaluation. Both U-Net and ViT had 94% (95% CI: 89-98%) accuracy while GPT4V had 85% (77-91%). GPT4V gave cautious nonanswers in six cases. GPT4V had precision (specificity), recall (sensitivity), and F1 score of 89% (75-93%), 92% (82-98%), 91 (82-97%) compared to 100% (100-100%), 88 (78-96%), and 0.94 (88-98%) for U-Net and 94% (87-100%), 94 (88-100%), and 94 (89-98%) for ViT. CONCLUSION The performance of GPT4V combined with its accessibility suggests has the potential to impact AI radiology research. However, misclassified cases and overly cautious non-answers confirm that it is not yet ready for clinical use. RELEVANCE STATEMENT GPT4V can identify the radiologic progression of MS in a simplified experimental setting. However, GPT4V is not a medical device, and its widespread availability highlights the need for caution and education for lay users, especially those with limited access to expert healthcare. KEY POINTS Without fine-tuning or the need for prior coding experience, GPT4V can perform a zero-shot radiologic change detection task with reasonable accuracy. However, in absolute terms, in a simplified "spot the difference" medical imaging task, GPT4V was inferior to state-of-the-art computer vision methods. GPT4V's performance metrics were more similar to the ViT than the U-net. This is an exploratory experimental study and GPT4V is not intended for use as a medical device.
Collapse
Affiliation(s)
- Brendan S Kelly
- St Vincent's University Hospital, Dublin, Ireland.
- Insight Centre for Data Analytics, UCD, Dublin, Ireland.
- Wellcome Trust-HRB, Irish Clinical Academic Training, Dublin, Ireland.
- School of Medicine, University College Dublin, Dublin, Ireland.
| | | | | | - Henry Dillon
- St Vincent's University Hospital, Dublin, Ireland
| | - Edward H Lee
- Lucille Packard Children's Hospital at Stanford, Stanford, CA, USA
| | - Kristen W Yeom
- Lucille Packard Children's Hospital at Stanford, Stanford, CA, USA
| | | | | | - Ronan P Killeen
- St Vincent's University Hospital, Dublin, Ireland
- School of Medicine, University College Dublin, Dublin, Ireland
| |
Collapse
|
12
|
Bashir Z, Lin M, Feragen A, Mikolaj K, Taksøe-Vester C, Christensen AN, Svendsen MBS, Fabricius MH, Andreasen L, Nielsen M, Tolsgaard MG. Clinical validation of explainable AI for fetal growth scans through multi-level, cross-institutional prospective end-user evaluation. Sci Rep 2025; 15:2074. [PMID: 39820804 PMCID: PMC11739376 DOI: 10.1038/s41598-025-86536-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 01/13/2025] [Indexed: 01/19/2025] Open
Abstract
We aimed to develop and evaluate Explainable Artificial Intelligence (XAI) for fetal ultrasound using actionable concepts as feedback to end-users, using a prospective cross-center, multi-level approach. We developed, implemented, and tested a deep-learning model for fetal growth scans using both retrospective and prospective data. We used a modified Progressive Concept Bottleneck Model with pre-established clinical concepts as explanations (feedback on image optimization and presence of anatomical landmarks) as well as segmentations (outlining anatomical landmarks). The model was evaluated prospectively by assessing the following: the model's ability to assess standard plane quality, the correctness of explanations, the clinical usefulness of explanations, and the model's ability to discriminate between different levels of expertise among clinicians. We used 9352 annotated images for model development and 100 videos for prospective evaluation. Overall classification accuracy was 96.3%. The model's performance in assessing standard plane quality was on par with that of clinicians. Agreement between model segmentations and explanations provided by expert clinicians was found in 83.3% and 74.2% of cases, respectively. A panel of clinicians evaluated segmentations as useful in 72.4% of cases and explanations as useful in 75.0% of cases. Finally, the model reliably discriminated between the performances of clinicians with different levels of experience (p- values < 0.01 for all measures) Our study has successfully developed an Explainable AI model for real-time feedback to clinicians performing fetal growth scans. This work contributes to the existing literature by addressing the gap in the clinical validation of Explainable AI models within fetal medicine, emphasizing the importance of multi-level, cross-institutional, and prospective evaluation with clinician end-users. The prospective clinical validation uncovered challenges and opportunities that could not have been anticipated if we had only focused on retrospective development and validation, such as leveraging AI to gauge operator competence in fetal ultrasound.
Collapse
Affiliation(s)
- Zahra Bashir
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
- Department of Obstetrics and Gynecology, Slagelse Hospital, Fælledvej 11, 4200, Slagelse, Denmark.
- Copenhagen Academy for Medical Education and Simulation (CAMES), Rigshospitalet, Denmark.
| | - Manxi Lin
- Technical University of Denmark (DTU), Lyngby, Denmark
| | - Aasa Feragen
- Technical University of Denmark (DTU), Lyngby, Denmark
| | - Kamil Mikolaj
- Technical University of Denmark (DTU), Lyngby, Denmark
| | - Caroline Taksøe-Vester
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Copenhagen Academy for Medical Education and Simulation (CAMES), Rigshospitalet, Denmark
- Center of Fetal Medicine, Dept. of Obstetrics, Copenhagen University Hospital, Rigshospitalet, Denmark
| | | | - Morten B S Svendsen
- Copenhagen Academy for Medical Education and Simulation (CAMES), Rigshospitalet, Denmark
| | - Mette Hvilshøj Fabricius
- Department of Obstetrics and Gynecology, Slagelse Hospital, Fælledvej 11, 4200, Slagelse, Denmark
| | - Lisbeth Andreasen
- Department of Obstetrics and Gynecology, Hvidovre Hospital, Hvidovre, Denmark
| | - Mads Nielsen
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Martin Grønnebæk Tolsgaard
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Copenhagen Academy for Medical Education and Simulation (CAMES), Rigshospitalet, Denmark
- Center of Fetal Medicine, Dept. of Obstetrics, Copenhagen University Hospital, Rigshospitalet, Denmark
| |
Collapse
|
13
|
Chouvarda I, Colantonio S, Verde ASC, Jimenez-Pastor A, Cerdá-Alberich L, Metz Y, Zacharias L, Nabhani-Gebara S, Bobowicz M, Tsakou G, Lekadir K, Tsiknakis M, Martí-Bonmati L, Papanikolaou N. Differences in technical and clinical perspectives on AI validation in cancer imaging: mind the gap! Eur Radiol Exp 2025; 9:7. [PMID: 39812924 PMCID: PMC11735720 DOI: 10.1186/s41747-024-00543-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Accepted: 11/29/2024] [Indexed: 01/16/2025] Open
Abstract
Good practices in artificial intelligence (AI) model validation are key for achieving trustworthy AI. Within the cancer imaging domain, attracting the attention of clinical and technical AI enthusiasts, this work discusses current gaps in AI validation strategies, examining existing practices that are common or variable across technical groups (TGs) and clinical groups (CGs). The work is based on a set of structured questions encompassing several AI validation topics, addressed to professionals working in AI for medical imaging. A total of 49 responses were obtained and analysed to identify trends and patterns. While TGs valued transparency and traceability the most, CGs pointed out the importance of explainability. Among the topics where TGs may benefit from further exposure are stability and robustness checks, and mitigation of fairness issues. On the other hand, CGs seemed more reluctant towards synthetic data for validation and would benefit from exposure to cross-validation techniques, or segmentation metrics. Topics emerging from the open questions were utility, capability, adoption and trustworthiness. These findings on current trends in AI validation strategies may guide the creation of guidelines necessary for training the next generation of professionals working with AI in healthcare and contribute to bridging any technical-clinical gap in AI validation. RELEVANCE STATEMENT: This study recognised current gaps in understanding and applying AI validation strategies in cancer imaging and helped promote trust and adoption for interdisciplinary teams of technical and clinical researchers. KEY POINTS: Clinical and technical researchers emphasise interpretability, external validation with diverse data, and bias awareness in AI validation for cancer imaging. In cancer imaging AI research, clinical researchers prioritise explainability, while technical researchers focus on transparency and traceability, and see potential in synthetic datasets. Researchers advocate for greater homogenisation of AI validation practices in cancer imaging.
Collapse
Affiliation(s)
- Ioanna Chouvarda
- School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | - Sara Colantonio
- Institute of Information Science and Technologies of the National Research Council of Italy, Pisa, Italy
| | - Ana S C Verde
- Computational Clinical Imaging Group (CCIG), Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal
| | | | - Leonor Cerdá-Alberich
- Biomedical Imaging Research Group (GIBI230), La Fe Health Research Institute, Valencia, Spain
| | - Yannick Metz
- Data Analysis and Visualization, University of Konstanz, Konstanz, Germany
| | | | - Shereen Nabhani-Gebara
- Faculty of Health, Science, Social Care & Education, Kingston University London, London, UK
| | - Maciej Bobowicz
- 2nd Department of Radiology, Medical University of Gdansk, Gdansk, Poland
| | - Gianna Tsakou
- Research and Development Lab, Gruppo Maggioli Greek Branch, Maroussi, Greece
| | - Karim Lekadir
- Departament de Matemàtiques i Informàtica, Artificial Intelligence in Medicine Lab (BCN-AIM), Universitat de Barcelona, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Manolis Tsiknakis
- Computational BioMedicine Laboratory (CBML), Foundation for Research and Technology-Hellas (FORTH), Heraklion, Greece
| | - Luis Martí-Bonmati
- Biomedical Imaging Research Group (GIBI230), La Fe Health Research Institute, Valencia, Spain
- Radiology Department, La Fe Polytechnic and University Hospital and Health Research Institute, Valencia, Spain
| | - Nikolaos Papanikolaou
- Computational Clinical Imaging Group (CCIG), Champalimaud Research, Champalimaud Foundation, Lisbon, Portugal
| |
Collapse
|
14
|
Zhao H, Xu Z, Chen L, Wu L, Cui Z, Ma J, Sun T, Lei Y, Wang N, Hu H, Tan Y, Lu W, Yang W, Liao K, Teng G, Liang X, Li Y, Feng C, Nie T, Han X, Xiang D, Majoie CBLM, van Zwam WH, van der Lugt A, van der Sluijs PM, van Walsum T, Feng Y, Liu G, Huang Y, Liu W, Kan X, Su R, Zhang W, Wang X, Zheng C. Large-scale pretrained frame generative model enables real-time low-dose DSA imaging: An AI system development and multi-center validation study. MED 2025; 6:100497. [PMID: 39163857 DOI: 10.1016/j.medj.2024.07.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 05/15/2024] [Accepted: 07/25/2024] [Indexed: 08/22/2024]
Abstract
BACKGROUND Digital subtraction angiography (DSA) devices are commonly used in numerous interventional procedures across various parts of the body, necessitating multiple scans per procedure, which results in significant radiation exposure for both doctors and patients. Inspired by generative artificial intelligence techniques, this study proposes GenDSA, a large-scale pretrained multi-frame generative model-based real-time and low-dose DSA imaging system. METHODS GenDSA was developed to generate 1-, 2-, and 3-frame sequences following each real frame. A large-scale dataset comprising ∼3 million DSA images from 27,117 patients across 10 hospitals was constructed to pretrain, fine-tune, and validate GenDSA. Two other datasets from 25 hospitals were used for evaluation. Objective evaluations included SSIM and PSNR. Five interventional radiologists independently assessed the quality of the generated frames using the Likert scale and visual Turing test. Scoring consistency among the radiologists was measured using the Kendall coefficient of concordance (W). The Fleiss' kappa values were used for inter-rater agreement analysis for visual Turing tests. FINDINGS Using only one-third of the clinical radiation dose, videos generated by GenDSA were perfectly consistent with real videos. Objective evaluations demonstrated that GenDSA's performance (PSNR = 36.83, SSIM = 0.911, generation time = 0.07 s/frame) surpassed state-of-the-art algorithms. Subjective ratings and statistical results from five doctors indicated no significant difference between real and generated videos. Furthermore, the generated videos were comparable to real videos in overall quality (4.905 vs. 4.935) and lesion assessment (4.825 vs. 4.860). CONCLUSIONS With clear clinical and translational values, the developed GenDSA can significantly reduce radiation damage to both doctors and patients during DSA-guided procedures. FUNDING This study was supported by the National Key R&D Program and the National Natural Science Foundation of China.
Collapse
Affiliation(s)
- Huangxuan Zhao
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China.
| | - Ziyang Xu
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Lei Chen
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Linxia Wu
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Ziwei Cui
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Jinqiang Ma
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Tao Sun
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Yu Lei
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Nan Wang
- Department of Radiology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Hongyao Hu
- Department of Interventional Radiology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Yiqing Tan
- Department of Radiology, Tongren Hospital of Wuhan University (Wuhan Third Hospital), Wuhan University, Wuhan, China
| | - Wei Lu
- Department of Interventional Radiology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Wenzhong Yang
- Department of Radiology, Maternal and Child Health Hospital of Hubei Province, Wuhan, China
| | - Kaibing Liao
- Department of Radiology, Hubei Integrated Traditional Chinese and Western Medicine Hospital, Wuhan, China
| | - Gaojun Teng
- Department of Radiology, Zhongda Hospital, Medical School, Southeast University, Nanjing, China
| | - Xiaoyun Liang
- Institute of Research and Clinical Innovations, Neusoft Medical Systems Co., Ltd., Shanghai, China
| | - Yi Li
- Institute of Research and Clinical Innovations, Neusoft Medical Systems Co., Ltd., Shanghai, China
| | - Congcong Feng
- CV Systems Research and Development Department, Neusoft Medical Systems Co., Ltd., Shenyang, China
| | - Tong Nie
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Xiaoyu Han
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Dongqiao Xiang
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Charles B L M Majoie
- Department of Radiology and Nuclear Medicine, Amsterdam University Medical Centers, location AMC, Amsterdam, the Netherlands
| | - Wim H van Zwam
- Department of Radiology and Nuclear Medicine, Cardiovascular Research Institute Maastricht, Maastricht University Medical Center, Maastricht, the Netherlands
| | - Aad van der Lugt
- Department of Radiology & Nuclear Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - P Matthijs van der Sluijs
- Department of Radiology & Nuclear Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Theo van Walsum
- Department of Radiology & Nuclear Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Yun Feng
- Center for Biological Imaging, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Guoli Liu
- CV Business Unit, Neusoft Medical Systems Co., Ltd., Shenyang, China
| | - Yan Huang
- CV Business Unit, Neusoft Medical Systems Co., Ltd., Shenyang, China
| | - Wenyu Liu
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Xuefeng Kan
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China.
| | - Ruisheng Su
- Department of Radiology & Nuclear Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands.
| | - Weihua Zhang
- Department of Radiology, Zhongda Hospital, Medical School, Southeast University, Nanjing, China.
| | - Xinggang Wang
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - Chuansheng Zheng
- Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China.
| |
Collapse
|
15
|
Barry N, Kendrick J, Molin K, Li S, Rowshanfarzad P, Hassan GM, Dowling J, Parizel PM, Hofman MS, Ebert MA. Evaluating the impact of the Radiomics Quality Score: a systematic review and meta-analysis. Eur Radiol 2025:10.1007/s00330-024-11341-y. [PMID: 39794540 DOI: 10.1007/s00330-024-11341-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 11/15/2024] [Accepted: 12/09/2024] [Indexed: 01/13/2025]
Abstract
OBJECTIVES Conduct a systematic review and meta-analysis on the application of the Radiomics Quality Score (RQS). MATERIALS AND METHODS A search was conducted from January 1, 2022, to December 31, 2023, for systematic reviews which implemented the RQS. Identification of articles prior to 2022 was via a previously published review. Quality scores of individual radiomics papers, their associated criteria scores, and these scores from all readers were extracted. Errors in the application of RQS criteria were noted and corrected. The RQS of radiomics papers were matched with the publication date, imaging modality, and country, where available. RESULTS A total of 130 systematic reviews were included, and individual quality scores 117/130 (90.0%), criteria scores 98/130 (75.4%), and multiple reader data 24/130 (18.5%) were extracted. 3258 quality scores were correlated with the radiomics study date of publication. Criteria scoring errors were discovered in 39/98 (39.8%) of articles. Overall mean RQS was 9.4 ± 6.4 (95% CI, 9.1-9.6) (26.1% ± 17.8% (25.3%-26.7%)). Quality scores were positively correlated with publication year (Pearson R = 0.32, p < 0.01) and significantly higher after publication of the RQS (year < 2018, 5.6 ± 6.1 (5.1-6.1); year ≥ 2018, 10.1 ± 6.1 (9.9-10.4); p < 0.01). Only 233/3258 (7.2%) scores were ≥ 50% of the maximum RQS. Quality scores were significantly different across imaging modalities (p < 0.01). Ten criteria were positively correlated with publication year, and one was negatively correlated. CONCLUSION Radiomics study adherence to the RQS is increasing with time, although a vast majority of studies are developmental and rarely provide a high level of evidence to justify the clinical translation of proposed models. KEY POINTS Question What level of adherence to the Radiomics Quality Score have radiomics studies achieved to date, has it increased with time, and is it sufficient? Findings A meta-analysis of 3258 quality scores extracted from 130 review articles resulted in a mean score of 9.4 ± 6.4. Quality scores were positively correlated with time. Clinical relevance Although quality scores of radiomics studies have increased with time, many studies have not demonstrated sufficient evidence for clinical translation. As new appraisal tools emerge, the current role of the Radiomics Quality Score may change.
Collapse
Affiliation(s)
- Nathaniel Barry
- School of Physics, Mathematics and Computing, University of Western Australia, Crawley, WA, Australia.
- Centre for Advanced Technologies in Cancer Research (CATCR), Perth, WA, Australia.
| | - Jake Kendrick
- School of Physics, Mathematics and Computing, University of Western Australia, Crawley, WA, Australia
- Centre for Advanced Technologies in Cancer Research (CATCR), Perth, WA, Australia
- Australian Centre for Quantitative Imaging, Medical School, University of Western Australia, Crawley, WA, Australia
| | - Kaylee Molin
- School of Physics, Mathematics and Computing, University of Western Australia, Crawley, WA, Australia
| | - Suning Li
- School of Physics, Mathematics and Computing, University of Western Australia, Crawley, WA, Australia
- Australian Centre for Quantitative Imaging, Medical School, University of Western Australia, Crawley, WA, Australia
- Department of Radiation Oncology, Sir Charles Gairdner Hospital, Nedlands, WA, Australia
| | - Pejman Rowshanfarzad
- School of Physics, Mathematics and Computing, University of Western Australia, Crawley, WA, Australia
- Centre for Advanced Technologies in Cancer Research (CATCR), Perth, WA, Australia
| | - Ghulam M Hassan
- School of Physics, Mathematics and Computing, University of Western Australia, Crawley, WA, Australia
- Australian Centre for Quantitative Imaging, Medical School, University of Western Australia, Crawley, WA, Australia
| | - Jason Dowling
- The Australian e-Health Research Centre, CSIRO, Brisbane, QLD, Australia
| | - Paul M Parizel
- David Hartley Chair of Radiology, Royal Perth Hospital and University of Western Australia, Perth, WA, Australia
- Medical School, University of Western Australia, Perth, WA, Australia
| | - Michael S Hofman
- Prostate Cancer Theranostics and Imaging Centre of Excellence (ProsTIC); Molecular Imaging and Therapeutic Nuclear Medicine, Cancer Imaging, Peter MacCallum Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
| | - Martin A Ebert
- School of Physics, Mathematics and Computing, University of Western Australia, Crawley, WA, Australia
- Centre for Advanced Technologies in Cancer Research (CATCR), Perth, WA, Australia
- Australian Centre for Quantitative Imaging, Medical School, University of Western Australia, Crawley, WA, Australia
- Department of Radiation Oncology, Sir Charles Gairdner Hospital, Nedlands, WA, Australia
| |
Collapse
|
16
|
Koyun M, Cevval ZK, Reis B, Ece B. Detection of Intracranial Hemorrhage from Computed Tomography Images: Diagnostic Role and Efficacy of ChatGPT-4o. Diagnostics (Basel) 2025; 15:143. [PMID: 39857027 PMCID: PMC11763562 DOI: 10.3390/diagnostics15020143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Revised: 01/07/2025] [Accepted: 01/08/2025] [Indexed: 01/27/2025] Open
Abstract
Background/Objectives: The role of artificial intelligence (AI) in radiological image analysis is rapidly evolving. This study evaluates the diagnostic performance of Chat Generative Pre-trained Transformer Omni (GPT-4 Omni) in detecting intracranial hemorrhages (ICHs) in non-contrast computed tomography (NCCT) images, along with its ability to classify hemorrhage type, stage, anatomical location, and associated findings. Methods: A retrospective study was conducted using 240 cases, comprising 120 ICH cases and 120 controls with normal findings. Five consecutive NCCT slices per case were selected by radiologists and analyzed by ChatGPT-4o using a standardized prompt with nine questions. Diagnostic accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated by comparing the model's results with radiologists' assessments (the gold standard). After a two-week interval, the same dataset was re-evaluated to assess intra-observer reliability and consistency. Results: ChatGPT-4o achieved 100% accuracy in identifying imaging modality type. For ICH detection, the model demonstrated a diagnostic accuracy of 68.3%, sensitivity of 79.2%, specificity of 57.5%, PPV of 65.1%, and NPV of 73.4%. It correctly classified 34.0% of hemorrhage types and 7.3% of localizations. All ICH-positive cases were identified as acute phase (100%). In the second evaluation, diagnostic accuracy improved to 73.3%, with a sensitivity of 86.7% and a specificity of 60%. The Cohen's Kappa coefficient for intra-observer agreement in ICH detection indicated moderate agreement (κ = 0.469). Conclusions: ChatGPT-4o shows promise in identifying imaging modalities and ICH presence but demonstrates limitations in localization and hemorrhage type classification. These findings highlight its potential for improvement through targeted training for medical applications.
Collapse
Affiliation(s)
- Mustafa Koyun
- Department of Radiology, Kastamonu Training and Research Hospital, Kastamonu 37150, Turkey;
| | - Zeycan Kubra Cevval
- Department of Radiology, Kastamonu Training and Research Hospital, Kastamonu 37150, Turkey;
| | - Bahadir Reis
- Department of Radiology, Kastamonu University, Kastamonu 37150, Turkey; (B.R.); (B.E.)
| | - Bunyamin Ece
- Department of Radiology, Kastamonu University, Kastamonu 37150, Turkey; (B.R.); (B.E.)
| |
Collapse
|
17
|
Taghavi RM, Shah A, Filkov V, Goldman RE. Deep Learning Models for Automatic Classification of Anatomic Location in Abdominopelvic Digital Subtraction Angiography. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-024-01351-z. [PMID: 39789320 DOI: 10.1007/s10278-024-01351-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 11/11/2024] [Accepted: 11/19/2024] [Indexed: 01/12/2025]
Abstract
PURPOSE To explore the information in routine digital subtraction angiography (DSA) and evaluate deep learning algorithms for automated identification of anatomic location in DSA sequences. METHODS DSA of the abdominal aorta, celiac, superior mesenteric, inferior mesenteric, and bilateral external iliac arteries was labeled with the anatomic location from retrospectively collected endovascular procedures performed between 2010 and 2020 at a tertiary care medical center. "Key" images within each sequence demonstrating the parent vessel and the first bifurcation were additionally labeled. Mode models aggregating single image predictions, trained with the full or "key" datasets, and a multiple instance learning (MIL) model were developed for location classification of the DSA sequences. Model performance was evaluated with a primary endpoint of multiclass classification accuracy and compared by McNemar's test. RESULTS A total of 819 unique angiographic sequences from 205 patients and 276 procedures were included in the training, validation, and testing data and split into partitions at the patient level to preclude data leakage. The data demonstrate substantial information sparsity as a minority of the images were designated as "key" with sufficient information for localization by a domain expert. A Mode model, trained and tested with "key" images, demonstrated an overall multiclass classification accuracy of 0.975 (95% CI 0.941-1). A MIL model, trained and tested with all data, demonstrated an overall multiclass classification accuracy of 0.966 (95% CI 0.932-0.992). Both the Mode model with "key" images (p < 0.001) and MIL model (p < 0.001) significantly outperformed a Mode model trained and tested with the full dataset. The MIL model additionally automatically identified a set of top-5 images with an average overlap of 92.5% to manually labelled "key" images. CONCLUSION Deep learning algorithms can identify anatomic locations in abdominopelvic DSA with high fidelity using manual or automatic methods to manage information sparsity.
Collapse
Affiliation(s)
- Reza Moein Taghavi
- Department of Radiology, UC Davis School of Medicine, University of California, Davis, 4860 Y Street, Suite 3100, Sacramento, CA, 95817-2307, USA
| | - Amol Shah
- Department of Radiology, UC Davis School of Medicine, University of California, Davis, 4860 Y Street, Suite 3100, Sacramento, CA, 95817-2307, USA
| | - Vladimir Filkov
- Department of Computer Science, University of California, Davis, Davis, CA, USA
| | - Roger Eric Goldman
- Department of Radiology, UC Davis School of Medicine, University of California, Davis, 4860 Y Street, Suite 3100, Sacramento, CA, 95817-2307, USA.
| |
Collapse
|
18
|
Sequí-Sabater JM, Benavent D. Artificial intelligence in rheumatology research: what is it good for? RMD Open 2025; 11:e004309. [PMID: 39778924 PMCID: PMC11748787 DOI: 10.1136/rmdopen-2024-004309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Accepted: 12/08/2024] [Indexed: 01/11/2025] Open
Abstract
Artificial intelligence (AI) is transforming rheumatology research, with a myriad of studies aiming to improve diagnosis, prognosis and treatment prediction, while also showing potential capability to optimise the research workflow, improve drug discovery and clinical trials. Machine learning, a key element of discriminative AI, has demonstrated the ability of accurately classifying rheumatic diseases and predicting therapeutic outcomes by using diverse data types, including structured databases, imaging and text. In parallel, generative AI, driven by large language models, is becoming a powerful tool for optimising the research workflow by supporting with content generation, literature review automation and clinical decision support. This review explores the current applications and future potential of both discriminative and generative AI in rheumatology. It also highlights the challenges posed by these technologies, such as ethical concerns and the need for rigorous validation and regulatory oversight. The integration of AI in rheumatology promises substantial advancements but requires a balanced approach to optimise benefits and minimise potential possible downsides.
Collapse
Affiliation(s)
- José Miguel Sequí-Sabater
- Rheumatology Department, La Ribera University Hospital, Alzira, Spain
- Rheumatology Deparment, La Fe University and Polytechnic Hospital, Valencia, Spain
- Division of Rheumatology, Department of Medicine Solna, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
| | - Diego Benavent
- Rheumatology Department, Hospital Universitari de Bellvitge, L'Hospitalet de Llobregat, Barcelona, Spain
| |
Collapse
|
19
|
Russo L, Bottazzi S, Kocak B, Zormpas-Petridis K, Gui B, Stanzione A, Imbriaco M, Sala E, Cuocolo R, Ponsiglione A. Evaluating the quality of radiomics-based studies for endometrial cancer using RQS and METRICS tools. Eur Radiol 2025; 35:202-214. [PMID: 39014086 PMCID: PMC11632020 DOI: 10.1007/s00330-024-10947-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 05/15/2024] [Accepted: 06/19/2024] [Indexed: 07/18/2024]
Abstract
OBJECTIVE To assess the methodological quality of radiomics-based models in endometrial cancer using the radiomics quality score (RQS) and METhodological radiomICs score (METRICS). METHODS We systematically reviewed studies published by October 30th, 2023. Inclusion criteria were original radiomics studies on endometrial cancer using CT, MRI, PET, or ultrasound. Articles underwent a quality assessment by novice and expert radiologists using RQS and METRICS. The inter-rater reliability for RQS and METRICS among radiologists with varying expertise was determined. Subgroup analyses were performed to assess whether scores varied according to study topic, imaging technique, publication year, and journal quartile. RESULTS Sixty-eight studies were analysed, with a median RQS of 11 (IQR, 9-14) and METRICS score of 67.6% (IQR, 58.8-76.0); two different articles reached maximum RQS of 19 and METRICS of 90.7%, respectively. Most studies utilised MRI (82.3%) and machine learning methods (88.2%). Characterisation and recurrence risk stratification were the most explored outcomes, featured in 35.3% and 19.1% of articles, respectively. High inter-rater reliability was observed for both RQS (ICC: 0.897; 95% CI: 0.821, 0.946) and METRICS (ICC: 0.959; 95% CI: 0.928, 0.979). Methodological limitations such as lack of external validation suggest areas for improvement. At subgroup analyses, no statistically significant difference was noted. CONCLUSIONS Whilst using RQS, the quality of endometrial cancer radiomics research was apparently unsatisfactory, METRICS depicts a good overall quality. Our study highlights the need for strict compliance with quality metrics. Adhering to these quality measures can increase the consistency of radiomics towards clinical application in the pre-operative management of endometrial cancer. CLINICAL RELEVANCE STATEMENT Both the RQS and METRICS can function as instrumental tools for identifying different methodological deficiencies in endometrial cancer radiomics research. However, METRICS also reflected a focus on the practical applicability and clarity of documentation. KEY POINTS The topic of radiomics currently lacks standardisation, limiting clinical implementation. METRICS scores were generally higher than the RQS, reflecting differences in the development process and methodological content. A positive trend in METRICS score may suggest growing attention to methodological aspects in radiomics research.
Collapse
Affiliation(s)
- Luca Russo
- Dipartimento di Scienze Radiologiche ed Ematologiche, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Silvia Bottazzi
- Dipartimento di Scienze Radiologiche ed Ematologiche, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Burak Kocak
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Basaksehir, Istanbul, Turkey
| | - Konstantinos Zormpas-Petridis
- Dipartimento Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Benedetta Gui
- Dipartimento Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Arnaldo Stanzione
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", Naples, Italy
| | - Massimo Imbriaco
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", Naples, Italy
| | - Evis Sala
- Dipartimento di Scienze Radiologiche ed Ematologiche, Università Cattolica del Sacro Cuore, Rome, Italy
- Dipartimento Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Renato Cuocolo
- Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, Italy.
| | - Andrea Ponsiglione
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", Naples, Italy
| |
Collapse
|
20
|
Yu PLH, Chiu KWH, Lu J, Lui GC, Zhou J, Cheng HM, Mao X, Wu J, Shen XP, Kwok KM, Kan WK, Ho Y, Chan HT, Xiao P, Mak LY, Tsui VW, Hui C, Lam PM, Deng Z, Guo J, Ni L, Huang J, Yu S, Peng C, Li WK, Yuen MF, Seto WK. Application of a deep learning algorithm for the diagnosis of HCC. JHEP Rep 2025; 7:101219. [PMID: 39687602 PMCID: PMC11648772 DOI: 10.1016/j.jhepr.2024.101219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 09/10/2024] [Accepted: 09/10/2024] [Indexed: 12/18/2024] Open
Abstract
Background & Aims Hepatocellular carcinoma (HCC) is characterized by a high mortality rate. The Liver Imaging Reporting and Data System (LI-RADS) results in a considerable number of indeterminate observations, rendering an accurate diagnosis difficult. Methods We developed four deep learning models for diagnosing HCC on computed tomography (CT) via a training-validation-testing approach. Thin-slice triphasic CT liver images and relevant clinical information were collected and processed for deep learning. HCC was diagnosed and verified via a 12-month clinical composite reference standard. CT observations among at-risk patients were annotated using LI-RADS. Diagnostic performance was assessed by internal validation and independent external testing. We conducted sensitivity analyses of different subgroups, deep learning explainability evaluation, and misclassification analysis. Results From 2,832 patients and 4,305 CT observations, the best-performing model was Spatio-Temporal 3D Convolution Network (ST3DCN), achieving area under receiver-operating-characteristic curves (AUCs) of 0.919 (95% CI, 0.903-0.935) and 0.901 (95% CI, 0.879-0.924) at the observation (n = 1,077) and patient (n = 685) levels, respectively during internal validation, compared with 0.839 (95% CI, 0.814-0.864) and 0.822 (95% CI, 0.790-0.853), respectively for standard of care radiological interpretation. The negative predictive values of ST3DCN were 0.966 (95% CI, 0.954-0.979) and 0.951 (95% CI, 0.931-0.971), respectively. The observation-level AUCs among at-risk patients, 2-5-cm observations, and singular portovenous phase analysis of ST3DCN were 0.899 (95% CI, 0.874-0.924), 0.872 (95% CI, 0.838-0.909) and 0.912 (95% CI, 0.895-0.929), respectively. In external testing (551/717 patients/observations), the AUC of ST3DCN was 0.901 (95% CI, 0.877-0.924), which was non-inferior to radiological interpretation (AUC 0.900; 95% CI, 0.877--923). Conclusions ST3DCN achieved strong, robust performance for accurate HCC diagnosis on CT. Thus, deep learning can expedite and improve the process of diagnosing HCC. Impact and implications The clinical applicability of deep learning in HCC diagnosis is potentially huge, especially considering the expected increase in the incidence and mortality of HCC worldwide. Early diagnosis through deep learning can lead to earlier definitive management, particularly for at-risk patients. The model can be broadly deployed for patients undergoing a triphasic contrast CT scan of the liver to reduce the currently high mortality rate of HCC.
Collapse
Affiliation(s)
- Philip Leung Ho Yu
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
- Department of Mathematics and Information Technology, The Education University of Hong Kong, Hong Kong, China
| | - Keith Wan-Hang Chiu
- Department of Diagnostic Radiology, School of Clinical Medicine, The University of Hong Kong, Hong Kong, China
- Department of Radiology and Imaging, Queen Elizabeth Hospital, Hong Kong, China
- Department of Medical Imaging, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - Jianliang Lu
- Department of Medicine, School of Clinical Medicine, The University of Hong Kong, Hong Kong, China
| | - Gilbert C.S. Lui
- Department of Mathematics and Information Technology, The Education University of Hong Kong, Hong Kong, China
| | - Jian Zhou
- Department of Medical Imaging, Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Ho-Ming Cheng
- Department of Medical Imaging, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - Xianhua Mao
- Department of Medicine, School of Clinical Medicine, The University of Hong Kong, Hong Kong, China
| | - Juan Wu
- Department of Medicine, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - Xin-Ping Shen
- Department of Medical Imaging, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - King Ming Kwok
- Department of Diagnostic and Interventional Radiology, Kwong Wah Hospital, Hong Kong, China
| | - Wai Kuen Kan
- Department of Radiology, Pamela Youde Nethersole Eastern Hospital, Hong Kong, China
| | - Y.C. Ho
- Department of Radiology, Queen Mary Hospital, Hong Kong, China
| | - Hung Tat Chan
- Department of Medical Imaging, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - Peng Xiao
- Department of Medicine, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - Lung-Yi Mak
- Department of Medicine, School of Clinical Medicine, The University of Hong Kong, Hong Kong, China
- State Key Laboratory of Liver Research, The University of Hong Kong, Hong Kong, China
| | - Vivien W.M. Tsui
- Department of Medicine, School of Clinical Medicine, The University of Hong Kong, Hong Kong, China
| | - Cynthia Hui
- Department of Medicine, School of Clinical Medicine, The University of Hong Kong, Hong Kong, China
| | - Pui Mei Lam
- Department of Medicine, School of Clinical Medicine, The University of Hong Kong, Hong Kong, China
| | - Zijie Deng
- Department of Medicine, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - Jiaqi Guo
- Department of Medicine, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - Li Ni
- Department of Medicine, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - Jinhua Huang
- Department of Minimal Invasive Interventional Therapy, Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Sarah Yu
- Department of Diagnostic Radiology, School of Clinical Medicine, The University of Hong Kong, Hong Kong, China
| | - Chengzhi Peng
- Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Wai Keung Li
- Department of Mathematics and Information Technology, The Education University of Hong Kong, Hong Kong, China
| | - Man-Fung Yuen
- Department of Medicine, School of Clinical Medicine, The University of Hong Kong, Hong Kong, China
- Department of Medicine, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- State Key Laboratory of Liver Research, The University of Hong Kong, Hong Kong, China
| | - Wai-Kay Seto
- Department of Medicine, School of Clinical Medicine, The University of Hong Kong, Hong Kong, China
- Department of Medicine, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- State Key Laboratory of Liver Research, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
21
|
Gao C, Wu L, Wu W, Huang Y, Wang X, Sun Z, Xu M, Gao C. Deep learning in pulmonary nodule detection and segmentation: a systematic review. Eur Radiol 2025; 35:255-266. [PMID: 38985185 PMCID: PMC11632000 DOI: 10.1007/s00330-024-10907-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/09/2024] [Accepted: 05/10/2024] [Indexed: 07/11/2024]
Abstract
OBJECTIVES The accurate detection and precise segmentation of lung nodules on computed tomography are key prerequisites for early diagnosis and appropriate treatment of lung cancer. This study was designed to compare detection and segmentation methods for pulmonary nodules using deep-learning techniques to fill methodological gaps and biases in the existing literature. METHODS This study utilized a systematic review with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, searching PubMed, Embase, Web of Science Core Collection, and the Cochrane Library databases up to May 10, 2023. The Quality Assessment of Diagnostic Accuracy Studies 2 criteria was used to assess the risk of bias and was adjusted with the Checklist for Artificial Intelligence in Medical Imaging. The study analyzed and extracted model performance, data sources, and task-focus information. RESULTS After screening, we included nine studies meeting our inclusion criteria. These studies were published between 2019 and 2023 and predominantly used public datasets, with the Lung Image Database Consortium Image Collection and Image Database Resource Initiative and Lung Nodule Analysis 2016 being the most common. The studies focused on detection, segmentation, and other tasks, primarily utilizing Convolutional Neural Networks for model development. Performance evaluation covered multiple metrics, including sensitivity and the Dice coefficient. CONCLUSIONS This study highlights the potential power of deep learning in lung nodule detection and segmentation. It underscores the importance of standardized data processing, code and data sharing, the value of external test datasets, and the need to balance model complexity and efficiency in future research. CLINICAL RELEVANCE STATEMENT Deep learning demonstrates significant promise in autonomously detecting and segmenting pulmonary nodules. Future research should address methodological shortcomings and variability to enhance its clinical utility. KEY POINTS Deep learning shows potential in the detection and segmentation of pulmonary nodules. There are methodological gaps and biases present in the existing literature. Factors such as external validation and transparency affect the clinical application.
Collapse
Affiliation(s)
- Chuan Gao
- The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China
- The First School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, China
| | - Linyu Wu
- The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China
- The First School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, China
| | - Wei Wu
- The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China
- The First School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, China
| | - Yichao Huang
- The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China
- The First School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, China
| | - Xinyue Wang
- The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China
- The First School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, China
| | - Zhichao Sun
- The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China.
- The First School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, China.
| | - Maosheng Xu
- The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China.
- The First School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, China.
| | - Chen Gao
- The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China.
- The First School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, China.
| |
Collapse
|
22
|
Wang TW, Hong JS, Lee WK, Lin YH, Yang HC, Lee CC, Chen HC, Wu HM, You WC, Wu YT. Performance of Convolutional Neural Network Models in Meningioma Segmentation in Magnetic Resonance Imaging: A Systematic Review and Meta-Analysis. Neuroinformatics 2025; 23:14. [PMID: 39777602 PMCID: PMC11706894 DOI: 10.1007/s12021-024-09704-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/02/2024] [Indexed: 01/11/2025]
Abstract
BACKGROUND Meningioma, the most common primary brain tumor, presents significant challenges in MRI-based diagnosis and treatment planning due to its diverse manifestations. Convolutional Neural Networks (CNNs) have shown promise in improving the accuracy and efficiency of meningioma segmentation from MRI scans. This systematic review and meta-analysis assess the effectiveness of CNN models in segmenting meningioma using MRI. METHODS Following the PRISMA guidelines, we searched PubMed, Embase, and Web of Science from their inception to December 20, 2023, to identify studies that used CNN models for meningioma segmentation in MRI. Methodological quality of the included studies was assessed using the CLAIM and QUADAS-2 tools. The primary variable was segmentation accuracy, which was evaluated using the Sørensen-Dice coefficient. Meta-analysis, subgroup analysis, and meta-regression were performed to investigate the effects of MRI sequence, CNN architecture, and training dataset size on model performance. RESULTS Nine studies, comprising 4,828 patients, were included in the analysis. The pooled Dice score across all studies was 89% (95% CI: 87-90%). Internal validation studies yielded a pooled Dice score of 88% (95% CI: 85-91%), while external validation studies reported a pooled Dice score of 89% (95% CI: 88-90%). Models trained on multiple MRI sequences consistently outperformed those trained on single sequences. Meta-regression indicated that training dataset size did not significantly influence segmentation accuracy. CONCLUSION CNN models are highly effective for meningioma segmentation in MRI, particularly during the use of diverse datasets from multiple MRI sequences. This finding highlights the importance of data quality and imaging sequence selection in the development of CNN models. Standardization of MRI data acquisition and preprocessing may improve the performance of CNN models, thereby facilitating their clinical adoption for the optimal diagnosis and treatment of meningioma.
Collapse
Affiliation(s)
- Ting-Wei Wang
- Institute of Biophotonics, National Yang Ming Chiao Tung University, 155, Sec. 2, Li-Nong St. Beitou Dist, Taipei, 112304, Taiwan
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, 112304, Taiwan
- Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jia-Sheng Hong
- Institute of Biophotonics, National Yang Ming Chiao Tung University, 155, Sec. 2, Li-Nong St. Beitou Dist, Taipei, 112304, Taiwan
| | - Wei-Kai Lee
- Institute of Biophotonics, National Yang Ming Chiao Tung University, 155, Sec. 2, Li-Nong St. Beitou Dist, Taipei, 112304, Taiwan
| | - Yi-Hui Lin
- Department of Radiation Oncology, Taichung Veterans General Hospital, Taichung, 407219, Taiwan
- College of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, 300093, Taiwan
| | - Huai-Che Yang
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, 112304, Taiwan
- Department of Neurosurgery, Neurological Institute, Taipei Veterans General Hospital, Taipei, 112201, Taiwan
| | - Cheng-Chia Lee
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, 112304, Taiwan
- Department of Neurosurgery, Neurological Institute, Taipei Veterans General Hospital, Taipei, 112201, Taiwan
| | - Hung-Chieh Chen
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, 112304, Taiwan
- Department of Radiology, Taichung Veterans General Hospital, Taichung, 407219, Taiwan
| | - Hsiu-Mei Wu
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, 112304, Taiwan
- Department of Radiology, Taipei Veterans General Hospital, Taipei, 112201, Taiwan
| | - Weir Chiang You
- Department of Radiation Oncology, Taichung Veterans General Hospital, Taichung, 407219, Taiwan
| | - Yu-Te Wu
- Institute of Biophotonics, National Yang Ming Chiao Tung University, 155, Sec. 2, Li-Nong St. Beitou Dist, Taipei, 112304, Taiwan.
| |
Collapse
|
23
|
Cavallo AU, Stanzione A, Ponsiglione A, Trotta R, Fanni SC, Ghezzo S, Vernuccio F, Klontzas ME, Triantafyllou M, Ugga L, Kalarakis G, Cannella R, Cuocolo R. Prostate cancer MRI methodological radiomics score: a EuSoMII radiomics auditing group initiative. Eur Radiol 2024:10.1007/s00330-024-11299-x. [PMID: 39739041 DOI: 10.1007/s00330-024-11299-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 09/05/2024] [Accepted: 10/10/2024] [Indexed: 01/02/2025]
Abstract
OBJECTIVES To evaluate the quality of radiomics research in prostate MRI for the evaluation of prostate cancer (PCa) through the assessment of METhodological RadiomICs (METRICS) score, a new scoring tool recently introduced with the goal of fostering further improvement in radiomics and machine learning methodology. MATERIALS AND METHODS A literature search was conducted from July 1st, 2019, to November 30th, 2023, to identify original investigations assessing MRI-based radiomics in the setting of PCa. Seven readers with varying expertise underwent a quality assessment using METRICS. Subgroup analyses were performed to assess whether the quality score varied according to papers' categories (diagnosis, staging, prognosis, technical) and quality ratings among these latter. RESULTS From a total of 1106 records, 185 manuscripts were available. Overall, the average METRICS total score was 52% ± 16%. ANOVA and chi-square tests revealed no statistically significant differences between subgroups. Items with the lowest positive scores were adherence to guidelines/checklists (4.9%), handling of confounding factors (14.1%), external testing (15.1%), and the availability of data (15.7%), code (4.3%), and models (1.6%). Conversely, most studies clearly defined patient selection criteria (86.5%), employed a high-quality reference standard (89.2%), and utilized a well-described (85.9%) and clinically applicable (87%) imaging protocol as a radiomics data source. CONCLUSION The quality of MRI-based radiomics research for PCa in recent studies demonstrated good homogeneity and overall moderate quality. KEY POINTS Question To evaluate the quality of MRI-based radiomics research for PCa, assessed through the METRICS score. Findings The average METRICS total score was 52%, reflecting moderate quality in MRI-based radiomics research for PCa, with no statistically significant differences between subgroups. Clinical relevance Enhancing the quality of radiomics research can improve diagnostic accuracy for PCa, leading to better patient outcomes and more informed clinical decision-making.
Collapse
Affiliation(s)
| | - Arnaldo Stanzione
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", Naples, Italy
| | - Andrea Ponsiglione
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", Naples, Italy.
| | - Romina Trotta
- Department of Radiology, Fatima Hospital, Seville, Spain
| | | | | | - Federica Vernuccio
- Section of Radiology, Department of Biomedicine, Neuroscience and Advanced Diagnostics (BIND), University of Palermo, Palermo, Italy
| | - Michail E Klontzas
- Artificial Intelligence and Translational Imaging (ATI) Lab, Department of Radiology, School of Medicine, University of Crete, Heraklion, Greece
- Department of Medical Imaging, University Hospital of Heraklion, Heraklion, Greece
- Division of Radiology, Department of Clinical Science, Intervention and Technology (CLINTEC), Karolinska Institutet, Stockholm, Sweden
| | - Matthaios Triantafyllou
- Artificial Intelligence and Translational Imaging (ATI) Lab, Department of Radiology, School of Medicine, University of Crete, Heraklion, Greece
- Department of Medical Imaging, University Hospital of Heraklion, Heraklion, Greece
| | - Lorenzo Ugga
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", Naples, Italy
| | - Georgios Kalarakis
- Division of Radiology, Department of Clinical Science, Intervention and Technology (CLINTEC), Karolinska Institutet, Stockholm, Sweden
- Department of Neuroradiology, Karolinska University Hospital, Stockholm, Sweden
| | - Roberto Cannella
- Section of Radiology, Department of Biomedicine, Neuroscience and Advanced Diagnostics (BIND), University of Palermo, Palermo, Italy
| | - Renato Cuocolo
- Department of Medicine, Surgery, and Dentistry, University of Salerno, Baronissi, Italy
| |
Collapse
|
24
|
Cao X, Xiong M, Liu Z, Yang J, Kan YB, Zhang LQ, Liu YH, Xie MG, Hu XF. Update report on the quality of gliomas radiomics: An integration of bibliometric and radiomics quality score. World J Radiol 2024; 16:794-805. [PMID: 39801663 PMCID: PMC11718527 DOI: 10.4329/wjr.v16.i12.794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 11/04/2024] [Accepted: 11/25/2024] [Indexed: 12/27/2024] Open
Abstract
BACKGROUND Despite the increasing number of publications on glioma radiomics, challenges persist in clinical translation. AIM To assess the development and reporting quality of radiomics in brain gliomas since 2019. METHODS A bibliometric analysis was conducted to reveal trends in brain glioma radiomics research. The Radiomics Quality Score (RQS), a metric for evaluating the quality of radiomics studies, was applied to assess the quality of adult-type diffuse glioma studies published since 2019. The total RQS score and the basic adherence rate for each item were calculated. Subgroup analysis by journal type and research objective was performed, correlating the total RQS score with journal impact factors. RESULTS The radiomics research in glioma was initiated in 2011 and has witnessed a surge since 2019. Among the 260 original studies, the median RQS score was 11, correlating with a basic compliance rate of 46.8%. Subgroup analysis revealed significant differences in domain 1 and its subitems (multiple segmentations) across journal types (P = 0.039 and P = 0.03, respectively). The Spearman correlation coefficients indicated that the total RQS score had a negative correlation with the Journal Citation Report category (-0.69) and a positive correlation with the five-year impact factors (0.318) of journals. CONCLUSION Glioma radiomics research quality has improved since 2019 but necessitates further advancement with higher publication standards.
Collapse
Affiliation(s)
- Xu Cao
- Department of Radiology, The People's Hospital of Shifang, Deyang 618400, Sichuan Province, China
- Department of Nuclear Medicine, Southwest Hospital, Third Military Medical University (Army Medical University), Chongqing 400038, Chongqing, China
| | - Ming Xiong
- Department of Digital Medicine, School of Biomedical Engineering and Medical Imaging, Third Military Medical University (Army Medical University), Chongqing 400038, China
| | - Zhi Liu
- Department of Radiology, Chongqing Hospital of Traditional Chinese Medicine, Chongqing 400000, China
| | - Jing Yang
- Department of Radiology, Southwest Hospital, Third Military Medical University (Army Medical University), Chongqing 400038, China
| | - Yu-Bo Kan
- School of Medical and Life Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 610075, Sichuan Province, China
| | - Li-Qiang Zhang
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400010, China
| | - Yan-Hui Liu
- Department of Neurosurgery, West China Hospital, Sichuan University, Chengdu 610041, Sichuan Province, China
| | - Ming-Guo Xie
- Department of Radiology, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu 500643, Sichuan Province, China
| | - Xiao-Fei Hu
- Department of Nuclear Medicine, Southwest Hospital, Third Military Medical University (Army Medical University), Chongqing 400038, Chongqing, China
- Glioma Medicine Research Center, Southwest Hospital, Third Military Medical University (Army Medical University), Chongqing 400038, Chongqing, China
| |
Collapse
|
25
|
Hanna M, Pantanowitz L, Jackson B, Palmer O, Visweswaran S, Pantanowitz J, Deebajah M, Rashidi H. Ethical and Bias Considerations in Artificial Intelligence (AI)/Machine Learning. Mod Pathol 2024:100686. [PMID: 39694331 DOI: 10.1016/j.modpat.2024.100686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2024] [Accepted: 11/27/2024] [Indexed: 12/20/2024]
Abstract
As artificial intelligence (AI) gains prominence in pathology and medicine, the ethical implications and potential biases within such integrated AI models will require careful scrutiny. Ethics and bias are important considerations in our practice settings, especially as increased number of machine learning (ML) systems are being integrated within our various medical domains. Such machine learning based systems, have demonstrated remarkable capabilities in specified tasks such as but not limited to image recognition, natural language processing, and predictive analytics. However, the potential bias that may exist within such AI-ML models can also inadvertently lead to unfair and potentially detrimental outcomes. The source of bias within such machine learning models can be due to numerous factors but can be typically put in three main buckets (data bias, development bias and interaction bias). These could be due to the training data, algorithmic bias, feature engineering and selection issues, clinical and institutional bias (i.e. practice variability), reporting bias, and temporal bias (i.e. changes in technology, clinical practice or disease patterns). Therefore despite the potential of these AI-ML applications, their deployment in our day to day practice also raises noteworthy ethical concerns. To address ethics and bias in medicine, a comprehensive evaluation process is required which will encompass all aspects such systems, from model development through clinical deployment. Addressing these biases is crucial to ensure that AI-ML systems remain fair, transparent, and beneficial to all. This review will discuss the relevant ethical and bias considerations in AI-ML specifically within the pathology and medical domain.
Collapse
Affiliation(s)
- Matthew Hanna
- Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, PA; Computational Pathology and AI Center of Excellence (CPACE), University of Pittsburgh, Pittsburgh, PA.
| | - Liron Pantanowitz
- Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, PA; Computational Pathology and AI Center of Excellence (CPACE), University of Pittsburgh, Pittsburgh, PA
| | - Brian Jackson
- Department of Pathology, University of Utah, Salt Lake City, UT; ARUP Laboratories, Salt Lake City, UT
| | - Octavia Palmer
- Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, PA; Computational Pathology and AI Center of Excellence (CPACE), University of Pittsburgh, Pittsburgh, PA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
| | | | | | - Hooman Rashidi
- Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, PA; Computational Pathology and AI Center of Excellence (CPACE), University of Pittsburgh, Pittsburgh, PA.
| |
Collapse
|
26
|
Spielvogel CP, Ning J, Kluge K, Haberl D, Wasinger G, Yu J, Einspieler H, Papp L, Grubmüller B, Shariat SF, Baltzer PAT, Clauser P, Hartenbach M, Kenner L, Hacker M, Haug AR, Rasul S. Preoperative detection of extraprostatic tumor extension in patients with primary prostate cancer utilizing [ 68Ga]Ga-PSMA-11 PET/MRI. Insights Imaging 2024; 15:299. [PMID: 39666257 PMCID: PMC11638435 DOI: 10.1186/s13244-024-01876-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Accepted: 11/28/2024] [Indexed: 12/13/2024] Open
Abstract
OBJECTIVES Radical prostatectomy (RP) is a common intervention in patients with localized prostate cancer (PCa), with nerve-sparing RP recommended to reduce adverse effects on patient quality of life. Accurate pre-operative detection of extraprostatic extension (EPE) remains challenging, often leading to the application of suboptimal treatment. The aim of this study was to enhance pre-operative EPE detection through multimodal data integration using explainable machine learning (ML). METHODS Patients with newly diagnosed PCa who underwent [68Ga]Ga-PSMA-11 PET/MRI and subsequent RP were recruited retrospectively from two time ranges for training, cross-validation, and independent validation. The presence of EPE was measured from post-surgical histopathology and predicted using ML and pre-operative parameters, including PET/MRI-derived features, blood-based markers, histology-derived parameters, and demographic parameters. ML models were subsequently compared with conventional PET/MRI-based image readings. RESULTS The study involved 107 patients, 59 (55%) of whom were affected by EPE according to postoperative findings for the initial training and cross-validation. The ML models demonstrated superior diagnostic performance over conventional PET/MRI image readings, with the explainable boosting machine model achieving an AUC of 0.88 (95% CI 0.87-0.89) during cross-validation and an AUC of 0.88 (95% CI 0.75-0.97) during independent validation. The ML approach integrating invasive features demonstrated better predictive capabilities for EPE compared to visual clinical read-outs (Cross-validation AUC 0.88 versus 0.71, p = 0.02). CONCLUSION ML based on routinely acquired clinical data can significantly improve the pre-operative detection of EPE in PCa patients, potentially enabling more accurate clinical staging and decision-making, thereby improving patient outcomes. CRITICAL RELEVANCE STATEMENT This study demonstrates that integrating multimodal data with machine learning significantly improves the pre-operative detection of extraprostatic extension in prostate cancer patients, outperforming conventional imaging methods and potentially leading to more accurate clinical staging and better treatment decisions. KEY POINTS Extraprostatic extension is an important indicator guiding treatment approaches. Current assessment of extraprostatic extension is difficult and lacks accuracy. Machine learning improves detection of extraprostatic extension using PSMA-PET/MRI and histopathology.
Collapse
Affiliation(s)
- Clemens P Spielvogel
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Jing Ning
- Christian Doppler Laboratory for Applied Metabolomics, Vienna, Austria
- Department of Pathology, Medical University of Vienna, Vienna, Austria
| | - Kilian Kluge
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
- Christian Doppler Laboratory for Applied Metabolomics, Vienna, Austria
| | - David Haberl
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Gabriel Wasinger
- Department of Pathology, Medical University of Vienna, Vienna, Austria
| | - Josef Yu
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Holger Einspieler
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Laszlo Papp
- Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
| | - Bernhard Grubmüller
- Department of Urology and Andrology, University Hospital Krems, Krems, Austria
- Karl Landsteiner University of Health Sciences, Krems, Austria
- Department of Urology, Medical University of Vienna, Vienna, Austria
| | - Shahrokh F Shariat
- Department of Urology, Medical University of Vienna, Vienna, Austria
- Department of Urology, University of Texas Southwestern Medical Center, Dallas, USA
- Division of Urology, Department of Special Surgery, The University of Jordan, Amman, Jordan
- Department of Urology, Second Faculty of Medicine, Charles University, Prague, Czech Republic
- Department of Urology, Weill Cornell Medical College, New York, USA
- Karl Landsteiner Institute of Urology and Andrology, Vienna, Austria
| | - Pascal A T Baltzer
- Department of Biomedical Imaging and Image-Guided Therapy, Division of General and Pediatric Radiology, Medical University of Vienna, Vienna, Austria
| | - Paola Clauser
- Department of Biomedical Imaging and Image-Guided Therapy, Division of General and Pediatric Radiology, Medical University of Vienna, Vienna, Austria
| | - Markus Hartenbach
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Lukas Kenner
- Christian Doppler Laboratory for Applied Metabolomics, Vienna, Austria
- Department of Pathology, Medical University of Vienna, Vienna, Austria
- Center for Biomarker Research in Medicine, Graz, Austria
- Unit for Pathology of Laboratory Animals, University of Veterinary Medicine Vienna, Vienna, Austria
| | - Marcus Hacker
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Alexander R Haug
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
- Christian Doppler Laboratory for Applied Metabolomics, Vienna, Austria
| | - Sazan Rasul
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria.
| |
Collapse
|
27
|
Positano V, Meloni A, De Santi LA, Pistoia L, Borsellino Z, Cossu A, Massei F, Sanna PMG, Santarelli MF, Cademartiri F. Convolutional neural networks for automatic MR classification of myocardial iron overload in thalassemia major patients. Eur Radiol 2024:10.1007/s00330-024-11245-x. [PMID: 39658686 DOI: 10.1007/s00330-024-11245-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 10/17/2024] [Accepted: 11/21/2024] [Indexed: 12/12/2024]
Abstract
OBJECTIVES To develop a deep-learning model for supervised classification of myocardial iron overload (MIO) from magnitude T2* multi-echo MR images. MATERIALS AND METHODS Eight hundred twenty-three cardiac magnitude T2* multi-slice, multi-echo MR images from 496 thalassemia major patients (285 females, 57%), labeled for MIO level (normal: T2* > 20 ms, moderate: 10 ≤ T2* ≤ 20 ms, severe: T2* < 10 ms), were retrospectively studied. Two 2D convolutional neural networks (CNN) developed for multi-slice (MS-HippoNet) and single-slice (SS-HippoNet) analysis were trained using 5-fold cross-validation. Performance was assessed using micro-average, multi-class accuracy, and single-class accuracy, sensitivity, and specificity. CNN performance was compared with inter-observer agreement between radiologists on 20% of the patients. The agreement between patients' classifications was assessed by the inter-agreement Kappa test. RESULTS Among the 165 images in the test set, a multi-class accuracy of 0.885 and 0.836 was obtained for MS- and SS-Hippo-Net, respectively. Network performances were confirmed on external test set analysis (0.827 and 0.793 multi-class accuracy, 29 patients from the CHMMOTv1 database). The agreement between automatic and ground truth classification was good (MS: κ = 0.771; SS: κ = 0.614), comparable with the inter-observer agreement (MS: κ = 0.872, SS: κ = 0.907) evaluated on the test set. CONCLUSION The developed networks performed classification of MIO level from multiecho, bright-blood, and T2* images with good performances. KEY POINTS Question MRI T2* represents the established clinical tool for MIO assessment. Quality control of the image analysis is a problem in small centers. Findings Deep learning models can perform MIO staging with good accuracy, comparable to inter-observer variability of the standard procedure. Clinical relevance CNN can perform automated staging of cardiac iron overload from multiecho MR sequences facilitating non-invasive evaluation of patients with various hematologic disorders.
Collapse
Affiliation(s)
- Vincenzo Positano
- Bioengineering Unit, Fondazione G. Monasterio CNR-Regione Toscana, Pisa, Italy.
- Department of Radiology, Fondazione G. Monasterio CNR-Regione Toscana, Pisa, Italy.
| | - Antonella Meloni
- Bioengineering Unit, Fondazione G. Monasterio CNR-Regione Toscana, Pisa, Italy
- Department of Radiology, Fondazione G. Monasterio CNR-Regione Toscana, Pisa, Italy
| | - Lisa Anita De Santi
- Bioengineering Unit, Fondazione G. Monasterio CNR-Regione Toscana, Pisa, Italy
- Department of Information Engineering, University of Pisa, Pisa, Italy
| | - Laura Pistoia
- Department of Radiology, Fondazione G. Monasterio CNR-Regione Toscana, Pisa, Italy
- Unità Operativa Complessa Ricerca Clinica, Fondazione G. Monasterio CNR-Regione Toscana, Pisa, Italy
| | - Zelia Borsellino
- Unità Operativa Complessa Ematologia con Talassemia, ARNAS Civico "Benfratelli-Di Cristina", Palermo, Italy
| | - Alberto Cossu
- Unità Operativa Radiologia Universitaria, Azienda Ospedaliero-Universitaria "S. Anna", Ferrara, Italy
| | - Francesco Massei
- UO Oncoematologia Pediatrica, Azienda Ospedaliero Universitaria Pisana, Pisa, Italy
| | | | | | - Filippo Cademartiri
- Department of Radiology, Fondazione G. Monasterio CNR-Regione Toscana, Pisa, Italy
| |
Collapse
|
28
|
Galldiks N, Lohmann P, Friedrich M, Werner JM, Stetter I, Wollring MM, Ceccon G, Stegmayr C, Krause S, Fink GR, Law I, Langen KJ, Tonn JC. PET imaging of gliomas: Status quo and quo vadis? Neuro Oncol 2024; 26:S185-S198. [PMID: 38970818 PMCID: PMC11631135 DOI: 10.1093/neuonc/noae078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2024] Open
Abstract
PET imaging, particularly using amino acid tracers, has become a valuable adjunct to anatomical MRI in the clinical management of patients with glioma. Collaborative international efforts have led to the development of clinical and technical guidelines for PET imaging in gliomas. The increasing readiness of statutory health insurance agencies, especially in European countries, to reimburse amino acid PET underscores its growing importance in clinical practice. Integrating artificial intelligence and radiomics in PET imaging of patients with glioma may significantly improve tumor detection, segmentation, and response assessment. Efforts are ongoing to facilitate the clinical translation of these techniques. Considerable progress in computer technology developments (eg quantum computers) may be helpful to accelerate these efforts. Next-generation PET scanners, such as long-axial field-of-view PET/CT scanners, have improved image quality and body coverage and therefore expanded the spectrum of indications for PET imaging in Neuro-Oncology (eg PET imaging of the whole spine). Encouraging results of clinical trials in patients with glioma have prompted the development of PET tracers directing therapeutically relevant targets (eg the mutant isocitrate dehydrogenase) for novel anticancer agents in gliomas to improve response assessment. In addition, the success of theranostics for the treatment of extracranial neoplasms such as neuroendocrine tumors and prostate cancer has currently prompted efforts to translate this approach to patients with glioma. These advancements highlight the evolving role of PET imaging in Neuro-Oncology, offering insights into tumor biology and treatment response, thereby informing personalized patient care. Nevertheless, these innovations warrant further validation in the near future.
Collapse
Affiliation(s)
- Norbert Galldiks
- Department of Neurology, University Hospital of Cologne, University of Cologne, Cologne, Germany
- Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf (CIO ABCD), Germany
- Institute of Neuroscience and Medicine (INM-3, INM-4), Research Center Juelich, Juelich, Germany
| | - Philipp Lohmann
- Institute of Neuroscience and Medicine (INM-3, INM-4), Research Center Juelich, Juelich, Germany
- Department of Nuclear Medicine, University Hospital RWTH Aachen, Aachen, Germany
| | - Michel Friedrich
- Institute of Neuroscience and Medicine (INM-3, INM-4), Research Center Juelich, Juelich, Germany
| | - Jan-Michael Werner
- Department of Neurology, University Hospital of Cologne, University of Cologne, Cologne, Germany
| | - Isabelle Stetter
- Department of Neurology, University Hospital of Cologne, University of Cologne, Cologne, Germany
| | - Michael M Wollring
- Department of Neurology, University Hospital of Cologne, University of Cologne, Cologne, Germany
| | - Garry Ceccon
- Department of Neurology, University Hospital of Cologne, University of Cologne, Cologne, Germany
| | - Carina Stegmayr
- Institute of Neuroscience and Medicine (INM-3, INM-4), Research Center Juelich, Juelich, Germany
| | - Sandra Krause
- Institute of Neuroscience and Medicine (INM-3, INM-4), Research Center Juelich, Juelich, Germany
| | - Gereon R Fink
- Department of Neurology, University Hospital of Cologne, University of Cologne, Cologne, Germany
| | - Ian Law
- Department of Clinical Physiology and Nuclear Medicine, Copenhagen University Hospital-Rigshospitalet, Copenhagen, Denmark
| | - Karl-Josef Langen
- Institute of Neuroscience and Medicine (INM-3, INM-4), Research Center Juelich, Juelich, Germany
- Department of Nuclear Medicine, University Hospital RWTH Aachen, Aachen, Germany
| | - Joerg-Christian Tonn
- Department of Neurosurgery, University Hospital of Munich (LMU), Munich, Germany
| |
Collapse
|
29
|
Shelmerdine SC, Pauling C, Allan E, Langan D, Ashworth E, Yung KW, Barber J, Haque S, Rosewarne D, Woznitza N, Ather S, Novak A, Theivendran K, Arthurs OJ. Artificial intelligence (AI) for paediatric fracture detection: a multireader multicase (MRMC) study protocol. BMJ Open 2024; 14:e084448. [PMID: 39645256 PMCID: PMC11628946 DOI: 10.1136/bmjopen-2024-084448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 10/25/2024] [Indexed: 12/09/2024] Open
Abstract
INTRODUCTION Paediatric fractures are common but can be easily missed on radiography leading to potentially serious implications including long-term pain, disability and missed opportunities for safeguarding in cases of inflicted injury. Artificial intelligence (AI) tools to assist fracture detection in adult patients exist, although their efficacy in children is less well known. This study aims to evaluate whether a commercially available AI tool (certified for paediatric use) improves healthcare professionals (HCPs) detection of fractures, and how this may impact patient care in a retrospective simulated study design. METHODS AND ANALYSIS Using a multicentric dataset of 500 paediatric radiographs across four body parts, the diagnostic performance of HCPs will be evaluated across two stages-first without, followed by with the assistance of an AI tool (BoneView, Gleamer) after an interval 4-week washout period. The dataset will contain a mixture of normal and abnormal cases. HCPs will be recruited across radiology, orthopaedics and emergency medicine. We will aim for 40 readers, with ~14 in each subspecialty, half being experienced consultants. For each radiograph HCPs will evaluate presence of a fracture, their confidence level and a suitable simulated management plan. Diagnostic accuracy will be judged against a consensus interpretation by an expert panel of two paediatric radiologists (ground truth). Multilevel logistic modelling techniques will analyse and report diagnostic accuracy outcome measures for fracture detection. Descriptive statistics will evaluate changes in simulated patient management. ETHICS AND DISSEMINATION This study was granted approval by National Health Service Health Research Authority and Health and Care Research Wales (REC Reference: 22/PR/0334). IRAS Project ID is 274 278. Funding has been provided by the National Institute for Heath and Care Research (NIHR) (Grant ID: NIHR-301322). Findings from this study will be disseminated through peer-reviewed publications, conferences and non-peer-reviewed media and social media outlets. TRIAL REGISTRATION NUMBER ISRCTN12921105.
Collapse
Affiliation(s)
- Susan C Shelmerdine
- Clinical Radiology, Great Ormond Street Hospital for Children, London, UK
- UCL Great Ormond Street Institute of Child Health, London, UK
- Great Ormond Street Hospital NIHR Biomedical Research Centre, London, UK
| | - Cato Pauling
- UCL Great Ormond Street Institute of Child Health, London, UK
| | - Emma Allan
- Clinical Radiology, Great Ormond Street Hospital for Children, London, UK
| | - Dean Langan
- UCL Great Ormond Street Institute of Child Health, London, UK
- Centre of Applied Statistics Courses, University College London, London, UK
| | - Emily Ashworth
- Clinical Radiology, Great Ormond Street Hospital for Children, London, UK
| | - Ka-Wai Yung
- Wellcome/ EPSRC Centre for Interventional and Surgical Sciences, London, UK
| | - Joy Barber
- Clinical Radiology, St George's Healthcare NHS Trust, London, UK
| | - Saira Haque
- Clinical Radiology, Kings College Hospital NHS Foundation Trust, London, UK
| | - David Rosewarne
- Clinical Radiology, Royal Wolverhampton Hospitals NHS Trust, Wolverhampton, UK
| | - Nick Woznitza
- School of Allied Health Professions, Faculty of Medicine, Health and Social Care, Canterbury Christ Church University, Canterbury, UK
- Clinical Radiology, University College London Hospitals NHS Foundation Trust, London, UK
| | - Sarim Ather
- Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Alex Novak
- Emergency Medicine Research Oxford, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Kanthan Theivendran
- Orthopaedic Surgery, Sandwell and West Birmingham Hospitals NHS Trust, Birmingham, UK
| | - Owen J Arthurs
- Clinical Radiology, Great Ormond Street Hospital for Children, London, UK
- UCL Great Ormond Street Institute of Child Health, London, UK
- Great Ormond Street Hospital NIHR Biomedical Research Centre, London, UK
| |
Collapse
|
30
|
Pattathil N, Lee TSJ, Huang RS, Lena ER, Felfeli T. Adherence of studies involving artificial intelligence in the analysis of ophthalmology electronic medical records to AI-specific items from the CONSORT-AI guideline: a systematic review. Graefes Arch Clin Exp Ophthalmol 2024; 262:3741-3748. [PMID: 38953984 DOI: 10.1007/s00417-024-06553-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 06/09/2024] [Accepted: 06/12/2024] [Indexed: 07/04/2024] Open
Abstract
PURPOSE In the context of ophthalmologic practice, there has been a rapid increase in the amount of data collected using electronic health records (EHR). Artificial intelligence (AI) offers a promising means of centralizing data collection and analysis, but to date, most AI algorithms have only been applied to analyzing image data in ophthalmologic practice. In this review we aimed to characterize the use of AI in the analysis of EHR, and to critically appraise the adherence of each included study to the CONSORT-AI reporting guideline. METHODS A comprehensive search of three relevant databases (MEDLINE, EMBASE, and Cochrane Library) from January 2010 to February 2023 was conducted. The included studies were evaluated for reporting quality based on the AI-specific items from the CONSORT-AI reporting guideline. RESULTS Of the 4,968 articles identified by our search, 89 studies met all inclusion criteria and were included in this review. Most of the studies utilized AI for ocular disease prediction (n = 41, 46.1%), and diabetic retinopathy was the most studied ocular pathology (n = 19, 21.3%). The overall mean CONSORT-AI score across the 14 measured items was 12.1 (range 8-14, median 12). Categories with the lowest adherence rates were: describing handling of poor quality data (48.3%), specifying participant inclusion and exclusion criteria (56.2%), and detailing access to the AI intervention or its code, including any restrictions (62.9%). CONCLUSIONS In conclusion, we have identified that AI is prominently being used for disease prediction in ophthalmology clinics, however these algorithms are limited by their lack of generalizability and cross-center reproducibility. A standardized framework for AI reporting should be developed, to improve AI applications in the management of ocular disease and ophthalmology decision making.
Collapse
Affiliation(s)
| | - Tin-Suet Joan Lee
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Ryan S Huang
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Eleanor R Lena
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Tina Felfeli
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, ON, Canada.
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
31
|
Khosravi P, Mohammadi S, Zahiri F, Khodarahmi M, Zahiri J. AI-Enhanced Detection of Clinically Relevant Structural and Functional Anomalies in MRI: Traversing the Landscape of Conventional to Explainable Approaches. J Magn Reson Imaging 2024; 60:2272-2289. [PMID: 38243677 DOI: 10.1002/jmri.29247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 01/05/2024] [Accepted: 01/08/2024] [Indexed: 01/21/2024] Open
Abstract
Anomaly detection in medical imaging, particularly within the realm of magnetic resonance imaging (MRI), stands as a vital area of research with far-reaching implications across various medical fields. This review meticulously examines the integration of artificial intelligence (AI) in anomaly detection for MR images, spotlighting its transformative impact on medical diagnostics. We delve into the forefront of AI applications in MRI, exploring advanced machine learning (ML) and deep learning (DL) methodologies that are pivotal in enhancing the precision of diagnostic processes. The review provides a detailed analysis of preprocessing, feature extraction, classification, and segmentation techniques, alongside a comprehensive evaluation of commonly used metrics. Further, this paper explores the latest developments in ensemble methods and explainable AI, offering insights into future directions and potential breakthroughs. This review synthesizes current insights, offering a valuable guide for researchers, clinicians, and medical imaging experts. It highlights AI's crucial role in improving the precision and speed of detecting key structural and functional irregularities in MRI. Our exploration of innovative techniques and trends furthers MRI technology development, aiming to refine diagnostics, tailor treatments, and elevate patient care outcomes. LEVEL OF EVIDENCE: 5 TECHNICAL EFFICACY: Stage 1.
Collapse
Affiliation(s)
- Pegah Khosravi
- Department of Biological Sciences, New York City College of Technology, CUNY, New York City, New York, USA
- The CUNY Graduate Center, City University of New York, New York City, New York, USA
| | - Saber Mohammadi
- Department of Biological Sciences, New York City College of Technology, CUNY, New York City, New York, USA
- Department of Biophysics, Tarbiat Modares University, Tehran, Iran
| | - Fatemeh Zahiri
- Department of Cell and Molecular Sciences, Kharazmi University, Tehran, Iran
| | | | - Javad Zahiri
- Department of Neuroscience, University of California San Diego, San Diego, California, USA
| |
Collapse
|
32
|
Wang TW, Tzeng YH, Wu KT, Liu HR, Hong JS, Hsu HY, Fu HN, Lee YT, Yin WH, Wu YT. Meta-analysis of deep learning approaches for automated coronary artery calcium scoring: Performance and clinical utility AI in CAC scoring: A meta-analysis: AI in CAC scoring: A meta-analysis. Comput Biol Med 2024; 183:109295. [PMID: 39437607 DOI: 10.1016/j.compbiomed.2024.109295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 10/04/2024] [Accepted: 10/15/2024] [Indexed: 10/25/2024]
Abstract
INTRODUCTION Manual Coronary Artery Calcium (CAC) scoring, crucial for assessing coronary artery disease risk, is time-consuming and variable. Deep learning, particularly through Convolutional Neural Networks (CNNs), promises to automate and enhance the accuracy of CAC scoring, which this study investigates. METHODS Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we conducted a comprehensive literature search across PubMed, Embase, Web of Science, and IEEE databases from their inception until November 1, 2023, and selected studies that employed deep learning for automated CAC scoring. We then evaluated the quality of these studies by using the Checklist for Artificial Intelligence in Medical Imaging and the Quality Assessment of Diagnostic Accuracy Studies 2. The main metric for evaluation was Cohen's kappa statistic, indicating an agreement between deep learning models and manual scoring methods. RESULTS A total of 25 studies were included, with a pooled kappa statistic of 83 % (95 % CI of 79 %-87 %), indicating strong agreement between automated and manual CAC scoring. Subgroup analysis revealed performance variations based on imaging modalities and technical specifications. Sensitivity analysis confirmed the reliability of the results. CONCLUSIONS Deep learning models, particularly CNNs, have great potential for use in automated CAC scoring applications, potentially enhancing the efficiency and accuracy of risk assessments for coronary artery disease. Further research and standardization are required to address the major heterogeneity and performance disparities between different imaging modalities. Overall, our findings underscore the evolving role of artificial intelligence in advancing cardiac imaging and patient care.
Collapse
Affiliation(s)
- Ting-Wei Wang
- Institute of Biophotonics, National Yang-Ming Chiao Tung University, 155, Sec. 2, Li-Nong St. Beitou Dist., Taipei, 112304, Taiwan; School of Medicine, College of Medicine, National Yang-Ming Chiao Tung University, Taipei, Taiwan; Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Yun-Hsuan Tzeng
- Division of Medical Imaging, Health Management Center, Cheng Hsin General Hospital, Taipei, Taiwan; Faculty of Medicine, National Defense Medical Center, Taipei, Taiwan
| | - Kuan-Ting Wu
- Institute of Biophotonics, National Yang-Ming Chiao Tung University, 155, Sec. 2, Li-Nong St. Beitou Dist., Taipei, 112304, Taiwan; School of Medicine, College of Medicine, National Yang-Ming Chiao Tung University, Taipei, Taiwan
| | - Ho-Ren Liu
- Division of Medical Imaging, Health Management Center, Cheng Hsin General Hospital, Taipei, Taiwan
| | - Jia-Sheng Hong
- Institute of Biophotonics, National Yang-Ming Chiao Tung University, 155, Sec. 2, Li-Nong St. Beitou Dist., Taipei, 112304, Taiwan
| | - Huan-Yu Hsu
- Institute of Biophotonics, National Yang-Ming Chiao Tung University, 155, Sec. 2, Li-Nong St. Beitou Dist., Taipei, 112304, Taiwan; School of Medicine, College of Medicine, National Yang-Ming Chiao Tung University, Taipei, Taiwan
| | - Hao-Neng Fu
- Heart Center, Cheng Hsin General Hospital, Taipei, Taiwan
| | - Yung-Tsai Lee
- Heart Center, Cheng Hsin General Hospital, Taipei, Taiwan
| | - Wei-Hsian Yin
- School of Medicine, College of Medicine, National Yang-Ming Chiao Tung University, Taipei, Taiwan; Heart Center, Cheng Hsin General Hospital, Taipei, Taiwan
| | - Yu-Te Wu
- Institute of Biophotonics, National Yang-Ming Chiao Tung University, 155, Sec. 2, Li-Nong St. Beitou Dist., Taipei, 112304, Taiwan; National Yang Ming Chiao Tung University, Brain Research Center, Taiwan; National Yang Ming Chiao Tung University, Medical Device Innovation and Translation Center, Taiwan.
| |
Collapse
|
33
|
Wahid KA, Kaffey ZY, Farris DP, Humbert-Vidan L, Moreno AC, Rasmussen M, Ren J, Naser MA, Netherton TJ, Korreman S, Balakrishnan G, Fuller CD, Fuentes D, Dohopolski MJ. Artificial intelligence uncertainty quantification in radiotherapy applications - A scoping review. Radiother Oncol 2024; 201:110542. [PMID: 39299574 PMCID: PMC11648575 DOI: 10.1016/j.radonc.2024.110542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 08/18/2024] [Accepted: 09/09/2024] [Indexed: 09/22/2024]
Abstract
BACKGROUND/PURPOSE The use of artificial intelligence (AI) in radiotherapy (RT) is expanding rapidly. However, there exists a notable lack of clinician trust in AI models, underscoring the need for effective uncertainty quantification (UQ) methods. The purpose of this study was to scope existing literature related to UQ in RT, identify areas of improvement, and determine future directions. METHODS We followed the PRISMA-ScR scoping review reporting guidelines. We utilized the population (human cancer patients), concept (utilization of AI UQ), context (radiotherapy applications) framework to structure our search and screening process. We conducted a systematic search spanning seven databases, supplemented by manual curation, up to January 2024. Our search yielded a total of 8980 articles for initial review. Manuscript screening and data extraction was performed in Covidence. Data extraction categories included general study characteristics, RT characteristics, AI characteristics, and UQ characteristics. RESULTS We identified 56 articles published from 2015 to 2024. 10 domains of RT applications were represented; most studies evaluated auto-contouring (50 %), followed by image-synthesis (13 %), and multiple applications simultaneously (11 %). 12 disease sites were represented, with head and neck cancer being the most common disease site independent of application space (32 %). Imaging data was used in 91 % of studies, while only 13 % incorporated RT dose information. Most studies focused on failure detection as the main application of UQ (60 %), with Monte Carlo dropout being the most commonly implemented UQ method (32 %) followed by ensembling (16 %). 55 % of studies did not share code or datasets. CONCLUSION Our review revealed a lack of diversity in UQ for RT applications beyond auto-contouring. Moreover, we identified a clear need to study additional UQ methods, such as conformal prediction. Our results may incentivize the development of guidelines for reporting and implementation of UQ in RT.
Collapse
Affiliation(s)
- Kareem A Wahid
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Zaphanlene Y Kaffey
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - David P Farris
- Research Medical Library, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Laia Humbert-Vidan
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Amy C Moreno
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Jintao Ren
- Department of Oncology, Aarhus University Hospital, Denmark
| | - Mohamed A Naser
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Tucker J Netherton
- Department of Radiation Physics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Stine Korreman
- Department of Oncology, Aarhus University Hospital, Denmark
| | | | - Clifton D Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - David Fuentes
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| | - Michael J Dohopolski
- Department of Radiation Oncology, The University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
34
|
Dzialas V, Doering E, Eich H, Strafella AP, Vaillancourt DE, Simonyan K, van Eimeren T. Houston, We Have AI Problem! Quality Issues with Neuroimaging-Based Artificial Intelligence in Parkinson's Disease: A Systematic Review. Mov Disord 2024; 39:2130-2143. [PMID: 39235364 PMCID: PMC11657025 DOI: 10.1002/mds.30002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 08/07/2024] [Accepted: 08/08/2024] [Indexed: 09/06/2024] Open
Abstract
In recent years, many neuroimaging studies have applied artificial intelligence (AI) to facilitate existing challenges in Parkinson's disease (PD) diagnosis, prognosis, and intervention. The aim of this systematic review was to provide an overview of neuroimaging-based AI studies and to assess their methodological quality. A PubMed search yielded 810 studies, of which 244 that investigated the utility of neuroimaging-based AI for PD diagnosis, prognosis, or intervention were included. We systematically categorized studies by outcomes and rated them with respect to five minimal quality criteria (MQC) pertaining to data splitting, data leakage, model complexity, performance reporting, and indication of biological plausibility. We found that the majority of studies aimed to distinguish PD patients from healthy controls (54%) or atypical parkinsonian syndromes (25%), whereas prognostic or interventional studies were sparse. Only 20% of evaluated studies passed all five MQC, with data leakage, non-minimal model complexity, and reporting of biological plausibility as the primary factors for quality loss. Data leakage was associated with a significant inflation of accuracies. Very few studies employed external test sets (8%), where accuracy was significantly lower, and 19% of studies did not account for data imbalance. Adherence to MQC was low across all observed years and journal impact factors. This review outlines that AI has been applied to a wide variety of research questions pertaining to PD; however, the number of studies failing to pass the MQC is alarming. Therefore, we provide recommendations to enhance the interpretability, generalizability, and clinical utility of future AI applications using neuroimaging in PD. © 2024 The Author(s). Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.
Collapse
Affiliation(s)
- Verena Dzialas
- Department of Nuclear Medicine, Faculty of Medicine and University HospitalUniversity of CologneCologneGermany
- Faculty of Mathematics and Natural SciencesUniversity of CologneCologneGermany
| | - Elena Doering
- Department of Nuclear Medicine, Faculty of Medicine and University HospitalUniversity of CologneCologneGermany
- German Center for Neurodegenerative Diseases (DZNE)BonnGermany
| | - Helena Eich
- Department of Nuclear Medicine, Faculty of Medicine and University HospitalUniversity of CologneCologneGermany
| | - Antonio P. Strafella
- Edmond J. Safra Parkinson Disease Program, Neurology Division, Krembil Brain InstituteUniversity Health NetworkTorontoCanada
- Brain Health Imaging Centre, Centre for Addiction and Mental HealthUniversity of TorontoTorontoCanada
- Temerty Faculty of MedicineUniversity of TorontoTorontoCanada
| | - David E. Vaillancourt
- Department of Applied Physiology and KinesiologyUniversity of FloridaGainesvilleFloridaUSA
| | - Kristina Simonyan
- Department of Otolaryngology—Head and Neck SurgeryHarvard Medical School and Massachusetts Eye and EarBostonMassachusettsUSA
- Department of NeurologyMassachusetts General HospitalBostonMassachusettsUSA
| | - Thilo van Eimeren
- Department of Nuclear Medicine, Faculty of Medicine and University HospitalUniversity of CologneCologneGermany
- Department of Neurology, Faculty of Medicine and University HospitalUniversity of CologneCologneGermany
| | | |
Collapse
|
35
|
Tiong EWW, Liu SH, Ting DSJ. Cochrane corner: artificial intelligence for keratoconus. Eye (Lond) 2024; 38:3406-3408. [PMID: 39300189 PMCID: PMC11621326 DOI: 10.1038/s41433-024-03347-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 08/29/2024] [Accepted: 09/13/2024] [Indexed: 09/22/2024] Open
Affiliation(s)
| | - Su-Hsun Liu
- Department of Ophthalmology and Department of Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Darren S J Ting
- Academic Unit of Ophthalmology, Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK.
- Birmingham and Midland Eye Centre, Sandwell and West Birmingham NHS Trust, Birmingham, UK.
- Academic Ophthalmology, School of Medicine, University of Nottingham, Nottingham, UK.
- Singapore Eye Research Institute, Singapore, Singapore.
| |
Collapse
|
36
|
Lysdahlgaard S, Jørgensen MD. Artificial intelligence and advanced MRI techniques: A comprehensive analysis of diffuse gliomas. J Med Imaging Radiat Sci 2024; 55:101736. [PMID: 39255563 DOI: 10.1016/j.jmir.2024.101736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 07/19/2024] [Accepted: 07/19/2024] [Indexed: 09/12/2024]
Abstract
INTRODUCTION The complexity of diffuse gliomas relies on advanced imaging techniques like MRI to understand their heterogeneity. Utilizing the UCSF-PDGM dataset, this study harnesses MRI techniques, radiomics, and AI to analyze diffuse gliomas for optimizing patient outcomes. METHODS The research utilized the dataset of 501 subjects with diffuse gliomas through a comprehensive MRI protocol. After performing intricate tumor segmentation, 82.800 radiomic features were extracted for each patient from nine segmentations across eight MRI sequences. These features informed neural network and XGBoost model training to predict patient outcomes and tumor grades, supplemented by SHAP analysis to pinpoint influential radiomic features. RESULTS In our analysis of the UCSF-PDGM dataset, we observed a diverse range of WHO tumor grades and patient outcomes, discarding one corrupt MRI scan. Our segmentation method showed high accuracy when comparing automated and manual techniques. The neural network excelled in prediction of WHO tumor grades with an accuracy of 0.9500 for the necrotic tumor label. The SHAP-analysis highlighted the 3D First Order mean as one of the most influential radiomic features, with features like Original Shape Sphericity and Original Shape Elongation were notably prominent. CONCLUSION A study using the UCSF-PDGM dataset highlighted AI and radiomics' profound impact on neuroradiology by demonstrating reliable tumor segmentation and identifying key radiomic features, despite challenges in predicting patient survival. The research emphasizes both the potential of AI in this field and the need for broader datasets of diverse MRI sequences to enhance patient outcomes. IMPLICATION FOR PRACTICE The study underline the significant role of radiomics in improving the accuracy of tumor identification through radiomic features.
Collapse
Affiliation(s)
- S Lysdahlgaard
- Department of Radiology and Nuclear Medicine, Hospital of South West Jutland, University Hospital of Southern Denmark, Esbjerg, Denmark; Department of Regional Health Research, Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark; Imaging Research Initiative Southwest (IRIS), Hospital of South West Jutland, University Hospital of Southern Denmark, Esbjerg, Denmark.
| | - M D Jørgensen
- Aarhus Universitetshospital, Røntgen og Skanning afsnit, Neuroradiology Department, Danmark
| |
Collapse
|
37
|
López-Rueda A, Rodríguez-Sánchez MÁ, Serrano E, Moreno J, Rodríguez A, Llull L, Amaro S, Oleaga L. Enhancing mortality prediction in patients with spontaneous intracerebral hemorrhage: Radiomics and supervised machine learning on non-contrast computed tomography. Eur J Radiol Open 2024; 13:100618. [PMID: 39687913 PMCID: PMC11648778 DOI: 10.1016/j.ejro.2024.100618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Revised: 11/17/2024] [Accepted: 11/28/2024] [Indexed: 12/18/2024] Open
Abstract
Purpose This study aims to develop a Radiomics-based Supervised Machine-Learning model to predict mortality in patients with spontaneous intracerebral hemorrhage (sICH). Methods Retrospective analysis of a prospectively collected clinical registry of patients with sICH consecutively admitted at a single academic comprehensive stroke center between January-2016 and April-2018. We conducted an in-depth analysis of 105 radiomic features extracted from 105 patients. Following the identification and handling of missing values, radiomics values were scaled to 0-1 to train different classifiers. The sample was split into 80-20 % training-test and validation cohort in a stratified fashion. Random Forest(RF), K-Nearest Neighbor(KNN), and Support Vector Machine(SVM) classifiers were evaluated, along with several feature selection methods and hyperparameter optimization strategies, to classify the binary outcome of mortality or survival during hospital admission. A tenfold stratified cross-validation method was used to train the models, and average metrics were calculated. Results RF, KNN, and SVM, with the "DropOut+SelectKBest" feature selection strategy and no hyperparameter optimization, demonstrated the best performances with the least number of radiomic features and the most simplified models, achieving a sensitivity range between 0.90 and 0.95 and AUC range from 0.97 to 1 on the validation dataset. Regarding the confusion matrix, the SVM model did not predict any false negative test (negative predicted value 1). Conclusion Radiomics-based Supervised Machine Learning models can predict mortality during admission in patients with sICH. SVM with the "DropOut+SelectKBest" feature selection strategy and no hyperparameter optimization was the best simplified model to detect mortality during admission in patients with sICH.
Collapse
Affiliation(s)
- Antonio López-Rueda
- Clinical Informatics Department, Hospital Clínic de Barcelona, Barcelona, Spain
- Radiology Department, Hospital Clínic de Barcelona, Barcelona, Spain
| | | | - Elena Serrano
- Radiology Department, Hospital Universitario de Bellvitge, Barcelona, Spain
| | - Javier Moreno
- Radiology Department, Hospital Clínic de Barcelona, Barcelona, Spain
| | | | - Laura Llull
- Neurology Department, Hospital Clínic de Barcelona, Barcelona, Spain
| | - Sergi Amaro
- Neurology Department, Hospital Clínic de Barcelona, Barcelona, Spain
| | - Laura Oleaga
- Radiology Department, Hospital Clínic de Barcelona, Barcelona, Spain
| |
Collapse
|
38
|
Ghaderi S, Mohammadi M, Sayehmiri F, Mohammadi S, Tavasol A, Rezaei M, Ghalyanchi-Langeroudi A. Machine Learning Approaches to Identify Affected Brain Regions in Movement Disorders Using MRI Data: A Systematic Review and Diagnostic Meta-analysis. J Magn Reson Imaging 2024; 60:2518-2546. [PMID: 38538062 DOI: 10.1002/jmri.29364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 03/13/2024] [Accepted: 03/14/2024] [Indexed: 11/15/2024] Open
Abstract
BACKGROUND Movement disorders such as Parkinson's disease are associated with structural and functional changes in specific brain regions. Advanced magnetic resonance imaging (MRI) techniques combined with machine learning (ML) are promising tools for identifying imaging biomarkers and patterns associated with these disorders. PURPOSE/HYPOTHESIS We aimed to systematically identify the brain regions most commonly affected in movement disorders using ML approaches applied to structural and functional MRI data. We searched the PubMed and Scopus databases using relevant keywords up to June 2023 for studies that used ML approaches to detect brain regions associated with movement disorders using MRI data. STUDY TYPE A systematic review and diagnostic meta-analysis. POPULATION/SUBJECTS Sixty-seven studies with 6,285 patients were included. FIELD STRENGTH/SEQUENCE Studies utilizing 1.5T or 3T MR scanners and the acquisition of diffusion tensor imaging (DTI), structural MRI (sMRI), functional MRI (fMRI), or a combination of these were included. ASSESSMENT The authors independently assessed the study quality using the CLAIM and QUADAS-2 criteria and extracted data on diagnostic accuracy measures. STATISTICAL TESTS Sensitivity, specificity, accuracy, and area under the curve were pooled using random-effects models. Q statistics and the I2 index were used to evaluate heterogeneity, and Begg's funnel plot was used to identify publication bias. RESULTS sMRI showed the highest sensitivity (93%) and mixed modalities had the highest specificity (90%) for detecting regional abnormalities. sMRI had a 94% sensitivity for identifying subcortical changes. The support vector machine (93%) and logistic regression (91%) models exhibited high diagnostic accuracies. DATA CONCLUSION The combination of advanced MR neuroimaging techniques and ML is a promising approach for identifying brain biomarkers and affected regions in movement disorders with subcortical structures frequently implicated. Structural MRI, in particular, showed strong performance. LEVEL OF EVIDENCE 1 TECHNICAL EFFICACY: Stage 2.
Collapse
Affiliation(s)
- Sadegh Ghaderi
- Department of Neuroscience and Addiction Studies, School of Advanced Technologies in Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Mahdi Mohammadi
- Department of Medical Physics and Biomedical Engineering, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Fatemeh Sayehmiri
- Skull Base Research Center, Loghman Hakim Hospital, Shahid Beheshti University of Medical Science, Tehran, Iran
| | - Sana Mohammadi
- Department of Medical Sciences, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Arian Tavasol
- Student Research Committee, Faculty of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Masoud Rezaei
- Medical Physics and Radiology Department, Faculty of Medicine, Gonabad University of Medical Sciences, Gonabad, Iran
| | - Azadeh Ghalyanchi-Langeroudi
- Department of Medical Physics and Biomedical Engineering, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
- Research Center for Biomedical Technologies and Robotics (RCBTR), Tehran, Iran
| |
Collapse
|
39
|
Sato J, Sugimoto K, Suzuki Y, Wataya T, Kita K, Nishigaki D, Tomiyama M, Hiraoka Y, Hori M, Takeda T, Kido S, Tomiyama N. Annotation-free multi-organ anomaly detection in abdominal CT using free-text radiology reports: a multi-centre retrospective study. EBioMedicine 2024; 110:105463. [PMID: 39613675 PMCID: PMC11663761 DOI: 10.1016/j.ebiom.2024.105463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 11/05/2024] [Accepted: 11/05/2024] [Indexed: 12/01/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) systems designed to detect abnormalities in abdominal computed tomography (CT) could reduce radiologists' workload and improve diagnostic processes. However, development of such models has been hampered by the shortage of large expert-annotated datasets. Here, we used information from free-text radiology reports, rather than manual annotations, to develop a deep-learning-based pipeline for comprehensive detection of abdominal CT abnormalities. METHODS In this multicentre retrospective study, we developed a deep-learning-based pipeline to detect abnormalities in the liver, gallbladder, pancreas, spleen, and kidneys. Abdominal CT exams and related free-text reports obtained during routine clinical practice collected from three institutions were used for training and internal testing, while data collected from six institutions were used for external testing. A multi-organ segmentation model and an information extraction schema were used to extract specific organ images and disease information, CT images and radiology reports, respectively, which were used to train a multiple-instance learning model for anomaly detection. Its performance was evaluated using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, and F1 score against radiologists' ground-truth labels. FINDINGS We trained the model for each organ on images selected from 66,684 exams (39,255 patients) and tested it on 300 (295 patients) and 600 (596 patients) exams for internal and external validation, respectively. In the external test cohort, the overall AUC for detecting organ abnormalities was 0.886. Whereas models trained on human-annotated labels performed better with the same number of exams, those trained on larger datasets with labels auto-extracted via the information extraction schema significantly outperformed human-annotated label-derived models. INTERPRETATION Using disease information from routine clinical free-text radiology reports allows development of accurate anomaly detection models without requiring manual annotations. This approach is applicable to various anatomical sites and could streamline diagnostic processes. FUNDING Japan Science and Technology Agency.
Collapse
Affiliation(s)
- Junya Sato
- Department of Artificial Intelligence in Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan; Department of Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Kento Sugimoto
- Department of Medical Informatics, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Yuki Suzuki
- Department of Artificial Intelligence in Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Tomohiro Wataya
- Department of Artificial Intelligence in Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan; Department of Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Kosuke Kita
- Department of Artificial Intelligence in Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Daiki Nishigaki
- Department of Artificial Intelligence in Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan; Department of Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Miyuki Tomiyama
- Department of Artificial Intelligence in Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan; Department of Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Yu Hiraoka
- Department of Artificial Intelligence in Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan; Department of Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Masatoshi Hori
- Department of Artificial Intelligence in Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Toshihiro Takeda
- Department of Medical Informatics, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Shoji Kido
- Department of Artificial Intelligence in Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan.
| | - Noriyuki Tomiyama
- Department of Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| |
Collapse
|
40
|
Cau R, Pisu F, Suri JS, Saba L. Addressing hidden risks: Systematic review of artificial intelligence biases across racial and ethnic groups in cardiovascular diseases. Eur J Radiol 2024; 183:111867. [PMID: 39637580 DOI: 10.1016/j.ejrad.2024.111867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Revised: 11/25/2024] [Accepted: 11/28/2024] [Indexed: 12/07/2024]
Abstract
BACKGROUND Artificial intelligence (AI)-based models are increasingly being integrated into cardiovascular medicine. Despite promising potential, racial and ethnic biases remain a key concern regarding the development and implementation of AI models in clinical settings. OBJECTIVE This systematic review offers an overview of the accuracy and clinical applicability of AI models for cardiovascular diagnosis and prognosis across diverse racial and ethnic groups. METHOD A comprehensive literature search was conducted across four medical and scientific databases: PubMed, MEDLINE via Ovid, Scopus, and the Cochrane Library, to evaluate racial and ethnic disparities in cardiovascular medicine. RESULTS A total of 1704 references were screened, of which 11 articles were included in the final analysis. Applications of AI-based algorithms across different race/ethnic groups were varied and involved diagnosis, prognosis, and imaging segmentation. Among the 11 studies, 9 (82%) concluded that racial/ethnic bias existed, while 2 (18%) found no differences in the outcomes of AI models across various ethnicities. CONCLUSION Our results suggest significant differences in how AI models perform in cardiovascular medicine across diverse racial and ethnic groups. CLINICAL RELEVANCE STATEMENT The increasing integration of AI into cardiovascular medicine highlights the importance of evaluating its performance across diverse populations. This systematic review underscores the critical need to address racial and ethnic disparities in AI-based models to ensure equitable healthcare delivery.
Collapse
Affiliation(s)
- Riccardo Cau
- Department of Radiology, Azienda Ospedaliero Universitaria, Monserrato, Cagliari, Italy
| | - Francesco Pisu
- Department of Radiology, Azienda Ospedaliero Universitaria, Monserrato, Cagliari, Italy
| | - Jasjit S Suri
- Department of Radiology, Azienda Ospedaliero Universitaria, Monserrato, Cagliari, Italy
| | - Luca Saba
- Department of Radiology, Azienda Ospedaliero Universitaria, Monserrato, Cagliari, Italy.
| |
Collapse
|
41
|
Fatania K, Frood R, Mistry H, Short SC, O'Connor J, Scarsbrook AF, Currie S. Impact of intensity standardisation and ComBat batch size on clinical-radiomic prognostic models performance in a multi-centre study of patients with glioblastoma. Eur Radiol 2024:10.1007/s00330-024-11168-7. [PMID: 39607450 DOI: 10.1007/s00330-024-11168-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 08/12/2024] [Accepted: 09/30/2024] [Indexed: 11/29/2024]
Abstract
PURPOSE To assess the effect of different intensity standardisation techniques (ISTs) and ComBat batch sizes on radiomics survival model performance and stability in a heterogenous, multi-centre cohort of patients with glioblastoma (GBM). METHODS Multi-centre pre-operative MRI acquired between 2014 and 2020 in patients with IDH-wildtype unifocal WHO grade 4 GBM were retrospectively evaluated. WhiteStripe (WS), Nyul histogram matching (HM), and Z-score (ZS) ISTs were applied before radiomic feature (RF) extraction. RFs were realigned using ComBat and minimum batch size (MBS) of 5, 10, or 15 patients. Cox proportional hazards models for overall survival (OS) prediction were produced using five different selection strategies and the impact of IST and MBS was evaluated using bootstrapping. Calibration, discrimination, relative explained variation, and model fit were assessed. Instability was evaluated using 95% confidence intervals (95% CIs), feature selection frequency and calibration curves across the bootstrap resamples. RESULTS One hundred ninety-five patients were included. Median OS = 13 (95% CI: 12-14) months. Twelve to fourteen unique MRI protocols were used per MRI sequence. HM and WS produced the highest relative increase in model discrimination, explained variation and model fit but IST choice did not greatly impact on stability, nor calibration. Larger ComBat batches improved discrimination, model fit, and explained variation but higher MBS (reduced sample size) reduced stability (across all performance metrics) and reduced calibration accuracy. CONCLUSION Heterogenous, real-world GBM data poses a challenge to the reproducibility of radiomics. ComBat generally improved model performance as MBS increased but reduced stability and calibration. HM and WS tended to improve model performance. KEY POINTS Question ComBat harmonisation of RFs and intensity standardisation of MRI have not been thoroughly evaluated in multicentre, heterogeneous GBM data. Findings The addition of ComBat and ISTs can improve discrimination, relative model fit, and explained variance but degrades the calibration and stability of survival models. Clinical relevance Radiomics risk prediction models in real-world, multicentre contexts could be improved by ComBat and ISTs, however, this degrades calibration and prediction stability and this must be thoroughly investigated before patients can be accurately separated into different risk groups.
Collapse
Affiliation(s)
- Kavi Fatania
- Department of Radiology, Leeds Teaching Hospitals NHS Trust, England, UK.
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK.
| | - Russell Frood
- Department of Radiology, Leeds Teaching Hospitals NHS Trust, England, UK
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Hitesh Mistry
- Division of Cancer Sciences, University of Manchester, Manchester, UK
| | - Susan C Short
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
- Department of Oncology, Leeds Teaching Hospitals NHS Trust, England, UK
| | - James O'Connor
- Division of Cancer Sciences, University of Manchester, Manchester, UK
- Department of Radiology, The Christie Hospital, Manchester, UK
- Division of Radiotherapy and Imaging, Institute of Cancer Research, London, UK
| | - Andrew F Scarsbrook
- Department of Radiology, Leeds Teaching Hospitals NHS Trust, England, UK
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Stuart Currie
- Department of Radiology, Leeds Teaching Hospitals NHS Trust, England, UK
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| |
Collapse
|
42
|
Kawata N, Iwao Y, Matsuura Y, Higashide T, Okamoto T, Sekiguchi Y, Nagayoshi M, Takiguchi Y, Suzuki T, Haneishi H. Generation of short-term follow-up chest CT images using a latent diffusion model in COVID-19. Jpn J Radiol 2024:10.1007/s11604-024-01699-w. [PMID: 39585556 DOI: 10.1007/s11604-024-01699-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 11/02/2024] [Indexed: 11/26/2024]
Abstract
PURPOSE Despite a global decrease in the number of COVID-19 patients, early prediction of the clinical course for optimal patient care remains challenging. Recently, the usefulness of image generation for medical images has been investigated. This study aimed to generate short-term follow-up chest CT images using a latent diffusion model in patients with COVID-19. MATERIALS AND METHODS We retrospectively enrolled 505 patients with COVID-19 for whom the clinical parameters (patient background, clinical symptoms, and blood test results) upon admission were available and chest CT imaging was performed. Subject datasets (n = 505) were allocated for training (n = 403), and the remaining (n = 102) were reserved for evaluation. The image underwent variational autoencoder (VAE) encoding, resulting in latent vectors. The information consisting of initial clinical parameters and radiomic features were formatted as a table data encoder. Initial and follow-up latent vectors and the initial table data encoders were utilized for training the diffusion model. The evaluation data were used to generate prognostic images. Then, similarity of the prognostic images (generated images) and the follow-up images (real images) was evaluated by zero-mean normalized cross-correlation (ZNCC), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM). Visual assessment was also performed using a numerical rating scale. RESULTS Prognostic chest CT images were generated using the diffusion model. Image similarity showed reasonable values of 0.973 ± 0.028 for the ZNCC, 24.48 ± 3.46 for the PSNR, and 0.844 ± 0.075 for the SSIM. Visual evaluation of the images by two pulmonologists and one radiologist yielded a reasonable mean score. CONCLUSIONS The similarity and validity of generated predictive images for the course of COVID-19-associated pneumonia using a diffusion model were reasonable. The generation of prognostic images may suggest potential utility for early prediction of the clinical course in COVID-19-associated pneumonia and other respiratory diseases.
Collapse
Affiliation(s)
- Naoko Kawata
- Department of Respirology, Graduate School of Medicine, Chiba University, 1-8-1, Inohana, Chuo-Ku, Chiba-Shi, Chiba, 260-8677, Japan.
- Graduate School of Science and Engineering, Chiba University, Chiba, 263-8522, Japan.
| | - Yuma Iwao
- Center for Frontier Medical Engineering, Chiba University, 1-33, Yayoi-Cho, Inage-Ku, Chiba-Shi, Chiba, 263-8522, Japan
- Institute for Quantum Medical Science, National Institutes for Quantum Science and Technology, 4-9-1, Anagawa, Inage-Ku, Chiba-Shi, Chiba, 263-8555, Japan
| | - Yukiko Matsuura
- Department of Respiratory Medicine, Chiba Aoba Municipal Hospital, 1273-2 Aoba-Cho, Chuo-Ku, Chiba-Shi, Chiba, 260-0852, Japan
| | - Takashi Higashide
- Department of Radiology, Chiba University Hospital, 1-8-1, Inohana, Chuo-Ku, Chiba-Shi, Chiba, 260-8677, Japan
- Department of Radiology, Japanese Red Cross Narita Hospital, 90-1, Iida-Cho, Narita-Shi, Chiba, 286-8523, Japan
| | - Takayuki Okamoto
- Center for Frontier Medical Engineering, Chiba University, 1-33, Yayoi-Cho, Inage-Ku, Chiba-Shi, Chiba, 263-8522, Japan
| | - Yuki Sekiguchi
- Graduate School of Science and Engineering, Chiba University, Chiba, 263-8522, Japan
| | - Masaru Nagayoshi
- Department of Respiratory Medicine, Chiba Aoba Municipal Hospital, 1273-2 Aoba-Cho, Chuo-Ku, Chiba-Shi, Chiba, 260-0852, Japan
| | - Yasuo Takiguchi
- Department of Respiratory Medicine, Chiba Aoba Municipal Hospital, 1273-2 Aoba-Cho, Chuo-Ku, Chiba-Shi, Chiba, 260-0852, Japan
| | - Takuji Suzuki
- Department of Respirology, Graduate School of Medicine, Chiba University, 1-8-1, Inohana, Chuo-Ku, Chiba-Shi, Chiba, 260-8677, Japan
| | - Hideaki Haneishi
- Center for Frontier Medical Engineering, Chiba University, 1-33, Yayoi-Cho, Inage-Ku, Chiba-Shi, Chiba, 263-8522, Japan
| |
Collapse
|
43
|
Slim ML, Jacobs R, de Souza Leal RM, Fontenele RC. AI-driven segmentation of the pulp cavity system in mandibular molars on CBCT images using convolutional neural networks. Clin Oral Investig 2024; 28:650. [PMID: 39570431 PMCID: PMC11582138 DOI: 10.1007/s00784-024-06009-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Accepted: 10/24/2024] [Indexed: 11/22/2024]
Abstract
OBJECTIVE To develop and validate an artificial intelligence (AI)-driven tool for automated segmentation of the pulp cavity system of mandibular molars on cone-beam computed tomography (CBCT) images. MATERIALS AND METHODS After ethical approval, 66 CBCT scans were retrieved from a hospital database and divided into training (n = 26, 86 molars), validation (n = 7, 20 molars), and testing (n = 33, 60 molars) sets. After automated segmentation, an expert evaluated the quality of the AI-driven segmentations. The expert then refined any under- or over-segmentation to produce refined-AI (R-AI) segmentations. The AI and R-AI 3D models were compared to assess the accuracy. 30% of the testing sample was randomly selected to assess accuracy metrics and conduct time analysis. RESULTS The AI-driven tool achieved high accuracy, with a Dice similarity coefficient (DSC) of 88% ± 7% for first molars and 90% ± 6% for second molars (p > .05). The 95% Hausdorff distance (HD) was lower for AI-driven segmentation (0.13 ± 0.07) compared to manual segmentation (0.21 ± 0.08) (p < .05). Regarding time efficiency, AI-driven (4.3 ± 2 s) and R-AI segmentation (139 ± 93 s) methods were the fastest, compared to manual segmentation (2349 ± 444 s) (p < .05). CONCLUSION The AI-driven segmentation proved to be accurate and time-efficient in segmenting the pulp cavity system in mandibular molars. CLINICAL RELEVANCE Automated segmentation of the pulp cavity system may result in a fast and accurate 3D model, facilitating minimal-invasive endodontics and leading to higher efficiency of the endodontic workflow, enabling anticipation of complications.
Collapse
Affiliation(s)
- Marie Louise Slim
- OMFS-IMPATH Research Group, Department of Imaging and Pathology, Faculty of Medicine, KU Leuven, Kapucijnenvoer 7, Leuven, 3000, Belgium
- Department of Endodontics, Faculty of Dentistry, Saint Joseph University, Beirut, Lebanon
| | - Reinhilde Jacobs
- OMFS-IMPATH Research Group, Department of Imaging and Pathology, Faculty of Medicine, KU Leuven, Kapucijnenvoer 7, Leuven, 3000, Belgium.
- Department of Dental Medicine, Karolinska Institute, Stockholm, Sweden.
| | - Renata Maíra de Souza Leal
- OMFS-IMPATH Research Group, Department of Imaging and Pathology, Faculty of Medicine, KU Leuven, Kapucijnenvoer 7, Leuven, 3000, Belgium
- Department of Restorative Dentistry, Federal University of Paraná, Curitiba, Paraná, Brazil
| | - Rocharles Cavalcante Fontenele
- OMFS-IMPATH Research Group, Department of Imaging and Pathology, Faculty of Medicine, KU Leuven, Kapucijnenvoer 7, Leuven, 3000, Belgium
| |
Collapse
|
44
|
Maleki F, Moy L, Forghani R, Ghosh T, Ovens K, Langer S, Rouzrokh P, Khosravi B, Ganjizadeh A, Warren D, Daneshjou R, Moassefi M, Avval AH, Sotardi S, Tenenholtz N, Kitamura F, Kline T. RIDGE: Reproducibility, Integrity, Dependability, Generalizability, and Efficiency Assessment of Medical Image Segmentation Models. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01282-9. [PMID: 39557736 DOI: 10.1007/s10278-024-01282-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 07/03/2024] [Accepted: 07/31/2024] [Indexed: 11/20/2024]
Abstract
Deep learning techniques hold immense promise for advancing medical image analysis, particularly in tasks like image segmentation, where precise annotation of regions or volumes of interest within medical images is crucial but manually laborious and prone to interobserver and intraobserver biases. As such, deep learning approaches could provide automated solutions for such applications. However, the potential of these techniques is often undermined by challenges in reproducibility and generalizability, which are key barriers to their clinical adoption. This paper introduces the RIDGE checklist, a comprehensive framework designed to assess the Reproducibility, Integrity, Dependability, Generalizability, and Efficiency of deep learning-based medical image segmentation models. The RIDGE checklist is not just a tool for evaluation but also a guideline for researchers striving to improve the quality and transparency of their work. By adhering to the principles outlined in the RIDGE checklist, researchers can ensure that their developed segmentation models are robust, scientifically valid, and applicable in a clinical setting.
Collapse
Affiliation(s)
- Farhad Maleki
- Department of Computer Science, University of Calgary, Calgary, AB, Canada.
- Department of Diagnostic Radiology, McGill University, Montreal, QC, Canada.
- Department of Radiology, Division of Medical Physics, University of Florida, Gainesville, FL, USA.
| | - Linda Moy
- Department of Radiology, New York University Langone Health, New York, NY, USA
| | - Reza Forghani
- Department of Radiology, Division of Medical Physics, University of Florida, Gainesville, FL, USA
| | - Tapotosh Ghosh
- Department of Computer Science, University of Calgary, Calgary, AB, Canada
| | - Katie Ovens
- Department of Computer Science, University of Calgary, Calgary, AB, Canada
| | - Steve Langer
- Department of Radiology, Mayo Clinic, Rochester, MN, USA
| | | | | | - Ali Ganjizadeh
- Department of Radiology, Mayo Clinic, Rochester, MN, USA
| | - Daniel Warren
- Carle College of Medicine University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - Roxana Daneshjou
- Department of Dermatology, Stanford School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA, USA
| | - Mana Moassefi
- Department of Radiology, Mayo Clinic, Rochester, MN, USA
| | | | - Susan Sotardi
- Department of Radiology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | | | | | - Timothy Kline
- Department of Radiology, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
45
|
Daga K, Agarwal S, Moti Z, Lee MBK, Din M, Wood D, Modat M, Booth TC. Machine Learning Algorithms to Predict the Risk of Rupture of Intracranial Aneurysms: a Systematic Review. Clin Neuroradiol 2024:10.1007/s00062-024-01474-4. [PMID: 39546007 DOI: 10.1007/s00062-024-01474-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Accepted: 10/17/2024] [Indexed: 11/17/2024]
Abstract
PURPOSE Subarachnoid haemorrhage is a potentially fatal consequence of intracranial aneurysm rupture, however, it is difficult to predict if aneurysms will rupture. Prophylactic treatment of an intracranial aneurysm also involves risk, hence identifying rupture-prone aneurysms is of substantial clinical importance. This systematic review aims to evaluate the performance of machine learning algorithms for predicting intracranial aneurysm rupture risk. METHODS MEDLINE, Embase, Cochrane Library and Web of Science were searched until December 2023. Studies incorporating any machine learning algorithm to predict the risk of rupture of an intracranial aneurysm were included. Risk of bias was assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST). PROSPERO registration: CRD42023452509. RESULTS Out of 10,307 records screened, 20 studies met the eligibility criteria for this review incorporating a total of 20,286 aneurysm cases. The machine learning models gave a 0.66-0.90 range for performance accuracy. The models were compared to current clinical standards in six studies and gave mixed results. Most studies posed high or unclear risks of bias and concerns for applicability, limiting the inferences that can be drawn from them. There was insufficient homogenous data for a meta-analysis. CONCLUSIONS Machine learning can be applied to predict the risk of rupture for intracranial aneurysms. However, the evidence does not comprehensively demonstrate superiority to existing practice, limiting its role as a clinical adjunct. Further prospective multicentre studies of recent machine learning tools are needed to prove clinical validation before they are implemented in the clinic.
Collapse
Affiliation(s)
- Karan Daga
- School of Biomedical Engineering & Imaging Sciences, King's College London, BMEIS, King's College London. 1 Lambeth Palace Road, UK SE1 7EU, London, UK
- Guy's and St. Thomas' NHS Foundation Trust, Westminster Bridge Road, UK SE1 7EH, London, UK
| | - Siddharth Agarwal
- School of Biomedical Engineering & Imaging Sciences, King's College London, BMEIS, King's College London. 1 Lambeth Palace Road, UK SE1 7EU, London, UK
| | - Zaeem Moti
- Guy's and St. Thomas' NHS Foundation Trust, Westminster Bridge Road, UK SE1 7EH, London, UK
| | - Matthew B K Lee
- University College London Hospital NHS Foundation Trust, 235 Euston Rd, UK NW1 2BU, London, UK
| | - Munaib Din
- Guy's and St. Thomas' NHS Foundation Trust, Westminster Bridge Road, UK SE1 7EH, London, UK
| | - David Wood
- School of Biomedical Engineering & Imaging Sciences, King's College London, BMEIS, King's College London. 1 Lambeth Palace Road, UK SE1 7EU, London, UK
| | - Marc Modat
- School of Biomedical Engineering & Imaging Sciences, King's College London, BMEIS, King's College London. 1 Lambeth Palace Road, UK SE1 7EU, London, UK
| | - Thomas C Booth
- School of Biomedical Engineering & Imaging Sciences, King's College London, BMEIS, King's College London. 1 Lambeth Palace Road, UK SE1 7EU, London, UK.
- Department of Neuroradiology, King's College Hospital, Denmark Hill, UK SE5 9RS, London, UK.
| |
Collapse
|
46
|
Weitz M, Pfeiffer JR, Patel S, Biancalana M, Pekis A, Kannan V, Kaklamanos E, Parker A, Bucksot JE, Romera JR, Alvin R, Zhang Y, Stefka AT, Lopez-Ramos D, Peterson JR, Antony AK, Zamora KW, Woodard S. Performance of an AI-powered visualization software platform for precision surgery in breast cancer patients. NPJ Breast Cancer 2024; 10:98. [PMID: 39543194 PMCID: PMC11564706 DOI: 10.1038/s41523-024-00696-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 09/22/2024] [Indexed: 11/17/2024] Open
Abstract
Surgery remains the primary treatment modality in the management of early-stage invasive breast cancer. Artificial intelligence (AI)-powered visualization platforms offer the compelling potential to aid surgeons in evaluating the tumor's location and morphology within the breast and accordingly optimize their surgical approach. We sought to validate an AI platform that employs dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to render three-dimensional (3D) representations of the tumor and 5 additional chest tissues, offering clear visualizations as well as functionalities for quantifying tumor morphology, tumor-to-landmark structure distances, excision volumes, and approximate surgical margins. This retrospective study assessed the visualization platform's performance on 100 cases with ground-truth labels vetted by 2 breast-specialized radiologists. We assessed features including automatic AI-generated clinical metrics (e.g., tumor dimensions) as well as visualization tools including convex hulls at desired margins around the tumor to help visualize lumpectomy volume. The statistical performance of the platform's automated features was robust and within the range of inter-radiologist variability. These detailed 3D tumor and surrounding multi-tissue depictions offer both qualitative and quantitative comprehension of cancer topology and may aid in formulating an optimal surgical approach for breast cancer treatment. We further establish the framework for broader data integration into the platform to enhance precision cancer care.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Kathryn W Zamora
- Department of Radiology, University of Alabama at Birmingham School of Medicine, Birmingham, AL, USA
| | - Stefanie Woodard
- Department of Radiology, University of Alabama at Birmingham School of Medicine, Birmingham, AL, USA.
| |
Collapse
|
47
|
Molière S, Hamzaoui D, Ploussard G, Mathieu R, Fiard G, Baboudjian M, Granger B, Roupret M, Delingette H, Renard-Penna R. A Systematic Review of the Diagnostic Accuracy of Deep Learning Models for the Automatic Detection, Localization, and Characterization of Clinically Significant Prostate Cancer on Magnetic Resonance Imaging. Eur Urol Oncol 2024:S2588-9311(24)00248-7. [PMID: 39547898 DOI: 10.1016/j.euo.2024.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 10/21/2024] [Accepted: 11/01/2024] [Indexed: 11/17/2024]
Abstract
BACKGROUND AND OBJECTIVE Magnetic resonance imaging (MRI) plays a critical role in prostate cancer diagnosis, but is limited by variability in interpretation and diagnostic accuracy. This systematic review evaluates the current state of deep learning (DL) models in enhancing the automatic detection, localization, and characterization of clinically significant prostate cancer (csPCa) on MRI. METHODS A systematic search was conducted across Medline/PubMed, Embase, Web of Science, and ScienceDirect for studies published between January 2020 and September 2023. Studies were included if these presented and validated fully automated DL models for csPCa detection on MRI, with pathology confirmation. Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool and the Checklist for Artificial Intelligence in Medical Imaging. KEY FINDINGS AND LIMITATIONS Twenty-five studies met the inclusion criteria, showing promising results in detecting and characterizing csPCa. However, significant heterogeneity in study designs, validation strategies, and datasets complicates direct comparisons. Only one-third of studies performed external validation, highlighting a critical gap in generalizability. The reliance on internal validation limits a broader application of these findings, and the lack of standardized methodologies hinders the integration of DL models into clinical practice. CONCLUSIONS AND CLINICAL IMPLICATIONS DL models demonstrate significant potential in improving prostate cancer diagnostics on MRI. However, challenges in validation, generalizability, and clinical implementation must be addressed. Future research should focus on standardizing methodologies, ensuring external validation and conducting prospective clinical trials to facilitate the adoption of artificial intelligence (AI) in routine clinical settings. These findings support the cautious integration of AI into clinical practice, with further studies needed to confirm their efficacy in diverse clinical environments. PATIENT SUMMARY In this study, we reviewed how artificial intelligence (AI) models can help doctors better detect and understand aggressive prostate cancer using magnetic resonance imaging scans. We found that while these AI tools show promise, these tools need more testing and validation in different hospitals before these can be used widely in patient care.
Collapse
Affiliation(s)
- Sébastien Molière
- Department of Radiology, Hôpital de Hautepierre, Hôpitaux Universitaire de Strasbourg, Strasbourg, France; Breast and Thyroid Imaging Unit, Institut de cancérologie Strasbourg Europe, Strasbourg, France; IGBMC, Illkirch, France.
| | - Dimitri Hamzaoui
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Guillaume Ploussard
- Department of Urology, La Croix du Sud Hôpital, Quint Fonsegrives, France; IUCT-O, Toulouse, France
| | - Romain Mathieu
- Department of Urology, Inserm, EHESP, Irset (Institut de Recherche en Santé, Environnement et Travail), University of Rennes, Rennes, France
| | - Gaelle Fiard
- Department of Urology, CNRS, Grenoble INP, TIMC-IMAG, Grenoble Alpes University Hospital, Université Grenoble Alpes, Grenoble, France
| | - Michael Baboudjian
- Department of Urology, Assistance Publique des Hôpitaux de Marseille, Hôpital Nord, Marseille, France
| | - Benjamin Granger
- Public Health Department, INSERM, IPLESP, AP-HP, Pitie-Salpetriere Hospital, Sorbonne Universite, Paris, France
| | - Morgan Roupret
- Urology, GRC 5 Predictive Onco-Uro, AP-HP, Pitie-Salpetriere Hospital, Sorbonne University, Paris, France
| | - Hervé Delingette
- Inria, Epione Team, Sophia Antipolis, Université Côte d'Azur, Nice, France
| | - Raphaele Renard-Penna
- Department of Radiology, GRC 5 Predictive Onco-Uro, AP-HP, Pitie-Salpetriere Hospital, Sorbonne University, Paris, France
| |
Collapse
|
48
|
Dashti M, Londono J, Ghasemi S, Zare N, Samman M, Ashi H, Amirzade-Iranaq MH, Khosraviani F, Sabeti M, Khurshid Z. Comparative analysis of deep learning algorithms for dental caries detection and prediction from radiographic images: a comprehensive umbrella review. PeerJ Comput Sci 2024; 10:e2371. [PMID: 39650341 PMCID: PMC11622875 DOI: 10.7717/peerj-cs.2371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 09/09/2024] [Indexed: 12/11/2024]
Abstract
Background In recent years, artificial intelligence (AI) and deep learning (DL) have made a considerable impact in dentistry, specifically in advancing image processing algorithms for detecting caries from radiographical images. Despite this progress, there is still a lack of data on the effectiveness of these algorithms in accurately identifying caries. This study provides an overview aimed at evaluating and comparing reviews that focus on the detection of dental caries (DC) using DL algorithms from 2D radiographs. Materials and Methods This comprehensive umbrella review adhered to the "Reporting guideline for overviews of reviews of healthcare interventions" (PRIOR). Specific keywords were generated to assess the accuracy of AI and DL algorithms in detecting DC from radiographical images. To ensure the highest quality of research, thorough searches were performed on PubMed/Medline, Web of Science, Scopus, and Embase. Additionally, bias in the selected articles was rigorously assessed using the Joanna Briggs Institute (JBI) tool. Results In this umbrella review, seven systematic reviews (SRs) were assessed from a total of 77 studies included. Various DL algorithms were used across these studies, with conventional neural networks and other techniques being the predominant methods for detecting DC. The SRs included in the study examined 24 original articles that used 2D radiographical images for caries detection. Accuracy rates varied between 0.733 and 0.986 across datasets ranging in size from 15 to 2,500 images. Conclusion The advancement of DL algorithms in detecting and predicting DC through radiographic imaging is a significant breakthrough. These algorithms excel in extracting subtle features from radiographic images and applying machine learning techniques to achieve highly accurate predictions, often outperforming human experts. This advancement holds immense potential to transform diagnostic processes in dentistry, promising to considerably improve patient outcomes.
Collapse
Affiliation(s)
- Mahmood Dashti
- Dentofacial Deformities Research Center, Research Institute of Dental Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Jimmy Londono
- Department of Prosthodontics, Dental College of Georgia at Augusta University, Augusta, Georgia, United States
| | - Shohreh Ghasemi
- Department of Oral and Maxillofacial Surgery, Queen Mary College of Medicine and Dentistry, London, United Kingdom
| | - Niusha Zare
- Department of Oral and Maxillofacial Radiology, Islamic Azad University Tehran Dental Branch, Tehran, Iran
| | - Meyassara Samman
- Department of Dental Public Health, College of Dentistry, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Heba Ashi
- Department of Dental Public Health, College of Dentistry, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mohammad Hosein Amirzade-Iranaq
- Faculty of Dentistry, Universal Scientific Education and Research Network (USERN), Tehran University of Medical Sciences, Tehran, Iran
| | | | - Mohammad Sabeti
- Department of Preventive and Restorative Dental Sciences, San Francisco School of Dentistry, San Francisco, CA, United States
| | - Zohaib Khurshid
- Department of Prosthodontics and Dental Implantology, King Faisal University, Al Hofuf, Saudi Arabia
| |
Collapse
|
49
|
Colacci M, Huang YQ, Postill G, Zhelnov P, Fennelly O, Verma A, Straus S, Tricco AC. Sociodemographic bias in clinical machine learning models: a scoping review of algorithmic bias instances and mechanisms. J Clin Epidemiol 2024; 178:111606. [PMID: 39532254 DOI: 10.1016/j.jclinepi.2024.111606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 10/22/2024] [Accepted: 11/06/2024] [Indexed: 11/16/2024]
Abstract
BACKGROUND AND OBJECTIVES Clinical machine learning (ML) technologies can sometimes be biased and their use could exacerbate health disparities. The extent to which bias is present, the groups who most frequently experience bias, and the mechanism through which bias is introduced in clinical ML applications is not well described. The objective of this study was to examine instances of bias in clinical ML models. We identified the sociodemographic subgroups PROGRESS that experienced bias and the reported mechanisms of bias introduction. METHODS We searched MEDLINE, EMBASE, PsycINFO, and Web of Science for all studies that evaluated bias on sociodemographic factors within ML algorithms created for the purpose of facilitating clinical care. The scoping review was conducted according to the Joanna Briggs Institute guide and reported using the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) extension for scoping reviews. RESULTS We identified 6448 articles, of which 760 reported on a clinical ML model and 91 (12.0%) completed a bias evaluation and met all inclusion criteria. Most studies evaluated a single sociodemographic factor (n = 56, 61.5%). The most frequently evaluated sociodemographic factor was race (n = 59, 64.8%), followed by sex/gender (n = 41, 45.1%), and age (n = 24, 26.4%), with one study (1.1%) evaluating intersectional factors. Of all studies, 74.7% (n = 68) reported that bias was present, 18.7% (n = 17) reported bias was not present, and 6.6% (n = 6) did not state whether bias was present. When present, 87% of studies reported bias against groups with socioeconomic disadvantage. CONCLUSION Most ML algorithms that were evaluated for bias demonstrated bias on sociodemographic factors. Furthermore, most bias evaluations concentrated on race, sex/gender, and age, while other sociodemographic factors and their intersection were infrequently assessed. Given potential health equity implications, bias assessments should be completed for all clinical ML models.
Collapse
Affiliation(s)
- Michael Colacci
- St. Michael's Hospital, Unity Health Toronto, Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada.
| | - Yu Qing Huang
- St. Michael's Hospital, Unity Health Toronto, Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada
| | - Gemma Postill
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada; Temerty Faculty of Medicine, University of Toronto, Toronto, Canada
| | - Pavel Zhelnov
- St. Michael's Hospital, Unity Health Toronto, Toronto, Canada
| | - Orna Fennelly
- St. Michael's Hospital, Unity Health Toronto, Toronto, Canada
| | - Amol Verma
- St. Michael's Hospital, Unity Health Toronto, Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada; Temerty Faculty of Medicine, University of Toronto, Toronto, Canada
| | - Sharon Straus
- St. Michael's Hospital, Unity Health Toronto, Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada; Temerty Faculty of Medicine, University of Toronto, Toronto, Canada
| | - Andrea C Tricco
- St. Michael's Hospital, Unity Health Toronto, Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada
| |
Collapse
|
50
|
Alis D, Tanyel T, Meltem E, Seker ME, Seker D, Karakaş HM, Karaarslan E, Öksüz İ. Choosing the right artificial intelligence solutions for your radiology department: key factors to consider. Diagn Interv Radiol 2024; 30:357-365. [PMID: 38682670 PMCID: PMC11589526 DOI: 10.4274/dir.2024.232658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Accepted: 04/15/2024] [Indexed: 05/01/2024]
Abstract
The rapid evolution of artificial intelligence (AI), particularly in deep learning, has significantly impacted radiology, introducing an array of AI solutions for interpretative tasks. This paper provides radiology departments with a practical guide for selecting and integrating AI solutions, focusing on interpretative tasks that require the active involvement of radiologists. Our approach is not to list available applications or review scientific evidence, as this information is readily available in previous studies; instead, we concentrate on the essential factors radiology departments must consider when choosing AI solutions. These factors include clinical relevance, performance and validation, implementation and integration, clinical usability, costs and return on investment, and regulations, security, and privacy. We illustrate each factor with hypothetical scenarios to provide a clearer understanding and practical relevance. Through our experience and literature review, we provide insights and a practical roadmap for radiologists to navigate the complex landscape of AI in radiology. We aim to assist in making informed decisions that enhance diagnostic precision, improve patient outcomes, and streamline workflows, thus contributing to the advancement of radiological practices and patient care.
Collapse
Affiliation(s)
- Deniz Alis
- Acıbadem Mehmet Ali Aydınlar University Faculty of Medicine, Department of Radiology, İstanbul, Türkiye
| | - Toygar Tanyel
- İstanbul Technical University, Biomedical Engineering Graduate Program, İstanbul, Türkiye
| | - Emine Meltem
- University of Health Sciences Türkiye, İstanbul Training and Research Hospital, Clinic of Diagnostic and Interventional Radiology, İstanbul, Türkiye
| | - Mustafa Ege Seker
- Acıbadem Mehmet Ali Aydınlar University Faculty of Medicine, Department of Radiology, İstanbul, Türkiye
| | - Delal Seker
- Dicle University Faculty of Engineering, Department of Electrical-Electronics Engineering, Diyarbakır, Türkiye
| | | | - Ercan Karaarslan
- Acıbadem Mehmet Ali Aydınlar University Faculty of Medicine, Department of Radiology, İstanbul, Türkiye
| | - İlkay Öksüz
- İstanbul Technical University Faculty of Engineering, Department of Computer Engineering, İstanbul, Türkiye
| |
Collapse
|