1
|
Justice AC, McMahon B, Madduri R, Crivelli S, Damrauer S, Cho K, Ramoni R, Muralidhar S. A landmark federal interagency collaboration to promote data science in health care: Million Veteran Program-Computational Health Analytics for Medical Precision to Improve Outcomes Now. JAMIA Open 2024; 7:ooae126. [PMID: 39507405 PMCID: PMC11540161 DOI: 10.1093/jamiaopen/ooae126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 09/23/2024] [Accepted: 10/16/2024] [Indexed: 11/08/2024] Open
Abstract
Objectives In 2016, the Department of Veterans Affairs (VA) and the Department of Energy (DOE) established an Interagency Agreement (IAA), the Million Veteran Program-Computational Health Analytics for Medical Precision to Improve Outcomes Now (MVP-CHAMPION) research collaboration. Materials and Methods Oversight fell under the VA Office of Research Development (VA ORD) and DOE headquarters. An Executive Committee and 2 senior scientific liaisons work with VA and DOE leadership to optimize efforts in the service of shared scientific goals. The program supported centralized data management and genomic analysis including creation of a scalable approach to cataloging phenotypes. Cross-cutting methods including natural language processing, image processing, and reusable code were developed. Results The 79.6 million dollar collaboration has supported centralized data management and genomic analysis including a scalable approach to cataloging phenotypes and launched over 10 collaborative scientific projects in health conditions highly prevalent in veterans. A ground-breaking analysis on the Summit and Andes supercomputers at the Oak Ridge National Laboratory (ORNL) of the genetic underpinnings of over 2000 health conditions across 44 million genetic variants which resulted in the identification of 38 270 independent genetic variants associating with one or more health traits. Of these, over 2000 identified associations were unique to non-European ancestry. Cross-cutting methods have advanced state-of-the-art artificial intelligence (AI) including large language natural language processing and a system biology study focused on opioid addiction awarded the 2018 Gordon Bell Prize for outstanding achievement in high-performance computing. The collaboration has completed work in prostate cancer, suicide prevention, and cardiovascular disease, and cross-cutting data science. Predictive models developed in these projects are being tested for application in clinical management. Discussion Eight new projects were launched in 2023, taking advantage of the momentum generated by the previous collaboration. A major challenge has been limitations in the scope of appropriated funds at DOE which cannot currently be used for health research. Conclusion Extensive multidisciplinary interactions take time to establish and are essential to continued progress. New funding models for maintaining high-performance computing infrastructure at the ORNL and for supporting continued collaboration by joint VA-DOE research teams are needed.
Collapse
Affiliation(s)
- Amy C Justice
- VA Connecticut Healthcare System, West Haven, CT 06516, United States
- Yale School of Medicine and Public Health, Yale University, New Haven, CT 06510, United States
| | - Benjamin McMahon
- Los Alamos National Laboratory, Los Alamos, NM 87545, United States
| | - Ravi Madduri
- Argonne National Laboratory, Argonne, IL 60439, United States
| | - Silvia Crivelli
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Scott Damrauer
- Penn Heart and Vascular Center, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Kelly Cho
- VA Boston Healthcare System, Boston, MA 02130, United States
| | - Rachel Ramoni
- Department of Veteran’s Affairs, Office of Research and Development, Veteran’s Health Administration, Washington, DC 20571, United States
| | - Sumitra Muralidhar
- Department of Veteran’s Affairs, Million Veteran Program, Veteran’s Health Administration, Washington, DC 20420, United States
| |
Collapse
|
2
|
Xie Y, Gu L, Harada T, Zhang J, Xia Y, Wu Q. Rethinking masked image modelling for medical image representation. Med Image Anal 2024; 98:103304. [PMID: 39173412 DOI: 10.1016/j.media.2024.103304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 08/06/2024] [Accepted: 08/09/2024] [Indexed: 08/24/2024]
Abstract
Masked Image Modelling (MIM), a form of self-supervised learning, has garnered significant success in computer vision by improving image representations using unannotated data. Traditional MIMs typically employ a strategy of random sampling across the image. However, this random masking technique may not be ideally suited for medical imaging, which possesses distinct characteristics divergent from natural images. In medical imaging, particularly in pathology, disease-related features are often exceedingly sparse and localized, while the remaining regions appear normal and undifferentiated. Additionally, medical images frequently accompany reports, directly pinpointing pathological changes' location. Inspired by this, we propose Masked medical Image Modelling (MedIM), a novel approach, to our knowledge, the first research that employs radiological reports to guide the masking and restore the informative areas of images, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge-driven masking (KDM), and sentence-driven masking (SDM). KDM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify symptom clues mapped to MeSH words (e.g., cardiac, edema, vascular, pulmonary) and guide the mask generation. Recognizing that radiological reports often comprise several sentences detailing varied findings, SDM integrates sentence-level information to identify key regions for masking. MedIM reconstructs images informed by this masking from the KDM and SDM modules, promoting a comprehensive and enriched medical image representation. Our extensive experiments on seven downstream tasks covering multi-label/class image classification, pneumothorax segmentation, and medical image-report analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image-report pre-training counterparts. Codes are available at https://github.com/YtongXie/MedIM.
Collapse
Affiliation(s)
| | - Lin Gu
- RIKEN AIP, Japan; RCAST, The University of Tokyo, Japan
| | | | - Jianpeng Zhang
- College of Computer Science and Technology, Zhejiang University, China
| | - Yong Xia
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China; Ningbo Institute of Northwestern Polytechnical University, Ningbo 315048, China
| | - Qi Wu
- University of Adelaide, Australia.
| |
Collapse
|
3
|
Zeng X, Abdullah N, Sumari P. Self-supervised learning framework application for medical image analysis: a review and summary. Biomed Eng Online 2024; 23:107. [PMID: 39465395 PMCID: PMC11514943 DOI: 10.1186/s12938-024-01299-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 10/17/2024] [Indexed: 10/29/2024] Open
Abstract
Manual annotation of medical image datasets is labor-intensive and prone to biases. Moreover, the rate at which image data accumulates significantly outpaces the speed of manual annotation, posing a challenge to the advancement of machine learning, particularly in the realm of supervised learning. Self-supervised learning is an emerging field that capitalizes on unlabeled data for training, thereby circumventing the need for extensive manual labeling. This learning paradigm generates synthetic pseudo-labels through pretext tasks, compelling the network to acquire image representations in a pseudo-supervised manner and subsequently fine-tuning with a limited set of annotated data to achieve enhanced performance. This review begins with an overview of prevalent types and advancements in self-supervised learning, followed by an exhaustive and systematic examination of methodologies within the medical imaging domain from 2018 to September 2024. The review encompasses a range of medical image modalities, including CT, MRI, X-ray, Histology, and Ultrasound. It addresses specific tasks, such as Classification, Localization, Segmentation, Reduction of False Positives, Improvement of Model Performance, and Enhancement of Image Quality. The analysis reveals a descending order in the volume of related studies, with CT and MRI leading the list, followed by X-ray, Histology, and Ultrasound. Except for CT and MRI, there is a greater prevalence of studies focusing on contrastive learning methods over generative learning approaches. The performance of MRI/Ultrasound classification and all image types segmentation still has room for further exploration. Generally, this review can provide conceptual guidance for medical professionals to combine self-supervised learning with their research.
Collapse
Affiliation(s)
- Xiangrui Zeng
- School of Computer Sciences, Universiti Sains Malaysia, USM, 11800, Pulau Pinang, Malaysia.
| | - Nibras Abdullah
- Faculty of Computer Studies, Arab Open University, Jeddah, Saudi Arabia.
| | - Putra Sumari
- School of Computer Sciences, Universiti Sains Malaysia, USM, 11800, Pulau Pinang, Malaysia
| |
Collapse
|
4
|
Hussain S, Naseem U, Ali M, Avendaño Avalos DB, Cardona-Huerta S, Bosques Palomo BA, Tamez-Peña JG. TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines. BMC Med Inform Decis Mak 2024; 24:310. [PMID: 39444035 PMCID: PMC11515610 DOI: 10.1186/s12911-024-02717-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 10/10/2024] [Indexed: 10/25/2024] Open
Abstract
BACKGROUND Recently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification. RESULTS The final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607). CONCLUSION In this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further.
Collapse
Affiliation(s)
- Sadam Hussain
- School of Engineering and Sciences, Tecnológico de Monterrey, Monterrey, 64849, Nuevo Leon, Mexico.
| | - Usman Naseem
- School of Computing, Macquarie University, Sydney, 2109, NSW, Australia
| | - Mansoor Ali
- School of Engineering and Sciences, Tecnológico de Monterrey, Monterrey, 64849, Nuevo Leon, Mexico
| | | | | | | | | |
Collapse
|
5
|
Anderson PG, Tarder-Stoll H, Alpaslan M, Keathley N, Levin DL, Venkatesh S, Bartel E, Sicular S, Howell S, Lindsey RV, Jones RM. Deep learning improves physician accuracy in the comprehensive detection of abnormalities on chest X-rays. Sci Rep 2024; 14:25151. [PMID: 39448764 PMCID: PMC11502915 DOI: 10.1038/s41598-024-76608-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 10/15/2024] [Indexed: 10/26/2024] Open
Abstract
Chest X-rays are the most commonly performed medical imaging exam, yet they are often misinterpreted by physicians. Here, we present an FDA-cleared, artificial intelligence (AI) system which uses a deep learning algorithm to assist physicians in the comprehensive detection and localization of abnormalities on chest X-rays. We trained and tested the AI system on a large dataset, assessed generalizability on publicly available data, and evaluated radiologist and non-radiologist physician accuracy when unaided and aided by the AI system. The AI system accurately detected chest X-ray abnormalities (AUC: 0.976, 95% bootstrap CI: 0.975, 0.976) and generalized to a publicly available dataset (AUC: 0.975, 95% bootstrap CI: 0.971, 0.978). Physicians showed significant improvements in detecting abnormalities on chest X-rays when aided by the AI system compared to when unaided (difference in AUC: 0.101, p < 0.001). Non-radiologist physicians detected abnormalities on chest X-ray exams as accurately as radiologists when aided by the AI system and were faster at evaluating chest X-rays when aided compared to unaided. Together, these results show that the AI system is accurate and reduces physician errors in chest X-ray evaluation, which highlights the potential of AI systems to improve access to fast, high-quality radiograph interpretation.
Collapse
Affiliation(s)
- Pamela G Anderson
- Imagen Technologies, 224 W 35th St Ste 500, New York, NY, 10001, USA.
| | | | - Mehmet Alpaslan
- Imagen Technologies, 224 W 35th St Ste 500, New York, NY, 10001, USA
| | - Nora Keathley
- Imagen Technologies, 224 W 35th St Ste 500, New York, NY, 10001, USA
| | - David L Levin
- Department of Radiology, Stanford University School of Medicine, 453 Quarry Rd, Palo Alto, CA, 94305, USA
| | - Srivas Venkatesh
- Imagen Technologies, 224 W 35th St Ste 500, New York, NY, 10001, USA
| | - Elliot Bartel
- Imagen Technologies, 224 W 35th St Ste 500, New York, NY, 10001, USA
| | - Serge Sicular
- Imagen Technologies, 224 W 35th St Ste 500, New York, NY, 10001, USA
- The Mount Sinai Hospital, 1 Gustave L. Levy Place, New York, NY, 10029, USA
| | - Scott Howell
- Imagen Technologies, 224 W 35th St Ste 500, New York, NY, 10001, USA
| | - Robert V Lindsey
- Imagen Technologies, 224 W 35th St Ste 500, New York, NY, 10001, USA
| | - Rebecca M Jones
- Imagen Technologies, 224 W 35th St Ste 500, New York, NY, 10001, USA
| |
Collapse
|
6
|
Lang W, Liu Z, Zhang Y. DACG: Dual Attention and Context Guidance model for radiology report generation. Med Image Anal 2024; 99:103377. [PMID: 39481215 DOI: 10.1016/j.media.2024.103377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 09/08/2024] [Accepted: 10/16/2024] [Indexed: 11/02/2024]
Abstract
Medical images are an essential basis for radiologists to write radiology reports and greatly help subsequent clinical treatment. The task of generating automatic radiology reports aims to alleviate the burden of clinical doctors writing reports and has received increasing attention this year, becoming an important research hotspot. However, there are severe issues of visual and textual data bias and long text generation in the medical field. Firstly, Abnormal areas in radiological images only account for a small portion, and most radiological reports only involve descriptions of normal findings. Secondly, there are still significant challenges in generating longer and more accurate descriptive texts for radiology report generation tasks. In this paper, we propose a new Dual Attention and Context Guidance (DACG) model to alleviate visual and textual data bias and promote the generation of long texts. We use a Dual Attention Module, including a Position Attention Block and a Channel Attention Block, to extract finer position and channel features from medical images, enhancing the image feature extraction ability of the encoder. We use the Context Guidance Module to integrate contextual information into the decoder and supervise the generation of long texts. The experimental results show that our proposed model achieves state-of-the-art performance on the most commonly used IU X-ray and MIMIC-CXR datasets. Further analysis also proves that our model can improve reporting through more accurate anomaly detection and more detailed descriptions. The source code is available at https://github.com/LangWY/DACG.
Collapse
Affiliation(s)
- Wangyu Lang
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, Liaoning, China
| | - Zhi Liu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, Liaoning, China
| | - Yijia Zhang
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, Liaoning, China.
| |
Collapse
|
7
|
Matta S, Lamard M, Zhang P, Le Guilcher A, Borderie L, Cochener B, Quellec G. A systematic review of generalization research in medical image classification. Comput Biol Med 2024; 183:109256. [PMID: 39427426 DOI: 10.1016/j.compbiomed.2024.109256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 09/17/2024] [Accepted: 10/06/2024] [Indexed: 10/22/2024]
Abstract
Numerous Deep Learning (DL) classification models have been developed for a large spectrum of medical image analysis applications, which promises to reshape various facets of medical practice. Despite early advances in DL model validation and implementation, which encourage healthcare institutions to adopt them, a fundamental questions remain: how can these models effectively handle domain shift? This question is crucial to limit DL models performance degradation. Medical data are dynamic and prone to domain shift, due to multiple factors. Two main shift types can occur over time: (1) covariate shift mainly arising due to updates to medical equipment and (2) concept shift caused by inter-grader variability. To mitigate the problem of domain shift, existing surveys mainly focus on domain adaptation techniques, with an emphasis on covariate shift. More generally, no work has reviewed the state-of-the-art solutions while focusing on the shift types. This paper aims to explore existing domain generalization methods for DL-based classification models through a systematic review of literature. It proposes a taxonomy based on the shift type they aim to solve. Papers were searched and gathered on Scopus till 10 April 2023, and after the eligibility screening and quality evaluation, 77 articles were identified. Exclusion criteria included: lack of methodological novelty (e.g., reviews, benchmarks), experiments conducted on a single mono-center dataset, or articles not written in English. The results of this paper show that learning based methods are emerging, for both shift types. Finally, we discuss future challenges, including the need for improved evaluation protocols and benchmarks, and envisioned future developments to achieve robust, generalized models for medical image classification.
Collapse
Affiliation(s)
- Sarah Matta
- Université de Bretagne Occidentale, Brest, Bretagne, 29200, France; Inserm, UMR 1101, Brest, F-29200, France.
| | - Mathieu Lamard
- Université de Bretagne Occidentale, Brest, Bretagne, 29200, France; Inserm, UMR 1101, Brest, F-29200, France
| | - Philippe Zhang
- Université de Bretagne Occidentale, Brest, Bretagne, 29200, France; Inserm, UMR 1101, Brest, F-29200, France; Evolucare Technologies, Villers-Bretonneux, F-80800, France
| | | | | | - Béatrice Cochener
- Université de Bretagne Occidentale, Brest, Bretagne, 29200, France; Inserm, UMR 1101, Brest, F-29200, France; Service d'Ophtalmologie, CHRU Brest, Brest, F-29200, France
| | | |
Collapse
|
8
|
Xu Z, Li J, Yao Q, Li H, Zhao M, Zhou SK. Addressing fairness issues in deep learning-based medical image analysis: a systematic review. NPJ Digit Med 2024; 7:286. [PMID: 39420149 PMCID: PMC11487181 DOI: 10.1038/s41746-024-01276-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 10/03/2024] [Indexed: 10/19/2024] Open
Abstract
Deep learning algorithms have demonstrated remarkable efficacy in various medical image analysis (MedIA) applications. However, recent research highlights a performance disparity in these algorithms when applied to specific subgroups, such as exhibiting poorer predictive performance in elderly females. Addressing this fairness issue has become a collaborative effort involving AI scientists and clinicians seeking to understand its origins and develop solutions for mitigation within MedIA. In this survey, we thoroughly examine the current advancements in addressing fairness issues in MedIA, focusing on methodological approaches. We introduce the basics of group fairness and subsequently categorize studies on fair MedIA into fairness evaluation and unfairness mitigation. Detailed methods employed in these studies are presented too. Our survey concludes with a discussion of existing challenges and opportunities in establishing a fair MedIA and healthcare system. By offering this comprehensive review, we aim to foster a shared understanding of fairness among AI researchers and clinicians, enhance the development of unfairness mitigation methods, and contribute to the creation of an equitable MedIA society.
Collapse
Affiliation(s)
- Zikang Xu
- School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, PR China
- Center for Medical Imaging, Robotics, Analytic Computing & Learning (MIRACLE), Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu, PR China
| | - Jun Li
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, PR China
| | - Qingsong Yao
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, PR China
| | - Han Li
- School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, PR China
- Center for Medical Imaging, Robotics, Analytic Computing & Learning (MIRACLE), Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu, PR China
| | - Mingyue Zhao
- School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, PR China
- Center for Medical Imaging, Robotics, Analytic Computing & Learning (MIRACLE), Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu, PR China
| | - S Kevin Zhou
- School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, PR China.
- Center for Medical Imaging, Robotics, Analytic Computing & Learning (MIRACLE), Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu, PR China.
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, PR China.
- Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui, PR China.
| |
Collapse
|
9
|
Kavandi H, Kulkarni P, Garin SP, Bachina P, Parekh VS, Yi PH. Radiomics-Based Prediction of Patient Demographic Characteristics on Chest Radiographs: Looking Beyond Deep Learning for Risk of Bias. AJR Am J Roentgenol 2024. [PMID: 39413236 DOI: 10.2214/ajr.24.31963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2024]
Affiliation(s)
- Hadiseh Kavandi
- Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, MD
| | - Pranav Kulkarni
- University of Maryland Institute for Health Computing (IHC), Bethesda, MD
| | - Sean P Garin
- F. Edward Hébert School of Medicine, Uniformed Services University of the Health Sciences (USU), Bethesda, MD
| | | | - Vishwa S Parekh
- Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, MD
| | - Paul H Yi
- Department of Radiology (DoR), St. Jude Children's Research Hospital, Memphis, TN
| |
Collapse
|
10
|
Sourlos N, Vliegenthart R, Santinha J, Klontzas ME, Cuocolo R, Huisman M, van Ooijen P. Recommendations for the creation of benchmark datasets for reproducible artificial intelligence in radiology. Insights Imaging 2024; 15:248. [PMID: 39400639 PMCID: PMC11473745 DOI: 10.1186/s13244-024-01833-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 09/20/2024] [Indexed: 10/15/2024] Open
Abstract
Various healthcare domains have witnessed successful preliminary implementation of artificial intelligence (AI) solutions, including radiology, though limited generalizability hinders their widespread adoption. Currently, most research groups and industry have limited access to the data needed for external validation studies. The creation and accessibility of benchmark datasets to validate such solutions represents a critical step towards generalizability, for which an array of aspects ranging from preprocessing to regulatory issues and biostatistical principles come into play. In this article, the authors provide recommendations for the creation of benchmark datasets in radiology, explain current limitations in this realm, and explore potential new approaches. CLINICAL RELEVANCE STATEMENT: Benchmark datasets, facilitating validation of AI software performance can contribute to the adoption of AI in clinical practice. KEY POINTS: Benchmark datasets are essential for the validation of AI software performance. Factors like image quality and representativeness of cases should be considered. Benchmark datasets can help adoption by increasing the trustworthiness and robustness of AI.
Collapse
Affiliation(s)
- Nikos Sourlos
- Department of Radiology, University Medical Center of Groningen, Groningen, The Netherlands
- DataScience Center in Health, University Medical Center Groningen, Groningen, The Netherlands
| | - Rozemarijn Vliegenthart
- Department of Radiology, University Medical Center of Groningen, Groningen, The Netherlands
- DataScience Center in Health, University Medical Center Groningen, Groningen, The Netherlands
| | - Joao Santinha
- Digital Surgery LAB, Champalimaud Foundation, Champalimaud Clinical Centre, Lisbon, Portugal
| | - Michail E Klontzas
- Department of Medical Imaging, University Hospital of Heraklion, Heraklion, Greece
- Department of Radiology, School of Medicine, University of Crete, Heraklion, Greece
| | - Renato Cuocolo
- Department of Medicine, Surgery, and Dentistry, University of Salerno, Baronissi, Italy
| | - Merel Huisman
- Department of Radiology and Nuclear Medicine, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Peter van Ooijen
- DataScience Center in Health, University Medical Center Groningen, Groningen, The Netherlands.
- Department of Radiation Oncology, University Medical Center Groningen, Groningen, The Netherlands.
| |
Collapse
|
11
|
Woźnicki P, Laqua C, Fiku I, Hekalo A, Truhn D, Engelhardt S, Kather J, Foersch S, D'Antonoli TA, Pinto Dos Santos D, Baeßler B, Laqua FC. Automatic structuring of radiology reports with on-premise open-source large language models. Eur Radiol 2024:10.1007/s00330-024-11074-y. [PMID: 39390261 DOI: 10.1007/s00330-024-11074-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 06/26/2024] [Accepted: 08/21/2024] [Indexed: 10/12/2024]
Abstract
OBJECTIVES Structured reporting enhances comparability, readability, and content detail. Large language models (LLMs) could convert free text into structured data without disrupting radiologists' reporting workflow. This study evaluated an on-premise, privacy-preserving LLM for automatically structuring free-text radiology reports. MATERIALS AND METHODS We developed an approach to controlling the LLM output, ensuring the validity and completeness of structured reports produced by a locally hosted Llama-2-70B-chat model. A dataset with de-identified narrative chest radiograph (CXR) reports was compiled retrospectively. It included 202 English reports from a publicly available MIMIC-CXR dataset and 197 German reports from our university hospital. Senior radiologist prepared a detailed, fully structured reporting template with 48 question-answer pairs. All reports were independently structured by the LLM and two human readers. Bayesian inference (Markov chain Monte Carlo sampling) was used to estimate the distributions of Matthews correlation coefficient (MCC), with [-0.05, 0.05] as the region of practical equivalence (ROPE). RESULTS The LLM generated valid structured reports in all cases, achieving an average MCC of 0.75 (94% HDI: 0.70-0.80) and F1 score of 0.70 (0.70-0.80) for English, and 0.66 (0.62-0.70) and 0.68 (0.64-0.72) for German reports, respectively. The MCC differences between LLM and humans were within ROPE for both languages: 0.01 (-0.05 to 0.07), 0.01 (-0.05 to 0.07) for English, and -0.01 (-0.07 to 0.05), 0.00 (-0.06 to 0.06) for German, indicating approximately comparable performance. CONCLUSION Locally hosted, open-source LLMs can automatically structure free-text radiology reports with approximately human accuracy. However, the understanding of semantics varied across languages and imaging findings. KEY POINTS Question Why has structured reporting not been widely adopted in radiology despite clear benefits and how can we improve this? Findings A locally hosted large language model successfully structured narrative reports, showing variation between languages and findings. Critical relevance Structured reporting provides many benefits, but its integration into the clinical routine is limited. Automating the extraction of structured information from radiology reports enables the capture of structured data while allowing the radiologist to maintain their reporting workflow.
Collapse
Affiliation(s)
- Piotr Woźnicki
- Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, Würzburg, Germany.
| | - Caroline Laqua
- Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, Würzburg, Germany
| | - Ina Fiku
- Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, Würzburg, Germany
| | - Amar Hekalo
- Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, Würzburg, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany
| | - Sandy Engelhardt
- Department of Internal Medicine III, Heidelberg University Hospital, Heidelberg, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany
| | - Jakob Kather
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, Technical University Dresden, Dresden, Germany
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
- Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany
| | - Sebastian Foersch
- Institute of Pathology, University Medical Center Mainz, Mainz, Germany
| | - Tugba Akinci D'Antonoli
- Institute of Radiology and Nuclear Medicine, Cantonal Hospital Baselland, Liestal, Switzerland
| | - Daniel Pinto Dos Santos
- Department of Diagnostic and Interventional Radiology, University of Cologne, Cologne, Germany
- Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - Bettina Baeßler
- Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, Würzburg, Germany
| | - Fabian Christopher Laqua
- Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, Würzburg, Germany
| |
Collapse
|
12
|
Hu L, Li D, Liu H, Chen X, Gao Y, Huang S, Peng X, Zhang X, Bai X, Yang H, Kong L, Tang J, Lu P, Xiong C, Liang H. Enhancing fairness in AI-enabled medical systems with the attribute neutral framework. Nat Commun 2024; 15:8767. [PMID: 39384748 PMCID: PMC11464531 DOI: 10.1038/s41467-024-52930-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 09/25/2024] [Indexed: 10/11/2024] Open
Abstract
Questions of unfairness and inequity pose critical challenges to the successful deployment of artificial intelligence (AI) in healthcare settings. In AI models, unequal performance across protected groups may be partially attributable to the learning of spurious or otherwise undesirable correlations between sensitive attributes and disease-related information. Here, we introduce the Attribute Neutral Framework, designed to disentangle biased attributes from disease-relevant information and subsequently neutralize them to improve representation across diverse subgroups. Within the framework, we develop the Attribute Neutralizer (AttrNzr) to generate neutralized data, for which protected attributes can no longer be easily predicted by humans or by machine learning classifiers. We then utilize these data to train the disease diagnosis model (DDM). Comparative analysis with other unfairness mitigation algorithms demonstrates that AttrNzr outperforms in reducing the unfairness of the DDM while maintaining DDM's overall disease diagnosis performance. Furthermore, AttrNzr supports the simultaneous neutralization of multiple attributes and demonstrates utility even when applied solely during the training phase, without being used in the test phase. Moreover, instead of introducing additional constraints to the DDM, the AttrNzr directly addresses a root cause of unfairness, providing a model-independent solution. Our results with AttrNzr highlight the potential of data-centered and model-independent solutions for fairness challenges in AI-enabled medical systems.
Collapse
Affiliation(s)
- Lianting Hu
- The Data Center, Wuhan Children's Hospital (Wuhan Maternal and Child Healthcare Hospital), Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430016, Hubei, China
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China
- Medical Big Data Center, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, Guangdong, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China
| | - Dantong Li
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China
- Medical Big Data Center, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, Guangdong, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China
| | - Huazhang Liu
- Medical Big Data Center, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, Guangdong, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China
| | - Xuanhui Chen
- Medical Big Data Center, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, Guangdong, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China
| | - Yunfei Gao
- Medical Big Data Center, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, Guangdong, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China
| | - Shuai Huang
- Medical Big Data Center, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, Guangdong, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China
| | - Xiaoting Peng
- Medical Big Data Center, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, Guangdong, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China
| | - Xueli Zhang
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China
- Medical Research Institute, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, Guangdong, China
- Guangdong Eye Institute, Department of Ophthalmology, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, Guangdong, China
| | - Xiaohe Bai
- School of Physical Sciences, University of California San Diego, La Jolla, San Diego, CA, 92093, USA
| | - Huan Yang
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China
- Medical Big Data Center, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, Guangdong, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China
| | - Lingcong Kong
- Medical Big Data Center, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, Guangdong, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China
| | - Jiajie Tang
- Clinical Medical Research Center, Xinqiao Hospital, Army Medical University, Chongqing, 400037, China
| | - Peixin Lu
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, 45229, USA
| | - Chao Xiong
- The Data Center, Wuhan Children's Hospital (Wuhan Maternal and Child Healthcare Hospital), Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430016, Hubei, China.
| | - Huiying Liang
- The Data Center, Wuhan Children's Hospital (Wuhan Maternal and Child Healthcare Hospital), Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430016, Hubei, China.
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China.
- Medical Big Data Center, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, Guangdong, China.
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, 510080, Guangdong, China.
| |
Collapse
|
13
|
Jang J, Kyung D, Kim SH, Lee H, Bae K, Choi E. Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders. Sci Rep 2024; 14:23199. [PMID: 39369048 PMCID: PMC11455863 DOI: 10.1038/s41598-024-73695-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 09/19/2024] [Indexed: 10/07/2024] Open
Abstract
Deep neural networks are increasingly used in medical imaging for tasks such as pathological classification, but they face challenges due to the scarcity of high-quality, expert-labeled training data. Recent efforts have utilized pre-trained contrastive image-text models like CLIP, adapting them for medical use by fine-tuning the model with chest X-ray images and corresponding reports for zero-shot pathology classification, thus eliminating the need for pathology-specific annotations. However, most studies continue to use the same contrastive learning objectives as in the general domain, overlooking the multi-labeled nature of medical image-report pairs. In this paper, we propose a new fine-tuning strategy that includes positive-pair loss relaxation and random sentence sampling. We aim to improve the performance of zero-shot pathology classification without relying on external knowledge. Our method can be applied to any pre-trained contrastive image-text encoder and easily transferred to out-of-domain datasets without further training, as it does not use external data. Our approach consistently improves overall zero-shot pathology classification across four chest X-ray datasets and three pre-trained models, with an average macro AUROC increase of 4.3%. Additionally, our method outperforms the state-of-the-art and marginally surpasses board-certified radiologists in zero-shot classification for the five competition pathologies in the CheXpert dataset.
Collapse
Affiliation(s)
| | - Daeun Kyung
- KAIST, Kim Jaechul Graduate School of AI, 34141, Daejeon, Republic of Korea
| | | | - Honglak Lee
- LG AI Research, 07796, Seoul, Republic of Korea
| | | | - Edward Choi
- KAIST, Kim Jaechul Graduate School of AI, 34141, Daejeon, Republic of Korea.
| |
Collapse
|
14
|
Yan L, Zhao J, Shi D, Li D, Liu Y. HF-CMN: a medical report generation model for heart failure. Med Biol Eng Comput 2024:10.1007/s11517-024-03197-7. [PMID: 39358488 DOI: 10.1007/s11517-024-03197-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 09/11/2024] [Indexed: 10/04/2024]
Abstract
Heart failure represents the ultimate stage in the progression of diverse cardiac ailments. Throughout the management of heart failure, physicians require observation of medical imagery to formulate therapeutic regimens for patients. Automated report generation technology serves as a tool aiding physicians in patient management. However, previous studies failed to generate targeted reports for specific diseases. To produce high-quality medical reports with greater relevance across diverse conditions, we introduce an automatic report generation model HF-CMN, tailored to heart failure. Firstly, the generated report includes comprehensive information pertaining to heart failure gleaned from chest radiographs. Additionally, we construct a storage query matrix grouping based on a multi-label type, enhancing the accuracy of our model in aligning images with text. Experimental results demonstrate that our method can generate reports strongly correlated with heart failure and outperforms most other advanced methods on benchmark datasets MIMIC-CXR and IU X-Ray. Further analysis confirms that our method achieves superior alignment between images and texts, resulting in higher-quality reports.
Collapse
Affiliation(s)
- Liangquan Yan
- College of Electronic Information and Optical Engineering, Taiyuan University of Technology, Taiyuan, 030024, China
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, 030024, China
| | - Jumin Zhao
- College of Electronic Information and Optical Engineering, Taiyuan University of Technology, Taiyuan, 030024, China
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, 030024, China
- Intelligent Perception Engineering Technology Center of Shanxi, Taiyuan, 030024, China
- Shanxi Province Engineering Technology Research Center of Spatial Information Network, Taiyuan, 030024, China
| | - Danyang Shi
- College of Computer Science and Technology (College of Big Data), Taiyuan University of Technology, Taiyuan, 030024, China
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, 030024, China
| | - Dengao Li
- College of Computer Science and Technology (College of Big Data), Taiyuan University of Technology, Taiyuan, 030024, China.
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, 030024, China.
- Intelligent Perception Engineering Technology Center of Shanxi, Taiyuan, 030024, China.
- Shanxi Province Engineering Technology Research Center of Spatial Information Network, Taiyuan, 030024, China.
| | - Yi Liu
- College of Computer Science and Technology (College of Big Data), Taiyuan University of Technology, Taiyuan, 030024, China
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, 030024, China
| |
Collapse
|
15
|
Yang R, Zeng Q, You K, Qiao Y, Huang L, Hsieh CC, Rosand B, Goldwasser J, Dave A, Keenan T, Ke Y, Hong C, Liu N, Chew E, Radev D, Lu Z, Xu H, Chen Q, Li I. Ascle-A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study. J Med Internet Res 2024; 26:e60601. [PMID: 39361955 PMCID: PMC11487205 DOI: 10.2196/60601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/08/2024] [Accepted: 07/15/2024] [Indexed: 10/05/2024] Open
Abstract
BACKGROUND Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. OBJECTIVE This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. METHODS We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. RESULTS The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). CONCLUSIONS This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.
Collapse
Affiliation(s)
- Rui Yang
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Qingcheng Zeng
- Department of Linguistics, Northwestern University, Evanston, IL, United States
| | - Keen You
- Department of Computer Science, Yale University, New Haven, CT, United States
| | - Yujie Qiao
- Yale School of Public Health, Yale University, New Haven, CT, United States
| | - Lucas Huang
- Department of Computer Science, Yale University, New Haven, CT, United States
| | - Chia-Chun Hsieh
- Department of Computer Science, Yale University, New Haven, CT, United States
| | - Benjamin Rosand
- Department of Computer Science, Yale University, New Haven, CT, United States
| | - Jeremy Goldwasser
- Department of Computer Science, Yale University, New Haven, CT, United States
| | - Amisha Dave
- Yale New Haven Hospital, Yale School of Medicine, Yale University, New Haven, CT, United States
| | - Tiarnan Keenan
- Division of Epidemiology and Clinical Applications, National Eye Institute, National Institutes of Health, Bethesda, MD, United States
| | - Yuhe Ke
- Department of Anesthesiology, Singapore General Hospital, Singapore, Singapore
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
- Program in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
- Institute of Data Science, National University of Singapore, Singapore, Singapore
| | - Emily Chew
- Division of Epidemiology and Clinical Applications, National Eye Institute, National Institutes of Health, Bethesda, MD, United States
| | - Dragomir Radev
- Department of Computer Science, Yale University, New Haven, CT, United States
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Hua Xu
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, United States
| | - Qingyu Chen
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, United States
| | - Irene Li
- Information Technology Center, University of Tokyo, Kashiwa, Japan
- Smartor LLC, Tokyo, Japan
| |
Collapse
|
16
|
Liu H, Seedat N, Ive J. Modeling disagreement in automatic data labeling for semi-supervised learning in Clinical Natural Language Processing. Front Artif Intell 2024; 7:1374162. [PMID: 39415941 PMCID: PMC11480042 DOI: 10.3389/frai.2024.1374162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 09/05/2024] [Indexed: 10/19/2024] Open
Abstract
Introduction Computational models providing accurate estimates of their uncertainty are crucial for risk management associated with decision-making in healthcare contexts. This is especially true since many state-of-the-art systems are trained using the data which have been labeled automatically (self-supervised mode) and tend to overfit. Methods In this study, we investigate the quality of uncertainty estimates from a range of current state-of-the-art predictive models applied to the problem of observation detection in radiology reports. This problem remains understudied for Natural Language Processing in the healthcare domain. Results We demonstrate that Gaussian Processes (GPs) provide superior performance in quantifying the risks of three uncertainty labels based on the negative log predictive probability (NLPP) evaluation metric and mean maximum predicted confidence levels (MMPCL), whilst retaining strong predictive performance. Discussion Our conclusions highlight the utility of probabilistic models applied to "noisy" labels and that similar methods could provide utility for Natural Language Processing (NLP) based automated labeling tasks.
Collapse
Affiliation(s)
- Hongshu Liu
- Department of Computing, Imperial College London, London, United Kingdom
| | - Nabeel Seedat
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, United Kingdom
| | - Julia Ive
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom
| |
Collapse
|
17
|
Cai W. Uncovering Demographic Bias in Natural Language Processing Tools for Radiology. Radiology 2024; 313:e242723. [PMID: 39436296 DOI: 10.1148/radiol.242723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Affiliation(s)
- Wenli Cai
- From the Global Alliance for Intelligent Oncology, 62 Edgemere Rd, Quincy, MA 02169
| |
Collapse
|
18
|
Yang Y, Zhang H, Gichoya JW, Katabi D, Ghassemi M. The limits of fair medical imaging AI in real-world generalization. Nat Med 2024; 30:2838-2848. [PMID: 38942996 PMCID: PMC11485237 DOI: 10.1038/s41591-024-03113-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 06/05/2024] [Indexed: 06/30/2024]
Abstract
As artificial intelligence (AI) rapidly approaches human-level performance in medical imaging, it is crucial that it does not exacerbate or propagate healthcare disparities. Previous research established AI's capacity to infer demographic data from chest X-rays, leading to a key concern: do models using demographic shortcuts have unfair predictions across subpopulations? In this study, we conducted a thorough investigation into the extent to which medical AI uses demographic encodings, focusing on potential fairness discrepancies within both in-distribution training sets and external test sets. Our analysis covers three key medical imaging disciplines-radiology, dermatology and ophthalmology-and incorporates data from six global chest X-ray datasets. We confirm that medical imaging AI leverages demographic shortcuts in disease classification. Although correcting shortcuts algorithmically effectively addresses fairness gaps to create 'locally optimal' models within the original data distribution, this optimality is not true in new test settings. Surprisingly, we found that models with less encoding of demographic attributes are often most 'globally optimal', exhibiting better fairness during model evaluation in new test environments. Our work establishes best practices for medical imaging models that maintain their performance and fairness in deployments beyond their initial training contexts, underscoring critical considerations for AI clinical deployments across populations and sites.
Collapse
Affiliation(s)
- Yuzhe Yang
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Haoran Zhang
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Judy W Gichoya
- Department of Radiology, Emory University School of Medicine, Atlanta, GA, USA
| | - Dina Katabi
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Marzyeh Ghassemi
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
19
|
Silva-Rodríguez J, Chakor H, Kobbi R, Dolz J, Ben Ayed I. A Foundation Language-Image Model of the Retina (FLAIR): encoding expert knowledge in text supervision. Med Image Anal 2024; 99:103357. [PMID: 39418828 DOI: 10.1016/j.media.2024.103357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 05/06/2024] [Accepted: 09/23/2024] [Indexed: 10/19/2024]
Abstract
Foundation vision-language models are currently transforming computer vision, and are on the rise in medical imaging fueled by their very promising generalization capabilities. However, the initial attempts to transfer this new paradigm to medical imaging have shown less impressive performances than those observed in other domains, due to the significant domain shift and the complex, expert domain knowledge inherent to medical-imaging tasks. Motivated by the need for domain-expert foundation models, we present FLAIR, a pre-trained vision-language model for universal retinal fundus image understanding. To this end, we compiled 38 open-access, mostly categorical fundus imaging datasets from various sources, with up to 101 different target conditions and 288,307 images. We integrate the expert's domain knowledge in the form of descriptive textual prompts, during both pre-training and zero-shot inference, enhancing the less-informative categorical supervision of the data. Such a textual expert's knowledge, which we compiled from the relevant clinical literature and community standards, describes the fine-grained features of the pathologies as well as the hierarchies and dependencies between them. We report comprehensive evaluations, which illustrate the benefit of integrating expert knowledge and the strong generalization capabilities of FLAIR under difficult scenarios with domain shifts or unseen categories. When adapted with a lightweight linear probe, FLAIR outperforms fully-trained, dataset-focused models, more so in the few-shot regimes. Interestingly, FLAIR outperforms by a wide margin larger-scale generalist image-language models and retina domain-specific self-supervised networks, which emphasizes the potential of embedding experts' domain knowledge and the limitations of generalist models in medical imaging. The pre-trained model is available at: https://github.com/jusiro/FLAIR.
Collapse
Affiliation(s)
| | | | | | - Jose Dolz
- ÉTS Montréal, Québec, Canada; Centre de Recherche du Centre Hospitalier de l'Université de Montréal (CR-CHUM), Québec, Canada
| | - Ismail Ben Ayed
- ÉTS Montréal, Québec, Canada; Centre de Recherche du Centre Hospitalier de l'Université de Montréal (CR-CHUM), Québec, Canada
| |
Collapse
|
20
|
Holste G, Zhou Y, Wang S, Jaiswal A, Lin M, Zhuge S, Yang Y, Kim D, Nguyen-Mau TH, Tran MT, Jeong J, Park W, Ryu J, Hong F, Verma A, Yamagishi Y, Kim C, Seo H, Kang M, Celi LA, Lu Z, Summers RM, Shih G, Wang Z, Peng Y. Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge. Med Image Anal 2024; 97:103224. [PMID: 38850624 PMCID: PMC11365790 DOI: 10.1016/j.media.2024.103224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/01/2024] [Accepted: 05/27/2024] [Indexed: 06/10/2024]
Abstract
Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" - there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification.
Collapse
Affiliation(s)
- Gregory Holste
- Department of Electrical and Computer Engineering, The University of Texas at Austin, 78712, Austin, TX, USA
| | - Yiliang Zhou
- Department of Population Health Sciences, Weill Cornell Medicine, 10065, New York, NY, USA
| | - Song Wang
- Department of Electrical and Computer Engineering, The University of Texas at Austin, 78712, Austin, TX, USA
| | - Ajay Jaiswal
- Department of Electrical and Computer Engineering, The University of Texas at Austin, 78712, Austin, TX, USA
| | - Mingquan Lin
- Department of Population Health Sciences, Weill Cornell Medicine, 10065, New York, NY, USA
| | - Sherry Zhuge
- School of Information Systems, Carnegie Mellon University, 15213, Pittsburgh, PA, USA
| | - Yuzhe Yang
- Department of Electrical Engineering and Computer Science, Massachussetts Institute of Technology, 02139, Cambridge, MA, USA
| | - Dongkyun Kim
- School of Computer Science, Carnegie Mellon University, 15213, Pittsburgh, PA, USA
| | | | | | - Jaehyup Jeong
- KT Research & Development Center, KT Corporation, 06763, Seoul, South Korea
| | - Wongi Park
- Department of Software and Computer Engineering, Ajou University, 16499, Suwon, South Korea
| | - Jongbin Ryu
- Department of Software and Computer Engineering, Ajou University, 16499, Suwon, South Korea
| | - Feng Hong
- Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Arsh Verma
- Wadhwani Institute for Artificial Intelligence, 400079, Mumbai, India
| | - Yosuke Yamagishi
- Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, 113-0033, Tokyo, Japan
| | - Changhyun Kim
- BioMedical AI Team, AIX Future R&D Center, SK Telecom, 04539, Seoul, South Korea
| | - Hyeryeong Seo
- Interdisciplinary Program in AI (IPAI), Seoul National University, 02504, Seoul, South Korea
| | - Myungjoo Kang
- Department of Mathematical Sciences, Seoul National University, 02504, Seoul, South Korea
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, 02139, Cambridge, MA, USA; Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, 02215, Boston, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, 02115, Boston, MA, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, 20894, Bethesda, MD, USA
| | - Ronald M Summers
- Clinical Center, National Institutes of Health, 20892, Bethesda, MD, USA
| | - George Shih
- Department of Radiology, Weill Cornell Medicine, 10065, New York, NY, USA
| | - Zhangyang Wang
- Department of Electrical and Computer Engineering, The University of Texas at Austin, 78712, Austin, TX, USA.
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, 10065, New York, NY, USA.
| |
Collapse
|
21
|
Dorfner FJ, Jürgensen L, Donle L, Al Mohamad F, Bodenmann TR, Cleveland MC, Busch F, Adams LC, Sato J, Schultz T, Kim AE, Merkow J, Bressem KK, Bridge CP, Atzen S. Comparing Commercial and Open-Source Large Language Models for Labeling Chest Radiograph Reports. Radiology 2024; 313:e241139. [PMID: 39470431 PMCID: PMC11535875 DOI: 10.1148/radiol.241139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 08/18/2024] [Accepted: 08/26/2024] [Indexed: 10/30/2024]
Abstract
Background Rapid advances in large language models (LLMs) have led to the development of numerous commercial and open-source models. While recent publications have explored OpenAI's GPT-4 to extract information of interest from radiology reports, there has not been a real-world comparison of GPT-4 to leading open-source models. Purpose To compare different leading open-source LLMs to GPT-4 on the task of extracting relevant findings from chest radiograph reports. Materials and Methods Two independent datasets of free-text radiology reports from chest radiograph examinations were used in this retrospective study performed between February 2, 2024, and February 14, 2024. The first dataset consisted of reports from the ImaGenome dataset, providing reference standard annotations from the MIMIC-CXR database acquired between 2011 and 2016. The second dataset consisted of randomly selected reports created at the Massachusetts General Hospital between July 2019 and July 2021. In both datasets, the commercial models GPT-3.5 Turbo and GPT-4 were compared with open-source models that included Mistral-7B and Mixtral-8 × 7B (Mistral AI), Llama 2-13B and Llama 2-70B (Meta), and Qwen1.5-72B (Alibaba Group), as well as CheXbert and CheXpert-labeler (Stanford ML Group), in their ability to accurately label the presence of multiple findings in radiograph text reports using zero-shot and few-shot prompting. The McNemar test was used to compare F1 scores between models. Results On the ImaGenome dataset (n = 450), the open-source model with the highest score, Llama 2-70B, achieved micro F1 scores of 0.97 and 0.97 for zero-shot and few-shot prompting, respectively, compared with the GPT-4 F1 scores of 0.98 and 0.98 (P > .99 and < .001 for superiority of GPT-4). On the institutional dataset (n = 500), the open-source model with the highest score, an ensemble model, achieved micro F1 scores of 0.96 and 0.97 for zero-shot and few-shot prompting, respectively, compared with the GPT-4 F1 scores of 0.98 and 0.97 (P < .001 and > .99 for superiority of GPT-4). Conclusion Although GPT-4 was superior to open-source models in zero-shot report labeling, few-shot prompting with a small number of example reports closely matched the performance of GPT-4. The benefit of few-shot prompting varied across datasets and models. © RSNA, 2024 Supplemental material is available for this article.
Collapse
Affiliation(s)
- Felix J. Dorfner
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital and Harvard Medical School, 149 Thirteenth St,
Charlestown, MA 02129 (F.J.D., T.R.B., M.C.C., A.E.K., C.P.B.); Department of
Radiology, Charité-Universitätsmedizin Berlin, corporate member of
Freie Universität Berlin and Humboldt Universität zu Berlin,
Berlin, Germany (F.J.D., L.D., F.A.M., F.B., L.J.); Department of Pediatric
Oncology, Dana-Farber Cancer Institute, Boston, Mass (L.J.); Department of
Diagnostic and Interventional Radiology, Technical University of Munich, Munich,
Germany (L.C.A.); Mass General Brigham Data Science Office, Boston, Mass (J.S.,
T.S., C.P.B.); Microsoft Health and Life Sciences (HLS), Redmond, Wash (J.M.);
Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
(K.K.B.); Department of Radiology and Nuclear Medicine, German Heart Center
Munich, Munich, Germany (K.K.B.); and Department of Cardiovascular Radiology and
Nuclear Medicine, Technical University of Munich, School of Medicine and Health,
German Heart Center, TUM University Hospital, Munich, Germany (K.K.B.)
| | - Liv Jürgensen
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital and Harvard Medical School, 149 Thirteenth St,
Charlestown, MA 02129 (F.J.D., T.R.B., M.C.C., A.E.K., C.P.B.); Department of
Radiology, Charité-Universitätsmedizin Berlin, corporate member of
Freie Universität Berlin and Humboldt Universität zu Berlin,
Berlin, Germany (F.J.D., L.D., F.A.M., F.B., L.J.); Department of Pediatric
Oncology, Dana-Farber Cancer Institute, Boston, Mass (L.J.); Department of
Diagnostic and Interventional Radiology, Technical University of Munich, Munich,
Germany (L.C.A.); Mass General Brigham Data Science Office, Boston, Mass (J.S.,
T.S., C.P.B.); Microsoft Health and Life Sciences (HLS), Redmond, Wash (J.M.);
Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
(K.K.B.); Department of Radiology and Nuclear Medicine, German Heart Center
Munich, Munich, Germany (K.K.B.); and Department of Cardiovascular Radiology and
Nuclear Medicine, Technical University of Munich, School of Medicine and Health,
German Heart Center, TUM University Hospital, Munich, Germany (K.K.B.)
| | - Leonhard Donle
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital and Harvard Medical School, 149 Thirteenth St,
Charlestown, MA 02129 (F.J.D., T.R.B., M.C.C., A.E.K., C.P.B.); Department of
Radiology, Charité-Universitätsmedizin Berlin, corporate member of
Freie Universität Berlin and Humboldt Universität zu Berlin,
Berlin, Germany (F.J.D., L.D., F.A.M., F.B., L.J.); Department of Pediatric
Oncology, Dana-Farber Cancer Institute, Boston, Mass (L.J.); Department of
Diagnostic and Interventional Radiology, Technical University of Munich, Munich,
Germany (L.C.A.); Mass General Brigham Data Science Office, Boston, Mass (J.S.,
T.S., C.P.B.); Microsoft Health and Life Sciences (HLS), Redmond, Wash (J.M.);
Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
(K.K.B.); Department of Radiology and Nuclear Medicine, German Heart Center
Munich, Munich, Germany (K.K.B.); and Department of Cardiovascular Radiology and
Nuclear Medicine, Technical University of Munich, School of Medicine and Health,
German Heart Center, TUM University Hospital, Munich, Germany (K.K.B.)
| | - Fares Al Mohamad
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital and Harvard Medical School, 149 Thirteenth St,
Charlestown, MA 02129 (F.J.D., T.R.B., M.C.C., A.E.K., C.P.B.); Department of
Radiology, Charité-Universitätsmedizin Berlin, corporate member of
Freie Universität Berlin and Humboldt Universität zu Berlin,
Berlin, Germany (F.J.D., L.D., F.A.M., F.B., L.J.); Department of Pediatric
Oncology, Dana-Farber Cancer Institute, Boston, Mass (L.J.); Department of
Diagnostic and Interventional Radiology, Technical University of Munich, Munich,
Germany (L.C.A.); Mass General Brigham Data Science Office, Boston, Mass (J.S.,
T.S., C.P.B.); Microsoft Health and Life Sciences (HLS), Redmond, Wash (J.M.);
Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
(K.K.B.); Department of Radiology and Nuclear Medicine, German Heart Center
Munich, Munich, Germany (K.K.B.); and Department of Cardiovascular Radiology and
Nuclear Medicine, Technical University of Munich, School of Medicine and Health,
German Heart Center, TUM University Hospital, Munich, Germany (K.K.B.)
| | - Tobias R. Bodenmann
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital and Harvard Medical School, 149 Thirteenth St,
Charlestown, MA 02129 (F.J.D., T.R.B., M.C.C., A.E.K., C.P.B.); Department of
Radiology, Charité-Universitätsmedizin Berlin, corporate member of
Freie Universität Berlin and Humboldt Universität zu Berlin,
Berlin, Germany (F.J.D., L.D., F.A.M., F.B., L.J.); Department of Pediatric
Oncology, Dana-Farber Cancer Institute, Boston, Mass (L.J.); Department of
Diagnostic and Interventional Radiology, Technical University of Munich, Munich,
Germany (L.C.A.); Mass General Brigham Data Science Office, Boston, Mass (J.S.,
T.S., C.P.B.); Microsoft Health and Life Sciences (HLS), Redmond, Wash (J.M.);
Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
(K.K.B.); Department of Radiology and Nuclear Medicine, German Heart Center
Munich, Munich, Germany (K.K.B.); and Department of Cardiovascular Radiology and
Nuclear Medicine, Technical University of Munich, School of Medicine and Health,
German Heart Center, TUM University Hospital, Munich, Germany (K.K.B.)
| | - Mason C. Cleveland
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital and Harvard Medical School, 149 Thirteenth St,
Charlestown, MA 02129 (F.J.D., T.R.B., M.C.C., A.E.K., C.P.B.); Department of
Radiology, Charité-Universitätsmedizin Berlin, corporate member of
Freie Universität Berlin and Humboldt Universität zu Berlin,
Berlin, Germany (F.J.D., L.D., F.A.M., F.B., L.J.); Department of Pediatric
Oncology, Dana-Farber Cancer Institute, Boston, Mass (L.J.); Department of
Diagnostic and Interventional Radiology, Technical University of Munich, Munich,
Germany (L.C.A.); Mass General Brigham Data Science Office, Boston, Mass (J.S.,
T.S., C.P.B.); Microsoft Health and Life Sciences (HLS), Redmond, Wash (J.M.);
Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
(K.K.B.); Department of Radiology and Nuclear Medicine, German Heart Center
Munich, Munich, Germany (K.K.B.); and Department of Cardiovascular Radiology and
Nuclear Medicine, Technical University of Munich, School of Medicine and Health,
German Heart Center, TUM University Hospital, Munich, Germany (K.K.B.)
| | - Felix Busch
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital and Harvard Medical School, 149 Thirteenth St,
Charlestown, MA 02129 (F.J.D., T.R.B., M.C.C., A.E.K., C.P.B.); Department of
Radiology, Charité-Universitätsmedizin Berlin, corporate member of
Freie Universität Berlin and Humboldt Universität zu Berlin,
Berlin, Germany (F.J.D., L.D., F.A.M., F.B., L.J.); Department of Pediatric
Oncology, Dana-Farber Cancer Institute, Boston, Mass (L.J.); Department of
Diagnostic and Interventional Radiology, Technical University of Munich, Munich,
Germany (L.C.A.); Mass General Brigham Data Science Office, Boston, Mass (J.S.,
T.S., C.P.B.); Microsoft Health and Life Sciences (HLS), Redmond, Wash (J.M.);
Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
(K.K.B.); Department of Radiology and Nuclear Medicine, German Heart Center
Munich, Munich, Germany (K.K.B.); and Department of Cardiovascular Radiology and
Nuclear Medicine, Technical University of Munich, School of Medicine and Health,
German Heart Center, TUM University Hospital, Munich, Germany (K.K.B.)
| | - Lisa C. Adams
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital and Harvard Medical School, 149 Thirteenth St,
Charlestown, MA 02129 (F.J.D., T.R.B., M.C.C., A.E.K., C.P.B.); Department of
Radiology, Charité-Universitätsmedizin Berlin, corporate member of
Freie Universität Berlin and Humboldt Universität zu Berlin,
Berlin, Germany (F.J.D., L.D., F.A.M., F.B., L.J.); Department of Pediatric
Oncology, Dana-Farber Cancer Institute, Boston, Mass (L.J.); Department of
Diagnostic and Interventional Radiology, Technical University of Munich, Munich,
Germany (L.C.A.); Mass General Brigham Data Science Office, Boston, Mass (J.S.,
T.S., C.P.B.); Microsoft Health and Life Sciences (HLS), Redmond, Wash (J.M.);
Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
(K.K.B.); Department of Radiology and Nuclear Medicine, German Heart Center
Munich, Munich, Germany (K.K.B.); and Department of Cardiovascular Radiology and
Nuclear Medicine, Technical University of Munich, School of Medicine and Health,
German Heart Center, TUM University Hospital, Munich, Germany (K.K.B.)
| | - James Sato
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital and Harvard Medical School, 149 Thirteenth St,
Charlestown, MA 02129 (F.J.D., T.R.B., M.C.C., A.E.K., C.P.B.); Department of
Radiology, Charité-Universitätsmedizin Berlin, corporate member of
Freie Universität Berlin and Humboldt Universität zu Berlin,
Berlin, Germany (F.J.D., L.D., F.A.M., F.B., L.J.); Department of Pediatric
Oncology, Dana-Farber Cancer Institute, Boston, Mass (L.J.); Department of
Diagnostic and Interventional Radiology, Technical University of Munich, Munich,
Germany (L.C.A.); Mass General Brigham Data Science Office, Boston, Mass (J.S.,
T.S., C.P.B.); Microsoft Health and Life Sciences (HLS), Redmond, Wash (J.M.);
Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
(K.K.B.); Department of Radiology and Nuclear Medicine, German Heart Center
Munich, Munich, Germany (K.K.B.); and Department of Cardiovascular Radiology and
Nuclear Medicine, Technical University of Munich, School of Medicine and Health,
German Heart Center, TUM University Hospital, Munich, Germany (K.K.B.)
| | - Thomas Schultz
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital and Harvard Medical School, 149 Thirteenth St,
Charlestown, MA 02129 (F.J.D., T.R.B., M.C.C., A.E.K., C.P.B.); Department of
Radiology, Charité-Universitätsmedizin Berlin, corporate member of
Freie Universität Berlin and Humboldt Universität zu Berlin,
Berlin, Germany (F.J.D., L.D., F.A.M., F.B., L.J.); Department of Pediatric
Oncology, Dana-Farber Cancer Institute, Boston, Mass (L.J.); Department of
Diagnostic and Interventional Radiology, Technical University of Munich, Munich,
Germany (L.C.A.); Mass General Brigham Data Science Office, Boston, Mass (J.S.,
T.S., C.P.B.); Microsoft Health and Life Sciences (HLS), Redmond, Wash (J.M.);
Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
(K.K.B.); Department of Radiology and Nuclear Medicine, German Heart Center
Munich, Munich, Germany (K.K.B.); and Department of Cardiovascular Radiology and
Nuclear Medicine, Technical University of Munich, School of Medicine and Health,
German Heart Center, TUM University Hospital, Munich, Germany (K.K.B.)
| | - Albert E. Kim
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital and Harvard Medical School, 149 Thirteenth St,
Charlestown, MA 02129 (F.J.D., T.R.B., M.C.C., A.E.K., C.P.B.); Department of
Radiology, Charité-Universitätsmedizin Berlin, corporate member of
Freie Universität Berlin and Humboldt Universität zu Berlin,
Berlin, Germany (F.J.D., L.D., F.A.M., F.B., L.J.); Department of Pediatric
Oncology, Dana-Farber Cancer Institute, Boston, Mass (L.J.); Department of
Diagnostic and Interventional Radiology, Technical University of Munich, Munich,
Germany (L.C.A.); Mass General Brigham Data Science Office, Boston, Mass (J.S.,
T.S., C.P.B.); Microsoft Health and Life Sciences (HLS), Redmond, Wash (J.M.);
Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
(K.K.B.); Department of Radiology and Nuclear Medicine, German Heart Center
Munich, Munich, Germany (K.K.B.); and Department of Cardiovascular Radiology and
Nuclear Medicine, Technical University of Munich, School of Medicine and Health,
German Heart Center, TUM University Hospital, Munich, Germany (K.K.B.)
| | - Jameson Merkow
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital and Harvard Medical School, 149 Thirteenth St,
Charlestown, MA 02129 (F.J.D., T.R.B., M.C.C., A.E.K., C.P.B.); Department of
Radiology, Charité-Universitätsmedizin Berlin, corporate member of
Freie Universität Berlin and Humboldt Universität zu Berlin,
Berlin, Germany (F.J.D., L.D., F.A.M., F.B., L.J.); Department of Pediatric
Oncology, Dana-Farber Cancer Institute, Boston, Mass (L.J.); Department of
Diagnostic and Interventional Radiology, Technical University of Munich, Munich,
Germany (L.C.A.); Mass General Brigham Data Science Office, Boston, Mass (J.S.,
T.S., C.P.B.); Microsoft Health and Life Sciences (HLS), Redmond, Wash (J.M.);
Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
(K.K.B.); Department of Radiology and Nuclear Medicine, German Heart Center
Munich, Munich, Germany (K.K.B.); and Department of Cardiovascular Radiology and
Nuclear Medicine, Technical University of Munich, School of Medicine and Health,
German Heart Center, TUM University Hospital, Munich, Germany (K.K.B.)
| | | | | | - Sarah Atzen
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital and Harvard Medical School, 149 Thirteenth St,
Charlestown, MA 02129 (F.J.D., T.R.B., M.C.C., A.E.K., C.P.B.); Department of
Radiology, Charité-Universitätsmedizin Berlin, corporate member of
Freie Universität Berlin and Humboldt Universität zu Berlin,
Berlin, Germany (F.J.D., L.D., F.A.M., F.B., L.J.); Department of Pediatric
Oncology, Dana-Farber Cancer Institute, Boston, Mass (L.J.); Department of
Diagnostic and Interventional Radiology, Technical University of Munich, Munich,
Germany (L.C.A.); Mass General Brigham Data Science Office, Boston, Mass (J.S.,
T.S., C.P.B.); Microsoft Health and Life Sciences (HLS), Redmond, Wash (J.M.);
Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
(K.K.B.); Department of Radiology and Nuclear Medicine, German Heart Center
Munich, Munich, Germany (K.K.B.); and Department of Cardiovascular Radiology and
Nuclear Medicine, Technical University of Munich, School of Medicine and Health,
German Heart Center, TUM University Hospital, Munich, Germany (K.K.B.)
| |
Collapse
|
22
|
Hu X, Gu L, Kobayashi K, Liu L, Zhang M, Harada T, Summers RM, Zhu Y. Interpretable medical image Visual Question Answering via multi-modal relationship graph learning. Med Image Anal 2024; 97:103279. [PMID: 39079429 DOI: 10.1016/j.media.2024.103279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 05/14/2024] [Accepted: 07/15/2024] [Indexed: 08/30/2024]
Abstract
Medical Visual Question Answering (VQA) is an important task in medical multi-modal Large Language Models (LLMs), aiming to answer clinically relevant questions regarding input medical images. This technique has the potential to improve the efficiency of medical professionals while relieving the burden on the public health system, particularly in resource-poor countries. However, existing medical VQA datasets are small and only contain simple questions (equivalent to classification tasks), which lack semantic reasoning and clinical knowledge. Our previous work proposed a clinical knowledge-driven image difference VQA benchmark using a rule-based approach (Hu et al., 2023). However, given the same breadth of information coverage, the rule-based approach shows an 85% error rate on extracted labels. We trained an LLM method to extract labels with 62% increased accuracy. We also comprehensively evaluated our labels with 2 clinical experts on 100 samples to help us fine-tune the LLM. Based on the trained LLM model, we proposed a large-scale medical VQA dataset, Medical-CXR-VQA, using LLMs focused on chest X-ray images. The questions involved detailed information, such as abnormalities, locations, levels, and types. Based on this dataset, we proposed a novel VQA method by constructing three different relationship graphs: spatial relationships, semantic relationships, and implicit relationship graphs on the image regions, questions, and semantic labels. We leveraged graph attention to learn the logical reasoning paths for different questions. These learned graph VQA reasoning paths can be further used for LLM prompt engineering and chain-of-thought, which are crucial for further fine-tuning and training multi-modal large language models. Moreover, we demonstrate that our approach has the qualities of evidence and faithfulness, which are crucial in the clinical field. The code and the dataset is available at https://github.com/Holipori/Medical-CXR-VQA.
Collapse
Affiliation(s)
- Xinyue Hu
- The University of Texas Arlington, Arlington, 76010, TX, USA
| | - Lin Gu
- RIKEN, Tokyo, Japan; University of Tokyo, Tokyo, Japan
| | | | - Liangchen Liu
- National Institutes of Health Clinical Center, Bethesda, 20892, MD, USA
| | - Mengliang Zhang
- The University of Texas Arlington, Arlington, 76010, TX, USA
| | | | - Ronald M Summers
- National Institutes of Health Clinical Center, Bethesda, 20892, MD, USA
| | - Yingying Zhu
- The University of Texas Arlington, Arlington, 76010, TX, USA.
| |
Collapse
|
23
|
Reale-Nosei G, Amador-Domínguez E, Serrano E. From vision to text: A comprehensive review of natural image captioning in medical diagnosis and radiology report generation. Med Image Anal 2024; 97:103264. [PMID: 39013207 DOI: 10.1016/j.media.2024.103264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 04/25/2024] [Accepted: 07/01/2024] [Indexed: 07/18/2024]
Abstract
Natural Image Captioning (NIC) is an interdisciplinary research area that lies within the intersection of Computer Vision (CV) and Natural Language Processing (NLP). Several works have been presented on the subject, ranging from the early template-based approaches to the more recent deep learning-based methods. This paper conducts a survey in the area of NIC, especially focusing on its applications for Medical Image Captioning (MIC) and Diagnostic Captioning (DC) in the field of radiology. A review of the state-of-the-art is conducted summarizing key research works in NIC and DC to provide a wide overview on the subject. These works include existing NIC and MIC models, datasets, evaluation metrics, and previous reviews in the specialized literature. The revised work is thoroughly analyzed and discussed, highlighting the limitations of existing approaches and their potential implications in real clinical practice. Similarly, future potential research lines are outlined on the basis of the detected limitations.
Collapse
Affiliation(s)
- Gabriel Reale-Nosei
- ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.
| | - Elvira Amador-Domínguez
- Ontology Engineering Group, Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Departamento de Sistemas Informáticos, ETSI Sistemas Informáticos, Universidad Politécnica de Madrid, 28031 Madrid, Spain.
| | - Emilio Serrano
- Ontology Engineering Group, Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.
| |
Collapse
|
24
|
Santomartino SM, Zech JR, Hall K, Jeudy J, Parekh V, Yi PH, Weintraub E. Evaluating the Performance and Bias of Natural Language Processing Tools in Labeling Chest Radiograph Reports. Radiology 2024; 313:e232746. [PMID: 39436298 PMCID: PMC11535863 DOI: 10.1148/radiol.232746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 08/12/2024] [Accepted: 08/20/2024] [Indexed: 10/23/2024]
Abstract
Background Natural language processing (NLP) is commonly used to annotate radiology datasets for training deep learning (DL) models. However, the accuracy and potential biases of these NLP methods have not been thoroughly investigated, particularly across different demographic groups. Purpose To evaluate the accuracy and demographic bias of four NLP radiology report labeling tools on two chest radiograph datasets. Materials and Methods This retrospective study, performed between April 2022 and April 2024, evaluated chest radiograph report labeling using four NLP tools (CheXpert [rule-based], RadReportAnnotator [RRA; DL-based], OpenAI's GPT-4 [DL-based], cTAKES [hybrid]) on a subset of the Medical Information Mart for Intensive Care (MIMIC) chest radiograph dataset balanced for representation of age, sex, and race and ethnicity (n = 692) and the entire Indiana University (IU) chest radiograph dataset (n = 3665). Three board-certified radiologists annotated the chest radiograph reports for 14 thoracic disease labels. NLP tool performance was evaluated using several metrics, including accuracy and error rate. Bias was evaluated by comparing performance between demographic subgroups using the Pearson χ2 test. Results The IU dataset included 3665 patients (mean age, 49.7 years ± 17 [SD]; 1963 female), while the MIMIC dataset included 692 patients (mean age, 54.1 years ± 23.1; 357 female). All four NLP tools demonstrated high accuracy across findings in the IU and MIMIC datasets, as follows: CheXpert (92.6% [47 516 of 51 310], 90.2% [8742 of 9688]), RRA (82.9% [19 746 of 23 829], 92.2% [2870 of 3114]), GPT-4 (94.3% [45 586 of 48 342], 91.6% [6721 of 7336]), and cTAKES (84.7% [43 436 of 51 310], 88.7% [8597 of 9688]). RRA and cTAKES had higher accuracy (P < .001) on the MIMIC dataset, while CheXpert and GPT-4 had higher accuracy on the IU dataset. Differences (P < .001) in error rates were observed across age groups for all NLP tools except RRA on the MIMIC dataset, with the highest error rates for CheXpert, RRA, and cTAKES in patients older than 80 years (mean, 15.8% ± 5.0) and the highest error rate for GPT-4 in patients 60-80 years of age (8.3%). Conclusion Although commonly used NLP tools for chest radiograph report annotation are accurate when evaluating reports in aggregate, demographic subanalyses showed significant bias, with poorer performance in older patients. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Cai in this issue.
Collapse
Affiliation(s)
- Samantha M. Santomartino
- From Drexel University College of Medicine, Philadelphia, Pa
(S.M.S.); Department of Radiology, Columbia University Irving Medical Center,
New York, NY (J.R.Z.); Department of Radiology, Wake Forest University Health
Sciences Center, Winston-Salem, NC (K.H.); Department of Diagnostic Radiology
and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, Md
(J.J., V.P.); and Department of Diagnostic Imaging, St. Jude Children’s
Research Hospital, 262 Danny Thomas Plc, Memphis, TN 38105-3678 (P.H.Y.)
| | - John R. Zech
- From Drexel University College of Medicine, Philadelphia, Pa
(S.M.S.); Department of Radiology, Columbia University Irving Medical Center,
New York, NY (J.R.Z.); Department of Radiology, Wake Forest University Health
Sciences Center, Winston-Salem, NC (K.H.); Department of Diagnostic Radiology
and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, Md
(J.J., V.P.); and Department of Diagnostic Imaging, St. Jude Children’s
Research Hospital, 262 Danny Thomas Plc, Memphis, TN 38105-3678 (P.H.Y.)
| | - Kent Hall
- From Drexel University College of Medicine, Philadelphia, Pa
(S.M.S.); Department of Radiology, Columbia University Irving Medical Center,
New York, NY (J.R.Z.); Department of Radiology, Wake Forest University Health
Sciences Center, Winston-Salem, NC (K.H.); Department of Diagnostic Radiology
and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, Md
(J.J., V.P.); and Department of Diagnostic Imaging, St. Jude Children’s
Research Hospital, 262 Danny Thomas Plc, Memphis, TN 38105-3678 (P.H.Y.)
| | - Jean Jeudy
- From Drexel University College of Medicine, Philadelphia, Pa
(S.M.S.); Department of Radiology, Columbia University Irving Medical Center,
New York, NY (J.R.Z.); Department of Radiology, Wake Forest University Health
Sciences Center, Winston-Salem, NC (K.H.); Department of Diagnostic Radiology
and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, Md
(J.J., V.P.); and Department of Diagnostic Imaging, St. Jude Children’s
Research Hospital, 262 Danny Thomas Plc, Memphis, TN 38105-3678 (P.H.Y.)
| | - Vishwa Parekh
- From Drexel University College of Medicine, Philadelphia, Pa
(S.M.S.); Department of Radiology, Columbia University Irving Medical Center,
New York, NY (J.R.Z.); Department of Radiology, Wake Forest University Health
Sciences Center, Winston-Salem, NC (K.H.); Department of Diagnostic Radiology
and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, Md
(J.J., V.P.); and Department of Diagnostic Imaging, St. Jude Children’s
Research Hospital, 262 Danny Thomas Plc, Memphis, TN 38105-3678 (P.H.Y.)
| | - Paul H. Yi
- From Drexel University College of Medicine, Philadelphia, Pa
(S.M.S.); Department of Radiology, Columbia University Irving Medical Center,
New York, NY (J.R.Z.); Department of Radiology, Wake Forest University Health
Sciences Center, Winston-Salem, NC (K.H.); Department of Diagnostic Radiology
and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, Md
(J.J., V.P.); and Department of Diagnostic Imaging, St. Jude Children’s
Research Hospital, 262 Danny Thomas Plc, Memphis, TN 38105-3678 (P.H.Y.)
| | - Elizabeth Weintraub
- From Drexel University College of Medicine, Philadelphia, Pa
(S.M.S.); Department of Radiology, Columbia University Irving Medical Center,
New York, NY (J.R.Z.); Department of Radiology, Wake Forest University Health
Sciences Center, Winston-Salem, NC (K.H.); Department of Diagnostic Radiology
and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, Md
(J.J., V.P.); and Department of Diagnostic Imaging, St. Jude Children’s
Research Hospital, 262 Danny Thomas Plc, Memphis, TN 38105-3678 (P.H.Y.)
| |
Collapse
|
25
|
Shurrab S, Guerra-Manzanares A, E Shamout F. Multimodal masked siamese network improves chest X-ray representation learning. Sci Rep 2024; 14:22516. [PMID: 39341871 PMCID: PMC11439023 DOI: 10.1038/s41598-024-74043-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 09/23/2024] [Indexed: 10/01/2024] Open
Abstract
Self-supervised learning methods for medical images primarily rely on the imaging modality during pretraining. Although such approaches deliver promising results, they do not take advantage of the associated patient or scan information collected within Electronic Health Records (EHR). This study aims to develop a multimodal pretraining approach for chest radiographs that considers EHR data incorporation as an additional modality that during training. We propose to incorporate EHR data during self-supervised pretraining with a Masked Siamese Network (MSN) to enhance the quality of chest radiograph representations. We investigate three types of EHR data, including demographic, scan metadata, and inpatient stay information. We evaluate the multimodal MSN on three publicly available chest X-ray datasets, MIMIC-CXR, CheXpert, and NIH-14, using two vision transformer (ViT) backbones, specifically ViT-Tiny and ViT-Small. In assessing the quality of the representations through linear evaluation, our proposed method demonstrates significant improvement compared to vanilla MSN and state-of-the-art self-supervised learning baselines. In particular, our proposed method achieves an improvement of of 2% in the Area Under the Receiver Operating Characteristic Curve (AUROC) compared to vanilla MSN and 5% to 8% compared to other baselines, including uni-modal ones. Furthermore, our findings reveal that demographic features provide the most significant performance improvement. Our work highlights the potential of EHR-enhanced self-supervised pretraining for medical imaging and opens opportunities for future research to address limitations in existing representation learning methods for other medical imaging modalities, such as neuro-, ophthalmic, and sonar imaging.
Collapse
Affiliation(s)
- Saeed Shurrab
- New York University Abu Dhabi, Computer Engineering, Abu Dhabi, 129188, UAE
| | | | - Farah E Shamout
- New York University Abu Dhabi, Computer Engineering, Abu Dhabi, 129188, UAE.
| |
Collapse
|
26
|
Wu X, Xu Z, Tong RKY. Continual learning in medical image analysis: A survey. Comput Biol Med 2024; 182:109206. [PMID: 39332115 DOI: 10.1016/j.compbiomed.2024.109206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 06/24/2024] [Accepted: 09/22/2024] [Indexed: 09/29/2024]
Abstract
In the dynamic realm of practical clinical scenarios, Continual Learning (CL) has gained increasing interest in medical image analysis due to its potential to address major challenges associated with data privacy, model adaptability, memory inefficiency, prediction robustness and detection accuracy. In general, the primary challenge in adapting and advancing CL remains catastrophic forgetting. Beyond this challenge, recent years have witnessed a growing body of work that expands our comprehension and application of continual learning in the medical domain, highlighting its practical significance and intricacy. In this paper, we present an in-depth and up-to-date review of the application of CL in medical image analysis. Our discussion delves into the strategies employed to address specific tasks within the medical domain, categorizing existing CL methods into three settings: Task-Incremental Learning, Class-Incremental Learning, and Domain-Incremental Learning. These settings are further subdivided based on representative learning strategies, allowing us to assess their strengths and weaknesses in the context of various medical scenarios. By establishing a correlation between each medical challenge and the corresponding insights provided by CL, we provide a comprehensive understanding of the potential impact of these techniques. To enhance the utility of our review, we provide an overview of the commonly used benchmark medical datasets and evaluation metrics in the field. Through a comprehensive comparison, we discuss promising future directions for the application of CL in medical image analysis. A comprehensive list of studies is being continuously updated at https://github.com/xw1519/Continual-Learning-Medical-Adaptation.
Collapse
Affiliation(s)
- Xinyao Wu
- Department of Biomedical Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong, China.
| | - Zhe Xu
- Department of Biomedical Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong, China; Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
| | - Raymond Kai-Yu Tong
- Department of Biomedical Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong, China.
| |
Collapse
|
27
|
Shah STH, Shah SAH, Khan II, Imran A, Shah SBH, Mehmood A, Qureshi SA, Raza M, Di Terlizzi A, Cavaglià M, Deriu MA. Data-driven classification and explainable-AI in the field of lung imaging. Front Big Data 2024; 7:1393758. [PMID: 39364222 PMCID: PMC11446784 DOI: 10.3389/fdata.2024.1393758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 09/03/2024] [Indexed: 10/05/2024] Open
Abstract
Detecting lung diseases in medical images can be quite challenging for radiologists. In some cases, even experienced experts may struggle with accurately diagnosing chest diseases, leading to potential inaccuracies due to complex or unseen biomarkers. This review paper delves into various datasets and machine learning techniques employed in recent research for lung disease classification, focusing on pneumonia analysis using chest X-ray images. We explore conventional machine learning methods, pretrained deep learning models, customized convolutional neural networks (CNNs), and ensemble methods. A comprehensive comparison of different classification approaches is presented, encompassing data acquisition, preprocessing, feature extraction, and classification using machine vision, machine and deep learning, and explainable-AI (XAI). Our analysis highlights the superior performance of transfer learning-based methods using CNNs and ensemble models/features for lung disease classification. In addition, our comprehensive review offers insights for researchers in other medical domains too who utilize radiological images. By providing a thorough overview of various techniques, our work enables the establishment of effective strategies and identification of suitable methods for a wide range of challenges. Currently, beyond traditional evaluation metrics, researchers emphasize the importance of XAI techniques in machine and deep learning models and their applications in classification tasks. This incorporation helps in gaining a deeper understanding of their decision-making processes, leading to improved trust, transparency, and overall clinical decision-making. Our comprehensive review serves as a valuable resource for researchers and practitioners seeking not only to advance the field of lung disease detection using machine learning and XAI but also from other diverse domains.
Collapse
Affiliation(s)
- Syed Taimoor Hussain Shah
- PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy
| | - Syed Adil Hussain Shah
- PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy
- Department of Research and Development (R&D), GPI SpA, Trento, Italy
| | - Iqra Iqbal Khan
- Department of Computer Science, Bahauddin Zakariya University, Multan, Pakistan
| | - Atif Imran
- College of Electrical and Mechanical Engineering, National University of Sciences and Technology, Rawalpindi, Pakistan
| | - Syed Baqir Hussain Shah
- Department of Computer Science, Commission on Science and Technology for Sustainable Development in the South (COMSATS) University Islamabad (CUI), Wah Campus, Wah, Pakistan
| | - Atif Mehmood
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, China
- Zhejiang Institute of Photoelectronics & Zhejiang Institute for Advanced Light Source, Zhejiang Normal University, Jinhua, Zhejiang, China
| | - Shahzad Ahmad Qureshi
- Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad, Pakistan
| | - Mudassar Raza
- Department of Computer Science, Namal University Mianwali, Mianwali, Pakistan
- Department of Computer Science, Heavy Industries Taxila Education City (HITEC), University of Taxila, Taxila, Pakistan
| | | | - Marco Cavaglià
- PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy
| | - Marco Agostino Deriu
- PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy
| |
Collapse
|
28
|
Han T, Žigutytė L, Huck L, Huppertz MS, Siepmann R, Gandelsman Y, Blüthgen C, Khader F, Kuhl C, Nebelung S, Kather JN, Truhn D. Reconstruction of patient-specific confounders in AI-based radiologic image interpretation using generative pretraining. Cell Rep Med 2024; 5:101713. [PMID: 39241771 PMCID: PMC11528237 DOI: 10.1016/j.xcrm.2024.101713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 06/29/2024] [Accepted: 08/13/2024] [Indexed: 09/09/2024]
Abstract
Reliably detecting potentially misleading patterns in automated diagnostic assistance systems, such as those powered by artificial intelligence (AI), is crucial for instilling user trust and ensuring reliability. Current techniques fall short in visualizing such confounding factors. We propose DiffChest, a self-conditioned diffusion model trained on 515,704 chest radiographs from 194,956 patients across the US and Europe. DiffChest provides patient-specific explanations and visualizes confounding factors that might mislead the model. The high inter-reader agreement, with Fleiss' kappa values of 0.8 or higher, validates its capability to identify treatment-related confounders. Confounders are accurately detected with 10%-100% prevalence rates. The pretraining process optimizes the model for relevant imaging information, resulting in excellent diagnostic accuracy for 11 chest conditions, including pleural effusion and heart insufficiency. Our findings highlight the potential of diffusion models in medical image classification, providing insights into confounding factors and enhancing model robustness and reliability.
Collapse
Affiliation(s)
- Tianyu Han
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, 52074 Aachen, Germany.
| | - Laura Žigutytė
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, 01307 Dresden, Germany
| | - Luisa Huck
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, 52074 Aachen, Germany
| | - Marc Sebastian Huppertz
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, 52074 Aachen, Germany
| | - Robert Siepmann
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, 52074 Aachen, Germany
| | - Yossi Gandelsman
- Department of Electrical Engineering and Computer Science, UC Berkeley, Berkeley, CA, USA
| | - Christian Blüthgen
- Institute for Diagnostic and Interventional Radiology, University Hospital Zurich, 8006 Zurich, Switzerland; Center for Artificial Intelligence in Medicine and Imaging (AIMI), Stanford University, Stanford, CA, USA
| | - Firas Khader
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, 52074 Aachen, Germany
| | - Christiane Kuhl
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, 52074 Aachen, Germany
| | - Sven Nebelung
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, 52074 Aachen, Germany
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, 01307 Dresden, Germany; Department of Medicine I, University Hospital Dresden, 01307 Dresden, Germany; Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, 52074 Aachen, Germany
| |
Collapse
|
29
|
Wang S, Zhao Z, Ouyang X, Liu T, Wang Q, Shen D. Interactive computer-aided diagnosis on medical image using large language models. COMMUNICATIONS ENGINEERING 2024; 3:133. [PMID: 39284899 PMCID: PMC11405679 DOI: 10.1038/s44172-024-00271-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 08/20/2024] [Indexed: 09/22/2024]
Abstract
Computer-aided diagnosis (CAD) has advanced medical image analysis, while large language models (LLMs) have shown potential in clinical applications. However, LLMs struggle to interpret medical images, which are critical for decision-making. Here we show a strategy integrating LLMs with CAD networks. The framework uses LLMs' medical knowledge and reasoning to enhance CAD network outputs, such as diagnosis, lesion segmentation, and report generation, by summarizing information in natural language. The generated reports are of higher quality and can improve the performance of vision-based CAD models. In chest X-rays, an LLM using ChatGPT improved diagnosis performance by 16.42 percentage points compared to state-of-the-art models, while GPT-3 provided a 15.00 percentage point F1-score improvement. Our strategy allows accurate report generation and creates a patient-friendly interactive system, unlike conventional CAD systems only understood by professionals. This approach has the potential to revolutionize clinical decision-making and patient communication.
Collapse
Affiliation(s)
- Sheng Wang
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- Department of Research and Development, Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
| | - Zihao Zhao
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
| | - Xi Ouyang
- Department of Research and Development, Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
| | - Tianming Liu
- School of Computing, University of Georgia, Athens, GA, USA
| | - Qian Wang
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China.
- Shanghai Clinical Research and Trial Center, Shanghai, China.
| | - Dinggang Shen
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China.
- Department of Research and Development, Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China.
- Shanghai Clinical Research and Trial Center, Shanghai, China.
| |
Collapse
|
30
|
Huang Q, Li G. Knowledge graph based reasoning in medical image analysis: A scoping review. Comput Biol Med 2024; 182:109100. [PMID: 39244959 DOI: 10.1016/j.compbiomed.2024.109100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 08/04/2024] [Accepted: 08/31/2024] [Indexed: 09/10/2024]
Abstract
Automated computer-aided diagnosis (CAD) is becoming more significant in the field of medicine due to advancements in computer hardware performance and the progress of artificial intelligence. The knowledge graph is a structure for visually representing knowledge facts. In the last decade, a large body of work based on knowledge graphs has effectively improved the organization and interpretability of large-scale complex knowledge. Introducing knowledge graph inference into CAD is a research direction with significant potential. In this review, we briefly review the basic principles and application methods of knowledge graphs firstly. Then, we systematically organize and analyze the research and application of knowledge graphs in medical imaging-assisted diagnosis. We also summarize the shortcomings of the current research, such as medical data barriers and deficiencies, low utilization of multimodal information, and weak interpretability. Finally, we propose future research directions with possibilities and potentials to address the shortcomings of current approaches.
Collapse
Affiliation(s)
- Qinghua Huang
- School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, 127 West Youyi Road, Beilin District, Xi'an, 710072, Shaanxi, China.
| | - Guanghui Li
- School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, 127 West Youyi Road, Beilin District, Xi'an, 710072, Shaanxi, China; School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Road, Chang'an District, Xi'an, 710129, Shaanxi, China.
| |
Collapse
|
31
|
Huang W, Li C, Zhou HY, Yang H, Liu J, Liang Y, Zheng H, Zhang S, Wang S. Enhancing representation in radiography-reports foundation model: a granular alignment algorithm using masked contrastive learning. Nat Commun 2024; 15:7620. [PMID: 39223122 PMCID: PMC11369198 DOI: 10.1038/s41467-024-51749-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 08/15/2024] [Indexed: 09/04/2024] Open
Abstract
Recently, multi-modal vision-language foundation models have gained significant attention in the medical field. While these models offer great opportunities, they still face crucial challenges, such as the requirement for fine-grained knowledge understanding in computer-aided diagnosis and the capability of utilizing very limited or even no task-specific labeled data in real-world clinical applications. In this study, we present MaCo, a masked contrastive chest X-ray foundation model that tackles these challenges. MaCo explores masked contrastive learning to simultaneously achieve fine-grained image understanding and zero-shot learning for a variety of medical imaging tasks. It designs a correlation weighting mechanism to adjust the correlation between masked chest X-ray image patches and their corresponding reports, thereby enhancing the model's representation learning capabilities. To evaluate the performance of MaCo, we conducted extensive experiments using 6 well-known open-source X-ray datasets. The experimental results demonstrate the superiority of MaCo over 10 state-of-the-art approaches across tasks such as classification, segmentation, detection, and phrase grounding. These findings highlight the significant potential of MaCo in advancing a wide range of medical image analysis tasks.
Collapse
Affiliation(s)
- Weijian Huang
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Pengcheng Laboratory, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Cheng Li
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Hong-Yu Zhou
- Department of Biomedical Informatics, Harvard Medical University, Boston, MA, USA
| | - Hao Yang
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Pengcheng Laboratory, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jiarun Liu
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Pengcheng Laboratory, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | | | - Hairong Zheng
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Shaoting Zhang
- Qingyuan Research Institute, Shanghai Jiao Tong University, Shanghai, China
| | - Shanshan Wang
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| |
Collapse
|
32
|
Zhang Z, Jiang A. Interactive dual-stream contrastive learning for radiology report generation. J Biomed Inform 2024; 157:104718. [PMID: 39209086 DOI: 10.1016/j.jbi.2024.104718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 08/08/2024] [Accepted: 08/25/2024] [Indexed: 09/04/2024]
Abstract
Radiology report generation automates diagnostic narrative synthesis from medical imaging data. Current report generation methods primarily employ knowledge graphs for image enhancement, neglecting the interpretability and guiding function of the knowledge graphs themselves. Additionally, few approaches leverage the stable modal alignment information from multimodal pre-trained models to facilitate the generation of radiology reports. We propose the Terms-Guided Radiology Report Generation (TGR), a simple and practical model for generating reports guided primarily by anatomical terms. Specifically, we utilize a dual-stream visual feature extraction module comprised of detail extraction module and a frozen multimodal pre-trained model to separately extract visual detail features and semantic features. Furthermore, a Visual Enhancement Module (VEM) is proposed to further enrich the visual features, thereby facilitating the generation of a list of anatomical terms. We integrate anatomical terms with image features and proceed to engage contrastive learning with frozen text embeddings, utilizing the stable feature space from these embeddings to boost modal alignment capabilities further. Our model incorporates the capability for manual input, enabling it to generate a list of organs for specifically focused abnormal areas or to produce more accurate single-sentence descriptions based on selected anatomical terms. Comprehensive experiments demonstrate the effectiveness of our method in report generation tasks, our TGR-S model reduces training parameters by 38.9% while performing comparably to current state-of-the-art models, and our TGR-B model exceeds the best baseline models across multiple metrics.
Collapse
Affiliation(s)
- Ziqi Zhang
- College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030600, China
| | - Ailian Jiang
- College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030600, China.
| |
Collapse
|
33
|
Lin M, Li T, Sun Z, Holste G, Ding Y, Wang F, Shih G, Peng Y. Improving Fairness of Automated Chest Radiograph Diagnosis by Contrastive Learning. Radiol Artif Intell 2024; 6:e230342. [PMID: 39166973 PMCID: PMC11449211 DOI: 10.1148/ryai.230342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 07/21/2024] [Accepted: 08/08/2024] [Indexed: 08/23/2024]
Abstract
Purpose To develop an artificial intelligence model that uses supervised contrastive learning (SCL) to minimize bias in chest radiograph diagnosis. Materials and Methods In this retrospective study, the proposed method was evaluated on two datasets: the Medical Imaging and Data Resource Center (MIDRC) dataset with 77 887 chest radiographs in 27 796 patients collected as of April 20, 2023, for COVID-19 diagnosis and the National Institutes of Health ChestX-ray14 dataset with 112 120 chest radiographs in 30 805 patients collected between 1992 and 2015. In the ChestX-ray14 dataset, thoracic abnormalities included atelectasis, cardiomegaly, effusion, infiltration, mass, nodule, pneumonia, pneumothorax, consolidation, edema, emphysema, fibrosis, pleural thickening, and hernia. The proposed method used SCL with carefully selected positive and negative samples to generate fair image embeddings, which were fine-tuned for subsequent tasks to reduce bias in chest radiograph diagnosis. The method was evaluated using the marginal area under the receiver operating characteristic curve difference (∆mAUC). Results The proposed model showed a significant decrease in bias across all subgroups compared with the baseline models, as evidenced by a paired t test (P < .001). The ∆mAUCs obtained by the proposed method were 0.01 (95% CI: 0.01, 0.01), 0.21 (95% CI: 0.21, 0.21), and 0.10 (95% CI: 0.10, 0.10) for sex, race, and age subgroups, respectively, on the MIDRC dataset and 0.01 (95% CI: 0.01, 0.01) and 0.05 (95% CI: 0.05, 0.05) for sex and age subgroups, respectively, on the ChestX-ray14 dataset. Conclusion Employing SCL can mitigate bias in chest radiograph diagnosis, addressing concerns of fairness and reliability in deep learning-based diagnostic methods. Keywords: Thorax, Diagnosis, Supervised Learning, Convolutional Neural Network (CNN), Computer-aided Diagnosis (CAD) Supplemental material is available for this article. © RSNA, 2024 See also the commentary by Johnson in this issue.
Collapse
Affiliation(s)
- Mingquan Lin
- From the Departments of Population Health Sciences (M.L., Z.S., F.W., Y.P.) and Radiology (G.S.), Weill Cornell Medicine, 425 E 61st St, New York, NY 10065; Department of Surgery, University of Minnesota, Minneapolis, Minn (M.L.); and School of Information (T.L., Y.D.) and Department of Electrical and Computer Engineering (G.H.), The University of Texas at Austin, Austin, Tex
| | - Tianhao Li
- From the Departments of Population Health Sciences (M.L., Z.S., F.W., Y.P.) and Radiology (G.S.), Weill Cornell Medicine, 425 E 61st St, New York, NY 10065; Department of Surgery, University of Minnesota, Minneapolis, Minn (M.L.); and School of Information (T.L., Y.D.) and Department of Electrical and Computer Engineering (G.H.), The University of Texas at Austin, Austin, Tex
| | - Zhaoyi Sun
- From the Departments of Population Health Sciences (M.L., Z.S., F.W., Y.P.) and Radiology (G.S.), Weill Cornell Medicine, 425 E 61st St, New York, NY 10065; Department of Surgery, University of Minnesota, Minneapolis, Minn (M.L.); and School of Information (T.L., Y.D.) and Department of Electrical and Computer Engineering (G.H.), The University of Texas at Austin, Austin, Tex
| | - Gregory Holste
- From the Departments of Population Health Sciences (M.L., Z.S., F.W., Y.P.) and Radiology (G.S.), Weill Cornell Medicine, 425 E 61st St, New York, NY 10065; Department of Surgery, University of Minnesota, Minneapolis, Minn (M.L.); and School of Information (T.L., Y.D.) and Department of Electrical and Computer Engineering (G.H.), The University of Texas at Austin, Austin, Tex
| | - Ying Ding
- From the Departments of Population Health Sciences (M.L., Z.S., F.W., Y.P.) and Radiology (G.S.), Weill Cornell Medicine, 425 E 61st St, New York, NY 10065; Department of Surgery, University of Minnesota, Minneapolis, Minn (M.L.); and School of Information (T.L., Y.D.) and Department of Electrical and Computer Engineering (G.H.), The University of Texas at Austin, Austin, Tex
| | - Fei Wang
- From the Departments of Population Health Sciences (M.L., Z.S., F.W., Y.P.) and Radiology (G.S.), Weill Cornell Medicine, 425 E 61st St, New York, NY 10065; Department of Surgery, University of Minnesota, Minneapolis, Minn (M.L.); and School of Information (T.L., Y.D.) and Department of Electrical and Computer Engineering (G.H.), The University of Texas at Austin, Austin, Tex
| | - George Shih
- From the Departments of Population Health Sciences (M.L., Z.S., F.W., Y.P.) and Radiology (G.S.), Weill Cornell Medicine, 425 E 61st St, New York, NY 10065; Department of Surgery, University of Minnesota, Minneapolis, Minn (M.L.); and School of Information (T.L., Y.D.) and Department of Electrical and Computer Engineering (G.H.), The University of Texas at Austin, Austin, Tex
| | - Yifan Peng
- From the Departments of Population Health Sciences (M.L., Z.S., F.W., Y.P.) and Radiology (G.S.), Weill Cornell Medicine, 425 E 61st St, New York, NY 10065; Department of Surgery, University of Minnesota, Minneapolis, Minn (M.L.); and School of Information (T.L., Y.D.) and Department of Electrical and Computer Engineering (G.H.), The University of Texas at Austin, Austin, Tex
| |
Collapse
|
34
|
Yu K, Ghosh S, Liu Z, Deible C, Poynton CB, Batmanghelich K. Anatomy-specific Progression Classification in Chest Radiographs via Weakly Supervised Learning. Radiol Artif Intell 2024; 6:e230277. [PMID: 39046325 PMCID: PMC11427915 DOI: 10.1148/ryai.230277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 06/19/2024] [Accepted: 06/28/2024] [Indexed: 07/25/2024]
Abstract
Purpose To develop a machine learning approach for classifying disease progression in chest radiographs using weak labels automatically derived from radiology reports. Materials and Methods In this retrospective study, a twin neural network was developed to classify anatomy-specific disease progression into four categories: improved, unchanged, worsened, and new. A two-step weakly supervised learning approach was employed, pretraining the model on 243 008 frontal chest radiographs from 63 877 patients (mean age, 51.7 years ± 17.0 [SD]; 34 813 [55%] female) included in the MIMIC-CXR database and fine-tuning it on the subset with progression labels derived from consecutive studies. Model performance was evaluated for six pathologic observations on test datasets of unseen patients from the MIMIC-CXR database. Area under the receiver operating characteristic (AUC) analysis was used to evaluate classification performance. The algorithm is also capable of generating bounding-box predictions to localize areas of new progression. Recall, precision, and mean average precision were used to evaluate the new progression localization. One-tailed paired t tests were used to assess statistical significance. Results The model outperformed most baselines in progression classification, achieving macro AUC scores of 0.72 ± 0.004 for atelectasis, 0.75 ± 0.007 for consolidation, 0.76 ± 0.017 for edema, 0.81 ± 0.006 for effusion, 0.7 ± 0.032 for pneumonia, and 0.69 ± 0.01 for pneumothorax. For new observation localization, the model achieved mean average precision scores of 0.25 ± 0.03 for atelectasis, 0.34 ± 0.03 for consolidation, 0.33 ± 0.03 for edema, and 0.31 ± 0.03 for pneumothorax. Conclusion Disease progression classification models were developed on a large chest radiograph dataset, which can be used to monitor interval changes and detect new pathologic conditions on chest radiographs. Keywords: Prognosis, Unsupervised Learning, Transfer Learning, Convolutional Neural Network (CNN), Emergency Radiology, Named Entity Recognition Supplemental material is available for this article. © RSNA, 2024 See also commentary by Alves and Venkadesh in this issue.
Collapse
Affiliation(s)
- Ke Yu
- From the School of Computing and Information, University of Pittsburgh, Pittsburgh, Pa (K.Y., Z.L.); Department of Electrical and Computer Engineering, Boston University, 8 St. Mary’s St, Office 421, Boston, MA 02215 (S.G., K.B.); Department of Radiology, University of Pittsburgh, Pittsburgh, Pa (C.D.); and Chobanian & Avedisian School of Medicine, Boston University, Boston, Mass (C.B.P.)
| | - Shantanu Ghosh
- From the School of Computing and Information, University of Pittsburgh, Pittsburgh, Pa (K.Y., Z.L.); Department of Electrical and Computer Engineering, Boston University, 8 St. Mary’s St, Office 421, Boston, MA 02215 (S.G., K.B.); Department of Radiology, University of Pittsburgh, Pittsburgh, Pa (C.D.); and Chobanian & Avedisian School of Medicine, Boston University, Boston, Mass (C.B.P.)
| | - Zhexiong Liu
- From the School of Computing and Information, University of Pittsburgh, Pittsburgh, Pa (K.Y., Z.L.); Department of Electrical and Computer Engineering, Boston University, 8 St. Mary’s St, Office 421, Boston, MA 02215 (S.G., K.B.); Department of Radiology, University of Pittsburgh, Pittsburgh, Pa (C.D.); and Chobanian & Avedisian School of Medicine, Boston University, Boston, Mass (C.B.P.)
| | - Christopher Deible
- From the School of Computing and Information, University of Pittsburgh, Pittsburgh, Pa (K.Y., Z.L.); Department of Electrical and Computer Engineering, Boston University, 8 St. Mary’s St, Office 421, Boston, MA 02215 (S.G., K.B.); Department of Radiology, University of Pittsburgh, Pittsburgh, Pa (C.D.); and Chobanian & Avedisian School of Medicine, Boston University, Boston, Mass (C.B.P.)
| | - Clare B. Poynton
- From the School of Computing and Information, University of Pittsburgh, Pittsburgh, Pa (K.Y., Z.L.); Department of Electrical and Computer Engineering, Boston University, 8 St. Mary’s St, Office 421, Boston, MA 02215 (S.G., K.B.); Department of Radiology, University of Pittsburgh, Pittsburgh, Pa (C.D.); and Chobanian & Avedisian School of Medicine, Boston University, Boston, Mass (C.B.P.)
| | - Kayhan Batmanghelich
- From the School of Computing and Information, University of Pittsburgh, Pittsburgh, Pa (K.Y., Z.L.); Department of Electrical and Computer Engineering, Boston University, 8 St. Mary’s St, Office 421, Boston, MA 02215 (S.G., K.B.); Department of Radiology, University of Pittsburgh, Pittsburgh, Pa (C.D.); and Chobanian & Avedisian School of Medicine, Boston University, Boston, Mass (C.B.P.)
| |
Collapse
|
35
|
D'Ancona G, Savardi M, Massussi M, Van Der Valk V, Scherptong RWC, Signoroni A, Farina D, Murero M, Ince H, Benussi S, Curello S, Arslan F. Deep learning to predict long-term mortality from plain chest X-ray in patients referred for suspected coronary artery disease. J Thorac Dis 2024; 16:4914-4923. [PMID: 39268143 PMCID: PMC11388213 DOI: 10.21037/jtd-24-322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 06/24/2024] [Indexed: 09/15/2024]
Abstract
Background The hypothesis that a deep learning (DL) model can produce long-term prognostic information from chest X-ray (CXR) has already been confirmed within cancer screening programs. We summarize our experience with DL prediction of long-term mortality, from plain CXR, in patients referred for angina and coronary angiography. Methods Data of patients referred to an Italian academic hospital were analyzed retrospectively. We designed a deep convolutional neural network (DCNN) that, from CXR, could predict long-term mortality. External validation was performed on patients referred to a Dutch academic hospital. Results A total of 6,031 were used for model training (71%; n=4,259) and fine-tuning/validation (10%; n=602). Internal validation was performed with the remaining patients (19%; n=1,170). Patients' stratification followed the DL-CXR risk score quartiles division. Median follow-up was 6.1 years [interquartile range (IQR), 3.3-8.7 years]. We observed an increment in estimated mortality with the increase of DL-CXR risk score (low-risk 5%, moderate 17%, high 29%, very high 46%; P<0.001). The DL-CXR risk score predicted median follow-up outcome with an area under the curve (AUC) of 0.793 [95% confidence interval (CI): 0.759-0.827, sensitivity 78%, specificity 68%]. Prediction was better than that achieved using coronary angiography findings (AUC: 0.569, 95% CI: 0.52-0.61, P<0.001) and age (AUC: 0.735, 95% CI: 0.69-0.77, P<0.004). At Cox regression, the DL-CXR risk score predicted follow-up mortality (P<0.005, hazard ratio: 3.30, 95% CI: 2.35-4.64). External validation confirmed the DL-CXR risk score performance (AUC: 0.71, 95% CI: 0.49-0.92; sensitivity 0.838; specificity 0.338). Conclusions In patients referred for coronary angiogram because of angina, the DL-CXR risk score could be used to stratify mortality risk and predict long-term outcome better than age and coronary artery disease status.
Collapse
Affiliation(s)
- Giuseppe D'Ancona
- Department of Cardiology and Cardiovascular Clinical Research Unit, Vivantes Klinikum Urban and Neukölln, Berlin, Germany
| | - Mattia Savardi
- Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, Brescia, Italy
- Department of Information Engineering, University of Brescia, Brescia, Italy
| | - Mauro Massussi
- Cardiac Catheterization Laboratory, Department of Cardiothoracic, ASST Spedali Civili, Brescia, Italy
| | - Viktor Van Der Valk
- Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | | | - Alberto Signoroni
- Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, Brescia, Italy
- Department of Information Engineering, University of Brescia, Brescia, Italy
| | - Davide Farina
- Radiology 2, ASST Spedali Civili and Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy
| | - Monica Murero
- Department of Excellence in Social Sciences, University Federico II, Neaples, Italy
| | - Hüseyin Ince
- Department of Cardiology and Cardiovascular Clinical Research Unit, Vivantes Klinikum Urban and Neukölln, Berlin, Germany
| | - Stefano Benussi
- Department of Cardiac Surgery, Spedali Civili Brescia and University of Brescia, Brescia, Italy
| | - Salvatore Curello
- Cardiac Catheterization Laboratory, Department of Cardiothoracic, ASST Spedali Civili, Brescia, Italy
| | - Fatih Arslan
- Department of Cardiology, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
36
|
Wu Y, Huang IC, Huang X. Joint Imbalance Adaptation for Radiology Report Generation. RESEARCH SQUARE 2024:rs.3.rs-4837662. [PMID: 39257991 PMCID: PMC11384792 DOI: 10.21203/rs.3.rs-4837662/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Purpose Radiology report generation, translating radiological images into precise and clinically relevant description, may face the data imbalance challenge - medical tokens appear less frequently than regular tokens; and normal entries are significantly more than abnormal ones. However, very few studies consider the imbalance issues, not even with conjugate imbalance factors. Methods In this study, we propose a Joint Imbalance Adaptation (JIMA) model to promote task robustness by leveraging token and label imbalance. JIMA predicts entity distributions from images and generates reports based on these distributions and image features. We employ a hard-to-easy learning strategy that mitigates overfitting to frequent labels and tokens, thereby encouraging the model to focus more on rare labels and clinical tokens. Results JIMA shows notable improvements (16.75% - 50.50% on average) across evaluation metrics on IU X-ray and MIMIC-CXR datasets. Our ablation analysis proves that JIMA's enhanced handling of infrequent tokens and abnormal labels counts the major contribution. Human evaluation and case study experiments further validate that JIMA can generate more clinically accurate reports. Conclusion Data imbalance (e.g., infrequent tokens and abnormal labels) leads to the underperformance of radiology report generation. Our curriculum learning strategy successfully reduce data imbalance impacts by reducing overfitting on frequent patterns and underfitting on infrequent patterns. While data imbalance remains challenging, our approach opens new directions for the generation task.
Collapse
Affiliation(s)
- Yuexin Wu
- Department of Computer Science, University of Memphis, Memphis, 38152, TN, United States
| | - I-Chan Huang
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, 38105, TN, United States
| | - Xiaolei Huang
- Department of Computer Science, University of Memphis, Memphis, 38152, TN, United States
| |
Collapse
|
37
|
Lotter W. Acquisition parameters influence AI recognition of race in chest x-rays and mitigating these factors reduces underdiagnosis bias. Nat Commun 2024; 15:7465. [PMID: 39198519 PMCID: PMC11358468 DOI: 10.1038/s41467-024-52003-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 08/22/2024] [Indexed: 09/01/2024] Open
Abstract
A core motivation for the use of artificial intelligence (AI) in medicine is to reduce existing healthcare disparities. Yet, recent studies have demonstrated two distinct findings: (1) AI models can show performance biases in underserved populations, and (2) these same models can be directly trained to recognize patient demographics, such as predicting self-reported race from medical images alone. Here, we investigate how these findings may be related, with an end goal of reducing a previously identified underdiagnosis bias. Using two popular chest x-ray datasets, we first demonstrate that technical parameters related to image acquisition and processing influence AI models trained to predict patient race, where these results partly reflect underlying biases in the original clinical datasets. We then find that mitigating the observed differences through a demographics-independent calibration strategy reduces the previously identified bias. While many factors likely contribute to AI bias and demographics prediction, these results highlight the importance of carefully considering data acquisition and processing parameters in AI development and healthcare equity more broadly.
Collapse
Affiliation(s)
- William Lotter
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Pathology, Brigham & Women's Hospital, Boston, MA, USA.
- Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
38
|
Bluethgen C, Chambon P, Delbrouck JB, van der Sluijs R, Połacin M, Zambrano Chaves JM, Abraham TM, Purohit S, Langlotz CP, Chaudhari AS. A vision-language foundation model for the generation of realistic chest X-ray images. Nat Biomed Eng 2024:10.1038/s41551-024-01246-y. [PMID: 39187663 DOI: 10.1038/s41551-024-01246-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 07/28/2024] [Indexed: 08/28/2024]
Abstract
The paucity of high-quality medical imaging datasets could be mitigated by machine learning models that generate compositionally diverse images that faithfully represent medical concepts and pathologies. However, large vision-language models are trained on natural images, and the diversity distribution of the generated images substantially differs from that of medical images. Moreover, medical language involves specific and semantically rich vocabulary. Here we describe a domain-adaptation strategy for large vision-language models that overcomes distributional shifts. Specifically, by leveraging publicly available datasets of chest X-ray images and the corresponding radiology reports, we adapted a latent diffusion model pre-trained on pairs of natural images and text descriptors to generate diverse and visually plausible synthetic chest X-ray images (as confirmed by board-certified radiologists) whose appearance can be controlled with free-form medical text prompts. The domain-adaptation strategy for the text-conditioned synthesis of medical images can be used to augment training datasets and is a viable alternative to the sharing of real medical images for model training and fine-tuning.
Collapse
Affiliation(s)
- Christian Bluethgen
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA.
- Department of Radiology, Stanford University, Palo Alto, CA, USA.
- Diagnostic and Interventional Radiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland.
| | - Pierre Chambon
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
| | - Jean-Benoit Delbrouck
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
| | - Rogier van der Sluijs
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
| | - Małgorzata Połacin
- Department of Radiology, Stanford University, Palo Alto, CA, USA
- Diagnostic and Interventional Radiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Juan Manuel Zambrano Chaves
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | | | | | - Curtis P Langlotz
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | - Akshay S Chaudhari
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| |
Collapse
|
39
|
Reichenpfader D, Müller H, Denecke K. A scoping review of large language model based approaches for information extraction from radiology reports. NPJ Digit Med 2024; 7:222. [PMID: 39182008 PMCID: PMC11344824 DOI: 10.1038/s41746-024-01219-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 08/09/2024] [Indexed: 08/27/2024] Open
Abstract
Radiological imaging is a globally prevalent diagnostic method, yet the free text contained in radiology reports is not frequently used for secondary purposes. Natural Language Processing can provide structured data retrieved from these reports. This paper provides a summary of the current state of research on Large Language Model (LLM) based approaches for information extraction (IE) from radiology reports. We conduct a scoping review that follows the PRISMA-ScR guideline. Queries of five databases were conducted on August 1st 2023. Among the 34 studies that met inclusion criteria, only pre-transformer and encoder-based models are described. External validation shows a general performance decrease, although LLMs might improve generalizability of IE approaches. Reports related to CT and MRI examinations, as well as thoracic reports, prevail. Most common challenges reported are missing validation on external data and augmentation of the described methods. Different reporting granularities affect the comparability and transparency of approaches.
Collapse
Affiliation(s)
- Daniel Reichenpfader
- Institute for Patient-Centered Digital Health, Bern University of Applied Sciences, Biel/Bienne, Switzerland.
- Faculty of Medicine, University of Geneva, Geneva, Switzerland.
| | - Henning Müller
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
- Informatics Institute, HES-SO Valais-Wallis, Sierre, Switzerland
| | - Kerstin Denecke
- Institute for Patient-Centered Digital Health, Bern University of Applied Sciences, Biel/Bienne, Switzerland
| |
Collapse
|
40
|
Naseem U, Thapa S, Masood A. Advancing Accuracy in Multimodal Medical Tasks Through Bootstrapped Language-Image Pretraining (BioMedBLIP): Performance Evaluation Study. JMIR Med Inform 2024; 12:e56627. [PMID: 39102281 PMCID: PMC11333867 DOI: 10.2196/56627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 04/20/2024] [Accepted: 05/04/2024] [Indexed: 08/06/2024] Open
Abstract
BACKGROUND Medical image analysis, particularly in the context of visual question answering (VQA) and image captioning, is crucial for accurate diagnosis and educational purposes. OBJECTIVE Our study aims to introduce BioMedBLIP models, fine-tuned for VQA tasks using specialized medical data sets such as Radiology Objects in Context and Medical Information Mart for Intensive Care-Chest X-ray, and evaluate their performance in comparison to the state of the art (SOTA) original Bootstrapping Language-Image Pretraining (BLIP) model. METHODS We present 9 versions of BioMedBLIP across 3 downstream tasks in various data sets. The models are trained on a varying number of epochs. The findings indicate the strong overall performance of our models. We proposed BioMedBLIP for the VQA generation model, VQA classification model, and BioMedBLIP image caption model. We conducted pretraining in BLIP using medical data sets, producing an adapted BLIP model tailored for medical applications. RESULTS In VQA generation tasks, BioMedBLIP models outperformed the SOTA on the Semantically-Labeled Knowledge-Enhanced (SLAKE) data set, VQA in Radiology (VQA-RAD), and Image Cross-Language Evaluation Forum data sets. In VQA classification, our models consistently surpassed the SOTA on the SLAKE data set. Our models also showed competitive performance on the VQA-RAD and PathVQA data sets. Similarly, in image captioning tasks, our model beat the SOTA, suggesting the importance of pretraining with medical data sets. Overall, in 20 different data sets and task combinations, our BioMedBLIP excelled in 15 (75%) out of 20 tasks. BioMedBLIP represents a new SOTA in 15 (75%) out of 20 tasks, and our responses were rated higher in all 20 tasks (P<.005) in comparison to SOTA models. CONCLUSIONS Our BioMedBLIP models show promising performance and suggest that incorporating medical knowledge through pretraining with domain-specific medical data sets helps models achieve higher performance. Our models thus demonstrate their potential to advance medical image analysis, impacting diagnosis, medical education, and research. However, data quality, task-specific variability, computational resources, and ethical considerations should be carefully addressed. In conclusion, our models represent a contribution toward the synergy of artificial intelligence and medicine. We have made BioMedBLIP freely available, which will help in further advancing research in multimodal medical tasks.
Collapse
Affiliation(s)
- Usman Naseem
- School of Computing, Macquarie University, Sydney, Australia
| | | | - Anum Masood
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway
- Harvard Medical School, Harvard University, Boston, MA, United States
- Department of Radiology, Boston Children's Hospital, Boston, MA, United States
| |
Collapse
|
41
|
Luo X, Deng Z, Yang B, Luo MY. Pre-trained language models in medicine: A survey. Artif Intell Med 2024; 154:102904. [PMID: 38917600 DOI: 10.1016/j.artmed.2024.102904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 04/15/2024] [Accepted: 06/03/2024] [Indexed: 06/27/2024]
Abstract
With the rapid progress in Natural Language Processing (NLP), Pre-trained Language Models (PLM) such as BERT, BioBERT, and ChatGPT have shown great potential in various medical NLP tasks. This paper surveys the cutting-edge achievements in applying PLMs to various medical NLP tasks. Specifically, we first brief PLMS and outline the research of PLMs in medicine. Next, we categorise and discuss the types of tasks in medical NLP, covering text summarisation, question-answering, machine translation, sentiment analysis, named entity recognition, information extraction, medical education, relation extraction, and text mining. For each type of task, we first provide an overview of the basic concepts, the main methodologies, the advantages of applying PLMs, the basic steps of applying PLMs application, the datasets for training and testing, and the metrics for task evaluation. Subsequently, a summary of recent important research findings is presented, analysing their motivations, strengths vs weaknesses, similarities vs differences, and discussing potential limitations. Also, we assess the quality and influence of the research reviewed in this paper by comparing the citation count of the papers reviewed and the reputation and impact of the conferences and journals where they are published. Through these indicators, we further identify the most concerned research topics currently. Finally, we look forward to future research directions, including enhancing models' reliability, explainability, and fairness, to promote the application of PLMs in clinical practice. In addition, this survey also collect some download links of some model codes and the relevant datasets, which are valuable references for researchers applying NLP techniques in medicine and medical professionals seeking to enhance their expertise and healthcare service through AI technology.
Collapse
Affiliation(s)
- Xudong Luo
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Zhiqi Deng
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Binxia Yang
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Michael Y Luo
- Emmanuel College, Cambridge University, Cambridge, CB2 3AP, UK.
| |
Collapse
|
42
|
Huemann Z, Tie X, Hu J, Bradshaw TJ. ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:1652-1663. [PMID: 38485899 PMCID: PMC11300752 DOI: 10.1007/s10278-024-01051-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 01/09/2024] [Accepted: 01/17/2024] [Indexed: 07/24/2024]
Abstract
Radiology narrative reports often describe characteristics of a patient's disease, including its location, size, and shape. Motivated by the recent success of multimodal learning, we hypothesized that this descriptive text could guide medical image analysis algorithms. We proposed a novel vision-language model, ConTEXTual Net, for the task of pneumothorax segmentation on chest radiographs. ConTEXTual Net extracts language features from physician-generated free-form radiology reports using a pre-trained language model. We then introduced cross-attention between the language features and the intermediate embeddings of an encoder-decoder convolutional neural network to enable language guidance for image analysis. ConTEXTual Net was trained on the CANDID-PTX dataset consisting of 3196 positive cases of pneumothorax with segmentation annotations from 6 different physicians as well as clinical radiology reports. Using cross-validation, ConTEXTual Net achieved a Dice score of 0.716±0.016, which was similar to the degree of inter-reader variability (0.712±0.044) computed on a subset of the data. It outperformed vision-only models (Swin UNETR: 0.670±0.015, ResNet50 U-Net: 0.677±0.015, GLoRIA: 0.686±0.014, and nnUNet 0.694±0.016) and a competing vision-language model (LAVT: 0.706±0.009). Ablation studies confirmed that it was the text information that led to the performance gains. Additionally, we show that certain augmentation methods degraded ConTEXTual Net's segmentation performance by breaking the image-text concordance. We also evaluated the effects of using different language models and activation functions in the cross-attention module, highlighting the efficacy of our chosen architectural design.
Collapse
Affiliation(s)
- Zachary Huemann
- Department of Radiology, University of Wisconsin-Madison, Madison, WI, 53705, USA.
| | - Xin Tie
- Department of Radiology, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Junjie Hu
- Departments of Biostatistics and Computer Science, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Tyler J Bradshaw
- Department of Radiology, University of Wisconsin-Madison, Madison, WI, 53705, USA
| |
Collapse
|
43
|
Sogancioglu E, Ginneken BV, Behrendt F, Bengs M, Schlaefer A, Radu M, Xu D, Sheng K, Scalzo F, Marcus E, Papa S, Teuwen J, Scholten ET, Schalekamp S, Hendrix N, Jacobs C, Hendrix W, Sanchez CI, Murphy K. Nodule Detection and Generation on Chest X-Rays: NODE21 Challenge. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2839-2853. [PMID: 38530714 DOI: 10.1109/tmi.2024.3382042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
Pulmonary nodules may be an early manifestation of lung cancer, the leading cause of cancer-related deaths among both men and women. Numerous studies have established that deep learning methods can yield high-performance levels in the detection of lung nodules in chest X-rays. However, the lack of gold-standard public datasets slows down the progression of the research and prevents benchmarking of methods for this task. To address this, we organized a public research challenge, NODE21, aimed at the detection and generation of lung nodules in chest X-rays. While the detection track assesses state-of-the-art nodule detection systems, the generation track determines the utility of nodule generation algorithms to augment training data and hence improve the performance of the detection systems. This paper summarizes the results of the NODE21 challenge and performs extensive additional experiments to examine the impact of the synthetically generated nodule training images on the detection algorithm performance.
Collapse
|
44
|
Lin J, Yang J, Yin M, Tang Y, Chen L, Xu C, Zhu S, Gao J, Liu L, Liu X, Gu C, Huang Z, Wei Y, Zhu J. Development and Validation of Multimodal Models to Predict the 30-Day Mortality of ICU Patients Based on Clinical Parameters and Chest X-Rays. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:1312-1322. [PMID: 38448758 PMCID: PMC11300735 DOI: 10.1007/s10278-024-01066-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/08/2024]
Abstract
We aimed to develop and validate multimodal ICU patient prognosis models that combine clinical parameters data and chest X-ray (CXR) images. A total of 3798 subjects with clinical parameters and CXR images were extracted from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database and an external hospital (the test set). The primary outcome was 30-day mortality after ICU admission. Automated machine learning (AutoML) and convolutional neural networks (CNNs) were used to construct single-modal models based on clinical parameters and CXR separately. An early fusion approach was used to integrate both modalities (clinical parameters and CXR) into a multimodal model named PrismICU. Compared to the single-modal models, i.e., the clinical parameter model (AUC = 0.80, F1-score = 0.43) and the CXR model (AUC = 0.76, F1-score = 0.45) and the scoring system APACHE II (AUC = 0.83, F1-score = 0.77), PrismICU (AUC = 0.95, F1 score = 0.95) showed improved performance in predicting the 30-day mortality in the validation set. In the test set, PrismICU (AUC = 0.82, F1-score = 0.61) was also better than the clinical parameters model (AUC = 0.72, F1-score = 0.50), CXR model (AUC = 0.71, F1-score = 0.36), and APACHE II (AUC = 0.62, F1-score = 0.50). PrismICU, which integrated clinical parameters data and CXR images, performed better than single-modal models and the existing scoring system. It supports the potential of multimodal models based on structured data and imaging in clinical management.
Collapse
Affiliation(s)
- Jiaxi Lin
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China
| | - Jin Yang
- Department of Critical Care Medicine, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
| | - Minyue Yin
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China
| | - Yuxiu Tang
- Department of Critical Care Medicine, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
| | - Liquan Chen
- Department of Critical Care Medicine, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
| | - Chang Xu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China
| | - Shiqi Zhu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China
| | - Jingwen Gao
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China
| | - Lu Liu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China
| | - Xiaolin Liu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China
| | - Chenqi Gu
- Department of Radiology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Zhou Huang
- Department of Radiology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Yao Wei
- Department of Critical Care Medicine, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China.
| | - Jinzhou Zhu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China.
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China.
| |
Collapse
|
45
|
Zhang Y, Kohne J, Wittrup E, Najarian K. Three-Stage Framework for Accurate Pediatric Chest X-ray Diagnosis Using Self-Supervision and Transfer Learning on Small Datasets. Diagnostics (Basel) 2024; 14:1634. [PMID: 39125510 PMCID: PMC11312211 DOI: 10.3390/diagnostics14151634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Revised: 07/19/2024] [Accepted: 07/25/2024] [Indexed: 08/12/2024] Open
Abstract
Pediatric respiratory disease diagnosis and subsequent treatment require accurate and interpretable analysis. A chest X-ray is the most cost-effective and rapid method for identifying and monitoring various thoracic diseases in children. Recent developments in self-supervised and transfer learning have shown their potential in medical imaging, including chest X-ray areas. In this article, we propose a three-stage framework with knowledge transfer from adult chest X-rays to aid the diagnosis and interpretation of pediatric thorax diseases. We conducted comprehensive experiments with different pre-training and fine-tuning strategies to develop transformer or convolutional neural network models and then evaluate them qualitatively and quantitatively. The ViT-Base/16 model, fine-tuned with the CheXpert dataset, a large chest X-ray dataset, emerged as the most effective, achieving a mean AUC of 0.761 (95% CI: 0.759-0.763) across six disease categories and demonstrating a high sensitivity (average 0.639) and specificity (average 0.683), which are indicative of its strong discriminative ability. The baseline models, ViT-Small/16 and ViT-Base/16, when directly trained on the Pediatric CXR dataset, only achieved mean AUC scores of 0.646 (95% CI: 0.641-0.651) and 0.654 (95% CI: 0.648-0.660), respectively. Qualitatively, our model excels in localizing diseased regions, outperforming models pre-trained on ImageNet and other fine-tuning approaches, thus providing superior explanations. The source code is available online and the data can be obtained from PhysioNet.
Collapse
Affiliation(s)
- Yufeng Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA (K.N.)
| | - Joseph Kohne
- Department of Pediatrics, University of Michigan, Ann Arbor, MI 48103, USA
| | - Emily Wittrup
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA (K.N.)
| | - Kayvan Najarian
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA (K.N.)
- Michigan Institute for Data Science (MIDAS), University of Michigan, Ann Arbor, MI 48109, USA
- Department of Emergency Medicine, University of Michigan, Ann Arbor, MI 48109, USA
- Max Harry Weil Institute for Critical Care Research and Innovation, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
46
|
Siddiqi R, Javaid S. Deep Learning for Pneumonia Detection in Chest X-ray Images: A Comprehensive Survey. J Imaging 2024; 10:176. [PMID: 39194965 DOI: 10.3390/jimaging10080176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 07/15/2024] [Accepted: 07/19/2024] [Indexed: 08/29/2024] Open
Abstract
This paper addresses the significant problem of identifying the relevant background and contextual literature related to deep learning (DL) as an evolving technology in order to provide a comprehensive analysis of the application of DL to the specific problem of pneumonia detection via chest X-ray (CXR) imaging, which is the most common and cost-effective imaging technique available worldwide for pneumonia diagnosis. This paper in particular addresses the key period associated with COVID-19, 2020-2023, to explain, analyze, and systematically evaluate the limitations of approaches and determine their relative levels of effectiveness. The context in which DL is applied as both an aid to and an automated substitute for existing expert radiography professionals, who often have limited availability, is elaborated in detail. The rationale for the undertaken research is provided, along with a justification of the resources adopted and their relevance. This explanatory text and the subsequent analyses are intended to provide sufficient detail of the problem being addressed, existing solutions, and the limitations of these, ranging in detail from the specific to the more general. Indeed, our analysis and evaluation agree with the generally held view that the use of transformers, specifically, vision transformers (ViTs), is the most promising technique for obtaining further effective results in the area of pneumonia detection using CXR images. However, ViTs require extensive further research to address several limitations, specifically the following: biased CXR datasets, data and code availability, the ease with which a model can be explained, systematic methods of accurate model comparison, the notion of class imbalance in CXR datasets, and the possibility of adversarial attacks, the latter of which remains an area of fundamental research.
Collapse
Affiliation(s)
- Raheel Siddiqi
- Computer Science Department, Karachi Campus, Bahria University, Karachi 73500, Pakistan
| | - Sameena Javaid
- Computer Science Department, Karachi Campus, Bahria University, Karachi 73500, Pakistan
| |
Collapse
|
47
|
Kim JY, Ryu WS, Kim D, Kim EY. Better performance of deep learning pulmonary nodule detection using chest radiography with pixel level labels in reference to computed tomography: data quality matters. Sci Rep 2024; 14:15967. [PMID: 38987309 PMCID: PMC11237128 DOI: 10.1038/s41598-024-66530-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 07/02/2024] [Indexed: 07/12/2024] Open
Abstract
Labeling errors can significantly impact the performance of deep learning models used for screening chest radiographs. The deep learning model for detecting pulmonary nodules is particularly vulnerable to such errors, mainly because normal chest radiographs and those with nodules obscured by ribs appear similar. Thus, high-quality datasets referred to chest computed tomography (CT) are required to prevent the misclassification of nodular chest radiographs as normal. From this perspective, a deep learning strategy employing chest radiography data with pixel-level annotations referencing chest CT scans may improve nodule detection and localization compared to image-level labels. We trained models using a National Institute of Health chest radiograph-based labeling dataset and an AI-HUB CT-based labeling dataset, employing DenseNet architecture with squeeze-and-excitation blocks. We developed four models to assess whether CT versus chest radiography and pixel-level versus image-level labeling would improve the deep learning model's performance to detect nodules. The models' performance was evaluated using two external validation datasets. The AI-HUB dataset with image-level labeling outperformed the NIH dataset (AUC 0.88 vs 0.71 and 0.78 vs. 0.73 in two external datasets, respectively; both p < 0.001). However, the AI-HUB data annotated at the pixel level produced the best model (AUC 0.91 and 0.86 in external datasets), and in terms of nodule localization, it significantly outperformed models trained with image-level annotation data, with a Dice coefficient ranging from 0.36 to 0.58. Our findings underscore the importance of accurately labeled data in developing reliable deep learning algorithms for nodule detection in chest radiography.
Collapse
Affiliation(s)
- Jae Yong Kim
- Artificial Intelligence Research Center, JLK Inc., 5 Teheran-ro 33-gil, Seoul, Republic of Korea
| | - Wi-Sun Ryu
- Artificial Intelligence Research Center, JLK Inc., 5 Teheran-ro 33-gil, Seoul, Republic of Korea.
| | - Dongmin Kim
- Artificial Intelligence Research Center, JLK Inc., 5 Teheran-ro 33-gil, Seoul, Republic of Korea
| | - Eun Young Kim
- Department of Radiology, Incheon Sejong Hospital, 20, Gyeyangmunhwa-ro, Gyeyang-gu, Incheon, 21080, Republic of Korea.
| |
Collapse
|
48
|
Wang X, Lu Z, Huang S, Ting Y, Ting JSZ, Chen W, Tan CH, Huang W. TransMVAN: Multi-view Aggregation Network with Transformer for Pneumonia Diagnosis. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01169-9. [PMID: 38977615 DOI: 10.1007/s10278-024-01169-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 04/30/2024] [Accepted: 05/01/2024] [Indexed: 07/10/2024]
Abstract
Automated and accurate classification of pneumonia plays a crucial role in improving the performance of computer-aided diagnosis systems for chest X-ray images. Nevertheless, it is a challenging task due to the difficulty of learning the complex structure information of lung abnormality from chest X-ray images. In this paper, we propose a multi-view aggregation network with Transformer (TransMVAN) for pneumonia classification in chest X-ray images. Specifically, we propose to incorporate the knowledge from glance and focus views to enrich the feature representation of lung abnormality. Moreover, to capture the complex relationships among different lung regions, we propose a bi-directional multi-scale vision Transformer (biMSVT), with which the informative messages between different lung regions are propagated through two directions. In addition, we also propose a gated multi-view aggregation (GMVA) to adaptively select the feature information from glance and focus views for further performance enhancement of pneumonia diagnosis. Our proposed method achieves AUCs of 0.9645 and 0.9550 for pneumonia classification on two different chest X-ray image datasets. In addition, it achieves an AUC of 0.9761 for evaluating positive and negative polymerase chain reaction (PCR). Furthermore, our proposed method also attains an AUC of 0.9741 for classifying non-COVID-19 pneumonia, COVID-19 pneumonia, and normal cases. Experimental results demonstrate the effectiveness of our method over other methods used for comparison in pneumonia diagnosis from chest X-ray images.
Collapse
Affiliation(s)
- Xiaohong Wang
- Institute for Infocomm Research (I²R), A*STAR, 138632, Singapore, Singapore
| | - Zhongkang Lu
- Institute for Infocomm Research (I²R), A*STAR, 138632, Singapore, Singapore
| | - Su Huang
- Institute for Infocomm Research (I²R), A*STAR, 138632, Singapore, Singapore
| | - Yonghan Ting
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 308433, Singapore, Singapore
| | - Jordan Sim Zheng Ting
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 308433, Singapore, Singapore
| | - Wenxiang Chen
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 308433, Singapore, Singapore
| | - Cher Heng Tan
- Department of Diagnostic Radiology, Tan Tock Seng Hospital, 308433, Singapore, Singapore
- Lee Kong Chian School of Medicine, Nanyang Technological University, 308232, Singapore, Singapore
| | - Weimin Huang
- Institute for Infocomm Research (I²R), A*STAR, 138632, Singapore, Singapore.
| |
Collapse
|
49
|
López-Úbeda P, Martín-Noguerol T, Díaz-Angulo C, Luna A. Evaluation of large language models performance against humans for summarizing MRI knee radiology reports: A feasibility study. Int J Med Inform 2024; 187:105443. [PMID: 38615509 DOI: 10.1016/j.ijmedinf.2024.105443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 03/20/2024] [Accepted: 03/29/2024] [Indexed: 04/16/2024]
Abstract
OBJECTIVES This study addresses the critical need for accurate summarization in radiology by comparing various Large Language Model (LLM)-based approaches for automatic summary generation. With the increasing volume of patient information, accurately and concisely conveying radiological findings becomes crucial for effective clinical decision-making. Minor inaccuracies in summaries can lead to significant consequences, highlighting the need for reliable automated summarization tools. METHODS We employed two language models - Text-to-Text Transfer Transformer (T5) and Bidirectional and Auto-Regressive Transformers (BART) - in both fine-tuned and zero-shot learning scenarios and compared them with a Recurrent Neural Network (RNN). Additionally, we conducted a comparative analysis of 100 MRI report summaries, using expert human judgment and criteria such as coherence, relevance, fluency, and consistency, to evaluate the models against the original radiologist summaries. To facilitate this, we compiled a dataset of 15,508 retrospective knee Magnetic Resonance Imaging (MRI) reports from our Radiology Information System (RIS), focusing on the findings section to predict the radiologist's summary. RESULTS The fine-tuned models outperform the neural network and show superior performance in the zero-shot variant. Specifically, the T5 model achieved a Rouge-L score of 0.638. Based on the radiologist readers' study, the summaries produced by this model were found to be very similar to those produced by a radiologist, with about 70% similarity in fluency and consistency between the T5-generated summaries and the original ones. CONCLUSIONS Technological advances, especially in NLP and LLM, hold great promise for improving and streamlining the summarization of radiological findings, thus providing valuable assistance to radiologists in their work.
Collapse
Affiliation(s)
| | | | | | - Antonio Luna
- MRI Unit, Radiology Department, Health Time, Jaén, Spain.
| |
Collapse
|
50
|
Rajaraman S, Zamzmi G, Yang F, Liang Z, Xue Z, Antani S. Semantically redundant training data removal and deep model classification performance: A study with chest X-rays. Comput Med Imaging Graph 2024; 115:102379. [PMID: 38608333 PMCID: PMC11144082 DOI: 10.1016/j.compmedimag.2024.102379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 03/28/2024] [Accepted: 04/01/2024] [Indexed: 04/14/2024]
Abstract
Deep learning (DL) has demonstrated its innate capacity to independently learn hierarchical features from complex and multi-dimensional data. A common understanding is that its performance scales up with the amount of training data. However, the data must also exhibit variety to enable improved learning. In medical imaging data, semantic redundancy, which is the presence of similar or repetitive information, can occur due to the presence of multiple images that have highly similar presentations for the disease of interest. Also, the common use of augmentation methods to generate variety in DL training could limit performance when indiscriminately applied to such data. We hypothesize that semantic redundancy would therefore tend to lower performance and limit generalizability to unseen data and question its impact on classifier performance even with large data. We propose an entropy-based sample scoring approach to identify and remove semantically redundant training data and demonstrate using the publicly available NIH chest X-ray dataset that the model trained on the resulting informative subset of training data significantly outperforms the model trained on the full training set, during both internal (recall: 0.7164 vs 0.6597, p<0.05) and external testing (recall: 0.3185 vs 0.2589, p<0.05). Our findings emphasize the importance of information-oriented training sample selection as opposed to the conventional practice of using all available training data.
Collapse
Affiliation(s)
| | - Ghada Zamzmi
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Feng Yang
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Zhaohui Liang
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Zhiyun Xue
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Sameer Antani
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|