1
|
Hasani AM, Singh S, Zahergivar A, Ryan B, Nethala D, Bravomontenegro G, Mendhiratta N, Ball M, Farhadi F, Malayeri A. Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports. Eur Radiol 2024; 34:3566-3574. [PMID: 37938381 DOI: 10.1007/s00330-023-10384-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/01/2023] [Accepted: 09/08/2023] [Indexed: 11/09/2023]
Abstract
OBJECTIVE Radiology reporting is an essential component of clinical diagnosis and decision-making. With the advent of advanced artificial intelligence (AI) models like GPT-4 (Generative Pre-trained Transformer 4), there is growing interest in evaluating their potential for optimizing or generating radiology reports. This study aimed to compare the quality and content of radiologist-generated and GPT-4 AI-generated radiology reports. METHODS A comparative study design was employed in the study, where a total of 100 anonymized radiology reports were randomly selected and analyzed. Each report was processed by GPT-4, resulting in the generation of a corresponding AI-generated report. Quantitative and qualitative analysis techniques were utilized to assess similarities and differences between the two sets of reports. RESULTS The AI-generated reports showed comparable quality to radiologist-generated reports in most categories. Significant differences were observed in clarity (p = 0.027), ease of understanding (p = 0.023), and structure (p = 0.050), favoring the AI-generated reports. AI-generated reports were more concise, with 34.53 fewer words and 174.22 fewer characters on average, but had greater variability in sentence length. Content similarity was high, with an average Cosine Similarity of 0.85, Sequence Matcher Similarity of 0.52, BLEU Score of 0.5008, and BERTScore F1 of 0.8775. CONCLUSION The results of this proof-of-concept study suggest that GPT-4 can be a reliable tool for generating standardized radiology reports, offering potential benefits such as improved efficiency, better communication, and simplified data extraction and analysis. However, limitations and ethical implications must be addressed to ensure the safe and effective implementation of this technology in clinical practice. CLINICAL RELEVANCE STATEMENT The findings of this study suggest that GPT-4 (Generative Pre-trained Transformer 4), an advanced AI model, has the potential to significantly contribute to the standardization and optimization of radiology reporting, offering improved efficiency and communication in clinical practice. KEY POINTS • Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports. • Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports. • Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.
Collapse
Affiliation(s)
- Amir M Hasani
- Laboratory of Translation Research, National Heart Blood Lung Institute, NIH, Bethesda, MD, USA
| | - Shiva Singh
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA
| | - Aryan Zahergivar
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA
| | - Beth Ryan
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Daniel Nethala
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | - Neil Mendhiratta
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Mark Ball
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Faraz Farhadi
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA
| | - Ashkan Malayeri
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA.
| |
Collapse
|
2
|
Sindhu A, Jadhav U, Ghewade B, Bhanushali J, Yadav P. Revolutionizing Pulmonary Diagnostics: A Narrative Review of Artificial Intelligence Applications in Lung Imaging. Cureus 2024; 16:e57657. [PMID: 38707160 PMCID: PMC11070215 DOI: 10.7759/cureus.57657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 04/04/2024] [Indexed: 05/07/2024] Open
Abstract
Artificial intelligence (AI) has emerged as a transformative force in healthcare, particularly in pulmonary diagnostics. This comprehensive review explores the impact of AI on revolutionizing lung imaging, focusing on its applications in detecting abnormalities, diagnosing pulmonary conditions, and predicting disease prognosis. We provide an overview of traditional pulmonary diagnostic methods and highlight the importance of accurate and efficient lung imaging for early intervention and improved patient outcomes. Through the lens of AI, we examine machine learning algorithms, deep learning techniques, and natural language processing for analyzing radiology reports. Case studies and examples showcase the successful implementation of AI in pulmonary diagnostics, alongside challenges faced and lessons learned. Finally, we discuss future directions, including integrating AI into clinical workflows, ethical considerations, and the need for further research and collaboration in this rapidly evolving field. This review underscores the transformative potential of AI in enhancing the accuracy, efficiency, and accessibility of pulmonary healthcare.
Collapse
Affiliation(s)
- Arman Sindhu
- Respiratory Medicine, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| | - Ulhas Jadhav
- Respiratory Medicine, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| | - Babaji Ghewade
- Respiratory Medicine, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| | - Jay Bhanushali
- Respiratory Medicine, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| | - Pallavi Yadav
- Obstetrics and Gynecology, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| |
Collapse
|
3
|
Zhang YJ, Yu ZF, Liu JK, Huang TJ. Neural Decoding of Visual Information Across Different Neural Recording Modalities and Approaches. MACHINE INTELLIGENCE RESEARCH 2022. [PMCID: PMC9283560 DOI: 10.1007/s11633-022-1335-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Vision plays a peculiar role in intelligence. Visual information, forming a large part of the sensory information, is fed into the human brain to formulate various types of cognition and behaviours that make humans become intelligent agents. Recent advances have led to the development of brain-inspired algorithms and models for machine vision. One of the key components of these methods is the utilization of the computational principles underlying biological neurons. Additionally, advanced experimental neuroscience techniques have generated different types of neural signals that carry essential visual information. Thus, there is a high demand for mapping out functional models for reading out visual information from neural signals. Here, we briefly review recent progress on this issue with a focus on how machine learning techniques can help in the development of models for contending various types of neural signals, from fine-scale neural spikes and single-cell calcium imaging to coarse-scale electroencephalography (EEG) and functional magnetic resonance imaging recordings of brain signals.
Collapse
|
4
|
Tejani AS, Ng YS, Xi Y, Fielding JR, Browning TG, Rayan JC. Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets. Radiol Artif Intell 2022; 4:e220007. [PMID: 35923377 PMCID: PMC9344209 DOI: 10.1148/ryai.220007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 06/08/2022] [Accepted: 06/14/2022] [Indexed: 06/15/2023]
Abstract
PURPOSE To develop and evaluate domain-specific and pretrained bidirectional encoder representations from transformers (BERT) models in a transfer learning task on varying training dataset sizes to annotate a larger overall dataset. MATERIALS AND METHODS The authors retrospectively reviewed 69 095 anonymized adult chest radiograph reports (reports dated April 2020-March 2021). From the overall cohort, 1004 reports were randomly selected and labeled for the presence or absence of each of the following devices: endotracheal tube (ETT), enterogastric tube (NGT, or Dobhoff tube), central venous catheter (CVC), and Swan-Ganz catheter (SGC). Pretrained transformer models (BERT, PubMedBERT, DistilBERT, RoBERTa, and DeBERTa) were trained, validated, and tested on 60%, 20%, and 20%, respectively, of these reports through fivefold cross-validation. Additional training involved varying dataset sizes with 5%, 10%, 15%, 20%, and 40% of the 1004 reports. The best-performing epochs were used to assess area under the receiver operating characteristic curve (AUC) and determine run time on the overall dataset. RESULTS The highest average AUCs from fivefold cross-validation were 0.996 for ETT (RoBERTa), 0.994 for NGT (RoBERTa), 0.991 for CVC (PubMedBERT), and 0.98 for SGC (PubMedBERT). DeBERTa demonstrated the highest AUC for each support device trained on 5% of the training set. PubMedBERT showed a higher AUC with a decreasing training set size compared with BERT. Training and validation time was shortest for DistilBERT at 3 minutes 39 seconds on the annotated cohort. CONCLUSION Pretrained and domain-specific transformer models required small training datasets and short training times to create a highly accurate final model that expedites autonomous annotation of large datasets.Keywords: Informatics, Named Entity Recognition, Transfer Learning Supplemental material is available for this article. ©RSNA, 2022See also the commentary by Zech in this issue.
Collapse
|
5
|
Wiggins WF, Tejani AS. On the Opportunities and Risks of Foundation Models for Natural Language Processing in Radiology. Radiol Artif Intell 2022; 4:e220119. [PMID: 35923379 PMCID: PMC9344208 DOI: 10.1148/ryai.220119] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 06/23/2022] [Accepted: 06/27/2022] [Indexed: 06/15/2023]
|
6
|
Lybarger K, Damani A, Gunn M, Uzuner OZ, Yetisgen M. Extracting Radiological Findings With Normalized Anatomical Information Using a Span-Based BERT Relation Extraction Model. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022; 2022:339-348. [PMID: 35854739 PMCID: PMC9285141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/27/2023]
Abstract
Medical imaging is critical to the diagnosis and treatment of numerous medical problems, including many forms of cancer. Medical imaging reports distill the findings and observations of radiologists, creating an unstructured textual representation of unstructured medical images. Large-scale use of this text-encoded information requires converting the unstructured text to a structured, semantic representation. We explore the extraction and normalization of anatomical information in radiology reports that is associated with radiological findings. We investigate this extraction and normalization task using a span-based relation extraction model that jointly extracts entities and relations using BERT. This work examines the factors that influence extraction and normalization performance, including the body part/organ system, frequency of occurrence, span length, and span diversity. It discusses approaches for improving performance and creating high-quality semantic representations of radiological phenomena.
Collapse
|
7
|
Saeidi M, Karwowski W, Farahani FV, Fiok K, Taiar R, Hancock PA, Al-Juaid A. Neural Decoding of EEG Signals with Machine Learning: A Systematic Review. Brain Sci 2021; 11:1525. [PMID: 34827524 PMCID: PMC8615531 DOI: 10.3390/brainsci11111525] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 11/04/2021] [Accepted: 11/11/2021] [Indexed: 11/16/2022] Open
Abstract
Electroencephalography (EEG) is a non-invasive technique used to record the brain's evoked and induced electrical activity from the scalp. Artificial intelligence, particularly machine learning (ML) and deep learning (DL) algorithms, are increasingly being applied to EEG data for pattern analysis, group membership classification, and brain-computer interface purposes. This study aimed to systematically review recent advances in ML and DL supervised models for decoding and classifying EEG signals. Moreover, this article provides a comprehensive review of the state-of-the-art techniques used for EEG signal preprocessing and feature extraction. To this end, several academic databases were searched to explore relevant studies from the year 2000 to the present. Our results showed that the application of ML and DL in both mental workload and motor imagery tasks has received substantial attention in recent years. A total of 75% of DL studies applied convolutional neural networks with various learning algorithms, and 36% of ML studies achieved competitive accuracy by using a support vector machine algorithm. Wavelet transform was found to be the most common feature extraction method used for all types of tasks. We further examined the specific feature extraction methods and end classifier recommendations discovered in this systematic review.
Collapse
Affiliation(s)
- Maham Saeidi
- Computational Neuroergonomics Laboratory, Department of Industrial Engineering and Management Systems, University of Central Florida, Orlando, FL 32816, USA; (F.V.F.); (K.F.)
| | - Waldemar Karwowski
- Computational Neuroergonomics Laboratory, Department of Industrial Engineering and Management Systems, University of Central Florida, Orlando, FL 32816, USA; (F.V.F.); (K.F.)
| | - Farzad V. Farahani
- Computational Neuroergonomics Laboratory, Department of Industrial Engineering and Management Systems, University of Central Florida, Orlando, FL 32816, USA; (F.V.F.); (K.F.)
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Krzysztof Fiok
- Computational Neuroergonomics Laboratory, Department of Industrial Engineering and Management Systems, University of Central Florida, Orlando, FL 32816, USA; (F.V.F.); (K.F.)
| | - Redha Taiar
- MATIM, Moulin de la Housse, Université de Reims Champagne Ardenne, CEDEX 02, 51687 Reims, France;
| | - P. A. Hancock
- Department of Psychology, University of Central Florida, Orlando, FL 32816, USA;
| | - Awad Al-Juaid
- Industrial Engineering Department, Taif University, Taif 26571, Saudi Arabia;
| |
Collapse
|