Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Mallio CA, Sertorio AC, Bernetti C, Beomonte Zobel B. Large language models for structured reporting in radiology: performance of GPT-4, ChatGPT-3.5, Perplexity and Bing. Radiol Med 2023:10.1007/s11547-023-01651-4. [PMID: 37248403 DOI: 10.1007/s11547-023-01651-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 05/17/2023] [Indexed: 05/31/2023]

For:	Mallio CA, Sertorio AC, Bernetti C, Beomonte Zobel B. Large language models for structured reporting in radiology: performance of GPT-4, ChatGPT-3.5, Perplexity and Bing. Radiol Med 2023:10.1007/s11547-023-01651-4. [PMID: 37248403 DOI: 10.1007/s11547-023-01651-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 05/17/2023] [Indexed: 05/31/2023]

Number

Cited by Other Article(s)

Xu X, Yang Y, Tan X, Zhang Z, Wang B, Yang X, Weng C, Yu R, Zhao Q, Quan S. Hepatic encephalopathy post-TIPS: Current status and prospects in predictive assessment. Comput Struct Biotechnol J 2024;24:493-506. [PMID: 39076168 PMCID: PMC11284497 DOI: 10.1016/j.csbj.2024.07.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 07/05/2024] [Accepted: 07/05/2024] [Indexed: 07/31/2024] Open

Sacoransky E, Kwan BYM, Soboleski D. ChatGPT and assistive AI in structured radiology reporting: A systematic review. Curr Probl Diagn Radiol 2024:S0363-0188(24)00113-0. [PMID: 39004580 DOI: 10.1067/j.cpradiol.2024.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 06/08/2024] [Accepted: 07/08/2024] [Indexed: 07/16/2024]

Keshavarz P, Bagherieh S, Nabipoorashrafi SA, Chalian H, Rahsepar AA, Kim GHJ, Hassani C, Raman SS, Bedayat A. ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives. Diagn Interv Imaging 2024;105:251-265. [PMID: 38679540 DOI: 10.1016/j.diii.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/11/2024] [Accepted: 04/16/2024] [Indexed: 05/01/2024]

Abstract

PURPOSE

The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications.

MATERIALS AND METHODS

After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications.

RESULTS

Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists' decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks.

CONCLUSION

Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.

Collapse

Levin C, Kagan T, Rosen S, Saban M. An evaluation of the capabilities of language models and nurses in providing neonatal clinical decision support. Int J Nurs Stud 2024;155:104771. [PMID: 38688103 DOI: 10.1016/j.ijnurstu.2024.104771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 03/26/2024] [Accepted: 04/03/2024] [Indexed: 05/02/2024]

Bhayana R, Nanda B, Dehkharghanian T, Deng Y, Bhambra N, Elias G, Datta D, Kambadakone A, Shwaartz CG, Moulton CA, Henault D, Gallinger S, Krishna S. Large Language Models for Automated Synoptic Reports and Resectability Categorization in Pancreatic Cancer. Radiology 2024;311:e233117. [PMID: 38888478 DOI: 10.1148/radiol.233117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2024]

Abstract

Background Structured radiology reports for pancreatic ductal adenocarcinoma (PDAC) improve surgical decision-making over free-text reports, but radiologist adoption is variable. Resectability criteria are applied inconsistently. Purpose To evaluate the performance of large language models (LLMs) in automatically creating PDAC synoptic reports from original reports and to explore performance in categorizing tumor resectability. Materials and Methods In this institutional review board-approved retrospective study, 180 consecutive PDAC staging CT reports on patients referred to the authors' European Society for Medical Oncology-designated cancer center from January to December 2018 were included. Reports were reviewed by two radiologists to establish the reference standard for 14 key findings and National Comprehensive Cancer Network (NCCN) resectability category. GPT-3.5 and GPT-4 (accessed September 18-29, 2023) were prompted to create synoptic reports from original reports with the same 14 features, and their performance was evaluated (recall, precision, F1 score). To categorize resectability, three prompting strategies (default knowledge, in-context knowledge, chain-of-thought) were used for both LLMs. Hepatopancreaticobiliary surgeons reviewed original and artificial intelligence (AI)-generated reports to determine resectability, with accuracy and review time compared. The McNemar test, t test, Wilcoxon signed-rank test, and mixed effects logistic regression models were used where appropriate. Results GPT-4 outperformed GPT-3.5 in the creation of synoptic reports (F1 score: 0.997 vs 0.967, respectively). Compared with GPT-3.5, GPT-4 achieved equal or higher F1 scores for all 14 extracted features. GPT-4 had higher precision than GPT-3.5 for extracting superior mesenteric artery involvement (100% vs 88.8%, respectively). For categorizing resectability, GPT-4 outperformed GPT-3.5 for each prompting strategy. For GPT-4, chain-of-thought prompting was most accurate, outperforming in-context knowledge prompting (92% vs 83%, respectively; P = .002), which outperformed the default knowledge strategy (83% vs 67%, P < .001). Surgeons were more accurate in categorizing resectability using AI-generated reports than original reports (83% vs 76%, respectively; P = .03), while spending less time on each report (58%; 95% CI: 0.53, 0.62). Conclusion GPT-4 created near-perfect PDAC synoptic reports from original reports. GPT-4 with chain-of-thought achieved high accuracy in categorizing resectability. Surgeons were more accurate and efficient using AI-generated reports. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Chang in this issue.

Collapse

Affiliation(s)

Rajesh Bhayana From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Princess Margaret Cancer Centre, Department of Medical Imaging, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Munk Building, 1st Fl, Toronto, ON, Canada M5G 24C (R.B., B.N., T.D., S.K.); Department of Biostatistics (Y.D.) and HPB Surgical Oncology (C.G.S., C.A.M., D.H., S.G.), University Health Network, Toronto, Ontario, Canada; Departments of Medicine (N.B., G.E., D.D.) and Surgery (C.G.S., C.A.M., D.H., S.G.), University of Toronto, Toronto, Ontario, Canada; and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (A.K.)
Bipin Nanda From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Princess Margaret Cancer Centre, Department of Medical Imaging, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Munk Building, 1st Fl, Toronto, ON, Canada M5G 24C (R.B., B.N., T.D., S.K.); Department of Biostatistics (Y.D.) and HPB Surgical Oncology (C.G.S., C.A.M., D.H., S.G.), University Health Network, Toronto, Ontario, Canada; Departments of Medicine (N.B., G.E., D.D.) and Surgery (C.G.S., C.A.M., D.H., S.G.), University of Toronto, Toronto, Ontario, Canada; and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (A.K.)
Taher Dehkharghanian From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Princess Margaret Cancer Centre, Department of Medical Imaging, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Munk Building, 1st Fl, Toronto, ON, Canada M5G 24C (R.B., B.N., T.D., S.K.); Department of Biostatistics (Y.D.) and HPB Surgical Oncology (C.G.S., C.A.M., D.H., S.G.), University Health Network, Toronto, Ontario, Canada; Departments of Medicine (N.B., G.E., D.D.) and Surgery (C.G.S., C.A.M., D.H., S.G.), University of Toronto, Toronto, Ontario, Canada; and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (A.K.)
Yangqing Deng From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Princess Margaret Cancer Centre, Department of Medical Imaging, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Munk Building, 1st Fl, Toronto, ON, Canada M5G 24C (R.B., B.N., T.D., S.K.); Department of Biostatistics (Y.D.) and HPB Surgical Oncology (C.G.S., C.A.M., D.H., S.G.), University Health Network, Toronto, Ontario, Canada; Departments of Medicine (N.B., G.E., D.D.) and Surgery (C.G.S., C.A.M., D.H., S.G.), University of Toronto, Toronto, Ontario, Canada; and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (A.K.)
Nishaant Bhambra From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Princess Margaret Cancer Centre, Department of Medical Imaging, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Munk Building, 1st Fl, Toronto, ON, Canada M5G 24C (R.B., B.N., T.D., S.K.); Department of Biostatistics (Y.D.) and HPB Surgical Oncology (C.G.S., C.A.M., D.H., S.G.), University Health Network, Toronto, Ontario, Canada; Departments of Medicine (N.B., G.E., D.D.) and Surgery (C.G.S., C.A.M., D.H., S.G.), University of Toronto, Toronto, Ontario, Canada; and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (A.K.)
Gavin Elias From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Princess Margaret Cancer Centre, Department of Medical Imaging, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Munk Building, 1st Fl, Toronto, ON, Canada M5G 24C (R.B., B.N., T.D., S.K.); Department of Biostatistics (Y.D.) and HPB Surgical Oncology (C.G.S., C.A.M., D.H., S.G.), University Health Network, Toronto, Ontario, Canada; Departments of Medicine (N.B., G.E., D.D.) and Surgery (C.G.S., C.A.M., D.H., S.G.), University of Toronto, Toronto, Ontario, Canada; and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (A.K.)
Daksh Datta From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Princess Margaret Cancer Centre, Department of Medical Imaging, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Munk Building, 1st Fl, Toronto, ON, Canada M5G 24C (R.B., B.N., T.D., S.K.); Department of Biostatistics (Y.D.) and HPB Surgical Oncology (C.G.S., C.A.M., D.H., S.G.), University Health Network, Toronto, Ontario, Canada; Departments of Medicine (N.B., G.E., D.D.) and Surgery (C.G.S., C.A.M., D.H., S.G.), University of Toronto, Toronto, Ontario, Canada; and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (A.K.)
Avinash Kambadakone From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Princess Margaret Cancer Centre, Department of Medical Imaging, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Munk Building, 1st Fl, Toronto, ON, Canada M5G 24C (R.B., B.N., T.D., S.K.); Department of Biostatistics (Y.D.) and HPB Surgical Oncology (C.G.S., C.A.M., D.H., S.G.), University Health Network, Toronto, Ontario, Canada; Departments of Medicine (N.B., G.E., D.D.) and Surgery (C.G.S., C.A.M., D.H., S.G.), University of Toronto, Toronto, Ontario, Canada; and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (A.K.)
Chaya G Shwaartz From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Princess Margaret Cancer Centre, Department of Medical Imaging, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Munk Building, 1st Fl, Toronto, ON, Canada M5G 24C (R.B., B.N., T.D., S.K.); Department of Biostatistics (Y.D.) and HPB Surgical Oncology (C.G.S., C.A.M., D.H., S.G.), University Health Network, Toronto, Ontario, Canada; Departments of Medicine (N.B., G.E., D.D.) and Surgery (C.G.S., C.A.M., D.H., S.G.), University of Toronto, Toronto, Ontario, Canada; and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (A.K.)
Carol-Anne Moulton From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Princess Margaret Cancer Centre, Department of Medical Imaging, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Munk Building, 1st Fl, Toronto, ON, Canada M5G 24C (R.B., B.N., T.D., S.K.); Department of Biostatistics (Y.D.) and HPB Surgical Oncology (C.G.S., C.A.M., D.H., S.G.), University Health Network, Toronto, Ontario, Canada; Departments of Medicine (N.B., G.E., D.D.) and Surgery (C.G.S., C.A.M., D.H., S.G.), University of Toronto, Toronto, Ontario, Canada; and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (A.K.)
David Henault From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Princess Margaret Cancer Centre, Department of Medical Imaging, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Munk Building, 1st Fl, Toronto, ON, Canada M5G 24C (R.B., B.N., T.D., S.K.); Department of Biostatistics (Y.D.) and HPB Surgical Oncology (C.G.S., C.A.M., D.H., S.G.), University Health Network, Toronto, Ontario, Canada; Departments of Medicine (N.B., G.E., D.D.) and Surgery (C.G.S., C.A.M., D.H., S.G.), University of Toronto, Toronto, Ontario, Canada; and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (A.K.)
Steven Gallinger From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Princess Margaret Cancer Centre, Department of Medical Imaging, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Munk Building, 1st Fl, Toronto, ON, Canada M5G 24C (R.B., B.N., T.D., S.K.); Department of Biostatistics (Y.D.) and HPB Surgical Oncology (C.G.S., C.A.M., D.H., S.G.), University Health Network, Toronto, Ontario, Canada; Departments of Medicine (N.B., G.E., D.D.) and Surgery (C.G.S., C.A.M., D.H., S.G.), University of Toronto, Toronto, Ontario, Canada; and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (A.K.)
Satheesh Krishna From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Princess Margaret Cancer Centre, Department of Medical Imaging, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Munk Building, 1st Fl, Toronto, ON, Canada M5G 24C (R.B., B.N., T.D., S.K.); Department of Biostatistics (Y.D.) and HPB Surgical Oncology (C.G.S., C.A.M., D.H., S.G.), University Health Network, Toronto, Ontario, Canada; Departments of Medicine (N.B., G.E., D.D.) and Surgery (C.G.S., C.A.M., D.H., S.G.), University of Toronto, Toronto, Ontario, Canada; and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (A.K.)

Collapse

Horiuchi D, Tatekawa H, Oura T, Oue S, Walston SL, Takita H, Matsushita S, Mitsuyama Y, Shimono T, Miki Y, Ueda D. Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases. Clin Neuroradiol 2024:10.1007/s00062-024-01426-y. [PMID: 38806794 DOI: 10.1007/s00062-024-01426-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 05/06/2024] [Indexed: 05/30/2024]

Abstract

PURPOSE

To compare the diagnostic performance among Generative Pre-trained Transformer (GPT)-4-based ChatGPT, GPT‑4 with vision (GPT-4V) based ChatGPT, and radiologists in challenging neuroradiology cases.

METHODS

We collected 32 consecutive "Freiburg Neuropathology Case Conference" cases from the journal Clinical Neuroradiology between March 2016 and December 2023. We input the medical history and imaging findings into GPT-4-based ChatGPT and the medical history and images into GPT-4V-based ChatGPT, then both generated a diagnosis for each case. Six radiologists (three radiology residents and three board-certified radiologists) independently reviewed all cases and provided diagnoses. ChatGPT and radiologists' diagnostic accuracy rates were evaluated based on the published ground truth. Chi-square tests were performed to compare the diagnostic accuracy of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists.

RESULTS

GPT‑4 and GPT-4V-based ChatGPTs achieved accuracy rates of 22% (7/32) and 16% (5/32), respectively. Radiologists achieved the following accuracy rates: three radiology residents 28% (9/32), 31% (10/32), and 28% (9/32); and three board-certified radiologists 38% (12/32), 47% (15/32), and 44% (14/32). GPT-4-based ChatGPT's diagnostic accuracy was lower than each radiologist, although not significantly (all p > 0.07). GPT-4V-based ChatGPT's diagnostic accuracy was also lower than each radiologist and significantly lower than two board-certified radiologists (p = 0.02 and 0.03) (not significant for radiology residents and one board-certified radiologist [all p > 0.09]).

CONCLUSION

While GPT-4-based ChatGPT demonstrated relatively higher diagnostic performance than GPT-4V-based ChatGPT, the diagnostic performance of GPT‑4 and GPT-4V-based ChatGPTs did not reach the performance level of either radiology residents or board-certified radiologists in challenging neuroradiology cases.

Collapse

Cozzi A, Pinker K, Hidber A, Zhang T, Bonomo L, Lo Gullo R, Christianson B, Curti M, Rizzo S, Del Grande F, Mann RM, Schiaffino S, Panzer A. BI-RADS Category Assignments by GPT-3.5, GPT-4, and Google Bard: A Multilanguage Study. Radiology 2024;311:e232133. [PMID: 38687216 PMCID: PMC11070611 DOI: 10.1148/radiol.232133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 03/08/2024] [Accepted: 03/12/2024] [Indexed: 05/02/2024]

Abstract

Background The performance of publicly available large language models (LLMs) remains unclear for complex clinical tasks. Purpose To evaluate the agreement between human readers and LLMs for Breast Imaging Reporting and Data System (BI-RADS) categories assigned based on breast imaging reports written in three languages and to assess the impact of discordant category assignments on clinical management. Materials and Methods This retrospective study included reports for women who underwent MRI, mammography, and/or US for breast cancer screening or diagnostic purposes at three referral centers. Reports with findings categorized as BI-RADS 1-5 and written in Italian, English, or Dutch were collected between January 2000 and October 2023. Board-certified breast radiologists and the LLMs GPT-3.5 and GPT-4 (OpenAI) and Bard, now called Gemini (Google), assigned BI-RADS categories using only the findings described by the original radiologists. Agreement between human readers and LLMs for BI-RADS categories was assessed using the Gwet agreement coefficient (AC1 value). Frequencies were calculated for changes in BI-RADS category assignments that would affect clinical management (ie, BI-RADS 0 vs BI-RADS 1 or 2 vs BI-RADS 3 vs BI-RADS 4 or 5) and compared using the McNemar test. Results Across 2400 reports, agreement between the original and reviewing radiologists was almost perfect (AC1 = 0.91), while agreement between the original radiologists and GPT-4, GPT-3.5, and Bard was moderate (AC1 = 0.52, 0.48, and 0.42, respectively). Across human readers and LLMs, differences were observed in the frequency of BI-RADS category upgrades or downgrades that would result in changed clinical management (118 of 2400 [4.9%] for human readers, 611 of 2400 [25.5%] for Bard, 573 of 2400 [23.9%] for GPT-3.5, and 435 of 2400 [18.1%] for GPT-4; P < .001) and that would negatively impact clinical management (37 of 2400 [1.5%] for human readers, 435 of 2400 [18.1%] for Bard, 344 of 2400 [14.3%] for GPT-3.5, and 255 of 2400 [10.6%] for GPT-4; P < .001). Conclusion LLMs achieved moderate agreement with human reader-assigned BI-RADS categories across reports written in three languages but also yielded a high percentage of discordant BI-RADS categories that would negatively impact clinical management. © RSNA, 2024 Supplemental material is available for this article.

Collapse

Affiliation(s)

Andrea Cozzi
Katja Pinker
Andri Hidber From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Tianyu Zhang From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Luca Bonomo From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Roberto Lo Gullo From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Blake Christianson From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Marco Curti From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Stefania Rizzo From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Filippo Del Grande From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Ritse M. Mann
Simone Schiaffino
Ariane Panzer From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)

Collapse

Bernetti C, Sertorio AC, Zobel BB, Mallio CA. ChatGPT generated diagnoses in neuroradiology: Quo Vadis? Neuroradiology 2024;66:303-304. [PMID: 38194083 DOI: 10.1007/s00234-024-03285-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 01/03/2024] [Indexed: 01/10/2024]

Scheschenja M, Viniol S, Bastian MB, Wessendorf J, König AM, Mahnken AH. Feasibility of GPT-3 and GPT-4 for in-Depth Patient Education Prior to Interventional Radiological Procedures: A Comparative Analysis. Cardiovasc Intervent Radiol 2024;47:245-250. [PMID: 37872295 PMCID: PMC10844465 DOI: 10.1007/s00270-023-03563-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 09/09/2023] [Indexed: 10/25/2023]

Kwee TC, Roest C, Yakar D. Is radiology's future without medical images? Eur J Radiol 2024;171:111296. [PMID: 38224634 DOI: 10.1016/j.ejrad.2024.111296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 01/07/2024] [Indexed: 01/17/2024]

Infante A, Gaudino S, Orsini F, Del Ciello A, Gullì C, Merlino B, Natale L, Iezzi R, Sala E. Large language models (LLMs) in the evaluation of emergency radiology reports: performance of ChatGPT-4, Perplexity, and Bard. Clin Radiol 2024;79:102-106. [PMID: 38087683 DOI: 10.1016/j.crad.2023.11.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 11/10/2023] [Accepted: 11/15/2023] [Indexed: 01/02/2024]

Affiliation(s)

A Infante ARC Advanced Radiology Center (ARC), Department of Oncological Radiotherapy, and Hematology, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy.
S Gaudino ARC Advanced Radiology Center (ARC), Department of Oncological Radiotherapy, and Hematology, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy; Università Cattolica del Sacro Cuore, Facoltà di Medicina e Chirurgia, Rome, Italy
F Orsini Università Cattolica del Sacro Cuore, Facoltà di Medicina e Chirurgia, Rome, Italy
A Del Ciello ARC Advanced Radiology Center (ARC), Department of Oncological Radiotherapy, and Hematology, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
C Gullì ARC Advanced Radiology Center (ARC), Department of Oncological Radiotherapy, and Hematology, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
B Merlino ARC Advanced Radiology Center (ARC), Department of Oncological Radiotherapy, and Hematology, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy; Università Cattolica del Sacro Cuore, Facoltà di Medicina e Chirurgia, Rome, Italy
L Natale ARC Advanced Radiology Center (ARC), Department of Oncological Radiotherapy, and Hematology, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy; Università Cattolica del Sacro Cuore, Facoltà di Medicina e Chirurgia, Rome, Italy
R Iezzi ARC Advanced Radiology Center (ARC), Department of Oncological Radiotherapy, and Hematology, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy; Università Cattolica del Sacro Cuore, Facoltà di Medicina e Chirurgia, Rome, Italy
E Sala ARC Advanced Radiology Center (ARC), Department of Oncological Radiotherapy, and Hematology, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy; Università Cattolica del Sacro Cuore, Facoltà di Medicina e Chirurgia, Rome, Italy

Collapse

Ray PP. Letter to the Editor: A critical evaluation on the use of large language model for radiology research. Eur Radiol 2023;33:9462-9463. [PMID: 37848769 DOI: 10.1007/s00330-023-10332-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 08/02/2023] [Accepted: 09/14/2023] [Indexed: 10/19/2023]

Mallio CA, Sertorio AC, Bernetti C, Beomonte Zobel B. Radiology, structured reporting and large language models: who is running faster? LA RADIOLOGIA MEDICA 2023;128:1443-1444. [PMID: 37501049 DOI: 10.1007/s11547-023-01689-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 07/17/2023] [Indexed: 07/29/2023]

Kleebayoon A, Wiwanitkit V. Large language models for structured reporting in radiology: comment. LA RADIOLOGIA MEDICA 2023;128:1440. [PMID: 37568071 DOI: 10.1007/s11547-023-01687-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 07/17/2023] [Indexed: 08/13/2023]

Mallio CA, Bernetti C, Sertorio AC, Beomonte Zobel B. Large language models and structured reporting: never stop chasing critical thinking. LA RADIOLOGIA MEDICA 2023;128:1445-1446. [PMID: 37660320 DOI: 10.1007/s11547-023-01711-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Accepted: 08/22/2023] [Indexed: 09/05/2023]