Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ariyaratne S, Iyengar KP, Nischal N, Chitti Babu N, Botchu R. A comparison of ChatGPT-generated articles with human-written articles. Skeletal Radiol 2023;52:1755-1758. [PMID: 37059827 DOI: 10.1007/s00256-023-04340-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 04/06/2023] [Accepted: 04/09/2023] [Indexed: 04/16/2023]

For:	Ariyaratne S, Iyengar KP, Nischal N, Chitti Babu N, Botchu R. A comparison of ChatGPT-generated articles with human-written articles. Skeletal Radiol 2023;52:1755-1758. [PMID: 37059827 DOI: 10.1007/s00256-023-04340-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 04/06/2023] [Accepted: 04/09/2023] [Indexed: 04/16/2023]

Number

Cited by Other Article(s)

Vaishya R, Iyengar KP, Patralekh MK, Botchu R, Shirodkar K, Jain VK, Vaish A, Scarlat MM. Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions-an observational study. INTERNATIONAL ORTHOPAEDICS 2024;48:1963-1969. [PMID: 38619565 DOI: 10.1007/s00264-024-06182-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 04/03/2024] [Indexed: 04/16/2024]

Horiuchi D, Tatekawa H, Oura T, Shimono T, Walston SL, Takita H, Matsushita S, Mitsuyama Y, Miki Y, Ueda D. ChatGPT's diagnostic performance based on textual vs. visual information compared to radiologists' diagnostic performance in musculoskeletal radiology. Eur Radiol 2024:10.1007/s00330-024-10902-5. [PMID: 38995378 DOI: 10.1007/s00330-024-10902-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/02/2024] [Accepted: 06/24/2024] [Indexed: 07/13/2024]

Abstract

OBJECTIVES

To compare the diagnostic accuracy of Generative Pre-trained Transformer (GPT)-4-based ChatGPT, GPT-4 with vision (GPT-4V) based ChatGPT, and radiologists in musculoskeletal radiology.

MATERIALS AND METHODS

We included 106 "Test Yourself" cases from Skeletal Radiology between January 2014 and September 2023. We input the medical history and imaging findings into GPT-4-based ChatGPT and the medical history and images into GPT-4V-based ChatGPT, then both generated a diagnosis for each case. Two radiologists (a radiology resident and a board-certified radiologist) independently provided diagnoses for all cases. The diagnostic accuracy rates were determined based on the published ground truth. Chi-square tests were performed to compare the diagnostic accuracy of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists.

RESULTS

GPT-4-based ChatGPT significantly outperformed GPT-4V-based ChatGPT (p < 0.001) with accuracy rates of 43% (46/106) and 8% (9/106), respectively. The radiology resident and the board-certified radiologist achieved accuracy rates of 41% (43/106) and 53% (56/106). The diagnostic accuracy of GPT-4-based ChatGPT was comparable to that of the radiology resident, but was lower than that of the board-certified radiologist although the differences were not significant (p = 0.78 and 0.22, respectively). The diagnostic accuracy of GPT-4V-based ChatGPT was significantly lower than those of both radiologists (p < 0.001 and < 0.001, respectively).

CONCLUSION

GPT-4-based ChatGPT demonstrated significantly higher diagnostic accuracy than GPT-4V-based ChatGPT. While GPT-4-based ChatGPT's diagnostic performance was comparable to radiology residents, it did not reach the performance level of board-certified radiologists in musculoskeletal radiology.

CLINICAL RELEVANCE STATEMENT

GPT-4-based ChatGPT outperformed GPT-4V-based ChatGPT and was comparable to radiology residents, but it did not reach the level of board-certified radiologists in musculoskeletal radiology. Radiologists should comprehend ChatGPT's current performance as a diagnostic tool for optimal utilization.

KEY POINTS

This study compared the diagnostic performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists in musculoskeletal radiology. GPT-4-based ChatGPT was comparable to radiology residents, but did not reach the level of board-certified radiologists. When utilizing ChatGPT, it is crucial to input appropriate descriptions of imaging findings rather than the images.

Collapse

Passby L, Madhwapathi V, Tso S, Wernham A. Appraisal of AI-generated dermatology literature reviews. J Eur Acad Dermatol Venereol 2024. [PMID: 38994876 DOI: 10.1111/jdv.20237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 06/28/2024] [Indexed: 07/13/2024]

Chen Z, Chen C, Yang G, He X, Chi X, Zeng Z, Chen X. Research integrity in the era of artificial intelligence: Challenges and responses. Medicine (Baltimore) 2024;103:e38811. [PMID: 38968491 PMCID: PMC11224801 DOI: 10.1097/md.0000000000038811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Accepted: 06/13/2024] [Indexed: 07/07/2024] Open

Keshavarz P, Bagherieh S, Nabipoorashrafi SA, Chalian H, Rahsepar AA, Kim GHJ, Hassani C, Raman SS, Bedayat A. ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives. Diagn Interv Imaging 2024;105:251-265. [PMID: 38679540 DOI: 10.1016/j.diii.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/11/2024] [Accepted: 04/16/2024] [Indexed: 05/01/2024]

Abstract

PURPOSE

The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications.

MATERIALS AND METHODS

After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications.

RESULTS

Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists' decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks.

CONCLUSION

Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.

Collapse

Botchu B, Iyengar KP, Botchu R. Can ChatGPT empower people with dyslexia? Disabil Rehabil Assist Technol 2024;19:2131-2132. [PMID: 37697967 DOI: 10.1080/17483107.2023.2256805] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 09/05/2023] [Indexed: 09/13/2023]

Kotzur T, Singh A, Parker J, Peterson B, Sager B, Rose R, Corley F, Brady C. Evaluation of a Large Language Model's Ability to Assist in an Orthopedic Hand Clinic. Hand (N Y) 2024:15589447241257643. [PMID: 38907651 DOI: 10.1177/15589447241257643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/24/2024]

Wang ZP, Bhandary P, Wang Y, Moore JH. Using GPT-4 to write a scientific review article: a pilot evaluation study. BioData Min 2024;17:16. [PMID: 38890715 PMCID: PMC11184879 DOI: 10.1186/s13040-024-00371-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 06/11/2024] [Indexed: 06/20/2024] Open

DeCook R, Muffly BT, Mahmood S, Holland CT, Ayeni AM, Ast MP, Bolognese MP, Guild GN, Sheth NP, Pean CA, Premkumar A. AI-Generated Graduate Medical Education Content for Total Joint Arthroplasty: Comparing ChatGPT Against Orthopaedic Fellows. Arthroplast Today 2024;27:101412. [PMID: 38912098 PMCID: PMC11190484 DOI: 10.1016/j.artd.2024.101412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 03/20/2024] [Accepted: 04/28/2024] [Indexed: 06/25/2024] Open

Jenko N, Ariyaratne S, Jeys L, Evans S, Iyengar KP, Botchu R. An evaluation of AI generated literature reviews in musculoskeletal radiology. Surgeon 2024;22:194-197. [PMID: 38218659 DOI: 10.1016/j.surge.2023.12.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/20/2023] [Accepted: 12/27/2023] [Indexed: 01/15/2024]

Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA, Bergonzani M, Bolzoni A, Committeri U, Crimi S, Gabriele G, Lonardi F, Maglitto F, Petrocelli M, Pucci R, Saponaro G, Tel A, Vellone V, Chiesa-Estomba CM, Boscolo-Rizzo P, Salzano G, De Riu G. Accuracy of ChatGPT-Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis. Otolaryngol Head Neck Surg 2024;170:1492-1503. [PMID: 37595113 DOI: 10.1002/ohn.489] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/16/2023] [Accepted: 07/14/2023] [Indexed: 08/20/2023]

Affiliation(s)

Luigi Angelo Vaira Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Sassari, Italy Biomedical Sciences Department, PhD School of Biomedical Science, University of Sassari, Sassari, Italy
Jerome R Lechien Department of Anatomy and Experimental Oncology, Mons School of Medicine, UMONS, Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium Department of Otolaryngology-Head Neck Surgery, Elsan Polyclinic of Poitiers, Poitiers, France
Vincenzo Abbate Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
Fabiana Allevi Maxillofacial Surgery Department, ASSt Santi Paolo e Carlo, University of Milan, Milan, Italy
Giovanni Audino Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
Giada Anna Beltramini Department of Biomedical, Surgical and Dental Sciences, University of Milan, Milan, Italy Maxillofacial and Dental Unit, Fondazione IRCCS Cà Granda Ospedale Maggiore Policlinico, Milan, Italy
Michela Bergonzani Maxillo-Facial Surgery Division, Head and Neck Department, University Hospital of Parma, Parma, Italy
Alessandro Bolzoni Department of Biomedical, Surgical and Dental Sciences, University of Milan, Milan, Italy
Umberto Committeri Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
Salvatore Crimi Operative Unit of Maxillofacial Surgery, Policlinico San Marco, University of Catania, Catania, Italy
Guido Gabriele Department of Maxillofacial Surgery, University of Siena, Siena, Italy
Fabio Lonardi Department of Maxillofacial Surgery, University of Verona, Verona, Italy
Fabio Maglitto Maxillo-Facial Surgery Unit, University of Bari "Aldo Moro", Bari, Italy
Marzia Petrocelli Maxillofacial Surgery Operative Unit, Bellaria and Maggiore Hospital, Bologna, Italy
Resi Pucci Maxillofacial Surgery Unit, San Camillo-Forlanini Hospital, Rome, Italy
Gianmarco Saponaro Maxillo-Facial Surgery Unit, IRCSS "A. Gemelli" Foundation-Catholic, University of the Sacred Heart, Rome, Italy
Alessandro Tel Department of Head and Neck Surgery and Neuroscience, Clinic of Maxillofacial Surgery, University Hospital of Udine, Udine, Italy
Valentino Vellone Maxillofacial Surgery Unit, "S. Maria" Hospital, Terni, Italy
Carlos Miguel Chiesa-Estomba Department of Otorhinolaryngology-Head and Neck Surgery, Hospital Universitario Donostia, San Sebastian, Spain
Paolo Boscolo-Rizzo Department of Medical, Surgical and Health Sciences, Section of Otolaryngology, University of Trieste, Trieste, Italy
Giovanni Salzano Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
Giacomo De Riu Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Sassari, Italy

Collapse

Şan H, Bayrakcı Ö, Çağdaş B, Serdengeçti M, Alagöz E. Reliability and readability analysis of ChatGPT-4 and Google Bard as a patient information source for the most commonly applied radionuclide treatments in cancer patients. Rev Esp Med Nucl Imagen Mol 2024:500021. [PMID: 38821410 DOI: 10.1016/j.remnie.2024.500021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 05/21/2024] [Indexed: 06/02/2024]

Abstract

PURPOSE

Searching for online health information is a popular approach employed by patients to enhance their knowledge for their diseases. Recently developed AI chatbots are probably the easiest way in this regard. The purpose of the study is to analyze the reliability and readability of AI chatbot responses in terms of the most commonly applied radionuclide treatments in cancer patients.

METHODS

Basic patient questions, thirty about RAI, PRRT and TARE treatments and twenty-nine about PSMA-TRT, were asked one by one to GPT-4 and Bard on January 2024. The reliability and readability of the responses were assessed by using DISCERN scale, Flesch Reading Ease(FRE) and Flesch-Kincaid Reading Grade Level(FKRGL).

RESULTS

The mean (SD) FKRGL scores for the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT and TARE treatmens were 14.57 (1.19), 14.65 (1.38), 14.25 (1.10), 14.38 (1.2) and 11.49 (1.59), 12.42 (1.71), 11.35 (1.80), 13.01 (1.97), respectively. In terms of readability the FRKGL scores of the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT and TARE treatments were above the general public reading grade level. The mean (SD) DISCERN scores assesses by nuclear medicine phsician for the responses of GPT-4 and Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 47.86 (5.09), 48.48 (4.22), 46.76 (4.09), 48.33 (5.15) and 51.50 (5.64), 53.44 (5.42), 53 (6.36), 49.43 (5.32), respectively. Based on mean DISCERN scores, the reliability of the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT, and TARE treatments ranged from fair to good. The inter-rater reliability correlation coefficient of DISCERN scores assessed by GPT-4, Bard and nuclear medicine physician for the responses of GPT-4 about RAI, PSMA-TRT, PRRT and TARE treatments were 0.512(95% CI 0.296: 0.704), 0.695(95% CI 0.518: 0.829), 0.687(95% CI 0.511: 0.823) and 0.649 (95% CI 0.462: 0.798), respectively (p < 0.01). The inter-rater reliability correlation coefficient of DISCERN scores assessed by GPT-4, Bard and nuclear medicine physician for the responses of Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 0.753(95% CI 0.602: 0.863), 0.812(95% CI 0.686: 0.899), 0.804(95% CI 0.677: 0.894) and 0.671 (95% CI 0.489: 0.812), respectively (p < 0.01). The inter-rater reliability for the responses of Bard and GPT-4 about RAİ, PSMA-TRT, PRRT and TARE treatments were moderate to good. Further, consulting to the nuclear medicine physician was rarely emphasized both in GPT-4 and Google Bard and references were included in some responses of Google Bard, but there were no references in GPT-4.

CONCLUSION

Although the information provided by AI chatbots may be acceptable in medical terms, it can not be easy to read for the general public, which may prevent it from being understandable. Effective prompts using 'prompt engineering' may refine the responses in a more comprehensible manner. Since radionuclide treatments are specific to nuclear medicine expertise, nuclear medicine physician need to be stated as a consultant in responses in order to guide patients and caregivers to obtain accurate medical advice. Referencing is significant in terms of confidence and satisfaction of patients and caregivers seeking information.

Collapse

Horiuchi D, Tatekawa H, Oura T, Oue S, Walston SL, Takita H, Matsushita S, Mitsuyama Y, Shimono T, Miki Y, Ueda D. Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases. Clin Neuroradiol 2024:10.1007/s00062-024-01426-y. [PMID: 38806794 DOI: 10.1007/s00062-024-01426-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 05/06/2024] [Indexed: 05/30/2024]

Abstract

PURPOSE

To compare the diagnostic performance among Generative Pre-trained Transformer (GPT)-4-based ChatGPT, GPT‑4 with vision (GPT-4V) based ChatGPT, and radiologists in challenging neuroradiology cases.

METHODS

We collected 32 consecutive "Freiburg Neuropathology Case Conference" cases from the journal Clinical Neuroradiology between March 2016 and December 2023. We input the medical history and imaging findings into GPT-4-based ChatGPT and the medical history and images into GPT-4V-based ChatGPT, then both generated a diagnosis for each case. Six radiologists (three radiology residents and three board-certified radiologists) independently reviewed all cases and provided diagnoses. ChatGPT and radiologists' diagnostic accuracy rates were evaluated based on the published ground truth. Chi-square tests were performed to compare the diagnostic accuracy of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists.

RESULTS

GPT‑4 and GPT-4V-based ChatGPTs achieved accuracy rates of 22% (7/32) and 16% (5/32), respectively. Radiologists achieved the following accuracy rates: three radiology residents 28% (9/32), 31% (10/32), and 28% (9/32); and three board-certified radiologists 38% (12/32), 47% (15/32), and 44% (14/32). GPT-4-based ChatGPT's diagnostic accuracy was lower than each radiologist, although not significantly (all p > 0.07). GPT-4V-based ChatGPT's diagnostic accuracy was also lower than each radiologist and significantly lower than two board-certified radiologists (p = 0.02 and 0.03) (not significant for radiology residents and one board-certified radiologist [all p > 0.09]).

CONCLUSION

While GPT-4-based ChatGPT demonstrated relatively higher diagnostic performance than GPT-4V-based ChatGPT, the diagnostic performance of GPT‑4 and GPT-4V-based ChatGPTs did not reach the performance level of either radiology residents or board-certified radiologists in challenging neuroradiology cases.

Collapse

Ariyaratne S, Jenko N, Mark Davies A, Iyengar KP, Botchu R. Could ChatGPT Pass the UK Radiology Fellowship Examinations? Acad Radiol 2024;31:2178-2182. [PMID: 38160089 DOI: 10.1016/j.acra.2023.11.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 11/12/2023] [Accepted: 11/18/2023] [Indexed: 01/03/2024]

Dağci M, Çam F, Dost A. Reliability and Quality of the Nursing Care Planning Texts Generated by ChatGPT. Nurse Educ 2024;49:E109-E114. [PMID: 37994523 DOI: 10.1097/nne.0000000000001566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2023]

Carobene A, Padoan A, Cabitza F, Banfi G, Plebani M. Rising adoption of artificial intelligence in scientific publishing: evaluating the role, risks, and ethical implications in paper drafting and review process. Clin Chem Lab Med 2024;62:835-843. [PMID: 38019961 DOI: 10.1515/cclm-2023-1136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 11/13/2023] [Indexed: 12/01/2023]

Abstract

BACKGROUND

In the rapid evolving landscape of artificial intelligence (AI), scientific publishing is experiencing significant transformations. AI tools, while offering unparalleled efficiencies in paper drafting and peer review, also introduce notable ethical concerns.

CONTENT

This study delineates AI's dual role in scientific publishing: as a co-creator in the writing and review of scientific papers and as an ethical challenge. We first explore the potential of AI as an enhancer of efficiency, efficacy, and quality in creating scientific papers. A critical assessment follows, evaluating the risks vs. rewards for researchers, especially those early in their careers, emphasizing the need to maintain a balance between AI's capabilities and fostering independent reasoning and creativity. Subsequently, we delve into the ethical dilemmas of AI's involvement, particularly concerning originality, plagiarism, and preserving the genuine essence of scientific discourse. The evolving dynamics further highlight an overlooked aspect: the inadequate recognition of human reviewers in the academic community. With the increasing volume of scientific literature, tangible metrics and incentives for reviewers are proposed as essential to ensure a balanced academic environment.

SUMMARY

AI's incorporation in scientific publishing is promising yet comes with significant ethical and operational challenges. The role of human reviewers is accentuated, ensuring authenticity in an AI-influenced environment.

OUTLOOK

As the scientific community treads the path of AI integration, a balanced symbiosis between AI's efficiency and human discernment is pivotal. Emphasizing human expertise, while exploit artificial intelligence responsibly, will determine the trajectory of an ethically sound and efficient AI-augmented future in scientific publishing.

Collapse

Graf EM, McKinney JA, Dye AB, Lin L, Sanchez-Ramos L. Exploring the Limits of Artificial Intelligence for Referencing Scientific Articles. Am J Perinatol 2024. [PMID: 38653452 DOI: 10.1055/s-0044-1786033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]

Abstract

OBJECTIVE

To evaluate the reliability of three artificial intelligence (AI) chatbots (ChatGPT, Google Bard, and Chatsonic) in generating accurate references from existing obstetric literature.

STUDY DESIGN

Between mid-March and late April 2023, ChatGPT, Google Bard, and Chatsonic were prompted to provide references for specific obstetrical randomized controlled trials (RCTs) published in 2020. RCTs were considered for inclusion if they were mentioned in a previous article that primarily evaluated RCTs published by the top medical and obstetrics and gynecology journals with the highest impact factors in 2020 as well as RCTs published in a new journal focused on publishing obstetric RCTs. The selection of the three AI models was based on their popularity, performance in natural language processing, and public availability. Data collection involved prompting the AI chatbots to provide references according to a standardized protocol. The primary evaluation metric was the accuracy of each AI model in correctly citing references, including authors, publication title, journal name, and digital object identifier (DOI). Statistical analysis was performed using a permutation test to compare the performance of the AI models.

RESULTS

Among the 44 RCTs analyzed, Google Bard demonstrated the highest accuracy, correctly citing 13.6% of the requested RCTs, whereas ChatGPT and Chatsonic exhibited lower accuracy rates of 2.4 and 0%, respectively. Google Bard often substantially outperformed Chatsonic and ChatGPT in correctly citing the studied reference components. The majority of references from all AI models studied were noted to provide DOIs for unrelated studies or DOIs that do not exist.

CONCLUSION

To ensure the reliability of scientific information being disseminated, authors must exercise caution when utilizing AI for scientific writing and literature search. However, despite their limitations, collaborative partnerships between AI systems and researchers have the potential to drive synergistic advancements, leading to improved patient care and outcomes.

KEY POINTS

· AI chatbots often cite scientific articles incorrectly.. · AI chatbots can create false references.. · Responsible AI use in research is vital..

Collapse

Ni Z, Peng R, Zheng X, Xie P. Embracing the future: Integrating ChatGPT into China's nursing education system. Int J Nurs Sci 2024;11:295-299. [PMID: 38707690 PMCID: PMC11064564 DOI: 10.1016/j.ijnss.2024.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 02/13/2024] [Accepted: 03/06/2024] [Indexed: 05/07/2024] Open

Shen OY, Pratap JS, Li X, Chen NC, Bhashyam AR. How Does ChatGPT Use Source Information Compared With Google? A Text Network Analysis of Online Health Information. Clin Orthop Relat Res 2024;482:578-588. [PMID: 38517757 PMCID: PMC10936961 DOI: 10.1097/corr.0000000000002995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 01/08/2024] [Indexed: 03/24/2024]

Abstract

BACKGROUND

The lay public is increasingly using ChatGPT (a large language model) as a source of medical information. Traditional search engines such as Google provide several distinct responses to each search query and indicate the source for each response, but ChatGPT provides responses in paragraph form in prose without providing the sources used, which makes it difficult or impossible to ascertain whether those sources are reliable. One practical method to infer the sources used by ChatGPT is text network analysis. By understanding how ChatGPT uses source information in relation to traditional search engines, physicians and physician organizations can better counsel patients on the use of this new tool.

QUESTIONS/PURPOSES

(1) In terms of key content words, how similar are ChatGPT and Google Search responses for queries related to topics in orthopaedic surgery? (2) Does the source distribution (academic, governmental, commercial, or form of a scientific manuscript) differ for Google Search responses based on the topic's level of medical consensus, and how is this reflected in the text similarity between ChatGPT and Google Search responses? (3) Do these results vary between different versions of ChatGPT?

METHODS

We evaluated three search queries relating to orthopaedic conditions: "What is the cause of carpal tunnel syndrome?," "What is the cause of tennis elbow?," and "Platelet-rich plasma for thumb arthritis?" These were selected because of their relatively high, medium, and low consensus in the medical evidence, respectively. Each question was posed to ChatGPT version 3.5 and version 4.0 20 times for a total of 120 responses. Text network analysis using term frequency-inverse document frequency (TF-IDF) was used to compare text similarity between responses from ChatGPT and Google Search. In the field of information retrieval, TF-IDF is a weighted statistical measure of the importance of a key word to a document in a collection of documents. Higher TF-IDF scores indicate greater similarity between two sources. TF-IDF scores are most often used to compare and rank the text similarity of documents. Using this type of text network analysis, text similarity between ChatGPT and Google Search can be determined by calculating and summing the TF-IDF for all keywords in a ChatGPT response and comparing it with each Google search result to assess their text similarity to each other. In this way, text similarity can be used to infer relative content similarity. To answer our first question, we characterized the text similarity between ChatGPT and Google Search responses by finding the TF-IDF scores of the ChatGPT response and each of the 20 Google Search results for each question. Using these scores, we could compare the similarity of each ChatGPT response to the Google Search results. To provide a reference point for interpreting TF-IDF values, we generated randomized text samples with the same term distribution as the Google Search results. By comparing ChatGPT TF-IDF to the random text sample, we could assess whether TF-IDF values were statistically significant from TF-IDF values obtained by random chance, and it allowed us to test whether text similarity was an appropriate quantitative statistical measure of relative content similarity. To answer our second question, we classified the Google Search results to better understand sourcing. Google Search provides 20 or more distinct sources of information, but ChatGPT gives only a single prose paragraph in response to each query. So, to answer this question, we used TF-IDF to ascertain whether the ChatGPT response was principally driven by one of four source categories: academic, government, commercial, or material that took the form of a scientific manuscript but was not peer-reviewed or indexed on a government site (such as PubMed). We then compared the TF-IDF similarity between ChatGPT responses and the source category. To answer our third research question, we repeated both analyses and compared the results when using ChatGPT 3.5 versus ChatGPT 4.0.

RESULTS

The ChatGPT response was dominated by the top Google Search result. For example, for carpal tunnel syndrome, the top result was an academic website with a mean TF-IDF of 7.2. A similar result was observed for the other search topics. To provide a reference point for interpreting TF-IDF values, a randomly generated sample of text compared with Google Search would have a mean TF-IDF of 2.7 ± 1.9, controlling for text length and keyword distribution. The observed TF-IDF distribution was higher for ChatGPT responses than for random text samples, supporting the claim that keyword text similarity is a measure of relative content similarity. When comparing source distribution, the ChatGPT response was most similar to the most common source category from Google Search. For the subject where there was strong consensus (carpal tunnel syndrome), the ChatGPT response was most similar to high-quality academic sources rather than lower-quality commercial sources (TF-IDF 8.6 versus 2.2). For topics with low consensus, the ChatGPT response paralleled lower-quality commercial websites compared with higher-quality academic websites (TF-IDF 14.6 versus 0.2). ChatGPT 4.0 had higher text similarity to Google Search results than ChatGPT 3.5 (mean increase in TF-IDF similarity of 0.80 to 0.91; p < 0.001). The ChatGPT 4.0 response was still dominated by the top Google Search result and reflected the most common search category for all search topics.

CONCLUSION

ChatGPT responses are similar to individual Google Search results for queries related to orthopaedic surgery, but the distribution of source information can vary substantially based on the relative level of consensus on a topic. For example, for carpal tunnel syndrome, where there is widely accepted medical consensus, ChatGPT responses had higher similarity to academic sources and therefore used those sources more. When fewer academic or government sources are available, especially in our search related to platelet-rich plasma, ChatGPT appears to have relied more heavily on a small number of nonacademic sources. These findings persisted even as ChatGPT was updated from version 3.5 to version 4.0.

CLINICAL RELEVANCE

Physicians should be aware that ChatGPT and Google likely use the same sources for a specific question. The main difference is that ChatGPT can draw upon multiple sources to create one aggregate response, while Google maintains its distinctness by providing multiple results. For topics with a low consensus and therefore a low number of quality sources, there is a much higher chance that ChatGPT will use less-reliable sources, in which case physicians should take the time to educate patients on the topic or provide resources that give more reliable information. Physician organizations should make it clear when the evidence is limited so that ChatGPT can reflect the lack of quality information or evidence.

Collapse

Temperley HC, O'Sullivan NJ, Mac Curtain BM, Corr A, Meaney JF, Kelly ME, Brennan I. Current applications and future potential of ChatGPT in radiology: A systematic review. J Med Imaging Radiat Oncol 2024;68:257-264. [PMID: 38243605 DOI: 10.1111/1754-9485.13621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 12/29/2023] [Indexed: 01/21/2024]

Lin KC, Chen TA, Lin MH, Chen YC, Chen TJ. Integration and Assessment of ChatGPT in Medical Case Reporting: A Multifaceted Approach. Eur J Investig Health Psychol Educ 2024;14:888-901. [PMID: 38667812 PMCID: PMC11049282 DOI: 10.3390/ejihpe14040057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 03/18/2024] [Accepted: 03/19/2024] [Indexed: 04/28/2024] Open

Yahagi M, Hiruta R, Miyauchi C, Tanaka S, Taguchi A, Yaguchi Y. Comparison of Conventional Anesthesia Nurse Education and an Artificial Intelligence Chatbot (ChatGPT) Intervention on Preoperative Anxiety: A Randomized Controlled Trial. J Perianesth Nurs 2024:S1089-9472(23)01073-0. [PMID: 38520470 DOI: 10.1016/j.jopan.2023.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 12/03/2023] [Accepted: 12/04/2023] [Indexed: 03/25/2024]

Abstract

PURPOSE

This study aimed to evaluate the effects of an artificial intelligence (AI) chatbot (ChatGPT-3.5, OpenAI) on preoperative anxiety reduction and patient satisfaction in adult patients undergoing surgery under general anesthesia.

DESIGN

The study used a single-blind, randomized controlled trial design.

METHODS

In this study, 100 adult patients were enrolled and divided into two groups: 50 in the control group, in which patients received standard preoperative information from anesthesia nurses, and 50 in the intervention group, in which patients interacted with ChatGPT. The primary outcome, preoperative anxiety reduction, was measured using the Japanese State-Trait Anxiety Inventory (STAI) self-report questionnaire. The secondary endpoints included participant satisfaction (Q1), comprehension of the treatment process (Q2), and the perception of the AI chatbot's responses as more relevant than those of the nurses (Q3).

FINDINGS

Of the 85 participants who completed the study, the STAI scores in the control group remained stable, whereas those in the intervention group decreased. The mixed-effects model showed significant effects of time and group-time interaction on the STAI scores; however, no main group effect was observed. The secondary endpoints revealed mixed results; some patients found that the chatbot's responses were more relevant, whereas others were dissatisfied or experienced difficulties.

CONCLUSIONS

The ChatGPT intervention significantly reduced preoperative anxiety compared with the control group; however, no overall difference in the STAI scores was observed. The mixed secondary endpoint results highlight the need for refining chatbot algorithms and knowledge bases to improve performance and satisfaction. AI chatbots should complement, rather than replace, human health care providers. Seamless integration and effective communication among AI chatbots, patients, and health care providers are essential for optimizing patient outcomes.

Collapse

Bukar UA, Sayeed MS, Razak SFA, Yogarayan S, Amodu OA. An integrative decision-making framework to guide policies on regulating ChatGPT usage. PeerJ Comput Sci 2024;10:e1845. [PMID: 38440047 PMCID: PMC10911759 DOI: 10.7717/peerj-cs.1845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 01/09/2024] [Indexed: 03/06/2024]

Abstract

Generative artificial intelligence has created a moment in history where human beings have begin to closely interact with artificial intelligence (AI) tools, putting policymakers in a position to restrict or legislate such tools. One particular example of such a tool is ChatGPT which is the first and world's most popular multipurpose generative AI tool. This study aims to put forward a policy-making framework of generative artificial intelligence based on the risk, reward, and resilience framework. A systematic search was conducted, by using carefully chosen keywords, excluding non-English content, conference articles, book chapters, and editorials. Published research were filtered based on their relevance to ChatGPT ethics, yielding a total of 41 articles. Key elements surrounding ChatGPT concerns and motivations were systematically deduced and classified under the risk, reward, and resilience categories to serve as ingredients for the proposed decision-making framework. The decision-making process and rules were developed as a primer to help policymakers navigate decision-making conundrums. Then, the framework was practically tailored towards some of the concerns surrounding ChatGPT in the context of higher education. In the case of the interconnection between risk and reward, the findings show that providing students with access to ChatGPT presents an opportunity for increased efficiency in tasks such as text summarization and workload reduction. However, this exposes them to risks such as plagiarism and cheating. Similarly, pursuing certain opportunities such as accessing vast amounts of information, can lead to rewards, but it also introduces risks like misinformation and copyright issues. Likewise, focusing on specific capabilities of ChatGPT, such as developing tools to detect plagiarism and misinformation, may enhance resilience in some areas (e.g., academic integrity). However, it may also create vulnerabilities in other domains, such as the digital divide, educational equity, and job losses. Furthermore, the finding indicates second-order effects of legislation regarding ChatGPT which have implications both positively and negatively. One potential effect is a decrease in rewards due to the limitations imposed by the legislation, which may hinder individuals from fully capitalizing on the opportunities provided by ChatGPT. Hence, the risk, reward, and resilience framework provides a comprehensive and flexible decision-making model that allows policymakers and in this use case, higher education institutions to navigate the complexities and trade-offs associated with ChatGPT, which have theoretical and practical implications for the future.

Collapse

Saad A, Jenko N, Ariyaratne S, Birch N, Iyengar KP, Davies AM, Vaishya R, Botchu R. Exploring the potential of ChatGPT in the peer review process: An observational study. Diabetes Metab Syndr 2024;18:102946. [PMID: 38330745 DOI: 10.1016/j.dsx.2024.102946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 01/09/2024] [Accepted: 01/10/2024] [Indexed: 02/10/2024]

Hatia A, Doldo T, Parrini S, Chisci E, Cipriani L, Montagna L, Lagana G, Guenza G, Agosta E, Vinjolli F, Hoxha M, D’Amelio C, Favaretto N, Chisci G. Accuracy and Completeness of ChatGPT-Generated Information on Interceptive Orthodontics: A Multicenter Collaborative Study. J Clin Med 2024;13:735. [PMID: 38337430 PMCID: PMC10856539 DOI: 10.3390/jcm13030735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 01/21/2024] [Accepted: 01/25/2024] [Indexed: 02/12/2024] Open

Mese I. Tracing the Footprints of AI in Radiology Literature: A Detailed Analysis of Journal Abstracts. ROFO-FORTSCHR RONTG 2024. [PMID: 38228155 DOI: 10.1055/a-2224-9230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2024]

Abstract

PURPOSE

To assess and compare the probabilities of AI-generated content within scientific abstracts from selected Q1 journals in the fields of radiology, nuclear medicine, and imaging, published between May and August 2022 and May and August 2023.

MATERIALS AND METHODS

An extensive list of Q1 journals was acquired from Scopus in the fields of radiology, nuclear medicine, and imaging. All articles in these journals were acquired from the Medline databases, focusing on articles published between May and August in 2022 and 2023. The study specifically compared abstracts for limitations of the AI detection tool in terms of word constraints. Extracted abstracts from the two different periods were categorized into two groups, and each abstract was analyzed using the AI detection tool, a system capable of distinguishing between human and AI-generated content with a validated accuracy of 97.06 %. This tool assessed the probability of each abstract being AI-generated, enabling an in-depth comparison between the two groups in terms of the prevalence of AI-generated content probability.

RESULTS

Group 1 and Group 2 exhibit significant variations in the characteristics of AI-generated content probability. Group 1, consisting of 4,727 abstracts, has a median AI-generated content probability of 3.8 % (IQR1.9-9.9 %) and peaks at 49.9 %, with the computation times contained within a range of 2 to 10 seconds (IQR 3-8 s). In contrast, Group 2, which is composed of 3,917 abstracts, displays a significantly higher median AI-generated content probability at 5.7 % (IQR2.8-12.9 %) surging to a maximum of 69.9 %, with computation times spanning from 2 to 14 seconds (IQR 4-11 s). This comparison yields a statistically significant difference in median AI-generated content probability between the two groups (p = 0.005). No significant correlation was observed between word count and AI probability, as well as between article type, primarily original articles and reviews, and AI probability, indicating that AI probability is independent of these factors.

CONCLUSION

The comprehensive analysis reveals significant differences and variations in AI-generated content probabilities between 2022 and 2023, indicating a growing presence of AI-generated content. However, it also illustrates that abstract length or article type does not impact the likelihood of content being AI-generated.

KEY POINTS

· The study examines AI-generated content probability in scientific abstracts from Q1 journals between 2022 to 2023.. · The AI detector tool indicates an increased median AI content probability from 3.8 % to 5.7 %.. · No correlation was found between abstract length or article type and AI probability..

Collapse

Warren E, Hurley ET, Park CN, Crook BS, Lorentz S, Levin JM, Anakwenze O, MacDonald PB, Klifto CS. Evaluation of information from artificial intelligence on rotator cuff repair surgery. JSES Int 2024;8:53-57. [PMID: 38312282 PMCID: PMC10837709 DOI: 10.1016/j.jseint.2023.09.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2024] Open

Abstract

Purpose

The purpose of this study was to analyze the quality and readability of information regarding rotator cuff repair surgery available using an online AI software.

Methods

An open AI model (ChatGPT) was used to answer 24 commonly asked questions from patients on rotator cuff repair. Questions were stratified into one of three categories based on the Rothwell classification system: fact, policy, or value. The answers for each category were evaluated for reliability, quality and readability using The Journal of the American Medical Association Benchmark criteria, DISCERN score, Flesch-Kincaid Reading Ease Score and Grade Level.

Results

The Journal of the American Medical Association Benchmark criteria score for all three categories was 0, which is the lowest score indicating no reliable resources cited. The DISCERN score was 51 for fact, 53 for policy, and 55 for value questions, all of which are considered good scores. Across question categories, the reliability portion of the DISCERN score was low, due to a lack of resources. The Flesch-Kincaid Reading Ease Score (and Flesch-Kincaid Grade Level) was 48.3 (10.3) for the fact class, 42.0 (10.9) for the policy class, and 38.4 (11.6) for the value class.

Conclusion

The quality of information provided by the open AI chat system was generally high across all question types but had significant shortcomings in reliability due to the absence of source material citations. The DISCERN scores of the AI generated responses matched or exceeded previously published results of studies evaluating the quality of online information about rotator cuff repairs. The responses were U.S. 10th grade or higher reading level which is above the AMA and NIH recommendation of 6th grade reading level for patient materials. The AI software commonly referred the user to seek advice from orthopedic surgeons to improve their chances of a successful outcome.

Collapse

Ferreira RM. New evidence-based practice: Artificial intelligence as a barrier breaker. World J Methodol 2023;13:384-389. [PMID: 38229944 PMCID: PMC10789101 DOI: 10.5662/wjm.v13.i5.384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 10/24/2023] [Accepted: 11/08/2023] [Indexed: 12/20/2023] Open

Ariyaratne S, Iyengar KP, Nischal N, Babu NC, Botchu R. Authors' response to the Letter to the Editor: Re-evaluating the role of AI in scientific writing: a critical analysis on ChatGPT. Skeletal Radiol 2023;52:2489. [PMID: 37462695 DOI: 10.1007/s00256-023-04405-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/18/2023]

Ariyaratne S, Iyengar KP, Nischal N, Babu NC, Botchu R. Authors' response to the Letter to the Editor. Skeletal Radiol 2023;52:2491. [PMID: 37515642 DOI: 10.1007/s00256-023-04418-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/31/2023]

Kleebayoon A, Wiwanitkit V. ChatGPT-generated articles and human-written articles: correspondence. Skeletal Radiol 2023;52:2493. [PMID: 37566150 DOI: 10.1007/s00256-023-04417-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 07/20/2023] [Accepted: 07/24/2023] [Indexed: 08/12/2023]

Ariyaratne S, Iyengar KP, Botchu R. Author response to: Comment on: Will collaborative publishing with ChatGPT drive academic writing in the future? Br J Surg 2023;110:1894. [PMID: 37702575 DOI: 10.1093/bjs/znad295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 09/05/2023] [Indexed: 09/14/2023]

Sharun K, Banu SA, Pawde AM, Kumar R, Akash S, Dhama K, Pal A. ChatGPT and artificial hallucinations in stem cell research: assessing the accuracy of generated references - a preliminary study. Ann Med Surg (Lond) 2023;85:5275-5278. [PMID: 37811040 PMCID: PMC10553015 DOI: 10.1097/ms9.0000000000001228] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 08/12/2023] [Indexed: 10/10/2023] Open

Saad A, Iyengar KP, Kurisunkal V, Botchu R. Assessing ChatGPT's ability to pass the FRCS orthopaedic part A exam: A critical analysis. Surgeon 2023;21:263-266. [PMID: 37517980 DOI: 10.1016/j.surge.2023.07.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 07/03/2023] [Indexed: 08/01/2023]

Ariyaratne S, Iyengar KP, Botchu R. Will collaborative publishing with ChatGPT drive academic writing in the future? Br J Surg 2023;110:1213-1214. [PMID: 37368994 DOI: 10.1093/bjs/znad198] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 04/28/2023] [Accepted: 05/26/2023] [Indexed: 06/29/2023]

Bhattacharyya M, Miller VM, Bhattacharyya D, Miller LE. High Rates of Fabricated and Inaccurate References in ChatGPT-Generated Medical Content. Cureus 2023;15:e39238. [PMID: 37337480 PMCID: PMC10277170 DOI: 10.7759/cureus.39238] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/18/2023] [Indexed: 06/21/2023] Open

Abstract

Background The availability of large language models such as Chat Generative Pre-trained Transformer (ChatGPT, OpenAI) has enabled individuals from diverse backgrounds to access medical information. However, concerns exist about the accuracy of ChatGPT responses and the references used to generate medical content. Methods This observational study investigated the authenticity and accuracy of references in medical articles generated by ChatGPT. ChatGPT-3.5 generated 30 short medical papers, each with at least three references, based on standardized prompts encompassing various topics and therapeutic areas. Reference authenticity and accuracy were verified by searching Medline, Google Scholar, and the Directory of Open Access Journals. The authenticity and accuracy of individual ChatGPT-generated reference elements were also determined. Results Overall, 115 references were generated by ChatGPT, with a mean of 3.8±1.1 per paper. Among these references, 47% were fabricated, 46% were authentic but inaccurate, and only 7% were authentic and accurate. The likelihood of fabricated references significantly differed based on prompt variations; yet the frequency of authentic and accurate references remained low in all cases. Among the seven components evaluated for each reference, an incorrect PMID number was most common, listed in 93% of papers. Incorrect volume (64%), page numbers (64%), and year of publication (60%) were the next most frequent errors. The mean number of inaccurate components was 4.3±2.8 out of seven per reference. Conclusions The findings of this study emphasize the need for caution when seeking medical information on ChatGPT since most of the references provided were found to be fabricated or inaccurate. Individuals are advised to verify medical information from reliable sources and avoid relying solely on artificial intelligence-generated content.

Collapse