Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, Moy L. ChatGPT and Other Large Language Models Are Double-edged Swords. Radiology 2023;307:e230163. [PMID: 36700838 DOI: 10.1148/radiol.230163] [Citation(s) in RCA: 224] [Impact Index Per Article: 224.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

For:	Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, Moy L. ChatGPT and Other Large Language Models Are Double-edged Swords. Radiology 2023;307:e230163. [PMID: 36700838 DOI: 10.1148/radiol.230163] [Citation(s) in RCA: 224] [Impact Index Per Article: 224.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

Number

Cited by Other Article(s)

151

Diane A, Gencarelli P, Lee JM, Mittal R. Utilizing ChatGPT to Streamline the Generation of Prior Authorization Letters and Enhance Clerical Workflow in Orthopedic Surgery Practice: A Case Report. Cureus 2023;15:e49680. [PMID: 38161881 PMCID: PMC10756745 DOI: 10.7759/cureus.49680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/29/2023] [Indexed: 01/03/2024] Open

152

Hu JM, Liu FC, Chu CM, Chang YT. Health Care Trainees' and Professionals' Perceptions of ChatGPT in Improving Medical Knowledge Training: Rapid Survey Study. J Med Internet Res 2023;25:e49385. [PMID: 37851495 PMCID: PMC10620632 DOI: 10.2196/49385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 07/13/2023] [Accepted: 09/29/2023] [Indexed: 10/19/2023] Open

Abstract

BACKGROUND

ChatGPT is a powerful pretrained large language model. It has both demonstrated potential and raised concerns related to knowledge translation and knowledge transfer. To apply and improve knowledge transfer in the real world, it is essential to assess the perceptions and acceptance of the users of ChatGPT-assisted training.

OBJECTIVE

We aimed to investigate the perceptions of health care trainees and professionals on ChatGPT-assisted training, using biomedical informatics as an example.

METHODS

We used purposeful sampling to include all health care undergraduate trainees and graduate professionals (n=195) from January to May 2023 in the School of Public Health at the National Defense Medical Center in Taiwan. Subjects were asked to watch a 2-minute video introducing 5 scenarios about ChatGPT-assisted training in biomedical informatics and then answer a self-designed online (web- and mobile-based) questionnaire according to the Kirkpatrick model. The survey responses were used to develop 4 constructs: "perceived knowledge acquisition," "perceived training motivation," "perceived training satisfaction," and "perceived training effectiveness." The study used structural equation modeling (SEM) to evaluate and test the structural model and hypotheses.

RESULTS

The online questionnaire response rate was 152 of 195 (78%); 88 of 152 participants (58%) were undergraduate trainees and 90 of 152 participants (59%) were women. The ages ranged from 18 to 53 years (mean 23.3, SD 6.0 years). There was no statistical difference in perceptions of training evaluation between men and women. Most participants were enthusiastic about the ChatGPT-assisted training, while the graduate professionals were more enthusiastic than undergraduate trainees. Nevertheless, some concerns were raised about potential cheating on training assessment. The average scores for knowledge acquisition, training motivation, training satisfaction, and training effectiveness were 3.84 (SD 0.80), 3.76 (SD 0.93), 3.75 (SD 0.87), and 3.72 (SD 0.91), respectively (Likert scale 1-5: strongly disagree to strongly agree). Knowledge acquisition had the highest score and training effectiveness the lowest. In the SEM results, training effectiveness was influenced predominantly by knowledge acquisition and partially met the hypotheses in the research framework. Knowledge acquisition had a direct effect on training effectiveness, training satisfaction, and training motivation, with β coefficients of .80, .87, and .97, respectively (all P<.001).

CONCLUSIONS

Most health care trainees and professionals perceived ChatGPT-assisted training as an aid in knowledge transfer. However, to improve training effectiveness, it should be combined with empirical experts for proper guidance and dual interaction. In a future study, we recommend using a larger sample size for evaluation of internet-connected large language models in medical knowledge transfer.

Collapse

153

Rashidi HH, Fennell BD, Albahra S, Hu B, Gorbett T. The ChatGPT conundrum: Human-generated scientific manuscripts misidentified as AI creations by AI text detection tool. J Pathol Inform 2023;14:100342. [PMID: 38116171 PMCID: PMC10727991 DOI: 10.1016/j.jpi.2023.100342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/08/2023] [Accepted: 10/10/2023] [Indexed: 12/21/2023] Open

154

Mou C, Liang A, Hu C, Meng F, Han B, Xu F. Monitoring Endangered and Rare Wildlife in the Field: A Foundation Deep Learning Model Integrating Human Knowledge for Incremental Recognition with Few Data and Low Cost. Animals (Basel) 2023;13:3168. [PMID: 37893892 PMCID: PMC10603653 DOI: 10.3390/ani13203168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/04/2023] [Accepted: 10/09/2023] [Indexed: 10/29/2023] Open

155

Abani S, De Decker S, Tipold A, Nessler JN, Volk HA. Can ChatGPT diagnose my collapsing dog? Front Vet Sci 2023;10:1245168. [PMID: 37901112 PMCID: PMC10600474 DOI: 10.3389/fvets.2023.1245168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 09/19/2023] [Indexed: 10/31/2023] Open

156

Abu Hammour K, Alhamad H, Al-Ashwal FY, Halboup A, Abu Farha R, Abu Hammour A. ChatGPT in pharmacy practice: a cross-sectional exploration of Jordanian pharmacists' perception, practice, and concerns. J Pharm Policy Pract 2023;16:115. [PMID: 37789443 PMCID: PMC10548710 DOI: 10.1186/s40545-023-00624-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 09/22/2023] [Indexed: 10/05/2023] Open

Abstract

OBJECTIVES

The purpose of this study is to find out how much pharmacists know and have used ChatGPT in their practice. We investigated the advantages and disadvantages of utilizing ChatGPT in a pharmacy context, the amount of training necessary to use it proficiently, and the influence on patient care using a survey.

METHODS

This cross-sectional study was carried out between May and June 2023 to assess the potential and problems that pharmacists observed while integrating chatbots powered by AI (ChatGPT) in pharmacy practice. The correlation between perceived benefits and concerns was evaluated using Spearman's rho correlation due to the data's non-normal distribution.Any pharmacists licensed by the Jordanian Pharmacists Association were included in the study. A convenient sampling technique was used to choose the participants, and the study questionnaire was distributed utilizing an online medium (Facebook and WhatsApp). Anyone who expressed interest in taking part was given a link to the study's instructions so they may read them before giving their electronic consent and accessing the survey.

RESULTS

The potential advantages of ChatGPT in the pharmacy practice were widely acknowledged by the participants. The majority of participants (69.9%) concurred that educational material about pharmacy items or therapeutic areas can be provided using ChatGPT, with 66.9% of respondents believing that ChatGPT is a machine learning algorithm. Concerns about the accuracy of AI-generated responses were also prevalent. More than half of the participants (55.7%) raised the possibility that AI systems such as ChatGPT could pick up on and replicate prejudices and discriminatory patterns from the data they were trained on. Analysis shows a statistically significant positive link, albeit a minor one, between the perceived advantages of ChatGPT and its drawbacks (r = 0.255, p < 0.001). However, concerns were strongly correlated with knowledge of ChatGPT. In contrast to those who were either unsure or had not heard of ChatGPT (64.2%), individuals who had heard of it were more likely to have strong concerns (79.8%) (p = 0.002). Finally, the results show a statistically significant association between the frequency of ChatGPT use and positive perceptions of the tool (p < 0.001).

CONCLUSIONS

Although ChatGPT has shown promise in health and pharmaceutical practice, its application should be rigorously regulated by evidence-based law. According to the study's findings, pharmacists support the use of ChatGPT in pharmacy practice but have concerns about its use due to ethical reasons, legal problems, privacy concerns, worries about the accuracy of the data generated, data learning, and bias risk.

Collapse

157

Goodman RS, Patrinely JR, Stone CA, Zimmerman E, Donald RR, Chang SS, Berkowitz ST, Finn AP, Jahangir E, Scoville EA, Reese TS, Friedman DL, Bastarache JA, van der Heijden YF, Wright JJ, Ye F, Carter N, Alexander MR, Choe JH, Chastain CA, Zic JA, Horst SN, Turker I, Agarwal R, Osmundson E, Idrees K, Kiernan CM, Padmanabhan C, Bailey CE, Schlegel CE, Chambless LB, Gibson MK, Osterman TJ, Wheless LE, Johnson DB. Accuracy and Reliability of Chatbot Responses to Physician Questions. JAMA Netw Open 2023;6:e2336483. [PMID: 37782499 PMCID: PMC10546234 DOI: 10.1001/jamanetworkopen.2023.36483] [Citation(s) in RCA: 52] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 08/22/2023] [Indexed: 10/03/2023] Open

Abstract

Importance

Natural language processing tools, such as ChatGPT (generative pretrained transformer, hereafter referred to as chatbot), have the potential to radically enhance the accessibility of medical information for health professionals and patients. Assessing the safety and efficacy of these tools in answering physician-generated questions is critical to determining their suitability in clinical settings, facilitating complex decision-making, and optimizing health care efficiency.

Objective

To assess the accuracy and comprehensiveness of chatbot-generated responses to physician-developed medical queries, highlighting the reliability and limitations of artificial intelligence-generated medical information.

Design, Setting, and Participants

Thirty-three physicians across 17 specialties generated 284 medical questions that they subjectively classified as easy, medium, or hard with either binary (yes or no) or descriptive answers. The physicians then graded the chatbot-generated answers to these questions for accuracy (6-point Likert scale with 1 being completely incorrect and 6 being completely correct) and completeness (3-point Likert scale, with 1 being incomplete and 3 being complete plus additional context). Scores were summarized with descriptive statistics and compared using the Mann-Whitney U test or the Kruskal-Wallis test. The study (including data analysis) was conducted from January to May 2023.

Main Outcomes and Measures

Accuracy, completeness, and consistency over time and between 2 different versions (GPT-3.5 and GPT-4) of chatbot-generated medical responses.

Results

Across all questions (n = 284) generated by 33 physicians (31 faculty members and 2 recent graduates from residency or fellowship programs) across 17 specialties, the median accuracy score was 5.5 (IQR, 4.0-6.0) (between almost completely and complete correct) with a mean (SD) score of 4.8 (1.6) (between mostly and almost completely correct). The median completeness score was 3.0 (IQR, 2.0-3.0) (complete and comprehensive) with a mean (SD) score of 2.5 (0.7). For questions rated easy, medium, and hard, the median accuracy scores were 6.0 (IQR, 5.0-6.0), 5.5 (IQR, 5.0-6.0), and 5.0 (IQR, 4.0-6.0), respectively (mean [SD] scores were 5.0 [1.5], 4.7 [1.7], and 4.6 [1.6], respectively; P = .05). Accuracy scores for binary and descriptive questions were similar (median score, 6.0 [IQR, 4.0-6.0] vs 5.0 [IQR, 3.4-6.0]; mean [SD] score, 4.9 [1.6] vs 4.7 [1.6]; P = .07). Of 36 questions with scores of 1.0 to 2.0, 34 were requeried or regraded 8 to 17 days later with substantial improvement (median score 2.0 [IQR, 1.0-3.0] vs 4.0 [IQR, 2.0-5.3]; P < .01). A subset of questions, regardless of initial scores (version 3.5), were regenerated and rescored using version 4 with improvement (mean accuracy [SD] score, 5.2 [1.5] vs 5.7 [0.8]; median score, 6.0 [IQR, 5.0-6.0] for original and 6.0 [IQR, 6.0-6.0] for rescored; P = .002).

Conclusions and Relevance

In this cross-sectional study, chatbot generated largely accurate information to diverse medical queries as judged by academic physician specialists with improvement over time, although it had important limitations. Further research and model development are needed to correct inaccuracies and for validation.

Collapse

Affiliation(s)

Rachel S. Goodman Vanderbilt University School of Medicine, Nashville, Tennessee
J. Randall Patrinely Department of Dermatology, Vanderbilt University Medical Center, Nashville, Tennessee
Cosby A. Stone Department of Allergy, Pulmonology, and Critical Care, Vanderbilt University Medical Center, Nashville, Tennessee
Eli Zimmerman Department of Neurology, Vanderbilt University Medical Center, Nashville, Tennessee
Rebecca R. Donald Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, Tennessee
Sam S. Chang Department of Urology, Vanderbilt University Medical Center, Nashville, Tennessee
Sean T. Berkowitz Vanderbilt Eye Institute, Department of Ophthalmology, Vanderbilt University Medical, Nashville, Tennessee
Avni P. Finn Vanderbilt Eye Institute, Department of Ophthalmology, Vanderbilt University Medical, Nashville, Tennessee
Eiman Jahangir Department of Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
Elizabeth A. Scoville Department of Gastroenterology, Hepatology, and Nutrition, Vanderbilt University Medical Center, Nashville, Tennessee
Tyler S. Reese Department of Rheumatology and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee
Debra L. Friedman Department of Pediatric Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
Julie A. Bastarache Department of Allergy, Pulmonology, and Critical Care, Vanderbilt University Medical Center, Nashville, Tennessee
Yuri F. van der Heijden Department of Infectious Disease, Vanderbilt University Medical Center, Nashville, Tennessee
Jordan J. Wright Department of Diabetes, Endocrinology, and Metabolism, Vanderbilt University Medical Center, Nashville, Tennessee
Fei Ye Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
Nicholas Carter Division of Trauma and Surgical Critical Care, University of Miami Miller School of Medicine, Miami, Florida
Matthew R. Alexander Department of Cardiovascular Medicine and Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee
Jennifer H. Choe Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
Cody A. Chastain Department of Infectious Disease, Vanderbilt University Medical Center, Nashville, Tennessee
John A. Zic Department of Dermatology, Vanderbilt University Medical Center, Nashville, Tennessee
Sara N. Horst Department of Gastroenterology, Hepatology, and Nutrition, Vanderbilt University Medical Center, Nashville, Tennessee
Isik Turker Department of Cardiology, Washington University School of Medicine in St Louis, St Louis, Missouri
Rajiv Agarwal Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
Evan Osmundson Department of Radiation Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
Kamran Idrees Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
Colleen M. Kiernan Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
Chandrasekhar Padmanabhan Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
Christina E. Bailey Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
Cameron E. Schlegel Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
Lola B. Chambless Department of Neurological Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
Michael K. Gibson Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
Travis J. Osterman Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
Lee E. Wheless Department of Dermatology, Vanderbilt University Medical Center, Nashville, Tennessee
Douglas B. Johnson Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee

Collapse

158

Blüthgen C. Does GPT4 dream of counting electric nodules? Eur Radiol 2023;33:6756-6758. [PMID: 37099177 PMCID: PMC10511354 DOI: 10.1007/s00330-023-09671-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 04/12/2023] [Accepted: 04/14/2023] [Indexed: 04/27/2023]

159

Eggmann F, Weiger R, Zitzmann NU, Blatz MB. Implications of large language models such as ChatGPT for dental medicine. J ESTHET RESTOR DENT 2023;35:1098-1102. [PMID: 37017291 DOI: 10.1111/jerd.13046] [Citation(s) in RCA: 44] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 03/25/2023] [Accepted: 03/28/2023] [Indexed: 04/06/2023]

160

Momenaei B, Wakabayashi T, Shahlaee A, Durrani AF, Pandit SA, Wang K, Mansour HA, Abishek RM, Xu D, Sridhar J, Yonekawa Y, Kuriyan AE. Appropriateness and Readability of ChatGPT-4-Generated Responses for Surgical Treatment of Retinal Diseases. Ophthalmol Retina 2023;7:862-868. [PMID: 37277096 DOI: 10.1016/j.oret.2023.05.022] [Citation(s) in RCA: 54] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/26/2023] [Accepted: 05/30/2023] [Indexed: 06/07/2023]

Abstract

OBJECTIVE

To evaluate the appropriateness and readability of the medical knowledge provided by ChatGPT-4, an artificial intelligence-powered conversational search engine, regarding common vitreoretinal surgeries for retinal detachments (RDs), macular holes (MHs), and epiretinal membranes (ERMs).

DESIGN

Retrospective cross-sectional study.

SUBJECTS

This study did not involve any human participants.

METHODS

We created lists of common questions about the definition, prevalence, visual impact, diagnostic methods, surgical and nonsurgical treatment options, postoperative information, surgery-related complications, and visual prognosis of RD, MH, and ERM, and asked each question 3 times on the online ChatGPT-4 platform. The data for this cross-sectional study were recorded on April 25, 2023. Two independent retina specialists graded the appropriateness of the responses. Readability was assessed using Readable, an online readability tool.

MAIN OUTCOME MEASURES

The "appropriateness" and "readability" of the answers generated by ChatGPT-4 bot.

RESULTS

Responses were consistently appropriate in 84.6% (33/39), 92% (23/25), and 91.7% (22/24) of the questions related to RD, MH, and ERM, respectively. Answers were inappropriate at least once in 5.1% (2/39), 8% (2/25), and 8.3% (2/24) of the respective questions. The average Flesch Kincaid Grade Level and Flesch Reading Ease Score were 14.1 ± 2.6 and 32.3 ± 10.8 for RD, 14 ± 1.3 and 34.4 ± 7.7 for MH, and 14.8 ± 1.3 and 28.1 ± 7.5 for ERM. These scores indicate that the answers are difficult or very difficult to read for the average lay person and college graduation would be required to understand the material.

CONCLUSIONS

Most of the answers provided by ChatGPT-4 were consistently appropriate. However, ChatGPT and other natural language models in their current form are not a source of factual information. Improving the credibility and readability of responses, especially in specialized fields, such as medicine, is a critical focus of research. Patients, physicians, and laypersons should be advised of the limitations of these tools for eye- and health-related counseling.

FINANCIAL DISCLOSURE(S)

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Collapse

161

Kim JK, Chua M, Rickard M, Lorenzo A. ChatGPT and large language model (LLM) chatbots: The current state of acceptability and a proposal for guidelines on utilization in academic medicine. J Pediatr Urol 2023;19:598-604. [PMID: 37328321 DOI: 10.1016/j.jpurol.2023.05.018] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 05/14/2023] [Accepted: 05/27/2023] [Indexed: 06/18/2023]

Abstract

INTRODUCTION

There is currently no clear consensus on the standards for using large language models such as ChatGPT in academic medicine. Hence, we performed a scoping review of available literature to understand the current state of LLM use in medicine and to provide a guideline for future utilization in academia.

MATERIALS AND METHODS

A scoping review of the literature was performed through a Medline search on February 16, 2023 using a combination of keywords including artificial intelligence, machine learning, natural language processing, generative pre-trained transformer, ChatGPT, and large language model. There were no restrictions to language or date of publication. Records not pertaining to LLMs were excluded. Records pertaining to LLM ChatBots and ChatGPT were identified and evaluated separately. Among the records pertaining to LLM ChatBots and ChatGPT, those that suggest recommendations for ChatGPT use in academia were utilized to create guideline statements for ChatGPT and LLM use in academic medicine.

RESULTS

A total of 87 records were identified. 30 records were not pertaining to large language models and were excluded. 54 records underwent a full-text review for evaluation. There were 33 records related to LLM ChatBots or ChatGPT.

DISCUSSION

From assessing these texts, five guideline statements for LLM use was developed: (1) ChatGPT/LLM cannot be cited as an author in scientific manuscripts; (2) If use of ChatGPT/LLM are considered for use in academic work, author(s) should have at least a basic understanding of what ChatGPT/LLM is; (3) Do not use ChatGPT/LLM to produce entirety of text in manuscripts; humans must be held accountable for use of ChatGPT/LLM and contents created by ChatGPT/LLM should be meticulously verified by humans; (4) ChatGPT/LLMs may be used for editing and refining of text; (5) Any use of ChatGPT/LLM should be transparent and should be clearly outlined in scientific manuscripts and acknowledged.

CONCLUSION

Future authors should remain mindful of the potential impact their academic work may have on healthcare and continue to uphold the highest ethical standards and integrity when utilizing ChatGPT/LLM.

Collapse

162

Cai W. Feasibility and Prospect of Privacy-preserving Large Language Models in Radiology. Radiology 2023;309:e232335. [PMID: 37815443 PMCID: PMC10623203 DOI: 10.1148/radiol.232335] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 09/07/2023] [Accepted: 09/08/2023] [Indexed: 10/11/2023]

163

Rao A, Kim J, Kamineni M, Pang M, Lie W, Dreyer KJ, Succi MD. Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot. J Am Coll Radiol 2023;20:990-997. [PMID: 37356806 PMCID: PMC10733745 DOI: 10.1016/j.jacr.2023.05.003] [Citation(s) in RCA: 47] [Impact Index Per Article: 47.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 05/16/2023] [Accepted: 05/23/2023] [Indexed: 06/27/2023]

164

Laxar D, Eitenberger M, Maleczek M, Kaider A, Hammerle FP, Kimberger O. The influence of explainable vs non-explainable clinical decision support systems on rapid triage decisions: a mixed methods study. BMC Med 2023;21:359. [PMID: 37726729 PMCID: PMC10510231 DOI: 10.1186/s12916-023-03068-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 09/05/2023] [Indexed: 09/21/2023] Open

Abstract

BACKGROUND

During the COVID-19 pandemic, a variety of clinical decision support systems (CDSS) were developed to aid patient triage. However, research focusing on the interaction between decision support systems and human experts is lacking.

METHODS

Thirty-two physicians were recruited to rate the survival probability of 59 critically ill patients by means of chart review. Subsequently, one of two artificial intelligence systems advised the physician of a computed survival probability. However, only one of these systems explained the reasons behind its decision-making. In the third step, physicians reviewed the chart once again to determine the final survival probability rating. We hypothesized that an explaining system would exhibit a higher impact on the physicians' second rating (i.e., higher weight-on-advice).

RESULTS

The survival probability rating given by the physician after receiving advice from the clinical decision support system was a median of 4 percentage points closer to the advice than the initial rating. Weight-on-advice was not significantly different (p = 0.115) between the two systems (with vs without explanation for its decision). Additionally, weight-on-advice showed no difference according to time of day or between board-qualified and not yet board-qualified physicians. Self-reported post-experiment overall trust was awarded a median of 4 out of 10 points. When asked after the conclusion of the experiment, overall trust was 5.5/10 (non-explaining median 4 (IQR 3.5-5.5), explaining median 7 (IQR 5.5-7.5), p = 0.007).

CONCLUSIONS

Although overall trust in the models was low, the median (IQR) weight-on-advice was high (0.33 (0.0-0.56)) and in line with published literature on expert advice. In contrast to the hypothesis, weight-on-advice was comparable between the explaining and non-explaining systems. In 30% of cases, weight-on-advice was 0, meaning the physician did not change their rating. The median of the remaining weight-on-advice values was 50%, suggesting that physicians either dismissed the recommendation or employed a "meeting halfway" approach. Newer technologies, such as clinical reasoning systems, may be able to augment the decision process rather than simply presenting unexplained bias.

Collapse

165

Khlaif ZN, Mousa A, Hattab MK, Itmazi J, Hassan AA, Sanmugam M, Ayyoub A. The Potential and Concerns of Using AI in Scientific Research: ChatGPT Performance Evaluation. JMIR MEDICAL EDUCATION 2023;9:e47049. [PMID: 37707884 PMCID: PMC10636627 DOI: 10.2196/47049] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 06/04/2023] [Accepted: 07/21/2023] [Indexed: 09/15/2023]

Abstract

BACKGROUND

Artificial intelligence (AI) has many applications in various aspects of our daily life, including health, criminal, education, civil, business, and liability law. One aspect of AI that has gained significant attention is natural language processing (NLP), which refers to the ability of computers to understand and generate human language.

OBJECTIVE

This study aims to examine the potential for, and concerns of, using AI in scientific research. For this purpose, high-impact research articles were generated by analyzing the quality of reports generated by ChatGPT and assessing the application's impact on the research framework, data analysis, and the literature review. The study also explored concerns around ownership and the integrity of research when using AI-generated text.

METHODS

A total of 4 articles were generated using ChatGPT, and thereafter evaluated by 23 reviewers. The researchers developed an evaluation form to assess the quality of the articles generated. Additionally, 50 abstracts were generated using ChatGPT and their quality was evaluated. The data were subjected to ANOVA and thematic analysis to analyze the qualitative data provided by the reviewers.

RESULTS

When using detailed prompts and providing the context of the study, ChatGPT would generate high-quality research that could be published in high-impact journals. However, ChatGPT had a minor impact on developing the research framework and data analysis. The primary area needing improvement was the development of the literature review. Moreover, reviewers expressed concerns around ownership and the integrity of the research when using AI-generated text. Nonetheless, ChatGPT has a strong potential to increase human productivity in research and can be used in academic writing.

CONCLUSIONS

AI-generated text has the potential to improve the quality of high-impact research articles. The findings of this study suggest that decision makers and researchers should focus more on the methodology part of the research, which includes research design, developing research tools, and analyzing data in depth, to draw strong theoretical and practical implications, thereby establishing a revolution in scientific research in the era of AI. The practical implications of this study can be used in different fields such as medical education to deliver materials to develop the basic competencies for both medicine students and faculty members.

Collapse

166

Brameier DT, Alnasser AA, Carnino JM, Bhashyam AR, von Keudell AG, Weaver MJ. Artificial Intelligence in Orthopaedic Surgery: Can a Large Language Model "Write" a Believable Orthopaedic Journal Article? J Bone Joint Surg Am 2023;105:1388-1392. [PMID: 37437021 DOI: 10.2106/jbjs.23.00473] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/14/2023]

167

Currie GM. Academic integrity and artificial intelligence: is ChatGPT hype, hero or heresy? Semin Nucl Med 2023;53:719-730. [PMID: 37225599 DOI: 10.1053/j.semnuclmed.2023.04.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 04/30/2023] [Indexed: 05/26/2023]

168

Ravi A, Neinstein A, Murray SG. Large Language Models and Medical Education: Preparing for a Rapid Transformation in How Trainees Will Learn to Be Doctors. ATS Sch 2023;4:282-292. [PMID: 37795112 PMCID: PMC10547030 DOI: 10.34197/ats-scholar.2023-0036ps] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Accepted: 06/01/2023] [Indexed: 10/06/2023] Open

169

Fink MA. [Large language models such as ChatGPT and GPT-4 for patient-centered care in radiology]. RADIOLOGIE (HEIDELBERG, GERMANY) 2023;63:665-671. [PMID: 37615692 DOI: 10.1007/s00117-023-01187-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 07/14/2023] [Indexed: 08/25/2023]

170

Chervenak J, Lieman H, Blanco-Breindel M, Jindal S. The promise and peril of using a large language model to obtain clinical information: ChatGPT performs strongly as a fertility counseling tool with limitations. Fertil Steril 2023;120:575-583. [PMID: 37217092 DOI: 10.1016/j.fertnstert.2023.05.151] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 05/01/2023] [Accepted: 05/12/2023] [Indexed: 05/24/2023]

Abstract

OBJECTIVE

To compare the responses of the large language model-based "ChatGPT" to reputable sources when given fertility-related clinical prompts.

DESIGN

The "Feb 13" version of ChatGPT by OpenAI was tested against established sources relating to patient-oriented clinical information: 17 "frequently asked questions (FAQs)" about infertility on the Centers for Disease Control (CDC) Website, 2 validated fertility knowledge surveys, the Cardiff Fertility Knowledge Scale and the Fertility and Infertility Treatment Knowledge Score, as well as the American Society for Reproductive Medicine committee opinion "optimizing natural fertility."

SETTING

Academic medical center.

PATIENT(S)

Online AI Chatbot.

INTERVENTION(S)

Frequently asked questions, survey questions and rephrased summary statements were entered as prompts in the chatbot over a 1-week period in February 2023.

MAIN OUTCOME MEASURE(S)

For FAQs from CDC: words/response, sentiment analysis polarity and objectivity, total factual statements, rate of statements that were incorrect, referenced a source, or noted the value of consulting providers.

FOR FERTILITY KNOWLEDGE SURVEYS

Percentile according to published population data.

FOR COMMITTEE OPINION

Whether response to conclusions rephrased as questions identified missing facts.

RESULT(S)

When administered the CDC's 17 infertility FAQ's, ChatGPT produced responses of similar length (207.8 ChatGPT vs. 181.0 CDC words/response), factual content (8.65 factual statements/response vs. 10.41), sentiment polarity (mean 0.11 vs. 0.11 on a scale of -1 (negative) to 1 (positive)), and subjectivity (mean 0.42 vs. 0.35 on a scale of 0 (objective) to 1 (subjective)). In total, 9 (6.12%) of 147 ChatGPT factual statements were categorized as incorrect, and only 1 (0.68%) statement cited a reference. ChatGPT would have been at the 87th percentile of Bunting's 2013 international cohort for the Cardiff Fertility Knowledge Scale and at the 95th percentile on the basis of Kudesia's 2017 cohort for the Fertility and Infertility Treatment Knowledge Score. ChatGPT reproduced the missing facts for all 7 summary statements from "optimizing natural fertility."

CONCLUSION(S)

A February 2023 version of "ChatGPT" demonstrates the ability of generative artificial intelligence to produce relevant, meaningful responses to fertility-related clinical queries comparable to established sources. Although performance may improve with medical domain-specific training, limitations such as the inability to reliably cite sources and the unpredictable possibility of fabricated information may limit its clinical use.

Collapse

171

Nazir A, Wang Z. A Comprehensive Survey of ChatGPT: Advancements, Applications, Prospects, and Challenges. META-RADIOLOGY 2023;1:100022. [PMID: 37901715 PMCID: PMC10611551 DOI: 10.1016/j.metrad.2023.100022] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/31/2023]

172

Ariyaratne S, Iyengar KP, Nischal N, Chitti Babu N, Botchu R. A comparison of ChatGPT-generated articles with human-written articles. Skeletal Radiol 2023;52:1755-1758. [PMID: 37059827 DOI: 10.1007/s00256-023-04340-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 04/06/2023] [Accepted: 04/09/2023] [Indexed: 04/16/2023]

173

Doo FX, Cook TS, Siegel EL, Joshi A, Parekh V, Elahi A, Yi PH. Exploring the Clinical Translation of Generative Models Like ChatGPT: Promise and Pitfalls in Radiology, From Patients to Population Health. J Am Coll Radiol 2023;20:877-885. [PMID: 37467871 DOI: 10.1016/j.jacr.2023.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 06/22/2023] [Accepted: 07/05/2023] [Indexed: 07/21/2023]

Affiliation(s)

Florence X Doo Director of Innovation, University of Maryland Medical Intelligent Imaging Center (UM2ii), Baltimore, Maryland; Member, Committee on Economics in Academic Radiology, under the ACR Commission on Economics.
Tessa S Cook Vice Chair for Practice Transformation, Department of Radiology, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania; Fellowship Director, Imaging Informatics, and Chief, 3-D and Advanced Imaging, Department of Radiology, Penn Medicine, Philadelphia, Pennsylvania; Chair, Society for Imaging Informatics in Medicine; and Vice Chair, ACR Commission on Patient- and Family-Centered Care; Chair, RAHSR Affinity Group. https://twitter.com/asset25
Eliot L Siegel Vice Chair, Research Information Systems, University of Maryland, Baltimore, Maryland; Lead, Radiology and Nuclear Medicine Diagnostics, US Department of Veterans Affairs Veterans Integrated Services Network; Chief, Imaging, US Department of Veterans Affairs Maryland Healthcare System; Radiology AI Senior Consultant. https://twitter.com/EliotSiegel
Anupam Joshi Oros Family Professor and Chair, Computer Science and Electrical Engineering, University of Maryland, Baltimore, Maryland; Director, University of Maryland, Baltimore County, Center for Cybersecurity; Director, CyberScholars Program; Associate Editor, IEEE Transactions on Dependable and Secure Computing
Vishwa Parekh Technical Director, University of Maryland Medical Intelligent Imaging (UM2ii) Center, Baltimore, Maryland; Review Editor, Frontiers in Oncology. https://twitter.com/vishwa_parekh
Ameena Elahi University of Pennsylvania, Philadelphia, Pennsylvania; Application Manager, Information Services, Penn Medicine, Philadelphia, Pennsylvania; Informatics Operations Director, RAD-AID International. https://twitter.com/AmeenaElahi
Paul H Yi Director, University of Maryland Medical Intelligent Imaging (UM2ii) Center, Baltimore, Maryland; Vice Chair, Society of Imaging Informatics in Medicine Program Planning Committee; Associate Editor, Radiology: Artificial Intelligence. https://twitter.com/PaulYiMD

Collapse

174

Goktas P, Karakaya G, Kalyoncu AF, Damadoglu E. Artificial Intelligence Chatbots in Allergy and Immunology Practice: Where Have We Been and Where Are We Going? THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2023;11:2697-2700. [PMID: 37301435 DOI: 10.1016/j.jaip.2023.05.042] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 05/22/2023] [Accepted: 05/25/2023] [Indexed: 06/12/2023]

175

Suppadungsuk S, Thongprayoon C, Krisanapan P, Tangpanithandee S, Garcia Valencia O, Miao J, Mekraksakit P, Kashani K, Cheungpasitporn W. Examining the Validity of ChatGPT in Identifying Relevant Nephrology Literature: Findings and Implications. J Clin Med 2023;12:5550. [PMID: 37685617 PMCID: PMC10488525 DOI: 10.3390/jcm12175550] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 08/21/2023] [Accepted: 08/24/2023] [Indexed: 09/10/2023] Open

Abstract

Literature reviews are valuable for summarizing and evaluating the available evidence in various medical fields, including nephrology. However, identifying and exploring the potential sources requires focus and time devoted to literature searching for clinicians and researchers. ChatGPT is a novel artificial intelligence (AI) large language model (LLM) renowned for its exceptional ability to generate human-like responses across various tasks. However, whether ChatGPT can effectively assist medical professionals in identifying relevant literature is unclear. Therefore, this study aimed to assess the effectiveness of ChatGPT in identifying references to literature reviews in nephrology. We keyed the prompt "Please provide the references in Vancouver style and their links in recent literature on… name of the topic" into ChatGPT-3.5 (03/23 Version). We selected all the results provided by ChatGPT and assessed them for existence, relevance, and author/link correctness. We recorded each resource's citations, authors, title, journal name, publication year, digital object identifier (DOI), and link. The relevance and correctness of each resource were verified by searching on Google Scholar. Of the total 610 references in the nephrology literature, only 378 (62%) of the references provided by ChatGPT existed, while 31% were fabricated, and 7% of citations were incomplete references. Notably, only 122 (20%) of references were authentic. Additionally, 256 (68%) of the links in the references were found to be incorrect, and the DOI was inaccurate in 206 (54%) of the references. Moreover, among those with a link provided, the link was correct in only 20% of cases, and 3% of the references were irrelevant. Notably, an analysis of specific topics in electrolyte, hemodialysis, and kidney stones found that >60% of the references were inaccurate or misleading, with less reliable authorship and links provided by ChatGPT. Based on our findings, the use of ChatGPT as a sole resource for identifying references to literature reviews in nephrology is not recommended. Future studies could explore ways to improve AI language models' performance in identifying relevant nephrology literature.

Collapse

Affiliation(s)

Supawadee Suppadungsuk Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.) Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Samut Prakan 10540, Thailand
Charat Thongprayoon Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)
Pajaree Krisanapan Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.) Division of Nephrology, Thammasat University Hospital, Pathum Thani 12120, Thailand
Supawit Tangpanithandee Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.) Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Samut Prakan 10540, Thailand
Oscar Garcia Valencia Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)
Jing Miao Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)
Poemlarp Mekraksakit Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)
Kianoush Kashani Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)
Wisit Cheungpasitporn Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)

Collapse

176

Leung TI, Sagar A, Shroff S, Henry TL. Can AI Mitigate Bias in Writing Letters of Recommendation? JMIR MEDICAL EDUCATION 2023;9:e51494. [PMID: 37610808 PMCID: PMC10483302 DOI: 10.2196/51494] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 08/08/2023] [Accepted: 08/08/2023] [Indexed: 08/24/2023]

177

Hsu HY, Hsu KC, Hou SY, Wu CL, Hsieh YW, Cheng YD. Examining Real-World Medication Consultations and Drug-Herb Interactions: ChatGPT Performance Evaluation. JMIR MEDICAL EDUCATION 2023;9:e48433. [PMID: 37561097 PMCID: PMC10477918 DOI: 10.2196/48433] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 06/23/2023] [Accepted: 07/25/2023] [Indexed: 08/11/2023]

178

Wang X, Gong Z, Wang G, Jia J, Xu Y, Zhao J, Fan Q, Wu S, Hu W, Li X. ChatGPT Performs on the Chinese National Medical Licensing Examination. J Med Syst 2023;47:86. [PMID: 37581690 DOI: 10.1007/s10916-023-01961-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 06/22/2023] [Indexed: 08/16/2023]

179

Patil NS, Huang RS, van der Pol CB, Larocque N. Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment. Can Assoc Radiol J 2023:8465371231193716. [PMID: 37578849 DOI: 10.1177/08465371231193716] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/16/2023] Open

Abstract

PURPOSE

Bard by Google, a direct competitor to ChatGPT, was recently released. Understanding the relative performance of these different chatbots can provide important insight into their strengths and weaknesses as well as which roles they are most suited to fill. In this project, we aimed to compare the most recent version of ChatGPT, ChatGPT-4, and Bard by Google, in their ability to accurately respond to radiology board examination practice questions.

METHODS

Text-based questions were collected from the 2017-2021 American College of Radiology's Diagnostic Radiology In-Training (DXIT) examinations. ChatGPT-4 and Bard were queried, and their comparative accuracies, response lengths, and response times were documented. Subspecialty-specific performance was analyzed as well.

RESULTS

318 questions were included in our analysis. ChatGPT answered significantly more accurately than Bard (87.11% vs 70.44%, P < .0001). ChatGPT's response length was significantly shorter than Bard's (935.28 ± 440.88 characters vs 1437.52 ± 415.91 characters, P < .0001). ChatGPT's response time was significantly longer than Bard's (26.79 ± 3.27 seconds vs 7.55 ± 1.88 seconds, P < .0001). ChatGPT performed superiorly to Bard in neuroradiology, (100.00% vs 86.21%, P = .03), general & physics (85.39% vs 68.54%, P < .001), nuclear medicine (80.00% vs 56.67%, P < .01), pediatric radiology (93.75% vs 68.75%, P = .03), and ultrasound (100.00% vs 63.64%, P < .001). In the remaining subspecialties, there were no significant differences between ChatGPT and Bard's performance.

CONCLUSION

ChatGPT displayed superior radiology knowledge compared to Bard. While both chatbots display reasonable radiology knowledge, they should be used with conscious knowledge of their limitations and fallibility. Both chatbots provided incorrect or illogical answer explanations and did not always address the educational content of the question.

Collapse

180

Alanzi TM. Impact of ChatGPT on Teleconsultants in Healthcare: Perceptions of Healthcare Experts in Saudi Arabia. J Multidiscip Healthc 2023;16:2309-2321. [PMID: 37601325 PMCID: PMC10438433 DOI: 10.2147/jmdh.s419847] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 08/01/2023] [Indexed: 08/22/2023] Open

181

Wang C, Liu S, Yang H, Guo J, Wu Y, Liu J. Ethical Considerations of Using ChatGPT in Health Care. J Med Internet Res 2023;25:e48009. [PMID: 37566454 PMCID: PMC10457697 DOI: 10.2196/48009] [Citation(s) in RCA: 58] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 07/05/2023] [Accepted: 07/25/2023] [Indexed: 08/12/2023] Open

182

Lin Z. Why and how to embrace AI such as ChatGPT in your academic life. ROYAL SOCIETY OPEN SCIENCE 2023;10:230658. [PMID: 37621662 PMCID: PMC10445029 DOI: 10.1098/rsos.230658] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 08/03/2023] [Indexed: 08/26/2023]

183

Şendur HN, Şendur AB, Cerit MN. ChatGPT from radiologists' perspective. Br J Radiol 2023;96:20230203. [PMID: 37183840 PMCID: PMC10392643 DOI: 10.1259/bjr.20230203] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 04/12/2023] [Accepted: 04/23/2023] [Indexed: 05/16/2023] Open

184

Yang R, Tan TF, Lu W, Thirunavukarasu AJ, Ting DSW, Liu N. Large language models in health care: Development, applications, and challenges. HEALTH CARE SCIENCE 2023;2:255-263. [PMID: 38939520 PMCID: PMC11080827 DOI: 10.1002/hcs2.61] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 06/10/2023] [Accepted: 06/12/2023] [Indexed: 06/29/2024]

185

Ariyaratne S, Botchu R, Iyengar KP. ChatGPT in academic publishing: An ally or an adversary? Scott Med J 2023;68:129-130. [PMID: 37151080 DOI: 10.1177/00369330231174231] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]

186

Wornow M, Xu Y, Thapa R, Patel B, Steinberg E, Fleming S, Pfeffer MA, Fries J, Shah NH. The shaky foundations of large language models and foundation models for electronic health records. NPJ Digit Med 2023;6:135. [PMID: 37516790 PMCID: PMC10387101 DOI: 10.1038/s41746-023-00879-8] [Citation(s) in RCA: 43] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 07/13/2023] [Indexed: 07/31/2023] Open

187

Kung JE, Marshall C, Gauthier C, Gonzalez TA, Jackson JB. Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination. JB JS Open Access 2023;8:e23.00056. [PMID: 37693092 PMCID: PMC10484364 DOI: 10.2106/jbjs.oa.23.00056] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/12/2023] Open

Abstract

Background

Artificial intelligence (AI) holds potential in improving medical education and healthcare delivery. ChatGPT is a state-of-the-art natural language processing AI model which has shown impressive capabilities, scoring in the top percentiles on numerous standardized examinations, including the Uniform Bar Exam and Scholastic Aptitude Test. The goal of this study was to evaluate ChatGPT performance on the Orthopaedic In-Training Examination (OITE), an assessment of medical knowledge for orthopedic residents.

Methods

OITE 2020, 2021, and 2022 questions without images were inputted into ChatGPT version 3.5 and version 4 (GPT-4) with zero prompting. The performance of ChatGPT was evaluated as a percentage of correct responses and compared with the national average of orthopedic surgery residents at each postgraduate year (PGY) level. ChatGPT was asked to provide a source for its answer, which was categorized as being a journal article, book, or website, and if the source could be verified. Impact factor for the journal cited was also recorded.

Results

ChatGPT answered 196 of 360 answers correctly (54.3%), corresponding to a PGY-1 level. ChatGPT cited a verifiable source in 47.2% of questions, with an average median journal impact factor of 5.4. GPT-4 answered 265 of 360 questions correctly (73.6%), corresponding to the average performance of a PGY-5 and exceeding the corresponding passing score for the American Board of Orthopaedic Surgery Part I Examination of 67%. GPT-4 cited a verifiable source in 87.9% of questions, with an average median journal impact factor of 5.2.

Conclusions

ChatGPT performed above the average PGY-1 level and GPT-4 performed better than the average PGY-5 level, showing major improvement. Further investigation is needed to determine how successive versions of ChatGPT would perform and how to optimize this technology to improve medical education.

Clinical Relevance

AI has the potential to aid in medical education and healthcare delivery.

Collapse

188

Mago J, Sharma M. The Potential Usefulness of ChatGPT in Oral and Maxillofacial Radiology. Cureus 2023;15:e42133. [PMID: 37476297 PMCID: PMC10355343 DOI: 10.7759/cureus.42133] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/19/2023] [Indexed: 07/22/2023] Open

Abstract

Aim This study aimed to evaluate the potential usefulness of Chat Generated Pre-Trained Transformer-3 (ChatGPT-3) in oral and maxillofacial radiology for report writing by identifying radiographic anatomical landmarks and learning about oral and maxillofacial pathologies and their radiographic features. The study also aimed to evaluate the performance of ChatGPT-3 and its usage in oral and maxillofacial radiology training. Materials and methods A questionnaire consisting of 80 questions was queried on the OpenAI app ChatGPT-3. The questions were stratified based on three categories. The categorization was based on random anatomical landmarks, oral and maxillofacial pathologies, and the radiographic features of some of these pathologies. One oral and maxillofacial radiologist evaluated queries that were answered by the ChatGPT-3 model and rated them on a 4-point, modified Likert scale. The post-survey analysis for the performance of ChatGPT-3 was based on the Strengths, Weaknesses, Opportunities, and Threats (SWOT) analysis, its application in oral and maxillofacial radiology training, and its recommended use. Results In order of efficiency, Chat GPT-3 gave 100% accuracy in describing radiographic landmarks. However, the content of the oral and maxillofacial pathologies was limited to major or characteristic radiographic features. The mean scores for the queries related to the anatomic landmarks, oral and maxillofacial pathologies, and radiographic features of the oral and maxillofacial pathologies were 3.94, 3.85, and 3.96, respectively. However, the median and mode scores were 4 and were similar to all categories. The data for the oral and maxillofacial pathologies when the questions were not specifically included in the format of the introduction of the pathology, causes, symptoms, and treatment. Out of two abbreviations, one was not answered correctly. Conclusion The study showed that ChatGPT-3 is efficient in describing the pathology, characteristic radiographic features, and describing anatomical landmarks. ChatGPT-3 can be used as an adjunct when an oral radiologist needs additional information on any pathology, however, it cannot be the mainstay for reference. ChatGPT-3 is less detail-oriented, and the data has a risk of infodemics and the possibility of medical errors. However, Chat GPT-3 can be an excellent tool in helping the community in increasing the knowledge and awareness of various pathologies and decreasing the anxiety of the patients while dental healthcare professionals formulate an appropriate treatment plan.

Collapse

189

Grech V, Cuschieri S, Eldawlatly AA. Artificial intelligence in medicine and research - the good, the bad, and the ugly. Saudi J Anaesth 2023;17:401-406. [PMID: 37601525 PMCID: PMC10435812 DOI: 10.4103/sja.sja_344_23] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 04/26/2023] [Indexed: 08/22/2023] Open

190

Liu J, Wang C, Liu S. Utility of ChatGPT in Clinical Practice. J Med Internet Res 2023;25:e48568. [PMID: 37379067 PMCID: PMC10365580 DOI: 10.2196/48568] [Citation(s) in RCA: 79] [Impact Index Per Article: 79.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 05/29/2023] [Accepted: 06/15/2023] [Indexed: 06/29/2023] Open

191

Kusunose K, Kashima S, Sata M. Evaluation of the Accuracy of ChatGPT in Answering Clinical Questions on the Japanese Society of Hypertension Guidelines. Circ J 2023;87:1030-1033. [PMID: 37286486 DOI: 10.1253/circj.cj-23-0308] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

192

Taylor CR, Monga N, Johnson C, Hawley JR, Patel M. Artificial Intelligence Applications in Breast Imaging: Current Status and Future Directions. Diagnostics (Basel) 2023;13:2041. [PMID: 37370936 DOI: 10.3390/diagnostics13122041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 05/20/2023] [Accepted: 05/29/2023] [Indexed: 06/29/2023] Open

193

Darzidehkalani E. ChatGPT in Medical Publications. Radiology 2023;307:e231188. [PMID: 37278630 DOI: 10.1148/radiol.231188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

194

Yu H. Reflection on whether Chat GPT should be banned by academia from the perspective of education and teaching. Front Psychol 2023;14:1181712. [PMID: 37325766 PMCID: PMC10267436 DOI: 10.3389/fpsyg.2023.1181712] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 05/16/2023] [Indexed: 06/17/2023] Open

195

Choi EPH, Lee JJ, Ho MH, Kwok JYY, Lok KYW. Chatting or cheating? The impacts of ChatGPT and other artificial intelligence language models on nurse education. NURSE EDUCATION TODAY 2023;125:105796. [PMID: 36934624 DOI: 10.1016/j.nedt.2023.105796] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 03/02/2023] [Accepted: 03/09/2023] [Indexed: 06/18/2023]

196

Lourenco AP, Slanetz PJ, Baird GL. Rise of ChatGPT: It May Be Time to Reassess How We Teach and Test Radiology Residents. Radiology 2023:231053. [PMID: 37191490 DOI: 10.1148/radiol.231053] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]

197

Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations. Radiology 2023:230582. [PMID: 37191485 DOI: 10.1148/radiol.230582] [Citation(s) in RCA: 127] [Impact Index Per Article: 127.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]

Abstract

Background ChatGPT is a powerful artificial intelligence large language model with great potential as a tool in medical practice and education, but its performance in radiology remains unclear. Purpose To assess the performance of ChatGPT on radiology board-style examination questions without images and to explore its strengths and limitations. Materials and Methods In this exploratory prospective study performed from February 25 to March 3, 2023, 150 multiple-choice questions designed to match the style, content, and difficulty of the Canadian Royal College and American Board of Radiology examinations were grouped by question type (lower-order [recall, understanding] and higher-order [apply, analyze, synthesize] thinking) and topic (physics, clinical). The higher-order thinking questions were further subclassified by type (description of imaging findings, clinical management, application of concepts, calculation and classification, disease associations). ChatGPT performance was evaluated overall, by question type, and by topic. Confidence of language in responses was assessed. Univariable analysis was performed. Results ChatGPT answered 69% of questions correctly (104 of 150). The model performed better on questions requiring lower-order thinking (84%, 51 of 61) than on those requiring higher-order thinking (60%, 53 of 89) (P = .002). When compared with lower-order questions, the model performed worse on questions involving description of imaging findings (61%, 28 of 46; P = .04), calculation and classification (25%, two of eight; P = .01), and application of concepts (30%, three of 10; P = .01). ChatGPT performed as well on higher-order clinical management questions (89%, 16 of 18) as on lower-order questions (P = .88). It performed worse on physics questions (40%, six of 15) than on clinical questions (73%, 98 of 135) (P = .02). ChatGPT used confident language consistently, even when incorrect (100%, 46 of 46). Conclusion Despite no radiology-specific pretraining, ChatGPT nearly passed a radiology board-style examination without images; it performed well on lower-order thinking questions and clinical management questions but struggled with higher-order thinking questions involving description of imaging findings, calculation and classification, and application of concepts. © RSNA, 2023 See also the editorial by Lourenco et al in this issue.

Collapse

198

Nune A, Iyengar KP, Manzo C, Barman B, Botchu R. Chat generative pre-trained transformer (ChatGPT): potential implications for rheumatology practice. Rheumatol Int 2023;43:1379-1380. [PMID: 37145135 DOI: 10.1007/s00296-023-05340-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 04/29/2023] [Indexed: 05/06/2023]

199

Singh S, Djalilian A, Ali MJ. ChatGPT and Ophthalmology: Exploring Its Potential with Discharge Summaries and Operative Notes. Semin Ophthalmol 2023:1-5. [PMID: 37133418 DOI: 10.1080/08820538.2023.2209166] [Citation(s) in RCA: 41] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Abstract

PURPOSE

This study aimed to report the abilities of the large language model ChatGPT^R (OpenAI, San Francisco, USA) in constructing ophthalmic discharge summaries and operative notes.

METHODS

A set of prompts was constructed through statements incorporating common ophthalmic surgeries across the subspecialties of the cornea, retina, glaucoma, paediatric ophthalmology, neuro-ophthalmology, and ophthalmic plastics surgery. The responses of ChatGPT were assessed by three surgeons carefully and analyzed them for evidence-based content, specificity of the response, presence of generic text, disclaimers, factual inaccuracies, and its abilities to admit mistakes and challenge incorrect premises.

RESULTS

A total of 24 prompts were presented to the ChatGPT. Twelve prompts assessed its ability to construct discharge summaries, and an equal number explored the potential for preparing operative notes. The response was found to be tailored based on the quality of inputs given and was provided in a matter of seconds. The ophthalmic discharge summaries had a valid but significant generic text. ChatGPT could incorporate specific medications, follow-up instructions, consultation time, and location within the discharge summaries when prompted appropriately. While the operative notes were detailed, they required significant tuning. ChatGPT routinely admits its mistakes and corrects itself immediately when confronted with factual inaccuracies. The mistakes are avoided in subsequent reports when given similar prompts.

CONCLUSION

The performance of ChatGPT in the context of ophthalmic discharge summaries and operative notes was encouraging. These are constructed rapidly in a matter of seconds. Focused training of ChatGPT on these issues with inclusion of a human verification step has an enormous potential to impact healthcare positively.

Collapse

200

Ufuk F. The Role and Limitations of Large Language Models Such as ChatGPT in Clinical Settings and Medical Journalism. Radiology 2023;307:e230276. [PMID: 36880943 DOI: 10.1148/radiol.230276] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]