Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

945
(from Reference Citation Analysis)

Article PDFs (73)

Cited by > 0 (305)

Searched Name

ChatGPT

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Leng L. Challenge, integration, and change: ChatGPT and future anatomical education. Med Educ Online 2024;29:2304973. [PMID: 38217884 PMCID: PMC10791098 DOI: 10.1080/10872981.2024.2304973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/08/2024] [Indexed: 01/15/2024]

Suárez A, Jiménez J, Llorente de Pedro M, Andreu-Vázquez C, Díaz-Flores García V, Gómez Sánchez M, Freire Y. Beyond the Scalpel: Assessing ChatGPT's potential as an auxiliary intelligent virtual assistant in oral surgery. Comput Struct Biotechnol J 2024;24:46-52. [PMID: 38162955 PMCID: PMC10755495 DOI: 10.1016/j.csbj.2023.11.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/28/2023] [Accepted: 11/28/2023] [Indexed: 01/03/2024] Open

Baxter SL, Longhurst CA, Millen M, Sitapati AM, Tai-Seale M. Generative artificial intelligence responses to patient messages in the electronic health record: early lessons learned. JAMIA Open 2024;7:ooae028. [PMID: 38601475 PMCID: PMC11006101 DOI: 10.1093/jamiaopen/ooae028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 03/18/2024] [Accepted: 04/03/2024] [Indexed: 04/12/2024] Open

Tailor PD, Dalvin LA, Chen JJ, Iezzi R, Olsen TW, Scruggs BA, Barkmeier AJ, Bakri SJ, Ryan EH, Tang PH, Parke DW, Belin PJ, Sridhar J, Xu D, Kuriyan AE, Yonekawa Y, Starr MR. A Comparative Study of Responses to Retina Questions from Either Experts, Expert-Edited Large Language Models, or Expert-Edited Large Language Models Alone. Ophthalmol Sci 2024;4:100485. [PMID: 38660460 PMCID: PMC11041826 DOI: 10.1016/j.xops.2024.100485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 01/03/2024] [Accepted: 02/01/2024] [Indexed: 04/26/2024]

Abstract

Objective

To assess the quality, empathy, and safety of expert edited large language model (LLM), human expert created, and LLM responses to common retina patient questions.

Design

Randomized, masked multicenter study.

Participants

Twenty-one common retina patient questions were randomly assigned among 13 retina specialists.

Methods

Each expert created a response (Expert) and then edited a LLM (ChatGPT-4)-generated response to that question (Expert + artificial intelligence [AI]), timing themselves for both tasks. Five LLMs (ChatGPT-3.5, ChatGPT-4, Claude 2, Bing, and Bard) also generated responses to each question. The original question along with anonymized and randomized Expert + AI, Expert, and LLM responses were evaluated by the other experts who did not write an expert response to the question. Evaluators judged quality and empathy (very poor, poor, acceptable, good, or very good) along with safety metrics (incorrect information, likelihood to cause harm, extent of harm, and missing content).

Main Outcome

Mean quality and empathy score, proportion of responses with incorrect information, likelihood to cause harm, extent of harm, and missing content for each response type.

Results

There were 4008 total grades collected (2608 for quality and empathy; 1400 for safety metrics), with significant differences in both quality and empathy (P < 0.001, P < 0.001) between LLM, Expert and Expert + AI groups. For quality, Expert + AI (3.86 ± 0.85) performed the best overall while GPT-3.5 (3.75 ± 0.79) was the top performing LLM. For empathy, GPT-3.5 (3.75 ± 0.69) had the highest mean score followed by Expert + AI (3.73 ± 0.63). By mean score, Expert placed 4 out of 7 for quality and 6 out of 7 for empathy. For both quality (P < 0.001) and empathy (P < 0.001), expert-edited LLM responses performed better than expert-created responses. There were time savings for an expert-edited LLM response versus expert-created response (P = 0.02). ChatGPT-4 performed similar to Expert for inappropriate content (P = 0.35), missing content (P = 0.001), extent of possible harm (P = 0.356), and likelihood of possible harm (P = 0.129).

Conclusions

In this randomized, masked, multicenter study, LLM responses were comparable with experts in terms of quality, empathy, and safety metrics, warranting further exploration of their potential benefits in clinical settings.

Financial Disclosures

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of the article.

Collapse

Young AT, Lane BN, Ozog D, Matthews NH. Patients and dermatologists are largely satisfied with ChatGPT-generated after-visit summaries: A pilot study. JAAD Int 2024;15:33-35. [PMID: 38371667 PMCID: PMC10869927 DOI: 10.1016/j.jdin.2023.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2024] Open

An H, Li X, Huang Y, Wang W, Wu Y, Liu L, Ling W, Li W, Zhao H, Lu D, Liu Q, Jiang G. A new ChatGPT-empowered, easy-to-use machine learning paradigm for environmental science. Eco Environ Health 2024;3:131-136. [PMID: 38638173 PMCID: PMC11021822 DOI: 10.1016/j.eehl.2024.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 12/23/2023] [Accepted: 01/02/2024] [Indexed: 04/20/2024]

Affiliation(s)

Haoyuan An State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China Biomedical Engineering Institute, School of Control Science and Engineering, Shandong University, Jinan 250061, China
Xiangyu Li State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
Yuming Huang State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
Weichao Wang State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
Yuehan Wu State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
Lin Liu State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
Weibo Ling State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
Wei Li Biomedical Engineering Institute, School of Control Science and Engineering, Shandong University, Jinan 250061, China
Hanzhu Zhao Biomedical Engineering Institute, School of Control Science and Engineering, Shandong University, Jinan 250061, China
Dawei Lu State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
Qian Liu State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
Guibin Jiang State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China

Collapse

Buitrago-Esquinas EM, Puig-Cabrera M, Santos JAC, Custódio-Santos M, Yñiguez-Ovando R. Developing a hetero-intelligence methodological framework for sustainable policy-making based on the assessment of large language models. MethodsX 2024;12:102707. [PMID: 38650999 PMCID: PMC11033193 DOI: 10.1016/j.mex.2024.102707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 04/09/2024] [Indexed: 04/25/2024] Open

Grippaudo F, Nigrelli S, Patrignani A, Ribuffo D. Quality of the Information provided by ChatGPT for Patients in Breast Plastic Surgery: Are we already in the future? JPRAS Open 2024;40:99-105. [PMID: 38444627 PMCID: PMC10914413 DOI: 10.1016/j.jpra.2024.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 02/04/2024] [Indexed: 03/07/2024] Open

Chang CT, Ticknor IL, Spinelli JA, Bhatia BK, Marwaha S, Mirmirani P, Seidler AM, Man JR, McCleskey PE. Comparison of large language models in generating patient handouts for the dermatology clinic: A blinded study. JAAD Int 2024;15:152-154. [PMID: 38571697 PMCID: PMC10988028 DOI: 10.1016/j.jdin.2024.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2024] Open

Lechien JR, Carroll TL, Huston MN, Naunheim MR. ChatGPT-4 accuracy for patient education in laryngopharyngeal reflux. Eur Arch Otorhinolaryngol 2024;281:2547-2552. [PMID: 38492008 DOI: 10.1007/s00405-024-08560-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 02/13/2024] [Indexed: 03/18/2024]

Fournier A, Fallet C, Sadeghipour F, Perrottet N. Assessing the applicability and appropriateness of ChatGPT in answering clinical pharmacy questions. Ann Pharm Fr 2024;82:507-513. [PMID: 37992892 DOI: 10.1016/j.pharma.2023.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 11/16/2023] [Accepted: 11/16/2023] [Indexed: 11/24/2023]

Papastratis I, Stergioulas A, Konstantinidis D, Daras P, Dimitropoulos K. Can ChatGPT provide appropriate meal plans for NCD patients? Nutrition 2024;121:112291. [PMID: 38359704 DOI: 10.1016/j.nut.2023.112291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 10/30/2023] [Indexed: 02/17/2024]

Noda R, Izaki Y, Kitano F, Komatsu J, Ichikawa D, Shibagaki Y. Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal. Clin Exp Nephrol 2024;28:465-469. [PMID: 38353783 DOI: 10.1007/s10157-023-02451-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 12/25/2023] [Indexed: 04/23/2024]

González R, Poenaru D, Woo R, Trappey AF, Carter S, Darcy D, Encisco E, Gulack B, Miniati D, Tombash E, Huang EY. ChatGPT: What Every Pediatric Surgeon Should Know About Its Potential Uses and Pitfalls. J Pediatr Surg 2024;59:941-947. [PMID: 38336588 DOI: 10.1016/j.jpedsurg.2024.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 12/30/2023] [Accepted: 01/09/2024] [Indexed: 02/12/2024]

Araji T, Brooks AD. Evaluating The Role of ChatGPT as a Study Aid in Medical Education in Surgery. J Surg Educ 2024;81:753-757. [PMID: 38556438 DOI: 10.1016/j.jsurg.2024.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 01/15/2024] [Accepted: 01/25/2024] [Indexed: 04/02/2024]

Abstract

OBJECTIVE

Our aim was to assess how ChatGPT compares to Google search in assisting medical students during their surgery clerkships.

DESIGN

We conducted a crossover study where participants were asked to complete 2 standardized assessments on different general surgery topics before and after they used either Google search or ChatGPT.

SETTING

The study was conducted at the Perelman School of Medicine at the University of Pennsylvania (PSOM) in Philadelphia, Pennsylvania.

PARTICIPANTS

19 third-year medical students participated in our study.

RESULTS

The baseline (preintervention) performance of participants on both quizzes did not differ between the Google search and ChatGPT groups (p = 0.728). Students overall performed better postintervention and the difference in test scores was statistically significant for both the Google group (p < 0.001) and the ChatGPT group (p = 0.01). The mean percent increase in test scores pre- and postintervention was higher in the Google group at 11% vs. 10% in the ChatGPT group, but this difference was not statistically significant (p = 0.87). Similarly, there was no statistically significant difference in postintervention scores on both assessments between the 2 groups (p = 0.508). Postassessment surveys revealed that all students (100%) have known about ChatGPT before, and 47% have previously used it for various purposes. On a scale of 1 to 10 with 1 being the lowest and 10 being the highest, the feasibility of ChatGPT and its usefulness in finding answers were rated as 8.4 and 6.6 on average, respectively. When asked to rate the likelihood of using ChatGPT in their surgery rotation, the answers ranged between 1 and 3 ("Unlikely" 47%), 4 to 6 ("intermediate" 26%), and 7 to 10 ("likely" 26%).

CONCLUSION

Our results show that even though ChatGPT was comparable to Google search in finding answers pertaining to surgery questions, many students were reluctant to use ChatGPT for learning purposes during their surgery clerkship.

Collapse

Amin KS, Forman HP, Davis MA. Even with ChatGPT, race matters. Clin Imaging 2024;109:110113. [PMID: 38552383 DOI: 10.1016/j.clinimag.2024.110113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/15/2024] [Accepted: 02/24/2024] [Indexed: 04/17/2024]

Makhoul M, Melkane AE, Khoury PE, Hadi CE, Matar N. A cross-sectional comparative study: ChatGPT 3.5 versus diverse levels of medical experts in the diagnosis of ENT diseases. Eur Arch Otorhinolaryngol 2024;281:2717-2721. [PMID: 38365990 DOI: 10.1007/s00405-024-08509-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 01/24/2024] [Indexed: 02/18/2024]

Abu Arqub S, Al-Moghrabi D, Allareddy V, Upadhyay M, Vaid N, Yadav S. Content analysis of AI-generated (ChatGPT) responses concerning orthodontic clear aligners. Angle Orthod 2024;94:263-272. [PMID: 38195060 DOI: 10.2319/071123-484.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 11/01/2023] [Indexed: 01/11/2024] Open

Davis RJ, Ayo-Ajibola O, Lin ME, Swanson MS, Chambers TN, Kwon DI, Kokot NC. Evaluation of Oropharyngeal Cancer Information from Revolutionary Artificial Intelligence Chatbot. Laryngoscope 2024;134:2252-2257. [PMID: 37983846 DOI: 10.1002/lary.31191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/12/2023] [Accepted: 11/03/2023] [Indexed: 11/22/2023]

Abstract

OBJECTIVE

With burgeoning popularity of artificial intelligence-based chatbots, oropharyngeal cancer patients now have access to a novel source of medical information. Because chatbot information is not reviewed by experts, we sought to evaluate an artificial intelligence-based chatbot's oropharyngeal cancer-related information for accuracy.

METHODS

Fifteen oropharyngeal cancer-related questions were developed and input into ChatGPT version 3.5. Four physician-graders independently assessed accuracy, comprehensiveness, and similarity to a physician response using 5-point Likert scales. Responses graded lower than three were then critiqued by physician-graders. Critiques were analyzed using inductive thematic analysis. Readability of responses was assessed using Flesch Reading Ease (FRE) and Flesch-Kincaid Reading Grade Level (FKRGL) scales.

RESULTS

Average accuracy, comprehensiveness, and similarity to a physician response scores were 3.88 (SD = 0.99), 3.80 (SD = 1.14), and 3.67 (SD = 1.08), respectively. Posttreatment-related questions were most accurate, comprehensive, and similar to a physician response, followed by treatment-related, then diagnosis-related questions. Posttreatment-related questions scored significantly higher than diagnosis-related questions in all three domains (p < 0.01). Two themes of the physician critiques were identified: suboptimal education value and potential to misinform patients. The mean FRE and FKRGL scores both indicated greater than an 11th grade readability level-higher than the 6th grade level recommended for patients.

CONCLUSION

ChatGPT responses may not educate patients to an appropriate degree, could outright misinform them, and read at a more difficult grade level than is recommended for patient material. As oropharyngeal cancer patients represent a vulnerable population facing complex, life-altering diagnoses, and treatments, they should be cautious when consuming chatbot-generated medical information.

LEVEL OF EVIDENCE

NA Laryngoscope, 134:2252-2257, 2024.

Collapse

Farhat F. ChatGPT as a Complementary Mental Health Resource: A Boon or a Bane. Ann Biomed Eng 2024;52:1111-1114. [PMID: 37477707 DOI: 10.1007/s10439-023-03326-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 07/17/2023] [Indexed: 07/22/2023]

Khan U. Revolutionizing Personalized Protein Energy Malnutrition Treatment: Harnessing the Power of Chat GPT. Ann Biomed Eng 2024;52:1125-1127. [PMID: 37728811 DOI: 10.1007/s10439-023-03331-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 07/22/2023] [Indexed: 09/21/2023]

Timurkaynak Ö, Gönenli G. Response to Young et al., "The utility of ChatGPT in generating patient-facing and clinical responses for melanoma". J Am Acad Dermatol 2024;90:e177. [PMID: 38215796 DOI: 10.1016/j.jaad.2023.12.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 01/14/2024]

Shifai N, van Doorn R, Malvehy J, Sangers TE. Can ChatGPT vision diagnose melanoma? An exploratory diagnostic accuracy study. J Am Acad Dermatol 2024;90:1057-1059. [PMID: 38244612 DOI: 10.1016/j.jaad.2023.12.062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 12/07/2023] [Accepted: 12/24/2023] [Indexed: 01/22/2024]

Robinson MA, Belzberg M, Thakker S, Bibee K, Merkel E, MacFarlane DF, Lim J, Scott JF, Deng M, Lewin J, Soleymani D, Rosenfeld D, Liu R, Liu TYA, Ng E. Assessing the accuracy, usefulness, and readability of artificial-intelligence-generated responses to common dermatologic surgery questions for patient education: A double-blinded comparative study of ChatGPT and Google Bard. J Am Acad Dermatol 2024;90:1078-1080. [PMID: 38296195 DOI: 10.1016/j.jaad.2024.01.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/26/2023] [Accepted: 01/14/2024] [Indexed: 02/17/2024]

Ozgor BY, Simavi MA. Accuracy and reproducibility of ChatGPT's free version answers about endometriosis. Int J Gynaecol Obstet 2024;165:691-695. [PMID: 38108232 DOI: 10.1002/ijgo.15309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 11/27/2023] [Accepted: 12/04/2023] [Indexed: 12/19/2023]

Coleman MC, Moore JN. Two artificial intelligence models underperform on examinations in a veterinary curriculum. J Am Vet Med Assoc 2024;262:692-697. [PMID: 38382193 DOI: 10.2460/javma.23.12.0666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 01/08/2024] [Indexed: 02/23/2024]

Ferdush J, Begum M, Hossain ST. ChatGPT and Clinical Decision Support: Scope, Application, and Limitations. Ann Biomed Eng 2024;52:1119-1124. [PMID: 37516680 DOI: 10.1007/s10439-023-03329-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 07/18/2023] [Indexed: 07/31/2023]

Kıyak YS, Coşkun Ö, Budakoğlu Iİ, Uluoğlu C. ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam. Eur J Clin Pharmacol 2024;80:729-735. [PMID: 38353690 DOI: 10.1007/s00228-024-03649-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/03/2024] [Indexed: 04/09/2024]

Koga S, Martin NB, Dickson DW. Evaluating the performance of large language models: ChatGPT and Google Bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders. Brain Pathol 2024;34:e13207. [PMID: 37553205 PMCID: PMC11006994 DOI: 10.1111/bpa.13207] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 07/31/2023] [Indexed: 08/10/2023] Open

Zaboli A, Brigo F, Sibilio S, Mian M, Turcato G. Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage? Am J Emerg Med 2024;79:44-47. [PMID: 38341993 DOI: 10.1016/j.ajem.2024.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 02/02/2024] [Accepted: 02/04/2024] [Indexed: 02/13/2024] Open

Samaan JS, Rajeev N, Ng WH, Srinivasan N, Busam JA, Yeo YH, Samakar K. ChatGPT as a Source of Information for Bariatric Surgery Patients: a Comparative Analysis of Accuracy and Comprehensiveness Between GPT-4 and GPT-3.5. Obes Surg 2024;34:1987-1989. [PMID: 38564173 PMCID: PMC11031485 DOI: 10.1007/s11695-024-07212-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 03/22/2024] [Accepted: 03/28/2024] [Indexed: 04/04/2024]

Alessandri-Bonetti M, Liu HY, Giorgino R, Nguyen VT, Egro FM. The First Months of Life of ChatGPT and Its Impact in Healthcare: A Bibliometric Analysis of the Current Literature. Ann Biomed Eng 2024;52:1107-1110. [PMID: 37482572 DOI: 10.1007/s10439-023-03325-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 07/14/2023] [Indexed: 07/25/2023]

Sabour A, Ghassemi F. Methodological issues on precision and prediction value of ChatGPT in emergency department triage decisions. Am J Emerg Med 2024;79:198-199. [PMID: 38565486 DOI: 10.1016/j.ajem.2024.03.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 03/15/2024] [Indexed: 04/04/2024] Open

Yang J, Ardavanis KS, Slack KE, Fernando ND, Della Valle CJ, Hernandez NM. Chat Generative Pretrained Transformer (ChatGPT) and Bard: Artificial Intelligence Does not yet Provide Clinically Supported Answers for Hip and Knee Osteoarthritis. J Arthroplasty 2024;39:1184-1190. [PMID: 38237878 DOI: 10.1016/j.arth.2024.01.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 01/08/2024] [Accepted: 01/11/2024] [Indexed: 02/22/2024] Open

Abstract

BACKGROUND

Advancements in artificial intelligence (AI) have led to the creation of large language models (LLMs), such as Chat Generative Pretrained Transformer (ChatGPT) and Bard, that analyze online resources to synthesize responses to user queries. Despite their popularity, the accuracy of LLM responses to medical questions remains unknown. This study aimed to compare the responses of ChatGPT and Bard regarding treatments for hip and knee osteoarthritis with the American Academy of Orthopaedic Surgeons (AAOS) Evidence-Based Clinical Practice Guidelines (CPGs) recommendations.

METHODS

Both ChatGPT (Open AI) and Bard (Google) were queried regarding 20 treatments (10 for hip and 10 for knee osteoarthritis) from the AAOS CPGs. Responses were classified by 2 reviewers as being in "Concordance," "Discordance," or "No Concordance" with AAOS CPGs. A Cohen's Kappa coefficient was used to assess inter-rater reliability, and Chi-squared analyses were used to compare responses between LLMs.

RESULTS

Overall, ChatGPT and Bard provided responses that were concordant with the AAOS CPGs for 16 (80%) and 12 (60%) treatments, respectively. Notably, ChatGPT and Bard encouraged the use of non-recommended treatments in 30% and 60% of queries, respectively. There were no differences in performance when evaluating by joint or by recommended versus non-recommended treatments. Studies were referenced in 6 (30%) of the Bard responses and none (0%) of the ChatGPT responses. Of the 6 Bard responses, studies could only be identified for 1 (16.7%). Of the remaining, 2 (33.3%) responses cited studies in journals that did not exist, 2 (33.3%) cited studies that could not be found with the information given, and 1 (16.7%) provided links to unrelated studies.

CONCLUSIONS

Both ChatGPT and Bard do not consistently provide responses that align with the AAOS CPGs. Consequently, physicians and patients should temper expectations on the guidance AI platforms can currently provide.

Collapse

Dahri NA, Yahaya N, Al-Rahmi WM, Aldraiweesh A, Alturki U, Almutairy S, Shutaleva A, Soomro RB. Extended TAM based acceptance of AI-Powered ChatGPT for supporting metacognitive self-regulated learning in education: A mixed-methods study. Heliyon 2024;10:e29317. [PMID: 38628736 PMCID: PMC11016976 DOI: 10.1016/j.heliyon.2024.e29317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 04/02/2024] [Accepted: 04/04/2024] [Indexed: 04/19/2024] Open

Choudhury A, Chaudhry Z. Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals. J Med Internet Res 2024;26:e56764. [PMID: 38662419 DOI: 10.2196/56764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/26/2024] Open

Abstract

As the health care industry increasingly embraces large language models (LLMs), understanding the consequence of this integration becomes crucial for maximizing benefits while mitigating potential pitfalls. This paper explores the evolving relationship among clinician trust in LLMs, the transition of data sources from predominantly human-generated to artificial intelligence (AI)-generated content, and the subsequent impact on the performance of LLMs and clinician competence. One of the primary concerns identified in this paper is the LLMs' self-referential learning loops, where AI-generated content feeds into the learning algorithms, threatening the diversity of the data pool, potentially entrenching biases, and reducing the efficacy of LLMs. While theoretical at this stage, this feedback loop poses a significant challenge as the integration of LLMs in health care deepens, emphasizing the need for proactive dialogue and strategic measures to ensure the safe and effective use of LLM technology. Another key takeaway from our investigation is the role of user expertise and the necessity for a discerning approach to trusting and validating LLM outputs. The paper highlights how expert users, particularly clinicians, can leverage LLMs to enhance productivity by off-loading routine tasks while maintaining a critical oversight to identify and correct potential inaccuracies in AI-generated content. This balance of trust and skepticism is vital for ensuring that LLMs augment rather than undermine the quality of patient care. We also discuss the risks associated with the deskilling of health care professionals. Frequent reliance on LLMs for critical tasks could result in a decline in health care providers' diagnostic and thinking skills, particularly affecting the training and development of future professionals. The legal and ethical considerations surrounding the deployment of LLMs in health care are also examined. We discuss the medicolegal challenges, including liability in cases of erroneous diagnoses or treatment advice generated by LLMs. The paper references recent legislative efforts, such as The Algorithmic Accountability Act of 2023, as crucial steps toward establishing a framework for the ethical and responsible use of AI-based technologies in health care. In conclusion, this paper advocates for a strategic approach to integrating LLMs into health care. By emphasizing the importance of maintaining clinician expertise, fostering critical engagement with LLM outputs, and navigating the legal and ethical landscape, we can ensure that LLMs serve as valuable tools in enhancing patient care and supporting health care professionals. This approach addresses the immediate challenges posed by integrating LLMs and sets a foundation for their maintainable and responsible use in the future.

Collapse

Carobene A, Padoan A, Cabitza F, Banfi G, Plebani M. Rising adoption of artificial intelligence in scientific publishing: evaluating the role, risks, and ethical implications in paper drafting and review process. Clin Chem Lab Med 2024;62:835-843. [PMID: 38019961 DOI: 10.1515/cclm-2023-1136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 11/13/2023] [Indexed: 12/01/2023]

Abstract

BACKGROUND

In the rapid evolving landscape of artificial intelligence (AI), scientific publishing is experiencing significant transformations. AI tools, while offering unparalleled efficiencies in paper drafting and peer review, also introduce notable ethical concerns.

CONTENT

This study delineates AI's dual role in scientific publishing: as a co-creator in the writing and review of scientific papers and as an ethical challenge. We first explore the potential of AI as an enhancer of efficiency, efficacy, and quality in creating scientific papers. A critical assessment follows, evaluating the risks vs. rewards for researchers, especially those early in their careers, emphasizing the need to maintain a balance between AI's capabilities and fostering independent reasoning and creativity. Subsequently, we delve into the ethical dilemmas of AI's involvement, particularly concerning originality, plagiarism, and preserving the genuine essence of scientific discourse. The evolving dynamics further highlight an overlooked aspect: the inadequate recognition of human reviewers in the academic community. With the increasing volume of scientific literature, tangible metrics and incentives for reviewers are proposed as essential to ensure a balanced academic environment.

SUMMARY

AI's incorporation in scientific publishing is promising yet comes with significant ethical and operational challenges. The role of human reviewers is accentuated, ensuring authenticity in an AI-influenced environment.

OUTLOOK

As the scientific community treads the path of AI integration, a balanced symbiosis between AI's efficiency and human discernment is pivotal. Emphasizing human expertise, while exploit artificial intelligence responsibly, will determine the trajectory of an ethically sound and efficient AI-augmented future in scientific publishing.

Collapse

Shin E, Yu Y, Bies RR, Ramanathan M. Evaluation of ChatGPT and Gemini large language models for pharmacometrics with NONMEM. J Pharmacokinet Pharmacodyn 2024:10.1007/s10928-024-09921-y. [PMID: 38656706 DOI: 10.1007/s10928-024-09921-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 04/16/2024] [Indexed: 04/26/2024]

Ostrowska M, Kacała P, Onolememen D, Vaughan-Lane K, Sisily Joseph A, Ostrowski A, Pietruszewska W, Banaszewski J, Wróbel MJ. To trust or not to trust: evaluating the reliability and safety of AI responses to laryngeal cancer queries. Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08643-8. [PMID: 38652298 DOI: 10.1007/s00405-024-08643-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 03/26/2024] [Indexed: 04/25/2024]

Abstract

PURPOSE

As online health information-seeking surges, concerns mount over the quality and safety of accessible content, potentially leading to patient harm through misinformation. On one hand, the emergence of Artificial Intelligence (AI) in healthcare could prevent it; on the other hand, questions raise regarding the quality and safety of the medical information provided. As laryngeal cancer is a prevalent head and neck malignancy, this study aims to evaluate the utility and safety of three large language models (LLMs) as sources of patient information about laryngeal cancer.

METHODS

A cross-sectional study was conducted using three LLMs (ChatGPT 3.5, ChatGPT 4.0, and Bard). A questionnaire comprising 36 inquiries about laryngeal cancer was categorised into diagnosis (11 questions), treatment (9 questions), novelties and upcoming treatments (4 questions), controversies (8 questions), and sources of information (4 questions). The population of reviewers consisted of 3 groups, including ENT specialists, junior physicians, and non-medicals, who graded the responses. Each physician evaluated each question twice for each model, while non-medicals only once. Everyone was blinded to the model type, and the question order was shuffled. Outcome evaluations were based on a safety score (1-3) and a Global Quality Score (GQS, 1-5). Results were compared between LLMs. The study included iterative assessments and statistical validations.

RESULTS

Analysis revealed that ChatGPT 3.5 scored highest in both safety (mean: 2.70) and GQS (mean: 3.95). ChatGPT 4.0 and Bard had lower safety scores of 2.56 and 2.42, respectively, with corresponding quality scores of 3.65 and 3.38. Inter-rater reliability was consistent, with less than 3% discrepancy. About 4.2% of responses fell into the lowest safety category (1), particularly in the novelty category. Non-medical reviewers' quality assessments correlated moderately (r = 0.67) with response length.

CONCLUSIONS

LLMs can be valuable resources for patients seeking information on laryngeal cancer. ChatGPT 3.5 provided the most reliable and safe responses among the models evaluated.

Collapse

Pinto VBP, Gomes CM. Insights and future directions for ChatGPT in medical practice: Addressing comments on our study. Neurourol Urodyn 2024. [PMID: 38651742 DOI: 10.1002/nau.25479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 04/11/2024] [Indexed: 04/25/2024]

Moulin TC. Learning with AI Language Models: Guidelines for the Development and Scoring of Medical Questions for Higher Education. J Med Syst 2024;48:45. [PMID: 38652327 DOI: 10.1007/s10916-024-02069-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 04/11/2024] [Indexed: 04/25/2024]

Jagiella-Lodise O, Suh N, Zelenski NA. Can Patients Rely on ChatGPT to Answer Hand Pathology-Related Medical Questions? Hand (N Y) 2024:15589447241247246. [PMID: 38654498 DOI: 10.1177/15589447241247246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]

Tsai CY, Hsieh SJ, Huang HH, Deng JH, Huang YY, Cheng PY. Performance of ChatGPT on the Taiwan urology board examination: insights into current strengths and shortcomings. World J Urol 2024;42:250. [PMID: 38652322 DOI: 10.1007/s00345-024-04957-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open

Abstract

PURPOSE

To compare ChatGPT-4 and ChatGPT-3.5's performance on Taiwan urology board examination (TUBE), focusing on answer accuracy, explanation consistency, and uncertainty management tactics to minimize score penalties from incorrect responses across 12 urology domains.

METHODS

450 multiple-choice questions from TUBE(2020-2022) were presented to two models. Three urologists assessed correctness and consistency of each response. Accuracy quantifies correct answers; consistency assesses logic and coherence in explanations out of total responses, alongside a penalty reduction experiment with prompt variations. Univariate logistic regression was applied for subgroup comparison.

RESULTS

ChatGPT-4 showed strengths in urology, achieved an overall accuracy of 57.8%, with annual accuracies of 64.7% (2020), 58.0% (2021), and 50.7% (2022), significantly surpassing ChatGPT-3.5 (33.8%, OR = 2.68, 95% CI [2.05-3.52]). It could have passed the TUBE written exams if solely based on accuracy but failed in the final score due to penalties. ChatGPT-4 displayed a declining accuracy trend over time. Variability in accuracy across 12 urological domains was noted, with more frequently updated knowledge domains showing lower accuracy (53.2% vs. 62.2%, OR = 0.69, p = 0.05). A high consistency rate of 91.6% in explanations across all domains indicates reliable delivery of coherent and logical information. The simple prompt outperformed strategy-based prompts in accuracy (60% vs. 40%, p = 0.016), highlighting ChatGPT's limitations in its inability to accurately self-assess uncertainty and a tendency towards overconfidence, which may hinder medical decision-making.

CONCLUSIONS

ChatGPT-4's high accuracy and consistent explanations in urology board examination demonstrate its potential in medical information processing. However, its limitations in self-assessment and overconfidence necessitate caution in its application, especially for inexperienced users. These insights call for ongoing advancements of urology-specific AI tools.

Collapse

Mishra V, Sarraju A, Kalwani NM, Dexter JP. Evaluation of Prompts to Simplify Cardiovascular Disease Information Generated Using a Large Language Model: Cross-Sectional Study. J Med Internet Res 2024;26:e55388. [PMID: 38648104 DOI: 10.2196/55388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 01/25/2024] [Accepted: 01/31/2024] [Indexed: 04/25/2024] Open

Kernberg A, Gold JA, Mohan V. Using ChatGPT-4 to Create Structured Medical Notes From Audio Recordings of Physician-Patient Encounters: Comparative Study. J Med Internet Res 2024;26:e54419. [PMID: 38648636 DOI: 10.2196/54419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/20/2024] [Accepted: 03/10/2024] [Indexed: 04/25/2024] Open

Abstract

BACKGROUND

Medical documentation plays a crucial role in clinical practice, facilitating accurate patient management and communication among health care professionals. However, inaccuracies in medical notes can lead to miscommunication and diagnostic errors. Additionally, the demands of documentation contribute to physician burnout. Although intermediaries like medical scribes and speech recognition software have been used to ease this burden, they have limitations in terms of accuracy and addressing provider-specific metrics. The integration of ambient artificial intelligence (AI)-powered solutions offers a promising way to improve documentation while fitting seamlessly into existing workflows.

OBJECTIVE

This study aims to assess the accuracy and quality of Subjective, Objective, Assessment, and Plan (SOAP) notes generated by ChatGPT-4, an AI model, using established transcripts of History and Physical Examination as the gold standard. We seek to identify potential errors and evaluate the model's performance across different categories.

METHODS

We conducted simulated patient-provider encounters representing various ambulatory specialties and transcribed the audio files. Key reportable elements were identified, and ChatGPT-4 was used to generate SOAP notes based on these transcripts. Three versions of each note were created and compared to the gold standard via chart review; errors generated from the comparison were categorized as omissions, incorrect information, or additions. We compared the accuracy of data elements across versions, transcript length, and data categories. Additionally, we assessed note quality using the Physician Documentation Quality Instrument (PDQI) scoring system.

RESULTS

Although ChatGPT-4 consistently generated SOAP-style notes, there were, on average, 23.6 errors per clinical case, with errors of omission (86%) being the most common, followed by addition errors (10.5%) and inclusion of incorrect facts (3.2%). There was significant variance between replicates of the same case, with only 52.9% of data elements reported correctly across all 3 replicates. The accuracy of data elements varied across cases, with the highest accuracy observed in the "Objective" section. Consequently, the measure of note quality, assessed by PDQI, demonstrated intra- and intercase variance. Finally, the accuracy of ChatGPT-4 was inversely correlated to both the transcript length (P=.05) and the number of scorable data elements (P=.05).

CONCLUSIONS

Our study reveals substantial variability in errors, accuracy, and note quality generated by ChatGPT-4. Errors were not limited to specific sections, and the inconsistency in error types across replicates complicated predictability. Transcript length and data complexity were inversely correlated with note accuracy, raising concerns about the model's effectiveness in handling complex medical cases. The quality and reliability of clinical notes produced by ChatGPT-4 do not meet the standards required for clinical use. Although AI holds promise in health care, caution should be exercised before widespread adoption. Further research is needed to address accuracy, variability, and potential errors. ChatGPT-4, while valuable in various applications, should not be considered a safe alternative to human-generated clinical documentation at this time.

Collapse

Pham C, Govender R, Tehami S, Chavez S, Adepoju OE, Liaw W. ChatGPT's Performance in Cardiac Arrest and Bradycardia Simulations Using the American Heart Association's Advanced Cardiovascular Life Support Guidelines: Exploratory Study. J Med Internet Res 2024;26:e55037. [PMID: 38648098 DOI: 10.2196/55037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/22/2024] [Accepted: 03/10/2024] [Indexed: 04/25/2024] Open

Abstract

BACKGROUND

ChatGPT is the most advanced large language model to date, with prior iterations having passed medical licensing examinations, providing clinical decision support, and improved diagnostics. Although limited, past studies of ChatGPT's performance found that artificial intelligence could pass the American Heart Association's advanced cardiovascular life support (ACLS) examinations with modifications. ChatGPT's accuracy has not been studied in more complex clinical scenarios. As heart disease and cardiac arrest remain leading causes of morbidity and mortality in the United States, finding technologies that help increase adherence to ACLS algorithms, which improves survival outcomes, is critical.

OBJECTIVE

This study aims to examine the accuracy of ChatGPT in following ACLS guidelines for bradycardia and cardiac arrest.

METHODS

We evaluated the accuracy of ChatGPT's responses to 2 simulations based on the 2020 American Heart Association ACLS guidelines with 3 primary outcomes of interest: the mean individual step accuracy, the accuracy score per simulation attempt, and the accuracy score for each algorithm. For each simulation step, ChatGPT was scored for correctness (1 point) or incorrectness (0 points). Each simulation was conducted 20 times.

RESULTS

ChatGPT's median accuracy for each step was 85% (IQR 40%-100%) for cardiac arrest and 30% (IQR 13%-81%) for bradycardia. ChatGPT's median accuracy over 20 simulation attempts for cardiac arrest was 69% (IQR 67%-74%) and for bradycardia was 42% (IQR 33%-50%). We found that ChatGPT's outputs varied despite consistent input, the same actions were persistently missed, repetitive overemphasis hindered guidance, and erroneous medication information was presented.

CONCLUSIONS

This study highlights the need for consistent and reliable guidance to prevent potential medical errors and optimize the application of ChatGPT to enhance its reliability and effectiveness in clinical practice.

Collapse

Tessler I, Wolfovitz A, Alon EE, Gecel NA, Livneh N, Zimlichman E, Klang E. ChatGPT's adherence to otolaryngology clinical practice guidelines. Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08634-9. [PMID: 38647684 DOI: 10.1007/s00405-024-08634-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 03/22/2024] [Indexed: 04/25/2024]

Angyal V, Bertalan Á, Domján P, Dinya E. [ScreenGPT - The opportunities and limitations of artificial intelligence in primary, secondary and tertiary prevention]. Orv Hetil 2024;165:629-635. [PMID: 38643476 DOI: 10.1556/650.2024.33029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 02/29/2024] [Indexed: 04/23/2024]

Abstract Bevezetés: A prevenció és a szűrővizsgálatok manapság egyre népszerűbbek. A páciensek – tudatosabbá válásuknak köszönhetően – többet kutatnak az interneten egészségi állapotukkal kapcsolatosan, függetlenül attól, hogy az mennyire megbízható. A ChatGPT megjelenése forradalmasította az információszerzést, így elkezdték azt öndiagnózisra és egészségi állapotuk menedzselésére használni. Annak ellenére, hogy a mesterségesintelligencia-alapú szolgáltatások nem helyettesíthetik az egészségügyi szakemberekkel történő konzultációt, kiegészítő szerepet tölthetnek be a hagyományos szűrési eljárások során, így érdemes megvizsgálni a lehetőségeket és a korlátokat. Célkitűzés: Kutatásunk legfőbb célkitűzése az volt, hogy azonosítsuk azokat a területeket, ahol a ChatGPT képes bekapcsolódni a primer, szekunder és tercier prevenciós folyamatokba. Célunk volt továbbá megalkotni az olyan mesterségesintelligencia-alapú szolgáltatás koncepcióját, amely segítheti a pácienseket a prevenció különböző szintjein. Módszer: A prevenciós területen a ChatGPT által nyújtott lehetőségeket a rendszernek feltett specifikus kérdésekkel térképeztük fel. Ezen tapasztalatok alapján létrehoztunk egy webapplikációt, melynek elkészítéséhez a GPT-4 modell szolgált alapul. A válaszok helyességét strukturált pontos kérdésekkel igyekeztük javítani. A webapplikáció elkészítéséhez Python programozási nyelvet használtunk, az alkalmazást pedig a Streamlit keretrendszer felhőszolgáltatásán keresztül tettük elérhetővé és tesztelhetővé. Eredmények: A tesztek eredményei alapján több olyan prevenciós területet azonosítottunk, ahol a ChatGPT-t hatékonyan lehetne alkalmazni. Az eredmények alapján sikeresen létrehoztuk egy webapplikáció alapjait, amely a ScreenGPT nevet kapta. Következtetés: Megállapítottuk, hogy a ChatGPT a prevenció mindhárom szintjén képes hasznos válaszokat adni pontos kérdésekre. Válaszai jól tükrözik az emberi párbeszédet, ám a ChatGPT nem rendelkezik öntudattal, így fontos, hogy a felhasználók kritikusan értékeljék a válaszait. A ScreenGPT szolgáltatást e tapasztalatok alapján sikerült megalkotnunk, számos további vizsgálatra van azonban szükség, hogy megbizonyosodjunk a megbízhatóságáról. Orv Hetil. 2024; 165(16): 629–635. Collapse

Stoneham AC, Walker LC, Newman MJ, Nicholls A, Avis D. Can artificial intelligence make elective hand clinic letters easier for patients to understand? J Hand Surg Eur Vol 2024:17531934241246479. [PMID: 38641940 DOI: 10.1177/17531934241246479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/21/2024]

A Fuller K, Morbitzer KA, Zeeman JM, M Persky A, C Savage A, McLaughlin JE. Exploring the use of ChatGPT to analyze student course evaluation comments. BMC Med Educ 2024;24:423. [PMID: 38641798 PMCID: PMC11031883 DOI: 10.1186/s12909-024-05316-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/14/2024] [Indexed: 04/21/2024]

Abstract

BACKGROUND

Since the release of ChatGPT, numerous positive applications for this artificial intelligence (AI) tool in higher education have emerged. Faculty can reduce workload by implementing the use of AI. While course evaluations are a common tool used across higher education, the process of identifying useful information from multiple open-ended comments is often time consuming. The purpose of this study was to explore the use of ChatGPT in analyzing course evaluation comments, including the time required to generate themes and the level of agreement between instructor-identified and AI-identified themes.

METHODS

Course instructors independently analyzed open-ended student course evaluation comments. Five prompts were provided to guide the coding process. Instructors were asked to note the time required to complete the analysis, the general process they used, and how they felt during their analysis. Student comments were also analyzed through two independent Open-AI ChatGPT user accounts. Thematic analysis was used to analyze the themes generated by instructors and ChatGPT. Percent agreement between the instructor and ChatGPT themes were calculated for each prompt, along with an overall agreement statistic between the instructor and two ChatGPT themes.

RESULTS

There was high agreement between the instructor and ChatGPT results. The highest agreement was for course-related topics (range 0.71-0.82) and lowest agreement was for weaknesses of the course (range 0.53-0.81). For all prompts except themes related to student experience, the two ChatGPT accounts demonstrated higher agreement with one another than with the instructors. On average, instructors took 27.50 ± 15.00 min to analyze their data (range 20-50). The ChatGPT users took 10.50 ± 1.00 min (range 10-12) and 12.50 ± 2.89 min (range 10-15) to analyze the data. In relation to reviewing and analyzing their own open-ended course evaluations, instructors reported feeling anxiety prior to the process, satisfaction during the process, and frustration related to findings.

CONCLUSIONS

This study offers valuable insights into the potential of ChatGPT as a tool for analyzing open-ended student course evaluation comments in health professions education. However, it is crucial to ensure ChatGPT is used as a tool to assist with the analysis and to avoid relying solely on its outputs for conclusions.

Collapse