Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

88
(from Reference Citation Analysis)

Article PDFs (5)

Cited by > 0 (24)

Searched Name

Large language model

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Wang Y, Zuo J, Duan C, Peng H, Huang J, Zhao L, Zhang L, Dong Z. Large language models assisted multi-effect variants mining on cerebral cavernous malformation familial whole genome sequencing. Comput Struct Biotechnol J 2024;23:843-858. [PMID: 38352937 PMCID: PMC10861960 DOI: 10.1016/j.csbj.2024.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 01/04/2024] [Accepted: 01/19/2024] [Indexed: 02/16/2024] Open

Affiliation(s)

Yiqi Wang College of Biomedicine and Health, College of Life Science and Technology, Huazhong Agricultural University, No.1, Shizishan Street, Wuhan 430070, Hubei, China Center for Neurological Disease Research, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, No. 32, Renmin South Road, Shiyan 442000, Hubei, China
Jinmei Zuo Physical Examination Center, Taihe Hospital, Hubei University of Medicine, No. 32, Renmin South Road, Shiyan 442000, Hubei, China
Chao Duan College of Biomedicine and Health, College of Life Science and Technology, Huazhong Agricultural University, No.1, Shizishan Street, Wuhan 430070, Hubei, China Center for Neurological Disease Research, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China
Hao Peng Center for Neurological Disease Research, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China Department of Neurosurgery, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China
Jia Huang The Second Clinical Medical College, Lanzhou University, No. 222, South Tianshui Road, Lanzhou 730030, Gansu, China
Liang Zhao Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, No. 32, Renmin South Road, Shiyan 442000, Hubei, China
Li Zhang Center for Neurological Disease Research, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China Department of Neurosurgery, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China
Zhiqiang Dong College of Biomedicine and Health, College of Life Science and Technology, Huazhong Agricultural University, No.1, Shizishan Street, Wuhan 430070, Hubei, China Center for Neurological Disease Research, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China

Collapse

Tailor PD, Dalvin LA, Chen JJ, Iezzi R, Olsen TW, Scruggs BA, Barkmeier AJ, Bakri SJ, Ryan EH, Tang PH, Parke DW, Belin PJ, Sridhar J, Xu D, Kuriyan AE, Yonekawa Y, Starr MR. A Comparative Study of Responses to Retina Questions from Either Experts, Expert-Edited Large Language Models, or Expert-Edited Large Language Models Alone. Ophthalmol Sci 2024;4:100485. [PMID: 38660460 PMCID: PMC11041826 DOI: 10.1016/j.xops.2024.100485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 01/03/2024] [Accepted: 02/01/2024] [Indexed: 04/26/2024]

Abstract

Objective

To assess the quality, empathy, and safety of expert edited large language model (LLM), human expert created, and LLM responses to common retina patient questions.

Design

Randomized, masked multicenter study.

Participants

Twenty-one common retina patient questions were randomly assigned among 13 retina specialists.

Methods

Each expert created a response (Expert) and then edited a LLM (ChatGPT-4)-generated response to that question (Expert + artificial intelligence [AI]), timing themselves for both tasks. Five LLMs (ChatGPT-3.5, ChatGPT-4, Claude 2, Bing, and Bard) also generated responses to each question. The original question along with anonymized and randomized Expert + AI, Expert, and LLM responses were evaluated by the other experts who did not write an expert response to the question. Evaluators judged quality and empathy (very poor, poor, acceptable, good, or very good) along with safety metrics (incorrect information, likelihood to cause harm, extent of harm, and missing content).

Main Outcome

Mean quality and empathy score, proportion of responses with incorrect information, likelihood to cause harm, extent of harm, and missing content for each response type.

Results

There were 4008 total grades collected (2608 for quality and empathy; 1400 for safety metrics), with significant differences in both quality and empathy (P < 0.001, P < 0.001) between LLM, Expert and Expert + AI groups. For quality, Expert + AI (3.86 ± 0.85) performed the best overall while GPT-3.5 (3.75 ± 0.79) was the top performing LLM. For empathy, GPT-3.5 (3.75 ± 0.69) had the highest mean score followed by Expert + AI (3.73 ± 0.63). By mean score, Expert placed 4 out of 7 for quality and 6 out of 7 for empathy. For both quality (P < 0.001) and empathy (P < 0.001), expert-edited LLM responses performed better than expert-created responses. There were time savings for an expert-edited LLM response versus expert-created response (P = 0.02). ChatGPT-4 performed similar to Expert for inappropriate content (P = 0.35), missing content (P = 0.001), extent of possible harm (P = 0.356), and likelihood of possible harm (P = 0.129).

Conclusions

In this randomized, masked, multicenter study, LLM responses were comparable with experts in terms of quality, empathy, and safety metrics, warranting further exploration of their potential benefits in clinical settings.

Financial Disclosures

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of the article.

Collapse

Tobler S. Smart grading: A generative AI-based tool for knowledge-grounded answer evaluation in educational assessments. MethodsX 2024;12:102531. [PMID: 38204981 PMCID: PMC10776976 DOI: 10.1016/j.mex.2023.102531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 12/19/2023] [Indexed: 01/12/2024] Open

Farhat F. ChatGPT as a Complementary Mental Health Resource: A Boon or a Bane. Ann Biomed Eng 2024;52:1111-1114. [PMID: 37477707 DOI: 10.1007/s10439-023-03326-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 07/17/2023] [Indexed: 07/22/2023]

Wang C, Ong J, Wang C, Ong H, Cheng R, Ong D. Potential for GPT Technology to Optimize Future Clinical Decision-Making Using Retrieval-Augmented Generation. Ann Biomed Eng 2024;52:1115-1118. [PMID: 37530906 DOI: 10.1007/s10439-023-03327-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 07/17/2023] [Indexed: 08/03/2023]

Jiang S, Evans-Yamamoto D, Bersenev D, Palaniappan SK, Yachie-Kinoshita A. ProtoCode: Leveraging large language models (LLMs) for automated generation of machine-readable PCR protocols from scientific publications. SLAS Technol 2024;29:100134. [PMID: 38670311 DOI: 10.1016/j.slast.2024.100134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 04/03/2024] [Accepted: 04/22/2024] [Indexed: 04/28/2024]

Tsai CY, Hsieh SJ, Huang HH, Deng JH, Huang YY, Cheng PY. Performance of ChatGPT on the Taiwan urology board examination: insights into current strengths and shortcomings. World J Urol 2024;42:250. [PMID: 38652322 DOI: 10.1007/s00345-024-04957-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open

Abstract

PURPOSE

To compare ChatGPT-4 and ChatGPT-3.5's performance on Taiwan urology board examination (TUBE), focusing on answer accuracy, explanation consistency, and uncertainty management tactics to minimize score penalties from incorrect responses across 12 urology domains.

METHODS

450 multiple-choice questions from TUBE(2020-2022) were presented to two models. Three urologists assessed correctness and consistency of each response. Accuracy quantifies correct answers; consistency assesses logic and coherence in explanations out of total responses, alongside a penalty reduction experiment with prompt variations. Univariate logistic regression was applied for subgroup comparison.

RESULTS

ChatGPT-4 showed strengths in urology, achieved an overall accuracy of 57.8%, with annual accuracies of 64.7% (2020), 58.0% (2021), and 50.7% (2022), significantly surpassing ChatGPT-3.5 (33.8%, OR = 2.68, 95% CI [2.05-3.52]). It could have passed the TUBE written exams if solely based on accuracy but failed in the final score due to penalties. ChatGPT-4 displayed a declining accuracy trend over time. Variability in accuracy across 12 urological domains was noted, with more frequently updated knowledge domains showing lower accuracy (53.2% vs. 62.2%, OR = 0.69, p = 0.05). A high consistency rate of 91.6% in explanations across all domains indicates reliable delivery of coherent and logical information. The simple prompt outperformed strategy-based prompts in accuracy (60% vs. 40%, p = 0.016), highlighting ChatGPT's limitations in its inability to accurately self-assess uncertainty and a tendency towards overconfidence, which may hinder medical decision-making.

CONCLUSIONS

ChatGPT-4's high accuracy and consistent explanations in urology board examination demonstrate its potential in medical information processing. However, its limitations in self-assessment and overconfidence necessitate caution in its application, especially for inexperienced users. These insights call for ongoing advancements of urology-specific AI tools.

Collapse

Ye G. De novo drug design as GPT language modeling: large chemistry models with supervised and reinforcement learning. J Comput Aided Mol Des 2024;38:20. [PMID: 38647700 PMCID: PMC11035455 DOI: 10.1007/s10822-024-00559-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/22/2024] [Indexed: 04/25/2024]

Kawahara T, Sumi Y. GPT-4/4V's performance on the Japanese National Medical Licensing Examination. Med Teach 2024:1-8. [PMID: 38648547 DOI: 10.1080/0142159x.2024.2342545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 04/09/2024] [Indexed: 04/25/2024]

Wang A, Liu C, Yang J, Weng C. Fine-tuning Large Language Models for Rare Disease Concept Normalization. bioRxiv 2024:2023.12.28.573586. [PMID: 38234802 PMCID: PMC10793431 DOI: 10.1101/2023.12.28.573586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]

Yu Z, Peng C, Yang X, Dang C, Adekkanattu P, Gopal Patra B, Peng Y, Pathak J, Wilson DL, Chang CY, Lo-Ciganic WH, George TJ, Hogan WR, Guo Y, Bian J, Wu Y. Identifying social determinants of health from clinical narratives: A study of performance, documentation ratio, and potential bias. J Biomed Inform 2024;153:104642. [PMID: 38621641 DOI: 10.1016/j.jbi.2024.104642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 04/09/2024] [Accepted: 04/12/2024] [Indexed: 04/17/2024]

Abstract

OBJECTIVE

To develop a natural language processing (NLP) package to extract social determinants of health (SDoH) from clinical narratives, examine the bias among race and gender groups, test the generalizability of extracting SDoH for different disease groups, and examine population-level extraction ratio.

METHODS

We developed SDoH corpora using clinical notes identified at the University of Florida (UF) Health. We systematically compared 7 transformer-based large language models (LLMs) and developed an open-source package - SODA (i.e., SOcial DeterminAnts) to facilitate SDoH extraction from clinical narratives. We examined the performance and potential bias of SODA for different race and gender groups, tested the generalizability of SODA using two disease domains including cancer and opioid use, and explored strategies for improvement. We applied SODA to extract 19 categories of SDoH from the breast (n = 7,971), lung (n = 11,804), and colorectal cancer (n = 6,240) cohorts to assess patient-level extraction ratio and examine the differences among race and gender groups.

RESULTS

We developed an SDoH corpus using 629 clinical notes of cancer patients with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH, and another cross-disease validation corpus using 200 notes from opioid use patients with 4,342 SDoH concepts/attributes. We compared 7 transformer models and the GatorTron model achieved the best mean average strict/lenient F1 scores of 0.9122 and 0.9367 for SDoH concept extraction and 0.9584 and 0.9593 for linking attributes to SDoH concepts. There is a small performance gap (∼4%) between Males and Females, but a large performance gap (>16 %) among race groups. The performance dropped when we applied the cancer SDoH model to the opioid cohort; fine-tuning using a smaller opioid SDoH corpus improved the performance. The extraction ratio varied in the three cancer cohorts, in which 10 SDoH could be extracted from over 70 % of cancer patients, but 9 SDoH could be extracted from less than 70 % of cancer patients. Individuals from the White and Black groups have a higher extraction ratio than other minority race groups.

CONCLUSIONS

Our SODA package achieved good performance in extracting 19 categories of SDoH from clinical narratives. The SODA package with pre-trained transformer models is available at https://github.com/uf-hobi-informatics-lab/SODA_Docker.

Collapse

Affiliation(s)

Zehao Yu Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
Cheng Peng Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
Xi Yang Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
Chong Dang Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
Prakash Adekkanattu Information Technologies and Services, Weill Cornell Medicine, New York, NY, USA
Braja Gopal Patra Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
Yifan Peng Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
Jyotishman Pathak Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
Debbie L Wilson Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
Ching-Yuan Chang Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
Wei-Hsuan Lo-Ciganic Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
Thomas J George Division of Hematology & Oncology, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA
William R Hogan Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
Yi Guo Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
Jiang Bian Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
Yonghui Wu Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.

Collapse

Kamihara T, Tabuchi M, Omura T, Suzuki Y, Aritake T, Hirashiki A, Kokubo M, Shimizu A. Evolution of a Large Language Model for Preoperative Assessment Based on the Japanese Circulation Society 2022 Guideline on Perioperative Cardiovascular Assessment and Management for Non-Cardiac Surgery. Circ Rep 2024;6:142-148. [PMID: 38606418 PMCID: PMC11004031 DOI: 10.1253/circrep.cr-24-0019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 03/04/2024] [Indexed: 04/13/2024] Open

Itelman E, Golovchiner G, Barsheshet A, Kornowski R, Erez A. Balancing innovation and professionalism: The emerging role of AI-powered chatbots in medical consultation. Heart Rhythm 2024:S1547-5271(24)02327-0. [PMID: 38588991 DOI: 10.1016/j.hrthm.2024.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 04/02/2024] [Accepted: 04/03/2024] [Indexed: 04/10/2024]

Gui H, Rezaei SJ, Schlessinger D, Weed J, Lester J, Wongvibulsin S, Mitchell D, Ko J, Rotemberg V, Lee I, Daneshjou R. Dermatologists' Perspectives and Usage of Large Language Models in Practice: An Exploratory Survey. J Invest Dermatol 2024:S0022-202X(24)00270-7. [PMID: 38582369 DOI: 10.1016/j.jid.2024.03.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 03/19/2024] [Indexed: 04/08/2024]

Zhang S, Liau ZQG, Tan KLM, Chua WL. Evaluating the accuracy and relevance of ChatGPT responses to frequently asked questions regarding total knee replacement. Knee Surg Relat Res 2024;36:15. [PMID: 38566254 PMCID: PMC10986046 DOI: 10.1186/s43019-024-00218-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 03/12/2024] [Indexed: 04/04/2024] Open

Zhenzhu L, Jingfeng Z, Wei Z, Jianjun Z, Yinshui X. GPT-agents based on medical guidelines can improve the responsiveness and explainability of outcomes for traumatic brain injury rehabilitation. Sci Rep 2024;14:7626. [PMID: 38561445 PMCID: PMC10985066 DOI: 10.1038/s41598-024-58514-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 03/30/2024] [Indexed: 04/04/2024] Open

Abstract

This study explored the application of generative pre-trained transformer (GPT) agents based on medical guidelines using large language model (LLM) technology for traumatic brain injury (TBI) rehabilitation-related questions. To assess the effectiveness of multiple agents (GPT-agents) created using GPT-4, a comparison was conducted using direct GPT-4 as the control group (GPT-4). The GPT-agents comprised multiple agents with distinct functions, including "Medical Guideline Classification", "Question Retrieval", "Matching Evaluation", "Intelligent Question Answering (QA)", and "Results Evaluation and Source Citation". Brain rehabilitation questions were selected from the doctor-patient Q&A database for assessment. The primary endpoint was a better answer. The secondary endpoints were accuracy, completeness, explainability, and empathy. Thirty questions were answered; overall GPT-agents took substantially longer and more words to respond than GPT-4 (time: 54.05 vs. 9.66 s, words: 371 vs. 57). However, GPT-agents provided superior answers in more cases compared to GPT-4 (66.7 vs. 33.3%). GPT-Agents surpassed GPT-4 in accuracy evaluation (3.8 ± 1.02 vs. 3.2 ± 0.96, p = 0.0234). No difference in incomplete answers was found (2 ± 0.87 vs. 1.7 ± 0.79, p = 0.213). However, in terms of explainability (2.79 ± 0.45 vs. 07 ± 0.52, p < 0.001) and empathy (2.63 ± 0.57 vs. 1.08 ± 0.51, p < 0.001) evaluation, the GPT-agents performed notably better. Based on medical guidelines, GPT-agents enhanced the accuracy and empathy of responses to TBI rehabilitation questions. This study provides guideline references and demonstrates improved clinical explainability. However, further validation through multicenter trials in a clinical setting is necessary. This study offers practical insights and establishes groundwork for the potential theoretical integration of LLM-agents medicine.

Collapse

Kim H, Kim P, Joo I, Kim JH, Park CM, Yoon SH. ChatGPT Vision for Radiological Interpretation: An Investigation Using Medical School Radiology Examinations. Korean J Radiol 2024;25:403-406. [PMID: 38528699 PMCID: PMC10973733 DOI: 10.3348/kjr.2024.0017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 01/11/2024] [Accepted: 01/14/2024] [Indexed: 03/27/2024] Open

Zhang K, Wang S, Jia N, Zhao L, Han C, Li L. Integrating visual large language model and reasoning chain for driver behavior analysis and risk assessment. Accid Anal Prev 2024;198:107497. [PMID: 38330547 DOI: 10.1016/j.aap.2024.107497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/12/2024] [Accepted: 02/03/2024] [Indexed: 02/10/2024]

Kinoshita M, Komasaka M, Tanaka K. ChatGPT's performance on JSA-certified anesthesiologist exam. J Anesth 2024;38:282-283. [PMID: 37902835 DOI: 10.1007/s00540-023-03275-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 10/12/2023] [Indexed: 11/01/2023]

Gu Z, He X, Yu P, Jia W, Yang X, Peng G, Hu P, Chen S, Chen H, Lin Y. Automatic quantitative stroke severity assessment based on Chinese clinical named entity recognition with domain-adaptive pre-trained large language model. Artif Intell Med 2024;150:102822. [PMID: 38553162 DOI: 10.1016/j.artmed.2024.102822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 01/28/2024] [Accepted: 02/21/2024] [Indexed: 04/02/2024]

Abstract

BACKGROUND

Stroke is a prevalent disease with a significant global impact. Effective assessment of stroke severity is vital for an accurate diagnosis, appropriate treatment, and optimal clinical outcomes. The National Institutes of Health Stroke Scale (NIHSS) is a widely used scale for quantitatively assessing stroke severity. However, the current manual scoring of NIHSS is labor-intensive, time-consuming, and sometimes unreliable. Applying artificial intelligence (AI) techniques to automate the quantitative assessment of stroke on vast amounts of electronic health records (EHRs) has attracted much interest.

OBJECTIVE

This study aims to develop an automatic, quantitative stroke severity assessment framework through automating the entire NIHSS scoring process on Chinese clinical EHRs.

METHODS

Our approach consists of two major parts: Chinese clinical named entity recognition (CNER) with a domain-adaptive pre-trained large language model (LLM) and automated NIHSS scoring. To build a high-performing CNER model, we first construct a stroke-specific, densely annotated dataset "Chinese Stroke Clinical Records" (CSCR) from EHRs provided by our partner hospital, based on a stroke ontology that defines semantically related entities for stroke assessment. We then pre-train a Chinese clinical LLM coined "CliRoberta" through domain-adaptive transfer learning and construct a deep learning-based CNER model that can accurately extract entities directly from Chinese EHRs. Finally, an automated, end-to-end NIHSS scoring pipeline is proposed by mapping the extracted entities to relevant NIHSS items and values, to quantitatively assess the stroke severity.

RESULTS

Results obtained on a benchmark dataset CCKS2019 and our newly created CSCR dataset demonstrate the superior performance of our domain-adaptive pre-trained LLM and the CNER model, compared with the existing benchmark LLMs and CNER models. The high F1 score of 0.990 ensures the reliability of our model in accurately extracting the entities for the subsequent automatic NIHSS scoring. Subsequently, our automated, end-to-end NIHSS scoring approach achieved excellent inter-rater agreement (0.823) and intraclass consistency (0.986) with the ground truth and significantly reduced the processing time from minutes to a few seconds.

CONCLUSION

Our proposed automatic and quantitative framework for assessing stroke severity demonstrates exceptional performance and reliability through directly scoring the NIHSS from diagnostic notes in Chinese clinical EHRs. Moreover, this study also contributes a new clinical dataset, a pre-trained clinical LLM, and an effective deep learning-based CNER model. The deployment of these advanced algorithms can improve the accuracy and efficiency of clinical assessment, and help improve the quality, affordability and productivity of healthcare services.

Collapse

Bernstorff M, Vistisen ST, Enevoldsen KC. Natural language processing for electronic health records in anaesthesiology: an introduction to clinicians with recommendations and pitfalls. J Clin Monit Comput 2024;38:241-245. [PMID: 38310589 PMCID: PMC10995065 DOI: 10.1007/s10877-024-01128-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 01/18/2024] [Indexed: 02/06/2024]

Peng C, Yang X, Smith KE, Yu Z, Chen A, Bian J, Wu Y. Model tuning or prompt Tuning? a study of large language models for clinical concept and relation extraction. J Biomed Inform 2024;153:104630. [PMID: 38548007 DOI: 10.1016/j.jbi.2024.104630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 02/24/2024] [Accepted: 03/25/2024] [Indexed: 04/04/2024]

Abstract

OBJECTIVE

To develop soft prompt-based learning architecture for large language models (LLMs), examine prompt-tuning using frozen/unfrozen LLMs, and assess their abilities in transfer learning and few-shot learning.

METHODS

We developed a soft prompt-based learning architecture and compared 4 strategies including (1) fine-tuning without prompts; (2) hard-prompting with unfrozen LLMs; (3) soft-prompting with unfrozen LLMs; and (4) soft-prompting with frozen LLMs. We evaluated GatorTron, a clinical LLM with up to 8.9 billion parameters, and compared GatorTron with 4 existing transformer models for clinical concept and relation extraction on 2 benchmark datasets for adverse drug events and social determinants of health (SDoH). We evaluated the few-shot learning ability and generalizability for cross-institution applications.

RESULTS AND CONCLUSION

When LLMs are unfrozen, GatorTron-3.9B with soft prompting achieves the best strict F1-scores of 0.9118 and 0.8604 for concept extraction, outperforming the traditional fine-tuning and hard prompt-based models by 0.6 ∼ 3.1 % and 1.2 ∼ 2.9 %, respectively; GatorTron-345 M with soft prompting achieves the best F1-scores of 0.8332 and 0.7488 for end-to-end relation extraction, outperforming other two models by 0.2 ∼ 2 % and 0.6 ∼ 11.7 %, respectively. When LLMs are frozen, small LLMs have a big gap to be competitive with unfrozen models; scaling LLMs up to billions of parameters makes frozen LLMs competitive with unfrozen models. Soft prompting with a frozen GatorTron-8.9B model achieved the best performance for cross-institution evaluation. We demonstrate that (1) machines can learn soft prompts better than hard prompts composed by human, (2) frozen LLMs have good few-shot learning ability and generalizability for cross-institution applications, (3) frozen LLMs reduce computing cost to 2.5 ∼ 6 % of previous methods using unfrozen LLMs, and (4) frozen LLMs require large models (e.g., over several billions of parameters) for good performance.

Collapse

Koga S, Du W. ChatGPT's limited accuracy in generating anatomical images for medical education. Skeletal Radiol 2024:10.1007/s00256-024-04655-x. [PMID: 38506966 DOI: 10.1007/s00256-024-04655-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 03/06/2024] [Accepted: 03/11/2024] [Indexed: 03/22/2024]

Suzuki R, Arita T. An evolutionary model of personality traits related to cooperative behavior using a large language model. Sci Rep 2024;14:5989. [PMID: 38503778 PMCID: PMC10951268 DOI: 10.1038/s41598-024-55903-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Accepted: 02/28/2024] [Indexed: 03/21/2024] Open

Mat Q, Briganti G, Maniaci A, Lelubre C. Will ChatGPT soon replace otolaryngologists? Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08543-x. [PMID: 38438614 DOI: 10.1007/s00405-024-08543-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 02/08/2024] [Indexed: 03/06/2024]

Hu D, Liu B, Zhu X, Lu X, Wu N. Zero-shot information extraction from radiological reports using ChatGPT. Int J Med Inform 2024;183:105321. [PMID: 38157785 DOI: 10.1016/j.ijmedinf.2023.105321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 12/04/2023] [Accepted: 12/16/2023] [Indexed: 01/03/2024]

Abstract

INTRODUCTION

Electronic health records contain an enormous amount of valuable information recorded in free text. Information extraction is the strategy to transform free text into structured data, but some of its components require annotated data to tune, which has become a bottleneck. Large language models achieve good performances on various downstream NLP tasks without parameter tuning, becoming a possible way to extract information in a zero-shot manner.

METHODS

In this study, we aim to explore whether the most popular large language model, ChatGPT, can extract information from the radiological reports. We first design the prompt template for the interested information in the CT reports. Then, we generate the prompts by combining the prompt template with the CT reports as the inputs of ChatGPT to obtain the responses. A post-processing module is developed to transform the responses into structured extraction results. Besides, we add prior medical knowledge to the prompt template to reduce wrong extraction results. We also explore the consistency of the extraction results.

RESULTS

We conducted the experiments with 847 real CT reports. The experimental results indicate that ChatGPT can achieve competitive performances for some extraction tasks like tumor location, tumor long and short diameters compared with the baseline information extraction system. By adding some prior medical knowledge to the prompt template, extraction tasks about tumor spiculations and lobulations obtain significant improvements but tasks about tumor density and lymph node status do not achieve better performances.

CONCLUSION

ChatGPT can achieve competitive information extraction for radiological reports in a zero-shot manner. Adding prior medical knowledge as instructions can further improve performances for some extraction tasks but may lead to worse performances for some complex extraction tasks.

Collapse

Kim K, Cho K, Jang R, Kyung S, Lee S, Ham S, Choi E, Hong GS, Kim N. Updated Primer on Generative Artificial Intelligence and Large Language Models in Medical Imaging for Medical Professionals. Korean J Radiol 2024;25:224-242. [PMID: 38413108 PMCID: PMC10912493 DOI: 10.3348/kjr.2023.0818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 11/27/2023] [Accepted: 12/28/2023] [Indexed: 02/29/2024] Open

Liu P, Ren Y, Tao J, Ren Z. GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text. Comput Biol Med 2024;171:108073. [PMID: 38359660 DOI: 10.1016/j.compbiomed.2024.108073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/25/2023] [Accepted: 01/27/2024] [Indexed: 02/17/2024]

Dang R, Hanba C. A large language model's assessment of methodology reporting in head and neck surgery. Am J Otolaryngol 2024;45:104145. [PMID: 38103488 DOI: 10.1016/j.amjoto.2023.104145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 12/03/2023] [Indexed: 12/19/2023]

Wei Q, Yao Z, Cui Y, Wei B, Jin Z, Xu X. Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis. J Biomed Inform 2024;151:104620. [PMID: 38462064 DOI: 10.1016/j.jbi.2024.104620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/27/2024] [Accepted: 02/29/2024] [Indexed: 03/12/2024]

Abstract

OBJECTIVE

Large language models (LLMs) such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT's performance in answering medical questions and provide direction for future research.

METHODS

An extensive literature search was conducted on June 15, 2023, across ten medical databases. The keyword used was "ChatGPT," without restrictions on publication type, language, or date. Studies evaluating ChatGPT's performance in answering medical questions were included. Exclusions comprised review articles, comments, patents, non-medical evaluations of ChatGPT, and preprint studies. Data was extracted on general study characteristics, question sources, conversation processes, assessment metrics, and performance of ChatGPT. An evaluation framework for LLM in medical inquiries was proposed by integrating insights from selected literature. This study is registered with PROSPERO, CRD42023456327.

RESULTS

A total of 3520 articles were identified, of which 60 were reviewed and summarized in this paper and 17 were included in the meta-analysis. ChatGPT displayed an overall integrated accuracy of 56 % (95 % CI: 51 %-60 %, I2 = 87 %) in addressing medical queries. However, the studies varied in question resource, question-asking process, and evaluation metrics. As per our proposed evaluation framework, many studies failed to report methodological details, such as the date of inquiry, version of ChatGPT, and inter-rater consistency.

CONCLUSION

This review reveals ChatGPT's potential in addressing medical inquiries, but the heterogeneity of the study design and insufficient reporting might affect the results' reliability. Our proposed evaluation framework provides insights for the future study design and transparent reporting of LLM in responding to medical questions.

Collapse

Li S, Guo Z, Zang X. Advancing the Production of Clinical Medical Devices Through ChatGPT. Ann Biomed Eng 2024;52:441-445. [PMID: 37369944 DOI: 10.1007/s10439-023-03300-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 06/22/2023] [Indexed: 06/29/2023]

Pal S, Bhattacharya M, Lee SS, Chakraborty C. A Domain-Specific Next-Generation Large Language Model (LLM) or ChatGPT is Required for Biomedical Engineering and Research. Ann Biomed Eng 2024;52:451-454. [PMID: 37428337 DOI: 10.1007/s10439-023-03306-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 07/03/2023] [Indexed: 07/11/2023]

Shen SA, Perez-Heydrich CA, Xie DX, Nellis JC. ChatGPT vs. web search for patient questions: what does ChatGPT do better? Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08524-0. [PMID: 38416195 DOI: 10.1007/s00405-024-08524-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 01/31/2024] [Indexed: 02/29/2024]

Abstract

PURPOSE

Chat generative pretrained transformer (ChatGPT) has the potential to significantly impact how patients acquire medical information online. Here, we characterize the readability and appropriateness of ChatGPT responses to a range of patient questions compared to results from traditional web searches.

METHODS

Patient questions related to the published Clinical Practice Guidelines by the American Academy of Otolaryngology-Head and Neck Surgery were sourced from existing online posts. Questions were categorized using a modified Rothwell classification system into (1) fact, (2) policy, and (3) diagnosis and recommendations. These were queried using ChatGPT and traditional web search. All results were evaluated on readability (Flesch Reading Ease and Flesch-Kinkaid Grade Level) and understandability (Patient Education Materials Assessment Tool). Accuracy was assessed by two blinded clinical evaluators using a three-point ordinal scale.

RESULTS

54 questions were organized into fact (37.0%), policy (37.0%), and diagnosis (25.8%). The average readability for ChatGPT responses was lower than traditional web search (FRE: 42.3 ± 13.1 vs. 55.6 ± 10.5, p < 0.001), while the PEMAT understandability was equivalent (93.8% vs. 93.5%, p = 0.17). ChatGPT scored higher than web search for questions the 'Diagnosis' category (p < 0.01); there was no difference in questions categorized as 'Fact' (p = 0.15) or 'Policy' (p = 0.22). Additional prompting improved ChatGPT response readability (FRE 55.6 ± 13.6, p < 0.01).

CONCLUSIONS

ChatGPT outperforms web search in answering patient questions related to symptom-based diagnoses and is equivalent in providing medical facts and established policy. Appropriate prompting can further improve readability while maintaining accuracy. Further patient education is needed to relay the benefits and limitations of this technology as a source of medial information.

Collapse

Reese JT, Danis D, Caufield JH, Groza T, Casiraghi E, Valentini G, Mungall CJ, Robinson PN. On the limitations of large language models in clinical diagnosis. medRxiv 2024:2023.07.13.23292613. [PMID: 37503093 PMCID: PMC10370243 DOI: 10.1101/2023.07.13.23292613] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]

Sood A, Mansoor N, Memmi C, Lynch M, Lynch J. Generative pretrained transformer-4, an artificial intelligence text predictive model, has a high capability for passing novel written radiology exam questions. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03071-9. [PMID: 38381363 DOI: 10.1007/s11548-024-03071-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 02/01/2024] [Indexed: 02/22/2024]

Hu Y, Hu Z, Liu W, Gao A, Wen S, Liu S, Lin Z. Exploring the potential of ChatGPT as an adjunct for generating diagnosis based on chief complaint and cone beam CT radiologic findings. BMC Med Inform Decis Mak 2024;24:55. [PMID: 38374067 PMCID: PMC10875853 DOI: 10.1186/s12911-024-02445-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 01/28/2024] [Indexed: 02/21/2024] Open

Guthrie E, Levy D, Del Carmen G. The Operating and Anesthetic Reference Assistant (OARA): A fine-tuned large language model for resident teaching. Am J Surg 2024:S0002-9610(24)00106-5. [PMID: 38365551 DOI: 10.1016/j.amjsurg.2024.02.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 02/03/2024] [Accepted: 02/08/2024] [Indexed: 02/18/2024]

Cai ZR, Chen ML, Kim J, Novoa RA, Barnes LA, Beam A, Linos E. Assessment of Correctness, Content Omission, and Risk of Harm in Large Language Model Responses to Dermatology Continuing Medical Education Questions. J Invest Dermatol 2024:S0022-202X(24)00088-5. [PMID: 38310972 DOI: 10.1016/j.jid.2024.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 01/16/2024] [Indexed: 02/06/2024]

Nakaura T, Yoshida N, Kobayashi N, Shiraishi K, Nagayama Y, Uetani H, Kidoh M, Hokamura M, Funama Y, Hirai T. Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports. Jpn J Radiol 2024;42:190-200. [PMID: 37713022 PMCID: PMC10811038 DOI: 10.1007/s11604-023-01487-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 08/29/2023] [Indexed: 09/16/2023]

Zhou Y, Moon C, Szatkowski J, Moore D, Stevens J. Evaluating ChatGPT responses in the context of a 53-year-old male with a femoral neck fracture: a qualitative analysis. Eur J Orthop Surg Traumatol 2024;34:927-955. [PMID: 37776392 PMCID: PMC10858115 DOI: 10.1007/s00590-023-03742-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 09/18/2023] [Indexed: 10/02/2023]

Abstract

PURPOSE

The integration of artificial intelligence (AI) tools, such as ChatGPT, in clinical medicine and medical education has gained significant attention due to their potential to support decision-making and improve patient care. However, there is a need to evaluate the benefits and limitations of these tools in specific clinical scenarios.

METHODS

This study used a case study approach within the field of orthopaedic surgery. A clinical case report featuring a 53-year-old male with a femoral neck fracture was used as the basis for evaluation. ChatGPT, a large language model, was asked to respond to clinical questions related to the case. The responses generated by ChatGPT were evaluated qualitatively, considering their relevance, justification, and alignment with the responses of real clinicians. Alternative dialogue protocols were also employed to assess the impact of additional prompts and contextual information on ChatGPT responses.

RESULTS

ChatGPT generally provided clinically appropriate responses to the questions posed in the clinical case report. However, the level of justification and explanation varied across the generated responses. Occasionally, clinically inappropriate responses and inconsistencies were observed in the generated responses across different dialogue protocols and on separate days.

CONCLUSIONS

The findings of this study highlight both the potential and limitations of using ChatGPT in clinical practice. While ChatGPT demonstrated the ability to provide relevant clinical information, the lack of consistent justification and occasional clinically inappropriate responses raise concerns about its reliability. These results underscore the importance of careful consideration and validation when using AI tools in healthcare. Further research and clinician training are necessary to effectively integrate AI tools like ChatGPT, ensuring their safe and reliable use in clinical decision-making.

Collapse

Kim S, Lee CK, Kim SS. Large Language Models: A Guide for Radiologists. Korean J Radiol 2024;25:126-133. [PMID: 38288895 PMCID: PMC10831297 DOI: 10.3348/kjr.2023.0997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 11/27/2023] [Accepted: 12/18/2023] [Indexed: 02/01/2024] Open

King MR, Abdulrahman AM, Petrovic MI, Poley PL, Hall SP, Kulapatana S, Lamantia ZE. Incorporation of ChatGPT and Other Large Language Models into a Graduate Level Computational Bioengineering Course. Cell Mol Bioeng 2024;17:1-6. [PMID: 38435794 PMCID: PMC10902225 DOI: 10.1007/s12195-024-00793-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2024] Open

Sahin MC, Sozer A, Kuzucu P, Turkmen T, Sahin MB, Sozer E, Tufek OY, Nernekli K, Emmez H, Celtikci E. Beyond human in neurosurgical exams: ChatGPT's success in the Turkish neurosurgical society proficiency board exams. Comput Biol Med 2024;169:107807. [PMID: 38091727 DOI: 10.1016/j.compbiomed.2023.107807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 11/29/2023] [Accepted: 12/01/2023] [Indexed: 02/08/2024]

Liao Z, Wang J, Shi Z, Lu L, Tabata H. Revolutionary Potential of ChatGPT in Constructing Intelligent Clinical Decision Support Systems. Ann Biomed Eng 2024;52:125-129. [PMID: 37332008 DOI: 10.1007/s10439-023-03288-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 06/13/2023] [Indexed: 06/20/2023]

Rahad K, Martin K, Amugo I, Ferguson S, Curtis A, Davis A, Gangula P, Wang Q. ChatGPT to Enhance Learning in Dental Education at a Historically Black Medical College. Dent Res Oral Health 2024;7:8-14. [PMID: 38404561 PMCID: PMC10887427 DOI: 10.26502/droh.0069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]

Gravina AG, Pellegrino R, Cipullo M, Palladino G, Imperio G, Ventura A, Auletta S, Ciamarra P, Federico A. May ChatGPT be a tool producing medical information for common inflammatory bowel disease patients' questions? An evidence-controlled analysis. World J Gastroenterol 2024;30:17-33. [PMID: 38293321 PMCID: PMC10823903 DOI: 10.3748/wjg.v30.i1.17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/07/2023] [Accepted: 12/28/2023] [Indexed: 01/06/2024] Open

Woo B, Huynh T, Tang A, Bui N, Nguyen G, Tam W. Transforming nursing with large language models: from concept to practice. Eur J Cardiovasc Nurs 2024:zvad120. [PMID: 38178303 DOI: 10.1093/eurjcn/zvad120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 11/19/2023] [Indexed: 01/06/2024]

Scquizzato T, Semeraro F, Swindell P, Simpson R, Angelini M, Gazzato A, Sajjad U, Bignami EG, Landoni G, Keeble TR, Mion M. Testing ChatGPT ability to answer laypeople questions about cardiac arrest and cardiopulmonary resuscitation. Resuscitation 2024;194:110077. [PMID: 38081504 DOI: 10.1016/j.resuscitation.2023.110077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Accepted: 11/15/2023] [Indexed: 12/22/2023]

Abstract

INTRODUCTION

Cardiac arrest leaves witnesses, survivors, and their relatives with a multitude of questions. When a young or a public figure is affected, interest around cardiac arrest and cardiopulmonary resuscitation (CPR) increases. ChatGPT allows everyone to obtain human-like responses on any topic. Due to the risks of accessing incorrect information, we assessed ChatGPT accuracy in answering laypeople questions about cardiac arrest and CPR.

METHODS

We co-produced a list of 40 questions with members of Sudden Cardiac Arrest UK covering all aspects of cardiac arrest and CPR. Answers provided by ChatGPT to each question were evaluated by professionals for their accuracy, by professionals and laypeople for their relevance, clarity, comprehensiveness, and overall value on a scale from 1 (poor) to 5 (excellent), and for readability.

RESULTS

ChatGPT answers received an overall positive evaluation (4.3 ± 0.7) by 14 professionals and 16 laypeople. Also, clarity (4.4 ± 0.6), relevance (4.3 ± 0.6), accuracy (4.0 ± 0.6), and comprehensiveness (4.2 ± 0.7) of answers was rated high. Professionals, however, rated overall value (4.0 ± 0.5 vs 4.6 ± 0.7; p = 0.02) and comprehensiveness (3.9 ± 0.6 vs 4.5 ± 0.7; p = 0.02) lower compared to laypeople. CPR-related answers consistently received a lower score across all parameters by professionals and laypeople. Readability was 'difficult' (median Flesch reading ease score of 34 [IQR 26-42]).

CONCLUSIONS

ChatGPT provided largely accurate, relevant, and comprehensive answers to questions about cardiac arrest commonly asked by survivors, their relatives, and lay rescuers, except CPR-related answers that received the lowest scores. Large language model will play a significant role in the future and healthcare-related content generated should be monitored.

Collapse

Wei WI, Leung CLK, Tang A, McNeil EB, Wong SYS, Kwok KO. Extracting symptoms from free-text responses using ChatGPT among COVID-19 cases in Hong Kong. Clin Microbiol Infect 2024;30:142.e1-142.e3. [PMID: 37949111 DOI: 10.1016/j.cmi.2023.11.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 11/01/2023] [Accepted: 11/03/2023] [Indexed: 11/12/2023]

Knoedler S, Sofo G, Kern B, Frank K, Cotofana S, von Isenburg S, Könneker S, Mazzarone F, Dorafshar AH, Knoedler L, Alfertshofer M. Modern Machiavelli? The illusion of ChatGPT-generated patient reviews in plastic and aesthetic surgery based on 9000 review classifications. J Plast Reconstr Aesthet Surg 2024;88:99-108. [PMID: 37972444 DOI: 10.1016/j.bjps.2023.10.119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 10/18/2023] [Accepted: 10/23/2023] [Indexed: 11/19/2023]

Abstract

BACKGROUND

Online patient reviews are crucial in guiding individuals who seek plastic surgery, but artificial chatbots pose a threat of disseminating fake reviews. This study aimed to compare real patient feedback with ChatGPT-generated reviews for the top five US plastic surgery procedures.

METHODS

Thirty real patient reviews on rhinoplasty, blepharoplasty, facelift, liposuction, and breast augmentation were collected from RealSelf and used as templates for ChatGPT to generate matching patient reviews. Prolific users (n = 30) assessed 150 pairs of reviews to identify human-written and artificial intelligence (AI)-generated reviews. Patient reviews were further assessed using AI content detector software (Copyleaks AI).

RESULTS

Among the 9000 classification tasks, 64.3% and 35.7% of reviews were classified as authentic and fake, respectively. On an average, the author (human versus machine) was correctly identified in 59.6% of cases, and this poor classification performance was consistent across all procedures. Patients with prior aesthetic treatment showed poorer classification performance than those without (p < 0.05). The mean character count in human-written reviews was significantly higher (p < 0.001) that that in AI-generated reviews, with a significant correlation between character count and participants' accuracy rate (p < 0.001). Emotional timbre of reviews differed significantly with "happiness" being more prevalent in human-written reviews (p < 0.001), and "disappointment" being more prevalent in AI reviews (p = 0.005). Copyleaks AI correctly classified 96.7% and 69.3% of human-written and ChatGPT-generated reviews, respectively.

CONCLUSION

ChatGPT convincingly replicates authentic patient reviews, even deceiving commercial AI detection software. Analyzing emotional tone and review length can help differentiate real from fake reviews, underscoring the need to educate both patients and physicians to prevent misinformation and mistrust.

Collapse