1
|
Zhang Z, Ni H. Critical care studies using large language models based on electronic healthcare records: A technical note. JOURNAL OF INTENSIVE MEDICINE 2025; 5:137-150. [PMID: 40241837 PMCID: PMC11997556 DOI: 10.1016/j.jointm.2024.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 08/13/2024] [Accepted: 09/23/2024] [Indexed: 01/18/2025]
Abstract
The integration of large language models (LLMs) in clinical medicine, particularly in critical care, has introduced transformative capabilities for analyzing and managing complex medical information. This technical note explores the application of LLMs, such as generative pretrained transformer 4 (GPT-4) and Qwen-Chat, in interpreting electronic healthcare records to assist with rapid patient condition assessments, predict sepsis, and automate the generation of discharge summaries. The note emphasizes the significance of LLMs in processing unstructured data from electronic health records (EHRs), extracting meaningful insights, and supporting personalized medicine through nuanced understanding of patient histories. Despite the technical complexity of deploying LLMs in clinical settings, this document provides a comprehensive guide to facilitate the effective integration of LLMs into clinical workflows, focusing on the use of DashScope's application programming interface (API) services for judgment on patient prognosis and organ support recommendations based on natural language in EHRs. By illustrating practical steps and best practices, this work aims to lower the technical barriers for clinicians and researchers, enabling broader adoption of LLMs in clinical research and practice to enhance patient care and outcomes.
Collapse
Affiliation(s)
- Zhongheng Zhang
- Department of Emergency Medicine, Provincial Key Laboratory of Precise Diagnosis and Treatment of Abdominal Infection, Sir Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- School of Medicine, Shaoxing University, Shaoxing, Zhejiang, China
| | - Hongying Ni
- Department of Critical Care Medicine, Zhejiang University School of Medicine, Affiliated Jinhua Hospital, Jinhua, China
| |
Collapse
|
2
|
Mohammad-Rahimi H, Setzer FC, Aminoshariae A, Dummer PMH, Duncan HF, Nosrat A. Artificial intelligence chatbots in endodontic education-Concepts and potential applications. Int Endod J 2025. [PMID: 40164964 DOI: 10.1111/iej.14231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2024] [Revised: 01/29/2025] [Accepted: 03/20/2025] [Indexed: 04/02/2025]
Abstract
The integration of artificial intelligence (AI) into education is transforming learning across various domains, including dentistry. Endodontic education can significantly benefit from AI chatbots; however, knowledge gaps regarding their potential and limitations hinder their effective utilization. This narrative review aims to: (A) explain the core functionalities of AI chatbots, including their reliance on natural language processing (NLP), machine learning (ML), and deep learning (DL); (B) explore their applications in endodontic education for personalized learning, interactive training, and clinical decision support; (C) discuss the challenges posed by technical limitations, ethical considerations, and the potential for misinformation. The review highlights that AI chatbots provide learners with immediate access to knowledge, personalized educational experiences, and tools for developing clinical reasoning through case-based learning. Educators benefit from streamlined curriculum development, automated assessment creation, and evidence-based resource integration. Despite these advantages, concerns such as chatbot hallucinations, algorithmic biases, potential for plagiarism, and the spread of misinformation require careful consideration. Analysis of current research reveals limited endodontic-specific studies, emphasizing the need for tailored chatbot solutions validated for accuracy and relevance. Successful integration will require collaborative efforts among educators, developers, and professional organizations to address challenges, ensure ethical use, and establish evaluation frameworks.
Collapse
Affiliation(s)
- Hossein Mohammad-Rahimi
- Department of Dentistry and Oral Health, Aarhus University, Aarhus, Denmark
- Conservative Dentistry and Periodontology, LMU Klinikum, LMU, Munich, Germany
| | - Frank C Setzer
- Department of Endodontics, School of Dental Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Anita Aminoshariae
- Department of Endodontics, School of Dental Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | | | - Henry F Duncan
- Division of Restorative Dentistry, Dublin Dental University Hospital, Trinity College Dublin, Dublin, Ireland
| | - Ali Nosrat
- Department of Advanced Oral Sciences and Therapeutics, School of Dentistry, University of Maryland Baltimore, Baltimore, Maryland, USA
- Private Practice, Centreville Endodontics, Centreville, Virginia, USA
| |
Collapse
|
3
|
Traipidok P, Srisombundit P, Tassanakijpanich N, Charleowsak P, Thongseiratch T. Evaluating ChatGPT-4omni in paediatric developmental screening: direct versus sequential prompts. BMJ Paediatr Open 2025; 9:e002809. [PMID: 40032588 DOI: 10.1136/bmjpo-2024-002809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Accepted: 02/17/2025] [Indexed: 03/05/2025] Open
Abstract
Integrating Large Language Models like ChatGPT-4omni (ChatGPT-4o) into paediatric healthcare could revolutionise developmental screening. This study evaluated ChatGPT-4o's efficacy in paediatric developmental screening using Direct and Sequential Prompting methods compared with the Bayley Scales of Infant Development, Third Edition. Among 106 paediatric cases, Direct Prompting showed a sensitivity of 73.42% and overall accuracy of 69.81%, while Sequential Prompting had a specificity of 62.96% and overall accuracy of 67.92%. Both methods demonstrate potential for improving the efficiency and accessibility of paediatric developmental screening, with Direct Prompts being more sensitive and Sequential Prompts more specific.
Collapse
Affiliation(s)
- Pathrada Traipidok
- Prince of Songkla University Faculty of Medicine, Hat Yai, Songkhla, Thailand
| | | | | | - Pattra Charleowsak
- Prince of Songkla University Faculty of Medicine, Hat Yai, Songkhla, Thailand
| | | |
Collapse
|
4
|
Annor E, Atarere J, Ubah N, Jolaoye O, Kunkle B, Egbo O, Martin DK. Assessing online chat-based artificial intelligence models for weight loss recommendation appropriateness and bias in the presence of guideline incongruence. Int J Obes (Lond) 2025:10.1038/s41366-025-01717-5. [PMID: 39871015 DOI: 10.1038/s41366-025-01717-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 12/17/2024] [Accepted: 01/14/2025] [Indexed: 01/29/2025]
Abstract
BACKGROUND AND AIM Managing obesity requires a comprehensive approach that involves therapeutic lifestyle changes, medications, or metabolic surgery. Many patients seek health information from online sources and artificial intelligence models like ChatGPT, Google Gemini, and Microsoft Copilot before consulting health professionals. This study aims to evaluate the appropriateness of the responses of Google Gemini and Microsoft Copilot to questions on pharmacologic and surgical management of obesity and assess for bias in their responses to either the ADA or AACE guidelines. METHODS Ten questions were compiled into a set and posed separately to the free editions of Google Gemini and Microsoft Copilot. Recommendations for the questions were extracted from the ADA and the AACE websites, and the responses were graded by reviewers for appropriateness, completeness, and bias to any of the guidelines. RESULTS All responses from Microsoft Copilot and 8/10 (80%) responses from Google Gemini were appropriate. There were no inappropriate responses. Google Gemini refused to respond to two questions and insisted on consulting a physician. Microsoft Copilot (10/10; 100%) provided a higher proportion of complete responses than Google Gemini (5/10; 50%). Of the eight responses from Google Gemini, none were biased towards any of the guidelines, while two of the responses from Microsoft Copilot were biased. CONCLUSION The study highlights the role of Microsoft Copilot and Google Gemini in weight loss management. The differences in their responses may be attributed to the variation in the quality and scope of their training data and design.
Collapse
Affiliation(s)
- Eugene Annor
- Department of Internal Medicine, University of Illinois College of Medicine, Peoria, IL, USA.
| | - Joseph Atarere
- Department of Medicine, MedStar Health, Baltimore, MD, USA
| | - Nneoma Ubah
- Department of Internal Medicine, Montefiore St. Luke's Cornwall Hospital, Newburgh, NY, USA
| | - Oladoyin Jolaoye
- Department of Internal Medicine, University of Illinois College of Medicine, Peoria, IL, USA
| | - Bryce Kunkle
- Department of Medicine, Georgetown University Hospital, Washington, DC, USA
| | - Olachi Egbo
- Department of Medicine, Aurora Medical Center, Oshkosh, WI, USA
| | - Daniel K Martin
- Department of Gastroenterology and Hepatology, University of Illinois College of Medicine, Peoria, IL, USA
| |
Collapse
|
5
|
Kaiser KN, Hughes AJ, Yang AD, Turk AA, Mohanty S, Gonzalez AA, Patzer RE, Bilimoria KY, Ellis RJ. Accuracy and consistency of publicly available Large Language Models as clinical decision support tools for the management of colon cancer. J Surg Oncol 2024; 130:1104-1110. [PMID: 39155667 DOI: 10.1002/jso.27821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Accepted: 07/26/2024] [Indexed: 08/20/2024]
Abstract
BACKGROUND Large Language Models (LLM; e.g., ChatGPT) may be used to assist clinicians and form the basis of future clinical decision support (CDS) for colon cancer. The objectives of this study were to (1) evaluate the response accuracy of two LLM-powered interfaces in identifying guideline-based care in simulated clinical scenarios and (2) define response variation between and within LLMs. METHODS Clinical scenarios with "next steps in management" queries were developed based on National Comprehensive Cancer Network guidelines. Prompts were entered into OpenAI ChatGPT and Microsoft Copilot in independent sessions, yielding four responses per scenario. Responses were compared to clinician-developed responses and assessed for accuracy, consistency, and verbosity. RESULTS Across 108 responses to 27 prompts, both platforms yielded completely correct responses to 36% of scenarios (n = 39). For ChatGPT, 39% (n = 21) were missing information and 24% (n = 14) contained inaccurate/misleading information. Copilot performed similarly, with 37% (n = 20) having missing information and 28% (n = 15) containing inaccurate/misleading information (p = 0.96). Clinician responses were significantly shorter (34 ± 15.5 words) than both ChatGPT (251 ± 86 words) and Copilot (271 ± 67 words; both p < 0.01). CONCLUSIONS Publicly available LLM applications often provide verbose responses with vague or inaccurate information regarding colon cancer management. Significant optimization is required before use in formal CDS.
Collapse
Affiliation(s)
- Kristen N Kaiser
- Department of Surgery, Indiana School of Medicine, Surgical Outcomes and Quality Improvement Center (SOQIC), Indianapolis, Indiana, USA
| | - Alexa J Hughes
- Department of Surgery, Indiana School of Medicine, Surgical Outcomes and Quality Improvement Center (SOQIC), Indianapolis, Indiana, USA
| | - Anthony D Yang
- Department of Surgery, Indiana School of Medicine, Surgical Outcomes and Quality Improvement Center (SOQIC), Indianapolis, Indiana, USA
- Department of Surgery, Division of Surgical Oncology, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Anita A Turk
- Department of Medicine, Division of Hematology & Oncology, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Sanjay Mohanty
- Department of Surgery, Indiana School of Medicine, Surgical Outcomes and Quality Improvement Center (SOQIC), Indianapolis, Indiana, USA
- Department of Surgery, Division of Surgical Oncology, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Andrew A Gonzalez
- Department of Surgery, Indiana School of Medicine, Surgical Outcomes and Quality Improvement Center (SOQIC), Indianapolis, Indiana, USA
| | - Rachel E Patzer
- Department of Surgery, Indiana School of Medicine, Surgical Outcomes and Quality Improvement Center (SOQIC), Indianapolis, Indiana, USA
- Regenstrief Institute, Indianapolis, USA
| | - Karl Y Bilimoria
- Department of Surgery, Indiana School of Medicine, Surgical Outcomes and Quality Improvement Center (SOQIC), Indianapolis, Indiana, USA
- Department of Surgery, Division of Surgical Oncology, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Ryan J Ellis
- Department of Surgery, Indiana School of Medicine, Surgical Outcomes and Quality Improvement Center (SOQIC), Indianapolis, Indiana, USA
- Department of Surgery, Division of Surgical Oncology, Indiana University School of Medicine, Indianapolis, Indiana, USA
| |
Collapse
|