1
|
Warrier A, Singh R, Haleem A, Zaki H, Eloy JA. The Comparative Diagnostic Capability of Large Language Models in Otolaryngology. Laryngoscope 2024; 134:3997-4002. [PMID: 38563415 DOI: 10.1002/lary.31434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/05/2024] [Accepted: 03/21/2024] [Indexed: 04/04/2024]
Abstract
OBJECTIVES Evaluate and compare the ability of large language models (LLMs) to diagnose various ailments in otolaryngology. METHODS We collected all 100 clinical vignettes from the second edition of Otolaryngology Cases-The University of Cincinnati Clinical Portfolio by Pensak et al. With the addition of the prompt "Provide a diagnosis given the following history," we prompted ChatGPT-3.5, Google Bard, and Bing-GPT4 to provide a diagnosis for each vignette. These diagnoses were compared to the portfolio for accuracy and recorded. All queries were run in June 2023. RESULTS ChatGPT-3.5 was the most accurate model (89% success rate), followed by Google Bard (82%) and Bing GPT (74%). A chi-squared test revealed a significant difference between the three LLMs in providing correct diagnoses (p = 0.023). Of the 100 vignettes, seven require additional testing results (i.e., biopsy, non-contrast CT) for accurate clinical diagnosis. When omitting these vignettes, the revised success rates were 95.7% for ChatGPT-3.5, 88.17% for Google Bard, and 78.72% for Bing-GPT4 (p = 0.002). CONCLUSIONS ChatGPT-3.5 offers the most accurate diagnoses when given established clinical vignettes as compared to Google Bard and Bing-GPT4. LLMs may accurately offer assessments for common otolaryngology conditions but currently require detailed prompt information and critical supervision from clinicians. There is vast potential in the clinical applicability of LLMs; however, practitioners should be wary of possible "hallucinations" and misinformation in responses. LEVEL OF EVIDENCE 3 Laryngoscope, 134:3997-4002, 2024.
Collapse
Affiliation(s)
- Akshay Warrier
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| | - Rohan Singh
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| | - Afash Haleem
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| | - Haider Zaki
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| | - Jean Anderson Eloy
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
- Center for Skull Base and Pituitary Surgery, Neurological Institute of New Jersey, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| |
Collapse
|
2
|
Moulaei K, Yadegari A, Baharestani M, Farzanbakhsh S, Sabet B, Reza Afrash M. Generative artificial intelligence in healthcare: A scoping review on benefits, challenges and applications. Int J Med Inform 2024; 188:105474. [PMID: 38733640 DOI: 10.1016/j.ijmedinf.2024.105474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 05/03/2024] [Accepted: 05/04/2024] [Indexed: 05/13/2024]
Abstract
BACKGROUND Generative artificial intelligence (GAI) is revolutionizing healthcare with solutions for complex challenges, enhancing diagnosis, treatment, and care through new data and insights. However, its integration raises questions about applications, benefits, and challenges. Our study explores these aspects, offering an overview of GAI's applications and future prospects in healthcare. METHODS This scoping review searched Web of Science, PubMed, and Scopus . The selection of studies involved screening titles, reviewing abstracts, and examining full texts, adhering to the PRISMA-ScR guidelines throughout the process. RESULTS From 1406 articles across three databases, 109 met inclusion criteria after screening and deduplication. Nine GAI models were utilized in healthcare, with ChatGPT (n = 102, 74 %), Google Bard (Gemini) (n = 16, 11 %), and Microsoft Bing AI (n = 10, 7 %) being the most frequently employed. A total of 24 different applications of GAI in healthcare were identified, with the most common being "offering insights and information on health conditions through answering questions" (n = 41) and "diagnosis and prediction of diseases" (n = 17). In total, 606 benefits and challenges were identified, which were condensed to 48 benefits and 61 challenges after consolidation. The predominant benefits included "Providing rapid access to information and valuable insights" and "Improving prediction and diagnosis accuracy", while the primary challenges comprised "generating inaccurate or fictional content", "unknown source of information and fake references for texts", and "lower accuracy in answering questions". CONCLUSION This scoping review identified the applications, benefits, and challenges of GAI in healthcare. This synthesis offers a crucial overview of GAI's potential to revolutionize healthcare, emphasizing the imperative to address its limitations.
Collapse
Affiliation(s)
- Khadijeh Moulaei
- Department of Health Information Technology, School of Paramedical, Ilam University of Medical Sciences, Ilam, Iran
| | - Atiye Yadegari
- Department of Pediatric Dentistry, School of Dentistry, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Mahdi Baharestani
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
| | - Shayan Farzanbakhsh
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
| | - Babak Sabet
- Department of Surgery, Faculty of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mohammad Reza Afrash
- Department of Artificial Intelligence, Smart University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
3
|
Small WR, Wiesenfeld B, Brandfield-Harvey B, Jonassen Z, Mandal S, Stevens ER, Major VJ, Lostraglio E, Szerencsy A, Jones S, Aphinyanaphongs Y, Johnson SB, Nov O, Mann D. Large Language Model-Based Responses to Patients' In-Basket Messages. JAMA Netw Open 2024; 7:e2422399. [PMID: 39012633 PMCID: PMC11252893 DOI: 10.1001/jamanetworkopen.2024.22399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 05/16/2024] [Indexed: 07/17/2024] Open
Abstract
Importance Virtual patient-physician communications have increased since 2020 and negatively impacted primary care physician (PCP) well-being. Generative artificial intelligence (GenAI) drafts of patient messages could potentially reduce health care professional (HCP) workload and improve communication quality, but only if the drafts are considered useful. Objectives To assess PCPs' perceptions of GenAI drafts and to examine linguistic characteristics associated with equity and perceived empathy. Design, Setting, and Participants This cross-sectional quality improvement study tested the hypothesis that PCPs' ratings of GenAI drafts (created using the electronic health record [EHR] standard prompts) would be equivalent to HCP-generated responses on 3 dimensions. The study was conducted at NYU Langone Health using private patient-HCP communications at 3 internal medicine practices piloting GenAI. Exposures Randomly assigned patient messages coupled with either an HCP message or the draft GenAI response. Main Outcomes and Measures PCPs rated responses' information content quality (eg, relevance), using a Likert scale, communication quality (eg, verbosity), using a Likert scale, and whether they would use the draft or start anew (usable vs unusable). Branching logic further probed for empathy, personalization, and professionalism of responses. Computational linguistics methods assessed content differences in HCP vs GenAI responses, focusing on equity and empathy. Results A total of 16 PCPs (8 [50.0%] female) reviewed 344 messages (175 GenAI drafted; 169 HCP drafted). Both GenAI and HCP responses were rated favorably. GenAI responses were rated higher for communication style than HCP responses (mean [SD], 3.70 [1.15] vs 3.38 [1.20]; P = .01, U = 12 568.5) but were similar to HCPs on information content (mean [SD], 3.53 [1.26] vs 3.41 [1.27]; P = .37; U = 13 981.0) and usable draft proportion (mean [SD], 0.69 [0.48] vs 0.65 [0.47], P = .49, t = -0.6842). Usable GenAI responses were considered more empathetic than usable HCP responses (32 of 86 [37.2%] vs 13 of 79 [16.5%]; difference, 125.5%), possibly attributable to more subjective (mean [SD], 0.54 [0.16] vs 0.31 [0.23]; P < .001; difference, 74.2%) and positive (mean [SD] polarity, 0.21 [0.14] vs 0.13 [0.25]; P = .02; difference, 61.5%) language; they were also numerically longer (mean [SD] word count, 90.5 [32.0] vs 65.4 [62.6]; difference, 38.4%), but the difference was not statistically significant (P = .07) and more linguistically complex (mean [SD] score, 125.2 [47.8] vs 95.4 [58.8]; P = .002; difference, 31.2%). Conclusions In this cross-sectional study of PCP perceptions of an EHR-integrated GenAI chatbot, GenAI was found to communicate information better and with more empathy than HCPs, highlighting its potential to enhance patient-HCP communication. However, GenAI drafts were less readable than HCPs', a significant concern for patients with low health or English literacy.
Collapse
Affiliation(s)
| | | | | | - Zoe Jonassen
- NYU Grossman School of Medicine, New York, New York
| | | | | | | | | | | | - Simon Jones
- NYU Grossman School of Medicine, New York, New York
| | | | | | - Oded Nov
- NYU Tandon School of Engineering, New York, New York
| | - Devin Mann
- NYU Grossman School of Medicine, New York, New York
| |
Collapse
|
4
|
Nasef H, Patel H, Amin Q, Baum S, Ratnasekera A, Ang D, Havron WS, Nakayama D, Elkbuli A. Evaluating the Accuracy, Comprehensiveness, and Validity of ChatGPT Compared to Evidence-Based Sources Regarding Common Surgical Conditions: Surgeons' Perspectives. Am Surg 2024:31348241256075. [PMID: 38794965 DOI: 10.1177/00031348241256075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2024]
Abstract
BACKGROUND This study aims to assess the accuracy, comprehensiveness, and validity of ChatGPT compared to evidence-based sources regarding the diagnosis and management of common surgical conditions by surveying the perceptions of U.S. board-certified practicing surgeons. METHODS An anonymous cross-sectional survey was distributed to U.S. practicing surgeons from June 2023 to March 2024. The survey comprised 94 multiple-choice questions evaluating diagnostic and management information for five common surgical conditions from evidence-based sources or generated by ChatGPT. Statistical analysis included descriptive statistics and paired-sample t-tests. RESULTS Participating surgeons were primarily aged 40-50 years (43%), male (86%), White (57%), and had 5-10 years or >15 years of experience (86%). The majority of surgeons had no prior experience with ChatGPT in surgical practice (86%). For material discussing both acute cholecystitis and upper gastrointestinal hemorrhage, evidence-based sources were rated as significantly more comprehensive (3.57 (±.535) vs 2.00 (±1.16), P = .025) (4.14 (±.69) vs 2.43 (±.98), P < .001) and valid (3.71 (±.488) vs 2.86 (±1.07), P = .045) (3.71 (±.76) vs 2.71 (±.95) P = .038) than ChatGPT. However, there was no significant difference in accuracy between the two sources (3.71 vs 3.29, P = .289) (3.57 vs 2.71, P = .111). CONCLUSION Surveyed U.S. board-certified practicing surgeons rated evidence-based sources as significantly more comprehensive and valid compared to ChatGPT across the majority of surveyed surgical conditions. However, there was no significant difference in accuracy between the sources across the majority of surveyed conditions. While ChatGPT may offer potential benefits in surgical practice, further refinement and validation are necessary to enhance its utility and acceptance among surgeons.
Collapse
Affiliation(s)
- Hazem Nasef
- NOVA Southeastern University, Kiran Patel College of Allopathic Medicine, Fort Lauderdale, FL, USA
| | - Heli Patel
- NOVA Southeastern University, Kiran Patel College of Allopathic Medicine, Fort Lauderdale, FL, USA
| | - Quratulain Amin
- NOVA Southeastern University, Kiran Patel College of Allopathic Medicine, Fort Lauderdale, FL, USA
| | - Samuel Baum
- Louisiana State University Health Science Center, College of Medicine, New Orleans, LA, USA
| | | | - Darwin Ang
- Department of Surgery, Ocala Regional Medical Center, Ocala, FL, USA
| | - William S Havron
- Department of Surgical Education, Orlando Regional Medical Center, Orlando, FL, USA
- Department of Surgery, Division of Trauma and Surgical Critical Care, Orlando Regional Medical Center, Orlando, FL, USA
| | - Don Nakayama
- Mercer University School of Medicine, Columbus, GA, USA
| | - Adel Elkbuli
- Department of Surgical Education, Orlando Regional Medical Center, Orlando, FL, USA
- Department of Surgery, Division of Trauma and Surgical Critical Care, Orlando Regional Medical Center, Orlando, FL, USA
| |
Collapse
|
5
|
Kedia N, Sanjeev S, Ong J, Chhablani J. ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology. Eye (Lond) 2024; 38:1252-1261. [PMID: 38172581 PMCID: PMC11076576 DOI: 10.1038/s41433-023-02915-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/23/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open
Abstract
ChatGPT, an artificial intelligence (AI) chatbot built on large language models (LLMs), has rapidly gained popularity. The benefits and limitations of this transformative technology have been discussed across various fields, including medicine. The widespread availability of ChatGPT has enabled clinicians to study how these tools could be used for a variety of tasks such as generating differential diagnosis lists, organizing patient notes, and synthesizing literature for scientific research. LLMs have shown promising capabilities in ophthalmology by performing well on the Ophthalmic Knowledge Assessment Program, providing fairly accurate responses to questions about retinal diseases, and in generating differential diagnoses list. There are current limitations to this technology, including the propensity of LLMs to "hallucinate", or confidently generate false information; their potential role in perpetuating biases in medicine; and the challenges in incorporating LLMs into research without allowing "AI-plagiarism" or publication of false information. In this paper, we provide a balanced overview of what LLMs are and introduce some of the LLMs that have been generated in the past few years. We discuss recent literature evaluating the role of these language models in medicine with a focus on ChatGPT. The field of AI is fast-paced, and new applications based on LLMs are being generated rapidly; therefore, it is important for ophthalmologists to be aware of how this technology works and how it may impact patient care. Here, we discuss the benefits, limitations, and future advancements of LLMs in patient care and research.
Collapse
Affiliation(s)
- Nikita Kedia
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | | | - Joshua Ong
- Department of Ophthalmology and Visual Sciences, University of Michigan Kellogg Eye Center, Ann Arbor, MI, USA
| | - Jay Chhablani
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
6
|
Breeding T, Martinez B, Patel H, Nasef H, Arif H, Nakayama D, Elkbuli A. The Utilization of ChatGPT in Reshaping Future Medical Education and Learning Perspectives: A Curse or a Blessing? Am Surg 2024; 90:560-566. [PMID: 37309705 DOI: 10.1177/00031348231180950] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
BACKGROUND ChatGPT has substantial potential to revolutionize medical education. We aim to assess how medical students and laypeople evaluate information produced by ChatGPT compared to an evidence-based resource on the diagnosis and management of 5 common surgical conditions. METHODS A 60-question anonymous online survey was distributed to third- and fourth-year U.S. medical students and laypeople to evaluate articles produced by ChatGPT and an evidence-based source on clarity, relevance, reliability, validity, organization, and comprehensiveness. Participants received 2 blinded articles, 1 from each source, for each surgical condition. Paired-sample t-tests were used to compare ratings between the 2 sources. RESULTS Of 56 survey participants, 50.9% (n = 28) were U.S. medical students and 49.1% (n = 27) were from the general population. Medical students reported that ChatGPT articles displayed significantly more clarity (appendicitis: 4.39 vs 3.89, P = .020; diverticulitis: 4.54 vs 3.68, P < .001; SBO 4.43 vs 3.79, P = .003; GI bleed: 4.36 vs 3.93, P = .020) and better organization (diverticulitis: 4.36 vs 3.68, P = .021; SBO: 4.39 vs 3.82, P = .033) than the evidence-based source. However, for all 5 conditions, medical students found evidence-based passages to be more comprehensive than ChatGPT articles (cholecystitis: 4.04 vs 3.36, P = .009; appendicitis: 4.07 vs 3.36, P = .015; diverticulitis: 4.07 vs 3.36, P = .015; small bowel obstruction: 4.11 vs 3.54, P = .030; upper GI bleed: 4.11 vs 3.29, P = .003). CONCLUSION Medical students perceived ChatGPT articles to be clearer and better organized than evidence-based sources on the pathogenesis, diagnosis, and management of 5 common surgical pathologies. However, evidence-based articles were rated as significantly more comprehensive.
Collapse
Affiliation(s)
- Tessa Breeding
- Kiran Patel College of Allopathic Medicine, NOVA Southeastern University, Fort Lauderdale, FL, USA
| | - Brian Martinez
- Kiran Patel College of Allopathic Medicine, NOVA Southeastern University, Fort Lauderdale, FL, USA
| | - Heli Patel
- Kiran Patel College of Allopathic Medicine, NOVA Southeastern University, Fort Lauderdale, FL, USA
| | - Hazem Nasef
- Kiran Patel College of Allopathic Medicine, NOVA Southeastern University, Fort Lauderdale, FL, USA
| | - Hasan Arif
- Kiran Patel College of Allopathic Medicine, NOVA Southeastern University, Fort Lauderdale, FL, USA
| | - Don Nakayama
- Mercer University School of Medicine, Columbus, GA, USA
- Department of Pediatric Surgery, Piedmont Columbus Regional Hospital, Piedmont, GA, USA
| | - Adel Elkbuli
- Department of Surgery, Division of Trauma and Surgical Critical Care, Orlando Regional Medical Center, Orlando, FL, USA
- Department of Surgical Education, Orlando Regional Medical Center, Orlando, FL, USA
| |
Collapse
|
7
|
Breeding T, Patel H, Nasef H, Elkbuli A. Letter re: "The Utilization of ChatGPT in Reshaping Future Medical Education and Learning Perspectives: A Curse or a Blessing?". Am Surg 2024; 90:913-914. [PMID: 37776260 DOI: 10.1177/00031348231204913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/02/2023]
Affiliation(s)
- Tessa Breeding
- Kiran Patel College of Allopathic Medicine, NOVA Southeastern University, Fort Lauderdale, FL, USA
| | - Heli Patel
- Kiran Patel College of Allopathic Medicine, NOVA Southeastern University, Fort Lauderdale, FL, USA
| | - Hazem Nasef
- Kiran Patel College of Allopathic Medicine, NOVA Southeastern University, Fort Lauderdale, FL, USA
| | - Adel Elkbuli
- Department of Surgery, Division of Trauma and Surgical Critical Care, Orlando Regional Medical Center, Orlando, FL, USA
- Department of Surgical Education, Orlando Regional Medical Center, Orlando, FL, USA
| |
Collapse
|
8
|
Ahmed W, Saturno M, Rajjoub R, Duey AH, Zaidat B, Hoang T, Restrepo Mejia M, Gallate ZS, Shrestha N, Tang J, Zapolsky I, Kim JS, Cho SK. ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis. EUROPEAN SPINE JOURNAL : OFFICIAL PUBLICATION OF THE EUROPEAN SPINE SOCIETY, THE EUROPEAN SPINAL DEFORMITY SOCIETY, AND THE EUROPEAN SECTION OF THE CERVICAL SPINE RESEARCH SOCIETY 2024:10.1007/s00586-024-08198-6. [PMID: 38489044 DOI: 10.1007/s00586-024-08198-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 02/01/2024] [Accepted: 02/17/2024] [Indexed: 03/17/2024]
Abstract
BACKGROUND CONTEXT Clinical guidelines, developed in concordance with the literature, are often used to guide surgeons' clinical decision making. Recent advancements of large language models and artificial intelligence (AI) in the medical field come with exciting potential. OpenAI's generative AI model, known as ChatGPT, can quickly synthesize information and generate responses grounded in medical literature, which may prove to be a useful tool in clinical decision-making for spine care. The current literature has yet to investigate the ability of ChatGPT to assist clinical decision making with regard to degenerative spondylolisthesis. PURPOSE The study aimed to compare ChatGPT's concordance with the recommendations set forth by The North American Spine Society (NASS) Clinical Guideline for the Diagnosis and Treatment of Degenerative Spondylolisthesis and assess ChatGPT's accuracy within the context of the most recent literature. METHODS ChatGPT-3.5 and 4.0 was prompted with questions from the NASS Clinical Guideline for the Diagnosis and Treatment of Degenerative Spondylolisthesis and graded its recommendations as "concordant" or "nonconcordant" relative to those put forth by NASS. A response was considered "concordant" when ChatGPT generated a recommendation that accurately reproduced all major points made in the NASS recommendation. Any responses with a grading of "nonconcordant" were further stratified into two subcategories: "Insufficient" or "Over-conclusive," to provide further insight into grading rationale. Responses between GPT-3.5 and 4.0 were compared using Chi-squared tests. RESULTS ChatGPT-3.5 answered 13 of NASS's 28 total clinical questions in concordance with NASS's guidelines (46.4%). Categorical breakdown is as follows: Definitions and Natural History (1/1, 100%), Diagnosis and Imaging (1/4, 25%), Outcome Measures for Medical Intervention and Surgical Treatment (0/1, 0%), Medical and Interventional Treatment (4/6, 66.7%), Surgical Treatment (7/14, 50%), and Value of Spine Care (0/2, 0%). When NASS indicated there was sufficient evidence to offer a clear recommendation, ChatGPT-3.5 generated a concordant response 66.7% of the time (6/9). However, ChatGPT-3.5's concordance dropped to 36.8% when asked clinical questions that NASS did not provide a clear recommendation on (7/19). A further breakdown of ChatGPT-3.5's nonconcordance with the guidelines revealed that a vast majority of its inaccurate recommendations were due to them being "over-conclusive" (12/15, 80%), rather than "insufficient" (3/15, 20%). ChatGPT-4.0 answered 19 (67.9%) of the 28 total questions in concordance with NASS guidelines (P = 0.177). When NASS indicated there was sufficient evidence to offer a clear recommendation, ChatGPT-4.0 generated a concordant response 66.7% of the time (6/9). ChatGPT-4.0's concordance held up at 68.4% when asked clinical questions that NASS did not provide a clear recommendation on (13/19, P = 0.104). CONCLUSIONS This study sheds light on the duality of LLM applications within clinical settings: one of accuracy and utility in some contexts versus inaccuracy and risk in others. ChatGPT was concordant for most clinical questions NASS offered recommendations for. However, for questions NASS did not offer best practices, ChatGPT generated answers that were either too general or inconsistent with the literature, and even fabricated data/citations. Thus, clinicians should exercise extreme caution when attempting to consult ChatGPT for clinical recommendations, taking care to ensure its reliability within the context of recent literature.
Collapse
Affiliation(s)
- Wasil Ahmed
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | - Rami Rajjoub
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Akiro H Duey
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bashar Zaidat
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Timothy Hoang
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | | | - Nancy Shrestha
- Chicago Medical School at Rosalind Franklin University, North Chicago, IL, USA
| | - Justin Tang
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ivan Zapolsky
- Department of Orthopedics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY, 10029, USA
| | - Jun S Kim
- Department of Orthopedics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY, 10029, USA
| | - Samuel K Cho
- Department of Orthopedics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY, 10029, USA.
| |
Collapse
|
9
|
Park YJ, Pillai A, Deng J, Guo E, Gupta M, Paget M, Naugler C. Assessing the research landscape and clinical utility of large language models: a scoping review. BMC Med Inform Decis Mak 2024; 24:72. [PMID: 38475802 DOI: 10.1186/s12911-024-02459-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Accepted: 02/12/2024] [Indexed: 03/14/2024] Open
Abstract
IMPORTANCE Large language models (LLMs) like OpenAI's ChatGPT are powerful generative systems that rapidly synthesize natural language responses. Research on LLMs has revealed their potential and pitfalls, especially in clinical settings. However, the evolving landscape of LLM research in medicine has left several gaps regarding their evaluation, application, and evidence base. OBJECTIVE This scoping review aims to (1) summarize current research evidence on the accuracy and efficacy of LLMs in medical applications, (2) discuss the ethical, legal, logistical, and socioeconomic implications of LLM use in clinical settings, (3) explore barriers and facilitators to LLM implementation in healthcare, (4) propose a standardized evaluation framework for assessing LLMs' clinical utility, and (5) identify evidence gaps and propose future research directions for LLMs in clinical applications. EVIDENCE REVIEW We screened 4,036 records from MEDLINE, EMBASE, CINAHL, medRxiv, bioRxiv, and arXiv from January 2023 (inception of the search) to June 26, 2023 for English-language papers and analyzed findings from 55 worldwide studies. Quality of evidence was reported based on the Oxford Centre for Evidence-based Medicine recommendations. FINDINGS Our results demonstrate that LLMs show promise in compiling patient notes, assisting patients in navigating the healthcare system, and to some extent, supporting clinical decision-making when combined with human oversight. However, their utilization is limited by biases in training data that may harm patients, the generation of inaccurate but convincing information, and ethical, legal, socioeconomic, and privacy concerns. We also identified a lack of standardized methods for evaluating LLMs' effectiveness and feasibility. CONCLUSIONS AND RELEVANCE This review thus highlights potential future directions and questions to address these limitations and to further explore LLMs' potential in enhancing healthcare delivery.
Collapse
Affiliation(s)
- Ye-Jean Park
- Temerty Faculty of Medicine, University of Toronto, 1 King's College Cir, M5S 1A8, Toronto, ON, Canada.
| | - Abhinav Pillai
- Cumming School of Medicine, University of Calgary, 3330 Hospital Dr NW, T2N 4N1, Calgary, AB, Canada
| | - Jiawen Deng
- Temerty Faculty of Medicine, University of Toronto, 1 King's College Cir, M5S 1A8, Toronto, ON, Canada
| | - Eddie Guo
- Cumming School of Medicine, University of Calgary, 3330 Hospital Dr NW, T2N 4N1, Calgary, AB, Canada
| | - Mehul Gupta
- Cumming School of Medicine, University of Calgary, 3330 Hospital Dr NW, T2N 4N1, Calgary, AB, Canada
| | - Mike Paget
- Cumming School of Medicine, University of Calgary, 3330 Hospital Dr NW, T2N 4N1, Calgary, AB, Canada
| | - Christopher Naugler
- Cumming School of Medicine, University of Calgary, 3330 Hospital Dr NW, T2N 4N1, Calgary, AB, Canada
| |
Collapse
|
10
|
Li J, Dada A, Puladi B, Kleesiek J, Egger J. ChatGPT in healthcare: A taxonomy and systematic review. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 245:108013. [PMID: 38262126 DOI: 10.1016/j.cmpb.2024.108013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 12/29/2023] [Accepted: 01/08/2024] [Indexed: 01/25/2024]
Abstract
The recent release of ChatGPT, a chat bot research project/product of natural language processing (NLP) by OpenAI, stirs up a sensation among both the general public and medical professionals, amassing a phenomenally large user base in a short time. This is a typical example of the 'productization' of cutting-edge technologies, which allows the general public without a technical background to gain firsthand experience in artificial intelligence (AI), similar to the AI hype created by AlphaGo (DeepMind Technologies, UK) and self-driving cars (Google, Tesla, etc.). However, it is crucial, especially for healthcare researchers, to remain prudent amidst the hype. This work provides a systematic review of existing publications on the use of ChatGPT in healthcare, elucidating the 'status quo' of ChatGPT in medical applications, for general readers, healthcare professionals as well as NLP scientists. The large biomedical literature database PubMed is used to retrieve published works on this topic using the keyword 'ChatGPT'. An inclusion criterion and a taxonomy are further proposed to filter the search results and categorize the selected publications, respectively. It is found through the review that the current release of ChatGPT has achieved only moderate or 'passing' performance in a variety of tests, and is unreliable for actual clinical deployment, since it is not intended for clinical applications by design. We conclude that specialized NLP models trained on (bio)medical datasets still represent the right direction to pursue for critical clinical applications.
Collapse
Affiliation(s)
- Jianning Li
- Institute for Artificial Intelligence in Medicine, University Hospital Essen (AöR), Girardetstraße 2, 45131 Essen, Germany
| | - Amin Dada
- Institute for Artificial Intelligence in Medicine, University Hospital Essen (AöR), Girardetstraße 2, 45131 Essen, Germany
| | - Behrus Puladi
- Institute of Medical Informatics, University Hospital RWTH Aachen, Pauwelsstraße 30, 52074 Aachen, Germany; Department of Oral and Maxillofacial Surgery, University Hospital RWTH Aachen, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Jens Kleesiek
- Institute for Artificial Intelligence in Medicine, University Hospital Essen (AöR), Girardetstraße 2, 45131 Essen, Germany; TU Dortmund University, Department of Physics, Otto-Hahn-Straße 4, 44227 Dortmund, Germany
| | - Jan Egger
- Institute for Artificial Intelligence in Medicine, University Hospital Essen (AöR), Girardetstraße 2, 45131 Essen, Germany; Center for Virtual and Extended Reality in Medicine (ZvRM), University Hospital Essen, University Medicine Essen, Hufelandstraße 55, 45147 Essen, Germany.
| |
Collapse
|
11
|
Padovan M, Cosci B, Petillo A, Nerli G, Porciatti F, Scarinci S, Carlucci F, Dell’Amico L, Meliani N, Necciari G, Lucisano VC, Marino R, Foddis R, Palla A. ChatGPT in Occupational Medicine: A Comparative Study with Human Experts. Bioengineering (Basel) 2024; 11:57. [PMID: 38247934 PMCID: PMC10813435 DOI: 10.3390/bioengineering11010057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 01/01/2024] [Accepted: 01/04/2024] [Indexed: 01/23/2024] Open
Abstract
The objective of this study is to evaluate ChatGPT's accuracy and reliability in answering complex medical questions related to occupational health and explore the implications and limitations of AI in occupational health medicine. The study also provides recommendations for future research in this area and informs decision-makers about AI's impact on healthcare. A group of physicians was enlisted to create a dataset of questions and answers on Italian occupational medicine legislation. The physicians were divided into two teams, and each team member was assigned a different subject area. ChatGPT was used to generate answers for each question, with/without legislative context. The two teams then evaluated human and AI-generated answers blind, with each group reviewing the other group's work. Occupational physicians outperformed ChatGPT in generating accurate questions on a 5-point Likert score, while the answers provided by ChatGPT with access to legislative texts were comparable to those of professional doctors. Still, we found that users tend to prefer answers generated by humans, indicating that while ChatGPT is useful, users still value the opinions of occupational medicine professionals.
Collapse
Affiliation(s)
- Martina Padovan
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Bianca Cosci
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Armando Petillo
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Gianluca Nerli
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Francesco Porciatti
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Sergio Scarinci
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Francesco Carlucci
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Letizia Dell’Amico
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Niccolò Meliani
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Gabriele Necciari
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Vincenzo Carmelo Lucisano
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Riccardo Marino
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Rudy Foddis
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | | |
Collapse
|
12
|
Morales-Ramirez P, Mishek H, Dasgupta A. The Genie Is Out of the Bottle: What ChatGPT Can and Cannot Do for Medical Professionals. Obstet Gynecol 2024; 143:e1-e6. [PMID: 37944140 DOI: 10.1097/aog.0000000000005446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/12/2023] [Indexed: 11/12/2023]
Abstract
ChatGPT is a cutting-edge artificial intelligence technology that was released for public use in November 2022. Its rapid adoption has raised questions about capabilities, limitations, and risks. This article presents an overview of ChatGPT, and it highlights the current state of this technology for the medical field. The article seeks to provide a balanced perspective on what the model can and cannot do in three specific domains: clinical practice, research, and medical education. It also provides suggestions on how to optimize the use of this tool.
Collapse
|
13
|
Miao J, Thongprayoon C, Suppadungsuk S, Garcia Valencia OA, Qureshi F, Cheungpasitporn W. Innovating Personalized Nephrology Care: Exploring the Potential Utilization of ChatGPT. J Pers Med 2023; 13:1681. [PMID: 38138908 PMCID: PMC10744377 DOI: 10.3390/jpm13121681] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 12/02/2023] [Accepted: 12/02/2023] [Indexed: 12/24/2023] Open
Abstract
The rapid advancement of artificial intelligence (AI) technologies, particularly machine learning, has brought substantial progress to the field of nephrology, enabling significant improvements in the management of kidney diseases. ChatGPT, a revolutionary language model developed by OpenAI, is a versatile AI model designed to engage in meaningful and informative conversations. Its applications in healthcare have been notable, with demonstrated proficiency in various medical knowledge assessments. However, ChatGPT's performance varies across different medical subfields, posing challenges in nephrology-related queries. At present, comprehensive reviews regarding ChatGPT's potential applications in nephrology remain lacking despite the surge of interest in its role in various domains. This article seeks to fill this gap by presenting an overview of the integration of ChatGPT in nephrology. It discusses the potential benefits of ChatGPT in nephrology, encompassing dataset management, diagnostics, treatment planning, and patient communication and education, as well as medical research and education. It also explores ethical and legal concerns regarding the utilization of AI in medical practice. The continuous development of AI models like ChatGPT holds promise for the healthcare realm but also underscores the necessity of thorough evaluation and validation before implementing AI in real-world medical scenarios. This review serves as a valuable resource for nephrologists and healthcare professionals interested in fully utilizing the potential of AI in innovating personalized nephrology care.
Collapse
Affiliation(s)
- Jing Miao
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| | - Charat Thongprayoon
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| | - Supawadee Suppadungsuk
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
- Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Samut Prakan 10540, Thailand
| | - Oscar A. Garcia Valencia
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| | - Fawad Qureshi
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| | - Wisit Cheungpasitporn
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| |
Collapse
|
14
|
van Heerden A, Bosman S, Swendeman D, Comulada WS. Chatbots for HIV Prevention and Care: a Narrative Review. Curr HIV/AIDS Rep 2023; 20:481-486. [PMID: 38010467 PMCID: PMC10719151 DOI: 10.1007/s11904-023-00681-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/06/2023] [Indexed: 11/29/2023]
Abstract
PURPOSE OF REVIEW To explore the intersection of chatbots and HIV prevention and care. Current applications of chatbots in HIV services, the challenges faced, recent advancements, and future research directions are presented and discussed. RECENT FINDINGS Chatbots facilitate sensitive discussions about HIV thereby promoting prevention and care strategies. Trustworthiness and accuracy of information were identified as primary factors influencing user engagement with chatbots. Additionally, the integration of AI-driven models that process and generate human-like text into chatbots poses both breakthroughs and challenges in terms of privacy, bias, resources, and ethical issues. Chatbots in HIV prevention and care show potential; however, significant work remains in addressing associated ethical and practical concerns. The integration of large language models into chatbots is a promising future direction for their effective deployment in HIV services. Encouraging future research, collaboration among stakeholders, and bold innovative thinking will be pivotal in harnessing the full potential of chatbot interventions.
Collapse
Affiliation(s)
- Alastair van Heerden
- Center for Community Based Research, Human Sciences Research Council, Old Bus Depot, Pietermaritzburg, 3201, South Africa.
- SAMRC/WITS Developmental Pathways for Health Research Unit, Department of Paediatrics, School of Clinical Medicine, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, Gauteng, South Africa.
| | - Shannon Bosman
- Center for Community Based Research, Human Sciences Research Council, Old Bus Depot, Pietermaritzburg, 3201, South Africa
| | - Dallas Swendeman
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Center for Community Health, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, CA, USA
| | - Warren Scott Comulada
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Center for Community Health, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, CA, USA
- Department of Health Policy and Management, Fielding School of Public Health, University of California, Los Angeles, CA, USA
| |
Collapse
|
15
|
Morita P, Abhari S, Kaur J. Do ChatGPT and Other Artificial Intelligence Bots Have Applications in Health Policy-Making? Opportunities and Threats. Int J Health Policy Manag 2023; 12:8131. [PMID: 38618768 PMCID: PMC10843407 DOI: 10.34172/ijhpm.2023.8131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 10/16/2023] [Indexed: 04/16/2024] Open
Affiliation(s)
- Plinio Morita
- School of Public Health Sciences, University of Waterloo, Waterloo, ON, Canada
- Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada
- Research Institute for Aging, University of Waterloo, Waterloo, ON, Canada
- Centre for Digital Therapeutics, Techna Institute, University Health Network, Toronto, ON, Canada
- Institute of Health Policy, Management, and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Shahabeddin Abhari
- School of Public Health Sciences, University of Waterloo, Waterloo, ON, Canada
| | - Jasleen Kaur
- School of Public Health Sciences, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|
16
|
Koubaa A, Qureshi B, Ammar A, Khan Z, Boulila W, Ghouti L. Humans are still better than ChatGPT: Case of the IEEEXtreme competition. Heliyon 2023; 9:e21624. [PMID: 37954270 PMCID: PMC10638003 DOI: 10.1016/j.heliyon.2023.e21624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 10/06/2023] [Accepted: 10/25/2023] [Indexed: 11/14/2023] Open
Abstract
Since the release of ChatGPT, numerous studies have highlighted the remarkable performance of ChatGPT, which often rivals or even surpasses human capabilities in various tasks and domains. However, this paper presents a contrasting perspective by demonstrating an instance where human performance excels in typical tasks suited for ChatGPT, specifically in the domain of computer programming. We utilize the IEEExtreme Challenge competition as a benchmark-a prestigious, annual international programming contest encompassing a wide range of problems with different complexities. To conduct a thorough evaluation, we selected and executed a diverse set of 102 challenges, drawn from five distinct IEEExtreme editions, using three major programming languages: Python, Java, and C++. Our empirical analysis provides evidence that contrary to popular belief, human programmers maintain a competitive edge over ChatGPT in certain aspects of problem-solving within the programming context. In fact, we found that the average score obtained by ChatGPT on the set of IEEExtreme programming problems is 3.9 to 5.8 times lower than the average human score, depending on the programming language. This paper elaborates on these findings, offering critical insights into the limitations and potential areas of improvement for AI-based language models like ChatGPT.
Collapse
Affiliation(s)
- Anis Koubaa
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| | - Basit Qureshi
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| | - Adel Ammar
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| | - Zahid Khan
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| | - Wadii Boulila
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| | - Lahouari Ghouti
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| |
Collapse
|
17
|
Shao CY, Li H, Liu XL, Li C, Yang LQ, Zhang YJ, Luo J, Zhao J. Appropriateness and Comprehensiveness of Using ChatGPT for Perioperative Patient Education in Thoracic Surgery in Different Language Contexts: Survey Study. Interact J Med Res 2023; 12:e46900. [PMID: 37578819 PMCID: PMC10463083 DOI: 10.2196/46900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 07/22/2023] [Accepted: 07/27/2023] [Indexed: 08/15/2023] Open
Abstract
BACKGROUND ChatGPT, a dialogue-based artificial intelligence language model, has shown promise in assisting clinical workflows and patient-clinician communication. However, there is a lack of feasibility assessments regarding its use for perioperative patient education in thoracic surgery. OBJECTIVE This study aimed to assess the appropriateness and comprehensiveness of using ChatGPT for perioperative patient education in thoracic surgery in both English and Chinese contexts. METHODS This pilot study was conducted in February 2023. A total of 37 questions focused on perioperative patient education in thoracic surgery were created based on guidelines and clinical experience. Two sets of inquiries were made to ChatGPT for each question, one in English and the other in Chinese. The responses generated by ChatGPT were evaluated separately by experienced thoracic surgical clinicians for appropriateness and comprehensiveness based on a hypothetical draft response to a patient's question on the electronic information platform. For a response to be qualified, it required at least 80% of reviewers to deem it appropriate and 50% to deem it comprehensive. Statistical analyses were performed using the unpaired chi-square test or Fisher exact test, with a significance level set at P<.05. RESULTS The set of 37 commonly asked questions covered topics such as disease information, diagnostic procedures, perioperative complications, treatment measures, disease prevention, and perioperative care considerations. In both the English and Chinese contexts, 34 (92%) out of 37 responses were qualified in terms of both appropriateness and comprehensiveness. The remaining 3 (8%) responses were unqualified in these 2 contexts. The unqualified responses primarily involved the diagnosis of disease symptoms and surgical-related complications symptoms. The reasons for determining the responses as unqualified were similar in both contexts. There was no statistically significant difference (34/37, 92% vs 34/37, 92%; P=.99) in the qualification rate between the 2 language sets. CONCLUSIONS This pilot study demonstrates the potential feasibility of using ChatGPT for perioperative patient education in thoracic surgery in both English and Chinese contexts. ChatGPT is expected to enhance patient satisfaction, reduce anxiety, and improve compliance during the perioperative period. In the future, there will be remarkable potential application for using artificial intelligence, in conjunction with human review, for patient education and health consultation after patients have provided their informed consent.
Collapse
Affiliation(s)
- Chen-Ye Shao
- Department of Thoracic Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Hui Li
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Xiao-Long Liu
- Department of Cardiothoracic Surgery, Jinling Hospital, Medical School of Nanjing University, Nanjing, China
| | - Chang Li
- Department of Thoracic Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Li-Qin Yang
- Department of Thoracic Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Yue-Juan Zhang
- Department of Thoracic Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Jing Luo
- Department of Thoracic Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Jun Zhao
- Department of Thoracic Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
| |
Collapse
|
18
|
Wang C, Liu S, Yang H, Guo J, Wu Y, Liu J. Ethical Considerations of Using ChatGPT in Health Care. J Med Internet Res 2023; 25:e48009. [PMID: 37566454 PMCID: PMC10457697 DOI: 10.2196/48009] [Citation(s) in RCA: 58] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 07/05/2023] [Accepted: 07/25/2023] [Indexed: 08/12/2023] Open
Abstract
ChatGPT has promising applications in health care, but potential ethical issues need to be addressed proactively to prevent harm. ChatGPT presents potential ethical challenges from legal, humanistic, algorithmic, and informational perspectives. Legal ethics concerns arise from the unclear allocation of responsibility when patient harm occurs and from potential breaches of patient privacy due to data collection. Clear rules and legal boundaries are needed to properly allocate liability and protect users. Humanistic ethics concerns arise from the potential disruption of the physician-patient relationship, humanistic care, and issues of integrity. Overreliance on artificial intelligence (AI) can undermine compassion and erode trust. Transparency and disclosure of AI-generated content are critical to maintaining integrity. Algorithmic ethics raise concerns about algorithmic bias, responsibility, transparency and explainability, as well as validation and evaluation. Information ethics include data bias, validity, and effectiveness. Biased training data can lead to biased output, and overreliance on ChatGPT can reduce patient adherence and encourage self-diagnosis. Ensuring the accuracy, reliability, and validity of ChatGPT-generated content requires rigorous validation and ongoing updates based on clinical practice. To navigate the evolving ethical landscape of AI, AI in health care must adhere to the strictest ethical standards. Through comprehensive ethical guidelines, health care professionals can ensure the responsible use of ChatGPT, promote accurate and reliable information exchange, protect patient privacy, and empower patients to make informed decisions about their health care.
Collapse
Affiliation(s)
- Changyu Wang
- Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China
- West China College of Stomatology, Sichuan University, Chengdu, China
| | - Siru Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Hao Yang
- Information Center, West China Hospital, Sichuan University, Chengdu, China
| | - Jiulin Guo
- Information Center, West China Hospital, Sichuan University, Chengdu, China
| | - Yuxuan Wu
- Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China
| | - Jialin Liu
- Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China
- Information Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
19
|
Liu J, Wang C, Liu S. Utility of ChatGPT in Clinical Practice. J Med Internet Res 2023; 25:e48568. [PMID: 37379067 PMCID: PMC10365580 DOI: 10.2196/48568] [Citation(s) in RCA: 112] [Impact Index Per Article: 112.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 05/29/2023] [Accepted: 06/15/2023] [Indexed: 06/29/2023] Open
Abstract
ChatGPT is receiving increasing attention and has a variety of application scenarios in clinical practice. In clinical decision support, ChatGPT has been used to generate accurate differential diagnosis lists, support clinical decision-making, optimize clinical decision support, and provide insights for cancer screening decisions. In addition, ChatGPT has been used for intelligent question-answering to provide reliable information about diseases and medical queries. In terms of medical documentation, ChatGPT has proven effective in generating patient clinical letters, radiology reports, medical notes, and discharge summaries, improving efficiency and accuracy for health care providers. Future research directions include real-time monitoring and predictive analytics, precision medicine and personalized treatment, the role of ChatGPT in telemedicine and remote health care, and integration with existing health care systems. Overall, ChatGPT is a valuable tool that complements the expertise of health care providers and improves clinical decision-making and patient care. However, ChatGPT is a double-edged sword. We need to carefully consider and study the benefits and potential dangers of ChatGPT. In this viewpoint, we discuss recent advances in ChatGPT research in clinical practice and suggest possible risks and challenges of using ChatGPT in clinical practice. It will help guide and support future artificial intelligence research similar to ChatGPT in health.
Collapse
Affiliation(s)
- Jialin Liu
- Information Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Medical Informatics, West China Medical School, Chengdu, China
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital, Sichuan University, Chengdu, China
| | - Changyu Wang
- Information Center, West China Hospital, Sichuan University, Chengdu, China
- West China College of Stomatology, Sichuan University, Chengdu, China
| | - Siru Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| |
Collapse
|
20
|
Temsah MH, Aljamaan F, Malki KH, Alhasan K, Altamimi I, Aljarbou R, Bazuhair F, Alsubaihin A, Abdulmajeed N, Alshahrani FS, Temsah R, Alshahrani T, Al-Eyadhy L, Alkhateeb SM, Saddik B, Halwani R, Jamal A, Al-Tawfiq JA, Al-Eyadhy A. ChatGPT and the Future of Digital Health: A Study on Healthcare Workers' Perceptions and Expectations. Healthcare (Basel) 2023; 11:1812. [PMID: 37444647 PMCID: PMC10340744 DOI: 10.3390/healthcare11131812] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 06/14/2023] [Accepted: 06/19/2023] [Indexed: 07/15/2023] Open
Abstract
This study aimed to assess the knowledge, attitudes, and intended practices of healthcare workers (HCWs) in Saudi Arabia towards ChatGPT, an artificial intelligence (AI) Chatbot, within the first three months after its launch. We also aimed to identify potential barriers to AI Chatbot adoption among healthcare professionals. A cross-sectional survey was conducted among 1057 HCWs in Saudi Arabia, distributed electronically via social media channels from 21 February to 6 March 2023. The survey evaluated HCWs' familiarity with ChatGPT-3.5, their satisfaction, intended future use, and perceived usefulness in healthcare practice. Of the respondents, 18.4% had used ChatGPT for healthcare purposes, while 84.1% of non-users expressed interest in utilizing AI Chatbots in the future. Most participants (75.1%) were comfortable with incorporating ChatGPT into their healthcare practice. HCWs perceived the Chatbot to be useful in various aspects of healthcare, such as medical decision-making (39.5%), patient and family support (44.7%), medical literature appraisal (48.5%), and medical research assistance (65.9%). A majority (76.7%) believed ChatGPT could positively impact the future of healthcare systems. Nevertheless, concerns about credibility and the source of information provided by AI Chatbots (46.9%) were identified as the main barriers. Although HCWs recognize ChatGPT as a valuable addition to digital health in the early stages of adoption, addressing concerns regarding accuracy, reliability, and medicolegal implications is crucial. Therefore, due to their unreliability, the current forms of ChatGPT and other Chatbots should not be used for diagnostic or treatment purposes without human expert oversight. Ensuring the trustworthiness and dependability of AI Chatbots is essential for successful implementation in healthcare settings. Future research should focus on evaluating the clinical outcomes of ChatGPT and benchmarking its performance against other AI Chatbots.
Collapse
Affiliation(s)
- Mohamad-Hani Temsah
- College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia
- Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia
- Evidence-Based Health Care & Knowledge Translation Research Chair, King Saud University, Riyadh 11587, Saudi Arabia
| | - Fadi Aljamaan
- College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia
- Critical Care Department, King Saud University Medical City, Riyadh 11411, Saudi Arabia
| | - Khalid H. Malki
- Research Chair of Voice, Swallowing, and Communication Disorders, ENT Department, College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia
| | - Khalid Alhasan
- College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia
- Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia
- Solid Organ Transplant Center of Excellence, King Faisal Specialist Hospital and Research Center, Riyadh 11564, Saudi Arabia
| | - Ibraheem Altamimi
- College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia
| | - Razan Aljarbou
- Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia
| | - Faisal Bazuhair
- Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia
| | - Abdulmajeed Alsubaihin
- College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia
- Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia
| | - Naif Abdulmajeed
- Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia
- Pediatric Nephrology Department, Prince Sultan Military Medical City, Riyadh 12233, Saudi Arabia
| | - Fatimah S. Alshahrani
- College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia
- Division of Infectious Diseases, Department of Internal Medicine, College of Medicine, King Saud University, Riyadh 11451, Saudi Arabia
| | - Reem Temsah
- College of Pharmacy, Alfaisal University, Riyadh 11533, Saudi Arabia
| | - Turki Alshahrani
- Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia
| | - Lama Al-Eyadhy
- College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia
| | | | - Basema Saddik
- Sharjah Institute of Medical Research, University of Sharjah, Sharjah 27272, United Arab Emirates
- Department of Community and Family Medicine, College of Medicine, University of Sharjah, Sharjah 27272, United Arab Emirates
- School of Population Health, Faculty of Medicine & Health, UNSW Sydney, Sydney, NSW 2052, Australia
| | - Rabih Halwani
- Sharjah Institute of Medical Research, University of Sharjah, Sharjah 27272, United Arab Emirates
- Department of Clinical Sciences, College of Medicine, University of Sharjah, Sharjah 27272, United Arab Emirates
| | - Amr Jamal
- College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia
- Evidence-Based Health Care & Knowledge Translation Research Chair, King Saud University, Riyadh 11587, Saudi Arabia
- Department of Family and Community Medicine, King Saud University Medical City, Riyadh 11411, Saudi Arabia
| | - Jaffar A. Al-Tawfiq
- Specialty Internal Medicine and Quality Department, Johns Hopkins Aramco Healthcare, Dhahran 34465, Saudi Arabia
- Infectious Disease Division, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Infectious Disease Division, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21218, USA
| | - Ayman Al-Eyadhy
- College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia
- Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia
| |
Collapse
|