1
|
Moll M, Heilemann G, Georg D, Kauer-Dorner D, Kuess P. The role of artificial intelligence in informed patient consent for radiotherapy treatments-a case report. Strahlenther Onkol 2024; 200:544-548. [PMID: 38180493 DOI: 10.1007/s00066-023-02190-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 12/03/2023] [Indexed: 01/06/2024]
Abstract
Recent advancements in large language models (LMM; e.g., ChatGPT (OpenAI, San Francisco, California, USA)) have seen widespread use in various fields, including healthcare. This case study reports on the first use of LMM in a pretreatment discussion and in obtaining informed consent for a radiation oncology treatment. Further, the reproducibility of the replies by ChatGPT 3.5 was analyzed. A breast cancer patient, following legal consultation, engaged in a conversation with ChatGPT 3.5 regarding her radiotherapy treatment. The patient posed questions about side effects, prevention, activities, medications, and late effects. While some answers contained inaccuracies, responses closely resembled doctors' replies. In a final evaluation discussion, the patient, however, stated that she preferred the presence of a physician and expressed concerns about the source of the provided information. The reproducibility was tested in ten iterations. Future guidelines for using such models in radiation oncology should be driven by medical professionals. While artificial intelligence (AI) supports essential tasks, human interaction remains crucial.
Collapse
Affiliation(s)
- M Moll
- Department of Radiation Oncology, Comprehensive Cancer Center Vienna, Medical University Vienna, Vienna, Austria.
| | - G Heilemann
- Department of Radiation Oncology, Comprehensive Cancer Center Vienna, Medical University Vienna, Vienna, Austria
| | - Dietmar Georg
- Department of Radiation Oncology, Comprehensive Cancer Center Vienna, Medical University Vienna, Vienna, Austria
| | - D Kauer-Dorner
- Department of Radiation Oncology, Comprehensive Cancer Center Vienna, Medical University Vienna, Vienna, Austria
| | - P Kuess
- Department of Radiation Oncology, Comprehensive Cancer Center Vienna, Medical University Vienna, Vienna, Austria
| |
Collapse
|
2
|
Hansen A, Klute RM, Yadav M, Bansal S, Bond WF. How Do Learners Receive Feedback on Note Writing? A Scoping Review. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2024; 99:683-690. [PMID: 38306581 DOI: 10.1097/acm.0000000000005653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2024]
Abstract
PURPOSE The literature assessing the process of note-writing based on gathered information is scant. This scoping review investigates methods of providing feedback on learners' note-writing abilities. METHOD Scopus and Web of Science were searched for studies that investigated feedback on student notes or reviewed notes written on an information or data-gathering activity in health care and other fields in August 2022. Of 426 articles screened, 23 met the inclusion criteria. Data were extracted on the article title, publication year, study location, study aim, study design, number of participants, participant demographics, level of education, type of note written, field of study, form of feedback given, source of the feedback, and student or participant rating of feedback method from the included articles. Then possible themes were identified and a final consensus-based thematic analysis was performed. RESULTS Themes identified in the 23 included articles were as follows: (1) learners found faculty and peer feedback beneficial; (2) direct written comments and evaluation tools, such as rubrics or checklists, were the most common feedback methods; (3) reports on notes in real clinical settings were limited (simulated clinical scenarios in preclinical curriculum were the most studied); (4) feedback providers and recipients benefit from having prior training on providing and receiving feedback; (5) sequential or iterative feedback was beneficial for learners but can be time intensive for faculty and confounded by maturation effects; and (6) use of technology and validated assessment tools facilitate the feedback process through ease of communication and improved organization. CONCLUSIONS The various factors influencing impact and perception of feedback include the source, structure, setting, use of technology, and amount of feedback provided. As the utility of note-writing in health care expands, studies are needed to clarify the value of note feedback in learning and the role of innovative technologies in facilitating note feedback.
Collapse
|
3
|
Lee TJ, Campbell DJ, Rao AK, Hossain A, Elkattawy O, Radfar N, Lee P, Gardin JM. Evaluating ChatGPT Responses on Atrial Fibrillation for Patient Education. Cureus 2024; 16:e61680. [PMID: 38841294 PMCID: PMC11151148 DOI: 10.7759/cureus.61680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/04/2024] [Indexed: 06/07/2024] Open
Abstract
Background ChatGPT is a language model that has gained widespread popularity for its fine-tuned conversational abilities. However, a known drawback to the artificial intelligence (AI) chatbot is its tendency to confidently present users with inaccurate information. We evaluated the quality of ChatGPT responses to questions pertaining to atrial fibrillation for patient education. Our analysis included the accuracy and estimated grade level of answers and whether references were provided for the answers. Methodology ChatGPT was prompted four times and 16 frequently asked questions on atrial fibrillation from the American Heart Association were asked. Prompts included Form 1 (no prompt), Form 2 (patient-friendly prompt), Form 3 (physician-level prompt), and Form 4 (prompting for statistics/references). Responses were scored as incorrect, partially correct, or correct with references (perfect). Flesch-Kincaid grade-level unique words and response lengths were recorded for answers. Proportions of the responses at differing scores were compared using the chi-square analysis. The relationship between form and grade level was assessed using the analysis of variance. Results Across all forms, scoring frequencies were one (1.6%) incorrect, five (7.8%) partially correct, 55 (85.9%) correct, and three (4.7%) perfect. Proportions of responses that were at least correct did not differ by form (p = 0.350), but perfect responses did (p = 0.001). Form 2 answers had a lower mean grade level (12.80 ± 3.38) than Forms 1 (14.23 ± 2.34), 3 (16.73 ± 2.65), and 4 (14.85 ± 2.76) (p < 0.05). Across all forms, references were provided in only three (4.7%) answers. Notably, when additionally prompted for sources or references, ChatGPT still only provided sources on three responses out of 16 (18.8%). Conclusions ChatGPT holds significant potential for enhancing patient education through accurate, adaptive responses. Its ability to alter response complexity based on user input, combined with high accuracy rates, supports its use as an informational resource in healthcare settings. Future advancements and continuous monitoring of AI capabilities will be crucial in maximizing the benefits while mitigating the risks associated with AI-driven patient education.
Collapse
Affiliation(s)
- Thomas J Lee
- Department of Medicine, Rutgers University New Jersey Medical School, Newark, USA
| | - Daniel J Campbell
- Otolaryngology-Head and Neck Surgery, Thomas Jefferson University Hospital, Philadelphia, USA
| | - Abhinav K Rao
- Department of Medicine, Trident Medical Center, Charleston, USA
| | - Afif Hossain
- Department of Medicine/Division of Cardiology, Rutgers University New Jersey Medical School, Newark, USA
| | - Omar Elkattawy
- Department of Medicine, Rutgers University New Jersey Medical School, Newark, USA
| | - Navid Radfar
- Department of Medicine, Rutgers University New Jersey Medical School, Newark, USA
| | - Paul Lee
- Department of Medicine, Rutgers University New Jersey Medical School, Newark, USA
| | - Julius M Gardin
- Department of Medicine/Division of Cardiology, Rutgers University New Jersey Medical School, Newark, USA
| |
Collapse
|
4
|
Morya VK, Lee HW, Shahid H, Magar AG, Lee JH, Kim JH, Jun L, Noh KC. Application of ChatGPT for Orthopedic Surgeries and Patient Care. Clin Orthop Surg 2024; 16:347-356. [PMID: 38827766 PMCID: PMC11130626 DOI: 10.4055/cios23181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 11/15/2023] [Accepted: 12/12/2023] [Indexed: 06/05/2024] Open
Abstract
Artificial intelligence (AI) has rapidly transformed various aspects of life, and the launch of the chatbot "ChatGPT" by OpenAI in November 2022 has garnered significant attention and user appreciation. ChatGPT utilizes natural language processing based on a "generative pre-trained transfer" (GPT) model, specifically the transformer architecture, to generate human-like responses to a wide range of questions and topics. Equipped with approximately 57 billion words and 175 billion parameters from online data, ChatGPT has potential applications in medicine and orthopedics. One of its key strengths is its personalized, easy-to-understand, and adaptive response, which allows it to learn continuously through user interaction. This article discusses how AI, especially ChatGPT, presents numerous opportunities in orthopedics, ranging from preoperative planning and surgical techniques to patient education and medical support. Although ChatGPT's user-friendly responses and adaptive capabilities are laudable, its limitations, including biased responses and ethical concerns, necessitate its cautious and responsible use. Surgeons and healthcare providers should leverage the strengths of the ChatGPT while recognizing its current limitations and verifying critical information through independent research and expert opinions. As AI technology continues to evolve, ChatGPT may become a valuable tool in orthopedic education and patient care, leading to improved outcomes and efficiency in healthcare delivery. The integration of AI into orthopedics offers substantial benefits but requires careful consideration and continuous improvement.
Collapse
Affiliation(s)
- Vivek Kumar Morya
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| | - Ho-Won Lee
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| | - Hamzah Shahid
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| | - Anuja Gajanan Magar
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| | - Ju-Hyung Lee
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| | - Jae-Hyung Kim
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| | - Lang Jun
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| | - Kyu-Cheol Noh
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| |
Collapse
|
5
|
Xue E, Bracken-Clarke D, Iannantuono GM, Choo-Wosoba H, Gulley JL, Floudas CS. Utility of Large Language Models for Health Care Professionals and Patients in Navigating Hematopoietic Stem Cell Transplantation: Comparison of the Performance of ChatGPT-3.5, ChatGPT-4, and Bard. J Med Internet Res 2024; 26:e54758. [PMID: 38758582 PMCID: PMC11143389 DOI: 10.2196/54758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 03/22/2024] [Accepted: 03/22/2024] [Indexed: 05/18/2024] Open
Abstract
BACKGROUND Artificial intelligence is increasingly being applied to many workflows. Large language models (LLMs) are publicly accessible platforms trained to understand, interact with, and produce human-readable text; their ability to deliver relevant and reliable information is also of particular interest for the health care providers and the patients. Hematopoietic stem cell transplantation (HSCT) is a complex medical field requiring extensive knowledge, background, and training to practice successfully and can be challenging for the nonspecialist audience to comprehend. OBJECTIVE We aimed to test the applicability of 3 prominent LLMs, namely ChatGPT-3.5 (OpenAI), ChatGPT-4 (OpenAI), and Bard (Google AI), in guiding nonspecialist health care professionals and advising patients seeking information regarding HSCT. METHODS We submitted 72 open-ended HSCT-related questions of variable difficulty to the LLMs and rated their responses based on consistency-defined as replicability of the response-response veracity, language comprehensibility, specificity to the topic, and the presence of hallucinations. We then rechallenged the 2 best performing chatbots by resubmitting the most difficult questions and prompting to respond as if communicating with either a health care professional or a patient and to provide verifiable sources of information. Responses were then rerated with the additional criterion of language appropriateness, defined as language adaptation for the intended audience. RESULTS ChatGPT-4 outperformed both ChatGPT-3.5 and Bard in terms of response consistency (66/72, 92%; 54/72, 75%; and 63/69, 91%, respectively; P=.007), response veracity (58/66, 88%; 40/54, 74%; and 16/63, 25%, respectively; P<.001), and specificity to the topic (60/66, 91%; 43/54, 80%; and 27/63, 43%, respectively; P<.001). Both ChatGPT-4 and ChatGPT-3.5 outperformed Bard in terms of language comprehensibility (64/66, 97%; 53/54, 98%; and 52/63, 83%, respectively; P=.002). All displayed episodes of hallucinations. ChatGPT-3.5 and ChatGPT-4 were then rechallenged with a prompt to adapt their language to the audience and to provide source of information, and responses were rated. ChatGPT-3.5 showed better ability to adapt its language to nonmedical audience than ChatGPT-4 (17/21, 81% and 10/22, 46%, respectively; P=.03); however, both failed to consistently provide correct and up-to-date information resources, reporting either out-of-date materials, incorrect URLs, or unfocused references, making their output not verifiable by the reader. CONCLUSIONS In conclusion, despite LLMs' potential capability in confronting challenging medical topics such as HSCT, the presence of mistakes and lack of clear references make them not yet appropriate for routine, unsupervised clinical use, or patient counseling. Implementation of LLMs' ability to access and to reference current and updated websites and research papers, as well as development of LLMs trained in specialized domain knowledge data sets, may offer potential solutions for their future clinical application.
Collapse
Affiliation(s)
- Elisabetta Xue
- Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, United States
| | - Dara Bracken-Clarke
- Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, United States
| | - Giovanni Maria Iannantuono
- Genitourinary Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, United States
| | - Hyoyoung Choo-Wosoba
- Biostatistics and Data Management Section, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, United States
| | - James L Gulley
- Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, United States
| | - Charalampos S Floudas
- Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, United States
| |
Collapse
|
6
|
Blasingame MN, Koonce TY, Williams AM, Giuse DA, Su J, Krump PA, Giuse NB. Evaluating a Large Language Model's Ability to Answer Clinicians' Requests for Evidence Summaries. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.01.24306691. [PMID: 38746273 PMCID: PMC11092721 DOI: 10.1101/2024.05.01.24306691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Objective This study investigated the performance of a generative artificial intelligence (AI) tool using GPT-4 in answering clinical questions in comparison with medical librarians' gold-standard evidence syntheses. Methods Questions were extracted from an in-house database of clinical evidence requests previously answered by medical librarians. Questions with multiple parts were subdivided into individual topics. A standardized prompt was developed using the COSTAR framework. Librarians submitted each question into aiChat, an internally-managed chat tool using GPT-4, and recorded the responses. The summaries generated by aiChat were evaluated on whether they contained the critical elements used in the established gold-standard summary of the librarian. A subset of questions was randomly selected for verification of references provided by aiChat. Results Of the 216 evaluated questions, aiChat's response was assessed as "correct" for 180 (83.3%) questions, "partially correct" for 35 (16.2%) questions, and "incorrect" for 1 (0.5%) question. No significant differences were observed in question ratings by question category (p=0.39). For a subset of 30% (n=66) of questions, 162 references were provided in the aiChat summaries, and 60 (37%) were confirmed as nonfabricated. Conclusions Overall, the performance of a generative AI tool was promising. However, many included references could not be independently verified, and attempts were not made to assess whether any additional concepts introduced by aiChat were factually accurate. Thus, we envision this being the first of a series of investigations designed to further our understanding of how current and future versions of generative AI can be used and integrated into medical librarians' workflow.
Collapse
|
7
|
Wang L, Wan Z, Ni C, Song Q, Li Y, Clayton EW, Malin BA, Yin Z. A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.26.24306390. [PMID: 38712148 PMCID: PMC11071576 DOI: 10.1101/2024.04.26.24306390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Background The launch of the Chat Generative Pre-trained Transformer (ChatGPT) in November 2022 has attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including healthcare. Numerous studies have since been conducted regarding how to employ state-of-the-art LLMs in health-related scenarios to assist patients, doctors, and public health administrators. Objective This review aims to summarize the applications and concerns of applying conversational LLMs in healthcare and provide an agenda for future research on LLMs in healthcare. Methods We utilized PubMed, ACM, and IEEE digital libraries as primary sources for this review. We followed the guidance of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRIMSA) to screen and select peer-reviewed research articles that (1) were related to both healthcare applications and conversational LLMs and (2) were published before September 1st, 2023, the date when we started paper collection and screening. We investigated these papers and classified them according to their applications and concerns. Results Our search initially identified 820 papers according to targeted keywords, out of which 65 papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT from OpenAI (60), followed by Bard from Google (1), Large Language Model Meta AI (LLaMA) from Meta (1), and other LLMs (5). These papers were classified into four categories in terms of their applications: 1) summarization, 2) medical knowledge inquiry, 3) prediction, and 4) administration, and four categories of concerns: 1) reliability, 2) bias, 3) privacy, and 4) public acceptability. There are 49 (75%) research papers using LLMs for summarization and/or medical knowledge inquiry, and 58 (89%) research papers expressing concerns about reliability and/or bias. We found that conversational LLMs exhibit promising results in summarization and providing medical knowledge to patients with a relatively high accuracy. However, conversational LLMs like ChatGPT are not able to provide reliable answers to complex health-related tasks that require specialized domain expertise. Additionally, no experiments in our reviewed papers have been conducted to thoughtfully examine how conversational LLMs lead to bias or privacy issues in healthcare research. Conclusions Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications brought bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in healthcare.
Collapse
Affiliation(s)
- Leyao Wang
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Zhiyu Wan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
| | - Congning Ni
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Qingyuan Song
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Yang Li
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Ellen Wright Clayton
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA, 37203
- Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, Nashville, Tennessee, USA, 37203
| | - Bradley A. Malin
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
- Department of Biostatistics, Vanderbilt University Medical Center, TN, USA, 37203
| | - Zhijun Yin
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
| |
Collapse
|
8
|
Wu J, Ma Y, Wang J, Xiao M. The Application of ChatGPT in Medicine: A Scoping Review and Bibliometric Analysis. J Multidiscip Healthc 2024; 17:1681-1692. [PMID: 38650670 PMCID: PMC11034560 DOI: 10.2147/jmdh.s463128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open
Abstract
Purpose ChatGPT has a wide range of applications in the medical field. Therefore, this review aims to define the key issues and provide a comprehensive view of the literature based on the application of ChatGPT in medicine. Methods This scope follows Arksey and O'Malley's five-stage framework. A comprehensive literature search of publications (30 November 2022 to 16 August 2023) was conducted. Six databases were searched and relevant references were systematically catalogued. Attention was focused on the general characteristics of the articles, their fields of application, and the advantages and disadvantages of using ChatGPT. Descriptive statistics and narrative synthesis methods were used for data analysis. Results Of the 3426 studies, 247 met the criteria for inclusion in this review. The majority of articles (31.17%) were from the United States. Editorials (43.32%) ranked first, followed by experimental studys (11.74%). The potential applications of ChatGPT in medicine are varied, with the largest number of studies (45.75%) exploring clinical practice, including assisting with clinical decision support and providing disease information and medical advice. This was followed by medical education (27.13%) and scientific research (16.19%). Particularly noteworthy in the discipline statistics were radiology, surgery and dentistry at the top of the list. However, ChatGPT in medicine also faces issues of data privacy, inaccuracy and plagiarism. Conclusion The application of ChatGPT in medicine focuses on different disciplines and general application scenarios. ChatGPT has a paradoxical nature: it offers significant advantages, but at the same time raises great concerns about its application in healthcare settings. Therefore, it is imperative to develop theoretical frameworks that not only address its widespread use in healthcare but also facilitate a comprehensive assessment. In addition, these frameworks should contribute to the development of strict and effective guidelines and regulatory measures.
Collapse
Affiliation(s)
- Jie Wu
- Department of Nursing, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Yingzhuo Ma
- Department of Nursing, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Jun Wang
- Department of Nursing, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Mingzhao Xiao
- Department of Urology, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| |
Collapse
|
9
|
Yuan S, Li F, Browning MHEM, Bardhan M, Zhang K, McAnirlin O, Patwary MM, Reuben A. Leveraging and exercising caution with ChatGPT and other generative artificial intelligence tools in environmental psychology research. Front Psychol 2024; 15:1295275. [PMID: 38650897 PMCID: PMC11033305 DOI: 10.3389/fpsyg.2024.1295275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 03/01/2024] [Indexed: 04/25/2024] Open
Abstract
Generative Artificial Intelligence (GAI) is an emerging and disruptive technology that has attracted considerable interest from researchers and educators across various disciplines. We discuss the relevance and concerns of ChatGPT and other GAI tools in environmental psychology research. We propose three use categories for GAI tools: integrated and contextualized understanding, practical and flexible implementation, and two-way external communication. These categories are exemplified by topics such as the health benefits of green space, theory building, visual simulation, and identifying practical relevance. However, we also highlight the balance of productivity with ethical issues, as well as the need for ethical guidelines, professional training, and changes in the academic performance evaluation systems. We hope this perspective can foster constructive dialogue and responsible practice of GAI tools.
Collapse
Affiliation(s)
- Shuai Yuan
- Virtual Reality and Nature Lab, Department of Parks, Recreation and Tourism Management, Clemson University, Clemson, SC, United States
| | - Fu Li
- Virtual Reality and Nature Lab, Department of Parks, Recreation and Tourism Management, Clemson University, Clemson, SC, United States
| | - Matthew H. E. M. Browning
- Virtual Reality and Nature Lab, Department of Parks, Recreation and Tourism Management, Clemson University, Clemson, SC, United States
| | - Mondira Bardhan
- Virtual Reality and Nature Lab, Department of Parks, Recreation and Tourism Management, Clemson University, Clemson, SC, United States
| | - Kuiran Zhang
- Virtual Reality and Nature Lab, Department of Parks, Recreation and Tourism Management, Clemson University, Clemson, SC, United States
| | - Olivia McAnirlin
- Virtual Reality and Nature Lab, Department of Parks, Recreation and Tourism Management, Clemson University, Clemson, SC, United States
| | - Muhammad Mainuddin Patwary
- Environment and Sustainability Research Initiative, Khulna, Bangladesh
- Environmental Science Discipline, Life Science School, Khulna University, Khulna, Bangladesh
| | - Aaron Reuben
- Department of Psychology and Neuroscience, Duke University, Durham, NC, United States
| |
Collapse
|
10
|
Omar M, Brin D, Glicksberg B, Klang E. Utilizing natural language processing and large language models in the diagnosis and prediction of infectious diseases: A systematic review. Am J Infect Control 2024:S0196-6553(24)00159-7. [PMID: 38588980 DOI: 10.1016/j.ajic.2024.03.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/26/2024] [Accepted: 03/27/2024] [Indexed: 04/10/2024]
Abstract
BACKGROUND Natural Language Processing (NLP) and Large Language Models (LLMs) hold largely untapped potential in infectious disease management. This review explores their current use and uncovers areas needing more attention. METHODS This analysis followed systematic review procedures, registered with the Prospective Register of Systematic Reviews. We conducted a search across major databases including PubMed, Embase, Web of Science, and Scopus, up to December 2023, using keywords related to NLP, LLM, and infectious diseases. We also employed the Quality Assessment of Diagnostic Accuracy Studies-2 tool for evaluating the quality and robustness of the included studies. RESULTS Our review identified 15 studies with diverse applications of NLP in infectious disease management. Notable examples include GPT-4's application in detecting urinary tract infections and BERTweet's use in Lyme Disease surveillance through social media analysis. These models demonstrated effective disease monitoring and public health tracking capabilities. However, the effectiveness varied across studies. For instance, while some NLP tools showed high accuracy in pneumonia detection and high sensitivity in identifying invasive mold diseases from medical reports, others fell short in areas like bloodstream infection management. CONCLUSIONS This review highlights the yet-to-be-fully-realized promise of NLP and LLMs in infectious disease management. It calls for more exploration to fully harness AI's capabilities, particularly in the areas of diagnosis, surveillance, predicting disease courses, and tracking epidemiological trends.
Collapse
Affiliation(s)
- Mahmud Omar
- Tel-aviv university, Faculty of medicine, Tel-Aviv, Israel.
| | - Dana Brin
- Division of Diagnostic Imaging, Sheba Medical Center, Affiliated to Tel-Aviv University, Ramat Gan, Israel
| | - Benjamin Glicksberg
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY; The Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY
| | - Eyal Klang
- The Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY
| |
Collapse
|
11
|
Zangrossi P, Martini M, Guerrini F, DE Bonis P, Spena G. Large language model, AI and scientific research: why ChatGPT is only the beginning. J Neurosurg Sci 2024; 68:216-224. [PMID: 38261307 DOI: 10.23736/s0390-5616.23.06171-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
ChatGPT, a conversational artificial intelligence model based on the generative pre-trained transformer GPT architecture, has garnered widespread attention due to its user-friendly nature and diverse capabilities. This technology enables users of all backgrounds to effortlessly engage in human-like conversations and receive coherent and intelligible responses. Beyond casual interactions, ChatGPT offers compelling prospects for scientific research, facilitating tasks like literature review and content summarization, ultimately expediting and enhancing the academic writing process. Still, in the field of medicine and surgery, it has already shown its endless potential in many tasks (enhancing decision-making processes, aiding in surgical planning and simulation, providing real-time assistance during surgery, improving postoperative care and rehabilitation, contributing to training, education, research, and development). However, it is crucial to acknowledge the model's limitations, encompassing knowledge constraints and the potential for erroneous responses, as well as ethical and legal considerations. This paper explores the potential benefits and pitfalls of these innovative technologies in scientific research, shedding light on their transformative impact while addressing concerns surrounding their use.
Collapse
Affiliation(s)
- Pietro Zangrossi
- Department of Neurosurgery, Sant'Anna University Hospital, Ferrara, Italy -
- Department of Translational Medicine, University of Ferrara, Ferrara, Italy -
| | - Massimo Martini
- R&D Department, Gate-away.com, Grottammare, Ascoli Piceno, Italy
| | - Francesco Guerrini
- Department of Neurosurgery, San Matteo Polyclinic IRCCS Foundation, Pavia, Italy
| | - Pasquale DE Bonis
- Department of Neurosurgery, Sant'Anna University Hospital, Ferrara, Italy
- Department of Translational Medicine, University of Ferrara, Ferrara, Italy
- Unit of Minimally Invasive Neurosurgery, Ferrara University Hospital, Ferrara, Italy
| | - Giannantonio Spena
- Department of Neurosurgery, San Matteo Polyclinic IRCCS Foundation, Pavia, Italy
| |
Collapse
|
12
|
Artsi Y, Sorin V, Konen E, Glicksberg BS, Nadkarni G, Klang E. Large language models for generating medical examinations: systematic review. BMC MEDICAL EDUCATION 2024; 24:354. [PMID: 38553693 PMCID: PMC10981304 DOI: 10.1186/s12909-024-05239-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 02/28/2024] [Indexed: 04/01/2024]
Abstract
BACKGROUND Writing multiple choice questions (MCQs) for the purpose of medical exams is challenging. It requires extensive medical knowledge, time and effort from medical educators. This systematic review focuses on the application of large language models (LLMs) in generating medical MCQs. METHODS The authors searched for studies published up to November 2023. Search terms focused on LLMs generated MCQs for medical examinations. Non-English, out of year range and studies not focusing on AI generated multiple-choice questions were excluded. MEDLINE was used as a search database. Risk of bias was evaluated using a tailored QUADAS-2 tool. RESULTS Overall, eight studies published between April 2023 and October 2023 were included. Six studies used Chat-GPT 3.5, while two employed GPT 4. Five studies showed that LLMs can produce competent questions valid for medical exams. Three studies used LLMs to write medical questions but did not evaluate the validity of the questions. One study conducted a comparative analysis of different models. One other study compared LLM-generated questions with those written by humans. All studies presented faulty questions that were deemed inappropriate for medical exams. Some questions required additional modifications in order to qualify. CONCLUSIONS LLMs can be used to write MCQs for medical examinations. However, their limitations cannot be ignored. Further study in this field is essential and more conclusive evidence is needed. Until then, LLMs may serve as a supplementary tool for writing medical examinations. 2 studies were at high risk of bias. The study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.
Collapse
Affiliation(s)
- Yaara Artsi
- Azrieli Faculty of Medicine, Bar-Ilan University, Ha'Hadas St. 1, Rishon Le Zion, Zefat, 7550598, Israel.
| | - Vera Sorin
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Ramat Gan, Israel
- Tel-Aviv University School of Medicine, Tel Aviv, Israel
- DeepVision Lab, Chaim Sheba Medical Center, Ramat Gan, Israel
| | - Eli Konen
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Ramat Gan, Israel
- Tel-Aviv University School of Medicine, Tel Aviv, Israel
| | - Benjamin S Glicksberg
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Girish Nadkarni
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eyal Klang
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
13
|
Bektaş M, Pereira JK, Daams F, van der Peet DL. ChatGPT in surgery: a revolutionary innovation? Surg Today 2024:10.1007/s00595-024-02800-6. [PMID: 38421439 DOI: 10.1007/s00595-024-02800-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 12/13/2023] [Indexed: 03/02/2024]
Abstract
ChatGPT has brought about a new era of digital health, as this model has become prominent and been rapidly developing since its release. ChatGPT may be able to facilitate improvements in surgery as well; however, the influence of ChatGPT on surgery is largely unknown at present. Therefore, the present study reports on the current applications of ChatGPT in the field of surgery, evaluating its workflow, practical implementations, limitations, and future perspectives. A literature search was performed using the PubMed and Embase databases. The initial search was performed from its inception until July 2023. This study revealed that ChatGPT has promising capabilities in areas of surgical research, education, training, and practice. In daily practice, surgeons and surgical residents can be aided in performing logistics and administrative tasks, and patients can be more efficiently informed about the details of their condition. However, priority should be given to establishing proper policies and protocols to ensure the safe and reliable use of this model.
Collapse
Affiliation(s)
- Mustafa Bektaş
- Amsterdam UMC Location Vrije Universiteit Amsterdam, Surgery, De Boelelaan 1117, Amsterdam, The Netherlands.
| | - Jaime Ken Pereira
- Department of Computer Science, Vrije Universiteit Amsterdam, De Boelelaan 1105, Amsterdam, The Netherlands
| | - Freek Daams
- Amsterdam UMC Location Vrije Universiteit Amsterdam, Surgery, De Boelelaan 1117, Amsterdam, The Netherlands
| | - Donald L van der Peet
- Amsterdam UMC Location Vrije Universiteit Amsterdam, Surgery, De Boelelaan 1117, Amsterdam, The Netherlands
| |
Collapse
|
14
|
Zaleski AL, Berkowsky R, Craig KJT, Pescatello LS. Comprehensiveness, Accuracy, and Readability of Exercise Recommendations Provided by an AI-Based Chatbot: Mixed Methods Study. JMIR MEDICAL EDUCATION 2024; 10:e51308. [PMID: 38206661 PMCID: PMC10811574 DOI: 10.2196/51308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 10/05/2023] [Accepted: 12/11/2023] [Indexed: 01/12/2024]
Abstract
BACKGROUND Regular physical activity is critical for health and disease prevention. Yet, health care providers and patients face barriers to implement evidence-based lifestyle recommendations. The potential to augment care with the increased availability of artificial intelligence (AI) technologies is limitless; however, the suitability of AI-generated exercise recommendations has yet to be explored. OBJECTIVE The purpose of this study was to assess the comprehensiveness, accuracy, and readability of individualized exercise recommendations generated by a novel AI chatbot. METHODS A coding scheme was developed to score AI-generated exercise recommendations across ten categories informed by gold-standard exercise recommendations, including (1) health condition-specific benefits of exercise, (2) exercise preparticipation health screening, (3) frequency, (4) intensity, (5) time, (6) type, (7) volume, (8) progression, (9) special considerations, and (10) references to the primary literature. The AI chatbot was prompted to provide individualized exercise recommendations for 26 clinical populations using an open-source application programming interface. Two independent reviewers coded AI-generated content for each category and calculated comprehensiveness (%) and factual accuracy (%) on a scale of 0%-100%. Readability was assessed using the Flesch-Kincaid formula. Qualitative analysis identified and categorized themes from AI-generated output. RESULTS AI-generated exercise recommendations were 41.2% (107/260) comprehensive and 90.7% (146/161) accurate, with the majority (8/15, 53%) of inaccuracy related to the need for exercise preparticipation medical clearance. Average readability level of AI-generated exercise recommendations was at the college level (mean 13.7, SD 1.7), with an average Flesch reading ease score of 31.1 (SD 7.7). Several recurring themes and observations of AI-generated output included concern for liability and safety, preference for aerobic exercise, and potential bias and direct discrimination against certain age-based populations and individuals with disabilities. CONCLUSIONS There were notable gaps in the comprehensiveness, accuracy, and readability of AI-generated exercise recommendations. Exercise and health care professionals should be aware of these limitations when using and endorsing AI-based technologies as a tool to support lifestyle change involving exercise.
Collapse
Affiliation(s)
- Amanda L Zaleski
- Clinical Evidence Development, Aetna Medical Affairs, CVS Health Corporation, Hartford, CT, United States
- Department of Preventive Cardiology, Hartford Hospital, Hartford, CT, United States
| | - Rachel Berkowsky
- Department of Kinesiology, University of Connecticut, Storrs, CT, United States
| | - Kelly Jean Thomas Craig
- Clinical Evidence Development, Aetna Medical Affairs, CVS Health Corporation, Hartford, CT, United States
| | - Linda S Pescatello
- Department of Kinesiology, University of Connecticut, Storrs, CT, United States
| |
Collapse
|
15
|
Jain N, Gottlich C, Fisher J, Campano D, Winston T. Assessing ChatGPT's orthopedic in-service training exam performance and applicability in the field. J Orthop Surg Res 2024; 19:27. [PMID: 38167093 PMCID: PMC10762835 DOI: 10.1186/s13018-023-04467-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 12/12/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND ChatGPT has gained widespread attention for its ability to understand and provide human-like responses to inputs. However, few works have focused on its use in Orthopedics. This study assessed ChatGPT's performance on the Orthopedic In-Service Training Exam (OITE) and evaluated its decision-making process to determine whether adoption as a resource in the field is practical. METHODS ChatGPT's performance on three OITE exams was evaluated through inputting multiple choice questions. Questions were classified by their orthopedic subject area. Yearly, OITE technical reports were used to gauge scores against resident physicians. ChatGPT's rationales were compared with testmaker explanations using six different groups denoting answer accuracy and logic consistency. Variables were analyzed using contingency table construction and Chi-squared analyses. RESULTS Of 635 questions, 360 were useable as inputs (56.7%). ChatGPT-3.5 scored 55.8%, 47.7%, and 54% for the years 2020, 2021, and 2022, respectively. Of 190 correct outputs, 179 provided a consistent logic (94.2%). Of 170 incorrect outputs, 133 provided an inconsistent logic (78.2%). Significant associations were found between test topic and correct answer (p = 0.011), and type of logic used and tested topic (p = < 0.001). Basic Science and Sports had adjusted residuals greater than 1.96. Basic Science and correct, no logic; Basic Science and incorrect, inconsistent logic; Sports and correct, no logic; and Sports and incorrect, inconsistent logic; had adjusted residuals greater than 1.96. CONCLUSIONS Based on annual OITE technical reports for resident physicians, ChatGPT-3.5 performed around the PGY-1 level. When answering correctly, it displayed congruent reasoning with testmakers. When answering incorrectly, it exhibited some understanding of the correct answer. It outperformed in Basic Science and Sports, likely due to its ability to output rote facts. These findings suggest that it lacks the fundamental capabilities to be a comprehensive tool in Orthopedic Surgery in its current form. LEVEL OF EVIDENCE II.
Collapse
Affiliation(s)
- Neil Jain
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA.
| | - Caleb Gottlich
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA
| | - John Fisher
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA
| | - Dominic Campano
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA
| | - Travis Winston
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA
| |
Collapse
|
16
|
Rahman MA, Victoros E, Ernest J, Davis R, Shanjana Y, Islam MR. Impact of Artificial Intelligence (AI) Technology in Healthcare Sector: A Critical Evaluation of Both Sides of the Coin. CLINICAL PATHOLOGY (THOUSAND OAKS, VENTURA COUNTY, CALIF.) 2024; 17:2632010X241226887. [PMID: 38264676 PMCID: PMC10804900 DOI: 10.1177/2632010x241226887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 12/27/2023] [Indexed: 01/25/2024]
Abstract
The influence of artificial intelligence (AI) has drastically risen in recent years, especially in the field of medicine. Its influence has spread so greatly that it is determined to become a pillar in the future medical world. A comprehensive literature search related to AI in healthcare was performed in the PubMed database and retrieved the relevant information from suitable ones. AI excels in aspects such as rapid adaptation, high diagnostic accuracy, and data management that can help improve workforce productivity. With this potential in sight, the FDA has continuously approved more machine learning (ML) software to be used by medical workers and scientists. However, there are few controversies such as increased chances of data breaches, concern for clinical implementation, and potential healthcare dilemmas. In this article, the positive and negative aspects of AI implementation in healthcare are discussed, as well as recommended some potential solutions to the potential issues at hand.
Collapse
Affiliation(s)
| | | | - Julianne Ernest
- Nesbitt School of Pharmacy Wilkes University, Wilkes-Barre, PA, USA
| | - Rob Davis
- Nesbitt School of Pharmacy Wilkes University, Wilkes-Barre, PA, USA
| | - Yeasna Shanjana
- Department of Environmental Sciences, North South University, Bashundhara, Dhaka, Bangladesh
| | | |
Collapse
|
17
|
Li W, Lu W, Gong Z. Harnessing the Potential of ChatGPT in Breast Reconstruction: A Revolution in Patient Communication and Education. Aesthetic Plast Surg 2024; 48:35-40. [PMID: 37439837 DOI: 10.1007/s00266-023-03490-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 06/26/2023] [Indexed: 07/14/2023]
Affiliation(s)
- Weiwei Li
- Department of Burn, Would Repair Surgery and Plastic Surgery, Department of Aesthetic Surgery, Affiliated Hospital of Guilin Medical University, Guangxi, 541001, China
| | - Wei Lu
- Department of Burn, Would Repair Surgery and Plastic Surgery, Department of Aesthetic Surgery, Affiliated Hospital of Guilin Medical University, Guangxi, 541001, China
| | - Zhenyu Gong
- Department of Burn, Would Repair Surgery and Plastic Surgery, Department of Aesthetic Surgery, Affiliated Hospital of Guilin Medical University, Guangxi, 541001, China.
| |
Collapse
|
18
|
Taylor E. My (Brief) Foray Into AI (Artificial Intelligence). HERD-HEALTH ENVIRONMENTS RESEARCH & DESIGN JOURNAL 2024; 17:12-16. [PMID: 37974341 DOI: 10.1177/19375867231211322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
|
19
|
Alkhaaldi SMI, Kassab CH, Dimassi Z, Oyoun Alsoud L, Al Fahim M, Al Hageh C, Ibrahim H. Medical Student Experiences and Perceptions of ChatGPT and Artificial Intelligence: Cross-Sectional Study. JMIR MEDICAL EDUCATION 2023; 9:e51302. [PMID: 38133911 PMCID: PMC10770787 DOI: 10.2196/51302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 11/10/2023] [Accepted: 12/11/2023] [Indexed: 12/23/2023]
Abstract
BACKGROUND Artificial intelligence (AI) has the potential to revolutionize the way medicine is learned, taught, and practiced, and medical education must prepare learners for these inevitable changes. Academic medicine has, however, been slow to embrace recent AI advances. Since its launch in November 2022, ChatGPT has emerged as a fast and user-friendly large language model that can assist health care professionals, medical educators, students, trainees, and patients. While many studies focus on the technology's capabilities, potential, and risks, there is a gap in studying the perspective of end users. OBJECTIVE The aim of this study was to gauge the experiences and perspectives of graduating medical students on ChatGPT and AI in their training and future careers. METHODS A cross-sectional web-based survey of recently graduated medical students was conducted in an international academic medical center between May 5, 2023, and June 13, 2023. Descriptive statistics were used to tabulate variable frequencies. RESULTS Of 325 applicants to the residency programs, 265 completed the survey (an 81.5% response rate). The vast majority of respondents denied using ChatGPT in medical school, with 20.4% (n=54) using it to help complete written assessments and only 9.4% using the technology in their clinical work (n=25). More students planned to use it during residency, primarily for exploring new medical topics and research (n=168, 63.4%) and exam preparation (n=151, 57%). Male students were significantly more likely to believe that AI will improve diagnostic accuracy (n=47, 51.7% vs n=69, 39.7%; P=.001), reduce medical error (n=53, 58.2% vs n=71, 40.8%; P=.002), and improve patient care (n=60, 65.9% vs n=95, 54.6%; P=.007). Previous experience with AI was significantly associated with positive AI perception in terms of improving patient care, decreasing medical errors and misdiagnoses, and increasing the accuracy of diagnoses (P=.001, P<.001, P=.008, respectively). CONCLUSIONS The surveyed medical students had minimal formal and informal experience with AI tools and limited perceptions of the potential uses of AI in health care but had overall positive views of ChatGPT and AI and were optimistic about the future of AI in medical education and health care. Structured curricula and formal policies and guidelines are needed to adequately prepare medical learners for the forthcoming integration of AI in medicine.
Collapse
Affiliation(s)
- Saif M I Alkhaaldi
- Khalifa University College of Medicine and Health Sciences, Abu Dhabi, United Arab Emirates
| | - Carl H Kassab
- Khalifa University College of Medicine and Health Sciences, Abu Dhabi, United Arab Emirates
| | - Zakia Dimassi
- Department of Medical Science, Khalifa University College of Medicine and Health Sciences, Abu Dhabi, United Arab Emirates
| | - Leen Oyoun Alsoud
- Department of Medical Science, Khalifa University College of Medicine and Health Sciences, Abu Dhabi, United Arab Emirates
| | - Maha Al Fahim
- Education Institute, Sheikh Khalifa Medical City, Abu Dhabi, United Arab Emirates
| | - Cynthia Al Hageh
- Department of Medical Science, Khalifa University College of Medicine and Health Sciences, Abu Dhabi, United Arab Emirates
| | - Halah Ibrahim
- Department of Medical Science, Khalifa University College of Medicine and Health Sciences, Abu Dhabi, United Arab Emirates
| |
Collapse
|
20
|
Sallam M, Al-Salahat K, Al-Ajlouni E. ChatGPT Performance in Diagnostic Clinical Microbiology Laboratory-Oriented Case Scenarios. Cureus 2023; 15:e50629. [PMID: 38107211 PMCID: PMC10725273 DOI: 10.7759/cureus.50629] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/16/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI)-based tools can reshape healthcare practice. This includes ChatGPT which is considered among the most popular AI-based conversational models. Nevertheless, the performance of different versions of ChatGPT needs further evaluation in different settings to assess its reliability and credibility in various healthcare-related tasks. Therefore, the current study aimed to assess the performance of the freely available ChatGPT-3.5 and the paid version ChatGPT-4 in 10 different diagnostic clinical microbiology case scenarios. METHODS The current study followed the METRICS (Model, Evaluation, Timing/Transparency, Range/Randomization, Individual factors, Count, Specificity of the prompts/language) checklist for standardization of the design and reporting of AI-based studies in healthcare. The models tested on December 3, 2023 included ChatGPT-3.5 and ChatGPT-4 and the evaluation of the ChatGPT-generated content was based on the CLEAR tool (Completeness, Lack of false information, Evidence support, Appropriateness, and Relevance) assessed on a 5-point Likert scale with a range of the CLEAR scores of 1-5. ChatGPT output was evaluated by two raters independently and the inter-rater agreement was based on the Cohen's κ statistic. Ten diagnostic clinical microbiology laboratory case scenarios were created in the English language by three microbiologists at diverse levels of expertise following an internal discussion of common cases observed in Jordan. The range of topics included bacteriology, mycology, parasitology, and virology cases. Specific prompts were tailored based on the CLEAR tool and a new session was selected following prompting each case scenario. RESULTS The Cohen's κ values for the five CLEAR items were 0.351-0.737 for ChatGPT-3.5 and 0.294-0.701 for ChatGPT-4 indicating fair to good agreement and suitability for analysis. Based on the average CLEAR scores, ChatGPT-4 outperformed ChatGPT-3.5 (mean: 2.64±1.06 vs. 3.21±1.05, P=.012, t-test). The performance of each model varied based on the CLEAR items, with the lowest performance for the "Relevance" item (2.15±0.71 for ChatGPT-3.5 and 2.65±1.16 for ChatGPT-4). A statistically significant difference upon assessing the performance per each CLEAR item was only seen in ChatGPT-4 with the best performance in "Completeness", "Lack of false information", and "Evidence support" (P=0.043). The lowest level of performance for both models was observed with antimicrobial susceptibility testing (AST) queries while the highest level of performance was seen in bacterial and mycologic identification. CONCLUSIONS Assessment of ChatGPT performance across different diagnostic clinical microbiology case scenarios showed that ChatGPT-4 outperformed ChatGPT-3.5. The performance of ChatGPT demonstrated noticeable variability depending on the specific topic evaluated. A primary shortcoming of both ChatGPT models was the tendency to generate irrelevant content lacking the needed focus. Although the overall ChatGPT performance in these diagnostic microbiology case scenarios might be described as "above average" at best, there remains a significant potential for improvement, considering the identified limitations and unsatisfactory results in a few cases.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, The University of Jordan, School of Medicine, Amman, JOR
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, JOR
| | - Khaled Al-Salahat
- Department of Pathology, Microbiology and Forensic Medicine, The University of Jordan, School of Medicine, Amman, JOR
| | - Eyad Al-Ajlouni
- Department of Pathology, Microbiology and Forensic Medicine, The University of Jordan, School of Medicine, Amman, JOR
| |
Collapse
|
21
|
Scherr R, Halaseh FF, Spina A, Andalib S, Rivera R. ChatGPT Interactive Medical Simulations for Early Clinical Education: Case Study. JMIR MEDICAL EDUCATION 2023; 9:e49877. [PMID: 37948112 PMCID: PMC10674152 DOI: 10.2196/49877] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/30/2023] [Accepted: 10/20/2023] [Indexed: 11/12/2023]
Abstract
BACKGROUND The transition to clinical clerkships can be difficult for medical students, as it requires the synthesis and application of preclinical information into diagnostic and therapeutic decisions. ChatGPT-a generative language model with many medical applications due to its creativity, memory, and accuracy-can help students in this transition. OBJECTIVE This paper models ChatGPT 3.5's ability to perform interactive clinical simulations and shows this tool's benefit to medical education. METHODS Simulation starting prompts were refined using ChatGPT 3.5 in Google Chrome. Starting prompts were selected based on assessment format, stepwise progression of simulation events and questions, free-response question type, responsiveness to user inputs, postscenario feedback, and medical accuracy of the feedback. The chosen scenarios were advanced cardiac life support and medical intensive care (for sepsis and pneumonia). RESULTS Two starting prompts were chosen. Prompt 1 was developed through 3 test simulations and used successfully in 2 simulations. Prompt 2 was developed through 10 additional test simulations and used successfully in 1 simulation. CONCLUSIONS ChatGPT is capable of creating simulations for early clinical education. These simulations let students practice novel parts of the clinical curriculum, such as forming independent diagnostic and therapeutic impressions over an entire patient encounter. Furthermore, the simulations can adapt to user inputs in a way that replicates real life more accurately than premade question bank clinical vignettes. Finally, ChatGPT can create potentially unlimited free simulations with specific feedback, which increases access for medical students with lower socioeconomic status and underresourced medical schools. However, no tool is perfect, and ChatGPT is no exception; there are concerns about simulation accuracy and replicability that need to be addressed to further optimize ChatGPT's performance as an educational resource.
Collapse
Affiliation(s)
- Riley Scherr
- Irvine School of Medicine, University of California, Irvine, CA, United States
| | - Faris F Halaseh
- Irvine School of Medicine, University of California, Irvine, CA, United States
| | - Aidin Spina
- Irvine School of Medicine, University of California, Irvine, CA, United States
| | - Saman Andalib
- Irvine School of Medicine, University of California, Irvine, CA, United States
| | - Ronald Rivera
- Department of Emergency Medicine, Irvine School of Medicine, University of California, Irvine, CA, United States
| |
Collapse
|
22
|
Tiwari K, Matthews L, May B, Shamovsky V, Orlic-Milacic M, Rothfels K, Ragueneau E, Gong C, Stephan R, Li N, Wu G, Stein L, D'Eustachio P, Hermjakob H. ChatGPT usage in the Reactome curation process. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.08.566195. [PMID: 37986970 PMCID: PMC10659344 DOI: 10.1101/2023.11.08.566195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Appreciating the rapid advancement and ubiquity of generative AI, particularly ChatGPT, a chatbot using large language models like GPT, we endeavour to explore the potential application of ChatGPT in the data collection and annotation stages within the Reactome curation process. This exploration aimed to create an automated or semi-automated framework to mitigate the extensive manual effort traditionally required for gathering and annotating information pertaining to biological pathways, adopting a Reactome "reaction-centric" approach. In this pilot study, we used ChatGPT/GPT4 to address gaps in the pathway annotation and enrichment in parallel with the conventional manual curation process. This approach facilitated a comparative analysis, where we assessed the outputs generated by ChatGPT against manually extracted information. The primary objective of this comparison was to ascertain the efficiency of integrating ChatGPT or other large language models into the Reactome curation workflow and helping plan our annotation pipeline, ultimately improving our protein-to-pathway association in a reliable and automated or semi-automated way. In the process, we identified some promising capabilities and inherent challenges associated with the utilisation of ChatGPT/GPT4 in general and also specifically in the context of Reactome curation processes. We describe approaches and tools for refining the output given by ChatGPT/GPT4 that aid in generating more accurate and detailed output.
Collapse
Affiliation(s)
- Krishna Tiwari
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Lisa Matthews
- NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Bruce May
- Ontario Institute for Cancer Research, Toronto, Ontario, M5G 0A3, Canada
| | | | | | - Karen Rothfels
- Ontario Institute for Cancer Research, Toronto, Ontario, M5G 0A3, Canada
| | - Eliot Ragueneau
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Chuqiao Gong
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Ralf Stephan
- Ontario Institute for Cancer Research, Toronto, Ontario, M5G 0A3, Canada
| | - Nancy Li
- Ontario Institute for Cancer Research, Toronto, Ontario, M5G 0A3, Canada
| | - Guanming Wu
- Oregon Health and Science University, Portland, OR 97239, USA
| | - Lincoln Stein
- Ontario Institute for Cancer Research, Toronto, Ontario, M5G 0A3, Canada
| | | | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| |
Collapse
|
23
|
Irfan B, Yaqoob A. ChatGPT's Epoch in Rheumatological Diagnostics: A Critical Assessment in the Context of Sjögren's Syndrome. Cureus 2023; 15:e47754. [PMID: 38022092 PMCID: PMC10676288 DOI: 10.7759/cureus.47754] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/26/2023] [Indexed: 12/01/2023] Open
Abstract
INTRODUCTION The rise of artificial intelligence in medical practice is reshaping clinical care. Large language models (LLMs) like ChatGPT have the potential to assist in rheumatology by personalizing scientific information retrieval, particularly in the context of Sjögren's Syndrome. This study aimed to evaluate the efficacy of ChatGPT in providing insights into Sjögren's Syndrome, differentiating it from other rheumatological conditions. MATERIALS AND METHODS A database of peer-reviewed articles and clinical guidelines focused on Sjögren's Syndrome was compiled. Clinically relevant questions were presented to ChatGPT, with responses assessed for accuracy, relevance, and comprehensiveness. Techniques such as blinding, random control queries, and temporal analysis ensured unbiased evaluation. ChatGPT's responses were also assessed using the 15-questionnaire DISCERN tool. RESULTS ChatGPT effectively highlighted key immunopathological and histopathological characteristics of Sjögren's Syndrome, though some crucial data and citation inconsistencies were noted. For a given clinical vignette, ChatGPT correctly identified potential etiological considerations with Sjögren's Syndrome being prominent. DISCUSSION LLMs like ChatGPT offer rapid access to vast amounts of data, beneficial for both patients and providers. While it democratizes information, limitations like potential oversimplification and reference inaccuracies were observed. The balance between LLM insights and clinical judgment, as well as continuous model refinement, is crucial. CONCLUSION LLMs like ChatGPT offer significant potential in rheumatology, providing swift and broad medical insights. However, a cautious approach is vital, ensuring rigorous training and ethical application for optimal patient care and clinical practice.
Collapse
Affiliation(s)
- Bilal Irfan
- Microbiology and Immunology, University of Michigan, Ann Arbor, USA
| | | |
Collapse
|
24
|
Garg RK, Urs VL, Agarwal AA, Chaudhary SK, Paliwal V, Kar SK. Exploring the role of ChatGPT in patient care (diagnosis and treatment) and medical research: A systematic review. Health Promot Perspect 2023; 13:183-191. [PMID: 37808939 PMCID: PMC10558973 DOI: 10.34172/hpp.2023.22] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 07/06/2023] [Indexed: 10/10/2023] Open
Abstract
Background ChatGPT is an artificial intelligence based tool developed by OpenAI (California, USA). This systematic review examines the potential of ChatGPT in patient care and its role in medical research. Methods The systematic review was done according to the PRISMA guidelines. Embase, Scopus, PubMed and Google Scholar data bases were searched. We also searched preprint data bases. Our search was aimed to identify all kinds of publications, without any restrictions, on ChatGPT and its application in medical research, medical publishing and patient care. We used search term "ChatGPT". We reviewed all kinds of publications including original articles, reviews, editorial/ commentaries, and even letter to the editor. Each selected records were analysed using ChatGPT and responses generated were compiled in a table. The word table was transformed in to a PDF and was further analysed using ChatPDF. Results We reviewed full texts of 118 articles. ChatGPT can assist with patient enquiries, note writing, decision-making, trial enrolment, data management, decision support, research support, and patient education. But the solutions it offers are usually insufficient and contradictory, raising questions about their originality, privacy, correctness, bias, and legality. Due to its lack of human-like qualities, ChatGPT's legitimacy as an author is questioned when used for academic writing. ChatGPT generated contents have concerns with bias and possible plagiarism. Conclusion Although it can help with patient treatment and research, there are issues with accuracy, authorship, and bias. ChatGPT can serve as a "clinical assistant" and be a help in research and scholarly writing.
Collapse
Affiliation(s)
| | - Vijeth L Urs
- Department of Neurology, King George’s Medical University, Lucknow, India
| | | | | | - Vimal Paliwal
- Department of Neurology, Sanjay Gandhi Institute of Medical Sciences, Lucknow, India
| | - Sujita Kumar Kar
- Department of Psychiatry, King George’s Medical University, Lucknow, India
| |
Collapse
|
25
|
Walters WH, Wilder EI. Fabrication and errors in the bibliographic citations generated by ChatGPT. Sci Rep 2023; 13:14045. [PMID: 37679503 PMCID: PMC10484980 DOI: 10.1038/s41598-023-41032-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 08/21/2023] [Indexed: 09/09/2023] Open
Abstract
Although chatbots such as ChatGPT can facilitate cost-effective text generation and editing, factually incorrect responses (hallucinations) limit their utility. This study evaluates one particular type of hallucination: fabricated bibliographic citations that do not represent actual scholarly works. We used ChatGPT-3.5 and ChatGPT-4 to produce short literature reviews on 42 multidisciplinary topics, compiling data on the 636 bibliographic citations (references) found in the 84 papers. We then searched multiple databases and websites to determine the prevalence of fabricated citations, to identify errors in the citations to non-fabricated papers, and to evaluate adherence to APA citation format. Within this set of documents, 55% of the GPT-3.5 citations but just 18% of the GPT-4 citations are fabricated. Likewise, 43% of the real (non-fabricated) GPT-3.5 citations but just 24% of the real GPT-4 citations include substantive citation errors. Although GPT-4 is a major improvement over GPT-3.5, problems remain.
Collapse
Affiliation(s)
- William H Walters
- Mary Alice & Tom O'Malley Library, Manhattan College, Riverdale, NY, USA.
| | - Esther Isabelle Wilder
- Department of Sociology, Lehman College, The City University of New York, Bronx, NY, USA
- Doctoral Program in Sociology, CUNY Graduate Center, The City University of New York, New York, NY, USA
| |
Collapse
|
26
|
Emsley R. ChatGPT: these are not hallucinations - they're fabrications and falsifications. SCHIZOPHRENIA (HEIDELBERG, GERMANY) 2023; 9:52. [PMID: 37598184 PMCID: PMC10439949 DOI: 10.1038/s41537-023-00379-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 07/18/2023] [Indexed: 08/21/2023]
|
27
|
Fayed AM, Mansur NSB, de Carvalho KA, Behrens A, D'Hooghe P, de Cesar Netto C. Artificial intelligence and ChatGPT in Orthopaedics and sports medicine. J Exp Orthop 2023; 10:74. [PMID: 37493985 PMCID: PMC10371934 DOI: 10.1186/s40634-023-00642-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 07/18/2023] [Indexed: 07/27/2023] Open
Abstract
Artificial intelligence (AI) is looked upon nowadays as the potential major catalyst for the fourth industrial revolution. In the last decade, AI use in Orthopaedics increased approximately tenfold. Artificial intelligence helps with tracking activities, evaluating diagnostic images, predicting injury risk, and several other uses. Chat Generated Pre-trained Transformer (ChatGPT), which is an AI-chatbot, represents an extremely controversial topic in the academic community. The aim of this review article is to simplify the concept of AI and study the extent of AI use in Orthopaedics and sports medicine literature. Additionally, the article will also evaluate the role of ChatGPT in scientific research and publications.Level of evidence: Level V, letter to review.
Collapse
Affiliation(s)
- Aly M Fayed
- Department of Orthopaedics and Rehabilitation, University of Iowa Hospitals and Clinics, Iowa City, IA, USA.
| | | | - Kepler Alencar de Carvalho
- Department of Orthopaedics and Rehabilitation, University of Iowa Hospitals and Clinics, Iowa City, IA, USA
| | - Andrew Behrens
- Department of Orthopaedics and Rehabilitation, University of Iowa Hospitals and Clinics, Iowa City, IA, USA
| | - Pieter D'Hooghe
- Aspetar Orthopedic and Sports Medicine Hospital, Doha, Qatar
| | | |
Collapse
|