151
|
Diane A, Gencarelli P, Lee JM, Mittal R. Utilizing ChatGPT to Streamline the Generation of Prior Authorization Letters and Enhance Clerical Workflow in Orthopedic Surgery Practice: A Case Report. Cureus 2023; 15:e49680. [PMID: 38161881 PMCID: PMC10756745 DOI: 10.7759/cureus.49680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/29/2023] [Indexed: 01/03/2024] Open
Abstract
Prior authorization is a cumbersome process that requires clinicians to create an individualized letter that includes detailed information about the patient's medical condition, proposed treatment plan, and any supplemental information required to obtain approval from a patient's insurance company before any services or procedures may be provided to the patient. However, drafting authorization letters is time-consuming clerical work that not only places an increased administrative burden on orthopedic surgeons and office staff but also concurrently takes time away from patient care. Therefore, there is a need to improve this process by streamlining workflows for healthcare providers in order to prioritize direct patient care. In this report, we present a case utilizing OpenAI's ChatGPT (OpenAI, L.L.C., San Francisco, CA, USA) to draft a prior authorization request letter for the use of matrix-induced autologous chondrocyte implantation to treat a cartilage injury of the knee.
Collapse
Affiliation(s)
- Alioune Diane
- Department of Orthopaedic Surgery, Rutgers Robert Wood Johnson Medical School, New Brunswick, USA
| | - Pasquale Gencarelli
- Department of Orthopaedic Surgery, Rutgers Robert Wood Johnson Medical School, New Brunswick, USA
| | - James M Lee
- Department of Orthopaedic Surgery, Orange Orthopaedic Associates, West Orange, USA
| | - Rahul Mittal
- Department of Health Informatics, Rutgers School of Health Professions, Newark, USA
| |
Collapse
|
152
|
Hu JM, Liu FC, Chu CM, Chang YT. Health Care Trainees' and Professionals' Perceptions of ChatGPT in Improving Medical Knowledge Training: Rapid Survey Study. J Med Internet Res 2023; 25:e49385. [PMID: 37851495 PMCID: PMC10620632 DOI: 10.2196/49385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 07/13/2023] [Accepted: 09/29/2023] [Indexed: 10/19/2023] Open
Abstract
BACKGROUND ChatGPT is a powerful pretrained large language model. It has both demonstrated potential and raised concerns related to knowledge translation and knowledge transfer. To apply and improve knowledge transfer in the real world, it is essential to assess the perceptions and acceptance of the users of ChatGPT-assisted training. OBJECTIVE We aimed to investigate the perceptions of health care trainees and professionals on ChatGPT-assisted training, using biomedical informatics as an example. METHODS We used purposeful sampling to include all health care undergraduate trainees and graduate professionals (n=195) from January to May 2023 in the School of Public Health at the National Defense Medical Center in Taiwan. Subjects were asked to watch a 2-minute video introducing 5 scenarios about ChatGPT-assisted training in biomedical informatics and then answer a self-designed online (web- and mobile-based) questionnaire according to the Kirkpatrick model. The survey responses were used to develop 4 constructs: "perceived knowledge acquisition," "perceived training motivation," "perceived training satisfaction," and "perceived training effectiveness." The study used structural equation modeling (SEM) to evaluate and test the structural model and hypotheses. RESULTS The online questionnaire response rate was 152 of 195 (78%); 88 of 152 participants (58%) were undergraduate trainees and 90 of 152 participants (59%) were women. The ages ranged from 18 to 53 years (mean 23.3, SD 6.0 years). There was no statistical difference in perceptions of training evaluation between men and women. Most participants were enthusiastic about the ChatGPT-assisted training, while the graduate professionals were more enthusiastic than undergraduate trainees. Nevertheless, some concerns were raised about potential cheating on training assessment. The average scores for knowledge acquisition, training motivation, training satisfaction, and training effectiveness were 3.84 (SD 0.80), 3.76 (SD 0.93), 3.75 (SD 0.87), and 3.72 (SD 0.91), respectively (Likert scale 1-5: strongly disagree to strongly agree). Knowledge acquisition had the highest score and training effectiveness the lowest. In the SEM results, training effectiveness was influenced predominantly by knowledge acquisition and partially met the hypotheses in the research framework. Knowledge acquisition had a direct effect on training effectiveness, training satisfaction, and training motivation, with β coefficients of .80, .87, and .97, respectively (all P<.001). CONCLUSIONS Most health care trainees and professionals perceived ChatGPT-assisted training as an aid in knowledge transfer. However, to improve training effectiveness, it should be combined with empirical experts for proper guidance and dual interaction. In a future study, we recommend using a larger sample size for evaluation of internet-connected large language models in medical knowledge transfer.
Collapse
Affiliation(s)
- Je-Ming Hu
- Division of Colorectal Surgery, Department of Surgery, Tri-service General Hospital, National Defense Medical Center, Taipei, Taiwan
- Graduate Institute of Medical Sciences, National Defense Medical Center, Taipei, Taiwan
- School of Medicine, National Defense Medical Center, Taipei, Taiwan
| | - Feng-Cheng Liu
- Division of Rheumatology/Immunology and Allergy, Department of Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
| | - Chi-Ming Chu
- Graduate Institute of Medical Sciences, National Defense Medical Center, Taipei, Taiwan
- School of Public Health, National Defense Medical Center, Taipei, Taiwan
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei, Taiwan
- Big Data Research Center, College of Medicine, Fu-Jen Catholic University, New Taipei City, Taiwan
- Department of Public Health, Kaohsiung Medical University, Kaohsiung, Taiwan
- Department of Public Health, China Medical University, Taichung, Taiwan
| | - Yu-Tien Chang
- School of Public Health, National Defense Medical Center, Taipei, Taiwan
| |
Collapse
|
153
|
Rashidi HH, Fennell BD, Albahra S, Hu B, Gorbett T. The ChatGPT conundrum: Human-generated scientific manuscripts misidentified as AI creations by AI text detection tool. J Pathol Inform 2023; 14:100342. [PMID: 38116171 PMCID: PMC10727991 DOI: 10.1016/j.jpi.2023.100342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/08/2023] [Accepted: 10/10/2023] [Indexed: 12/21/2023] Open
Abstract
AI Chat Bots such as ChatGPT are revolutionizing our AI capabilities, especially in text generation, to help expedite many tasks, but they introduce new dilemmas. The detection of AI-generated text has become a subject of great debate considering the AI text detector's known and unexpected limitations. Thus far, much research in this area has focused on the detection of AI-generated text; however, the goal of this study was to evaluate the opposite scenario, an AI-text detection tool's ability to discriminate human-generated text. Thousands of abstracts from several of the most well-known scientific journals were used to test the predictive capabilities of these detection tools, assessing abstracts from 1980 to 2023. We found that the AI text detector erroneously identified up to 8% of the known real abstracts as AI-generated text. This further highlights the current limitations of such detection tools and argues for novel detectors or combined approaches that can address this shortcoming and minimize its unanticipated consequences as we navigate this new AI landscape.
Collapse
Affiliation(s)
- Hooman H. Rashidi
- Pathology and Laboratory Medicine Institute (PLMI), Cleveland Clinic, Cleveland, OH, United States
- PLMI’s Center for Artificial Intelligence & Data Science, Cleveland Clinic, Cleveland, OH, United States
| | - Brandon D. Fennell
- University of California, San Francisco – Department of Medicine, San Francisco, CA, United States
| | - Samer Albahra
- Pathology and Laboratory Medicine Institute (PLMI), Cleveland Clinic, Cleveland, OH, United States
- PLMI’s Center for Artificial Intelligence & Data Science, Cleveland Clinic, Cleveland, OH, United States
| | - Bo Hu
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH, United States
- PLMI’s Center for Artificial Intelligence & Data Science, Cleveland Clinic, Cleveland, OH, United States
| | - Tom Gorbett
- Pathology and Laboratory Medicine Institute (PLMI), Cleveland Clinic, Cleveland, OH, United States
- PLMI’s Center for Artificial Intelligence & Data Science, Cleveland Clinic, Cleveland, OH, United States
| |
Collapse
|
154
|
Mou C, Liang A, Hu C, Meng F, Han B, Xu F. Monitoring Endangered and Rare Wildlife in the Field: A Foundation Deep Learning Model Integrating Human Knowledge for Incremental Recognition with Few Data and Low Cost. Animals (Basel) 2023; 13:3168. [PMID: 37893892 PMCID: PMC10603653 DOI: 10.3390/ani13203168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/04/2023] [Accepted: 10/09/2023] [Indexed: 10/29/2023] Open
Abstract
Intelligent monitoring of endangered and rare wildlife is important for biodiversity conservation. In practical monitoring, few animal data are available to train recognition algorithms. The system must, therefore, achieve high accuracy with limited resources. Simultaneously, zoologists expect the system to be able to discover unknown species to make significant discoveries. To date, none of the current algorithms have these abilities. Therefore, this paper proposed a KI-CLIP method. Firstly, by first introducing CLIP, a foundation deep learning model that has not yet been applied in animal fields, the powerful recognition capability with few training resources is exploited with an additional shallow network. Secondly, inspired by the single-image recognition abilities of zoologists, we incorporate easily accessible expert description texts to improve performance with few samples. Finally, a simple incremental learning module is designed to detect unknown species. We conducted extensive comparative experiments, ablation experiments, and case studies on 12 datasets containing real data. The results validate the effectiveness of KI-CLIP, which can be trained on multiple real scenarios in seconds, achieving in our study over 90% recognition accuracy with only 8 training samples, and over 97% with 16 training samples. In conclusion, KI-CLIP is suitable for practical animal monitoring.
Collapse
Affiliation(s)
- Chao Mou
- School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China; (C.M.)
- Engineering Research Center for Forestry-oriented Intelligent Information Processing of National Forestry and Grassland Administration, Beijing 100083, China
| | - Aokang Liang
- School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China; (C.M.)
- Engineering Research Center for Forestry-oriented Intelligent Information Processing of National Forestry and Grassland Administration, Beijing 100083, China
| | - Chunying Hu
- School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China; (C.M.)
- Engineering Research Center for Forestry-oriented Intelligent Information Processing of National Forestry and Grassland Administration, Beijing 100083, China
| | - Fanyu Meng
- School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China; (C.M.)
- Engineering Research Center for Forestry-oriented Intelligent Information Processing of National Forestry and Grassland Administration, Beijing 100083, China
| | - Baixun Han
- School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China; (C.M.)
- Engineering Research Center for Forestry-oriented Intelligent Information Processing of National Forestry and Grassland Administration, Beijing 100083, China
| | - Fu Xu
- School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China; (C.M.)
- Engineering Research Center for Forestry-oriented Intelligent Information Processing of National Forestry and Grassland Administration, Beijing 100083, China
- State Key Laboratory of Efficient Production of Forest Resources, Beijing 100083, China
| |
Collapse
|
155
|
Abani S, De Decker S, Tipold A, Nessler JN, Volk HA. Can ChatGPT diagnose my collapsing dog? Front Vet Sci 2023; 10:1245168. [PMID: 37901112 PMCID: PMC10600474 DOI: 10.3389/fvets.2023.1245168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 09/19/2023] [Indexed: 10/31/2023] Open
Affiliation(s)
- Samira Abani
- Department of Small Animal Medicine and Surgery, University of Veterinary Medicine Hannover, Hannover, Germany
- Centre for Systems Neuroscience, University of Veterinary Medicine Hannover, Hannover, Germany
| | - Steven De Decker
- Department of Veterinary Clinical Science and Services, Royal Veterinary College, University of London, London, United Kingdom
| | - Andrea Tipold
- Department of Small Animal Medicine and Surgery, University of Veterinary Medicine Hannover, Hannover, Germany
- Centre for Systems Neuroscience, University of Veterinary Medicine Hannover, Hannover, Germany
| | - Jasmin Nicole Nessler
- Department of Small Animal Medicine and Surgery, University of Veterinary Medicine Hannover, Hannover, Germany
| | - Holger Andreas Volk
- Department of Small Animal Medicine and Surgery, University of Veterinary Medicine Hannover, Hannover, Germany
- Centre for Systems Neuroscience, University of Veterinary Medicine Hannover, Hannover, Germany
| |
Collapse
|
156
|
Abu Hammour K, Alhamad H, Al-Ashwal FY, Halboup A, Abu Farha R, Abu Hammour A. ChatGPT in pharmacy practice: a cross-sectional exploration of Jordanian pharmacists' perception, practice, and concerns. J Pharm Policy Pract 2023; 16:115. [PMID: 37789443 PMCID: PMC10548710 DOI: 10.1186/s40545-023-00624-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 09/22/2023] [Indexed: 10/05/2023] Open
Abstract
OBJECTIVES The purpose of this study is to find out how much pharmacists know and have used ChatGPT in their practice. We investigated the advantages and disadvantages of utilizing ChatGPT in a pharmacy context, the amount of training necessary to use it proficiently, and the influence on patient care using a survey. METHODS This cross-sectional study was carried out between May and June 2023 to assess the potential and problems that pharmacists observed while integrating chatbots powered by AI (ChatGPT) in pharmacy practice. The correlation between perceived benefits and concerns was evaluated using Spearman's rho correlation due to the data's non-normal distribution.Any pharmacists licensed by the Jordanian Pharmacists Association were included in the study. A convenient sampling technique was used to choose the participants, and the study questionnaire was distributed utilizing an online medium (Facebook and WhatsApp). Anyone who expressed interest in taking part was given a link to the study's instructions so they may read them before giving their electronic consent and accessing the survey. RESULTS The potential advantages of ChatGPT in the pharmacy practice were widely acknowledged by the participants. The majority of participants (69.9%) concurred that educational material about pharmacy items or therapeutic areas can be provided using ChatGPT, with 66.9% of respondents believing that ChatGPT is a machine learning algorithm. Concerns about the accuracy of AI-generated responses were also prevalent. More than half of the participants (55.7%) raised the possibility that AI systems such as ChatGPT could pick up on and replicate prejudices and discriminatory patterns from the data they were trained on. Analysis shows a statistically significant positive link, albeit a minor one, between the perceived advantages of ChatGPT and its drawbacks (r = 0.255, p < 0.001). However, concerns were strongly correlated with knowledge of ChatGPT. In contrast to those who were either unsure or had not heard of ChatGPT (64.2%), individuals who had heard of it were more likely to have strong concerns (79.8%) (p = 0.002). Finally, the results show a statistically significant association between the frequency of ChatGPT use and positive perceptions of the tool (p < 0.001). CONCLUSIONS Although ChatGPT has shown promise in health and pharmaceutical practice, its application should be rigorously regulated by evidence-based law. According to the study's findings, pharmacists support the use of ChatGPT in pharmacy practice but have concerns about its use due to ethical reasons, legal problems, privacy concerns, worries about the accuracy of the data generated, data learning, and bias risk.
Collapse
Affiliation(s)
- Khawla Abu Hammour
- Department of Clinical Pharmacy and Biopharmaceutics, Faculty of Pharmacy, University of Jordan, Amman, Jordan
| | - Hamza Alhamad
- Department of Clinical Pharmacy, Faculty of Pharmacy, Zarqa University, Zarqa, Jordan
| | - Fahmi Y Al-Ashwal
- Department of Clinical Pharmacy, College of Pharmacy, Al-Ayen University, Thi-Qar, Iraq.
- Department of Clinical Pharmacy and Pharmacy Practice, Faculty of Pharmacy, University of Science and Technology, Sana'a, Yemen.
| | - Abdulsalam Halboup
- Department of Clinical Pharmacy and Pharmacy Practice, Faculty of Pharmacy, University of Science and Technology, Sana'a, Yemen
- Discipline of Clinical Pharmacy, School of Pharmaceutical Sciences, University Sains Malaysia, Gelugor, Pulau Pinang, Malaysia
| | - Rana Abu Farha
- Clinical Pharmacy and Therapeutics Department, Faculty of Pharmacy, Applied Science Private University, P.O. Box 11937, Amman, Jordan
| | - Adnan Abu Hammour
- Medrise Medical Center, Dubai Healthcare City, Dubai, United Arab Emirates
| |
Collapse
|
157
|
Goodman RS, Patrinely JR, Stone CA, Zimmerman E, Donald RR, Chang SS, Berkowitz ST, Finn AP, Jahangir E, Scoville EA, Reese TS, Friedman DL, Bastarache JA, van der Heijden YF, Wright JJ, Ye F, Carter N, Alexander MR, Choe JH, Chastain CA, Zic JA, Horst SN, Turker I, Agarwal R, Osmundson E, Idrees K, Kiernan CM, Padmanabhan C, Bailey CE, Schlegel CE, Chambless LB, Gibson MK, Osterman TJ, Wheless LE, Johnson DB. Accuracy and Reliability of Chatbot Responses to Physician Questions. JAMA Netw Open 2023; 6:e2336483. [PMID: 37782499 PMCID: PMC10546234 DOI: 10.1001/jamanetworkopen.2023.36483] [Citation(s) in RCA: 52] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 08/22/2023] [Indexed: 10/03/2023] Open
Abstract
Importance Natural language processing tools, such as ChatGPT (generative pretrained transformer, hereafter referred to as chatbot), have the potential to radically enhance the accessibility of medical information for health professionals and patients. Assessing the safety and efficacy of these tools in answering physician-generated questions is critical to determining their suitability in clinical settings, facilitating complex decision-making, and optimizing health care efficiency. Objective To assess the accuracy and comprehensiveness of chatbot-generated responses to physician-developed medical queries, highlighting the reliability and limitations of artificial intelligence-generated medical information. Design, Setting, and Participants Thirty-three physicians across 17 specialties generated 284 medical questions that they subjectively classified as easy, medium, or hard with either binary (yes or no) or descriptive answers. The physicians then graded the chatbot-generated answers to these questions for accuracy (6-point Likert scale with 1 being completely incorrect and 6 being completely correct) and completeness (3-point Likert scale, with 1 being incomplete and 3 being complete plus additional context). Scores were summarized with descriptive statistics and compared using the Mann-Whitney U test or the Kruskal-Wallis test. The study (including data analysis) was conducted from January to May 2023. Main Outcomes and Measures Accuracy, completeness, and consistency over time and between 2 different versions (GPT-3.5 and GPT-4) of chatbot-generated medical responses. Results Across all questions (n = 284) generated by 33 physicians (31 faculty members and 2 recent graduates from residency or fellowship programs) across 17 specialties, the median accuracy score was 5.5 (IQR, 4.0-6.0) (between almost completely and complete correct) with a mean (SD) score of 4.8 (1.6) (between mostly and almost completely correct). The median completeness score was 3.0 (IQR, 2.0-3.0) (complete and comprehensive) with a mean (SD) score of 2.5 (0.7). For questions rated easy, medium, and hard, the median accuracy scores were 6.0 (IQR, 5.0-6.0), 5.5 (IQR, 5.0-6.0), and 5.0 (IQR, 4.0-6.0), respectively (mean [SD] scores were 5.0 [1.5], 4.7 [1.7], and 4.6 [1.6], respectively; P = .05). Accuracy scores for binary and descriptive questions were similar (median score, 6.0 [IQR, 4.0-6.0] vs 5.0 [IQR, 3.4-6.0]; mean [SD] score, 4.9 [1.6] vs 4.7 [1.6]; P = .07). Of 36 questions with scores of 1.0 to 2.0, 34 were requeried or regraded 8 to 17 days later with substantial improvement (median score 2.0 [IQR, 1.0-3.0] vs 4.0 [IQR, 2.0-5.3]; P < .01). A subset of questions, regardless of initial scores (version 3.5), were regenerated and rescored using version 4 with improvement (mean accuracy [SD] score, 5.2 [1.5] vs 5.7 [0.8]; median score, 6.0 [IQR, 5.0-6.0] for original and 6.0 [IQR, 6.0-6.0] for rescored; P = .002). Conclusions and Relevance In this cross-sectional study, chatbot generated largely accurate information to diverse medical queries as judged by academic physician specialists with improvement over time, although it had important limitations. Further research and model development are needed to correct inaccuracies and for validation.
Collapse
Affiliation(s)
| | - J. Randall Patrinely
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Cosby A. Stone
- Department of Allergy, Pulmonology, and Critical Care, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Eli Zimmerman
- Department of Neurology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Rebecca R. Donald
- Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Sam S. Chang
- Department of Urology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Sean T. Berkowitz
- Vanderbilt Eye Institute, Department of Ophthalmology, Vanderbilt University Medical, Nashville, Tennessee
| | - Avni P. Finn
- Vanderbilt Eye Institute, Department of Ophthalmology, Vanderbilt University Medical, Nashville, Tennessee
| | - Eiman Jahangir
- Department of Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Elizabeth A. Scoville
- Department of Gastroenterology, Hepatology, and Nutrition, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Tyler S. Reese
- Department of Rheumatology and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Debra L. Friedman
- Department of Pediatric Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Julie A. Bastarache
- Department of Allergy, Pulmonology, and Critical Care, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Yuri F. van der Heijden
- Department of Infectious Disease, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Jordan J. Wright
- Department of Diabetes, Endocrinology, and Metabolism, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Fei Ye
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Nicholas Carter
- Division of Trauma and Surgical Critical Care, University of Miami Miller School of Medicine, Miami, Florida
| | - Matthew R. Alexander
- Department of Cardiovascular Medicine and Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Jennifer H. Choe
- Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Cody A. Chastain
- Department of Infectious Disease, Vanderbilt University Medical Center, Nashville, Tennessee
| | - John A. Zic
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Sara N. Horst
- Department of Gastroenterology, Hepatology, and Nutrition, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Isik Turker
- Department of Cardiology, Washington University School of Medicine in St Louis, St Louis, Missouri
| | - Rajiv Agarwal
- Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Evan Osmundson
- Department of Radiation Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Kamran Idrees
- Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Colleen M. Kiernan
- Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Chandrasekhar Padmanabhan
- Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Christina E. Bailey
- Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Cameron E. Schlegel
- Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Lola B. Chambless
- Department of Neurological Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Michael K. Gibson
- Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Travis J. Osterman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Lee E. Wheless
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Douglas B. Johnson
- Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
| |
Collapse
|
158
|
Blüthgen C. Does GPT4 dream of counting electric nodules? Eur Radiol 2023; 33:6756-6758. [PMID: 37099177 PMCID: PMC10511354 DOI: 10.1007/s00330-023-09671-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 04/12/2023] [Accepted: 04/14/2023] [Indexed: 04/27/2023]
Affiliation(s)
- Christian Blüthgen
- Institute for Diagnostic and Interventional Radiology, University Hospital Zurich, University of Zurich, Rämistrasse 100, CH-8091, Zurich, Switzerland.
- Center for Artificial Intelligence in Medicine and Imaging (AIMI), Stanford University, Stanford, CA, USA.
| |
Collapse
|
159
|
Eggmann F, Weiger R, Zitzmann NU, Blatz MB. Implications of large language models such as ChatGPT for dental medicine. J ESTHET RESTOR DENT 2023; 35:1098-1102. [PMID: 37017291 DOI: 10.1111/jerd.13046] [Citation(s) in RCA: 44] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 03/25/2023] [Accepted: 03/28/2023] [Indexed: 04/06/2023]
Abstract
OBJECTIVE This article provides an overview of the implications of ChatGPT and other large language models (LLMs) for dental medicine. OVERVIEW ChatGPT, a LLM trained on massive amounts of textual data, is adept at fulfilling various language-related tasks. Despite its impressive capabilities, ChatGPT has serious limitations, such as occasionally giving incorrect answers, producing nonsensical content, and presenting misinformation as fact. Dental practitioners, assistants, and hygienists are not likely to be significantly impacted by LLMs. However, LLMs could affect the work of administrative personnel and the provision of dental telemedicine. LLMs offer potential for clinical decision support, text summarization, efficient writing, and multilingual communication. As more people seek health information from LLMs, it is crucial to safeguard against inaccurate, outdated, and biased responses to health-related queries. LLMs pose challenges for patient data confidentiality and cybersecurity that must be tackled. In dental education, LLMs present fewer challenges than in other academic fields. LLMs can enhance academic writing fluency, but acceptable usage boundaries in science need to be established. CONCLUSIONS While LLMs such as ChatGPT may have various useful applications in dental medicine, they come with risks of malicious use and serious limitations, including the potential for misinformation. CLINICAL SIGNIFICANCE Along with the potential benefits of using LLMs as an additional tool in dental medicine, it is crucial to carefully consider the limitations and potential risks inherent in such artificial intelligence technologies.
Collapse
Affiliation(s)
- Florin Eggmann
- Department of Preventive and Restorative Sciences, Penn Dental Medicine, Robert Schattner Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Periodontology, Endodontology, and Cariology, University Center for Dental Medicine Basel UZB, University of Basel, Basel, Switzerland
| | - Roland Weiger
- Department of Periodontology, Endodontology, and Cariology, University Center for Dental Medicine Basel UZB, University of Basel, Basel, Switzerland
| | - Nicola U Zitzmann
- Department of Reconstructive Dentistry, University Center for Dental Medicine Basel UZB, University of Basel, Basel, Switzerland
| | - Markus B Blatz
- Department of Preventive and Restorative Sciences, Penn Dental Medicine, Robert Schattner Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
160
|
Momenaei B, Wakabayashi T, Shahlaee A, Durrani AF, Pandit SA, Wang K, Mansour HA, Abishek RM, Xu D, Sridhar J, Yonekawa Y, Kuriyan AE. Appropriateness and Readability of ChatGPT-4-Generated Responses for Surgical Treatment of Retinal Diseases. Ophthalmol Retina 2023; 7:862-868. [PMID: 37277096 DOI: 10.1016/j.oret.2023.05.022] [Citation(s) in RCA: 54] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/26/2023] [Accepted: 05/30/2023] [Indexed: 06/07/2023]
Abstract
OBJECTIVE To evaluate the appropriateness and readability of the medical knowledge provided by ChatGPT-4, an artificial intelligence-powered conversational search engine, regarding common vitreoretinal surgeries for retinal detachments (RDs), macular holes (MHs), and epiretinal membranes (ERMs). DESIGN Retrospective cross-sectional study. SUBJECTS This study did not involve any human participants. METHODS We created lists of common questions about the definition, prevalence, visual impact, diagnostic methods, surgical and nonsurgical treatment options, postoperative information, surgery-related complications, and visual prognosis of RD, MH, and ERM, and asked each question 3 times on the online ChatGPT-4 platform. The data for this cross-sectional study were recorded on April 25, 2023. Two independent retina specialists graded the appropriateness of the responses. Readability was assessed using Readable, an online readability tool. MAIN OUTCOME MEASURES The "appropriateness" and "readability" of the answers generated by ChatGPT-4 bot. RESULTS Responses were consistently appropriate in 84.6% (33/39), 92% (23/25), and 91.7% (22/24) of the questions related to RD, MH, and ERM, respectively. Answers were inappropriate at least once in 5.1% (2/39), 8% (2/25), and 8.3% (2/24) of the respective questions. The average Flesch Kincaid Grade Level and Flesch Reading Ease Score were 14.1 ± 2.6 and 32.3 ± 10.8 for RD, 14 ± 1.3 and 34.4 ± 7.7 for MH, and 14.8 ± 1.3 and 28.1 ± 7.5 for ERM. These scores indicate that the answers are difficult or very difficult to read for the average lay person and college graduation would be required to understand the material. CONCLUSIONS Most of the answers provided by ChatGPT-4 were consistently appropriate. However, ChatGPT and other natural language models in their current form are not a source of factual information. Improving the credibility and readability of responses, especially in specialized fields, such as medicine, is a critical focus of research. Patients, physicians, and laypersons should be advised of the limitations of these tools for eye- and health-related counseling. FINANCIAL DISCLOSURE(S) Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Bita Momenaei
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Taku Wakabayashi
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Abtin Shahlaee
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Asad F Durrani
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Saagar A Pandit
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Kristine Wang
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Hana A Mansour
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Robert M Abishek
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - David Xu
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Jayanth Sridhar
- Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida
| | - Yoshihiro Yonekawa
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Ajay E Kuriyan
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania.
| |
Collapse
|
161
|
Kim JK, Chua M, Rickard M, Lorenzo A. ChatGPT and large language model (LLM) chatbots: The current state of acceptability and a proposal for guidelines on utilization in academic medicine. J Pediatr Urol 2023; 19:598-604. [PMID: 37328321 DOI: 10.1016/j.jpurol.2023.05.018] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 05/14/2023] [Accepted: 05/27/2023] [Indexed: 06/18/2023]
Abstract
INTRODUCTION There is currently no clear consensus on the standards for using large language models such as ChatGPT in academic medicine. Hence, we performed a scoping review of available literature to understand the current state of LLM use in medicine and to provide a guideline for future utilization in academia. MATERIALS AND METHODS A scoping review of the literature was performed through a Medline search on February 16, 2023 using a combination of keywords including artificial intelligence, machine learning, natural language processing, generative pre-trained transformer, ChatGPT, and large language model. There were no restrictions to language or date of publication. Records not pertaining to LLMs were excluded. Records pertaining to LLM ChatBots and ChatGPT were identified and evaluated separately. Among the records pertaining to LLM ChatBots and ChatGPT, those that suggest recommendations for ChatGPT use in academia were utilized to create guideline statements for ChatGPT and LLM use in academic medicine. RESULTS A total of 87 records were identified. 30 records were not pertaining to large language models and were excluded. 54 records underwent a full-text review for evaluation. There were 33 records related to LLM ChatBots or ChatGPT. DISCUSSION From assessing these texts, five guideline statements for LLM use was developed: (1) ChatGPT/LLM cannot be cited as an author in scientific manuscripts; (2) If use of ChatGPT/LLM are considered for use in academic work, author(s) should have at least a basic understanding of what ChatGPT/LLM is; (3) Do not use ChatGPT/LLM to produce entirety of text in manuscripts; humans must be held accountable for use of ChatGPT/LLM and contents created by ChatGPT/LLM should be meticulously verified by humans; (4) ChatGPT/LLMs may be used for editing and refining of text; (5) Any use of ChatGPT/LLM should be transparent and should be clearly outlined in scientific manuscripts and acknowledged. CONCLUSION Future authors should remain mindful of the potential impact their academic work may have on healthcare and continue to uphold the highest ethical standards and integrity when utilizing ChatGPT/LLM.
Collapse
Affiliation(s)
- Jin K Kim
- Division of Urology, Department of Surgery, The Hospital for Sick Children, Toronto, Canada; Division of Urology, Department of Surgery, University of Toronto, Toronto, Canada.
| | - Michael Chua
- Division of Urology, Department of Surgery, The Hospital for Sick Children, Toronto, Canada; Division of Urology, Department of Surgery, University of Toronto, Toronto, Canada; Institute of Urology, St. Luke's Medical Center, Quezon City, Philippines
| | - Mandy Rickard
- Division of Urology, Department of Surgery, The Hospital for Sick Children, Toronto, Canada
| | - Armando Lorenzo
- Division of Urology, Department of Surgery, The Hospital for Sick Children, Toronto, Canada; Division of Urology, Department of Surgery, University of Toronto, Toronto, Canada
| |
Collapse
|
162
|
Cai W. Feasibility and Prospect of Privacy-preserving Large Language Models in Radiology. Radiology 2023; 309:e232335. [PMID: 37815443 PMCID: PMC10623203 DOI: 10.1148/radiol.232335] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 09/07/2023] [Accepted: 09/08/2023] [Indexed: 10/11/2023]
Affiliation(s)
- Wenli Cai
- From the Department of Radiology, Massachusetts General Hospital and
Harvard Medical School, 399 Revolution Dr, 13W44, Sommerville, MA 02145
| |
Collapse
|
163
|
Rao A, Kim J, Kamineni M, Pang M, Lie W, Dreyer KJ, Succi MD. Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot. J Am Coll Radiol 2023; 20:990-997. [PMID: 37356806 PMCID: PMC10733745 DOI: 10.1016/j.jacr.2023.05.003] [Citation(s) in RCA: 47] [Impact Index Per Article: 47.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 05/16/2023] [Accepted: 05/23/2023] [Indexed: 06/27/2023]
Abstract
OBJECTIVE Despite rising popularity and performance, studies evaluating the use of large language models for clinical decision support are lacking. Here, we evaluate ChatGPT (Generative Pre-trained Transformer)-3.5 and GPT-4's (OpenAI, San Francisco, California) capacity for clinical decision support in radiology via the identification of appropriate imaging services for two important clinical presentations: breast cancer screening and breast pain. METHODS We compared ChatGPT's responses to the ACR Appropriateness Criteria for breast pain and breast cancer screening. Our prompt formats included an open-ended (OE) and a select all that apply (SATA) format. Scoring criteria evaluated whether proposed imaging modalities were in accordance with ACR guidelines. Three replicate entries were conducted for each prompt, and the average of these was used to determine final scores. RESULTS Both ChatGPT-3.5 and ChatGPT-4 achieved an average OE score of 1.830 (out of 2) for breast cancer screening prompts. ChatGPT-3.5 achieved a SATA average percentage correct of 88.9%, compared with ChatGPT-4's average percentage correct of 98.4% for breast cancer screening prompts. For breast pain, ChatGPT-3.5 achieved an average OE score of 1.125 (out of 2) and a SATA average percentage correct of 58.3%, as compared with an average OE score of 1.666 (out of 2) and a SATA average percentage correct of 77.7%. DISCUSSION Our results demonstrate the eventual feasibility of using large language models like ChatGPT for radiologic decision making, with the potential to improve clinical workflow and responsible use of radiology services. More use cases and greater accuracy are necessary to evaluate and implement such tools.
Collapse
Affiliation(s)
- Arya Rao
- Harvard Medical School, Boston, Massachusetts; Medically Engineered Solutions in Healthcare, Innovation in Operations Research Center, Massachusetts General Hospital, Boston, Massachusetts
| | - John Kim
- Harvard Medical School, Boston, Massachusetts; Medically Engineered Solutions in Healthcare, Innovation in Operations Research Center, Massachusetts General Hospital, Boston, Massachusetts
| | - Meghana Kamineni
- Harvard Medical School, Boston, Massachusetts; Medically Engineered Solutions in Healthcare, Innovation in Operations Research Center, Massachusetts General Hospital, Boston, Massachusetts
| | - Michael Pang
- Harvard Medical School, Boston, Massachusetts; Medically Engineered Solutions in Healthcare, Innovation in Operations Research Center, Massachusetts General Hospital, Boston, Massachusetts
| | - Winston Lie
- Harvard Medical School, Boston, Massachusetts; Medically Engineered Solutions in Healthcare, Innovation in Operations Research Center, Massachusetts General Hospital, Boston, Massachusetts
| | - Keith J Dreyer
- Harvard Medical School, Boston, Massachusetts; Medically Engineered Solutions in Healthcare, Innovation in Operations Research Center, Massachusetts General Hospital, Boston, Massachusetts; Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts; and Chief Data Science Officer and Chief Imaging Information Officer for Mass General Brigham, Boston, Massachusetts
| | - Marc D Succi
- Harvard Medical School, Boston, Massachusetts; Medically Engineered Solutions in Healthcare, Innovation in Operations Research Center and Associate Chair of Innovation & Commercialization, Mass General Brigham Enterprise Radiology; Executive Director, MESH Incubator. Massachusetts General Hospital, Boston, Massachusetts; and Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts.
| |
Collapse
|
164
|
Laxar D, Eitenberger M, Maleczek M, Kaider A, Hammerle FP, Kimberger O. The influence of explainable vs non-explainable clinical decision support systems on rapid triage decisions: a mixed methods study. BMC Med 2023; 21:359. [PMID: 37726729 PMCID: PMC10510231 DOI: 10.1186/s12916-023-03068-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 09/05/2023] [Indexed: 09/21/2023] Open
Abstract
BACKGROUND During the COVID-19 pandemic, a variety of clinical decision support systems (CDSS) were developed to aid patient triage. However, research focusing on the interaction between decision support systems and human experts is lacking. METHODS Thirty-two physicians were recruited to rate the survival probability of 59 critically ill patients by means of chart review. Subsequently, one of two artificial intelligence systems advised the physician of a computed survival probability. However, only one of these systems explained the reasons behind its decision-making. In the third step, physicians reviewed the chart once again to determine the final survival probability rating. We hypothesized that an explaining system would exhibit a higher impact on the physicians' second rating (i.e., higher weight-on-advice). RESULTS The survival probability rating given by the physician after receiving advice from the clinical decision support system was a median of 4 percentage points closer to the advice than the initial rating. Weight-on-advice was not significantly different (p = 0.115) between the two systems (with vs without explanation for its decision). Additionally, weight-on-advice showed no difference according to time of day or between board-qualified and not yet board-qualified physicians. Self-reported post-experiment overall trust was awarded a median of 4 out of 10 points. When asked after the conclusion of the experiment, overall trust was 5.5/10 (non-explaining median 4 (IQR 3.5-5.5), explaining median 7 (IQR 5.5-7.5), p = 0.007). CONCLUSIONS Although overall trust in the models was low, the median (IQR) weight-on-advice was high (0.33 (0.0-0.56)) and in line with published literature on expert advice. In contrast to the hypothesis, weight-on-advice was comparable between the explaining and non-explaining systems. In 30% of cases, weight-on-advice was 0, meaning the physician did not change their rating. The median of the remaining weight-on-advice values was 50%, suggesting that physicians either dismissed the recommendation or employed a "meeting halfway" approach. Newer technologies, such as clinical reasoning systems, may be able to augment the decision process rather than simply presenting unexplained bias.
Collapse
Affiliation(s)
- Daniel Laxar
- Department of Anaesthesia, Intensive Care Medicine and Pain Medicine, Medical University of Vienna, Vienna, Austria
- Ludwig Boltzmann Institute Digital Health and Patient Safety, Ludwig Boltzmann Gesellschaft, Vienna, Austria
| | - Magdalena Eitenberger
- Ludwig Boltzmann Institute Digital Health and Patient Safety, Ludwig Boltzmann Gesellschaft, Vienna, Austria
| | - Mathias Maleczek
- Department of Anaesthesia, Intensive Care Medicine and Pain Medicine, Medical University of Vienna, Vienna, Austria.
- Ludwig Boltzmann Institute Digital Health and Patient Safety, Ludwig Boltzmann Gesellschaft, Vienna, Austria.
| | - Alexandra Kaider
- Center for Medical Data Science, Medical University of Vienna, Vienna, Austria
| | - Fabian Peter Hammerle
- Department of Anaesthesia, Intensive Care Medicine and Pain Medicine, Medical University of Vienna, Vienna, Austria
| | - Oliver Kimberger
- Department of Anaesthesia, Intensive Care Medicine and Pain Medicine, Medical University of Vienna, Vienna, Austria
- Ludwig Boltzmann Institute Digital Health and Patient Safety, Ludwig Boltzmann Gesellschaft, Vienna, Austria
| |
Collapse
|
165
|
Khlaif ZN, Mousa A, Hattab MK, Itmazi J, Hassan AA, Sanmugam M, Ayyoub A. The Potential and Concerns of Using AI in Scientific Research: ChatGPT Performance Evaluation. JMIR MEDICAL EDUCATION 2023; 9:e47049. [PMID: 37707884 PMCID: PMC10636627 DOI: 10.2196/47049] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 06/04/2023] [Accepted: 07/21/2023] [Indexed: 09/15/2023]
Abstract
BACKGROUND Artificial intelligence (AI) has many applications in various aspects of our daily life, including health, criminal, education, civil, business, and liability law. One aspect of AI that has gained significant attention is natural language processing (NLP), which refers to the ability of computers to understand and generate human language. OBJECTIVE This study aims to examine the potential for, and concerns of, using AI in scientific research. For this purpose, high-impact research articles were generated by analyzing the quality of reports generated by ChatGPT and assessing the application's impact on the research framework, data analysis, and the literature review. The study also explored concerns around ownership and the integrity of research when using AI-generated text. METHODS A total of 4 articles were generated using ChatGPT, and thereafter evaluated by 23 reviewers. The researchers developed an evaluation form to assess the quality of the articles generated. Additionally, 50 abstracts were generated using ChatGPT and their quality was evaluated. The data were subjected to ANOVA and thematic analysis to analyze the qualitative data provided by the reviewers. RESULTS When using detailed prompts and providing the context of the study, ChatGPT would generate high-quality research that could be published in high-impact journals. However, ChatGPT had a minor impact on developing the research framework and data analysis. The primary area needing improvement was the development of the literature review. Moreover, reviewers expressed concerns around ownership and the integrity of the research when using AI-generated text. Nonetheless, ChatGPT has a strong potential to increase human productivity in research and can be used in academic writing. CONCLUSIONS AI-generated text has the potential to improve the quality of high-impact research articles. The findings of this study suggest that decision makers and researchers should focus more on the methodology part of the research, which includes research design, developing research tools, and analyzing data in depth, to draw strong theoretical and practical implications, thereby establishing a revolution in scientific research in the era of AI. The practical implications of this study can be used in different fields such as medical education to deliver materials to develop the basic competencies for both medicine students and faculty members.
Collapse
Affiliation(s)
- Zuheir N Khlaif
- Faculty of Humanities and Educational Sciences, An-Najah National University, Nablus, Occupied Palestinian Territory
| | - Allam Mousa
- Artificial Intelligence and Virtual Reality Research Center, Department of Electrical and Computer Engineering, An Najah National University, Nablus, Occupied Palestinian Territory
| | - Muayad Kamal Hattab
- Faculty of Law and Political Sciences, An-Najah National University, Nablus, Occupied Palestinian Territory
| | - Jamil Itmazi
- Department of Information Technology, College of Engineering and Information Technology, Palestine Ahliya University, Bethlahem, Occupied Palestinian Territory
| | - Amjad A Hassan
- Faculty of Law and Political Sciences, An-Najah National University, Nablus, Occupied Palestinian Territory
| | - Mageswaran Sanmugam
- Centre for Instructional Technology and Multimedia, Universiti Sains Malaysia, Penang, Malaysia
| | - Abedalkarim Ayyoub
- Faculty of Humanities and Educational Sciences, An-Najah National University, Nablus, Occupied Palestinian Territory
| |
Collapse
|
166
|
Brameier DT, Alnasser AA, Carnino JM, Bhashyam AR, von Keudell AG, Weaver MJ. Artificial Intelligence in Orthopaedic Surgery: Can a Large Language Model "Write" a Believable Orthopaedic Journal Article? J Bone Joint Surg Am 2023; 105:1388-1392. [PMID: 37437021 DOI: 10.2106/jbjs.23.00473] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/14/2023]
Abstract
ABSTRACT ➢ Natural language processing with large language models is a subdivision of artificial intelligence (AI) that extracts meaning from text with use of linguistic rules, statistics, and machine learning to generate appropriate text responses. Its utilization in medicine and in the field of orthopaedic surgery is rapidly growing.➢ Large language models can be utilized in generating scientific manuscript texts of a publishable quality; however, they suffer from AI hallucinations, in which untruths or half-truths are stated with misleading confidence. Their use raises considerable concerns regarding the potential for research misconduct and for hallucinations to insert misinformation into the clinical literature.➢ Current editorial processes are insufficient for identifying the involvement of large language models in manuscripts. Academic publishing must adapt to encourage safe use of these tools by establishing clear guidelines for their use, which should be adopted across the orthopaedic literature, and by implementing additional steps in the editorial screening process to identify the use of these tools in submitted manuscripts.
Collapse
Affiliation(s)
- Devon T Brameier
- Department of Orthopaedic Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Ahmad A Alnasser
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Jonathan M Carnino
- Boston University Chobanian & Avedisian School of Medicine, Boston, Massachusetts
| | - Abhiram R Bhashyam
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Arvind G von Keudell
- Department of Orthopaedic Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
- Bispebjerg Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Michael J Weaver
- Department of Orthopaedic Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
167
|
Currie GM. Academic integrity and artificial intelligence: is ChatGPT hype, hero or heresy? Semin Nucl Med 2023; 53:719-730. [PMID: 37225599 DOI: 10.1053/j.semnuclmed.2023.04.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 04/30/2023] [Indexed: 05/26/2023]
Abstract
Academic integrity in both higher education and scientific writing has been challenged by developments in artificial intelligence. The limitations associated with algorithms have been largely overcome by the recently released ChatGPT; a chatbot powered by GPT-3.5 capable of producing accurate and human-like responses to questions in real-time. Despite the potential benefits, ChatGPT confronts significant limitations to its usefulness in nuclear medicine and radiology. Most notably, ChatGPT is prone to errors and fabrication of information which poses a risk to professionalism, ethics and integrity. These limitations simultaneously undermine the value of ChatGPT to the user by not producing outcomes at the expected standard. Nonetheless, there are a number of exciting applications of ChatGPT in nuclear medicine across education, clinical and research sectors. Assimilation of ChatGPT into practice requires redefining of norms, and re-engineering of information expectations.
Collapse
Affiliation(s)
- Geoffrey M Currie
- Charles Sturt University, Wagga Wagga, NSW, Australia; Baylor College of Medicine, Houston, TX.
| |
Collapse
|
168
|
Ravi A, Neinstein A, Murray SG. Large Language Models and Medical Education: Preparing for a Rapid Transformation in How Trainees Will Learn to Be Doctors. ATS Sch 2023; 4:282-292. [PMID: 37795112 PMCID: PMC10547030 DOI: 10.34197/ats-scholar.2023-0036ps] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Accepted: 06/01/2023] [Indexed: 10/06/2023] Open
Abstract
Artificial intelligence has the potential to revolutionize health care but has yet to be widely implemented. In part, this may be because, to date, we have focused on easily predicted rather than easily actionable problems. Large language models (LLMs) represent a paradigm shift in our approach to artificial intelligence because they are easily accessible and already being tested by frontline clinicians, who are rapidly identifying possible use cases. LLMs in health care have the potential to reduce clerical work, bridge gaps in patient education, and more. As we enter this era of healthcare delivery, LLMs will present both opportunities and challenges in medical education. Future models should be developed to support trainees to develop skills in clinical reasoning, encourage evidence-based medicine, and offer case-based training opportunities. LLMs may also change what we continue teaching trainees with regard to clinical documentation. Finally, trainees can help us train and develop the LLMs of the future as we consider the best ways to incorporate LLMs into medical education. Ready or not, LLMs will soon be integrated into various aspects of clinical practice, and we must work closely with students and educators to make sure these models are also built with trainees in mind to responsibly chaperone medical education into the next era.
Collapse
Affiliation(s)
| | - Aaron Neinstein
- Department of Medicine
- Center for Digital Health Innovation and
| | - Sara G. Murray
- Department of Medicine
- Health Informatics, University of California, San Francisco, San Francisco, California
| |
Collapse
|
169
|
Fink MA. [Large language models such as ChatGPT and GPT-4 for patient-centered care in radiology]. RADIOLOGIE (HEIDELBERG, GERMANY) 2023; 63:665-671. [PMID: 37615692 DOI: 10.1007/s00117-023-01187-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 07/14/2023] [Indexed: 08/25/2023]
Abstract
BACKGROUND With the introduction of ChatGPT in late November 2022, large language models based on artificial intelligence have gained worldwide recognition. These language models are trained on vast amounts of data, enabling them to process complex tasks in seconds and provide detailed, high-level text-based responses. OBJECTIVE To provide an overview of the most widely discussed large language models, ChatGPT and GPT‑4, with a focus on potential applications for patient-centered radiology. MATERIALS AND METHODS A PubMed search of both large language models was performed using the terms "ChatGPT" and "GPT-4", with subjective selection and completion in the form of a narrative review. RESULTS The generic nature of language models holds great promise for radiology, enabling both patients and referrers to facilitate understanding of radiological findings, overcome language barriers, and improve the quality of informed consent discussions. This could represent a significant step towards patient-centered or person-centered radiology. CONCLUSION Large language models represent a promising tool for improving the communication of findings, interdisciplinary collaboration, and workflow in radiology. However, important privacy issues and the reliable applicability of these models in medicine remain to be addressed.
Collapse
Affiliation(s)
- Matthias A Fink
- Klinik für Diagnostische und Interventionelle Radiologie, Universitätsklinikum Heidelberg, Im Neuenheimer Feld 420, 69120, Heidelberg, Deutschland.
| |
Collapse
|
170
|
Chervenak J, Lieman H, Blanco-Breindel M, Jindal S. The promise and peril of using a large language model to obtain clinical information: ChatGPT performs strongly as a fertility counseling tool with limitations. Fertil Steril 2023; 120:575-583. [PMID: 37217092 DOI: 10.1016/j.fertnstert.2023.05.151] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 05/01/2023] [Accepted: 05/12/2023] [Indexed: 05/24/2023]
Abstract
OBJECTIVE To compare the responses of the large language model-based "ChatGPT" to reputable sources when given fertility-related clinical prompts. DESIGN The "Feb 13" version of ChatGPT by OpenAI was tested against established sources relating to patient-oriented clinical information: 17 "frequently asked questions (FAQs)" about infertility on the Centers for Disease Control (CDC) Website, 2 validated fertility knowledge surveys, the Cardiff Fertility Knowledge Scale and the Fertility and Infertility Treatment Knowledge Score, as well as the American Society for Reproductive Medicine committee opinion "optimizing natural fertility." SETTING Academic medical center. PATIENT(S) Online AI Chatbot. INTERVENTION(S) Frequently asked questions, survey questions and rephrased summary statements were entered as prompts in the chatbot over a 1-week period in February 2023. MAIN OUTCOME MEASURE(S) For FAQs from CDC: words/response, sentiment analysis polarity and objectivity, total factual statements, rate of statements that were incorrect, referenced a source, or noted the value of consulting providers. FOR FERTILITY KNOWLEDGE SURVEYS Percentile according to published population data. FOR COMMITTEE OPINION Whether response to conclusions rephrased as questions identified missing facts. RESULT(S) When administered the CDC's 17 infertility FAQ's, ChatGPT produced responses of similar length (207.8 ChatGPT vs. 181.0 CDC words/response), factual content (8.65 factual statements/response vs. 10.41), sentiment polarity (mean 0.11 vs. 0.11 on a scale of -1 (negative) to 1 (positive)), and subjectivity (mean 0.42 vs. 0.35 on a scale of 0 (objective) to 1 (subjective)). In total, 9 (6.12%) of 147 ChatGPT factual statements were categorized as incorrect, and only 1 (0.68%) statement cited a reference. ChatGPT would have been at the 87th percentile of Bunting's 2013 international cohort for the Cardiff Fertility Knowledge Scale and at the 95th percentile on the basis of Kudesia's 2017 cohort for the Fertility and Infertility Treatment Knowledge Score. ChatGPT reproduced the missing facts for all 7 summary statements from "optimizing natural fertility." CONCLUSION(S) A February 2023 version of "ChatGPT" demonstrates the ability of generative artificial intelligence to produce relevant, meaningful responses to fertility-related clinical queries comparable to established sources. Although performance may improve with medical domain-specific training, limitations such as the inability to reliably cite sources and the unpredictable possibility of fabricated information may limit its clinical use.
Collapse
Affiliation(s)
- Joseph Chervenak
- Albert Einstein College of Medicine/Montefiore's Institute for Reproductive Medicine and Health, Hartsdale, New York.
| | - Harry Lieman
- Albert Einstein College of Medicine/Montefiore's Institute for Reproductive Medicine and Health, Hartsdale, New York
| | - Miranda Blanco-Breindel
- Albert Einstein College of Medicine/Montefiore's Institute for Reproductive Medicine and Health, Hartsdale, New York
| | - Sangita Jindal
- Albert Einstein College of Medicine/Montefiore's Institute for Reproductive Medicine and Health, Hartsdale, New York
| |
Collapse
|
171
|
Nazir A, Wang Z. A Comprehensive Survey of ChatGPT: Advancements, Applications, Prospects, and Challenges. META-RADIOLOGY 2023; 1:100022. [PMID: 37901715 PMCID: PMC10611551 DOI: 10.1016/j.metrad.2023.100022] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/31/2023]
Abstract
Large Language Models (LLMs) especially when combined with Generative Pre-trained Transformers (GPT) represent a groundbreaking in natural language processing. In particular, ChatGPT, a state-of-the-art conversational language model with a user-friendly interface, has garnered substantial attention owing to its remarkable capability for generating human-like responses across a variety of conversational scenarios. This survey offers an overview of ChatGPT, delving into its inception, evolution, and key technology. We summarize the fundamental principles that underpin ChatGPT, encompassing its introduction in conjunction with GPT and LLMs. We also highlight the specific characteristics of GPT models with details of their impressive language understanding and generation capabilities. We then summarize applications of ChatGPT in a few representative domains. In parallel to the many advantages that ChatGPT can provide, we discuss the limitations and challenges along with potential mitigation strategies. Despite various controversial arguments and ethical concerns, ChatGPT has drawn significant attention from research industries and academia in a very short period. The survey concludes with an envision of promising avenues for future research in the field of ChatGPT. It is worth noting that knowing and addressing the challenges faced by ChatGPT will mount the way for more reliable and trustworthy conversational agents in the years to come.
Collapse
Affiliation(s)
- Anam Nazir
- Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine. W 670 Baltimore St, HSF III, Room 1173, Baltimore, MD 21201
| | - Ze Wang
- Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine. W 670 Baltimore St, HSF III, Room 1173, Baltimore, MD 21201
| |
Collapse
|
172
|
Ariyaratne S, Iyengar KP, Nischal N, Chitti Babu N, Botchu R. A comparison of ChatGPT-generated articles with human-written articles. Skeletal Radiol 2023; 52:1755-1758. [PMID: 37059827 DOI: 10.1007/s00256-023-04340-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 04/06/2023] [Accepted: 04/09/2023] [Indexed: 04/16/2023]
Abstract
OBJECTIVE ChatGPT (Generative Pre-trained Transformer) is an artificial intelligence language tool developed by OpenAI that utilises machine learning algorithms to generate text that closely mimics human language. It has recently taken the internet by storm. There have been several concerns regarding the accuracy of documents it generates. This study compares the accuracy and quality of several ChatGPT-generated academic articles with those written by human authors. MATERIAL AND METHODS We performed a study to assess the accuracy of ChatGPT-generated radiology articles by comparing them with the published or written, and under review articles. These were independently analysed by two fellowship-trained musculoskeletal radiologists and graded from 1 to 5 (1 being bad and inaccurate to 5 being excellent and accurate). RESULTS In total, 4 of the 5 articles written by ChatGPT were significantly inaccurate with fictitious references. One of the papers was well written, with a good introduction and discussion; however, all references were fictitious. CONCLUSION ChatGPT is able to generate coherent research articles, which on initial review may closely resemble authentic articles published by academic researchers. However, all of the articles we assessed were factually inaccurate and had fictitious references. It is worth noting, however, that the articles generated may appear authentic to an untrained reader.
Collapse
Affiliation(s)
- Sisith Ariyaratne
- Department of Musculoskeletal Radiology, The Royal Orthopedic Hospital, Bristol Road South, Northfield, Birmingham, UK
| | | | - Neha Nischal
- Department of Radiology, Holy Family Hospital, New Delhi, India
| | - Naparla Chitti Babu
- Department of Radiology, Srinivas Institute of Medical Sciences & Research Centre, Mukka, Mangalore, India
| | - Rajesh Botchu
- Department of Musculoskeletal Radiology, The Royal Orthopedic Hospital, Bristol Road South, Northfield, Birmingham, UK.
| |
Collapse
|
173
|
Doo FX, Cook TS, Siegel EL, Joshi A, Parekh V, Elahi A, Yi PH. Exploring the Clinical Translation of Generative Models Like ChatGPT: Promise and Pitfalls in Radiology, From Patients to Population Health. J Am Coll Radiol 2023; 20:877-885. [PMID: 37467871 DOI: 10.1016/j.jacr.2023.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 06/22/2023] [Accepted: 07/05/2023] [Indexed: 07/21/2023]
Abstract
Generative artificial intelligence (AI) tools such as GPT-4, and the chatbot interface ChatGPT, show promise for a variety of applications in radiology and health care. However, like other AI tools, ChatGPT has limitations and potential pitfalls that must be considered before adopting it for teaching, clinical practice, and beyond. We summarize five major emerging use cases for ChatGPT and generative AI in radiology across the levels of increasing data complexity, along with pitfalls associated with each. As the use of AI in health care continues to grow, it is crucial for radiologists (and all physicians) to stay informed and ensure the safe translation of these new technologies.
Collapse
Affiliation(s)
- Florence X Doo
- Director of Innovation, University of Maryland Medical Intelligent Imaging Center (UM2ii), Baltimore, Maryland; Member, Committee on Economics in Academic Radiology, under the ACR Commission on Economics.
| | - Tessa S Cook
- Vice Chair for Practice Transformation, Department of Radiology, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania; Fellowship Director, Imaging Informatics, and Chief, 3-D and Advanced Imaging, Department of Radiology, Penn Medicine, Philadelphia, Pennsylvania; Chair, Society for Imaging Informatics in Medicine; and Vice Chair, ACR Commission on Patient- and Family-Centered Care; Chair, RAHSR Affinity Group. https://twitter.com/asset25
| | - Eliot L Siegel
- Vice Chair, Research Information Systems, University of Maryland, Baltimore, Maryland; Lead, Radiology and Nuclear Medicine Diagnostics, US Department of Veterans Affairs Veterans Integrated Services Network; Chief, Imaging, US Department of Veterans Affairs Maryland Healthcare System; Radiology AI Senior Consultant. https://twitter.com/EliotSiegel
| | - Anupam Joshi
- Oros Family Professor and Chair, Computer Science and Electrical Engineering, University of Maryland, Baltimore, Maryland; Director, University of Maryland, Baltimore County, Center for Cybersecurity; Director, CyberScholars Program; Associate Editor, IEEE Transactions on Dependable and Secure Computing
| | - Vishwa Parekh
- Technical Director, University of Maryland Medical Intelligent Imaging (UM2ii) Center, Baltimore, Maryland; Review Editor, Frontiers in Oncology. https://twitter.com/vishwa_parekh
| | - Ameena Elahi
- University of Pennsylvania, Philadelphia, Pennsylvania; Application Manager, Information Services, Penn Medicine, Philadelphia, Pennsylvania; Informatics Operations Director, RAD-AID International. https://twitter.com/AmeenaElahi
| | - Paul H Yi
- Director, University of Maryland Medical Intelligent Imaging (UM2ii) Center, Baltimore, Maryland; Vice Chair, Society of Imaging Informatics in Medicine Program Planning Committee; Associate Editor, Radiology: Artificial Intelligence. https://twitter.com/PaulYiMD
| |
Collapse
|
174
|
Goktas P, Karakaya G, Kalyoncu AF, Damadoglu E. Artificial Intelligence Chatbots in Allergy and Immunology Practice: Where Have We Been and Where Are We Going? THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2023; 11:2697-2700. [PMID: 37301435 DOI: 10.1016/j.jaip.2023.05.042] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 05/22/2023] [Accepted: 05/25/2023] [Indexed: 06/12/2023]
Abstract
Artificial intelligence (AI) is rapidly becoming a valuable tool in healthcare, providing clinicians with a new AI lens perspective for patient care, diagnosis, and treatment. This article explores the potential applications, benefits, and challenges of AI chatbots in clinical settings, with a particular emphasis on ChatGPT 4.0 (OpenAI - Chat generative pretrained transformer 4.0), especially in the field of allergy and immunology. AI chatbots have shown considerable promise in various medical domains, including radiology and dermatology, by improving patient engagement, diagnostic accuracy, and personalized treatment plans. ChatGPT 4.0, developed by OpenAI, is good at understanding and replying to prompts in a way that makes sense. However, it is critical to address the potential biases, data privacy issues, ethical considerations, and the need for verification of AI-generated findings. When used responsibly, AI chatbots can significantly enhance clinical practice in allergy and immunology. However, there are still challenges in using this technology that require ongoing research and collaboration between AI developers and medical specialists. To this end, the ChatGPT 4.0 platform has the potential to enhance patient engagement, improve diagnostic accuracy, and provide personalized treatment plans in allergy and immunology practice. However, limitations and risks must be addressed to ensure their safe and effective use in clinical practice.
Collapse
Affiliation(s)
- Polat Goktas
- UCD School of Computer Science, University College Dublin, Belfield, Dublin, Ireland; CeADAR: Ireland's Centre for Applied Artificial Intelligence, Clonskeagh, Dublin, Ireland.
| | - Gul Karakaya
- School of Medicine, Department of Chest Diseases, Division of Allergy and Clinical Immunology, Hacettepe University, Ankara, Turkey
| | - Ali Fuat Kalyoncu
- School of Medicine, Department of Chest Diseases, Division of Allergy and Clinical Immunology, Hacettepe University, Ankara, Turkey
| | - Ebru Damadoglu
- School of Medicine, Department of Chest Diseases, Division of Allergy and Clinical Immunology, Hacettepe University, Ankara, Turkey
| |
Collapse
|
175
|
Suppadungsuk S, Thongprayoon C, Krisanapan P, Tangpanithandee S, Garcia Valencia O, Miao J, Mekraksakit P, Kashani K, Cheungpasitporn W. Examining the Validity of ChatGPT in Identifying Relevant Nephrology Literature: Findings and Implications. J Clin Med 2023; 12:5550. [PMID: 37685617 PMCID: PMC10488525 DOI: 10.3390/jcm12175550] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 08/21/2023] [Accepted: 08/24/2023] [Indexed: 09/10/2023] Open
Abstract
Literature reviews are valuable for summarizing and evaluating the available evidence in various medical fields, including nephrology. However, identifying and exploring the potential sources requires focus and time devoted to literature searching for clinicians and researchers. ChatGPT is a novel artificial intelligence (AI) large language model (LLM) renowned for its exceptional ability to generate human-like responses across various tasks. However, whether ChatGPT can effectively assist medical professionals in identifying relevant literature is unclear. Therefore, this study aimed to assess the effectiveness of ChatGPT in identifying references to literature reviews in nephrology. We keyed the prompt "Please provide the references in Vancouver style and their links in recent literature on… name of the topic" into ChatGPT-3.5 (03/23 Version). We selected all the results provided by ChatGPT and assessed them for existence, relevance, and author/link correctness. We recorded each resource's citations, authors, title, journal name, publication year, digital object identifier (DOI), and link. The relevance and correctness of each resource were verified by searching on Google Scholar. Of the total 610 references in the nephrology literature, only 378 (62%) of the references provided by ChatGPT existed, while 31% were fabricated, and 7% of citations were incomplete references. Notably, only 122 (20%) of references were authentic. Additionally, 256 (68%) of the links in the references were found to be incorrect, and the DOI was inaccurate in 206 (54%) of the references. Moreover, among those with a link provided, the link was correct in only 20% of cases, and 3% of the references were irrelevant. Notably, an analysis of specific topics in electrolyte, hemodialysis, and kidney stones found that >60% of the references were inaccurate or misleading, with less reliable authorship and links provided by ChatGPT. Based on our findings, the use of ChatGPT as a sole resource for identifying references to literature reviews in nephrology is not recommended. Future studies could explore ways to improve AI language models' performance in identifying relevant nephrology literature.
Collapse
Affiliation(s)
- Supawadee Suppadungsuk
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)
- Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Samut Prakan 10540, Thailand
| | - Charat Thongprayoon
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)
| | - Pajaree Krisanapan
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)
- Division of Nephrology, Thammasat University Hospital, Pathum Thani 12120, Thailand
| | - Supawit Tangpanithandee
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)
- Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Samut Prakan 10540, Thailand
| | - Oscar Garcia Valencia
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)
| | - Jing Miao
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)
| | - Poemlarp Mekraksakit
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)
| | - Kianoush Kashani
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)
| | - Wisit Cheungpasitporn
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (S.S.); (C.T.); (P.K.); (S.T.); (O.G.V.); (J.M.); (P.M.); (K.K.)
| |
Collapse
|
176
|
Leung TI, Sagar A, Shroff S, Henry TL. Can AI Mitigate Bias in Writing Letters of Recommendation? JMIR MEDICAL EDUCATION 2023; 9:e51494. [PMID: 37610808 PMCID: PMC10483302 DOI: 10.2196/51494] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 08/08/2023] [Accepted: 08/08/2023] [Indexed: 08/24/2023]
Abstract
Letters of recommendation play a significant role in higher education and career progression, particularly for women and underrepresented groups in medicine and science. Already, there is evidence to suggest that written letters of recommendation contain language that expresses implicit biases, or unconscious biases, and that these biases occur for all recommenders regardless of the recommender's sex. Given that all individuals have implicit biases that may influence language use, there may be opportunities to apply contemporary technologies, such as large language models or other forms of generative artificial intelligence (AI), to augment and potentially reduce implicit biases in the written language of letters of recommendation. In this editorial, we provide a brief overview of existing literature on the manifestations of implicit bias in letters of recommendation, with a focus on academia and medical education. We then highlight potential opportunities and drawbacks of applying this emerging technology in augmenting the focused, professional task of writing letters of recommendation. We also offer best practices for integrating their use into the routine writing of letters of recommendation and conclude with our outlook for the future of generative AI applications in supporting this task.
Collapse
Affiliation(s)
- Tiffany I Leung
- Department of Internal Medicine (adjunct), Southern Illinois University School of Medicine, Toronto, ON, Canada
- JMIR Publications, Toronto, ON, Canada
| | - Ankita Sagar
- CommonSpirit Health, Chicago, IL, United States
- Creighton University School of Medicine, Omaha, NE, United States
| | - Swati Shroff
- Division of Internal Medicine, Thomas Jefferson University, Philadelphia, PA, United States
| | - Tracey L Henry
- Department of Medicine, Emory University School of Medicine, Atlanta, GA, United States
| |
Collapse
|
177
|
Hsu HY, Hsu KC, Hou SY, Wu CL, Hsieh YW, Cheng YD. Examining Real-World Medication Consultations and Drug-Herb Interactions: ChatGPT Performance Evaluation. JMIR MEDICAL EDUCATION 2023; 9:e48433. [PMID: 37561097 PMCID: PMC10477918 DOI: 10.2196/48433] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 06/23/2023] [Accepted: 07/25/2023] [Indexed: 08/11/2023]
Abstract
BACKGROUND Since OpenAI released ChatGPT, with its strong capability in handling natural tasks and its user-friendly interface, it has garnered significant attention. OBJECTIVE A prospective analysis is required to evaluate the accuracy and appropriateness of medication consultation responses generated by ChatGPT. METHODS A prospective cross-sectional study was conducted by the pharmacy department of a medical center in Taiwan. The test data set comprised retrospective medication consultation questions collected from February 1, 2023, to February 28, 2023, along with common questions about drug-herb interactions. Two distinct sets of questions were tested: real-world medication consultation questions and common questions about interactions between traditional Chinese and Western medicines. We used the conventional double-review mechanism. The appropriateness of each response from ChatGPT was assessed by 2 experienced pharmacists. In the event of a discrepancy between the assessments, a third pharmacist stepped in to make the final decision. RESULTS Of 293 real-world medication consultation questions, a random selection of 80 was used to evaluate ChatGPT's performance. ChatGPT exhibited a higher appropriateness rate in responding to public medication consultation questions compared to those asked by health care providers in a hospital setting (31/51, 61% vs 20/51, 39%; P=.01). CONCLUSIONS The findings from this study suggest that ChatGPT could potentially be used for answering basic medication consultation questions. Our analysis of the erroneous information allowed us to identify potential medical risks associated with certain questions; this problem deserves our close attention.
Collapse
Affiliation(s)
- Hsing-Yu Hsu
- Department of Pharmacy, China Medical University Hospital, Taichung, Taiwan
- Graduate Institute of Clinical Pharmacy, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Kai-Cheng Hsu
- Artificial Intelligence Center, China Medical University Hospital, Taichung, Taiwan
- Department of Medicine, China Medical University, Taichung, Taiwan
| | - Shih-Yen Hou
- Artificial Intelligence Center, China Medical University Hospital, Taichung, Taiwan
| | - Ching-Lung Wu
- School of Pharmacy, College of Pharmacy, China Medical University, Taichung, Taiwan
| | - Yow-Wen Hsieh
- Department of Pharmacy, China Medical University Hospital, Taichung, Taiwan
- School of Pharmacy, College of Pharmacy, China Medical University, Taichung, Taiwan
| | - Yih-Dih Cheng
- Department of Pharmacy, China Medical University Hospital, Taichung, Taiwan
- School of Pharmacy, College of Pharmacy, China Medical University, Taichung, Taiwan
| |
Collapse
|
178
|
Wang X, Gong Z, Wang G, Jia J, Xu Y, Zhao J, Fan Q, Wu S, Hu W, Li X. ChatGPT Performs on the Chinese National Medical Licensing Examination. J Med Syst 2023; 47:86. [PMID: 37581690 DOI: 10.1007/s10916-023-01961-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 06/22/2023] [Indexed: 08/16/2023]
Abstract
ChatGPT, a language model developed by OpenAI, uses a 175 billion parameter Transformer architecture for natural language processing tasks. This study aimed to compare the knowledge and interpretation ability of ChatGPT with those of medical students in China by administering the Chinese National Medical Licensing Examination (NMLE) to both ChatGPT and medical students. We evaluated the performance of ChatGPT in three years' worth of the NMLE, which consists of four units. At the same time, the exam results were compared to those of medical students who had studied for five years at medical colleges. ChatGPT's performance was lower than that of the medical students, and ChatGPT's correct answer rate was related to the year in which the exam questions were released. ChatGPT's knowledge and interpretation ability for the NMLE were not yet comparable to those of medical students in China. It is probable that these abilities will improve through deep learning.
Collapse
Affiliation(s)
- Xinyi Wang
- Department of Medical Education, Ruijin Hospital Affifiliated to Shanghai Jiao Tong University School of Medicine, 197 Ruijin Rd. II, Shanghai, 200025, China
| | - Zhenye Gong
- Department of Medical Education, Ruijin Hospital Affifiliated to Shanghai Jiao Tong University School of Medicine, 197 Ruijin Rd. II, Shanghai, 200025, China
| | - Guoxin Wang
- Department of Medical Education, Ruijin Hospital Affifiliated to Shanghai Jiao Tong University School of Medicine, 197 Ruijin Rd. II, Shanghai, 200025, China
| | - Jingdan Jia
- Department of Medical Education, Ruijin Hospital Affifiliated to Shanghai Jiao Tong University School of Medicine, 197 Ruijin Rd. II, Shanghai, 200025, China
| | - Ying Xu
- Department of Medical Education, Ruijin Hospital Affifiliated to Shanghai Jiao Tong University School of Medicine, 197 Ruijin Rd. II, Shanghai, 200025, China
| | - Jialu Zhao
- Department of Medical Education, Ruijin Hospital Affifiliated to Shanghai Jiao Tong University School of Medicine, 197 Ruijin Rd. II, Shanghai, 200025, China
| | - Qingye Fan
- Department of Medical Education, Ruijin Hospital Affifiliated to Shanghai Jiao Tong University School of Medicine, 197 Ruijin Rd. II, Shanghai, 200025, China
| | - Shaun Wu
- WORK Medical Technology Group LTD, Hangzhou, China
| | - Weiguo Hu
- Department of Medical Education, Ruijin Hospital Affifiliated to Shanghai Jiao Tong University School of Medicine, 197 Ruijin Rd. II, Shanghai, 200025, China
| | - Xiaoyang Li
- Department of Medical Education, Ruijin Hospital Affifiliated to Shanghai Jiao Tong University School of Medicine, 197 Ruijin Rd. II, Shanghai, 200025, China.
| |
Collapse
|
179
|
Patil NS, Huang RS, van der Pol CB, Larocque N. Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment. Can Assoc Radiol J 2023:8465371231193716. [PMID: 37578849 DOI: 10.1177/08465371231193716] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/16/2023] Open
Abstract
PURPOSE Bard by Google, a direct competitor to ChatGPT, was recently released. Understanding the relative performance of these different chatbots can provide important insight into their strengths and weaknesses as well as which roles they are most suited to fill. In this project, we aimed to compare the most recent version of ChatGPT, ChatGPT-4, and Bard by Google, in their ability to accurately respond to radiology board examination practice questions. METHODS Text-based questions were collected from the 2017-2021 American College of Radiology's Diagnostic Radiology In-Training (DXIT) examinations. ChatGPT-4 and Bard were queried, and their comparative accuracies, response lengths, and response times were documented. Subspecialty-specific performance was analyzed as well. RESULTS 318 questions were included in our analysis. ChatGPT answered significantly more accurately than Bard (87.11% vs 70.44%, P < .0001). ChatGPT's response length was significantly shorter than Bard's (935.28 ± 440.88 characters vs 1437.52 ± 415.91 characters, P < .0001). ChatGPT's response time was significantly longer than Bard's (26.79 ± 3.27 seconds vs 7.55 ± 1.88 seconds, P < .0001). ChatGPT performed superiorly to Bard in neuroradiology, (100.00% vs 86.21%, P = .03), general & physics (85.39% vs 68.54%, P < .001), nuclear medicine (80.00% vs 56.67%, P < .01), pediatric radiology (93.75% vs 68.75%, P = .03), and ultrasound (100.00% vs 63.64%, P < .001). In the remaining subspecialties, there were no significant differences between ChatGPT and Bard's performance. CONCLUSION ChatGPT displayed superior radiology knowledge compared to Bard. While both chatbots display reasonable radiology knowledge, they should be used with conscious knowledge of their limitations and fallibility. Both chatbots provided incorrect or illogical answer explanations and did not always address the educational content of the question.
Collapse
Affiliation(s)
- Nikhil S Patil
- Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON, Canada
| | - Ryan S Huang
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Christian B van der Pol
- Department of Diagnostic Imaging, Hamilton Health Sciences, Juravinski Hospital and Cancer Centre, Hamilton, ON, Canada
| | - Natasha Larocque
- Department of Radiology, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
180
|
Alanzi TM. Impact of ChatGPT on Teleconsultants in Healthcare: Perceptions of Healthcare Experts in Saudi Arabia. J Multidiscip Healthc 2023; 16:2309-2321. [PMID: 37601325 PMCID: PMC10438433 DOI: 10.2147/jmdh.s419847] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 08/01/2023] [Indexed: 08/22/2023] Open
Abstract
Purpose This study aims to investigate the impact of ChatGPT on teleconsultants in managing their operations and services. Methods A qualitative approach with focus groups is adopted in this study. A total of 54 participants with varying degrees of experience using AI such as ChatGPT in healthcare, including 11 physicians, 24 nurses, eight dieticians, six pharmacists, and five physiotherapists providing teleconsultations participated in this study. Results Twelve themes including informational support, diagnostic assistance, communication, enhancing efficiency, cost and time saving, personalizing care, multilingual support, assisting in medical research, decision-making, documentation, continuing education, and enhanced team collaboration reflecting positive impact were identified from the data analysis of seven focus groups. In addition, six themes including misdiagnosis and errors, issues in personalized care, ethical and legal issues, limited medical context/knowledge, communication challenges, and increased dependency reflecting negative impact were identified. Conclusion Although ChatGPT has several advantages for teleconsultants in the healthcare sector, it is associated with ethical issues.
Collapse
Affiliation(s)
- Turki M Alanzi
- Health Information Management and Technology Department, College of Public Health, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| |
Collapse
|
181
|
Wang C, Liu S, Yang H, Guo J, Wu Y, Liu J. Ethical Considerations of Using ChatGPT in Health Care. J Med Internet Res 2023; 25:e48009. [PMID: 37566454 PMCID: PMC10457697 DOI: 10.2196/48009] [Citation(s) in RCA: 58] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 07/05/2023] [Accepted: 07/25/2023] [Indexed: 08/12/2023] Open
Abstract
ChatGPT has promising applications in health care, but potential ethical issues need to be addressed proactively to prevent harm. ChatGPT presents potential ethical challenges from legal, humanistic, algorithmic, and informational perspectives. Legal ethics concerns arise from the unclear allocation of responsibility when patient harm occurs and from potential breaches of patient privacy due to data collection. Clear rules and legal boundaries are needed to properly allocate liability and protect users. Humanistic ethics concerns arise from the potential disruption of the physician-patient relationship, humanistic care, and issues of integrity. Overreliance on artificial intelligence (AI) can undermine compassion and erode trust. Transparency and disclosure of AI-generated content are critical to maintaining integrity. Algorithmic ethics raise concerns about algorithmic bias, responsibility, transparency and explainability, as well as validation and evaluation. Information ethics include data bias, validity, and effectiveness. Biased training data can lead to biased output, and overreliance on ChatGPT can reduce patient adherence and encourage self-diagnosis. Ensuring the accuracy, reliability, and validity of ChatGPT-generated content requires rigorous validation and ongoing updates based on clinical practice. To navigate the evolving ethical landscape of AI, AI in health care must adhere to the strictest ethical standards. Through comprehensive ethical guidelines, health care professionals can ensure the responsible use of ChatGPT, promote accurate and reliable information exchange, protect patient privacy, and empower patients to make informed decisions about their health care.
Collapse
Affiliation(s)
- Changyu Wang
- Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China
- West China College of Stomatology, Sichuan University, Chengdu, China
| | - Siru Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Hao Yang
- Information Center, West China Hospital, Sichuan University, Chengdu, China
| | - Jiulin Guo
- Information Center, West China Hospital, Sichuan University, Chengdu, China
| | - Yuxuan Wu
- Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China
| | - Jialin Liu
- Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China
- Information Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
182
|
Lin Z. Why and how to embrace AI such as ChatGPT in your academic life. ROYAL SOCIETY OPEN SCIENCE 2023; 10:230658. [PMID: 37621662 PMCID: PMC10445029 DOI: 10.1098/rsos.230658] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 08/03/2023] [Indexed: 08/26/2023]
Abstract
Generative artificial intelligence (AI), including large language models (LLMs), is poised to transform scientific research, enabling researchers to elevate their research productivity. This article presents a how-to guide for employing LLMs in academic settings, focusing on their unique strengths, constraints and implications through the lens of philosophy of science and epistemology. Using ChatGPT as a case study, I identify and elaborate on three attributes contributing to its effectiveness-intelligence, versatility and collaboration-accompanied by tips on crafting effective prompts, practical use cases and a living resource online (https://osf.io/8vpwu/). Next, I evaluate the limitations of generative AI and its implications for ethical use, equality and education. Regarding ethical and responsible use, I argue from technical and epistemic standpoints that there is no need to restrict the scope or nature of AI assistance, provided that its use is transparently disclosed. A pressing challenge, however, lies in detecting fake research, which can be mitigated by embracing open science practices, such as transparent peer review and sharing data, code and materials. Addressing equality, I contend that while generative AI may promote equality for some, it may simultaneously exacerbate disparities for others-an issue with potentially significant yet unclear ramifications as it unfolds. Lastly, I consider the implications for education, advocating for active engagement with LLMs and cultivating students' critical thinking and analytical skills. The how-to guide seeks to empower researchers with the knowledge and resources necessary to effectively harness generative AI while navigating the complex ethical dilemmas intrinsic to its application.
Collapse
Affiliation(s)
- Zhicheng Lin
- Programme of Applied Psychology, School of Humanities and Social Science, The Chinese University of Hong Kong, Shenzhen, Guangdong 518172, People's Republic of China
| |
Collapse
|
183
|
Şendur HN, Şendur AB, Cerit MN. ChatGPT from radiologists' perspective. Br J Radiol 2023; 96:20230203. [PMID: 37183840 PMCID: PMC10392643 DOI: 10.1259/bjr.20230203] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 04/12/2023] [Accepted: 04/23/2023] [Indexed: 05/16/2023] Open
Abstract
ChatGPT is a newly developed technology created by the OpenAI company. It is an artificial-intelligence-based large language model (LLM) and able to generate human-like text. The potential roles of ChatGPT in clinical decision support and academic writing have led to intense criticism of this technology in the scientific community. Therefore, radiologists also need to be familiar with LLMs such as ChatGPT.
Collapse
Affiliation(s)
- Halit Nahit Şendur
- Department of Radiology, Gazi University, Faculty of Medicine, Mevlana Bulvarı, Yenimahalle, Ankara, Turkey
| | - Aylin Billur Şendur
- Private Radiology Clinic, Kızılırmak Mah. 1443. Cad. No:25 1071 Plaza, Çankaya, Ankara, Turkey
| | - Mahi Nur Cerit
- Department of Radiology, Gazi University, Faculty of Medicine, Mevlana Bulvarı, Yenimahalle, Ankara, Turkey
| |
Collapse
|
184
|
Yang R, Tan TF, Lu W, Thirunavukarasu AJ, Ting DSW, Liu N. Large language models in health care: Development, applications, and challenges. HEALTH CARE SCIENCE 2023; 2:255-263. [PMID: 38939520 PMCID: PMC11080827 DOI: 10.1002/hcs2.61] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 06/10/2023] [Accepted: 06/12/2023] [Indexed: 06/29/2024]
Abstract
Recently, the emergence of ChatGPT, an artificial intelligence chatbot developed by OpenAI, has attracted significant attention due to its exceptional language comprehension and content generation capabilities, highlighting the immense potential of large language models (LLMs). LLMs have become a burgeoning hotspot across many fields, including health care. Within health care, LLMs may be classified into LLMs for the biomedical domain and LLMs for the clinical domain based on the corpora used for pre-training. In the last 3 years, these domain-specific LLMs have demonstrated exceptional performance on multiple natural language processing tasks, surpassing the performance of general LLMs as well. This not only emphasizes the significance of developing dedicated LLMs for the specific domains, but also raises expectations for their applications in health care. We believe that LLMs may be used widely in preconsultation, diagnosis, and management, with appropriate development and supervision. Additionally, LLMs hold tremendous promise in assisting with medical education, medical writing and other related applications. Likewise, health care systems must recognize and address the challenges posed by LLMs.
Collapse
Affiliation(s)
- Rui Yang
- Department of Biomedical Informatics, Yong Loo Lin School of MedicineNational University of SingaporeSingaporeSingapore
| | - Ting Fang Tan
- Singapore National Eye Center, Singapore Eye Research InstituteSingapore Health ServiceSingaporeSingapore
| | - Wei Lu
- StatNLP Research GroupSingapore University of Technology and DesignSingapore
| | | | - Daniel Shu Wei Ting
- Singapore National Eye Center, Singapore Eye Research InstituteSingapore Health ServiceSingaporeSingapore
- Duke‐NUS Medical SchoolCentre for Quantitative MedicineSingaporeSingapore
| | - Nan Liu
- Duke‐NUS Medical SchoolCentre for Quantitative MedicineSingaporeSingapore
- Duke‐NUS Medical SchoolProgramme in Health Services and Systems ResearchSingaporeSingapore
| |
Collapse
|
185
|
Ariyaratne S, Botchu R, Iyengar KP. ChatGPT in academic publishing: An ally or an adversary? Scott Med J 2023; 68:129-130. [PMID: 37151080 DOI: 10.1177/00369330231174231] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Affiliation(s)
- Sisith Ariyaratne
- Department of Musculoskeletal Radiology, Royal Orthopaedic Hospital, Birmingham, UK
| | - Rajesh Botchu
- Department of Musculoskeletal Radiology, Royal Orthopaedic Hospital, Birmingham, UK
| | | |
Collapse
|
186
|
Wornow M, Xu Y, Thapa R, Patel B, Steinberg E, Fleming S, Pfeffer MA, Fries J, Shah NH. The shaky foundations of large language models and foundation models for electronic health records. NPJ Digit Med 2023; 6:135. [PMID: 37516790 PMCID: PMC10387101 DOI: 10.1038/s41746-023-00879-8] [Citation(s) in RCA: 43] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 07/13/2023] [Indexed: 07/31/2023] Open
Abstract
The success of foundation models such as ChatGPT and AlphaFold has spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models' capabilities. In this narrative review, we examine 84 foundation models trained on non-imaging EMR data (i.e., clinical text and/or structured data) and create a taxonomy delineating their architectures, training data, and potential use cases. We find that most models are trained on small, narrowly-scoped clinical datasets (e.g., MIMIC-III) or broad, public biomedical corpora (e.g., PubMed) and are evaluated on tasks that do not provide meaningful insights on their usefulness to health systems. Considering these findings, we propose an improved evaluation framework for measuring the benefits of clinical foundation models that is more closely grounded to metrics that matter in healthcare.
Collapse
Affiliation(s)
- Michael Wornow
- Department of Computer Science, Stanford University, Stanford, CA, USA.
| | - Yizhe Xu
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
| | - Rahul Thapa
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
| | - Birju Patel
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
| | - Ethan Steinberg
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Scott Fleming
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
| | - Michael A Pfeffer
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
- Technology and Digital Services, Stanford Health Care, Palo Alto, CA, USA
| | - Jason Fries
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
| | - Nigam H Shah
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
- Technology and Digital Services, Stanford Health Care, Palo Alto, CA, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Clinical Excellence Research Center, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
187
|
Kung JE, Marshall C, Gauthier C, Gonzalez TA, Jackson JB. Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination. JB JS Open Access 2023; 8:e23.00056. [PMID: 37693092 PMCID: PMC10484364 DOI: 10.2106/jbjs.oa.23.00056] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/12/2023] Open
Abstract
Background Artificial intelligence (AI) holds potential in improving medical education and healthcare delivery. ChatGPT is a state-of-the-art natural language processing AI model which has shown impressive capabilities, scoring in the top percentiles on numerous standardized examinations, including the Uniform Bar Exam and Scholastic Aptitude Test. The goal of this study was to evaluate ChatGPT performance on the Orthopaedic In-Training Examination (OITE), an assessment of medical knowledge for orthopedic residents. Methods OITE 2020, 2021, and 2022 questions without images were inputted into ChatGPT version 3.5 and version 4 (GPT-4) with zero prompting. The performance of ChatGPT was evaluated as a percentage of correct responses and compared with the national average of orthopedic surgery residents at each postgraduate year (PGY) level. ChatGPT was asked to provide a source for its answer, which was categorized as being a journal article, book, or website, and if the source could be verified. Impact factor for the journal cited was also recorded. Results ChatGPT answered 196 of 360 answers correctly (54.3%), corresponding to a PGY-1 level. ChatGPT cited a verifiable source in 47.2% of questions, with an average median journal impact factor of 5.4. GPT-4 answered 265 of 360 questions correctly (73.6%), corresponding to the average performance of a PGY-5 and exceeding the corresponding passing score for the American Board of Orthopaedic Surgery Part I Examination of 67%. GPT-4 cited a verifiable source in 87.9% of questions, with an average median journal impact factor of 5.2. Conclusions ChatGPT performed above the average PGY-1 level and GPT-4 performed better than the average PGY-5 level, showing major improvement. Further investigation is needed to determine how successive versions of ChatGPT would perform and how to optimize this technology to improve medical education. Clinical Relevance AI has the potential to aid in medical education and healthcare delivery.
Collapse
Affiliation(s)
- Justin E. Kung
- Department of Orthopedic Surgery, Prisma Health-Midlands University of South Carolina, Columbia, South Carolina
| | | | - Chase Gauthier
- Department of Orthopedic Surgery, Prisma Health-Midlands University of South Carolina, Columbia, South Carolina
| | - Tyler A. Gonzalez
- Department of Orthopedic Surgery, Prisma Health-Midlands University of South Carolina, Columbia, South Carolina
| | - J. Benjamin Jackson
- Department of Orthopedic Surgery, Prisma Health-Midlands University of South Carolina, Columbia, South Carolina
| |
Collapse
|
188
|
Mago J, Sharma M. The Potential Usefulness of ChatGPT in Oral and Maxillofacial Radiology. Cureus 2023; 15:e42133. [PMID: 37476297 PMCID: PMC10355343 DOI: 10.7759/cureus.42133] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/19/2023] [Indexed: 07/22/2023] Open
Abstract
Aim This study aimed to evaluate the potential usefulness of Chat Generated Pre-Trained Transformer-3 (ChatGPT-3) in oral and maxillofacial radiology for report writing by identifying radiographic anatomical landmarks and learning about oral and maxillofacial pathologies and their radiographic features. The study also aimed to evaluate the performance of ChatGPT-3 and its usage in oral and maxillofacial radiology training. Materials and methods A questionnaire consisting of 80 questions was queried on the OpenAI app ChatGPT-3. The questions were stratified based on three categories. The categorization was based on random anatomical landmarks, oral and maxillofacial pathologies, and the radiographic features of some of these pathologies. One oral and maxillofacial radiologist evaluated queries that were answered by the ChatGPT-3 model and rated them on a 4-point, modified Likert scale. The post-survey analysis for the performance of ChatGPT-3 was based on the Strengths, Weaknesses, Opportunities, and Threats (SWOT) analysis, its application in oral and maxillofacial radiology training, and its recommended use. Results In order of efficiency, Chat GPT-3 gave 100% accuracy in describing radiographic landmarks. However, the content of the oral and maxillofacial pathologies was limited to major or characteristic radiographic features. The mean scores for the queries related to the anatomic landmarks, oral and maxillofacial pathologies, and radiographic features of the oral and maxillofacial pathologies were 3.94, 3.85, and 3.96, respectively. However, the median and mode scores were 4 and were similar to all categories. The data for the oral and maxillofacial pathologies when the questions were not specifically included in the format of the introduction of the pathology, causes, symptoms, and treatment. Out of two abbreviations, one was not answered correctly. Conclusion The study showed that ChatGPT-3 is efficient in describing the pathology, characteristic radiographic features, and describing anatomical landmarks. ChatGPT-3 can be used as an adjunct when an oral radiologist needs additional information on any pathology, however, it cannot be the mainstay for reference. ChatGPT-3 is less detail-oriented, and the data has a risk of infodemics and the possibility of medical errors. However, Chat GPT-3 can be an excellent tool in helping the community in increasing the knowledge and awareness of various pathologies and decreasing the anxiety of the patients while dental healthcare professionals formulate an appropriate treatment plan.
Collapse
Affiliation(s)
- Jyoti Mago
- Oral and Maxillofacial Radiology, University of Nevada, Las Vegas (UNLV), Las Vegas, USA
| | - Manoj Sharma
- Public Health, University of Nevada, Las Vegas (UNLV), Las Vegas, USA
| |
Collapse
|
189
|
Grech V, Cuschieri S, Eldawlatly AA. Artificial intelligence in medicine and research - the good, the bad, and the ugly. Saudi J Anaesth 2023; 17:401-406. [PMID: 37601525 PMCID: PMC10435812 DOI: 10.4103/sja.sja_344_23] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 04/26/2023] [Indexed: 08/22/2023] Open
Abstract
Artificial intelligence (AI) broadly refers to machines that simulate intelligent human behavior, and research into this field is exponential and worldwide, with global players such as Microsoft battling with Google for supremacy and market share. This paper reviews the "good" aspects of AI in medicine for individuals who embrace the 4P model of medicine (Predictive, Preventive, Personalized, and Participatory) to medical assistants in diagnostics, surgery, and research. The "bad" aspects relate to the potential for errors, culpability, ethics, data loss and data breaches, and so on. The "ugly" aspects are deliberate personal malfeasances and outright scientific misconduct including the ease of plagiarism and fabrication, with particular reference to the novel ChatGPT as well as AI software that can also fabricate graphs and images. The issues pertaining to the potential dangers of creating rogue, super-intelligent AI systems that lead to a technological singularity and the ensuing perceived existential threat to mankind by leading AI researchers are also briefly discussed.
Collapse
|
190
|
Liu J, Wang C, Liu S. Utility of ChatGPT in Clinical Practice. J Med Internet Res 2023; 25:e48568. [PMID: 37379067 PMCID: PMC10365580 DOI: 10.2196/48568] [Citation(s) in RCA: 79] [Impact Index Per Article: 79.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 05/29/2023] [Accepted: 06/15/2023] [Indexed: 06/29/2023] Open
Abstract
ChatGPT is receiving increasing attention and has a variety of application scenarios in clinical practice. In clinical decision support, ChatGPT has been used to generate accurate differential diagnosis lists, support clinical decision-making, optimize clinical decision support, and provide insights for cancer screening decisions. In addition, ChatGPT has been used for intelligent question-answering to provide reliable information about diseases and medical queries. In terms of medical documentation, ChatGPT has proven effective in generating patient clinical letters, radiology reports, medical notes, and discharge summaries, improving efficiency and accuracy for health care providers. Future research directions include real-time monitoring and predictive analytics, precision medicine and personalized treatment, the role of ChatGPT in telemedicine and remote health care, and integration with existing health care systems. Overall, ChatGPT is a valuable tool that complements the expertise of health care providers and improves clinical decision-making and patient care. However, ChatGPT is a double-edged sword. We need to carefully consider and study the benefits and potential dangers of ChatGPT. In this viewpoint, we discuss recent advances in ChatGPT research in clinical practice and suggest possible risks and challenges of using ChatGPT in clinical practice. It will help guide and support future artificial intelligence research similar to ChatGPT in health.
Collapse
Affiliation(s)
- Jialin Liu
- Information Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Medical Informatics, West China Medical School, Chengdu, China
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital, Sichuan University, Chengdu, China
| | - Changyu Wang
- Information Center, West China Hospital, Sichuan University, Chengdu, China
- West China College of Stomatology, Sichuan University, Chengdu, China
| | - Siru Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| |
Collapse
|
191
|
Kusunose K, Kashima S, Sata M. Evaluation of the Accuracy of ChatGPT in Answering Clinical Questions on the Japanese Society of Hypertension Guidelines. Circ J 2023; 87:1030-1033. [PMID: 37286486 DOI: 10.1253/circj.cj-23-0308] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
BACKGROUND To assist healthcare providers in interpreting guidelines, clinical questions (CQ) are often included, but not always, which can make interpretation difficult for non-expert clinicians. We evaluated the ability of ChatGPT to accurately answer CQs on the Japanese Society of Hypertension Guidelines for the Management of Hypertension (JSH 2019). METHODS AND RESULTS We conducted an observational study using data from JSH 2019. The accuracy rate for CQs and limited evidence-based questions of the guidelines (Qs) were evaluated. ChatGPT demonstrated a higher accuracy rate for CQs than for Qs (80% vs. 36%, P value: 0.005). CONCLUSIONS ChatGPT has the potential to be a valuable tool for clinicians in the management of hypertension.
Collapse
Affiliation(s)
- Kenya Kusunose
- Department of Cardiovascular Medicine, Tokushima University Hospital
- Department of Cardiovascular Medicine, Nephrology, and Neurology, Graduate School of Medicine, University of the Ryukyus
| | - Shuichiro Kashima
- Department of Cardiovascular Medicine, Tokushima University Hospital
| | - Masataka Sata
- Department of Cardiovascular Medicine, Tokushima University Hospital
| |
Collapse
|
192
|
Taylor CR, Monga N, Johnson C, Hawley JR, Patel M. Artificial Intelligence Applications in Breast Imaging: Current Status and Future Directions. Diagnostics (Basel) 2023; 13:2041. [PMID: 37370936 DOI: 10.3390/diagnostics13122041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 05/20/2023] [Accepted: 05/29/2023] [Indexed: 06/29/2023] Open
Abstract
Attempts to use computers to aid in the detection of breast malignancies date back more than 20 years. Despite significant interest and investment, this has historically led to minimal or no significant improvement in performance and outcomes with traditional computer-aided detection. However, recent advances in artificial intelligence and machine learning are now starting to deliver on the promise of improved performance. There are at present more than 20 FDA-approved AI applications for breast imaging, but adoption and utilization are widely variable and low overall. Breast imaging is unique and has aspects that create both opportunities and challenges for AI development and implementation. Breast cancer screening programs worldwide rely on screening mammography to reduce the morbidity and mortality of breast cancer, and many of the most exciting research projects and available AI applications focus on cancer detection for mammography. There are, however, multiple additional potential applications for AI in breast imaging, including decision support, risk assessment, breast density quantitation, workflow and triage, quality evaluation, response to neoadjuvant chemotherapy assessment, and image enhancement. In this review the current status, availability, and future directions of investigation of these applications are discussed, as well as the opportunities and barriers to more widespread utilization.
Collapse
Affiliation(s)
- Clayton R Taylor
- Department of Radiology, The Ohio State University Wexner Medical Center, Columbus, OH 43210, USA
| | - Natasha Monga
- Department of Radiology, The Ohio State University Wexner Medical Center, Columbus, OH 43210, USA
| | - Candise Johnson
- Department of Radiology, The Ohio State University Wexner Medical Center, Columbus, OH 43210, USA
| | - Jeffrey R Hawley
- Department of Radiology, The Ohio State University Wexner Medical Center, Columbus, OH 43210, USA
| | - Mitva Patel
- Department of Radiology, The Ohio State University Wexner Medical Center, Columbus, OH 43210, USA
| |
Collapse
|
193
|
Darzidehkalani E. ChatGPT in Medical Publications. Radiology 2023; 307:e231188. [PMID: 37278630 DOI: 10.1148/radiol.231188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Affiliation(s)
- Erfan Darzidehkalani
- CSAIL, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139
| |
Collapse
|
194
|
Yu H. Reflection on whether Chat GPT should be banned by academia from the perspective of education and teaching. Front Psychol 2023; 14:1181712. [PMID: 37325766 PMCID: PMC10267436 DOI: 10.3389/fpsyg.2023.1181712] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 05/16/2023] [Indexed: 06/17/2023] Open
|
195
|
Choi EPH, Lee JJ, Ho MH, Kwok JYY, Lok KYW. Chatting or cheating? The impacts of ChatGPT and other artificial intelligence language models on nurse education. NURSE EDUCATION TODAY 2023; 125:105796. [PMID: 36934624 DOI: 10.1016/j.nedt.2023.105796] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 03/02/2023] [Accepted: 03/09/2023] [Indexed: 06/18/2023]
Affiliation(s)
- Edmond Pui Hang Choi
- School of Nursing, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong.
| | - Jung Jae Lee
- School of Nursing, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong
| | - Mu-Hsing Ho
- School of Nursing, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong
| | - Jojo Yan Yan Kwok
- School of Nursing, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong
| | - Kris Yuet Wan Lok
- School of Nursing, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong
| |
Collapse
|
196
|
Lourenco AP, Slanetz PJ, Baird GL. Rise of ChatGPT: It May Be Time to Reassess How We Teach and Test Radiology Residents. Radiology 2023:231053. [PMID: 37191490 DOI: 10.1148/radiol.231053] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Affiliation(s)
- Ana P Lourenco
- From the Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University and Rhode Island Hospital, 593 Eddy St, 3rd Floor, Providence, RI 02903 (A.P.L., G.L.B.); and Department of Radiology, Boston University Medical Center, Boston, Mass (P.J.S.)
| | - Priscilla J Slanetz
- From the Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University and Rhode Island Hospital, 593 Eddy St, 3rd Floor, Providence, RI 02903 (A.P.L., G.L.B.); and Department of Radiology, Boston University Medical Center, Boston, Mass (P.J.S.)
| | - Grayson L Baird
- From the Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University and Rhode Island Hospital, 593 Eddy St, 3rd Floor, Providence, RI 02903 (A.P.L., G.L.B.); and Department of Radiology, Boston University Medical Center, Boston, Mass (P.J.S.)
| |
Collapse
|
197
|
Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations. Radiology 2023:230582. [PMID: 37191485 DOI: 10.1148/radiol.230582] [Citation(s) in RCA: 127] [Impact Index Per Article: 127.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Background ChatGPT is a powerful artificial intelligence large language model with great potential as a tool in medical practice and education, but its performance in radiology remains unclear. Purpose To assess the performance of ChatGPT on radiology board-style examination questions without images and to explore its strengths and limitations. Materials and Methods In this exploratory prospective study performed from February 25 to March 3, 2023, 150 multiple-choice questions designed to match the style, content, and difficulty of the Canadian Royal College and American Board of Radiology examinations were grouped by question type (lower-order [recall, understanding] and higher-order [apply, analyze, synthesize] thinking) and topic (physics, clinical). The higher-order thinking questions were further subclassified by type (description of imaging findings, clinical management, application of concepts, calculation and classification, disease associations). ChatGPT performance was evaluated overall, by question type, and by topic. Confidence of language in responses was assessed. Univariable analysis was performed. Results ChatGPT answered 69% of questions correctly (104 of 150). The model performed better on questions requiring lower-order thinking (84%, 51 of 61) than on those requiring higher-order thinking (60%, 53 of 89) (P = .002). When compared with lower-order questions, the model performed worse on questions involving description of imaging findings (61%, 28 of 46; P = .04), calculation and classification (25%, two of eight; P = .01), and application of concepts (30%, three of 10; P = .01). ChatGPT performed as well on higher-order clinical management questions (89%, 16 of 18) as on lower-order questions (P = .88). It performed worse on physics questions (40%, six of 15) than on clinical questions (73%, 98 of 135) (P = .02). ChatGPT used confident language consistently, even when incorrect (100%, 46 of 46). Conclusion Despite no radiology-specific pretraining, ChatGPT nearly passed a radiology board-style examination without images; it performed well on lower-order thinking questions and clinical management questions but struggled with higher-order thinking questions involving description of imaging findings, calculation and classification, and application of concepts. © RSNA, 2023 See also the editorial by Lourenco et al in this issue.
Collapse
Affiliation(s)
- Rajesh Bhayana
- From the University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Mount Sinai Hospital and Women's College Hospital, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Mulk Building, 1st Fl, Toronto, ON, Canada M5G 24C
| | - Satheesh Krishna
- From the University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Mount Sinai Hospital and Women's College Hospital, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Mulk Building, 1st Fl, Toronto, ON, Canada M5G 24C
| | - Robert R Bleakney
- From the University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Mount Sinai Hospital and Women's College Hospital, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Mulk Building, 1st Fl, Toronto, ON, Canada M5G 24C
| |
Collapse
|
198
|
Nune A, Iyengar KP, Manzo C, Barman B, Botchu R. Chat generative pre-trained transformer (ChatGPT): potential implications for rheumatology practice. Rheumatol Int 2023; 43:1379-1380. [PMID: 37145135 DOI: 10.1007/s00296-023-05340-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 04/29/2023] [Indexed: 05/06/2023]
Affiliation(s)
- Arvind Nune
- Department of Rheumatology and General Medicine, Southport and Ormskirk NHS Trust, Southport, PR8 6PN, UK.
| | - Karthikeyan P Iyengar
- Department of Trauma and Orthopaedics, Southport and Ormskirk NHS Trust, Southport, PR8 6PN, UK
| | - Ciro Manzo
- Rheumatology Outpatient Clinic, Azienda Sanitaria Locale Napoli 3 Sud, Mariano Lauro Hospital, Sant'Agnello, Naples, Italy
| | - Bhupen Barman
- Department of General Medicine, All India Institute of Medical Sciences, Guwahati, Assam, India
| | - Rajesh Botchu
- Department of Musculoskeletal Radiology, Royal Orthopaedic Hospital, Birmingham, B31 2AP, UK
| |
Collapse
|
199
|
Singh S, Djalilian A, Ali MJ. ChatGPT and Ophthalmology: Exploring Its Potential with Discharge Summaries and Operative Notes. Semin Ophthalmol 2023:1-5. [PMID: 37133418 DOI: 10.1080/08820538.2023.2209166] [Citation(s) in RCA: 41] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
PURPOSE This study aimed to report the abilities of the large language model ChatGPTR (OpenAI, San Francisco, USA) in constructing ophthalmic discharge summaries and operative notes. METHODS A set of prompts was constructed through statements incorporating common ophthalmic surgeries across the subspecialties of the cornea, retina, glaucoma, paediatric ophthalmology, neuro-ophthalmology, and ophthalmic plastics surgery. The responses of ChatGPT were assessed by three surgeons carefully and analyzed them for evidence-based content, specificity of the response, presence of generic text, disclaimers, factual inaccuracies, and its abilities to admit mistakes and challenge incorrect premises. RESULTS A total of 24 prompts were presented to the ChatGPT. Twelve prompts assessed its ability to construct discharge summaries, and an equal number explored the potential for preparing operative notes. The response was found to be tailored based on the quality of inputs given and was provided in a matter of seconds. The ophthalmic discharge summaries had a valid but significant generic text. ChatGPT could incorporate specific medications, follow-up instructions, consultation time, and location within the discharge summaries when prompted appropriately. While the operative notes were detailed, they required significant tuning. ChatGPT routinely admits its mistakes and corrects itself immediately when confronted with factual inaccuracies. The mistakes are avoided in subsequent reports when given similar prompts. CONCLUSION The performance of ChatGPT in the context of ophthalmic discharge summaries and operative notes was encouraging. These are constructed rapidly in a matter of seconds. Focused training of ChatGPT on these issues with inclusion of a human verification step has an enormous potential to impact healthcare positively.
Collapse
Affiliation(s)
- Swati Singh
- Ophthalmic Plastic Surgery Service, L.V. Prasad Eye Institute, Hyderabad, India
| | - Ali Djalilian
- Department of Ophthalmology, University of Illinois, Chicago, Illinois, USA
| | - Mohammad Javed Ali
- Govindram Seksaria Institute of Dacryology, L.V. Prasad Eye Institute, Hyderabad, India
| |
Collapse
|
200
|
Ufuk F. The Role and Limitations of Large Language Models Such as ChatGPT in Clinical Settings and Medical Journalism. Radiology 2023; 307:e230276. [PMID: 36880943 DOI: 10.1148/radiol.230276] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Affiliation(s)
- Furkan Ufuk
- Department of Radiology, School of Medicine, University of Pamukkale, Denizli, Turkey
| |
Collapse
|