1
|
Leng L. Challenge, integration, and change: ChatGPT and future anatomical education. MEDICAL EDUCATION ONLINE 2024; 29:2304973. [PMID: 38217884 PMCID: PMC10791098 DOI: 10.1080/10872981.2024.2304973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/08/2024] [Indexed: 01/15/2024]
Abstract
With the vigorous development of ChatGPT and its application in the field of education, a new era of the collaborative development of human and artificial intelligence and the symbiosis of education has come. Integrating artificial intelligence (AI) into medical education has the potential to revolutionize it. Large language models, such as ChatGPT, can be used as virtual teaching aids to provide students with individualized and immediate medical knowledge, and conduct interactive simulation learning and detection. In this paper, we discuss the application of ChatGPT in anatomy teaching and its various application levels based on our own teaching experiences, and discuss the advantages and disadvantages of ChatGPT in anatomy teaching. ChatGPT increases student engagement and strengthens students' ability to learn independently. At the same time, ChatGPT faces many challenges and limitations in medical education. Medical educators must keep pace with the rapid changes in technology, taking into account ChatGPT's impact on curriculum design, assessment strategies and teaching methods. Discussing the application of ChatGPT in medical education, especially anatomy teaching, is helpful to the effective integration and application of artificial intelligence tools in medical education.
Collapse
Affiliation(s)
- Lige Leng
- Fujian Provincial Key Laboratory of Neurodegenerative Disease and Aging Research, Institute of Neuroscience, School of Medicine, Xiamen University, Xiamen, Fujian, P.R. China
| |
Collapse
|
2
|
Dergaa I, Ben Saad H, Glenn JM, Ben Aissa M, Taheri M, Swed S, Guelmami N, Chamari K. A thorough examination of ChatGPT-3.5 potential applications in medical writing: A preliminary study. Medicine (Baltimore) 2024; 103:e39757. [PMID: 39465713 PMCID: PMC11460921 DOI: 10.1097/md.0000000000039757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Indexed: 10/29/2024] Open
Abstract
Effective communication of scientific knowledge plays a crucial role in the advancement of medical research and health care. Technological advancements have introduced large language models such as Chat Generative Pre-Trained Transformer (ChatGPT), powered by artificial intelligence (AI), which has already shown promise in revolutionizing medical writing. This study aimed to conduct a detailed evaluation of ChatGPT-3.5's role in enhancing various aspects of medical writing. From May 10 to 12, 2023, the authors engaged in a series of interactions with ChatGPT-3.5 to evaluate its effectiveness in various tasks, particularly its application to medical writing, including vocabulary enhancement, text rewriting for plagiarism prevention, hypothesis generation, keyword generation, title generation, article summarization, simplification of medical jargon, transforming text from informal to scientific and data interpretation. The exploration of ChatGPT's functionalities in medical writing revealed its potential in enhancing various aspects of the writing process, demonstrating its efficiency in improving vocabulary usage, suggesting alternative phrasing, and providing grammar enhancements. While the results indicate the effectiveness of ChatGPT (version 3.5), the presence of certain imperfections highlights the current indispensability of human intervention to refine and validate outputs, ensuring accuracy and relevance in medical settings. The integration of AI into medical writing shows significant potential for improving clarity, efficiency, and reliability. This evaluation highlights both the benefits and limitations of using ChatGPT-3.5, emphasizing its ability to enhance vocabulary, prevent plagiarism, generate hypotheses, suggest keywords, summarize articles, simplify medical jargon, and transform informal text into an academic format. However, AI tools should not replace human expertise. It is crucial for medical professionals to ensure thorough human review and validation to maintain the accuracy and relevance of the content in case they eventually use AI as a supplementary resource in medical writing. Accepting this mutually symbiotic partnership holds the promise of improving medical research and patient outcomes, and it sets the stage for the fusion of AI and human knowledge to produce a novel approach to medical assessment. Thus, while AI can streamline certain tasks, experienced medical writers and researchers must perform final reviews to uphold high standards in medical communications.
Collapse
Affiliation(s)
- Ismail Dergaa
- Departement of Preventative Health, Primary Health Care Corporation (PHCC), Doha, Qatar
| | - Helmi Ben Saad
- Farhat HACHED Hospital, Service of Physiology and Functional Explorations, University of Sousse, Sousse, Tunisia
- Heart Failure (LR12SP09) Research Laboratory, Farhat HACHED Hospital, University of Sousse, Sousse, Tunisia
- Faculty of Medicine of Sousse, Laboratory of Physiology, University of Sousse, Sousse, Tunisia
| | - Jordan M. Glenn
- Department of Health, Exercise Science Research Center Human Performance and Recreation, University of Arkansas, Fayetteville, AR
| | - Mohamed Ben Aissa
- Department of Human and Social Sciences, Higher Institute of Sport and Physical Education of Kef, University of Jendouba, Jendouba, Tunisia
| | - Morteza Taheri
- Institute of Future Studies, Imam Khomeini International University, Qazvi, Iran
| | - Sarya Swed
- Faculty of Medicine, Aleppo University, Aleppo, Syria
| | - Noomen Guelmami
- Department of Health Sciences, Dipartimento di scienze della salute (DISSAL), Postgraduate School of Public Health, University of Genoa, Genoa, Italy
| | - Karim Chamari
- Naufar, Wellness and Recovery Center, Doha, Qatar
- High Institute of Sport and Physical Education, University of Manouba, Tunis, Tunisia
| |
Collapse
|
3
|
Kim J, Wang K, Weng C, Liu C. Assessing the utility of large language models for phenotype-driven gene prioritization in the diagnosis of rare genetic disease. Am J Hum Genet 2024; 111:2190-2202. [PMID: 39255797 PMCID: PMC11480789 DOI: 10.1016/j.ajhg.2024.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 08/08/2024] [Accepted: 08/13/2024] [Indexed: 09/12/2024] Open
Abstract
Phenotype-driven gene prioritization is fundamental to diagnosing rare genetic disorders. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models (LLMs) promise a streamlined text-to-gene solution. In this study, we evaluated five LLMs, including two generative pre-trained transformers (GPT) series and three Llama2 series, assessing their performance across task completeness, gene prediction accuracy, and adherence to required output structures. We conducted experiments, exploring various combinations of models, prompts, phenotypic input types, and task difficulty levels. Our findings revealed that the best-performed LLM, GPT-4, achieved an average accuracy of 17.0% in identifying diagnosed genes within the top 50 predictions, which still falls behind traditional tools. However, accuracy increased with the model size. Consistent results were observed over time, as shown in the dataset curated after 2023. Advanced techniques such as retrieval-augmented generation (RAG) and few-shot learning did not improve the accuracy. Sophisticated prompts were more likely to enhance task completeness, especially in smaller models. Conversely, complicated prompts tended to decrease output structure compliance rate. LLMs also achieved better-than-random prediction accuracy with free-text input, though performance was slightly lower than with standardized concept input. Bias analysis showed that highly cited genes, such as BRCA1, TP53, and PTEN, are more likely to be predicted. Our study provides valuable insights into integrating LLMs with genomic analysis, contributing to the ongoing discussion on their utilization in clinical workflows.
Collapse
Affiliation(s)
- Junyoung Kim
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
4
|
Martinson AK, Chin AT, Butte MJ, Rider NL. Artificial Intelligence and Machine Learning for Inborn Errors of Immunity: Current State and Future Promise. THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2024; 12:2695-2704. [PMID: 39127104 DOI: 10.1016/j.jaip.2024.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 07/10/2024] [Accepted: 08/01/2024] [Indexed: 08/12/2024]
Abstract
Artificial intelligence (AI) and machine learning (ML) research within medicine has exponentially increased over the last decade, with studies showcasing the potential of AI/ML algorithms to improve clinical practice and outcomes. Ongoing research and efforts to develop AI-based models have expanded to aid in the identification of inborn errors of immunity (IEI). The use of larger electronic health record data sets, coupled with advances in phenotyping precision and enhancements in ML techniques, has the potential to significantly improve the early recognition of IEI, thereby increasing access to equitable care. In this review, we provide a comprehensive examination of AI/ML for IEI, covering the spectrum from data preprocessing for AI/ML analysis to current applications within immunology, and address the challenges associated with implementing clinical decision support systems to refine the diagnosis and management of IEI.
Collapse
Affiliation(s)
| | - Aaron T Chin
- Department of Pediatrics, Division of Immunology, Allergy and Rheumatology, University of California, Los Angeles, Los Angeles, Calif
| | - Manish J Butte
- Department of Pediatrics, Division of Immunology, Allergy and Rheumatology, University of California, Los Angeles, Los Angeles, Calif
| | - Nicholas L Rider
- Department of Health Systems & Implementation Science, Virginia Tech Carilion School of Medicine, Roanoke, Va; Department of Medicine, Division of Allergy-Immunology, Carilion Clinic, Roanoke, Va.
| |
Collapse
|
5
|
Balta KY, Javidan AP, Walser E, Arntfield R, Prager R. Evaluating the Appropriateness, Consistency, and Readability of ChatGPT in Critical Care Recommendations. J Intensive Care Med 2024:8850666241267871. [PMID: 39118320 DOI: 10.1177/08850666241267871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/10/2024]
Abstract
Background: We assessed 2 versions of the large language model (LLM) ChatGPT-versions 3.5 and 4.0-in generating appropriate, consistent, and readable recommendations on core critical care topics. Research Question: How do successive large language models compare in terms of generating appropriate, consistent, and readable recommendations on core critical care topics? Design and Methods: A set of 50 LLM-generated responses to clinical questions were evaluated by 2 independent intensivists based on a 5-point Likert scale for appropriateness, consistency, and readability. Results: ChatGPT 4.0 showed significantly higher median appropriateness scores compared to ChatGPT 3.5 (4.0 vs 3.0, P < .001). However, there was no significant difference in consistency between the 2 versions (40% vs 28%, P = 0.291). Readability, assessed by the Flesch-Kincaid Grade Level, was also not significantly different between the 2 models (14.3 vs 14.4, P = 0.93). Interpretation: Both models produced "hallucinations"-misinformation delivered with high confidence-which highlights the risk of relying on these tools without domain expertise. Despite potential for clinical application, both models lacked consistency producing different results when asked the same question multiple times. The study underscores the need for clinicians to understand the strengths and limitations of LLMs for safe and effective implementation in critical care settings. Registration: https://osf.io/8chj7/.
Collapse
Affiliation(s)
- Kaan Y Balta
- Schulich School of Medicine & Dentistry, Western University, London, Ontario, Canada
| | - Arshia P Javidan
- Division of Vascular Surgery, Department of Surgery, University of Toronto, Toronto, Ontario, Canada
| | - Eric Walser
- Division of Critical Care, London Health Sciences Centre, Western University, London, Ontario, Canada
- Department of Surgery, Trauma Program, London Health Sciences Centre, London, Ontario, Canada
| | - Robert Arntfield
- Division of Critical Care, London Health Sciences Centre, Western University, London, Ontario, Canada
| | - Ross Prager
- Division of Critical Care, London Health Sciences Centre, Western University, London, Ontario, Canada
| |
Collapse
|
6
|
Su Z, Tang G, Huang R, Qiao Y, Zhang Z, Dai X. Based on Medicine, The Now and Future of Large Language Models. Cell Mol Bioeng 2024; 17:263-277. [PMID: 39372551 PMCID: PMC11450117 DOI: 10.1007/s12195-024-00820-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 09/08/2024] [Indexed: 10/08/2024] Open
Abstract
Objectives This review explores the potential applications of large language models (LLMs) such as ChatGPT, GPT-3.5, and GPT-4 in the medical field, aiming to encourage their prudent use, provide professional support, and develop accessible medical AI tools that adhere to healthcare standards. Methods This paper examines the impact of technologies such as OpenAI's Generative Pre-trained Transformers (GPT) series, including GPT-3.5 and GPT-4, and other large language models (LLMs) in medical education, scientific research, clinical practice, and nursing. Specifically, it includes supporting curriculum design, acting as personalized learning assistants, creating standardized simulated patient scenarios in education; assisting with writing papers, data analysis, and optimizing experimental designs in scientific research; aiding in medical imaging analysis, decision-making, patient education, and communication in clinical practice; and reducing repetitive tasks, promoting personalized care and self-care, providing psychological support, and enhancing management efficiency in nursing. Results LLMs, including ChatGPT, have demonstrated significant potential and effectiveness in the aforementioned areas, yet their deployment in healthcare settings is fraught with ethical complexities, potential lack of empathy, and risks of biased responses. Conclusion Despite these challenges, significant medical advancements can be expected through the proper use of LLMs and appropriate policy guidance. Future research should focus on overcoming these barriers to ensure the effective and ethical application of LLMs in the medical field.
Collapse
Affiliation(s)
- Ziqing Su
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Guozhang Tang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The Second Clinical College of Anhui Medical University, Hefei, 230032 Anhui P.R. China
| | - Rui Huang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Yang Qiao
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
| | - Zheng Zhang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Xingliang Dai
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Research & Development, East China Institute of Digital Medical Engineering, Shangrao, 334000 P.R. China
| |
Collapse
|
7
|
Keshavarz P, Bagherieh S, Nabipoorashrafi SA, Chalian H, Rahsepar AA, Kim GHJ, Hassani C, Raman SS, Bedayat A. ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives. Diagn Interv Imaging 2024; 105:251-265. [PMID: 38679540 DOI: 10.1016/j.diii.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/11/2024] [Accepted: 04/16/2024] [Indexed: 05/01/2024]
Abstract
PURPOSE The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications. MATERIALS AND METHODS After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications. RESULTS Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists' decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks. CONCLUSION Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.
Collapse
Affiliation(s)
- Pedram Keshavarz
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA; School of Science and Technology, The University of Georgia, Tbilisi 0171, Georgia
| | - Sara Bagherieh
- Independent Clinical Radiology Researcher, Los Angeles, CA 90024, USA
| | | | - Hamid Chalian
- Department of Radiology, Cardiothoracic Imaging, University of Washington, Seattle, WA 98195, USA
| | - Amir Ali Rahsepar
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Grace Hyun J Kim
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA; Department of Radiological Sciences, Center for Computer Vision and Imaging Biomarkers, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Cameron Hassani
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Steven S Raman
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Arash Bedayat
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA.
| |
Collapse
|
8
|
Xu R, Wang Z. Generative artificial intelligence in healthcare from the perspective of digital media: Applications, opportunities and challenges. Heliyon 2024; 10:e32364. [PMID: 38975200 PMCID: PMC11225727 DOI: 10.1016/j.heliyon.2024.e32364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 06/03/2024] [Indexed: 07/09/2024] Open
Abstract
Introduction The emergence and application of generative artificial intelligence/large language models (hereafter GenAI LLMs) have the potential for significant impact on the healthcare industry. However, there is currently a lack of systematic research on GenAI LLMs in healthcare based on reliable data. This article aims to conduct an exploratory study of the application of GenAI LLMs (i.e., ChatGPT) in healthcare from the perspective of digital media (i.e., online news), including the application scenarios, potential opportunities, and challenges. Methods This research used thematic qualitative text analysis in five steps: firstly, developing main topical categories based on relevant articles; secondly, encoding the search keywords using these categories; thirdly, conducting searches for news articles via Google ; fourthly, encoding the sub-categories using the elaborate category system; and finally, conducting category-based analysis and presenting the results. Natural language processing techniques, including the TermRaider and AntConc tool, were applied in the aforementioned steps to assist in text qualitative analysis. Additionally, this study built a framework, using for analyzing the above three topics, from the perspective of five different stakeholders, including healthcare demanders and providers. Results This study summarizes 26 applications (e.g., provide medical advice, provide diagnosis and triage recommendations, provide mental health support, etc.), 21 opportunities (e.g., make healthcare more accessible, reduce healthcare costs, improve patients care, etc.), and 17 challenges (e.g., generate inaccurate/misleading/wrong answers, raise privacy concerns, lack of transparency, etc.), and analyzes the reasons for the formation of these key items and the links between the three research topics. Conclusions The application of GenAI LLMs in healthcare is primarily focused on transforming the way healthcare demanders access medical services (i.e., making it more intelligent, refined, and humane) and optimizing the processes through which healthcare providers offer medical services (i.e., simplifying, ensuring timeliness, and reducing errors). As the application becomes more widespread and deepens, GenAI LLMs is expected to have a revolutionary impact on traditional healthcare service models, but it also inevitably raises ethical and security concerns. Furthermore, GenAI LLMs applied in healthcare is still in the initial stage, which can be accelerated from a specific healthcare field (e.g., mental health) or a specific mechanism (e.g., GenAI LLMs' economic benefits allocation mechanism applied to healthcare) with empirical or clinical research.
Collapse
Affiliation(s)
- Rui Xu
- School of Economics, Guangdong University of Technology, Guangzhou, China
| | - Zhong Wang
- School of Economics, Guangdong University of Technology, Guangzhou, China
- Key Laboratory of Digital Economy and Data Governance, Guangdong University of Technology, Guangzhou, China
| |
Collapse
|
9
|
Yang S, Chang MC. The assessment of the validity, safety, and utility of ChatGPT for patients with herniated lumbar disc: A preliminary study. Medicine (Baltimore) 2024; 103:e38445. [PMID: 38847711 PMCID: PMC11155576 DOI: 10.1097/md.0000000000038445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 05/10/2024] [Indexed: 06/10/2024] Open
Abstract
ChatGPT is perceived as a potential tool for patients diagnosed with herniated lumbar disc (HLD) to ask questions concerning desired information, with provision for necessary responses. In this preliminary study, we assessed the validity, safety, and utility of ChatGPT in patients with HLD. Two physicians specializing in the treatment of musculoskeletal disorders discussed and determined the 12 most frequently asked questions by patients with HLD in clinical practice. We used ChatGPT (version 4.0) to ask questions related to HLD. Each question was inputted into ChatGPT, and the responses were assessed by the 2 physicians. A Likert score was used to evaluate the validity, safety, and utility of the responses generated by ChatGPT. Each score for validity, safety, and utility was divided into 4 points, with a score of 4 indicating the most valid, safe, and useful answers and 1 point indicating the worst answers. Regarding validity, ChatGPT responses demonstrated 4 points for 9 questions (9/12, 75.0%) and 3 points for 3 questions (3/12, 25.0%). Regarding safety, ChatGPT scored 4 points for 11 questions (11/12, 91.7%) and 3 points for 1 question (1/12, 8.3%). Regarding utility, ChatGPT responses exhibited 4 points for 9 questions (9/12, 75.0%) and 3 points for 3 questions (3/12, 25.0%). ChatGPT demonstrates a tendency to offer relatively valid, safe, and useful information regarding HLD. However, users should exercise caution as ChatGPT may occasionally provide incomplete answers to some questions on HLD.
Collapse
Affiliation(s)
- Seoyon Yang
- Department of Rehabilitation Medicine, School of Medicine, Ewha Woman’s University Seoul Hospital, Seoul, Republic of Korea
| | - Min Cheol Chang
- Department of Rehabilitation Medicine, College of Medicine, Yeungnam University, Daegu, Republic of Korea
| |
Collapse
|
10
|
Ayoub NF, Lee YJ, Grimm D, Divi V. Head-to-Head Comparison of ChatGPT Versus Google Search for Medical Knowledge Acquisition. Otolaryngol Head Neck Surg 2024; 170:1484-1491. [PMID: 37529853 DOI: 10.1002/ohn.465] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 07/07/2023] [Accepted: 07/14/2023] [Indexed: 08/03/2023]
Abstract
OBJECTIVE Chat Generative Pretrained Transformer (ChatGPT) is the newest iteration of OpenAI's generative artificial intelligence (AI) with the potential to influence many facets of life, including health care. This study sought to assess ChatGPT's capabilities as a source of medical knowledge, using Google Search as a comparison. STUDY DESIGN Cross-sectional analysis. SETTING Online using ChatGPT, Google Seach, and Clinical Practice Guidelines (CPG). METHODS CPG Plain Language Summaries for 6 conditions were obtained. Questions relevant to specific conditions were developed and input into ChatGPT and Google Search. All questions were written from the patient perspective and sought (1) general medical knowledge or (2) medical recommendations, with varying levels of acuity (urgent or emergent vs routine clinical scenarios). Two blinded reviewers scored all passages and compared results from ChatGPT and Google Search, using the Patient Education Material Assessment Tool (PEMAT-P) as the primary outcome. Additional customized questions were developed that assessed the medical content of the passages. RESULTS The overall average PEMAT-P score for medical advice was 68.2% (standard deviation [SD]: 4.4) for ChatGPT and 89.4% (SD: 5.9) for Google Search (p < .001). There was a statistically significant difference in the PEMAT-P score by source (p < .001) but not by urgency of the clinical situation (p = .613). ChatGPT scored significantly higher than Google Search (87% vs 78%, p = .012) for patient education questions. CONCLUSION ChatGPT fared better than Google Search when offering general medical knowledge, but it scored worse when providing medical recommendations. Health care providers should strive to understand the potential benefits and ramifications of generative AI to guide patients appropriately.
Collapse
Affiliation(s)
- Noel F Ayoub
- Department of Otolaryngology-Head and Neck Surgery, Division of Head & Neck Surgery, Stanford University School of Medicine, Stanford, California, USA
| | - Yu-Jin Lee
- Department of Otolaryngology-Head and Neck Surgery, Division of Head & Neck Surgery, Stanford University School of Medicine, Stanford, California, USA
| | - David Grimm
- Department of Otolaryngology-Head and Neck Surgery, Division of Head & Neck Surgery, Stanford University School of Medicine, Stanford, California, USA
| | - Vasu Divi
- Department of Otolaryngology-Head and Neck Surgery, Division of Head & Neck Surgery, Stanford University School of Medicine, Stanford, California, USA
| |
Collapse
|
11
|
Bumgardner VKC, Mullen A, Armstrong SE, Hickey C, Marek V, Talbert J. Local Large Language Models for Complex Structured Tasks. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:105-114. [PMID: 38827047 PMCID: PMC11141822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
This paper introduces an approach that combines the language reasoning capabilities of large language models (LLMs) with the benefits of local training to tackle complex language tasks. The authors demonstrate their approach by extracting structured condition codes from pathology reports. The proposed approach utilizes local, fine-tuned LLMs to respond to specific generative instructions and provide structured outputs. Over 150k uncurated surgical pathology reports containing gross descriptions, final diagnoses, and condition codes were used. Different model architectures were trained and evaluated, including LLaMA, BERT, and LongFormer. The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics. LLaMA models performed especially well with large datasets, demonstrating their ability to handle complex, multi-label tasks. Overall, this work presents an effective approach for utilizing LLMs to perform structured generative tasks on domain-specific language in the medical domain.
Collapse
|
12
|
Piazza D, Martorana F, Curaba A, Sambataro D, Valerio MR, Firenze A, Pecorino B, Scollo P, Chiantera V, Scibilia G, Vigneri P, Gebbia V, Scandurra G. The Consistency and Quality of ChatGPT Responses Compared to Clinical Guidelines for Ovarian Cancer: A Delphi Approach. Curr Oncol 2024; 31:2796-2804. [PMID: 38785493 PMCID: PMC11119344 DOI: 10.3390/curroncol31050212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 05/06/2024] [Accepted: 05/14/2024] [Indexed: 05/25/2024] Open
Abstract
INTRODUCTION In recent years, generative Artificial Intelligence models, such as ChatGPT, have increasingly been utilized in healthcare. Despite acknowledging the high potential of AI models in terms of quick access to sources and formulating responses to a clinical question, the results obtained using these models still require validation through comparison with established clinical guidelines. This study compares the responses of the AI model to eight clinical questions with the Italian Association of Medical Oncology (AIOM) guidelines for ovarian cancer. MATERIALS AND METHODS The authors used the Delphi method to evaluate responses from ChatGPT and the AIOM guidelines. An expert panel of healthcare professionals assessed responses based on clarity, consistency, comprehensiveness, usability, and quality using a five-point Likert scale. The GRADE methodology assessed the evidence quality and the recommendations' strength. RESULTS A survey involving 14 physicians revealed that the AIOM guidelines consistently scored higher averages compared to the AI models, with a statistically significant difference. Post hoc tests showed that AIOM guidelines significantly differed from all AI models, with no significant difference among the AI models. CONCLUSIONS While AI models can provide rapid responses, they must match established clinical guidelines regarding clarity, consistency, comprehensiveness, usability, and quality. These findings underscore the importance of relying on expert-developed guidelines in clinical decision-making and highlight potential areas for AI model improvement.
Collapse
Affiliation(s)
- Dario Piazza
- Medical Oncology Unit, Casa di Cura Torina, 90145 Palermo, Italy; (D.P.); (A.C.)
| | - Federica Martorana
- Department of Clinical and Experimental Medicine, University of Catania, 95124 Catania, Italy;
| | - Annabella Curaba
- Medical Oncology Unit, Casa di Cura Torina, 90145 Palermo, Italy; (D.P.); (A.C.)
| | | | - Maria Rosaria Valerio
- Medical Oncology Unit, Policlinico P. Giaccone, University of Palermo, 90133 Palermo, Italy;
| | - Alberto Firenze
- Occupational Health Section, Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, University of Palermo, 90133 Palermo, Italy;
| | - Basilio Pecorino
- Gynecology Unit, Ospedale Cannizzaro, 95126 Catania, Italy; (B.P.); (P.S.)
- Gynecology, Faculty of Medicine and Surgery, University of Enna Kore, 94100 Enna, Italy
| | - Paolo Scollo
- Gynecology Unit, Ospedale Cannizzaro, 95126 Catania, Italy; (B.P.); (P.S.)
- Gynecology, Faculty of Medicine and Surgery, University of Enna Kore, 94100 Enna, Italy
| | - Vito Chiantera
- Gynecology, University of Palermo, 90133 Palermo, Italy;
| | | | - Paolo Vigneri
- Medical Oncology, University of Catania, 95124 Catania, Italy;
- Medical Oncology, Istituto Clinico Humanitas, 95045 Catania, Italy
| | - Vittorio Gebbia
- Medical Oncology Unit, Casa di Cura Torina, 90145 Palermo, Italy; (D.P.); (A.C.)
- Medical Oncology, Faculty of Medicine and Surgery, University of Enna Kore, 94100 Enna, Italy
| | | |
Collapse
|
13
|
Fournier A, Fallet C, Sadeghipour F, Perrottet N. Assessing the applicability and appropriateness of ChatGPT in answering clinical pharmacy questions. ANNALES PHARMACEUTIQUES FRANÇAISES 2024; 82:507-513. [PMID: 37992892 DOI: 10.1016/j.pharma.2023.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 11/16/2023] [Accepted: 11/16/2023] [Indexed: 11/24/2023]
Abstract
OBJECTIVES Clinical pharmacists rely on different scientific references to ensure appropriate, safe, and cost-effective drug use. Tools based on artificial intelligence (AI) such as ChatGPT (Generative Pre-trained Transformer) could offer valuable support. The objective of this study was to assess ChatGPT's capacity to correctly respond to clinical pharmacy questions asked by healthcare professionals in our university hospital. MATERIAL AND METHODS ChatGPT's capacity to respond correctly to the last 100 consecutive questions recorded in our clinical pharmacy database was assessed. Questions were copied from our FileMaker Pro database and pasted into ChatGPT March 14 version online platform. The generated answers were then copied verbatim into an Excel file. Two blinded clinical pharmacists reviewed all the questions and the answers given by the software. In case of disagreements, a third blinded pharmacist intervened to decide. RESULTS Documentation-related issues (n=36) and drug administration mode (n=30) were preponderantly recorded. Among 69 applicable questions, the rate of correct answers varied from 30 to 57.1% depending on questions type with a global rate of 44.9%. Regarding inappropriate answers (n=38), 20 were incorrect, 18 gave no answers and 8 were incomplete with 8 answers belonging to 2 different categories. No better answers than the pharmacists were observed. CONCLUSIONS ChatGPT demonstrated a mitigated performance in answering clinical pharmacy questions. It should not replace human expertise as a high rate of inappropriate answers was highlighted. Future studies should focus on the optimization of ChatGPT for specific clinical pharmacy questions and explore the potential benefits and limitations of integrating this technology into clinical practice.
Collapse
Affiliation(s)
- A Fournier
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
| | - C Fallet
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
| | - F Sadeghipour
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland; School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, Geneva, Switzerland; Center for Research and Innovation in Clinical Pharmaceutical Sciences, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - N Perrottet
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland; School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, Geneva, Switzerland.
| |
Collapse
|
14
|
Sloss EA, Abdul S, Aboagyewah MA, Beebe A, Kendle K, Marshall K, Rosenbloom ST, Rossetti S, Grigg A, Smith KD, Mishuris RG. Toward Alleviating Clinician Documentation Burden: A Scoping Review of Burden Reduction Efforts. Appl Clin Inform 2024; 15:446-455. [PMID: 38839063 PMCID: PMC11152769 DOI: 10.1055/s-0044-1787007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 04/17/2024] [Indexed: 06/07/2024] Open
Abstract
BACKGROUND Studies have shown that documentation burden experienced by clinicians may lead to less direct patient care, increased errors, and job dissatisfaction. Implementing effective strategies within health care systems to mitigate documentation burden can result in improved clinician satisfaction and more time spent with patients. However, there is a gap in the literature regarding evidence-based interventions to reduce documentation burden. OBJECTIVES The objective of this review was to identify and comprehensively summarize the state of the science related to documentation burden reduction efforts. METHODS Following Joanna Briggs Institute Manual for Evidence Synthesis and Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) guidelines, we conducted a comprehensive search of multiple databases, including PubMed, Medline, Embase, CINAHL Complete, Scopus, and Web of Science. Additionally, we searched gray literature and used Google Scholar to ensure a thorough review. Two reviewers independently screened titles and abstracts, followed by full-text review, with a third reviewer resolving any discrepancies. Data extraction was performed and a table of evidence was created. RESULTS A total of 34 articles were included in the review, published between 2016 and 2022, with a majority focusing on the United States. The efforts described can be categorized into medical scribes, workflow improvements, educational interventions, user-driven approaches, technology-based solutions, combination approaches, and other strategies. The outcomes of these efforts often resulted in improvements in documentation time, workflow efficiency, provider satisfaction, and patient interactions. CONCLUSION This scoping review provides a comprehensive summary of health system documentation burden reduction efforts. The positive outcomes reported in the literature emphasize the potential effectiveness of these efforts. However, more research is needed to identify universally applicable best practices, and considerations should be given to the transfer of burden among members of the health care team, quality of education, clinician involvement, and evaluation methods.
Collapse
Affiliation(s)
- Elizabeth A. Sloss
- Division of Health Systems and Community Based Care, College of Nursing, University of Utah, Utah, United States
| | - Shawna Abdul
- John D. Dingell VA Medical Center, Detroit, Michigan, United States
| | - Mayfair A. Aboagyewah
- Case Management, Mount Sinai Health System, MSH Main Campus, New York, New York, United States
| | - Alicia Beebe
- Saint Luke's Health System (MO), Kansas City, Missouri, United States
| | - Kathleen Kendle
- Section of Health Informatics, El Paso VA Health Care System, El Paso, Texas, United States
| | - Kyle Marshall
- Department of Emergency Medicine, Geisinger, Danville, Pennsylvania, United States
| | - S. Trent Rosenbloom
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Sarah Rossetti
- Biomedical Informatics and Nursing, Columbia University Irving Medical Center, New York, New York, United States
| | - Aaron Grigg
- Department of Informatics, Grande Ronde Hospital, La Grande, Oregon, United States
| | - Kevin D. Smith
- Department of Pediatrics, University of Chicago Medicine, Chicago, Illinois, United States
| | - Rebecca G. Mishuris
- Digital, Mass General Brigham, Somerville, Massachusetts, United States
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States
| |
Collapse
|
15
|
Dağci M, Çam F, Dost A. Reliability and Quality of the Nursing Care Planning Texts Generated by ChatGPT. Nurse Educ 2024; 49:E109-E114. [PMID: 37994523 DOI: 10.1097/nne.0000000000001566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2023]
Abstract
BACKGROUND The research on ChatGPT-generated nursing care planning texts is critical for enhancing nursing education through innovative and accessible learning methods, improving reliability and quality. PURPOSE The aim of the study was to examine the quality, authenticity, and reliability of the nursing care planning texts produced using ChatGPT. METHODS The study sample comprised 40 texts generated by ChatGPT selected nursing diagnoses that were included in NANDA 2021-2023. The texts were evaluated by using descriptive criteria form and DISCERN tool to evaluate health information. RESULTS DISCERN total average score of the texts was 45.93 ± 4.72. All texts had a moderate level of reliability and 97.5% of them provided moderate quality subscale score of information. A statistically significant relationship was found among the number of accessible references, reliability ( r = 0.408) and quality subscale score ( r = 0.379) of the texts ( P < .05). CONCLUSION ChatGPT-generated texts exhibited moderate reliability, quality of nursing care information, and overall quality despite low similarity rates.
Collapse
Affiliation(s)
- Mahmut Dağci
- Author Affiliation: Department of Nursing, Bezmialem Vakif University, Faculty of Health Sciences, Istanbul, Turkey
| | | | | |
Collapse
|
16
|
Aggarwal N, Saini BS, Gupta S. Contribution of ChatGPT in Parkinson's Disease Detection. Nucl Med Mol Imaging 2024; 58:101-103. [PMID: 38633283 PMCID: PMC11018720 DOI: 10.1007/s13139-024-00857-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 02/27/2024] [Accepted: 03/11/2024] [Indexed: 04/19/2024] Open
Affiliation(s)
- Nikita Aggarwal
- Department of Electronics & Communication Engineering, Dr. B R Ambedkar National Institute of Technology, Jalandhar, 144011 India
| | - Barjinder Singh Saini
- Department of Electronics & Communication Engineering, Dr. B R Ambedkar National Institute of Technology, Jalandhar, 144011 India
| | - Savita Gupta
- Department of Computer Science & Engineering, UIET, Panjab University, Chandigarh, 160014 India
| |
Collapse
|
17
|
Zheng Y, Wang L, Feng B, Zhao A, Wu Y. Innovating Healthcare: The Role of ChatGPT in Streamlining Hospital Workflow in the Future. Ann Biomed Eng 2024; 52:750-753. [PMID: 37464178 DOI: 10.1007/s10439-023-03323-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 07/13/2023] [Indexed: 07/20/2023]
Abstract
ChatGPT is revolutionizing hospital workflows by enhancing the precision and efficiency of tasks that were formerly the exclusive domain of healthcare professionals. Additionally, ChatGPT can aid in administrative duties, including appointment scheduling and billing, which enables healthcare professionals to allocate more time towards patient care. By shouldering some of these responsibilities, ChatGPT has the potential to advance the quality of patient care, streamline departmental efficiency, and lower healthcare costs. Nevertheless, it is crucial to strike a balance between the advantages of ChatGPT and the necessity of human interaction in healthcare to guarantee optimal patient care. While ChatGPT may assume some of the duties of physicians in particular medical domains, it cannot replace human doctors. Tackling the challenges and constraints associated with the integration of ChatGPT into the healthcare system is critical for its successful implementation.
Collapse
Affiliation(s)
- Yue Zheng
- Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China
| | - Laduona Wang
- Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China
| | - Baijie Feng
- West China School of Medicine, Sichuan University, Chengdu, 610041, China
| | - Ailin Zhao
- Department of Hematology, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China.
| | - Yijun Wu
- Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China.
| |
Collapse
|
18
|
Shahin MH, Barth A, Podichetty JT, Liu Q, Goyal N, Jin JY, Ouellet D. Artificial Intelligence: From Buzzword to Useful Tool in Clinical Pharmacology. Clin Pharmacol Ther 2024; 115:698-709. [PMID: 37881133 DOI: 10.1002/cpt.3083] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 10/06/2023] [Indexed: 10/27/2023]
Abstract
The advent of artificial intelligence (AI) in clinical pharmacology and drug development is akin to the dawning of a new era. Previously dismissed as merely technological hype, these approaches have emerged as promising tools in different domains, including health care, demonstrating their potential to empower clinical pharmacology decision making, revolutionize the drug development landscape, and advance patient care. Although challenges remain, the remarkable progress already made signals that the leap from hype to reality is well underway, and AI promises to offer clinical pharmacology new tools and possibilities for optimizing patient care is gradually coming to fruition. This review dives into the burgeoning world of AI and machine learning (ML), showcasing different applications of AI in clinical pharmacology and the impact of successful AI/ML implementation on drug development and/or regulatory decisions. This review also highlights recommendations for areas of opportunity in clinical pharmacology, including data analysis (e.g., handling large data sets, screening to identify important covariates, and optimizing patient population) and efficiencies (e.g., automation, translation, literature curation, and training). Realizing the benefits of AI in drug development and understanding its value will lead to the successful integration of AI tools in our clinical pharmacology and pharmacometrics armamentarium.
Collapse
Affiliation(s)
- Mohamed H Shahin
- Clinical Pharmacology and Bioanalytics, Pfizer Inc., Groton, Connecticut, USA
| | - Aline Barth
- Clinical Pharmacology and Bioanalytics, Pfizer Inc., Groton, Connecticut, USA
| | | | - Qi Liu
- Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland, USA
| | - Navin Goyal
- Clinical Pharmacology and Pharmacometrics, Janssen Research and Development, LLC., Spring House, Pennsylvania, USA
| | - Jin Y Jin
- Department of Clinical Pharmacology, Genentech, South San Francisco, California, USA
| | - Daniele Ouellet
- Clinical Pharmacology and Pharmacometrics, Janssen Research and Development, LLC., Spring House, Pennsylvania, USA
| |
Collapse
|
19
|
Sarman A, Tuncay S. An Exaggeration? Reality?: Can ChatGPT Be Used in Neonatal Nursing? J Perinat Neonatal Nurs 2024; 38:120-121. [PMID: 38758263 DOI: 10.1097/jpn.0000000000000826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/18/2024]
Abstract
Artificial intelligence (AI) represents a system endowed with the ability to derive meaningful inferences from a diverse array of datasets. Rooted in the advancements of machine learning models, AI has spawned various transformative technologies such as deep learning, natural language processing, computer vision, and robotics. This technological evolution is poised to witness a broadened spectrum of applications across diverse domains, with a particular focus on revolutionizing healthcare services. Noteworthy among these innovations is OpenAI's creation, ChatGPT, which stands out for its profound capabilities in intricate analysis, primarily facilitated through extensive language modeling. In the realm of healthcare, AI applications, including ChatGPT, have showcased promising outcomes, especially in the domain of neonatal nursing. Areas such as pain assessment, feeding processes, and patient status determination have witnessed substantial enhancements through the integration of AI technologies. However, it is crucial to approach the deployment of such applications with a judicious mindset. The accuracy of the underlying data must undergo rigorous validation, and any results lacking a solid foundation in scientific insights should be approached with skepticism. The paramount consideration remains patient safety, necessitating that AI applications, like ChatGPT, undergo thorough scrutiny through controlled and evidence-based studies. Only through such meticulous evaluation can the transformative potential of AI be harnessed responsibly, ensuring its alignment with the highest standards of healthcare practice.
Collapse
Affiliation(s)
- Abdullah Sarman
- Author Affiliations: Department of Pediatric Nursing, Faculty of Health Science, Bingöl University, Bingöl, Turkey
| | | |
Collapse
|
20
|
Jačisko J, Veselý V, Chang KV, Özçakar L. (How) ChatGPT-Artificial Intelligence Thinks It Can Help/Harm Physiatry. Am J Phys Med Rehabil 2024; 103:346-349. [PMID: 38112589 DOI: 10.1097/phm.0000000000002370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
ABSTRACT ChatGPT is a chatbot that is based on the generative pretrained transformer architecture as an artificial inteligence-based large language model. Its widespread use in healthcare practice, research, and education seems to be (increasingly) inevitable. Also considering the relevant limitations regarding privacy, ethics, bias, legal, and validity, in this article, its use as a supplement (for sure not as a substitute for physicians) is discussed in light of the recent literature. Particularly, the "opinion" of ChatGPT about how it can help/harm physiatry is exemplified.
Collapse
Affiliation(s)
- Jakub Jačisko
- From the Department of Rehabilitation and Sports Medicine, Second Faculty of Medicine, Charles University and University Hospital Motol, Prague, Czech Republic (JJ, VV); Department of Physical Medicine and Rehabilitation, National Taiwan University Hospital, Bei-Hu Branch, Taipei, Taiwan (K-VC); and Department of Physical and Rehabilitation Medicine, Hacettepe University Medical School, Ankara, Turkey (LO)
| | | | | | | |
Collapse
|
21
|
Zampatti S, Peconi C, Megalizzi D, Calvino G, Trastulli G, Cascella R, Strafella C, Caltagirone C, Giardina E. Innovations in Medicine: Exploring ChatGPT's Impact on Rare Disorder Management. Genes (Basel) 2024; 15:421. [PMID: 38674356 PMCID: PMC11050022 DOI: 10.3390/genes15040421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/28/2024] Open
Abstract
Artificial intelligence (AI) is rapidly transforming the field of medicine, announcing a new era of innovation and efficiency. Among AI programs designed for general use, ChatGPT holds a prominent position, using an innovative language model developed by OpenAI. Thanks to the use of deep learning techniques, ChatGPT stands out as an exceptionally viable tool, renowned for generating human-like responses to queries. Various medical specialties, including rheumatology, oncology, psychiatry, internal medicine, and ophthalmology, have been explored for ChatGPT integration, with pilot studies and trials revealing each field's potential benefits and challenges. However, the field of genetics and genetic counseling, as well as that of rare disorders, represents an area suitable for exploration, with its complex datasets and the need for personalized patient care. In this review, we synthesize the wide range of potential applications for ChatGPT in the medical field, highlighting its benefits and limitations. We pay special attention to rare and genetic disorders, aiming to shed light on the future roles of AI-driven chatbots in healthcare. Our goal is to pave the way for a healthcare system that is more knowledgeable, efficient, and centered around patient needs.
Collapse
Affiliation(s)
- Stefania Zampatti
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Cristina Peconi
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Domenica Megalizzi
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Science, Roma Tre University, 00146 Rome, Italy
| | - Giulia Calvino
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Science, Roma Tre University, 00146 Rome, Italy
| | - Giulia Trastulli
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of System Medicine, Tor Vergata University, 00133 Rome, Italy
| | - Raffaella Cascella
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Chemical-Toxicological and Pharmacological Evaluation of Drugs, Catholic University Our Lady of Good Counsel, 1000 Tirana, Albania
| | - Claudia Strafella
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Carlo Caltagirone
- Department of Clinical and Behavioral Neurology, IRCCS Fondazione Santa Lucia, 00179 Rome, Italy;
| | - Emiliano Giardina
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Biomedicine and Prevention, Tor Vergata University, 00133 Rome, Italy
| |
Collapse
|
22
|
Erden Y, Temel MH, Bağcıer F. Artificial intelligence insights into osteoporosis: assessing ChatGPT's information quality and readability. Arch Osteoporos 2024; 19:17. [PMID: 38499716 DOI: 10.1007/s11657-024-01376-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 03/07/2024] [Indexed: 03/20/2024]
Abstract
Accessible, accurate information, and readability play crucial role in empowering individuals managing osteoporosis. This study showed that the responses generated by ChatGPT regarding osteoporosis had serious problems with quality and were at a level of complexity that that necessitates an educational background of approximately 17 years. PURPOSE The use of artificial intelligence (AI) applications as a source of information in the field of health is increasing. Readable and accurate information plays a critical role in empowering patients to make decisions about their disease. The aim was to examine the quality and readability of responses provided by ChatGPT, an AI chatbot, to commonly asked questions regarding osteoporosis, representing a major public health problem. METHODS "Osteoporosis," "female osteoporosis," and "male osteoporosis" were identified by using Google trends for the 25 most frequently searched keywords on Google. A selected set of 38 keywords was sequentially inputted into the chat interface of the ChatGPT. The responses were evaluated with tools of the Ensuring Quality Information for Patients (EQIP), the Flesch-Kincaid Grade Level (FKGL), and the Flesch-Kincaid Reading Ease (FKRE). RESULTS The EQIP score of the texts ranged from a minimum of 36.36 to a maximum of 61.76 with a mean value of 48.71 as having "serious problems with quality." The FKRE scores spanned from 13.71 to 56.06 with a mean value of 28.71 and the FKGL varied between 8.48 and 17.63, with a mean value of 13.25. There were no statistically significant correlations between the EQIP score and the FKGL or FKRE scores. CONCLUSIONS Although ChatGPT is easily accessible for patients to obtain information about osteoporosis, its current quality and readability fall short of meeting comprehensive healthcare standards.
Collapse
Affiliation(s)
- Yakup Erden
- Clinic of Physical Medicine and Rehabilitation, İzzet Baysal Physical Treatment and Rehabilitation Training and Research Hospital, Orüs Street, No. 59, 14020, Bolu, Turkey.
| | - Mustafa Hüseyin Temel
- Department of Physical Medicine and Rehabilitation, Üsküdar State Hospital, Barbaros, Veysi Paşa Street, No. 14, 34662, Istanbul, Turkey
| | - Fatih Bağcıer
- Clinic of Physical Medicine and Rehabilitation, Başakşehir Çam and Sakura City Hospital, Olympic Boulevard Road, 34480, Istanbul, Turkey
| |
Collapse
|
23
|
Liu Y, Ju S, Wang J. Exploring the potential of ChatGPT in medical dialogue summarization: a study on consistency with human preferences. BMC Med Inform Decis Mak 2024; 24:75. [PMID: 38486198 PMCID: PMC10938713 DOI: 10.1186/s12911-024-02481-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 03/11/2024] [Indexed: 03/18/2024] Open
Abstract
BACKGROUND Telemedicine has experienced rapid growth in recent years, aiming to enhance medical efficiency and reduce the workload of healthcare professionals. During the COVID-19 pandemic in 2019, it became especially crucial, enabling remote screenings and access to healthcare services while maintaining social distancing. Online consultation platforms have emerged, but the demand has strained the availability of medical professionals, directly leading to research and development in automated medical consultation. Specifically, there is a need for efficient and accurate medical dialogue summarization algorithms to condense lengthy conversations into shorter versions focused on relevant medical facts. The success of large language models like generative pre-trained transformer (GPT)-3 has recently prompted a paradigm shift in natural language processing (NLP) research. In this paper, we will explore its impact on medical dialogue summarization. METHODS We present the performance and evaluation results of two approaches on a medical dialogue dataset. The first approach is based on fine-tuned pre-trained language models, such as bert-based summarization (BERTSUM) and bidirectional auto-regressive Transformers (BART). The second approach utilizes a large language models (LLMs) GPT-3.5 with inter-context learning (ICL). Evaluation is conducted using automated metrics such as ROUGE and BERTScore. RESULTS In comparison to the BART and ChatGPT models, the summaries generated by the BERTSUM model not only exhibit significantly lower ROUGE and BERTScore values but also fail to pass the testing for any of the metrics in manual evaluation. On the other hand, the BART model achieved the highest ROUGE and BERTScore values among all evaluated models, surpassing ChatGPT. Its ROUGE-1, ROUGE-2, ROUGE-L, and BERTScore values were 14.94%, 53.48%, 32.84%, and 6.73% higher respectively than ChatGPT's best results. However, in the manual evaluation by medical experts, the summaries generated by the BART model exhibit satisfactory performance only in the "Readability" metric, with less than 30% passing the manual evaluation in other metrics. When compared to the BERTSUM and BART models, the ChatGPT model was evidently more favored by human medical experts. CONCLUSION On one hand, the GPT-3.5 model can manipulate the style and outcomes of medical dialogue summaries through various prompts. The generated content is not only better received than results from certain human experts but also more comprehensible, making it a promising avenue for automated medical dialogue summarization. On the other hand, automated evaluation mechanisms like ROUGE and BERTScore fall short in fully assessing the outputs of large language models like GPT-3.5. Therefore, it is necessary to research more appropriate evaluation criteria.
Collapse
Affiliation(s)
- Yong Liu
- Department of Computer Science, Sichuan University, No. 24, South Section 1, 1st Ring Road, Chendu, 610065, Sichuan, China
| | - Shenggen Ju
- Department of Computer Science, Sichuan University, No. 24, South Section 1, 1st Ring Road, Chendu, 610065, Sichuan, China.
| | - Junfeng Wang
- Department of Computer Science, Sichuan University, No. 24, South Section 1, 1st Ring Road, Chendu, 610065, Sichuan, China
| |
Collapse
|
24
|
Xiong C, Dang W, Yang Q, Zhou Q, Shen M, Xiong Q, An M, Jiang X, Ni Y, Ji X. Integrated Ink Printing Paper Based Self-Powered Electrochemical Multimodal Biosensing (IFP -Multi ) with ChatGPT-Bioelectronic Interface for Personalized Healthcare Management. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2305962. [PMID: 38161220 PMCID: PMC10953564 DOI: 10.1002/advs.202305962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 10/23/2023] [Indexed: 01/03/2024]
Abstract
Personalized healthcare management is an emerging field that requires the development of environment-friendly, integrated, and electrochemical multimodal devices. In this study, the concept of integrated paper-based biosensors (IFP-Multi ) for personalized healthcare management is introduced. By leveraging ink printing technology and a ChatGPT-bioelectronic interface, these biosensors offer ultrahigh areal-specific capacitance (74633 mF cm-2 ), excellent mechanical properties, and multifunctional sensing and humidity power generation capabilities. More importantly, the IFP-Multi devices have the potential to simulate deaf-mute vocalization and can be integrated into wearable sensors to detect muscle contractions and bending motions. Moreover, they also enable monitoring of physiological signals from various body parts, such as the throat, nape, elbow, wrist, and knee, and successfully record sharp and repeatable signals generated by muscle contractions. In addition, the IFP-Multi devices demonstrate self-powered handwriting sensing and moisture power generation for sweat-sensing applications. As a proof-of-concept, a GPT 3.5 model-based fine-tuning and prediction pipeline that utilizes recorded physiological signals through IFP-Multi is showcased, enabling artificial intelligence with multimodal sensing capabilities for personalized healthcare management. This work presents a promising and ecofriendly approach to developing paper-based electrochemical multimodal devices, paving the way for a new era of healthcare advancements.
Collapse
Affiliation(s)
- Chuanyin Xiong
- College of Bioresources Chemical & Materials EngineeringShaanxi University of Science and TechnologyXi'an710021China
| | - Weihua Dang
- College of Bioresources Chemical & Materials EngineeringShaanxi University of Science and TechnologyXi'an710021China
| | - Qi Yang
- College of Bioresources Chemical & Materials EngineeringShaanxi University of Science and TechnologyXi'an710021China
| | - Qiusheng Zhou
- College of Bioresources Chemical & Materials EngineeringShaanxi University of Science and TechnologyXi'an710021China
| | - Mengxia Shen
- College of Bioresources Chemical & Materials EngineeringShaanxi University of Science and TechnologyXi'an710021China
| | - Qiancheng Xiong
- School of Chemistry and Materials EngineeringHuizhou UniversityHuizhou516007China
| | - Meng An
- College of Mechanical and Electrical EngineeringShaanxi University of Science and TechnologyXi'an710021China
| | - Xue Jiang
- College of Bioresources Chemical & Materials EngineeringShaanxi University of Science and TechnologyXi'an710021China
| | - Yonghao Ni
- Department of Chemical and Biomedical EngineeringThe University of MaineOronoME04469USA
| | - Xianglin Ji
- Oxford‐CityU Centre for Cerebro‐Cardiovascular Health Engineering (COCHE)City University of Hong KongHong KongHong Kong SAR999077China
| |
Collapse
|
25
|
Shiraishi M, Lee H, Kanayama K, Moriwaki Y, Okazaki M. Appropriateness of Artificial Intelligence Chatbots in Diabetic Foot Ulcer Management. INT J LOW EXTR WOUND 2024:15347346241236811. [PMID: 38419470 DOI: 10.1177/15347346241236811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
Type 2 diabetes is a significant global health concern. It often causes diabetic foot ulcers (DFUs), which affect millions of people and increase amputation and mortality rates. Despite existing guidelines, the complexity of DFU treatment makes clinical decisions challenging. Large language models such as chat generative pretrained transformer (ChatGPT), which are adept at natural language processing, have emerged as valuable resources in the medical field. However, concerns about the accuracy and reliability of the information they provide remain. We aimed to assess the accuracy of various artificial intelligence (AI) chatbots, including ChatGPT, in providing information on DFUs based on established guidelines. Seven AI chatbots were asked clinical questions (CQs) based on the DFU guidelines. Their responses were analyzed for accuracy in terms of answers to CQs, grade of recommendation, level of evidence, and agreement with the reference, including verification of the authenticity of the references provided by the chatbots. The AI chatbots showed a mean accuracy of 91.2% in answers to CQs, with discrepancies noted in grade of recommendation and level of evidence. Claude-2 outperformed other chatbots in the number of verified references (99.6%), whereas ChatGPT had the lowest rate of reference authenticity (66.3%). This study highlights the potential of AI chatbots as tools for disseminating medical information and demonstrates their high degree of accuracy in answering CQs related to DFUs. However, the variability in the accuracy of these chatbots and problems like AI hallucinations necessitate cautious use and further optimization for medical applications. This study underscores the evolving role of AI in healthcare and the importance of refining these technologies for effective use in clinical decision-making and patient education.
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Haesu Lee
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Koji Kanayama
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Yuta Moriwaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| |
Collapse
|
26
|
Rudroff T. Revealing the Complexity of Fatigue: A Review of the Persistent Challenges and Promises of Artificial Intelligence. Brain Sci 2024; 14:186. [PMID: 38391760 PMCID: PMC10886506 DOI: 10.3390/brainsci14020186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 01/31/2024] [Accepted: 02/16/2024] [Indexed: 02/24/2024] Open
Abstract
Part I reviews persistent challenges obstructing progress in understanding complex fatigue's biology. Difficulties quantifying subjective symptoms, mapping multi-factorial mechanisms, accounting for individual variation, enabling invasive sensing, overcoming research/funding insularity, and more are discussed. Part II explores how emerging artificial intelligence and machine and deep learning techniques can help address limitations through pattern recognition of complex physiological signatures as more objective biomarkers, predictive modeling to capture individual differences, consolidation of disjointed findings via data mining, and simulation to explore interventions. Conversational agents like Claude and ChatGPT also have potential to accelerate human fatigue research, but they currently lack capacities for robust autonomous contributions. Envisioned is an innovation timeline where synergistic application of enhanced neuroimaging, biosensors, closed-loop systems, and other advances combined with AI analytics could catalyze transformative progress in elucidating fatigue neural circuitry and treating associated conditions over the coming decades.
Collapse
Affiliation(s)
- Thorsten Rudroff
- Department of Health and Human Physiology, University of Iowa, Iowa City, IA 52242, USA
- Department of Neurology, University of Iowa Hospitals and Clinics, Iowa City, IA 52242, USA
| |
Collapse
|
27
|
Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024; 44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND The rapidly evolving field of artificial intelligence (AI) holds great potential for plastic surgeons. ChatGPT, a recently released AI large language model (LLM), promises applications across many disciplines, including healthcare. OBJECTIVES The aim of this article was to provide a primer for plastic surgeons on AI, LLM, and ChatGPT, including an analysis of current demonstrated and proposed clinical applications. METHODS A systematic review was performed identifying medical and surgical literature on ChatGPT's proposed clinical applications. Variables assessed included applications investigated, command tasks provided, user input information, AI-emulated human skills, output validation, and reported limitations. RESULTS The analysis included 175 articles reporting on 13 plastic surgery applications and 116 additional clinical applications, categorized by field and purpose. Thirty-four applications within plastic surgery are thus proposed, with relevance to different target audiences, including attending plastic surgeons (n = 17, 50%), trainees/educators (n = 8, 24.0%), researchers/scholars (n = 7, 21%), and patients (n = 2, 6%). The 15 identified limitations of ChatGPT were categorized by training data, algorithm, and ethical considerations. CONCLUSIONS Widespread use of ChatGPT in plastic surgery will depend on rigorous research of proposed applications to validate performance and address limitations. This systemic review aims to guide research, development, and regulation to safely adopt AI in plastic surgery.
Collapse
|
28
|
Gangwal A, Ansari A, Ahmad I, Azad AK, Kumarasamy V, Subramaniyan V, Wong LS. Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities. Front Pharmacol 2024; 15:1331062. [PMID: 38384298 PMCID: PMC10879372 DOI: 10.3389/fphar.2024.1331062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/17/2024] [Indexed: 02/23/2024] Open
Abstract
There are two main ways to discover or design small drug molecules. The first involves fine-tuning existing molecules or commercially successful drugs through quantitative structure-activity relationships and virtual screening. The second approach involves generating new molecules through de novo drug design or inverse quantitative structure-activity relationship. Both methods aim to get a drug molecule with the best pharmacokinetic and pharmacodynamic profiles. However, bringing a new drug to market is an expensive and time-consuming endeavor, with the average cost being estimated at around $2.5 billion. One of the biggest challenges is screening the vast number of potential drug candidates to find one that is both safe and effective. The development of artificial intelligence in recent years has been phenomenal, ushering in a revolution in many fields. The field of pharmaceutical sciences has also significantly benefited from multiple applications of artificial intelligence, especially drug discovery projects. Artificial intelligence models are finding use in molecular property prediction, molecule generation, virtual screening, synthesis planning, repurposing, among others. Lately, generative artificial intelligence has gained popularity across domains for its ability to generate entirely new data, such as images, sentences, audios, videos, novel chemical molecules, etc. Generative artificial intelligence has also delivered promising results in drug discovery and development. This review article delves into the fundamentals and framework of various generative artificial intelligence models in the context of drug discovery via de novo drug design approach. Various basic and advanced models have been discussed, along with their recent applications. The review also explores recent examples and advances in the generative artificial intelligence approach, as well as the challenges and ongoing efforts to fully harness the potential of generative artificial intelligence in generating novel drug molecules in a faster and more affordable manner. Some clinical-level assets generated form generative artificial intelligence have also been discussed in this review to show the ever-increasing application of artificial intelligence in drug discovery through commercial partnerships.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Azim Ansari
- Computer Aided Drug Design Center Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Dhule, India
| | - Abul Kalam Azad
- Faculty of Pharmacy, University College of MAIWP International, Batu Caves, Malaysia
| | - Vinoth Kumarasamy
- Department of Parasitology and Medical Entomology, Faculty of Medicine, Universiti Kebangsaan Malaysia, Cheras, Malaysia
| | - Vetriselvan Subramaniyan
- Pharmacology Unit, Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Selangor, Malaysia
- School of Bioengineering and Biosciences, Lovely Professional University, Phagwara, Punjab, India
| | - Ling Shing Wong
- Faculty of Health and Life Sciences, INTI International University, Nilai, Malaysia
| |
Collapse
|
29
|
Peng W, Feng Y, Yao C, Zhang S, Zhuo H, Qiu T, Zhang Y, Tang J, Gu Y, Sun Y. Evaluating AI in medicine: a comparative analysis of expert and ChatGPT responses to colorectal cancer questions. Sci Rep 2024; 14:2840. [PMID: 38310152 PMCID: PMC10838275 DOI: 10.1038/s41598-024-52853-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 01/24/2024] [Indexed: 02/05/2024] Open
Abstract
Colorectal cancer (CRC) is a global health challenge, and patient education plays a crucial role in its early detection and treatment. Despite progress in AI technology, as exemplified by transformer-like models such as ChatGPT, there remains a lack of in-depth understanding of their efficacy for medical purposes. We aimed to assess the proficiency of ChatGPT in the field of popular science, specifically in answering questions related to CRC diagnosis and treatment, using the book "Colorectal Cancer: Your Questions Answered" as a reference. In general, 131 valid questions from the book were manually input into ChatGPT. Responses were evaluated by clinical physicians in the relevant fields based on comprehensiveness and accuracy of information, and scores were standardized for comparison. Not surprisingly, ChatGPT showed high reproducibility in its responses, with high uniformity in comprehensiveness, accuracy, and final scores. However, the mean scores of ChatGPT's responses were significantly lower than the benchmarks, indicating it has not reached an expert level of competence in CRC. While it could provide accurate information, it lacked in comprehensiveness. Notably, ChatGPT performed well in domains of radiation therapy, interventional therapy, stoma care, venous care, and pain control, almost rivaling the benchmarks, but fell short in basic information, surgery, and internal medicine domains. While ChatGPT demonstrated promise in specific domains, its general efficiency in providing CRC information falls short of expert standards, indicating the need for further advancements and improvements in AI technology for patient education in healthcare.
Collapse
Affiliation(s)
- Wen Peng
- Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China
- The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Yifei Feng
- Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China
- The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Cui Yao
- Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China
- The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Sheng Zhang
- Department of Radiotherapy, The First Affiliated Hospital with Nanjing Medical University, Nanjing, People's Republic of China
| | - Han Zhuo
- Department of Intervention, The First Affiliated Hospital with Nanjing Medical University, Nanjing, People's Republic of China
| | - Tianzhu Qiu
- Department of Oncology, The First Affiliated Hospital with Nanjing Medical University, Nanjing, People's Republic of China
| | - Yi Zhang
- Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China
- The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Junwei Tang
- Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China.
- The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China.
| | - Yanhong Gu
- Department of Oncology, The First Affiliated Hospital with Nanjing Medical University, Nanjing, People's Republic of China.
| | - Yueming Sun
- Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China.
- The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China.
| |
Collapse
|
30
|
Barlas T, Altinova AE, Akturk M, Toruner FB. Credibility of ChatGPT in the assessment of obesity in type 2 diabetes according to the guidelines. Int J Obes (Lond) 2024; 48:271-275. [PMID: 37951982 DOI: 10.1038/s41366-023-01410-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 10/22/2023] [Accepted: 10/30/2023] [Indexed: 11/14/2023]
Abstract
BACKGROUND The Chat Generative Pre-trained Transformer (ChatGPT) allows students, researchers, and patients in the medical field to access information easily and has gained attention nowadays. We aimed to evaluate the credibility of ChatGPT according to the guidelines for the assessment of obesity in type 2 diabetes (T2D), which is one of the major concerns of this century. MATERIALS AND METHOD In this cross-sectional non-human subject study, experienced endocrinologists posed 20 questions to ChatGPT in subsections, which were assessments and different treatment options for obesity according to the American Diabetes Association and American Association of Clinical Endocrinology guidelines. The responses of ChatGPT were classified into four categories: compatible, compatible but insufficient, partially incompatible and incompatible with the guidelines. RESULTS ChatGPT demonstrated a systematic approach to answering questions and recommended consulting a healthcare provider to receive personalized advice based on the specific health needs and circumstances of patients. The compatibility of ChatGPT with the guidelines was 100% in the assessment of obesity in type 2 diabetes; however, it was lower in the therapy sections, which included nutritional, medical, and surgical approaches to weight loss. Furthermore, ChatGPT required additional prompts for responses that were evaluated as "compatible but insufficient" to provide all the information in the guidelines. CONCLUSION The assessment and management of obesity in T2D are highly individualized. Despite ChatGPT's comprehensive and understandable responses, it should not be used as a substitute for healthcare professionals' patient-centered approach.
Collapse
Affiliation(s)
- Tugba Barlas
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey.
| | - Alev Eroglu Altinova
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey
| | - Mujde Akturk
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey
| | - Fusun Balos Toruner
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey
| |
Collapse
|
31
|
Malik S, Zaheer S. ChatGPT as an aid for pathological diagnosis of cancer. Pathol Res Pract 2024; 253:154989. [PMID: 38056135 DOI: 10.1016/j.prp.2023.154989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/26/2023] [Accepted: 11/27/2023] [Indexed: 12/08/2023]
Abstract
Diagnostic workup of cancer patients is highly reliant on the science of pathology using cytopathology, histopathology, and other ancillary techniques like immunohistochemistry and molecular cytogenetics. Data processing and learning by means of artificial intelligence (AI) has become a spearhead for the advancement of medicine, with pathology and laboratory medicine being no exceptions. ChatGPT, an artificial intelligence (AI)-based chatbot, that was recently launched by OpenAI, is currently a talk of the town, and its role in cancer diagnosis is also being explored meticulously. Pathology workflow by integration of digital slides, implementation of advanced algorithms, and computer-aided diagnostic techniques extend the frontiers of the pathologist's view beyond a microscopic slide and enables effective integration, assimilation, and utilization of knowledge that is beyond human limits and boundaries. Despite of it's numerous advantages in the pathological diagnosis of cancer, it comes with several challenges like integration of digital slides with input language parameters, problems of bias, and legal issues which have to be addressed and worked up soon so that we as a pathologists diagnosing malignancies are on the same band wagon and don't miss the train.
Collapse
Affiliation(s)
- Shaivy Malik
- Department of Pathology, Vardhman Mahavir Medical College and Safdarjung Hospital, New Delhi, India
| | - Sufian Zaheer
- Department of Pathology, Vardhman Mahavir Medical College and Safdarjung Hospital, New Delhi, India.
| |
Collapse
|
32
|
Lanera C, Lorenzoni G, Barbieri E, Piras G, Magge A, Weissenbacher D, Donà D, Cantarutti L, Gonzalez-Hernandez G, Giaquinto C, Gregori D. Monitoring the Epidemiology of Otitis Using Free-Text Pediatric Medical Notes: A Deep Learning Approach. J Pers Med 2023; 14:28. [PMID: 38248729 PMCID: PMC10817419 DOI: 10.3390/jpm14010028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 12/20/2023] [Accepted: 12/21/2023] [Indexed: 01/23/2024] Open
Abstract
Free-text information represents a valuable resource for epidemiological surveillance. Its unstructured nature, however, presents significant challenges in the extraction of meaningful information. This study presents a deep learning model for classifying otitis using pediatric medical records. We analyzed the Pedianet database, which includes data from January 2004 to August 2017. The model categorizes narratives from clinical record diagnoses into six types: no otitis, non-media otitis, non-acute otitis media (OM), acute OM (AOM), AOM with perforation, and recurrent AOM. Utilizing deep learning architectures, including an ensemble model, this study addressed the challenges associated with the manual classification of extensive narrative data. The performance of the model was evaluated according to a gold standard classification made by three expert clinicians. The ensemble model achieved values of 97.03, 93.97, 96.59, and 95.48 for balanced precision, balanced recall, accuracy, and balanced F1 measure, respectively. These results underscore the efficacy of using automated systems for medical diagnoses, especially in pediatric care. Our findings demonstrate the potential of deep learning in interpreting complex medical records, enhancing epidemiological surveillance and research. This approach offers significant improvements in handling large-scale medical data, ensuring accuracy and minimizing human error. The methodology is adaptable to other medical contexts, promising a new horizon in healthcare analytics.
Collapse
Affiliation(s)
- Corrado Lanera
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (C.L.); (G.L.)
| | - Giulia Lorenzoni
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (C.L.); (G.L.)
| | - Elisa Barbieri
- Division of Pediatric Infectious Diseases, Department for Woman and Child Health, University of Padova, 35128 Padova, Italy; (E.B.); (D.D.); (C.G.)
| | - Gianluca Piras
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (C.L.); (G.L.)
| | - Arjun Magge
- Health Language Processing Center, Institute for Biomedical Informatics at the Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; (A.M.); (D.W.); (G.G.-H.)
| | - Davy Weissenbacher
- Health Language Processing Center, Institute for Biomedical Informatics at the Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; (A.M.); (D.W.); (G.G.-H.)
| | - Daniele Donà
- Division of Pediatric Infectious Diseases, Department for Woman and Child Health, University of Padova, 35128 Padova, Italy; (E.B.); (D.D.); (C.G.)
| | | | - Graciela Gonzalez-Hernandez
- Health Language Processing Center, Institute for Biomedical Informatics at the Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; (A.M.); (D.W.); (G.G.-H.)
| | - Carlo Giaquinto
- Division of Pediatric Infectious Diseases, Department for Woman and Child Health, University of Padova, 35128 Padova, Italy; (E.B.); (D.D.); (C.G.)
- Società Servizi Telematici—Pedianet, 35100 Padova, Italy;
| | - Dario Gregori
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (C.L.); (G.L.)
| |
Collapse
|
33
|
Alsadhan A, Al-Anezi F, Almohanna A, Alnaim N, Alzahrani H, Shinawi R, AboAlsamh H, Bakhshwain A, Alenazy M, Arif W, Alyousef S, Alhamidi S, Alghamdi A, AlShrayfi N, Rubaian NB, Alanzi T, AlSahli A, Alturki R, Herzallah N. The opportunities and challenges of adopting ChatGPT in medical research. Front Med (Lausanne) 2023; 10:1259640. [PMID: 38188345 PMCID: PMC10766839 DOI: 10.3389/fmed.2023.1259640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 12/07/2023] [Indexed: 01/09/2024] Open
Abstract
Purpose This study aims to investigate the opportunities and challenges of adopting ChatGPT in medical research. Methods A qualitative approach with focus groups is adopted in this study. A total of 62 participants including academic researchers from different streams in medicine and eHealth, participated in this study. Results A total of five themes with 16 sub-themes related to the opportunities; and a total of five themes with 12 sub-themes related to the challenges were identified. The major opportunities include improved data collection and analysis, improved communication and accessibility, and support for researchers in multiple streams of medical research. The major challenges identified were limitations of training data leading to bias, ethical issues, technical limitations, and limitations in data collection and analysis. Conclusion Although ChatGPT can be used as a potential tool in medical research, there is a need for further evidence to generalize its impact on the different research activities.
Collapse
Affiliation(s)
- Abeer Alsadhan
- Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Fahad Al-Anezi
- Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Asmaa Almohanna
- Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Norah Alnaim
- Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | | | | | - Hoda AboAlsamh
- Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | | | - Maha Alenazy
- King Saud University, Riyadh, Riyadh, Saudi Arabia
| | - Wejdan Arif
- King Saud University, Riyadh, Riyadh, Saudi Arabia
| | | | | | | | - Nour AlShrayfi
- Public Authority for Applied Education and Training, Kuwait City, Kuwait
| | | | - Turki Alanzi
- Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Alaa AlSahli
- King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| | - Rasha Alturki
- Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | | |
Collapse
|
34
|
Alotaibi SS, Rehman A, Hasnain M. Revolutionizing ocular cancer management: a narrative review on exploring the potential role of ChatGPT. Front Public Health 2023; 11:1338215. [PMID: 38192545 PMCID: PMC10773849 DOI: 10.3389/fpubh.2023.1338215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 12/04/2023] [Indexed: 01/10/2024] Open
Abstract
This paper pioneers the exploration of ocular cancer, and its management with the help of Artificial Intelligence (AI) technology. Existing literature presents a significant increase in new eye cancer cases in 2023, experiencing a higher incidence rate. Extensive research was conducted using online databases such as PubMed, ACM Digital Library, ScienceDirect, and Springer. To conduct this review, Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines are used. Of the collected 62 studies, only 20 documents met the inclusion criteria. The review study identifies seven ocular cancer types. Important challenges associated with ocular cancer are highlighted, including limited awareness about eye cancer, restricted healthcare access, financial barriers, and insufficient infrastructure support. Financial barriers is one of the widely examined ocular cancer challenges in the literature. The potential role and limitations of ChatGPT are discussed, emphasizing its usefulness in providing general information to physicians, noting its inability to deliver up-to-date information. The paper concludes by presenting the potential future applications of ChatGPT to advance research on ocular cancer globally.
Collapse
Affiliation(s)
- Saud S. Alotaibi
- Information Systems Department, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Amna Rehman
- Department of Computer Science, Lahore Leads University, Lahore, Pakistan
| | - Muhammad Hasnain
- Department of Computer Science, Lahore Leads University, Lahore, Pakistan
| |
Collapse
|
35
|
Madrid-García A, Rosales-Rosado Z, Freites-Nuñez D, Pérez-Sancristóbal I, Pato-Cour E, Plasencia-Rodríguez C, Cabeza-Osorio L, Abasolo-Alcázar L, León-Mateos L, Fernández-Gutiérrez B, Rodríguez-Rodríguez L. Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training. Sci Rep 2023; 13:22129. [PMID: 38092821 PMCID: PMC10719375 DOI: 10.1038/s41598-023-49483-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 12/08/2023] [Indexed: 12/17/2023] Open
Abstract
The emergence of large language models (LLM) with remarkable performance such as ChatGPT and GPT-4, has led to an unprecedented uptake in the population. One of their most promising and studied applications concerns education due to their ability to understand and generate human-like text, creating a multitude of opportunities for enhancing educational practices and outcomes. The objective of this study is twofold: to assess the accuracy of ChatGPT/GPT-4 in answering rheumatology questions from the access exam to specialized medical training in Spain (MIR), and to evaluate the medical reasoning followed by these LLM to answer those questions. A dataset, RheumaMIR, of 145 rheumatology-related questions, extracted from the exams held between 2010 and 2023, was created for that purpose, used as a prompt for the LLM, and was publicly distributed. Six rheumatologists with clinical and teaching experience evaluated the clinical reasoning of the chatbots using a 5-point Likert scale and their degree of agreement was analyzed. The association between variables that could influence the models' accuracy (i.e., year of the exam question, disease addressed, type of question and genre) was studied. ChatGPT demonstrated a high level of performance in both accuracy, 66.43%, and clinical reasoning, median (Q1-Q3), 4.5 (2.33-4.67). However, GPT-4 showed better performance with an accuracy score of 93.71% and a median clinical reasoning value of 4.67 (4.5-4.83). These findings suggest that LLM may serve as valuable tools in rheumatology education, aiding in exam preparation and supplementing traditional teaching methods.
Collapse
Affiliation(s)
- Alfredo Madrid-García
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain.
| | - Zulema Rosales-Rosado
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Dalifer Freites-Nuñez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Inés Pérez-Sancristóbal
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Esperanza Pato-Cour
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | | | - Luis Cabeza-Osorio
- Medicina Interna, Hospital Universitario del Henares, Avenida de Marie Curie, 0, 28822, Madrid, Spain
- Facultad de Medicina, Universidad Francisco de Vitoria, Carretera Pozuelo, Km 1800, 28223, Madrid, Spain
| | - Lydia Abasolo-Alcázar
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Leticia León-Mateos
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Benjamín Fernández-Gutiérrez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
- Facultad de Medicina, Universidad Complutense de Madrid, Madrid, Spain
| | - Luis Rodríguez-Rodríguez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| |
Collapse
|
36
|
Wang G, Liu Q, Chen G, Xia B, Zeng D, Chen G, Guo C. AI's deep dive into complex pediatric inguinal hernia issues: a challenge to traditional guidelines? Hernia 2023; 27:1587-1599. [PMID: 37843604 DOI: 10.1007/s10029-023-02900-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/19/2023] [Indexed: 10/17/2023]
Abstract
OBJECTIVE This study utilized ChatGPT, an artificial intelligence program based on large language models, to explore controversial issues in pediatric inguinal hernia surgery and compare its responses with the guidelines of the European Association of Pediatric Surgeons (EUPSA). METHODS Six contentious issues raised by EUPSA were submitted to ChatGPT 4.0 for analysis, for which two independent responses were generated for each issue. These generated answers were subsequently compared with systematic reviews and guidelines. To ensure content accuracy and reliability, a content analysis was conducted, and expert evaluations were solicited for validation. Content analysis evaluated the consistency or discrepancy between ChatGPT 4.0's responses and the guidelines. An expert scoring method assess the quality, reliability, and applicability of responses. The TF-IDF model tested the stability and consistency of the two responses. RESULTS The responses generated by ChatGPT 4.0 were mostly consistent with the guidelines. However, some differences and contradictions were noted. The average quality score was 3.33, reliability score was 2.75, and applicability score was 3.46 (out of 5). The average similarity between the two responses was 0.72 (out of 1), Content analysis and expert ratings yielded consistent conclusions, enhancing the credibility of our research. CONCLUSION ChatGPT can provide valuable responses to clinical questions, but it has limitations and requires further improvement. It is recommended to combine ChatGPT with other reliable data sources to improve clinical practice and decision-making.
Collapse
Affiliation(s)
- G Wang
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Pediatrics, Children's Hospital, Chongqing Medical University, Chongqing, People's Republic of China
- Department of Pediatric General Surgery, Chongqing Maternal and Child Health Hospital, Chongqing Medical University, Chongqing, People's Republic of China
| | - Q Liu
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
| | - G Chen
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
| | - B Xia
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
| | - D Zeng
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
| | - G Chen
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China.
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China.
- Department of Pediatric General Surgery, Chongqing Maternal and Child Health Hospital, Chongqing Medical University, Chongqing, People's Republic of China.
- Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Women and Children's Hospital of Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China.
| | - C Guo
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China.
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China.
- Department of Pediatric General Surgery, Chongqing Maternal and Child Health Hospital, Chongqing Medical University, Chongqing, People's Republic of China.
| |
Collapse
|
37
|
Chatterjee S, Bhattacharya M, Pal S, Lee SS, Chakraborty C. ChatGPT and large language models in orthopedics: from education and surgery to research. J Exp Orthop 2023; 10:128. [PMID: 38038796 PMCID: PMC10692045 DOI: 10.1186/s40634-023-00700-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 11/16/2023] [Indexed: 12/02/2023] Open
Abstract
ChatGPT has quickly popularized since its release in November 2022. Currently, large language models (LLMs) and ChatGPT have been applied in various domains of medical science, including in cardiology, nephrology, orthopedics, ophthalmology, gastroenterology, and radiology. Researchers are exploring the potential of LLMs and ChatGPT for clinicians and surgeons in every domain. This study discusses how ChatGPT can help orthopedic clinicians and surgeons perform various medical tasks. LLMs and ChatGPT can help the patient community by providing suggestions and diagnostic guidelines. In this study, the use of LLMs and ChatGPT to enhance and expand the field of orthopedics, including orthopedic education, surgery, and research, is explored. Present LLMs have several shortcomings, which are discussed herein. However, next-generation and future domain-specific LLMs are expected to be more potent and transform patients' quality of life.
Collapse
Affiliation(s)
- Srijan Chatterjee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-Si, 24252, Gangwon-Do, Republic of Korea
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, 756020, Odisha, India
| | - Soumen Pal
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-Si, 24252, Gangwon-Do, Republic of Korea.
| | - Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, 700126, India.
| |
Collapse
|
38
|
Al-Dujaili Z, Omari S, Pillai J, Al Faraj A. Assessing the accuracy and consistency of ChatGPT in clinical pharmacy management: A preliminary analysis with clinical pharmacy experts worldwide. Res Social Adm Pharm 2023; 19:1590-1594. [PMID: 37696742 DOI: 10.1016/j.sapharm.2023.08.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/30/2023] [Accepted: 08/30/2023] [Indexed: 09/13/2023]
Abstract
BACKGROUND ChatGPT conversation system has ushered in a revolutionary new era of information retrieval and stands as one of the fastest-growing platforms. Clinical pharmacy, as a dynamic discipline, necessitates an advanced comprehension of drugs and diseases. The process of decision-making in clinical pharmacy demands accuracy and consistency in medical information, as it directly affects patient safety. OBJECTIVE The objective was to evaluate ChatGPT's accuracy and consistency in managing pharmacotherapy cases across multiple time points. Additionally, input was gathered from global clinical pharmacy experts, and the agreement between ChatGPT's responses and those of clinical pharmacy experts worldwide was assessed. METHODS A set of 20 cases of pharmacotherapy was entered into ChatGPT at three different time points. Reliability analysis was performed using inter-rater reliability to measure the accuracy of the output generated by ChatGPT at each time point. Test-retest reliability was performed to measure the consistency of the output generated by ChatGPT across the three time points. Pharmacy expert performance was evaluated, and the overall results were compared. RESULTS ChatGPT achieved a hit rate of 70.83% at week 1, 79.2% at week 3, and 75% at week 5. The percent agreement between weeks 1 and 3 was 79.2%, whereas it was 87.5% between weeks 3 and 5, and 83.3% between weeks 1 and 5. In contrast, accuracy rates among clinical pharmacy experts showed considerable variation according to their geographic location. The highest agreement between clinical pharmacist responses and ChatGPT responses was observed at the last time point examined. CONCLUSIONS Overall, the analysis suggested that ChatGPT is capable of generating clinically relevant pharmaceutical information, albeit with some variation in accuracy and consistency. It should be noted that clinical pharmacy experts worldwide may provide varying degrees of accuracy depending on their expertise. This study highlights the potential of AI chatbots in clinical pharmacy.
Collapse
Affiliation(s)
- Zahraa Al-Dujaili
- College of Pharmacy, American University of Iraq - Baghdad (AUIB), Baghdad, Iraq
| | - Sarah Omari
- Department of Epidemiology and Population Health, American University of Beirut (AUB), Beirut, Lebanon
| | - Jey Pillai
- College of Pharmacy, American University of Iraq - Baghdad (AUIB), Baghdad, Iraq
| | - Achraf Al Faraj
- College of Pharmacy, American University of Iraq - Baghdad (AUIB), Baghdad, Iraq.
| |
Collapse
|
39
|
Wei Y, Guo L, Lian C, Chen J. ChatGPT: Opportunities, risks and priorities for psychiatry. Asian J Psychiatr 2023; 90:103808. [PMID: 37898100 DOI: 10.1016/j.ajp.2023.103808] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 10/18/2023] [Accepted: 10/22/2023] [Indexed: 10/30/2023]
Abstract
The advancement of large language models such as ChatGPT, opens new possibilities in psychiatry but also invites scrutiny. This paper examines the potential opportunities, risks, and crucial areas of focus within this area. The active engagement of the mental health community is seen as critical to ensure ethical practice, equal access, and a patient-centric approach.
Collapse
Affiliation(s)
- Yaohui Wei
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Department of Psychiatry and Psychotherapy, University Hospital Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Lei Guo
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Cheng Lian
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jue Chen
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| |
Collapse
|
40
|
Morita P, Abhari S, Kaur J. Do ChatGPT and Other Artificial Intelligence Bots Have Applications in Health Policy-Making? Opportunities and Threats. Int J Health Policy Manag 2023; 12:8131. [PMID: 38618768 PMCID: PMC10843407 DOI: 10.34172/ijhpm.2023.8131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 10/16/2023] [Indexed: 04/16/2024] Open
Affiliation(s)
- Plinio Morita
- School of Public Health Sciences, University of Waterloo, Waterloo, ON, Canada
- Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada
- Research Institute for Aging, University of Waterloo, Waterloo, ON, Canada
- Centre for Digital Therapeutics, Techna Institute, University Health Network, Toronto, ON, Canada
- Institute of Health Policy, Management, and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Shahabeddin Abhari
- School of Public Health Sciences, University of Waterloo, Waterloo, ON, Canada
| | - Jasleen Kaur
- School of Public Health Sciences, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|
41
|
Wong RSY, Ming LC, Raja Ali RA. The Intersection of ChatGPT, Clinical Medicine, and Medical Education. JMIR MEDICAL EDUCATION 2023; 9:e47274. [PMID: 37988149 DOI: 10.2196/47274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 06/16/2023] [Accepted: 06/30/2023] [Indexed: 11/22/2023]
Abstract
As we progress deeper into the digital age, the robust development and application of advanced artificial intelligence (AI) technology, specifically generative language models like ChatGPT (OpenAI), have potential implications in all sectors including medicine. This viewpoint article aims to present the authors' perspective on the integration of AI models such as ChatGPT in clinical medicine and medical education. The unprecedented capacity of ChatGPT to generate human-like responses, refined through Reinforcement Learning with Human Feedback, could significantly reshape the pedagogical methodologies within medical education. Through a comprehensive review and the authors' personal experiences, this viewpoint article elucidates the pros, cons, and ethical considerations of using ChatGPT within clinical medicine and notably, its implications for medical education. This exploration is crucial in a transformative era where AI could potentially augment human capability in the process of knowledge creation and dissemination, potentially revolutionizing medical education and clinical practice. The importance of maintaining academic integrity and professional standards is highlighted. The relevance of establishing clear guidelines for the responsible and ethical use of AI technologies in clinical medicine and medical education is also emphasized.
Collapse
Affiliation(s)
- Rebecca Shin-Yee Wong
- Department of Medical Education, School of Medical and Life Sciences, Sunway University, Selangor, Malaysia
- Faculty of Medicine, Nursing and Health Sciences, SEGi University, Petaling Jaya, Malaysia
| | - Long Chiau Ming
- School of Medical and Life Sciences, Sunway University, Selangor, Malaysia
| | - Raja Affendi Raja Ali
- School of Medical and Life Sciences, Sunway University, Selangor, Malaysia
- GUT Research Group, Faculty of Medicine, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
| |
Collapse
|
42
|
Franco D'Souza R, Amanullah S, Mathew M, Surapaneni KM. Appraising the performance of ChatGPT in psychiatry using 100 clinical case vignettes. Asian J Psychiatr 2023; 89:103770. [PMID: 37812998 DOI: 10.1016/j.ajp.2023.103770] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 09/13/2023] [Accepted: 09/18/2023] [Indexed: 10/11/2023]
Abstract
BACKGROUND ChatGPT has emerged as the most advanced and rapidly developing large language chatbot system. With its immense potential ranging from answering a simple query to cracking highly competitive medical exams, ChatGPT continues to impress the scientists and researchers worldwide giving room for more discussions regarding its utility in various fields. One such field of attention is Psychiatry. With suboptimal diagnosis and treatment, assuring mental health and well-being is a challenge in many countries, particularly developing nations. To this regard, we conducted an evaluation to assess the performance of ChatGPT 3.5 in Psychiatry using clinical cases to provide evidence-based information regarding the implication of ChatGPT 3.5 in enhancing mental health and well-being. METHODS ChatGPT 3.5 was used in this experimental study to initiate the conversations and collect responses to clinical vignettes in Psychiatry. Using 100 clinical case vignettes, the replies were assessed by expert faculties from the Department of Psychiatry. There were 100 different psychiatric illnesses represented in the cases. We recorded and assessed the initial ChatGPT 3.5 responses. The evaluation was conducted using the objective of questions that were put forth at the conclusion of the case, and the aim of the questions was divided into 10 categories. The grading was completed by taking the mean value of the scores provided by the evaluators. Graphs and tables were used to represent the grades. RESULTS The evaluation report suggests that ChatGPT 3.5 fared extremely well in Psychiatry by receiving "Grade A" ratings in 61 out of 100 cases, "Grade B" ratings in 31, and "Grade C" ratings in 8. Majority of the queries were concerned with the management strategies, which were followed by diagnosis, differential diagnosis, assessment, investigation, counselling, clinical reasoning, ethical reasoning, prognosis, and request acceptance. ChatGPT 3.5 performed extremely well, especially in generating management strategies followed by diagnoses for different psychiatric conditions. There were no responses which were graded "D" indicating that there were no errors in the diagnosis or response for clinical care. Only a few discrepancies and additional details were missed in a few responses that received a "Grade C" CONCLUSION: It is evident from our study that ChatGPT 3.5 has appreciable knowledge and interpretation skills in Psychiatry. Thus, ChatGPT 3.5 undoubtedly has the potential to transform the field of Medicine and we emphasize its utility in Psychiatry through the finding of our study. However, for any AI model to be successful, assuring the reliability, validation of information, proper guidelines and implementation framework are necessary.
Collapse
Affiliation(s)
- Russell Franco D'Souza
- Professor of Organizational Psychological Medicine, International Institute of Organisational Psychological Medicine, 71 Cleeland Street, Dandenong Victoria, Melbourne, 3175 Australia
| | - Shabbir Amanullah
- Division of Geriatric Psychiatry, Queen's University, 752 King Street West, Postal Bag 603 Kingston, ON K7L7X3
| | - Mary Mathew
- Department of Pathology, Kasturba Medical College, Manipal Academy of Higher Education, Tiger Circle Road, Madhav Nagar, Manipal, Karnataka 576104
| | - Krishna Mohan Surapaneni
- Department of Biochemistry, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai - 600 123, Tamil Nadu, India; Departments of Medical Education, Molecular Virology, Research, Clinical Skills & Simulation, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai - 600 123, Tamil Nadu, India.
| |
Collapse
|
43
|
Mese I, Taslicay CA, Sivrioglu AK. Improving radiology workflow using ChatGPT and artificial intelligence. Clin Imaging 2023; 103:109993. [PMID: 37812965 DOI: 10.1016/j.clinimag.2023.109993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/19/2023] [Accepted: 09/28/2023] [Indexed: 10/11/2023]
Abstract
Artificial Intelligence is a branch of computer science that aims to create intelligent machines capable of performing tasks that typically require human intelligence. One of the branches of artificial intelligence is natural language processing, which is dedicated to studying the interaction between computers and human language. ChatGPT is a sophisticated natural language processing tool that can understand and respond to complex questions and commands in natural language. Radiology is a vital aspect of modern medicine that involves the use of imaging technologies to diagnose and treat medical conditions artificial intelligence, including ChatGPT, can be integrated into radiology workflows to improve efficiency, accuracy, and patient care. ChatGPT can streamline various radiology workflow steps, including patient registration, scheduling, patient check-in, image acquisition, interpretation, and reporting. While ChatGPT has the potential to transform radiology workflows, there are limitations to the technology that must be addressed, such as the potential for bias in artificial intelligence algorithms and ethical concerns. As technology continues to advance, ChatGPT is likely to become an increasingly important tool in the field of radiology, and in healthcare more broadly.
Collapse
Affiliation(s)
- Ismail Mese
- Department of Radiology, Health Sciences University, Erenkoy Mental Health and Neurology Training and Research Hospital, 19 Mayıs, Sinan Ercan Cd. No: 23, Kadıköy/Istanbul 34736, Turkey.
| | | | - Ali Kemal Sivrioglu
- Department of Radiology, Liv Hospital Vadistanbul, Ayazağa Mahallesi, Kemerburgaz Caddesi, Vadistanbul Park Etabı, 7F Blok, 34396 Sarıyer/İstanbul, Turkey
| |
Collapse
|
44
|
Chakraborty C, Pal S, Bhattacharya M, Dash S, Lee SS. Overview of Chatbots with special emphasis on artificial intelligence-enabled ChatGPT in medical science. Front Artif Intell 2023; 6:1237704. [PMID: 38028668 PMCID: PMC10644239 DOI: 10.3389/frai.2023.1237704] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/05/2023] [Indexed: 12/01/2023] Open
Abstract
The release of ChatGPT has initiated new thinking about AI-based Chatbot and its application and has drawn huge public attention worldwide. Researchers and doctors have started thinking about the promise and application of AI-related large language models in medicine during the past few months. Here, the comprehensive review highlighted the overview of Chatbot and ChatGPT and their current role in medicine. Firstly, the general idea of Chatbots, their evolution, architecture, and medical use are discussed. Secondly, ChatGPT is discussed with special emphasis of its application in medicine, architecture and training methods, medical diagnosis and treatment, research ethical issues, and a comparison of ChatGPT with other NLP models are illustrated. The article also discussed the limitations and prospects of ChatGPT. In the future, these large language models and ChatGPT will have immense promise in healthcare. However, more research is needed in this direction.
Collapse
Affiliation(s)
- Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, India
| | - Soumen Pal
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | | | - Snehasish Dash
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging and Orthopedic Surgery, Hallym University Chuncheon Sacred Heart Hospital, Chuncheon-si, Gangwon-do, Republic of Korea
| |
Collapse
|
45
|
Wilhelm TI, Roos J, Kaczmarczyk R. Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study. J Med Internet Res 2023; 25:e49324. [PMID: 37902826 PMCID: PMC10644179 DOI: 10.2196/49324] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 09/12/2023] [Accepted: 09/29/2023] [Indexed: 10/31/2023] Open
Abstract
BACKGROUND As advancements in artificial intelligence (AI) continue, large language models (LLMs) have emerged as promising tools for generating medical information. Their rapid adaptation and potential benefits in health care require rigorous assessment in terms of the quality, accuracy, and safety of the generated information across diverse medical specialties. OBJECTIVE This study aimed to evaluate the performance of 4 prominent LLMs, namely, Claude-instant-v1.0, GPT-3.5-Turbo, Command-xlarge-nightly, and Bloomz, in generating medical content spanning the clinical specialties of ophthalmology, orthopedics, and dermatology. METHODS Three domain-specific physicians evaluated the AI-generated therapeutic recommendations for a diverse set of 60 diseases. The evaluation criteria involved the mDISCERN score, correctness, and potential harmfulness of the recommendations. ANOVA and pairwise t tests were used to explore discrepancies in content quality and safety across models and specialties. Additionally, using the capabilities of OpenAI's most advanced model, GPT-4, an automated evaluation of each model's responses to the diseases was performed using the same criteria and compared to the physicians' assessments through Pearson correlation analysis. RESULTS Claude-instant-v1.0 emerged with the highest mean mDISCERN score (3.35, 95% CI 3.23-3.46). In contrast, Bloomz lagged with the lowest score (1.07, 95% CI 1.03-1.10). Our analysis revealed significant differences among the models in terms of quality (P<.001). Evaluating their reliability, the models displayed strong contrasts in their falseness ratings, with variations both across models (P<.001) and specialties (P<.001). Distinct error patterns emerged, such as confusing diagnoses; providing vague, ambiguous advice; or omitting critical treatments, such as antibiotics for infectious diseases. Regarding potential harm, GPT-3.5-Turbo was found to be the safest, with the lowest harmfulness rating. All models lagged in detailing the risks associated with treatment procedures, explaining the effects of therapies on quality of life, and offering additional sources of information. Pearson correlation analysis underscored a substantial alignment between physician assessments and GPT-4's evaluations across all established criteria (P<.01). CONCLUSIONS This study, while comprehensive, was limited by the involvement of a select number of specialties and physician evaluators. The straightforward prompting strategy ("How to treat…") and the assessment benchmarks, initially conceptualized for human-authored content, might have potential gaps in capturing the nuances of AI-driven information. The LLMs evaluated showed a notable capability in generating valuable medical content; however, evident lapses in content quality and potential harm signal the need for further refinements. Given the dynamic landscape of LLMs, this study's findings emphasize the need for regular and methodical assessments, oversight, and fine-tuning of these AI tools to ensure they produce consistently trustworthy and clinically safe medical advice. Notably, the introduction of an auto-evaluation mechanism using GPT-4, as detailed in this study, provides a scalable, transferable method for domain-agnostic evaluations, extending beyond therapy recommendation assessments.
Collapse
Affiliation(s)
- Theresa Isabelle Wilhelm
- Eye Center, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
- Medical Graduate Center, School of Medicine, Technical University of Munich, Munich, Germany
| | - Jonas Roos
- Department of Orthopedics and Trauma Surgery, University Hospital of Bonn, Bonn, Germany
| | - Robert Kaczmarczyk
- Department of Dermatology and Allergy, School of Medicine, Technical University of Munich, Munich, Germany
- Division of Dermatology and Venerology, Department of Medicine Solna, Karolinska Institutet, Solna, Sweden
| |
Collapse
|
46
|
Jin Y, Liu H, Zhao B, Pan W. ChatGPT and mycosis- a new weapon in the knowledge battlefield. BMC Infect Dis 2023; 23:731. [PMID: 37891532 PMCID: PMC10605453 DOI: 10.1186/s12879-023-08724-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Accepted: 10/17/2023] [Indexed: 10/29/2023] Open
Abstract
As current trend for physician tools, ChatGPT can sift through massive amounts of information and solve problems through easy-to-understand conversations, ultimately improving efficiency. Mycosis is currently facing great challenges, including high fungal burdens, high mortality, limited choice of antifungal drugs and increasing drug resistance. To address these challenges, We asked ChatGPT for fungal infection scenario-based questions and assessed its appropriateness, consistency, and potential pitfalls. We concluded ChatGPT can provide compelling responses to most prompts, including diagnosis, recommendations for examination, treatment and rational drug use. Moreover, we summarized exciting future applications in mycosis, such as clinical work, scientific research, education and healthcare. However, the largest barriers to implementation are deficits in indiviudal advice, timely literature updates, consistency, accuracy and data safety. To fully embrace the opportunity, we need to address these barriers and manage the risks. We expect that ChatGPT will become a new weapon in in the battlefield of mycosis.
Collapse
Affiliation(s)
- Yi Jin
- Department of Dermatology, Shanghai Key Laboratory of Medical Mycology, Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, P.R. China
| | - Hua Liu
- Department of Anesthesiology, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Bin Zhao
- Department of Anesthesiology and SICU, Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200092, P.R. China.
| | - Weihua Pan
- Department of Dermatology, Shanghai Key Laboratory of Medical Mycology, Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, P.R. China.
| |
Collapse
|
47
|
Griewing S, Gremke N, Wagner U, Lingenfelder M, Kuhn S, Boekhoff J. Challenging ChatGPT 3.5 in Senology-An Assessment of Concordance with Breast Cancer Tumor Board Decision Making. J Pers Med 2023; 13:1502. [PMID: 37888113 PMCID: PMC10608120 DOI: 10.3390/jpm13101502] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 10/13/2023] [Accepted: 10/13/2023] [Indexed: 10/28/2023] Open
Abstract
With the recent diffusion of access to publicly available large language models (LLMs), common interest in generative artificial-intelligence-based applications for medical purposes has skyrocketed. The increased use of these models by tech-savvy patients for personal health issues calls for a scientific evaluation of whether LLMs provide a satisfactory level of accuracy for treatment decisions. This observational study compares the concordance of treatment recommendations from the popular LLM ChatGPT 3.5 with those of a multidisciplinary tumor board for breast cancer (MTB). The study design builds on previous findings by combining an extended input model with patient profiles reflecting patho- and immunomorphological diversity of primary breast cancer, including primary metastasis and precancerous tumor stages. Overall concordance between the LLM and MTB is reached for half of the patient profiles, including precancerous lesions. In the assessment of invasive breast cancer profiles, the concordance amounts to 58.8%. Nevertheless, as the LLM makes considerably fraudulent decisions at times, we do not identify the current development status of publicly available LLMs to be adequate as a support tool for tumor boards. Gynecological oncologists should familiarize themselves with the capabilities of LLMs in order to understand and utilize their potential while keeping in mind potential risks and limitations.
Collapse
Affiliation(s)
- Sebastian Griewing
- Institute for Digital Medicine, University Hospital Marburg, Philipps-University Marburg, Baldingerstraße, 35043 Marburg, Germany;
- Department of Gynecology and Obstetrics, University Hospital Marburg, Philipps-University Marburg, Baldingerstraße, 35043 Marburg, Germany; (N.G.); (U.W.); (J.B.)
- Institute for Healthcare Management, Chair of General Business Administration, Philipps-University Marburg, Universitätsstraße 24, 35037 Marburg, Germany;
| | - Niklas Gremke
- Department of Gynecology and Obstetrics, University Hospital Marburg, Philipps-University Marburg, Baldingerstraße, 35043 Marburg, Germany; (N.G.); (U.W.); (J.B.)
| | - Uwe Wagner
- Department of Gynecology and Obstetrics, University Hospital Marburg, Philipps-University Marburg, Baldingerstraße, 35043 Marburg, Germany; (N.G.); (U.W.); (J.B.)
| | - Michael Lingenfelder
- Institute for Healthcare Management, Chair of General Business Administration, Philipps-University Marburg, Universitätsstraße 24, 35037 Marburg, Germany;
| | - Sebastian Kuhn
- Institute for Digital Medicine, University Hospital Marburg, Philipps-University Marburg, Baldingerstraße, 35043 Marburg, Germany;
| | - Jelena Boekhoff
- Department of Gynecology and Obstetrics, University Hospital Marburg, Philipps-University Marburg, Baldingerstraße, 35043 Marburg, Germany; (N.G.); (U.W.); (J.B.)
| |
Collapse
|
48
|
Alvarado R, Morar N. ChatGPT's Relevance for Bioethics: A Novel Challenge to the Intrinsically Relational, Critical, and Reason-Giving Aspect of Healthcare. THE AMERICAN JOURNAL OF BIOETHICS : AJOB 2023; 23:71-73. [PMID: 37812123 DOI: 10.1080/15265161.2023.2250305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
|
49
|
Garg RK, Urs VL, Agarwal AA, Chaudhary SK, Paliwal V, Kar SK. Exploring the role of ChatGPT in patient care (diagnosis and treatment) and medical research: A systematic review. Health Promot Perspect 2023; 13:183-191. [PMID: 37808939 PMCID: PMC10558973 DOI: 10.34172/hpp.2023.22] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 07/06/2023] [Indexed: 10/10/2023] Open
Abstract
Background ChatGPT is an artificial intelligence based tool developed by OpenAI (California, USA). This systematic review examines the potential of ChatGPT in patient care and its role in medical research. Methods The systematic review was done according to the PRISMA guidelines. Embase, Scopus, PubMed and Google Scholar data bases were searched. We also searched preprint data bases. Our search was aimed to identify all kinds of publications, without any restrictions, on ChatGPT and its application in medical research, medical publishing and patient care. We used search term "ChatGPT". We reviewed all kinds of publications including original articles, reviews, editorial/ commentaries, and even letter to the editor. Each selected records were analysed using ChatGPT and responses generated were compiled in a table. The word table was transformed in to a PDF and was further analysed using ChatPDF. Results We reviewed full texts of 118 articles. ChatGPT can assist with patient enquiries, note writing, decision-making, trial enrolment, data management, decision support, research support, and patient education. But the solutions it offers are usually insufficient and contradictory, raising questions about their originality, privacy, correctness, bias, and legality. Due to its lack of human-like qualities, ChatGPT's legitimacy as an author is questioned when used for academic writing. ChatGPT generated contents have concerns with bias and possible plagiarism. Conclusion Although it can help with patient treatment and research, there are issues with accuracy, authorship, and bias. ChatGPT can serve as a "clinical assistant" and be a help in research and scholarly writing.
Collapse
Affiliation(s)
| | - Vijeth L Urs
- Department of Neurology, King George’s Medical University, Lucknow, India
| | | | | | - Vimal Paliwal
- Department of Neurology, Sanjay Gandhi Institute of Medical Sciences, Lucknow, India
| | - Sujita Kumar Kar
- Department of Psychiatry, King George’s Medical University, Lucknow, India
| |
Collapse
|
50
|
Ray PP. Peeking inside GPT-4 for medical research and practice. J Chin Med Assoc 2023; 86:866. [PMID: 37458382 DOI: 10.1097/jcma.0000000000000961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/09/2023] Open
Affiliation(s)
- Partha Pratim Ray
- Department of Computer Applications, Sikkim University, Gangtok, Sikkim, India
| |
Collapse
|