1
|
Mahedia M, Rohrich RN, Sadiq KO, Bailey L, Harrison LM, Hallac RR. Exploring the Utility of ChatGPT in Cleft Lip Repair Education. J Clin Med 2025; 14:993. [PMID: 39941663 PMCID: PMC11818196 DOI: 10.3390/jcm14030993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Revised: 01/30/2025] [Accepted: 02/01/2025] [Indexed: 02/16/2025] Open
Abstract
Background/Objectives: The evolving capabilities of large language models, such as generative pre-trained transformers (ChatGPT), offer new avenues for disseminating health information online. These models, trained on extensive datasets, are designed to deliver customized responses to user queries. However, as these outputs are unsupervised, understanding their quality and accuracy is essential to gauge their reliability for potential applications in healthcare. This study evaluates responses generated by ChatGPT addressing common patient concerns and questions about cleft lip repair. Methods: Ten commonly asked questions about cleft lip repair procedures were selected from the American Society of Plastic Surgeons' patient information resources. These questions were input as ChatGPT prompts and five board-certified plastic surgeons assessed the generated responses on quality of content, clarity, relevance, and trustworthiness, using a 4-point Likert scale. Readability was evaluated using the Flesch reading ease score (FRES) and the Flesch-Kincaid grade level (FKGL). Results: ChatGPT responses scored an aggregated mean rating of 2.9 out of 4 across all evaluation criteria. Clarity and content quality received the highest ratings (3.1 ± 0.6), while trustworthiness had the lowest rating (2.7 ± 0.6). Readability metrics revealed a mean FRES of 44.35 and a FKGL of 10.87, corresponding to approximately a 10th-grade literacy standard. None of the responses contained grossly inaccurate or potentially harmful medical information but lacked citations. Conclusions: ChatGPT demonstrates potential as a supplementary tool for patient education in cleft lip management by delivering generally accurate, relevant, and understandable information. Despite the value that AI-powered tools can provide to clinicians and patients, the lack of human oversight underscores the importance of user awareness regarding its limitations.
Collapse
Affiliation(s)
- Monali Mahedia
- Department of Surgery, Rutgers University—NJMS, Newark, NJ 07103, USA
| | - Rachel N. Rohrich
- Department of Plastic and Reconstructive Surgery, MedStar Georgetown University Hospital, Washington, DC 20007, USA
| | | | - Lauren Bailey
- Department of Plastic Surgery, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Lucas M. Harrison
- Department of Plastic Surgery, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Rami R. Hallac
- Department of Plastic Surgery, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Analytical Imaging and Modeling Center, Children’s Health, Dallas, TX 75235, USA
| |
Collapse
|
2
|
Balta KY, Javidan AP, Walser E, Arntfield R, Prager R. Evaluating the Appropriateness, Consistency, and Readability of ChatGPT in Critical Care Recommendations. J Intensive Care Med 2025; 40:184-190. [PMID: 39118320 PMCID: PMC11639400 DOI: 10.1177/08850666241267871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 06/12/2024] [Accepted: 07/18/2024] [Indexed: 08/10/2024]
Abstract
Background: We assessed 2 versions of the large language model (LLM) ChatGPT-versions 3.5 and 4.0-in generating appropriate, consistent, and readable recommendations on core critical care topics. Research Question: How do successive large language models compare in terms of generating appropriate, consistent, and readable recommendations on core critical care topics? Design and Methods: A set of 50 LLM-generated responses to clinical questions were evaluated by 2 independent intensivists based on a 5-point Likert scale for appropriateness, consistency, and readability. Results: ChatGPT 4.0 showed significantly higher median appropriateness scores compared to ChatGPT 3.5 (4.0 vs 3.0, P < .001). However, there was no significant difference in consistency between the 2 versions (40% vs 28%, P = 0.291). Readability, assessed by the Flesch-Kincaid Grade Level, was also not significantly different between the 2 models (14.3 vs 14.4, P = 0.93). Interpretation: Both models produced "hallucinations"-misinformation delivered with high confidence-which highlights the risk of relying on these tools without domain expertise. Despite potential for clinical application, both models lacked consistency producing different results when asked the same question multiple times. The study underscores the need for clinicians to understand the strengths and limitations of LLMs for safe and effective implementation in critical care settings. Registration: https://osf.io/8chj7/.
Collapse
Affiliation(s)
- Kaan Y. Balta
- Schulich School of Medicine & Dentistry, Western University, London, Ontario, Canada
| | - Arshia P. Javidan
- Division of Vascular Surgery, Department of Surgery, University of Toronto, Toronto, Ontario, Canada
| | - Eric Walser
- Division of Critical Care, London Health Sciences Centre, Western University, London, Ontario, Canada
- Department of Surgery, Trauma Program, London Health Sciences Centre, London, Ontario, Canada
| | - Robert Arntfield
- Division of Critical Care, London Health Sciences Centre, Western University, London, Ontario, Canada
| | - Ross Prager
- Division of Critical Care, London Health Sciences Centre, Western University, London, Ontario, Canada
| |
Collapse
|
3
|
Tangsrivimol JA, Darzidehkalani E, Virk HUH, Wang Z, Egger J, Wang M, Hacking S, Glicksberg BS, Strauss M, Krittanawong C. Benefits, limits, and risks of ChatGPT in medicine. Front Artif Intell 2025; 8:1518049. [PMID: 39949509 PMCID: PMC11821943 DOI: 10.3389/frai.2025.1518049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2024] [Accepted: 01/15/2025] [Indexed: 02/16/2025] Open
Abstract
ChatGPT represents a transformative technology in healthcare, with demonstrated impacts across clinical practice, medical education, and research. Studies show significant efficiency gains, including 70% reduction in administrative time for discharge summaries and achievement of medical professional-level performance on standardized tests (60% accuracy on USMLE, 78.2% on PubMedQA). ChatGPT offers personalized learning platforms, automated scoring, and instant access to vast medical knowledge in medical education, addressing resource limitations and enhancing training efficiency. It streamlines clinical workflows by supporting triage processes, generating discharge summaries, and alleviating administrative burdens, allowing healthcare professionals to focus more on patient care. Additionally, ChatGPT facilitates remote monitoring and chronic disease management, providing personalized advice, medication reminders, and emotional support, thus bridging gaps between clinical visits. Its ability to process and synthesize vast amounts of data accelerates research workflows, aiding in literature reviews, hypothesis generation, and clinical trial designs. This paper aims to gather and analyze published studies involving ChatGPT, focusing on exploring its advantages and disadvantages within the healthcare context. To aid in understanding and progress, our analysis is organized into six key areas: (1) Information and Education, (2) Triage and Symptom Assessment, (3) Remote Monitoring and Support, (4) Mental Healthcare Assistance, (5) Research and Decision Support, and (6) Language Translation. Realizing ChatGPT's full potential in healthcare requires addressing key limitations, such as its lack of clinical experience, inability to process visual data, and absence of emotional intelligence. Ethical, privacy, and regulatory challenges further complicate its integration. Future improvements should focus on enhancing accuracy, developing multimodal AI models, improving empathy through sentiment analysis, and safeguarding against artificial hallucination. While not a replacement for healthcare professionals, ChatGPT can serve as a powerful assistant, augmenting their expertise to improve efficiency, accessibility, and quality of care. This collaboration ensures responsible adoption of AI in transforming healthcare delivery. While ChatGPT demonstrates significant potential in healthcare transformation, systematic evaluation of its implementation across different healthcare settings reveals varying levels of evidence quality-from robust randomized trials in medical education to preliminary observational studies in clinical practice. This heterogeneity in evidence quality necessitates a structured approach to future research and implementation.
Collapse
Affiliation(s)
- Jonathan A. Tangsrivimol
- Department of Neurosurgery, and Neuroscience, Weill Cornell Medicine, NewYork-Presbyterian Hospital, New York, NY, United States
- Department of Neurosurgery, Chulabhorn Hospital, Chulabhorn Royal Academy, Bangkok, Thailand
| | - Erfan Darzidehkalani
- MIT Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Hafeez Ul Hassan Virk
- Harrington Heart & Vascular Institute, University Hospitals Cleveland Medical Center, Case Western Reserve University, Cleveland, OH, United States
| | - Zhen Wang
- Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, United States
- Division of Health Care Policy and Research, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Jan Egger
- Institute for Artificial Intelligence in Medicine, University Hospital Essen (AöR), Essen, Germany
| | - Michelle Wang
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
| | - Sean Hacking
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, United States
| | - Benjamin S. Glicksberg
- Hasso Plattner Institute for Digital Health, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Markus Strauss
- Department of Cardiology I, Coronary and Peripheral Vascular Disease, Heart Failure Medicine, University Hospital Muenster, Muenster, Germany
- Department of Cardiology, Sector Preventive Medicine, Health Promotion, Faculty of Health, School of Medicine, University Witten/Herdecke, Hagen, Germany
| | - Chayakrit Krittanawong
- Cardiology Division, New York University Langone Health, New York University School of Medicine, New York, NY, United States
- HumanX, Delaware, DE, United States
| |
Collapse
|
4
|
Zhang L, Zhao Q, Zhang D, Song M, Zhang Y, Wang X. Application of large language models in healthcare: A bibliometric analysis. Digit Health 2025; 11:20552076251324444. [PMID: 40035041 PMCID: PMC11873863 DOI: 10.1177/20552076251324444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Accepted: 02/11/2025] [Indexed: 03/05/2025] Open
Abstract
Objective The objective is to provide an overview of the application of large language models (LLMs) in healthcare by employing a bibliometric analysis methodology. Method We performed a comprehensive search for peer-reviewed English-language articles using PubMed and Web of Science. The selected articles were subsequently clustered and analyzed textually, with a focus on lexical co-occurrences, country-level and inter-author collaborations, and other relevant factors. This textual analysis produced high-level concept maps that illustrate specific terms and their interconnections. Findings Our final sample comprised 371 English-language journal articles. The study revealed a sharp rise in the number of publications related to the application of LLMs in healthcare. However, the development is geographically imbalanced, with a higher concentration of articles originating from developed countries like the United States, Italy, and Germany, which also exhibit strong inter-country collaboration. LLMs are applied across various specialties, with researchers investigating their use in medical education, diagnosis, treatment, administrative reporting, and enhancing doctor-patient communication. Nonetheless, significant concerns persist regarding the risks and ethical implications of LLMs, including the potential for gender and racial bias, as well as the lack of transparency in the training datasets, which can lead to inaccurate or misleading responses. Conclusion While the application of LLMs in healthcare is promising, the widespread adoption of LLMs in practice requires further improvements in their standardization and accuracy. It is critical to establish clear accountability guidelines, develop a robust regulatory framework, and ensure that training datasets are based on evidence-based sources to minimize risk and ensure ethical and reliable use.
Collapse
Affiliation(s)
- Lanping Zhang
- Department of the Third Pulmonary Disease, Shenzhen Third People's Hospital, Shenzhen, Guangdong Province, China
- Shenzhen Clinical Research Center for Tuberculosis, Shenzhen, Guangdong Province, China
| | - Qing Zhao
- Acacia Lab for Implementation Science, School of Public Health Management, Southern Medical University, Guangzhou, Guangdong, China
| | - Dandan Zhang
- Department of the Third Pulmonary Disease, Shenzhen Third People's Hospital, Shenzhen, Guangdong Province, China
- Shenzhen Clinical Research Center for Tuberculosis, Shenzhen, Guangdong Province, China
| | - Meijuan Song
- Department of the Third Pulmonary Disease, Shenzhen Third People's Hospital, Shenzhen, Guangdong Province, China
- Shenzhen Clinical Research Center for Tuberculosis, Shenzhen, Guangdong Province, China
| | - Yu Zhang
- School of Humanities Changzhou Vocational Institute of Textile and Garment Changzhou, China
| | - Xiufen Wang
- Department of the Third Pulmonary Disease, Shenzhen Third People's Hospital, Shenzhen, Guangdong Province, China
- Shenzhen Clinical Research Center for Tuberculosis, Shenzhen, Guangdong Province, China
| |
Collapse
|
5
|
Hao J, Yao Z, Tang Y, Remis A, Wu K, Yu X. Artificial Intelligence in Physical Therapy: Evaluating ChatGPT's Role in Clinical Decision Support for Musculoskeletal Care. Ann Biomed Eng 2025; 53:9-13. [PMID: 39760952 DOI: 10.1007/s10439-025-03676-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Accepted: 12/31/2024] [Indexed: 01/07/2025]
Abstract
BACKGROUND The integration of artificial intelligence into medicine has attracted increasing attention in recent years. ChatGPT has emerged as a promising tool for delivering evidence-based recommendations in various clinical domains. However, the application of ChatGPT to physical therapy for musculoskeletal conditions has yet to be investigated. METHODS Thirty clinical questions related to spinal, lower extremity, and upper extremity conditions were quired to ChatGPT-4. Responses were assessed for accuracy against clinical practice guidelines by two reviewers. Intra- and inter-rater reliability were measured using Fleiss' kappa (k). RESULTS ChatGPT's responses were consistent with CPG recommendations for 80% of the questions. Performance was highest for upper extremity conditions (100%) and lowest for spinal conditions (60%), with a moderate performance for lower extremity conditions (87%). Intra-rater reliability was good (k = 0.698 and k = 0.631 for the two reviewers), and inter-rater reliability was very good (k = 0.847). CONCLUSION ChatGPT demonstrates promise as a supplementary decision-making support tool for physical therapy, with good accuracy and reliability in aligning with clinical practice guideline recommendations. Further research is needed to evaluate its performance across broader scenarios and refine its clinical applicability.
Collapse
Affiliation(s)
- Jie Hao
- Department of Physical Therapy and Rehabilitation, Southeast Colorado Hospital, Springfield, CO, 81073, USA.
- Global Health Opportunities Program, University of Nebraska Medical Center, Omaha, NE, USA.
| | - Zixuan Yao
- Department of Rehabilitation Medicine, Beijing Hospital, National Center of Gerontology, Institution of Geriatric Medicine, Chinese Academy of Medical Science, Beijing, 100051, People's Republic of China.
| | - Yaogeng Tang
- Program in Physical Therapy, Washington University in St. Louis, St. Louis, MI, USA
| | - Andréas Remis
- Duke Health Center Arringdon, Duke University, Durham, NC, USA
| | - Kangchao Wu
- School of Health Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Xin Yu
- Department of Rehabilitation Medicine, Beijing Jishuitan Hospital, Beijing, People's Republic of China
| |
Collapse
|
6
|
Huang AE, Chang MT, Khanwalkar A, Yan CH, Phillips KM, Yong MJ, Nayak JV, Hwang PH, Patel ZM. Utilization of ChatGPT for Rhinology Patient Education: Limitations in a Surgical Sub-Specialty. OTO Open 2025; 9:e70065. [PMID: 39776758 PMCID: PMC11705442 DOI: 10.1002/oto2.70065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 10/28/2024] [Accepted: 12/08/2024] [Indexed: 01/11/2025] Open
Abstract
Objective To analyze the accuracy of ChatGPT-generated responses to common rhinologic patient questions. Methods Ten common questions from rhinology patients were compiled by a panel of 4 rhinology fellowship-trained surgeons based on clinical patient experience. This panel (Panel 1) developed consensus "expert" responses to each question. Questions were individually posed to ChatGPT (version 3.5) and its responses recorded. ChatGPT-generated responses were individually graded by Panel 1 on a scale of 0 (incorrect) to 3 (correct and exceeding the quality of expert responses). A 2nd panel was given the consensus and ChatGPT responses to each question and asked to guess which response corresponded to which source. They then graded ChatGPT responses using the same criteria as Panel 1. Question-specific and overall mean grades for ChatGPT responses, as well as interclass correlation coefficient (ICC) as a measure of interrater reliability, were calculated. Results The overall mean grade for ChatGPT responses was 1.65/3. For 2 out of 10 questions, ChatGPT responses were equal to or better than expert responses. However, for 8 out of 10 questions, ChatGPT provided responses that were incorrect, false, or incomplete based on mean rater grades. Overall ICC was 0.526, indicating moderate reliability among raters of ChatGPT responses. Reviewers were able to discern ChatGPT from human responses with 97.5% accuracy. Conclusion This preliminary study demonstrates overall near-complete and variably accurate responses provided by ChatGPT to common rhinologic questions, demonstrating important limitations in nuanced subspecialty fields.
Collapse
Affiliation(s)
- Alice E. Huang
- Department of Otolaryngology–Head and Neck SurgeryStanford University School of MedicineStanfordCaliforniaUSA
| | - Michael T. Chang
- Department of Otolaryngology–Head and Neck SurgeryStanford University School of MedicineStanfordCaliforniaUSA
| | - Ashoke Khanwalkar
- Department of Otolaryngology–Head and Neck SurgeryUniversity of Colorado Anschultz School of MedicineAuroraColoradoUSA
| | - Carol H. Yan
- Department of Otolaryngology–Head and Neck SurgeryUniversity of California‐San Diego School of MedicineSan DiegoCaliforniaUSA
| | - Katie M. Phillips
- Department of Otolaryngology–Head and Neck SurgeryUniversity of Cincinnati College of MedicineCincinattiOhioUSA
| | - Michael J. Yong
- Department of Otolaryngology–Head and Neck SurgeryStanford University School of MedicineStanfordCaliforniaUSA
| | - Jayakar V. Nayak
- Department of Otolaryngology–Head and Neck SurgeryStanford University School of MedicineStanfordCaliforniaUSA
| | - Peter H. Hwang
- Department of Otolaryngology–Head and Neck SurgeryStanford University School of MedicineStanfordCaliforniaUSA
| | - Zara M. Patel
- Department of Otolaryngology–Head and Neck SurgeryStanford University School of MedicineStanfordCaliforniaUSA
| |
Collapse
|
7
|
Leng L. Challenge, integration, and change: ChatGPT and future anatomical education. MEDICAL EDUCATION ONLINE 2024; 29:2304973. [PMID: 38217884 PMCID: PMC10791098 DOI: 10.1080/10872981.2024.2304973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/08/2024] [Indexed: 01/15/2024]
Abstract
With the vigorous development of ChatGPT and its application in the field of education, a new era of the collaborative development of human and artificial intelligence and the symbiosis of education has come. Integrating artificial intelligence (AI) into medical education has the potential to revolutionize it. Large language models, such as ChatGPT, can be used as virtual teaching aids to provide students with individualized and immediate medical knowledge, and conduct interactive simulation learning and detection. In this paper, we discuss the application of ChatGPT in anatomy teaching and its various application levels based on our own teaching experiences, and discuss the advantages and disadvantages of ChatGPT in anatomy teaching. ChatGPT increases student engagement and strengthens students' ability to learn independently. At the same time, ChatGPT faces many challenges and limitations in medical education. Medical educators must keep pace with the rapid changes in technology, taking into account ChatGPT's impact on curriculum design, assessment strategies and teaching methods. Discussing the application of ChatGPT in medical education, especially anatomy teaching, is helpful to the effective integration and application of artificial intelligence tools in medical education.
Collapse
Affiliation(s)
- Lige Leng
- Fujian Provincial Key Laboratory of Neurodegenerative Disease and Aging Research, Institute of Neuroscience, School of Medicine, Xiamen University, Xiamen, Fujian, P.R. China
| |
Collapse
|
8
|
Zampatti S, Farro J, Peconi C, Cascella R, Strafella C, Calvino G, Megalizzi D, Trastulli G, Caltagirone C, Giardina E. AI-Powered Neurogenetics: Supporting Patient's Evaluation with Chatbot. Genes (Basel) 2024; 16:29. [PMID: 39858576 PMCID: PMC11765031 DOI: 10.3390/genes16010029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 12/23/2024] [Accepted: 12/24/2024] [Indexed: 01/27/2025] Open
Abstract
BACKGROUND/OBJECTIVES Artificial intelligence and large language models like ChatGPT and Google's Gemini are promising tools with remarkable potential to assist healthcare professionals. This study explores ChatGPT and Gemini's potential utility in assisting clinicians during the first evaluation of patients with suspected neurogenetic disorders. METHODS By analyzing the model's performance in identifying relevant clinical features, suggesting differential diagnoses, and providing insights into possible genetic testing, this research seeks to determine whether these AI tools could serve as a valuable adjunct in neurogenetic assessments. Ninety questions were posed to ChatGPT (Versions 4o, 4, and 3.5) and Gemini: four questions about clinical diagnosis, seven about genetic inheritance, estimable recurrence risks, and available tests, and four questions about patient management, each for six different neurogenetic rare disorders (Hereditary Spastic Paraplegia type 4 and type 7, Huntington Disease, Fragile X-associated Tremor/Ataxia Syndrome, Becker Muscular Dystrophy, and FacioScapuloHumeral Muscular Dystrophy). RESULTS According to the results of this study, GPT chatbots demonstrated significantly better performance than Gemini. Nonetheless, all AI chatbots showed notable gaps in diagnostic accuracy and a concerning level of hallucinations. CONCLUSIONS As expected, these tools can empower clinicians in assessing neurogenetic disorders, yet their effective use demands meticulous collaboration and oversight from both neurologists and geneticists.
Collapse
Affiliation(s)
- Stefania Zampatti
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Juliette Farro
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Cristina Peconi
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Raffaella Cascella
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Chemical-Toxicological and Pharmacological Evaluation of Drugs, Catholic University Our Lady of Good Counsel, 1000 Tirana, Albania
| | - Claudia Strafella
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Giulia Calvino
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Science, Roma Tre University, 00146 Rome, Italy
| | - Domenica Megalizzi
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Biomedicine and Prevention, Tor Vergata University, 00133 Rome, Italy
| | - Giulia Trastulli
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Systems Medicine, Tor Vergata University, 00133 Rome, Italy
| | - Carlo Caltagirone
- Department of Clinical and Behavioral Neurology, IRCCS Fondazione Santa Lucia, 00179 Rome, Italy
| | - Emiliano Giardina
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Biomedicine and Prevention, Tor Vergata University, 00133 Rome, Italy
| |
Collapse
|
9
|
Heisinger S, Salzmann SN, Senker W, Aspalter S, Oberndorfer J, Matzner MP, Stienen MN, Motov S, Huber D, Grohs JG. ChatGPT's Performance in Spinal Metastasis Cases-Can We Discuss Our Complex Cases with ChatGPT? J Clin Med 2024; 13:7864. [PMID: 39768787 PMCID: PMC11727723 DOI: 10.3390/jcm13247864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 12/11/2024] [Accepted: 12/19/2024] [Indexed: 01/06/2025] Open
Abstract
Background: The integration of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT-4, is transforming healthcare. ChatGPT's potential to assist in decision-making for complex cases, such as spinal metastasis treatment, is promising but widely untested. Especially in cancer patients who develop spinal metastases, precise and personalized treatment is essential. This study examines ChatGPT-4's performance in treatment planning for spinal metastasis cases compared to experienced spine surgeons. Materials and Methods: Five spine metastasis cases were randomly selected from recent literature. Consequently, five spine surgeons and ChatGPT-4 were tasked with providing treatment recommendations for each case in a standardized manner. Responses were analyzed for frequency distribution, agreement, and subjective rater opinions. Results: ChatGPT's treatment recommendations aligned with the majority of human raters in 73% of treatment choices, with moderate to substantial agreement on systemic therapy, pain management, and supportive care. However, ChatGPT's recommendations tended towards generalized statements, with raters noting its generalized answers. Agreement among raters improved in sensitivity analyses excluding ChatGPT, particularly for controversial areas like surgical intervention and palliative care. Conclusions: ChatGPT shows potential in aligning with experienced surgeons on certain treatment aspects of spinal metastasis. However, its generalized approach highlights limitations, suggesting that training with specific clinical guidelines could potentially enhance its utility in complex case management. Further studies are necessary to refine AI applications in personalized healthcare decision-making.
Collapse
Affiliation(s)
- Stephan Heisinger
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| | - Stephan N. Salzmann
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| | - Wolfgang Senker
- Department of Neurosurgery, Kepler University Hospital, 4020 Linz, Austria (S.A.)
| | - Stefan Aspalter
- Department of Neurosurgery, Kepler University Hospital, 4020 Linz, Austria (S.A.)
| | - Johannes Oberndorfer
- Department of Neurosurgery, Kepler University Hospital, 4020 Linz, Austria (S.A.)
| | - Michael P. Matzner
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| | - Martin N. Stienen
- Spine Center of Eastern Switzerland & Department of Neurosurgery, Kantonsspital St. Gallen, Medical School of St. Gallen, University of St.Gallen, 9000 St. Gallen, Switzerland
| | - Stefan Motov
- Spine Center of Eastern Switzerland & Department of Neurosurgery, Kantonsspital St. Gallen, Medical School of St. Gallen, University of St.Gallen, 9000 St. Gallen, Switzerland
| | - Dominikus Huber
- Division of Oncology, Department of Medicine I, Medical University of Vienna, 1090 Vienna, Austria
| | - Josef Georg Grohs
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| |
Collapse
|
10
|
Akyol Onder EN, Ensari E, Ertan P. ChatGPT-4o's performance on pediatric Vesicoureteral reflux. J Pediatr Urol 2024:S1477-5131(24)00619-3. [PMID: 39694777 DOI: 10.1016/j.jpurol.2024.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Revised: 11/16/2024] [Accepted: 12/02/2024] [Indexed: 12/20/2024]
Abstract
INTRODUCTION Vesicoureteral reflux (VUR) is a common congenital or acquired urinary disorder in children. Chat Generative Pre-trained Transformer (ChatGPT) is an artificial intelligence-driven platform offering medical information. This research aims to assess the reliability and readability of ChatGPT-4o's answers regarding pediatric VUR for general, non-medical audience. MATERIALS AND METHODS Twenty of the most frequently asked English-language questions about VUR in children were used to evaluate ChatGPT-4o's responses. Two independent reviewers rated the reliability and quality using the Global Quality Scale (GQS) and a modified version of the DISCERN tool. The readability of ChatGPT responses was assessed through the Flesch Reading Ease (FRE) Score, Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG). RESULTS Median mDISCERN and GQS scores were 4 (4-5) and 5 (3-5), respectively. Most of the responses of ChatGPT have moderate (55 %) and good (45 %) reliability according to the mDISCERN score and high quality (95 %) according to GQS. The mean ± standard deviation scores for FRE, FKGL, SMOG, GFI, and CLI of the text were 26 ± 12, 15 ± 2.5, 16.3 ± 2, 18.8 ± 2.9, and 15.3 ± 2.2, respectively, indicating a high level of reading difficulty. DISCUSSION While ChatGPT-4o offers accurate and high-quality information about pediatric VUR, its readability poses challenges, as the content is difficult to understand for a general audience. CONCLUSION ChatGPT provides high-quality, accessible information about VUR. However, improving readability should be a priority to make this information more user-friendly for a broader audience.
Collapse
Affiliation(s)
- Esra Nagehan Akyol Onder
- Aksaray University Training and Research Hospital, Department of Paediatric Nephrology, Aksaray, TR-68200, Turkey.
| | - Esra Ensari
- Antalya City Hospital, Department of Paediatric Nephrology, Antalya, TR-07080, Turkey.
| | - Pelin Ertan
- Manisa Celal Bayar University, School of Medicine, Department of Paediatric Nephrology, Manisa, TR-45010, Turkey.
| |
Collapse
|
11
|
Malik J, Afzal MW, Khan SS, Umer MR, Fakhar B, Mehmoodi A. Role of Artificial Intelligence-assisted Decision Support Tool for Common Rhythm Disturbances: A ChatGPT Proof-of-concept Study. J Community Hosp Intern Med Perspect 2024; 14:5-9. [PMID: 39839170 PMCID: PMC11745183 DOI: 10.55729/2000-9666.1402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 07/23/2024] [Accepted: 08/01/2024] [Indexed: 01/23/2025] Open
Abstract
Background The objective of this article was to explore the use of ChatGPT as a clinical support tool for common arrhythmias. Methods This study assessed the feasibility of using ChatGPT as an AI decision-support tool for common rhythm disturbances. The study was conducted using retrospective data collected from electronic medical records (EMRs) of patients with documented rhythm disturbances. The model's performance was evaluated using sensitivity, specificity, positive predictive value, and negative predictive value. Results A total of 20,000 patients with rhythm disturbances were included in the study. The ChatGPT model demonstrated high diagnostic accuracy in identifying and diagnosing common rhythm disturbances, with a sensitivity of 93%, specificity of 89%, positive predictive value of 91%, and negative predictive value of 92%. The ROC curve analysis showed an area under the curve (AUC) of 0.743, indicating the excellent diagnostic performance of the ChatGPT model. Conclusion The model's diagnostic performance was comparable to clinical experts, indicating its potential to enhance clinical decision-making and improve patient outcomes. Clinical trial registration Not applicable.
Collapse
Affiliation(s)
- Jahanzeb Malik
- Department of Cardiovascular Medicine, Armed Forces Institute of Cardiology, Rawalpindi,
Pakistan
| | - Muhammad W. Afzal
- Department of Medicine, Sheikh Zayed Hospital, Rahim Yar Khan,
Pakistan
| | - Salaar S. Khan
- Department of Medicine, Federal Medical College, Islamabad,
Pakistan
| | - Muhammad R. Umer
- Department of Medicine, DHQ Teaching Hospital, Sargodha,
Pakistan
| | - Bushra Fakhar
- Pre-Medical Student, Aspire College, Islamabad,
Pakistan
| | - Amin Mehmoodi
- Department of Medicine, Ibn e Seena Hospital, Kabul,
Afghanistan
| |
Collapse
|
12
|
Zhao Z, Wang S, Gu J, Zhu Y, Mei L, Zhuang Z, Cui Z, Wang Q, Shen D. ChatCAD+: Toward a Universal and Reliable Interactive CAD Using LLMs. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:3755-3766. [PMID: 38717880 DOI: 10.1109/tmi.2024.3398350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2024]
Abstract
The integration of Computer-Aided Diagnosis (CAD) with Large Language Models (LLMs) presents a promising frontier in clinical applications, notably in automating diagnostic processes akin to those performed by radiologists and providing consultations similar to a virtual family doctor. Despite the promising potential of this integration, current works face at least two limitations: (1) From the perspective of a radiologist, existing studies typically have a restricted scope of applicable imaging domains, failing to meet the diagnostic needs of different patients. Also, the insufficient diagnostic capability of LLMs further undermine the quality and reliability of the generated medical reports. (2) Current LLMs lack the requisite depth in medical expertise, rendering them less effective as virtual family doctors due to the potential unreliability of the advice provided during patient consultations. To address these limitations, we introduce ChatCAD+, to be universal and reliable. Specifically, it is featured by two main modules: (1) Reliable Report Generation and (2) Reliable Interaction. The Reliable Report Generation module is capable of interpreting medical images from diverse domains and generate high-quality medical reports via our proposed hierarchical in-context learning. Concurrently, the interaction module leverages up-to-date information from reputable medical websites to provide reliable medical advice. Together, these designed modules synergize to closely align with the expertise of human medical professionals, offering enhanced consistency and reliability for interpretation and advice. The source code is available at GitHub.
Collapse
|
13
|
Dergaa I, Ben Saad H, Glenn JM, Ben Aissa M, Taheri M, Swed S, Guelmami N, Chamari K. A thorough examination of ChatGPT-3.5 potential applications in medical writing: A preliminary study. Medicine (Baltimore) 2024; 103:e39757. [PMID: 39465713 PMCID: PMC11460921 DOI: 10.1097/md.0000000000039757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Indexed: 10/29/2024] Open
Abstract
Effective communication of scientific knowledge plays a crucial role in the advancement of medical research and health care. Technological advancements have introduced large language models such as Chat Generative Pre-Trained Transformer (ChatGPT), powered by artificial intelligence (AI), which has already shown promise in revolutionizing medical writing. This study aimed to conduct a detailed evaluation of ChatGPT-3.5's role in enhancing various aspects of medical writing. From May 10 to 12, 2023, the authors engaged in a series of interactions with ChatGPT-3.5 to evaluate its effectiveness in various tasks, particularly its application to medical writing, including vocabulary enhancement, text rewriting for plagiarism prevention, hypothesis generation, keyword generation, title generation, article summarization, simplification of medical jargon, transforming text from informal to scientific and data interpretation. The exploration of ChatGPT's functionalities in medical writing revealed its potential in enhancing various aspects of the writing process, demonstrating its efficiency in improving vocabulary usage, suggesting alternative phrasing, and providing grammar enhancements. While the results indicate the effectiveness of ChatGPT (version 3.5), the presence of certain imperfections highlights the current indispensability of human intervention to refine and validate outputs, ensuring accuracy and relevance in medical settings. The integration of AI into medical writing shows significant potential for improving clarity, efficiency, and reliability. This evaluation highlights both the benefits and limitations of using ChatGPT-3.5, emphasizing its ability to enhance vocabulary, prevent plagiarism, generate hypotheses, suggest keywords, summarize articles, simplify medical jargon, and transform informal text into an academic format. However, AI tools should not replace human expertise. It is crucial for medical professionals to ensure thorough human review and validation to maintain the accuracy and relevance of the content in case they eventually use AI as a supplementary resource in medical writing. Accepting this mutually symbiotic partnership holds the promise of improving medical research and patient outcomes, and it sets the stage for the fusion of AI and human knowledge to produce a novel approach to medical assessment. Thus, while AI can streamline certain tasks, experienced medical writers and researchers must perform final reviews to uphold high standards in medical communications.
Collapse
Affiliation(s)
- Ismail Dergaa
- Departement of Preventative Health, Primary Health Care Corporation (PHCC), Doha, Qatar
| | - Helmi Ben Saad
- Farhat HACHED Hospital, Service of Physiology and Functional Explorations, University of Sousse, Sousse, Tunisia
- Heart Failure (LR12SP09) Research Laboratory, Farhat HACHED Hospital, University of Sousse, Sousse, Tunisia
- Faculty of Medicine of Sousse, Laboratory of Physiology, University of Sousse, Sousse, Tunisia
| | - Jordan M. Glenn
- Department of Health, Exercise Science Research Center Human Performance and Recreation, University of Arkansas, Fayetteville, AR
| | - Mohamed Ben Aissa
- Department of Human and Social Sciences, Higher Institute of Sport and Physical Education of Kef, University of Jendouba, Jendouba, Tunisia
| | - Morteza Taheri
- Institute of Future Studies, Imam Khomeini International University, Qazvi, Iran
| | - Sarya Swed
- Faculty of Medicine, Aleppo University, Aleppo, Syria
| | - Noomen Guelmami
- Department of Health Sciences, Dipartimento di scienze della salute (DISSAL), Postgraduate School of Public Health, University of Genoa, Genoa, Italy
| | - Karim Chamari
- Naufar, Wellness and Recovery Center, Doha, Qatar
- High Institute of Sport and Physical Education, University of Manouba, Tunis, Tunisia
| |
Collapse
|
14
|
Kim J, Wang K, Weng C, Liu C. Assessing the utility of large language models for phenotype-driven gene prioritization in the diagnosis of rare genetic disease. Am J Hum Genet 2024; 111:2190-2202. [PMID: 39255797 PMCID: PMC11480789 DOI: 10.1016/j.ajhg.2024.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 08/08/2024] [Accepted: 08/13/2024] [Indexed: 09/12/2024] Open
Abstract
Phenotype-driven gene prioritization is fundamental to diagnosing rare genetic disorders. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models (LLMs) promise a streamlined text-to-gene solution. In this study, we evaluated five LLMs, including two generative pre-trained transformers (GPT) series and three Llama2 series, assessing their performance across task completeness, gene prediction accuracy, and adherence to required output structures. We conducted experiments, exploring various combinations of models, prompts, phenotypic input types, and task difficulty levels. Our findings revealed that the best-performed LLM, GPT-4, achieved an average accuracy of 17.0% in identifying diagnosed genes within the top 50 predictions, which still falls behind traditional tools. However, accuracy increased with the model size. Consistent results were observed over time, as shown in the dataset curated after 2023. Advanced techniques such as retrieval-augmented generation (RAG) and few-shot learning did not improve the accuracy. Sophisticated prompts were more likely to enhance task completeness, especially in smaller models. Conversely, complicated prompts tended to decrease output structure compliance rate. LLMs also achieved better-than-random prediction accuracy with free-text input, though performance was slightly lower than with standardized concept input. Bias analysis showed that highly cited genes, such as BRCA1, TP53, and PTEN, are more likely to be predicted. Our study provides valuable insights into integrating LLMs with genomic analysis, contributing to the ongoing discussion on their utilization in clinical workflows.
Collapse
Affiliation(s)
- Junyoung Kim
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
15
|
Martinson AK, Chin AT, Butte MJ, Rider NL. Artificial Intelligence and Machine Learning for Inborn Errors of Immunity: Current State and Future Promise. THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2024; 12:2695-2704. [PMID: 39127104 DOI: 10.1016/j.jaip.2024.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 07/10/2024] [Accepted: 08/01/2024] [Indexed: 08/12/2024]
Abstract
Artificial intelligence (AI) and machine learning (ML) research within medicine has exponentially increased over the last decade, with studies showcasing the potential of AI/ML algorithms to improve clinical practice and outcomes. Ongoing research and efforts to develop AI-based models have expanded to aid in the identification of inborn errors of immunity (IEI). The use of larger electronic health record data sets, coupled with advances in phenotyping precision and enhancements in ML techniques, has the potential to significantly improve the early recognition of IEI, thereby increasing access to equitable care. In this review, we provide a comprehensive examination of AI/ML for IEI, covering the spectrum from data preprocessing for AI/ML analysis to current applications within immunology, and address the challenges associated with implementing clinical decision support systems to refine the diagnosis and management of IEI.
Collapse
Affiliation(s)
| | - Aaron T Chin
- Department of Pediatrics, Division of Immunology, Allergy and Rheumatology, University of California, Los Angeles, Los Angeles, Calif
| | - Manish J Butte
- Department of Pediatrics, Division of Immunology, Allergy and Rheumatology, University of California, Los Angeles, Los Angeles, Calif
| | - Nicholas L Rider
- Department of Health Systems & Implementation Science, Virginia Tech Carilion School of Medicine, Roanoke, Va; Department of Medicine, Division of Allergy-Immunology, Carilion Clinic, Roanoke, Va.
| |
Collapse
|
16
|
Su Z, Tang G, Huang R, Qiao Y, Zhang Z, Dai X. Based on Medicine, The Now and Future of Large Language Models. Cell Mol Bioeng 2024; 17:263-277. [PMID: 39372551 PMCID: PMC11450117 DOI: 10.1007/s12195-024-00820-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 09/08/2024] [Indexed: 10/08/2024] Open
Abstract
Objectives This review explores the potential applications of large language models (LLMs) such as ChatGPT, GPT-3.5, and GPT-4 in the medical field, aiming to encourage their prudent use, provide professional support, and develop accessible medical AI tools that adhere to healthcare standards. Methods This paper examines the impact of technologies such as OpenAI's Generative Pre-trained Transformers (GPT) series, including GPT-3.5 and GPT-4, and other large language models (LLMs) in medical education, scientific research, clinical practice, and nursing. Specifically, it includes supporting curriculum design, acting as personalized learning assistants, creating standardized simulated patient scenarios in education; assisting with writing papers, data analysis, and optimizing experimental designs in scientific research; aiding in medical imaging analysis, decision-making, patient education, and communication in clinical practice; and reducing repetitive tasks, promoting personalized care and self-care, providing psychological support, and enhancing management efficiency in nursing. Results LLMs, including ChatGPT, have demonstrated significant potential and effectiveness in the aforementioned areas, yet their deployment in healthcare settings is fraught with ethical complexities, potential lack of empathy, and risks of biased responses. Conclusion Despite these challenges, significant medical advancements can be expected through the proper use of LLMs and appropriate policy guidance. Future research should focus on overcoming these barriers to ensure the effective and ethical application of LLMs in the medical field.
Collapse
Affiliation(s)
- Ziqing Su
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Guozhang Tang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The Second Clinical College of Anhui Medical University, Hefei, 230032 Anhui P.R. China
| | - Rui Huang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Yang Qiao
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
| | - Zheng Zhang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Xingliang Dai
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Research & Development, East China Institute of Digital Medical Engineering, Shangrao, 334000 P.R. China
| |
Collapse
|
17
|
Keshavarz P, Bagherieh S, Nabipoorashrafi SA, Chalian H, Rahsepar AA, Kim GHJ, Hassani C, Raman SS, Bedayat A. ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives. Diagn Interv Imaging 2024; 105:251-265. [PMID: 38679540 DOI: 10.1016/j.diii.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/11/2024] [Accepted: 04/16/2024] [Indexed: 05/01/2024]
Abstract
PURPOSE The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications. MATERIALS AND METHODS After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications. RESULTS Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists' decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks. CONCLUSION Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.
Collapse
Affiliation(s)
- Pedram Keshavarz
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA; School of Science and Technology, The University of Georgia, Tbilisi 0171, Georgia
| | - Sara Bagherieh
- Independent Clinical Radiology Researcher, Los Angeles, CA 90024, USA
| | | | - Hamid Chalian
- Department of Radiology, Cardiothoracic Imaging, University of Washington, Seattle, WA 98195, USA
| | - Amir Ali Rahsepar
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Grace Hyun J Kim
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA; Department of Radiological Sciences, Center for Computer Vision and Imaging Biomarkers, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Cameron Hassani
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Steven S Raman
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Arash Bedayat
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA.
| |
Collapse
|
18
|
Xu R, Wang Z. Generative artificial intelligence in healthcare from the perspective of digital media: Applications, opportunities and challenges. Heliyon 2024; 10:e32364. [PMID: 38975200 PMCID: PMC11225727 DOI: 10.1016/j.heliyon.2024.e32364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 06/03/2024] [Indexed: 07/09/2024] Open
Abstract
Introduction The emergence and application of generative artificial intelligence/large language models (hereafter GenAI LLMs) have the potential for significant impact on the healthcare industry. However, there is currently a lack of systematic research on GenAI LLMs in healthcare based on reliable data. This article aims to conduct an exploratory study of the application of GenAI LLMs (i.e., ChatGPT) in healthcare from the perspective of digital media (i.e., online news), including the application scenarios, potential opportunities, and challenges. Methods This research used thematic qualitative text analysis in five steps: firstly, developing main topical categories based on relevant articles; secondly, encoding the search keywords using these categories; thirdly, conducting searches for news articles via Google ; fourthly, encoding the sub-categories using the elaborate category system; and finally, conducting category-based analysis and presenting the results. Natural language processing techniques, including the TermRaider and AntConc tool, were applied in the aforementioned steps to assist in text qualitative analysis. Additionally, this study built a framework, using for analyzing the above three topics, from the perspective of five different stakeholders, including healthcare demanders and providers. Results This study summarizes 26 applications (e.g., provide medical advice, provide diagnosis and triage recommendations, provide mental health support, etc.), 21 opportunities (e.g., make healthcare more accessible, reduce healthcare costs, improve patients care, etc.), and 17 challenges (e.g., generate inaccurate/misleading/wrong answers, raise privacy concerns, lack of transparency, etc.), and analyzes the reasons for the formation of these key items and the links between the three research topics. Conclusions The application of GenAI LLMs in healthcare is primarily focused on transforming the way healthcare demanders access medical services (i.e., making it more intelligent, refined, and humane) and optimizing the processes through which healthcare providers offer medical services (i.e., simplifying, ensuring timeliness, and reducing errors). As the application becomes more widespread and deepens, GenAI LLMs is expected to have a revolutionary impact on traditional healthcare service models, but it also inevitably raises ethical and security concerns. Furthermore, GenAI LLMs applied in healthcare is still in the initial stage, which can be accelerated from a specific healthcare field (e.g., mental health) or a specific mechanism (e.g., GenAI LLMs' economic benefits allocation mechanism applied to healthcare) with empirical or clinical research.
Collapse
Affiliation(s)
- Rui Xu
- School of Economics, Guangdong University of Technology, Guangzhou, China
| | - Zhong Wang
- School of Economics, Guangdong University of Technology, Guangzhou, China
- Key Laboratory of Digital Economy and Data Governance, Guangdong University of Technology, Guangzhou, China
| |
Collapse
|
19
|
Yang S, Chang MC. The assessment of the validity, safety, and utility of ChatGPT for patients with herniated lumbar disc: A preliminary study. Medicine (Baltimore) 2024; 103:e38445. [PMID: 38847711 PMCID: PMC11155576 DOI: 10.1097/md.0000000000038445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 05/10/2024] [Indexed: 06/10/2024] Open
Abstract
ChatGPT is perceived as a potential tool for patients diagnosed with herniated lumbar disc (HLD) to ask questions concerning desired information, with provision for necessary responses. In this preliminary study, we assessed the validity, safety, and utility of ChatGPT in patients with HLD. Two physicians specializing in the treatment of musculoskeletal disorders discussed and determined the 12 most frequently asked questions by patients with HLD in clinical practice. We used ChatGPT (version 4.0) to ask questions related to HLD. Each question was inputted into ChatGPT, and the responses were assessed by the 2 physicians. A Likert score was used to evaluate the validity, safety, and utility of the responses generated by ChatGPT. Each score for validity, safety, and utility was divided into 4 points, with a score of 4 indicating the most valid, safe, and useful answers and 1 point indicating the worst answers. Regarding validity, ChatGPT responses demonstrated 4 points for 9 questions (9/12, 75.0%) and 3 points for 3 questions (3/12, 25.0%). Regarding safety, ChatGPT scored 4 points for 11 questions (11/12, 91.7%) and 3 points for 1 question (1/12, 8.3%). Regarding utility, ChatGPT responses exhibited 4 points for 9 questions (9/12, 75.0%) and 3 points for 3 questions (3/12, 25.0%). ChatGPT demonstrates a tendency to offer relatively valid, safe, and useful information regarding HLD. However, users should exercise caution as ChatGPT may occasionally provide incomplete answers to some questions on HLD.
Collapse
Affiliation(s)
- Seoyon Yang
- Department of Rehabilitation Medicine, School of Medicine, Ewha Woman’s University Seoul Hospital, Seoul, Republic of Korea
| | - Min Cheol Chang
- Department of Rehabilitation Medicine, College of Medicine, Yeungnam University, Daegu, Republic of Korea
| |
Collapse
|
20
|
Ayoub NF, Lee YJ, Grimm D, Divi V. Head-to-Head Comparison of ChatGPT Versus Google Search for Medical Knowledge Acquisition. Otolaryngol Head Neck Surg 2024; 170:1484-1491. [PMID: 37529853 DOI: 10.1002/ohn.465] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 07/07/2023] [Accepted: 07/14/2023] [Indexed: 08/03/2023]
Abstract
OBJECTIVE Chat Generative Pretrained Transformer (ChatGPT) is the newest iteration of OpenAI's generative artificial intelligence (AI) with the potential to influence many facets of life, including health care. This study sought to assess ChatGPT's capabilities as a source of medical knowledge, using Google Search as a comparison. STUDY DESIGN Cross-sectional analysis. SETTING Online using ChatGPT, Google Seach, and Clinical Practice Guidelines (CPG). METHODS CPG Plain Language Summaries for 6 conditions were obtained. Questions relevant to specific conditions were developed and input into ChatGPT and Google Search. All questions were written from the patient perspective and sought (1) general medical knowledge or (2) medical recommendations, with varying levels of acuity (urgent or emergent vs routine clinical scenarios). Two blinded reviewers scored all passages and compared results from ChatGPT and Google Search, using the Patient Education Material Assessment Tool (PEMAT-P) as the primary outcome. Additional customized questions were developed that assessed the medical content of the passages. RESULTS The overall average PEMAT-P score for medical advice was 68.2% (standard deviation [SD]: 4.4) for ChatGPT and 89.4% (SD: 5.9) for Google Search (p < .001). There was a statistically significant difference in the PEMAT-P score by source (p < .001) but not by urgency of the clinical situation (p = .613). ChatGPT scored significantly higher than Google Search (87% vs 78%, p = .012) for patient education questions. CONCLUSION ChatGPT fared better than Google Search when offering general medical knowledge, but it scored worse when providing medical recommendations. Health care providers should strive to understand the potential benefits and ramifications of generative AI to guide patients appropriately.
Collapse
Affiliation(s)
- Noel F Ayoub
- Department of Otolaryngology-Head and Neck Surgery, Division of Head & Neck Surgery, Stanford University School of Medicine, Stanford, California, USA
| | - Yu-Jin Lee
- Department of Otolaryngology-Head and Neck Surgery, Division of Head & Neck Surgery, Stanford University School of Medicine, Stanford, California, USA
| | - David Grimm
- Department of Otolaryngology-Head and Neck Surgery, Division of Head & Neck Surgery, Stanford University School of Medicine, Stanford, California, USA
| | - Vasu Divi
- Department of Otolaryngology-Head and Neck Surgery, Division of Head & Neck Surgery, Stanford University School of Medicine, Stanford, California, USA
| |
Collapse
|
21
|
Bumgardner VKC, Mullen A, Armstrong SE, Hickey C, Marek V, Talbert J. Local Large Language Models for Complex Structured Tasks. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:105-114. [PMID: 38827047 PMCID: PMC11141822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
This paper introduces an approach that combines the language reasoning capabilities of large language models (LLMs) with the benefits of local training to tackle complex language tasks. The authors demonstrate their approach by extracting structured condition codes from pathology reports. The proposed approach utilizes local, fine-tuned LLMs to respond to specific generative instructions and provide structured outputs. Over 150k uncurated surgical pathology reports containing gross descriptions, final diagnoses, and condition codes were used. Different model architectures were trained and evaluated, including LLaMA, BERT, and LongFormer. The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics. LLaMA models performed especially well with large datasets, demonstrating their ability to handle complex, multi-label tasks. Overall, this work presents an effective approach for utilizing LLMs to perform structured generative tasks on domain-specific language in the medical domain.
Collapse
|
22
|
Piazza D, Martorana F, Curaba A, Sambataro D, Valerio MR, Firenze A, Pecorino B, Scollo P, Chiantera V, Scibilia G, Vigneri P, Gebbia V, Scandurra G. The Consistency and Quality of ChatGPT Responses Compared to Clinical Guidelines for Ovarian Cancer: A Delphi Approach. Curr Oncol 2024; 31:2796-2804. [PMID: 38785493 PMCID: PMC11119344 DOI: 10.3390/curroncol31050212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 05/06/2024] [Accepted: 05/14/2024] [Indexed: 05/25/2024] Open
Abstract
INTRODUCTION In recent years, generative Artificial Intelligence models, such as ChatGPT, have increasingly been utilized in healthcare. Despite acknowledging the high potential of AI models in terms of quick access to sources and formulating responses to a clinical question, the results obtained using these models still require validation through comparison with established clinical guidelines. This study compares the responses of the AI model to eight clinical questions with the Italian Association of Medical Oncology (AIOM) guidelines for ovarian cancer. MATERIALS AND METHODS The authors used the Delphi method to evaluate responses from ChatGPT and the AIOM guidelines. An expert panel of healthcare professionals assessed responses based on clarity, consistency, comprehensiveness, usability, and quality using a five-point Likert scale. The GRADE methodology assessed the evidence quality and the recommendations' strength. RESULTS A survey involving 14 physicians revealed that the AIOM guidelines consistently scored higher averages compared to the AI models, with a statistically significant difference. Post hoc tests showed that AIOM guidelines significantly differed from all AI models, with no significant difference among the AI models. CONCLUSIONS While AI models can provide rapid responses, they must match established clinical guidelines regarding clarity, consistency, comprehensiveness, usability, and quality. These findings underscore the importance of relying on expert-developed guidelines in clinical decision-making and highlight potential areas for AI model improvement.
Collapse
Affiliation(s)
- Dario Piazza
- Medical Oncology Unit, Casa di Cura Torina, 90145 Palermo, Italy; (D.P.); (A.C.)
| | - Federica Martorana
- Department of Clinical and Experimental Medicine, University of Catania, 95124 Catania, Italy;
| | - Annabella Curaba
- Medical Oncology Unit, Casa di Cura Torina, 90145 Palermo, Italy; (D.P.); (A.C.)
| | | | - Maria Rosaria Valerio
- Medical Oncology Unit, Policlinico P. Giaccone, University of Palermo, 90133 Palermo, Italy;
| | - Alberto Firenze
- Occupational Health Section, Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, University of Palermo, 90133 Palermo, Italy;
| | - Basilio Pecorino
- Gynecology Unit, Ospedale Cannizzaro, 95126 Catania, Italy; (B.P.); (P.S.)
- Gynecology, Faculty of Medicine and Surgery, University of Enna Kore, 94100 Enna, Italy
| | - Paolo Scollo
- Gynecology Unit, Ospedale Cannizzaro, 95126 Catania, Italy; (B.P.); (P.S.)
- Gynecology, Faculty of Medicine and Surgery, University of Enna Kore, 94100 Enna, Italy
| | - Vito Chiantera
- Gynecology, University of Palermo, 90133 Palermo, Italy;
| | | | - Paolo Vigneri
- Medical Oncology, University of Catania, 95124 Catania, Italy;
- Medical Oncology, Istituto Clinico Humanitas, 95045 Catania, Italy
| | - Vittorio Gebbia
- Medical Oncology Unit, Casa di Cura Torina, 90145 Palermo, Italy; (D.P.); (A.C.)
- Medical Oncology, Faculty of Medicine and Surgery, University of Enna Kore, 94100 Enna, Italy
| | | |
Collapse
|
23
|
Fournier A, Fallet C, Sadeghipour F, Perrottet N. Assessing the applicability and appropriateness of ChatGPT in answering clinical pharmacy questions. ANNALES PHARMACEUTIQUES FRANÇAISES 2024; 82:507-513. [PMID: 37992892 DOI: 10.1016/j.pharma.2023.11.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 11/16/2023] [Accepted: 11/16/2023] [Indexed: 11/24/2023]
Abstract
OBJECTIVES Clinical pharmacists rely on different scientific references to ensure appropriate, safe, and cost-effective drug use. Tools based on artificial intelligence (AI) such as ChatGPT (Generative Pre-trained Transformer) could offer valuable support. The objective of this study was to assess ChatGPT's capacity to correctly respond to clinical pharmacy questions asked by healthcare professionals in our university hospital. MATERIAL AND METHODS ChatGPT's capacity to respond correctly to the last 100 consecutive questions recorded in our clinical pharmacy database was assessed. Questions were copied from our FileMaker Pro database and pasted into ChatGPT March 14 version online platform. The generated answers were then copied verbatim into an Excel file. Two blinded clinical pharmacists reviewed all the questions and the answers given by the software. In case of disagreements, a third blinded pharmacist intervened to decide. RESULTS Documentation-related issues (n=36) and drug administration mode (n=30) were preponderantly recorded. Among 69 applicable questions, the rate of correct answers varied from 30 to 57.1% depending on questions type with a global rate of 44.9%. Regarding inappropriate answers (n=38), 20 were incorrect, 18 gave no answers and 8 were incomplete with 8 answers belonging to 2 different categories. No better answers than the pharmacists were observed. CONCLUSIONS ChatGPT demonstrated a mitigated performance in answering clinical pharmacy questions. It should not replace human expertise as a high rate of inappropriate answers was highlighted. Future studies should focus on the optimization of ChatGPT for specific clinical pharmacy questions and explore the potential benefits and limitations of integrating this technology into clinical practice.
Collapse
Affiliation(s)
- A Fournier
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
| | - C Fallet
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
| | - F Sadeghipour
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland; School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, Geneva, Switzerland; Center for Research and Innovation in Clinical Pharmaceutical Sciences, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - N Perrottet
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland; School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, Geneva, Switzerland.
| |
Collapse
|
24
|
Sloss EA, Abdul S, Aboagyewah MA, Beebe A, Kendle K, Marshall K, Rosenbloom ST, Rossetti S, Grigg A, Smith KD, Mishuris RG. Toward Alleviating Clinician Documentation Burden: A Scoping Review of Burden Reduction Efforts. Appl Clin Inform 2024; 15:446-455. [PMID: 38839063 PMCID: PMC11152769 DOI: 10.1055/s-0044-1787007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 04/17/2024] [Indexed: 06/07/2024] Open
Abstract
BACKGROUND Studies have shown that documentation burden experienced by clinicians may lead to less direct patient care, increased errors, and job dissatisfaction. Implementing effective strategies within health care systems to mitigate documentation burden can result in improved clinician satisfaction and more time spent with patients. However, there is a gap in the literature regarding evidence-based interventions to reduce documentation burden. OBJECTIVES The objective of this review was to identify and comprehensively summarize the state of the science related to documentation burden reduction efforts. METHODS Following Joanna Briggs Institute Manual for Evidence Synthesis and Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) guidelines, we conducted a comprehensive search of multiple databases, including PubMed, Medline, Embase, CINAHL Complete, Scopus, and Web of Science. Additionally, we searched gray literature and used Google Scholar to ensure a thorough review. Two reviewers independently screened titles and abstracts, followed by full-text review, with a third reviewer resolving any discrepancies. Data extraction was performed and a table of evidence was created. RESULTS A total of 34 articles were included in the review, published between 2016 and 2022, with a majority focusing on the United States. The efforts described can be categorized into medical scribes, workflow improvements, educational interventions, user-driven approaches, technology-based solutions, combination approaches, and other strategies. The outcomes of these efforts often resulted in improvements in documentation time, workflow efficiency, provider satisfaction, and patient interactions. CONCLUSION This scoping review provides a comprehensive summary of health system documentation burden reduction efforts. The positive outcomes reported in the literature emphasize the potential effectiveness of these efforts. However, more research is needed to identify universally applicable best practices, and considerations should be given to the transfer of burden among members of the health care team, quality of education, clinician involvement, and evaluation methods.
Collapse
Affiliation(s)
- Elizabeth A. Sloss
- Division of Health Systems and Community Based Care, College of Nursing, University of Utah, Utah, United States
| | - Shawna Abdul
- John D. Dingell VA Medical Center, Detroit, Michigan, United States
| | - Mayfair A. Aboagyewah
- Case Management, Mount Sinai Health System, MSH Main Campus, New York, New York, United States
| | - Alicia Beebe
- Saint Luke's Health System (MO), Kansas City, Missouri, United States
| | - Kathleen Kendle
- Section of Health Informatics, El Paso VA Health Care System, El Paso, Texas, United States
| | - Kyle Marshall
- Department of Emergency Medicine, Geisinger, Danville, Pennsylvania, United States
| | - S. Trent Rosenbloom
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Sarah Rossetti
- Biomedical Informatics and Nursing, Columbia University Irving Medical Center, New York, New York, United States
| | - Aaron Grigg
- Department of Informatics, Grande Ronde Hospital, La Grande, Oregon, United States
| | - Kevin D. Smith
- Department of Pediatrics, University of Chicago Medicine, Chicago, Illinois, United States
| | - Rebecca G. Mishuris
- Digital, Mass General Brigham, Somerville, Massachusetts, United States
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States
| |
Collapse
|
25
|
Dağci M, Çam F, Dost A. Reliability and Quality of the Nursing Care Planning Texts Generated by ChatGPT. Nurse Educ 2024; 49:E109-E114. [PMID: 37994523 DOI: 10.1097/nne.0000000000001566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2023]
Abstract
BACKGROUND The research on ChatGPT-generated nursing care planning texts is critical for enhancing nursing education through innovative and accessible learning methods, improving reliability and quality. PURPOSE The aim of the study was to examine the quality, authenticity, and reliability of the nursing care planning texts produced using ChatGPT. METHODS The study sample comprised 40 texts generated by ChatGPT selected nursing diagnoses that were included in NANDA 2021-2023. The texts were evaluated by using descriptive criteria form and DISCERN tool to evaluate health information. RESULTS DISCERN total average score of the texts was 45.93 ± 4.72. All texts had a moderate level of reliability and 97.5% of them provided moderate quality subscale score of information. A statistically significant relationship was found among the number of accessible references, reliability ( r = 0.408) and quality subscale score ( r = 0.379) of the texts ( P < .05). CONCLUSION ChatGPT-generated texts exhibited moderate reliability, quality of nursing care information, and overall quality despite low similarity rates.
Collapse
Affiliation(s)
- Mahmut Dağci
- Author Affiliation: Department of Nursing, Bezmialem Vakif University, Faculty of Health Sciences, Istanbul, Turkey
| | | | | |
Collapse
|
26
|
Aggarwal N, Saini BS, Gupta S. Contribution of ChatGPT in Parkinson's Disease Detection. Nucl Med Mol Imaging 2024; 58:101-103. [PMID: 38633283 PMCID: PMC11018720 DOI: 10.1007/s13139-024-00857-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 02/27/2024] [Accepted: 03/11/2024] [Indexed: 04/19/2024] Open
Affiliation(s)
- Nikita Aggarwal
- Department of Electronics & Communication Engineering, Dr. B R Ambedkar National Institute of Technology, Jalandhar, 144011 India
| | - Barjinder Singh Saini
- Department of Electronics & Communication Engineering, Dr. B R Ambedkar National Institute of Technology, Jalandhar, 144011 India
| | - Savita Gupta
- Department of Computer Science & Engineering, UIET, Panjab University, Chandigarh, 160014 India
| |
Collapse
|
27
|
Zheng Y, Wang L, Feng B, Zhao A, Wu Y. Innovating Healthcare: The Role of ChatGPT in Streamlining Hospital Workflow in the Future. Ann Biomed Eng 2024; 52:750-753. [PMID: 37464178 DOI: 10.1007/s10439-023-03323-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 07/13/2023] [Indexed: 07/20/2023]
Abstract
ChatGPT is revolutionizing hospital workflows by enhancing the precision and efficiency of tasks that were formerly the exclusive domain of healthcare professionals. Additionally, ChatGPT can aid in administrative duties, including appointment scheduling and billing, which enables healthcare professionals to allocate more time towards patient care. By shouldering some of these responsibilities, ChatGPT has the potential to advance the quality of patient care, streamline departmental efficiency, and lower healthcare costs. Nevertheless, it is crucial to strike a balance between the advantages of ChatGPT and the necessity of human interaction in healthcare to guarantee optimal patient care. While ChatGPT may assume some of the duties of physicians in particular medical domains, it cannot replace human doctors. Tackling the challenges and constraints associated with the integration of ChatGPT into the healthcare system is critical for its successful implementation.
Collapse
Affiliation(s)
- Yue Zheng
- Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China
| | - Laduona Wang
- Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China
| | - Baijie Feng
- West China School of Medicine, Sichuan University, Chengdu, 610041, China
| | - Ailin Zhao
- Department of Hematology, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China.
| | - Yijun Wu
- Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China.
| |
Collapse
|
28
|
Shahin MH, Barth A, Podichetty JT, Liu Q, Goyal N, Jin JY, Ouellet D. Artificial Intelligence: From Buzzword to Useful Tool in Clinical Pharmacology. Clin Pharmacol Ther 2024; 115:698-709. [PMID: 37881133 DOI: 10.1002/cpt.3083] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 10/06/2023] [Indexed: 10/27/2023]
Abstract
The advent of artificial intelligence (AI) in clinical pharmacology and drug development is akin to the dawning of a new era. Previously dismissed as merely technological hype, these approaches have emerged as promising tools in different domains, including health care, demonstrating their potential to empower clinical pharmacology decision making, revolutionize the drug development landscape, and advance patient care. Although challenges remain, the remarkable progress already made signals that the leap from hype to reality is well underway, and AI promises to offer clinical pharmacology new tools and possibilities for optimizing patient care is gradually coming to fruition. This review dives into the burgeoning world of AI and machine learning (ML), showcasing different applications of AI in clinical pharmacology and the impact of successful AI/ML implementation on drug development and/or regulatory decisions. This review also highlights recommendations for areas of opportunity in clinical pharmacology, including data analysis (e.g., handling large data sets, screening to identify important covariates, and optimizing patient population) and efficiencies (e.g., automation, translation, literature curation, and training). Realizing the benefits of AI in drug development and understanding its value will lead to the successful integration of AI tools in our clinical pharmacology and pharmacometrics armamentarium.
Collapse
Affiliation(s)
- Mohamed H Shahin
- Clinical Pharmacology and Bioanalytics, Pfizer Inc., Groton, Connecticut, USA
| | - Aline Barth
- Clinical Pharmacology and Bioanalytics, Pfizer Inc., Groton, Connecticut, USA
| | | | - Qi Liu
- Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland, USA
| | - Navin Goyal
- Clinical Pharmacology and Pharmacometrics, Janssen Research and Development, LLC., Spring House, Pennsylvania, USA
| | - Jin Y Jin
- Department of Clinical Pharmacology, Genentech, South San Francisco, California, USA
| | - Daniele Ouellet
- Clinical Pharmacology and Pharmacometrics, Janssen Research and Development, LLC., Spring House, Pennsylvania, USA
| |
Collapse
|
29
|
Sarman A, Tuncay S. An Exaggeration? Reality?: Can ChatGPT Be Used in Neonatal Nursing? J Perinat Neonatal Nurs 2024; 38:120-121. [PMID: 38758263 DOI: 10.1097/jpn.0000000000000826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/18/2024]
Abstract
Artificial intelligence (AI) represents a system endowed with the ability to derive meaningful inferences from a diverse array of datasets. Rooted in the advancements of machine learning models, AI has spawned various transformative technologies such as deep learning, natural language processing, computer vision, and robotics. This technological evolution is poised to witness a broadened spectrum of applications across diverse domains, with a particular focus on revolutionizing healthcare services. Noteworthy among these innovations is OpenAI's creation, ChatGPT, which stands out for its profound capabilities in intricate analysis, primarily facilitated through extensive language modeling. In the realm of healthcare, AI applications, including ChatGPT, have showcased promising outcomes, especially in the domain of neonatal nursing. Areas such as pain assessment, feeding processes, and patient status determination have witnessed substantial enhancements through the integration of AI technologies. However, it is crucial to approach the deployment of such applications with a judicious mindset. The accuracy of the underlying data must undergo rigorous validation, and any results lacking a solid foundation in scientific insights should be approached with skepticism. The paramount consideration remains patient safety, necessitating that AI applications, like ChatGPT, undergo thorough scrutiny through controlled and evidence-based studies. Only through such meticulous evaluation can the transformative potential of AI be harnessed responsibly, ensuring its alignment with the highest standards of healthcare practice.
Collapse
Affiliation(s)
- Abdullah Sarman
- Author Affiliations: Department of Pediatric Nursing, Faculty of Health Science, Bingöl University, Bingöl, Turkey
| | | |
Collapse
|
30
|
Jačisko J, Veselý V, Chang KV, Özçakar L. (How) ChatGPT-Artificial Intelligence Thinks It Can Help/Harm Physiatry. Am J Phys Med Rehabil 2024; 103:346-349. [PMID: 38112589 DOI: 10.1097/phm.0000000000002370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
ABSTRACT ChatGPT is a chatbot that is based on the generative pretrained transformer architecture as an artificial inteligence-based large language model. Its widespread use in healthcare practice, research, and education seems to be (increasingly) inevitable. Also considering the relevant limitations regarding privacy, ethics, bias, legal, and validity, in this article, its use as a supplement (for sure not as a substitute for physicians) is discussed in light of the recent literature. Particularly, the "opinion" of ChatGPT about how it can help/harm physiatry is exemplified.
Collapse
Affiliation(s)
- Jakub Jačisko
- From the Department of Rehabilitation and Sports Medicine, Second Faculty of Medicine, Charles University and University Hospital Motol, Prague, Czech Republic (JJ, VV); Department of Physical Medicine and Rehabilitation, National Taiwan University Hospital, Bei-Hu Branch, Taipei, Taiwan (K-VC); and Department of Physical and Rehabilitation Medicine, Hacettepe University Medical School, Ankara, Turkey (LO)
| | | | | | | |
Collapse
|
31
|
Zampatti S, Peconi C, Megalizzi D, Calvino G, Trastulli G, Cascella R, Strafella C, Caltagirone C, Giardina E. Innovations in Medicine: Exploring ChatGPT's Impact on Rare Disorder Management. Genes (Basel) 2024; 15:421. [PMID: 38674356 PMCID: PMC11050022 DOI: 10.3390/genes15040421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/28/2024] Open
Abstract
Artificial intelligence (AI) is rapidly transforming the field of medicine, announcing a new era of innovation and efficiency. Among AI programs designed for general use, ChatGPT holds a prominent position, using an innovative language model developed by OpenAI. Thanks to the use of deep learning techniques, ChatGPT stands out as an exceptionally viable tool, renowned for generating human-like responses to queries. Various medical specialties, including rheumatology, oncology, psychiatry, internal medicine, and ophthalmology, have been explored for ChatGPT integration, with pilot studies and trials revealing each field's potential benefits and challenges. However, the field of genetics and genetic counseling, as well as that of rare disorders, represents an area suitable for exploration, with its complex datasets and the need for personalized patient care. In this review, we synthesize the wide range of potential applications for ChatGPT in the medical field, highlighting its benefits and limitations. We pay special attention to rare and genetic disorders, aiming to shed light on the future roles of AI-driven chatbots in healthcare. Our goal is to pave the way for a healthcare system that is more knowledgeable, efficient, and centered around patient needs.
Collapse
Affiliation(s)
- Stefania Zampatti
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Cristina Peconi
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Domenica Megalizzi
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Science, Roma Tre University, 00146 Rome, Italy
| | - Giulia Calvino
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Science, Roma Tre University, 00146 Rome, Italy
| | - Giulia Trastulli
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of System Medicine, Tor Vergata University, 00133 Rome, Italy
| | - Raffaella Cascella
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Chemical-Toxicological and Pharmacological Evaluation of Drugs, Catholic University Our Lady of Good Counsel, 1000 Tirana, Albania
| | - Claudia Strafella
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Carlo Caltagirone
- Department of Clinical and Behavioral Neurology, IRCCS Fondazione Santa Lucia, 00179 Rome, Italy;
| | - Emiliano Giardina
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Biomedicine and Prevention, Tor Vergata University, 00133 Rome, Italy
| |
Collapse
|
32
|
Erden Y, Temel MH, Bağcıer F. Artificial intelligence insights into osteoporosis: assessing ChatGPT's information quality and readability. Arch Osteoporos 2024; 19:17. [PMID: 38499716 DOI: 10.1007/s11657-024-01376-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 03/07/2024] [Indexed: 03/20/2024]
Abstract
Accessible, accurate information, and readability play crucial role in empowering individuals managing osteoporosis. This study showed that the responses generated by ChatGPT regarding osteoporosis had serious problems with quality and were at a level of complexity that that necessitates an educational background of approximately 17 years. PURPOSE The use of artificial intelligence (AI) applications as a source of information in the field of health is increasing. Readable and accurate information plays a critical role in empowering patients to make decisions about their disease. The aim was to examine the quality and readability of responses provided by ChatGPT, an AI chatbot, to commonly asked questions regarding osteoporosis, representing a major public health problem. METHODS "Osteoporosis," "female osteoporosis," and "male osteoporosis" were identified by using Google trends for the 25 most frequently searched keywords on Google. A selected set of 38 keywords was sequentially inputted into the chat interface of the ChatGPT. The responses were evaluated with tools of the Ensuring Quality Information for Patients (EQIP), the Flesch-Kincaid Grade Level (FKGL), and the Flesch-Kincaid Reading Ease (FKRE). RESULTS The EQIP score of the texts ranged from a minimum of 36.36 to a maximum of 61.76 with a mean value of 48.71 as having "serious problems with quality." The FKRE scores spanned from 13.71 to 56.06 with a mean value of 28.71 and the FKGL varied between 8.48 and 17.63, with a mean value of 13.25. There were no statistically significant correlations between the EQIP score and the FKGL or FKRE scores. CONCLUSIONS Although ChatGPT is easily accessible for patients to obtain information about osteoporosis, its current quality and readability fall short of meeting comprehensive healthcare standards.
Collapse
Affiliation(s)
- Yakup Erden
- Clinic of Physical Medicine and Rehabilitation, İzzet Baysal Physical Treatment and Rehabilitation Training and Research Hospital, Orüs Street, No. 59, 14020, Bolu, Turkey.
| | - Mustafa Hüseyin Temel
- Department of Physical Medicine and Rehabilitation, Üsküdar State Hospital, Barbaros, Veysi Paşa Street, No. 14, 34662, Istanbul, Turkey
| | - Fatih Bağcıer
- Clinic of Physical Medicine and Rehabilitation, Başakşehir Çam and Sakura City Hospital, Olympic Boulevard Road, 34480, Istanbul, Turkey
| |
Collapse
|
33
|
Liu Y, Ju S, Wang J. Exploring the potential of ChatGPT in medical dialogue summarization: a study on consistency with human preferences. BMC Med Inform Decis Mak 2024; 24:75. [PMID: 38486198 PMCID: PMC10938713 DOI: 10.1186/s12911-024-02481-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 03/11/2024] [Indexed: 03/18/2024] Open
Abstract
BACKGROUND Telemedicine has experienced rapid growth in recent years, aiming to enhance medical efficiency and reduce the workload of healthcare professionals. During the COVID-19 pandemic in 2019, it became especially crucial, enabling remote screenings and access to healthcare services while maintaining social distancing. Online consultation platforms have emerged, but the demand has strained the availability of medical professionals, directly leading to research and development in automated medical consultation. Specifically, there is a need for efficient and accurate medical dialogue summarization algorithms to condense lengthy conversations into shorter versions focused on relevant medical facts. The success of large language models like generative pre-trained transformer (GPT)-3 has recently prompted a paradigm shift in natural language processing (NLP) research. In this paper, we will explore its impact on medical dialogue summarization. METHODS We present the performance and evaluation results of two approaches on a medical dialogue dataset. The first approach is based on fine-tuned pre-trained language models, such as bert-based summarization (BERTSUM) and bidirectional auto-regressive Transformers (BART). The second approach utilizes a large language models (LLMs) GPT-3.5 with inter-context learning (ICL). Evaluation is conducted using automated metrics such as ROUGE and BERTScore. RESULTS In comparison to the BART and ChatGPT models, the summaries generated by the BERTSUM model not only exhibit significantly lower ROUGE and BERTScore values but also fail to pass the testing for any of the metrics in manual evaluation. On the other hand, the BART model achieved the highest ROUGE and BERTScore values among all evaluated models, surpassing ChatGPT. Its ROUGE-1, ROUGE-2, ROUGE-L, and BERTScore values were 14.94%, 53.48%, 32.84%, and 6.73% higher respectively than ChatGPT's best results. However, in the manual evaluation by medical experts, the summaries generated by the BART model exhibit satisfactory performance only in the "Readability" metric, with less than 30% passing the manual evaluation in other metrics. When compared to the BERTSUM and BART models, the ChatGPT model was evidently more favored by human medical experts. CONCLUSION On one hand, the GPT-3.5 model can manipulate the style and outcomes of medical dialogue summaries through various prompts. The generated content is not only better received than results from certain human experts but also more comprehensible, making it a promising avenue for automated medical dialogue summarization. On the other hand, automated evaluation mechanisms like ROUGE and BERTScore fall short in fully assessing the outputs of large language models like GPT-3.5. Therefore, it is necessary to research more appropriate evaluation criteria.
Collapse
Affiliation(s)
- Yong Liu
- Department of Computer Science, Sichuan University, No. 24, South Section 1, 1st Ring Road, Chendu, 610065, Sichuan, China
| | - Shenggen Ju
- Department of Computer Science, Sichuan University, No. 24, South Section 1, 1st Ring Road, Chendu, 610065, Sichuan, China.
| | - Junfeng Wang
- Department of Computer Science, Sichuan University, No. 24, South Section 1, 1st Ring Road, Chendu, 610065, Sichuan, China
| |
Collapse
|
34
|
Xiong C, Dang W, Yang Q, Zhou Q, Shen M, Xiong Q, An M, Jiang X, Ni Y, Ji X. Integrated Ink Printing Paper Based Self-Powered Electrochemical Multimodal Biosensing (IFP -Multi ) with ChatGPT-Bioelectronic Interface for Personalized Healthcare Management. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2305962. [PMID: 38161220 PMCID: PMC10953564 DOI: 10.1002/advs.202305962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 10/23/2023] [Indexed: 01/03/2024]
Abstract
Personalized healthcare management is an emerging field that requires the development of environment-friendly, integrated, and electrochemical multimodal devices. In this study, the concept of integrated paper-based biosensors (IFP-Multi ) for personalized healthcare management is introduced. By leveraging ink printing technology and a ChatGPT-bioelectronic interface, these biosensors offer ultrahigh areal-specific capacitance (74633 mF cm-2 ), excellent mechanical properties, and multifunctional sensing and humidity power generation capabilities. More importantly, the IFP-Multi devices have the potential to simulate deaf-mute vocalization and can be integrated into wearable sensors to detect muscle contractions and bending motions. Moreover, they also enable monitoring of physiological signals from various body parts, such as the throat, nape, elbow, wrist, and knee, and successfully record sharp and repeatable signals generated by muscle contractions. In addition, the IFP-Multi devices demonstrate self-powered handwriting sensing and moisture power generation for sweat-sensing applications. As a proof-of-concept, a GPT 3.5 model-based fine-tuning and prediction pipeline that utilizes recorded physiological signals through IFP-Multi is showcased, enabling artificial intelligence with multimodal sensing capabilities for personalized healthcare management. This work presents a promising and ecofriendly approach to developing paper-based electrochemical multimodal devices, paving the way for a new era of healthcare advancements.
Collapse
Affiliation(s)
- Chuanyin Xiong
- College of Bioresources Chemical & Materials EngineeringShaanxi University of Science and TechnologyXi'an710021China
| | - Weihua Dang
- College of Bioresources Chemical & Materials EngineeringShaanxi University of Science and TechnologyXi'an710021China
| | - Qi Yang
- College of Bioresources Chemical & Materials EngineeringShaanxi University of Science and TechnologyXi'an710021China
| | - Qiusheng Zhou
- College of Bioresources Chemical & Materials EngineeringShaanxi University of Science and TechnologyXi'an710021China
| | - Mengxia Shen
- College of Bioresources Chemical & Materials EngineeringShaanxi University of Science and TechnologyXi'an710021China
| | - Qiancheng Xiong
- School of Chemistry and Materials EngineeringHuizhou UniversityHuizhou516007China
| | - Meng An
- College of Mechanical and Electrical EngineeringShaanxi University of Science and TechnologyXi'an710021China
| | - Xue Jiang
- College of Bioresources Chemical & Materials EngineeringShaanxi University of Science and TechnologyXi'an710021China
| | - Yonghao Ni
- Department of Chemical and Biomedical EngineeringThe University of MaineOronoME04469USA
| | - Xianglin Ji
- Oxford‐CityU Centre for Cerebro‐Cardiovascular Health Engineering (COCHE)City University of Hong KongHong KongHong Kong SAR999077China
| |
Collapse
|
35
|
Shiraishi M, Lee H, Kanayama K, Moriwaki Y, Okazaki M. Appropriateness of Artificial Intelligence Chatbots in Diabetic Foot Ulcer Management. INT J LOW EXTR WOUND 2024:15347346241236811. [PMID: 38419470 DOI: 10.1177/15347346241236811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
Type 2 diabetes is a significant global health concern. It often causes diabetic foot ulcers (DFUs), which affect millions of people and increase amputation and mortality rates. Despite existing guidelines, the complexity of DFU treatment makes clinical decisions challenging. Large language models such as chat generative pretrained transformer (ChatGPT), which are adept at natural language processing, have emerged as valuable resources in the medical field. However, concerns about the accuracy and reliability of the information they provide remain. We aimed to assess the accuracy of various artificial intelligence (AI) chatbots, including ChatGPT, in providing information on DFUs based on established guidelines. Seven AI chatbots were asked clinical questions (CQs) based on the DFU guidelines. Their responses were analyzed for accuracy in terms of answers to CQs, grade of recommendation, level of evidence, and agreement with the reference, including verification of the authenticity of the references provided by the chatbots. The AI chatbots showed a mean accuracy of 91.2% in answers to CQs, with discrepancies noted in grade of recommendation and level of evidence. Claude-2 outperformed other chatbots in the number of verified references (99.6%), whereas ChatGPT had the lowest rate of reference authenticity (66.3%). This study highlights the potential of AI chatbots as tools for disseminating medical information and demonstrates their high degree of accuracy in answering CQs related to DFUs. However, the variability in the accuracy of these chatbots and problems like AI hallucinations necessitate cautious use and further optimization for medical applications. This study underscores the evolving role of AI in healthcare and the importance of refining these technologies for effective use in clinical decision-making and patient education.
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Haesu Lee
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Koji Kanayama
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Yuta Moriwaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| |
Collapse
|
36
|
Rudroff T. Revealing the Complexity of Fatigue: A Review of the Persistent Challenges and Promises of Artificial Intelligence. Brain Sci 2024; 14:186. [PMID: 38391760 PMCID: PMC10886506 DOI: 10.3390/brainsci14020186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 01/31/2024] [Accepted: 02/16/2024] [Indexed: 02/24/2024] Open
Abstract
Part I reviews persistent challenges obstructing progress in understanding complex fatigue's biology. Difficulties quantifying subjective symptoms, mapping multi-factorial mechanisms, accounting for individual variation, enabling invasive sensing, overcoming research/funding insularity, and more are discussed. Part II explores how emerging artificial intelligence and machine and deep learning techniques can help address limitations through pattern recognition of complex physiological signatures as more objective biomarkers, predictive modeling to capture individual differences, consolidation of disjointed findings via data mining, and simulation to explore interventions. Conversational agents like Claude and ChatGPT also have potential to accelerate human fatigue research, but they currently lack capacities for robust autonomous contributions. Envisioned is an innovation timeline where synergistic application of enhanced neuroimaging, biosensors, closed-loop systems, and other advances combined with AI analytics could catalyze transformative progress in elucidating fatigue neural circuitry and treating associated conditions over the coming decades.
Collapse
Affiliation(s)
- Thorsten Rudroff
- Department of Health and Human Physiology, University of Iowa, Iowa City, IA 52242, USA
- Department of Neurology, University of Iowa Hospitals and Clinics, Iowa City, IA 52242, USA
| |
Collapse
|
37
|
Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024; 44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND The rapidly evolving field of artificial intelligence (AI) holds great potential for plastic surgeons. ChatGPT, a recently released AI large language model (LLM), promises applications across many disciplines, including healthcare. OBJECTIVES The aim of this article was to provide a primer for plastic surgeons on AI, LLM, and ChatGPT, including an analysis of current demonstrated and proposed clinical applications. METHODS A systematic review was performed identifying medical and surgical literature on ChatGPT's proposed clinical applications. Variables assessed included applications investigated, command tasks provided, user input information, AI-emulated human skills, output validation, and reported limitations. RESULTS The analysis included 175 articles reporting on 13 plastic surgery applications and 116 additional clinical applications, categorized by field and purpose. Thirty-four applications within plastic surgery are thus proposed, with relevance to different target audiences, including attending plastic surgeons (n = 17, 50%), trainees/educators (n = 8, 24.0%), researchers/scholars (n = 7, 21%), and patients (n = 2, 6%). The 15 identified limitations of ChatGPT were categorized by training data, algorithm, and ethical considerations. CONCLUSIONS Widespread use of ChatGPT in plastic surgery will depend on rigorous research of proposed applications to validate performance and address limitations. This systemic review aims to guide research, development, and regulation to safely adopt AI in plastic surgery.
Collapse
|
38
|
Gangwal A, Ansari A, Ahmad I, Azad AK, Kumarasamy V, Subramaniyan V, Wong LS. Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities. Front Pharmacol 2024; 15:1331062. [PMID: 38384298 PMCID: PMC10879372 DOI: 10.3389/fphar.2024.1331062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/17/2024] [Indexed: 02/23/2024] Open
Abstract
There are two main ways to discover or design small drug molecules. The first involves fine-tuning existing molecules or commercially successful drugs through quantitative structure-activity relationships and virtual screening. The second approach involves generating new molecules through de novo drug design or inverse quantitative structure-activity relationship. Both methods aim to get a drug molecule with the best pharmacokinetic and pharmacodynamic profiles. However, bringing a new drug to market is an expensive and time-consuming endeavor, with the average cost being estimated at around $2.5 billion. One of the biggest challenges is screening the vast number of potential drug candidates to find one that is both safe and effective. The development of artificial intelligence in recent years has been phenomenal, ushering in a revolution in many fields. The field of pharmaceutical sciences has also significantly benefited from multiple applications of artificial intelligence, especially drug discovery projects. Artificial intelligence models are finding use in molecular property prediction, molecule generation, virtual screening, synthesis planning, repurposing, among others. Lately, generative artificial intelligence has gained popularity across domains for its ability to generate entirely new data, such as images, sentences, audios, videos, novel chemical molecules, etc. Generative artificial intelligence has also delivered promising results in drug discovery and development. This review article delves into the fundamentals and framework of various generative artificial intelligence models in the context of drug discovery via de novo drug design approach. Various basic and advanced models have been discussed, along with their recent applications. The review also explores recent examples and advances in the generative artificial intelligence approach, as well as the challenges and ongoing efforts to fully harness the potential of generative artificial intelligence in generating novel drug molecules in a faster and more affordable manner. Some clinical-level assets generated form generative artificial intelligence have also been discussed in this review to show the ever-increasing application of artificial intelligence in drug discovery through commercial partnerships.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Azim Ansari
- Computer Aided Drug Design Center Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Dhule, India
| | - Abul Kalam Azad
- Faculty of Pharmacy, University College of MAIWP International, Batu Caves, Malaysia
| | - Vinoth Kumarasamy
- Department of Parasitology and Medical Entomology, Faculty of Medicine, Universiti Kebangsaan Malaysia, Cheras, Malaysia
| | - Vetriselvan Subramaniyan
- Pharmacology Unit, Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Selangor, Malaysia
- School of Bioengineering and Biosciences, Lovely Professional University, Phagwara, Punjab, India
| | - Ling Shing Wong
- Faculty of Health and Life Sciences, INTI International University, Nilai, Malaysia
| |
Collapse
|
39
|
Peng W, Feng Y, Yao C, Zhang S, Zhuo H, Qiu T, Zhang Y, Tang J, Gu Y, Sun Y. Evaluating AI in medicine: a comparative analysis of expert and ChatGPT responses to colorectal cancer questions. Sci Rep 2024; 14:2840. [PMID: 38310152 PMCID: PMC10838275 DOI: 10.1038/s41598-024-52853-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 01/24/2024] [Indexed: 02/05/2024] Open
Abstract
Colorectal cancer (CRC) is a global health challenge, and patient education plays a crucial role in its early detection and treatment. Despite progress in AI technology, as exemplified by transformer-like models such as ChatGPT, there remains a lack of in-depth understanding of their efficacy for medical purposes. We aimed to assess the proficiency of ChatGPT in the field of popular science, specifically in answering questions related to CRC diagnosis and treatment, using the book "Colorectal Cancer: Your Questions Answered" as a reference. In general, 131 valid questions from the book were manually input into ChatGPT. Responses were evaluated by clinical physicians in the relevant fields based on comprehensiveness and accuracy of information, and scores were standardized for comparison. Not surprisingly, ChatGPT showed high reproducibility in its responses, with high uniformity in comprehensiveness, accuracy, and final scores. However, the mean scores of ChatGPT's responses were significantly lower than the benchmarks, indicating it has not reached an expert level of competence in CRC. While it could provide accurate information, it lacked in comprehensiveness. Notably, ChatGPT performed well in domains of radiation therapy, interventional therapy, stoma care, venous care, and pain control, almost rivaling the benchmarks, but fell short in basic information, surgery, and internal medicine domains. While ChatGPT demonstrated promise in specific domains, its general efficiency in providing CRC information falls short of expert standards, indicating the need for further advancements and improvements in AI technology for patient education in healthcare.
Collapse
Affiliation(s)
- Wen Peng
- Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China
- The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Yifei Feng
- Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China
- The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Cui Yao
- Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China
- The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Sheng Zhang
- Department of Radiotherapy, The First Affiliated Hospital with Nanjing Medical University, Nanjing, People's Republic of China
| | - Han Zhuo
- Department of Intervention, The First Affiliated Hospital with Nanjing Medical University, Nanjing, People's Republic of China
| | - Tianzhu Qiu
- Department of Oncology, The First Affiliated Hospital with Nanjing Medical University, Nanjing, People's Republic of China
| | - Yi Zhang
- Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China
- The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Junwei Tang
- Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China.
- The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China.
| | - Yanhong Gu
- Department of Oncology, The First Affiliated Hospital with Nanjing Medical University, Nanjing, People's Republic of China.
| | - Yueming Sun
- Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China.
- The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China.
| |
Collapse
|
40
|
Barlas T, Altinova AE, Akturk M, Toruner FB. Credibility of ChatGPT in the assessment of obesity in type 2 diabetes according to the guidelines. Int J Obes (Lond) 2024; 48:271-275. [PMID: 37951982 DOI: 10.1038/s41366-023-01410-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 10/22/2023] [Accepted: 10/30/2023] [Indexed: 11/14/2023]
Abstract
BACKGROUND The Chat Generative Pre-trained Transformer (ChatGPT) allows students, researchers, and patients in the medical field to access information easily and has gained attention nowadays. We aimed to evaluate the credibility of ChatGPT according to the guidelines for the assessment of obesity in type 2 diabetes (T2D), which is one of the major concerns of this century. MATERIALS AND METHOD In this cross-sectional non-human subject study, experienced endocrinologists posed 20 questions to ChatGPT in subsections, which were assessments and different treatment options for obesity according to the American Diabetes Association and American Association of Clinical Endocrinology guidelines. The responses of ChatGPT were classified into four categories: compatible, compatible but insufficient, partially incompatible and incompatible with the guidelines. RESULTS ChatGPT demonstrated a systematic approach to answering questions and recommended consulting a healthcare provider to receive personalized advice based on the specific health needs and circumstances of patients. The compatibility of ChatGPT with the guidelines was 100% in the assessment of obesity in type 2 diabetes; however, it was lower in the therapy sections, which included nutritional, medical, and surgical approaches to weight loss. Furthermore, ChatGPT required additional prompts for responses that were evaluated as "compatible but insufficient" to provide all the information in the guidelines. CONCLUSION The assessment and management of obesity in T2D are highly individualized. Despite ChatGPT's comprehensive and understandable responses, it should not be used as a substitute for healthcare professionals' patient-centered approach.
Collapse
Affiliation(s)
- Tugba Barlas
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey.
| | - Alev Eroglu Altinova
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey
| | - Mujde Akturk
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey
| | - Fusun Balos Toruner
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey
| |
Collapse
|
41
|
Pandya A, Lodha P, Ganatra A. Is ChatGPT ready to change mental healthcare? Challenges and considerations: a reality-check. FRONTIERS IN HUMAN DYNAMICS 2024; 5. [DOI: 10.3389/fhumd.2023.1289255] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2025]
Abstract
As mental healthcare is highly stigmatized, digital platforms and services are becoming popular. A wide variety of exciting and futuristic applications of AI platforms are available now. One such application getting tremendous attention from users and researchers alike is Chat Generative Pre-trained Transformer (ChatGPT). ChatGPT is a powerful chatbot launched by open artificial intelligence (Open AI). ChatGPT interacts with clients conversationally, answering follow-up questions, admitting mistakes, challenging incorrect premises, and rejecting inappropriate requests. With its multifarious applications, the ethical and privacy considerations surrounding the use of these technologies in sensitive areas such as mental health should be carefully addressed to ensure user safety and wellbeing. The authors comment on the ethical challenges with ChatGPT in mental healthcare that need attention at various levels, outlining six major concerns viz., (1) accurate identification and diagnosis of mental health conditions; (2) limited understanding and misinterpretation; (3) safety, and privacy of users; (4) bias and equity; (5) lack of monitoring and regulation; and (6) gaps in evidence, and lack of educational and training curricula.
Collapse
|
42
|
Malik S, Zaheer S. ChatGPT as an aid for pathological diagnosis of cancer. Pathol Res Pract 2024; 253:154989. [PMID: 38056135 DOI: 10.1016/j.prp.2023.154989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/26/2023] [Accepted: 11/27/2023] [Indexed: 12/08/2023]
Abstract
Diagnostic workup of cancer patients is highly reliant on the science of pathology using cytopathology, histopathology, and other ancillary techniques like immunohistochemistry and molecular cytogenetics. Data processing and learning by means of artificial intelligence (AI) has become a spearhead for the advancement of medicine, with pathology and laboratory medicine being no exceptions. ChatGPT, an artificial intelligence (AI)-based chatbot, that was recently launched by OpenAI, is currently a talk of the town, and its role in cancer diagnosis is also being explored meticulously. Pathology workflow by integration of digital slides, implementation of advanced algorithms, and computer-aided diagnostic techniques extend the frontiers of the pathologist's view beyond a microscopic slide and enables effective integration, assimilation, and utilization of knowledge that is beyond human limits and boundaries. Despite of it's numerous advantages in the pathological diagnosis of cancer, it comes with several challenges like integration of digital slides with input language parameters, problems of bias, and legal issues which have to be addressed and worked up soon so that we as a pathologists diagnosing malignancies are on the same band wagon and don't miss the train.
Collapse
Affiliation(s)
- Shaivy Malik
- Department of Pathology, Vardhman Mahavir Medical College and Safdarjung Hospital, New Delhi, India
| | - Sufian Zaheer
- Department of Pathology, Vardhman Mahavir Medical College and Safdarjung Hospital, New Delhi, India.
| |
Collapse
|
43
|
Lanera C, Lorenzoni G, Barbieri E, Piras G, Magge A, Weissenbacher D, Donà D, Cantarutti L, Gonzalez-Hernandez G, Giaquinto C, Gregori D. Monitoring the Epidemiology of Otitis Using Free-Text Pediatric Medical Notes: A Deep Learning Approach. J Pers Med 2023; 14:28. [PMID: 38248729 PMCID: PMC10817419 DOI: 10.3390/jpm14010028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 12/20/2023] [Accepted: 12/21/2023] [Indexed: 01/23/2024] Open
Abstract
Free-text information represents a valuable resource for epidemiological surveillance. Its unstructured nature, however, presents significant challenges in the extraction of meaningful information. This study presents a deep learning model for classifying otitis using pediatric medical records. We analyzed the Pedianet database, which includes data from January 2004 to August 2017. The model categorizes narratives from clinical record diagnoses into six types: no otitis, non-media otitis, non-acute otitis media (OM), acute OM (AOM), AOM with perforation, and recurrent AOM. Utilizing deep learning architectures, including an ensemble model, this study addressed the challenges associated with the manual classification of extensive narrative data. The performance of the model was evaluated according to a gold standard classification made by three expert clinicians. The ensemble model achieved values of 97.03, 93.97, 96.59, and 95.48 for balanced precision, balanced recall, accuracy, and balanced F1 measure, respectively. These results underscore the efficacy of using automated systems for medical diagnoses, especially in pediatric care. Our findings demonstrate the potential of deep learning in interpreting complex medical records, enhancing epidemiological surveillance and research. This approach offers significant improvements in handling large-scale medical data, ensuring accuracy and minimizing human error. The methodology is adaptable to other medical contexts, promising a new horizon in healthcare analytics.
Collapse
Affiliation(s)
- Corrado Lanera
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (C.L.); (G.L.)
| | - Giulia Lorenzoni
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (C.L.); (G.L.)
| | - Elisa Barbieri
- Division of Pediatric Infectious Diseases, Department for Woman and Child Health, University of Padova, 35128 Padova, Italy; (E.B.); (D.D.); (C.G.)
| | - Gianluca Piras
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (C.L.); (G.L.)
| | - Arjun Magge
- Health Language Processing Center, Institute for Biomedical Informatics at the Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; (A.M.); (D.W.); (G.G.-H.)
| | - Davy Weissenbacher
- Health Language Processing Center, Institute for Biomedical Informatics at the Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; (A.M.); (D.W.); (G.G.-H.)
| | - Daniele Donà
- Division of Pediatric Infectious Diseases, Department for Woman and Child Health, University of Padova, 35128 Padova, Italy; (E.B.); (D.D.); (C.G.)
| | | | - Graciela Gonzalez-Hernandez
- Health Language Processing Center, Institute for Biomedical Informatics at the Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; (A.M.); (D.W.); (G.G.-H.)
| | - Carlo Giaquinto
- Division of Pediatric Infectious Diseases, Department for Woman and Child Health, University of Padova, 35128 Padova, Italy; (E.B.); (D.D.); (C.G.)
- Società Servizi Telematici—Pedianet, 35100 Padova, Italy;
| | - Dario Gregori
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (C.L.); (G.L.)
| |
Collapse
|
44
|
Alsadhan A, Al-Anezi F, Almohanna A, Alnaim N, Alzahrani H, Shinawi R, AboAlsamh H, Bakhshwain A, Alenazy M, Arif W, Alyousef S, Alhamidi S, Alghamdi A, AlShrayfi N, Rubaian NB, Alanzi T, AlSahli A, Alturki R, Herzallah N. The opportunities and challenges of adopting ChatGPT in medical research. Front Med (Lausanne) 2023; 10:1259640. [PMID: 38188345 PMCID: PMC10766839 DOI: 10.3389/fmed.2023.1259640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 12/07/2023] [Indexed: 01/09/2024] Open
Abstract
Purpose This study aims to investigate the opportunities and challenges of adopting ChatGPT in medical research. Methods A qualitative approach with focus groups is adopted in this study. A total of 62 participants including academic researchers from different streams in medicine and eHealth, participated in this study. Results A total of five themes with 16 sub-themes related to the opportunities; and a total of five themes with 12 sub-themes related to the challenges were identified. The major opportunities include improved data collection and analysis, improved communication and accessibility, and support for researchers in multiple streams of medical research. The major challenges identified were limitations of training data leading to bias, ethical issues, technical limitations, and limitations in data collection and analysis. Conclusion Although ChatGPT can be used as a potential tool in medical research, there is a need for further evidence to generalize its impact on the different research activities.
Collapse
Affiliation(s)
- Abeer Alsadhan
- Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Fahad Al-Anezi
- Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Asmaa Almohanna
- Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Norah Alnaim
- Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | | | | | - Hoda AboAlsamh
- Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | | | - Maha Alenazy
- King Saud University, Riyadh, Riyadh, Saudi Arabia
| | - Wejdan Arif
- King Saud University, Riyadh, Riyadh, Saudi Arabia
| | | | | | | | - Nour AlShrayfi
- Public Authority for Applied Education and Training, Kuwait City, Kuwait
| | | | - Turki Alanzi
- Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Alaa AlSahli
- King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| | - Rasha Alturki
- Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | | |
Collapse
|
45
|
Alotaibi SS, Rehman A, Hasnain M. Revolutionizing ocular cancer management: a narrative review on exploring the potential role of ChatGPT. Front Public Health 2023; 11:1338215. [PMID: 38192545 PMCID: PMC10773849 DOI: 10.3389/fpubh.2023.1338215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 12/04/2023] [Indexed: 01/10/2024] Open
Abstract
This paper pioneers the exploration of ocular cancer, and its management with the help of Artificial Intelligence (AI) technology. Existing literature presents a significant increase in new eye cancer cases in 2023, experiencing a higher incidence rate. Extensive research was conducted using online databases such as PubMed, ACM Digital Library, ScienceDirect, and Springer. To conduct this review, Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines are used. Of the collected 62 studies, only 20 documents met the inclusion criteria. The review study identifies seven ocular cancer types. Important challenges associated with ocular cancer are highlighted, including limited awareness about eye cancer, restricted healthcare access, financial barriers, and insufficient infrastructure support. Financial barriers is one of the widely examined ocular cancer challenges in the literature. The potential role and limitations of ChatGPT are discussed, emphasizing its usefulness in providing general information to physicians, noting its inability to deliver up-to-date information. The paper concludes by presenting the potential future applications of ChatGPT to advance research on ocular cancer globally.
Collapse
Affiliation(s)
- Saud S. Alotaibi
- Information Systems Department, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Amna Rehman
- Department of Computer Science, Lahore Leads University, Lahore, Pakistan
| | - Muhammad Hasnain
- Department of Computer Science, Lahore Leads University, Lahore, Pakistan
| |
Collapse
|
46
|
Madrid-García A, Rosales-Rosado Z, Freites-Nuñez D, Pérez-Sancristóbal I, Pato-Cour E, Plasencia-Rodríguez C, Cabeza-Osorio L, Abasolo-Alcázar L, León-Mateos L, Fernández-Gutiérrez B, Rodríguez-Rodríguez L. Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training. Sci Rep 2023; 13:22129. [PMID: 38092821 PMCID: PMC10719375 DOI: 10.1038/s41598-023-49483-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 12/08/2023] [Indexed: 12/17/2023] Open
Abstract
The emergence of large language models (LLM) with remarkable performance such as ChatGPT and GPT-4, has led to an unprecedented uptake in the population. One of their most promising and studied applications concerns education due to their ability to understand and generate human-like text, creating a multitude of opportunities for enhancing educational practices and outcomes. The objective of this study is twofold: to assess the accuracy of ChatGPT/GPT-4 in answering rheumatology questions from the access exam to specialized medical training in Spain (MIR), and to evaluate the medical reasoning followed by these LLM to answer those questions. A dataset, RheumaMIR, of 145 rheumatology-related questions, extracted from the exams held between 2010 and 2023, was created for that purpose, used as a prompt for the LLM, and was publicly distributed. Six rheumatologists with clinical and teaching experience evaluated the clinical reasoning of the chatbots using a 5-point Likert scale and their degree of agreement was analyzed. The association between variables that could influence the models' accuracy (i.e., year of the exam question, disease addressed, type of question and genre) was studied. ChatGPT demonstrated a high level of performance in both accuracy, 66.43%, and clinical reasoning, median (Q1-Q3), 4.5 (2.33-4.67). However, GPT-4 showed better performance with an accuracy score of 93.71% and a median clinical reasoning value of 4.67 (4.5-4.83). These findings suggest that LLM may serve as valuable tools in rheumatology education, aiding in exam preparation and supplementing traditional teaching methods.
Collapse
Affiliation(s)
- Alfredo Madrid-García
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain.
| | - Zulema Rosales-Rosado
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Dalifer Freites-Nuñez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Inés Pérez-Sancristóbal
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Esperanza Pato-Cour
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | | | - Luis Cabeza-Osorio
- Medicina Interna, Hospital Universitario del Henares, Avenida de Marie Curie, 0, 28822, Madrid, Spain
- Facultad de Medicina, Universidad Francisco de Vitoria, Carretera Pozuelo, Km 1800, 28223, Madrid, Spain
| | - Lydia Abasolo-Alcázar
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Leticia León-Mateos
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Benjamín Fernández-Gutiérrez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
- Facultad de Medicina, Universidad Complutense de Madrid, Madrid, Spain
| | - Luis Rodríguez-Rodríguez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| |
Collapse
|
47
|
Wang G, Liu Q, Chen G, Xia B, Zeng D, Chen G, Guo C. AI's deep dive into complex pediatric inguinal hernia issues: a challenge to traditional guidelines? Hernia 2023; 27:1587-1599. [PMID: 37843604 DOI: 10.1007/s10029-023-02900-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/19/2023] [Indexed: 10/17/2023]
Abstract
OBJECTIVE This study utilized ChatGPT, an artificial intelligence program based on large language models, to explore controversial issues in pediatric inguinal hernia surgery and compare its responses with the guidelines of the European Association of Pediatric Surgeons (EUPSA). METHODS Six contentious issues raised by EUPSA were submitted to ChatGPT 4.0 for analysis, for which two independent responses were generated for each issue. These generated answers were subsequently compared with systematic reviews and guidelines. To ensure content accuracy and reliability, a content analysis was conducted, and expert evaluations were solicited for validation. Content analysis evaluated the consistency or discrepancy between ChatGPT 4.0's responses and the guidelines. An expert scoring method assess the quality, reliability, and applicability of responses. The TF-IDF model tested the stability and consistency of the two responses. RESULTS The responses generated by ChatGPT 4.0 were mostly consistent with the guidelines. However, some differences and contradictions were noted. The average quality score was 3.33, reliability score was 2.75, and applicability score was 3.46 (out of 5). The average similarity between the two responses was 0.72 (out of 1), Content analysis and expert ratings yielded consistent conclusions, enhancing the credibility of our research. CONCLUSION ChatGPT can provide valuable responses to clinical questions, but it has limitations and requires further improvement. It is recommended to combine ChatGPT with other reliable data sources to improve clinical practice and decision-making.
Collapse
Affiliation(s)
- G Wang
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Pediatrics, Children's Hospital, Chongqing Medical University, Chongqing, People's Republic of China
- Department of Pediatric General Surgery, Chongqing Maternal and Child Health Hospital, Chongqing Medical University, Chongqing, People's Republic of China
| | - Q Liu
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
| | - G Chen
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
| | - B Xia
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
| | - D Zeng
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
| | - G Chen
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China.
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China.
- Department of Pediatric General Surgery, Chongqing Maternal and Child Health Hospital, Chongqing Medical University, Chongqing, People's Republic of China.
- Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Women and Children's Hospital of Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China.
| | - C Guo
- Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China.
- Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China.
- Department of Pediatric General Surgery, Chongqing Maternal and Child Health Hospital, Chongqing Medical University, Chongqing, People's Republic of China.
| |
Collapse
|
48
|
Chatterjee S, Bhattacharya M, Pal S, Lee SS, Chakraborty C. ChatGPT and large language models in orthopedics: from education and surgery to research. J Exp Orthop 2023; 10:128. [PMID: 38038796 PMCID: PMC10692045 DOI: 10.1186/s40634-023-00700-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 11/16/2023] [Indexed: 12/02/2023] Open
Abstract
ChatGPT has quickly popularized since its release in November 2022. Currently, large language models (LLMs) and ChatGPT have been applied in various domains of medical science, including in cardiology, nephrology, orthopedics, ophthalmology, gastroenterology, and radiology. Researchers are exploring the potential of LLMs and ChatGPT for clinicians and surgeons in every domain. This study discusses how ChatGPT can help orthopedic clinicians and surgeons perform various medical tasks. LLMs and ChatGPT can help the patient community by providing suggestions and diagnostic guidelines. In this study, the use of LLMs and ChatGPT to enhance and expand the field of orthopedics, including orthopedic education, surgery, and research, is explored. Present LLMs have several shortcomings, which are discussed herein. However, next-generation and future domain-specific LLMs are expected to be more potent and transform patients' quality of life.
Collapse
Affiliation(s)
- Srijan Chatterjee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-Si, 24252, Gangwon-Do, Republic of Korea
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, 756020, Odisha, India
| | - Soumen Pal
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-Si, 24252, Gangwon-Do, Republic of Korea.
| | - Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, 700126, India.
| |
Collapse
|
49
|
Al-Dujaili Z, Omari S, Pillai J, Al Faraj A. Assessing the accuracy and consistency of ChatGPT in clinical pharmacy management: A preliminary analysis with clinical pharmacy experts worldwide. Res Social Adm Pharm 2023; 19:1590-1594. [PMID: 37696742 DOI: 10.1016/j.sapharm.2023.08.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/30/2023] [Accepted: 08/30/2023] [Indexed: 09/13/2023]
Abstract
BACKGROUND ChatGPT conversation system has ushered in a revolutionary new era of information retrieval and stands as one of the fastest-growing platforms. Clinical pharmacy, as a dynamic discipline, necessitates an advanced comprehension of drugs and diseases. The process of decision-making in clinical pharmacy demands accuracy and consistency in medical information, as it directly affects patient safety. OBJECTIVE The objective was to evaluate ChatGPT's accuracy and consistency in managing pharmacotherapy cases across multiple time points. Additionally, input was gathered from global clinical pharmacy experts, and the agreement between ChatGPT's responses and those of clinical pharmacy experts worldwide was assessed. METHODS A set of 20 cases of pharmacotherapy was entered into ChatGPT at three different time points. Reliability analysis was performed using inter-rater reliability to measure the accuracy of the output generated by ChatGPT at each time point. Test-retest reliability was performed to measure the consistency of the output generated by ChatGPT across the three time points. Pharmacy expert performance was evaluated, and the overall results were compared. RESULTS ChatGPT achieved a hit rate of 70.83% at week 1, 79.2% at week 3, and 75% at week 5. The percent agreement between weeks 1 and 3 was 79.2%, whereas it was 87.5% between weeks 3 and 5, and 83.3% between weeks 1 and 5. In contrast, accuracy rates among clinical pharmacy experts showed considerable variation according to their geographic location. The highest agreement between clinical pharmacist responses and ChatGPT responses was observed at the last time point examined. CONCLUSIONS Overall, the analysis suggested that ChatGPT is capable of generating clinically relevant pharmaceutical information, albeit with some variation in accuracy and consistency. It should be noted that clinical pharmacy experts worldwide may provide varying degrees of accuracy depending on their expertise. This study highlights the potential of AI chatbots in clinical pharmacy.
Collapse
Affiliation(s)
- Zahraa Al-Dujaili
- College of Pharmacy, American University of Iraq - Baghdad (AUIB), Baghdad, Iraq
| | - Sarah Omari
- Department of Epidemiology and Population Health, American University of Beirut (AUB), Beirut, Lebanon
| | - Jey Pillai
- College of Pharmacy, American University of Iraq - Baghdad (AUIB), Baghdad, Iraq
| | - Achraf Al Faraj
- College of Pharmacy, American University of Iraq - Baghdad (AUIB), Baghdad, Iraq.
| |
Collapse
|
50
|
Wei Y, Guo L, Lian C, Chen J. ChatGPT: Opportunities, risks and priorities for psychiatry. Asian J Psychiatr 2023; 90:103808. [PMID: 37898100 DOI: 10.1016/j.ajp.2023.103808] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 10/18/2023] [Accepted: 10/22/2023] [Indexed: 10/30/2023]
Abstract
The advancement of large language models such as ChatGPT, opens new possibilities in psychiatry but also invites scrutiny. This paper examines the potential opportunities, risks, and crucial areas of focus within this area. The active engagement of the mental health community is seen as critical to ensure ethical practice, equal access, and a patient-centric approach.
Collapse
Affiliation(s)
- Yaohui Wei
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Department of Psychiatry and Psychotherapy, University Hospital Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Lei Guo
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Cheng Lian
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jue Chen
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| |
Collapse
|