Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Grünebaum A, Chervenak J, Pollet SL, Katz A, Chervenak FA. The exciting potential for ChatGPT in obstetrics and gynecology. Am J Obstet Gynecol 2023;228:696-705. [PMID: 36924907 DOI: 10.1016/j.ajog.2023.03.009] [Citation(s) in RCA: 114] [Impact Index Per Article: 57.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/02/2023] [Accepted: 03/02/2023] [Indexed: 03/17/2023]

For:	Grünebaum A, Chervenak J, Pollet SL, Katz A, Chervenak FA. The exciting potential for ChatGPT in obstetrics and gynecology. Am J Obstet Gynecol 2023;228:696-705. [PMID: 36924907 DOI: 10.1016/j.ajog.2023.03.009] [Citation(s) in RCA: 114] [Impact Index Per Article: 57.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/02/2023] [Accepted: 03/02/2023] [Indexed: 03/17/2023]

Number

Cited by Other Article(s)

Shabani F, Jodeiri A, Mohammad-Alizadeh-Charandabi S, Abbasalizadeh F, Tanha J, Mirghafourvand M. Developing and validating an artificial intelligence-based application for predicting some pregnancy outcomes: a multi-phase study protocol. Reprod Health 2025;22:99. [PMID: 40481447 PMCID: PMC12144753 DOI: 10.1186/s12978-025-02048-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2025] [Accepted: 05/25/2025] [Indexed: 06/11/2025] Open

Abstract

Background

Pregnancy complications such as preterm birth, low birth weight, gestational diabetes mellitus, preeclampsia, and intrauterine growth restriction significantly affect both maternal and neonatal health outcomes. Early identification of high-risk pregnancies is essential for timely interventions; however, traditional predictive models often lack accuracy. This study aims to develop and validate an AI-based application to improve risk assessment and clinical decision-making regarding pregnancy outcomes through a multi-phase approach.

Methods

This study comprises three phases. In Phase 1, retrospective case-control data will be collected from medical records, including Mother and Infant System (IMaN), Hospital Information System (HIS), and archived records of women who gave birth at Al-Zahra and Taleghani Educational and Medical Centers in Tabriz between 2022 and 2024. In Phase 2, an artificial intelligence model will be developed using machine learning algorithms such as Random Forest, XGBoost, Support Vector Machines (SVM), and neural networks, followed by model training, validation, and integration into a user-friendly application. Phase 3 will focus on a prospective cohort study of pregnant women attending clinics after 22 weeks of gestation, evaluating the AI model’s predictive performance through metrics like AUROC (area under the receiver operating characteristic curve), sensitivity, specificity, and predictive values, along with real-time data collection. Content validity will be determined through expert reviews.

Discussion

This study protocol presents a multi-phase approach to developing and validating an AI-based application for predicting pregnancy outcomes. By integrating retrospective data analysis, machine learning, and prospective validation, the study aims to improve early risk detection and maternal care. If successful, this application could support personalized obstetric decision-making.

This study aims to develop and validate an artificial intelligence (AI)-based tool to predict pregnancy complications, including preterm birth, low birth weight, gestational diabetes, intrauterine growth restriction, and preeclampsia. The research will be conducted in three phases. First, past medical records from two hospitals will be analysed to identify key risk factors. Next, a machine learning model will be developed and integrated into a user-friendly application. Finally, the tool will be tested on a group of pregnant women to assess its accuracy in predicting adverse pregnancy outcomes.

By leveraging AI, this study seeks to enhance early risk detection, enabling healthcare providers to implement timely preventive measures and improve maternal and neonatal health outcomes. If successful, this AI-based application could serve as a valuable resource in maternity care, assisting midwives and doctors in delivering personalized care and reducing complications. The findings could also advance the use of AI technology in obstetric practice, improving decision-making and optimizing healthcare resources.

Collapse

Birol NY, Çiftci HB, Yılmaz A, Çağlayan A, Alkan F. Is there any room for ChatGPT AI bot in speech-language pathology? Eur Arch Otorhinolaryngol 2025;282:3267-3280. [PMID: 40025183 PMCID: PMC12122639 DOI: 10.1007/s00405-025-09295-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Accepted: 02/21/2025] [Indexed: 03/04/2025]

Abstract

PURPOSE

This study investigates the potential of the ChatGPT-4.0 artificial intelligence bot to assist speech-language pathologists (SLPs) by assessing its accuracy, comprehensiveness, and relevance in various tasks related to speech, language, and swallowing disorders.

METHOD

In this cross-sectional descriptive study, 15 practicing SLPs evaluated ChatGPT-4.0's responses to task-specific queries across six core areas: report writing, assessment material generation, clinical decision support, therapy stimulus generation, therapy planning, and client/family training material generation. English prompts were created in seven areas: speech sound disorders, motor speech disorders, aphasia, stuttering, childhood language disorders, voice disorders, and swallowing disorders. These prompts were entered into ChatGPT-4.0, and its responses were evaluated. Using a three-point Likert-type scale, participants rated each response for accuracy, relevance, and comprehensiveness based on clinical expectations and their professional judgment.

RESULTS

The study revealed that ChatGPT-4.0 performed with predominantly high accuracy, comprehensiveness, and relevance in tasks related to speech and language disorders. High accuracy, comprehensiveness, and relevance levels were observed in report writing, clinical decision support, and creating education material. However, tasks such as creating therapy stimuli and therapy planning showed more variation with medium and high accuracy levels.

CONCLUSIONS

ChatGPT-4.0 shows promise in assisting SLPs with various professional tasks, particularly report writing, clinical decision support, and education material creation. However, further research is needed to address its limitations in therapy stimulus generation and therapy planning to improve its usability in clinical practice. Integrating AI technologies such as ChatGPT could improve the efficiency and effectiveness of therapeutic processes in speech-language pathology.

Collapse

Du Y, Ji C, Xu J, Wei M, Ren Y, Xia S, Zhou J. Performance of ChatGPT and Microsoft Copilot in Bing in answering obstetric ultrasound questions and analyzing obstetric ultrasound reports. Sci Rep 2025;15:14627. [PMID: 40287483 PMCID: PMC12033324 DOI: 10.1038/s41598-025-99268-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 04/18/2025] [Indexed: 04/29/2025] Open

Liu R, Liu J, Yang J, Sun Z, Yan H. Comparative analysis of ChatGPT-4o mini, ChatGPT-4o and Gemini Advanced in the treatment of postmenopausal osteoporosis. BMC Musculoskelet Disord 2025;26:369. [PMID: 40241048 PMCID: PMC12001388 DOI: 10.1186/s12891-025-08601-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/23/2025] [Accepted: 03/31/2025] [Indexed: 04/18/2025] Open

Abstract

BACKGROUND

Osteoporosis is a sex-specific disease. Postmenopausal osteoporosis (PMOP) has been the focus of public health research worldwide. The purpose of this study is to evaluate the quality and readability of artificial intelligence large-scale language models (AI-LLMs): ChatGPT-4o mini, ChatGPT-4o and Gemini Advanced for responses generated in response to questions related to PMOP.

METHODS

We collected 48 PMOP frequently asked questions (FAQs) through offline counseling and online medical community forums. We also prepared 24 specific questions about PMOP based on the Management of Postmenopausal Osteoporosis: 2022 ACOG Clinical Practice Guideline No. 2 (2022 ACOG-PMOP Guideline). In this project, the FAQs were imported into the AI-LLMs (ChatGPT-4o mini, ChatGPT-4o, Gemini Advanced) and randomly assigned to four professional orthopedic surgeons, who independently rated the satisfaction of each response via a 5-point Likert scale. Furthermore, a Flesch Reading Ease (FRE) score was calculated for each of the LLMs' responses to assess the readability of the text generated by each LLM.

RESULTS

When it comes to addressing questions related to PMOP and the 2022 ACOG-PMOP guidelines, ChatGPT-4o and Gemini Advanced provide more concise answers than ChatGPT-4o mini. In terms of the overall FAQs of PMOP, ChatGPT-4o has a significantly higher accuracy rate than ChatGPT-4o mini and Gemini Advanced. When answering questions related to the 2022 ACOG-PMOP guidelines, ChatGPT-4o mini vs. ChatGPT-4o have significantly higher response accuracy than Gemini Advanced. ChatGPT-4o mini, ChatGPT-4o, and Gemini Advanced all have good levels of self-correction.

CONCLUSIONS

Our research shows that Gemini Advanced and ChatGPT-4o provide more concise and intuitive answers. ChatGPT-4o responds better in answering frequently asked questions related to PMOP. When answering questions related to the 2022 ACOG-PMOP guidelines, ChatGPT-4o mini and ChatGPT-4o responded significantly better than Gemini Advanced. ChatGPT-4o mini, ChatGPT-4o, and Gemini Advanced have demonstrated a strong ability to self-correct.

CLINICAL TRIAL NUMBER

Not applicable.

Collapse

Zhou X, Chen Y, Abdulghani EA, Zhang X, Zheng W, Li Y. Performance in answering orthodontic patients' frequently asked questions: Conversational artificial intelligence versus orthodontists. J World Fed Orthod 2025:S2212-4438(25)00012-8. [PMID: 40140287 DOI: 10.1016/j.ejwf.2025.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 02/11/2025] [Accepted: 02/11/2025] [Indexed: 03/28/2025]

Chen R, Zeng D, Li Y, Huang R, Sun D, Li T. Evaluating the performance and clinical decision-making impact of ChatGPT-4 in reproductive medicine. Int J Gynaecol Obstet 2025;168:1285-1291. [PMID: 39526823 DOI: 10.1002/ijgo.15959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Revised: 10/22/2024] [Accepted: 09/30/2024] [Indexed: 11/16/2024]

Abstract

BACKGROUND

ChatGPT, a sophisticated language model developed by OpenAI, has the potential to offer professional and patient-friendly support. We aimed to assess the accuracy and reproducibility of ChatGPT-4 in answering questions related to knowledge, management, and support within the field of reproductive medicine.

METHODS

ChatGPT-4 was used to respond to queries sourced from a domestic attending physician examination database, as well as to address both local and international treatment guidelines within the field of reproductive medicine. Each response generated by ChatGPT-4 was independently evaluated by a trio of experts specializing in reproductive medicine. The experts used four qualitative measures-relevance, accuracy, completeness, and understandability-to assess each response.

RESULTS

We found that ChatGPT-4 demonstrated extensive knowledge in reproductive medicine, with median scores for relevance, accuracy, completeness, and comprehensibility of objective questions being 4, 3.5, 3, and 3, respectively. However, the composite accuracy rate for multiple-choice questions was 63.38%. Significant discrepancies were observed among the three experts' scores across all four measures. Expert 1 generally provided higher and more consistent scores, while Expert 3 awarded lower scores for accuracy. ChatGPT-4's responses to both domestic and international guidelines showed varying levels of understanding, with a lack of knowledge on regional guideline variations. However, it offered practical and multifaceted advice regarding next steps and adjusting to new guidelines.

CONCLUSIONS

We analyzed the strengths and limitations of ChatGPT-4's responses on the management of reproductive medicine and relevant support. ChatGPT-4 might serve as a supplementary informational tool for patients and physicians to improve outcomes in the field of reproductive medicine.

Collapse

Nieves-Lopez B, Bechtle AR, Traverse J, Klifto C, Schoch BS, Aziz KT. Evaluating the Evolution of ChatGPT as an Information Resource in Shoulder and Elbow Surgery. Orthopedics 2025;48:e69-e74. [PMID: 39879624 DOI: 10.3928/01477447-20250123-03] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]

Abstract

BACKGROUND

The purpose of this study was to evaluate the performance and evolution of Chat Generative Pre-Trained Transformer (ChatGPT; OpenAI) as a resource for shoulder and elbow surgery information by assessing its accuracy on the American Academy of Orthopaedic Surgeons shoulder-elbow self-assessment questions. We hypothesized that both ChatGPT models would demonstrate proficiency and that there would be significant improvement with progressive iterations.

MATERIALS AND METHODS

A total of 200 questions were selected from the 2019 and 2021 American Academy of Orthopaedic Surgeons shoulder-elbow self-assessment questions. ChatGPT 3.5 and 4 were used to evaluate all questions. Questions with non-text data were excluded (114 questions). Remaining questions were input into ChatGPT and categorized as follows: anatomy, arthroplasty, basic science, instability, miscellaneous, nonoperative, and trauma. ChatGPT's performances were quantified and compared across categories with chi-square tests. The continuing medical education credit threshold of 50% was used to determine proficiency. Statistical significance was set at P<.05.

RESULTS

ChatGPT 3.5 and 4 answered 52.3% and 73.3% of the questions correctly, respectively (P=.003). ChatGPT 3.5 performed significantly better in the instability category (P=.037). ChatGPT 4's performance did not significantly differ across categories (P=.841). ChatGPT 4 performed significantly better than ChatGPT 3.5 in all categories except instability and miscellaneous.

CONCLUSION

ChatGPT 3.5 and 4 exceeded the proficiency threshold. ChatGPT 4 performed better than ChatGPT 3.5, showing an increased capability to correctly answer shoulder and elbow-focused questions. Further refinement of ChatGPT's training may improve its performance and utility as a resource. Currently, ChatGPT remains unable to answer questions at a high enough accuracy to replace clinical decision-making. [Orthopedics. 2025;48(2):e69-e74.].

Collapse

Guo S, Li R, Li G, Chen W, Huang J, He L, Ma Y, Wang L, Zheng H, Tian C, Zhao Y, Pan X, Wan H, Liu D, Li Z, Lei J. Comparing ChatGPT's and Surgeon's Responses to Thyroid-related Questions From Patients. J Clin Endocrinol Metab 2025;110:e841-e850. [PMID: 38597169 DOI: 10.1210/clinem/dgae235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 04/03/2024] [Accepted: 04/08/2024] [Indexed: 04/11/2024]

Affiliation(s)

Siyin Guo Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
Ruicen Li Health Management Center, General Practice Medical Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
Genpeng Li Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
Wenjie Chen Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
Jing Huang Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
Linye He Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
Yu Ma Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
Liying Wang Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
Hongping Zheng Department of Thyroid Surgery, General Surgery Ward 7, The First Hospital of Lanzhou University, Lanzhou, Gansu 730000, China
Chunxiang Tian Chengdu Women's and Children's Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, Sichuan 610031, China
Yatong Zhao Thyroid Surgery, Zhengzhou Central Hospital Affiliated of Zhengzhou University, Zhengzhou, Henan 450007, China
Xinmin Pan Department of Thyroid Surgery, General Surgery III, Gansu Provincial Hospital, Lanzhou, Gansu 730000, China
Hongxing Wan Department of Oncology, Sanya People's Hospital, Sanya, Hainan 572000, China
Dasheng Liu Department of Vascular Thyroid Surgery, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510120, China
Zhihui Li Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
Jianyong Lei Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China

Collapse

Wang X, Ye H, Zhang S, Yang M, Wang X. Evaluation of the Performance of Three Large Language Models in Clinical Decision Support: A Comparative Study Based on Actual Cases. J Med Syst 2025;49:23. [PMID: 39948214 DOI: 10.1007/s10916-025-02152-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Accepted: 01/23/2025] [Indexed: 05/09/2025]

Zapata-Caballero CA, Galindo-Rodriguez NA, Rodriguez-Lane R, Cueto-Cámara JF, Gorbea-Chávez V, Granados-Martínez V. Evaluating language processing artificial intelligence answers to patient-generated queries on chronic pelvic pain. PAIN MEDICINE (MALDEN, MASS.) 2025;26:114-116. [PMID: 39404826 DOI: 10.1093/pm/pnae104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 10/02/2024] [Accepted: 10/10/2024] [Indexed: 02/04/2025]

He N, Yan Y, Wu Z, Cheng Y, Liu F, Li X, Zhai S. Chat GPT-4 significantly surpasses GPT-3.5 in drug information queries. J Telemed Telecare 2025;31:306-308. [PMID: 37350055 DOI: 10.1177/1357633x231181922] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2023]

Cohen A, Burns J, Gabra M, Gordon A, Deebel N, Terlecki R, Woodburn KL. Performance of Chat Generative Pre-Trained Transformer on Personal Review of Learning in Obstetrics and Gynecology. South Med J 2025;118:102-105. [PMID: 39883147 DOI: 10.14423/smj.0000000000001783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]

Cohen ND, Ho M, McIntire D, Smith K, Kho KA. A comparative analysis of generative artificial intelligence responses from leading chatbots to questions about endometriosis. AJOG GLOBAL REPORTS 2025;5:100405. [PMID: 39810943 PMCID: PMC11730533 DOI: 10.1016/j.xagr.2024.100405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2025] Open

Abstract

Introduction

The use of generative artificial intelligence (AI) has begun to permeate most industries, including medicine, and patients will inevitably start using these large language model (LLM) chatbots as a modality for education. As healthcare information technology evolves, it is imperative to evaluate chatbots and the accuracy of the information they provide to patients and to determine if there is variability between them.

Objective

This study aimed to evaluate the accuracy and comprehensiveness of three chatbots in addressing questions related to endometriosis and determine the level of variability between them.

Study Design

Three LLMs, including Chat GPT-4 (Open AI), Claude (Anthropic), and Bard (Google) were asked to generate answers to 10 commonly asked questions about endometriosis. The responses were qualitatively compared to current guidelines and expert opinion on endometriosis and rated on a scale by nine gynecologists. The grading scale included the following: (1) Completely incorrect, (2) mostly incorrect and some correct, (3) mostly correct and some incorrect, (4) correct but inadequate, (5) correct and comprehensive. Final scores were averaged between the nine reviewers. Kendall's W and the related chi-square test were used to evaluate the reviewers' strength of agreement in ranking the LLMs' responses for each item.

Results

Average scores for the 10 answers amongst Bard, Chat GPT, and Claude were 3.69, 4.24, and 3.7, respectively. Two questions showed significant disagreement between the nine reviewers. There were no questions the models could answer comprehensively or correctly across the reviewers. The model most associated with comprehensive and correct responses was ChatGPT. Chatbots showed an improved ability to accurately answer questions about symptoms and pathophysiology over treatment and risk of recurrence.

Conclusion

The analysis of LLMs revealed that, on average, they mainly provided correct but inadequate responses to commonly asked patient questions about endometriosis. While chatbot responses can serve as valuable supplements to information provided by licensed medical professionals, it is crucial to maintain a thorough ongoing evaluation process for outputs to provide the most comprehensive and accurate information to patients. Further research into this technology and its role in patient education and treatment is crucial as generative AI becomes more embedded in the medical field.

Collapse

Layton AT. Artificial Intelligence and Machine Learning in Preeclampsia. Arterioscler Thromb Vasc Biol 2025;45:165-171. [PMID: 39744839 DOI: 10.1161/atvbaha.124.321673] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2025]

Yuan XT, Shao CY, Zhang ZZ, Qian D. Comparing the performance of ChatGPT and ERNIE Bot in answering questions regarding liver cancer interventional radiology in Chinese and English contexts: A comparative study. Digit Health 2025;11:20552076251315511. [PMID: 39850627 PMCID: PMC11755525 DOI: 10.1177/20552076251315511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Accepted: 01/08/2025] [Indexed: 01/25/2025] Open

Abstract

Introduction

This study aims to critically assess the appropriateness and limitations of two prominent large language models (LLMs), enhanced representation through knowledge integration (ERNIE Bot) and chat generative pre-trained transformer (ChatGPT), in answering questions about liver cancer interventional radiology. Through a comparative analysis, the performance of these models will be evaluated based on their responses to questions about transarterial chemoembolization and hepatic arterial infusion chemotherapy in both English and Chinese contexts.

Methods

A total of 38 questions were developed to cover a range of topics related to transarterial chemoembolization (TACE) and hepatic arterial infusion chemotherapy (HAIC), including foundational knowledge, patient education, and treatment and care. The responses generated by ERNIE Bot and ChatGPT were rigorously evaluated by 10 professionals in liver cancer interventional radiology. The final score was determined by one seasoned clinical expert. Each response was rated on a five-point Likert scale, facilitating a quantitative analysis of the accuracy and comprehensiveness of the information provided by each language model.

Results

ERNIE Bot is superior to ChatGPT in the Chinese context (ERNIE Bot: 5, 89.47%; 4, 10.53%; 3, 0%; 2, 0%; 1, 0% vs ChatGPT: 5, 57.89%; 4, 5.27%; 3, 34.21%; 2, 2.63%; 1, 0%; P = 0.001). However, ChatGPT outperformed ERNIE Bot in the English context (ERNIE Bot: 5, 73.68%; 4, 2.63%; 3, 13.16; 2, 10.53%；1, 0% vs ChatGPT: 5, 92.11%; 4, 2.63%; 3, 5.26%; 2, 0%; 1, 0%; P = 0.026).

Conclusions

This study preliminarily demonstrated that ERNIE Bot and ChatGPT effectively address questions related to liver cancer interventional radiology. However, their performance varied by language: ChatGPT excelled in English contexts, while ERNIE Bot performed better in Chinese. We found that choosing the appropriate LLMs is beneficial for patients in obtaining more accurate treatment information. Both models require manual review to ensure accuracy and reliability in practical use.

Collapse

Barbosa-Silva J, Driusso P, Ferreira EA, de Abreu RM. Exploring the Efficacy of Artificial Intelligence: A Comprehensive Analysis of CHAT-GPT's Accuracy and Completeness in Addressing Urinary Incontinence Queries. Neurourol Urodyn 2025;44:153-164. [PMID: 39390731 DOI: 10.1002/nau.25603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 09/05/2024] [Accepted: 09/25/2024] [Indexed: 10/12/2024]

Abstract

BACKGROUND

Artificial intelligence models are increasingly gaining popularity among patients and healthcare professionals. While it is impossible to restrict patient's access to different sources of information on the Internet, healthcare professional needs to be aware of the content-quality available across different platforms.

OBJECTIVE

To investigate the accuracy and completeness of Chat Generative Pretrained Transformer (ChatGPT) in addressing frequently asked questions related to the management and treatment of female urinary incontinence (UI), compared to recommendations from guidelines.

METHODS

This is a cross-sectional study. Two researchers developed 14 frequently asked questions related to UI. Then, they were inserted into the ChatGPT platform on September 16, 2023. The accuracy (scores from 1 to 5) and completeness (score from 1 to 3) of ChatGPT's answers were assessed individually by two experienced researchers in the Women's Health field, following the recommendations proposed by the guidelines for UI.

RESULTS

Most of the answers were classified as "more correct than incorrect" (n = 6), followed by "incorrect information than correct" (n = 3), "approximately equal correct and incorrect" (n = 2), "near all correct" (n = 2, and "correct" (n = 1). Regarding the appropriateness, most of the answers were classified as adequate, as they provided the minimum information expected to be classified as correct.

CONCLUSION

These results showed an inconsistency when evaluating the accuracy of answers generated by ChatGPT compared by scientific guidelines. Almost all the answers did not bring the complete content expected or reported in previous guidelines, which highlights to healthcare professionals and scientific community a concern about using artificial intelligence in patient counseling.

Collapse

Zitu MM, Le TD, Duong T, Haddadan S, Garcia M, Amorrortu R, Zhao Y, Rollison DE, Thieu T. Large language models in cancer: potentials, risks, and safeguards. BJR ARTIFICIAL INTELLIGENCE 2025;2:ubae019. [PMID: 39777117 PMCID: PMC11703354 DOI: 10.1093/bjrai/ubae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 10/26/2024] [Accepted: 12/09/2024] [Indexed: 01/11/2025]

Huang AE, Chang MT, Khanwalkar A, Yan CH, Phillips KM, Yong MJ, Nayak JV, Hwang PH, Patel ZM. Utilization of ChatGPT for Rhinology Patient Education: Limitations in a Surgical Sub-Specialty. OTO Open 2025;9:e70065. [PMID: 39776758 PMCID: PMC11705442 DOI: 10.1002/oto2.70065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 10/28/2024] [Accepted: 12/08/2024] [Indexed: 01/11/2025] Open

Gungor ND, Esen FS, Tasci T, Gungor K, Cil K. Navigating Gynecological Oncology with Different Versions of ChatGPT: A Transformative Breakthrough or the Next Black Box Challenge? Oncol Res Treat 2024;48:102-111. [PMID: 39689699 DOI: 10.1159/000543173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Accepted: 12/10/2024] [Indexed: 12/19/2024]

Zhang S, Chu Q, Li Y, Liu J, Wang J, Yan C, Liu W, Wang Y, Zhao C, Zhang X, Chen Y. Evaluation of large language models under different training background in Chinese medical examination: a comparative study. Front Artif Intell 2024;7:1442975. [PMID: 39697797 PMCID: PMC11652508 DOI: 10.3389/frai.2024.1442975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Accepted: 11/06/2024] [Indexed: 12/20/2024] Open

Abstract BackgroundRecently, Large Language Models have shown impressive potential in medical services. However, the aforementioned research primarily discusses the performance of LLMs developed in English within English-speaking medical contexts, ignoring the LLMs developed under different linguistic environments with respect to their performance in the Chinese clinical medicine field.ObjectiveThrough a comparative analysis of three LLMs developed under different training background, we firstly evaluate their potential as medical service tools in a Chinese language environment. Furthermore, we also point out the limitations in the application of Chinese medical practice.MethodUtilizing the APIs provided by three LLMs, we conducted an automated assessment of their performance in the 2023 CMLE. We not only examined the accuracy of three LLMs across various question, but also categorized the reasons for their errors. Furthermore, we performed repetitive experiments on selected questions to evaluate the stability of the outputs generated by the LLMs.ResultThe accuracy of GPT-4, ERNIE Bot, and DISC-MedLLM in CMLE are 65.2, 61.7, and 25.3%. In error types, the knowledge errors of GPT-4 and ERNIE Bot account for 52.2 and 51.7%, while hallucinatory errors account for 36.4 and 52.6%. In the Chinese text generation experiment, the general LLMs demonstrated high natural language understanding ability and was able to generate clear and standardized Chinese texts. In repetitive experiments, the LLMs showed a certain output stability of 70%, but there were still cases of inconsistent output results.ConclusionGeneral LLMs, represented by GPT-4 and ERNIE Bot, demonstrate the capability to meet the standards of the CMLE. Despite being developed and trained in different linguistic contexts, they exhibit excellence in understanding Chinese natural language and Chinese clinical knowledge, highlighting their broad potential application in Chinese medical practice. However, these models still show deficiencies in mastering specialized knowledge, addressing ethical issues, and maintaining the outputs stability. Additionally, there is a tendency to avoid risk when providing medical advice. Collapse

Rotem R, Zamstein O, Rottenstreich M, O'Sullivan OE, O'reilly BA, Weintraub AY. The future of patient education: A study on AI-driven responses to urinary incontinence inquiries. Int J Gynaecol Obstet 2024;167:1004-1009. [PMID: 38944693 DOI: 10.1002/ijgo.15751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 05/30/2024] [Accepted: 06/14/2024] [Indexed: 07/01/2024]

Abstract

OBJECTIVE

To evaluate the effectiveness of ChatGPT in providing insights into common urinary incontinence concerns within urogynecology. By analyzing the model's responses against established benchmarks of accuracy, completeness, and safety, the study aimed to quantify its usefulness for informing patients and aiding healthcare providers.

METHODS

An expert-driven questionnaire was developed, inviting urogynecologists worldwide to assess ChatGPT's answers to 10 carefully selected questions on urinary incontinence (UI). These assessments focused on the accuracy of the responses, their comprehensiveness, and whether they raised any safety issues. Subsequent statistical analyses determined the average consensus among experts and identified the proportion of responses receiving favorable evaluations (a score of 4 or higher).

RESULTS

Of 50 urogynecologists that were approached worldwide, 37 responded, offering insights into ChatGPT's responses on UI. The overall feedback averaged a score of 4.0, indicating a positive acceptance. Accuracy scores averaged 3.9 with 71% rated favorably, whereas comprehensiveness scored slightly higher at 4 with 74% favorable ratings. Safety assessments also averaged 4 with 74% favorable responses.

CONCLUSION

This investigation underlines ChatGPT's favorable performance across the evaluated domains of accuracy, comprehensiveness, and safety within the context of UI queries. However, despite this broadly positive reception, the study also signals a clear avenue for improvement, particularly in the precision of the provided information. Refining ChatGPT's accuracy and ensuring the delivery of more pinpointed responses are essential steps forward, aiming to bolster its utility as a comprehensive educational resource for patients and a supportive tool for healthcare practitioners.

Collapse

Gurbuz T, Gokmen O, Devranoglu B, Yurci A, Madenli AA. Artificial intelligence in reproductive endocrinology: an in-depth longitudinal analysis of ChatGPTv4's month-by-month interpretation and adherence to clinical guidelines for diminished ovarian reserve. Endocrine 2024;86:1171-1177. [PMID: 39341951 DOI: 10.1007/s12020-024-04031-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 09/03/2024] [Indexed: 10/01/2024]

Abstract

OBJECTIVE

To quantitatively assess the performance of ChatGPTv4, an Artificial Intelligence Language Model, in adhering to clinical guidelines for Diminished Ovarian Reserve (DOR) over two months, evaluating the model's consistency in providing guideline-based responses.

DESIGN

A longitudinal study design was employed to evaluate ChatGPTv4's response accuracy and completeness using a structured questionnaire at baseline and at a two-month follow-up.

SETTING

ChatGPTv4 was tasked with interpreting DOR questionnaires based on standardized clinical guidelines.

PARTICIPANTS

The study did not involve human participants; the questionnaire was exclusively administered to the ChatGPT model to generate responses about DOR.

METHODS

A guideline-based questionnaire with 176 open-ended, 166 multiple-choice, and 153 true/false questions were deployed to rigorously assess ChatGPTv4's ability to provide accurate medical advice aligned with current DOR clinical guidelines. AI-generated responses were rated on a 6-point Likert scale for accuracy and a 3-point scale for completeness. The two-phase design assessed the stability and consistency of AI-generated answers over two months.

RESULTS

ChatGPTv4 achieved near-perfect scores across all question types, with true/false questions consistently answered with 100% accuracy. In multiple-choice queries, accuracy improved from 98.2 to 100% at the two-month follow-up. Open-ended question responses exhibited significant positive enhancements, with accuracy scores increasing from an average of 5.38 ± 0.71 to 5.74 ± 0.51 (max: 6.0) and completeness scores from 2.57 ± 0.52 to 2.85 ± 0.36 (max: 3.0). It underscored the improvements as significant (p < 0.001), with positive correlations between initial and follow-up accuracy (r = 0.597) and completeness (r = 0.381) scores.

LIMITATIONS

The study was limited by the reliance on a controlled, albeit simulated, setting that may not perfectly mirror real-world clinical interactions.

CONCLUSION

ChatGPTv4 demonstrated exceptional and improving accuracy and completeness in handling DOR-related guideline queries over the studied period. These findings highlight ChatGPTv4's potential as a reliable, adaptable AI tool in reproductive endocrinology, capable of augmenting clinical decision-making and guideline development.

Collapse

Cheng T, Li Y, Gu J, He Y, He G, Zhou P, Li S, Xu H, Bao Y, Wang X. The performance of ChatGPT in day surgery and pre-anesthesia risk assessment: a case-control study of 150 simulated patient presentations. Perioper Med (Lond) 2024;13:111. [PMID: 39574189 PMCID: PMC11580513 DOI: 10.1186/s13741-024-00469-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 11/09/2024] [Indexed: 11/25/2024] Open

Abstract

BACKGROUND

Day surgery has developed rapidly in China in recent years, although it still faces a shortage of anesthesiologists to handle pre-anesthesia routine before surgery. We hypothesized that ChatGPT may assist anesthesia practitioners in preoperative assessment and answer questions on the concerns of patients. The aims of this study were to examine the ability of ChatGPT to assess preoperative risk and determine its accuracy in answering questions regarding knowledge and management of day surgery anesthesia.

METHODS

One-hundred fifty patient profiles were generated to simulate day surgery patient presentations that involved complications of varying acuity and severity. The ChatGPT group and the expert group were both required to evaluate the profiles of 150 simulated patients to determine their ASA-PS classification and whether day surgery was recommended. ChatGPT was then asked to answer 131 questions about day surgery anesthesia that represented the most common issues encountered in clinical practice. The performance of ChatGPT was assessed and graded independently by two experienced anesthesiologists.

RESULTS

A total of 150 patient profiles were included in the study (75 males [50.0%] and 75 females [50.0%]). There was no difference between the ChatGPT group and the expert group for the ASA-PS classification and assessment of anesthesia risk in the patient profiles (P > 0.05). Regarding recommendation for day surgery in patients with certain comorbidities (ASA ≥ II), the expert group was inclined to require further examination or treatment. In addition, the proportion of conclusions made by ChatGPT was smaller than that of the experts (i.e., ChatGPT n (%) vs. expert n (%): day surgery can be performed, 67 (47.9) vs. 31 (25.4); needs further treatment and evaluation, 56 (37.3) vs. 66 (44.0); and day surgery is not recommended, 18 (12.9) vs. 29 (9.3), P < 0.05). We showed that ChatGPT had extensive knowledge related to day surgery anesthesia (94.0% correct), with most of the points (70%) considered comprehensive. The performance of ChatGPT was also better in the domains of peri-anesthesia concerns, lifestyle, and emotional support.

CONCLUSIONS

ChatGPT can assist anesthesia practitioners and surgeons by alerting them to the ASA-PS classification and assessing perioperative risk in day surgery patients. ChatGPT can also be trusted to answer questions and concerns related to pre-anesthesia and therefore has the potential to provide important assistance in clinical work.

Collapse

Desseauve D, Lescar R, de la Fourniere B, Ceccaldi PF, Dziadzko M. AI in obstetrics: Evaluating residents' capabilities and interaction strategies with ChatGPT. Eur J Obstet Gynecol Reprod Biol 2024;302:238-241. [PMID: 39326228 DOI: 10.1016/j.ejogrb.2024.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 06/01/2024] [Accepted: 09/06/2024] [Indexed: 09/28/2024]

Abstract

In line with the digital transformation trend in medical training, students may resort to artificial intelligence (AI) for learning. This study assessed the interaction between obstetrics residents and ChatGPT during clinically oriented summative evaluations related to acute hepatic steatosis of pregnancy, and their self-reported competencies in information technology (IT) and AI. The participants in this semi-qualitative observational study were 14 obstetrics residents from two university hospitals. Students' queries were categorized into three distinct types: third-party enquiries; search-engine-style queries; and GPT-centric prompts. Responses were compared against a standardized answer produced by ChatGPT with a Delphi-developed expert prompt. Data analysis employed descriptive statistics and correlation analysis to explore the relationship between AI/IT skills and response accuracy. The study participants showed moderate IT proficiency but low AI proficiency. Interaction with ChatGPT regarding clinical signs of acute hepatic steatosis gravidarum revealed a preference for third-party questioning, resulting in only 21% accurate responses due to misinterpretation of medical acronyms. No correlation was found between AI response accuracy and the residents' self-assessed IT or AI skills, with most expressing dissatisfaction with their AI training. This study underlines the discrepancy between perceived and actual AI proficiency, highlighted by clinically inaccurate yet plausible AI responses - a manifestation of the 'stochastic parrot' phenomenon. These findings advocate for the inclusion of structured AI literacy programmes in medical education, focusing on prompt engineering. These academic skills are essential to exploit AI's potential in obstetrics and gynaecology. The ultimate aim is to optimize patient care in AI-augmented health care, and prevent misleading and unsafe knowledge acquisition.

Collapse

Graf EM, McKinney JA, Dye AB, Lin L, Sanchez-Ramos L. Exploring the Limits of Artificial Intelligence for Referencing Scientific Articles. Am J Perinatol 2024;41:2072-2081. [PMID: 38653452 DOI: 10.1055/s-0044-1786033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]

Abstract

OBJECTIVE

To evaluate the reliability of three artificial intelligence (AI) chatbots (ChatGPT, Google Bard, and Chatsonic) in generating accurate references from existing obstetric literature.

STUDY DESIGN

Between mid-March and late April 2023, ChatGPT, Google Bard, and Chatsonic were prompted to provide references for specific obstetrical randomized controlled trials (RCTs) published in 2020. RCTs were considered for inclusion if they were mentioned in a previous article that primarily evaluated RCTs published by the top medical and obstetrics and gynecology journals with the highest impact factors in 2020 as well as RCTs published in a new journal focused on publishing obstetric RCTs. The selection of the three AI models was based on their popularity, performance in natural language processing, and public availability. Data collection involved prompting the AI chatbots to provide references according to a standardized protocol. The primary evaluation metric was the accuracy of each AI model in correctly citing references, including authors, publication title, journal name, and digital object identifier (DOI). Statistical analysis was performed using a permutation test to compare the performance of the AI models.

RESULTS

Among the 44 RCTs analyzed, Google Bard demonstrated the highest accuracy, correctly citing 13.6% of the requested RCTs, whereas ChatGPT and Chatsonic exhibited lower accuracy rates of 2.4 and 0%, respectively. Google Bard often substantially outperformed Chatsonic and ChatGPT in correctly citing the studied reference components. The majority of references from all AI models studied were noted to provide DOIs for unrelated studies or DOIs that do not exist.

CONCLUSION

To ensure the reliability of scientific information being disseminated, authors must exercise caution when utilizing AI for scientific writing and literature search. However, despite their limitations, collaborative partnerships between AI systems and researchers have the potential to drive synergistic advancements, leading to improved patient care and outcomes.

KEY POINTS

· AI chatbots often cite scientific articles incorrectly.. · AI chatbots can create false references.. · Responsible AI use in research is vital..

Collapse

Grünebaum A, Dudenhausen J, Chervenak FA. Enhancing patient understanding in obstetrics: The role of generative AI in simplifying informed consent for labor induction with oxytocin. J Perinat Med 2024:jpm-2024-0428. [PMID: 39470098 DOI: 10.1515/jpm-2024-0428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Accepted: 10/12/2024] [Indexed: 10/30/2024]

Anastasio MK, Peters P, Foote J, Melamed A, Modesitt SC, Musa F, Rossi E, Albright BB, Havrilesky LJ, Moss HA. The doc versus the bot: A pilot study to assess the quality and accuracy of physician and chatbot responses to clinical questions in gynecologic oncology. Gynecol Oncol Rep 2024;55:101477. [PMID: 39224817 PMCID: PMC11367046 DOI: 10.1016/j.gore.2024.101477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 08/03/2024] [Accepted: 08/06/2024] [Indexed: 09/04/2024] Open

Grossman S, Zerilli T, Nathan JP. Appropriateness of ChatGPT as a resource for medication-related questions. Br J Clin Pharmacol 2024;90:2691-2695. [PMID: 39096130 DOI: 10.1111/bcp.16212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 07/04/2024] [Accepted: 07/22/2024] [Indexed: 08/04/2024] Open

Peng L, Liang R, Zhao A, Sun R, Yi F, Zhong J, Li R, Zhu S, Zhang S, Wu S. Amplifying Chinese physicians' emphasis on patients' psychological states beyond urologic diagnoses with ChatGPT - a multicenter cross-sectional study. Int J Surg 2024;110:6501-6508. [PMID: 38954666 PMCID: PMC11487044 DOI: 10.1097/js9.0000000000001775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 05/29/2024] [Indexed: 07/04/2024]

Weichert J, Scharf JL. Advancements in Artificial Intelligence for Fetal Neurosonography: A Comprehensive Review. J Clin Med 2024;13:5626. [PMID: 39337113 PMCID: PMC11432922 DOI: 10.3390/jcm13185626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 09/04/2024] [Accepted: 09/16/2024] [Indexed: 09/30/2024] Open

Zheng C, Ye H, Guo J, Yang J, Fei P, Yuan Y, Huang D, Huang Y, Peng J, Xie X, Xie M, Zhao P, Chen L, Zhang M. Development and evaluation of a large language model of ophthalmology in Chinese. Br J Ophthalmol 2024;108:1390-1397. [PMID: 39019566 PMCID: PMC11503135 DOI: 10.1136/bjo-2023-324526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 06/03/2024] [Indexed: 07/19/2024]

Affiliation(s)

Ce Zheng Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China Institute of Hospital Development Strategy, China Hospital Development Institute, Shanghai Jiao Tong University, Shanghai, China
Hongfei Ye Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China Institute of Hospital Development Strategy, China Hospital Development Institute, Shanghai Jiao Tong University, Shanghai, China
Jinming Guo Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China
Junrui Yang Ophthalmology, The 74th Army Group Hospital, Guangzhou, Guangdong, China
Ping Fei Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
Yuanzhi Yuan Ophthalmology, Zhongshan Hospital Fudan University, Shanghai, China
Danqing Huang Discipline Inspection & Supervision Office, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
Yuqiang Huang Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China
Jie Peng Opthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
Xiaoling Xie Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China
Meng Xie Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
Peiquan Zhao Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
Li Chen Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
Mingzhi Zhang Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China

Collapse

Ye Z, Zhang B, Zhang K, Méndez MJG, Yan H, Wu T, Qu Y, Jiang Y, Xue P, Qiao Y. An assessment of ChatGPT's responses to frequently asked questions about cervical and breast cancer. BMC Womens Health 2024;24:482. [PMID: 39223612 PMCID: PMC11367894 DOI: 10.1186/s12905-024-03320-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 08/16/2024] [Indexed: 09/04/2024] Open

Abstract

BACKGROUND

Cervical cancer (CC) and breast cancer (BC) threaten women's well-being, influenced by health-related stigma and a lack of reliable information, which can cause late diagnosis and early death. ChatGPT is likely to become a key source of health information, although quality concerns could also influence health-seeking behaviours.

METHODS

This cross-sectional online survey compared ChatGPT's responses to five physicians specializing in mammography and five specializing in gynaecology. Twenty frequently asked questions about CC and BC were asked on 26th and 29th of April, 2023. A panel of seven experts assessed the accuracy, consistency, and relevance of ChatGPT's responses using a 7-point Likert scale. Responses were analyzed for readability, reliability, and efficiency. ChatGPT's responses were synthesized, and findings are presented as a radar chart.

RESULTS

ChatGPT had an accuracy score of 7.0 (range: 6.6-7.0) for CC and BC questions, surpassing the highest-scoring physicians (P < 0.05). ChatGPT took an average of 13.6 s (range: 7.6-24.0) to answer each of the 20 questions presented. Readability was comparable to that of experts and physicians involved, but ChatGPT generated more extended responses compared to physicians. The consistency of repeated answers was 5.2 (range: 3.4-6.7). With different contexts combined, the overall ChatGPT relevance score was 6.5 (range: 4.8-7.0). Radar plot analysis indicated comparably good accuracy, efficiency, and to a certain extent, relevance. However, there were apparent inconsistencies, and the reliability and readability be considered inadequate.

CONCLUSIONS

ChatGPT shows promise as an initial source of information for CC and BC. ChatGPT is also highly functional and appears to be superior to physicians, and aligns with expert consensus, although there is room for improvement in readability, reliability, and consistency. Future efforts should focus on developing advanced ChatGPT models explicitly designed to improve medical practice and for those with concerns about symptoms.

Collapse

Matsubara S. ChatGPT use should be prohibited in writing letters. Am J Obstet Gynecol 2024;231:e110. [PMID: 38710270 DOI: 10.1016/j.ajog.2024.04.046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 04/30/2024] [Indexed: 05/08/2024]

Anuk AT, Tanacan A, Kara Ö, Sahin D. Assessing adverse pregnancy outcomes in women with uncontrolled asthma vs. mild asthma: a retrospective comparative analysis. Arch Gynecol Obstet 2024;310:1433-1440. [PMID: 38276984 DOI: 10.1007/s00404-023-07347-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 12/11/2023] [Indexed: 01/27/2024]

Grünebaum A, Chervenak FA. The dichotomy between the scientific and artistic aspects of medical writing. Am J Obstet Gynecol 2024;231:e111. [PMID: 38710266 DOI: 10.1016/j.ajog.2024.04.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 04/30/2024] [Indexed: 05/08/2024]

Peled T, Sela HY, Weiss A, Grisaru-Granovsky S, Agrawal S, Rottenstreich M. Evaluating the validity of ChatGPT responses on common obstetric issues: Potential clinical applications and implications. Int J Gynaecol Obstet 2024;166:1127-1133. [PMID: 38523565 DOI: 10.1002/ijgo.15501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 02/29/2024] [Accepted: 03/10/2024] [Indexed: 03/26/2024]

Abstract

OBJECTIVE

To evaluate the quality of ChatGPT responses to common issues in obstetrics and assess its ability to provide reliable responses to pregnant individuals. The study aimed to examine the responses based on expert opinions using predetermined criteria, including "accuracy," "completeness," and "safety."

METHODS

We curated 15 common and potentially clinically significant questions that pregnant women are asking. Two native English-speaking women were asked to reframe the questions in their own words, and we employed the ChatGPT language model to generate responses to the questions. To evaluate the accuracy, completeness, and safety of the ChatGPT's generated responses, we developed a questionnaire with a scale of 1 to 5 that obstetrics and gynecology experts from different countries were invited to rate accordingly. The ratings were analyzed to evaluate the average level of agreement and percentage of positive ratings (≥4) for each criterion.

RESULTS

Of the 42 experts invited, 20 responded to the questionnaire. The combined score for all responses yielded a mean rating of 4, with 75% of responses receiving a positive rating (≥4). While examining specific criteria, the ChatGPT responses were better for the accuracy criterion, with a mean rating of 4.2 and 80% of the questions received a positive rating. The responses scored less for the completeness criterion, with a mean rating of 3.8 and 46.7% of questions received a positive rating. For safety, the mean rating was 3.9 and 53.3% of questions received a positive rating. There was no response with an average negative rating below three.

CONCLUSION

This study demonstrates promising results regarding potential use of ChatGPT's in providing accurate responses to obstetric clinical questions posed by pregnant women. However, it is crucial to exercise caution when addressing inquiries concerning the safety of the fetus or the mother.

Collapse

Hua R, Dong X, Wei Y, Shu Z, Yang P, Hu Y, Zhou S, Sun H, Yan K, Yan X, Chang K, Li X, Bai Y, Zhang R, Wang W, Zhou X. Lingdan: enhancing encoding of traditional Chinese medicine knowledge for clinical reasoning tasks with large language models. J Am Med Inform Assoc 2024;31:2019-2029. [PMID: 39038795 PMCID: PMC11339528 DOI: 10.1093/jamia/ocae087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/22/2024] [Accepted: 04/06/2024] [Indexed: 07/24/2024] Open

Abstract

OBJECTIVE

The recent surge in large language models (LLMs) across various fields has yet to be fully realized in traditional Chinese medicine (TCM). This study aims to bridge this gap by developing a large language model tailored to TCM knowledge, enhancing its performance and accuracy in clinical reasoning tasks such as diagnosis, treatment, and prescription recommendations.

MATERIALS AND METHODS

This study harnessed a wide array of TCM data resources, including TCM ancient books, textbooks, and clinical data, to create 3 key datasets: the TCM Pre-trained Dataset, the Traditional Chinese Patent Medicine (TCPM) Question Answering Dataset, and the Spleen and Stomach Herbal Prescription Recommendation Dataset. These datasets underpinned the development of the Lingdan Pre-trained LLM and 2 specialized models: the Lingdan-TCPM-Chat Model, which uses a Chain-of-Thought process for symptom analysis and TCPM recommendation, and a Lingdan Prescription Recommendation model (Lingdan-PR) that proposes herbal prescriptions based on electronic medical records.

RESULTS

The Lingdan-TCPM-Chat and the Lingdan-PR Model, fine-tuned on the Lingdan Pre-trained LLM, demonstrated state-of-the art performances for the tasks of TCM clinical knowledge answering and herbal prescription recommendation. Notably, Lingdan-PR outperformed all state-of-the-art baseline models, achieving an improvement of 18.39% in the Top@20 F1-score compared with the best baseline.

CONCLUSION

This study marks a pivotal step in merging advanced LLMs with TCM, showcasing the potential of artificial intelligence to help improve clinical decision-making of medical diagnostics and treatment strategies. The success of the Lingdan Pre-trained LLM and its derivative models, Lingdan-TCPM-Chat and Lingdan-PR, not only revolutionizes TCM practices but also opens new avenues for the application of artificial intelligence in other specialized medical fields. Our project is available at https://github.com/TCMAI-BJTU/LingdanLLM.

Collapse

Affiliation(s)

Rui Hua Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
Xin Dong Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
Yu Wei Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
Zixin Shu Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
Pengcheng Yang Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
Yunhui Hu Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
Shuiping Zhou Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
He Sun Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
Kaijing Yan Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
Xijun Yan Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
Kai Chang Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
Xiaodong Li Affiliated Hospital of Hubei University of Chinese Medicine, Wuhan 430065, China Hubei Academy of Chinese Medicine, Wuhan 430061, China Institute of Liver Diseases, Hubei Key Laboratory of Theoretical and Applied Research of Liver and Kidney in Traditional Chinese Medicine, Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan 430061, China
Yuning Bai Department of Gastroenterology, Guang’anmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China
Runshun Zhang Department of Gastroenterology, Guang’anmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China
Wenjia Wang Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
Xuezhong Zhou Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China

Collapse

Liu CH, Wang PH. Winners of the 2023 honor awards for excellence at the annual meeting of the Chinese Medical Association-Taipei: Part IV. J Chin Med Assoc 2024;87:817-818. [PMID: 38965650 DOI: 10.1097/jcma.0000000000001130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/06/2024] Open

Wang Y, Chen Y, Sheng J. Assessing ChatGPT as a Medical Consultation Assistant for Chronic Hepatitis B: Cross-Language Study of English and Chinese. JMIR Med Inform 2024;12:e56426. [PMID: 39115930 PMCID: PMC11342014 DOI: 10.2196/56426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 05/24/2024] [Accepted: 07/21/2024] [Indexed: 08/10/2024] Open

Abstract

BACKGROUND

Chronic hepatitis B (CHB) imposes substantial economic and social burdens globally. The management of CHB involves intricate monitoring and adherence challenges, particularly in regions like China, where a high prevalence of CHB intersects with health care resource limitations. This study explores the potential of ChatGPT-3.5, an emerging artificial intelligence (AI) assistant, to address these complexities. With notable capabilities in medical education and practice, ChatGPT-3.5's role is examined in managing CHB, particularly in regions with distinct health care landscapes.

OBJECTIVE

This study aimed to uncover insights into ChatGPT-3.5's potential and limitations in delivering personalized medical consultation assistance for CHB patients across diverse linguistic contexts.

METHODS

Questions sourced from published guidelines, online CHB communities, and search engines in English and Chinese were refined, translated, and compiled into 96 inquiries. Subsequently, these questions were presented to both ChatGPT-3.5 and ChatGPT-4.0 in independent dialogues. The responses were then evaluated by senior physicians, focusing on informativeness, emotional management, consistency across repeated inquiries, and cautionary statements regarding medical advice. Additionally, a true-or-false questionnaire was employed to further discern the variance in information accuracy for closed questions between ChatGPT-3.5 and ChatGPT-4.0.

RESULTS

Over half of the responses (228/370, 61.6%) from ChatGPT-3.5 were considered comprehensive. In contrast, ChatGPT-4.0 exhibited a higher percentage at 74.5% (172/222; P<.001). Notably, superior performance was evident in English, particularly in terms of informativeness and consistency across repeated queries. However, deficiencies were identified in emotional management guidance, with only 3.2% (6/186) in ChatGPT-3.5 and 8.1% (15/154) in ChatGPT-4.0 (P=.04). ChatGPT-3.5 included a disclaimer in 10.8% (24/222) of responses, while ChatGPT-4.0 included a disclaimer in 13.1% (29/222) of responses (P=.46). When responding to true-or-false questions, ChatGPT-4.0 achieved an accuracy rate of 93.3% (168/180), significantly surpassing ChatGPT-3.5's accuracy rate of 65.0% (117/180) (P<.001).

CONCLUSIONS

In this study, ChatGPT demonstrated basic capabilities as a medical consultation assistant for CHB management. The choice of working language for ChatGPT-3.5 was considered a potential factor influencing its performance, particularly in the use of terminology and colloquial language, and this potentially affects its applicability within specific target populations. However, as an updated model, ChatGPT-4.0 exhibits improved information processing capabilities, overcoming the language impact on information accuracy. This suggests that the implications of model advancement on applications need to be considered when selecting large language models as medical consultation assistants. Given that both models performed inadequately in emotional guidance management, this study highlights the importance of providing specific language training and emotional management strategies when deploying ChatGPT for medical purposes. Furthermore, the tendency of these models to use disclaimers in conversations should be further investigated to understand the impact on patients' experiences in practical applications.

Collapse

Burns C, Bakaj A, Berishaj A, Hristidis V, Deak P, Equils O. Use of Generative AI for Improving Health Literacy in Reproductive Health: Case Study. JMIR Form Res 2024;8:e59434. [PMID: 38986153 PMCID: PMC11336497 DOI: 10.2196/59434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 06/18/2024] [Accepted: 07/10/2024] [Indexed: 07/12/2024] Open

Abstract

BACKGROUND

Patients find technology tools to be more approachable for seeking sensitive health-related information, such as reproductive health information. The inventive conversational ability of artificial intelligence (AI) chatbots, such as ChatGPT (OpenAI Inc), offers a potential means for patients to effectively locate answers to their health-related questions digitally.

OBJECTIVE

A pilot study was conducted to compare the novel ChatGPT with the existing Google Search technology for their ability to offer accurate, effective, and current information regarding proceeding action after missing a dose of oral contraceptive pill.

METHODS

A sequence of 11 questions, mimicking a patient inquiring about the action to take after missing a dose of an oral contraceptive pill, were input into ChatGPT as a cascade, given the conversational ability of ChatGPT. The questions were input into 4 different ChatGPT accounts, with the account holders being of various demographics, to evaluate potential differences and biases in the responses given to different account holders. The leading question, "what should I do if I missed a day of my oral contraception birth control?" alone was then input into Google Search, given its nonconversational nature. The results from the ChatGPT questions and the Google Search results for the leading question were evaluated on their readability, accuracy, and effective delivery of information.

RESULTS

The ChatGPT results were determined to be at an overall higher-grade reading level, with a longer reading duration, less accurate, less current, and with a less effective delivery of information. In contrast, the Google Search resulting answer box and snippets were at a lower-grade reading level, shorter reading duration, more current, able to reference the origin of the information (transparent), and provided the information in various formats in addition to text.

CONCLUSIONS

ChatGPT has room for improvement in accuracy, transparency, recency, and reliability before it can equitably be implemented into health care information delivery and provide the potential benefits it poses. However, AI may be used as a tool for providers to educate their patients in preferred, creative, and efficient ways, such as using AI to generate accessible short educational videos from health care provider-vetted information. Larger studies representing a diverse group of users are needed.

Collapse

De Vito A, Geremia N, Marino A, Bavaro DF, Caruana G, Meschiari M, Colpani A, Mazzitelli M, Scaglione V, Venanzi Rullo E, Fiore V, Fois M, Campanella E, Pistarà E, Faltoni M, Nunnari G, Cattelan A, Mussini C, Bartoletti M, Vaira LA, Madeddu G. Assessing ChatGPT's theoretical knowledge and prescriptive accuracy in bacterial infections: a comparative study with infectious diseases residents and specialists. Infection 2024:10.1007/s15010-024-02350-6. [PMID: 38995551 DOI: 10.1007/s15010-024-02350-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 07/06/2024] [Indexed: 07/13/2024]

Abstract

OBJECTIVES

Advancements in Artificial Intelligence(AI) have made platforms like ChatGPT increasingly relevant in medicine. This study assesses ChatGPT's utility in addressing bacterial infection-related questions and antibiogram-based clinical cases.

METHODS

This study involved a collaborative effort involving infectious disease (ID) specialists and residents. A group of experts formulated six true/false, six open-ended questions, and six clinical cases with antibiograms for four types of infections (endocarditis, pneumonia, intra-abdominal infections, and bloodstream infection) for a total of 96 questions. The questions were submitted to four senior residents and four specialists in ID and inputted into ChatGPT-4 and a trained version of ChatGPT-4. A total of 720 responses were obtained and reviewed by a blinded panel of experts in antibiotic treatments. They evaluated the responses for accuracy and completeness, the ability to identify correct resistance mechanisms from antibiograms, and the appropriateness of antibiotics prescriptions.

RESULTS

No significant difference was noted among the four groups for true/false questions, with approximately 70% correct answers. The trained ChatGPT-4 and ChatGPT-4 offered more accurate and complete answers to the open-ended questions than both the residents and specialists. Regarding the clinical case, we observed a lower accuracy from ChatGPT-4 to recognize the correct resistance mechanism. ChatGPT-4 tended not to prescribe newer antibiotics like cefiderocol or imipenem/cilastatin/relebactam, favoring less recommended options like colistin. Both trained- ChatGPT-4 and ChatGPT-4 recommended longer than necessary treatment periods (p-value = 0.022).

CONCLUSIONS

This study highlights ChatGPT's capabilities and limitations in medical decision-making, specifically regarding bacterial infections and antibiogram analysis. While ChatGPT demonstrated proficiency in answering theoretical questions, it did not consistently align with expert decisions in clinical case management. Despite these limitations, the potential of ChatGPT as a supportive tool in ID education and preliminary analysis is evident. However, it should not replace expert consultation, especially in complex clinical decision-making.

Collapse

Affiliation(s)

Andrea De Vito Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, Italy. PhD School in Biomedical Science, Biomedical Science Department, University of Sassari, Sassari, Italy.
Nicholas Geremia Unit of Infectious Diseases, Department of Clinical Medicine, Ospedale dell'Angelo, Venice, Italy Unit of Infectious Diseases, Department of Clinical Medicine, Ospedale Civile S.S. Giovanni e Paolo, Venice, Italy
Andrea Marino Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, ARNAS Garibaldi Hospital, University of Catania, Catania, Italy
Davide Fiore Bavaro Infectious Diseases Unit - IRCCS Humanitas Research Hospital, Via Manzoni 56, Rozzano, Milan, 20089, Italy Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, Pieve Emanuele, Milan, 20090, Italy
Giorgia Caruana Infectious Diseases Service, Cantonal Hospital of Sion and Institut Central des Hôpitaux (ICH), Sion, Switzerland Institute of Microbiology, Department of Laboratory Medicine and Pathology, Lausanne University Hospital, Lausanne, Switzerland
Marianna Meschiari University of Modena and Reggio Emilia, Modena, Italy
Agnese Colpani Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, Italy
Maria Mazzitelli Infectious and Tropical Diseases Unit, Padua University Hospital, Padua, Italy
Vincenzo Scaglione Infectious and Tropical Diseases Unit, Padua University Hospital, Padua, Italy
Emmanuele Venanzi Rullo Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, University of Messina, Messina, Italy
Vito Fiore Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, Italy
Marco Fois Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, Italy
Edoardo Campanella Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, ARNAS Garibaldi Hospital, University of Catania, Catania, Italy Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, University of Messina, Messina, Italy
Eugenia Pistarà Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, ARNAS Garibaldi Hospital, University of Catania, Catania, Italy Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, University of Messina, Messina, Italy
Matteo Faltoni University of Modena and Reggio Emilia, Modena, Italy
Giuseppe Nunnari Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, ARNAS Garibaldi Hospital, University of Catania, Catania, Italy
Annamaria Cattelan Infectious and Tropical Diseases Unit, Padua University Hospital, Padua, Italy
Cristina Mussini University of Modena and Reggio Emilia, Modena, Italy
Michele Bartoletti Infectious Diseases Unit - IRCCS Humanitas Research Hospital, Via Manzoni 56, Rozzano, Milan, 20089, Italy Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, Pieve Emanuele, Milan, 20090, Italy
Luigi Angelo Vaira Maxillofacial Surgery Unit, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, Italy
Giordano Madeddu Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, Italy

Collapse

Yilmaz Muluk S, Olcucu N. Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-3.5 and GoogleBard in Identifying Red Flags of Low Back Pain. Cureus 2024;16:e63580. [PMID: 39087174 PMCID: PMC11290316 DOI: 10.7759/cureus.63580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/25/2024] [Indexed: 08/02/2024] Open

Abstract

BACKGROUND

Low back pain (LBP) is a prevalent healthcare concern that is frequently responsive to conservative treatment. However, it can also stem from severe conditions, marked by 'red flags' (RF) such as malignancy, cauda equina syndrome, fractures, infections, spondyloarthropathies, and aneurysm rupture, which physicians should be vigilant about. Given the increasing reliance on online health information, this study assessed ChatGPT-3.5's (OpenAI, San Francisco, CA, USA) and GoogleBard's (Google, Mountain View, CA, USA) accuracy in responding to RF-related LBP questions and their capacity to discriminate the severity of the condition.

METHODS

We created 70 questions on RF-related symptoms and diseases following the LBP guidelines. Among them, 58 had a single symptom (SS), and 12 had multiple symptoms (MS) of LBP. Questions were posed to ChatGPT and GoogleBard, and responses were assessed by two authors for accuracy, completeness, and relevance (ACR) using a 5-point rubric criteria.

RESULTS

Cohen's kappa values (0.60-0.81) indicated significant agreement among the authors. The average scores for responses ranged from 3.47 to 3.85 for ChatGPT-3.5 and from 3.36 to 3.76 for GoogleBard for 58 SS questions, and from 4.04 to 4.29 for ChatGPT-3.5 and from 3.50 to 3.71 for GoogleBard for 12 MS questions. The ratings for these responses ranged from 'good' to 'excellent'. Most SS responses effectively conveyed the severity of the situation (93.1% for ChatGPT-3.5, 94.8% for GoogleBard), and all MS responses did so. No statistically significant differences were found between ChatGPT-3.5 and GoogleBard scores (p>0.05).

CONCLUSIONS

In an era characterized by widespread online health information seeking, artificial intelligence (AI) systems play a vital role in delivering precise medical information. These technologies may hold promise in the field of health information if they continue to improve.

Collapse

Safrai M, Orwig KE. Utilizing artificial intelligence in academic writing: an in-depth evaluation of a scientific review on fertility preservation written by ChatGPT-4. J Assist Reprod Genet 2024;41:1871-1880. [PMID: 38619763 PMCID: PMC11263262 DOI: 10.1007/s10815-024-03089-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 03/07/2024] [Indexed: 04/16/2024] Open

Aghamaliyev U, Karimbayli J, Giessen-Jung C, Matthias I, Unger K, Andrade D, Hofmann FO, Weniger M, Angele MK, Benedikt Westphalen C, Werner J, Renz BW. ChatGPT's Gastrointestinal Tumor Board Tango: A limping dance partner? Eur J Cancer 2024;205:114100. [PMID: 38729055 DOI: 10.1016/j.ejca.2024.114100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 04/23/2024] [Indexed: 05/12/2024]

Abstract

OBJECTIVES

This study aimed to assess the consistency and replicability of treatment recommendations provided by ChatGPT 3.5 compared to gastrointestinal tumor cases presented at multidisciplinary tumor boards (MTBs). It also aimed to distinguish between general and case-specific responses and investigated the precision of ChatGPT's recommendations in replicating exact treatment plans, particularly regarding chemotherapy regimens and follow-up protocols.

MATERIAL AND METHODS

A retrospective study was carried out on 115 cases of gastrointestinal malignancies, selected from 448 patients reviewed in MTB meetings. A senior resident fed patient data into ChatGPT 3.5 to produce treatment recommendations, which were then evaluated against the tumor board's decisions by senior oncology fellows.

RESULTS

Among the examined cases, ChatGPT 3.5 provided general information about the malignancy without considering individual patient characteristics in 19% of cases. However, only in 81% of cases, ChatGPT generated responses that were specific to the individual clinical scenarios. In the subset of case-specific responses, 83% of recommendations exhibited overall treatment strategy concordance between ChatGPT and MTB. However, the exact treatment concordance dropped to 65%, notably lower in recommending specific chemotherapy regimens. Cases recommended for surgery showed the highest concordance rates, while those involving chemotherapy recommendations faced challenges in precision.

CONCLUSIONS

ChatGPT 3.5 demonstrates potential in aligning conceptual approaches to treatment strategies with MTB guidelines. However, it falls short in accurately duplicating specific treatment plans, especially concerning chemotherapy regimens and follow-up procedures. Ethical concerns and challenges in achieving exact replication necessitate prudence when considering ChatGPT 3.5 for direct clinical decision-making in MTBs.

Collapse

Affiliation(s)

Ughur Aghamaliyev Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
Javad Karimbayli Division of Molecular Oncology, Centro di Riferimento Oncologico di Aviano (CRO), IRCCS, National Cancer Institute, Aviano, Italy
Clemens Giessen-Jung Comprehensive Cancer Center Munich & Department of Medicine III, LMU University Hospital, LMU Munich, Germany
Ilmer Matthias Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
Kristian Unger German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany; Department of Radiation Oncology, University Hospital, LMU Munich, 81377; Bavarian Cancer Research Center (BZKF), Munich, Germany
Dorian Andrade Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
Felix O Hofmann Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
Maximilian Weniger Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
Martin K Angele Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
C Benedikt Westphalen Comprehensive Cancer Center Munich & Department of Medicine III, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
Jens Werner Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
Bernhard W Renz Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.

Collapse

Khromchenko K, Shaikh S, Singh M, Vurture G, Rana RA, Baum JD. ChatGPT-3.5 Versus Google Bard: Which Large Language Model Responds Best to Commonly Asked Pregnancy Questions? Cureus 2024;16:e65543. [PMID: 39188430 PMCID: PMC11346960 DOI: 10.7759/cureus.65543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/27/2024] [Indexed: 08/28/2024] Open

Abstract

Large language models (LLM) have been widely used to provide information in many fields, including obstetrics and gynecology. Which model performs best in providing answers to commonly asked pregnancy questions is unknown. A qualitative analysis of Chat Generative Pre-Training Transformer Version 3.5 (ChatGPT-3.5) (OpenAI, Inc., San Francisco, California, United States) and Bard, recently renamed Google Gemini (Google LLC, Mountain View, California, United States), was performed in August of 2023. Each LLM was queried on 12 commonly asked pregnancy questions and asked for their references. Review and grading of the responses and references for both LLMs were performed by the co-authors individually and then as a group to formulate a consensus. Query responses were graded as "acceptable" or "not acceptable" based on correctness and completeness in comparison to American College of Obstetricians and Gynecologists (ACOG) publications, PubMed-indexed evidence, and clinical experience. References were classified as "verified," "broken," "irrelevant," "non-existent," and "no references." Grades of "acceptable" were given to 58% of ChatGPT-3.5 responses (seven out of 12) and 83% of Bard responses (10 out of 12). In regard to references, ChatGPT-3.5 had reference issues in 100% of its references, and Bard had discrepancies in 8% of its references (one out of 12). When comparing ChatGPT-3.5 responses between May 2023 and August 2023, a change in "acceptable" responses was noted: 50% versus 58%, respectively. Bard answered more questions correctly than ChatGPT-3.5 when queried on a small sample of commonly asked pregnancy questions. ChatGPT-3.5 performed poorly in terms of reference verification. The overall performance of ChatGPT-3.5 remained stable over time, with approximately one-half of responses being "acceptable" in both May and August of 2023. Both LLMs need further evaluation and vetting before being accepted as accurate and reliable sources of information for pregnant women.

Collapse

Rodrigues Alessi M, Gomes HA, Lopes de Castro M, Terumy Okamoto C. Performance of ChatGPT in Solving Questions From the Progress Test (Brazilian National Medical Exam): A Potential Artificial Intelligence Tool in Medical Practice. Cureus 2024;16:e64924. [PMID: 39156244 PMCID: PMC11330648 DOI: 10.7759/cureus.64924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/19/2024] [Indexed: 08/20/2024] Open

Meyer R, Hamilton KM, Truong MD, Wright KN, Siedhoff MT, Brezinov Y, Levin G. ChatGPT compared with Google Search and healthcare institution as sources of postoperative patient instructions after gynecological surgery. BJOG 2024;131:1154-1156. [PMID: 38177090 DOI: 10.1111/1471-0528.17746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/13/2023] [Indexed: 01/06/2024]

Kotzur T, Singh A, Parker J, Peterson B, Sager B, Rose R, Corley F, Brady C. Evaluation of a Large Language Model's Ability to Assist in an Orthopedic Hand Clinic. Hand (N Y) 2024:15589447241257643. [PMID: 38907651 PMCID: PMC11571334 DOI: 10.1177/15589447241257643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/24/2024]

Moll M, Heilemann G, Georg D, Kauer-Dorner D, Kuess P. The role of artificial intelligence in informed patient consent for radiotherapy treatments-a case report. Strahlenther Onkol 2024;200:544-548. [PMID: 38180493 DOI: 10.1007/s00066-023-02190-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 12/03/2023] [Indexed: 01/06/2024]

Padmanabhan P, Dasarathan T, Surapaneni KM. Exploring the Potential of ChatGPT in Obstetrics and Gynecology of Undergraduate Medical Curriculum. J Obstet Gynaecol India 2024;74:281-283. [PMID: 38974749 PMCID: PMC11224185 DOI: 10.1007/s13224-023-01909-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 11/06/2023] [Indexed: 07/09/2024] Open