1
|
Matsubara S. ChatGPT use should be prohibited in writing letters. Am J Obstet Gynecol 2024; 231:e110. [PMID: 38710270 DOI: 10.1016/j.ajog.2024.04.046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 04/30/2024] [Indexed: 05/08/2024]
Affiliation(s)
- Shigeki Matsubara
- Department of Obstetrics and Gynecology, Jichi Medical University, Tochigi, Japan; Department of Obstetrics and Gynecology, Koga Red Cross Hospital, 1150 Shimoyama, Koga, Ibaraki 306-0014, Japan; Medical Examination Center, Ibaraki Western Medical Center, Chikusei, Japan.
| |
Collapse
|
2
|
Anuk AT, Tanacan A, Kara Ö, Sahin D. Assessing adverse pregnancy outcomes in women with uncontrolled asthma vs. mild asthma: a retrospective comparative analysis. Arch Gynecol Obstet 2024; 310:1433-1440. [PMID: 38276984 DOI: 10.1007/s00404-023-07347-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 12/11/2023] [Indexed: 01/27/2024]
Abstract
PURPOSE The aim of this study was to evaluate perinatal outcomes between the uncontrolled asthma group and the mild asthma group and to reveal the relationship between disease severity and adverse maternal-fetal outcomes in this study. METHODS This retrospective cohort study analyzed 180 pregnant women diagnosed with asthma, hospitalized, and delivered at our center between September 1, 2019, and December 1, 2021. We compared two groups: 160 with mild asthma and 20 with uncontrolled asthma. Data encompassed maternal characteristics, obstetrical complications, medication use, emergency department admissions for exacerbations, smoking status, and neonatal outcomes. RESULTS In the uncontrolled asthma group, hospitalization rates, use of inhaled short-acting β-agonist (SABA), and systemic corticosteroids were significantly higher compared to the mild asthma group (p < 0.01). Maternal and fetal complications were more prevalent in the uncontrolled group, including asthma exacerbations (45% vs. 1.2%), anemia (10% vs. 4.4%), prematurity (25% vs. 9.6%), and intrauterine fetal demise (IUFD) (10% vs. 0.6%). Neonatal outcomes in the uncontrolled group showed higher rates of admission to the neonatal intensive care unit (NICU) (50% vs. 25%), respiratory distress syndrome (RDS) (30% vs. 14%), and intraventricular hemorrhage (IVH) (5% vs. 0%) compared to the mild asthma group. CONCLUSION Uncontrolled asthma during pregnancy is associated with higher adverse maternal-fetal and neonatal outcomes compared to mild asthma.
Collapse
Affiliation(s)
- Ali Taner Anuk
- Division of Perinatology, Department of Obstetrics and Gynecology, Ministry of Health, Ankara City Hospital, Ankara, Türkiye.
| | - Atakan Tanacan
- Division of Perinatology, Department of Obstetrics and Gynecology, Ministry of Health, Ankara City Hospital, Ankara, Türkiye
| | - Özgür Kara
- Division of Perinatology, Department of Obstetrics and Gynecology, Ministry of Health, Ankara City Hospital, Ankara, Türkiye
| | - Dilek Sahin
- Division of Perinatology, Department of Obstetrics and Gynecology, University of Health Sciences, Ministry of Health, Ankara City Hospital, Ankara, Türkiye
| |
Collapse
|
3
|
Peled T, Sela HY, Weiss A, Grisaru-Granovsky S, Agrawal S, Rottenstreich M. Evaluating the validity of ChatGPT responses on common obstetric issues: Potential clinical applications and implications. Int J Gynaecol Obstet 2024; 166:1127-1133. [PMID: 38523565 DOI: 10.1002/ijgo.15501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 02/29/2024] [Accepted: 03/10/2024] [Indexed: 03/26/2024]
Abstract
OBJECTIVE To evaluate the quality of ChatGPT responses to common issues in obstetrics and assess its ability to provide reliable responses to pregnant individuals. The study aimed to examine the responses based on expert opinions using predetermined criteria, including "accuracy," "completeness," and "safety." METHODS We curated 15 common and potentially clinically significant questions that pregnant women are asking. Two native English-speaking women were asked to reframe the questions in their own words, and we employed the ChatGPT language model to generate responses to the questions. To evaluate the accuracy, completeness, and safety of the ChatGPT's generated responses, we developed a questionnaire with a scale of 1 to 5 that obstetrics and gynecology experts from different countries were invited to rate accordingly. The ratings were analyzed to evaluate the average level of agreement and percentage of positive ratings (≥4) for each criterion. RESULTS Of the 42 experts invited, 20 responded to the questionnaire. The combined score for all responses yielded a mean rating of 4, with 75% of responses receiving a positive rating (≥4). While examining specific criteria, the ChatGPT responses were better for the accuracy criterion, with a mean rating of 4.2 and 80% of the questions received a positive rating. The responses scored less for the completeness criterion, with a mean rating of 3.8 and 46.7% of questions received a positive rating. For safety, the mean rating was 3.9 and 53.3% of questions received a positive rating. There was no response with an average negative rating below three. CONCLUSION This study demonstrates promising results regarding potential use of ChatGPT's in providing accurate responses to obstetric clinical questions posed by pregnant women. However, it is crucial to exercise caution when addressing inquiries concerning the safety of the fetus or the mother.
Collapse
Affiliation(s)
- Tzuria Peled
- Department of Obstetrics and Gynecology, Shaare Zedek Medical Center, Affiliated with the Hebrew University School of Medicine, Jerusalem, Israel
| | - Hen Y Sela
- Department of Obstetrics and Gynecology, Shaare Zedek Medical Center, Affiliated with the Hebrew University School of Medicine, Jerusalem, Israel
| | - Ari Weiss
- Department of Obstetrics and Gynecology, Shaare Zedek Medical Center, Affiliated with the Hebrew University School of Medicine, Jerusalem, Israel
| | - Sorina Grisaru-Granovsky
- Department of Obstetrics and Gynecology, Shaare Zedek Medical Center, Affiliated with the Hebrew University School of Medicine, Jerusalem, Israel
| | - Swati Agrawal
- Division of Maternal-Fetal Medicine, Department of Obstetrics and Gynecology, Hamilton Health Sciences, McMaster University, Hamilton, Ontario, Canada
| | - Misgav Rottenstreich
- Department of Obstetrics and Gynecology, Shaare Zedek Medical Center, Affiliated with the Hebrew University School of Medicine, Jerusalem, Israel
- Division of Maternal-Fetal Medicine, Department of Obstetrics and Gynecology, Hamilton Health Sciences, McMaster University, Hamilton, Ontario, Canada
- Department of Nursing, Jerusalem College of Technology, Jerusalem, Israel
| |
Collapse
|
4
|
Grünebaum A, Chervenak FA. The dichotomy between the scientific and artistic aspects of medical writing. Am J Obstet Gynecol 2024; 231:e111. [PMID: 38710266 DOI: 10.1016/j.ajog.2024.04.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 04/30/2024] [Indexed: 05/08/2024]
Affiliation(s)
- Amos Grünebaum
- Department of Obstetrics and Gynecology, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Lenox Hill Hospital, New York, NY.
| | - Frank A Chervenak
- Department of Obstetrics and Gynecology, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Lenox Hill Hospital, New York, NY
| |
Collapse
|
5
|
Hua R, Dong X, Wei Y, Shu Z, Yang P, Hu Y, Zhou S, Sun H, Yan K, Yan X, Chang K, Li X, Bai Y, Zhang R, Wang W, Zhou X. Lingdan: enhancing encoding of traditional Chinese medicine knowledge for clinical reasoning tasks with large language models. J Am Med Inform Assoc 2024; 31:2019-2029. [PMID: 39038795 PMCID: PMC11339528 DOI: 10.1093/jamia/ocae087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/22/2024] [Accepted: 04/06/2024] [Indexed: 07/24/2024] Open
Abstract
OBJECTIVE The recent surge in large language models (LLMs) across various fields has yet to be fully realized in traditional Chinese medicine (TCM). This study aims to bridge this gap by developing a large language model tailored to TCM knowledge, enhancing its performance and accuracy in clinical reasoning tasks such as diagnosis, treatment, and prescription recommendations. MATERIALS AND METHODS This study harnessed a wide array of TCM data resources, including TCM ancient books, textbooks, and clinical data, to create 3 key datasets: the TCM Pre-trained Dataset, the Traditional Chinese Patent Medicine (TCPM) Question Answering Dataset, and the Spleen and Stomach Herbal Prescription Recommendation Dataset. These datasets underpinned the development of the Lingdan Pre-trained LLM and 2 specialized models: the Lingdan-TCPM-Chat Model, which uses a Chain-of-Thought process for symptom analysis and TCPM recommendation, and a Lingdan Prescription Recommendation model (Lingdan-PR) that proposes herbal prescriptions based on electronic medical records. RESULTS The Lingdan-TCPM-Chat and the Lingdan-PR Model, fine-tuned on the Lingdan Pre-trained LLM, demonstrated state-of-the art performances for the tasks of TCM clinical knowledge answering and herbal prescription recommendation. Notably, Lingdan-PR outperformed all state-of-the-art baseline models, achieving an improvement of 18.39% in the Top@20 F1-score compared with the best baseline. CONCLUSION This study marks a pivotal step in merging advanced LLMs with TCM, showcasing the potential of artificial intelligence to help improve clinical decision-making of medical diagnostics and treatment strategies. The success of the Lingdan Pre-trained LLM and its derivative models, Lingdan-TCPM-Chat and Lingdan-PR, not only revolutionizes TCM practices but also opens new avenues for the application of artificial intelligence in other specialized medical fields. Our project is available at https://github.com/TCMAI-BJTU/LingdanLLM.
Collapse
Affiliation(s)
- Rui Hua
- Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
| | - Xin Dong
- Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
| | - Yu Wei
- Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
| | - Zixin Shu
- Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
| | - Pengcheng Yang
- Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
| | - Yunhui Hu
- Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
| | - Shuiping Zhou
- Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
| | - He Sun
- Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
| | - Kaijing Yan
- Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
| | - Xijun Yan
- Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
| | - Kai Chang
- Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
| | - Xiaodong Li
- Affiliated Hospital of Hubei University of Chinese Medicine, Wuhan 430065, China
- Hubei Academy of Chinese Medicine, Wuhan 430061, China
- Institute of Liver Diseases, Hubei Key Laboratory of Theoretical and Applied Research of Liver and Kidney in Traditional Chinese Medicine, Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan 430061, China
| | - Yuning Bai
- Department of Gastroenterology, Guang’anmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China
| | - Runshun Zhang
- Department of Gastroenterology, Guang’anmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China
| | - Wenjia Wang
- Innovation Center of Digital & Intelligent Chinese Medicine, Tasly Pharmaceutical Group Co., Ltd., Tianjin 300410, China
| | - Xuezhong Zhou
- Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
| |
Collapse
|
6
|
Wang Y, Chen Y, Sheng J. Assessing ChatGPT as a Medical Consultation Assistant for Chronic Hepatitis B: Cross-Language Study of English and Chinese. JMIR Med Inform 2024; 12:e56426. [PMID: 39115930 PMCID: PMC11342014 DOI: 10.2196/56426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 05/24/2024] [Accepted: 07/21/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Chronic hepatitis B (CHB) imposes substantial economic and social burdens globally. The management of CHB involves intricate monitoring and adherence challenges, particularly in regions like China, where a high prevalence of CHB intersects with health care resource limitations. This study explores the potential of ChatGPT-3.5, an emerging artificial intelligence (AI) assistant, to address these complexities. With notable capabilities in medical education and practice, ChatGPT-3.5's role is examined in managing CHB, particularly in regions with distinct health care landscapes. OBJECTIVE This study aimed to uncover insights into ChatGPT-3.5's potential and limitations in delivering personalized medical consultation assistance for CHB patients across diverse linguistic contexts. METHODS Questions sourced from published guidelines, online CHB communities, and search engines in English and Chinese were refined, translated, and compiled into 96 inquiries. Subsequently, these questions were presented to both ChatGPT-3.5 and ChatGPT-4.0 in independent dialogues. The responses were then evaluated by senior physicians, focusing on informativeness, emotional management, consistency across repeated inquiries, and cautionary statements regarding medical advice. Additionally, a true-or-false questionnaire was employed to further discern the variance in information accuracy for closed questions between ChatGPT-3.5 and ChatGPT-4.0. RESULTS Over half of the responses (228/370, 61.6%) from ChatGPT-3.5 were considered comprehensive. In contrast, ChatGPT-4.0 exhibited a higher percentage at 74.5% (172/222; P<.001). Notably, superior performance was evident in English, particularly in terms of informativeness and consistency across repeated queries. However, deficiencies were identified in emotional management guidance, with only 3.2% (6/186) in ChatGPT-3.5 and 8.1% (15/154) in ChatGPT-4.0 (P=.04). ChatGPT-3.5 included a disclaimer in 10.8% (24/222) of responses, while ChatGPT-4.0 included a disclaimer in 13.1% (29/222) of responses (P=.46). When responding to true-or-false questions, ChatGPT-4.0 achieved an accuracy rate of 93.3% (168/180), significantly surpassing ChatGPT-3.5's accuracy rate of 65.0% (117/180) (P<.001). CONCLUSIONS In this study, ChatGPT demonstrated basic capabilities as a medical consultation assistant for CHB management. The choice of working language for ChatGPT-3.5 was considered a potential factor influencing its performance, particularly in the use of terminology and colloquial language, and this potentially affects its applicability within specific target populations. However, as an updated model, ChatGPT-4.0 exhibits improved information processing capabilities, overcoming the language impact on information accuracy. This suggests that the implications of model advancement on applications need to be considered when selecting large language models as medical consultation assistants. Given that both models performed inadequately in emotional guidance management, this study highlights the importance of providing specific language training and emotional management strategies when deploying ChatGPT for medical purposes. Furthermore, the tendency of these models to use disclaimers in conversations should be further investigated to understand the impact on patients' experiences in practical applications.
Collapse
Affiliation(s)
- Yijie Wang
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Disease, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yining Chen
- Department of Urology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jifang Sheng
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Disease, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
7
|
Burns C, Bakaj A, Berishaj A, Hristidis V, Deak P, Equils O. Use of Generative AI for Improving Health Literacy in Reproductive Health: Case Study. JMIR Form Res 2024; 8:e59434. [PMID: 38986153 PMCID: PMC11336497 DOI: 10.2196/59434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 06/18/2024] [Accepted: 07/10/2024] [Indexed: 07/12/2024] Open
Abstract
BACKGROUND Patients find technology tools to be more approachable for seeking sensitive health-related information, such as reproductive health information. The inventive conversational ability of artificial intelligence (AI) chatbots, such as ChatGPT (OpenAI Inc), offers a potential means for patients to effectively locate answers to their health-related questions digitally. OBJECTIVE A pilot study was conducted to compare the novel ChatGPT with the existing Google Search technology for their ability to offer accurate, effective, and current information regarding proceeding action after missing a dose of oral contraceptive pill. METHODS A sequence of 11 questions, mimicking a patient inquiring about the action to take after missing a dose of an oral contraceptive pill, were input into ChatGPT as a cascade, given the conversational ability of ChatGPT. The questions were input into 4 different ChatGPT accounts, with the account holders being of various demographics, to evaluate potential differences and biases in the responses given to different account holders. The leading question, "what should I do if I missed a day of my oral contraception birth control?" alone was then input into Google Search, given its nonconversational nature. The results from the ChatGPT questions and the Google Search results for the leading question were evaluated on their readability, accuracy, and effective delivery of information. RESULTS The ChatGPT results were determined to be at an overall higher-grade reading level, with a longer reading duration, less accurate, less current, and with a less effective delivery of information. In contrast, the Google Search resulting answer box and snippets were at a lower-grade reading level, shorter reading duration, more current, able to reference the origin of the information (transparent), and provided the information in various formats in addition to text. CONCLUSIONS ChatGPT has room for improvement in accuracy, transparency, recency, and reliability before it can equitably be implemented into health care information delivery and provide the potential benefits it poses. However, AI may be used as a tool for providers to educate their patients in preferred, creative, and efficient ways, such as using AI to generate accessible short educational videos from health care provider-vetted information. Larger studies representing a diverse group of users are needed.
Collapse
Affiliation(s)
- Christina Burns
- MiOra, Encino, CA, United States
- University of California San Diego, San Diego, CA, United States
| | - Angela Bakaj
- MiOra, Encino, CA, United States
- Institute for Management & Innovation, University of Toronto, Toronto, ON, Canada
| | - Amonda Berishaj
- MiOra, Encino, CA, United States
- College of Professional Studies, Northeastern University, Boston, MA, United States
| | - Vagelis Hristidis
- Computer Science and Engineering, University of California Riverside, Riverside, CA, United States
| | - Pamela Deak
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of California San Diego, San Diego, CA, United States
| | | |
Collapse
|
8
|
Grossman S, Zerilli T, Nathan JP. Appropriateness of ChatGPT as a resource for medication-related questions. Br J Clin Pharmacol 2024. [PMID: 39096130 DOI: 10.1111/bcp.16212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 07/04/2024] [Accepted: 07/22/2024] [Indexed: 08/04/2024] Open
Abstract
With its increasing popularity, healthcare professionals and patients may use ChatGPT to obtain medication-related information. This study was conducted to assess ChatGPT's ability to provide satisfactory responses (i.e., directly answers the question, accurate, complete and relevant) to medication-related questions posed to an academic drug information service. ChatGPT responses were compared to responses generated by the investigators through the use of traditional resources, and references were evaluated. Thirty-nine questions were entered into ChatGPT; the three most common categories were therapeutics (8; 21%), compounding/formulation (6; 15%) and dosage (5; 13%). Ten (26%) questions were answered satisfactorily by ChatGPT. Of the 29 (74%) questions that were not answered satisfactorily, deficiencies included lack of a direct response (11; 38%), lack of accuracy (11; 38%) and/or lack of completeness (12; 41%). References were included with eight (29%) responses; each included fabricated references. Presently, healthcare professionals and consumers should be cautioned against using ChatGPT for medication-related information.
Collapse
Affiliation(s)
- Sara Grossman
- LIU Pharmacy, Arnold & Marie Schwartz College of Pharmacy and Health Sciences, Brooklyn, New York, USA
| | - Tina Zerilli
- LIU Pharmacy, Arnold & Marie Schwartz College of Pharmacy and Health Sciences, Brooklyn, New York, USA
| | - Joseph P Nathan
- LIU Pharmacy, Arnold & Marie Schwartz College of Pharmacy and Health Sciences, Brooklyn, New York, USA
| |
Collapse
|
9
|
Zheng C, Ye H, Guo J, Yang J, Fei P, Yuan Y, Huang D, Huang Y, Peng J, Xie X, Xie M, Zhao P, Chen L, Zhang M. Development and evaluation of a large language model of ophthalmology in Chinese. Br J Ophthalmol 2024:bjo-2023-324526. [PMID: 39019566 DOI: 10.1136/bjo-2023-324526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 06/03/2024] [Indexed: 07/19/2024]
Abstract
BACKGROUND Large language models (LLMs), such as ChatGPT, have considerable implications for various medical applications. However, ChatGPT's training primarily draws from English-centric internet data and is not tailored explicitly to the medical domain. Thus, an ophthalmic LLM in Chinese is clinically essential for both healthcare providers and patients in mainland China. METHODS We developed an LLM of ophthalmology (MOPH) using Chinese corpora and evaluated its performance in three clinical scenarios: ophthalmic board exams in Chinese, answering evidence-based medicine-oriented ophthalmic questions and diagnostic accuracy for clinical vignettes. Additionally, we compared MOPH's performance to that of human doctors. RESULTS In the ophthalmic exam, MOPH's average score closely aligned with the mean score of trainees (64.7 (range 62-68) vs 66.2 (range 50-92), p=0.817), but achieving a score above 60 in all seven mock exams. In answering ophthalmic questions, MOPH demonstrated an adherence of 83.3% (25/30) of responses following Chinese guidelines (Likert scale 4-5). Only 6.7% (2/30, Likert scale 1-2) and 10% (3/30, Likert scale 3) of responses were rated as 'poor or very poor' or 'potentially misinterpretable inaccuracies' by reviewers. In diagnostic accuracy, although the rate of correct diagnosis by ophthalmologists was superior to that by MOPH (96.1% vs 81.1%, p>0.05), the difference was not statistically significant. CONCLUSION This study demonstrated the promising performance of MOPH, a Chinese-specific ophthalmic LLM, in diverse clinical scenarios. MOPH has potential real-world applications in Chinese-language ophthalmology settings.
Collapse
Affiliation(s)
- Ce Zheng
- Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
- Institute of Hospital Development Strategy, China Hospital Development Institute, Shanghai Jiao Tong University, Shanghai, China
| | - Hongfei Ye
- Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
- Institute of Hospital Development Strategy, China Hospital Development Institute, Shanghai Jiao Tong University, Shanghai, China
| | - Jinming Guo
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China
| | - Junrui Yang
- Ophthalmology, The 74th Army Group Hospital, Guangzhou, Guangdong, China
| | - Ping Fei
- Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Yuanzhi Yuan
- Ophthalmology, Zhongshan Hospital Fudan University, Shanghai, China
| | - Danqing Huang
- Discipline Inspection & Supervision Office, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Yuqiang Huang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China
| | - Jie Peng
- Opthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Xiaoling Xie
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China
| | - Meng Xie
- Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Peiquan Zhao
- Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Li Chen
- Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Mingzhi Zhang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China
| |
Collapse
|
10
|
De Vito A, Geremia N, Marino A, Bavaro DF, Caruana G, Meschiari M, Colpani A, Mazzitelli M, Scaglione V, Venanzi Rullo E, Fiore V, Fois M, Campanella E, Pistarà E, Faltoni M, Nunnari G, Cattelan A, Mussini C, Bartoletti M, Vaira LA, Madeddu G. Assessing ChatGPT's theoretical knowledge and prescriptive accuracy in bacterial infections: a comparative study with infectious diseases residents and specialists. Infection 2024:10.1007/s15010-024-02350-6. [PMID: 38995551 DOI: 10.1007/s15010-024-02350-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 07/06/2024] [Indexed: 07/13/2024]
Abstract
OBJECTIVES Advancements in Artificial Intelligence(AI) have made platforms like ChatGPT increasingly relevant in medicine. This study assesses ChatGPT's utility in addressing bacterial infection-related questions and antibiogram-based clinical cases. METHODS This study involved a collaborative effort involving infectious disease (ID) specialists and residents. A group of experts formulated six true/false, six open-ended questions, and six clinical cases with antibiograms for four types of infections (endocarditis, pneumonia, intra-abdominal infections, and bloodstream infection) for a total of 96 questions. The questions were submitted to four senior residents and four specialists in ID and inputted into ChatGPT-4 and a trained version of ChatGPT-4. A total of 720 responses were obtained and reviewed by a blinded panel of experts in antibiotic treatments. They evaluated the responses for accuracy and completeness, the ability to identify correct resistance mechanisms from antibiograms, and the appropriateness of antibiotics prescriptions. RESULTS No significant difference was noted among the four groups for true/false questions, with approximately 70% correct answers. The trained ChatGPT-4 and ChatGPT-4 offered more accurate and complete answers to the open-ended questions than both the residents and specialists. Regarding the clinical case, we observed a lower accuracy from ChatGPT-4 to recognize the correct resistance mechanism. ChatGPT-4 tended not to prescribe newer antibiotics like cefiderocol or imipenem/cilastatin/relebactam, favoring less recommended options like colistin. Both trained- ChatGPT-4 and ChatGPT-4 recommended longer than necessary treatment periods (p-value = 0.022). CONCLUSIONS This study highlights ChatGPT's capabilities and limitations in medical decision-making, specifically regarding bacterial infections and antibiogram analysis. While ChatGPT demonstrated proficiency in answering theoretical questions, it did not consistently align with expert decisions in clinical case management. Despite these limitations, the potential of ChatGPT as a supportive tool in ID education and preliminary analysis is evident. However, it should not replace expert consultation, especially in complex clinical decision-making.
Collapse
Affiliation(s)
- Andrea De Vito
- Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, Italy.
- PhD School in Biomedical Science, Biomedical Science Department, University of Sassari, Sassari, Italy.
| | - Nicholas Geremia
- Unit of Infectious Diseases, Department of Clinical Medicine, Ospedale dell'Angelo, Venice, Italy
- Unit of Infectious Diseases, Department of Clinical Medicine, Ospedale Civile S.S. Giovanni e Paolo, Venice, Italy
| | - Andrea Marino
- Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, ARNAS Garibaldi Hospital, University of Catania, Catania, Italy
| | - Davide Fiore Bavaro
- Infectious Diseases Unit - IRCCS Humanitas Research Hospital, Via Manzoni 56, Rozzano, Milan, 20089, Italy
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, Pieve Emanuele, Milan, 20090, Italy
| | - Giorgia Caruana
- Infectious Diseases Service, Cantonal Hospital of Sion and Institut Central des Hôpitaux (ICH), Sion, Switzerland
- Institute of Microbiology, Department of Laboratory Medicine and Pathology, Lausanne University Hospital, Lausanne, Switzerland
| | | | - Agnese Colpani
- Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, Italy
| | - Maria Mazzitelli
- Infectious and Tropical Diseases Unit, Padua University Hospital, Padua, Italy
| | - Vincenzo Scaglione
- Infectious and Tropical Diseases Unit, Padua University Hospital, Padua, Italy
| | - Emmanuele Venanzi Rullo
- Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, University of Messina, Messina, Italy
| | - Vito Fiore
- Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, Italy
| | - Marco Fois
- Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, Italy
| | - Edoardo Campanella
- Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, ARNAS Garibaldi Hospital, University of Catania, Catania, Italy
- Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, University of Messina, Messina, Italy
| | - Eugenia Pistarà
- Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, ARNAS Garibaldi Hospital, University of Catania, Catania, Italy
- Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, University of Messina, Messina, Italy
| | | | - Giuseppe Nunnari
- Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, ARNAS Garibaldi Hospital, University of Catania, Catania, Italy
| | - Annamaria Cattelan
- Infectious and Tropical Diseases Unit, Padua University Hospital, Padua, Italy
| | | | - Michele Bartoletti
- Infectious Diseases Unit - IRCCS Humanitas Research Hospital, Via Manzoni 56, Rozzano, Milan, 20089, Italy
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, Pieve Emanuele, Milan, 20090, Italy
| | - Luigi Angelo Vaira
- Maxillofacial Surgery Unit, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, Italy
| | - Giordano Madeddu
- Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, Italy
| |
Collapse
|
11
|
Yilmaz Muluk S, Olcucu N. Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-3.5 and GoogleBard in Identifying Red Flags of Low Back Pain. Cureus 2024; 16:e63580. [PMID: 39087174 PMCID: PMC11290316 DOI: 10.7759/cureus.63580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/25/2024] [Indexed: 08/02/2024] Open
Abstract
BACKGROUND Low back pain (LBP) is a prevalent healthcare concern that is frequently responsive to conservative treatment. However, it can also stem from severe conditions, marked by 'red flags' (RF) such as malignancy, cauda equina syndrome, fractures, infections, spondyloarthropathies, and aneurysm rupture, which physicians should be vigilant about. Given the increasing reliance on online health information, this study assessed ChatGPT-3.5's (OpenAI, San Francisco, CA, USA) and GoogleBard's (Google, Mountain View, CA, USA) accuracy in responding to RF-related LBP questions and their capacity to discriminate the severity of the condition. METHODS We created 70 questions on RF-related symptoms and diseases following the LBP guidelines. Among them, 58 had a single symptom (SS), and 12 had multiple symptoms (MS) of LBP. Questions were posed to ChatGPT and GoogleBard, and responses were assessed by two authors for accuracy, completeness, and relevance (ACR) using a 5-point rubric criteria. RESULTS Cohen's kappa values (0.60-0.81) indicated significant agreement among the authors. The average scores for responses ranged from 3.47 to 3.85 for ChatGPT-3.5 and from 3.36 to 3.76 for GoogleBard for 58 SS questions, and from 4.04 to 4.29 for ChatGPT-3.5 and from 3.50 to 3.71 for GoogleBard for 12 MS questions. The ratings for these responses ranged from 'good' to 'excellent'. Most SS responses effectively conveyed the severity of the situation (93.1% for ChatGPT-3.5, 94.8% for GoogleBard), and all MS responses did so. No statistically significant differences were found between ChatGPT-3.5 and GoogleBard scores (p>0.05). CONCLUSIONS In an era characterized by widespread online health information seeking, artificial intelligence (AI) systems play a vital role in delivering precise medical information. These technologies may hold promise in the field of health information if they continue to improve.
Collapse
Affiliation(s)
| | - Nazli Olcucu
- Physical Medicine and Rehabilitation, Antalya Ataturk State Hospital, Antalya, TUR
| |
Collapse
|
12
|
Safrai M, Orwig KE. Utilizing artificial intelligence in academic writing: an in-depth evaluation of a scientific review on fertility preservation written by ChatGPT-4. J Assist Reprod Genet 2024; 41:1871-1880. [PMID: 38619763 PMCID: PMC11263262 DOI: 10.1007/s10815-024-03089-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 03/07/2024] [Indexed: 04/16/2024] Open
Abstract
PURPOSE To evaluate the ability of ChatGPT-4 to generate a biomedical review article on fertility preservation. METHODS ChatGPT-4 was prompted to create an outline for a review on fertility preservation in men and prepubertal boys. The outline provided by ChatGPT-4 was subsequently used to prompt ChatGPT-4 to write the different parts of the review and provide five references for each section. The different parts of the article and the references provided were combined to create a single scientific review that was evaluated by the authors, who are experts in fertility preservation. The experts assessed the article and the references for accuracy and checked for plagiarism using online tools. In addition, both experts independently scored the relevance, depth, and currentness of the ChatGPT-4's article using a scoring matrix ranging from 0 to 5 where higher scores indicate higher quality. RESULTS ChatGPT-4 successfully generated a relevant scientific article with references. Among 27 statements needing citations, four were inaccurate. Of 25 references, 36% were accurate, 48% had correct titles but other errors, and 16% were completely fabricated. Plagiarism was minimal (mean = 3%). Experts rated the article's relevance highly (5/5) but gave lower scores for depth (2-3/5) and currentness (3/5). CONCLUSION ChatGPT-4 can produce a scientific review on fertility preservation with minimal plagiarism. While precise in content, it showed factual and contextual inaccuracies and inconsistent reference reliability. These issues limit ChatGPT-4 as a sole tool for scientific writing but suggest its potential as an aid in the writing process.
Collapse
Affiliation(s)
- Myriam Safrai
- Department of Obstetrics, Gynecology and Reproductive Sciences, Magee-Womens Research Institute, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA.
- Department of Obstetrics and Gynecology, Chaim Sheba Medical Center (Tel Hashomer), Sackler Faculty of Medicine, Tel Aviv University, 52621, Tel Aviv, Israel.
| | - Kyle E Orwig
- Department of Obstetrics, Gynecology and Reproductive Sciences, Magee-Womens Research Institute, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| |
Collapse
|
13
|
Aghamaliyev U, Karimbayli J, Giessen-Jung C, Matthias I, Unger K, Andrade D, Hofmann FO, Weniger M, Angele MK, Benedikt Westphalen C, Werner J, Renz BW. ChatGPT's Gastrointestinal Tumor Board Tango: A limping dance partner? Eur J Cancer 2024; 205:114100. [PMID: 38729055 DOI: 10.1016/j.ejca.2024.114100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 04/23/2024] [Indexed: 05/12/2024]
Abstract
OBJECTIVES This study aimed to assess the consistency and replicability of treatment recommendations provided by ChatGPT 3.5 compared to gastrointestinal tumor cases presented at multidisciplinary tumor boards (MTBs). It also aimed to distinguish between general and case-specific responses and investigated the precision of ChatGPT's recommendations in replicating exact treatment plans, particularly regarding chemotherapy regimens and follow-up protocols. MATERIAL AND METHODS A retrospective study was carried out on 115 cases of gastrointestinal malignancies, selected from 448 patients reviewed in MTB meetings. A senior resident fed patient data into ChatGPT 3.5 to produce treatment recommendations, which were then evaluated against the tumor board's decisions by senior oncology fellows. RESULTS Among the examined cases, ChatGPT 3.5 provided general information about the malignancy without considering individual patient characteristics in 19% of cases. However, only in 81% of cases, ChatGPT generated responses that were specific to the individual clinical scenarios. In the subset of case-specific responses, 83% of recommendations exhibited overall treatment strategy concordance between ChatGPT and MTB. However, the exact treatment concordance dropped to 65%, notably lower in recommending specific chemotherapy regimens. Cases recommended for surgery showed the highest concordance rates, while those involving chemotherapy recommendations faced challenges in precision. CONCLUSIONS ChatGPT 3.5 demonstrates potential in aligning conceptual approaches to treatment strategies with MTB guidelines. However, it falls short in accurately duplicating specific treatment plans, especially concerning chemotherapy regimens and follow-up procedures. Ethical concerns and challenges in achieving exact replication necessitate prudence when considering ChatGPT 3.5 for direct clinical decision-making in MTBs.
Collapse
Affiliation(s)
- Ughur Aghamaliyev
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - Javad Karimbayli
- Division of Molecular Oncology, Centro di Riferimento Oncologico di Aviano (CRO), IRCCS, National Cancer Institute, Aviano, Italy
| | - Clemens Giessen-Jung
- Comprehensive Cancer Center Munich & Department of Medicine III, LMU University Hospital, LMU Munich, Germany
| | - Ilmer Matthias
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
| | - Kristian Unger
- German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany; Department of Radiation Oncology, University Hospital, LMU Munich, 81377; Bavarian Cancer Research Center (BZKF), Munich, Germany
| | - Dorian Andrade
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - Felix O Hofmann
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
| | - Maximilian Weniger
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - Martin K Angele
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - C Benedikt Westphalen
- Comprehensive Cancer Center Munich & Department of Medicine III, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
| | - Jens Werner
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - Bernhard W Renz
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.
| |
Collapse
|
14
|
Khromchenko K, Shaikh S, Singh M, Vurture G, Rana RA, Baum JD. ChatGPT-3.5 Versus Google Bard: Which Large Language Model Responds Best to Commonly Asked Pregnancy Questions? Cureus 2024; 16:e65543. [PMID: 39188430 PMCID: PMC11346960 DOI: 10.7759/cureus.65543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/27/2024] [Indexed: 08/28/2024] Open
Abstract
Large language models (LLM) have been widely used to provide information in many fields, including obstetrics and gynecology. Which model performs best in providing answers to commonly asked pregnancy questions is unknown. A qualitative analysis of Chat Generative Pre-Training Transformer Version 3.5 (ChatGPT-3.5) (OpenAI, Inc., San Francisco, California, United States) and Bard, recently renamed Google Gemini (Google LLC, Mountain View, California, United States), was performed in August of 2023. Each LLM was queried on 12 commonly asked pregnancy questions and asked for their references. Review and grading of the responses and references for both LLMs were performed by the co-authors individually and then as a group to formulate a consensus. Query responses were graded as "acceptable" or "not acceptable" based on correctness and completeness in comparison to American College of Obstetricians and Gynecologists (ACOG) publications, PubMed-indexed evidence, and clinical experience. References were classified as "verified," "broken," "irrelevant," "non-existent," and "no references." Grades of "acceptable" were given to 58% of ChatGPT-3.5 responses (seven out of 12) and 83% of Bard responses (10 out of 12). In regard to references, ChatGPT-3.5 had reference issues in 100% of its references, and Bard had discrepancies in 8% of its references (one out of 12). When comparing ChatGPT-3.5 responses between May 2023 and August 2023, a change in "acceptable" responses was noted: 50% versus 58%, respectively. Bard answered more questions correctly than ChatGPT-3.5 when queried on a small sample of commonly asked pregnancy questions. ChatGPT-3.5 performed poorly in terms of reference verification. The overall performance of ChatGPT-3.5 remained stable over time, with approximately one-half of responses being "acceptable" in both May and August of 2023. Both LLMs need further evaluation and vetting before being accepted as accurate and reliable sources of information for pregnant women.
Collapse
Affiliation(s)
- Keren Khromchenko
- Obstetrics and Gynecology, Hackensack Meridian Jersey Shore University Medical Center, Neptune, USA
| | - Sameeha Shaikh
- Obstetrics and Gynecology, Hackensack Meridian School of Medicine, Nutley, USA
| | - Meghana Singh
- Obstetrics and Gynecology, Hackensack Meridian School of Medicine, Nutley, USA
| | - Gregory Vurture
- Obstetrics and Gynecology, Hackensack Meridian Jersey Shore University Medical Center, Neptune, USA
| | - Rima A Rana
- Obstetrics and Gynecology, Hackensack Meridian Jersey Shore University Medical Center, Neptune, USA
| | - Jonathan D Baum
- Obstetrics and Gynecology, Hackensack Meridian Jersey Shore University Medical Center, Neptune, USA
| |
Collapse
|
15
|
Rodrigues Alessi M, Gomes HA, Lopes de Castro M, Terumy Okamoto C. Performance of ChatGPT in Solving Questions From the Progress Test (Brazilian National Medical Exam): A Potential Artificial Intelligence Tool in Medical Practice. Cureus 2024; 16:e64924. [PMID: 39156244 PMCID: PMC11330648 DOI: 10.7759/cureus.64924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/19/2024] [Indexed: 08/20/2024] Open
Abstract
Background The use of artificial intelligence (AI) is not a recent phenomenon, but the latest advancements in this technology are making a significant impact across various fields of human knowledge. In medicine, this trend is no different, although it has developed at a slower pace. ChatGPT is an example of an AI-based algorithm capable of answering questions, interpreting phrases, and synthesizing complex information, potentially aiding and even replacing humans in various areas of social interest. Some studies have compared its performance in solving medical knowledge exams with medical students and professionals to verify AI accuracy. This study aimed to measure the performance of ChatGPT in answering questions from the Progress Test from 2021 to 2023. Methodology An observational study was conducted in which questions from the 2021 Progress Test and the regional tests (Southern Institutional Pedagogical Support Center II) of 2022 and 2023 were presented to ChatGPT 3.5. The results obtained were compared with the scores of first- to sixth-year medical students from over 120 Brazilian universities. All questions were presented sequentially, without any modification to their structure. After each question was presented, the platform's history was cleared, and the site was restarted. Results The platform achieved an average accuracy rate in 2021, 2022, and 2023 of 69.7%, 68.3%, and 67.2%, respectively, surpassing students from all medical years in the three tests evaluated, reinforcing findings in the current literature. The subject with the best score for the AI was Public Health, with a mean grade of 77.8%. Conclusions ChatGPT demonstrated the ability to answer medical questions with higher accuracy than humans, including students from the last year of medical school.
Collapse
Affiliation(s)
| | - Heitor A Gomes
- School of Medicine, Universidade Positivo, Curitiba, BRA
| | | | | |
Collapse
|
16
|
Meyer R, Hamilton KM, Truong MD, Wright KN, Siedhoff MT, Brezinov Y, Levin G. ChatGPT compared with Google Search and healthcare institution as sources of postoperative patient instructions after gynecological surgery. BJOG 2024; 131:1154-1156. [PMID: 38177090 DOI: 10.1111/1471-0528.17746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/13/2023] [Indexed: 01/06/2024]
Affiliation(s)
- Raanan Meyer
- Division of Minimally Invasive Gynecologic Surgery, Department of Obstetrics and Gynecology, Cedars Sinai Medical Center, Los Angeles, California, USA
- The Dr. Pinchas Bornstein Talpiot Medical Leadership Programme, Sheba Medical Center, Ramat-Gan, Israel
| | - Kacey M Hamilton
- Division of Minimally Invasive Gynecologic Surgery, Department of Obstetrics and Gynecology, Cedars Sinai Medical Center, Los Angeles, California, USA
| | - Mireille D Truong
- Division of Minimally Invasive Gynecologic Surgery, Department of Obstetrics and Gynecology, Cedars Sinai Medical Center, Los Angeles, California, USA
| | - Kelly N Wright
- Division of Minimally Invasive Gynecologic Surgery, Department of Obstetrics and Gynecology, Cedars Sinai Medical Center, Los Angeles, California, USA
| | - Matthew T Siedhoff
- Division of Minimally Invasive Gynecologic Surgery, Department of Obstetrics and Gynecology, Cedars Sinai Medical Center, Los Angeles, California, USA
| | - Yoav Brezinov
- Lady Davis Institute for Cancer Research, Jewish General Hospital, McGill University, Montreal, Quebec, Canada
| | - Gabriel Levin
- Lady Davis Institute for Cancer Research, Jewish General Hospital, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
17
|
Rotem R, Zamstein O, Rottenstreich M, O'Sullivan OE, O'reilly BA, Weintraub AY. The future of patient education: A study on AI-driven responses to urinary incontinence inquiries. Int J Gynaecol Obstet 2024. [PMID: 38944693 DOI: 10.1002/ijgo.15751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 05/30/2024] [Accepted: 06/14/2024] [Indexed: 07/01/2024]
Abstract
OBJECTIVE To evaluate the effectiveness of ChatGPT in providing insights into common urinary incontinence concerns within urogynecology. By analyzing the model's responses against established benchmarks of accuracy, completeness, and safety, the study aimed to quantify its usefulness for informing patients and aiding healthcare providers. METHODS An expert-driven questionnaire was developed, inviting urogynecologists worldwide to assess ChatGPT's answers to 10 carefully selected questions on urinary incontinence (UI). These assessments focused on the accuracy of the responses, their comprehensiveness, and whether they raised any safety issues. Subsequent statistical analyses determined the average consensus among experts and identified the proportion of responses receiving favorable evaluations (a score of 4 or higher). RESULTS Of 50 urogynecologists that were approached worldwide, 37 responded, offering insights into ChatGPT's responses on UI. The overall feedback averaged a score of 4.0, indicating a positive acceptance. Accuracy scores averaged 3.9 with 71% rated favorably, whereas comprehensiveness scored slightly higher at 4 with 74% favorable ratings. Safety assessments also averaged 4 with 74% favorable responses. CONCLUSION This investigation underlines ChatGPT's favorable performance across the evaluated domains of accuracy, comprehensiveness, and safety within the context of UI queries. However, despite this broadly positive reception, the study also signals a clear avenue for improvement, particularly in the precision of the provided information. Refining ChatGPT's accuracy and ensuring the delivery of more pinpointed responses are essential steps forward, aiming to bolster its utility as a comprehensive educational resource for patients and a supportive tool for healthcare practitioners.
Collapse
Affiliation(s)
- Reut Rotem
- Department of Urogynaecology, Cork University Maternity Hospital, Cork, Ireland
- Department of Obstetrics and Gynecology, Shaare Zedek Medical Center, Affiliated with the Hebrew University School of Medicine, Jerusalem, Israel
| | - Omri Zamstein
- Department of Obstetrics and Gynecology, Soroka University Medical Center, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Misgav Rottenstreich
- Department of Obstetrics and Gynecology, Shaare Zedek Medical Center, Affiliated with the Hebrew University School of Medicine, Jerusalem, Israel
| | | | - Barry A O'reilly
- Department of Urogynaecology, Cork University Maternity Hospital, Cork, Ireland
| | - Adi Y Weintraub
- Department of Obstetrics and Gynecology, Soroka University Medical Center, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| |
Collapse
|
18
|
Kotzur T, Singh A, Parker J, Peterson B, Sager B, Rose R, Corley F, Brady C. Evaluation of a Large Language Model's Ability to Assist in an Orthopedic Hand Clinic. Hand (N Y) 2024:15589447241257643. [PMID: 38907651 DOI: 10.1177/15589447241257643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/24/2024]
Abstract
BACKGROUND Advancements in artificial intelligence technology, such as OpenAI's large language model, ChatGPT, could transform medicine through applications in a clinical setting. This study aimed to assess the utility of ChatGPT as a clinical assistant in an orthopedic hand clinic. METHODS Nine clinical vignettes, describing various common and uncommon hand pathologies, were constructed and reviewed by 4 fellowship-trained orthopedic hand surgeons and an orthopedic resident. ChatGPT was given these vignettes and asked to generate a differential diagnosis, potential workup plan, and provide treatment options for its top differential. Responses were graded for accuracy and the overall utility scored on a 5-point Likert scale. RESULTS The diagnostic accuracy of ChatGPT was 7 out of 9 cases, indicating an overall accuracy rate of 78%. ChatGPT was less reliable with more complex pathologies and failed to identify an intentionally incorrect presentation. ChatGPT received a score of 3.8 ± 1.4 for correct diagnosis, 3.4 ± 1.4 for helpfulness in guiding patient management, 4.1 ± 1.0 for appropriate workup for the actual diagnosis, 4.3 ± 0.8 for an appropriate recommended treatment plan for the diagnosis, and 4.4 ± 0.8 for the helpfulness of treatment options in managing patients. CONCLUSION ChatGPT was successful in diagnosing most of the conditions; however, the overall utility of its advice was variable. While it performed well in recommending treatments, it faced difficulties in providing appropriate diagnoses for uncommon pathologies. In addition, it failed to identify an obvious error in presenting pathology.
Collapse
|
19
|
Moll M, Heilemann G, Georg D, Kauer-Dorner D, Kuess P. The role of artificial intelligence in informed patient consent for radiotherapy treatments-a case report. Strahlenther Onkol 2024; 200:544-548. [PMID: 38180493 DOI: 10.1007/s00066-023-02190-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 12/03/2023] [Indexed: 01/06/2024]
Abstract
Recent advancements in large language models (LMM; e.g., ChatGPT (OpenAI, San Francisco, California, USA)) have seen widespread use in various fields, including healthcare. This case study reports on the first use of LMM in a pretreatment discussion and in obtaining informed consent for a radiation oncology treatment. Further, the reproducibility of the replies by ChatGPT 3.5 was analyzed. A breast cancer patient, following legal consultation, engaged in a conversation with ChatGPT 3.5 regarding her radiotherapy treatment. The patient posed questions about side effects, prevention, activities, medications, and late effects. While some answers contained inaccuracies, responses closely resembled doctors' replies. In a final evaluation discussion, the patient, however, stated that she preferred the presence of a physician and expressed concerns about the source of the provided information. The reproducibility was tested in ten iterations. Future guidelines for using such models in radiation oncology should be driven by medical professionals. While artificial intelligence (AI) supports essential tasks, human interaction remains crucial.
Collapse
Affiliation(s)
- M Moll
- Department of Radiation Oncology, Comprehensive Cancer Center Vienna, Medical University Vienna, Vienna, Austria.
| | - G Heilemann
- Department of Radiation Oncology, Comprehensive Cancer Center Vienna, Medical University Vienna, Vienna, Austria
| | - Dietmar Georg
- Department of Radiation Oncology, Comprehensive Cancer Center Vienna, Medical University Vienna, Vienna, Austria
| | - D Kauer-Dorner
- Department of Radiation Oncology, Comprehensive Cancer Center Vienna, Medical University Vienna, Vienna, Austria
| | - P Kuess
- Department of Radiation Oncology, Comprehensive Cancer Center Vienna, Medical University Vienna, Vienna, Austria
| |
Collapse
|
20
|
Padmanabhan P, Dasarathan T, Surapaneni KM. Exploring the Potential of ChatGPT in Obstetrics and Gynecology of Undergraduate Medical Curriculum. J Obstet Gynaecol India 2024; 74:281-283. [PMID: 38974749 PMCID: PMC11224185 DOI: 10.1007/s13224-023-01909-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 11/06/2023] [Indexed: 07/09/2024] Open
Abstract
ChatGPT, the new bustle in the field of technology, is attracting millions of users worldwide with its impressive skills to perform multiple tasks in a way that mimics human conversation. We conducted this study at two levels with direct and case-based questions from Obstetrics and gynecology to assess the performance of ChatGPT in the medical field. Our results suggest that ChatGPT has a good comprehension of the subject. However, ChatGPT should be trained to include recent updates and improvements in terms of generating error-free and upgraded responses.
Collapse
Affiliation(s)
- Padmavathy Padmanabhan
- Department of Obstetrics & Gynaecology, Panimalar Medical College Hospital and Research Institute, Varadharajapuram, Poonamallee, Chennai, 600123 India
| | - Tamilselvi Dasarathan
- Department of Obstetrics & Gynaecology, Panimalar Medical College Hospital and Research Institute, Varadharajapuram, Poonamallee, Chennai, 600123 India
| | - Krishna Mohan Surapaneni
- Department of Biochemistry, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai, Tamil Nadu 600 123 India
- Departments of Medical Education, Clinical Skills & Simulation, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai, Tamil Nadu 600 123 India
| |
Collapse
|
21
|
Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA, Bergonzani M, Bolzoni A, Committeri U, Crimi S, Gabriele G, Lonardi F, Maglitto F, Petrocelli M, Pucci R, Saponaro G, Tel A, Vellone V, Chiesa-Estomba CM, Boscolo-Rizzo P, Salzano G, De Riu G. Accuracy of ChatGPT-Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis. Otolaryngol Head Neck Surg 2024; 170:1492-1503. [PMID: 37595113 DOI: 10.1002/ohn.489] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/16/2023] [Accepted: 07/14/2023] [Indexed: 08/20/2023]
Abstract
OBJECTIVE To investigate the accuracy of Chat-Based Generative Pre-trained Transformer (ChatGPT) in answering questions and solving clinical scenarios of head and neck surgery. STUDY DESIGN Observational and valuative study. SETTING Eighteen surgeons from 14 Italian head and neck surgery units. METHODS A total of 144 clinical questions encompassing different subspecialities of head and neck surgery and 15 comprehensive clinical scenarios were developed. Questions and scenarios were inputted into ChatGPT4, and the resulting answers were evaluated by the researchers using accuracy (range 1-6), completeness (range 1-3), and references' quality Likert scales. RESULTS The overall median score of open-ended questions was 6 (interquartile range[IQR]: 5-6) for accuracy and 3 (IQR: 2-3) for completeness. Overall, the reviewers rated the answer as entirely or nearly entirely correct in 87.2% of cases and as comprehensive and covering all aspects of the question in 73% of cases. The artificial intelligence (AI) model achieved a correct response in 84.7% of the closed-ended questions (11 wrong answers). As for the clinical scenarios, ChatGPT provided a fully or nearly fully correct diagnosis in 81.7% of cases. The proposed diagnostic or therapeutic procedure was judged to be complete in 56.7% of cases. The overall quality of the bibliographic references was poor, and sources were nonexistent in 46.4% of the cases. CONCLUSION The results generally demonstrate a good level of accuracy in the AI's answers. The AI's ability to resolve complex clinical scenarios is promising, but it still falls short of being considered a reliable support for the decision-making process of specialists in head-neck surgery.
Collapse
Affiliation(s)
- Luigi Angelo Vaira
- Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Sassari, Italy
- Biomedical Sciences Department, PhD School of Biomedical Science, University of Sassari, Sassari, Italy
| | - Jerome R Lechien
- Department of Anatomy and Experimental Oncology, Mons School of Medicine, UMONS, Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium
- Department of Otolaryngology-Head Neck Surgery, Elsan Polyclinic of Poitiers, Poitiers, France
| | - Vincenzo Abbate
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Fabiana Allevi
- Maxillofacial Surgery Department, ASSt Santi Paolo e Carlo, University of Milan, Milan, Italy
| | - Giovanni Audino
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Giada Anna Beltramini
- Department of Biomedical, Surgical and Dental Sciences, University of Milan, Milan, Italy
- Maxillofacial and Dental Unit, Fondazione IRCCS Cà Granda Ospedale Maggiore Policlinico, Milan, Italy
| | - Michela Bergonzani
- Maxillo-Facial Surgery Division, Head and Neck Department, University Hospital of Parma, Parma, Italy
| | - Alessandro Bolzoni
- Department of Biomedical, Surgical and Dental Sciences, University of Milan, Milan, Italy
| | - Umberto Committeri
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Salvatore Crimi
- Operative Unit of Maxillofacial Surgery, Policlinico San Marco, University of Catania, Catania, Italy
| | - Guido Gabriele
- Department of Maxillofacial Surgery, University of Siena, Siena, Italy
| | - Fabio Lonardi
- Department of Maxillofacial Surgery, University of Verona, Verona, Italy
| | - Fabio Maglitto
- Maxillo-Facial Surgery Unit, University of Bari "Aldo Moro", Bari, Italy
| | - Marzia Petrocelli
- Maxillofacial Surgery Operative Unit, Bellaria and Maggiore Hospital, Bologna, Italy
| | - Resi Pucci
- Maxillofacial Surgery Unit, San Camillo-Forlanini Hospital, Rome, Italy
| | - Gianmarco Saponaro
- Maxillo-Facial Surgery Unit, IRCSS "A. Gemelli" Foundation-Catholic, University of the Sacred Heart, Rome, Italy
| | - Alessandro Tel
- Department of Head and Neck Surgery and Neuroscience, Clinic of Maxillofacial Surgery, University Hospital of Udine, Udine, Italy
| | | | | | - Paolo Boscolo-Rizzo
- Department of Medical, Surgical and Health Sciences, Section of Otolaryngology, University of Trieste, Trieste, Italy
| | - Giovanni Salzano
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Giacomo De Riu
- Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Sassari, Italy
| |
Collapse
|
22
|
Khan I, Khare BK. Exploring the potential of machine learning in gynecological care: a review. Arch Gynecol Obstet 2024; 309:2347-2365. [PMID: 38625543 DOI: 10.1007/s00404-024-07479-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Accepted: 03/10/2024] [Indexed: 04/17/2024]
Abstract
Gynecological health remains a critical aspect of women's overall well-being, with profound implications for maternal and reproductive outcomes. This comprehensive review synthesizes the current state of knowledge on four pivotal aspects of gynecological health: preterm birth, breast cancer and cervical cancer and infertility treatment. Machine learning (ML) has emerged as a transformative technology with the potential to revolutionize gynecology and women's healthcare. The subsets of AI, namely, machine learning (ML) and deep learning (DL) methods, have aided in detecting complex patterns from huge datasets and using such patterns in making predictions. This paper investigates how machine learning (ML) algorithms are employed in the field of gynecology to tackle crucial issues pertaining to women's health. This paper also investigates the integration of ultrasound technology with artificial intelligence (AI) during the initial, intermediate, and final stages of pregnancy. Additionally, it delves into the diverse applications of AI throughout each trimester.This review paper provides an overview of machine learning (ML) models, introduces natural language processing (NLP) concepts, including ChatGPT, and discusses the clinical applications of artificial intelligence (AI) in gynecology. Additionally, the paper outlines the challenges in utilizing machine learning within the field of gynecology.
Collapse
Affiliation(s)
- Imran Khan
- Harcourt Butler Technical University, Kanpur, India.
| | | |
Collapse
|
23
|
Naqvi WM, Shaikh SZ, Mishra GV. Large language models in physical therapy: time to adapt and adept. Front Public Health 2024; 12:1364660. [PMID: 38887241 PMCID: PMC11182445 DOI: 10.3389/fpubh.2024.1364660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 05/10/2024] [Indexed: 06/20/2024] Open
Abstract
Healthcare is experiencing a transformative phase, with artificial intelligence (AI) and machine learning (ML). Physical therapists (PTs) stand on the brink of a paradigm shift in education, practice, and research. Rather than visualizing AI as a threat, it presents an opportunity to revolutionize. This paper examines how large language models (LLMs), such as ChatGPT and BioMedLM, driven by deep ML can offer human-like performance but face challenges in accuracy due to vast data in PT and rehabilitation practice. PTs can benefit by developing and training an LLM specifically for streamlining administrative tasks, connecting globally, and customizing treatments using LLMs. However, human touch and creativity remain invaluable. This paper urges PTs to engage in learning and shaping AI models by highlighting the need for ethical use and human supervision to address potential biases. Embracing AI as a contributor, and not just a user, is crucial by integrating AI, fostering collaboration for a future in which AI enriches the PT field provided data accuracy, and the challenges associated with feeding the AI model are sensitively addressed.
Collapse
Affiliation(s)
- Waqar M. Naqvi
- Department of Interdisciplinary Sciences, Datta Meghe Institute of Higher Education and Research, Wardha, India
- Department of Physiotherapy, College of Health Sciences, Gulf Medical University, Ajman, United Arab Emirates
- NKP Salve Institute of Medical Sciences and Research Center, Nagpur, India
| | - Summaiya Zareen Shaikh
- Department of Neuro-Physiotherapy, The SIA College of Health Sciences, College of Physiotherapy, Thane, India
| | - Gaurav V. Mishra
- Department of Radiodiagnosis, Datta Meghe Institute of Higher Education and Research, Wardha, India
| |
Collapse
|
24
|
Devranoglu B, Gurbuz T, Gokmen O. ChatGPT's Efficacy in Queries Regarding Polycystic Ovary Syndrome and Treatment Strategies for Women Experiencing Infertility. Diagnostics (Basel) 2024; 14:1082. [PMID: 38893609 PMCID: PMC11172366 DOI: 10.3390/diagnostics14111082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 05/14/2024] [Accepted: 05/21/2024] [Indexed: 06/21/2024] Open
Abstract
This study assesses the efficacy of ChatGPT-4, an advanced artificial intelligence (AI) language model, in delivering precise and comprehensive answers to inquiries regarding managing polycystic ovary syndrome (PCOS)-related infertility. The research team, comprising experienced gynecologists, formulated 460 structured queries encompassing a wide range of common and intricate PCOS scenarios. The queries were: true/false (170), open-ended (165), and multiple-choice (125) and further classified as 'easy', 'moderate', and 'hard'. For true/false questions, ChatGPT-4 achieved a flawless accuracy rate of 100% initially and upon reassessment after 30 days. In the open-ended category, there was a noteworthy enhancement in accuracy, with scores increasing from 5.53 ± 0.89 initially to 5.88 ± 0.43 at the 30-day mark (p < 0.001). Completeness scores for open-ended queries also experienced a significant improvement, rising from 2.35 ± 0.58 to 2.92 ± 0.29 (p < 0.001). In the multiple-choice category, although the accuracy score exhibited a minor decline from 5.96 ± 0.44 to 5.92 ± 0.63 after 30 days (p > 0.05). Completeness scores for multiple-choice questions remained consistent, with initial and 30-day means of 2.98 ± 0.18 and 2.97 ± 0.25, respectively (p > 0.05). ChatGPT-4 demonstrated exceptional performance in true/false queries and significantly improved handling of open-ended questions during the 30 days. These findings emphasize the potential of AI, particularly ChatGPT-4, in enhancing decision-making support for healthcare professionals managing PCOS-related infertility.
Collapse
Affiliation(s)
- Belgin Devranoglu
- Department of Obstetrics and Gynecology, Zeynep Kamil Maternity/Children, Education and Training Hospital, Istanbul 34480, Turkey
| | - Tugba Gurbuz
- Department of Gynecology and Obstetrics Clinic, Medistate Hospital, Istanbul 34820, Turkey;
| | - Oya Gokmen
- Department of Gynecology, Obstetrics and In Vitro Fertilization Clinic, Medistate Hospital, Istanbul 34820, Turkey;
| |
Collapse
|
25
|
Cetera GE, Tozzi AE, Chiappa V, Castiglioni I, Merli CEM, Vercellini P. Artificial Intelligence in the Management of Women with Endometriosis and Adenomyosis: Can Machines Ever Be Worse Than Humans? J Clin Med 2024; 13:2950. [PMID: 38792490 PMCID: PMC11121846 DOI: 10.3390/jcm13102950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 04/08/2024] [Accepted: 05/06/2024] [Indexed: 05/26/2024] Open
Abstract
Artificial intelligence (AI) is experiencing advances and integration in all medical specializations, and this creates excitement but also concerns. This narrative review aims to critically assess the state of the art of AI in the field of endometriosis and adenomyosis. By enabling automation, AI may speed up some routine tasks, decreasing gynecologists' risk of burnout, as well as enabling them to spend more time interacting with their patients, increasing their efficiency and patients' perception of being taken care of. Surgery may also benefit from AI, especially through its integration with robotic surgery systems. This may improve the detection of anatomical structures and enhance surgical outcomes by combining intra-operative findings with pre-operative imaging. Not only that, but AI promises to improve the quality of care by facilitating clinical research. Through the introduction of decision-support tools, it can enhance diagnostic assessment; it can also predict treatment effectiveness and side effects, as well as reproductive prognosis and cancer risk. However, concerns exist regarding the fact that good quality data used in tool development and compliance with data sharing guidelines are crucial. Also, professionals are worried AI may render certain specialists obsolete. This said, AI is more likely to become a well-liked team member rather than a usurper.
Collapse
Affiliation(s)
- Giulia Emily Cetera
- Gynecology Unit, Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, 20122 Milan, Italy; (G.E.C.); (C.E.M.M.)
- Academic Center for Research on Adenomyosis and Endometriosis, Department of Clinical Sciences and Community Health, Università degli Studi di Milano, 20122 Milan, Italy
| | - Alberto Eugenio Tozzi
- Predictive and Preventive Medicine Research Unit, Bambino Gesù Children’s Hospital, IRCCS, 00165 Rome, Italy;
| | - Valentina Chiappa
- Gynaecologic Oncology, Fondazione IRCCS Istituto Nazionale dei Tumori, 20133 Milan, Italy;
| | | | - Camilla Erminia Maria Merli
- Gynecology Unit, Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, 20122 Milan, Italy; (G.E.C.); (C.E.M.M.)
| | - Paolo Vercellini
- Gynecology Unit, Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, 20122 Milan, Italy; (G.E.C.); (C.E.M.M.)
- Academic Center for Research on Adenomyosis and Endometriosis, Department of Clinical Sciences and Community Health, Università degli Studi di Milano, 20122 Milan, Italy
| |
Collapse
|
26
|
Zhu L, Mou W, Hong C, Yang T, Lai Y, Qi C, Lin A, Zhang J, Luo P. The Evaluation of Generative AI Should Include Repetition to Assess Stability. JMIR Mhealth Uhealth 2024; 12:e57978. [PMID: 38688841 PMCID: PMC11106698 DOI: 10.2196/57978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 04/30/2024] [Indexed: 05/02/2024] Open
Abstract
The increasing interest in the potential applications of generative artificial intelligence (AI) models like ChatGPT in health care has prompted numerous studies to explore its performance in various medical contexts. However, evaluating ChatGPT poses unique challenges due to the inherent randomness in its responses. Unlike traditional AI models, ChatGPT generates different responses for the same input, making it imperative to assess its stability through repetition. This commentary highlights the importance of including repetition in the evaluation of ChatGPT to ensure the reliability of conclusions drawn from its performance. Similar to biological experiments, which often require multiple repetitions for validity, we argue that assessing generative AI models like ChatGPT demands a similar approach. Failure to acknowledge the impact of repetition can lead to biased conclusions and undermine the credibility of research findings. We urge researchers to incorporate appropriate repetition in their studies from the outset and transparently report their methods to enhance the robustness and reproducibility of findings in this rapidly evolving field.
Collapse
Affiliation(s)
- Lingxuan Zhu
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Weiming Mou
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Chenglin Hong
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Tao Yang
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yancheng Lai
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Chang Qi
- Institute of Logic and Computation, TU Wien, Austria
| | - Anqi Lin
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Jian Zhang
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Peng Luo
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
27
|
Safrai M, Azaria A. Does small talk with a medical provider affect ChatGPT's medical counsel? Performance of ChatGPT on USMLE with and without distractions. PLoS One 2024; 19:e0302217. [PMID: 38687696 PMCID: PMC11060598 DOI: 10.1371/journal.pone.0302217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 03/28/2024] [Indexed: 05/02/2024] Open
Abstract
Efforts are being made to improve the time effectiveness of healthcare providers. Artificial intelligence tools can help transcript and summarize physician-patient encounters and produce medical notes and medical recommendations. However, in addition to medical information, discussion between healthcare and patients includes small talk and other information irrelevant to medical concerns. As Large Language Models (LLMs) are predictive models building their response based on the words in the prompts, there is a risk that small talk and irrelevant information may alter the response and the suggestion given. Therefore, this study aims to investigate the impact of medical data mixed with small talk on the accuracy of medical advice provided by ChatGPT. USMLE step 3 questions were used as a model for relevant medical data. We use both multiple-choice and open-ended questions. First, we gathered small talk sentences from human participants using the Mechanical Turk platform. Second, both sets of USLME questions were arranged in a pattern where each sentence from the original questions was followed by a small talk sentence. ChatGPT 3.5 and 4 were asked to answer both sets of questions with and without the small talk sentences. Finally, a board-certified physician analyzed the answers by ChatGPT and compared them to the formal correct answer. The analysis results demonstrate that the ability of ChatGPT-3.5 to answer correctly was impaired when small talk was added to medical data (66.8% vs. 56.6%; p = 0.025). Specifically, for multiple-choice questions (72.1% vs. 68.9%; p = 0.67) and for the open questions (61.5% vs. 44.3%; p = 0.01), respectively. In contrast, small talk phrases did not impair ChatGPT-4 ability in both types of questions (83.6% and 66.2%, respectively). According to these results, ChatGPT-4 seems more accurate than the earlier 3.5 version, and it appears that small talk does not impair its capability to provide medical recommendations. Our results are an important first step in understanding the potential and limitations of utilizing ChatGPT and other LLMs for physician-patient interactions, which include casual conversations.
Collapse
Affiliation(s)
- Myriam Safrai
- Department of Obstetrics and Gynecology, Chaim Sheba Medical Center (Tel Hashomer), Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Department of Obstetrics, Gynecology and Reproductive Sciences, Magee-Womens Research Institute, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States of America
| | - Amos Azaria
- School of Computer Science, Ariel University, Ari’el, Israel
| |
Collapse
|
28
|
Graf EM, McKinney JA, Dye AB, Lin L, Sanchez-Ramos L. Exploring the Limits of Artificial Intelligence for Referencing Scientific Articles. Am J Perinatol 2024. [PMID: 38653452 DOI: 10.1055/s-0044-1786033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
OBJECTIVE To evaluate the reliability of three artificial intelligence (AI) chatbots (ChatGPT, Google Bard, and Chatsonic) in generating accurate references from existing obstetric literature. STUDY DESIGN Between mid-March and late April 2023, ChatGPT, Google Bard, and Chatsonic were prompted to provide references for specific obstetrical randomized controlled trials (RCTs) published in 2020. RCTs were considered for inclusion if they were mentioned in a previous article that primarily evaluated RCTs published by the top medical and obstetrics and gynecology journals with the highest impact factors in 2020 as well as RCTs published in a new journal focused on publishing obstetric RCTs. The selection of the three AI models was based on their popularity, performance in natural language processing, and public availability. Data collection involved prompting the AI chatbots to provide references according to a standardized protocol. The primary evaluation metric was the accuracy of each AI model in correctly citing references, including authors, publication title, journal name, and digital object identifier (DOI). Statistical analysis was performed using a permutation test to compare the performance of the AI models. RESULTS Among the 44 RCTs analyzed, Google Bard demonstrated the highest accuracy, correctly citing 13.6% of the requested RCTs, whereas ChatGPT and Chatsonic exhibited lower accuracy rates of 2.4 and 0%, respectively. Google Bard often substantially outperformed Chatsonic and ChatGPT in correctly citing the studied reference components. The majority of references from all AI models studied were noted to provide DOIs for unrelated studies or DOIs that do not exist. CONCLUSION To ensure the reliability of scientific information being disseminated, authors must exercise caution when utilizing AI for scientific writing and literature search. However, despite their limitations, collaborative partnerships between AI systems and researchers have the potential to drive synergistic advancements, leading to improved patient care and outcomes. KEY POINTS · AI chatbots often cite scientific articles incorrectly.. · AI chatbots can create false references.. · Responsible AI use in research is vital..
Collapse
Affiliation(s)
- Emily M Graf
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Jacksonville, Florida
| | - Jordan A McKinney
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Jacksonville, Florida
| | - Alexander B Dye
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Jacksonville, Florida
| | - Lifeng Lin
- Department of Epidemiology and Biostatistics, University of Arizona, Tucson, Arizona
| | - Luis Sanchez-Ramos
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Jacksonville, Florida
| |
Collapse
|
29
|
Braun EM, Juhasz-Böss I, Solomayer EF, Truhn D, Keller C, Heinrich V, Braun BJ. Will I soon be out of my job? Quality and guideline conformity of ChatGPT therapy suggestions to patient inquiries with gynecologic symptoms in a palliative setting. Arch Gynecol Obstet 2024; 309:1543-1549. [PMID: 37975899 DOI: 10.1007/s00404-023-07272-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 10/15/2023] [Indexed: 11/19/2023]
Abstract
PURPOSE The market and application possibilities for artificial intelligence are currently growing at high speed and are increasingly finding their way into gynecology. While the medical side is highly represented in the current literature, the patient's perspective is still lagging behind. Therefore, the aim of this study was to evaluate the recommendations of ChatGPT regarding patient inquiries about the possible therapy of gynecological leading symptoms in a palliative situation by experts. METHODS Case vignettes were constructed for 10 common concomitant symptoms in gynecologic oncology tumors in a palliative setting, and patient queries regarding therapy of these symptoms were generated as prompts for ChatGPT. Five experts in palliative care and gynecologic oncology evaluated the responses with respect to guideline adherence and applicability and identified advantages and disadvantages. RESULTS The overall rating of ChatGPT responses averaged 4.1 (5 = strongly agree; 1 = strongly disagree). The experts saw an average guideline conformity of the therapy recommendations with a value of 4.0. ChatGPT sometimes omits relevant therapies and does not provide an individual assessment of the suggested therapies, but does indicate that a physician consultation is additionally necessary. CONCLUSIONS Language models, such as ChatGPT, can provide valid and largely guideline-compliant therapy recommendations in their freely available and thus in principle accessible version for our patients. For a complete therapy recommendation, an evaluation of the therapies, their individual adjustment as well as a filtering of possible wrong recommendations, a medical expert's opinion remains indispensable.
Collapse
Affiliation(s)
- Eva-Marie Braun
- Center for Integrative Oncology, Die Filderklinik, Im Haberschlai 7, 70794, Filderstadt-Bonlanden, Germany.
| | - Ingolf Juhasz-Böss
- Department of Gynecology, University Medical Center Freiburg, Hugstetter Straße 55, 79106, Freiburg, Germany
| | - Erich-Franz Solomayer
- Department of Gynecology, Obstetrics and Reproductive Medicine, Saarland University Hospital, Kirrberger Straße, Building 9, 66421, Homburg, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Pauwelsstraße 30, 52074, Aachen, Germany
| | - Christiane Keller
- Center for Palliative Medicine and Pediatric Pain Therapy, Saarland University Hospital, Kirrberger Straße, Building 69, 66421, Homburg, Germany
| | - Vanessa Heinrich
- Department of Radiation Oncology, University Hospital Tübingen, Crona Kliniken, Hoppe-Seyler-Str. 3, 72076, Tübingen, Germany
| | - Benedikt Johannes Braun
- Department of Trauma and Reconstructive Surgery at the Eberhard Karls University Tübingen, BG Unfallklinik Tübingen, Schnarrenbergstrasse 95, 72076, Tübingen, Germany
| |
Collapse
|
30
|
Deng L, Wang T, Yangzhang, Zhai Z, Tao W, Li J, Zhao Y, Luo S, Xu J. Evaluation of large language models in breast cancer clinical scenarios: a comparative analysis based on ChatGPT-3.5, ChatGPT-4.0, and Claude2. Int J Surg 2024; 110:1941-1950. [PMID: 38668655 PMCID: PMC11019981 DOI: 10.1097/js9.0000000000001066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 12/23/2023] [Indexed: 04/29/2024]
Abstract
BACKGROUND Large language models (LLMs) have garnered significant attention in the AI domain owing to their exemplary context recognition and response capabilities. However, the potential of LLMs in specific clinical scenarios, particularly in breast cancer diagnosis, treatment, and care, has not been fully explored. This study aimed to compare the performances of three major LLMs in the clinical context of breast cancer. METHODS In this study, clinical scenarios designed specifically for breast cancer were segmented into five pivotal domains (nine cases): assessment and diagnosis, treatment decision-making, postoperative care, psychosocial support, and prognosis and rehabilitation. The LLMs were used to generate feedback for various queries related to these domains. For each scenario, a panel of five breast cancer specialists, each with over a decade of experience, evaluated the feedback from LLMs. They assessed feedback concerning LLMs in terms of their quality, relevance, and applicability. RESULTS There was a moderate level of agreement among the raters (Fleiss' kappa=0.345, P<0.05). Comparing the performance of different models regarding response length, GPT-4.0 and GPT-3.5 provided relatively longer feedback than Claude2. Furthermore, across the nine case analyses, GPT-4.0 significantly outperformed the other two models in average quality, relevance, and applicability. Within the five clinical areas, GPT-4.0 markedly surpassed GPT-3.5 in the quality of the other four areas and scored higher than Claude2 in tasks related to psychosocial support and treatment decision-making. CONCLUSION This study revealed that in the realm of clinical applications for breast cancer, GPT-4.0 showcases not only superiority in terms of quality and relevance but also demonstrates exceptional capability in applicability, especially when compared to GPT-3.5. Relative to Claude2, GPT-4.0 holds advantages in specific domains. With the expanding use of LLMs in the clinical field, ongoing optimization and rigorous accuracy assessments are paramount.
Collapse
Affiliation(s)
- Linfang Deng
- Department of Nursing, Jinzhou Medical University, Jinzhou
| | | | - Yangzhang
- Department of Breast Surgery, Xingtai People’s Hospital of Hebei Medical University, Xingtai, Hebei, People’s Republic of China
| | - Zhenhua Zhai
- Department of General Surgery, The First Affiliated Hospital of Jinzhou Medical University, Jinzhou
| | - Wei Tao
- Department of Breast Surgery
| | | | - Yi Zhao
- Department of Breast Surgery
| | - Shaoting Luo
- Department of Pediatric Orthopedics, Shengjing Hospital of China Medical University, Shenyang
| | - Jinjiang Xu
- Department of Health Management Center, The First Hospital of Jinzhou Medical University, Jinzhou, Liaoning
| |
Collapse
|
31
|
Patel JM, Hermann CE, Growdon WB, Aviki E, Stasenko M. ChatGPT accurately performs genetic counseling for gynecologic cancers. Gynecol Oncol 2024; 183:115-119. [PMID: 38676973 DOI: 10.1016/j.ygyno.2024.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 04/03/2024] [Accepted: 04/08/2024] [Indexed: 04/29/2024]
Abstract
OBJECTIVE Artificial Intelligence (AI) systems such as ChatGPT can take medical examinations and counsel patients regarding medical diagnosis. We aim to quantify the accuracy of the ChatGPT V3.4 in answering commonly asked questions pertaining to genetic testing and counseling for gynecologic cancers. METHODS Forty questions were formulated in conjunction with gynecologic oncologists and adapted from professional society guidelines and ChatGPT version 3.5 was queried, the version that is readily available to the public. The two categories of questions were genetic counseling guidelines and questions pertaining to specific genetic disorders. The answers were scored by two attending Gynecologic Oncologists according to the following scale: 1) correct and comprehensive, 2) correct but not comprehensive, 3) some correct, some incorrect, and 4) completely incorrect. Scoring discrepancies were resolved by additional third reviewer. The proportion of responses earning each score were calculated overall and within each question category. RESULTS ChatGPT provided correct and comprehensive answers to 33/40 (82.5%) questions, correct but not comprehensive answers to 6/40 (15%) questions, partially incorrect answers to 1/40 (2.5%) questions, and completely incorrect answers to 0/40 (0%) questions. The genetic counseling category of questions had the highest proportion of answers that were both correct and comprehensive with ChatGPT answering all 20/20 questions with 100% accuracy and were comprehensive in responses. ChatGPT performed equally in the specific genetic disorders category, with 88.2% (15/17) and 66.6% (2/3) correct and comprehensive answers to questions pertaining to hereditary breast and ovarian cancer and Lynch syndrome questions respectively. CONCLUSION ChatGPT accurately answers questions about genetic syndromes, genetic testing, and counseling in majority of the studied questions. These data suggest this powerful tool can be utilized as a patient resource for genetic counseling questions, though more data input from gynecologic oncologists would be needed to educate patients on genetic syndromes.
Collapse
Affiliation(s)
- Jharna M Patel
- New York University Langone Health, Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, New York, NY, United States of America.
| | - Catherine E Hermann
- New York University Langone Health, Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, New York, NY, United States of America
| | - Whitfield B Growdon
- New York University Langone Health, Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, New York, NY, United States of America
| | - Emeline Aviki
- New York University Langone Health, Long Island, Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, Mineola, NY, United States of America
| | - Marina Stasenko
- New York University Langone Health, Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, New York, NY, United States of America
| |
Collapse
|
32
|
Teixeira-Marques F, Medeiros N, Nazaré F, Alves S, Lima N, Ribeiro L, Gama R, Oliveira P. Exploring the role of ChatGPT in clinical decision-making in otorhinolaryngology: a ChatGPT designed study. Eur Arch Otorhinolaryngol 2024; 281:2023-2030. [PMID: 38345613 DOI: 10.1007/s00405-024-08498-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 01/23/2024] [Indexed: 03/16/2024]
Abstract
PURPOSE Since the beginning of 2023, ChatGPT emerged as a hot topic in healthcare research. The potential to be a valuable tool in clinical practice is compelling, particularly in improving clinical decision support by helping physicians to make clinical decisions based on the best medical knowledge available. We aim to investigate ChatGPT's ability to identify, diagnose and manage patients with otorhinolaryngology-related symptoms. METHODS A prospective, cross-sectional study was designed based on an idea suggested by ChatGPT to assess the level of agreement between ChatGPT and five otorhinolaryngologists (ENTs) in 20 reality-inspired clinical cases. The clinical cases were presented to the chatbot on two different occasions (ChatGPT-1 and ChatGPT-2) to assess its temporal stability. RESULTS The mean score of ChatGPT-1 was 4.4 (SD 1.2; min 1, max 5) and of ChatGPT-2 was 4.15 (SD 1.3; min 1, max 5), while the ENTs mean score was 4.91 (SD 0.3; min 3, max 5). The Mann-Whitney U test revealed a statistically significant difference (p < 0.001) between both ChatGPT's and the ENTs's score. ChatGPT-1 and ChatGPT-2 gave different answers in five occasions. CONCLUSIONS Artificial intelligence will be an important instrument in clinical decision-making in the near future and ChatGPT is the most promising chatbot so far. Despite needing further development to be used with safety, there is room for improvement and potential to aid otorhinolaryngology residents and specialists in making the most correct decision for the patient.
Collapse
Affiliation(s)
- Francisco Teixeira-Marques
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal.
| | - Nuno Medeiros
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Francisco Nazaré
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Sandra Alves
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Nuno Lima
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Leandro Ribeiro
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Rita Gama
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Pedro Oliveira
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| |
Collapse
|
33
|
Wiwanitmkit S, Wiwanitkit V. Potential for ChatGPT in obstetrics and gynecology: a comment. Am J Obstet Gynecol 2024; 230:e51. [PMID: 38006981 DOI: 10.1016/j.ajog.2023.11.1238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 11/27/2023]
Affiliation(s)
| | - Viroj Wiwanitkit
- Saveetha Medical College, Chennai, India; Chandigarh University, 140413 Punjab, India.
| |
Collapse
|
34
|
Grünebaum A, Pollet S, Chervenak F. Potential for ChatGPT in obstetrics and gynecology: a response. Am J Obstet Gynecol 2024; 230:e52. [PMID: 38008150 DOI: 10.1016/j.ajog.2023.11.1239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 11/03/2023] [Indexed: 11/28/2023]
Affiliation(s)
- Amos Grünebaum
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, New York, NY.
| | - Susan Pollet
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, New York, NY
| | - Frank Chervenak
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, New York, NY
| |
Collapse
|
35
|
Ocakoglu SR, Coskun B. The Emerging Role of AI in Patient Education: A Comparative Analysis of LLM Accuracy for Pelvic Organ Prolapse. Med Princ Pract 2024; 33:000538538. [PMID: 38527444 PMCID: PMC11324208 DOI: 10.1159/000538538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 03/21/2024] [Indexed: 03/27/2024] Open
Abstract
OBJECTIVE This study aimed to evaluate the accuracy, completeness, precision, and readability of outputs generated by three Large Language Models (LLMs): GPT by OpenAI, BARD by Google, and Bing by Microsoft, in comparison to patient education material on Pelvic Organ Prolapse (POP) provided by the Royal College of Obstetricians and Gynecologists (RCOG). METHODS A total of 15 questions were retrieved from the RCOG website and input into the three LLMs. Two independent reviewers evaluated the outputs for accuracy, completeness, and precision. Readability was assessed using the Simplified Measure of Gobbledygook (SMOG) score and the Flesch-Kincaid Grade Level (FKGL) score. RESULTS Significant differences were observed in completeness and precision metrics. ChatGPT ranked highest in completeness (66.7%), while Bing led in precision (100%). No significant differences were observed in accuracy across all models. In terms of readability, ChatGPT exhibited higher difficulty than BARD, Bing, and the original RCOG answers. CONCLUSION While all models displayed a variable degree of correctness, ChatGPT excelled in completeness, significantly surpassing BARD and Bing. However, Bing led in precision, providing the most relevant and concise answers. Regarding readability, ChatGPT exhibited higher difficulty. The study found that while all LLMs showed varying degrees of correctness in answering RCOG questions on patient information for Pelvic Organ Prolapse (POP), ChatGPT was the most comprehensive, but its answers were harder to read. Bing, on the other hand, was the most precise. The findings highlight the potential of LLMs in health information dissemination and the need for careful interpretation of their outputs.
Collapse
Affiliation(s)
| | - Burhan Coskun
- Department of Urology, Bursa Uludag University, Bursa, Turkey
| |
Collapse
|
36
|
Horgan R, Martins JG, Saade G, Abuhamad A, Kawakita T. ChatGPT in maternal-fetal medicine practice: a primer for clinicians. Am J Obstet Gynecol MFM 2024; 6:101302. [PMID: 38281582 DOI: 10.1016/j.ajogmf.2024.101302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 01/02/2024] [Accepted: 01/21/2024] [Indexed: 01/30/2024]
Abstract
ChatGPT (Generative Pre-trained Transformer), a language model that was developed by OpenAI and launched in November 2022, generates human-like responses to prompts using deep-learning technology. The integration of large language processing models into healthcare has the potential to improve the accessibility of medical information for both patients and health professionals alike. In this commentary, we demonstrated the ability of ChatGPT to produce patient information sheets. Four board-certified, maternal-fetal medicine attending physicians rated the accuracy and humanness of the information according to 2 predefined scales of accuracy and completeness. The median score for accuracy of information was rated 4.8 on a 6-point scale and the median score for completeness of information was 2.2 on a 3-point scale for the 5 patient information leaflets generated by ChatGPT. Concerns raised included the omission of clinically important information for patient counseling in some patient information leaflets and the inability to verify the source of information because ChatGPT does not provide references. ChatGPT is a powerful tool that has the potential to enhance patient care, but such a tool requires extensive validation and is perhaps best considered as an adjunct to clinical practice rather than as a tool to be used freely by the public for healthcare information.
Collapse
Affiliation(s)
- Rebecca Horgan
- Division of Maternal-Fetal Medicine, Department of Obstetrics and Gynecology, Eastern Virginia Medical School, Norfolk, VA..
| | - Juliana G Martins
- Division of Maternal-Fetal Medicine, Department of Obstetrics and Gynecology, Eastern Virginia Medical School, Norfolk, VA
| | - George Saade
- Division of Maternal-Fetal Medicine, Department of Obstetrics and Gynecology, Eastern Virginia Medical School, Norfolk, VA
| | - Alfred Abuhamad
- Division of Maternal-Fetal Medicine, Department of Obstetrics and Gynecology, Eastern Virginia Medical School, Norfolk, VA
| | - Tetsuya Kawakita
- Division of Maternal-Fetal Medicine, Department of Obstetrics and Gynecology, Eastern Virginia Medical School, Norfolk, VA
| |
Collapse
|
37
|
Tailor PD, Xu TT, Fortes BH, Iezzi R, Olsen TW, Starr MR, Bakri SJ, Scruggs BA, Barkmeier AJ, Patel SV, Baratz KH, Bernhisel AA, Wagner LH, Tooley AA, Roddy GW, Sit AJ, Wu KY, Bothun ED, Mansukhani SA, Mohney BG, Chen JJ, Brodsky MC, Tajfirouz DA, Chodnicki KD, Smith WM, Dalvin LA. Appropriateness of Ophthalmology Recommendations From an Online Chat-Based Artificial Intelligence Model. MAYO CLINIC PROCEEDINGS. DIGITAL HEALTH 2024; 2:119-128. [PMID: 38577703 PMCID: PMC10994056 DOI: 10.1016/j.mcpdig.2024.01.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/06/2024]
Abstract
Objective To determine the appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model to ophthalmology questions. Patients and Methods Cross-sectional qualitative study from April 1, 2023, to April 30, 2023. A total of 192 questions were generated spanning all ophthalmic subspecialties. Each question was posed to a large language model (LLM) 3 times. The responses were graded by appropriate subspecialists as appropriate, inappropriate, or unreliable in 2 grading contexts. The first grading context was if the information was presented on a patient information site. The second was an LLM-generated draft response to patient queries sent by the electronic medical record (EMR). Appropriate was defined as accurate and specific enough to serve as a surrogate for physician-approved information. Main outcome measure was percentage of appropriate responses per subspecialty. Results For patient information site-related questions, the LLM provided an overall average of 79% appropriate responses. Variable rates of average appropriateness were observed across ophthalmic subspecialties for patient information site information ranging from 56% to 100%: cataract or refractive (92%), cornea (56%), glaucoma (72%), neuro-ophthalmology (67%), oculoplastic or orbital surgery (80%), ocular oncology (100%), pediatrics (89%), vitreoretinal diseases (86%), and uveitis (65%). For draft responses to patient questions via EMR, the LLM provided an overall average of 74% appropriate responses and varied by subspecialty: cataract or refractive (85%), cornea (54%), glaucoma (77%), neuro-ophthalmology (63%), oculoplastic or orbital surgery (62%), ocular oncology (90%), pediatrics (94%), vitreoretinal diseases (88%), and uveitis (55%). Stratifying grades across health information categories (disease and condition, risk and prevention, surgery-related, and treatment and management) showed notable but insignificant variations, with disease and condition often rated highest (72% and 69%) for appropriateness and surgery-related (55% and 51%) lowest, in both contexts. Conclusion This LLM reported mostly appropriate responses across multiple ophthalmology subspecialties in the context of both patient information sites and EMR-related responses to patient questions. Current LLM offerings require optimization and improvement before widespread clinical use.
Collapse
Affiliation(s)
- Prashant D Tailor
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Timothy T Xu
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Blake H Fortes
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Raymond Iezzi
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Timothy W Olsen
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Matthew R Starr
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Sophie J Bakri
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Brittni A Scruggs
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Andrew J Barkmeier
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Sanjay V Patel
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Keith H Baratz
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Ashlie A Bernhisel
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Lilly H Wagner
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Andrea A Tooley
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Gavin W Roddy
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Arthur J Sit
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Kristi Y Wu
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Erick D Bothun
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Sasha A Mansukhani
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Brian G Mohney
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - John J Chen
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Michael C Brodsky
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Deena A Tajfirouz
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Kevin D Chodnicki
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Wendy M Smith
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| | - Lauren A Dalvin
- Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.)
| |
Collapse
|
38
|
Ray PP. Bridging the gap: integrating ChatGPT into obstetrics and gynecology research-a call to action. Arch Gynecol Obstet 2024; 309:1111-1113. [PMID: 37466691 DOI: 10.1007/s00404-023-07129-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 06/27/2023] [Indexed: 07/20/2023]
|
39
|
Lee Y, Kim SY. Potential applications of ChatGPT in obstetrics and gynecology in Korea: a review article. Obstet Gynecol Sci 2024; 67:153-159. [PMID: 38247132 PMCID: PMC10948210 DOI: 10.5468/ogs.23231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/08/2023] [Accepted: 11/29/2023] [Indexed: 01/23/2024] Open
Abstract
The use of chatbot technology, particularly chat generative pre-trained transformer (ChatGPT) with an impressive 175 billion parameters, has garnered significant attention across various domains, including Obstetrics and Gynecology (OBGYN). This comprehensive review delves into the transformative potential of chatbots with a special focus on ChatGPT as a leading artificial intelligence (AI) technology. Moreover, ChatGPT harnesses the power of deep learning algorithms to generate responses that closely mimic human language, opening up myriad applications in medicine, research, and education. In the field of medicine, ChatGPT plays a pivotal role in diagnosis, treatment, and personalized patient education. Notably, the technology has demonstrated remarkable capabilities, surpassing human performance in OBGYN examinations, and delivering highly accurate diagnoses. However, challenges remain, including the need to verify the accuracy of the information and address the ethical considerations and limitations. In the wide scope of chatbot technology, AI systems play a vital role in healthcare processes, including documentation, diagnosis, research, and education. Although promising, the limitations and occasional inaccuracies require validation by healthcare professionals. This review also examined global chatbot adoption in healthcare, emphasizing the need for user awareness to ensure patient safety. Chatbot technology holds great promise in OBGYN and medicine, offering innovative solutions while necessitating responsible integration to ensure patient care and safety.
Collapse
Affiliation(s)
- YooKyung Lee
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynecology, MizMedi Hospital, Seoul, Korea
| | - So Yun Kim
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynecology, MizMedi Hospital, Seoul, Korea
| |
Collapse
|
40
|
Peng S, Wang D, Liang Y, Xiao W, Zhang Y, Liu L. AI-ChatGPT/GPT-4: An Booster for the Development of Physical Medicine and Rehabilitation in the New Era! Ann Biomed Eng 2024; 52:462-466. [PMID: 37500980 PMCID: PMC10859338 DOI: 10.1007/s10439-023-03314-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 07/06/2023] [Indexed: 07/29/2023]
Abstract
Artificial intelligence (AI) has been driving the continuous development of the Physical Medicine and Rehabilitation (PM&R) fields. The latest release of ChatGPT/GPT-4 has shown us that AI can potentially transform the healthcare industry. In this study, we propose various ways in which ChatGPT/GPT-4 can display its talents in the field of PM&R in future. ChatGPT/GPT-4 is an essential tool for Physiatrists in the new era.
Collapse
Affiliation(s)
- Shengxin Peng
- School of Rehabilitation Medicine of Binzhou Medical University, Yantai, China
| | - Deqiang Wang
- School of Rehabilitation Medicine of Binzhou Medical University, Yantai, China
| | | | | | | | - Lei Liu
- Department of Painology, The First Affiliated Hospital of Shandong First Medical University (Shandong Provincial Qianfoshan Hospital), Jinan, 250014, China.
| |
Collapse
|
41
|
Bukar UA, Sayeed MS, Razak SFA, Yogarayan S, Amodu OA. An integrative decision-making framework to guide policies on regulating ChatGPT usage. PeerJ Comput Sci 2024; 10:e1845. [PMID: 38440047 PMCID: PMC10911759 DOI: 10.7717/peerj-cs.1845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 01/09/2024] [Indexed: 03/06/2024]
Abstract
Generative artificial intelligence has created a moment in history where human beings have begin to closely interact with artificial intelligence (AI) tools, putting policymakers in a position to restrict or legislate such tools. One particular example of such a tool is ChatGPT which is the first and world's most popular multipurpose generative AI tool. This study aims to put forward a policy-making framework of generative artificial intelligence based on the risk, reward, and resilience framework. A systematic search was conducted, by using carefully chosen keywords, excluding non-English content, conference articles, book chapters, and editorials. Published research were filtered based on their relevance to ChatGPT ethics, yielding a total of 41 articles. Key elements surrounding ChatGPT concerns and motivations were systematically deduced and classified under the risk, reward, and resilience categories to serve as ingredients for the proposed decision-making framework. The decision-making process and rules were developed as a primer to help policymakers navigate decision-making conundrums. Then, the framework was practically tailored towards some of the concerns surrounding ChatGPT in the context of higher education. In the case of the interconnection between risk and reward, the findings show that providing students with access to ChatGPT presents an opportunity for increased efficiency in tasks such as text summarization and workload reduction. However, this exposes them to risks such as plagiarism and cheating. Similarly, pursuing certain opportunities such as accessing vast amounts of information, can lead to rewards, but it also introduces risks like misinformation and copyright issues. Likewise, focusing on specific capabilities of ChatGPT, such as developing tools to detect plagiarism and misinformation, may enhance resilience in some areas (e.g., academic integrity). However, it may also create vulnerabilities in other domains, such as the digital divide, educational equity, and job losses. Furthermore, the finding indicates second-order effects of legislation regarding ChatGPT which have implications both positively and negatively. One potential effect is a decrease in rewards due to the limitations imposed by the legislation, which may hinder individuals from fully capitalizing on the opportunities provided by ChatGPT. Hence, the risk, reward, and resilience framework provides a comprehensive and flexible decision-making model that allows policymakers and in this use case, higher education institutions to navigate the complexities and trade-offs associated with ChatGPT, which have theoretical and practical implications for the future.
Collapse
Affiliation(s)
- Umar Ali Bukar
- Centre for Intelligent Cloud Computing (CICC), Faculty of Information Science & Technology, Multimedia University, Melaka, Malaysia
| | - Md Shohel Sayeed
- Centre for Intelligent Cloud Computing (CICC), Faculty of Information Science & Technology, Multimedia University, Melaka, Malaysia
| | - Siti Fatimah Abdul Razak
- Centre for Intelligent Cloud Computing (CICC), Faculty of Information Science & Technology, Multimedia University, Melaka, Malaysia
| | - Sumendra Yogarayan
- Centre for Intelligent Cloud Computing (CICC), Faculty of Information Science & Technology, Multimedia University, Melaka, Malaysia
| | - Oluwatosin Ahmed Amodu
- Information and Communication Engineering Department, Elizade University, Ilara-Mokin, Ondo State, Nigeria
| |
Collapse
|
42
|
Raman R, Kumar Nair V, Nedungadi P, Kumar Sahu A, Kowalski R, Ramanathan S, Achuthan K. Fake news research trends, linkages to generative artificial intelligence and sustainable development goals. Heliyon 2024; 10:e24727. [PMID: 38322879 PMCID: PMC10844021 DOI: 10.1016/j.heliyon.2024.e24727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/14/2023] [Accepted: 01/12/2024] [Indexed: 02/08/2024] Open
Abstract
In the digital age, where information is a cornerstone for decision-making, social media's not-so-regulated environment has intensified the prevalence of fake news, with significant implications for both individuals and societies. This study employs a bibliometric analysis of a large corpus of 9678 publications spanning 2013-2022 to scrutinize the evolution of fake news research, identifying leading authors, institutions, and nations. Three thematic clusters emerge: Disinformation in social media, COVID-19-induced infodemics, and techno-scientific advancements in auto-detection. This work introduces three novel contributions: 1) a pioneering mapping of fake news research to Sustainable Development Goals (SDGs), indicating its influence on areas like health (SDG 3), peace (SDG 16), and industry (SDG 9); 2) the utilization of Prominence percentile metrics to discern critical and economically prioritized research areas, such as misinformation and object detection in deep learning; and 3) an evaluation of generative AI's role in the propagation and realism of fake news, raising pressing ethical concerns. These contributions collectively provide a comprehensive overview of the current state and future trajectories of fake news research, offering valuable insights for academia, policymakers, and industry.
Collapse
Affiliation(s)
- Raghu Raman
- Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, 690525, India
| | - Vinith Kumar Nair
- Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, 690525, India
| | - Prema Nedungadi
- Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, 690525, India
| | - Aditya Kumar Sahu
- Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amaravati, Andhra Pradesh, 522503, India
| | - Robin Kowalski
- College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, 29634, USA
| | - Sasangan Ramanathan
- Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, Tamilnadu, 641112, India
| | - Krishnashree Achuthan
- Center for Cybersecurity Systems and Networks, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, 690525, India
| |
Collapse
|
43
|
Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024; 44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND The rapidly evolving field of artificial intelligence (AI) holds great potential for plastic surgeons. ChatGPT, a recently released AI large language model (LLM), promises applications across many disciplines, including healthcare. OBJECTIVES The aim of this article was to provide a primer for plastic surgeons on AI, LLM, and ChatGPT, including an analysis of current demonstrated and proposed clinical applications. METHODS A systematic review was performed identifying medical and surgical literature on ChatGPT's proposed clinical applications. Variables assessed included applications investigated, command tasks provided, user input information, AI-emulated human skills, output validation, and reported limitations. RESULTS The analysis included 175 articles reporting on 13 plastic surgery applications and 116 additional clinical applications, categorized by field and purpose. Thirty-four applications within plastic surgery are thus proposed, with relevance to different target audiences, including attending plastic surgeons (n = 17, 50%), trainees/educators (n = 8, 24.0%), researchers/scholars (n = 7, 21%), and patients (n = 2, 6%). The 15 identified limitations of ChatGPT were categorized by training data, algorithm, and ethical considerations. CONCLUSIONS Widespread use of ChatGPT in plastic surgery will depend on rigorous research of proposed applications to validate performance and address limitations. This systemic review aims to guide research, development, and regulation to safely adopt AI in plastic surgery.
Collapse
|
44
|
Sarma G, Kashyap H, Medhi PP. ChatGPT in Head and Neck Oncology-Opportunities and Challenges. Indian J Otolaryngol Head Neck Surg 2024; 76:1425-1429. [PMID: 38440617 PMCID: PMC10908741 DOI: 10.1007/s12070-023-04201-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 08/28/2023] [Indexed: 03/06/2024] Open
Abstract
Head and neck oncology represents a complex and challenging field, encompassing the diagnosis, treatment and management of various malignancies affecting the intricate anatomical structures of the head and neck region. With advancements in artificial intelligence (AI), chatbot applications have emerged as a promising tool to revolutionize the field of Head and Neck oncology. ChatGPT is a cutting-edge language model developed by OpenAI that can help the oncologist in the clinic in scheduling appointments, establishing a clinical diagnosis, making a treatment plan and follow-up. ChatGPT also plays an essential role in telemedicine consultations, medical documentation, scientific writing and research. ChatGPT carries its inherent drawbacks too. ChatGPT raises significant ethical concerns related to authorship, accountability, transparency, bias, and the potential for misinformation. ChatGPT's training data is limited to September 2021; thus, regular updates are required to keep pace with the rapidly evolving medical research and advancements. Therefore, a judicial approach to using ChatGPT is of utmost importance. Head and Neck Oncologists can reap the maximum benefit of this technology in terms of patient care, education and research to improve clinical outcomes.
Collapse
Affiliation(s)
- Gautam Sarma
- Department of Radiation Oncology, All India Institute of Medical Sciences Guwahati, Changsari, Assam, 781101 India
| | - Hrishikesh Kashyap
- Department of Radiation Oncology, All India Institute of Medical Sciences Guwahati, Changsari, Assam, 781101 India
| | - Partha Pratim Medhi
- Department of Radiation Oncology, All India Institute of Medical Sciences Guwahati, Changsari, Assam, 781101 India
| |
Collapse
|
45
|
Wang WH, Wang SY, Huang JY, Liu XD, Yang J, Liao M, Lu Q, Wu Z. An investigation study on the interpretation of ultrasonic medical reports using OpenAI's GPT-3.5-turbo model. JOURNAL OF CLINICAL ULTRASOUND : JCU 2024; 52:105-111. [PMID: 37930057 DOI: 10.1002/jcu.23590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/23/2023] [Accepted: 10/04/2023] [Indexed: 11/07/2023]
Abstract
OBJECTIVES Ultrasound medical reports are an important means of diagnosing diseases and assessing treatment effectiveness. However, their professional terms and complex sentences often make it difficult for ordinary people to understand. Therefore, this study explores the clinical value of using artificial intelligence systems based on ChatGPT to interpret ultrasound medical reports. METHODS In this study, a combination of online and offline questionnaires were used to survey both physicians and non-medical individuals. The questionnaires evaluated ChatGPT's interpretation of ultrasound reports from both professional and comprehensibility perspectives, and the results were analyzed using Excel spreadsheets. Additionally, a portion of the research content was evaluated using the Likert Scale 5-point method in the questionnaire. RESULTS According to survey results, 67.4% of surveyed doctors believe that using ChatGPT for interpreting ultrasound medical reports can help improve work efficiency. At the same time, 69.72% of non-medical professionals believe it is necessary to enhance their understanding of medical ultrasound reports through ChatGPT interpretation, and 62.58% support the application of ChatGPT to ultrasound medical reports. The non-medical group's understanding of ultrasound medical reports significantly improved (p < 0.01) after implementing ChatGPT, However, 67.49% of the general public are concerned about ChatGPT's imperfect functionality, which may cause misleading information. This reflects that the public's trust in new technology is not high enough, and they are also worried about possible privacy leaks and security issues with ChatGPT technology. CONCLUSIONS The higher acceptance and support of non-medical individuals for the interpretation of medical reports by ChatGPT might be due to the system's natural language processing abilities that allow them to better understand and evaluate report contents. However, the expertise and experience of physicians are still irreplaceable. This suggests that the ChatGPT-based ultrasound medical report interpretation system has certain clinical value and application prospects, but further optimization is necessary to address its shortcomings in data quality and professionalism. This study provides a reference and inspiration for promoting the application and development of ultrasound technology and artificial intelligence systems in the medical field.
Collapse
Affiliation(s)
- Wen Hui Wang
- Department of Ultrasound, West China Hospital of Sichuan University, Chengdu, China
| | - Shi Yu Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jia Yan Huang
- Department of Ultrasound, West China Hospital of Sichuan University, Chengdu, China
| | - Xiao di Liu
- Department of Ultrasound, West China Hospital of Sichuan University, Chengdu, China
| | - Jie Yang
- Department of Ultrasound, West China Hospital of Sichuan University, Chengdu, China
| | - Min Liao
- Department of Ultrasound, West China Hospital of Sichuan University, Chengdu, China
| | - Qiang Lu
- Department of Ultrasound, West China Hospital of Sichuan University, Chengdu, China
| | - Zhe Wu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
46
|
Zalzal HG, Abraham A, Cheng J, Shah RK. Can ChatGPT help patients answer their otolaryngology questions? Laryngoscope Investig Otolaryngol 2024; 9:e1193. [PMID: 38362184 PMCID: PMC10866598 DOI: 10.1002/lio2.1193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 11/14/2023] [Accepted: 11/22/2023] [Indexed: 02/17/2024] Open
Abstract
Background Over the past year, the world has been captivated by the potential of artificial intelligence (AI). The appetite for AI in science, specifically healthcare is huge. It is imperative to understand the credibility of large language models in assisting the public in medical queries. Objective To evaluate the ability of ChatGPT to provide reasonably accurate answers to public queries within the domain of Otolaryngology. Methods Two board-certified otolaryngologists (HZ, RS) inputted 30 text-based patient queries into the ChatGPT-3.5 model. ChatGPT responses were rated by physicians on a scale (accurate, partially accurate, incorrect), while a similar 3-point scale involving confidence was given to layperson reviewers. Demographic data involving gender and education level was recorded for the public reviewers. Inter-rater agreement percentage was based on binomial distribution for calculating the 95% confidence intervals and performing significance tests. Statistical significance was defined as p < .05 for two-sided tests. Results In testing patient queries, both Otolaryngology physicians found that ChatGPT answered 98.3% of questions correctly, but only 79.8% (range 51.7%-100%) of patients were confident that the AI model was accurate in its responses (corrected agreement = 0.682; p < .001). Among the layperson responses, the corrected coefficient was of moderate agreement (0.571; p < .001). No correlation was noted among age, gender, or education level for the layperson responses. Conclusion ChatGPT is highly accurate in responding to questions posed by the public with regards to Otolaryngology from a physician standpoint. Public reviewers were not fully confident in believing the AI model, with subjective concerns related to less trust in AI answers compared to physician explanation. Larger evaluations with a representative public sample and broader medical questions should immediately be conducted by appropriate organizations, governing bodies, and/or governmental agencies to instill public confidence in AI and ChatGPT as a medical resource. Level of Evidence 4.
Collapse
Affiliation(s)
- Habib G. Zalzal
- Division of Otolaryngology‐Head and Neck SurgeryChildren's National HospitalWashingtonDistrict of ColumbiaUSA
| | | | - Jenhao Cheng
- Quality, Safety, AnalyticsChildren's National HospitalWashingtonDistrict of ColumbiaUSA
| | - Rahul K. Shah
- Division of Otolaryngology‐Head and Neck SurgeryChildren's National HospitalWashingtonDistrict of ColumbiaUSA
| |
Collapse
|
47
|
Kavadella A, Dias da Silva MA, Kaklamanos EG, Stamatopoulos V, Giannakopoulos K. Evaluation of ChatGPT's Real-Life Implementation in Undergraduate Dental Education: Mixed Methods Study. JMIR MEDICAL EDUCATION 2024; 10:e51344. [PMID: 38111256 PMCID: PMC10867750 DOI: 10.2196/51344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/28/2023] [Accepted: 12/11/2023] [Indexed: 12/20/2023]
Abstract
BACKGROUND The recent artificial intelligence tool ChatGPT seems to offer a range of benefits in academic education while also raising concerns. Relevant literature encompasses issues of plagiarism and academic dishonesty, as well as pedagogy and educational affordances; yet, no real-life implementation of ChatGPT in the educational process has been reported to our knowledge so far. OBJECTIVE This mixed methods study aimed to evaluate the implementation of ChatGPT in the educational process, both quantitatively and qualitatively. METHODS In March 2023, a total of 77 second-year dental students of the European University Cyprus were divided into 2 groups and asked to compose a learning assignment on "Radiation Biology and Radiation Protection in the Dental Office," working collaboratively in small subgroups, as part of the educational semester program of the Dentomaxillofacial Radiology module. Careful planning ensured a seamless integration of ChatGPT, addressing potential challenges. One group searched the internet for scientific resources to perform the task and the other group used ChatGPT for this purpose. Both groups developed a PowerPoint (Microsoft Corp) presentation based on their research and presented it in class. The ChatGPT group students additionally registered all interactions with the language model during the prompting process and evaluated the final outcome; they also answered an open-ended evaluation questionnaire, including questions on their learning experience. Finally, all students undertook a knowledge examination on the topic, and the grades between the 2 groups were compared statistically, whereas the free-text comments of the questionnaires were thematically analyzed. RESULTS Out of the 77 students, 39 were assigned to the ChatGPT group and 38 to the literature research group. Seventy students undertook the multiple choice question knowledge examination, and examination grades ranged from 5 to 10 on the 0-10 grading scale. The Mann-Whitney U test showed that students of the ChatGPT group performed significantly better (P=.045) than students of the literature research group. The evaluation questionnaires revealed the benefits (human-like interface, immediate response, and wide knowledge base), the limitations (need for rephrasing the prompts to get a relevant answer, general content, false citations, and incapability to provide images or videos), and the prospects (in education, clinical practice, continuing education, and research) of ChatGPT. CONCLUSIONS Students using ChatGPT for their learning assignments performed significantly better in the knowledge examination than their fellow students who used the literature research methodology. Students adapted quickly to the technological environment of the language model, recognized its opportunities and limitations, and used it creatively and efficiently. Implications for practice: the study underscores the adaptability of students to technological innovations including ChatGPT and its potential to enhance educational outcomes. Educators should consider integrating ChatGPT into curriculum design; awareness programs are warranted to educate both students and educators about the limitations of ChatGPT, encouraging critical engagement and responsible use.
Collapse
Affiliation(s)
- Argyro Kavadella
- School of Dentistry, European University Cyprus, Nicosia, Cyprus
| | - Marco Antonio Dias da Silva
- Research Group of Teleducation and Teledentistry, Federal University of Campina Grande, Campina Grande, Brazil
| | - Eleftherios G Kaklamanos
- School of Dentistry, European University Cyprus, Nicosia, Cyprus
- School of Dentistry, Aristotle University of Thessaloniki, Thessaloniki, Greece
- Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, United Arab Emirates
| | - Vasileios Stamatopoulos
- Information Management Systems Institute, ATHENA Research and Innovation Center, Athens, Greece
| | | |
Collapse
|
48
|
Jain N, Gottlich C, Fisher J, Campano D, Winston T. Assessing ChatGPT's orthopedic in-service training exam performance and applicability in the field. J Orthop Surg Res 2024; 19:27. [PMID: 38167093 PMCID: PMC10762835 DOI: 10.1186/s13018-023-04467-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 12/12/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND ChatGPT has gained widespread attention for its ability to understand and provide human-like responses to inputs. However, few works have focused on its use in Orthopedics. This study assessed ChatGPT's performance on the Orthopedic In-Service Training Exam (OITE) and evaluated its decision-making process to determine whether adoption as a resource in the field is practical. METHODS ChatGPT's performance on three OITE exams was evaluated through inputting multiple choice questions. Questions were classified by their orthopedic subject area. Yearly, OITE technical reports were used to gauge scores against resident physicians. ChatGPT's rationales were compared with testmaker explanations using six different groups denoting answer accuracy and logic consistency. Variables were analyzed using contingency table construction and Chi-squared analyses. RESULTS Of 635 questions, 360 were useable as inputs (56.7%). ChatGPT-3.5 scored 55.8%, 47.7%, and 54% for the years 2020, 2021, and 2022, respectively. Of 190 correct outputs, 179 provided a consistent logic (94.2%). Of 170 incorrect outputs, 133 provided an inconsistent logic (78.2%). Significant associations were found between test topic and correct answer (p = 0.011), and type of logic used and tested topic (p = < 0.001). Basic Science and Sports had adjusted residuals greater than 1.96. Basic Science and correct, no logic; Basic Science and incorrect, inconsistent logic; Sports and correct, no logic; and Sports and incorrect, inconsistent logic; had adjusted residuals greater than 1.96. CONCLUSIONS Based on annual OITE technical reports for resident physicians, ChatGPT-3.5 performed around the PGY-1 level. When answering correctly, it displayed congruent reasoning with testmakers. When answering incorrectly, it exhibited some understanding of the correct answer. It outperformed in Basic Science and Sports, likely due to its ability to output rote facts. These findings suggest that it lacks the fundamental capabilities to be a comprehensive tool in Orthopedic Surgery in its current form. LEVEL OF EVIDENCE II.
Collapse
Affiliation(s)
- Neil Jain
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA.
| | - Caleb Gottlich
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA
| | - John Fisher
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA
| | - Dominic Campano
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA
| | - Travis Winston
- Department of Orthopedic Surgery, Texas Tech University Health Sciences Center Lubbock, 3601 4th St, Lubbock, TX, 79430, USA
| |
Collapse
|
49
|
Sumbal A, Sumbal R, Amir A. Can ChatGPT-3.5 Pass a Medical Exam? A Systematic Review of ChatGPT's Performance in Academic Testing. JOURNAL OF MEDICAL EDUCATION AND CURRICULAR DEVELOPMENT 2024; 11:23821205241238641. [PMID: 38487300 PMCID: PMC10938614 DOI: 10.1177/23821205241238641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 02/25/2024] [Indexed: 03/17/2024]
Abstract
OBJECTIVE We, therefore, aim to conduct a systematic review to assess the academic potential of ChatGPT-3.5, along with its strengths and limitations when giving medical exams. METHOD Following PRISMA guidelines, a systemic search of the literature was performed using electronic databases PUBMED/MEDLINE, Google Scholar, and Cochrane. Articles from their inception till April 4, 2023, were queried. A formal narrative analysis was conducted by systematically arranging similarities and differences between individual findings together. RESULTS After rigorous screening, 12 articles underwent this review. All the selected papers assessed the academic performance of ChatGPT-3.5. One study compared the performance of ChatGPT-3.5 with the performance of ChatGPT-4 when giving a medical exam. Overall, ChatGPT performed well in 4 tests, averaged in 4 tests, and performed badly in 4 tests. ChatGPT's performance was directly proportional to the level of the questions' difficulty but was unremarkable on whether the questions were binary, descriptive, or MCQ-based. ChatGPT's explanation, reasoning, memory, and accuracy were remarkably good, whereas it failed to understand image-based questions, and lacked insight and critical thinking. CONCLUSION ChatGPT-3.5 performed satisfactorily in the exams it took as an examinee. However, there is a need for future related studies to fully explore the potential of ChatGPT in medical education.
Collapse
Affiliation(s)
- Anusha Sumbal
- Dow University of Health Sciences, Karachi, Pakistan
| | - Ramish Sumbal
- Dow University of Health Sciences, Karachi, Pakistan
| | - Alina Amir
- Dow University of Health Sciences, Karachi, Pakistan
| |
Collapse
|
50
|
Morales-Ramirez P, Mishek H, Dasgupta A. The Genie Is Out of the Bottle: What ChatGPT Can and Cannot Do for Medical Professionals. Obstet Gynecol 2024; 143:e1-e6. [PMID: 37944140 DOI: 10.1097/aog.0000000000005446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/12/2023] [Indexed: 11/12/2023]
Abstract
ChatGPT is a cutting-edge artificial intelligence technology that was released for public use in November 2022. Its rapid adoption has raised questions about capabilities, limitations, and risks. This article presents an overview of ChatGPT, and it highlights the current state of this technology for the medical field. The article seeks to provide a balanced perspective on what the model can and cannot do in three specific domains: clinical practice, research, and medical education. It also provides suggestions on how to optimize the use of this tool.
Collapse
|