1
|
Kalaw FGP, Baxter SL. Ethical considerations for large language models in ophthalmology. Curr Opin Ophthalmol 2024; 35:438-446. [PMID: 39259616 PMCID: PMC11427135 DOI: 10.1097/icu.0000000000001083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024]
Abstract
PURPOSE OF REVIEW This review aims to summarize and discuss the ethical considerations regarding large language model (LLM) use in the field of ophthalmology. RECENT FINDINGS This review of 47 articles on LLM applications in ophthalmology highlights their diverse potential uses, including education, research, clinical decision support, and surgical assistance (as an aid in operative notes). We also review ethical considerations such as the inability of LLMs to interpret data accurately, the risk of promoting controversial or harmful recommendations, and breaches of data privacy. These concerns imply the need for cautious integration of artificial intelligence in healthcare, emphasizing human oversight, transparency, and accountability to mitigate risks and uphold ethical standards. SUMMARY The integration of LLMs in ophthalmology offers potential advantages such as aiding in clinical decision support and facilitating medical education through their ability to process queries and analyze ophthalmic imaging and clinical cases. However, their utilization also raises ethical concerns regarding data privacy, potential misinformation, and biases inherent in the datasets used. Awareness of these concerns should be addressed in order to optimize its utility in the healthcare setting. More importantly, promoting responsible and careful use by consumers should be practiced.
Collapse
Affiliation(s)
- Fritz Gerald P. Kalaw
- Division of Ophthalmology Informatics and Data Science, The Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California, USA
- Department of Biomedical Informatics, University of California San Diego Health System, University of California San Diego, La Jolla, California, USA
| | - Sally L. Baxter
- Division of Ophthalmology Informatics and Data Science, The Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California, USA
- Department of Biomedical Informatics, University of California San Diego Health System, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
2
|
Høj S, Thomsen SF, Ulrik CS, Meteran H, Sigsgaard T, Meteran H. Evaluating the scientific reliability of ChatGPT as a source of information on asthma. THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. GLOBAL 2024; 3:100330. [PMID: 39328581 PMCID: PMC11426030 DOI: 10.1016/j.jacig.2024.100330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 06/21/2024] [Accepted: 07/16/2024] [Indexed: 09/28/2024]
Abstract
Background This study assessed the reliability of ChatGPT as a source of information on asthma, given the increasing use of artificial intelligence-driven models for medical information. Prior concerns about misinformation on atopic diseases in various digital platforms underline the importance of this evaluation. Objective We aimed to evaluate the scientific reliability of ChatGPT as a source of information on asthma. Methods The study involved analyzing ChatGPT's responses to 26 asthma-related questions, each followed by a follow-up question. These encompassed definition/risk factors, diagnosis, treatment, lifestyle factors, and specific clinical inquiries. Medical professionals specialized in allergic and respiratory diseases independently assessed the responses using a 1-to-5 accuracy scale. Results Approximately 81% of the responses scored 4 or higher, suggesting a generally high accuracy level. However, 5 responses scored >3, indicating minor potentially harmful inaccuracies. The overall median score was 4. Fleiss multirater kappa value showed moderate agreement among raters. Conclusion ChatGPT generally provides reliable asthma-related information, but its limitations, such as lack of depth in certain responses and inability to cite sources or update in real time, were noted. It shows promise as an educational tool, but it should not be a substitute for professional medical advice. Future studies should explore its applicability for different user demographics and compare it with newer artificial intelligence models.
Collapse
Affiliation(s)
- Simon Høj
- Department of Dermatology, Venereology, and Wound Healing Centre, Copenhagen University Hospital-Bispebjerg, Bispebjerg, Denmark
- Department of Public Health, Environment, Occupation, and Health, Aarhus University, Aarhus, Denmark
| | - Simon Francis Thomsen
- Department of Dermatology, Venereology, and Wound Healing Centre, Copenhagen University Hospital-Bispebjerg, Bispebjerg, Denmark
- Department of Biomedical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Charlotte Suppli Ulrik
- Department of Respiratory Medicine, Copenhagen University Hospital-Hvidovre, Hvidovre, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Hanieh Meteran
- Department of Internal Medicine, Section of Endocrinology, Copenhagen University Hospital-Hvidovre, Hvidovre, Denmark
| | - Torben Sigsgaard
- Department of Public Health, Environment, Occupation, and Health, Aarhus University, Aarhus, Denmark
| | - Howraman Meteran
- Department of Public Health, Environment, Occupation, and Health, Aarhus University, Aarhus, Denmark
- Department of Respiratory Medicine, Copenhagen University Hospital-Hvidovre, Hvidovre, Denmark
- Department of Respiratory Medicine, Zealand University Hospital Roskilde-Næstved, Næstved, Denmark
| |
Collapse
|
3
|
Mihalache A, Huang RS, Mikhail D, Popovic MM, Shor R, Pereira A, Kwok J, Yan P, Wong DT, Kertes PJ, Kohly RP, Muni RH. Interpretation of Clinical Retinal Images Using an Artificial Intelligence Chatbot. OPHTHALMOLOGY SCIENCE 2024; 4:100556. [PMID: 39139542 PMCID: PMC11321281 DOI: 10.1016/j.xops.2024.100556] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 05/13/2024] [Accepted: 05/17/2024] [Indexed: 08/15/2024]
Abstract
Purpose To assess the performance of Chat Generative Pre-Trained Transformer-4 in providing accurate diagnoses to retina teaching cases from OCTCases. Design Cross-sectional study. Subjects Retina teaching cases from OCTCases. Methods We prompted a custom chatbot with 69 retina cases containing multimodal ophthalmic images, asking it to provide the most likely diagnosis. In a sensitivity analysis, we inputted increasing amounts of clinical information pertaining to each case until the chatbot achieved a correct diagnosis. We performed multivariable logistic regressions on Stata v17.0 (StataCorp LLC) to investigate associations between the amount of text-based information inputted per prompt and the odds of the chatbot achieving a correct diagnosis, adjusting for the laterality of cases, number of ophthalmic images inputted, and imaging modalities. Main Outcome Measures Our primary outcome was the proportion of cases for which the chatbot was able to provide a correct diagnosis. Our secondary outcome was the chatbot's performance in relation to the amount of text-based information accompanying ophthalmic images. Results Across 69 retina cases collectively containing 139 ophthalmic images, the chatbot was able to provide a definitive, correct diagnosis for 35 (50.7%) cases. The chatbot needed variable amounts of clinical information to achieve a correct diagnosis, where the entire patient description as presented by OCTCases was required for a majority of correctly diagnosed cases (23 of 35 cases, 65.7%). Relative to when the chatbot was only prompted with a patient's age and sex, the chatbot achieved a higher odds of a correct diagnosis when prompted with an entire patient description (odds ratio = 10.1, 95% confidence interval = 3.3-30.3, P < 0.01). Despite providing an incorrect diagnosis for 34 (49.3%) cases, the chatbot listed the correct diagnosis within its differential diagnosis for 7 (20.6%) of these incorrectly answered cases. Conclusions This custom chatbot was able to accurately diagnose approximately half of the retina cases requiring multimodal input, albeit relying heavily on text-based contextual information that accompanied ophthalmic images. The diagnostic ability of the chatbot in interpretation of multimodal imaging without text-based information is currently limited. The appropriate use of the chatbot in this setting is of utmost importance, given bioethical concerns. Financial Disclosures Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Andrew Mihalache
- Temerty School of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Ryan S. Huang
- Temerty School of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - David Mikhail
- Temerty School of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Marko M. Popovic
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Reut Shor
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Austin Pereira
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Jason Kwok
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Peng Yan
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - David T. Wong
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Ophthalmology, St. Michael’s Hospital/Unity Health Toronto, Toronto, Ontario, Canada
| | - Peter J. Kertes
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
- John and Liz Tory Eye Centre, Sunnybrook Health Science Centre, Toronto, Ontario, Canada
| | - Radha P. Kohly
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
- John and Liz Tory Eye Centre, Sunnybrook Health Science Centre, Toronto, Ontario, Canada
| | - Rajeev H. Muni
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Ophthalmology, St. Michael’s Hospital/Unity Health Toronto, Toronto, Ontario, Canada
| |
Collapse
|
4
|
Ghalibafan S, Taylor Gonzalez DJ, Cai LZ, Graham Chou B, Panneerselvam S, Conrad Barrett S, Djulbegovic MB, Yannuzzi NA. APPLICATIONS OF MULTIMODAL GENERATIVE ARTIFICIAL INTELLIGENCE IN A REAL-WORLD RETINA CLINIC SETTING. Retina 2024; 44:1732-1740. [PMID: 39287535 DOI: 10.1097/iae.0000000000004204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/19/2024]
Abstract
PURPOSE This study evaluates a large language model, Generative Pre-trained Transformer 4 with vision, for diagnosing vitreoretinal diseases in real-world ophthalmology settings. METHODS A retrospective cross-sectional study at Bascom Palmer Eye Clinic, analyzing patient data from January 2010 to March 2023, assesses Generative Pre-trained Transformer 4 with vision's performance on retinal image analysis and International Classification of Diseases 10th revision coding across 2 patient groups: simpler cases (Group A) and complex cases (Group B) requiring more in-depth analysis. Diagnostic accuracy was assessed through open-ended questions and multiple-choice questions independently verified by three retina specialists. RESULTS In 256 eyes from 143 patients, Generative Pre-trained Transformer 4-V demonstrated a 13.7% accuracy for open-ended questions and 31.3% for multiple-choice questions, with International Classification of Diseases 10th revision code accuracies at 5.5% and 31.3%, respectively. Accurately diagnosed posterior vitreous detachment, nonexudative age-related macular degeneration, and retinal detachment. International Classification of Diseases 10th revision coding was most accurate for nonexudative age-related macular degeneration, central retinal vein occlusion, and macular hole in OEQs, and for posterior vitreous detachment, nonexudative age-related macular degeneration, and retinal detachment in multiple-choice questions. No significant difference in diagnostic or coding accuracy was found in Groups A and B. CONCLUSION Generative Pre-trained Transformer 4 with vision has potential in clinical care and record keeping, particularly with standardized questions. Its effectiveness in open-ended scenarios is limited, indicating a significant limitation in providing complex medical advice.
Collapse
Affiliation(s)
- Seyyedehfatemeh Ghalibafan
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida; and
| | - David J Taylor Gonzalez
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida; and
| | - Louis Z Cai
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida; and
| | - Brandon Graham Chou
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida; and
| | - Sugi Panneerselvam
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida; and
| | - Spencer Conrad Barrett
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida; and
| | - Mak B Djulbegovic
- Wills Eye Hospital, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Nicolas A Yannuzzi
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida; and
| |
Collapse
|
5
|
Aykut A, Sezenoz AS. Exploring the Potential of Code-Free Custom GPTs in Ophthalmology: An Early Analysis of GPT Store and User-Creator Guidance. Ophthalmol Ther 2024; 13:2697-2713. [PMID: 39141071 PMCID: PMC11408450 DOI: 10.1007/s40123-024-01014-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 07/26/2024] [Indexed: 08/15/2024] Open
Abstract
INTRODUCTION OpenAI recently introduced the ability to create custom generative pre-trained transformers (cGPTs) using text-based instruction and/or external documents using retrieval-augmented generation (RAG) architecture without coding knowledge. This study aimed to analyze the features of ophthalmology-related cGPTs and explore their potential utilities. METHODS Data collection took place on January 20 and 21, 2024, and custom GPTs were found by entering ophthalmology keywords into the "Explore GPTS" section of the website. General and specific features of cGPTs were recorded, such as knowledge other than GPT-4 training data. The instruction and description sections were analyzed for compatibility using the Likert scale. We analyzed two custom GPTs with the highest Likert score in detail. We attempted to create a convincingly presented yet potentially harmful cGPT to test safety features. RESULTS We analyzed 22 ophthalmic cGPTs, of which 55% were for general use and the most common subspecialty was glaucoma (18%). Over half (55%) contained knowledge other than GPT-4 training data. The representation of the instructions through the description was between "Moderately representative" and "Very representative" with a median Likert score of 3.5 (IQR 3.0-4.0). The instruction word count was significantly associated with Likert scores (P = 0.03). Tested cGPTs demonstrated potential for specific conversational tone, information, retrieval and combining knowledge from an uploaded source. With these safety settings, creating a malicious GPT was possible. CONCLUSIONS This is the first study to our knowledge to examine the GPT store for a medical field. Our findings suggest that these cGPTs can be immediately implemented in practice and may offer more targeted and effective solutions compared to the standard GPT-4. However, further research is necessary to evaluate their capabilities and limitations comprehensively. The safety features currently appear to be rather limited. It may be helpful for the user to review the instruction section before using a cGPT.
Collapse
Affiliation(s)
- Aslan Aykut
- Department of Ophthalmology and Visual Sciences, Kellogg Eye Center, University of Michigan, 1000 Wall St, Rm 641, Ann Arbor, MI, 48105, USA.
- Department of Ophthalmology, School of Medicine, Marmara University, Istanbul, 34854, Turkey.
| | - Almila Sarigul Sezenoz
- Department of Ophthalmology and Visual Sciences, Kellogg Eye Center, University of Michigan, 1000 Wall St, Rm 641, Ann Arbor, MI, 48105, USA
- Department of Ophthalmology, Faculty of Medicine, Başkent University, Ankara, 06790, Turkey
| |
Collapse
|
6
|
Rojas-Carabali W, Cifuentes-González C, Wei X, Putera I, Sen A, Thng ZX, Agrawal R, Elze T, Sobrin L, Kempen JH, Lee B, Biswas J, Nguyen QD, Gupta V, de-la-Torre A, Agrawal R. Evaluating the Diagnostic Accuracy and Management Recommendations of ChatGPT in Uveitis. Ocul Immunol Inflamm 2024; 32:1526-1531. [PMID: 37722842 DOI: 10.1080/09273948.2023.2253471] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/01/2023] [Accepted: 08/25/2023] [Indexed: 09/20/2023]
Abstract
INTRODUCTION Accurate diagnosis and timely management are vital for favorable uveitis outcomes. Artificial Intelligence (AI) holds promise in medical decision-making, particularly in ophthalmology. Yet, the diagnostic precision and management advice from AI-based uveitis chatbots lack assessment. METHODS We appraised diagnostic accuracy and management suggestions of an AI-based chatbot, ChatGPT, versus five uveitis-trained ophthalmologists, using 25 standard cases aligned with new Uveitis Nomenclature guidelines. Participants predicted likely diagnoses, two differentials, and next management steps. Comparative success rates were computed. RESULTS Ophthalmologists excelled (60-92%) in likely diagnosis, exceeding AI (60%). Considering fully and partially accurate diagnoses, ophthalmologists achieved 76-100% success; AI attained 72%. Despite an 8% AI improvement, its overall performance lagged. Ophthalmologists and AI agreed on diagnosis in 48% cases, with 91.6% exhibiting concurrence in management plans. CONCLUSIONS The study underscores AI chatbots' potential in uveitis diagnosis and management, indicating their value in reducing diagnostic errors. Further research is essential to enhance AI chatbot precision in diagnosis and recommendations.
Collapse
Affiliation(s)
- William Rojas-Carabali
- National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
- Department of Bioinformatics, Lee Kong Chiang School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Carlos Cifuentes-González
- Neuroscience Research Group (NEUROS), Neurovitae Center for Neuroscience, Institute of Translational Medicine (IMT), Escuela de Medicina y Ciencias de la Salud, Universidad del Rosario, Bogotá, Colombia
| | - Xin Wei
- National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
| | - Ikhwanuliman Putera
- Department of Ophthalmology, Faculty of Medicine Universitas Indonesia - CiptoMangunkusmoKirana Eye Hospital, Jakarta, Indonesia
- Laboratory Medical Immunology, Department of Immunology, ErasmusMC, University Medical Centre, Rotterdam, the Netherlands
- Department of Internal Medicine, Division of Clinical Immunology, Erasmus MC, University Medical Center, Rotterdam, The Netherlands
- Department of Ophthalmology, Erasmus MC, University Medical Center, Rotterdam, The Netherlands
| | - Alok Sen
- Department of Vitreoretina and Uveitis, Sadguru Netra Chikatsalya, Chitrakoot, India
| | - Zheng Xian Thng
- National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
| | - Rajdeep Agrawal
- Department of Bioinformatics, Lee Kong Chiang School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Tobias Elze
- Department of Ophthalmology, Massachusetts Eye and Ear/Harvard Medical School, and Schepens Eye Research Institute, Boston, Massachusetts, USA
| | - Lucia Sobrin
- Department of Ophthalmology, Massachusetts Eye and Ear/Harvard Medical School, and Schepens Eye Research Institute, Boston, Massachusetts, USA
| | - John H Kempen
- Department of Ophthalmology, Massachusetts Eye and Ear/Harvard Medical School, and Schepens Eye Research Institute, Boston, Massachusetts, USA
- Community Ophthalmology, Sight for Souls, Bellevue, Washington, USA
- Department of Ophthalmology, Addis Ababa University, Addis Ababa, Ethiopia
- MyungSung Christian Medical Center (MCM) Eye Unit, MCM Comprehensive Specialized Hospital, and MyungSung Medical School, Addis Ababa, Ethiopia
| | - Bernett Lee
- National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
| | - Jyotirmay Biswas
- Department of Ocular Pathology and Uveitis, Medical Research Foundation, Sankara Netralaya, Chennai, India
| | - Quan Dong Nguyen
- Byers Eye Institute, Stanford University, Palo Alto, California, USA
| | - Vishali Gupta
- Post Graduate Institute of Medical Education and Research (PGIMER), Advance Eye Centre, Chandigarh, India
| | - Alejandra de-la-Torre
- National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
| | - Rupesh Agrawal
- MyungSung Christian Medical Center (MCM) Eye Unit, MCM Comprehensive Specialized Hospital, and MyungSung Medical School, Addis Ababa, Ethiopia
- Department of Ophthalmology and Visual Sciences, Academic Clinical Program, Duke-NUS Medical School, Singapore, Singapore
- Moorfields Eye Hospital, NHS Foundation Trust, London, UK
- Singapore Eye Research Institute, The Academia, Singapore, Singapore
| |
Collapse
|
7
|
Cohen SA, Yadlapalli N, Tijerina JD, Alabiad CR, Chang JR, Kinde B, Mahoney NR, Roelofs KA, Woodward JA, Kossler AL. Comparing the Ability of Google and ChatGPT to Accurately Respond to Oculoplastics-Related Patient Questions and Generate Customized Oculoplastics Patient Education Materials. Clin Ophthalmol 2024; 18:2647-2655. [PMID: 39323727 PMCID: PMC11423829 DOI: 10.2147/opth.s480222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 09/16/2024] [Indexed: 09/27/2024] Open
Abstract
Purpose To compare the accuracy and readability of responses to oculoplastics patient questions provided by Google and ChatGPT. Additionally, to assess the ability of ChatGPT to create customized patient education materials. Methods We executed a Google search to identify the 3 most frequently asked patient questions (FAQs) related to 10 oculoplastics conditions. FAQs were entered into both the Google search engine and the ChatGPT tool and responses were recorded. Responses were graded for readability using five validated readability indices and for accuracy by six oculoplastics surgeons. ChatGPT was instructed to create patient education materials at various reading levels for 8 oculoplastics procedures. The accuracy and readability of ChatGPT-generated procedural explanations were assessed. Results ChatGPT responses to patient FAQs were written at a significantly higher average grade level than Google responses (grade 15.6 vs 10.0, p < 0.001). ChatGPT responses (93% accuracy) were significantly more accurate (p < 0.001) than Google responses (78% accuracy) and were preferred by expert panelists (79%). ChatGPT accurately explained oculoplastics procedures at an above average reading level. When instructed to rewrite patient education materials at a lower reading level, grade level was reduced by approximately 4 (15.7 vs 11.7, respectively, p < 0.001) without sacrificing accuracy. Conclusion ChatGPT has the potential to provide patients with accurate information regarding their oculoplastics conditions. ChatGPT may also be utilized by oculoplastic surgeons as an accurate tool to provide customizable patient education for patients with varying health literacy. A better understanding of oculoplastics conditions and procedures amongst patients can lead to informed eye care decisions.
Collapse
Affiliation(s)
- Samuel A Cohen
- Department of Ophthalmology, Stein Eye Institute at University of California Los Angeles David Geffen School of Medicine, Los Angeles, CA, USA
| | - Nikhita Yadlapalli
- Department of Ophthalmology, FIU Herbert Wertheim College of Medicine, Miami, FL, USA
| | - Jonathan D Tijerina
- Department of Ophthalmology, Bascom Palmer Eye Institute at University of Miami Miller School of Medicine, Miami, FL, USA
| | - Chrisfouad R Alabiad
- Department of Ophthalmology, Bascom Palmer Eye Institute at University of Miami Miller School of Medicine, Miami, FL, USA
| | - Jessica R Chang
- Department of Ophthalmology, USC Roski Eye Institute at University of Southern California Keck School of Medicine, Los Angeles, CA, USA
| | - Benyam Kinde
- Department of Ophthalmology, Byers Eye Institute at Stanford University School of Medicine, Palo Alto, CA, USA
| | - Nicholas R Mahoney
- Department of Ophthalmology, Wilmer Eye Institute at Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kelsey A Roelofs
- Department of Ophthalmology, Stein Eye Institute at University of California Los Angeles David Geffen School of Medicine, Los Angeles, CA, USA
| | - Julie A Woodward
- Department of Ophthalmology, Duke Eye Center at Duke University School of Medicine, Durham, NC, USA
| | - Andrea L Kossler
- Department of Ophthalmology, Byers Eye Institute at Stanford University School of Medicine, Palo Alto, CA, USA
| |
Collapse
|
8
|
Wang Y, Liu C, Zhou K, Zhu T, Han X. Towards regulatory generative AI in ophthalmology healthcare: a security and privacy perspective. Br J Ophthalmol 2024; 108:1349-1353. [PMID: 38834290 DOI: 10.1136/bjo-2024-325167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 05/19/2024] [Indexed: 06/06/2024]
Abstract
As the healthcare community increasingly harnesses the power of generative artificial intelligence (AI), critical issues of security, privacy and regulation take centre stage. In this paper, we explore the security and privacy risks of generative AI from model-level and data-level perspectives. Moreover, we elucidate the potential consequences and case studies within the domain of ophthalmology. Model-level risks include knowledge leakage from the model and model safety under AI-specific attacks, while data-level risks involve unauthorised data collection and data accuracy concerns. Within the healthcare context, these risks can bear severe consequences, encompassing potential breaches of sensitive information, violating privacy rights and threats to patient safety. This paper not only highlights these challenges but also elucidates governance-driven solutions that adhere to AI and healthcare regulations. We advocate for preparedness against potential threats, call for transparency enhancements and underscore the necessity of clinical validation before real-world implementation. The objective of security and privacy improvement in generative AI warrants emphasising the role of ophthalmologists and other healthcare providers, and the timely introduction of comprehensive regulations.
Collapse
Affiliation(s)
- Yueye Wang
- Sun Yat-sen University Zhongshan Ophthalmic Center State Key Laboratory of Ophthalmology, Guangzhou, Guangdong, China
| | - Chi Liu
- Faculty of Data Science, City University of Macau, Macao SAR, China
| | - Keyao Zhou
- Department of Ophthalmology, Guangdong Provincial People's Hospital, Guangzhou, Guangdong, China
- Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, China
| | - Tianqing Zhu
- Faculty of Data Science, City University of Macau, Macao SAR, China
| | - Xiaotong Han
- Sun Yat-sen University Zhongshan Ophthalmic Center State Key Laboratory of Ophthalmology, Guangzhou, Guangdong, China
| |
Collapse
|
9
|
Wong M, Lim ZW, Pushpanathan K, Cheung CY, Wang YX, Chen D, Tham YC. Review of emerging trends and projection of future developments in large language models research in ophthalmology. Br J Ophthalmol 2024; 108:1362-1370. [PMID: 38164563 DOI: 10.1136/bjo-2023-324734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 11/14/2023] [Indexed: 01/03/2024]
Abstract
BACKGROUND Large language models (LLMs) are fast emerging as potent tools in healthcare, including ophthalmology. This systematic review offers a twofold contribution: it summarises current trends in ophthalmology-related LLM research and projects future directions for this burgeoning field. METHODS We systematically searched across various databases (PubMed, Europe PMC, Scopus and Web of Science) for articles related to LLM use in ophthalmology, published between 1 January 2022 and 31 July 2023. Selected articles were summarised, and categorised by type (editorial, commentary, original research, etc) and their research focus (eg, evaluating ChatGPT's performance in ophthalmology examinations or clinical tasks). FINDINGS We identified 32 articles meeting our criteria, published between January and July 2023, with a peak in June (n=12). Most were original research evaluating LLMs' proficiency in clinically related tasks (n=9). Studies demonstrated that ChatGPT-4.0 outperformed its predecessor, ChatGPT-3.5, in ophthalmology exams. Furthermore, ChatGPT excelled in constructing discharge notes (n=2), evaluating diagnoses (n=2) and answering general medical queries (n=6). However, it struggled with generating scientific articles or abstracts (n=3) and answering specific subdomain questions, especially those regarding specific treatment options (n=2). ChatGPT's performance relative to other LLMs (Google's Bard, Microsoft's Bing) varied by study design. Ethical concerns such as data hallucination (n=27), authorship (n=5) and data privacy (n=2) were frequently cited. INTERPRETATION While LLMs hold transformative potential for healthcare and ophthalmology, concerns over accountability, accuracy and data security remain. Future research should focus on application programming interface integration, comparative assessments of popular LLMs, their ability to interpret image-based data and the establishment of standardised evaluation frameworks.
Collapse
Affiliation(s)
| | - Zhi Wei Lim
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Krithi Pushpanathan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Carol Y Cheung
- Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong, Hong Kong
| | - Ya Xing Wang
- Beijing Institute of Ophthalmology, Beijing Tongren Hospital, Capital University of Medical Science, Beijing, China
| | - David Chen
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Department of Ophthalmology, National University Hospital, Singapore
| | - Yih Chung Tham
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
| |
Collapse
|
10
|
Zheng C, Ye H, Guo J, Yang J, Fei P, Yuan Y, Huang D, Huang Y, Peng J, Xie X, Xie M, Zhao P, Chen L, Zhang M. Development and evaluation of a large language model of ophthalmology in Chinese. Br J Ophthalmol 2024; 108:1390-1397. [PMID: 39019566 DOI: 10.1136/bjo-2023-324526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 06/03/2024] [Indexed: 07/19/2024]
Abstract
BACKGROUND Large language models (LLMs), such as ChatGPT, have considerable implications for various medical applications. However, ChatGPT's training primarily draws from English-centric internet data and is not tailored explicitly to the medical domain. Thus, an ophthalmic LLM in Chinese is clinically essential for both healthcare providers and patients in mainland China. METHODS We developed an LLM of ophthalmology (MOPH) using Chinese corpora and evaluated its performance in three clinical scenarios: ophthalmic board exams in Chinese, answering evidence-based medicine-oriented ophthalmic questions and diagnostic accuracy for clinical vignettes. Additionally, we compared MOPH's performance to that of human doctors. RESULTS In the ophthalmic exam, MOPH's average score closely aligned with the mean score of trainees (64.7 (range 62-68) vs 66.2 (range 50-92), p=0.817), but achieving a score above 60 in all seven mock exams. In answering ophthalmic questions, MOPH demonstrated an adherence of 83.3% (25/30) of responses following Chinese guidelines (Likert scale 4-5). Only 6.7% (2/30, Likert scale 1-2) and 10% (3/30, Likert scale 3) of responses were rated as 'poor or very poor' or 'potentially misinterpretable inaccuracies' by reviewers. In diagnostic accuracy, although the rate of correct diagnosis by ophthalmologists was superior to that by MOPH (96.1% vs 81.1%, p>0.05), the difference was not statistically significant. CONCLUSION This study demonstrated the promising performance of MOPH, a Chinese-specific ophthalmic LLM, in diverse clinical scenarios. MOPH has potential real-world applications in Chinese-language ophthalmology settings.
Collapse
Affiliation(s)
- Ce Zheng
- Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
- Institute of Hospital Development Strategy, China Hospital Development Institute, Shanghai Jiao Tong University, Shanghai, China
| | - Hongfei Ye
- Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
- Institute of Hospital Development Strategy, China Hospital Development Institute, Shanghai Jiao Tong University, Shanghai, China
| | - Jinming Guo
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China
| | - Junrui Yang
- Ophthalmology, The 74th Army Group Hospital, Guangzhou, Guangdong, China
| | - Ping Fei
- Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Yuanzhi Yuan
- Ophthalmology, Zhongshan Hospital Fudan University, Shanghai, China
| | - Danqing Huang
- Discipline Inspection & Supervision Office, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Yuqiang Huang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China
| | - Jie Peng
- Opthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Xiaoling Xie
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China
| | - Meng Xie
- Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Peiquan Zhao
- Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Li Chen
- Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Mingzhi Zhang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China
| |
Collapse
|
11
|
Jung H, Oh J, Stephenson KAJ, Joe AW, Mammo ZN. Prompt engineering with ChatGPT3.5 and GPT4 to improve patient education on retinal diseases. CANADIAN JOURNAL OF OPHTHALMOLOGY 2024:S0008-4182(24)00258-8. [PMID: 39245293 DOI: 10.1016/j.jcjo.2024.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 04/24/2024] [Accepted: 08/18/2024] [Indexed: 09/10/2024]
Abstract
OBJECTIVE To assess the effect of prompt engineering on the accuracy, comprehensiveness, readability, and empathy of large language model (LLM)-generated responses to patient questions regarding retinal disease. DESIGN Prospective qualitative study. PARTICIPANTS Retina specialists, ChatGPT3.5, and GPT4. METHODS Twenty common patient questions regarding 5 retinal conditions were inputted to ChatGPT3.5 and GPT4 as a stand-alone question or preceded by an optimized prompt (prompt A) or preceded by prompt A with specified limits to length and grade reading level (prompt B). Accuracy and comprehensiveness were graded by 3 retina specialists on a Likert scale from 1 to 5 (1: very poor to 5: very good). Readability of responses was assessed using Readable.com, an online readability tool. RESULTS There were no significant differences between ChatGPT3.5 and GPT4 across any of the metrics tested. Median accuracy of responses to a stand-alone question, prompt A, and prompt B questions were 5.0, 5.0, and 4.0, respectively. Median comprehensiveness of responses to a stand-alone question, prompt A, and prompt B questions were 5.0, 5.0, and 4.0, respectively. The use of prompt B was associated with a lower accuracy and comprehensiveness than responses to stand-alone question or prompt A questions (p < 0.001). Average-grade reading level of responses across both LLMs were 13.45, 11.5, and 10.3 for a stand-alone question, prompt A, and prompt B questions, respectively (p < 0.001). CONCLUSIONS Prompt engineering can significantly improve readability of LLM-generated responses, although at the cost of reducing accuracy and comprehensiveness. Further study is needed to understand the utility and bioethical implications of LLMs as a patient educational resource.
Collapse
Affiliation(s)
- Hoyoung Jung
- Faculty of Medicine, University of British Columbia, Vancouver BC, Canada
| | - Jean Oh
- Faculty of Medicine, University of British Columbia, Vancouver BC, Canada
| | - Kirk A J Stephenson
- Department of Ophthalmology and Visual Sciences, University of British Columbia, Vancouver BC, Canada
| | - Aaron W Joe
- Department of Ophthalmology and Visual Sciences, University of British Columbia, Vancouver BC, Canada
| | - Zaid N Mammo
- Department of Ophthalmology and Visual Sciences, University of British Columbia, Vancouver BC, Canada.
| |
Collapse
|
12
|
Ye Z, Zhang B, Zhang K, Méndez MJG, Yan H, Wu T, Qu Y, Jiang Y, Xue P, Qiao Y. An assessment of ChatGPT's responses to frequently asked questions about cervical and breast cancer. BMC Womens Health 2024; 24:482. [PMID: 39223612 PMCID: PMC11367894 DOI: 10.1186/s12905-024-03320-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 08/16/2024] [Indexed: 09/04/2024] Open
Abstract
BACKGROUND Cervical cancer (CC) and breast cancer (BC) threaten women's well-being, influenced by health-related stigma and a lack of reliable information, which can cause late diagnosis and early death. ChatGPT is likely to become a key source of health information, although quality concerns could also influence health-seeking behaviours. METHODS This cross-sectional online survey compared ChatGPT's responses to five physicians specializing in mammography and five specializing in gynaecology. Twenty frequently asked questions about CC and BC were asked on 26th and 29th of April, 2023. A panel of seven experts assessed the accuracy, consistency, and relevance of ChatGPT's responses using a 7-point Likert scale. Responses were analyzed for readability, reliability, and efficiency. ChatGPT's responses were synthesized, and findings are presented as a radar chart. RESULTS ChatGPT had an accuracy score of 7.0 (range: 6.6-7.0) for CC and BC questions, surpassing the highest-scoring physicians (P < 0.05). ChatGPT took an average of 13.6 s (range: 7.6-24.0) to answer each of the 20 questions presented. Readability was comparable to that of experts and physicians involved, but ChatGPT generated more extended responses compared to physicians. The consistency of repeated answers was 5.2 (range: 3.4-6.7). With different contexts combined, the overall ChatGPT relevance score was 6.5 (range: 4.8-7.0). Radar plot analysis indicated comparably good accuracy, efficiency, and to a certain extent, relevance. However, there were apparent inconsistencies, and the reliability and readability be considered inadequate. CONCLUSIONS ChatGPT shows promise as an initial source of information for CC and BC. ChatGPT is also highly functional and appears to be superior to physicians, and aligns with expert consensus, although there is room for improvement in readability, reliability, and consistency. Future efforts should focus on developing advanced ChatGPT models explicitly designed to improve medical practice and for those with concerns about symptoms.
Collapse
Affiliation(s)
- Zichen Ye
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Bo Zhang
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Kun Zhang
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - María José González Méndez
- Department of Primary Healthcare and Family Medicine, Faculty of Medicine, Universidad de Chile, Santiago, Chile
| | - Huijiao Yan
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Tong Wu
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yimin Qu
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yu Jiang
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
- School of Health Policy and Management, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
| | - Peng Xue
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
| | - Youlin Qiao
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
13
|
Ye F, Zhang H, Luo X, Wu T, Yang Q, Shi Z. Evaluating ChatGPT's Performance in Answering Questions About Allergic Rhinitis and Chronic Rhinosinusitis. Otolaryngol Head Neck Surg 2024; 171:571-577. [PMID: 38796735 DOI: 10.1002/ohn.832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 04/30/2024] [Accepted: 05/04/2024] [Indexed: 05/28/2024]
Abstract
OBJECTIVE This study aims to evaluate the accuracy of ChatGPT in answering allergic rhinitis (AR) and chronic rhinosinusitis (CRS) related questions. STUDY DESIGN This is a cross-sectional study. SETTING Each question was inputted as a separate, independent prompt. METHODS Responses to AR (n = 189) and CRS (n = 242) related questions, generated by GPT-3.5 and GPT-4, were independently graded for accuracy by 2 senior rhinology professors, with disagreements adjudicated by a third reviewer. RESULTS Overall, ChatGPT demonstrated a satisfactory performance, accurately answering over 80% of questions across all categories. Specifically, GPT-4.0's accuracy in responding to AR-related questions significantly exceeded that of GPT-3.5, but distinction not evident in CRS-related questions. Patient-originated questions had a significantly higher accuracy compared to doctor-originated questions when utilizing GPT-4.0 to respond to AR-related questions. This discrepancy was not observed with GPT-3.5 or in the context of CRS-related questions. Across different types of content, ChatGPT excelled in covering basic knowledge, prevention, and emotion for AR and CRS. However, it experienced challenges when addressing questions about recent advancements, a trend consistent across both GPT-3.5 and GPT-4.0 iterations. Importantly, the accuracy of responses remained unaffected when questions were posed in Chinese. CONCLUSION Our findings suggest ChatGPT's capability to convey accurate information for AR and CRS patients, and offer insights into its performance across various domains, guiding its utilization and improvement.
Collapse
Affiliation(s)
- Fan Ye
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - He Zhang
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Xin Luo
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Tong Wu
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Qintai Yang
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Naso-Orbital-Maxilla and Skull Base Center, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Airway Inflammatory Disease Research and Innovative Technology Translation, Guangzhou, China
| | - Zhaohui Shi
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Naso-Orbital-Maxilla and Skull Base Center, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Airway Inflammatory Disease Research and Innovative Technology Translation, Guangzhou, China
| |
Collapse
|
14
|
Rosselló-Jiménez D, Docampo S, Collado Y, Cuadra-Llopart L, Riba F, Llonch-Masriera M. Geriatrics and artificial intelligence in Spain (Ger-IA project): talking to ChatGPT, a nationwide survey. Eur Geriatr Med 2024; 15:1129-1136. [PMID: 38615289 DOI: 10.1007/s41999-024-00970-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 03/04/2024] [Indexed: 04/15/2024]
Abstract
PURPOSE The purposes of the study was to describe the degree of agreement between geriatricians with the answers given by an AI tool (ChatGPT) in response to questions related to different areas in geriatrics, to study the differences between specialists and residents in geriatrics in terms of the degree of agreement with ChatGPT, and to analyse the mean scores obtained by areas of knowledge/domains. METHODS An observational study was conducted involving 126 doctors from 41 geriatric medicine departments in Spain. Ten questions about geriatric medicine were posed to ChatGPT, and doctors evaluated the AI's answers using a Likert scale. Sociodemographic variables were included. Questions were categorized into five knowledge domains, and means and standard deviations were calculated for each. RESULTS 130 doctors answered the questionnaire. 126 doctors (69.8% women, mean age 41.4 [9.8]) were included in the final analysis. The mean score obtained by ChatGPT was 3.1/5 [0.67]. Specialists rated ChatGPT lower than residents (3.0/5 vs. 3.3/5 points, respectively, P < 0.05). By domains, ChatGPT scored better (M: 3.96; SD: 0.71) in general/theoretical questions rather than in complex decisions/end-of-life situations (M: 2.50; SD: 0.76) and answers related to diagnosis/performing of complementary tests obtained the lowest ones (M: 2.48; SD: 0.77). CONCLUSION Scores presented big variability depending on the area of knowledge. Questions related to theoretical aspects of challenges/future in geriatrics obtained better scores. When it comes to complex decision-making, appropriateness of the therapeutic efforts or decisions about diagnostic tests, professionals indicated a poorer performance. AI is likely to be incorporated into some areas of medicine, but it would still present important limitations, mainly in complex medical decision-making.
Collapse
Affiliation(s)
- Daniel Rosselló-Jiménez
- Geriatric Medicine Department, Hospital Universitari de Terrassa, Consorci Sanitari de Terrassa, Carr. Torrebonica, s/n, Terrassa, 08227, Barcelona, Spain.
| | - S Docampo
- Geriatric Medicine Department, Hospital Santa Creu, Tortosa, Tortosa, Tarragona, Spain
| | - Y Collado
- Geriatric Medicine Department, Hospital Universitari de Terrassa, Consorci Sanitari de Terrassa, Carr. Torrebonica, s/n, Terrassa, 08227, Barcelona, Spain
| | - L Cuadra-Llopart
- Geriatric Medicine Department, Hospital Universitari de Terrassa, Consorci Sanitari de Terrassa, Carr. Torrebonica, s/n, Terrassa, 08227, Barcelona, Spain
- Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya (UIC), Barcelona, Spain
- ACTIUM Functional Anatomy Group, Universitat Internacional de Catalunya (UIC), Barcelona, Spain
| | - F Riba
- Geriatric Medicine Department, Hospital Santa Creu, Tortosa, Tortosa, Tarragona, Spain
| | - M Llonch-Masriera
- Geriatric Medicine Department, Hospital Universitari de Terrassa, Consorci Sanitari de Terrassa, Carr. Torrebonica, s/n, Terrassa, 08227, Barcelona, Spain
- Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya (UIC), Barcelona, Spain
| |
Collapse
|
15
|
Cohen SA, Brant A, Fisher AC, Pershing S, Do D, Pan C. Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery. Semin Ophthalmol 2024; 39:472-479. [PMID: 38516983 DOI: 10.1080/08820538.2024.2326058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 02/25/2024] [Accepted: 02/27/2024] [Indexed: 03/23/2024]
Abstract
PURPOSE Patients are using online search modalities to learn about their eye health. While Google remains the most popular search engine, the use of large language models (LLMs) like ChatGPT has increased. Cataract surgery is the most common surgical procedure in the US, and there is limited data on the quality of online information that populates after searches related to cataract surgery on search engines such as Google and LLM platforms such as ChatGPT. We identified the most common patient frequently asked questions (FAQs) about cataracts and cataract surgery and evaluated the accuracy, safety, and readability of the answers to these questions provided by both Google and ChatGPT. We demonstrated the utility of ChatGPT in writing notes and creating patient education materials. METHODS The top 20 FAQs related to cataracts and cataract surgery were recorded from Google. Responses to the questions provided by Google and ChatGPT were evaluated by a panel of ophthalmologists for accuracy and safety. Evaluators were also asked to distinguish between Google and LLM chatbot answers. Five validated readability indices were used to assess the readability of responses. ChatGPT was instructed to generate operative notes, post-operative instructions, and customizable patient education materials according to specific readability criteria. RESULTS Responses to 20 patient FAQs generated by ChatGPT were significantly longer and written at a higher reading level than responses provided by Google (p < .001), with an average grade level of 14.8 (college level). Expert reviewers were correctly able to distinguish between a human-reviewed and chatbot generated response an average of 31% of the time. Google answers contained incorrect or inappropriate material 27% of the time, compared with 6% of LLM generated answers (p < .001). When expert reviewers were asked to compare the responses directly, chatbot responses were favored (66%). CONCLUSIONS When comparing the responses to patients' cataract FAQs provided by ChatGPT and Google, practicing ophthalmologists overwhelming preferred ChatGPT responses. LLM chatbot responses were less likely to contain inaccurate information. ChatGPT represents a viable information source for eye health for patients with higher health literacy. ChatGPT may also be used by ophthalmologists to create customizable patient education materials for patients with varying health literacy.
Collapse
Affiliation(s)
- Samuel A Cohen
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Arthur Brant
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Ann Caroline Fisher
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Suzann Pershing
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Diana Do
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Carolyn Pan
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
16
|
Roldan-Vasquez E, Mitri S, Bhasin S, Bharani T, Capasso K, Haslinger M, Sharma R, James TA. Reliability of artificial intelligence chatbot responses to frequently asked questions in breast surgical oncology. J Surg Oncol 2024; 130:188-203. [PMID: 38837375 DOI: 10.1002/jso.27715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 05/21/2024] [Indexed: 06/07/2024]
Abstract
INTRODUCTION Artificial intelligence (AI)-driven chatbots, capable of simulating human-like conversations, are becoming more prevalent in healthcare. While this technology offers potential benefits in patient engagement and information accessibility, it raises concerns about potential misuse, misinformation, inaccuracies, and ethical challenges. METHODS This study evaluated a publicly available AI chatbot, ChatGPT, in its responses to nine questions related to breast cancer surgery selected from the American Society of Breast Surgeons' frequently asked questions (FAQ) patient education website. Four breast surgical oncologists assessed the responses for accuracy and reliability using a five-point Likert scale and the Patient Education Materials Assessment (PEMAT) Tool. RESULTS The average reliability score for ChatGPT in answering breast cancer surgery questions was 3.98 out of 5.00. Surgeons unanimously found the responses understandable and actionable per the PEMAT criteria. The consensus found ChatGPT's overall performance was appropriate, with minor or no inaccuracies. CONCLUSION ChatGPT demonstrates good reliability in responding to breast cancer surgery queries, with minor, nonharmful inaccuracies. Its answers are accurate, clear, and easy to comprehend. Notably, ChatGPT acknowledged its informational role and did not attempt to replace medical advice or discourage users from seeking input from a healthcare professional.
Collapse
Affiliation(s)
- Estefania Roldan-Vasquez
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Samir Mitri
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Shreya Bhasin
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
- School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| | - Tina Bharani
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
- Department of Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Kathryn Capasso
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Michelle Haslinger
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Ranjna Sharma
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Ted A James
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
17
|
Pompili D, Richa Y, Collins P, Richards H, Hennessey DB. Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models. World J Urol 2024; 42:455. [PMID: 39073590 PMCID: PMC11286728 DOI: 10.1007/s00345-024-05146-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 06/23/2024] [Indexed: 07/30/2024] Open
Abstract
PURPOSE Large language models (LLMs) are a form of artificial intelligence (AI) that uses deep learning techniques to understand, summarize and generate content. The potential benefits of LLMs in healthcare is predicted to be immense. The objective of this study was to examine the quality of patient information leaflets (PILs) produced by 3 LLMs on urological topics. METHODS Prompts were created to generate PILs from 3 LLMs: ChatGPT-4, PaLM 2 (Google Bard) and Llama 2 (Meta) across four urology topics (circumcision, nephrectomy, overactive bladder syndrome, and transurethral resection of the prostate). PILs were evaluated using a quality assessment checklist. PIL readability was assessed by the Average Reading Level Consensus Calculator. RESULTS PILs generated by PaLM 2 had the highest overall average quality score (3.58), followed by Llama 2 (3.34) and ChatGPT-4 (3.08). PaLM 2 generated PILs were of the highest quality in all topics except TURP and was the only LLM to include images. Medical inaccuracies were present in all generated content including instances of significant error. Readability analysis identified PaLM 2 generated PILs as the simplest (age 14-15 average reading level). Llama 2 PILs were the most difficult (age 16-17 average). CONCLUSION While LLMs can generate PILs that may help reduce healthcare professional workload, generated content requires clinician input for accuracy and inclusion of health literacy aids, such as images. LLM-generated PILs were above the average reading level for adults, necessitating improvement in LLM algorithms and/or prompt design. How satisfied patients are to LLM-generated PILs remains to be evaluated.
Collapse
Affiliation(s)
- David Pompili
- School of Medicine, University College Cork, Cork, Ireland
| | - Yasmina Richa
- School of Medicine, University College Cork, Cork, Ireland
| | - Patrick Collins
- Department of Urology, Mercy University Hospital, Cork, Ireland
| | - Helen Richards
- School of Medicine, University College Cork, Cork, Ireland
- Department of Clinical Psychology, Mercy University Hospital, Cork, Ireland
| | - Derek B Hennessey
- School of Medicine, University College Cork, Cork, Ireland.
- Department of Urology, Mercy University Hospital, Cork, Ireland.
| |
Collapse
|
18
|
Li J, Guan Z, Wang J, Cheung CY, Zheng Y, Lim LL, Lim CC, Ruamviboonsuk P, Raman R, Corsino L, Echouffo-Tcheugui JB, Luk AOY, Chen LJ, Sun X, Hamzah H, Wu Q, Wang X, Liu R, Wang YX, Chen T, Zhang X, Yang X, Yin J, Wan J, Du W, Quek TC, Goh JHL, Yang D, Hu X, Nguyen TX, Szeto SKH, Chotcomwongse P, Malek R, Normatova N, Ibragimova N, Srinivasan R, Zhong P, Huang W, Deng C, Ruan L, Zhang C, Zhang C, Zhou Y, Wu C, Dai R, Koh SWC, Abdullah A, Hee NKY, Tan HC, Liew ZH, Tien CSY, Kao SL, Lim AYL, Mok SF, Sun L, Gu J, Wu L, Li T, Cheng D, Wang Z, Qin Y, Dai L, Meng Z, Shu J, Lu Y, Jiang N, Hu T, Huang S, Huang G, Yu S, Liu D, Ma W, Guo M, Guan X, Yang X, Bascaran C, Cleland CR, Bao Y, Ekinci EI, Jenkins A, Chan JCN, Bee YM, Sivaprasad S, Shaw JE, Simó R, Keane PA, Cheng CY, Tan GSW, Jia W, Tham YC, Li H, Sheng B, Wong TY. Integrated image-based deep learning and language models for primary diabetes care. Nat Med 2024:10.1038/s41591-024-03139-8. [PMID: 39030266 DOI: 10.1038/s41591-024-03139-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 06/18/2024] [Indexed: 07/21/2024]
Abstract
Primary diabetes care and diabetic retinopathy (DR) screening persist as major public health challenges due to a shortage of trained primary care physicians (PCPs), particularly in low-resource settings. Here, to bridge the gaps, we developed an integrated image-language system (DeepDR-LLM), combining a large language model (LLM module) and image-based deep learning (DeepDR-Transformer), to provide individualized diabetes management recommendations to PCPs. In a retrospective evaluation, the LLM module demonstrated comparable performance to PCPs and endocrinology residents when tested in English and outperformed PCPs and had comparable performance to endocrinology residents in Chinese. For identifying referable DR, the average PCP's accuracy was 81.0% unassisted and 92.3% assisted by DeepDR-Transformer. Furthermore, we performed a single-center real-world prospective study, deploying DeepDR-LLM. We compared diabetes management adherence of patients under the unassisted PCP arm (n = 397) with those under the PCP+DeepDR-LLM arm (n = 372). Patients with newly diagnosed diabetes in the PCP+DeepDR-LLM arm showed better self-management behaviors throughout follow-up (P < 0.05). For patients with referral DR, those in the PCP+DeepDR-LLM arm were more likely to adhere to DR referrals (P < 0.01). Additionally, DeepDR-LLM deployment improved the quality and empathy level of management recommendations. Given its multifaceted performance, DeepDR-LLM holds promise as a digital solution for enhancing primary diabetes care and DR screening.
Collapse
Affiliation(s)
- Jiajia Li
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
- MOE Key Laboratory of AI, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Zhouyu Guan
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
| | - Jing Wang
- Department of Ophthalmology, Huadong Sanatorium, Wuxi, China
| | - Carol Y Cheung
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
| | - Yingfeng Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Lee-Ling Lim
- Department of Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Cynthia Ciwei Lim
- Department of Renal Medicine, Singapore General Hospital, SingHealth-Duke Academic Medical Centre, Singapore, Singapore
| | - Paisan Ruamviboonsuk
- Faculty of Medicine, Department of Ophthalmology, Rajavithi Hospital, College of Medicine, Rangsit University, Bangkok, Thailand
| | - Rajiv Raman
- Shri Bhagwan Mahavir Vitreoretinal Services, Medical Research Foundation, Sankara Nethralaya, Chennai, India
| | - Leonor Corsino
- Department of Medicine, Division of Endocrinology, Metabolism and Nutrition, and Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, USA
| | - Justin B Echouffo-Tcheugui
- Department of Medicine, Division of Endocrinology, Diabetes and Metabolism, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Andrea O Y Luk
- Department of Medicine and Therapeutics, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
- Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
- Asia Diabetes Foundation, Hong Kong Special Administrative Region, China
| | - Li Jia Chen
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
| | - Xiaodong Sun
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Haslina Hamzah
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
| | - Qiang Wu
- Department of Ophthalmology, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xiangning Wang
- Department of Ophthalmology, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Ruhan Liu
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
- MOE Key Laboratory of AI, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Ya Xing Wang
- Beijing Institute of Ophthalmology, Beijing Tongren Hospital, Capital Medical University, Beijing Ophthalmology and Visual Sciences Key Laboratory, Beijing, China
| | - Tingli Chen
- Department of Ophthalmology, Huadong Sanatorium, Wuxi, China
| | - Xiao Zhang
- The People's Hospital of Sixian County, Anhui, China
| | - Xiaolong Yang
- Department of Ophthalmology, Huadong Sanatorium, Wuxi, China
| | - Jun Yin
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
| | - Jing Wan
- Department of Endocrinology and Metabolism, Shanghai Eighth People's Hospital, Shanghai, China
| | - Wei Du
- Department of Endocrinology and Metabolism, Shanghai Eighth People's Hospital, Shanghai, China
| | - Ten Cheer Quek
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
| | - Jocelyn Hui Lin Goh
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
| | - Dawei Yang
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
| | - Xiaoyan Hu
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
| | - Truong X Nguyen
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
| | - Simon K H Szeto
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
| | - Peranut Chotcomwongse
- Faculty of Medicine, Department of Ophthalmology, Rajavithi Hospital, College of Medicine, Rangsit University, Bangkok, Thailand
| | - Rachid Malek
- Department of Internal Medicine, Setif University Ferhat Abbas, Setif, Algeria
| | - Nargiza Normatova
- Ophthalmology Department at Tashkent Advanced Training Institute for Doctors, Tashkent, Uzbekistan
| | - Nilufar Ibragimova
- Charity Union of Persons with Disabilities and People with Diabetes UMID, Tashkent, Uzbekistan
| | - Ramyaa Srinivasan
- Shri Bhagwan Mahavir Vitreoretinal Services, Medical Research Foundation, Sankara Nethralaya, Chennai, India
| | - Pingting Zhong
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Wenyong Huang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Chenxin Deng
- Department of Geriatrics, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Lei Ruan
- Department of Geriatrics, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Cuntai Zhang
- Department of Geriatrics, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Chenxi Zhang
- Department of Ophthalmology, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Yan Zhou
- Department of Ophthalmology, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Chan Wu
- Department of Ophthalmology, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Rongping Dai
- Department of Ophthalmology, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Sky Wei Chee Koh
- National University Polyclinics, National University Health System, Department of Family Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Adina Abdullah
- Department of Primary Care Medicine, Faculty of Medicine, Universiti Malaya, Kuala Lumpur, Malaysia
| | | | - Hong Chang Tan
- Department of Endocrinology, Singapore General Hospital, Singapore, Singapore
| | - Zhong Hong Liew
- Department of Renal Medicine, Singapore General Hospital, SingHealth-Duke Academic Medical Centre, Singapore, Singapore
| | - Carolyn Shan-Yeu Tien
- Department of Renal Medicine, Singapore General Hospital, SingHealth-Duke Academic Medical Centre, Singapore, Singapore
| | - Shih Ling Kao
- Division of Endocrinology, University Medicine Cluster, National University Health System, Singapore, Singapore
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Amanda Yuan Ling Lim
- Division of Endocrinology, University Medicine Cluster, National University Health System, Singapore, Singapore
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Shao Feng Mok
- Division of Endocrinology, University Medicine Cluster, National University Health System, Singapore, Singapore
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Lina Sun
- Department of Internal Medicine, Huadong Sanatorium, Wuxi, China
| | - Jing Gu
- Department of Internal Medicine, Huadong Sanatorium, Wuxi, China
| | - Liang Wu
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
| | - Tingyao Li
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
- MOE Key Laboratory of AI, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Di Cheng
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
| | - Zheyuan Wang
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
- MOE Key Laboratory of AI, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Yiming Qin
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
- MOE Key Laboratory of AI, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Ling Dai
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
- MOE Key Laboratory of AI, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Ziyao Meng
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
- MOE Key Laboratory of AI, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Jia Shu
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
- MOE Key Laboratory of AI, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Yuwei Lu
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
| | - Nan Jiang
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
- MOE Key Laboratory of AI, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Tingting Hu
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
| | - Shan Huang
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
- MOE Key Laboratory of AI, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Gengyou Huang
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
- MOE Key Laboratory of AI, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Shujie Yu
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
| | - Dan Liu
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
| | - Weizhi Ma
- Institute for AI Industry Research, Tsinghua University, Beijing, China
| | - Minyi Guo
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
| | - Xinping Guan
- Department of Automation and the Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai Jiao Tong University, Shanghai, China
| | - Xiaokang Yang
- MOE Key Laboratory of AI, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Covadonga Bascaran
- International Centre for Eye Health, London School of Hygiene and Tropical Medicine, University of London, London, UK
| | - Charles R Cleland
- International Centre for Eye Health, London School of Hygiene and Tropical Medicine, University of London, London, UK
| | - Yuqian Bao
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
| | - Elif I Ekinci
- Department of Endocrinology, Austin Health, Melbourne, Victoria, Australia
- Department of Medicine, The University of Melbourne (Austin Health), Melbourne, Victoria, Australia
- Australian Centre for Accelerating Diabetes Innovations, The University of Melbourne, Parkville, Victoria, Australia
| | - Alicia Jenkins
- Australian Centre for Accelerating Diabetes Innovations, The University of Melbourne, Parkville, Victoria, Australia
- Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- NHMRC Clinical Trials Centre, University of Sydney, Sydney, New South Wales, Australia
| | - Juliana C N Chan
- Department of Medicine and Therapeutics, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
- Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
- Asia Diabetes Foundation, Hong Kong Special Administrative Region, China
| | - Yong Mong Bee
- Department of Endocrinology, Singapore General Hospital, Singapore, Singapore
| | - Sobha Sivaprasad
- NIHR Moorfields Biomedical Research Centre, Moorfields Eye Hospital, London, UK
| | - Jonathan E Shaw
- Department of Medicine, The University of Melbourne (Austin Health), Melbourne, Victoria, Australia
| | - Rafael Simó
- Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas, Instituto de Salud Carlos III, Madrid, Spain
- Diabetes and Metabolism Research Unit, Vall d'Hebron Research Institut, Autonomous University of Barcelona, Barcelona, Spain
| | - Pearse A Keane
- NIHR Moorfields Biomedical Research Centre, Moorfields Eye Hospital, London, UK
- Institute of Ophthalmology, University College London, London, UK
| | - Ching-Yu Cheng
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
- Center for Innovation and Precision Eye Health and Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Gavin Siew Wei Tan
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
| | - Weiping Jia
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China.
| | - Yih-Chung Tham
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore.
- Center for Innovation and Precision Eye Health and Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- Ophthalmology and Visual Science Academic Clinical Program, Duke-NUS Medical School, Singapore, Singapore.
| | - Huating Li
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China.
| | - Bin Sheng
- Shanghai Belt and Road International Joint Laboratory of Intelligent Prevention and Treatment for Metabolic Diseases, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China.
- MOE Key Laboratory of AI, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| | - Tien Yin Wong
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore.
- School of Clinical Medicine, Tsinghua Medicine, Tsinghua University, Beijing, China.
- Beijing Tsinghua Changgung Hospital, Beijing, China.
- Zhongshan Ophthalmic Center, Guangzhou, China.
| |
Collapse
|
19
|
Tan DNH, Tham YC, Koh V, Loon SC, Aquino MC, Lun K, Cheng CY, Ngiam KY, Tan M. Evaluating Chatbot responses to patient questions in the field of glaucoma. Front Med (Lausanne) 2024; 11:1359073. [PMID: 39050528 PMCID: PMC11267485 DOI: 10.3389/fmed.2024.1359073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 06/20/2024] [Indexed: 07/27/2024] Open
Abstract
Objective The aim of this study was to evaluate the accuracy, comprehensiveness, and safety of a publicly available large language model (LLM)-ChatGPT in the sub-domain of glaucoma. Design Evaluation of diagnostic test or technology. Subjects participants and/or controls We seek to evaluate the responses of an artificial intelligence chatbot ChatGPT (version GPT-3.5, OpenAI). Methods intervention or testing We curated 24 clinically relevant questions in the domain of glaucoma. The questions spanned four categories: pertaining to diagnosis, treatment, surgeries, and ocular emergencies. Each question was posed to the LLM and the responses obtained were graded by an expert grader panel of three glaucoma specialists with combined experience of more than 30 years in the field. For responses which performed poorly, the LLM was further prompted to self-correct. The subsequent responses were then re-evaluated by the expert panel. Main outcome measures Accuracy, comprehensiveness, and safety of the responses of a public domain LLM. Results There were a total of 24 questions and three expert graders with a total number of responses of n = 72. The scores were ranked from 1 to 4, where 4 represents the best score with a complete and accurate response. The mean score of the expert panel was 3.29 with a standard deviation of 0.484. Out of the 24 question-response pairs, seven (29.2%) of them had a mean inter-grader score of 3 or less. The mean score of the original seven question-response pairs was 2.96 which rose to 3.58 after an opportunity to self-correct (z-score - 3.27, p = 0.001, Mann-Whitney U). The seven out of 24 question-response pairs which performed poorly were given a chance to self-correct. After self-correction, the proportion of responses obtaining a full score increased from 22/72 (30.6%) to 12/21 (57.1%), (p = 0.026, χ2 test). Conclusion LLMs show great promise in the realm of glaucoma with additional capabilities of self-correction. The application of LLMs in glaucoma is still in its infancy, and still requires further research and validation.
Collapse
Affiliation(s)
| | - Yih-Chung Tham
- Centre of Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
- Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore, Singapore
| | - Victor Koh
- Department of Ophthalmology, National University Hospital, Singapore, Singapore
| | - Seng Chee Loon
- Department of Ophthalmology, National University Hospital, Singapore, Singapore
| | | | - Katherine Lun
- Department of Ophthalmology, National University Hospital, Singapore, Singapore
| | - Ching-Yu Cheng
- Centre of Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
- Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore, Singapore
| | - Kee Yuan Ngiam
- Division of General Surgery (Endocrine & Thyroid Surgery), Department of Surgery, National University Hospital, Singapore, Singapore
| | - Marcus Tan
- Department of Ophthalmology, National University Hospital, Singapore, Singapore
| |
Collapse
|
20
|
Rodrigues Alessi M, Gomes HA, Lopes de Castro M, Terumy Okamoto C. Performance of ChatGPT in Solving Questions From the Progress Test (Brazilian National Medical Exam): A Potential Artificial Intelligence Tool in Medical Practice. Cureus 2024; 16:e64924. [PMID: 39156244 PMCID: PMC11330648 DOI: 10.7759/cureus.64924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/19/2024] [Indexed: 08/20/2024] Open
Abstract
Background The use of artificial intelligence (AI) is not a recent phenomenon, but the latest advancements in this technology are making a significant impact across various fields of human knowledge. In medicine, this trend is no different, although it has developed at a slower pace. ChatGPT is an example of an AI-based algorithm capable of answering questions, interpreting phrases, and synthesizing complex information, potentially aiding and even replacing humans in various areas of social interest. Some studies have compared its performance in solving medical knowledge exams with medical students and professionals to verify AI accuracy. This study aimed to measure the performance of ChatGPT in answering questions from the Progress Test from 2021 to 2023. Methodology An observational study was conducted in which questions from the 2021 Progress Test and the regional tests (Southern Institutional Pedagogical Support Center II) of 2022 and 2023 were presented to ChatGPT 3.5. The results obtained were compared with the scores of first- to sixth-year medical students from over 120 Brazilian universities. All questions were presented sequentially, without any modification to their structure. After each question was presented, the platform's history was cleared, and the site was restarted. Results The platform achieved an average accuracy rate in 2021, 2022, and 2023 of 69.7%, 68.3%, and 67.2%, respectively, surpassing students from all medical years in the three tests evaluated, reinforcing findings in the current literature. The subject with the best score for the AI was Public Health, with a mean grade of 77.8%. Conclusions ChatGPT demonstrated the ability to answer medical questions with higher accuracy than humans, including students from the last year of medical school.
Collapse
Affiliation(s)
| | - Heitor A Gomes
- School of Medicine, Universidade Positivo, Curitiba, BRA
| | | | | |
Collapse
|
21
|
Yang Z, Wang D, Zhou F, Song D, Zhang Y, Jiang J, Kong K, Liu X, Qiao Y, Chang RT, Han Y, Li F, Tham CC, Zhang X. Understanding natural language: Potential application of large language models to ophthalmology. Asia Pac J Ophthalmol (Phila) 2024; 13:100085. [PMID: 39059558 DOI: 10.1016/j.apjo.2024.100085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/19/2024] [Accepted: 07/19/2024] [Indexed: 07/28/2024] Open
Abstract
Large language models (LLMs), a natural language processing technology based on deep learning, are currently in the spotlight. These models closely mimic natural language comprehension and generation. Their evolution has undergone several waves of innovation similar to convolutional neural networks. The transformer architecture advancement in generative artificial intelligence marks a monumental leap beyond early-stage pattern recognition via supervised learning. With the expansion of parameters and training data (terabytes), LLMs unveil remarkable human interactivity, encompassing capabilities such as memory retention and comprehension. These advances make LLMs particularly well-suited for roles in healthcare communication between medical practitioners and patients. In this comprehensive review, we discuss the trajectory of LLMs and their potential implications for clinicians and patients. For clinicians, LLMs can be used for automated medical documentation, and given better inputs and extensive validation, LLMs may be able to autonomously diagnose and treat in the future. For patient care, LLMs can be used for triage suggestions, summarization of medical documents, explanation of a patient's condition, and customizing patient education materials tailored to their comprehension level. The limitations of LLMs and possible solutions for real-world use are also presented. Given the rapid advancements in this area, this review attempts to briefly cover many roles that LLMs may play in the ophthalmic space, with a focus on improving the quality of healthcare delivery.
Collapse
Affiliation(s)
- Zefeng Yang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Deming Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Fengqi Zhou
- Ophthalmology, Mayo Clinic Health System, Eau Claire, Wisconsin, USA
| | - Diping Song
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Yinhang Zhang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Jiaxuan Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Kangjie Kong
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Xiaoyi Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Yu Qiao
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Robert T Chang
- Department of Ophthalmology, Byers Eye Institute at Stanford University, Palo Alto, CA, USA
| | - Ying Han
- Department of Ophthalmology, University of California, San Francisco, San Francisco, CA, USA
| | - Fei Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China.
| | - Clement C Tham
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China; Hong Kong Eye Hospital, Kowloon, Hong Kong SAR, China; Department of Ophthalmology and Visual Sciences, Prince of Wales Hospital, Shatin, Hong Kong SAR, China.
| | - Xiulan Zhang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China.
| |
Collapse
|
22
|
Mihalache A, Huang RS, Patil NS, Popovic MM, Lee WW, Yan P, Cruz-Pimentel M, Muni RH. Chatbot and Academy Preferred Practice Pattern Guidelines on Retinal Diseases. Ophthalmol Retina 2024; 8:723-725. [PMID: 38499086 DOI: 10.1016/j.oret.2024.03.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 02/24/2024] [Accepted: 03/12/2024] [Indexed: 03/20/2024]
Affiliation(s)
- Andrew Mihalache
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Ryan S Huang
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Nikhil S Patil
- Michael G. DeGroote School of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Marko M Popovic
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Wei Wei Lee
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Peng Yan
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada; Toronto Western Hospital, University Health Network, University of Toronto, Toronto, Ontario, Canada; Department of Ophthalmology, Kensington Vision and Research Center, Toronto, Ontario, Canada
| | - Miguel Cruz-Pimentel
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Rajeev H Muni
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada; Department of Ophthalmology, St. Michael's Hospital/Unity Health Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
23
|
Feng X, Xu K, Luo MJ, Chen H, Yang Y, He Q, Song C, Li R, Wu Y, Wang H, Tham YC, Ting DSW, Lin H, Wong TY, Lam DSC. Latest developments of generative artificial intelligence and applications in ophthalmology. Asia Pac J Ophthalmol (Phila) 2024; 13:100090. [PMID: 39128549 DOI: 10.1016/j.apjo.2024.100090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 07/30/2024] [Accepted: 08/07/2024] [Indexed: 08/13/2024] Open
Abstract
The emergence of generative artificial intelligence (AI) has revolutionized various fields. In ophthalmology, generative AI has the potential to enhance efficiency, accuracy, personalization and innovation in clinical practice and medical research, through processing data, streamlining medical documentation, facilitating patient-doctor communication, aiding in clinical decision-making, and simulating clinical trials. This review focuses on the development and integration of generative AI models into clinical workflows and scientific research of ophthalmology. It outlines the need for development of a standard framework for comprehensive assessments, robust evidence, and exploration of the potential of multimodal capabilities and intelligent agents. Additionally, the review addresses the risks in AI model development and application in clinical service and research of ophthalmology, including data privacy, data bias, adaptation friction, over interdependence, and job replacement, based on which we summarized a risk management framework to mitigate these concerns. This review highlights the transformative potential of generative AI in enhancing patient care, improving operational efficiency in the clinical service and research in ophthalmology. It also advocates for a balanced approach to its adoption.
Collapse
Affiliation(s)
- Xiaoru Feng
- School of Biomedical Engineering, Tsinghua Medicine, Tsinghua University, Beijing, China; Institute for Hospital Management, Tsinghua Medicine, Tsinghua University, Beijing, China
| | - Kezheng Xu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Ming-Jie Luo
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Haichao Chen
- School of Clinical Medicine, Beijing Tsinghua Changgung Hospital, Tsinghua Medicine, Tsinghua University, Beijing, China
| | - Yangfan Yang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Qi He
- Research Centre of Big Data and Artificial Research for Medicine, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Chenxin Song
- Research Centre of Big Data and Artificial Research for Medicine, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Ruiyao Li
- Research Centre of Big Data and Artificial Research for Medicine, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - You Wu
- Institute for Hospital Management, Tsinghua Medicine, Tsinghua University, Beijing, China; School of Basic Medical Sciences, Tsinghua Medicine, Tsinghua University, Beijing, China; Department of Health Policy and Management, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA.
| | - Haibo Wang
- Research Centre of Big Data and Artificial Research for Medicine, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China.
| | - Yih Chung Tham
- Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Ophthalmology and Visual Science Academic Clinical Program, Duke-NUS Medical School, Singapore
| | - Daniel Shu Wei Ting
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Ophthalmology and Visual Science Academic Clinical Program, Duke-NUS Medical School, Singapore; Byers Eye Institute, Stanford University, Palo Alto, CA, USA
| | - Haotian Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China; Center for Precision Medicine and Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China; Hainan Eye Hospital and Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Haikou, China
| | - Tien Yin Wong
- School of Clinical Medicine, Beijing Tsinghua Changgung Hospital, Tsinghua Medicine, Tsinghua University, Beijing, China; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Tsinghua Medicine, Tsinghua University, Beijing, China
| | - Dennis Shun-Chiu Lam
- The International Eye Research Institute, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China; The C-MER International Eye Care Group, Hong Kong, Hong Kong, China
| |
Collapse
|
24
|
Shemer A, Cohen M, Altarescu A, Atar-Vardi M, Hecht I, Dubinsky-Pertzov B, Shoshany N, Zmujack S, Or L, Einan-Lifshitz A, Pras E. Diagnostic capabilities of ChatGPT in ophthalmology. Graefes Arch Clin Exp Ophthalmol 2024; 262:2345-2352. [PMID: 38183467 DOI: 10.1007/s00417-023-06363-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 12/04/2023] [Accepted: 12/23/2023] [Indexed: 01/08/2024] Open
Abstract
PURPOSE The purpose of this study is to assess the diagnostic accuracy of ChatGPT in the field of ophthalmology. METHODS This is a retrospective cohort study conducted in one academic tertiary medical center. We reviewed data of patients admitted to the ophthalmology department from 06/2022 to 01/2023. We then created two clinical cases for each patient. The first case is according to the medical history alone (Hx). The second case includes an addition of the clinical examination (Hx and Ex). For each case, we asked for the three most likely diagnoses from ChatGPT, residents, and attendings. Then, we compared the accuracy rates (at least one correct diagnosis) of all groups. Additionally, we evaluated the total duration for completing the assignment between the groups. RESULTS ChatGPT, residents, and attendings evaluated 126 cases from 63 patients (history only or history and exam findings for each patient). ChatGPT achieved a significantly lower accurate diagnosis rate (54%) in the Hx, as compared to the residents (75%; p < 0.01) and attendings (71%; p < 0.01). After adding the clinical examination findings, the diagnosis rate of ChatGPT was 68%, whereas for the residents and the attendings, it increased to 94% (p < 0.01) and 86% (p < 0.01), respectively. ChatGPT was 4 to 5 times faster than the attendings and residents. CONCLUSIONS AND RELEVANCE ChatGPT showed low diagnostic rates in ophthalmology cases compared to residents and attendings based on patient history alone or with additional clinical examination findings. However, ChatGPT completed the task faster than the physicians.
Collapse
Affiliation(s)
- Asaf Shemer
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel.
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| | - Michal Cohen
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Health Science, Ben-Gurion University of the Negev, South District, Beer-Sheva, Israel
| | - Aya Altarescu
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Maya Atar-Vardi
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Idan Hecht
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Biana Dubinsky-Pertzov
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Nadav Shoshany
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Sigal Zmujack
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Lior Or
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Adi Einan-Lifshitz
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eran Pras
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- The Matlow's Ophthalmo-Genetics Laboratory, Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
| |
Collapse
|
25
|
Durmaz Engin C, Karatas E, Ozturk T. Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity. CHILDREN (BASEL, SWITZERLAND) 2024; 11:750. [PMID: 38929329 PMCID: PMC11202218 DOI: 10.3390/children11060750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 06/02/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
BACKGROUND Large language models (LLMs) are becoming increasingly important as they are being used more frequently for providing medical information. Our aim is to evaluate the effectiveness of electronic artificial intelligence (AI) large language models (LLMs), such as ChatGPT-4, BingAI, and Gemini in responding to patient inquiries about retinopathy of prematurity (ROP). METHODS The answers of LLMs for fifty real-life patient inquiries were assessed using a 5-point Likert scale by three ophthalmologists. The models' responses were also evaluated for reliability with the DISCERN instrument and the EQIP framework, and for readability using the Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), and Coleman-Liau Index. RESULTS ChatGPT-4 outperformed BingAI and Gemini, scoring the highest with 5 points in 90% (45 out of 50) and achieving ratings of "agreed" or "strongly agreed" in 98% (49 out of 50) of responses. It led in accuracy and reliability with DISCERN and EQIP scores of 63 and 72.2, respectively. BingAI followed with scores of 53 and 61.1, while Gemini was noted for the best readability (FRE score of 39.1) but lower reliability scores. Statistically significant performance differences were observed particularly in the screening, diagnosis, and treatment categories. CONCLUSION ChatGPT-4 excelled in providing detailed and reliable responses to ROP-related queries, although its texts were more complex. All models delivered generally accurate information as per DISCERN and EQIP assessments.
Collapse
Affiliation(s)
- Ceren Durmaz Engin
- Department of Ophthalmology, Izmir Democracy University, Buca Seyfi Demirsoy Education and Research Hospital, Izmir 35390, Turkey
- Department of Biomedical Technologies, Faculty of Engineering, Dokuz Eylul University, Izmir 35390, Turkey
| | - Ezgi Karatas
- Department of Ophthalmology, Agri Ibrahim Cecen University, Agri 04200, Turkey;
| | - Taylan Ozturk
- Department of Ophthalmology, Izmir Tinaztepe University, Izmir 35400, Turkey;
| |
Collapse
|
26
|
Nguyen TP, Carvalho B, Sukhdeo H, Joudi K, Guo N, Chen M, Wolpaw JT, Kiefer JJ, Byrne M, Jamroz T, Mootz AA, Reale SC, Zou J, Sultan P. Comparison of artificial intelligence large language model chatbots in answering frequently asked questions in anaesthesia. BJA OPEN 2024; 10:100280. [PMID: 38764485 PMCID: PMC11099318 DOI: 10.1016/j.bjao.2024.100280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 03/20/2024] [Indexed: 05/21/2024]
Abstract
Background Patients are increasingly using artificial intelligence (AI) chatbots to seek answers to medical queries. Methods Ten frequently asked questions in anaesthesia were posed to three AI chatbots: ChatGPT4 (OpenAI), Bard (Google), and Bing Chat (Microsoft). Each chatbot's answers were evaluated in a randomised, blinded order by five residency programme directors from 15 medical institutions in the USA. Three medical content quality categories (accuracy, comprehensiveness, safety) and three communication quality categories (understandability, empathy/respect, and ethics) were scored between 1 and 5 (1 representing worst, 5 representing best). Results ChatGPT4 and Bard outperformed Bing Chat (median [inter-quartile range] scores: 4 [3-4], 4 [3-4], and 3 [2-4], respectively; P<0.001 with all metrics combined). All AI chatbots performed poorly in accuracy (score of ≥4 by 58%, 48%, and 36% of experts for ChatGPT4, Bard, and Bing Chat, respectively), comprehensiveness (score ≥4 by 42%, 30%, and 12% of experts for ChatGPT4, Bard, and Bing Chat, respectively), and safety (score ≥4 by 50%, 40%, and 28% of experts for ChatGPT4, Bard, and Bing Chat, respectively). Notably, answers from ChatGPT4, Bard, and Bing Chat differed statistically in comprehensiveness (ChatGPT4, 3 [2-4] vs Bing Chat, 2 [2-3], P<0.001; and Bard 3 [2-4] vs Bing Chat, 2 [2-3], P=0.002). All large language model chatbots performed well with no statistical difference for understandability (P=0.24), empathy (P=0.032), and ethics (P=0.465). Conclusions In answering anaesthesia patient frequently asked questions, the chatbots perform well on communication metrics but are suboptimal for medical content metrics. Overall, ChatGPT4 and Bard were comparable to each other, both outperforming Bing Chat.
Collapse
Affiliation(s)
- Teresa P. Nguyen
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| | - Brendan Carvalho
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| | - Hannah Sukhdeo
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| | - Kareem Joudi
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| | - Nan Guo
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| | - Marianne Chen
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| | - Jed T. Wolpaw
- Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jesse J. Kiefer
- Department of Anesthesiology and Critical Care Medicine, University of Pennsylvania School of Medicine, Philadelphia, PA, USA
| | - Melissa Byrne
- Department of Anesthesiology, Perioperative and Pain Medicine, University of Michigan Ann Arbor School of Medicine, Ann Arbor, MI, USA
| | - Tatiana Jamroz
- Department of Anesthesiology, Perioperative and Pain Medicine, Cleveland Clinic Foundation and Hospitals, Cleveland, OH, USA
| | - Allison A. Mootz
- Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital, Harvard School of Medicine, Boston, MA, USA
| | - Sharon C. Reale
- Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital, Harvard School of Medicine, Boston, MA, USA
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Pervez Sultan
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| |
Collapse
|
27
|
Shiraishi M, Tomioka Y, Miyakuni A, Moriwaki Y, Yang R, Oba J, Okazaki M. Generating Informed Consent Documents Related to Blepharoplasty Using ChatGPT. Ophthalmic Plast Reconstr Surg 2024; 40:316-320. [PMID: 38133626 DOI: 10.1097/iop.0000000000002574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
PURPOSE This study aimed to demonstrate the performance of the popular artificial intelligence (AI) language model, Chat Generative Pre-trained Transformer (ChatGPT) (OpenAI, San Francisco, CA, U.S.A.), in generating the informed consent (IC) document of blepharoplasty. METHODS A total of 2 prompts were provided to ChatGPT to generate IC documents. Four board-certified plastic surgeons and 4 nonmedical staff members evaluated the AI-generated IC documents and the original IC document currently used in the clinical setting. They assessed these documents in terms of accuracy, informativeness, and accessibility. RESULTS Among board-certified plastic surgeons, the initial AI-generated IC document scored significantly lower than the original IC document in accuracy ( p < 0.001), informativeness ( p = 0.005), and accessibility ( p = 0.021), while the revised AI-generated IC document scored lower compared with the original document in accuracy ( p = 0.03) and accessibility ( p = 0.021). Among nonmedical staff members, no statistical significance of 2 AI-generated IC documents was observed compared with the original document in terms of accuracy, informativeness, and accessibility. CONCLUSIONS The results showed that current ChatGPT cannot be used as a distinct patient education resource. However, it has the potential to make better IC documents when improving the professional terminology. This AI technology will eventually transform ophthalmic plastic surgery healthcare systematics by enhancing patient education and decision-making via IC documents.
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | | | | | | | | | | | | |
Collapse
|
28
|
Kedia N, Sanjeev S, Ong J, Chhablani J. ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology. Eye (Lond) 2024; 38:1252-1261. [PMID: 38172581 PMCID: PMC11076576 DOI: 10.1038/s41433-023-02915-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/23/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open
Abstract
ChatGPT, an artificial intelligence (AI) chatbot built on large language models (LLMs), has rapidly gained popularity. The benefits and limitations of this transformative technology have been discussed across various fields, including medicine. The widespread availability of ChatGPT has enabled clinicians to study how these tools could be used for a variety of tasks such as generating differential diagnosis lists, organizing patient notes, and synthesizing literature for scientific research. LLMs have shown promising capabilities in ophthalmology by performing well on the Ophthalmic Knowledge Assessment Program, providing fairly accurate responses to questions about retinal diseases, and in generating differential diagnoses list. There are current limitations to this technology, including the propensity of LLMs to "hallucinate", or confidently generate false information; their potential role in perpetuating biases in medicine; and the challenges in incorporating LLMs into research without allowing "AI-plagiarism" or publication of false information. In this paper, we provide a balanced overview of what LLMs are and introduce some of the LLMs that have been generated in the past few years. We discuss recent literature evaluating the role of these language models in medicine with a focus on ChatGPT. The field of AI is fast-paced, and new applications based on LLMs are being generated rapidly; therefore, it is important for ophthalmologists to be aware of how this technology works and how it may impact patient care. Here, we discuss the benefits, limitations, and future advancements of LLMs in patient care and research.
Collapse
Affiliation(s)
- Nikita Kedia
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | | | - Joshua Ong
- Department of Ophthalmology and Visual Sciences, University of Michigan Kellogg Eye Center, Ann Arbor, MI, USA
| | - Jay Chhablani
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
29
|
Momenaei B, Mansour HA, Kuriyan AE, Xu D, Sridhar J, Ting DSW, Yonekawa Y. ChatGPT enters the room: what it means for patient counseling, physician education, academics, and disease management. Curr Opin Ophthalmol 2024; 35:205-209. [PMID: 38334288 DOI: 10.1097/icu.0000000000001036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
PURPOSE OF REVIEW This review seeks to provide a summary of the most recent research findings regarding the utilization of ChatGPT, an artificial intelligence (AI)-powered chatbot, in the field of ophthalmology in addition to exploring the limitations and ethical considerations associated with its application. RECENT FINDINGS ChatGPT has gained widespread recognition and demonstrated potential in enhancing patient and physician education, boosting research productivity, and streamlining administrative tasks. In various studies examining its utility in ophthalmology, ChatGPT has exhibited fair to good accuracy, with its most recent iteration showcasing superior performance in providing ophthalmic recommendations across various ophthalmic disorders such as corneal diseases, orbital disorders, vitreoretinal diseases, uveitis, neuro-ophthalmology, and glaucoma. This proves beneficial for patients in accessing information and aids physicians in triaging as well as formulating differential diagnoses. Despite such benefits, ChatGPT has limitations that require acknowledgment including the potential risk of offering inaccurate or harmful information, dependence on outdated data, the necessity for a high level of education for data comprehension, and concerns regarding patient privacy and ethical considerations within the research domain. SUMMARY ChatGPT is a promising new tool that could contribute to ophthalmic healthcare education and research, potentially reducing work burdens. However, its current limitations necessitate a complementary role with human expert oversight.
Collapse
Affiliation(s)
- Bita Momenaei
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Hana A Mansour
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Ajay E Kuriyan
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - David Xu
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Jayanth Sridhar
- University of California Los Angeles, Los Angeles, California, USA
| | | | - Yoshihiro Yonekawa
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| |
Collapse
|
30
|
Mihalache A, Huang RS, Popovic MM, Muni RH. Artificial intelligence chatbot and Academy Preferred Practice Pattern ® Guidelines on cataract and glaucoma. J Cataract Refract Surg 2024; 50:534-535. [PMID: 38468154 DOI: 10.1097/j.jcrs.0000000000001317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 09/08/2023] [Indexed: 03/13/2024]
Affiliation(s)
- Andrew Mihalache
- From the Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada (Mihalache, Huang); Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada (Popovic, Muni); Department of Ophthalmology, St. Michael's Hospital/Unity Health Toronto, Toronto, Ontario, Canada (Muni)
| | | | | | | |
Collapse
|
31
|
Biswas S, Davies LN, Sheppard AL, Logan NS, Wolffsohn JS. Utility of artificial intelligence-based large language models in ophthalmic care. Ophthalmic Physiol Opt 2024; 44:641-671. [PMID: 38404172 DOI: 10.1111/opo.13284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 01/23/2024] [Accepted: 01/25/2024] [Indexed: 02/27/2024]
Abstract
PURPOSE With the introduction of ChatGPT, artificial intelligence (AI)-based large language models (LLMs) are rapidly becoming popular within the scientific community. They use natural language processing to generate human-like responses to queries. However, the application of LLMs and comparison of the abilities among different LLMs with their human counterparts in ophthalmic care remain under-reported. RECENT FINDINGS Hitherto, studies in eye care have demonstrated the utility of ChatGPT in generating patient information, clinical diagnosis and passing ophthalmology question-based examinations, among others. LLMs' performance (median accuracy, %) is influenced by factors such as the iteration, prompts utilised and the domain. Human expert (86%) demonstrated the highest proficiency in disease diagnosis, while ChatGPT-4 outperformed others in ophthalmology examinations (75.9%), symptom triaging (98%) and providing information and answering questions (84.6%). LLMs exhibited superior performance in general ophthalmology but reduced accuracy in ophthalmic subspecialties. Although AI-based LLMs like ChatGPT are deemed more efficient than their human counterparts, these AIs are constrained by their nonspecific and outdated training, no access to current knowledge, generation of plausible-sounding 'fake' responses or hallucinations, inability to process images, lack of critical literature analysis and ethical and copyright issues. A comprehensive evaluation of recently published studies is crucial to deepen understanding of LLMs and the potential of these AI-based LLMs. SUMMARY Ophthalmic care professionals should undertake a conservative approach when using AI, as human judgement remains essential for clinical decision-making and monitoring the accuracy of information. This review identified the ophthalmic applications and potential usages which need further exploration. With the advancement of LLMs, setting standards for benchmarking and promoting best practices is crucial. Potential clinical deployment requires the evaluation of these LLMs to move away from artificial settings, delve into clinical trials and determine their usefulness in the real world.
Collapse
Affiliation(s)
- Sayantan Biswas
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Leon N Davies
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Amy L Sheppard
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Nicola S Logan
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - James S Wolffsohn
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| |
Collapse
|
32
|
Knebel D, Priglinger S, Scherer N, Klaas J, Siedlecki J, Schworm B. Assessment of ChatGPT in the Prehospital Management of Ophthalmological Emergencies - An Analysis of 10 Fictional Case Vignettes. Klin Monbl Augenheilkd 2024; 241:675-681. [PMID: 37890504 DOI: 10.1055/a-2149-0447] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/29/2023]
Abstract
BACKGROUND The artificial intelligence (AI)-based platform ChatGPT (Chat Generative Pre-Trained Transformer, OpenAI LP, San Francisco, CA, USA) has gained impressive popularity in recent months. Its performance on case vignettes of general medical (non-ophthalmological) emergencies has been assessed - with very encouraging results. The purpose of this study was to assess the performance of ChatGPT on ophthalmological emergency case vignettes in terms of the main outcome measures triage accuracy, appropriateness of recommended prehospital measures, and overall potential to inflict harm to the user/patient. METHODS We wrote ten short, fictional case vignettes describing different acute ophthalmological symptoms. Each vignette was entered into ChatGPT five times with the same wording and following a standardized interaction pathway. The answers were analyzed following a systematic approach. RESULTS We observed a triage accuracy of 93.6%. Most answers contained only appropriate recommendations for prehospital measures. However, an overall potential to inflict harm to users/patients was present in 32% of answers. CONCLUSION ChatGPT should presently not be used as a stand-alone primary source of information about acute ophthalmological symptoms. As AI continues to evolve, its safety and efficacy in the prehospital management of ophthalmological emergencies has to be reassessed regularly.
Collapse
Affiliation(s)
- Dominik Knebel
- Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
| | - Siegfried Priglinger
- Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
| | - Nicolas Scherer
- Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
| | - Julian Klaas
- Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
| | - Jakob Siedlecki
- Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
| | - Benedikt Schworm
- Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
| |
Collapse
|
33
|
Kaftan AN, Hussain MK, Naser FH. Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study. Sci Rep 2024; 14:8233. [PMID: 38589613 PMCID: PMC11002004 DOI: 10.1038/s41598-024-58964-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 04/05/2024] [Indexed: 04/10/2024] Open
Abstract
With the release of ChatGPT at the end of 2022, a new era of thinking and technology use has begun. Artificial intelligence models (AIs) like Gemini (Bard), Copilot (Bing), and ChatGPT-3.5 have the potential to impact every aspect of our lives, including laboratory data interpretation. To assess the accuracy of ChatGPT-3.5, Copilot, and Gemini responses in evaluating biochemical data. Ten simulated patients' biochemical laboratory data, including serum urea, creatinine, glucose, cholesterol, triglycerides, low-density lipoprotein (LDL-c), and high-density lipoprotein (HDL-c), in addition to HbA1c, were interpreted by three AIs: Copilot, Gemini, and ChatGPT-3.5, followed by evaluation with three raters. The study was carried out using two approaches. The first encompassed all biochemical data. The second contained only kidney function data. The first approach indicated Copilot to have the highest level of accuracy, followed by Gemini and ChatGPT-3.5. Friedman and Dunn's post-hoc test revealed that Copilot had the highest mean rank; the pairwise comparisons revealed significant differences for Copilot vs. ChatGPT-3.5 (P = 0.002) and Gemini (P = 0.008). The second approach exhibited Copilot to have the highest accuracy of performance. The Friedman test with Dunn's post-hoc analysis showed Copilot to have the highest mean rank. The Wilcoxon Signed-Rank Test demonstrated an indistinguishable response (P = 0.5) of Copilot when all laboratory data were applied vs. the application of only kidney function data. Copilot is more accurate in interpreting biochemical data than Gemini and ChatGPT-3.5. Its consistent responses across different data subsets highlight its reliability in this context.
Collapse
|
34
|
Braun EM, Juhasz-Böss I, Solomayer EF, Truhn D, Keller C, Heinrich V, Braun BJ. Will I soon be out of my job? Quality and guideline conformity of ChatGPT therapy suggestions to patient inquiries with gynecologic symptoms in a palliative setting. Arch Gynecol Obstet 2024; 309:1543-1549. [PMID: 37975899 DOI: 10.1007/s00404-023-07272-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 10/15/2023] [Indexed: 11/19/2023]
Abstract
PURPOSE The market and application possibilities for artificial intelligence are currently growing at high speed and are increasingly finding their way into gynecology. While the medical side is highly represented in the current literature, the patient's perspective is still lagging behind. Therefore, the aim of this study was to evaluate the recommendations of ChatGPT regarding patient inquiries about the possible therapy of gynecological leading symptoms in a palliative situation by experts. METHODS Case vignettes were constructed for 10 common concomitant symptoms in gynecologic oncology tumors in a palliative setting, and patient queries regarding therapy of these symptoms were generated as prompts for ChatGPT. Five experts in palliative care and gynecologic oncology evaluated the responses with respect to guideline adherence and applicability and identified advantages and disadvantages. RESULTS The overall rating of ChatGPT responses averaged 4.1 (5 = strongly agree; 1 = strongly disagree). The experts saw an average guideline conformity of the therapy recommendations with a value of 4.0. ChatGPT sometimes omits relevant therapies and does not provide an individual assessment of the suggested therapies, but does indicate that a physician consultation is additionally necessary. CONCLUSIONS Language models, such as ChatGPT, can provide valid and largely guideline-compliant therapy recommendations in their freely available and thus in principle accessible version for our patients. For a complete therapy recommendation, an evaluation of the therapies, their individual adjustment as well as a filtering of possible wrong recommendations, a medical expert's opinion remains indispensable.
Collapse
Affiliation(s)
- Eva-Marie Braun
- Center for Integrative Oncology, Die Filderklinik, Im Haberschlai 7, 70794, Filderstadt-Bonlanden, Germany.
| | - Ingolf Juhasz-Böss
- Department of Gynecology, University Medical Center Freiburg, Hugstetter Straße 55, 79106, Freiburg, Germany
| | - Erich-Franz Solomayer
- Department of Gynecology, Obstetrics and Reproductive Medicine, Saarland University Hospital, Kirrberger Straße, Building 9, 66421, Homburg, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Pauwelsstraße 30, 52074, Aachen, Germany
| | - Christiane Keller
- Center for Palliative Medicine and Pediatric Pain Therapy, Saarland University Hospital, Kirrberger Straße, Building 69, 66421, Homburg, Germany
| | - Vanessa Heinrich
- Department of Radiation Oncology, University Hospital Tübingen, Crona Kliniken, Hoppe-Seyler-Str. 3, 72076, Tübingen, Germany
| | - Benedikt Johannes Braun
- Department of Trauma and Reconstructive Surgery at the Eberhard Karls University Tübingen, BG Unfallklinik Tübingen, Schnarrenbergstrasse 95, 72076, Tübingen, Germany
| |
Collapse
|
35
|
Teixeira-Marques F, Medeiros N, Nazaré F, Alves S, Lima N, Ribeiro L, Gama R, Oliveira P. Exploring the role of ChatGPT in clinical decision-making in otorhinolaryngology: a ChatGPT designed study. Eur Arch Otorhinolaryngol 2024; 281:2023-2030. [PMID: 38345613 DOI: 10.1007/s00405-024-08498-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 01/23/2024] [Indexed: 03/16/2024]
Abstract
PURPOSE Since the beginning of 2023, ChatGPT emerged as a hot topic in healthcare research. The potential to be a valuable tool in clinical practice is compelling, particularly in improving clinical decision support by helping physicians to make clinical decisions based on the best medical knowledge available. We aim to investigate ChatGPT's ability to identify, diagnose and manage patients with otorhinolaryngology-related symptoms. METHODS A prospective, cross-sectional study was designed based on an idea suggested by ChatGPT to assess the level of agreement between ChatGPT and five otorhinolaryngologists (ENTs) in 20 reality-inspired clinical cases. The clinical cases were presented to the chatbot on two different occasions (ChatGPT-1 and ChatGPT-2) to assess its temporal stability. RESULTS The mean score of ChatGPT-1 was 4.4 (SD 1.2; min 1, max 5) and of ChatGPT-2 was 4.15 (SD 1.3; min 1, max 5), while the ENTs mean score was 4.91 (SD 0.3; min 3, max 5). The Mann-Whitney U test revealed a statistically significant difference (p < 0.001) between both ChatGPT's and the ENTs's score. ChatGPT-1 and ChatGPT-2 gave different answers in five occasions. CONCLUSIONS Artificial intelligence will be an important instrument in clinical decision-making in the near future and ChatGPT is the most promising chatbot so far. Despite needing further development to be used with safety, there is room for improvement and potential to aid otorhinolaryngology residents and specialists in making the most correct decision for the patient.
Collapse
Affiliation(s)
- Francisco Teixeira-Marques
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal.
| | - Nuno Medeiros
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Francisco Nazaré
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Sandra Alves
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Nuno Lima
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Leandro Ribeiro
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Rita Gama
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Pedro Oliveira
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| |
Collapse
|
36
|
Mark J, Subhi Y. Blinded by Stress: A Patient and Physician Perspective on Central Serous Chorioretinopathy. Ophthalmol Ther 2024; 13:861-866. [PMID: 38386185 PMCID: PMC10912400 DOI: 10.1007/s40123-024-00907-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 02/05/2024] [Indexed: 02/23/2024] Open
Abstract
This commentary is co-authored by a patient with central serous chorioretinopathy (CSC), which is the fourth most common exudative maculopathy. The patient, a young and profiled member of the Danish Parliament, kindly shares his experience living with stress, onset of symptoms, and the experience of being diagnosed with CSC and receiving photodynamic treatment. The experiences of the patient are put into perspective by an ophthalmologist.
Collapse
Affiliation(s)
- Jacob Mark
- Patient Author, Christiansborg, Copenhagen, Denmark
| | - Yousif Subhi
- Department of Clinical Research, University of Southern Denmark, J.B. Winsløws Vej 19.3, 5000, Odense C, Denmark.
- Department of Ophthalmology, Zealand University Hospital, Roskilde, Denmark.
- Department of Ophthalmology, Rigshospitalet, Glostrup, Denmark.
| |
Collapse
|
37
|
Gurnani B, Kaur K. Leveraging ChatGPT for ophthalmic education: A critical appraisal. Eur J Ophthalmol 2024; 34:323-327. [PMID: 37974429 DOI: 10.1177/11206721231215862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
In recent years, the advent of artificial intelligence (AI) has transformed many sectors, including medical education. This editorial critically appraises the integration of ChatGPT, a state-of-the-art AI language model, into ophthalmic education, focusing on its potential, limitations, and ethical considerations. The application of ChatGPT in teaching and training ophthalmologists presents an innovative method to offer real-time, customized learning experiences. Through a systematic analysis of both experimental and clinical data, this editorial examines how ChatGPT enhances engagement, understanding, and retention of complex ophthalmological concepts. The study also evaluates the efficacy of ChatGPT in simulating patient interactions and clinical scenarios, which can foster improved diagnostic and interpersonal skills. Despite the promising advantages, concerns regarding reliability, lack of personal touch, and potential biases in the AI-generated content are scrutinized. Ethical considerations concerning data privacy and potential misuse are also explored. The findings underline the need for carefully designed integration, continuous evaluation, and adherence to ethical guidelines to maximize benefits while mitigating risks. By shedding light on these multifaceted aspects, this paper contributes to the ongoing discourse on the incorporation of AI in medical education, offering valuable insights and guidance for educators, practitioners, and policymakers aiming to leverage modern technology for enhancing ophthalmic education.
Collapse
Affiliation(s)
- Bharat Gurnani
- Cataract, Cornea, Trauma, External Diseases, Ocular Surface and Refractive Services, ASG Eye Hospital, Jodhpur, Rajasthan, India
- Sadguru Netra Chikitsalya, Shri Sadguru Seva Sangh Trust, Chitrakoot, Madhya Pradesh, India
| | - Kirandeep Kaur
- Cataract, Pediatric Ophthalmology and Strabismus, ASG Eye Hospital, Jodhpur, Rajasthan, India
- Children Eye Care Centre, Sadguru Netra Chikitsalya, Shri Sadguru Seva Sangh Trust, Chitrakoot, Madhya Pradesh, India
| |
Collapse
|
38
|
Wei Q, Yao Z, Cui Y, Wei B, Jin Z, Xu X. Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis. J Biomed Inform 2024; 151:104620. [PMID: 38462064 DOI: 10.1016/j.jbi.2024.104620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/27/2024] [Accepted: 02/29/2024] [Indexed: 03/12/2024]
Abstract
OBJECTIVE Large language models (LLMs) such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT's performance in answering medical questions and provide direction for future research. METHODS An extensive literature search was conducted on June 15, 2023, across ten medical databases. The keyword used was "ChatGPT," without restrictions on publication type, language, or date. Studies evaluating ChatGPT's performance in answering medical questions were included. Exclusions comprised review articles, comments, patents, non-medical evaluations of ChatGPT, and preprint studies. Data was extracted on general study characteristics, question sources, conversation processes, assessment metrics, and performance of ChatGPT. An evaluation framework for LLM in medical inquiries was proposed by integrating insights from selected literature. This study is registered with PROSPERO, CRD42023456327. RESULTS A total of 3520 articles were identified, of which 60 were reviewed and summarized in this paper and 17 were included in the meta-analysis. ChatGPT displayed an overall integrated accuracy of 56 % (95 % CI: 51 %-60 %, I2 = 87 %) in addressing medical queries. However, the studies varied in question resource, question-asking process, and evaluation metrics. As per our proposed evaluation framework, many studies failed to report methodological details, such as the date of inquiry, version of ChatGPT, and inter-rater consistency. CONCLUSION This review reveals ChatGPT's potential in addressing medical inquiries, but the heterogeneity of the study design and insufficient reporting might affect the results' reliability. Our proposed evaluation framework provides insights for the future study design and transparent reporting of LLM in responding to medical questions.
Collapse
Affiliation(s)
- Qiuhong Wei
- Big Data Center for Children's Medical Care, Children's Hospital of Chongqing Medical University, Chongqing, China; Children Nutrition Research Center, Children's Hospital of Chongqing Medical University, Chongqing, China; National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, Chongqing Key Laboratory of Child Neurodevelopment and Cognitive Disorders, Chongqing, China
| | - Zhengxiong Yao
- Department of Neurology, Children's Hospital of Chongqing Medical University, Chongqing, China
| | - Ying Cui
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Bo Wei
- Department of Global Statistics and Data Science, BeiGene USA Inc., San Mateo, CA, USA
| | - Zhezhen Jin
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Ximing Xu
- Big Data Center for Children's Medical Care, Children's Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
39
|
Nikdel M, Ghadimi H, Tavakoli M, Suh DW. Assessment of the Responses of the Artificial Intelligence-based Chatbot ChatGPT-4 to Frequently Asked Questions About Amblyopia and Childhood Myopia. J Pediatr Ophthalmol Strabismus 2024; 61:86-89. [PMID: 37882183 DOI: 10.3928/01913913-20231005-02] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]
Abstract
PURPOSE To assess the responses of the ChatGPT-4, the forerunner artificial intelligence-based chatbot, to frequently asked questions regarding two common pediatric ophthalmologic disorders, amblyopia and childhood myopia. METHODS Twenty-seven questions about amblyopia and 28 questions about childhood myopia were asked of the ChatGPT twice (totally 110 questions). The responses were evaluated by two pediatric ophthalmologists as acceptable, incomplete, or unacceptable. RESULTS There was remarkable agreement (96.4%) between the two pediatric ophthalmologists on their assessment of the responses. Acceptable responses were provided by the ChatGPT to 93 of 110 (84.6%) questions in total (44 of 54 [81.5%] for amblyopia and 49 of 56 [87.5%] questions for childhood myopia). Seven of 54 (12.9%) responses to questions on amblyopia were graded as incomplete compared to 4 of 56 (7.1%) of questions on childhood myopia. The ChatGPT gave inappropriate responses to three questions about amblyopia (5.6%) and childhood myopia (5.4%). The most noticeable inappropriate responses were related to the definition of reverse amblyopia and the threshold of refractive error for prescription of spectacles to children with myopia. CONCLUSIONS The ChatGPT has the potential to serve as an adjunct informational tool for pediatric ophthalmology patients and their caregivers by demonstrating a relatively good performance in answering 84.6% of the most frequently asked questions about amblyopia and childhood myopia. [J Pediatr Ophthalmol Strabismus. 2024;61(2):86-89.].
Collapse
|
40
|
Yalla GR, Hyman N, Hock LE, Zhang Q, Shukla AG, Kolomeyer NN. Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures. Cureus 2024; 16:e56766. [PMID: 38650824 PMCID: PMC11034394 DOI: 10.7759/cureus.56766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2024] [Indexed: 04/25/2024] Open
Abstract
Introduction With the potential for artificial intelligence (AI) chatbots to serve as the primary source of glaucoma information to patients, it is essential to characterize the information that chatbots provide such that providers can tailor discussions, anticipate patient concerns, and identify misleading information. Therefore, the purpose of this study was to evaluate glaucoma information from AI chatbots, including ChatGPT-4, Bard, and Bing, by analyzing response accuracy, comprehensiveness, readability, word count, and character count in comparison to each other and glaucoma-related American Academy of Ophthalmology (AAO) patient materials. Methods Section headers from AAO glaucoma-related patient education brochures were adapted into question form and asked five times to each AI chatbot (ChatGPT-4, Bard, and Bing). Two sets of responses from each chatbot were used to evaluate the accuracy of AI chatbot responses and AAO brochure information, and the comprehensiveness of AI chatbot responses compared to the AAO brochure information, scored 1-5 by three independent glaucoma-trained ophthalmologists. Readability (assessed with Flesch-Kincaid Grade Level (FKGL), corresponding to the United States school grade levels), word count, and character count were determined for all chatbot responses and AAO brochure sections. Results Accuracy scores for AAO, ChatGPT, Bing, and Bard were 4.84, 4.26, 4.53, and 3.53, respectively. On direct comparison, AAO was more accurate than ChatGPT (p=0.002), and Bard was the least accurate (Bard versus AAO, p<0.001; Bard versus ChatGPT, p<0.002; Bard versus Bing, p=0.001). ChatGPT had the most comprehensive responses (ChatGPT versus Bing, p<0.001; ChatGPT versus Bard p=0.008), with comprehensiveness scores for ChatGPT, Bing, and Bard at 3.32, 2.16, and 2.79, respectively. AAO information and Bard responses were at the most accessible readability levels (AAO versus ChatGPT, AAO versus Bing, Bard versus ChatGPT, Bard versus Bing, all p<0.0001), with readability levels for AAO, ChatGPT, Bing, and Bard at 8.11, 13.01, 11.73, and 7.90, respectively. Bing responses had the lowest word and character count. Conclusion AI chatbot responses varied in accuracy, comprehensiveness, and readability. With accuracy scores and comprehensiveness below that of AAO brochures and elevated readability levels, AI chatbots require improvements to be a more useful supplementary source of glaucoma information for patients. Physicians must be aware of these limitations such that patients are asked about existing knowledge and questions and are then provided with clarifying and comprehensive information.
Collapse
Affiliation(s)
- Goutham R Yalla
- Department of Ophthalmology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, USA
- Glaucoma Research Center, Wills Eye Hospital, Philadelphia, USA
| | - Nicholas Hyman
- Department of Ophthalmology, Vagelos College of Physicians and Surgeons, Columbia University, New York, USA
- Department of Ophthalmology, Glaucoma Division, Columbia University Irving Medical Center, New York, USA
| | - Lauren E Hock
- Glaucoma Research Center, Wills Eye Hospital, Philadelphia, USA
| | - Qiang Zhang
- Glaucoma Research Center, Wills Eye Hospital, Philadelphia, USA
- Biostatistics Consulting Core, Vickie and Jack Farber Vision Research Center, Wills Eye Hospital, Philadelphia, USA
| | - Aakriti G Shukla
- Department of Ophthalmology, Glaucoma Division, Columbia University Irving Medical Center, New York, USA
| | | |
Collapse
|
41
|
Høj S, Thomsen SF, Meteran H, Sigsgaard T, Meteran H. Artificial intelligence and allergic rhinitis: does ChatGPT increase or impair the knowledge? J Public Health (Oxf) 2024; 46:123-126. [PMID: 37968109 DOI: 10.1093/pubmed/fdad219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 09/14/2023] [Accepted: 10/06/2023] [Indexed: 11/17/2023] Open
Abstract
BACKGROUND Optimal management of allergic rhinitis requires patient education with easy access to accurate information. However, previous online platforms have provided misleading information. The demand for online medical information continues to grow, especially with the introduction of advanced chatbots like ChatGPT. METHODS This study aimed to evaluate the quality of information provided by ChatGPT regarding allergic rhinitis. A Likert scale was used to assess the accuracy of responses, ranging from 1 to 5. Four authors independently rated the responses from a healthcare professional's perspective. RESULTS A total of 20 questions covering various aspects of allergic rhinitis were asked. Among the answers, eight received a score of 5 (no inaccuracies), five received a score of 4 (minor non-harmful inaccuracies), six received a score of 3 (potentially misinterpretable inaccuracies) and one answer had a score of 2 (minor potentially harmful inaccuracies). CONCLUSIONS The variability in accuracy scores highlights the need for caution when relying solely on chatbots like ChatGPT for medical advice. Patients should consult qualified healthcare professionals and use online sources as a supplement. While ChatGPT has advantages in medical information delivery, its use should be approached with caution. ChatGPT can be useful for patient education but cannot replace healthcare professionals.
Collapse
Affiliation(s)
- Simon Høj
- Steno Diabetes Center Copenhagen, Copenhagen University Hospital, Herlev 2730, Denmark
- Department of Dermatology, Venereology, and Wound Healing Centre, Copenhagen University Hospital-Bispebjerg, Copenhagen 2400, Denmark
- Department of Public Health, Environment, Occupation, and Health, Aarhus University, Aarhus 8000, Denmark
| | - Simon F Thomsen
- Department of Dermatology, Venereology, and Wound Healing Centre, Copenhagen University Hospital-Bispebjerg, Copenhagen 2400, Denmark
- Department of Biomedical Sciences, University of Copenhagen, Copenhagen 2200, Denmark
| | - Hanieh Meteran
- Department of Internal Medicine, Section of Endocrinology, Copenhagen University Hospital-Hvidovre, Hvidovre 2650, Denmark
| | - Torben Sigsgaard
- Department of Public Health, Environment, Occupation, and Health, Aarhus University, Aarhus 8000, Denmark
| | - Howraman Meteran
- Department of Public Health, Environment, Occupation, and Health, Aarhus University, Aarhus 8000, Denmark
- Department of Internal Medicine, Respiratory Medicine Section, Copenhagen University Hospital-Hvidovre, Hvidovre 2650, Denmark
- Department of Respiratory Medicine, Zealand University Hospital Roskilde-Næstved, Næstved 4700, Denmark
| |
Collapse
|
42
|
Marshall RF, Mallem K, Xu H, Thorne J, Burkholder B, Chaon B, Liberman P, Berkenstock M. Investigating the Accuracy and Completeness of an Artificial Intelligence Large Language Model About Uveitis: An Evaluation of ChatGPT. Ocul Immunol Inflamm 2024:1-4. [PMID: 38394625 DOI: 10.1080/09273948.2024.2317417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 02/06/2024] [Indexed: 02/25/2024]
Abstract
PURPOSE To assess the accuracy and completeness of ChatGPT-generated answers regarding uveitis description, prevention, treatment, and prognosis. METHODS Thirty-two uveitis-related questions were generated by a uveitis specialist and inputted into ChatGPT 3.5. Answers were compiled into a survey and were reviewed by five uveitis specialists using standardized Likert scales of accuracy and completeness. RESULTS In total, the median accuracy score for all the uveitis questions (n = 32) was 4.00 (between "more correct than incorrect" and "nearly all correct"), and the median completeness score was 2.00 ("adequate, addresses all aspects of the question and provides the minimum amount of information required to be considered complete"). The interrater variability assessment had a total kappa value of 0.0278 for accuracy and 0.0847 for completeness. CONCLUSION ChatGPT can provide relatively high accuracy responses for various questions related to uveitis; however, the answers it provides are incomplete, with some inaccuracies. Its utility in providing medical information requires further validation and development prior to serving as a source of uveitis information for patients.
Collapse
Affiliation(s)
- Rayna F Marshall
- The Drexel University College of Medicine, Philadelphia, Pennsylvania, USA
| | - Krishna Mallem
- The Drexel University College of Medicine, Philadelphia, Pennsylvania, USA
| | - Hannah Xu
- University of California San Diego, San Diego, California, USA
| | - Jennifer Thorne
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Bryn Burkholder
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Benjamin Chaon
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Paulina Liberman
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Meghan Berkenstock
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
43
|
Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024; 44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND The rapidly evolving field of artificial intelligence (AI) holds great potential for plastic surgeons. ChatGPT, a recently released AI large language model (LLM), promises applications across many disciplines, including healthcare. OBJECTIVES The aim of this article was to provide a primer for plastic surgeons on AI, LLM, and ChatGPT, including an analysis of current demonstrated and proposed clinical applications. METHODS A systematic review was performed identifying medical and surgical literature on ChatGPT's proposed clinical applications. Variables assessed included applications investigated, command tasks provided, user input information, AI-emulated human skills, output validation, and reported limitations. RESULTS The analysis included 175 articles reporting on 13 plastic surgery applications and 116 additional clinical applications, categorized by field and purpose. Thirty-four applications within plastic surgery are thus proposed, with relevance to different target audiences, including attending plastic surgeons (n = 17, 50%), trainees/educators (n = 8, 24.0%), researchers/scholars (n = 7, 21%), and patients (n = 2, 6%). The 15 identified limitations of ChatGPT were categorized by training data, algorithm, and ethical considerations. CONCLUSIONS Widespread use of ChatGPT in plastic surgery will depend on rigorous research of proposed applications to validate performance and address limitations. This systemic review aims to guide research, development, and regulation to safely adopt AI in plastic surgery.
Collapse
|
44
|
Hatia A, Doldo T, Parrini S, Chisci E, Cipriani L, Montagna L, Lagana G, Guenza G, Agosta E, Vinjolli F, Hoxha M, D’Amelio C, Favaretto N, Chisci G. Accuracy and Completeness of ChatGPT-Generated Information on Interceptive Orthodontics: A Multicenter Collaborative Study. J Clin Med 2024; 13:735. [PMID: 38337430 PMCID: PMC10856539 DOI: 10.3390/jcm13030735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 01/21/2024] [Accepted: 01/25/2024] [Indexed: 02/12/2024] Open
Abstract
Background: this study aims to investigate the accuracy and completeness of ChatGPT in answering questions and solving clinical scenarios of interceptive orthodontics. Materials and Methods: ten specialized orthodontists from ten Italian postgraduate orthodontics schools developed 21 clinical open-ended questions encompassing all of the subspecialities of interceptive orthodontics and 7 comprehensive clinical cases. Questions and scenarios were inputted into ChatGPT4, and the resulting answers were evaluated by the researchers using predefined accuracy (range 1-6) and completeness (range 1-3) Likert scales. Results: For the open-ended questions, the overall median score was 4.9/6 for the accuracy and 2.4/3 for completeness. In addition, the reviewers rated the accuracy of open-ended answers as entirely correct (score 6 on Likert scale) in 40.5% of cases and completeness as entirely correct (score 3 n Likert scale) in 50.5% of cases. As for the clinical cases, the overall median score was 4.9/6 for accuracy and 2.5/3 for completeness. Overall, the reviewers rated the accuracy of clinical case answers as entirely correct in 46% of cases and the completeness of clinical case answers as entirely correct in 54.3% of cases. Conclusions: The results showed a high level of accuracy and completeness in AI responses and a great ability to solve difficult clinical cases, but the answers were not 100% accurate and complete. ChatGPT is not yet sophisticated enough to replace the intellectual work of human beings.
Collapse
Affiliation(s)
- Arjeta Hatia
- Orthodontics Postgraduate School, Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy; (T.D.); (L.C.)
| | - Tiziana Doldo
- Orthodontics Postgraduate School, Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy; (T.D.); (L.C.)
| | - Stefano Parrini
- Oral Surgery Postgraduate School, Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy;
| | - Elettra Chisci
- Orthodontics Postgraduate School, University of Ferrara, 44121 Ferrara, Italy
| | - Linda Cipriani
- Orthodontics Postgraduate School, Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy; (T.D.); (L.C.)
| | - Livia Montagna
- Orthodontics Postgraduate School, University of Cagliari, 09121 Cagliari, Italy;
| | - Giuseppina Lagana
- Orthodontics Postgraduate School, “Sapienza” University of Rome, 00185 Rome, Italy;
| | - Guia Guenza
- Orthodontics Postgraduate School, University of Milano, 20019 Milan, Italy
| | - Edoardo Agosta
- Orthodontics Postgraduate School, University of Torino, 10024 Turin, Italy
| | - Franceska Vinjolli
- Orthodontics Postgraduate School, University of Roma Tor Vergata, 00133 Rome, Italy;
| | - Meladiona Hoxha
- Orthodontics Postgraduate School, “Cattolica” University of Rome, 00168 Rome, Italy;
| | - Claudio D’Amelio
- Orthodontics Postgraduate School, University of Chieti, 66100 Chieti, Italy;
| | - Nicolò Favaretto
- Orthodontics Postgraduate School, University of Trieste, 34100 Trieste, Italy
| | - Glauco Chisci
- Oral Surgery Postgraduate School, Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy;
| |
Collapse
|
45
|
Gravina AG, Pellegrino R, Cipullo M, Palladino G, Imperio G, Ventura A, Auletta S, Ciamarra P, Federico A. May ChatGPT be a tool producing medical information for common inflammatory bowel disease patients' questions? An evidence-controlled analysis. World J Gastroenterol 2024; 30:17-33. [PMID: 38293321 PMCID: PMC10823903 DOI: 10.3748/wjg.v30.i1.17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/07/2023] [Accepted: 12/28/2023] [Indexed: 01/06/2024] Open
Abstract
Artificial intelligence is increasingly entering everyday healthcare. Large language model (LLM) systems such as Chat Generative Pre-trained Transformer (ChatGPT) have become potentially accessible to everyone, including patients with inflammatory bowel diseases (IBD). However, significant ethical issues and pitfalls exist in innovative LLM tools. The hype generated by such systems may lead to unweighted patient trust in these systems. Therefore, it is necessary to understand whether LLMs (trendy ones, such as ChatGPT) can produce plausible medical information (MI) for patients. This review examined ChatGPT's potential to provide MI regarding questions commonly addressed by patients with IBD to their gastroenterologists. From the review of the outputs provided by ChatGPT, this tool showed some attractive potential while having significant limitations in updating and detailing information and providing inaccurate information in some cases. Further studies and refinement of the ChatGPT, possibly aligning the outputs with the leading medical evidence provided by reliable databases, are needed.
Collapse
Affiliation(s)
- Antonietta Gerarda Gravina
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Raffaele Pellegrino
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Marina Cipullo
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Giovanna Palladino
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Giuseppe Imperio
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Andrea Ventura
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Salvatore Auletta
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Paola Ciamarra
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Alessandro Federico
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| |
Collapse
|
46
|
Morales-Ramirez P, Mishek H, Dasgupta A. The Genie Is Out of the Bottle: What ChatGPT Can and Cannot Do for Medical Professionals. Obstet Gynecol 2024; 143:e1-e6. [PMID: 37944140 DOI: 10.1097/aog.0000000000005446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/12/2023] [Indexed: 11/12/2023]
Abstract
ChatGPT is a cutting-edge artificial intelligence technology that was released for public use in November 2022. Its rapid adoption has raised questions about capabilities, limitations, and risks. This article presents an overview of ChatGPT, and it highlights the current state of this technology for the medical field. The article seeks to provide a balanced perspective on what the model can and cannot do in three specific domains: clinical practice, research, and medical education. It also provides suggestions on how to optimize the use of this tool.
Collapse
|
47
|
Biswas S, Logan NS, Davies LN, Sheppard AL, Wolffsohn JS. Authors' Reply: Assessing the utility of ChatGPT as an artificial intelligence-based large language model for information to answer questions on myopia. Ophthalmic Physiol Opt 2024; 44:233-234. [PMID: 37635297 DOI: 10.1111/opo.13227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 08/17/2023] [Indexed: 08/29/2023]
Affiliation(s)
- Sayantan Biswas
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Nicola S Logan
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Leon N Davies
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Amy L Sheppard
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - James S Wolffsohn
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| |
Collapse
|
48
|
Chang LC, Sun CC, Chen TH, Tsai DC, Lin HL, Liao LL. Evaluation of the quality and readability of ChatGPT responses to frequently asked questions about myopia in traditional Chinese language. Digit Health 2024; 10:20552076241277021. [PMID: 39229462 PMCID: PMC11369861 DOI: 10.1177/20552076241277021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 08/05/2024] [Indexed: 09/05/2024] Open
Abstract
Introduction ChatGPT can serve as an adjunct informational tool for ophthalmologists and their patients. However, the reliability and readability of its responses to myopia-related queries in the Chinese language remain underexplored. Purpose This study aimed to evaluate the ability of ChatGPT to address frequently asked questions (FAQs) about myopia by parents and caregivers. Method Myopia-related FAQs were input three times into fresh ChatGPT sessions, and the responses were evaluated by 10 ophthalmologists using a Likert scale for appropriateness, usability, and clarity. The Chinese Readability Index Explorer (CRIE) was used to evaluate the readability of each response. Inter-rater reliability among the reviewers was examined using Cohen's kappa coefficient, and Spearman's rank correlation analysis and one-way analysis of variance were used to investigate the relationship between CRIE scores and each criterion. Results Forty-five percent of the responses of ChatGPT in Chinese language were appropriate and usable and only 35% met all the set criteria. The CRIE scores for 20 ChatGPT responses ranged from 7.29 to 12.09, indicating that the readability level was equivalent to a middle-to-high school level. Responses about the treatment efficacy and side effects were deficient for all three criteria. Conclusions The performance of ChatGPT in addressing pediatric myopia-related questions is currently suboptimal. As parents increasingly utilize digital resources to obtain health information, it has become crucial for eye care professionals to familiarize themselves with artificial intelligence-driven information on pediatric myopia.
Collapse
Affiliation(s)
- Li-Chun Chang
- School of Nursing, Chang Gung University of Science and Technology, Gueishan
- School of Nursing, College of Medicine, Chang Gung University,
Tao-Yuan
- Department of Nursing, Linkou Chang Gung Memorial Hospital, Linkou
| | - Chi-Chin Sun
- Department of Ophthalmology, Chang Gung Memorial Hospital, Keelung
- School of Medicine, College of Medicine, Chang Gung University, Taoyuan
| | - Ting-Han Chen
- Chang Gung University of Science and Technology, Taoyuan City
| | - Der-Chong Tsai
- Faculty of Medicine, National Yang Ming Chiao Tung University, School of Medicine, Taipei
- Community Medicine Research Center and Institute of Public Health, National Yang Ming Chiao Tung University, Taipei
- Department of Ophthalmology, National Yang Ming Chiao Tung University Hospital, Yilan
| | - Hui-Ling Lin
- School of Nursing, Chang Gung University of Science and Technology, Gueishan
- School of Nursing, College of Medicine, Chang Gung University,
Tao-Yuan
- Department of Nursing, Linkou Chang Gung Memorial Hospital, Linkou
- Taipei Medical University, Taipei
| | - Li-Ling Liao
- Department of Public Health, College of Health Science, Kaohsiung Medical University, Kaohsiung City
- Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung City
| |
Collapse
|
49
|
Ittarat M, Cheungpasitporn W, Chansangpetch S. Personalized Care in Eye Health: Exploring Opportunities, Challenges, and the Road Ahead for Chatbots. J Pers Med 2023; 13:1679. [PMID: 38138906 PMCID: PMC10744965 DOI: 10.3390/jpm13121679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 12/24/2023] Open
Abstract
In modern eye care, the adoption of ophthalmology chatbots stands out as a pivotal technological progression. These digital assistants present numerous benefits, such as better access to vital information, heightened patient interaction, and streamlined triaging. Recent evaluations have highlighted their performance in both the triage of ophthalmology conditions and ophthalmology knowledge assessment, underscoring their potential and areas for improvement. However, assimilating these chatbots into the prevailing healthcare infrastructures brings challenges. These encompass ethical dilemmas, legal compliance, seamless integration with electronic health records (EHR), and fostering effective dialogue with medical professionals. Addressing these challenges necessitates the creation of bespoke standards and protocols for ophthalmology chatbots. The horizon for these chatbots is illuminated by advancements and anticipated innovations, poised to redefine the delivery of eye care. The synergy of artificial intelligence (AI) and machine learning (ML) with chatbots amplifies their diagnostic prowess. Additionally, their capability to adapt linguistically and culturally ensures they can cater to a global patient demographic. In this article, we explore in detail the utilization of chatbots in ophthalmology, examining their accuracy, reliability, data protection, security, transparency, potential algorithmic biases, and ethical considerations. We provide a comprehensive review of their roles in the triage of ophthalmology conditions and knowledge assessment, emphasizing their significance and future potential in the field.
Collapse
Affiliation(s)
- Mantapond Ittarat
- Surin Hospital and Surin Medical Education Center, Suranaree University of Technology, Surin 32000, Thailand;
| | | | - Sunee Chansangpetch
- Center of Excellence in Glaucoma, Chulalongkorn University, Bangkok 10330, Thailand;
- Department of Ophthalmology, Faculty of Medicine, Chulalongkorn University and King Chulalongkorn Memorial Hospital, Thai Red Cross Society, Bangkok 10330, Thailand
| |
Collapse
|
50
|
Kunze KN. Editorial Commentary: Recognizing and Avoiding Medical Misinformation Across Digital Platforms: Smoke, Mirrors (and Streaming). Arthroscopy 2023; 39:2454-2455. [PMID: 37981387 DOI: 10.1016/j.arthro.2023.06.054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 06/27/2023] [Accepted: 06/30/2023] [Indexed: 11/21/2023]
Abstract
The evolution of social media and related online sources has substantially increased the ability of patients to query and access publicly available information that may have relevance to a potential musculoskeletal condition of interest. Although increased accessibility to information has several purported benefits, including encouragement of patients to become more invested in their care through self-teaching, a downside to the existence of a vast number of unregulated resources remains the risk of misinformation. As health care providers, we have a moral and ethical obligation to mitigate this risk by directing patients to high-quality resources for medical information and to be aware of resources that are unreliable. To this end, a growing body of evidence has suggested that YouTube lacks reliability and quality in terms of medical information concerning a variety of musculoskeletal conditions.
Collapse
|