1
|
Naved BA, Luo Y. Contrasting rule and machine learning based digital self triage systems in the USA. NPJ Digit Med 2024; 7:381. [PMID: 39725711 DOI: 10.1038/s41746-024-01367-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Accepted: 11/30/2024] [Indexed: 12/28/2024] Open
Abstract
Patient smart access and self-triage systems have been in development for decades. As of now, no LLM for processing self-reported patient data has been published by health systems. Many expert systems and computational models have been released to millions. This review is the first to summarize progress in the field including an analysis of the exact self-triage solutions available on the websites of 647 health systems in the USA.
Collapse
Affiliation(s)
- Bilal A Naved
- Department of Biomedical Engineering, Northwestern University McCormick School of Engineering, Chicago, IL, USA
- Department of Preventative Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Yuan Luo
- Department of Preventative Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
| |
Collapse
|
2
|
Sabaner MC, Anguita R, Antaki F, Balas M, Boberg-Ans LC, Ferro Desideri L, Grauslund J, Hansen MS, Klefter ON, Potapenko I, Rasmussen MLR, Subhi Y. Opportunities and Challenges of Chatbots in Ophthalmology: A Narrative Review. J Pers Med 2024; 14:1165. [PMID: 39728077 DOI: 10.3390/jpm14121165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 12/18/2024] [Accepted: 12/19/2024] [Indexed: 12/28/2024] Open
Abstract
Artificial intelligence (AI) is becoming increasingly influential in ophthalmology, particularly through advancements in machine learning, deep learning, robotics, neural networks, and natural language processing (NLP). Among these, NLP-based chatbots are the most readily accessible and are driven by AI-based large language models (LLMs). These chatbots have facilitated new research avenues and have gained traction in both clinical and surgical applications in ophthalmology. They are also increasingly being utilized in studies on ophthalmology-related exams, particularly those containing multiple-choice questions (MCQs). This narrative review evaluates both the opportunities and the challenges of integrating chatbots into ophthalmology research, with separate assessments of studies involving open- and close-ended questions. While chatbots have demonstrated sufficient accuracy in handling MCQ-based studies, supporting their use in education, additional exam security measures are necessary. The research on open-ended question responses suggests that AI-based LLM chatbots could be applied across nearly all areas of ophthalmology. They have shown promise for addressing patient inquiries, offering medical advice, patient education, supporting triage, facilitating diagnosis and differential diagnosis, and aiding in surgical planning. However, the ethical implications, confidentiality concerns, physician liability, and issues surrounding patient privacy remain pressing challenges. Although AI has demonstrated significant promise in clinical patient care, it is currently most effective as a supportive tool rather than as a replacement for human physicians.
Collapse
Affiliation(s)
- Mehmet Cem Sabaner
- Department of Ophthalmology, Kastamonu University, Training and Research Hospital, 37150 Kastamonu, Türkiye
| | - Rodrigo Anguita
- Department of Ophthalmology, Inselspital, University Hospital Bern, University of Bern, 3010 Bern, Switzerland
- Moorfields Eye Hospital National Health Service Foundation Trust, London EC1V 2PD, UK
| | - Fares Antaki
- Moorfields Eye Hospital National Health Service Foundation Trust, London EC1V 2PD, UK
- The CHUM School of Artificial Intelligence in Healthcare, Montreal, QC H2X 0A9, Canada
- Cole Eye Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Michael Balas
- Department of Ophthalmology & Vision Sciences, University of Toronto, Toronto, ON M5T 2S8, Canada
| | | | - Lorenzo Ferro Desideri
- Department of Ophthalmology, Inselspital, University Hospital Bern, University of Bern, 3010 Bern, Switzerland
- Graduate School for Health Sciences, University of Bern, 3012 Bern, Switzerland
| | - Jakob Grauslund
- Department of Ophthalmology, Odense University Hospital, 5000 Odense, Denmark
- Department of Clinical Research, University of Southern Denmark, 5230 Odense, Denmark
- Department of Ophthalmology, Vestfold Hospital Trust, 3103 Tønsberg, Norway
| | | | - Oliver Niels Klefter
- Department of Ophthalmology, Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, 1172 Copenhagen, Denmark
| | - Ivan Potapenko
- Department of Ophthalmology, Rigshospitalet, 2100 Copenhagen, Denmark
| | - Marie Louise Roed Rasmussen
- Department of Ophthalmology, Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, 1172 Copenhagen, Denmark
| | - Yousif Subhi
- Department of Clinical Research, University of Southern Denmark, 5230 Odense, Denmark
- Department of Ophthalmology, Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, 1172 Copenhagen, Denmark
| |
Collapse
|
3
|
Kolokythas A, Dahan MH. Is Artificial Intelligence (AI) currently able to provide evidence-based scientific responses on methods that can improve the outcomes of embryo transfers? No. JBRA Assist Reprod 2024; 28:629-638. [PMID: 39254470 PMCID: PMC11622398 DOI: 10.5935/1518-0557.20240050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 07/25/2024] [Indexed: 09/11/2024] Open
Abstract
OBJECTIVE The rapid development of Artificial Intelligence (AI) has raised questions about its potential uses in different sectors of everyday life. Specifically in medicine, the question arose whether chatbots could be used as tools for clinical decision-making or patients' and physicians' education. To answer this question in the context of fertility, we conducted a test to determine whether current AI platforms can provide evidence-based responses regarding methods that can improve the outcomes of embryo transfers. METHODS We asked nine popular chatbots to write a 300-word scientific essay, outlining scientific methods that improve embryo transfer outcomes. We then gathered the responses and extracted the methods suggested by each chatbot. RESULTS Out of a total of 43 recommendations, which could be grouped into 19 similar categories, only 3/19 (15.8%) were evidence-based practices, those being "ultrasound-guided embryo transfer" in 7/9 (77.8%) chatbots, "single embryo transfer" in 4/9 (44.4%) and "use of a soft catheter" in 2/9 (22.2%), whereas some controversial responses like "preimplantation genetic testing" appeared frequently (6/9 chatbots; 66.7%), along with other debatable recommendations like "endometrial receptivity assay", "assisted hatching" and "time-lapse incubator". CONCLUSIONS Our results suggest that AI is not yet in a position to give evidence-based recommendations in the field of fertility, particularly concerning embryo transfer, since the vast majority of responses consisted of scientifically unsupported recommendations. As such, both patients and physicians should be wary of guiding care based on chatbot recommendations in infertility. Chatbot results might improve with time especially if trained from validated medical databases; however, this will have to be scientifically checked.
Collapse
Affiliation(s)
- Argyrios Kolokythas
- McGill University Health Centre, Department of Obstetrics &
Gynecology, Montreal, Canada
| | - Michael H. Dahan
- McGill University Health Centre, Department of Obstetrics &
Gynecology, Montreal, Canada
| |
Collapse
|
4
|
Risser KM, Zhou MY, Koster KG, Tejawinata FI, Gu X, Steinemann TL. Contact Lens Regulation: Where Have We Been, Where are We Going? Eye Contact Lens 2024; 50:00140068-990000000-00247. [PMID: 39569988 DOI: 10.1097/icl.0000000000001148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/12/2024] [Indexed: 11/22/2024]
Abstract
ABSTRACT The Food and Drug Administration and the Federal Trade Commission influence the contact lens (CL) market, with the Food and Drug Administration regulating CLs as medical devices and the Federal Trade Commission dictating how they are prescribed and sold. Legislative oversight came to the forefront in 2004, when the Contact Lens Rule was introduced, drastically changing how CLs are prescribed and distributed. This article examines the evolution of CL regulations over the past two decades and discusses how regulation, such as allowing passive verification, has shaped the current and evolving CL market. We also explore how related products (decorative CLs, artificial tears) are regulated and compare US regulations with those abroad. Finally, we discuss how future technological advancements, including artificial intelligence, promise to change the CL industry and its regulation worldwide.
Collapse
Affiliation(s)
- Kayleigh M Risser
- Case Western Reserve University School of Medicine (K.M.R., M.Y.Z., K.G.K., F.I.T., X.G., T.L.S.), Cleveland, OH; and MetroHealth Medical Center Division of Ophthalmology (T.L.S.), Cleveland, OH
| | | | | | | | | | | |
Collapse
|
5
|
Ming S, Yao X, Guo X, Guo Q, Xie K, Chen D, Lei B. Performance of ChatGPT in Ophthalmic Registration and Clinical Diagnosis: Cross-Sectional Study. J Med Internet Res 2024; 26:e60226. [PMID: 39541581 PMCID: PMC11605262 DOI: 10.2196/60226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 08/05/2024] [Accepted: 10/15/2024] [Indexed: 11/16/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) chatbots such as ChatGPT are expected to impact vision health care significantly. Their potential to optimize the consultation process and diagnostic capabilities across range of ophthalmic subspecialties have yet to be fully explored. OBJECTIVE This study aims to investigate the performance of AI chatbots in recommending ophthalmic outpatient registration and diagnosing eye diseases within clinical case profiles. METHODS This cross-sectional study used clinical cases from Chinese Standardized Resident Training-Ophthalmology (2nd Edition). For each case, 2 profiles were created: patient with history (Hx) and patient with history and examination (Hx+Ex). These profiles served as independent queries for GPT-3.5 and GPT-4.0 (accessed from March 5 to 18, 2024). Similarly, 3 ophthalmic residents were posed the same profiles in a questionnaire format. The accuracy of recommending ophthalmic subspecialty registration was primarily evaluated using Hx profiles. The accuracy of the top-ranked diagnosis and the accuracy of the diagnosis within the top 3 suggestions (do-not-miss diagnosis) were assessed using Hx+Ex profiles. The gold standard for judgment was the published, official diagnosis. Characteristics of incorrect diagnoses by ChatGPT were also analyzed. RESULTS A total of 208 clinical profiles from 12 ophthalmic subspecialties were analyzed (104 Hx and 104 Hx+Ex profiles). For Hx profiles, GPT-3.5, GPT-4.0, and residents showed comparable accuracy in registration suggestions (66/104, 63.5%; 81/104, 77.9%; and 72/104, 69.2%, respectively; P=.07), with ocular trauma, retinal diseases, and strabismus and amblyopia achieving the top 3 accuracies. For Hx+Ex profiles, both GPT-4.0 and residents demonstrated higher diagnostic accuracy than GPT-3.5 (62/104, 59.6% and 63/104, 60.6% vs 41/104, 39.4%; P=.003 and P=.001, respectively). Accuracy for do-not-miss diagnoses also improved (79/104, 76% and 68/104, 65.4% vs 51/104, 49%; P<.001 and P=.02, respectively). The highest diagnostic accuracies were observed in glaucoma; lens diseases; and eyelid, lacrimal, and orbital diseases. GPT-4.0 recorded fewer incorrect top-3 diagnoses (25/42, 60% vs 53/63, 84%; P=.005) and more partially correct diagnoses (21/42, 50% vs 7/63 11%; P<.001) than GPT-3.5, while GPT-3.5 had more completely incorrect (27/63, 43% vs 7/42, 17%; P=.005) and less precise diagnoses (22/63, 35% vs 5/42, 12%; P=.009). CONCLUSIONS GPT-3.5 and GPT-4.0 showed intermediate performance in recommending ophthalmic subspecialties for registration. While GPT-3.5 underperformed, GPT-4.0 approached and numerically surpassed residents in differential diagnosis. AI chatbots show promise in facilitating ophthalmic patient registration. However, their integration into diagnostic decision-making requires more validation.
Collapse
Affiliation(s)
- Shuai Ming
- Department of Ophthalmology, Henan Eye Institute, Henan Eye Hospital, Henan Provincial People's Hospital, Zhengzhou, China
- Eye Institute, Henan Academy of Innovations in Medical Science, Zhengzhou, China
- Henan Clinical Research Center for Ocular Diseases, People's Hospital of Zhengzhou University, Zhengzhou, China
| | - Xi Yao
- Department of Ophthalmology, Henan Eye Institute, Henan Eye Hospital, Henan Provincial People's Hospital, Zhengzhou, China
| | - Xiaohong Guo
- Department of Ophthalmology, Henan Eye Institute, Henan Eye Hospital, Henan Provincial People's Hospital, Zhengzhou, China
| | - Qingge Guo
- Department of Ophthalmology, Henan Eye Institute, Henan Eye Hospital, Henan Provincial People's Hospital, Zhengzhou, China
- Eye Institute, Henan Academy of Innovations in Medical Science, Zhengzhou, China
- Henan Clinical Research Center for Ocular Diseases, People's Hospital of Zhengzhou University, Zhengzhou, China
| | - Kunpeng Xie
- Department of Ophthalmology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Dandan Chen
- Department of Ophthalmology, Henan Eye Institute, Henan Eye Hospital, Henan Provincial People's Hospital, Zhengzhou, China
- Eye Institute, Henan Academy of Innovations in Medical Science, Zhengzhou, China
- Henan Clinical Research Center for Ocular Diseases, People's Hospital of Zhengzhou University, Zhengzhou, China
| | - Bo Lei
- Department of Ophthalmology, Henan Eye Institute, Henan Eye Hospital, Henan Provincial People's Hospital, Zhengzhou, China
- Eye Institute, Henan Academy of Innovations in Medical Science, Zhengzhou, China
- Henan Clinical Research Center for Ocular Diseases, People's Hospital of Zhengzhou University, Zhengzhou, China
| |
Collapse
|
6
|
Elkarmi R, Abu-Ghazaleh S, Sonbol H, Haha O, Al-Haddad A, Hassona Y. ChatGPT for parents' education about early childhood caries: A friend or foe? Int J Paediatr Dent 2024. [PMID: 39533165 DOI: 10.1111/ipd.13283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 08/25/2024] [Accepted: 10/22/2024] [Indexed: 11/16/2024]
Abstract
BACKGROUND With the increasing popularity of online sources for health information, parents may seek information related to early childhood caries (ECC) from artificial intelligence-based chatbots. AIM The aim of this article was to evaluate the usefulness, quality, reliability, and readability of ChatGPT answers to parents' questions about ECC. DESIGN Eighty questions commonly asked about ECC were compiled from experts and keyword research tools. ChatGPT 3.5 was asked these questions independently. The answers were evaluated by experts in paediatric dentistry. RESULTS ChatGPT provided "very useful" and "useful" responses to 82.5% of the questions. The mean global quality score was 4.3 ± 1 (good quality). The mean reliability score was 18.5 ± 8.9 (average to very good). The mean understandability score was 59.5% ± 13.8 (not highly understandable), and the mean actionability score was 40.5% ± 12.8 (low actionability). The mean Flesch-Kincaid reading ease score was 32% ± 25.7, and the mean Simple Measure of Gobbledygook index readability score was 15.3 ± 9.1(indicating poor readability for the lay person). Misleading and false information were detected in some answers. CONCLUSION ChatGPT has significant potential as a tool for answering parent's questions about ECC. Concerns, however, do exist about the readability and actionability of the answers. The presence of false information should not be overlooked.
Collapse
Affiliation(s)
- Rawan Elkarmi
- Department of Paediatric Dentistry, Orthodontics, and Preventive Dentistry, School of Dentistry, The University of Jordan, Amman, Jordan
- School of Dentistry, The University of Jordan, Amman, Jordan
| | - Suha Abu-Ghazaleh
- Department of Paediatric Dentistry, Orthodontics, and Preventive Dentistry, School of Dentistry, The University of Jordan, Amman, Jordan
- School of Dentistry, The University of Jordan, Amman, Jordan
| | - Hawazen Sonbol
- Department of Paediatric Dentistry, Orthodontics, and Preventive Dentistry, School of Dentistry, The University of Jordan, Amman, Jordan
- School of Dentistry, The University of Jordan, Amman, Jordan
| | - Ola Haha
- School of Dentistry, The University of Jordan, Amman, Jordan
| | - Alaa Al-Haddad
- School of Dentistry, The University of Jordan, Amman, Jordan
- Department of Prosthodontics, School of Dentistry, The University of Jordan, Amman, Jordan
| | - Yazan Hassona
- School of Dentistry, The University of Jordan, Amman, Jordan
- Department of Oral and Maxillofacial Surgery, Oral Medicine and Periodontology, School of Dentistry, The University of Jordan, Amman, Jordan
| |
Collapse
|
7
|
Marshall RF, Mallem K, Xu H, Thorne J, Burkholder B, Chaon B, Liberman P, Berkenstock M. Investigating the Accuracy and Completeness of an Artificial Intelligence Large Language Model About Uveitis: An Evaluation of ChatGPT. Ocul Immunol Inflamm 2024; 32:2052-2055. [PMID: 38394625 DOI: 10.1080/09273948.2024.2317417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/20/2023] [Accepted: 02/06/2024] [Indexed: 02/25/2024]
Abstract
PURPOSE To assess the accuracy and completeness of ChatGPT-generated answers regarding uveitis description, prevention, treatment, and prognosis. METHODS Thirty-two uveitis-related questions were generated by a uveitis specialist and inputted into ChatGPT 3.5. Answers were compiled into a survey and were reviewed by five uveitis specialists using standardized Likert scales of accuracy and completeness. RESULTS In total, the median accuracy score for all the uveitis questions (n = 32) was 4.00 (between "more correct than incorrect" and "nearly all correct"), and the median completeness score was 2.00 ("adequate, addresses all aspects of the question and provides the minimum amount of information required to be considered complete"). The interrater variability assessment had a total kappa value of 0.0278 for accuracy and 0.0847 for completeness. CONCLUSION ChatGPT can provide relatively high accuracy responses for various questions related to uveitis; however, the answers it provides are incomplete, with some inaccuracies. Its utility in providing medical information requires further validation and development prior to serving as a source of uveitis information for patients.
Collapse
Affiliation(s)
- Rayna F Marshall
- The Drexel University College of Medicine, Philadelphia, Pennsylvania, USA
| | - Krishna Mallem
- The Drexel University College of Medicine, Philadelphia, Pennsylvania, USA
| | - Hannah Xu
- University of California San Diego, San Diego, California, USA
| | - Jennifer Thorne
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Bryn Burkholder
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Benjamin Chaon
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Paulina Liberman
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Meghan Berkenstock
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
8
|
Rampat R, Debellemanière G, Gatinel D, Ting DSJ. Artificial intelligence applications in cataract and refractive surgeries. Curr Opin Ophthalmol 2024; 35:480-486. [PMID: 39259648 DOI: 10.1097/icu.0000000000001090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024]
Abstract
PURPOSE OF REVIEW This review highlights the recent advancements in the applications of artificial intelligence within the field of cataract and refractive surgeries. Given the rapid evolution of artificial intelligence technologies, it is essential to provide an updated overview of the significant strides and emerging trends in this field. RECENT FINDINGS Key themes include artificial intelligence-assisted diagnostics and intraoperative support, image analysis for anterior segment surgeries, development of artificial intelligence-based diagnostic scores and calculators for early disease detection and treatment planning, and integration of generative artificial intelligence for patient education and postoperative monitoring. SUMMARY The impact of artificial intelligence on cataract and refractive surgeries is becoming increasingly evident through improved diagnostic accuracy, enhanced patient education, and streamlined clinical workflows. These advancements hold significant implications for clinical practice, promising more personalized patient care and facilitating early disease detection and intervention. Equally, the review also highlights the fact that only some of this work reaches the clinical stage, successful integration of which may benefit from our focus.
Collapse
Affiliation(s)
| | - Guillaume Debellemanière
- Department of Anterior Segment and Refractive Surgery, Rothschild Foundation Hospital, Paris, France
| | - Damien Gatinel
- Department of Anterior Segment and Refractive Surgery, Rothschild Foundation Hospital, Paris, France
| | - Darren S J Ting
- Academic Unit of Ophthalmology, Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham
- Birmingham and Midland Eye Centre, Sandwell and West Birmingham NHS Trust, Birmingham
- Academic Ophthalmology, School of Medicine, University of Nottingham, Nottingham, UK
| |
Collapse
|
9
|
Kalaw FGP, Baxter SL. Ethical considerations for large language models in ophthalmology. Curr Opin Ophthalmol 2024; 35:438-446. [PMID: 39259616 PMCID: PMC11427135 DOI: 10.1097/icu.0000000000001083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024]
Abstract
PURPOSE OF REVIEW This review aims to summarize and discuss the ethical considerations regarding large language model (LLM) use in the field of ophthalmology. RECENT FINDINGS This review of 47 articles on LLM applications in ophthalmology highlights their diverse potential uses, including education, research, clinical decision support, and surgical assistance (as an aid in operative notes). We also review ethical considerations such as the inability of LLMs to interpret data accurately, the risk of promoting controversial or harmful recommendations, and breaches of data privacy. These concerns imply the need for cautious integration of artificial intelligence in healthcare, emphasizing human oversight, transparency, and accountability to mitigate risks and uphold ethical standards. SUMMARY The integration of LLMs in ophthalmology offers potential advantages such as aiding in clinical decision support and facilitating medical education through their ability to process queries and analyze ophthalmic imaging and clinical cases. However, their utilization also raises ethical concerns regarding data privacy, potential misinformation, and biases inherent in the datasets used. Awareness of these concerns should be addressed in order to optimize its utility in the healthcare setting. More importantly, promoting responsible and careful use by consumers should be practiced.
Collapse
Affiliation(s)
- Fritz Gerald P Kalaw
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute
- Department of Biomedical Informatics, University of California San Diego Health System, University of California San Diego, La Jolla, California, USA
| | - Sally L Baxter
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute
- Department of Biomedical Informatics, University of California San Diego Health System, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
10
|
Azzopardi M, Ng B, Logeswaran A, Loizou C, Cheong RCT, Gireesh P, Ting DSJ, Chong YJ. Artificial intelligence chatbots as sources of patient education material for cataract surgery: ChatGPT-4 versus Google Bard. BMJ Open Ophthalmol 2024; 9:e001824. [PMID: 39419585 PMCID: PMC11487885 DOI: 10.1136/bmjophth-2024-001824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Accepted: 10/02/2024] [Indexed: 10/19/2024] Open
Abstract
OBJECTIVE To conduct a head-to-head comparative analysis of cataract surgery patient education material generated by Chat Generative Pre-trained Transformer (ChatGPT-4) and Google Bard. METHODS AND ANALYSIS 98 frequently asked questions on cataract surgery in English were taken in November 2023 from 5 trustworthy online patient information resources. 59 of these were curated (20 augmented for clarity and 39 duplicates excluded) and categorised into 3 domains: condition (n=15), preparation for surgery (n=21) and recovery after surgery (n=23). They were formulated into input prompts with 'prompt engineering'. Using the Patient Education Materials Assessment Tool-Printable (PEMAT-P) Auto-Scoring Form, four ophthalmologists independently graded ChatGPT-4 and Google Bard responses. The readability of responses was evaluated using a Flesch-Kincaid calculator. Responses were also subjectively examined for any inaccurate or harmful information. RESULTS Google Bard had a higher mean overall Flesch-Kincaid Level (8.02) compared with ChatGPT-4 (5.75) (p<0.001), also noted across all three domains. ChatGPT-4 had a higher overall PEMAT-P understandability score (85.8%) in comparison to Google Bard (80.9%) (p<0.001), which was also noted in the 'preparation for cataract surgery' (85.2% vs 75.7%; p<0.001) and 'recovery after cataract surgery' (86.5% vs 82.3%; p=0.004) domains. There was no statistically significant difference in overall (42.5% vs 44.2%; p=0.344) or individual domain actionability scores (p>0.10). None of the generated material contained dangerous information. CONCLUSION In comparison to Google Bard, ChatGPT-4 fared better overall, scoring higher on the PEMAT-P understandability scale and exhibiting more faithfulness to the prompt engineering instruction. Since input prompts might vary from real-world patient searches, follow-up studies with patient participation are required.
Collapse
Affiliation(s)
| | - Benjamin Ng
- University of Oxford Christ Church, Oxford, UK
| | | | | | | | | | - Darren Shu Jeng Ting
- Academic Unit of Ophthalmology, University of Birmingham Institute of Inflammation and Ageing, Birmingham, UK
- Birmingham and Midland Eye Centre, Birmingham, UK
| | - Yu Jeat Chong
- Singapore Eye Research Institute, Singapore
- University of Cambridge, Cambridge, UK
| |
Collapse
|
11
|
Künzle P, Paris S. Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments. Clin Oral Investig 2024; 28:575. [PMID: 39373739 PMCID: PMC11458639 DOI: 10.1007/s00784-024-05968-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Accepted: 09/24/2024] [Indexed: 10/08/2024]
Abstract
OBJECTIVES The advent of artificial intelligence (AI) and large language model (LLM)-based AI applications (LLMAs) has tremendous implications for our society. This study analyzed the performance of LLMAs on solving restorative dentistry and endodontics (RDE) student assessment questions. MATERIALS AND METHODS 151 questions from a RDE question pool were prepared for prompting using LLMAs from OpenAI (ChatGPT-3.5,-4.0 and -4.0o) and Google (Gemini 1.0). Multiple-choice questions were sorted into four question subcategories, entered into LLMAs and answers recorded for analysis. P-value and chi-square statistical analyses were performed using Python 3.9.16. RESULTS The total answer accuracy of ChatGPT-4.0o was the highest, followed by ChatGPT-4.0, Gemini 1.0 and ChatGPT-3.5 (72%, 62%, 44% and 25%, respectively) with significant differences between all LLMAs except GPT-4.0 models. The performance on subcategories direct restorations and caries was the highest, followed by indirect restorations and endodontics. CONCLUSIONS Overall, there are large performance differences among LLMAs. Only the ChatGPT-4 models achieved a success ratio that could be used with caution to support the dental academic curriculum. CLINICAL RELEVANCE While LLMAs could support clinicians to answer dental field-related questions, this capacity depends strongly on the employed model. The most performant model ChatGPT-4.0o achieved acceptable accuracy rates in some subject sub-categories analyzed.
Collapse
Affiliation(s)
- Paul Künzle
- Department of Operative, Preventive and Pediatric Dentistry, Charité - Universitätsmedizin Berlin, Aßmannshauser Str. 4-6, Berlin, 14197, Germany.
| | - Sebastian Paris
- Department of Operative, Preventive and Pediatric Dentistry, Charité - Universitätsmedizin Berlin, Aßmannshauser Str. 4-6, Berlin, 14197, Germany
| |
Collapse
|
12
|
Wong M, Lim ZW, Pushpanathan K, Cheung CY, Wang YX, Chen D, Tham YC. Review of emerging trends and projection of future developments in large language models research in ophthalmology. Br J Ophthalmol 2024; 108:1362-1370. [PMID: 38164563 DOI: 10.1136/bjo-2023-324734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 11/14/2023] [Indexed: 01/03/2024]
Abstract
BACKGROUND Large language models (LLMs) are fast emerging as potent tools in healthcare, including ophthalmology. This systematic review offers a twofold contribution: it summarises current trends in ophthalmology-related LLM research and projects future directions for this burgeoning field. METHODS We systematically searched across various databases (PubMed, Europe PMC, Scopus and Web of Science) for articles related to LLM use in ophthalmology, published between 1 January 2022 and 31 July 2023. Selected articles were summarised, and categorised by type (editorial, commentary, original research, etc) and their research focus (eg, evaluating ChatGPT's performance in ophthalmology examinations or clinical tasks). FINDINGS We identified 32 articles meeting our criteria, published between January and July 2023, with a peak in June (n=12). Most were original research evaluating LLMs' proficiency in clinically related tasks (n=9). Studies demonstrated that ChatGPT-4.0 outperformed its predecessor, ChatGPT-3.5, in ophthalmology exams. Furthermore, ChatGPT excelled in constructing discharge notes (n=2), evaluating diagnoses (n=2) and answering general medical queries (n=6). However, it struggled with generating scientific articles or abstracts (n=3) and answering specific subdomain questions, especially those regarding specific treatment options (n=2). ChatGPT's performance relative to other LLMs (Google's Bard, Microsoft's Bing) varied by study design. Ethical concerns such as data hallucination (n=27), authorship (n=5) and data privacy (n=2) were frequently cited. INTERPRETATION While LLMs hold transformative potential for healthcare and ophthalmology, concerns over accountability, accuracy and data security remain. Future research should focus on application programming interface integration, comparative assessments of popular LLMs, their ability to interpret image-based data and the establishment of standardised evaluation frameworks.
Collapse
Affiliation(s)
| | - Zhi Wei Lim
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Krithi Pushpanathan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Carol Y Cheung
- Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong, Hong Kong
| | - Ya Xing Wang
- Beijing Institute of Ophthalmology, Beijing Tongren Hospital, Capital University of Medical Science, Beijing, China
| | - David Chen
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Department of Ophthalmology, National University Hospital, Singapore
| | - Yih Chung Tham
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
| |
Collapse
|
13
|
Jung H, Oh J, Stephenson KAJ, Joe AW, Mammo ZN. Prompt engineering with ChatGPT3.5 and GPT4 to improve patient education on retinal diseases. CANADIAN JOURNAL OF OPHTHALMOLOGY 2024:S0008-4182(24)00258-8. [PMID: 39245293 DOI: 10.1016/j.jcjo.2024.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 04/24/2024] [Accepted: 08/18/2024] [Indexed: 09/10/2024]
Abstract
OBJECTIVE To assess the effect of prompt engineering on the accuracy, comprehensiveness, readability, and empathy of large language model (LLM)-generated responses to patient questions regarding retinal disease. DESIGN Prospective qualitative study. PARTICIPANTS Retina specialists, ChatGPT3.5, and GPT4. METHODS Twenty common patient questions regarding 5 retinal conditions were inputted to ChatGPT3.5 and GPT4 as a stand-alone question or preceded by an optimized prompt (prompt A) or preceded by prompt A with specified limits to length and grade reading level (prompt B). Accuracy and comprehensiveness were graded by 3 retina specialists on a Likert scale from 1 to 5 (1: very poor to 5: very good). Readability of responses was assessed using Readable.com, an online readability tool. RESULTS There were no significant differences between ChatGPT3.5 and GPT4 across any of the metrics tested. Median accuracy of responses to a stand-alone question, prompt A, and prompt B questions were 5.0, 5.0, and 4.0, respectively. Median comprehensiveness of responses to a stand-alone question, prompt A, and prompt B questions were 5.0, 5.0, and 4.0, respectively. The use of prompt B was associated with a lower accuracy and comprehensiveness than responses to stand-alone question or prompt A questions (p < 0.001). Average-grade reading level of responses across both LLMs were 13.45, 11.5, and 10.3 for a stand-alone question, prompt A, and prompt B questions, respectively (p < 0.001). CONCLUSIONS Prompt engineering can significantly improve readability of LLM-generated responses, although at the cost of reducing accuracy and comprehensiveness. Further study is needed to understand the utility and bioethical implications of LLMs as a patient educational resource.
Collapse
Affiliation(s)
- Hoyoung Jung
- Faculty of Medicine, University of British Columbia, Vancouver BC, Canada
| | - Jean Oh
- Faculty of Medicine, University of British Columbia, Vancouver BC, Canada
| | - Kirk A J Stephenson
- Department of Ophthalmology and Visual Sciences, University of British Columbia, Vancouver BC, Canada
| | - Aaron W Joe
- Department of Ophthalmology and Visual Sciences, University of British Columbia, Vancouver BC, Canada
| | - Zaid N Mammo
- Department of Ophthalmology and Visual Sciences, University of British Columbia, Vancouver BC, Canada.
| |
Collapse
|
14
|
Hassona Y, Alqaisi D, Al-Haddad A, Georgakopoulou EA, Malamos D, Alrashdan MS, Sawair F. How good is ChatGPT at answering patients' questions related to early detection of oral (mouth) cancer? Oral Surg Oral Med Oral Pathol Oral Radiol 2024; 138:269-278. [PMID: 38714483 DOI: 10.1016/j.oooo.2024.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 03/22/2024] [Accepted: 04/14/2024] [Indexed: 05/10/2024]
Abstract
OBJECTIVES To examine the quality, reliability, readability, and usefulness of ChatGPT in promoting oral cancer early detection. STUDY DESIGN About 108 patient-oriented questions about oral cancer early detection were compiled from expert panels, professional societies, and web-based tools. Questions were categorized into 4 topic domains and ChatGPT 3.5 was asked each question independently. ChatGPT answers were evaluated regarding quality, readability, actionability, and usefulness using. Two experienced reviewers independently assessed each response. RESULTS Questions related to clinical appearance constituted 36.1% (n = 39) of the total questions. ChatGPT provided "very useful" responses to the majority of questions (75%; n = 81). The mean Global Quality Score was 4.24 ± 1.3 of 5. The mean reliability score was 23.17 ± 9.87 of 25. The mean understandability score was 76.6% ± 25.9% of 100, while the mean actionability score was 47.3% ± 18.9% of 100. The mean FKS reading ease score was 38.4% ± 29.9%, while the mean SMOG index readability score was 11.65 ± 8.4. No misleading information was identified among ChatGPT responses. CONCLUSION ChatGPT is an attractive and potentially useful resource for informing patients about early detection of oral cancer. Nevertheless, concerns do exist about readability and actionability of the offered information.
Collapse
Affiliation(s)
- Yazan Hassona
- Faculty of Dentistry, Centre for Oral Diseases Studies (CODS), Al-Ahliyya Amman University, Jordan; School of Dentistry, The University of Jordan, Jordan.
| | - Dua'a Alqaisi
- School of Dentistry, The University of Jordan, Jordan
| | | | - Eleni A Georgakopoulou
- Molecular Carcinogenesis Group, Department of Histology and Embryology, Medical School, National and Kapodistrian University of Athens, Greece
| | - Dimitris Malamos
- Oral Medicine Clinic of the National Organization for the Provision of Health, Athens, Greece
| | - Mohammad S Alrashdan
- Department of Oral and Craniofacial Health Sciences, College of Dental Medicine, University of Sharjah, Sharjah, United Arab Emirates
| | - Faleh Sawair
- School of Dentistry, The University of Jordan, Jordan
| |
Collapse
|
15
|
Tao BKL, Hua N, Milkovich J, Micieli JA. ChatGPT-3.5 and Bing Chat in ophthalmology: an updated evaluation of performance, readability, and informative sources. Eye (Lond) 2024; 38:1897-1902. [PMID: 38509182 PMCID: PMC11226422 DOI: 10.1038/s41433-024-03037-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 03/04/2024] [Accepted: 03/14/2024] [Indexed: 03/22/2024] Open
Abstract
BACKGROUND/OBJECTIVES Experimental investigation. Bing Chat (Microsoft) integration with ChatGPT-4 (OpenAI) integration has conferred the capability of accessing online data past 2021. We investigate its performance against ChatGPT-3.5 on a multiple-choice question ophthalmology exam. SUBJECTS/METHODS In August 2023, ChatGPT-3.5 and Bing Chat were evaluated against 913 questions derived from the Academy's Basic and Clinical Science Collection collection. For each response, the sub-topic, performance, Simple Measure of Gobbledygook readability score (measuring years of required education to understand a given passage), and cited resources were collected. The primary outcomes were the comparative scores between models, and qualitatively, the resources referenced by Bing Chat. Secondary outcomes included performance stratified by response readability, question type (explicit or situational), and BCSC sub-topic. RESULTS Across 913 questions, ChatGPT-3.5 scored 59.69% [95% CI 56.45,62.94] while Bing Chat scored 73.60% [95% CI 70.69,76.52]. Both models performed significantly better in explicit than clinical reasoning questions. Both models performed best on general medicine questions than ophthalmology subsections. Bing Chat referenced 927 online entities and provided at-least one citation to 836 of the 913 questions. The use of more reliable (peer-reviewed) sources was associated with higher likelihood of correct response. The most-cited resources were eyewiki.aao.org, aao.org, wikipedia.org, and ncbi.nlm.nih.gov. Bing Chat showed significantly better readability than ChatGPT-3.5, averaging a reading level of grade 11.4 [95% CI 7.14, 15.7] versus 12.4 [95% CI 8.77, 16.1], respectively (p-value < 0.0001, ρ = 0.25). CONCLUSIONS The online access, improved readability, and citation feature of Bing Chat confers additional utility for ophthalmology learners. We recommend critical appraisal of cited sources during response interpretation.
Collapse
Affiliation(s)
- Brendan Ka-Lok Tao
- Faculty of Medicine, The University of British Columbia, 317-2194 Health Sciences Mall, Vancouver, BC, V6T 1Z3, Canada
| | - Nicholas Hua
- Temerty Faculty of Medicine, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada
| | - John Milkovich
- Temerty Faculty of Medicine, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada
| | - Jonathan Andrew Micieli
- Temerty Faculty of Medicine, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada.
- Department of Ophthalmology and Vision Sciences, University of Toronto, 340 College Street, Toronto, ON, M5T 3A9, Canada.
- Division of Neurology, Department of Medicine, University of Toronto, 6 Queen's Park Crescent West, Toronto, ON, M5S 3H2, Canada.
- Kensington Vision and Research Center, 340 College Street, Toronto, ON, M5T 3A9, Canada.
- St. Michael's Hospital, 36 Queen Street East, Toronto, ON, M5B 1W8, Canada.
- Toronto Western Hospital, 399 Bathurst Street, Toronto, ON, M5T 2S8, Canada.
- University Health Network, 190 Elizabeth Street, Toronto, ON, M5G 2C4, Canada.
| |
Collapse
|
16
|
Momenaei B, Mansour HA, Kuriyan AE, Xu D, Sridhar J, Ting DSW, Yonekawa Y. ChatGPT enters the room: what it means for patient counseling, physician education, academics, and disease management. Curr Opin Ophthalmol 2024; 35:205-209. [PMID: 38334288 DOI: 10.1097/icu.0000000000001036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
PURPOSE OF REVIEW This review seeks to provide a summary of the most recent research findings regarding the utilization of ChatGPT, an artificial intelligence (AI)-powered chatbot, in the field of ophthalmology in addition to exploring the limitations and ethical considerations associated with its application. RECENT FINDINGS ChatGPT has gained widespread recognition and demonstrated potential in enhancing patient and physician education, boosting research productivity, and streamlining administrative tasks. In various studies examining its utility in ophthalmology, ChatGPT has exhibited fair to good accuracy, with its most recent iteration showcasing superior performance in providing ophthalmic recommendations across various ophthalmic disorders such as corneal diseases, orbital disorders, vitreoretinal diseases, uveitis, neuro-ophthalmology, and glaucoma. This proves beneficial for patients in accessing information and aids physicians in triaging as well as formulating differential diagnoses. Despite such benefits, ChatGPT has limitations that require acknowledgment including the potential risk of offering inaccurate or harmful information, dependence on outdated data, the necessity for a high level of education for data comprehension, and concerns regarding patient privacy and ethical considerations within the research domain. SUMMARY ChatGPT is a promising new tool that could contribute to ophthalmic healthcare education and research, potentially reducing work burdens. However, its current limitations necessitate a complementary role with human expert oversight.
Collapse
Affiliation(s)
- Bita Momenaei
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Hana A Mansour
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Ajay E Kuriyan
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - David Xu
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Jayanth Sridhar
- University of California Los Angeles, Los Angeles, California, USA
| | | | - Yoshihiro Yonekawa
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| |
Collapse
|
17
|
Alexander AC, Somineni Raghupathy S, Surapaneni KM. An assessment of the capability of ChatGPT in solving clinical cases of ophthalmology using multiple choice and short answer questions. ADVANCES IN OPHTHALMOLOGY PRACTICE AND RESEARCH 2024; 4:95-97. [PMID: 38666248 PMCID: PMC11043809 DOI: 10.1016/j.aopr.2024.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/16/2024] [Accepted: 01/18/2024] [Indexed: 04/28/2024]
Affiliation(s)
- Anjana Christy Alexander
- Department of Ophthalmology, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai, Tamil Nadu, India
| | - Suprithy Somineni Raghupathy
- Department of Ophthalmology, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai, Tamil Nadu, India
| | - Krishna Mohan Surapaneni
- Department of Biochemistry, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai, Tamil Nadu, India
- Department of Medical Education, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai, Tamil Nadu, India
| |
Collapse
|
18
|
Driban M, Yan A, Selvam A, Ong J, Vupparaboina KK, Chhablani J. Artificial intelligence in chorioretinal pathology through fundoscopy: a comprehensive review. Int J Retina Vitreous 2024; 10:36. [PMID: 38654344 PMCID: PMC11036694 DOI: 10.1186/s40942-024-00554-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND Applications for artificial intelligence (AI) in ophthalmology are continually evolving. Fundoscopy is one of the oldest ocular imaging techniques but remains a mainstay in posterior segment imaging due to its prevalence, ease of use, and ongoing technological advancement. AI has been leveraged for fundoscopy to accomplish core tasks including segmentation, classification, and prediction. MAIN BODY In this article we provide a review of AI in fundoscopy applied to representative chorioretinal pathologies, including diabetic retinopathy and age-related macular degeneration, among others. We conclude with a discussion of future directions and current limitations. SHORT CONCLUSION As AI evolves, it will become increasingly essential for the modern ophthalmologist to understand its applications and limitations to improve patient outcomes and continue to innovate.
Collapse
Affiliation(s)
- Matthew Driban
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Audrey Yan
- Department of Medicine, West Virginia School of Osteopathic Medicine, Lewisburg, WV, USA
| | - Amrish Selvam
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Joshua Ong
- Michigan Medicine, University of Michigan, Ann Arbor, USA
| | | | - Jay Chhablani
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
19
|
Ghadiri N. Comment on: 'Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination' and 'ChatGPT in ophthalmology: the dawn of a new era?'. Eye (Lond) 2024; 38:654-655. [PMID: 37770530 PMCID: PMC10920659 DOI: 10.1038/s41433-023-02773-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 09/21/2023] [Accepted: 09/22/2023] [Indexed: 09/30/2023] Open
Affiliation(s)
- Nima Ghadiri
- Department of Ophthalmology, Liverpool University Hospitals NHS Foundation Trust, Liverpool, UK.
- Department of Eye and Vision Science, University of Liverpool, Liverpool, UK.
| |
Collapse
|
20
|
Gurnani B, Kaur K. Leveraging ChatGPT for ophthalmic education: A critical appraisal. Eur J Ophthalmol 2024; 34:323-327. [PMID: 37974429 DOI: 10.1177/11206721231215862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
In recent years, the advent of artificial intelligence (AI) has transformed many sectors, including medical education. This editorial critically appraises the integration of ChatGPT, a state-of-the-art AI language model, into ophthalmic education, focusing on its potential, limitations, and ethical considerations. The application of ChatGPT in teaching and training ophthalmologists presents an innovative method to offer real-time, customized learning experiences. Through a systematic analysis of both experimental and clinical data, this editorial examines how ChatGPT enhances engagement, understanding, and retention of complex ophthalmological concepts. The study also evaluates the efficacy of ChatGPT in simulating patient interactions and clinical scenarios, which can foster improved diagnostic and interpersonal skills. Despite the promising advantages, concerns regarding reliability, lack of personal touch, and potential biases in the AI-generated content are scrutinized. Ethical considerations concerning data privacy and potential misuse are also explored. The findings underline the need for carefully designed integration, continuous evaluation, and adherence to ethical guidelines to maximize benefits while mitigating risks. By shedding light on these multifaceted aspects, this paper contributes to the ongoing discourse on the incorporation of AI in medical education, offering valuable insights and guidance for educators, practitioners, and policymakers aiming to leverage modern technology for enhancing ophthalmic education.
Collapse
Affiliation(s)
- Bharat Gurnani
- Cataract, Cornea, Trauma, External Diseases, Ocular Surface and Refractive Services, ASG Eye Hospital, Jodhpur, Rajasthan, India
- Sadguru Netra Chikitsalya, Shri Sadguru Seva Sangh Trust, Chitrakoot, Madhya Pradesh, India
| | - Kirandeep Kaur
- Cataract, Pediatric Ophthalmology and Strabismus, ASG Eye Hospital, Jodhpur, Rajasthan, India
- Children Eye Care Centre, Sadguru Netra Chikitsalya, Shri Sadguru Seva Sangh Trust, Chitrakoot, Madhya Pradesh, India
| |
Collapse
|
21
|
Balas M, Janic A, Daigle P, Nijhawan N, Hussain A, Gill H, Lahaie GL, Belliveau MJ, Crawford SA, Arjmand P, Ing EB. Evaluating ChatGPT on Orbital and Oculofacial Disorders: Accuracy and Readability Insights. Ophthalmic Plast Reconstr Surg 2024; 40:217-222. [PMID: 37989540 DOI: 10.1097/iop.0000000000002552] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
PURPOSE To assess the accuracy and readability of responses generated by the artificial intelligence model, ChatGPT (version 4.0), to questions related to 10 essential domains of orbital and oculofacial disease. METHODS A set of 100 questions related to the diagnosis, treatment, and interpretation of orbital and oculofacial diseases was posed to ChatGPT 4.0. Responses were evaluated by a panel of 7 experts based on appropriateness and accuracy, with performance scores measured on a 7-item Likert scale. Inter-rater reliability was determined via the intraclass correlation coefficient. RESULTS The artificial intelligence model demonstrated accurate and consistent performance across all 10 domains of orbital and oculofacial disease, with an average appropriateness score of 5.3/6.0 ("mostly appropriate" to "completely appropriate"). Domains of cavernous sinus fistula, retrobulbar hemorrhage, and blepharospasm had the highest domain scores (average scores of 5.5 to 5.6), while the proptosis domain had the lowest (average score of 5.0/6.0). The intraclass correlation coefficient was 0.64 (95% CI: 0.52 to 0.74), reflecting moderate inter-rater reliability. The responses exhibited a high reading-level complexity, representing the comprehension levels of a college or graduate education. CONCLUSIONS This study demonstrates the potential of ChatGPT 4.0 to provide accurate information in the field of ophthalmology, specifically orbital and oculofacial disease. However, challenges remain in ensuring accurate and comprehensive responses across all disease domains. Future improvements should focus on refining the model's correctness and eventually expanding the scope to visual data interpretation. Our results highlight the vast potential for artificial intelligence in educational and clinical ophthalmology contexts.
Collapse
Affiliation(s)
| | | | - Patrick Daigle
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Navdeep Nijhawan
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Ahsen Hussain
- Department of Ophthalmology and Visual Sciences, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Harmeet Gill
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Gabriela L Lahaie
- Department of Ophthalmology, Queen's University, Kingston, Ontario, Canada
| | - Michel J Belliveau
- Department of Ophthalmology, University of Ottawa and The Ottawa Hospital Research Institute, Ottawa, Ontario, Canada
| | - Sean A Crawford
- Temerty Faculty of Medicine
- Division of Vascular Surgery, Department of Surgery, University of Toronto, Toronto, Ontario, Canada
| | | | - Edsel B Ing
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Ophthalmology and Vision Sciences, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
22
|
Ćirković A, Katz T. Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study. JMIR Form Res 2023; 7:e51798. [PMID: 38153777 PMCID: PMC10784977 DOI: 10.2196/51798] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 11/01/2023] [Accepted: 12/04/2023] [Indexed: 12/29/2023] Open
Abstract
BACKGROUND Refractive surgery research aims to optimally precategorize patients by their suitability for various types of surgery. Recent advances have led to the development of artificial intelligence-powered algorithms, including machine learning approaches, to assess risks and enhance workflow. Large language models (LLMs) like ChatGPT-4 (OpenAI LP) have emerged as potential general artificial intelligence tools that can assist across various disciplines, possibly including refractive surgery decision-making. However, their actual capabilities in precategorizing refractive surgery patients based on real-world parameters remain unexplored. OBJECTIVE This exploratory study aimed to validate ChatGPT-4's capabilities in precategorizing refractive surgery patients based on commonly used clinical parameters. The goal was to assess whether ChatGPT-4's performance when categorizing batch inputs is comparable to those made by a refractive surgeon. A simple binary set of categories (patient suitable for laser refractive surgery or not) as well as a more detailed set were compared. METHODS Data from 100 consecutive patients from a refractive clinic were anonymized and analyzed. Parameters included age, sex, manifest refraction, visual acuity, and various corneal measurements and indices from Scheimpflug imaging. This study compared ChatGPT-4's performance with a clinician's categorizations using Cohen κ coefficient, a chi-square test, a confusion matrix, accuracy, precision, recall, F1-score, and receiver operating characteristic area under the curve. RESULTS A statistically significant noncoincidental accordance was found between ChatGPT-4 and the clinician's categorizations with a Cohen κ coefficient of 0.399 for 6 categories (95% CI 0.256-0.537) and 0.610 for binary categorization (95% CI 0.372-0.792). The model showed temporal instability and response variability, however. The chi-square test on 6 categories indicated an association between the 2 raters' distributions (χ²5=94.7, P<.001). Here, the accuracy was 0.68, precision 0.75, recall 0.68, and F1-score 0.70. For 2 categories, the accuracy was 0.88, precision 0.88, recall 0.88, F1-score 0.88, and area under the curve 0.79. CONCLUSIONS This study revealed that ChatGPT-4 exhibits potential as a precategorization tool in refractive surgery, showing promising agreement with clinician categorizations. However, its main limitations include, among others, dependency on solely one human rater, small sample size, the instability and variability of ChatGPT's (OpenAI LP) output between iterations and nontransparency of the underlying models. The results encourage further exploration into the application of LLMs like ChatGPT-4 in health care, particularly in decision-making processes that require understanding vast clinical data. Future research should focus on defining the model's accuracy with prompt and vignette standardization, detecting confounding factors, and comparing to other versions of ChatGPT-4 and other LLMs to pave the way for larger-scale validation and real-world implementation.
Collapse
Affiliation(s)
| | - Toam Katz
- Department of Ophthalmology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
23
|
Roberts RHR, Ali SR, Dobbs TD, Whitaker IS. Can Large Language Models Generate Outpatient Clinic Letters at First Consultation That Incorporate Complication Profiles From UK and USA Aesthetic Plastic Surgery Associations? Aesthet Surg J Open Forum 2023; 6:ojad109. [PMID: 38192329 PMCID: PMC10773662 DOI: 10.1093/asjof/ojad109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024] Open
Abstract
The importance of written communication between clinicians and patients, especially in the wake of the Supreme Court case of Montgomery vs Lanarkshire, has led to a shift toward patient-centric care in the United Kingdom. This study investigates the use of large language models (LLMs) like ChatGPT and Google Bard in enhancing clinic letters with gold-standard complication profiles, aiming to improve patients' understanding and save clinicians' time in aesthetic plastic surgery. The aim of this study is to assess the effectiveness of LLMs in integrating complication profiles from authoritative sources into clinic letters, thus enhancing patient comprehension and clinician efficiency in aesthetic plastic surgery. Seven widely performed aesthetic procedures were chosen, and complication profiles were sourced from the British Association of Aesthetic Plastic Surgeons (BAAPS) and the American Society of Plastic Surgeons (ASPS). We evaluated the proficiency of the ChatGPT4, ChatGPT3.5, and Google Bard in generating clinic letters which incorporated complication profiles from online resources. These letters were assessed for readability using an online tool, targeting a recommended sixth-grade reading level. ChatGPT4 achieved the highest compliance in integrating complication profiles from BAAPS and ASPS websites, with average readability grades between eighth and ninth. ChatGPT3.5 and Google Bard showed lower compliance, particularly when accessing paywalled content like the ASPS Informed Consent Bundle. In conclusion, LLMs, particularly ChatGPT4, show promise in enhancing patient communications in aesthetic plastic surgery by effectively incorporating standard complication profiles into clinic letters. This aids in informed decision making and time saving for clinicians. However, the study underscores the need for improvements in data accessibility, search capabilities, and ethical considerations for optimal LLM integration into healthcare communications. Future enhancements should focus on better interpretation of inaccessible formats and a Human in the Loop approach to combine Artifical Intelligence capabilities with clinician expertise. Level of Evidence 3
Collapse
Affiliation(s)
- Richard H R Roberts
- Corresponding Author: Dr Richard H.R. Roberts, Reconstructive Surgery and Regenerative Medicine Research Centre, Institute of Life Sciences, Swansea University Medical School, Swansea SA2 8PP, UK. E-mail:
| | | | | | | |
Collapse
|
24
|
Sakai D, Maeda T, Ozaki A, Kanda GN, Kurimoto Y, Takahashi M. Performance of ChatGPT in Board Examinations for Specialists in the Japanese Ophthalmology Society. Cureus 2023; 15:e49903. [PMID: 38174202 PMCID: PMC10763518 DOI: 10.7759/cureus.49903] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/04/2023] [Indexed: 01/05/2024] Open
Abstract
We investigated the potential of ChatGPT in the ophthalmological field in the Japanese language using board examinations for specialists in the Japanese Ophthalmology Society. We tested GPT-3.5 and GPT-4-based ChatGPT on five sets of past board examination problems in July 2023. Japanese text was used as the prompt adopting two strategies: zero- and few-shot prompting. We compared the correct answer rate of ChatGPT with that of actual examinees, and the performance characteristics in 10 subspecialties were assessed. ChatGPT-3.5 and ChatGPT-4 correctly answered 112 (22.4%) and 229 (45.8%) out of 500 questions with simple zero-shot prompting, respectively, and ChatGPT-4 correctly answered 231 (46.2%) questions with few-shot prompting. The correct answer rates of ChatGPT-3.5 were approximately two to three times lower than those of the actual examinees for each examination set (p = 0.001). However, the correct answer rates for ChatGPT-4 were close to approximately 70% of those of the examinees. ChatGPT-4 had the highest correct answer rate (71.4% with zero-shot prompting and 61.9% with few-shot prompting) in "blepharoplasty, orbit, and ocular oncology," and the lowest answer rate (30.0% with zero-shot prompting and 23.3% with few-shot prompting) in "pediatric ophthalmology." We concluded that ChatGPT could be one of the advanced technologies for practical tools in Japanese ophthalmology.
Collapse
Affiliation(s)
- Daiki Sakai
- Department of Ophthalmology, Kobe City Eye Hospital, Kobe, JPN
- Department of Ophthalmology, Kobe City Medical Center General Hospital, Kobe, JPN
- Department of Surgery, Division of Ophthalmology, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Tadao Maeda
- Department of Ophthalmology, Kobe City Eye Hospital, Kobe, JPN
| | - Atsuta Ozaki
- Department of Ophthalmology, Kobe City Eye Hospital, Kobe, JPN
- Department of Ophthalmology, Mie University Graduate School of Medicine, Tsu, JPN
| | - Genki N Kanda
- Department of Ophthalmology, Kobe City Eye Hospital, Kobe, JPN
- Laboratory for Biologically Inspired Computing, RIKEN Center for Biosystems Dynamics Research, Kobe, JPN
| | - Yasuo Kurimoto
- Department of Ophthalmology, Kobe City Eye Hospital, Kobe, JPN
- Department of Ophthalmology, Kobe City Medical Center General Hospital, Kobe, JPN
| | | |
Collapse
|
25
|
Pushpanathan K, Lim ZW, Er Yew SM, Chen DZ, Hui'En Lin HA, Lin Goh JH, Wong WM, Wang X, Jin Tan MC, Chang Koh VT, Tham YC. Popular large language model chatbots' accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries. iScience 2023; 26:108163. [PMID: 37915603 PMCID: PMC10616302 DOI: 10.1016/j.isci.2023.108163] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/19/2023] [Accepted: 10/05/2023] [Indexed: 11/03/2023] Open
Abstract
In light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly shuffled, and then graded by three consultant-level ophthalmologists for accuracy (poor, borderline, good) and comprehensiveness. Additionally, we evaluated the self-awareness capabilities (ability to self-check and self-correct) of the LLM-Chatbots. 89.2% of ChatGPT-4.0 responses were 'good'-rated, outperforming ChatGPT-3.5 (59.5%) and Google Bard (40.5%) significantly (all p < 0.001). All three LLM-Chatbots showed optimal mean comprehensiveness scores as well (ranging from 4.6 to 4.7 out of 5). However, they exhibited subpar to moderate self-awareness capabilities. Our study underscores the potential of ChatGPT-4.0 in delivering accurate and comprehensive responses to ocular symptom inquiries. Future rigorous validation of their performance is crucial to ensure their reliability and appropriateness for actual clinical use.
Collapse
Affiliation(s)
- Krithi Pushpanathan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Zhi Wei Lim
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Samantha Min Er Yew
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - David Ziyou Chen
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Ophthalmology, National University Hospital, Singapore, Singapore
| | - Hazel Anne Hui'En Lin
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Ophthalmology, National University Hospital, Singapore, Singapore
| | - Jocelyn Hui Lin Goh
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
| | - Wendy Meihua Wong
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Ophthalmology, National University Hospital, Singapore, Singapore
| | - Xiaofei Wang
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing, China
- Advanced Innovation Centre for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, China
| | - Marcus Chun Jin Tan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Ophthalmology, National University Hospital, Singapore, Singapore
| | - Victor Teck Chang Koh
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Ophthalmology, National University Hospital, Singapore, Singapore
| | - Yih-Chung Tham
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
- Ophthalmology and Visual Sciences Academic Clinical Programme (Eye ACP), Duke NUS Medical School, Singapore, Singapore
| |
Collapse
|
26
|
Jiao C, Edupuganti NR, Patel PA, Bui T, Sheth V. Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge. Cureus 2023; 15:e45700. [PMID: 37868408 PMCID: PMC10590143 DOI: 10.7759/cureus.45700] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/20/2023] [Indexed: 10/24/2023] Open
Abstract
OBJECTIVE We aim to compare the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT)-3.5 and ChatGPT-4.0 (OpenAI, San Francisco, CA, USA) in addressing multiple-choice ophthalmic case challenges. METHODS AND ANALYSIS Both models' accuracy was compared across different ophthalmology subspecialties using multiple-choice ophthalmic clinical cases provided by the American Academy of Ophthalmology (AAO) "Diagnosis This" questions. Additional analysis was based on image content, question difficulty, character length of models' responses, and model's alignment with responses from human respondents. χ2 test, Fisher's exact test, Student's t-test, and one-way analysis of variance (ANOVA) were conducted where appropriate, with p<0.05 considered significant. RESULTS GPT-4.0 significantly outperformed GPT-3.5 (75% versus 46%, p<0.01), with the most noticeable improvement in neuro-ophthalmology (100% versus 38%, p=0.03). While both models struggled with uveitis and refractive questions, GPT-4.0 excelled in other areas, such as pediatric questions (82%). In image-related questions, GPT-4.0 also displayed superior accuracy that trended toward significance (73% versus 46%, p=0.07). GPT-4.0 performed better with easier questions (93.8% (least difficult) versus 76.2% (middle) versus 53.3% (most), p=0.03) and generated more concise answers than GPT-3.5 (651.7±342.9 versus 1,112.9±328.8 characters, p<0.01). Moreover, GPT-4.0's answers were more in line with those of AAO respondents (57.3% versus 41.4%, p<0.01), showing a strong correlation between its accuracy and the proportion of AAO respondents who selected GPT-4.0's answer (ρ=0.713, p<0.01). CONCLUSION AND RELEVANCE Our study demonstrated that GPT-4.0 significantly outperforms GPT-3.5 in addressing ophthalmic case challenges, especially in neuro-ophthalmology, with improved accuracy even in image-related questions. These findings underscore the potential of advancing artificial intelligence (AI) models in enhancing ophthalmic diagnostics and medical education.
Collapse
Affiliation(s)
- Cheng Jiao
- Ophthalmology, Augusta University Medical College of Georgia, Augusta, USA
| | - Neel R Edupuganti
- Ophthalmology, Augusta University Medical College of Georgia, Augusta, USA
| | - Parth A Patel
- Neurology, Augusta University Medical College of Georgia, Augusta, USA
| | - Tommy Bui
- Ophthalmology, Augusta University Medical College of Georgia, Augusta, USA
| | - Veeral Sheth
- Ophthalmology, University Retina and Macula Associates, Oak Forest, USA
| |
Collapse
|