1
|
Yang Z, Wang D, Zhou F, Song D, Zhang Y, Jiang J, Kong K, Liu X, Qiao Y, Chang RT, Han Y, Li F, Tham CC, Zhang X. Understanding Natural Language: Potential Application of Large Language Models to Ophthalmology. Asia Pac J Ophthalmol (Phila) 2024:100085. [PMID: 39059558 DOI: 10.1016/j.apjo.2024.100085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/19/2024] [Accepted: 07/19/2024] [Indexed: 07/28/2024] Open
Abstract
Large language models (LLMs), a natural language processing technology based on deep learning, are currently in the spotlight. These models closely mimic natural language comprehension and generation. Their evolution has undergone several waves of innovation similar to convolutional neural networks. The transformer architecture advancement in generative artificial intelligence marks a monumental leap beyond early-stage pattern recognition via supervised learning. With the expansion of parameters and training data (terabytes), LLMs unveil remarkable human interactivity, encompassing capabilities such as memory retention and comprehension. These advances make LLMs particularly well-suited for roles in healthcare communication between medical practitioners and patients. In this comprehensive review we discuss the trajectory of LLMs and potential implications for clinicians and patients. For clinicians, LLMs can be used for automated medical documentation, and given better inputs and extensive validation, LLMs may be able to autonomously diagnose and treat in the future. For patient care, LLMs can be used for triage suggestions, summarization of medical documents, explanation of a patient's condition, and customizing patient education materials tailored to their comprehension level. The limitations of LLM and possible solutions for real world use are also presented. Given the rapid advancements in this area, this review attempts to briefly cover many roles that LLMs may play in the ophthalmic space, with a focus on improving the quality of healthcare delivery.
Collapse
Affiliation(s)
- Zefeng Yang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Deming Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Fengqi Zhou
- Ophthalmology, Mayo Clinic Health System, Eau Claire, Wisconsin, USA
| | - Diping Song
- Shanghai Artificial Intelligence Laboratory, Shanghai, China; ShenZhen Key Lab of Computer Vision and Pattern Recognition, Shenzhen Institutes of Advanced Technology, The Chinese Academy of Sciences, Shenzhen, China
| | - Yinhang Zhang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Jiaxuan Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Kangjie Kong
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Xiaoyi Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Yu Qiao
- Shanghai Artificial Intelligence Laboratory, Shanghai, China; ShenZhen Key Lab of Computer Vision and Pattern Recognition, Shenzhen Institutes of Advanced Technology, The Chinese Academy of Sciences, Shenzhen, China
| | - Robert T Chang
- Department of Ophthalmology, Byers Eye Institute at Stanford University, Palo Alto, CA, USA
| | - Ying Han
- Department of Ophthalmology, University of California, San Francisco, San Francisco, CA, USA
| | - Fei Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China.
| | - Clement C Tham
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China; Hong Kong Eye Hospital, Kowloon, Hong Kong SAR, China; Department of Ophthalmology and Visual Sciences, Prince of Wales Hospital, Shatin, Hong Kong SAR, China.
| | - Xiulan Zhang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China.
| |
Collapse
|
2
|
Li J, Zong H, Wu E, Wu R, Peng Z, Zhao J, Yang L, Xie H, Shen B. Exploring the potential of artificial intelligence to enhance the writing of english academic papers by non-native english-speaking medical students - the educational application of ChatGPT. BMC MEDICAL EDUCATION 2024; 24:736. [PMID: 38982429 PMCID: PMC11232216 DOI: 10.1186/s12909-024-05738-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 07/02/2024] [Indexed: 07/11/2024]
Abstract
BACKGROUND Academic paper writing holds significant importance in the education of medical students, and poses a clear challenge for those whose first language is not English. This study aims to investigate the effectiveness of employing large language models, particularly ChatGPT, in improving the English academic writing skills of these students. METHODS A cohort of 25 third-year medical students from China was recruited. The study consisted of two stages. Firstly, the students were asked to write a mini paper. Secondly, the students were asked to revise the mini paper using ChatGPT within two weeks. The evaluation of the mini papers focused on three key dimensions, including structure, logic, and language. The evaluation method incorporated both manual scoring and AI scoring utilizing the ChatGPT-3.5 and ChatGPT-4 models. Additionally, we employed a questionnaire to gather feedback on students' experience in using ChatGPT. RESULTS After implementing ChatGPT for writing assistance, there was a notable increase in manual scoring by 4.23 points. Similarly, AI scoring based on the ChatGPT-3.5 model showed an increase of 4.82 points, while the ChatGPT-4 model showed an increase of 3.84 points. These results highlight the potential of large language models in supporting academic writing. Statistical analysis revealed no significant difference between manual scoring and ChatGPT-4 scoring, indicating the potential of ChatGPT-4 to assist teachers in the grading process. Feedback from the questionnaire indicated a generally positive response from students, with 92% acknowledging an improvement in the quality of their writing, 84% noting advancements in their language skills, and 76% recognizing the contribution of ChatGPT in supporting academic research. CONCLUSION The study highlighted the efficacy of large language models like ChatGPT in augmenting the English academic writing proficiency of non-native speakers in medical education. Furthermore, it illustrated the potential of these models to make a contribution to the educational evaluation process, particularly in environments where English is not the primary language.
Collapse
Affiliation(s)
- Jiakun Li
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Hui Zong
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Erman Wu
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
- Department of Neurosurgery, the First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, China
| | - Rongrong Wu
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Zhufeng Peng
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Jing Zhao
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Lu Yang
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Hong Xie
- West China Hospital, West China School of Medicine, Sichuan University, No. 37, Guoxue Alley, Chengdu, 610041, China.
| | - Bairong Shen
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
3
|
Kenney RC, Requarth TW, Jack AI, Hyman SW, Galetta SL, Grossman SN. AI in Neuro-Ophthalmology: Current Practice and Future Opportunities. J Neuroophthalmol 2024:00041327-990000000-00679. [PMID: 38965655 DOI: 10.1097/wno.0000000000002205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/06/2024]
Abstract
BACKGROUND Neuro-ophthalmology frequently requires a complex and multi-faceted clinical assessment supported by sophisticated imaging techniques in order to assess disease status. The current approach to diagnosis requires substantial expertise and time. The emergence of AI has brought forth innovative solutions to streamline and enhance this diagnostic process, which is especially valuable given the shortage of neuro-ophthalmologists. Machine learning algorithms, in particular, have demonstrated significant potential in interpreting imaging data, identifying subtle patterns, and aiding clinicians in making more accurate and timely diagnosis while also supplementing nonspecialist evaluations of neuro-ophthalmic disease. EVIDENCE ACQUISITION Electronic searches of published literature were conducted using PubMed and Google Scholar. A comprehensive search of the following terms was conducted within the Journal of Neuro-Ophthalmology: AI, artificial intelligence, machine learning, deep learning, natural language processing, computer vision, large language models, and generative AI. RESULTS This review aims to provide a comprehensive overview of the evolving landscape of AI applications in neuro-ophthalmology. It will delve into the diverse applications of AI, optical coherence tomography (OCT), and fundus photography to the development of predictive models for disease progression. Additionally, the review will explore the integration of generative AI into neuro-ophthalmic education and clinical practice. CONCLUSIONS We review the current state of AI in neuro-ophthalmology and its potentially transformative impact. The inclusion of AI in neuro-ophthalmic practice and research not only holds promise for improving diagnostic accuracy but also opens avenues for novel therapeutic interventions. We emphasize its potential to improve access to scarce subspecialty resources while examining the current challenges associated with the integration of AI into clinical practice and research.
Collapse
Affiliation(s)
- Rachel C Kenney
- Departments of Neurology (RCK, AJ, SH, SG, SNG), Population Health (RCK), and Ophthalmology (SG), New York University Grossman School of Medicine, New York, New York; and Vilcek Institute of Graduate Biomedical Sciences (TR), New York University Grossman School of Medicine, New York, New York
| | | | | | | | | | | |
Collapse
|
4
|
Rossettini G, Rodeghiero L, Corradi F, Cook C, Pillastrini P, Turolla A, Castellini G, Chiappinotto S, Gianola S, Palese A. Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study. BMC MEDICAL EDUCATION 2024; 24:694. [PMID: 38926809 PMCID: PMC11210096 DOI: 10.1186/s12909-024-05630-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 06/04/2024] [Indexed: 06/28/2024]
Abstract
BACKGROUND Artificial intelligence (AI) chatbots are emerging educational tools for students in healthcare science. However, assessing their accuracy is essential prior to adoption in educational settings. This study aimed to assess the accuracy of predicting the correct answers from three AI chatbots (ChatGPT-4, Microsoft Copilot and Google Gemini) in the Italian entrance standardized examination test of healthcare science degrees (CINECA test). Secondarily, we assessed the narrative coherence of the AI chatbots' responses (i.e., text output) based on three qualitative metrics: the logical rationale behind the chosen answer, the presence of information internal to the question, and presence of information external to the question. METHODS An observational cross-sectional design was performed in September of 2023. Accuracy of the three chatbots was evaluated for the CINECA test, where questions were formatted using a multiple-choice structure with a single best answer. The outcome is binary (correct or incorrect). Chi-squared test and a post hoc analysis with Bonferroni correction assessed differences among chatbots performance in accuracy. A p-value of < 0.05 was considered statistically significant. A sensitivity analysis was performed, excluding answers that were not applicable (e.g., images). Narrative coherence was analyzed by absolute and relative frequencies of correct answers and errors. RESULTS Overall, of the 820 CINECA multiple-choice questions inputted into all chatbots, 20 questions were not imported in ChatGPT-4 (n = 808) and Google Gemini (n = 808) due to technical limitations. We found statistically significant differences in the ChatGPT-4 vs Google Gemini and Microsoft Copilot vs Google Gemini comparisons (p-value < 0.001). The narrative coherence of AI chatbots revealed "Logical reasoning" as the prevalent correct answer (n = 622, 81.5%) and "Logical error" as the prevalent incorrect answer (n = 40, 88.9%). CONCLUSIONS Our main findings reveal that: (A) AI chatbots performed well; (B) ChatGPT-4 and Microsoft Copilot performed better than Google Gemini; and (C) their narrative coherence is primarily logical. Although AI chatbots showed promising accuracy in predicting the correct answer in the Italian entrance university standardized examination test, we encourage candidates to cautiously incorporate this new technology to supplement their learning rather than a primary resource. TRIAL REGISTRATION Not required.
Collapse
Affiliation(s)
- Giacomo Rossettini
- School of Physiotherapy, University of Verona, Verona, Italy.
- Department of Physiotherapy, Faculty of Sport Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, 28670, Spain.
| | - Lia Rodeghiero
- Department of Rehabilitation, Hospital of Merano (SABES-ASDAA), Teaching Hospital of Paracelsus Medical University (PMU), Merano-Meran, Italy.
| | | | - Chad Cook
- Department of Orthopaedics, Duke University, Durham, NC, USA
- Duke Clinical Research Institute, Duke University, Durham, NC, USA
- Department of Population Health Sciences, Duke University, Durham, NC, USA
| | - Paolo Pillastrini
- Department of Biomedical and Neuromotor Sciences (DIBINEM), Alma Mater University of Bologna, Bologna, Italy
- Unit of Occupational Medicine, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, Bologna, Italy
| | - Andrea Turolla
- Department of Biomedical and Neuromotor Sciences (DIBINEM), Alma Mater University of Bologna, Bologna, Italy
- Unit of Occupational Medicine, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, Bologna, Italy
| | - Greta Castellini
- Unit of Clinical Epidemiology, IRCCS Istituto Ortopedico Galeazzi, Milan, Italy
| | | | - Silvia Gianola
- Unit of Clinical Epidemiology, IRCCS Istituto Ortopedico Galeazzi, Milan, Italy.
| | - Alvisa Palese
- Department of Medical Sciences, University of Udine, Udine, Italy.
| |
Collapse
|
5
|
Paul S, Govindaraj S, Jk J. ChatGPT Versus National Eligibility cum Entrance Test for Postgraduate (NEET PG). Cureus 2024; 16:e63048. [PMID: 39050297 PMCID: PMC11268980 DOI: 10.7759/cureus.63048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/24/2024] [Indexed: 07/27/2024] Open
Abstract
Introduction With both suspicion and excitement, artificial intelligence tools are being integrated into nearly every aspect of human existence, including medical sciences and medical education. The newest large language model (LLM) in the class of autoregressive language models is ChatGPT. While ChatGPT's potential to revolutionize clinical practice and medical education is under investigation, further research is necessary to understand its strengths and limitations in this field comprehensively. Methods Two hundred National Eligibility cum Entrance Test for Postgraduate 2023 questions were gathered from various public education websites and individually entered into Microsoft Bing (GPT-4 Version 2.2.1). Microsoft Bing Chatbot is currently the only platform incorporating all of GPT-4's multimodal features, including image recognition. The results were subsequently analyzed. Results Out of 200 questions, ChatGPT-4 answered 129 correctly. The most tested specialties were medicine (15%), obstetrics and gynecology (15%), general surgery (14%), and pathology (10%), respectively. Conclusion This study sheds light on how well the GPT-4 performs in addressing the NEET-PG entrance test. ChatGPT has potential as an adjunctive instrument within medical education and clinical settings. Its capacity to react intelligently and accurately in complicated clinical settings demonstrates its versatility.
Collapse
Affiliation(s)
- Sam Paul
- General Surgery, St John's Medical College Hospital, Bengaluru, IND
| | - Sridar Govindaraj
- Surgical Gastroenterology and Laparoscopy, St John's Medical College Hospital, Bengaluru, IND
| | - Jerisha Jk
- Pediatrics and Neonatology, Christian Medical College Ludhiana, Ludhiana, IND
| |
Collapse
|
6
|
Tailor PD, D'Souza HS, Li H, Starr MR. Vision of the future: large language models in ophthalmology. Curr Opin Ophthalmol 2024:00055735-990000000-00173. [PMID: 38814572 DOI: 10.1097/icu.0000000000001062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
PURPOSE OF REVIEW Large language models (LLMs) are rapidly entering the landscape of medicine in areas from patient interaction to clinical decision-making. This review discusses the evolving role of LLMs in ophthalmology, focusing on their current applications and future potential in enhancing ophthalmic care. RECENT FINDINGS LLMs in ophthalmology have demonstrated potential in improving patient communication and aiding preliminary diagnostics because of their ability to process complex language and generate human-like domain-specific interactions. However, some studies have shown potential for harm and there have been no prospective real-world studies evaluating the safety and efficacy of LLMs in practice. SUMMARY While current applications are largely theoretical and require rigorous safety testing before implementation, LLMs exhibit promise in augmenting patient care quality and efficiency. Challenges such as data privacy and user acceptance must be overcome before LLMs can be fully integrated into clinical practice.
Collapse
Affiliation(s)
| | - Haley S D'Souza
- Department of Ophthalmology, Mayo Clinic, Rochester, Minnesota
| | - Hanzhou Li
- Department of Radiology, Emory University, Atlanta, Georgia, USA
| | - Matthew R Starr
- Department of Ophthalmology, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
7
|
SAYGIN M, BEKMEZCİ M, DİNÇER E. Artificial Intelligence Model Chatgpt-4: Entrepreneur Candidate and Entrepreneurship Example. F1000Res 2024; 13:308. [PMID: 38845823 PMCID: PMC11153998 DOI: 10.12688/f1000research.144671.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/20/2024] [Indexed: 06/09/2024] Open
Abstract
Background Although artificial intelligence technologies are still in their infancy, it is seen that they can bring together both hope and anxiety for the future. In the research, it is focused on examining the ChatGPT-4 version, which is one of the most well-known artificial intelligence applications and claimed to have self-learning feature, within the scope of business establishment processes. Methods In this direction, the assessment questions in the Entrepreneurship Handbook, published as open access by the Small and Medium Enterprises Development Organization of Turkey, which focuses on guiding the entrepreneurial processes in Turkey and creating the perception of entrepreneurship, were combined with the artificial intelligence model ChatGPT-4 and analysed within three stages. The way of solving the questions of artificial intelligence modelling and the answers it provides have the opportunity to be compared with the entrepreneurship literature. Results It has been seen that the artificial intelligence modelling ChatGPT-4, being an outstanding entrepreneurship example itself, has succeeded in answering the questions posed in the context of 16 modules in the entrepreneurship handbook in an original way by analysing deeply. Conclusion It has also been concluded that it is quite creative in developing new alternatives to the correct answers specified in the entrepreneurship handbook. The original aspect of the research is that it is one of the pioneers of the study on artificial intelligence and entrepreneurship in literature.
Collapse
|
8
|
Kedia N, Sanjeev S, Ong J, Chhablani J. ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology. Eye (Lond) 2024; 38:1252-1261. [PMID: 38172581 PMCID: PMC11076576 DOI: 10.1038/s41433-023-02915-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/23/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open
Abstract
ChatGPT, an artificial intelligence (AI) chatbot built on large language models (LLMs), has rapidly gained popularity. The benefits and limitations of this transformative technology have been discussed across various fields, including medicine. The widespread availability of ChatGPT has enabled clinicians to study how these tools could be used for a variety of tasks such as generating differential diagnosis lists, organizing patient notes, and synthesizing literature for scientific research. LLMs have shown promising capabilities in ophthalmology by performing well on the Ophthalmic Knowledge Assessment Program, providing fairly accurate responses to questions about retinal diseases, and in generating differential diagnoses list. There are current limitations to this technology, including the propensity of LLMs to "hallucinate", or confidently generate false information; their potential role in perpetuating biases in medicine; and the challenges in incorporating LLMs into research without allowing "AI-plagiarism" or publication of false information. In this paper, we provide a balanced overview of what LLMs are and introduce some of the LLMs that have been generated in the past few years. We discuss recent literature evaluating the role of these language models in medicine with a focus on ChatGPT. The field of AI is fast-paced, and new applications based on LLMs are being generated rapidly; therefore, it is important for ophthalmologists to be aware of how this technology works and how it may impact patient care. Here, we discuss the benefits, limitations, and future advancements of LLMs in patient care and research.
Collapse
Affiliation(s)
- Nikita Kedia
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | | | - Joshua Ong
- Department of Ophthalmology and Visual Sciences, University of Michigan Kellogg Eye Center, Ann Arbor, MI, USA
| | - Jay Chhablani
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
9
|
Momenaei B, Mansour HA, Kuriyan AE, Xu D, Sridhar J, Ting DSW, Yonekawa Y. ChatGPT enters the room: what it means for patient counseling, physician education, academics, and disease management. Curr Opin Ophthalmol 2024; 35:205-209. [PMID: 38334288 DOI: 10.1097/icu.0000000000001036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
PURPOSE OF REVIEW This review seeks to provide a summary of the most recent research findings regarding the utilization of ChatGPT, an artificial intelligence (AI)-powered chatbot, in the field of ophthalmology in addition to exploring the limitations and ethical considerations associated with its application. RECENT FINDINGS ChatGPT has gained widespread recognition and demonstrated potential in enhancing patient and physician education, boosting research productivity, and streamlining administrative tasks. In various studies examining its utility in ophthalmology, ChatGPT has exhibited fair to good accuracy, with its most recent iteration showcasing superior performance in providing ophthalmic recommendations across various ophthalmic disorders such as corneal diseases, orbital disorders, vitreoretinal diseases, uveitis, neuro-ophthalmology, and glaucoma. This proves beneficial for patients in accessing information and aids physicians in triaging as well as formulating differential diagnoses. Despite such benefits, ChatGPT has limitations that require acknowledgment including the potential risk of offering inaccurate or harmful information, dependence on outdated data, the necessity for a high level of education for data comprehension, and concerns regarding patient privacy and ethical considerations within the research domain. SUMMARY ChatGPT is a promising new tool that could contribute to ophthalmic healthcare education and research, potentially reducing work burdens. However, its current limitations necessitate a complementary role with human expert oversight.
Collapse
Affiliation(s)
- Bita Momenaei
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Hana A Mansour
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Ajay E Kuriyan
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - David Xu
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Jayanth Sridhar
- University of California Los Angeles, Los Angeles, California, USA
| | | | - Yoshihiro Yonekawa
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| |
Collapse
|
10
|
Biswas S, Davies LN, Sheppard AL, Logan NS, Wolffsohn JS. Utility of artificial intelligence-based large language models in ophthalmic care. Ophthalmic Physiol Opt 2024; 44:641-671. [PMID: 38404172 DOI: 10.1111/opo.13284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 01/23/2024] [Accepted: 01/25/2024] [Indexed: 02/27/2024]
Abstract
PURPOSE With the introduction of ChatGPT, artificial intelligence (AI)-based large language models (LLMs) are rapidly becoming popular within the scientific community. They use natural language processing to generate human-like responses to queries. However, the application of LLMs and comparison of the abilities among different LLMs with their human counterparts in ophthalmic care remain under-reported. RECENT FINDINGS Hitherto, studies in eye care have demonstrated the utility of ChatGPT in generating patient information, clinical diagnosis and passing ophthalmology question-based examinations, among others. LLMs' performance (median accuracy, %) is influenced by factors such as the iteration, prompts utilised and the domain. Human expert (86%) demonstrated the highest proficiency in disease diagnosis, while ChatGPT-4 outperformed others in ophthalmology examinations (75.9%), symptom triaging (98%) and providing information and answering questions (84.6%). LLMs exhibited superior performance in general ophthalmology but reduced accuracy in ophthalmic subspecialties. Although AI-based LLMs like ChatGPT are deemed more efficient than their human counterparts, these AIs are constrained by their nonspecific and outdated training, no access to current knowledge, generation of plausible-sounding 'fake' responses or hallucinations, inability to process images, lack of critical literature analysis and ethical and copyright issues. A comprehensive evaluation of recently published studies is crucial to deepen understanding of LLMs and the potential of these AI-based LLMs. SUMMARY Ophthalmic care professionals should undertake a conservative approach when using AI, as human judgement remains essential for clinical decision-making and monitoring the accuracy of information. This review identified the ophthalmic applications and potential usages which need further exploration. With the advancement of LLMs, setting standards for benchmarking and promoting best practices is crucial. Potential clinical deployment requires the evaluation of these LLMs to move away from artificial settings, delve into clinical trials and determine their usefulness in the real world.
Collapse
Affiliation(s)
- Sayantan Biswas
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Leon N Davies
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Amy L Sheppard
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Nicola S Logan
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - James S Wolffsohn
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| |
Collapse
|
11
|
Haghighi T, Gholami S, Sokol JT, Kishnani E, Ahsaniyan A, Rahmanian H, Hedayati F, Leng T, Alam MN. EYE-Llama, an in-domain large language model for ophthalmology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.26.591355. [PMID: 38746183 PMCID: PMC11092466 DOI: 10.1101/2024.04.26.591355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Background Training Large Language Models (LLMs) with in-domain data can significantly enhance their performance, leading to more accurate and reliable question-answering (QA) systems essential for supporting clinical decision-making and educating patients. Methods This study introduces LLMs trained on in-domain, well-curated ophthalmic datasets. We also present an open-source substantial ophthalmic language dataset for model training. Our LLMs (EYE-Llama), first pre-trained on an ophthalmology-specific dataset, including paper abstracts, textbooks, EyeWiki, and Wikipedia articles. Subsequently, the models underwent fine-tuning using a diverse range of QA datasets. The LLMs at each stage were then compared to baseline Llama 2, ChatDoctor, and ChatGPT (GPT3.5) models, using four distinct test sets, and evaluated quantitatively (Accuracy, F1 score, and BERTScore) and qualitatively by two ophthalmologists. Results Upon evaluating the models using the American Academy of Ophthalmology (AAO) test set and BERTScore as the metric, our models surpassed both Llama 2 and ChatDoctor in terms of F1 score and performed equally to ChatGPT, which was trained with 175 billion parameters (EYE-Llama: 0.57, Llama 2: 0.56, ChatDoctor: 0.56, and ChatGPT: 0.57). When evaluated on the MedMCQA test set, the fine-tuned models demonstrated a higher accuracy compared to the Llama 2 and ChatDoctor models (EYE-Llama: 0.39, Llama 2: 0.33, ChatDoctor: 0.29). However, ChatGPT outperformed EYE-Llama with an accuracy of 0.55. When tested with the PubmedQA set, the fine-tuned model showed improvement in accuracy over both the Llama 2, ChatGPT, and ChatDoctor models (EYE-Llama: 0.96, Llama 2: 0.90, ChatGPT: 0.93, ChatDoctor: 0.92). Conclusion The study shows that pre-training and fine-tuning LLMs like EYE-Llama enhances their performance in specific medical domains. Our EYE-Llama models surpass baseline Llama 2 in all evaluations, highlighting the effectiveness of specialized LLMs in medical QA systems. (Funded by NEI R15EY035804 (MNA) and UNC Charlotte Faculty Research Grant (MNA).).
Collapse
Affiliation(s)
- Tania Haghighi
- Department of Electrical Engineering, University of North Carolina at Charlotte, Charlotte, NC, United States
- Department of Computer Science, Baha’i Institute for Higher Education, Tehran, Iran
| | - Sina Gholami
- Department of Electrical Engineering, University of North Carolina at Charlotte, Charlotte, NC, United States
| | - Jared Todd Sokol
- Department of Ophthalmology Stanford University School of Medicine, Stanford, CA, United States
| | - Enaika Kishnani
- Department of Electrical Engineering, University of North Carolina at Charlotte, Charlotte, NC, United States
| | - Adnan Ahsaniyan
- Department of Computer Science, Baha’i Institute for Higher Education, Tehran, Iran
| | - Holakou Rahmanian
- Department of Computer Science, Baha’i Institute for Higher Education, Tehran, Iran
| | - Fares Hedayati
- Department of Computer Science, Baha’i Institute for Higher Education, Tehran, Iran
| | - Theodore Leng
- Department of Ophthalmology Stanford University School of Medicine, Stanford, CA, United States
| | - Minhaj Nur Alam
- Department of Electrical Engineering, University of North Carolina at Charlotte, Charlotte, NC, United States
| |
Collapse
|
12
|
Zandi R, Fahey JD, Drakopoulos M, Bryan JM, Dong S, Bryar PJ, Bidwell AE, Bowen RC, Lavine JA, Mirza RG. Exploring Diagnostic Precision and Triage Proficiency: A Comparative Study of GPT-4 and Bard in Addressing Common Ophthalmic Complaints. Bioengineering (Basel) 2024; 11:120. [PMID: 38391606 PMCID: PMC10886029 DOI: 10.3390/bioengineering11020120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 01/20/2024] [Accepted: 01/24/2024] [Indexed: 02/24/2024] Open
Abstract
In the modern era, patients often resort to the internet for answers to their health-related concerns, and clinics face challenges to providing timely response to patient concerns. This has led to a need to investigate the capabilities of AI chatbots for ophthalmic diagnosis and triage. In this in silico study, 80 simulated patient complaints in ophthalmology with varying urgency levels and clinical descriptors were entered into both ChatGPT and Bard in a systematic 3-step submission process asking chatbots to triage, diagnose, and evaluate urgency. Three ophthalmologists graded chatbot responses. Chatbots were significantly better at ophthalmic triage than diagnosis (90.0% appropriate triage vs. 48.8% correct leading diagnosis; p < 0.001), and GPT-4 performed better than Bard for appropriate triage recommendations (96.3% vs. 83.8%; p = 0.008), grader satisfaction for patient use (81.3% vs. 55.0%; p < 0.001), and lower potential harm rates (6.3% vs. 20.0%; p = 0.010). More descriptors improved the accuracy of diagnosis for both GPT-4 and Bard. These results indicate that chatbots may not need to recognize the correct diagnosis to provide appropriate ophthalmic triage, and there is a potential utility of these tools in aiding patients or triage staff; however, they are not a replacement for professional ophthalmic evaluation or advice.
Collapse
Affiliation(s)
- Roya Zandi
- Department of Ophthalmology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Joseph D Fahey
- Department of Ophthalmology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Michael Drakopoulos
- Department of Ophthalmology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - John M Bryan
- Department of Ophthalmology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Siyuan Dong
- Division of Biostatistics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Paul J Bryar
- Department of Ophthalmology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Ann E Bidwell
- Department of Ophthalmology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - R Chris Bowen
- Department of Ophthalmology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Jeremy A Lavine
- Department of Ophthalmology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Rukhsana G Mirza
- Department of Ophthalmology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| |
Collapse
|
13
|
Sallam M, Al-Salahat K, Al-Ajlouni E. ChatGPT Performance in Diagnostic Clinical Microbiology Laboratory-Oriented Case Scenarios. Cureus 2023; 15:e50629. [PMID: 38107211 PMCID: PMC10725273 DOI: 10.7759/cureus.50629] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/16/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI)-based tools can reshape healthcare practice. This includes ChatGPT which is considered among the most popular AI-based conversational models. Nevertheless, the performance of different versions of ChatGPT needs further evaluation in different settings to assess its reliability and credibility in various healthcare-related tasks. Therefore, the current study aimed to assess the performance of the freely available ChatGPT-3.5 and the paid version ChatGPT-4 in 10 different diagnostic clinical microbiology case scenarios. METHODS The current study followed the METRICS (Model, Evaluation, Timing/Transparency, Range/Randomization, Individual factors, Count, Specificity of the prompts/language) checklist for standardization of the design and reporting of AI-based studies in healthcare. The models tested on December 3, 2023 included ChatGPT-3.5 and ChatGPT-4 and the evaluation of the ChatGPT-generated content was based on the CLEAR tool (Completeness, Lack of false information, Evidence support, Appropriateness, and Relevance) assessed on a 5-point Likert scale with a range of the CLEAR scores of 1-5. ChatGPT output was evaluated by two raters independently and the inter-rater agreement was based on the Cohen's κ statistic. Ten diagnostic clinical microbiology laboratory case scenarios were created in the English language by three microbiologists at diverse levels of expertise following an internal discussion of common cases observed in Jordan. The range of topics included bacteriology, mycology, parasitology, and virology cases. Specific prompts were tailored based on the CLEAR tool and a new session was selected following prompting each case scenario. RESULTS The Cohen's κ values for the five CLEAR items were 0.351-0.737 for ChatGPT-3.5 and 0.294-0.701 for ChatGPT-4 indicating fair to good agreement and suitability for analysis. Based on the average CLEAR scores, ChatGPT-4 outperformed ChatGPT-3.5 (mean: 2.64±1.06 vs. 3.21±1.05, P=.012, t-test). The performance of each model varied based on the CLEAR items, with the lowest performance for the "Relevance" item (2.15±0.71 for ChatGPT-3.5 and 2.65±1.16 for ChatGPT-4). A statistically significant difference upon assessing the performance per each CLEAR item was only seen in ChatGPT-4 with the best performance in "Completeness", "Lack of false information", and "Evidence support" (P=0.043). The lowest level of performance for both models was observed with antimicrobial susceptibility testing (AST) queries while the highest level of performance was seen in bacterial and mycologic identification. CONCLUSIONS Assessment of ChatGPT performance across different diagnostic clinical microbiology case scenarios showed that ChatGPT-4 outperformed ChatGPT-3.5. The performance of ChatGPT demonstrated noticeable variability depending on the specific topic evaluated. A primary shortcoming of both ChatGPT models was the tendency to generate irrelevant content lacking the needed focus. Although the overall ChatGPT performance in these diagnostic microbiology case scenarios might be described as "above average" at best, there remains a significant potential for improvement, considering the identified limitations and unsatisfactory results in a few cases.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, The University of Jordan, School of Medicine, Amman, JOR
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, JOR
| | - Khaled Al-Salahat
- Department of Pathology, Microbiology and Forensic Medicine, The University of Jordan, School of Medicine, Amman, JOR
| | - Eyad Al-Ajlouni
- Department of Pathology, Microbiology and Forensic Medicine, The University of Jordan, School of Medicine, Amman, JOR
| |
Collapse
|
14
|
Antaki F, Milad D, Chia MA, Giguère CÉ, Touma S, El-Khoury J, Keane PA, Duval R. Capabilities of GPT-4 in ophthalmology: an analysis of model entropy and progress towards human-level medical question answering. Br J Ophthalmol 2023:bjo-2023-324438. [PMID: 37923374 DOI: 10.1136/bjo-2023-324438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/01/2023] [Indexed: 11/07/2023]
Abstract
BACKGROUND Evidence on the performance of Generative Pre-trained Transformer 4 (GPT-4), a large language model (LLM), in the ophthalmology question-answering domain is needed. METHODS We tested GPT-4 on two 260-question multiple choice question sets from the Basic and Clinical Science Course (BCSC) Self-Assessment Program and the OphthoQuestions question banks. We compared the accuracy of GPT-4 models with varying temperatures (creativity setting) and evaluated their responses in a subset of questions. We also compared the best-performing GPT-4 model to GPT-3.5 and to historical human performance. RESULTS GPT-4-0.3 (GPT-4 with a temperature of 0.3) achieved the highest accuracy among GPT-4 models, with 75.8% on the BCSC set and 70.0% on the OphthoQuestions set. The combined accuracy was 72.9%, which represents an 18.3% raw improvement in accuracy compared with GPT-3.5 (p<0.001). Human graders preferred responses from models with a temperature higher than 0 (more creative). Exam section, question difficulty and cognitive level were all predictive of GPT-4-0.3 answer accuracy. GPT-4-0.3's performance was numerically superior to human performance on the BCSC (75.8% vs 73.3%) and OphthoQuestions (70.0% vs 63.0%), but the difference was not statistically significant (p=0.55 and p=0.09). CONCLUSION GPT-4, an LLM trained on non-ophthalmology-specific data, performs significantly better than its predecessor on simulated ophthalmology board-style exams. Remarkably, its performance tended to be superior to historical human performance, but that difference was not statistically significant in our study.
Collapse
Affiliation(s)
- Fares Antaki
- Moorfields Eye Hospital NHS Foundation Trust, London, UK
- Institute of Ophthalmology, UCL, London, UK
- The CHUM School of Artificial Intelligence in Healthcare, Montreal, Quebec, Canada
- Department of Ophthalmology, University of Montreal, Montreal, Quebec, Canada
- Department of Ophthalmology, Centre Hospitalier de l'Universite de Montreal (CHUM), Montreal, Quebec, Canada
| | - Daniel Milad
- Department of Ophthalmology, University of Montreal, Montreal, Quebec, Canada
- Department of Ophthalmology, Centre Hospitalier de l'Universite de Montreal (CHUM), Montreal, Quebec, Canada
- Department of Ophthalmology, Hopital Maisonneuve-Rosemont, Montreal, Quebec, Canada
| | - Mark A Chia
- Moorfields Eye Hospital NHS Foundation Trust, London, UK
- Institute of Ophthalmology, UCL, London, UK
| | | | - Samir Touma
- Department of Ophthalmology, University of Montreal, Montreal, Quebec, Canada
- Department of Ophthalmology, Centre Hospitalier de l'Universite de Montreal (CHUM), Montreal, Quebec, Canada
- Department of Ophthalmology, Hopital Maisonneuve-Rosemont, Montreal, Quebec, Canada
| | - Jonathan El-Khoury
- Department of Ophthalmology, University of Montreal, Montreal, Quebec, Canada
- Department of Ophthalmology, Centre Hospitalier de l'Universite de Montreal (CHUM), Montreal, Quebec, Canada
- Department of Ophthalmology, Hopital Maisonneuve-Rosemont, Montreal, Quebec, Canada
| | - Pearse A Keane
- Moorfields Eye Hospital NHS Foundation Trust, London, UK
- Institute of Ophthalmology, UCL, London, UK
- NIHR Moorfields Biomedical Research Centre, London, UK
| | - Renaud Duval
- Department of Ophthalmology, University of Montreal, Montreal, Quebec, Canada
- Department of Ophthalmology, Hopital Maisonneuve-Rosemont, Montreal, Quebec, Canada
| |
Collapse
|