Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Teebagy S, Colwell L, Wood E, Yaghy A, Faustina M. Improved Performance of ChatGPT-4 on the OKAP Examination: A Comparative Study with ChatGPT-3.5. J Acad Ophthalmol (2017) 2023;15:e184-e187. [PMID: 37701862 PMCID: PMC10495224 DOI: 10.1055/s-0043-1774399] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 08/10/2023] [Indexed: 09/14/2023]

For:	Teebagy S, Colwell L, Wood E, Yaghy A, Faustina M. Improved Performance of ChatGPT-4 on the OKAP Examination: A Comparative Study with ChatGPT-3.5. J Acad Ophthalmol (2017) 2023;15:e184-e187. [PMID: 37701862 PMCID: PMC10495224 DOI: 10.1055/s-0043-1774399] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 08/10/2023] [Indexed: 09/14/2023]

Number

Cited by Other Article(s)

Yang Z, Wang D, Zhou F, Song D, Zhang Y, Jiang J, Kong K, Liu X, Qiao Y, Chang RT, Han Y, Li F, Tham CC, Zhang X. Understanding Natural Language: Potential Application of Large Language Models to Ophthalmology. Asia Pac J Ophthalmol (Phila) 2024:100085. [PMID: 39059558 DOI: 10.1016/j.apjo.2024.100085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/19/2024] [Accepted: 07/19/2024] [Indexed: 07/28/2024] Open

Affiliation(s)

Zefeng Yang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Deming Wang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Fengqi Zhou Ophthalmology, Mayo Clinic Health System, Eau Claire, Wisconsin, USA
Diping Song Shanghai Artificial Intelligence Laboratory, Shanghai, China; ShenZhen Key Lab of Computer Vision and Pattern Recognition, Shenzhen Institutes of Advanced Technology, The Chinese Academy of Sciences, Shenzhen, China
Yinhang Zhang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Jiaxuan Jiang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Kangjie Kong State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Xiaoyi Liu State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Yu Qiao Shanghai Artificial Intelligence Laboratory, Shanghai, China; ShenZhen Key Lab of Computer Vision and Pattern Recognition, Shenzhen Institutes of Advanced Technology, The Chinese Academy of Sciences, Shenzhen, China
Robert T Chang Department of Ophthalmology, Byers Eye Institute at Stanford University, Palo Alto, CA, USA
Ying Han Department of Ophthalmology, University of California, San Francisco, San Francisco, CA, USA
Fei Li State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China.
Clement C Tham Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China; Hong Kong Eye Hospital, Kowloon, Hong Kong SAR, China; Department of Ophthalmology and Visual Sciences, Prince of Wales Hospital, Shatin, Hong Kong SAR, China.
Xiulan Zhang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China.

Collapse

Li J, Zong H, Wu E, Wu R, Peng Z, Zhao J, Yang L, Xie H, Shen B. Exploring the potential of artificial intelligence to enhance the writing of english academic papers by non-native english-speaking medical students - the educational application of ChatGPT. BMC MEDICAL EDUCATION 2024;24:736. [PMID: 38982429 PMCID: PMC11232216 DOI: 10.1186/s12909-024-05738-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 07/02/2024] [Indexed: 07/11/2024]

Abstract

BACKGROUND

Academic paper writing holds significant importance in the education of medical students, and poses a clear challenge for those whose first language is not English. This study aims to investigate the effectiveness of employing large language models, particularly ChatGPT, in improving the English academic writing skills of these students.

METHODS

A cohort of 25 third-year medical students from China was recruited. The study consisted of two stages. Firstly, the students were asked to write a mini paper. Secondly, the students were asked to revise the mini paper using ChatGPT within two weeks. The evaluation of the mini papers focused on three key dimensions, including structure, logic, and language. The evaluation method incorporated both manual scoring and AI scoring utilizing the ChatGPT-3.5 and ChatGPT-4 models. Additionally, we employed a questionnaire to gather feedback on students' experience in using ChatGPT.

RESULTS

After implementing ChatGPT for writing assistance, there was a notable increase in manual scoring by 4.23 points. Similarly, AI scoring based on the ChatGPT-3.5 model showed an increase of 4.82 points, while the ChatGPT-4 model showed an increase of 3.84 points. These results highlight the potential of large language models in supporting academic writing. Statistical analysis revealed no significant difference between manual scoring and ChatGPT-4 scoring, indicating the potential of ChatGPT-4 to assist teachers in the grading process. Feedback from the questionnaire indicated a generally positive response from students, with 92% acknowledging an improvement in the quality of their writing, 84% noting advancements in their language skills, and 76% recognizing the contribution of ChatGPT in supporting academic research.

CONCLUSION

The study highlighted the efficacy of large language models like ChatGPT in augmenting the English academic writing proficiency of non-native speakers in medical education. Furthermore, it illustrated the potential of these models to make a contribution to the educational evaluation process, particularly in environments where English is not the primary language.

Collapse

Affiliation(s)

Jiakun Li Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
Hui Zong Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
Erman Wu Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China Department of Neurosurgery, the First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, China
Rongrong Wu Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
Zhufeng Peng Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
Jing Zhao Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
Lu Yang Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
Hong Xie West China Hospital, West China School of Medicine, Sichuan University, No. 37, Guoxue Alley, Chengdu, 610041, China.
Bairong Shen Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China. Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China.

Collapse

Kenney RC, Requarth TW, Jack AI, Hyman SW, Galetta SL, Grossman SN. AI in Neuro-Ophthalmology: Current Practice and Future Opportunities. J Neuroophthalmol 2024:00041327-990000000-00679. [PMID: 38965655 DOI: 10.1097/wno.0000000000002205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/06/2024]

Abstract

BACKGROUND

Neuro-ophthalmology frequently requires a complex and multi-faceted clinical assessment supported by sophisticated imaging techniques in order to assess disease status. The current approach to diagnosis requires substantial expertise and time. The emergence of AI has brought forth innovative solutions to streamline and enhance this diagnostic process, which is especially valuable given the shortage of neuro-ophthalmologists. Machine learning algorithms, in particular, have demonstrated significant potential in interpreting imaging data, identifying subtle patterns, and aiding clinicians in making more accurate and timely diagnosis while also supplementing nonspecialist evaluations of neuro-ophthalmic disease.

EVIDENCE ACQUISITION

Electronic searches of published literature were conducted using PubMed and Google Scholar. A comprehensive search of the following terms was conducted within the Journal of Neuro-Ophthalmology: AI, artificial intelligence, machine learning, deep learning, natural language processing, computer vision, large language models, and generative AI.

RESULTS

This review aims to provide a comprehensive overview of the evolving landscape of AI applications in neuro-ophthalmology. It will delve into the diverse applications of AI, optical coherence tomography (OCT), and fundus photography to the development of predictive models for disease progression. Additionally, the review will explore the integration of generative AI into neuro-ophthalmic education and clinical practice.

CONCLUSIONS

We review the current state of AI in neuro-ophthalmology and its potentially transformative impact. The inclusion of AI in neuro-ophthalmic practice and research not only holds promise for improving diagnostic accuracy but also opens avenues for novel therapeutic interventions. We emphasize its potential to improve access to scarce subspecialty resources while examining the current challenges associated with the integration of AI into clinical practice and research.

Collapse

Rossettini G, Rodeghiero L, Corradi F, Cook C, Pillastrini P, Turolla A, Castellini G, Chiappinotto S, Gianola S, Palese A. Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study. BMC MEDICAL EDUCATION 2024;24:694. [PMID: 38926809 PMCID: PMC11210096 DOI: 10.1186/s12909-024-05630-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 06/04/2024] [Indexed: 06/28/2024]

Abstract

BACKGROUND

Artificial intelligence (AI) chatbots are emerging educational tools for students in healthcare science. However, assessing their accuracy is essential prior to adoption in educational settings. This study aimed to assess the accuracy of predicting the correct answers from three AI chatbots (ChatGPT-4, Microsoft Copilot and Google Gemini) in the Italian entrance standardized examination test of healthcare science degrees (CINECA test). Secondarily, we assessed the narrative coherence of the AI chatbots' responses (i.e., text output) based on three qualitative metrics: the logical rationale behind the chosen answer, the presence of information internal to the question, and presence of information external to the question.

METHODS

An observational cross-sectional design was performed in September of 2023. Accuracy of the three chatbots was evaluated for the CINECA test, where questions were formatted using a multiple-choice structure with a single best answer. The outcome is binary (correct or incorrect). Chi-squared test and a post hoc analysis with Bonferroni correction assessed differences among chatbots performance in accuracy. A p-value of < 0.05 was considered statistically significant. A sensitivity analysis was performed, excluding answers that were not applicable (e.g., images). Narrative coherence was analyzed by absolute and relative frequencies of correct answers and errors.

RESULTS

Overall, of the 820 CINECA multiple-choice questions inputted into all chatbots, 20 questions were not imported in ChatGPT-4 (n = 808) and Google Gemini (n = 808) due to technical limitations. We found statistically significant differences in the ChatGPT-4 vs Google Gemini and Microsoft Copilot vs Google Gemini comparisons (p-value < 0.001). The narrative coherence of AI chatbots revealed "Logical reasoning" as the prevalent correct answer (n = 622, 81.5%) and "Logical error" as the prevalent incorrect answer (n = 40, 88.9%).

CONCLUSIONS

Our main findings reveal that: (A) AI chatbots performed well; (B) ChatGPT-4 and Microsoft Copilot performed better than Google Gemini; and (C) their narrative coherence is primarily logical. Although AI chatbots showed promising accuracy in predicting the correct answer in the Italian entrance university standardized examination test, we encourage candidates to cautiously incorporate this new technology to supplement their learning rather than a primary resource.

TRIAL REGISTRATION

Not required.

Collapse

Paul S, Govindaraj S, Jk J. ChatGPT Versus National Eligibility cum Entrance Test for Postgraduate (NEET PG). Cureus 2024;16:e63048. [PMID: 39050297 PMCID: PMC11268980 DOI: 10.7759/cureus.63048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/24/2024] [Indexed: 07/27/2024] Open

Tailor PD, D'Souza HS, Li H, Starr MR. Vision of the future: large language models in ophthalmology. Curr Opin Ophthalmol 2024:00055735-990000000-00173. [PMID: 38814572 DOI: 10.1097/icu.0000000000001062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]

SAYGIN M, BEKMEZCİ M, DİNÇER E. Artificial Intelligence Model Chatgpt-4: Entrepreneur Candidate and Entrepreneurship Example. F1000Res 2024;13:308. [PMID: 38845823 PMCID: PMC11153998 DOI: 10.12688/f1000research.144671.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/20/2024] [Indexed: 06/09/2024] Open

Kedia N, Sanjeev S, Ong J, Chhablani J. ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology. Eye (Lond) 2024;38:1252-1261. [PMID: 38172581 PMCID: PMC11076576 DOI: 10.1038/s41433-023-02915-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/23/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open

Momenaei B, Mansour HA, Kuriyan AE, Xu D, Sridhar J, Ting DSW, Yonekawa Y. ChatGPT enters the room: what it means for patient counseling, physician education, academics, and disease management. Curr Opin Ophthalmol 2024;35:205-209. [PMID: 38334288 DOI: 10.1097/icu.0000000000001036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]

Biswas S, Davies LN, Sheppard AL, Logan NS, Wolffsohn JS. Utility of artificial intelligence-based large language models in ophthalmic care. Ophthalmic Physiol Opt 2024;44:641-671. [PMID: 38404172 DOI: 10.1111/opo.13284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 01/23/2024] [Accepted: 01/25/2024] [Indexed: 02/27/2024]

Abstract

PURPOSE

With the introduction of ChatGPT, artificial intelligence (AI)-based large language models (LLMs) are rapidly becoming popular within the scientific community. They use natural language processing to generate human-like responses to queries. However, the application of LLMs and comparison of the abilities among different LLMs with their human counterparts in ophthalmic care remain under-reported.

RECENT FINDINGS

Hitherto, studies in eye care have demonstrated the utility of ChatGPT in generating patient information, clinical diagnosis and passing ophthalmology question-based examinations, among others. LLMs' performance (median accuracy, %) is influenced by factors such as the iteration, prompts utilised and the domain. Human expert (86%) demonstrated the highest proficiency in disease diagnosis, while ChatGPT-4 outperformed others in ophthalmology examinations (75.9%), symptom triaging (98%) and providing information and answering questions (84.6%). LLMs exhibited superior performance in general ophthalmology but reduced accuracy in ophthalmic subspecialties. Although AI-based LLMs like ChatGPT are deemed more efficient than their human counterparts, these AIs are constrained by their nonspecific and outdated training, no access to current knowledge, generation of plausible-sounding 'fake' responses or hallucinations, inability to process images, lack of critical literature analysis and ethical and copyright issues. A comprehensive evaluation of recently published studies is crucial to deepen understanding of LLMs and the potential of these AI-based LLMs.

SUMMARY

Ophthalmic care professionals should undertake a conservative approach when using AI, as human judgement remains essential for clinical decision-making and monitoring the accuracy of information. This review identified the ophthalmic applications and potential usages which need further exploration. With the advancement of LLMs, setting standards for benchmarking and promoting best practices is crucial. Potential clinical deployment requires the evaluation of these LLMs to move away from artificial settings, delve into clinical trials and determine their usefulness in the real world.

Collapse

Haghighi T, Gholami S, Sokol JT, Kishnani E, Ahsaniyan A, Rahmanian H, Hedayati F, Leng T, Alam MN. EYE-Llama, an in-domain large language model for ophthalmology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.26.591355. [PMID: 38746183 PMCID: PMC11092466 DOI: 10.1101/2024.04.26.591355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]

Abstract

Background

Training Large Language Models (LLMs) with in-domain data can significantly enhance their performance, leading to more accurate and reliable question-answering (QA) systems essential for supporting clinical decision-making and educating patients.

Methods

This study introduces LLMs trained on in-domain, well-curated ophthalmic datasets. We also present an open-source substantial ophthalmic language dataset for model training. Our LLMs (EYE-Llama), first pre-trained on an ophthalmology-specific dataset, including paper abstracts, textbooks, EyeWiki, and Wikipedia articles. Subsequently, the models underwent fine-tuning using a diverse range of QA datasets. The LLMs at each stage were then compared to baseline Llama 2, ChatDoctor, and ChatGPT (GPT3.5) models, using four distinct test sets, and evaluated quantitatively (Accuracy, F1 score, and BERTScore) and qualitatively by two ophthalmologists.

Results

Upon evaluating the models using the American Academy of Ophthalmology (AAO) test set and BERTScore as the metric, our models surpassed both Llama 2 and ChatDoctor in terms of F1 score and performed equally to ChatGPT, which was trained with 175 billion parameters (EYE-Llama: 0.57, Llama 2: 0.56, ChatDoctor: 0.56, and ChatGPT: 0.57). When evaluated on the MedMCQA test set, the fine-tuned models demonstrated a higher accuracy compared to the Llama 2 and ChatDoctor models (EYE-Llama: 0.39, Llama 2: 0.33, ChatDoctor: 0.29). However, ChatGPT outperformed EYE-Llama with an accuracy of 0.55. When tested with the PubmedQA set, the fine-tuned model showed improvement in accuracy over both the Llama 2, ChatGPT, and ChatDoctor models (EYE-Llama: 0.96, Llama 2: 0.90, ChatGPT: 0.93, ChatDoctor: 0.92).

Conclusion

The study shows that pre-training and fine-tuning LLMs like EYE-Llama enhances their performance in specific medical domains. Our EYE-Llama models surpass baseline Llama 2 in all evaluations, highlighting the effectiveness of specialized LLMs in medical QA systems. (Funded by NEI R15EY035804 (MNA) and UNC Charlotte Faculty Research Grant (MNA).).

Collapse

Zandi R, Fahey JD, Drakopoulos M, Bryan JM, Dong S, Bryar PJ, Bidwell AE, Bowen RC, Lavine JA, Mirza RG. Exploring Diagnostic Precision and Triage Proficiency: A Comparative Study of GPT-4 and Bard in Addressing Common Ophthalmic Complaints. Bioengineering (Basel) 2024;11:120. [PMID: 38391606 PMCID: PMC10886029 DOI: 10.3390/bioengineering11020120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 01/20/2024] [Accepted: 01/24/2024] [Indexed: 02/24/2024] Open

Sallam M, Al-Salahat K, Al-Ajlouni E. ChatGPT Performance in Diagnostic Clinical Microbiology Laboratory-Oriented Case Scenarios. Cureus 2023;15:e50629. [PMID: 38107211 PMCID: PMC10725273 DOI: 10.7759/cureus.50629] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/16/2023] [Indexed: 12/19/2023] Open

Abstract

BACKGROUND

Artificial intelligence (AI)-based tools can reshape healthcare practice. This includes ChatGPT which is considered among the most popular AI-based conversational models. Nevertheless, the performance of different versions of ChatGPT needs further evaluation in different settings to assess its reliability and credibility in various healthcare-related tasks. Therefore, the current study aimed to assess the performance of the freely available ChatGPT-3.5 and the paid version ChatGPT-4 in 10 different diagnostic clinical microbiology case scenarios.

METHODS

The current study followed the METRICS (Model, Evaluation, Timing/Transparency, Range/Randomization, Individual factors, Count, Specificity of the prompts/language) checklist for standardization of the design and reporting of AI-based studies in healthcare. The models tested on December 3, 2023 included ChatGPT-3.5 and ChatGPT-4 and the evaluation of the ChatGPT-generated content was based on the CLEAR tool (Completeness, Lack of false information, Evidence support, Appropriateness, and Relevance) assessed on a 5-point Likert scale with a range of the CLEAR scores of 1-5. ChatGPT output was evaluated by two raters independently and the inter-rater agreement was based on the Cohen's κ statistic. Ten diagnostic clinical microbiology laboratory case scenarios were created in the English language by three microbiologists at diverse levels of expertise following an internal discussion of common cases observed in Jordan. The range of topics included bacteriology, mycology, parasitology, and virology cases. Specific prompts were tailored based on the CLEAR tool and a new session was selected following prompting each case scenario.

RESULTS

The Cohen's κ values for the five CLEAR items were 0.351-0.737 for ChatGPT-3.5 and 0.294-0.701 for ChatGPT-4 indicating fair to good agreement and suitability for analysis. Based on the average CLEAR scores, ChatGPT-4 outperformed ChatGPT-3.5 (mean: 2.64±1.06 vs. 3.21±1.05, P=.012, t-test). The performance of each model varied based on the CLEAR items, with the lowest performance for the "Relevance" item (2.15±0.71 for ChatGPT-3.5 and 2.65±1.16 for ChatGPT-4). A statistically significant difference upon assessing the performance per each CLEAR item was only seen in ChatGPT-4 with the best performance in "Completeness", "Lack of false information", and "Evidence support" (P=0.043). The lowest level of performance for both models was observed with antimicrobial susceptibility testing (AST) queries while the highest level of performance was seen in bacterial and mycologic identification.

CONCLUSIONS

Assessment of ChatGPT performance across different diagnostic clinical microbiology case scenarios showed that ChatGPT-4 outperformed ChatGPT-3.5. The performance of ChatGPT demonstrated noticeable variability depending on the specific topic evaluated. A primary shortcoming of both ChatGPT models was the tendency to generate irrelevant content lacking the needed focus. Although the overall ChatGPT performance in these diagnostic microbiology case scenarios might be described as "above average" at best, there remains a significant potential for improvement, considering the identified limitations and unsatisfactory results in a few cases.

Collapse

Antaki F, Milad D, Chia MA, Giguère CÉ, Touma S, El-Khoury J, Keane PA, Duval R. Capabilities of GPT-4 in ophthalmology: an analysis of model entropy and progress towards human-level medical question answering. Br J Ophthalmol 2023:bjo-2023-324438. [PMID: 37923374 DOI: 10.1136/bjo-2023-324438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/01/2023] [Indexed: 11/07/2023]