1
|
Aljamaan F, Temsah MH, Altamimi I, Al-Eyadhy A, Jamal A, Alhasan K, Mesallam TA, Farahat M, Malki KH. Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study. JMIR Med Inform 2024; 12:e54345. [PMID: 39083799 DOI: 10.2196/54345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/05/2024] [Accepted: 07/03/2024] [Indexed: 08/02/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) chatbots have recently gained use in medical practice by health care practitioners. Interestingly, the output of these AI chatbots was found to have varying degrees of hallucination in content and references. Such hallucinations generate doubts about their output and their implementation. OBJECTIVE The aim of our study was to propose a reference hallucination score (RHS) to evaluate the authenticity of AI chatbots' citations. METHODS Six AI chatbots were challenged with the same 10 medical prompts, requesting 10 references per prompt. The RHS is composed of 6 bibliographic items and the reference's relevance to prompts' keywords. RHS was calculated for each reference, prompt, and type of prompt (basic vs complex). The average RHS was calculated for each AI chatbot and compared across the different types of prompts and AI chatbots. RESULTS Bard failed to generate any references. ChatGPT 3.5 and Bing generated the highest RHS (score=11), while Elicit and SciSpace generated the lowest RHS (score=1), and Perplexity generated a middle RHS (score=7). The highest degree of hallucination was observed for reference relevancy to the prompt keywords (308/500, 61.6%), while the lowest was for reference titles (169/500, 33.8%). ChatGPT and Bing had comparable RHS (β coefficient=-0.069; P=.32), while Perplexity had significantly lower RHS than ChatGPT (β coefficient=-0.345; P<.001). AI chatbots generally had significantly higher RHS when prompted with scenarios or complex format prompts (β coefficient=0.486; P<.001). CONCLUSIONS The variation in RHS underscores the necessity for a robust reference evaluation tool to improve the authenticity of AI chatbots. Further, the variations highlight the importance of verifying their output and citations. Elicit and SciSpace had negligible hallucination, while ChatGPT and Bing had critical hallucination levels. The proposed AI chatbots' RHS could contribute to ongoing efforts to enhance AI's general reliability in medical research.
Collapse
Affiliation(s)
- Fadi Aljamaan
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | | | | | - Ayman Al-Eyadhy
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Amr Jamal
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Khalid Alhasan
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Tamer A Mesallam
- Department of Otolaryngology, College of Medicine, Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia
| | - Mohamed Farahat
- Department of Otolaryngology, College of Medicine, Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia
| | - Khalid H Malki
- Department of Otolaryngology, College of Medicine, Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
2
|
Haltaufderheide J, Ranisch R. The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs). NPJ Digit Med 2024; 7:183. [PMID: 38977771 PMCID: PMC11231310 DOI: 10.1038/s41746-024-01157-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/29/2024] [Indexed: 07/10/2024] Open
Abstract
With the introduction of ChatGPT, Large Language Models (LLMs) have received enormous attention in healthcare. Despite potential benefits, researchers have underscored various ethical implications. While individual instances have garnered attention, a systematic and comprehensive overview of practical applications currently researched and ethical issues connected to them is lacking. Against this background, this work maps the ethical landscape surrounding the current deployment of LLMs in medicine and healthcare through a systematic review. Electronic databases and preprint servers were queried using a comprehensive search strategy which generated 796 records. Studies were screened and extracted following a modified rapid review approach. Methodological quality was assessed using a hybrid approach. For 53 records, a meta-aggregative synthesis was performed. Four general fields of applications emerged showcasing a dynamic exploration phase. Advantages of using LLMs are attributed to their capacity in data analysis, information provisioning, support in decision-making or mitigating information loss and enhancing information accessibility. However, our study also identifies recurrent ethical concerns connected to fairness, bias, non-maleficence, transparency, and privacy. A distinctive concern is the tendency to produce harmful or convincing but inaccurate content. Calls for ethical guidance and human oversight are recurrent. We suggest that the ethical guidance debate should be reframed to focus on defining what constitutes acceptable human oversight across the spectrum of applications. This involves considering the diversity of settings, varying potentials for harm, and different acceptable thresholds for performance and certainty in healthcare. Additionally, critical inquiry is needed to evaluate the necessity and justification of LLMs' current experimental use.
Collapse
Affiliation(s)
- Joschka Haltaufderheide
- Faculty of Health Sciences Brandenburg, University of Potsdam, Am Mühlenberg 9, Potsdam, 14476, Germany
| | - Robert Ranisch
- Faculty of Health Sciences Brandenburg, University of Potsdam, Am Mühlenberg 9, Potsdam, 14476, Germany.
| |
Collapse
|
3
|
Brondani M, Alves C, Ribeiro C, Braga MM, Garcia RCM, Ardenghi T, Pattanaporn K. Artificial intelligence, ChatGPT, and dental education: Implications for reflective assignments and qualitative research. J Dent Educ 2024. [PMID: 38973069 DOI: 10.1002/jdd.13663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 06/02/2024] [Accepted: 06/21/2024] [Indexed: 07/09/2024]
Abstract
INTRODUCTION Reflections enable students to gain additional value from a given experience. The use of Chat Generative Pre-training Transformer (ChatGPT, OpenAI Incorporated) has gained momentum, but its impact on dental education is understudied. OBJECTIVES To assess whether or not university instructors can differentiate reflections generated by ChatGPT from those generated by students, and to assess whether or not the content of a thematic analysis generated by ChatGPT differs from that generated by qualitative researchers on the same reflections. METHODS Hardcopies of 20 reflections (10 generated by undergraduate dental students and 10 generated by ChatGPT) were distributed to three instructors who had at least 5 years of teaching experience. Instructors were asked to assign either 'ChatGPT' or 'student' to each reflection. Ten of these reflections (five generated by undergraduate dental students and five generated by ChatGPT) were randomly selected and distributed to two qualitative researchers who were asked to perform a brief thematic analysis with codes and themes. The same ten reflections were also thematically analyzed by ChatGPT. RESULTS The three instructors correctly determined whether the reflections were student or ChatGPT generated 85% of the time. Most disagreements (40%) happened with the reflections generated by ChatGPT, as the instructors thought to be generated by students. The thematic analyses did not differ substantially when comparing the codes and themes produced by the two researchers with those generated by ChatGPT. CONCLUSIONS Instructors could differentiate between reflections generated by ChatGPT or by students most of the time. The overall content of a thematic analysis generated by the artificial intelligence program ChatGPT did not differ from that generated by qualitative researchers. Overall, the promising applications of ChatGPT will likely generate a paradigm shift in (dental) health education, research, and practice.
Collapse
Affiliation(s)
- Mario Brondani
- Faculty of Dentistry, Department of Oral Health Sciences, University of British Columbia, Vancouver, Canada
| | - Claudia Alves
- Faculty of Dentistry, Department of Dentistry II, Federal University of Maranhão, Sao Luis-Maranhao, Brazil
| | - Cecilia Ribeiro
- Faculty of Dentistry, Department of Dentistry II, Federal University of Maranhão, Sao Luis-Maranhao, Brazil
| | - Mariana M Braga
- Faculty of Dentistry, Department of Pediatric Dentistry, University of São Paulo, Sao Paulo, Brazil
| | - Renata C Mathes Garcia
- Faculty of Dentistry, Prosthodontic and Periodontic Department, University of Campinas, Sao Paulo, Brazil
| | - Thiago Ardenghi
- Faculty of Dentistry, Department of Pediatric Dentistry and Epidemiology, School of Dentistry, Federal University of Santa Maria, Santa Maria, Brazil
| | | |
Collapse
|
4
|
Pickens V, Maille J, Pitt WJ, Twombly Ellis J, Salgado S, Tims KM, Edwards CC, Peavy M, Williamson ZV, Musgrove TRT, Doherty E, Khadka A, Martin Ewert A, Sparks TC, Shrestha B, Scribner H, Balthazor N, Johnson RL, Markwardt C, Singh R, Constancio N, Hauri KC, Ternest JJ, Gula SW, Dillard D. Addressing emerging issues in entomology: 2023 student debates. JOURNAL OF INSECT SCIENCE (ONLINE) 2024; 24:11. [PMID: 39095324 PMCID: PMC11296816 DOI: 10.1093/jisesa/ieae080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 06/05/2024] [Accepted: 07/16/2024] [Indexed: 08/04/2024]
Abstract
The Entomological Society of America (ESA) Student Debates is an annual student competition at the ESA Annual Meeting organized by Student Debates Subcommittee (SDS) members of the ESA Student Affairs Committee. In conjunction with the 2023 ESA Annual Meeting theme, 'Insects and influence: Advancing entomology's impact on people and policy', the theme of this year's student debate was 'Addressing emerging issues in entomology'. With the aid of ESA membership, the SDS selected the following debate topics: (1) Should disclosure of artificial intelligence large language models in scientific writing always be required? and (2) Is it more important to prioritize honey bee or native pollinator health for long-term food security within North America? Four student teams from across the nation, composed of 3-5 student members and a professional advisor, were assigned a topic and stance. Over the course of 5 months, all team members researched and prepared for their assigned topic before debating live with an opposing team at the 2023 ESA Annual Meeting in National Harbor, Maryland. SDS members additionally prepared and presented introductions for each debate topic to provide unbiased backgrounds to the judges and audience for context in assessing teams' arguments. The result was an engaging discussion between our teams, judges, and audience members on emerging issues facing entomology and its impact on people and policy, such as scientific communication and food security, that brought attention to the complexities involved when debating topics concerning insects and influence.
Collapse
Affiliation(s)
- Victoria Pickens
- Department of Entomology, Kansas State University, Manhattan, KS, USA
| | - Jacqueline Maille
- Department of Entomology, Kansas State University, Manhattan, KS, USA
| | - William Jacob Pitt
- Tree Fruit Research & Extension Center, Washington State University, Wenatchee, WA, USA
| | | | - Sara Salgado
- Department of Entomology and Nematology, University of Florida, Fort Pierce, FL, USA
| | - Kelly M Tims
- Department of Entomology, University of Georgia, Athens, GA, USA
| | | | - Malcolm Peavy
- Department of Entomology, University of Georgia, Athens, GA, USA
| | | | - Tyler R T Musgrove
- Department of Entomology, Louisiana State University, Baton Rouge, LA, USA
| | - Ethan Doherty
- Department of Mathematical and Statistical Sciences, Clemson University, Clemson, SC, USA
- Department of Forestry and Environmental Sciences, Clemson University, Clemson, SC, USA
| | - Arjun Khadka
- Department of Entomology, Louisiana State University, Baton Rouge, LA, USA
| | | | - Tanner C Sparks
- Department of Entomology, Louisiana State University, Baton Rouge, LA, USA
| | - Bandana Shrestha
- Department of Entomology, Louisiana State University, Baton Rouge, LA, USA
| | - Hazel Scribner
- Department of Entomology, Kansas State University, Manhattan, KS, USA
| | - Navi Balthazor
- Department of Entomology, Kansas State University, Manhattan, KS, USA
| | - Rachel L Johnson
- Department of Entomology, Kansas State University, Manhattan, KS, USA
| | - Chip Markwardt
- Department of Entomology, Kansas State University, Manhattan, KS, USA
| | - Rupinder Singh
- Department of Entomology, Kansas State University, Manhattan, KS, USA
| | - Natalie Constancio
- Department of Entomology, Michigan State University, East Lansing, MI, USA
| | - Kayleigh C Hauri
- Department of Entomology, Michigan State University, East Lansing, MI, USA
| | - John J Ternest
- Department of Entomology and Nematology, University of Florida, Gainesville, FL, USA
| | - Scott W Gula
- Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN, USA
| | - DeShae Dillard
- Department of Entomology, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
5
|
Taesotikul S, Singhan W, Taesotikul T. ChatGPT vs pharmacy students in the pharmacotherapy time-limit test: A comparative study in Thailand. CURRENTS IN PHARMACY TEACHING & LEARNING 2024; 16:404-410. [PMID: 38641483 DOI: 10.1016/j.cptl.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 04/03/2024] [Accepted: 04/04/2024] [Indexed: 04/21/2024]
Abstract
OBJECTIVES ChatGPT is an innovative artificial intelligence designed to enhance human activities and serve as a potent tool for information retrieval. This study aimed to evaluate the performance and limitation of ChatGPT on fourth-year pharmacy student examination. METHODS This cross-sectional study was conducted on February 2023 at the Faculty of Pharmacy, Chiang Mai University, Thailand. The exam contained 16 multiple-choice questions and 2 short-answer questions, focusing on classification and medical management of shock and electrolyte disorders. RESULTS Out of the 18 questions, ChatGPT provided 44% (8 out of 18) correct responses. In contrast, the students provided a higher accuracy rate with 66% (12 out of 18) correctly answered questions. The findings of this study underscore that while AI exhibits proficiency, it encounters limitations when confronted with specific queries derived from practical scenarios, on the contrary with pharmacy students who possess the liberty to explore and collaborate, mirroring real-world scenarios. CONCLUSIONS Users must exercise caution regarding its reliability, and interpretations of AI-generated answers should be approached judiciously due to potential restrictions in multi-step analysis and reliance on outdated data. Future advancements in AI models, with refinements and tailored enhancements, offer the potential for improved performance.
Collapse
Affiliation(s)
- Suthinee Taesotikul
- Department of Pharmaceutical Care, Faculty of Pharmacy, Chiang Mai University, Chiang Mai 50200, Thailand.
| | - Wanchana Singhan
- Department of Pharmaceutical Care, Faculty of Pharmacy, Chiang Mai University, Chiang Mai 50200, Thailand.
| | - Theerada Taesotikul
- Department of Biomedicine and Health Informatics, Faculty of Pharmacy, Silpakorn University, Nakhon Pathom 73000, Thailand.
| |
Collapse
|
6
|
Wu J, Ma Y, Wang J, Xiao M. The Application of ChatGPT in Medicine: A Scoping Review and Bibliometric Analysis. J Multidiscip Healthc 2024; 17:1681-1692. [PMID: 38650670 PMCID: PMC11034560 DOI: 10.2147/jmdh.s463128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open
Abstract
Purpose ChatGPT has a wide range of applications in the medical field. Therefore, this review aims to define the key issues and provide a comprehensive view of the literature based on the application of ChatGPT in medicine. Methods This scope follows Arksey and O'Malley's five-stage framework. A comprehensive literature search of publications (30 November 2022 to 16 August 2023) was conducted. Six databases were searched and relevant references were systematically catalogued. Attention was focused on the general characteristics of the articles, their fields of application, and the advantages and disadvantages of using ChatGPT. Descriptive statistics and narrative synthesis methods were used for data analysis. Results Of the 3426 studies, 247 met the criteria for inclusion in this review. The majority of articles (31.17%) were from the United States. Editorials (43.32%) ranked first, followed by experimental studys (11.74%). The potential applications of ChatGPT in medicine are varied, with the largest number of studies (45.75%) exploring clinical practice, including assisting with clinical decision support and providing disease information and medical advice. This was followed by medical education (27.13%) and scientific research (16.19%). Particularly noteworthy in the discipline statistics were radiology, surgery and dentistry at the top of the list. However, ChatGPT in medicine also faces issues of data privacy, inaccuracy and plagiarism. Conclusion The application of ChatGPT in medicine focuses on different disciplines and general application scenarios. ChatGPT has a paradoxical nature: it offers significant advantages, but at the same time raises great concerns about its application in healthcare settings. Therefore, it is imperative to develop theoretical frameworks that not only address its widespread use in healthcare but also facilitate a comprehensive assessment. In addition, these frameworks should contribute to the development of strict and effective guidelines and regulatory measures.
Collapse
Affiliation(s)
- Jie Wu
- Department of Nursing, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Yingzhuo Ma
- Department of Nursing, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Jun Wang
- Department of Nursing, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Mingzhao Xiao
- Department of Urology, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| |
Collapse
|
7
|
Currie GM, Hawk KE, Rohren EM. The potential role of artificial intelligence in sustainability of nuclear medicine. Radiography (Lond) 2024:S1078-8174(24)00067-1. [PMID: 38582701 DOI: 10.1016/j.radi.2024.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 03/08/2024] [Accepted: 03/12/2024] [Indexed: 04/08/2024]
Abstract
BACKGROUND Strategies targeted at the five pillars of sustainability (social, human, economic, ecological and environmental) can be used to improve sustainability of clinical or research practices in nuclear medicine. KEY FINDINGS While the core principle of sustainability is ensuring depletion does not exceed regeneration, this manuscript considers the balance of benefits and detriments of artificial intelligence (AI) technologies across the five pillars of sustainability. Specifically, innovations such as AI, generative AI and digital twins could enhance sustainability. While AI has the potential to address social asymmetry and inequity to drive the social and human pillars of sustainability, there is potential for widening the equity gap. AI augmentation and generative AI present economic and environmental sustainability opportunities. Deep digital twins offers clinical and research benefits in economic, ecological and environmental sustainability pillars. CONCLUSION AI, digital twins and generative AI offer potential benefits to sustainability in nuclear medicine. Despite the benefits, caution is advised because these technologies confront a number of challenges that could potentially threaten sustainability. IMPLICATIONS FOR PRACTICE AI presents opportunities for improving sustainability of nuclear medicine practice although caution is recommended to avoid unintentional undermining of sustainability across the five pillars.
Collapse
Affiliation(s)
- G M Currie
- Charles Sturt University, NSW, Australia; Baylor College of Medicine, Texas, USA.
| | - K E Hawk
- University of California San Diego, California, USA; Stanford University, California, USA
| | - E M Rohren
- Charles Sturt University, NSW, Australia; Baylor College of Medicine, Texas, USA
| |
Collapse
|
8
|
Lee TL, Ding J, Trivedi HM, Gichoya JW, Moon JT, Li H. Understanding Radiological Journal Views and Policies on Large Language Models in Academic Writing. J Am Coll Radiol 2024; 21:678-682. [PMID: 37558108 DOI: 10.1016/j.jacr.2023.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 07/31/2023] [Accepted: 08/05/2023] [Indexed: 08/11/2023]
Affiliation(s)
- Tai-Lin Lee
- Department of Radiology and Imaging Science, Emory University School of Medicine, Atlanta, Georgia. https://twitter.com/heyttymonica
| | - Julia Ding
- Emory University School of Medicine, Atlanta, Georgia. https://twitter.com/_juliading
| | - Hari M Trivedi
- Co-Director of Healthcare Innovations and Translational Informatics Lab, Emory University School of Medicine, Atlanta, Georgia. https://twitter.com/HariTrivediMD
| | - Judy W Gichoya
- Co-Director of Healthcare Innovations and Translational Informatics Lab, Emory University School of Medicine, Atlanta, Georgia. https://twitter.com/judywawira
| | - John T Moon
- Founder and PI of Biodesign & Innovation of Minimally-Invasive Technologies Lab, Emory University School of Medicine, Atlanta, Georgia. https://twitter.com/johntmoon
| | - Hanzhou Li
- Department of Radiology and Imaging Science, Emory University School of Medicine, Atlanta, Georgia.
| |
Collapse
|
9
|
Shorey S, Mattar C, Pereira TLB, Choolani M. A scoping review of ChatGPT's role in healthcare education and research. NURSE EDUCATION TODAY 2024; 135:106121. [PMID: 38340639 DOI: 10.1016/j.nedt.2024.106121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/05/2024] [Accepted: 02/04/2024] [Indexed: 02/12/2024]
Abstract
OBJECTIVES To examine and consolidate literature regarding the advantages and disadvantages of utilizing ChatGPT in healthcare education and research. DESIGN/METHODS We searched seven electronic databases (PubMed/Medline, CINAHL, Embase, PsycINFO, Scopus, ProQuest Dissertations and Theses Global, and Web of Science) from November 2022 until September 2023. This scoping review adhered to Arksey and O'Malley's framework and followed reporting guidelines outlined in the PRISMA-ScR checklist. For analysis, we employed Thomas and Harden's thematic synthesis framework. RESULTS A total of 100 studies were included. An overarching theme, "Forging the Future: Bridging Theory and Integration of ChatGPT" emerged, accompanied by two main themes (1) Enhancing Healthcare Education, Research, and Writing with ChatGPT, (2) Controversies and Concerns about ChatGPT in Healthcare Education Research and Writing, and seven subthemes. CONCLUSIONS Our review underscores the importance of acknowledging legitimate concerns related to the potential misuse of ChatGPT such as 'ChatGPT hallucinations', its limited understanding of specialized healthcare knowledge, its impact on teaching methods and assessments, confidentiality and security risks, and the controversial practice of crediting it as a co-author on scientific papers, among other considerations. Furthermore, our review also recognizes the urgency of establishing timely guidelines and regulations, along with the active engagement of relevant stakeholders, to ensure the responsible and safe implementation of ChatGPT's capabilities. We advocate for the use of cross-verification techniques to enhance the precision and reliability of generated content, the adaptation of higher education curricula to incorporate ChatGPT's potential, educators' need to familiarize themselves with the technology to improve their literacy and teaching approaches, and the development of innovative methods to detect ChatGPT usage. Furthermore, data protection measures should be prioritized when employing ChatGPT, and transparent reporting becomes crucial when integrating ChatGPT into academic writing.
Collapse
Affiliation(s)
- Shefaly Shorey
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
| | - Citra Mattar
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynaecology, National University Health Systems, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Travis Lanz-Brian Pereira
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Mahesh Choolani
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynaecology, National University Health Systems, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| |
Collapse
|
10
|
Bains SS, Dubin JA, Hameed D, Sax OC, Douglas S, Mont MA, Nace J, Delanois RE. Use and Application of Large Language Models for Patient Questions Following Total Knee Arthroplasty. J Arthroplasty 2024:S0883-5403(24)00233-X. [PMID: 38490569 DOI: 10.1016/j.arth.2024.03.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 03/06/2024] [Accepted: 03/07/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND A consumer-focused health care model not only allows unprecedented access to information, but equally warrants consideration of the appropriateness of providing accurate patient health information. Nurses play a large role in influencing patient satisfaction following total knee arthroplasty (TKA), but they come at a cost. A specific natural language artificial intelligence (AI) model, ChatGPT (Chat Generative Pre-trained Transformer), has accumulated over 100 million users within months of launching. As such, we aimed to compare: (1) orthopaedic surgeons' evaluation of the appropriateness of the answers to the most frequently asked patient questions after TKA; and (2) patients' comfort level in answering their postoperative questions by using answers provided by arthroplasty-trained nurses and ChatGPT. METHODS We prospectively created 60 questions based on the most commonly asked patient questions following TKA. There were 3 fellowship-trained surgeons who assessed the answers provided by arthroplasty-trained nurses and ChatGPT-4 to each of the questions. The surgeons graded each set of responses based on clinical judgment as: (1) "appropriate," (2) "inappropriate" if the response contained inappropriate information, or (3) "unreliable," if the responses provided inconsistent content. Patients' comfort level and trust in AI were assessed using Research Electronic Data Capture (REDCap) hosted at our local hospital. RESULTS The surgeons graded 44 out of 60 (73.3%) responses for the arthroplasty-trained nurses and 44 out of 60 (73.3%) for ChatGPT to be "appropriate." There were 4 responses graded "inappropriate" and one response graded "unreliable" provided by the nurses. For the ChatGPT response, there were 5 responses graded "inappropriate" and no responses graded "unreliable." There were 136 patients (53.8%) who were more comfortable with the answers provided by ChatGPT compared to 86 patients (34.0%) who preferred the answers from arthroplasty-trained nurses. Of the 253 patients, 233 (92.1%) were uncertain if they would trust AI to answer their postoperative questions. There were 127 patients (50.2%) who answered that if they knew the previous answer was provided by ChatGPT, their comfort level in trusting the answer would change. CONCLUSIONS One potential use of ChatGPT can be found in providing appropriate answers to patient questions after TKA. At our institution, cost expenditures can potentially be minimized while maintaining patient satisfaction. Inevitably, successful implementation is dependent on the ability to provide information that is credible and in accordance with the objectives of both physicians and patients. LEVEL OF EVIDENCE III.
Collapse
Affiliation(s)
- Sandeep S Bains
- Rubin Institute for Advanced Orthopedics, LifeBridge Health, Sinai Hospital of Baltimore, Baltimore, Maryland
| | - Jeremy A Dubin
- Rubin Institute for Advanced Orthopedics, LifeBridge Health, Sinai Hospital of Baltimore, Baltimore, Maryland
| | - Daniel Hameed
- Rubin Institute for Advanced Orthopedics, LifeBridge Health, Sinai Hospital of Baltimore, Baltimore, Maryland
| | - Oliver C Sax
- Rubin Institute for Advanced Orthopedics, LifeBridge Health, Sinai Hospital of Baltimore, Baltimore, Maryland
| | - Scott Douglas
- Rubin Institute for Advanced Orthopedics, LifeBridge Health, Sinai Hospital of Baltimore, Baltimore, Maryland
| | - Michael A Mont
- Rubin Institute for Advanced Orthopedics, LifeBridge Health, Sinai Hospital of Baltimore, Baltimore, Maryland
| | - James Nace
- Rubin Institute for Advanced Orthopedics, LifeBridge Health, Sinai Hospital of Baltimore, Baltimore, Maryland
| | - Ronald E Delanois
- Rubin Institute for Advanced Orthopedics, LifeBridge Health, Sinai Hospital of Baltimore, Baltimore, Maryland
| |
Collapse
|
11
|
Padillah R. Ghostwriting: a reflection of academic dishonesty in the artificial intelligence era. J Public Health (Oxf) 2024; 46:e193-e194. [PMID: 37641487 DOI: 10.1093/pubmed/fdad169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Indexed: 08/31/2023] Open
Affiliation(s)
- Raup Padillah
- Guidance and Counselling, Universitas PGRI Banyuwangi, Banyuwangi 41482, Indonesia
| |
Collapse
|
12
|
Trinkley KE, An R, Maw AM, Glasgow RE, Brownson RC. Leveraging artificial intelligence to advance implementation science: potential opportunities and cautions. Implement Sci 2024; 19:17. [PMID: 38383393 PMCID: PMC10880216 DOI: 10.1186/s13012-024-01346-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 01/25/2024] [Indexed: 02/23/2024] Open
Abstract
BACKGROUND The field of implementation science was developed to address the significant time delay between establishing an evidence-based practice and its widespread use. Although implementation science has contributed much toward bridging this gap, the evidence-to-practice chasm remains a challenge. There are some key aspects of implementation science in which advances are needed, including speed and assessing causality and mechanisms. The increasing availability of artificial intelligence applications offers opportunities to help address specific issues faced by the field of implementation science and expand its methods. MAIN TEXT This paper discusses the many ways artificial intelligence can address key challenges in applying implementation science methods while also considering potential pitfalls to the use of artificial intelligence. We answer the questions of "why" the field of implementation science should consider artificial intelligence, for "what" (the purpose and methods), and the "what" (consequences and challenges). We describe specific ways artificial intelligence can address implementation science challenges related to (1) speed, (2) sustainability, (3) equity, (4) generalizability, (5) assessing context and context-outcome relationships, and (6) assessing causality and mechanisms. Examples are provided from global health systems, public health, and precision health that illustrate both potential advantages and hazards of integrating artificial intelligence applications into implementation science methods. We conclude by providing recommendations and resources for implementation researchers and practitioners to leverage artificial intelligence in their work responsibly. CONCLUSIONS Artificial intelligence holds promise to advance implementation science methods ("why") and accelerate its goals of closing the evidence-to-practice gap ("purpose"). However, evaluation of artificial intelligence's potential unintended consequences must be considered and proactively monitored. Given the technical nature of artificial intelligence applications as well as their potential impact on the field, transdisciplinary collaboration is needed and may suggest the need for a subset of implementation scientists cross-trained in both fields to ensure artificial intelligence is used optimally and ethically.
Collapse
Affiliation(s)
- Katy E Trinkley
- Department of Family Medicine, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
- Adult and Child Center for Outcomes Research and Delivery Science Center, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
- Department of Biomedical Informatics, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
- Colorado Center for Personalized Medicine, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| | - Ruopeng An
- Brown School and Division of Computational and Data Sciences at Washington University in St. Louis, St. Louis, MO, USA
| | - Anna M Maw
- Adult and Child Center for Outcomes Research and Delivery Science Center, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- School of Medicine, Division of Hospital Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Russell E Glasgow
- Department of Family Medicine, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Adult and Child Center for Outcomes Research and Delivery Science Center, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Ross C Brownson
- Prevention Research Center, Brown School at Washington University in St. Louis, St. Louis, MO, USA
- Department of Surgery, Division of Public Health Sciences, and Alvin J. Siteman Cancer Center, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| |
Collapse
|
13
|
Rahimi F, Talebi Bezmin Abadi A. ChatGPT and Corporations of Mega-journals Jeopardize the Norms That Underpin Academic Publishing. ARCHIVES OF IRANIAN MEDICINE 2024; 27:110-112. [PMID: 38619035 PMCID: PMC11017264 DOI: 10.34172/aim.2024.17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Accepted: 12/03/2023] [Indexed: 04/16/2024]
Abstract
Those who participate in and contribute to academic publishing are affected by its evolution. Funding bodies, academic institutions, researchers and peer-reviewers, junior scholars, freelance language editors, language-editing services, and journal editors are to enforce and uphold the ethical norms on which academic publishing is founded. Deviating from such norms will challenge and threaten the scholarly reputation, academic careers, and institutional standing; reduce the publishers' true impacts; squander public funding; and erode the public trust to the academic enterprise. Rigorous review is paramount because peer-review norms guarantee that scientific findings are scrutinized before being publicized. Volunteer peer-reviewers and guest journal editors devote an immense amount of unremunerated time to reviewing papers, voluntarily serving the scientific community, and benefiting the publishers. Some mega-journals are motivated to mass-produce publications and attract the funded projects instead of maintaining the scientific rigor. The rapid development of mega-journals may diminish some traditional journals by outcompeting their impacts. Artificial intelligence (AI) tools/algorithms such as ChatGPT may be misused to contribute to the mass-production of publications which may have not been rigorously revised or peer-reviewed. Maintaining norms that guarantee scientific rigor and academic integrity enable the academic community to overcome the new challenges such as mega-journals and AI tools.
Collapse
Affiliation(s)
- Farid Rahimi
- Research School of Biology, The Australian National University, Ngunnawal and Ngambri Country, Canberra, Australia
| | - Amin Talebi Bezmin Abadi
- Department of Bacteriology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
14
|
Kienzle A, Niemann M, Meller S, Gwinner C. ChatGPT May Offer an Adequate Substitute for Informed Consent to Patients Prior to Total Knee Arthroplasty-Yet Caution Is Needed. J Pers Med 2024; 14:69. [PMID: 38248771 PMCID: PMC10821427 DOI: 10.3390/jpm14010069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 12/30/2023] [Accepted: 01/03/2024] [Indexed: 01/23/2024] Open
Abstract
Prior to undergoing total knee arthroplasty (TKA), surgeons are often confronted with patients with numerous questions regarding the procedure and the recovery process. Due to limited staff resources and mounting individual workload, increased efficiency, e.g., using artificial intelligence (AI), is of increasing interest. We comprehensively evaluated ChatGPT's orthopedic responses using the DISCERN instrument. Three independent orthopedic surgeons rated the responses across various criteria. We found consistently high scores, predominantly exceeding a score of three out of five in almost all categories, indicative of the quality and accuracy of the information provided. Notably, the AI demonstrated proficiency in conveying precise and reliable information on orthopedic topics. However, a notable observation pertains to the generation of non-existing references for certain claims. This study underscores the significance of critically evaluating references provided by ChatGPT and emphasizes the necessity of cross-referencing information from established sources. Overall, the findings contribute valuable insights into the performance of ChatGPT in delivering accurate orthopedic information for patients in clinical use while shedding light on areas warranting further refinement. Future iterations of natural language processing systems may be able to replace, in part or in entirety, the preoperative interactions, thereby optimizing the efficiency, accessibility, and standardization of patient communication.
Collapse
Affiliation(s)
- Arne Kienzle
- Center for Musculoskeletal Surgery, Clinic for Orthopedics, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 10117 Berlin, Germany; (M.N.); (S.M.); (C.G.)
- Julius Wolff Institute and Center for Musculoskeletal Surgery, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, Germany
- Berlin Institute of Health at Charité—Universitätsmedizin Berlin, BIH Biomedical Innovation Academy, BIH Charité Clinician Scientist Program, 10117 Berlin, Germany
| | - Marcel Niemann
- Center for Musculoskeletal Surgery, Clinic for Orthopedics, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 10117 Berlin, Germany; (M.N.); (S.M.); (C.G.)
| | - Sebastian Meller
- Center for Musculoskeletal Surgery, Clinic for Orthopedics, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 10117 Berlin, Germany; (M.N.); (S.M.); (C.G.)
| | - Clemens Gwinner
- Center for Musculoskeletal Surgery, Clinic for Orthopedics, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 10117 Berlin, Germany; (M.N.); (S.M.); (C.G.)
| |
Collapse
|
15
|
Khalifa AA, Ibrahim MA. Artificial intelligence (AI) and ChatGPT involvement in scientific and medical writing, a new concern for researchers. A scoping review. ARAB GULF JOURNAL OF SCIENTIFIC RESEARCH 2024. [DOI: 10.1108/agjsr-09-2023-0423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
PurposeThe study aims to evaluate PubMed publications on ChatGPT or artificial intelligence (AI) involvement in scientific or medical writing and investigate whether ChatGPT or AI was used to create these articles or listed as authors.Design/methodology/approachThis scoping review was conducted according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) guidelines. A PubMed database search was performed for articles published between January 1 and November 29, 2023, using appropriate search terms; both authors performed screening and selection independently.FindingsFrom the initial search results of 127 articles, 41 were eligible for final analysis. Articles were published in 34 journals. Editorials were the most common article type, with 15 (36.6%) articles. Authors originated from 27 countries, and authors from the USA contributed the most, with 14 (34.1%) articles. The most discussed topic was AI tools and writing capabilities in 19 (46.3%) articles. AI or ChatGPT was involved in manuscript preparation in 31 (75.6%) articles. None of the articles listed AI or ChatGPT as an author, and in 19 (46.3%) articles, the authors acknowledged utilizing AI or ChatGPT.Practical implicationsResearchers worldwide are concerned with AI or ChatGPT involvement in scientific research, specifically the writing process. The authors believe that precise and mature regulations will be developed soon by journals, publishers and editors, which will pave the way for the best usage of these tools.Originality/valueThis scoping review expressed data published on using AI or ChatGPT in various scientific research and writing aspects, besides alluding to the advantages, disadvantages and implications of their usage.
Collapse
|
16
|
Currie GM. GPT-4 in Nuclear Medicine Education: Does It Outperform GPT-3.5? J Nucl Med Technol 2023; 51:314-317. [PMID: 37852647 DOI: 10.2967/jnmt.123.266485] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/12/2023] [Indexed: 10/20/2023] Open
Abstract
The emergence of ChatGPT has challenged academic integrity in teaching institutions, including those providing nuclear medicine training. Although previous evaluations of ChatGPT have suggested a limited scope for academic writing, the March 2023 release of generative pretrained transformer (GPT)-4 promises enhanced capabilities that require evaluation. Methods: Examinations (final and calculation) and written assignments for nuclear medicine subjects were tested using GPT-3.5 and GPT-4. GPT-3.5 and GPT-4 responses were evaluated by Turnitin software for artificial intelligence scores, marked against standardized rubrics, and compared with the mean performance of student cohorts. Results: ChatGPT powered by GPT-3.5 performed poorly in calculation examinations (31.4%), compared with GPT-4 (59.1%). GPT-3.5 failed each of 3 written tasks (39.9%), whereas GPT-4 passed each task (56.3%). Conclusion: Although GPT-3.5 poses a minimal risk to academic integrity, its usefulness as a cheating tool can be significantly enhanced by GPT-4 but remains prone to hallucination and fabrication.
Collapse
|
17
|
Currie G, Robbie S, Tually P. ChatGPT and Patient Information in Nuclear Medicine: GPT-3.5 Versus GPT-4. J Nucl Med Technol 2023; 51:307-313. [PMID: 37699647 DOI: 10.2967/jnmt.123.266151] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 07/13/2023] [Indexed: 09/14/2023] Open
Abstract
The GPT-3.5-powered ChatGPT was released in late November 2022 powered by the generative pretrained transformer (GPT) version 3.5. It has emerged as a readily accessible source of patient information ahead of medical procedures. Although ChatGPT has purported benefits for supporting patient education and information, actual capability has not been evaluated. Moreover, the March 2023 emergence of paid subscription access to GPT-4 promises further enhanced capabilities requiring evaluation. Methods: ChatGPT was used to generate patient information sheets suitable for gaining informed consent for 7 common procedures in nuclear medicine. Responses were generated independently for both GPT-3.5 and GPT-4 architectures. Specific procedures were selected that had a long-standing history of use to avoid any bias associated with the September 2021 learning cutoff that constrains both GPT-3.5 and GPT-4 architectures. Each information sheet was independently evaluated by 3 expert assessors and ranked on the basis of accuracy, appropriateness, currency, and fitness for purpose. Results: ChatGPT powered by GPT-3.5 provided patient information that was appropriate in terms of being patient-facing but lacked accuracy and currency and omitted important information. GPT-3.5 produced patient information deemed not fit for the purpose. GPT-4 provided patient information enhanced across appropriateness, accuracy, and currency, despite some omission of information. GPT-4 produced patient information that was largely fit for the purpose. Conclusion: Although ChatGPT powered by GPT-3.5 is accessible and provides plausible patient information, inaccuracies and omissions present a risk to patients and informed consent. Conversely, GPT-4 is more accurate and fit for the purpose but, at the time of writing, was available only through a paid subscription.
Collapse
Affiliation(s)
- Geoff Currie
- School of Dentistry and Medical Sciences, Charles Sturt University, Wagga Wagga, New South Wales, Australia;
| | - Stephanie Robbie
- Queensland X-Ray, St. Andrews Hospital, Toowoomba, Queensland, Australia; and
| | - Peter Tually
- School of Dentistry and Medical Sciences, Charles Sturt University, Wagga Wagga, New South Wales, Australia
- Telemed Health, Kalgoorlie, Western Australia, Australia
| |
Collapse
|
18
|
Botelho F, Tshimula JM, Poenaru D. Leveraging ChatGPT to Democratize and Decolonize Global Surgery: Large Language Models for Small Healthcare Budgets. World J Surg 2023; 47:2626-2627. [PMID: 37689598 DOI: 10.1007/s00268-023-07167-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/18/2023] [Indexed: 09/11/2023]
Affiliation(s)
- Fabio Botelho
- Departments of Pediatric Surgery and Surgical and Interventional Sciences, McGill University, Montreal, Canada
| | - Jean Marie Tshimula
- Departments of Pediatric Surgery and Surgical and Interventional Sciences, McGill University, Montreal, Canada
| | - Dan Poenaru
- Departments of Pediatric Surgery and Surgical and Interventional Sciences, McGill University, Montreal, Canada.
| |
Collapse
|
19
|
Arrivé L, Minssen L, Ali A. ChatGPT risk of fabrication in literature searches. Comment on Br J Anaesth 2023; 131: e29-e30. Br J Anaesth 2023; 131:e172-e173. [PMID: 37625909 DOI: 10.1016/j.bja.2023.07.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Revised: 07/14/2023] [Accepted: 07/30/2023] [Indexed: 08/27/2023] Open
Affiliation(s)
- Lionel Arrivé
- Department of Radiology, Saint-Antoine Hospital, Assistance Publique - Hôpitaux de Paris (APHP) and Sorbonne University, Paris, France.
| | - Lise Minssen
- Department of Radiology, Saint-Antoine Hospital, Assistance Publique - Hôpitaux de Paris (APHP) and Sorbonne University, Paris, France
| | - Amal Ali
- Department of Radiology, Saint-Antoine Hospital, Assistance Publique - Hôpitaux de Paris (APHP) and Sorbonne University, Paris, France
| |
Collapse
|
20
|
Tong W, Guan Y, Chen J, Huang X, Zhong Y, Zhang C, Zhang H. Artificial intelligence in global health equity: an evaluation and discussion on the application of ChatGPT, in the Chinese National Medical Licensing Examination. Front Med (Lausanne) 2023; 10:1237432. [PMID: 38020160 PMCID: PMC10656681 DOI: 10.3389/fmed.2023.1237432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/09/2023] [Indexed: 12/01/2023] Open
Abstract
Background The demand for healthcare is increasing globally, with notable disparities in access to resources, especially in Asia, Africa, and Latin America. The rapid development of Artificial Intelligence (AI) technologies, such as OpenAI's ChatGPT, has shown promise in revolutionizing healthcare. However, potential challenges, including the need for specialized medical training, privacy concerns, and language bias, require attention. Methods To assess the applicability and limitations of ChatGPT in Chinese and English settings, we designed an experiment evaluating its performance in the 2022 National Medical Licensing Examination (NMLE) in China. For a standardized evaluation, we used the comprehensive written part of the NMLE, translated into English by a bilingual expert. All questions were input into ChatGPT, which provided answers and reasons for choosing them. Responses were evaluated for "information quality" using the Likert scale. Results ChatGPT demonstrated a correct response rate of 81.25% for Chinese and 86.25% for English questions. Logistic regression analysis showed that neither the difficulty nor the subject matter of the questions was a significant factor in AI errors. The Brier Scores, indicating predictive accuracy, were 0.19 for Chinese and 0.14 for English, indicating good predictive performance. The average quality score for English responses was excellent (4.43 point), slightly higher than for Chinese (4.34 point). Conclusion While AI language models like ChatGPT show promise for global healthcare, language bias is a key challenge. Ensuring that such technologies are robustly trained and sensitive to multiple languages and cultures is vital. Further research into AI's role in healthcare, particularly in areas with limited resources, is warranted.
Collapse
Affiliation(s)
- Wenting Tong
- Department of Pharmacy, Gannan Healthcare Vocational College, Ganzhou, Jiangxi, China
| | - Yongfu Guan
- Department of Rehabilitation and Elderly Care, Gannan Healthcare Vocational College, Ganzhou, Jiangxi, China
| | - Jinping Chen
- Department of Rehabilitation and Elderly Care, Gannan Healthcare Vocational College, Ganzhou, Jiangxi, China
| | - Xixuan Huang
- Department of Mathematics, Xiamen University, Xiamen, Fujian, China
| | - Yuting Zhong
- Department of Anesthesiology, Gannan Medical University, Jiangxi, China
| | - Changrong Zhang
- Department of Chinese Medicine, Affiliated Hospital of Qinghai University, Xining, Qinghai, China
| | - Hui Zhang
- Department of Rehabilitation and Elderly Care, Gannan Healthcare Vocational College, Ganzhou, Jiangxi, China
- Chair of Endocrinology and Medical Sexology (ENDOSEX), Department of Experimental Medicine, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|