1
|
Zhui L, Yhap N, Liping L, Zhengjie W, Zhonghao X, Xiaoshu Y, Hong C, Xuexiu L, Wei R. Impact of Large Language Models on Medical Education and Teaching Adaptations. JMIR Med Inform 2024; 12:e55933. [PMID: 39087590 PMCID: PMC11294775 DOI: 10.2196/55933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 04/25/2024] [Accepted: 06/08/2024] [Indexed: 08/02/2024] Open
Abstract
Unlabelled This viewpoint article explores the transformative role of large language models (LLMs) in the field of medical education, highlighting their potential to enhance teaching quality, promote personalized learning paths, strengthen clinical skills training, optimize teaching assessment processes, boost the efficiency of medical research, and support continuing medical education. However, the use of LLMs entails certain challenges, such as questions regarding the accuracy of information, the risk of overreliance on technology, a lack of emotional recognition capabilities, and concerns related to ethics, privacy, and data security. This article emphasizes that to maximize the potential of LLMs and overcome these challenges, educators must exhibit leadership in medical education, adjust their teaching strategies flexibly, cultivate students' critical thinking, and emphasize the importance of practical experience, thus ensuring that students can use LLMs correctly and effectively. By adopting such a comprehensive and balanced approach, educators can train health care professionals who are proficient in the use of advanced technologies and who exhibit solid professional ethics and practical skills, thus laying a strong foundation for these professionals to overcome future challenges in the health care sector.
Collapse
Affiliation(s)
- Li Zhui
- Department of Vascular Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Nina Yhap
- Department of General Surgery, Queen Elizabeth Hospital, St Michael, Barbados
| | - Liu Liping
- Department of Ultrasound, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Wang Zhengjie
- Department of Nuclear Medicine, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Xiong Zhonghao
- Department of Acupuncture and Moxibustion, Chongqing Traditional Chinese Medicine Hospital, Chongqing, China
| | - Yuan Xiaoshu
- Department of Anesthesia, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Cui Hong
- Department of Anesthesia, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Liu Xuexiu
- Department of Neonatology, Children’s Hospital of Chongqing Medical University, Chongqing, China
| | - Ren Wei
- Department of Vascular Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
2
|
Shang L, Li R, Xue M, Guo Q, Hou Y. Evaluating the application of ChatGPT in China's residency training education: An exploratory study. MEDICAL TEACHER 2024:1-7. [PMID: 38994848 DOI: 10.1080/0142159x.2024.2377808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 07/04/2024] [Indexed: 07/13/2024]
Abstract
OBJECTIVE The purpose of this study was to assess the utility of information generated by ChatGPT for residency education in China. METHODS We designed a three-step survey to evaluate the performance of ChatGPT in China's residency training education including residency final examination questions, patient cases, and resident satisfaction scores. First, 204 questions from the residency final exam were input into ChatGPT's interface to obtain the percentage of correct answers. Next, ChatGPT was asked to generate 20 clinical cases, which were subsequently evaluated by three instructors using a pre-designed Likert scale with 5 points. The quality of the cases was assessed based on criteria including clarity, relevance, logicality, credibility, and comprehensiveness. Finally, interaction sessions between 31 third-year residents and ChatGPT were conducted. Residents' perceptions of ChatGPT's feedback were assessed using a Likert scale, focusing on aspects such as ease of use, accuracy and completeness of responses, and its effectiveness in enhancing understanding of medical knowledge. RESULTS Our results showed ChatGPT-3.5 correctly answered 45.1% of exam questions. In the virtual patient cases, ChatGPT received mean ratings of 4.57 ± 0.50, 4.68 ± 0.47, 4.77 ± 0.46, 4.60 ± 0.53, and 3.95 ± 0.59 points for clarity, relevance, logicality, credibility, and comprehensiveness from clinical instructors, respectively. Among training residents, ChatGPT scored 4.48 ± 0.70, 4.00 ± 0.82 and 4.61 ± 0.50 points for ease of use, accuracy and completeness, and usefulness, respectively. CONCLUSION Our findings demonstrate ChatGPT's immense potential for personalized Chinese medical education.
Collapse
Affiliation(s)
- Luxiang Shang
- Department of Cardiology, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan, China
- Medical Science and Technology Innovation Center, Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China
| | - Rui Li
- Shandong Provincial Center for Disease Control and Prevention, Jinan, China
| | - Mingyue Xue
- Zane Cohen Centre for Digestive Diseases, Mount Sinai Hospital, Toronto, Canada
| | - Qilong Guo
- Department of Cardiology, The Affiliated Hospital of Qingdao University (Pingdu), Qingdao, China
| | - Yinglong Hou
- Department of Cardiology, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan, China
| |
Collapse
|
3
|
Miao Y, Luo Y, Zhao Y, Li J, Liu M, Wang H, Chen Y, Wu Y. Performance of GPT-4 on Chinese Nursing Examination: Potentials for AI-Assisted Nursing Education Using Large Language Models. Nurse Educ 2024:00006223-990000000-00488. [PMID: 38981035 DOI: 10.1097/nne.0000000000001679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
BACKGROUND The performance of GPT-4 in nursing examinations within the Chinese context has not yet been thoroughly evaluated. OBJECTIVE To assess the performance of GPT-4 on multiple-choice and open-ended questions derived from nursing examinations in the Chinese context. METHODS The data sets of the Chinese National Nursing Licensure Examination spanning 2021 to 2023 were used to evaluate the accuracy of GPT-4 in multiple-choice questions. The performance of GPT-4 on open-ended questions was examined using 18 case-based questions. RESULTS For multiple-choice questions, GPT-4 achieved an accuracy of 71.0% (511/720). For open-ended questions, the responses were evaluated for cosine similarity, logical consistency, and information quality, all of which were found to be at a moderate level. CONCLUSION GPT-4 performed well at addressing queries on basic knowledge. However, it has notable limitations in answering open-ended questions. Nursing educators should weigh the benefits and challenges of GPT-4 for integration into nursing education.
Collapse
Affiliation(s)
- Yiqun Miao
- Author Affiliations: School of Nursing, Capital Medical University, Beijing, China (Drs Miao, Luo, Zhao, Li, Liu, Wang, and Wu); and School of Nursing, Johns Hopkins University, Baltimore, USA (Dr Chen)
| | | | | | | | | | | | | | | |
Collapse
|
4
|
Brondani M, Alves C, Ribeiro C, Braga MM, Garcia RCM, Ardenghi T, Pattanaporn K. Artificial intelligence, ChatGPT, and dental education: Implications for reflective assignments and qualitative research. J Dent Educ 2024. [PMID: 38973069 DOI: 10.1002/jdd.13663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 06/02/2024] [Accepted: 06/21/2024] [Indexed: 07/09/2024]
Abstract
INTRODUCTION Reflections enable students to gain additional value from a given experience. The use of Chat Generative Pre-training Transformer (ChatGPT, OpenAI Incorporated) has gained momentum, but its impact on dental education is understudied. OBJECTIVES To assess whether or not university instructors can differentiate reflections generated by ChatGPT from those generated by students, and to assess whether or not the content of a thematic analysis generated by ChatGPT differs from that generated by qualitative researchers on the same reflections. METHODS Hardcopies of 20 reflections (10 generated by undergraduate dental students and 10 generated by ChatGPT) were distributed to three instructors who had at least 5 years of teaching experience. Instructors were asked to assign either 'ChatGPT' or 'student' to each reflection. Ten of these reflections (five generated by undergraduate dental students and five generated by ChatGPT) were randomly selected and distributed to two qualitative researchers who were asked to perform a brief thematic analysis with codes and themes. The same ten reflections were also thematically analyzed by ChatGPT. RESULTS The three instructors correctly determined whether the reflections were student or ChatGPT generated 85% of the time. Most disagreements (40%) happened with the reflections generated by ChatGPT, as the instructors thought to be generated by students. The thematic analyses did not differ substantially when comparing the codes and themes produced by the two researchers with those generated by ChatGPT. CONCLUSIONS Instructors could differentiate between reflections generated by ChatGPT or by students most of the time. The overall content of a thematic analysis generated by the artificial intelligence program ChatGPT did not differ from that generated by qualitative researchers. Overall, the promising applications of ChatGPT will likely generate a paradigm shift in (dental) health education, research, and practice.
Collapse
Affiliation(s)
- Mario Brondani
- Faculty of Dentistry, Department of Oral Health Sciences, University of British Columbia, Vancouver, Canada
| | - Claudia Alves
- Faculty of Dentistry, Department of Dentistry II, Federal University of Maranhão, Sao Luis-Maranhao, Brazil
| | - Cecilia Ribeiro
- Faculty of Dentistry, Department of Dentistry II, Federal University of Maranhão, Sao Luis-Maranhao, Brazil
| | - Mariana M Braga
- Faculty of Dentistry, Department of Pediatric Dentistry, University of São Paulo, Sao Paulo, Brazil
| | - Renata C Mathes Garcia
- Faculty of Dentistry, Prosthodontic and Periodontic Department, University of Campinas, Sao Paulo, Brazil
| | - Thiago Ardenghi
- Faculty of Dentistry, Department of Pediatric Dentistry and Epidemiology, School of Dentistry, Federal University of Santa Maria, Santa Maria, Brazil
| | | |
Collapse
|
5
|
Balasanjeevi G, Surapaneni KM. Comparison of ChatGPT version 3.5 & 4 for utility in respiratory medicine education using clinical case scenarios. Respir Med Res 2024; 85:101091. [PMID: 38657295 DOI: 10.1016/j.resmer.2024.101091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 02/14/2024] [Accepted: 02/15/2024] [Indexed: 04/26/2024]
Abstract
Integration of ChatGPT in Respiratory medicine presents a promising avenue for enhancing clinical practice and pedagogical approaches. This study compares the performance of ChatGPT version 3.5 and 4 in respiratory medicine, emphasizing its potential in clinical decision support and medical education using clinical cases. Results indicate moderate performance highlighting limitations in handling complex case scenarios. Compared to ChatGPT 3.5, version 4 showed greater promise as a pedagogical tool, providing interactive learning experiences. While serving as a preliminary decision support tool clinically, caution is advised, stressing the need for ongoing validation. Future research should refine its clinical capabilities for optimal integration into medical education and practice.
Collapse
Affiliation(s)
- Gayathri Balasanjeevi
- Department of Tuberculosis & Respiratory Diseases, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai 600 123, Tamil Nadu, India
| | - Krishna Mohan Surapaneni
- Department of Biochemistry, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai, 600 123 Tamil Nadu, India; Department of Medical Education, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai, 600 123 Tamil Nadu, India.
| |
Collapse
|
6
|
Temsah MH, Alhuzaimi AN, Almansour M, Aljamaan F, Alhasan K, Batarfi MA, Altamimi I, Alharbi A, Alsuhaibani AA, Alwakeel L, Alzahrani AA, Alsulaim KB, Jamal A, Khayat A, Alghamdi MH, Halwani R, Khan MK, Al-Eyadhy A, Nazer R. Art or Artifact: Evaluating the Accuracy, Appeal, and Educational Value of AI-Generated Imagery in DALL·E 3 for Illustrating Congenital Heart Diseases. J Med Syst 2024; 48:54. [PMID: 38780839 DOI: 10.1007/s10916-024-02072-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 04/30/2024] [Indexed: 05/25/2024]
Abstract
Artificial Intelligence (AI), particularly AI-Generated Imagery, has the potential to impact medical and patient education. This research explores the use of AI-generated imagery, from text-to-images, in medical education, focusing on congenital heart diseases (CHD). Utilizing ChatGPT's DALL·E 3, the research aims to assess the accuracy and educational value of AI-created images for 20 common CHDs. In this study, we utilized DALL·E 3 to generate a comprehensive set of 110 images, comprising ten images depicting the normal human heart and five images for each of the 20 common CHDs. The generated images were evaluated by a diverse group of 33 healthcare professionals. This cohort included cardiology experts, pediatricians, non-pediatric faculty members, trainees (medical students, interns, pediatric residents), and pediatric nurses. Utilizing a structured framework, these professionals assessed each image for anatomical accuracy, the usefulness of in-picture text, its appeal to medical professionals, and the image's potential applicability in medical presentations. Each item was assessed on a Likert scale of three. The assessments produced a total of 3630 images' assessments. Most AI-generated cardiac images were rated poorly as follows: 80.8% of images were rated as anatomically incorrect or fabricated, 85.2% rated to have incorrect text labels, 78.1% rated as not usable for medical education. The nurses and medical interns were found to have a more positive perception about the AI-generated cardiac images compared to the faculty members, pediatricians, and cardiology experts. Complex congenital anomalies were found to be significantly more predicted to anatomical fabrication compared to simple cardiac anomalies. There were significant challenges identified in image generation. Based on our findings, we recommend a vigilant approach towards the use of AI-generated imagery in medical education at present, underscoring the imperative for thorough validation and the importance of collaboration across disciplines. While we advise against its immediate integration until further validations are conducted, the study advocates for future AI-models to be fine-tuned with accurate medical data, enhancing their reliability and educational utility.
Collapse
Affiliation(s)
- Mohamad-Hani Temsah
- College of Medicine, King Saud University, Riyadh, Saudi Arabia.
- Pediatric Department, King Saud University Medical City, King Saud University, Riyadh, Saudi Arabia.
- Evidence-Based Health Care & Knowledge Translation Research Chair, Family & Community Medicine Department, College of Medicine, King Saud University, 11362, Riyadh, Saudi Arabia.
| | - Abdullah N Alhuzaimi
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
- Division of Pediatric Cardiology, Cardiac Science Department, College of Medicine, King Saud University Medical City, 11362, Riyadh, Saudi Arabia
| | - Mohammed Almansour
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
- Department of Medical Education, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Fadi Aljamaan
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
- Critical Care Department, King Saud University Medical City, Riyadh, Saudi Arabia
| | - Khalid Alhasan
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
- Pediatric Department, King Saud University Medical City, King Saud University, Riyadh, Saudi Arabia
- Kidney & Pancreas Health Center, Organ Transplant Center of Excellence, King Faisal Specialist Hospital & Research Center, Riyadh, Saudi Arabia
| | - Munirah A Batarfi
- Basic Medical Sciences, College of Medicine King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia
| | | | - Amani Alharbi
- Pediatric Department, King Saud University Medical City, King Saud University, Riyadh, Saudi Arabia
| | | | - Leena Alwakeel
- Pediatric Department, King Saud University Medical City, King Saud University, Riyadh, Saudi Arabia
| | | | | | - Amr Jamal
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
- Evidence-Based Health Care & Knowledge Translation Research Chair, Family & Community Medicine Department, College of Medicine, King Saud University, 11362, Riyadh, Saudi Arabia
- Department of Family and Community Medicine, King Saud University Medical City, 11362, Riyadh, Saudi Arabia
| | - Afnan Khayat
- Health Information Management Department, Prince Sultan Military College of Health Sciences, Al Dhahran, Saudi Arabia
| | - Mohammed Hussien Alghamdi
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
- Division of Pediatric Cardiology, Cardiac Science Department, College of Medicine, King Saud University Medical City, 11362, Riyadh, Saudi Arabia
- Department of Medical Education, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Rabih Halwani
- Department of Clinical Sciences, College of Medicine, University of Sharjah, 27272, Sharjah, United Arab Emirates
- Research Institute for Medical and Health Sciences, University of Sharjah, 27272, Sharjah, United Arab Emirates
| | - Muhammad Khurram Khan
- Center of Excellence in Information Assurance, King Saud University, 11653, Riyadh, Saudi Arabia
| | - Ayman Al-Eyadhy
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
- Pediatric Department, King Saud University Medical City, King Saud University, Riyadh, Saudi Arabia
| | - Rakan Nazer
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
- Department of Cardiac Science, King Fahad Cardiac Center, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
7
|
Moulin TC. Learning with AI Language Models: Guidelines for the Development and Scoring of Medical Questions for Higher Education. J Med Syst 2024; 48:45. [PMID: 38652327 DOI: 10.1007/s10916-024-02069-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 04/11/2024] [Indexed: 04/25/2024]
Abstract
In medical and biomedical education, traditional teaching methods often struggle to engage students and promote critical thinking. The use of AI language models has the potential to transform teaching and learning practices by offering an innovative, active learning approach that promotes intellectual curiosity and deeper understanding. To effectively integrate AI language models into biomedical education, it is essential for educators to understand the benefits and limitations of these tools and how they can be employed to achieve high-level learning outcomes.This article explores the use of AI language models in biomedical education, focusing on their application in both classroom teaching and learning assignments. Using the SOLO taxonomy as a framework, I discuss strategies for designing questions that challenge students to exercise critical thinking and problem-solving skills, even when assisted by AI models. Additionally, I propose a scoring rubric for evaluating student performance when collaborating with AI language models, ensuring a comprehensive assessment of their learning outcomes.AI language models offer a promising opportunity for enhancing student engagement and promoting active learning in the biomedical field. Understanding the potential use of these technologies allows educators to create learning experiences that are fit for their students' needs, encouraging intellectual curiosity and a deeper understanding of complex subjects. The application of these tools will be fundamental to provide more effective and engaging learning experiences for students in the future.
Collapse
Affiliation(s)
- Thiago C Moulin
- Department of Experimental Medical Science, Lund University, Lund, Sweden.
- Department of Surgical Sciences, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
8
|
Quttainah M, Mishra V, Madakam S, Lurie Y, Mark S. Cost, Usability, Credibility, Fairness, Accountability, Transparency, and Explainability Framework for Safe and Effective Large Language Models in Medical Education: Narrative Review and Qualitative Study. JMIR AI 2024; 3:e51834. [PMID: 38875562 PMCID: PMC11077408 DOI: 10.2196/51834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 12/20/2023] [Accepted: 02/03/2024] [Indexed: 06/16/2024]
Abstract
BACKGROUND The world has witnessed increased adoption of large language models (LLMs) in the last year. Although the products developed using LLMs have the potential to solve accessibility and efficiency problems in health care, there is a lack of available guidelines for developing LLMs for health care, especially for medical education. OBJECTIVE The aim of this study was to identify and prioritize the enablers for developing successful LLMs for medical education. We further evaluated the relationships among these identified enablers. METHODS A narrative review of the extant literature was first performed to identify the key enablers for LLM development. We additionally gathered the opinions of LLM users to determine the relative importance of these enablers using an analytical hierarchy process (AHP), which is a multicriteria decision-making method. Further, total interpretive structural modeling (TISM) was used to analyze the perspectives of product developers and ascertain the relationships and hierarchy among these enablers. Finally, the cross-impact matrix-based multiplication applied to a classification (MICMAC) approach was used to determine the relative driving and dependence powers of these enablers. A nonprobabilistic purposive sampling approach was used for recruitment of focus groups. RESULTS The AHP demonstrated that the most important enabler for LLMs was credibility, with a priority weight of 0.37, followed by accountability (0.27642) and fairness (0.10572). In contrast, usability, with a priority weight of 0.04, showed negligible importance. The results of TISM concurred with the findings of the AHP. The only striking difference between expert perspectives and user preference evaluation was that the product developers indicated that cost has the least importance as a potential enabler. The MICMAC analysis suggested that cost has a strong influence on other enablers. The inputs of the focus group were found to be reliable, with a consistency ratio less than 0.1 (0.084). CONCLUSIONS This study is the first to identify, prioritize, and analyze the relationships of enablers of effective LLMs for medical education. Based on the results of this study, we developed a comprehendible prescriptive framework, named CUC-FATE (Cost, Usability, Credibility, Fairness, Accountability, Transparency, and Explainability), for evaluating the enablers of LLMs in medical education. The study findings are useful for health care professionals, health technology experts, medical technology regulators, and policy makers.
Collapse
Affiliation(s)
- Majdi Quttainah
- College of Business Administration, Kuwait University, Kuwait, Kuwait
| | - Vinaytosh Mishra
- College of Healthcare Management and Economics, Gulf Medical University, Ajman, United Arab Emirates
| | - Somayya Madakam
- Information Technology, Birla Institute of Management Technology, Knowledge Park - II, Greater Noida, India
| | - Yotam Lurie
- Department of Management, Ben-Gurion University, Negev, Israel
| | - Shlomo Mark
- Department of Software Engineering, Shamoon College of Engineering, Ashdod, Israel
| |
Collapse
|
9
|
Shorey S, Mattar C, Pereira TLB, Choolani M. A scoping review of ChatGPT's role in healthcare education and research. NURSE EDUCATION TODAY 2024; 135:106121. [PMID: 38340639 DOI: 10.1016/j.nedt.2024.106121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/05/2024] [Accepted: 02/04/2024] [Indexed: 02/12/2024]
Abstract
OBJECTIVES To examine and consolidate literature regarding the advantages and disadvantages of utilizing ChatGPT in healthcare education and research. DESIGN/METHODS We searched seven electronic databases (PubMed/Medline, CINAHL, Embase, PsycINFO, Scopus, ProQuest Dissertations and Theses Global, and Web of Science) from November 2022 until September 2023. This scoping review adhered to Arksey and O'Malley's framework and followed reporting guidelines outlined in the PRISMA-ScR checklist. For analysis, we employed Thomas and Harden's thematic synthesis framework. RESULTS A total of 100 studies were included. An overarching theme, "Forging the Future: Bridging Theory and Integration of ChatGPT" emerged, accompanied by two main themes (1) Enhancing Healthcare Education, Research, and Writing with ChatGPT, (2) Controversies and Concerns about ChatGPT in Healthcare Education Research and Writing, and seven subthemes. CONCLUSIONS Our review underscores the importance of acknowledging legitimate concerns related to the potential misuse of ChatGPT such as 'ChatGPT hallucinations', its limited understanding of specialized healthcare knowledge, its impact on teaching methods and assessments, confidentiality and security risks, and the controversial practice of crediting it as a co-author on scientific papers, among other considerations. Furthermore, our review also recognizes the urgency of establishing timely guidelines and regulations, along with the active engagement of relevant stakeholders, to ensure the responsible and safe implementation of ChatGPT's capabilities. We advocate for the use of cross-verification techniques to enhance the precision and reliability of generated content, the adaptation of higher education curricula to incorporate ChatGPT's potential, educators' need to familiarize themselves with the technology to improve their literacy and teaching approaches, and the development of innovative methods to detect ChatGPT usage. Furthermore, data protection measures should be prioritized when employing ChatGPT, and transparent reporting becomes crucial when integrating ChatGPT into academic writing.
Collapse
Affiliation(s)
- Shefaly Shorey
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
| | - Citra Mattar
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynaecology, National University Health Systems, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Travis Lanz-Brian Pereira
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Mahesh Choolani
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynaecology, National University Health Systems, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| |
Collapse
|
10
|
Gordon M, Daniel M, Ajiboye A, Uraiby H, Xu NY, Bartlett R, Hanson J, Haas M, Spadafore M, Grafton-Clarke C, Gasiea RY, Michie C, Corral J, Kwan B, Dolmans D, Thammasitboon S. A scoping review of artificial intelligence in medical education: BEME Guide No. 84. MEDICAL TEACHER 2024; 46:446-470. [PMID: 38423127 DOI: 10.1080/0142159x.2024.2314198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 01/31/2024] [Indexed: 03/02/2024]
Abstract
BACKGROUND Artificial Intelligence (AI) is rapidly transforming healthcare, and there is a critical need for a nuanced understanding of how AI is reshaping teaching, learning, and educational practice in medical education. This review aimed to map the literature regarding AI applications in medical education, core areas of findings, potential candidates for formal systematic review and gaps for future research. METHODS This rapid scoping review, conducted over 16 weeks, employed Arksey and O'Malley's framework and adhered to STORIES and BEME guidelines. A systematic and comprehensive search across PubMed/MEDLINE, EMBASE, and MedEdPublish was conducted without date or language restrictions. Publications included in the review spanned undergraduate, graduate, and continuing medical education, encompassing both original studies and perspective pieces. Data were charted by multiple author pairs and synthesized into various thematic maps and charts, ensuring a broad and detailed representation of the current landscape. RESULTS The review synthesized 278 publications, with a majority (68%) from North American and European regions. The studies covered diverse AI applications in medical education, such as AI for admissions, teaching, assessment, and clinical reasoning. The review highlighted AI's varied roles, from augmenting traditional educational methods to introducing innovative practices, and underscores the urgent need for ethical guidelines in AI's application in medical education. CONCLUSION The current literature has been charted. The findings underscore the need for ongoing research to explore uncharted areas and address potential risks associated with AI use in medical education. This work serves as a foundational resource for educators, policymakers, and researchers in navigating AI's evolving role in medical education. A framework to support future high utility reporting is proposed, the FACETS framework.
Collapse
Affiliation(s)
- Morris Gordon
- School of Medicine and Dentistry, University of Central Lancashire, Preston, UK
- Blackpool Hospitals NHS Foundation Trust, Blackpool, UK
| | - Michelle Daniel
- School of Medicine, University of California, San Diego, SanDiego, CA, USA
| | - Aderonke Ajiboye
- School of Medicine and Dentistry, University of Central Lancashire, Preston, UK
| | - Hussein Uraiby
- Department of Cellular Pathology, University Hospitals of Leicester NHS Trust, Leicester, UK
| | - Nicole Y Xu
- School of Medicine, University of California, San Diego, SanDiego, CA, USA
| | - Rangana Bartlett
- Department of Cognitive Science, University of California, San Diego, CA, USA
| | - Janice Hanson
- Department of Medicine and Office of Education, School of Medicine, Washington University in Saint Louis, Saint Louis, MO, USA
| | - Mary Haas
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Maxwell Spadafore
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | | | - Colin Michie
- School of Medicine and Dentistry, University of Central Lancashire, Preston, UK
| | - Janet Corral
- Department of Medicine, University of Nevada Reno, School of Medicine, Reno, NV, USA
| | - Brian Kwan
- School of Medicine, University of California, San Diego, SanDiego, CA, USA
| | - Diana Dolmans
- School of Health Professions Education, Faculty of Health, Maastricht University, Maastricht, NL, USA
| | - Satid Thammasitboon
- Center for Research, Innovation and Scholarship in Health Professions Education, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
11
|
Yu J, Matava C. ChatGPT for Parents of Children Seeking Emergency Care - so much Hope, so much Caution. J Med Syst 2024; 48:17. [PMID: 38305947 DOI: 10.1007/s10916-024-02036-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 01/17/2024] [Indexed: 02/03/2024]
Affiliation(s)
- Julie Yu
- Department of Anesthesia and Pain Medicine, The Hospital for Sick Children, 555 University Avenue, Toronto, ON, Canada
- Department of Anesthesiology and Pain Medicine, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Clyde Matava
- Department of Anesthesia and Pain Medicine, The Hospital for Sick Children, 555 University Avenue, Toronto, ON, Canada.
- Department of Anesthesiology and Pain Medicine, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
12
|
Ting DSJ, Tan TF, Ting DSW. ChatGPT in ophthalmology: the dawn of a new era? Eye (Lond) 2024; 38:4-7. [PMID: 37369764 PMCID: PMC10764795 DOI: 10.1038/s41433-023-02619-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 05/22/2023] [Accepted: 06/02/2023] [Indexed: 06/29/2023] Open
Affiliation(s)
- Darren Shu Jeng Ting
- Birmingham and Midland Eye Centre, Birmingham, UK
- Academic Unit of Ophthalmology, Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK
- Academic Ophthalmology, School of Medicine, University of Nottingham, Nottingham, UK
| | - Ting Fang Tan
- Artificial Intelligence and Digital Innovation Research Group, Singapore Eye Research Institute, Singapore, Singapore
- Singapore National Eye Centre, Singapore, Singapore
| | - Daniel Shu Wei Ting
- Artificial Intelligence and Digital Innovation Research Group, Singapore Eye Research Institute, Singapore, Singapore.
- Singapore National Eye Centre, Singapore, Singapore.
- Department of Ophthalmology and Visual Sciences, Duke-National University of Singapore Medical School, Singapore, Singapore.
| |
Collapse
|
13
|
Huang CH, Hsiao HJ, Yeh PC, Wu KC, Kao CH. Performance of ChatGPT on Stage 1 of the Taiwanese medical licensing exam. Digit Health 2024; 10:20552076241233144. [PMID: 38371244 PMCID: PMC10874144 DOI: 10.1177/20552076241233144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 01/25/2024] [Indexed: 02/20/2024] Open
Abstract
Introduction Since its release by OpenAI in November 2022, numerous studies have subjected ChatGPT to various tests to evaluate its performance in medical exams. The objective of this study is to evaluate ChatGPT's accuracy and logical reasoning across all 10 subjects featured in Stage 1 of Senior Professional and Technical Examinations for Medical Doctors (SPTEMD) in Taiwan, with questions that encompass both Chinese and English. Methods In this study, we tested ChatGPT-4 to complete SPTEMD Stage 1. The model was presented with multiple-choice questions extracted from three separate tests conducted in February 2022, July 2022, and February 2023. These questions encompass 10 subjects, namely biochemistry and molecular biology, anatomy, embryology and developmental biology, histology, physiology, microbiology and immunology, parasitology, pharmacology, pathology, and public health. Subsequently, we analyzed the model's accuracy for each subject. Result In all three tests, ChatGPT achieved scores surpassing the 60% passing threshold, resulting in an overall average score of 87.8%. Notably, its best performance was in biochemistry, where it garnered an average score of 93.8%. Conversely, the performance of the generative pre-trained transformer (GPT)-4 assistant on anatomy, parasitology, and embryology was not as good. In addition, its scores were highly variable in embryology and parasitology. Conclusion ChatGPT has the potential to facilitate not only exam preparation but also improve the accessibility of medical education and support continuous education for medical professionals. In conclusion, this study has demonstrated ChatGPT's potential competence across various subjects within the SPTEMD Stage 1 and suggests that it could be a helpful tool for learning and exam preparation for medical students and professionals.
Collapse
Affiliation(s)
| | - Han-Jung Hsiao
- Artificial Intelligence Center, China Medical University Hospital, China Medical University, Taichung
| | - Pei-Chun Yeh
- Artificial Intelligence Center, China Medical University Hospital, China Medical University, Taichung
| | - Kuo-Chen Wu
- Artificial Intelligence Center, China Medical University Hospital, China Medical University, Taichung
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei
| | - Chia-Hung Kao
- Artificial Intelligence Center, China Medical University Hospital, China Medical University, Taichung
- Graduate Institute of Biomedical Sciences, School of Medicine, College of Medicine, China Medical University, Taichung
- Department of Nuclear Medicine and PET Center, China Medical University Hospital, Taichung
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung
| |
Collapse
|
14
|
Zhang JS, Yoon C, Williams DKA, Pinkas A. Exploring the Usage of ChatGPT Among Medical Students in the United States. JOURNAL OF MEDICAL EDUCATION AND CURRICULAR DEVELOPMENT 2024; 11:23821205241264695. [PMID: 39092290 PMCID: PMC11292693 DOI: 10.1177/23821205241264695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 06/08/2024] [Indexed: 08/04/2024]
Abstract
OBJECTIVES Chat Generative Pretrained Transformer (ChatGPT) is a large language model developed by OpenAI that has gained widespread interest. It has been cited for its potential impact on health care and its beneficial role in medical education. However, there is limited investigation into its use among medical students. In this study, we evaluated the frequency of ChatGPT use, motivations for use, and preference for ChatGPT over existing resources among medical students in the United States. METHODS Data was collected from an original survey consisting of 14 questions assessing the frequency and usage of ChatGPT in various contexts within medical education. The survey was distributed via email lists, group messaging applications, and classroom lectures to medical students across the United States. Responses were collected between August and October 2023. RESULTS One hundred thirty-one participants completed the survey and were included in the analysis. Of the total, 48.9% respondents responded that they have used ChatGPT in medical studies. Among ChatGPT users, 43.7% of respondents report using ChatGPT weekly, several times per week, or daily. ChatGPT is most used for writing, revising, editing, and summarizing purposes. 37.5% and 41.3% of respondents reported using ChatGPT more than 25% of the working time for these tasks respectively. Among respondents who have not used ChatGPT, more than 50% of respondents reported they were extremely unlikely or unlikely to use ChatGPT across all surveyed scenarios. ChatGPT users report they are more likely to use ChatGPT over directly asking professors or attendings (45.3%), textbooks (42.2%), and lectures (31.7%), and least likely to be used over popular flashcard application Anki (11.1%) and medical education videos (9.5%). CONCLUSIONS ChatGPT is an increasingly popular resource among medical students, with many preferring ChatGPT over other traditional resources such as professors, textbooks, and lectures. Its impact on medical education will only continue to grow as its capabilities improve.
Collapse
Affiliation(s)
| | - Christine Yoon
- Albert Einstein College of Medicine, Bronx, New York, USA
| | | | - Adi Pinkas
- Albert Einstein College of Medicine, Bronx, New York, USA
| |
Collapse
|
15
|
Madrid-García A, Rosales-Rosado Z, Freites-Nuñez D, Pérez-Sancristóbal I, Pato-Cour E, Plasencia-Rodríguez C, Cabeza-Osorio L, Abasolo-Alcázar L, León-Mateos L, Fernández-Gutiérrez B, Rodríguez-Rodríguez L. Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training. Sci Rep 2023; 13:22129. [PMID: 38092821 PMCID: PMC10719375 DOI: 10.1038/s41598-023-49483-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 12/08/2023] [Indexed: 12/17/2023] Open
Abstract
The emergence of large language models (LLM) with remarkable performance such as ChatGPT and GPT-4, has led to an unprecedented uptake in the population. One of their most promising and studied applications concerns education due to their ability to understand and generate human-like text, creating a multitude of opportunities for enhancing educational practices and outcomes. The objective of this study is twofold: to assess the accuracy of ChatGPT/GPT-4 in answering rheumatology questions from the access exam to specialized medical training in Spain (MIR), and to evaluate the medical reasoning followed by these LLM to answer those questions. A dataset, RheumaMIR, of 145 rheumatology-related questions, extracted from the exams held between 2010 and 2023, was created for that purpose, used as a prompt for the LLM, and was publicly distributed. Six rheumatologists with clinical and teaching experience evaluated the clinical reasoning of the chatbots using a 5-point Likert scale and their degree of agreement was analyzed. The association between variables that could influence the models' accuracy (i.e., year of the exam question, disease addressed, type of question and genre) was studied. ChatGPT demonstrated a high level of performance in both accuracy, 66.43%, and clinical reasoning, median (Q1-Q3), 4.5 (2.33-4.67). However, GPT-4 showed better performance with an accuracy score of 93.71% and a median clinical reasoning value of 4.67 (4.5-4.83). These findings suggest that LLM may serve as valuable tools in rheumatology education, aiding in exam preparation and supplementing traditional teaching methods.
Collapse
Affiliation(s)
- Alfredo Madrid-García
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain.
| | - Zulema Rosales-Rosado
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Dalifer Freites-Nuñez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Inés Pérez-Sancristóbal
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Esperanza Pato-Cour
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | | | - Luis Cabeza-Osorio
- Medicina Interna, Hospital Universitario del Henares, Avenida de Marie Curie, 0, 28822, Madrid, Spain
- Facultad de Medicina, Universidad Francisco de Vitoria, Carretera Pozuelo, Km 1800, 28223, Madrid, Spain
| | - Lydia Abasolo-Alcázar
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Leticia León-Mateos
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Benjamín Fernández-Gutiérrez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
- Facultad de Medicina, Universidad Complutense de Madrid, Madrid, Spain
| | - Luis Rodríguez-Rodríguez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| |
Collapse
|
16
|
Agarwal M, Goswami A, Sharma P. Evaluating ChatGPT-3.5 and Claude-2 in Answering and Explaining Conceptual Medical Physiology Multiple-Choice Questions. Cureus 2023; 15:e46222. [PMID: 37908959 PMCID: PMC10613833 DOI: 10.7759/cureus.46222] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/29/2023] [Indexed: 11/02/2023] Open
Abstract
Background Generative artificial intelligence (AI) systems such as ChatGPT-3.5 and Claude-2 may assist in explaining complex medical science topics. A few studies have shown that AI can solve complicated physiology problems that require critical thinking and analysis. However, further studies are required to validate the effectiveness of AI in answering conceptual multiple-choice questions (MCQs) in human physiology. Objective This study aimed to evaluate and compare the proficiency of ChatGPT-3.5 and Claude-2 in answering and explaining a curated set of MCQs in medical physiology. Methods In this cross-sectional study, a set of 55 MCQs from 10 competencies of medical physiology was purposefully constructed that required comprehension, problem-solving, and analytical skills to solve them. The MCQs and a structured prompt for response generation were presented to ChatGPT-3.5 and Claude-2. The explanations provided by both AI systems were documented in an Excel spreadsheet. All three authors subjected these explanations to a rating process using a scale of 0 to 3. A rating of 0 was assigned to an incorrect, 1 to a partially correct, 2 to a correct explanation with some aspects missing, and 3 to a perfectly correct explanation. Both AI models were evaluated for their ability to choose the correct answer (option) and provide clear and comprehensive explanations of the MCQs. The Mann-Whitney U test was used to compare AI responses. The Fleiss multi-rater kappa (κ) was used to determine the score agreement among the three raters. The statistical significance level was decided at P ≤ 0.05. Results Claude-2 answered 40 MCQs correctly, which was significantly higher than the 26 correct responses from ChatGPT-3.5. The rating distribution for the explanations generated by Claude-2 was significantly higher than that of ChatGPT-3.5. The κ values were 0.804 and 0.818 for Claude-2 and ChatGPT-3.5, respectively. Conclusion In terms of answering and elucidating conceptual MCQs in medical physiology, Claude-2 surpassed ChatGPT-3.5. However, accessing Claude-2 from India requires the use of a virtual private network, which may raise security concerns.
Collapse
Affiliation(s)
- Mayank Agarwal
- Physiology, All India Institute of Medical Sciences, Raebareli, IND
| | - Ayan Goswami
- Physiology, Santiniketan Medical College, Bolpur, IND
| | - Priyanka Sharma
- Physiology, School of Medical Sciences & Research, Sharda University, Greater Noida, IND
| |
Collapse
|