1
|
Cherif H, Moussa C, Missaoui AM, Salouage I, Mokaddem S, Dhahri B. Appraisal of ChatGPT's Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination. JMIR MEDICAL EDUCATION 2024; 10:e52818. [PMID: 39042876 DOI: 10.2196/52818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 02/05/2024] [Accepted: 02/26/2024] [Indexed: 07/25/2024]
Abstract
BACKGROUND The rapid evolution of ChatGPT has generated substantial interest and led to extensive discussions in both public and academic domains, particularly in the context of medical education. OBJECTIVE This study aimed to evaluate ChatGPT's performance in a pulmonology examination through a comparative analysis with that of third-year medical students. METHODS In this cross-sectional study, we conducted a comparative analysis with 2 distinct groups. The first group comprised 244 third-year medical students who had previously taken our institution's 2020 pulmonology examination, which was conducted in French. The second group involved ChatGPT-3.5 in 2 separate sets of conversations: without contextualization (V1) and with contextualization (V2). In both V1 and V2, ChatGPT received the same set of questions administered to the students. RESULTS V1 demonstrated exceptional proficiency in radiology, microbiology, and thoracic surgery, surpassing the majority of medical students in these domains. However, it faced challenges in pathology, pharmacology, and clinical pneumology. In contrast, V2 consistently delivered more accurate responses across various question categories, regardless of the specialization. ChatGPT exhibited suboptimal performance in multiple choice questions compared to medical students. V2 excelled in responding to structured open-ended questions. Both ChatGPT conversations, particularly V2, outperformed students in addressing questions of low and intermediate difficulty. Interestingly, students showcased enhanced proficiency when confronted with highly challenging questions. V1 fell short of passing the examination. Conversely, V2 successfully achieved examination success, outperforming 139 (62.1%) medical students. CONCLUSIONS While ChatGPT has access to a comprehensive web-based data set, its performance closely mirrors that of an average medical student. Outcomes are influenced by question format, item complexity, and contextual nuances. The model faces challenges in medical contexts requiring information synthesis, advanced analytical aptitude, and clinical judgment, as well as in non-English language assessments and when confronted with data outside mainstream internet sources.
Collapse
Affiliation(s)
- Hela Cherif
- Faculté de Médecine de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Chirine Moussa
- Faculté de Médecine de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | | | - Issam Salouage
- Faculté de Médecine de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Salma Mokaddem
- Faculté de Médecine de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Besma Dhahri
- Faculté de Médecine de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| |
Collapse
|
2
|
Altamimi I, Alhumimidi A, Alshehri S, Alrumayan A, Al-khlaiwi T, Meo SA, Temsah MH. The scientific knowledge of three large language models in cardiology: multiple-choice questions examination-based performance. Ann Med Surg (Lond) 2024; 86:3261-3266. [PMID: 38846858 PMCID: PMC11152788 DOI: 10.1097/ms9.0000000000002120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 04/16/2024] [Indexed: 06/09/2024] Open
Abstract
Background The integration of artificial intelligence (AI) chatbots like Google's Bard, OpenAI's ChatGPT, and Microsoft's Bing Chatbot into academic and professional domains, including cardiology, has been rapidly evolving. Their application in educational and research frameworks, however, raises questions about their efficacy, particularly in specialized fields like cardiology. This study aims to evaluate the knowledge depth and accuracy of these AI chatbots in cardiology using a multiple-choice question (MCQ) format. Methods The study was conducted as an exploratory, cross-sectional study in November 2023 on a bank of 100 MCQs covering various cardiology topics that was created from authoritative textbooks and question banks. These MCQs were then used to assess the knowledge level of Google's Bard, Microsoft Bing, and ChatGPT 4.0. Each question was entered manually into the chatbots, ensuring no memory retention bias. Results The study found that ChatGPT 4.0 demonstrated the highest knowledge score in cardiology, with 87% accuracy, followed by Bing at 60% and Bard at 46%. The performance varied across different cardiology subtopics, with ChatGPT consistently outperforming the others. Notably, the study revealed significant differences in the proficiency of these chatbots in specific cardiology domains. Conclusion This study highlights a spectrum of efficacy among AI chatbots in disseminating cardiology knowledge. ChatGPT 4.0 emerged as a potential auxiliary educational resource in cardiology, surpassing traditional learning methods in some aspects. However, the variability in performance among these AI systems underscores the need for cautious evaluation and continuous improvement, especially for chatbots like Bard, to ensure reliability and accuracy in medical knowledge dissemination.
Collapse
Affiliation(s)
- Ibraheem Altamimi
- College of Medicine
- Evidence-Based Health Care and Knowledge Translation Research Chair, Family and Community Medicine Department, College of Medicine, King Saud University
| | | | | | - Abdullah Alrumayan
- College of Medicine, King Saud Bin Abdulaziz University for Health and Sciences, Riyadh, Saudi Arabia
| | | | | | - Mohamad-Hani Temsah
- College of Medicine
- Evidence-Based Health Care and Knowledge Translation Research Chair, Family and Community Medicine Department, College of Medicine, King Saud University
- Pediatric Intensive Care Unit, Pediatric Department, College of Medicine, King Saud University Medical City
| |
Collapse
|
3
|
Shojaee-Mend H, Mohebbati R, Amiri M, Atarodi A. Evaluating the strengths and weaknesses of large language models in answering neurophysiology questions. Sci Rep 2024; 14:10785. [PMID: 38734712 PMCID: PMC11088627 DOI: 10.1038/s41598-024-60405-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 04/23/2024] [Indexed: 05/13/2024] Open
Abstract
Large language models (LLMs), like ChatGPT, Google's Bard, and Anthropic's Claude, showcase remarkable natural language processing capabilities. Evaluating their proficiency in specialized domains such as neurophysiology is crucial in understanding their utility in research, education, and clinical applications. This study aims to assess and compare the effectiveness of Large Language Models (LLMs) in answering neurophysiology questions in both English and Persian (Farsi) covering a range of topics and cognitive levels. Twenty questions covering four topics (general, sensory system, motor system, and integrative) and two cognitive levels (lower-order and higher-order) were posed to the LLMs. Physiologists scored the essay-style answers on a scale of 0-5 points. Statistical analysis compared the scores across different levels such as model, language, topic, and cognitive levels. Performing qualitative analysis identified reasoning gaps. In general, the models demonstrated good performance (mean score = 3.87/5), with no significant difference between language or cognitive levels. The performance was the strongest in the motor system (mean = 4.41) while the weakest was observed in integrative topics (mean = 3.35). Detailed qualitative analysis uncovered deficiencies in reasoning, discerning priorities, and knowledge integrating. This study offers valuable insights into LLMs' capabilities and limitations in the field of neurophysiology. The models demonstrate proficiency in general questions but face challenges in advanced reasoning and knowledge integration. Targeted training could address gaps in knowledge and causal reasoning. As LLMs evolve, rigorous domain-specific assessments will be crucial for evaluating advancements in their performance.
Collapse
Affiliation(s)
- Hassan Shojaee-Mend
- Department of General Courses, Faculty of Medicine, Gonabad University of Medical Sciences, Gonabad, Iran
| | - Reza Mohebbati
- Department of Physiology, Faculty of Medicine, Gonabad University of Medical Sciences, Gonabad, Iran
| | - Mostafa Amiri
- Department of General Courses, Faculty of Medicine, Gonabad University of Medical Sciences, Gonabad, Iran
- Department of English Language and General Courses, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Alireza Atarodi
- Department of Knowledge and Information Science, Paramedical College and Social Development & Health Promotion Research Center, Gonabad University of Medical Sciences, Gonabad, Iran.
| |
Collapse
|
4
|
Hu G, Liu L, Xu D. On the Responsible Use of Chatbots in Bioinformatics. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae002. [PMID: 38862428 PMCID: PMC11104453 DOI: 10.1093/gpbjnl/qzae002] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 11/08/2023] [Accepted: 11/14/2023] [Indexed: 06/13/2024]
Affiliation(s)
- Gangqing Hu
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
| | - Dong Xu
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
5
|
Hacking S. ChatGPT and Medicine: Together We Embrace the AI Renaissance. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2024; 5:e52700. [PMID: 38935938 PMCID: PMC11135232 DOI: 10.2196/52700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 04/16/2024] [Accepted: 04/17/2024] [Indexed: 06/29/2024]
Abstract
The generative artificial intelligence (AI) model ChatGPT holds transformative prospects in medicine. The development of such models has signaled the beginning of a new era where complex biological data can be made more accessible and interpretable. ChatGPT is a natural language processing tool that can process, interpret, and summarize vast data sets. It can serve as a digital assistant for physicians and researchers, aiding in integrating medical imaging data with other multiomics data and facilitating the understanding of complex biological systems. The physician's and AI's viewpoints emphasize the value of such AI models in medicine, providing tangible examples of how this could enhance patient care. The editorial also discusses the rise of generative AI, highlighting its substantial impact in democratizing AI applications for modern medicine. While AI may not supersede health care professionals, practitioners incorporating AI into their practices could potentially have a competitive edge.
Collapse
|
6
|
Meo SA, Alotaibi M, Meo MZS, Meo MOS, Hamid M. Medical knowledge of ChatGPT in public health, infectious diseases, COVID-19 pandemic, and vaccines: multiple choice questions examination based performance. Front Public Health 2024; 12:1360597. [PMID: 38711764 PMCID: PMC11073538 DOI: 10.3389/fpubh.2024.1360597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Accepted: 04/02/2024] [Indexed: 05/08/2024] Open
Abstract
Background At the beginning of the year 2023, the Chatbot Generative Pre-Trained Transformer (ChatGPT) gained remarkable attention from the public. There is a great discussion about ChatGPT and its knowledge in medical sciences, however, literature is lacking to evaluate the ChatGPT knowledge level in public health. Therefore, this study investigates the knowledge of ChatGPT in public health, infectious diseases, the COVID-19 pandemic, and its vaccines. Methods Multiple Choice Questions (MCQs) bank was established. The question's contents were reviewed and confirmed that the questions were appropriate to the contents. The MCQs were based on the case scenario, with four sub-stems, with a single correct answer. From the MCQs bank, 60 MCQs we selected, 30 MCQs were from public health, and infectious diseases topics, 17 MCQs were from the COVID-19 pandemic, and 13 MCQs were on COVID-19 vaccines. Each MCQ was manually entered, and tasks were given to determine the knowledge level of ChatGPT on MCQs. Results Out of a total of 60 MCQs in public health, infectious diseases, the COVID-19 pandemic, and vaccines, ChatGPT attempted all the MCQs and obtained 17/30 (56.66%) marks in public health, infectious diseases, 15/17 (88.23%) in COVID-19, and 12/13 (92.30%) marks in COVID-19 vaccines MCQs, with an overall score of 44/60 (73.33%). The observed results of the correct answers in each section were significantly higher (p = 0.001). The ChatGPT obtained satisfactory grades in all three domains of public health, infectious diseases, and COVID-19 pandemic-allied examination. Conclusion ChatGPT has satisfactory knowledge of public health, infectious diseases, the COVID-19 pandemic, and its vaccines. In future, ChatGPT may assist medical educators, academicians, and healthcare professionals in providing a better understanding of public health, infectious diseases, the COVID-19 pandemic, and vaccines.
Collapse
Affiliation(s)
- Sultan Ayoub Meo
- Department of Physiology, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Metib Alotaibi
- Department of Medicine, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | | | | | - Mashhood Hamid
- Department of Family and Community Medicine, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
7
|
Wang J, Ye Q, Liu L, Guo NL, Hu G. Scientific figures interpreted by ChatGPT: strengths in plot recognition and limits in color perception. NPJ Precis Oncol 2024; 8:84. [PMID: 38580746 PMCID: PMC10997760 DOI: 10.1038/s41698-024-00576-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 02/27/2024] [Indexed: 04/07/2024] Open
Abstract
Emerging studies underscore the promising capabilities of large language model-based chatbots in conducting basic bioinformatics data analyses. The recent feature of accepting image inputs by ChatGPT, also known as GPT-4V(ision), motivated us to explore its efficacy in deciphering bioinformatics scientific figures. Our evaluation with examples in cancer research, including sequencing data analysis, multimodal network-based drug repositioning, and tumor clonal evolution, revealed that ChatGPT can proficiently explain different plot types and apply biological knowledge to enrich interpretations. However, it struggled to provide accurate interpretations when color perception and quantitative analysis of visual elements were involved. Furthermore, while the chatbot can draft figure legends and summarize findings from the figures, stringent proofreading is imperative to ensure the accuracy and reliability of the content.
Collapse
Affiliation(s)
- Jinge Wang
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV, 26506, USA
| | - Qing Ye
- West Virginia University Cancer Institute, West Virginia University, Morgantown, WV, 26506, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA
- Biodesign Institute, Arizona State University, Tempe, AZ, 85281, USA
| | - Nancy Lan Guo
- West Virginia University Cancer Institute, West Virginia University, Morgantown, WV, 26506, USA
- Department of Occupational and Environmental Health Sciences, West Virginia University, Morgantown, WV, 26506, USA
| | - Gangqing Hu
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV, 26506, USA.
- West Virginia University Cancer Institute, West Virginia University, Morgantown, WV, 26506, USA.
| |
Collapse
|
8
|
Braun EM, Juhasz-Böss I, Solomayer EF, Truhn D, Keller C, Heinrich V, Braun BJ. Will I soon be out of my job? Quality and guideline conformity of ChatGPT therapy suggestions to patient inquiries with gynecologic symptoms in a palliative setting. Arch Gynecol Obstet 2024; 309:1543-1549. [PMID: 37975899 DOI: 10.1007/s00404-023-07272-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 10/15/2023] [Indexed: 11/19/2023]
Abstract
PURPOSE The market and application possibilities for artificial intelligence are currently growing at high speed and are increasingly finding their way into gynecology. While the medical side is highly represented in the current literature, the patient's perspective is still lagging behind. Therefore, the aim of this study was to evaluate the recommendations of ChatGPT regarding patient inquiries about the possible therapy of gynecological leading symptoms in a palliative situation by experts. METHODS Case vignettes were constructed for 10 common concomitant symptoms in gynecologic oncology tumors in a palliative setting, and patient queries regarding therapy of these symptoms were generated as prompts for ChatGPT. Five experts in palliative care and gynecologic oncology evaluated the responses with respect to guideline adherence and applicability and identified advantages and disadvantages. RESULTS The overall rating of ChatGPT responses averaged 4.1 (5 = strongly agree; 1 = strongly disagree). The experts saw an average guideline conformity of the therapy recommendations with a value of 4.0. ChatGPT sometimes omits relevant therapies and does not provide an individual assessment of the suggested therapies, but does indicate that a physician consultation is additionally necessary. CONCLUSIONS Language models, such as ChatGPT, can provide valid and largely guideline-compliant therapy recommendations in their freely available and thus in principle accessible version for our patients. For a complete therapy recommendation, an evaluation of the therapies, their individual adjustment as well as a filtering of possible wrong recommendations, a medical expert's opinion remains indispensable.
Collapse
Affiliation(s)
- Eva-Marie Braun
- Center for Integrative Oncology, Die Filderklinik, Im Haberschlai 7, 70794, Filderstadt-Bonlanden, Germany.
| | - Ingolf Juhasz-Böss
- Department of Gynecology, University Medical Center Freiburg, Hugstetter Straße 55, 79106, Freiburg, Germany
| | - Erich-Franz Solomayer
- Department of Gynecology, Obstetrics and Reproductive Medicine, Saarland University Hospital, Kirrberger Straße, Building 9, 66421, Homburg, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Pauwelsstraße 30, 52074, Aachen, Germany
| | - Christiane Keller
- Center for Palliative Medicine and Pediatric Pain Therapy, Saarland University Hospital, Kirrberger Straße, Building 69, 66421, Homburg, Germany
| | - Vanessa Heinrich
- Department of Radiation Oncology, University Hospital Tübingen, Crona Kliniken, Hoppe-Seyler-Str. 3, 72076, Tübingen, Germany
| | - Benedikt Johannes Braun
- Department of Trauma and Reconstructive Surgery at the Eberhard Karls University Tübingen, BG Unfallklinik Tübingen, Schnarrenbergstrasse 95, 72076, Tübingen, Germany
| |
Collapse
|
9
|
Emmert-Streib F. Can ChatGPT understand genetics? Eur J Hum Genet 2024; 32:371-372. [PMID: 37407734 PMCID: PMC10999414 DOI: 10.1038/s41431-023-01419-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 06/19/2023] [Indexed: 07/07/2023] Open
Affiliation(s)
- Frank Emmert-Streib
- Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland.
| |
Collapse
|
10
|
McNeill A. Artificial intelligence - the next generation of sequencing? Eur J Hum Genet 2024; 32:367-368. [PMID: 38584194 PMCID: PMC10999430 DOI: 10.1038/s41431-024-01595-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/09/2024] Open
Affiliation(s)
- Alisdair McNeill
- Division of Neuroscience and Neuroscience Institute, The University of Sheffield, Sheffield, UK.
- Sheffield Clinical Genetics Service, Sheffield Children's Hospital NHS Foundation Trust, Sheffield, UK.
| |
Collapse
|
11
|
Wang L, Ge X, Liu L, Hu G. Code Interpreter for Bioinformatics: Are We There Yet? Ann Biomed Eng 2024; 52:754-756. [PMID: 37482573 DOI: 10.1007/s10439-023-03324-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 07/14/2023] [Indexed: 07/25/2023]
Abstract
The Code Interpreter feature in ChatGPT has the potential to democratize data analysis for non-specialists. As bioinformaticians, we are impressed by its performance in data manipulation and visualization. However, bioinformatics tasks often require execution of third-party packages, access to annotation knowledgebase, and handling large datasets. Code Interpreter's exclusive support for Python, no installation option for additional packages, inability to utilize external resources, and limited storage capacity could pose obstacles to its wide adoption in bioinformatics applications. To address these limitations, we advocated for the necessity of locally deployable, API-based systems for chatbot-aided bioinformatics applications.
Collapse
Affiliation(s)
- Lei Wang
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV, 26506, USA
| | - Xijin Ge
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, 57007, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA
- Biodesign Institute, Arizona State University, Tempe, AZ, 85281, USA
| | - Gangqing Hu
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV, 26506, USA.
| |
Collapse
|
12
|
Duong D, Solomon BD. Response to correspondence regarding "Analysis of large-language model versus human performance for genetics questions". Eur J Hum Genet 2024; 32:379-380. [PMID: 37582904 PMCID: PMC10999417 DOI: 10.1038/s41431-023-01444-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 07/24/2023] [Indexed: 08/17/2023] Open
Affiliation(s)
- Dat Duong
- Medical Genomics Unit, Medical Genetics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Benjamin D Solomon
- Medical Genomics Unit, Medical Genetics Branch, National Human Genome Research Institute, Bethesda, MD, USA.
| |
Collapse
|
13
|
Emmert-Streib F. Importance of critical thinking to understand ChatGPT. Eur J Hum Genet 2024; 32:377-378. [PMID: 37582903 PMCID: PMC10999413 DOI: 10.1038/s41431-023-01443-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 07/19/2023] [Indexed: 08/17/2023] Open
Affiliation(s)
- Frank Emmert-Streib
- Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland.
| |
Collapse
|
14
|
Bottomly D, McWeeney S. Just how transformative will AI/ML be for immuno-oncology? J Immunother Cancer 2024; 12:e007841. [PMID: 38531545 DOI: 10.1136/jitc-2023-007841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/15/2024] [Indexed: 03/28/2024] Open
Abstract
Immuno-oncology involves the study of approaches which harness the patient's immune system to fight malignancies. Immuno-oncology, as with every other biomedical and clinical research field as well as clinical operations, is in the midst of technological revolutions, which vastly increase the amount of available data. Recent advances in artificial intelligence and machine learning (AI/ML) have received much attention in terms of their potential to harness available data to improve insights and outcomes in many areas including immuno-oncology. In this review, we discuss important aspects to consider when evaluating the potential impact of AI/ML applications in the clinic. We highlight four clinical/biomedical challenges relevant to immuno-oncology and how they may be able to be addressed by the latest advancements in AI/ML. These challenges include (1) efficiency in clinical workflows, (2) curation of high-quality image data, (3) finding, extracting and synthesizing text knowledge as well as addressing, and (4) small cohort size in immunotherapeutic evaluation cohorts. Finally, we outline how advancements in reinforcement and federated learning, as well as the development of best practices for ethical and unbiased data generation, are likely to drive future innovations.
Collapse
Affiliation(s)
- Daniel Bottomly
- Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon, USA
| | - Shannon McWeeney
- Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon, USA
| |
Collapse
|
15
|
Stribling D, Xia Y, Amer MK, Graim KS, Mulligan CJ, Renne R. The model student: GPT-4 performance on graduate biomedical science exams. Sci Rep 2024; 14:5670. [PMID: 38453979 PMCID: PMC10920673 DOI: 10.1038/s41598-024-55568-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 02/25/2024] [Indexed: 03/09/2024] Open
Abstract
The GPT-4 large language model (LLM) and ChatGPT chatbot have emerged as accessible and capable tools for generating English-language text in a variety of formats. GPT-4 has previously performed well when applied to questions from multiple standardized examinations. However, further evaluation of trustworthiness and accuracy of GPT-4 responses across various knowledge domains is essential before its use as a reference resource. Here, we assess GPT-4 performance on nine graduate-level examinations in the biomedical sciences (seven blinded), finding that GPT-4 scores exceed the student average in seven of nine cases and exceed all student scores for four exams. GPT-4 performed very well on fill-in-the-blank, short-answer, and essay questions, and correctly answered several questions on figures sourced from published manuscripts. Conversely, GPT-4 performed poorly on questions with figures containing simulated data and those requiring a hand-drawn answer. Two GPT-4 answer-sets were flagged as plagiarism based on answer similarity and some model responses included detailed hallucinations. In addition to assessing GPT-4 performance, we discuss patterns and limitations in GPT-4 capabilities with the goal of informing design of future academic examinations in the chatbot era.
Collapse
Affiliation(s)
- Daniel Stribling
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32610, USA.
- UF Genetics Institute, University of Florida, Gainesville, FL, 32610, USA.
- UF Health Cancer Center, University of Florida, Gainesville, FL, 32610, USA.
| | - Yuxing Xia
- Department of Neuroscience, Center for Translational Research in Neurodegenerative Disease, College of Medicine, University of Florida, Gainesville, FL, 32610, USA
- Department of Neurology, UCLA, Los Angeles, CA, 90095, USA
| | - Maha K Amer
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32610, USA
| | - Kiley S Graim
- Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, 32610, USA
| | - Connie J Mulligan
- UF Genetics Institute, University of Florida, Gainesville, FL, 32610, USA
- Department of Anthropology, University of Florida, Gainesville, FL, 32610, USA
| | - Rolf Renne
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32610, USA.
- UF Genetics Institute, University of Florida, Gainesville, FL, 32610, USA.
- UF Health Cancer Center, University of Florida, Gainesville, FL, 32610, USA.
| |
Collapse
|
16
|
Sisk BA, Antes AL, DuBois JM. An Overarching Framework for the Ethics of Artificial Intelligence in Pediatrics. JAMA Pediatr 2024; 178:213-214. [PMID: 38165711 DOI: 10.1001/jamapediatrics.2023.5761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Abstract
This Viewpoint discusses the use of artificial intelligence in pediatrics.
Collapse
Affiliation(s)
- Bryan A Sisk
- Bioethics Research Center, Department of Medicine, Washington University School of Medicine, St Louis, Missouri
- Division of Hematology/Oncology, Department of Pediatrics, Washington University School of Medicine, St Louis, Missouri
| | - Alison L Antes
- Bioethics Research Center, Department of Medicine, Washington University School of Medicine, St Louis, Missouri
| | - James M DuBois
- Bioethics Research Center, Department of Medicine, Washington University School of Medicine, St Louis, Missouri
| |
Collapse
|
17
|
Wei Q, Yao Z, Cui Y, Wei B, Jin Z, Xu X. Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis. J Biomed Inform 2024; 151:104620. [PMID: 38462064 DOI: 10.1016/j.jbi.2024.104620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/27/2024] [Accepted: 02/29/2024] [Indexed: 03/12/2024]
Abstract
OBJECTIVE Large language models (LLMs) such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT's performance in answering medical questions and provide direction for future research. METHODS An extensive literature search was conducted on June 15, 2023, across ten medical databases. The keyword used was "ChatGPT," without restrictions on publication type, language, or date. Studies evaluating ChatGPT's performance in answering medical questions were included. Exclusions comprised review articles, comments, patents, non-medical evaluations of ChatGPT, and preprint studies. Data was extracted on general study characteristics, question sources, conversation processes, assessment metrics, and performance of ChatGPT. An evaluation framework for LLM in medical inquiries was proposed by integrating insights from selected literature. This study is registered with PROSPERO, CRD42023456327. RESULTS A total of 3520 articles were identified, of which 60 were reviewed and summarized in this paper and 17 were included in the meta-analysis. ChatGPT displayed an overall integrated accuracy of 56 % (95 % CI: 51 %-60 %, I2 = 87 %) in addressing medical queries. However, the studies varied in question resource, question-asking process, and evaluation metrics. As per our proposed evaluation framework, many studies failed to report methodological details, such as the date of inquiry, version of ChatGPT, and inter-rater consistency. CONCLUSION This review reveals ChatGPT's potential in addressing medical inquiries, but the heterogeneity of the study design and insufficient reporting might affect the results' reliability. Our proposed evaluation framework provides insights for the future study design and transparent reporting of LLM in responding to medical questions.
Collapse
Affiliation(s)
- Qiuhong Wei
- Big Data Center for Children's Medical Care, Children's Hospital of Chongqing Medical University, Chongqing, China; Children Nutrition Research Center, Children's Hospital of Chongqing Medical University, Chongqing, China; National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, Chongqing Key Laboratory of Child Neurodevelopment and Cognitive Disorders, Chongqing, China
| | - Zhengxiong Yao
- Department of Neurology, Children's Hospital of Chongqing Medical University, Chongqing, China
| | - Ying Cui
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Bo Wei
- Department of Global Statistics and Data Science, BeiGene USA Inc., San Mateo, CA, USA
| | - Zhezhen Jin
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Ximing Xu
- Big Data Center for Children's Medical Care, Children's Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
18
|
Meyer A, Riese J, Streichert T. Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study. JMIR MEDICAL EDUCATION 2024; 10:e50965. [PMID: 38329802 PMCID: PMC10884900 DOI: 10.2196/50965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/14/2023] [Accepted: 12/11/2023] [Indexed: 02/09/2024]
Abstract
BACKGROUND The potential of artificial intelligence (AI)-based large language models, such as ChatGPT, has gained significant attention in the medical field. This enthusiasm is driven not only by recent breakthroughs and improved accessibility, but also by the prospect of democratizing medical knowledge and promoting equitable health care. However, the performance of ChatGPT is substantially influenced by the input language, and given the growing public trust in this AI tool compared to that in traditional sources of information, investigating its medical accuracy across different languages is of particular importance. OBJECTIVE This study aimed to compare the performance of GPT-3.5 and GPT-4 with that of medical students on the written German medical licensing examination. METHODS To assess GPT-3.5's and GPT-4's medical proficiency, we used 937 original multiple-choice questions from 3 written German medical licensing examinations in October 2021, April 2022, and October 2022. RESULTS GPT-4 achieved an average score of 85% and ranked in the 92.8th, 99.5th, and 92.6th percentiles among medical students who took the same examinations in October 2021, April 2022, and October 2022, respectively. This represents a substantial improvement of 27% compared to GPT-3.5, which only passed 1 out of the 3 examinations. While GPT-3.5 performed well in psychiatry questions, GPT-4 exhibited strengths in internal medicine and surgery but showed weakness in academic research. CONCLUSIONS The study results highlight ChatGPT's remarkable improvement from moderate (GPT-3.5) to high competency (GPT-4) in answering medical licensing examination questions in German. While GPT-4's predecessor (GPT-3.5) was imprecise and inconsistent, it demonstrates considerable potential to improve medical education and patient care, provided that medically trained users critically evaluate its results. As the replacement of search engines by AI tools seems possible in the future, further studies with nonprofessional questions are needed to assess the safety and accuracy of ChatGPT for the general population.
Collapse
Affiliation(s)
- Annika Meyer
- Institute for Clinical Chemistry, University Hospital Cologne, Cologne, Germany
| | - Janik Riese
- Department of General Surgery, Visceral, Thoracic and Vascular Surgery, University Hospital Greifswald, Greifswald, Germany
| | - Thomas Streichert
- Institute for Clinical Chemistry, University Hospital Cologne, Cologne, Germany
| |
Collapse
|
19
|
Tangadulrat P, Sono S, Tangtrakulwanich B. Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students' and Physicians' Perceptions. JMIR MEDICAL EDUCATION 2023; 9:e50658. [PMID: 38133908 PMCID: PMC10770783 DOI: 10.2196/50658] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 10/17/2023] [Accepted: 12/11/2023] [Indexed: 12/23/2023]
Abstract
BACKGROUND ChatGPT is a well-known large language model-based chatbot. It could be used in the medical field in many aspects. However, some physicians are still unfamiliar with ChatGPT and are concerned about its benefits and risks. OBJECTIVE We aim to evaluate the perception of physicians and medical students toward using ChatGPT in the medical field. METHODS A web-based questionnaire was sent to medical students, interns, residents, and attending staff with questions regarding their perception toward using ChatGPT in clinical practice and medical education. Participants were also asked to rate their perception of ChatGPT's generated response about knee osteoarthritis. RESULTS Participants included 124 medical students, 46 interns, 37 residents, and 32 attending staff. After reading ChatGPT's response, 132 of the 239 (55.2%) participants had a positive rating about using ChatGPT for clinical practice. The proportion of positive answers was significantly lower in graduated physicians (48/115, 42%) compared with medical students (84/124, 68%; P<.001). Participants listed a lack of a patient-specific treatment plan, updated evidence, and a language barrier as ChatGPT's pitfalls. Regarding using ChatGPT for medical education, the proportion of positive responses was also significantly lower in graduate physicians (71/115, 62%) compared to medical students (103/124, 83.1%; P<.001). Participants were concerned that ChatGPT's response was too superficial, might lack scientific evidence, and might need expert verification. CONCLUSIONS Medical students generally had a positive perception of using ChatGPT for guiding treatment and medical education, whereas graduated doctors were more cautious in this regard. Nonetheless, both medical students and graduated doctors positively perceived using ChatGPT for creating patient educational materials.
Collapse
Affiliation(s)
- Pasin Tangadulrat
- Department of Orthopedics, Faculty of Medicine, Prince of Songkla University, Hatyai, Thailand
| | - Supinya Sono
- Division of Family and Preventive Medicine, Faculty of Medicine, Prince of Songkla University, Hatyai, Thailand
| | | |
Collapse
|
20
|
Choi W. Assessment of the capacity of ChatGPT as a self-learning tool in medical pharmacology: a study using MCQs. BMC MEDICAL EDUCATION 2023; 23:864. [PMID: 37957666 PMCID: PMC10644619 DOI: 10.1186/s12909-023-04832-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 11/01/2023] [Indexed: 11/15/2023]
Abstract
BACKGROUND ChatGPT is a large language model developed by OpenAI that exhibits a remarkable ability to simulate human speech. This investigation attempts to evaluate the potential of ChatGPT as a standalone self-learning tool, with specific attention on its efficacy in answering multiple-choice questions (MCQs) and providing credible rationale for its responses. METHODS The study used 78 test items from the Korean Comprehensive Basic Medical Sciences Examination (K-CBMSE) for years 2019 to 2021. 78 test items translated from Korean to English with four lead-in prompts per item resulted in a total of 312 MCQs. The MCQs were submitted to ChatGPT and the responses were analyzed for correctness, consistency, and relevance. RESULTS ChatGPT responded with an overall accuracy of 76.0%. Compared to its performance on recall and interpretation questions, the model performed poorly on problem-solving questions. ChatGPT offered correct rationales for 77.8% (182/234) of the responses, with errors primarily arising from faulty information and flawed reasoning. In terms of references, ChatGPT provided incorrect citations for 69.7% (191/274) of the responses. While the veracity of reference paragraphs could not be ascertained, 77.0% (47/61) were deemed pertinent and accurate with respect to the answer key. CONCLUSION The current version of ChatGPT has limitations in accurately answering MCQs and generating correct and relevant rationales, particularly when it comes to referencing. To avoid possible threats such as spreading inaccuracies and decreasing critical thinking skills, ChatGPT should be used with supervision.
Collapse
Affiliation(s)
- Woong Choi
- Department of Pharmacology, College of Medicine, Chungbuk National University, Cheongju, Chungbuk, 28644, Korea.
| |
Collapse
|
21
|
Wang J, Ye Q, Liu L, Lan Guo N, Hu G. Bioinformatics Illustrations Decoded by ChatGPT: The Good, The Bad, and The Ugly. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.15.562423. [PMID: 37904927 PMCID: PMC10614796 DOI: 10.1101/2023.10.15.562423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
Emerging studies underscore the promising capabilities of large language model-based chatbots in conducting fundamental bioinformatics data analyses. The recent feature of accepting image-inputs by ChatGPT motivated us to explore its efficacy in deciphering bioinformatics illustrations. Our evaluation with examples in cancer research, including sequencing data analysis, multimodal network-based drug repositioning, and tumor clonal evolution, revealed that ChatGPT can proficiently explain different plot types and apply biological knowledge to enrich interpretations. However, it struggled to provide accurate interpretations when quantitative analysis of visual elements was involved. Furthermore, while the chatbot can draft figure legends and summarize findings from the figures, stringent proofreading is imperative to ensure the accuracy and reliability of the content.
Collapse
Affiliation(s)
- Jinge Wang
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA
| | - Qing Ye
- West Virginia University Cancer Institute, West Virginia University, Morgantown, WV 26506, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA
- Biodesign Institute, Arizona State University, Tempe, AZ, 85281 USA
| | - Nancy Lan Guo
- West Virginia University Cancer Institute, West Virginia University, Morgantown, WV 26506, USA
- Department of Occupational and Environmental Health Sciences, West Virginia University, Morgantown, WV 26506, USA
| | - Gangqing Hu
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA
- West Virginia University Cancer Institute, West Virginia University, Morgantown, WV 26506, USA
| |
Collapse
|
22
|
Meo SA, Al-Khlaiwi T, AbuKhalaf AA, Meo AS, Klonoff DC. The Scientific Knowledge of Bard and ChatGPT in Endocrinology, Diabetes, and Diabetes Technology: Multiple-Choice Questions Examination-Based Performance. J Diabetes Sci Technol 2023:19322968231203987. [PMID: 37798960 DOI: 10.1177/19322968231203987] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/07/2023]
Abstract
BACKGROUND The present study aimed to investigate the knowledge level of Bard and ChatGPT in the areas of endocrinology, diabetes, and diabetes technology through a multiple-choice question (MCQ) examination format. METHODS Initially, a 100-MCQ bank was established based on MCQs in endocrinology, diabetes, and diabetes technology. The MCQs were created from physiology, medical textbooks, and academic examination pools in the areas of endocrinology, diabetes, and diabetes technology and academic examination pools. The study team members analyzed the MCQ contents to ensure that they were related to the endocrinology, diabetes, and diabetes technology. The number of MCQs from endocrinology was 50, and that from diabetes and science technology was also 50. The knowledge level of Google's Bard and ChatGPT was assessed with an MCQ-based examination. RESULTS In the endocrinology examination section, ChatGPT obtained 29 marks (correct responses) of 50 (58%), and Bard obtained a similar score of 29 of 50 (58%). However, in the diabetes technology examination section, ChatGPT obtained 23 marks of 50 (46%), and Bard obtained 20 marks of 50 (40%). Overall, in the entire three-part examination, ChatGPT obtained 52 marks of 100 (52%), and Bard obtained 49 marks of 100 (49%). ChatGPT obtained slightly more marks than Bard. However, both ChatGPT and Bard did not achieve satisfactory scores in endocrinology or diabetes/technology of at least 60%. CONCLUSIONS The overall MCQ-based performance of ChatGPT was slightly better than that of Google's Bard. However, both ChatGPT and Bard did not achieve appropriate scores in endocrinology and diabetes/diabetes technology. The study indicates that Bard and ChatGPT have the potential to facilitate medical students and faculty in academic medical education settings, but both artificial intelligence tools need more updated information in the fields of endocrinology, diabetes, and diabetes technology.
Collapse
Affiliation(s)
- Sultan Ayoub Meo
- Department of Physiology, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Thamir Al-Khlaiwi
- Department of Physiology, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | | | - Anusha Sultan Meo
- The School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Aberdeen, UK
| | - David C Klonoff
- Diabetes Research Institute, Mills-Peninsula Medical Center, San Mateo, CA, USA
| |
Collapse
|
23
|
Jeyaraman M, Ramasubramanian S, Balaji S, Jeyaraman N, Nallakumarasamy A, Sharma S. ChatGPT in action: Harnessing artificial intelligence potential and addressing ethical challenges in medicine, education, and scientific research. World J Methodol 2023; 13:170-178. [PMID: 37771867 PMCID: PMC10523250 DOI: 10.5662/wjm.v13.i4.170] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 06/29/2023] [Accepted: 07/24/2023] [Indexed: 09/20/2023] Open
Abstract
Artificial intelligence (AI) tools, like OpenAI's Chat Generative Pre-trained Transformer (ChatGPT), hold considerable potential in healthcare, academia, and diverse industries. Evidence demonstrates its capability at a medical student level in standardized tests, suggesting utility in medical education, radiology reporting, genetics research, data optimization, and drafting repetitive texts such as discharge summaries. Nevertheless, these tools should augment, not supplant, human expertise. Despite promising applications, ChatGPT confronts limitations, including critical thinking tasks and generating false references, necessitating stringent cross-verification. Ensuing concerns, such as potential misuse, bias, blind trust, and privacy, underscore the need for transparency, accountability, and clear policies. Evaluations of AI-generated content and preservation of academic integrity are critical. With responsible use, AI can significantly improve healthcare, academia, and industry without compromising integrity and research quality. For effective and ethical AI deployment, collaboration amongst AI developers, researchers, educators, and policymakers is vital. The development of domain-specific tools, guidelines, regulations, and the facilitation of public dialogue must underpin these endeavors to responsibly harness AI's potential.
Collapse
Affiliation(s)
- Madhan Jeyaraman
- Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Chennai 600077, Tamil Nadu, India
| | - Swaminathan Ramasubramanian
- Department of General Medicine, Government Medical College, Omandurar Government Estate, Chennai 600018, Tamil Nadu, India
| | - Sangeetha Balaji
- Department of General Medicine, Government Medical College, Omandurar Government Estate, Chennai 600018, Tamil Nadu, India
| | - Naveen Jeyaraman
- Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Chennai 600077, Tamil Nadu, India
| | - Arulkumar Nallakumarasamy
- Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Chennai 600077, Tamil Nadu, India
| | - Shilpa Sharma
- Department of Paediatric Surgery, All India Institute of Medical Sciences, Delhi 110029, New Delhi, India
| |
Collapse
|
24
|
Chatterjee S, Bhattacharya M, Lee SS, Chakraborty C. Can artificial intelligence-strengthened ChatGPT or other large language models transform nucleic acid research? MOLECULAR THERAPY. NUCLEIC ACIDS 2023; 33:205-207. [PMID: 37727444 PMCID: PMC10505907 DOI: 10.1016/j.omtn.2023.06.019] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/21/2023]
Affiliation(s)
- Srijan Chatterjee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-si, Gangwon-do 24252, Republic of Korea
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, Odisha 756020, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-si, Gangwon-do 24252, Republic of Korea
| | - Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal 700126, India
| |
Collapse
|
25
|
Solomon BD, Chung WK. Artificial intelligence and the impact on medical genetics. AMERICAN JOURNAL OF MEDICAL GENETICS. PART C, SEMINARS IN MEDICAL GENETICS 2023; 193:e32060. [PMID: 37565625 DOI: 10.1002/ajmg.c.32060] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 07/24/2023] [Accepted: 07/29/2023] [Indexed: 08/12/2023]
Abstract
Virtually all areas of biomedicine will be increasingly affected by applications of artificial intelligence (AI). We discuss how AI may affect fields of medical genetics, including both clinicians and laboratorians. In addition to reviewing the anticipated impact, we provide recommendations for ways in which these groups may want to evolve in light of the influence of AI. We also briefly discuss how educational and training programs can play a key role in preparing the future workforce given these anticipated changes.
Collapse
Affiliation(s)
- Benjamin D Solomon
- Medical Genetics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Wendy K Chung
- Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
26
|
Alkuraya IF. Is artificial intelligence getting too much credit in medical genetics? AMERICAN JOURNAL OF MEDICAL GENETICS. PART C, SEMINARS IN MEDICAL GENETICS 2023; 193:e32062. [PMID: 37606000 DOI: 10.1002/ajmg.c.32062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 08/09/2023] [Indexed: 08/23/2023]
Abstract
Artificial intelligence has lately proven useful in the field of medical genetics. It is already being used to interpret genome sequences and diagnose patients based on facial recognition. More recently, large-language models (LLMs) such as ChatGPT have been tested for their capacity to provide medical genetics information. It was found that ChatGPT performed similarly to human respondents in factual and critical thinking questions, albeit with reduced accuracy in the latter. In particular, ChatGPT's performance in questions related to calculating the recurrence risk was dismal, despite only having to deal with a single disease. To see if challenging ChatGPT with more difficult problems may reveal its flaws and their bases, it was asked to solve recurrence risk problems dealing with two diseases instead of one. Interestingly, it managed to correctly understand the mode of inheritance of recessive diseases, yet it incorrectly calculated the probability of having a healthy child. Other LLMs were also tested and showed similar noise. This highlights a major limitation for clinical use. While this shortcoming may be solved in the near future, LLMs may not be ready yet to be used as an effective clinical tool in communicating medical genetics information.
Collapse
|
27
|
Meo SA, Al-Masri AA, Alotaibi M, Meo MZS, Meo MOS. ChatGPT Knowledge Evaluation in Basic and Clinical Medical Sciences: Multiple Choice Question Examination-Based Performance. Healthcare (Basel) 2023; 11:2046. [PMID: 37510487 PMCID: PMC10379728 DOI: 10.3390/healthcare11142046] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 07/12/2023] [Accepted: 07/14/2023] [Indexed: 07/30/2023] Open
Abstract
The Chatbot Generative Pre-Trained Transformer (ChatGPT) has garnered great attention from the public, academicians and science communities. It responds with appropriate and articulate answers and explanations across various disciplines. For the use of ChatGPT in education, research and healthcare, different perspectives exist with some level of ambiguity around its acceptability and ideal uses. However, the literature is acutely lacking in establishing a link to assess the intellectual levels of ChatGPT in the medical sciences. Therefore, the present study aimed to investigate the knowledge level of ChatGPT in medical education both in basic and clinical medical sciences, multiple-choice question (MCQs) examination-based performance and its impact on the medical examination system. In this study, initially, a subject-wise question bank was established with a pool of multiple-choice questions (MCQs) from various medical textbooks and university examination pools. The research team members carefully reviewed the MCQ contents and ensured that the MCQs were relevant to the subject's contents. Each question was scenario-based with four sub-stems and had a single correct answer. In this study, 100 MCQs in various disciplines, including basic medical sciences (50 MCQs) and clinical medical sciences (50 MCQs), were randomly selected from the MCQ bank. The MCQs were manually entered one by one, and a fresh ChatGPT session was started for each entry to avoid memory retention bias. The task was given to ChatGPT to assess the response and knowledge level of ChatGPT. The first response obtained was taken as the final response. Based on a pre-determined answer key, scoring was made on a scale of 0 to 1, with zero representing incorrect and one representing the correct answer. The results revealed that out of 100 MCQs in various disciplines of basic and clinical medical sciences, ChatGPT attempted all the MCQs and obtained 37/50 (74%) marks in basic medical sciences and 35/50 (70%) marks in clinical medical sciences, with an overall score of 72/100 (72%) in both basic and clinical medical sciences. It is concluded that ChatGPT obtained a satisfactory score in both basic and clinical medical sciences subjects and demonstrated a degree of understanding and explanation. This study's findings suggest that ChatGPT may be able to assist medical students and faculty in medical education settings since it has potential as an innovation in the framework of medical sciences and education.
Collapse
Affiliation(s)
- Sultan Ayoub Meo
- Department of Physiology, College of Medicine, King Saud University, Riyadh 11461, Saudi Arabia;
| | - Abeer A. Al-Masri
- Department of Physiology, College of Medicine, King Saud University, Riyadh 11461, Saudi Arabia;
| | - Metib Alotaibi
- University Diabetes Unit, Department of Medicine, College of Medicine, King Saud University, Riyadh 11461, Saudi Arabia;
| | | | | |
Collapse
|