1
|
Stadler RD, Sudah SY, Moverman MA, Denard PJ, Duralde XA, Garrigues GE, Klifto CS, Levy JC, Namdari S, Sanchez-Sotelo J, Menendez ME. Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers. Arthroscopy 2024:S0749-8063(24)00495-X. [PMID: 38992513 DOI: 10.1016/j.arthro.2024.06.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 06/21/2024] [Accepted: 06/27/2024] [Indexed: 07/13/2024]
Abstract
PURPOSE To evaluate the extent to which experienced reviewers can accurately discern between artificial intelligence (AI)-generated and original research abstracts published in the field of shoulder and elbow surgery and compare this with the performance of an AI detection tool. METHODS Twenty-five shoulder- and elbow-related articles published in high-impact journals in 2023 were randomly selected. ChatGPT was prompted with only the abstract title to create an AI-generated version of each abstract. The resulting 50 abstracts were randomly distributed to and evaluated by 8 blinded peer reviewers with at least 5 years of experience. Reviewers were tasked with distinguishing between original and AI-generated text. A Likert scale assessed reviewer confidence for each interpretation, and the primary reason guiding assessment of generated text was collected. AI output detector (0%-100%) and plagiarism (0%-100%) scores were evaluated using GPTZero. RESULTS Reviewers correctly identified 62% of AI-generated abstracts and misclassified 38% of original abstracts as being AI generated. GPTZero reported a significantly higher probability of AI output among generated abstracts (median, 56%; interquartile range [IQR], 51%-77%) compared with original abstracts (median, 10%; IQR, 4%-37%; P < .01). Generated abstracts scored significantly lower on the plagiarism detector (median, 7%; IQR, 5%-14%) relative to original abstracts (median, 82%; IQR, 72%-92%; P < .01). Correct identification of AI-generated abstracts was predominately attributed to the presence of unrealistic data/values. The primary reason for misidentifying original abstracts as AI was attributed to writing style. CONCLUSIONS Experienced reviewers faced difficulties in distinguishing between human and AI-generated research content within shoulder and elbow surgery. The presence of unrealistic data facilitated correct identification of AI abstracts, whereas misidentification of original abstracts was often ascribed to writing style. CLINICAL RELEVANCE With rapidly increasing AI advancements, it is paramount that ethical standards of scientific reporting are upheld. It is therefore helpful to understand the ability of reviewers to identify AI-generated content.
Collapse
Affiliation(s)
- Ryan D Stadler
- Rutgers Robert Wood Johnson Medical School, New Brunswick, New Jersey, U.S.A..
| | - Suleiman Y Sudah
- Department of Orthopaedic Surgery, Monmouth Medical Center, Monmouth, New Jersey, U.S.A
| | - Michael A Moverman
- Department of Orthopaedics, University of Utah School of Medicine, Salt Lake City, Utah, U.S.A
| | | | | | - Grant E Garrigues
- Midwest Orthopaedics at Rush University Medical Center, Chicago, Illinois, U.S.A
| | - Christopher S Klifto
- Department of Orthopaedic Surgery, Duke University School of Medicine, Durham, North Carolina, U.S.A
| | - Jonathan C Levy
- Levy Shoulder Center at Paley Orthopedic & Spine Institute, Boca Raton, Florida, U.S.A
| | - Surena Namdari
- Rothman Orthopaedic Institute at Thomas Jefferson University Hospitals, Philadelphia, Pennsylvania, U.S.A
| | | | - Mariano E Menendez
- Department of Orthopaedics, University of California Davis, Sacramento, California, U.S.A
| |
Collapse
|
2
|
Li J, Zong H, Wu E, Wu R, Peng Z, Zhao J, Yang L, Xie H, Shen B. Exploring the potential of artificial intelligence to enhance the writing of english academic papers by non-native english-speaking medical students - the educational application of ChatGPT. BMC MEDICAL EDUCATION 2024; 24:736. [PMID: 38982429 PMCID: PMC11232216 DOI: 10.1186/s12909-024-05738-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 07/02/2024] [Indexed: 07/11/2024]
Abstract
BACKGROUND Academic paper writing holds significant importance in the education of medical students, and poses a clear challenge for those whose first language is not English. This study aims to investigate the effectiveness of employing large language models, particularly ChatGPT, in improving the English academic writing skills of these students. METHODS A cohort of 25 third-year medical students from China was recruited. The study consisted of two stages. Firstly, the students were asked to write a mini paper. Secondly, the students were asked to revise the mini paper using ChatGPT within two weeks. The evaluation of the mini papers focused on three key dimensions, including structure, logic, and language. The evaluation method incorporated both manual scoring and AI scoring utilizing the ChatGPT-3.5 and ChatGPT-4 models. Additionally, we employed a questionnaire to gather feedback on students' experience in using ChatGPT. RESULTS After implementing ChatGPT for writing assistance, there was a notable increase in manual scoring by 4.23 points. Similarly, AI scoring based on the ChatGPT-3.5 model showed an increase of 4.82 points, while the ChatGPT-4 model showed an increase of 3.84 points. These results highlight the potential of large language models in supporting academic writing. Statistical analysis revealed no significant difference between manual scoring and ChatGPT-4 scoring, indicating the potential of ChatGPT-4 to assist teachers in the grading process. Feedback from the questionnaire indicated a generally positive response from students, with 92% acknowledging an improvement in the quality of their writing, 84% noting advancements in their language skills, and 76% recognizing the contribution of ChatGPT in supporting academic research. CONCLUSION The study highlighted the efficacy of large language models like ChatGPT in augmenting the English academic writing proficiency of non-native speakers in medical education. Furthermore, it illustrated the potential of these models to make a contribution to the educational evaluation process, particularly in environments where English is not the primary language.
Collapse
Affiliation(s)
- Jiakun Li
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Hui Zong
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Erman Wu
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
- Department of Neurosurgery, the First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, China
| | - Rongrong Wu
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Zhufeng Peng
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Jing Zhao
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Lu Yang
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Hong Xie
- West China Hospital, West China School of Medicine, Sichuan University, No. 37, Guoxue Alley, Chengdu, 610041, China.
| | - Bairong Shen
- Department of Urology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
3
|
Roberts MC, Holt KE, Del Fiol G, Baccarelli AA, Allen CG. Precision public health in the era of genomics and big data. Nat Med 2024; 30:1865-1873. [PMID: 38992127 DOI: 10.1038/s41591-024-03098-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 05/29/2024] [Indexed: 07/13/2024]
Abstract
Precision public health (PPH) considers the interplay between genetics, lifestyle and the environment to improve disease prevention, diagnosis and treatment on a population level-thereby delivering the right interventions to the right populations at the right time. In this Review, we explore the concept of PPH as the next generation of public health. We discuss the historical context of using individual-level data in public health interventions and examine recent advancements in how data from human and pathogen genomics and social, behavioral and environmental research, as well as artificial intelligence, have transformed public health. Real-world examples of PPH are discussed, emphasizing how these approaches are becoming a mainstay in public health, as well as outstanding challenges in their development, implementation and sustainability. Data sciences, ethical, legal and social implications research, capacity building, equity research and implementation science will have a crucial role in realizing the potential for 'precision' to enhance traditional public health approaches.
Collapse
Affiliation(s)
- Megan C Roberts
- Division of Pharmaceutical Outcomes and Policy, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, NC, USA.
| | - Kathryn E Holt
- Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, UK
- Department of Infectious Diseases, School of Translational Medicine, Monash University, Melbourne, Victoria, Australia
| | - Guilherme Del Fiol
- Biomedical Informatics, School of Medicine, University of Utah, Salt Lake City, UT, USA
| | - Andrea A Baccarelli
- Department of Environmental Health, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Caitlin G Allen
- Department of Public Health Sciences, College of Medicine, Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
4
|
Rossettini G, Rodeghiero L, Corradi F, Cook C, Pillastrini P, Turolla A, Castellini G, Chiappinotto S, Gianola S, Palese A. Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study. BMC MEDICAL EDUCATION 2024; 24:694. [PMID: 38926809 PMCID: PMC11210096 DOI: 10.1186/s12909-024-05630-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 06/04/2024] [Indexed: 06/28/2024]
Abstract
BACKGROUND Artificial intelligence (AI) chatbots are emerging educational tools for students in healthcare science. However, assessing their accuracy is essential prior to adoption in educational settings. This study aimed to assess the accuracy of predicting the correct answers from three AI chatbots (ChatGPT-4, Microsoft Copilot and Google Gemini) in the Italian entrance standardized examination test of healthcare science degrees (CINECA test). Secondarily, we assessed the narrative coherence of the AI chatbots' responses (i.e., text output) based on three qualitative metrics: the logical rationale behind the chosen answer, the presence of information internal to the question, and presence of information external to the question. METHODS An observational cross-sectional design was performed in September of 2023. Accuracy of the three chatbots was evaluated for the CINECA test, where questions were formatted using a multiple-choice structure with a single best answer. The outcome is binary (correct or incorrect). Chi-squared test and a post hoc analysis with Bonferroni correction assessed differences among chatbots performance in accuracy. A p-value of < 0.05 was considered statistically significant. A sensitivity analysis was performed, excluding answers that were not applicable (e.g., images). Narrative coherence was analyzed by absolute and relative frequencies of correct answers and errors. RESULTS Overall, of the 820 CINECA multiple-choice questions inputted into all chatbots, 20 questions were not imported in ChatGPT-4 (n = 808) and Google Gemini (n = 808) due to technical limitations. We found statistically significant differences in the ChatGPT-4 vs Google Gemini and Microsoft Copilot vs Google Gemini comparisons (p-value < 0.001). The narrative coherence of AI chatbots revealed "Logical reasoning" as the prevalent correct answer (n = 622, 81.5%) and "Logical error" as the prevalent incorrect answer (n = 40, 88.9%). CONCLUSIONS Our main findings reveal that: (A) AI chatbots performed well; (B) ChatGPT-4 and Microsoft Copilot performed better than Google Gemini; and (C) their narrative coherence is primarily logical. Although AI chatbots showed promising accuracy in predicting the correct answer in the Italian entrance university standardized examination test, we encourage candidates to cautiously incorporate this new technology to supplement their learning rather than a primary resource. TRIAL REGISTRATION Not required.
Collapse
Affiliation(s)
- Giacomo Rossettini
- School of Physiotherapy, University of Verona, Verona, Italy.
- Department of Physiotherapy, Faculty of Sport Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, 28670, Spain.
| | - Lia Rodeghiero
- Department of Rehabilitation, Hospital of Merano (SABES-ASDAA), Teaching Hospital of Paracelsus Medical University (PMU), Merano-Meran, Italy.
| | | | - Chad Cook
- Department of Orthopaedics, Duke University, Durham, NC, USA
- Duke Clinical Research Institute, Duke University, Durham, NC, USA
- Department of Population Health Sciences, Duke University, Durham, NC, USA
| | - Paolo Pillastrini
- Department of Biomedical and Neuromotor Sciences (DIBINEM), Alma Mater University of Bologna, Bologna, Italy
- Unit of Occupational Medicine, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, Bologna, Italy
| | - Andrea Turolla
- Department of Biomedical and Neuromotor Sciences (DIBINEM), Alma Mater University of Bologna, Bologna, Italy
- Unit of Occupational Medicine, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, Bologna, Italy
| | - Greta Castellini
- Unit of Clinical Epidemiology, IRCCS Istituto Ortopedico Galeazzi, Milan, Italy
| | | | - Silvia Gianola
- Unit of Clinical Epidemiology, IRCCS Istituto Ortopedico Galeazzi, Milan, Italy.
| | - Alvisa Palese
- Department of Medical Sciences, University of Udine, Udine, Italy.
| |
Collapse
|
5
|
Buldur M, Sezer B. Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam. BMC Oral Health 2024; 24:605. [PMID: 38789962 PMCID: PMC11127407 DOI: 10.1186/s12903-024-04358-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open
Abstract
BACKGROUND The use of artificial intelligence in the field of health sciences is becoming widespread. It is known that patients benefit from artificial intelligence applications on various health issues, especially after the pandemic period. One of the most important issues in this regard is the accuracy of the information provided by artificial intelligence applications. OBJECTIVE The purpose of this study was to the frequently asked questions about dental amalgam, as determined by the United States Food and Drug Administration (FDA), which is one of these information resources, to Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) and to compare the content of the answers given by the application with the answers of the FDA. METHODS The questions were directed to ChatGPT-4 on May 8th and May 16th, 2023, and the responses were recorded and compared at the word and meaning levels using ChatGPT. The answers from the FDA webpage were also recorded. The responses were compared for content similarity in "Main Idea", "Quality Analysis", "Common Ideas", and "Inconsistent Ideas" between ChatGPT-4's responses and FDA's responses. RESULTS ChatGPT-4 provided similar responses at one-week intervals. In comparison with FDA guidance, it provided answers with similar information content to frequently asked questions. However, although there were some similarities in the general aspects of the recommendation regarding amalgam removal in the question, the two texts are not the same, and they offered different perspectives on the replacement of fillings. CONCLUSIONS The findings of this study indicate that ChatGPT-4, an artificial intelligence based application, encompasses current and accurate information regarding dental amalgam and its removal, providing it to individuals seeking access to such information. Nevertheless, we believe that numerous studies are required to assess the validity and reliability of ChatGPT-4 across diverse subjects.
Collapse
Affiliation(s)
- Mehmet Buldur
- Department of Restorative Dentistry, School of Dentistry, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye
| | - Berkant Sezer
- Department of Pediatric Dentistry, School of Dentistry, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye.
| |
Collapse
|
6
|
Garcia Valencia OA, Thongprayoon C, Jadlowiec CC, Mao SA, Leeaphorn N, Budhiraja P, Craici IM, Gonzalez Suarez ML, Cheungpasitporn W. AI-driven translations for kidney transplant equity in Hispanic populations. Sci Rep 2024; 14:8511. [PMID: 38609476 PMCID: PMC11014982 DOI: 10.1038/s41598-024-59237-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 04/08/2024] [Indexed: 04/14/2024] Open
Abstract
Health equity and accessing Spanish kidney transplant information continues being a substantial challenge facing the Hispanic community. This study evaluated ChatGPT's capabilities in translating 54 English kidney transplant frequently asked questions (FAQs) into Spanish using two versions of the AI model, GPT-3.5 and GPT-4.0. The FAQs included 19 from Organ Procurement and Transplantation Network (OPTN), 15 from National Health Service (NHS), and 20 from National Kidney Foundation (NKF). Two native Spanish-speaking nephrologists, both of whom are of Mexican heritage, scored the translations for linguistic accuracy and cultural sensitivity tailored to Hispanics using a 1-5 rubric. The inter-rater reliability of the evaluators, measured by Cohen's Kappa, was 0.85. Overall linguistic accuracy was 4.89 ± 0.31 for GPT-3.5 versus 4.94 ± 0.23 for GPT-4.0 (non-significant p = 0.23). Both versions scored 4.96 ± 0.19 in cultural sensitivity (p = 1.00). By source, GPT-3.5 linguistic accuracy was 4.84 ± 0.37 (OPTN), 4.93 ± 0.26 (NHS), 4.90 ± 0.31 (NKF). GPT-4.0 scored 4.95 ± 0.23 (OPTN), 4.93 ± 0.26 (NHS), 4.95 ± 0.22 (NKF). For cultural sensitivity, GPT-3.5 scored 4.95 ± 0.23 (OPTN), 4.93 ± 0.26 (NHS), 5.00 ± 0.00 (NKF), while GPT-4.0 scored 5.00 ± 0.00 (OPTN), 5.00 ± 0.00 (NHS), 4.90 ± 0.31 (NKF). These high linguistic and cultural sensitivity scores demonstrate Chat GPT effectively translated the English FAQs into Spanish across systems. The findings suggest Chat GPT's potential to promote health equity by improving Spanish access to essential kidney transplant information. Additional research should evaluate its medical translation capabilities across diverse contexts/languages. These English-to-Spanish translations may increase access to vital transplant information for underserved Spanish-speaking Hispanic patients.
Collapse
Affiliation(s)
- Oscar A Garcia Valencia
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Charat Thongprayoon
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | | | - Shennen A Mao
- Division of Transplant Surgery, Department of Transplantation, Mayo Clinic, Jacksonville, FL, USA
| | - Napat Leeaphorn
- Division of Transplant Surgery, Department of Transplantation, Mayo Clinic, Jacksonville, FL, USA
- Department of Transplant, Mayo Clinic, Jacksonville, USA
| | - Pooja Budhiraja
- Division of Transplant Surgery, Mayo Clinic, Phoenix, AZ, USA
| | - Iasmina M Craici
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Maria L Gonzalez Suarez
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Wisit Cheungpasitporn
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
7
|
Li C, Ye G, Jiang Y, Wang Z, Yu H, Yang M. Artificial Intelligence in battling infectious diseases: A transformative role. J Med Virol 2024; 96:e29355. [PMID: 38179882 DOI: 10.1002/jmv.29355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/01/2023] [Accepted: 12/17/2023] [Indexed: 01/06/2024]
Abstract
It is widely acknowledged that infectious diseases have wrought immense havoc on human society, being regarded as adversaries from which humanity cannot elude. In recent years, the advancement of Artificial Intelligence (AI) technology has ushered in a revolutionary era in the realm of infectious disease prevention and control. This evolution encompasses early warning of outbreaks, contact tracing, infection diagnosis, drug discovery, and the facilitation of drug design, alongside other facets of epidemic management. This article presents an overview of the utilization of AI systems in the field of infectious diseases, with a specific focus on their role during the COVID-19 pandemic. The article also highlights the contemporary challenges that AI confronts within this domain and posits strategies for their mitigation. There exists an imperative to further harness the potential applications of AI across multiple domains to augment its capacity in effectively addressing future disease outbreaks.
Collapse
Affiliation(s)
- Chunhui Li
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Guoguo Ye
- Shenzhen Key Laboratory of Pathogen and Immunity, National Clinical Research Center for Infectious Disease, The Third People's Hospital of Shenzhen, Second Hospital Affiliated to Southern University of Science and Technology, Shenzhen, China
| | - Yinghan Jiang
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Zhiming Wang
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Haiyang Yu
- Hangzhou Yalla Information Technology Service Co., Ltd., Hangzhou, People's Republic of China
| | - Minghui Yang
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| |
Collapse
|