1
|
Gosak L, Štiglic G, Pruinelli L, Vrbnjak D. PICOT questions and search strategies formulation: A novel approach using artificial intelligence automation. J Nurs Scholarsh 2025; 57:5-16. [PMID: 39582233 PMCID: PMC11771709 DOI: 10.1111/jnu.13036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 10/24/2024] [Accepted: 11/07/2024] [Indexed: 11/26/2024]
Abstract
AIM The aim of this study was to evaluate and compare artificial intelligence (AI)-based large language models (LLMs) (ChatGPT-3.5, Bing, and Bard) with human-based formulations in generating relevant clinical queries, using comprehensive methodological evaluations. METHODS To interact with the major LLMs ChatGPT-3.5, Bing Chat, and Google Bard, scripts and prompts were designed to formulate PICOT (population, intervention, comparison, outcome, time) clinical questions and search strategies. Quality of the LLMs responses was assessed using a descriptive approach and independent assessment by two researchers. To determine the number of hits, PubMed, Web of Science, Cochrane Library, and CINAHL Ultimate search results were imported separately, without search restrictions, with the search strings generated by the three LLMs and an additional one by the expert. Hits from one of the scenarios were also exported for relevance evaluation. The use of a single scenario was chosen to provide a focused analysis. Cronbach's alpha and intraclass correlation coefficient (ICC) were also calculated. RESULTS In five different scenarios, ChatGPT-3.5 generated 11,859 hits, Bing 1,376,854, Bard 16,583, and an expert 5919 hits. We then used the first scenario to assess the relevance of the obtained results. The human expert search approach resulted in 65.22% (56/105) relevant articles. Bing was the most accurate AI-based LLM with 70.79% (63/89), followed by ChatGPT-3.5 with 21.05% (12/45), and Bard with 13.29% (42/316) relevant hits. Based on the assessment of two evaluators, ChatGPT-3.5 received the highest score (M = 48.50; SD = 0.71). Results showed a high level of agreement between the two evaluators. Although ChatGPT-3.5 showed a lower percentage of relevant hits compared to Bing, this reflects the nuanced evaluation criteria, where the subjective evaluation prioritized contextual accuracy and quality over mere relevance. CONCLUSION This study provides valuable insights into the ability of LLMs to formulate PICOT clinical questions and search strategies. AI-based LLMs, such as ChatGPT-3.5, demonstrate significant potential for augmenting clinical workflows, improving clinical query development, and supporting search strategies. However, the findings also highlight limitations that necessitate further refinement and continued human oversight. CLINICAL RELEVANCE AI could assist nurses in formulating PICOT clinical questions and search strategies. AI-based LLMs offer valuable support to healthcare professionals by improving the structure of clinical questions and enhancing search strategies, thereby significantly increasing the efficiency of information retrieval.
Collapse
Affiliation(s)
- Lucija Gosak
- Faculty of Health SciencesUniversity of MariborMariborSlovenia
| | - Gregor Štiglic
- Faculty of Health SciencesUniversity of MariborMariborSlovenia
- Faculty of Electrical Engineering and Computer ScienceUniversity of MariborMariborSlovenia
- Usher InstituteUniversity of EdinburghEdinburghUK
| | - Lisiane Pruinelli
- College of Nursing and College of MedicineUniversity of FloridaGainesvilleFloridaUSA
| | | |
Collapse
|
2
|
Carneiro MM. Artificial intelligence in scientific writing: sailing fair winds or between the devil and the deep blue sea? Women Health 2025; 65:1-3. [PMID: 39722655 DOI: 10.1080/03630242.2025.2445890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2024]
Affiliation(s)
- Márcia Mendonça Carneiro
- Department of Obstetrics and Gynecology, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG, Brazil
- ORIGEN Center for Reproductive Medicine, Belo Horizonte, MG, Brazil
| |
Collapse
|
3
|
Seam N, Chotirmall SH, Martinez FJ, Halayko AJ, Harhay MO, Davis SD, Schumacker PT, Tighe RM, Burkart KM, Cooke C. Editorial Position of the American Thoracic Society Journal Family on the Evolving Role of Artificial Intelligence in Scientific Research and Review. Am J Respir Crit Care Med 2024; 211:1-3. [PMID: 39680927 PMCID: PMC11755372 DOI: 10.1164/rccm.202411-2208ed] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Accepted: 12/11/2024] [Indexed: 12/18/2024] Open
Affiliation(s)
- Nitin Seam
- National Institutes of Health, Bethesda, Maryland, United States;
| | - Sanjay H Chotirmall
- Lee Kong Chian School of Medicine, Translational Respiratory Research Laboratory, Singapore, Singapore
- Singapore
| | | | - Andrew J Halayko
- University of Manitoba, SECTION OF RESPIRATORY DISEASES, Winnipeg, Manitoba, Canada
- University of Manitoba, Biology of Breathing Group, Children's Hospital Research Institute of Manitoba, Winnipeg, Manitoba, Canada
| | - Michael O Harhay
- University of Pennsylvania, Biostatistics, Epidemiology and Informatics, Philadelphia, Pennsylvania, United States
| | - Stephanie D Davis
- Riley Children's Hospital, Pediatrics, Indianapolis, Indiana, United States
| | - Paul T Schumacker
- Northwestern University, Pediatrics - Neonatology, Chicago, Illinois, United States
| | - Robert M Tighe
- Duke Medicine, Medicine, Durham, North Carolina, United States
| | | | - Colin Cooke
- University of Michigan, Pulmonary and Critical Care Medicine, Ann Arbor, Michigan, United States
| |
Collapse
|
4
|
Gauckler C, Werner MH. Artificial Intelligence: A Challenge to Scientific Communication. Klin Monbl Augenheilkd 2024; 241:1309-1321. [PMID: 39637910 DOI: 10.1055/a-2418-5238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2024]
Abstract
Recent years have seen formidable advances in artificial intelligence. Developments include a large number of specialised systems either existing or planned for use in scientific research, data analysis, translation, text production and design with grammar checking and stylistic revision, plagiarism detection, and scientific review in addition to general-purpose AI systems for searching the internet and generative AI systems for texts, images, videos, and musical compositions. These systems promise more ease and simplicity in many aspects of work. Blind trust in AI systems with uncritical, careless use of AI results is dangerous, as these systems do not have any inherent understanding of the content they process or generate, but only simulate this understanding by reproducing statistical patterns extracted from training data. This article discusses the potential and risk of using AI in scientific communication and explores potential systemic consequences of widespread AI implementation in this context.
Collapse
Affiliation(s)
| | - Micha H Werner
- Institut für Philosophie, Universität Greifswald, Deutschland
| |
Collapse
|
5
|
Kabir A, Shah S, Haddad A, Raper DMS. Introducing Our Custom GPT: An Example of the Potential Impact of Personalized GPT Builders on Scientific Writing. World Neurosurg 2024; 193:461-468. [PMID: 39442688 DOI: 10.1016/j.wneu.2024.10.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 10/10/2024] [Accepted: 10/14/2024] [Indexed: 10/25/2024]
Abstract
BACKGROUND The rapid progression of artificial intelligence (AI) and large language models (LLMs), such as ChatGPT, has contributed to increase its utility and popularity in various fields. Discourse about AI's potential role in different aspects of scientific literature such as writing, data analysis, and literature review, is growing as the programs continue to improve their capabilities. This study utilizes a recently released ChatGPT tool that allows users to create customized GPTs to highlight the potential of customizable GPTs tailored to prepare and write research manuscripts. METHODS We developed our 2 GPTs, Neurosurgical Research Paper Writer and Medi Research Assistant, through iterative refinement of ChatGPT 4.0's tool, GPT Builder. This process included providing specific and thorough instructions along with repetitive testing and feedback-driven adjustments to finalize a version of the model that fit our needs. RESULTS The GPT models that we created were able to efficiently and consistently produce accurate outputs from inputted prompts based on their specific configurations. It effectively analyzed existing literature that it found and synthesized information in ways that were reliable and written in ways comparable to manuscripts authored by scientific professionals. CONCLUSIONS While the ability of modern AI to generate scientific manuscripts has shown significant progress, the persistence of fallacies and miscalculations suggest that the development of GPTs requires extensive calibration before achieving greater reliability and consistency. Nevertheless, the prospective horizon of AI-driven research holds promise in streamlining the publication workflow and increasing accessibility to novel research.
Collapse
Affiliation(s)
- Aymen Kabir
- Department of Neurological Surgery, University of California, San Francisco, California, USA
| | - Suraj Shah
- University of California, Berkeley, California, USA
| | - Alexander Haddad
- Department of Neurological Surgery, University of California, San Francisco, California, USA
| | - Daniel M S Raper
- Department of Neurological Surgery, University of California, San Francisco, California, USA.
| |
Collapse
|
6
|
Filetti S, Fenza G, Gallo A. Research design and writing of scholarly articles: new artificial intelligence tools available for researchers. Endocrine 2024; 85:1104-1116. [PMID: 39085566 DOI: 10.1007/s12020-024-03977-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 07/22/2024] [Indexed: 08/02/2024]
|
7
|
İlhan B, Gürses BO, Güneri P. Addressing Inequalities in Science: The Role of Language Learning Models in Bridging the Gap. Int Dent J 2024; 74:657-660. [PMID: 38599934 PMCID: PMC11287170 DOI: 10.1016/j.identj.2024.01.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Revised: 01/24/2024] [Accepted: 01/28/2024] [Indexed: 04/12/2024] Open
Affiliation(s)
- Betül İlhan
- Department of Oral & Maxillofacial Radiology, Faculty of Dentistry, Ege University, Izmir, Turkey.
| | - Barış Oğuz Gürses
- Department of Mechanical Engineering, Faculty of Engineering, Ege University, Izmir, Turkey
| | - Pelin Güneri
- Department of Oral & Maxillofacial Radiology, Faculty of Dentistry, Ege University, Izmir, Turkey
| |
Collapse
|
8
|
Ren D, Roland D. Arise robot overlords! A synergy of artificial intelligence in the evolution of scientific writing and publishing. Pediatr Res 2024; 96:576-578. [PMID: 38627589 DOI: 10.1038/s41390-024-03217-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 03/29/2024] [Indexed: 06/09/2024]
Affiliation(s)
- Dennis Ren
- Division of Emergency Medicine, Children's National Hospital, Washington, DC, WA, USA.
| | - Damian Roland
- SAPPHIRE Group, Population Health Sciences, Leicester University, Leicester, UK
- Paediatric Emergency Medicine Leicester Academic (PEMLA) Group, Children's Emergency Department, Leicester Royal Infirmary, Leicester, UK
| |
Collapse
|
9
|
Aljamaan F, Temsah MH, Altamimi I, Al-Eyadhy A, Jamal A, Alhasan K, Mesallam TA, Farahat M, Malki KH. Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study. JMIR Med Inform 2024; 12:e54345. [PMID: 39083799 PMCID: PMC11325115 DOI: 10.2196/54345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/05/2024] [Accepted: 07/03/2024] [Indexed: 08/02/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) chatbots have recently gained use in medical practice by health care practitioners. Interestingly, the output of these AI chatbots was found to have varying degrees of hallucination in content and references. Such hallucinations generate doubts about their output and their implementation. OBJECTIVE The aim of our study was to propose a reference hallucination score (RHS) to evaluate the authenticity of AI chatbots' citations. METHODS Six AI chatbots were challenged with the same 10 medical prompts, requesting 10 references per prompt. The RHS is composed of 6 bibliographic items and the reference's relevance to prompts' keywords. RHS was calculated for each reference, prompt, and type of prompt (basic vs complex). The average RHS was calculated for each AI chatbot and compared across the different types of prompts and AI chatbots. RESULTS Bard failed to generate any references. ChatGPT 3.5 and Bing generated the highest RHS (score=11), while Elicit and SciSpace generated the lowest RHS (score=1), and Perplexity generated a middle RHS (score=7). The highest degree of hallucination was observed for reference relevancy to the prompt keywords (308/500, 61.6%), while the lowest was for reference titles (169/500, 33.8%). ChatGPT and Bing had comparable RHS (β coefficient=-0.069; P=.32), while Perplexity had significantly lower RHS than ChatGPT (β coefficient=-0.345; P<.001). AI chatbots generally had significantly higher RHS when prompted with scenarios or complex format prompts (β coefficient=0.486; P<.001). CONCLUSIONS The variation in RHS underscores the necessity for a robust reference evaluation tool to improve the authenticity of AI chatbots. Further, the variations highlight the importance of verifying their output and citations. Elicit and SciSpace had negligible hallucination, while ChatGPT and Bing had critical hallucination levels. The proposed AI chatbots' RHS could contribute to ongoing efforts to enhance AI's general reliability in medical research.
Collapse
Affiliation(s)
- Fadi Aljamaan
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | | | | | - Ayman Al-Eyadhy
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Amr Jamal
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Khalid Alhasan
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Tamer A Mesallam
- Department of Otolaryngology, College of Medicine, Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia
| | - Mohamed Farahat
- Department of Otolaryngology, College of Medicine, Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia
| | - Khalid H Malki
- Department of Otolaryngology, College of Medicine, Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
10
|
Molena KF, Macedo AP, Ijaz A, Carvalho FK, Gallo MJD, Wanderley Garcia de Paula E Silva F, de Rossi A, Mezzomo LA, Mugayar LRF, Queiroz AM. Assessing the Accuracy, Completeness, and Reliability of Artificial Intelligence-Generated Responses in Dentistry: A Pilot Study Evaluating the ChatGPT Model. Cureus 2024; 16:e65658. [PMID: 39205730 PMCID: PMC11352766 DOI: 10.7759/cureus.65658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2024] [Indexed: 09/04/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) can be a tool in the diagnosis and acquisition of knowledge, particularly in dentistry, sparking debates on its application in clinical decision-making. OBJECTIVE This study aims to evaluate the accuracy, completeness, and reliability of the responses generated by Chatbot Generative Pre-Trained Transformer (ChatGPT) 3.5 in dentistry using expert-formulated questions. MATERIALS AND METHODS Experts were invited to create three questions, answers, and respective references according to specialized fields of activity. The Likert scale was used to evaluate agreement levels between experts and ChatGPT responses. Statistical analysis compared descriptive and binary question groups in terms of accuracy and completeness. Questions with low accuracy underwent re-evaluation, and subsequent responses were compared for improvement. The Wilcoxon test was utilized (α = 0.05). RESULTS Ten experts across six dental specialties generated 30 binary and descriptive dental questions and references. The accuracy score had a median of 5.50 and a mean of 4.17. For completeness, the median was 2.00 and the mean was 2.07. No difference was observed between descriptive and binary responses for accuracy and completeness. However, re-evaluated responses showed a significant improvement with a significant difference in accuracy (median 5.50 vs. 6.00; mean 4.17 vs. 4.80; p=0.042) and completeness (median 2.0 vs. 2.0; mean 2.07 vs. 2.30; p=0.011). References were more incorrect than correct, with no differences between descriptive and binary questions. CONCLUSIONS ChatGPT initially demonstrated good accuracy and completeness, which was further improved with machine learning (ML) over time. However, some inaccurate answers and references persisted. Human critical discernment continues to be essential to facing complex clinical cases and advancing theoretical knowledge and evidence-based practice.
Collapse
Affiliation(s)
- Kelly F Molena
- Department of Pediatric Dentistry, School of Dentistry of Ribeirão Preto at University of São Paulo, Ribeirão Preto, BRA
| | - Ana P Macedo
- Department of Dental Materials and Prosthesis, School of Dentistry of Ribeirão Preto at University of São Paulo, Ribeirão Preto, BRA
| | - Anum Ijaz
- Department of Public Health, University of Illinois Chicago at College of Dentistry, Chicago, USA
| | - Fabrício K Carvalho
- Department of Pediatric Dentistry, School of Dentistry of Ribeirão Preto at University of São Paulo, Ribeirão Preto, USA
| | - Maria Julia D Gallo
- Department of Pediatric Dentistry, School of Dentistry of Ribeirão Preto at University of São Paulo, Ribeirão Preto, BRA
| | | | - Andiara de Rossi
- Department of Dentistry, School of Dentistry of Ribeirão Preto at University of São Paulo, São Paulo, BRA
| | - Luis A Mezzomo
- Department of Restorative Dentistry, University of Illinois Chicago at College of Dentistry, Chicago, USA
| | - Leda Regina F Mugayar
- Department of Pediatric Dentistry, University of Illinois Chicago College of Dentistry, Chicago, USA
| | - Alexandra M Queiroz
- Department of Pediatric Dentistry, School of Dentistry of Ribeirão Preto at University of São Paulo, Ribeirão Preto, USA
| |
Collapse
|
11
|
Pelletier J, Li Y, Cloessner E, Sistenich V, Maxwell N, Thomas M, Stoner D, Mwenze B, Manguvo A. Bridging Gaps: A Quality Improvement Project for the Continuing Medical Education on Stick (CMES) Program. Cureus 2024; 16:e62657. [PMID: 39036234 PMCID: PMC11258952 DOI: 10.7759/cureus.62657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/18/2024] [Indexed: 07/23/2024] Open
Abstract
BACKGROUND Aimed at bridging the gap in continuing medical education (CME) resource availability in low- and middle-income countries (LMICs), the "Continuing Medical Education on Stick" (CMES) program introduces two technological solutions: a universal serial bus (USB) drive and the CMES-Pi computer facilitating access to monthly updated CME content without data cost. Feedback from users suggests a lack of content on tropical infectious diseases (IDs) and content from a Western perspective, which may be less relevant in LMIC settings. METHODS This quality improvement project was intended to identify areas for improvement of the CMES database to better meet the educational needs of users. We compared the CMES content with the American Board of Emergency Medicine (ABEM) Exam content outline to identify gaps. The curriculum map of the CMES library, encompassing content from 2019 to 2024, was reviewed. An anonymous survey was conducted among 47 global users to gather feedback on unmet educational needs and suggestions for content improvements. All healthcare workers who were members of the CMES WhatsApp group were eligible to participate in the survey. RESULTS The curriculum map included 2,572 items categorized into 23 areas. The comparison with the ABEM outline identified gaps in several clinical areas, including procedures, traumatic disorders, and geriatrics, which were represented -5%, -5%, and -4% in the CMES library compared with the ABEM outline, respectively. Free responses from users highlighted a lack of content on practical skills, such as electrocardiogram (ECG) interpretation and management of tropical diseases. Respondents identified emergency medical services (EMS)/prehospital care (81%), diagnostic imaging (62%), and toxicology/pharmacology (40%) as the most beneficial areas for clinical practice. In response to feedback from users, new content was added to the CMES platform on the management of sickle cell disease and dermatologic conditions in darkly pigmented skin. Furthermore, a targeted podcast series called "ID for Users of the CMES Program (ID4U)" has been launched, focusing on tropical and locally relevant ID, with episodes now being integrated into the CMES platform. CONCLUSIONS The project pinpointed critical gaps in emergency medicine (EM) content pertinent to LMICs and led to targeted enhancements in the CMES library. Ongoing updates will focus on including more prehospital medicine, diagnostic imaging, and toxicology content. Further engagement with users and education on utilizing the CMES platform will be implemented to maximize its educational impact. Future adaptations will consider local relevance over the ABEM curriculum to better serve the diverse needs of global users.
Collapse
Affiliation(s)
- Jessica Pelletier
- Emergency Medicine, Washington University School of Medicine, St. Louis, USA
| | - Yan Li
- Center for Information Systems and Technology, Claremont Graduate University, Claremont, USA
| | - Emily Cloessner
- Emergency Medicine, Washington University School of Medicine, St. Louis, USA
| | | | - Nicholas Maxwell
- Emergency Medicine, Washington University School of Medicine, St. Louis, USA
| | - Manoj Thomas
- Business Management, University of Sydney, Darlington, AUS
| | - Deb Stoner
- Emergency Medicine, Evangelical Community Hospital, Lewisburg, USA
| | - Bethel Mwenze
- Emergency Medical Services, Samaritan Health Systems, Kampala, UGA
| | - Angellar Manguvo
- Department of Graduate Health Professions in Medicine, University of Missouri-Kansas City School of Medicine, Kansas City, USA
| |
Collapse
|
12
|
Howard FM, Li A, Riffon MF, Garrett-Mayer E, Pearson AT. Characterizing the Increase in Artificial Intelligence Content Detection in Oncology Scientific Abstracts From 2021 to 2023. JCO Clin Cancer Inform 2024; 8:e2400077. [PMID: 38822755 PMCID: PMC11371107 DOI: 10.1200/cci.24.00077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 04/25/2024] [Accepted: 04/26/2024] [Indexed: 06/03/2024] Open
Abstract
PURPOSE Artificial intelligence (AI) models can generate scientific abstracts that are difficult to distinguish from the work of human authors. The use of AI in scientific writing and performance of AI detection tools are poorly characterized. METHODS We extracted text from published scientific abstracts from the ASCO 2021-2023 Annual Meetings. Likelihood of AI content was evaluated by three detectors: GPTZero, Originality.ai, and Sapling. Optimal thresholds for AI content detection were selected using 100 abstracts from before 2020 as negative controls, and 100 produced by OpenAI's GPT-3 and GPT-4 models as positive controls. Logistic regression was used to evaluate the association of predicted AI content with submission year and abstract characteristics, and adjusted odds ratios (aORs) were computed. RESULTS Fifteen thousand five hundred and fifty-three abstracts met inclusion criteria. Across detectors, abstracts submitted in 2023 were significantly more likely to contain AI content than those in 2021 (aOR range from 1.79 with Originality to 2.37 with Sapling). Online-only publication and lack of clinical trial number were consistently associated with AI content. With optimal thresholds, 99.5%, 96%, and 97% of GPT-3/4-generated abstracts were identified by GPTZero, Originality, and Sapling respectively, and no sampled abstracts from before 2020 were classified as AI generated by the GPTZero and Originality detectors. Correlation between detectors was low to moderate, with Spearman correlation coefficient ranging from 0.14 for Originality and Sapling to 0.47 for Sapling and GPTZero. CONCLUSION There is an increasing signal of AI content in ASCO abstracts, coinciding with the growing popularity of generative AI models.
Collapse
Affiliation(s)
- Frederick M. Howard
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL
| | - Anran Li
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL
| | - Mark F. Riffon
- Center for Research and Analytics, American Society of Clinical Oncology, Alexandria, VA
| | | | - Alexander T. Pearson
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL
| |
Collapse
|
13
|
Elston DM. Letter from the editor: Artificial intelligence and medical writing. J Am Acad Dermatol 2024:S0190-9622(24)00580-2. [PMID: 38608868 DOI: 10.1016/j.jaad.2024.04.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 04/04/2024] [Indexed: 04/14/2024]
Affiliation(s)
- Dirk M Elston
- Department of Dermatology, Medical University of South Carolina, Charleston.
| |
Collapse
|
14
|
Carlsson SV, Esteves SC, Grobet-Jeandin E, Masone MC, Ribal MJ, Zhu Y. Being a non-native English speaker in science and medicine. Nat Rev Urol 2024; 21:127-132. [PMID: 38225458 DOI: 10.1038/s41585-023-00839-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/22/2023] [Indexed: 01/17/2024]
Affiliation(s)
- Sigrid V Carlsson
- Departments of Surgery (Urology Service) and Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Department of Urology, Sahlgrenska Academy at Gothenburg University, Gothenburg, Sweden.
- Department of Translational Medicine, Division of Urological Cancers, Medical Faculty, Lund University, Lund, Sweden.
| | - Sandro C Esteves
- ANDROFERT, Andrology and Human Reproduction Clinic, Campinas, SP, Brazil.
- Department of Surgery (Division of Urology), University of Campinas (UNICAMP), Campinas, SP, Brazil.
- Faculty of Health, Aarhus University, Aarhus, Denmark.
| | | | | | - Maria J Ribal
- Uro-Oncology Unit, Hospital Clinic, University of Barcelona, Barcelona, Spain.
| | - Yao Zhu
- Department of Urology, Fudan University Shanghai Cancer Center, Shanghai, China.
| |
Collapse
|
15
|
Inam M, Sheikh S, Minhas AMK, Vaughan EM, Krittanawong C, Samad Z, Lavie CJ, Khoja A, D'Cruze M, Slipczuk L, Alarakhiya F, Naseem A, Haider AH, Virani SS. A review of top cardiology and cardiovascular medicine journal guidelines regarding the use of generative artificial intelligence tools in scientific writing. Curr Probl Cardiol 2024; 49:102387. [PMID: 38185435 DOI: 10.1016/j.cpcardiol.2024.102387] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 01/04/2024] [Indexed: 01/09/2024]
Abstract
BACKGROUND Generative Artificial Intelligence (AI) tools have experienced rapid development over the last decade and are gaining increasing popularity as assistive models in academic writing. However, the ability of AI to generate reliable and accurate research articles is a topic of debate. Major scientific journals have issued policies regarding the contribution of AI tools in scientific writing. METHODS We conducted a review of the author and peer reviewer guidelines of the top 25 Cardiology and Cardiovascular Medicine journals as per the 2023 SCImago rankings. Data were obtained though reviewing journal websites and directly emailing the editorial office. Descriptive data regarding journal characteristics were coded on SPSS. Subgroup analyses of the journal guidelines were conducted based on the publishing company policies. RESULTS Our analysis revealed that all scientific journals in our study permitted the documented use of AI in scientific writing with certain limitations as per ICMJE recommendations. We found that AI tools cannot be included in the authorship or be used for image generation, and that all authors are required to assume full responsibility of their submitted and published work. The use of generative AI tools in the peer review process is strictly prohibited. CONCLUSION Guidelines regarding the use of generative AI in scientific writing are standardized, detailed, and unanimously followed by all journals in our study according to the recommendations set forth by international forums. It is imperative to ensure that these policies are carefully followed and updated to maintain scientific integrity.
Collapse
Affiliation(s)
- Maha Inam
- Office of the Vice Provost, Research, Aga Khan University, Karachi, Pakistan
| | - Sana Sheikh
- Department of Medicine, Aga Khan University, Karachi, Pakistan
| | - Abdul Mannan Khan Minhas
- Section of Cardiovascular Research, Department of Medicine, Baylor College of Medicine, Houston, TX, United States
| | - Elizabeth M Vaughan
- Section of Cardiovascular Research, Department of Medicine, Baylor College of Medicine, Houston, TX, United States; Department of Internal Medicine, UTMB, Galveston, TX, United States
| | - Chayakrit Krittanawong
- Leon H. Charney Division of Cardiology, New York University Langone Health, New York, NY, United States
| | - Zainab Samad
- Section of Cardiology, Department of Medicine, Aga Khan University Hospital, Karachi, Pakistan
| | - Carl J Lavie
- Department of Cardiovascular Diseases, John Ochsner Heart and Vascular Institute, Ochsner Clinical School, The University of Queensland School of Medicine, New Orleans, LA, United States
| | - Adeel Khoja
- Department of Medicine, Aga Khan University, Karachi, Pakistan; Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Melaine D'Cruze
- Institute for Educational Development, Aga Khan University Hospital, Karachi, Pakistan
| | - Leandro Slipczuk
- Cardiology Division, Montefiore Medical Center, Bronx, NY, United States; Albert Einstein College of Medicine, Bronx, NY, United States
| | | | - Azra Naseem
- Institute for Educational Development, Aga Khan University Hospital, Karachi, Pakistan
| | - Adil H Haider
- Dean's Office, Medical College, Aga Khan University Hospital, Karachi, Pakistan
| | - Salim S Virani
- Office of the Vice Provost, Research, Aga Khan University, Karachi, Pakistan; Section of Cardiovascular Research, Department of Medicine, Baylor College of Medicine, Houston, TX, United States; Section of Cardiology, Department of Medicine, Aga Khan University Hospital, Karachi, Pakistan; The Texas Heart Institute, Houston, TX, United States.
| |
Collapse
|
16
|
Bellini V, Semeraro F, Montomoli J, Cascella M, Bignami E. Between human and AI: assessing the reliability of AI text detection tools. Curr Med Res Opin 2024; 40:353-358. [PMID: 38265047 DOI: 10.1080/03007995.2024.2310086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 01/22/2024] [Indexed: 01/25/2024]
Abstract
OBJECTIVE Large language models (LLMs) such as ChatGPT-4 have raised critical questions regarding their distinguishability from human-generated content. In this research, we evaluated the effectiveness of online detection tools in identifying ChatGPT-4 vs human-written text. METHODS A two texts produced by ChatGPT-4 using differing prompts and one text created by a human author were analytically assessed using the following online detection tools: GPTZero, ZeroGPT, Writer ACD, and Originality. RESULTS The findings revealed a notable variance in the detection capabilities of the employed detection tools. GPTZero and ZeroGPT exhibited inconsistent assessments regarding the AI-origin of the texts. Writer ACD predominantly identified texts as human-written, whereas Originality consistently recognized the AI-generated content in both samples from ChatGPT-4. This highlights Originality's enhanced sensitivity to patterns characteristic of AI-generated text. CONCLUSION The study demonstrates that while automatic detection tools may discern texts generated by ChatGPT-4 significant variability exists in their accuracy. Undoubtedly, there is an urgent need for advanced detection tools to ensure the authenticity and integrity of content, especially in scientific and academic research. However, our findings underscore an urgent need for more refined detection methodologies to prevent the misdetection of human-written content as AI-generated and vice versa.
Collapse
Affiliation(s)
- Valentina Bellini
- Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Parma, Italy
| | - Federico Semeraro
- Department of Anesthesia, Intensive Care and Prehospital Emergency, Maggiore Hospital Carlo Alberto Pizzardi, Bologna, Italy
| | - Jonathan Montomoli
- Department of Anesthesia and Intensive Care, Infermi Hospital, Romagna Local Health Authority, Rimini, Italy
| | - Marco Cascella
- Anesthesia and Pain Medicine. Department of Medicine, Surgery and Dentistry "Scuola Medica Salernitana", University of Salerno, Baronissi, Italy
| | - Elena Bignami
- Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Parma, Italy
| |
Collapse
|
17
|
Andrew A. Potential applications and implications of large language models in primary care. Fam Med Community Health 2024; 12:e002602. [PMID: 38290759 PMCID: PMC10828839 DOI: 10.1136/fmch-2023-002602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/16/2024] [Indexed: 02/01/2024] Open
Abstract
The recent release of highly advanced generative artificial intelligence (AI) chatbots, including ChatGPT and Bard, which are powered by large language models (LLMs), has attracted growing mainstream interest over its diverse applications in clinical practice, including in health and healthcare. The potential applications of LLM-based programmes in the medical field range from assisting medical practitioners in improving their clinical decision-making and streamlining administrative paperwork to empowering patients to take charge of their own health. However, despite the broad range of benefits, the use of such AI tools also comes with several limitations and ethical concerns that warrant further consideration, encompassing issues related to privacy, data bias, and the accuracy and reliability of information generated by AI. The focus of prior research has primarily centred on the broad applications of LLMs in medicine. To the author's knowledge, this is, the first article that consolidates current and pertinent literature on LLMs to examine its potential in primary care. The objectives of this paper are not only to summarise the potential benefits, risks and challenges of using LLMs in primary care, but also to offer insights into considerations that primary care clinicians should take into account when deciding to adopt and integrate such technologies into their clinical practice.
Collapse
Affiliation(s)
- Albert Andrew
- Medical Student, The University of Auckland School of Medicine, Auckland, New Zealand
| |
Collapse
|
18
|
Daungsupawong H, Wiwanitkit V. Artificial intelligence and the scientific writing of non-native English speakers. REVISTA DA ASSOCIACAO MEDICA BRASILEIRA (1992) 2024; 70:e20231291. [PMID: 38265353 PMCID: PMC10807046 DOI: 10.1590/1806-9282.20231291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 09/24/2023] [Indexed: 01/25/2024]
|
19
|
Khalifa AA, Ibrahim MA. Artificial intelligence (AI) and ChatGPT involvement in scientific and medical writing, a new concern for researchers. A scoping review. ARAB GULF JOURNAL OF SCIENTIFIC RESEARCH 2024. [DOI: 10.1108/agjsr-09-2023-0423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
PurposeThe study aims to evaluate PubMed publications on ChatGPT or artificial intelligence (AI) involvement in scientific or medical writing and investigate whether ChatGPT or AI was used to create these articles or listed as authors.Design/methodology/approachThis scoping review was conducted according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) guidelines. A PubMed database search was performed for articles published between January 1 and November 29, 2023, using appropriate search terms; both authors performed screening and selection independently.FindingsFrom the initial search results of 127 articles, 41 were eligible for final analysis. Articles were published in 34 journals. Editorials were the most common article type, with 15 (36.6%) articles. Authors originated from 27 countries, and authors from the USA contributed the most, with 14 (34.1%) articles. The most discussed topic was AI tools and writing capabilities in 19 (46.3%) articles. AI or ChatGPT was involved in manuscript preparation in 31 (75.6%) articles. None of the articles listed AI or ChatGPT as an author, and in 19 (46.3%) articles, the authors acknowledged utilizing AI or ChatGPT.Practical implicationsResearchers worldwide are concerned with AI or ChatGPT involvement in scientific research, specifically the writing process. The authors believe that precise and mature regulations will be developed soon by journals, publishers and editors, which will pave the way for the best usage of these tools.Originality/valueThis scoping review expressed data published on using AI or ChatGPT in various scientific research and writing aspects, besides alluding to the advantages, disadvantages and implications of their usage.
Collapse
|
20
|
Jacques T, Sleiman R, Diaz MI, Dartus J. Artificial intelligence: Emergence and possible fraudulent use in medical publishing. Orthop Traumatol Surg Res 2023; 109:103709. [PMID: 37852535 DOI: 10.1016/j.otsr.2023.103709] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 10/06/2023] [Indexed: 10/20/2023]
Affiliation(s)
- Thibaut Jacques
- IRIS Imagerie, 144, avenue de Dunkerque, 59000 Lille, France.
| | - Rita Sleiman
- Centre de recherche et d'innovation de Talan, 14, rue Pergolèse, 75116 Paris, France
| | - Manuel I Diaz
- Centre de recherche et d'innovation de Talan, 14, rue Pergolèse, 75116 Paris, France
| | - Julien Dartus
- Département universitaire de chirurgie orthopédique et traumatologique, hôpital Roger-Salengro, CHU de Lille ULR 4490, université de Lille, place de Verdun, 59037 Lille, France; U1008 - Controlled Drug Delivery Systems and Biomaterials, CHU de Lille, University Lille, Inserm, 59000 Lille, France
| |
Collapse
|