Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK, Landman A, Dreyer KJ, Succi MD. Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow. medRxiv 2023:2023.02.21.23285886. [PMID: 36865204 PMCID: PMC9980239 DOI: 10.1101/2023.02.21.23285886] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]

For:	Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK, Landman A, Dreyer KJ, Succi MD. Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow. medRxiv 2023:2023.02.21.23285886. [PMID: 36865204 PMCID: PMC9980239 DOI: 10.1101/2023.02.21.23285886] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]

Number

Cited by Other Article(s)

McDarby M, Mroz EL, Hahne J, Malling CD, Carpenter BD, Parker PA. "Hospice Care Could Be a Compassionate Choice": ChatGPT Responses to Questions About Decision Making in Advanced Cancer. J Palliat Med 2024. [PMID: 39263979 DOI: 10.1089/jpm.2024.0256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024] Open

Abstract

Background: Patients with cancer use the internet to inform medical decision making. Objective: To examine the content of ChatGPT responses to a hypothetical patient question about decision making in advanced cancer. Design: We developed a medical advice-seeking vignette in English about a patient with metastatic melanoma. When inputting this vignette, we varied five characteristics (patient age, race, ethnicity, insurance status, and preexisting recommendation of hospice/the opinion of an adult daughter regarding the recommendation). ChatGPT responses (N = 96) were coded for mentions of: hospice care, palliative care, financial implications of treatment, second opinions, clinical trials, discussing the decision with loved ones, and discussing the decision with care providers. We conducted additional analyses to understand how ChatGPT described hospice and referenced the adult daughter. Data were analyzed using descriptive statistics and chi-square analysis. Results: Responses more frequently mentioned clinical trials for vignettes describing 45-year-old patients compared with 65- and 85-year-old patients. When vignettes mentioned a preexisting recommendation for hospice, responses more frequently mentioned seeking a second opinion and hospice care. ChatGPT's descriptions of hospice focused primarily on its ability to provide comfort and support. When vignettes referenced the daughter's opinion on the hospice recommendation, approximately one third of responses also referenced this, stating the importance of talking to her about treatment preferences and values. Conclusion: ChatGPT responses to questions about advanced cancer decision making can be heterogeneous based on demographic and clinical characteristics. Findings underscore the possible impact of this heterogeneity on treatment decision making in patients with cancer.

Collapse

Kayastha A, Lakshmanan K, Valentine MJ, Nguyen A, Dholakia K, Wang D. Lumbar disc herniation with radiculopathy: a comparison of NASS guidelines and ChatGPT. NORTH AMERICAN SPINE SOCIETY JOURNAL 2024;19:100333. [PMID: 39040948 PMCID: PMC11261487 DOI: 10.1016/j.xnsj.2024.100333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 05/25/2024] [Accepted: 05/27/2024] [Indexed: 07/24/2024]

Abstract

Background

ChatGPT is an advanced language AI able to generate responses to clinical questions regarding lumbar disc herniation with radiculopathy. Artificial intelligence (AI) tools are increasingly being considered to assist clinicians in decision-making. This study compared ChatGPT-3.5 and ChatGPT-4.0 responses to established NASS clinical guidelines and evaluated concordance.

Methods

ChatGPT-3.5 and ChatGPT-4.0 were prompted with fifteen questions from The 2012 NASS Clinical Guidelines for the diagnosis and treatment of lumbar disc herniation with radiculopathy. Clinical questions organized into categories were directly entered as unmodified queries into ChatGPT. Language output was assessed by two independent authors on September 26, 2023 based on operationally-defined parameters of accuracy, over-conclusiveness, supplementary, and incompleteness. ChatGPT-3.5 and ChatGPT-4.0 performance was compared via chi-square analyses.

Results

Among the fifteen responses produced by ChatGPT-3.5, 7 (47%) were accurate, 7 (47%) were over-conclusive, fifteen (100%) were supplementary, and 6 (40%) were incomplete. For ChatGPT-4.0, ten (67%) were accurate, 5 (33%) were over-conclusive, 10 (67%) were supplementary, and 6 (40%) were incomplete. There was a statistically significant difference in supplementary information (100% vs. 67%; p=.014) between ChatGPT-3.5 and ChatGPT-4.0. Accuracy (47% vs. 67%; p=.269), over-conclusiveness (47% vs. 33%; p=.456), and incompleteness (40% vs. 40%; p=1.000) did not show significant differences between ChatGPT-3.5 and ChatGPT-4.0. ChatGPT-3.5 and ChatGPT-4.0 both yielded 100% accuracy for definition and history and physical examination categories. Diagnostic testing yielded 0% accuracy for ChatGPT-3.5 and 100% accuracy for ChatGPT-4.0. Nonsurgical interventions had 50% accuracy for ChatGPT-3.5 and 63% accuracy for ChatGPT-4.0. Surgical interventions resulted in 0% accuracy for ChatGPT-3.5 and 33% accuracy for ChatGPT-4.0.

Conclusions

ChatGPT-4.0 provided less supplementary information and overall higher accuracy in question categories than ChatGPT-3.5. ChatGPT showed reasonable concordance to NASS guidelines, but clinicians should caution use of ChatGPT in its current state as it fails to safeguard against misinformation.

Collapse

Liang Z, Li J, Tang Y, Zhang Y, Chen C, Li S, Wang X, Xu X, Zhuang Z, He S, Deng B. Predicting the risk category of thymoma with machine learning-based computed tomography radiomics signatures and their between-imaging phase differences. Sci Rep 2024;14:19215. [PMID: 39160177 PMCID: PMC11333573 DOI: 10.1038/s41598-024-69735-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 08/08/2024] [Indexed: 08/21/2024] Open

Abstract

The aim of this study was to develop a medical imaging and comprehensive stacked learning-based method for predicting high- and low-risk thymoma. A total of 126 patients with thymomas and 5 patients with thymic carcinoma treated at our institution, including 65 low-risk patients and 66 high-risk patients, were retrospectively recruited. Among them, 78 patients composed the training cohort, while the remaining 53 patients formed the validation cohort. We extracted 1702 features each from the patients' arterial-, venous-, and plain-phase images. Pairwise subtraction of these features yielded 1702 arterial-venous, arterial-plain, and venous-plain difference features each. The Mann‒Whitney U test and least absolute shrinkage and selection operator (LASSO) and SelectKBest methods were employed to select the best features from the training set. Six models were built with a stacked learning algorithm. By applying stacked ensemble learning, three machine learning algorithms (XGBoost, multilayer perceptron (MLP), and random forest) were combined by XGBoost to produce the the six basic imaging models. Then, the XGBoost algorithm was applied to the six basic imaging models to construct a combined radiomic model. Finally, the radiomic model was combined with clinical information to create a nomogram that could easily be used in clinical practice to predict the thymoma risk category. The areas under the curve (AUCs) of the combined radiomic model in the training and validation cohorts were 0.999 (95% CI 0.988-1.000) and 0.967 (95% CI 0.916-1.000), respectively, while those of the nomogram were 0.999 (95% CI 0.996-1.000) and 0.983 (95% CI 0.990-1.000). This study describes the application of CT-based radiomics in thymoma patients and proposes a nomogram for predicting the risk category for this disease, which could be advantageous for clinical decision-making for affected patients.

Collapse

Young CC, Enichen E, Rao A, Hilker S, Butler A, Laird-Gion J, Succi MD. Pilot Study of Large Language Models as an Age-Appropriate Explanatory Tool for Chronic Pediatric Conditions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.06.24311544. [PMID: 39148860 PMCID: PMC11326333 DOI: 10.1101/2024.08.06.24311544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]

Haltaufderheide J, Ranisch R. The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs). NPJ Digit Med 2024;7:183. [PMID: 38977771 PMCID: PMC11231310 DOI: 10.1038/s41746-024-01157-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/29/2024] [Indexed: 07/10/2024] Open

Abstract

With the introduction of ChatGPT, Large Language Models (LLMs) have received enormous attention in healthcare. Despite potential benefits, researchers have underscored various ethical implications. While individual instances have garnered attention, a systematic and comprehensive overview of practical applications currently researched and ethical issues connected to them is lacking. Against this background, this work maps the ethical landscape surrounding the current deployment of LLMs in medicine and healthcare through a systematic review. Electronic databases and preprint servers were queried using a comprehensive search strategy which generated 796 records. Studies were screened and extracted following a modified rapid review approach. Methodological quality was assessed using a hybrid approach. For 53 records, a meta-aggregative synthesis was performed. Four general fields of applications emerged showcasing a dynamic exploration phase. Advantages of using LLMs are attributed to their capacity in data analysis, information provisioning, support in decision-making or mitigating information loss and enhancing information accessibility. However, our study also identifies recurrent ethical concerns connected to fairness, bias, non-maleficence, transparency, and privacy. A distinctive concern is the tendency to produce harmful or convincing but inaccurate content. Calls for ethical guidance and human oversight are recurrent. We suggest that the ethical guidance debate should be reframed to focus on defining what constitutes acceptable human oversight across the spectrum of applications. This involves considering the diversity of settings, varying potentials for harm, and different acceptable thresholds for performance and certainty in healthcare. Additionally, critical inquiry is needed to evaluate the necessity and justification of LLMs' current experimental use.

Collapse

Hoppe JM, Auer MK, Strüven A, Massberg S, Stremmel C. ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis. J Med Internet Res 2024;26:e56110. [PMID: 38976865 PMCID: PMC11263899 DOI: 10.2196/56110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Revised: 04/08/2024] [Accepted: 05/08/2024] [Indexed: 07/10/2024] Open

Abstract

BACKGROUND

OpenAI's ChatGPT is a pioneering artificial intelligence (AI) in the field of natural language processing, and it holds significant potential in medicine for providing treatment advice. Additionally, recent studies have demonstrated promising results using ChatGPT for emergency medicine triage. However, its diagnostic accuracy in the emergency department (ED) has not yet been evaluated.

OBJECTIVE

This study compares the diagnostic accuracy of ChatGPT with GPT-3.5 and GPT-4 and primary treating resident physicians in an ED setting.

METHODS

Among 100 adults admitted to our ED in January 2023 with internal medicine issues, the diagnostic accuracy was assessed by comparing the diagnoses made by ED resident physicians and those made by ChatGPT with GPT-3.5 or GPT-4 against the final hospital discharge diagnosis, using a point system for grading accuracy.

RESULTS

The study enrolled 100 patients with a median age of 72 (IQR 58.5-82.0) years who were admitted to our internal medicine ED primarily for cardiovascular, endocrine, gastrointestinal, or infectious diseases. GPT-4 outperformed both GPT-3.5 (P<.001) and ED resident physicians (P=.01) in diagnostic accuracy for internal medicine emergencies. Furthermore, across various disease subgroups, GPT-4 consistently outperformed GPT-3.5 and resident physicians. It demonstrated significant superiority in cardiovascular (GPT-4 vs ED physicians: P=.03) and endocrine or gastrointestinal diseases (GPT-4 vs GPT-3.5: P=.01). However, in other categories, the differences were not statistically significant.

CONCLUSIONS

In this study, which compared the diagnostic accuracy of GPT-3.5, GPT-4, and ED resident physicians against a discharge diagnosis gold standard, GPT-4 outperformed both the resident physicians and its predecessor, GPT-3.5. Despite the retrospective design of the study and its limited sample size, the results underscore the potential of AI as a supportive diagnostic tool in ED settings.

Collapse

Mohammadi SS, Nguyen QD. A User-friendly Approach for the Diagnosis of Diabetic Retinopathy Using ChatGPT and Automated Machine Learning. OPHTHALMOLOGY SCIENCE 2024;4:100495. [PMID: 38690313 PMCID: PMC11059323 DOI: 10.1016/j.xops.2024.100495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 02/15/2024] [Accepted: 02/15/2024] [Indexed: 05/02/2024]

Abstract

Purpose

To assess the capabilities of Chat Generative Pre-trained Transformer (ChatGPT) and Vertex AI in executing code-free preprocessing, training machine learning (ML) models, and analyzing the data.

Design

Evaluation of diagnostic test or technology.

Participants

ChatGPT and Vetrex AI as publicly available large language model and ML platform, respectively.

Methods

ChatGPT was employed to improve the resolution of fundus photography images from the Methods to Evaluate Segmentation and Indexing Techniques in the field of Retinal Ophthalmology (Messidor-2) open-source dataset using the Contrast Limited Adaptive Histogram Equalization (CLAHE) technique by Fiji software. Subsequently, Vertex AI, an automated ML (AutoML) platform, was utilized to develop 2 classification models. The first model served as a binary classifier for detecting the presence of diabetic retinopathy (DR), while the second determined its severity. Finally, ChatGPT was used to provide scripts for R and Python programming languages for data analysis and was also directly employed in analyzing the data in a code-free method.

Main Outcome Measures

Evaluating the utility of ChatGPT in generating scripts for preprocessing images using Fiji and analyzing data across Python and R and assessing its potential in analyzing data through a code-free method. Investigating the capabilities of Vertex AI to train image classification models for detection of DR and its severity.

Results

Two ML models were trained using 1740 images from the Messidor-2 database. The first model, designed to detect the severity of DR, achieved an area under the precision-recall curve (AUPRC) of 0.81, with a precision rate of 81.81% and recall of 72.83%. The second model, tailored for the detection of the presence of DR, recorded a precision and recall of 84.48% with an AUPRC of 0.90.

Conclusions

ChatGPT and Vertex AI have the potential to enable physicians without coding expertise to preprocess images, analyze data, and train ML models.

Financial Disclosures

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Collapse

Mitchell J, Bennett TD. Navigating Complexity: Enhancing Pediatric Diagnostics With Large Language Models. Pediatr Crit Care Med 2024;25:577-580. [PMID: 38836714 PMCID: PMC11160974 DOI: 10.1097/pcc.0000000000003483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]

Preiksaitis C, Ashenburg N, Bunney G, Chu A, Kabeer R, Riley F, Ribeira R, Rose C. The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review. JMIR Med Inform 2024;12:e53787. [PMID: 38728687 PMCID: PMC11127144 DOI: 10.2196/53787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 12/20/2023] [Accepted: 04/05/2024] [Indexed: 05/12/2024] Open

Abstract

BACKGROUND

Artificial intelligence (AI), more specifically large language models (LLMs), holds significant potential in revolutionizing emergency care delivery by optimizing clinical workflows and enhancing the quality of decision-making. Although enthusiasm for integrating LLMs into emergency medicine (EM) is growing, the existing literature is characterized by a disparate collection of individual studies, conceptual analyses, and preliminary implementations. Given these complexities and gaps in understanding, a cohesive framework is needed to comprehend the existing body of knowledge on the application of LLMs in EM.

OBJECTIVE

Given the absence of a comprehensive framework for exploring the roles of LLMs in EM, this scoping review aims to systematically map the existing literature on LLMs' potential applications within EM and identify directions for future research. Addressing this gap will allow for informed advancements in the field.

METHODS

Using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) criteria, we searched Ovid MEDLINE, Embase, Web of Science, and Google Scholar for papers published between January 2018 and August 2023 that discussed LLMs' use in EM. We excluded other forms of AI. A total of 1994 unique titles and abstracts were screened, and each full-text paper was independently reviewed by 2 authors. Data were abstracted independently, and 5 authors performed a collaborative quantitative and qualitative synthesis of the data.

RESULTS

A total of 43 papers were included. Studies were predominantly from 2022 to 2023 and conducted in the United States and China. We uncovered four major themes: (1) clinical decision-making and support was highlighted as a pivotal area, with LLMs playing a substantial role in enhancing patient care, notably through their application in real-time triage, allowing early recognition of patient urgency; (2) efficiency, workflow, and information management demonstrated the capacity of LLMs to significantly boost operational efficiency, particularly through the automation of patient record synthesis, which could reduce administrative burden and enhance patient-centric care; (3) risks, ethics, and transparency were identified as areas of concern, especially regarding the reliability of LLMs' outputs, and specific studies highlighted the challenges of ensuring unbiased decision-making amidst potentially flawed training data sets, stressing the importance of thorough validation and ethical oversight; and (4) education and communication possibilities included LLMs' capacity to enrich medical training, such as through using simulated patient interactions that enhance communication skills.

CONCLUSIONS

LLMs have the potential to fundamentally transform EM, enhancing clinical decision-making, optimizing workflows, and improving patient outcomes. This review sets the stage for future advancements by identifying key research areas: prospective validation of LLM applications, establishing standards for responsible use, understanding provider and patient perceptions, and improving physicians' AI literacy. Effective integration of LLMs into EM will require collaborative efforts and thorough evaluation to ensure these technologies can be safely and effectively applied.

Collapse

Makhoul M, Melkane AE, Khoury PE, Hadi CE, Matar N. A cross-sectional comparative study: ChatGPT 3.5 versus diverse levels of medical experts in the diagnosis of ENT diseases. Eur Arch Otorhinolaryngol 2024;281:2717-2721. [PMID: 38365990 DOI: 10.1007/s00405-024-08509-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 01/24/2024] [Indexed: 02/18/2024]

Ha LT, Kelley KD. Artificial Intelligence: Promise or Pitfalls? A Clinical Vignette of Real-Life ChatGPT Implementation in Perioperative Medicine. J Gen Intern Med 2024;39:1063-1067. [PMID: 38252252 DOI: 10.1007/s11606-024-08611-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 01/05/2024] [Indexed: 01/23/2024]

Rao A, Kim J, Lie W, Pang M, Fuh L, Dreyer KJ, Succi MD. Proactive Polypharmacy Management Using Large Language Models: Opportunities to Enhance Geriatric Care. J Med Syst 2024;48:41. [PMID: 38632172 DOI: 10.1007/s10916-024-02058-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 03/25/2024] [Indexed: 04/19/2024]

Sievert M, Aubreville M, Mueller SK, Eckstein M, Breininger K, Iro H, Goncalves M. Diagnosis of malignancy in oropharyngeal confocal laser endomicroscopy using GPT 4.0 with vision. Eur Arch Otorhinolaryngol 2024;281:2115-2122. [PMID: 38329525 DOI: 10.1007/s00405-024-08476-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 01/11/2024] [Indexed: 02/09/2024]

Abstract

PURPOSE

Confocal Laser Endomicroscopy (CLE) is an imaging tool, that has demonstrated potential for intraoperative, real-time, non-invasive, microscopical assessment of surgical margins of oropharyngeal squamous cell carcinoma (OPSCC). However, interpreting CLE images remains challenging. This study investigates the application of OpenAI's Generative Pretrained Transformer (GPT) 4.0 with Vision capabilities for automated classification of CLE images in OPSCC.

METHODS

CLE Images of histological confirmed SCC or healthy mucosa from a database of 12 809 CLE images from 5 patients with OPSCC were retrieved and anonymized. Using a training data set of 16 images, a validation set of 139 images, comprising SCC (83 images, 59.7%) and healthy normal mucosa (56 images, 40.3%) was classified using the application programming interface (API) of GPT4.0. The same set of images was also classified by CLE experts (two surgeons and one pathologist), who were blinded to the histology. Diagnostic metrics, the reliability of GPT and inter-rater reliability were assessed.

RESULTS

Overall accuracy of the GPT model was 71.2%, the intra-rater agreement was κ = 0.837, indicating an almost perfect agreement across the three runs of GPT-generated results. Human experts achieved an accuracy of 88.5% with a substantial level of agreement (κ = 0.773).

CONCLUSIONS

Though limited to a specific clinical framework, patient and image set, this study sheds light on some previously unexplored diagnostic capabilities of large language models using few-shot prompting. It suggests the model`s ability to extrapolate information and classify CLE images with minimal example data. Whether future versions of the model can achieve clinically relevant diagnostic accuracy, especially in uncurated data sets, remains to be investigated.

Collapse

Ahmed W, Saturno M, Rajjoub R, Duey AH, Zaidat B, Hoang T, Restrepo Mejia M, Gallate ZS, Shrestha N, Tang J, Zapolsky I, Kim JS, Cho SK. ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis. EUROPEAN SPINE JOURNAL : OFFICIAL PUBLICATION OF THE EUROPEAN SPINE SOCIETY, THE EUROPEAN SPINAL DEFORMITY SOCIETY, AND THE EUROPEAN SECTION OF THE CERVICAL SPINE RESEARCH SOCIETY 2024:10.1007/s00586-024-08198-6. [PMID: 38489044 DOI: 10.1007/s00586-024-08198-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 02/01/2024] [Accepted: 02/17/2024] [Indexed: 03/17/2024]

Abstract

BACKGROUND CONTEXT

Clinical guidelines, developed in concordance with the literature, are often used to guide surgeons' clinical decision making. Recent advancements of large language models and artificial intelligence (AI) in the medical field come with exciting potential. OpenAI's generative AI model, known as ChatGPT, can quickly synthesize information and generate responses grounded in medical literature, which may prove to be a useful tool in clinical decision-making for spine care. The current literature has yet to investigate the ability of ChatGPT to assist clinical decision making with regard to degenerative spondylolisthesis.

PURPOSE

The study aimed to compare ChatGPT's concordance with the recommendations set forth by The North American Spine Society (NASS) Clinical Guideline for the Diagnosis and Treatment of Degenerative Spondylolisthesis and assess ChatGPT's accuracy within the context of the most recent literature.

METHODS

ChatGPT-3.5 and 4.0 was prompted with questions from the NASS Clinical Guideline for the Diagnosis and Treatment of Degenerative Spondylolisthesis and graded its recommendations as "concordant" or "nonconcordant" relative to those put forth by NASS. A response was considered "concordant" when ChatGPT generated a recommendation that accurately reproduced all major points made in the NASS recommendation. Any responses with a grading of "nonconcordant" were further stratified into two subcategories: "Insufficient" or "Over-conclusive," to provide further insight into grading rationale. Responses between GPT-3.5 and 4.0 were compared using Chi-squared tests.

RESULTS

ChatGPT-3.5 answered 13 of NASS's 28 total clinical questions in concordance with NASS's guidelines (46.4%). Categorical breakdown is as follows: Definitions and Natural History (1/1, 100%), Diagnosis and Imaging (1/4, 25%), Outcome Measures for Medical Intervention and Surgical Treatment (0/1, 0%), Medical and Interventional Treatment (4/6, 66.7%), Surgical Treatment (7/14, 50%), and Value of Spine Care (0/2, 0%). When NASS indicated there was sufficient evidence to offer a clear recommendation, ChatGPT-3.5 generated a concordant response 66.7% of the time (6/9). However, ChatGPT-3.5's concordance dropped to 36.8% when asked clinical questions that NASS did not provide a clear recommendation on (7/19). A further breakdown of ChatGPT-3.5's nonconcordance with the guidelines revealed that a vast majority of its inaccurate recommendations were due to them being "over-conclusive" (12/15, 80%), rather than "insufficient" (3/15, 20%). ChatGPT-4.0 answered 19 (67.9%) of the 28 total questions in concordance with NASS guidelines (P = 0.177). When NASS indicated there was sufficient evidence to offer a clear recommendation, ChatGPT-4.0 generated a concordant response 66.7% of the time (6/9). ChatGPT-4.0's concordance held up at 68.4% when asked clinical questions that NASS did not provide a clear recommendation on (13/19, P = 0.104).

CONCLUSIONS

This study sheds light on the duality of LLM applications within clinical settings: one of accuracy and utility in some contexts versus inaccuracy and risk in others. ChatGPT was concordant for most clinical questions NASS offered recommendations for. However, for questions NASS did not offer best practices, ChatGPT generated answers that were either too general or inconsistent with the literature, and even fabricated data/citations. Thus, clinicians should exercise extreme caution when attempting to consult ChatGPT for clinical recommendations, taking care to ensure its reliability within the context of recent literature.

Collapse

Park YJ, Pillai A, Deng J, Guo E, Gupta M, Paget M, Naugler C. Assessing the research landscape and clinical utility of large language models: a scoping review. BMC Med Inform Decis Mak 2024;24:72. [PMID: 38475802 DOI: 10.1186/s12911-024-02459-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Accepted: 02/12/2024] [Indexed: 03/14/2024] Open

Abstract

IMPORTANCE

Large language models (LLMs) like OpenAI's ChatGPT are powerful generative systems that rapidly synthesize natural language responses. Research on LLMs has revealed their potential and pitfalls, especially in clinical settings. However, the evolving landscape of LLM research in medicine has left several gaps regarding their evaluation, application, and evidence base.

OBJECTIVE

This scoping review aims to (1) summarize current research evidence on the accuracy and efficacy of LLMs in medical applications, (2) discuss the ethical, legal, logistical, and socioeconomic implications of LLM use in clinical settings, (3) explore barriers and facilitators to LLM implementation in healthcare, (4) propose a standardized evaluation framework for assessing LLMs' clinical utility, and (5) identify evidence gaps and propose future research directions for LLMs in clinical applications.

EVIDENCE REVIEW

We screened 4,036 records from MEDLINE, EMBASE, CINAHL, medRxiv, bioRxiv, and arXiv from January 2023 (inception of the search) to June 26, 2023 for English-language papers and analyzed findings from 55 worldwide studies. Quality of evidence was reported based on the Oxford Centre for Evidence-based Medicine recommendations.

FINDINGS

Our results demonstrate that LLMs show promise in compiling patient notes, assisting patients in navigating the healthcare system, and to some extent, supporting clinical decision-making when combined with human oversight. However, their utilization is limited by biases in training data that may harm patients, the generation of inaccurate but convincing information, and ethical, legal, socioeconomic, and privacy concerns. We also identified a lack of standardized methods for evaluating LLMs' effectiveness and feasibility.

CONCLUSIONS AND RELEVANCE

This review thus highlights potential future directions and questions to address these limitations and to further explore LLMs' potential in enhancing healthcare delivery.

Collapse

Tunçer G, Güçlü KG. How Reliable is ChatGPT as a Novel Consultant in Infectious Diseases and Clinical Microbiology? INFECTIOUS DISEASES & CLINICAL MICROBIOLOGY 2024;6:55-59. [PMID: 38633442 PMCID: PMC11020004 DOI: 10.36519/idcm.2024.286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 12/14/2023] [Indexed: 04/19/2024]

Abstract

Objective

The study aimed to investigate the reliability of ChatGPT's answers to medical questions, including those sourced from patients and guide recommendations. The focus was on evaluating ChatGPT's accuracy in responding to various types of infectious disease questions.

Materials and Methods

The study was conducted using 200 questions sourced from social media, experts, and guidelines related to various infectious diseases like urinary tract infection, pneumonia, HIV, various types of hepatitis, COVID-19, skin infections, and tuberculosis. The questions were arranged for clarity and consistency by excluding repetitive or unclear ones. The answers were based on guidelines from reputable sources like the Infectious Diseases Society of America (IDSA), Centers for Disease Control and Prevention (CDC), European Association for the Study of Liver Disease (EASL) and Joint United Nations Programme on HIV/AIDS (UNAIDS) AIDSinfo. According to the scoring system, completely correct answers were given 1-point, and completely incorrect ones were given 4-points. To assess reproducibility, each question was posed twice on separate computers. Repeatability was determined by the consistency of the answers' scores.

Results

In the study, ChatGPT was posed with 200 questions: 107 from social media platforms and 93 from guidelines. The questions covered a range of topics: urinary tract infections (n=18 questions), pneumonia (n=22), HIV (n=39), hepatitis B and C (n=53), COVID-19 (n=11), skin and soft tissue infections (n=38), and tuberculosis (n=19). The lowest accuracy was 72% for urinary tract infections. ChatGPT answered 92% of social media platform questions correctly (scored 1-point) versus 69% of guideline questions (p=0.001; OR=5.48, 95% CI=2.29-13.11).

Conclusion

Artificial intelligence is widely used in the medical field by both healthcare professionals and patients. Although ChatGPT answers questions from social media platforms quite properly, we recommend that healthcare professionals be conscientious when using it.

Collapse

Zhou Y, Moon C, Szatkowski J, Moore D, Stevens J. Evaluating ChatGPT responses in the context of a 53-year-old male with a femoral neck fracture: a qualitative analysis. EUROPEAN JOURNAL OF ORTHOPAEDIC SURGERY & TRAUMATOLOGY : ORTHOPEDIE TRAUMATOLOGIE 2024;34:927-955. [PMID: 37776392 PMCID: PMC10858115 DOI: 10.1007/s00590-023-03742-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 09/18/2023] [Indexed: 10/02/2023]

Abstract

PURPOSE

The integration of artificial intelligence (AI) tools, such as ChatGPT, in clinical medicine and medical education has gained significant attention due to their potential to support decision-making and improve patient care. However, there is a need to evaluate the benefits and limitations of these tools in specific clinical scenarios.

METHODS

This study used a case study approach within the field of orthopaedic surgery. A clinical case report featuring a 53-year-old male with a femoral neck fracture was used as the basis for evaluation. ChatGPT, a large language model, was asked to respond to clinical questions related to the case. The responses generated by ChatGPT were evaluated qualitatively, considering their relevance, justification, and alignment with the responses of real clinicians. Alternative dialogue protocols were also employed to assess the impact of additional prompts and contextual information on ChatGPT responses.

RESULTS

ChatGPT generally provided clinically appropriate responses to the questions posed in the clinical case report. However, the level of justification and explanation varied across the generated responses. Occasionally, clinically inappropriate responses and inconsistencies were observed in the generated responses across different dialogue protocols and on separate days.

CONCLUSIONS

The findings of this study highlight both the potential and limitations of using ChatGPT in clinical practice. While ChatGPT demonstrated the ability to provide relevant clinical information, the lack of consistent justification and occasional clinically inappropriate responses raise concerns about its reliability. These results underscore the importance of careful consideration and validation when using AI tools in healthcare. Further research and clinician training are necessary to effectively integrate AI tools like ChatGPT, ensuring their safe and reliable use in clinical decision-making.

Collapse

Tenner ZM, Cottone MC, Chavez MR. Harnessing the open access version of ChatGPT for enhanced clinical opinions. PLOS DIGITAL HEALTH 2024;3:e0000355. [PMID: 38315648 PMCID: PMC10843476 DOI: 10.1371/journal.pdig.0000355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 01/11/2024] [Indexed: 02/07/2024]

Omiye JA, Gui H, Rezaei SJ, Zou J, Daneshjou R. Large Language Models in Medicine: The Potentials and Pitfalls : A Narrative Review. Ann Intern Med 2024;177:210-220. [PMID: 38285984 DOI: 10.7326/m23-2772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/31/2024] Open

Padovan M, Cosci B, Petillo A, Nerli G, Porciatti F, Scarinci S, Carlucci F, Dell’Amico L, Meliani N, Necciari G, Lucisano VC, Marino R, Foddis R, Palla A. ChatGPT in Occupational Medicine: A Comparative Study with Human Experts. Bioengineering (Basel) 2024;11:57. [PMID: 38247934 PMCID: PMC10813435 DOI: 10.3390/bioengineering11010057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 01/01/2024] [Accepted: 01/04/2024] [Indexed: 01/23/2024] Open

Affiliation(s)

Martina Padovan Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
Bianca Cosci Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
Armando Petillo Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
Gianluca Nerli Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
Francesco Porciatti Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
Sergio Scarinci Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
Francesco Carlucci Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
Letizia Dell’Amico Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
Niccolò Meliani Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
Gabriele Necciari Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
Vincenzo Carmelo Lucisano Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
Riccardo Marino Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
Rudy Foddis Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
Alessandro Palla Intel Corporation, Santa Clara, CA 95054, USA;

Collapse

Morales-Ramirez P, Mishek H, Dasgupta A. The Genie Is Out of the Bottle: What ChatGPT Can and Cannot Do for Medical Professionals. Obstet Gynecol 2024;143:e1-e6. [PMID: 37944140 DOI: 10.1097/aog.0000000000005446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/12/2023] [Indexed: 11/12/2023]

Koranteng E, Rao A, Flores E, Lev M, Landman A, Dreyer K, Succi M. Empathy and Equity: Key Considerations for Large Language Model Adoption in Health Care. JMIR MEDICAL EDUCATION 2023;9:e51199. [PMID: 38153778 PMCID: PMC10884892 DOI: 10.2196/51199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 10/01/2023] [Accepted: 10/14/2023] [Indexed: 12/29/2023]

Alkhaaldi SMI, Kassab CH, Dimassi Z, Oyoun Alsoud L, Al Fahim M, Al Hageh C, Ibrahim H. Medical Student Experiences and Perceptions of ChatGPT and Artificial Intelligence: Cross-Sectional Study. JMIR MEDICAL EDUCATION 2023;9:e51302. [PMID: 38133911 PMCID: PMC10770787 DOI: 10.2196/51302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 11/10/2023] [Accepted: 12/11/2023] [Indexed: 12/23/2023]

Abstract

BACKGROUND

Artificial intelligence (AI) has the potential to revolutionize the way medicine is learned, taught, and practiced, and medical education must prepare learners for these inevitable changes. Academic medicine has, however, been slow to embrace recent AI advances. Since its launch in November 2022, ChatGPT has emerged as a fast and user-friendly large language model that can assist health care professionals, medical educators, students, trainees, and patients. While many studies focus on the technology's capabilities, potential, and risks, there is a gap in studying the perspective of end users.

OBJECTIVE

The aim of this study was to gauge the experiences and perspectives of graduating medical students on ChatGPT and AI in their training and future careers.

METHODS

A cross-sectional web-based survey of recently graduated medical students was conducted in an international academic medical center between May 5, 2023, and June 13, 2023. Descriptive statistics were used to tabulate variable frequencies.

RESULTS

Of 325 applicants to the residency programs, 265 completed the survey (an 81.5% response rate). The vast majority of respondents denied using ChatGPT in medical school, with 20.4% (n=54) using it to help complete written assessments and only 9.4% using the technology in their clinical work (n=25). More students planned to use it during residency, primarily for exploring new medical topics and research (n=168, 63.4%) and exam preparation (n=151, 57%). Male students were significantly more likely to believe that AI will improve diagnostic accuracy (n=47, 51.7% vs n=69, 39.7%; P=.001), reduce medical error (n=53, 58.2% vs n=71, 40.8%; P=.002), and improve patient care (n=60, 65.9% vs n=95, 54.6%; P=.007). Previous experience with AI was significantly associated with positive AI perception in terms of improving patient care, decreasing medical errors and misdiagnoses, and increasing the accuracy of diagnoses (P=.001, P<.001, P=.008, respectively).

CONCLUSIONS

The surveyed medical students had minimal formal and informal experience with AI tools and limited perceptions of the potential uses of AI in health care but had overall positive views of ChatGPT and AI and were optimistic about the future of AI in medical education and health care. Structured curricula and formal policies and guidelines are needed to adequately prepare medical learners for the forthcoming integration of AI in medicine.

Collapse

Bagde H, Dhopte A, Alam MK, Basri R. A systematic review and meta-analysis on ChatGPT and its utilization in medical and dental research. Heliyon 2023;9:e23050. [PMID: 38144348 PMCID: PMC10746423 DOI: 10.1016/j.heliyon.2023.e23050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 10/24/2023] [Accepted: 11/24/2023] [Indexed: 12/26/2023] Open

Abstract

Since its release, ChatGPT has taken the world by storm with its utilization in various fields of life. This review's main goal was to offer a thorough and fact-based evaluation of ChatGPT's potential as a tool for medical and dental research, which could direct subsequent research and influence clinical practices.

METHODS

Different online databases were scoured for relevant articles that were in accordance with the study objectives. A team of reviewers was assembled to devise a proper methodological framework for inclusion of articles and meta-analysis.

RESULTS

11 descriptive studies were considered for this review that evaluated the accuracy of ChatGPT in answering medical queries related to different domains such as systematic reviews, cancer, liver diseases, diagnostic imaging, education, and COVID-19 vaccination. The studies reported different accuracy ranges, from 18.3 % to 100 %, across various datasets and specialties. The meta-analysis showed an odds ratio (OR) of 2.25 and a relative risk (RR) of 1.47 with a 95 % confidence interval (CI), indicating that the accuracy of ChatGPT in providing correct responses was significantly higher compared to the total responses for queries. However, significant heterogeneity was present among the studies, suggesting considerable variability in the effect sizes across the included studies.

CONCLUSION

The observations indicate that ChatGPT has the ability to provide appropriate solutions to questions in the medical and dentistry areas, but researchers and doctors should cautiously assess its responses because they might not always be dependable. Overall, the importance of this study rests in shedding light on ChatGPT's accuracy in the medical and dentistry fields and emphasizing the need for additional investigation to enhance its performance. © 2017 Elsevier Inc. All rights reserved.

Collapse

Pagano S, Holzapfel S, Kappenschneider T, Meyer M, Maderbacher G, Grifka J, Holzapfel DE. Arthrosis diagnosis and treatment recommendations in clinical practice: an exploratory investigation with the generative AI model GPT-4. J Orthop Traumatol 2023;24:61. [PMID: 38015298 PMCID: PMC10684473 DOI: 10.1186/s10195-023-00740-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 11/05/2023] [Indexed: 11/29/2023] Open

Abstract

BACKGROUND

The spread of artificial intelligence (AI) has led to transformative advancements in diverse sectors, including healthcare. Specifically, generative writing systems have shown potential in various applications, but their effectiveness in clinical settings has been barely investigated. In this context, we evaluated the proficiency of ChatGPT-4 in diagnosing gonarthrosis and coxarthrosis and recommending appropriate treatments compared with orthopaedic specialists.

METHODS

A retrospective review was conducted using anonymized medical records of 100 patients previously diagnosed with either knee or hip arthrosis. ChatGPT-4 was employed to analyse these historical records, formulating both a diagnosis and potential treatment suggestions. Subsequently, a comparative analysis was conducted to assess the concordance between the AI's conclusions and the original clinical decisions made by the physicians.

RESULTS

In diagnostic evaluations, ChatGPT-4 consistently aligned with the conclusions previously drawn by physicians. In terms of treatment recommendations, there was an 83% agreement between the AI and orthopaedic specialists. The therapeutic concordance was verified by the calculation of a Cohen's Kappa coefficient of 0.580 (p < 0.001). This indicates a moderate-to-good level of agreement. In recommendations pertaining to surgical treatment, the AI demonstrated a sensitivity and specificity of 78% and 80%, respectively. Multivariable logistic regression demonstrated that the variables reduced quality of life (OR 49.97, p < 0.001) and start-up pain (OR 12.54, p = 0.028) have an influence on ChatGPT-4's recommendation for a surgery.

CONCLUSION

This study emphasises ChatGPT-4's notable potential in diagnosing conditions such as gonarthrosis and coxarthrosis and in aligning its treatment recommendations with those of orthopaedic specialists. However, it is crucial to acknowledge that AI tools such as ChatGPT-4 are not meant to replace the nuanced expertise and clinical judgment of seasoned orthopaedic surgeons, particularly in complex decision-making scenarios regarding treatment indications. Due to the exploratory nature of the study, further research with larger patient populations and more complex diagnoses is necessary to validate the findings and explore the broader potential of AI in healthcare.

LEVEL OF EVIDENCE

Level III evidence.

Collapse

Wong RSY, Ming LC, Raja Ali RA. The Intersection of ChatGPT, Clinical Medicine, and Medical Education. JMIR MEDICAL EDUCATION 2023;9:e47274. [PMID: 37988149 DOI: 10.2196/47274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 06/16/2023] [Accepted: 06/30/2023] [Indexed: 11/22/2023]

Truhn D, Weber CD, Braun BJ, Bressem K, Kather JN, Kuhl C, Nebelung S. A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports. Sci Rep 2023;13:20159. [PMID: 37978240 PMCID: PMC10656559 DOI: 10.1038/s41598-023-47500-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/14/2023] [Indexed: 11/19/2023] Open

Gödde D, Nöhl S, Wolf C, Rupert Y, Rimkus L, Ehlers J, Breuckmann F, Sellmann T. A SWOT (Strengths, Weaknesses, Opportunities, and Threats) Analysis of ChatGPT in the Medical Literature: Concise Review. J Med Internet Res 2023;25:e49368. [PMID: 37865883 PMCID: PMC10690535 DOI: 10.2196/49368] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/23/2023] Open

Abstract

BACKGROUND

ChatGPT is a 175-billion-parameter natural language processing model that is already involved in scientific content and publications. Its influence ranges from providing quick access to information on medical topics, assisting in generating medical and scientific articles and papers, performing medical data analyses, and even interpreting complex data sets.

OBJECTIVE

The future role of ChatGPT remains uncertain and a matter of debate already shortly after its release. This review aimed to analyze the role of ChatGPT in the medical literature during the first 3 months after its release.

METHODS

We performed a concise review of literature published in PubMed from December 1, 2022, to March 31, 2023. To find all publications related to ChatGPT or considering ChatGPT, the search term was kept simple ("ChatGPT" in AllFields). All publications available as full text in German or English were included. All accessible publications were evaluated according to specifications by the author team (eg, impact factor, publication modus, article type, publication speed, and type of ChatGPT integration or content). The conclusions of the articles were used for later SWOT (strengths, weaknesses, opportunities, and threats) analysis. All data were analyzed on a descriptive basis.

RESULTS

Of 178 studies in total, 160 met the inclusion criteria and were evaluated. The average impact factor was 4.423 (range 0-96.216), and the average publication speed was 16 (range 0-83) days. Among the articles, there were 77 editorials (48,1%), 43 essays (26.9%), 21 studies (13.1%), 6 reviews (3.8%), 6 case reports (3.8%), 6 news (3.8%), and 1 meta-analysis (0.6%). Of those, 54.4% (n=87) were published as open access, with 5% (n=8) provided on preprint servers. Over 400 quotes with information on strengths, weaknesses, opportunities, and threats were detected. By far, most (n=142, 34.8%) were related to weaknesses. ChatGPT excels in its ability to express ideas clearly and formulate general contexts comprehensibly. It performs so well that even experts in the field have difficulty identifying abstracts generated by ChatGPT. However, the time-limited scope and the need for corrections by experts were mentioned as weaknesses and threats of ChatGPT. Opportunities include assistance in formulating medical issues for nonnative English speakers, as well as the possibility of timely participation in the development of such artificial intelligence tools since it is in its early stages and can therefore still be influenced.

CONCLUSIONS

Artificial intelligence tools such as ChatGPT are already part of the medical publishing landscape. Despite their apparent opportunities, policies and guidelines must be implemented to ensure benefits in education, clinical practice, and research and protect against threats such as scientific misconduct, plagiarism, and inaccuracy.

Collapse

Yu P, Xu H, Hu X, Deng C. Leveraging Generative AI and Large Language Models: A Comprehensive Roadmap for Healthcare Integration. Healthcare (Basel) 2023;11:2776. [PMID: 37893850 PMCID: PMC10606429 DOI: 10.3390/healthcare11202776] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/13/2023] [Accepted: 10/17/2023] [Indexed: 10/29/2023] Open

Miao H, Li C, Wang J. A Future of Smarter Digital Health Empowered by Generative Pretrained Transformer. J Med Internet Res 2023;25:e49963. [PMID: 37751243 PMCID: PMC10565615 DOI: 10.2196/49963] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 07/30/2023] [Accepted: 08/28/2023] [Indexed: 09/27/2023] Open

Sallam M, Salim NA, Barakat M, Al-Mahzoum K, Al-Tammemi AB, Malaeb D, Hallit R, Hallit S. Assessing Health Students' Attitudes and Usage of ChatGPT in Jordan: Validation Study. JMIR MEDICAL EDUCATION 2023;9:e48254. [PMID: 37578934 PMCID: PMC10509747 DOI: 10.2196/48254] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 07/25/2023] [Accepted: 08/14/2023] [Indexed: 08/16/2023]

Abstract

BACKGROUND

ChatGPT is a conversational large language model that has the potential to revolutionize knowledge acquisition. However, the impact of this technology on the quality of education is still unknown considering the risks and concerns surrounding ChatGPT use. Therefore, it is necessary to assess the usability and acceptability of this promising tool. As an innovative technology, the intention to use ChatGPT can be studied in the context of the technology acceptance model (TAM).

OBJECTIVE

This study aimed to develop and validate a TAM-based survey instrument called TAME-ChatGPT (Technology Acceptance Model Edited to Assess ChatGPT Adoption) that could be employed to examine the successful integration and use of ChatGPT in health care education.

METHODS

The survey tool was created based on the TAM framework. It comprised 13 items for participants who heard of ChatGPT but did not use it and 23 items for participants who used ChatGPT. Using a convenient sampling approach, the survey link was circulated electronically among university students between February and March 2023. Exploratory factor analysis (EFA) was used to assess the construct validity of the survey instrument.

RESULTS

The final sample comprised 458 respondents, the majority among them undergraduate students (n=442, 96.5%). Only 109 (23.8%) respondents had heard of ChatGPT prior to participation and only 55 (11.3%) self-reported ChatGPT use before the study. EFA analysis on the attitude and usage scales showed significant Bartlett tests of sphericity scores (P<.001) and adequate Kaiser-Meyer-Olkin measures (0.823 for the attitude scale and 0.702 for the usage scale), confirming the factorability of the correlation matrices. The EFA showed that 3 constructs explained a cumulative total of 69.3% variance in the attitude scale, and these subscales represented perceived risks, attitude to technology/social influence, and anxiety. For the ChatGPT usage scale, EFA showed that 4 constructs explained a cumulative total of 72% variance in the data and comprised the perceived usefulness, perceived risks, perceived ease of use, and behavior/cognitive factors. All the ChatGPT attitude and usage subscales showed good reliability with Cronbach α values >.78 for all the deduced subscales.

CONCLUSIONS

The TAME-ChatGPT demonstrated good reliability, validity, and usefulness in assessing health care students' attitudes toward ChatGPT. The findings highlighted the importance of considering risk perceptions, usefulness, ease of use, attitudes toward technology, and behavioral factors when adopting ChatGPT as a tool in health care education. This information can aid the stakeholders in creating strategies to support the optimal and ethical use of ChatGPT and to identify the potential challenges hindering its successful implementation. Future research is recommended to guide the effective adoption of ChatGPT in health care education.

Collapse

Russe MF, Fink A, Ngo H, Tran H, Bamberg F, Reisert M, Rau A. Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports. Sci Rep 2023;13:14215. [PMID: 37648742 PMCID: PMC10468502 DOI: 10.1038/s41598-023-41512-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 08/28/2023] [Indexed: 09/01/2023] Open

Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK, Landman A, Dreyer K, Succi MD. Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study. J Med Internet Res 2023;25:e48659. [PMID: 37606976 PMCID: PMC10481210 DOI: 10.2196/48659] [Citation(s) in RCA: 65] [Impact Index Per Article: 65.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/26/2023] [Accepted: 07/27/2023] [Indexed: 08/23/2023] Open

Abstract

BACKGROUND

Large language model (LLM)-based artificial intelligence chatbots direct the power of large training data sets toward successive, related tasks as opposed to single-ask tasks, for which artificial intelligence already achieves impressive performance. The capacity of LLMs to assist in the full scope of iterative clinical reasoning via successive prompting, in effect acting as artificial physicians, has not yet been evaluated.

OBJECTIVE

This study aimed to evaluate ChatGPT's capacity for ongoing clinical decision support via its performance on standardized clinical vignettes.

METHODS

We inputted all 36 published clinical vignettes from the Merck Sharpe & Dohme (MSD) Clinical Manual into ChatGPT and compared its accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and case acuity. Accuracy was measured by the proportion of correct responses to the questions posed within the clinical vignettes tested, as calculated by human scorers. We further conducted linear regression to assess the contributing factors toward ChatGPT's performance on clinical tasks.

RESULTS

ChatGPT achieved an overall accuracy of 71.7% (95% CI 69.3%-74.1%) across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9% (95% CI 67.8%-86.1%) and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3% (95% CI 54.2%-66.6%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis (β=-15.8%; P<.001) and clinical management (β=-7.4%; P=.02) question types.

CONCLUSIONS

ChatGPT achieves impressive accuracy in clinical decision-making, with increasing strength as it gains more clinical information at its disposal. In particular, ChatGPT demonstrates the greatest accuracy in tasks of final diagnosis as compared to initial diagnosis. Limitations include possible model hallucinations and the unclear composition of ChatGPT's training data set.

Collapse

Affiliation(s)

Arya Rao Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
Michael Pang Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
John Kim Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
Meghana Kamineni Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
Winston Lie Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
Anoop K Prasad Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
Adam Landman Harvard Medical School, Boston, MA, United States Department of Radiology, Brigham and Women's Hospital, Boston, MA, United States
Keith Dreyer Harvard Medical School, Boston, MA, United States Data Science Office, Mass General Brigham, Boston, MA, United States
Marc D Succi Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Department of Radiology, Massachusetts General Hospital, Boston, MA, United States Mass General Brigham Innovation, Mass General Brigham, Boston, MA, United States

Collapse

Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med 2023;29:1930-1940. [PMID: 37460753 DOI: 10.1038/s41591-023-02448-8] [Citation(s) in RCA: 335] [Impact Index Per Article: 335.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 06/08/2023] [Indexed: 08/17/2023]

Beaulieu-Jones BR, Shah S, Berrigan MT, Marwaha JS, Lai SL, Brat GA. Evaluating Capabilities of Large Language Models: Performance of GPT4 on Surgical Knowledge Assessments. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.16.23292743. [PMID: 37502981 PMCID: PMC10371188 DOI: 10.1101/2023.07.16.23292743] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]

Abstract

Background

Artificial intelligence (AI) has the potential to dramatically alter healthcare by enhancing how we diagnosis and treat disease. One promising AI model is ChatGPT, a large general-purpose language model trained by OpenAI. The chat interface has shown robust, human-level performance on several professional and academic benchmarks. We sought to probe its performance and stability over time on surgical case questions.

Methods

We evaluated the performance of ChatGPT-4 on two surgical knowledge assessments: the Surgical Council on Resident Education (SCORE) and a second commonly used knowledge assessment, referred to as Data-B. Questions were entered in two formats: open-ended and multiple choice. ChatGPT output were assessed for accuracy and insights by surgeon evaluators. We categorized reasons for model errors and the stability of performance on repeat encounters.

Results

A total of 167 SCORE and 112 Data-B questions were presented to the ChatGPT interface. ChatGPT correctly answered 71% and 68% of multiple-choice SCORE and Data-B questions, respectively. For both open-ended and multiple-choice questions, approximately two-thirds of ChatGPT responses contained non-obvious insights. Common reasons for inaccurate responses included: inaccurate information in a complex question (n=16, 36.4%); inaccurate information in fact-based question (n=11, 25.0%); and accurate information with circumstantial discrepancy (n=6, 13.6%). Upon repeat query, the answer selected by ChatGPT varied for 36.4% of inaccurate questions; the response accuracy changed for 6/16 questions.

Conclusion

Consistent with prior findings, we demonstrate robust near or above human-level performance of ChatGPT within the surgical domain. Unique to this study, we demonstrate a substantial inconsistency in ChatGPT responses with repeat query. This finding warrants future consideration and presents an opportunity to further train these models to provide safe and consistent responses. Without mental and/or conceptual models, it is unclear whether language models such as ChatGPT would be able to safely assist clinicians in providing care.

Collapse

Liu J, Wang C, Liu S. Utility of ChatGPT in Clinical Practice. J Med Internet Res 2023;25:e48568. [PMID: 37379067 PMCID: PMC10365580 DOI: 10.2196/48568] [Citation(s) in RCA: 79] [Impact Index Per Article: 79.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 05/29/2023] [Accepted: 06/15/2023] [Indexed: 06/29/2023] Open

Temsah MH, Aljamaan F, Malki KH, Alhasan K, Altamimi I, Aljarbou R, Bazuhair F, Alsubaihin A, Abdulmajeed N, Alshahrani FS, Temsah R, Alshahrani T, Al-Eyadhy L, Alkhateeb SM, Saddik B, Halwani R, Jamal A, Al-Tawfiq JA, Al-Eyadhy A. ChatGPT and the Future of Digital Health: A Study on Healthcare Workers' Perceptions and Expectations. Healthcare (Basel) 2023;11:1812. [PMID: 37444647 PMCID: PMC10340744 DOI: 10.3390/healthcare11131812] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 06/14/2023] [Accepted: 06/19/2023] [Indexed: 07/15/2023] Open

Abstract

This study aimed to assess the knowledge, attitudes, and intended practices of healthcare workers (HCWs) in Saudi Arabia towards ChatGPT, an artificial intelligence (AI) Chatbot, within the first three months after its launch. We also aimed to identify potential barriers to AI Chatbot adoption among healthcare professionals. A cross-sectional survey was conducted among 1057 HCWs in Saudi Arabia, distributed electronically via social media channels from 21 February to 6 March 2023. The survey evaluated HCWs' familiarity with ChatGPT-3.5, their satisfaction, intended future use, and perceived usefulness in healthcare practice. Of the respondents, 18.4% had used ChatGPT for healthcare purposes, while 84.1% of non-users expressed interest in utilizing AI Chatbots in the future. Most participants (75.1%) were comfortable with incorporating ChatGPT into their healthcare practice. HCWs perceived the Chatbot to be useful in various aspects of healthcare, such as medical decision-making (39.5%), patient and family support (44.7%), medical literature appraisal (48.5%), and medical research assistance (65.9%). A majority (76.7%) believed ChatGPT could positively impact the future of healthcare systems. Nevertheless, concerns about credibility and the source of information provided by AI Chatbots (46.9%) were identified as the main barriers. Although HCWs recognize ChatGPT as a valuable addition to digital health in the early stages of adoption, addressing concerns regarding accuracy, reliability, and medicolegal implications is crucial. Therefore, due to their unreliability, the current forms of ChatGPT and other Chatbots should not be used for diagnostic or treatment purposes without human expert oversight. Ensuring the trustworthiness and dependability of AI Chatbots is essential for successful implementation in healthcare settings. Future research should focus on evaluating the clinical outcomes of ChatGPT and benchmarking its performance against other AI Chatbots.

Collapse

Affiliation(s)

Mohamad-Hani Temsah College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia Evidence-Based Health Care & Knowledge Translation Research Chair, King Saud University, Riyadh 11587, Saudi Arabia
Fadi Aljamaan College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia Critical Care Department, King Saud University Medical City, Riyadh 11411, Saudi Arabia
Khalid H. Malki Research Chair of Voice, Swallowing, and Communication Disorders, ENT Department, College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia
Khalid Alhasan College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia Solid Organ Transplant Center of Excellence, King Faisal Specialist Hospital and Research Center, Riyadh 11564, Saudi Arabia
Ibraheem Altamimi College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia
Razan Aljarbou Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia
Faisal Bazuhair Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia
Abdulmajeed Alsubaihin College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia
Naif Abdulmajeed Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia Pediatric Nephrology Department, Prince Sultan Military Medical City, Riyadh 12233, Saudi Arabia
Fatimah S. Alshahrani College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia Division of Infectious Diseases, Department of Internal Medicine, College of Medicine, King Saud University, Riyadh 11451, Saudi Arabia
Reem Temsah College of Pharmacy, Alfaisal University, Riyadh 11533, Saudi Arabia
Turki Alshahrani Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia
Lama Al-Eyadhy College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia
Serin Mohammed Alkhateeb College of Medicine, Jordan University of Science and Technology, Irbid 22110, Jordan
Basema Saddik Sharjah Institute of Medical Research, University of Sharjah, Sharjah 27272, United Arab Emirates Department of Community and Family Medicine, College of Medicine, University of Sharjah, Sharjah 27272, United Arab Emirates School of Population Health, Faculty of Medicine & Health, UNSW Sydney, Sydney, NSW 2052, Australia
Rabih Halwani Sharjah Institute of Medical Research, University of Sharjah, Sharjah 27272, United Arab Emirates Department of Clinical Sciences, College of Medicine, University of Sharjah, Sharjah 27272, United Arab Emirates
Amr Jamal College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia Evidence-Based Health Care & Knowledge Translation Research Chair, King Saud University, Riyadh 11587, Saudi Arabia Department of Family and Community Medicine, King Saud University Medical City, Riyadh 11411, Saudi Arabia
Jaffar A. Al-Tawfiq Specialty Internal Medicine and Quality Department, Johns Hopkins Aramco Healthcare, Dhahran 34465, Saudi Arabia Infectious Disease Division, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202, USA Infectious Disease Division, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21218, USA
Ayman Al-Eyadhy College of Medicine, King Saud University, Riyadh 11587, Saudi Arabia Pediatric Department, King Saud University Medical City, King Saud University, Riyadh 11411, Saudi Arabia

Collapse

Shoja MM, Van de Ridder JMM, Rajput V. The Emerging Role of Generative Artificial Intelligence in Medical Education, Research, and Practice. Cureus 2023;15:e40883. [PMID: 37492829 PMCID: PMC10363933 DOI: 10.7759/cureus.40883] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 06/24/2023] [Indexed: 07/27/2023] Open

Deik A. Potential Benefits and Perils of Incorporating ChatGPT to the Movement Disorders Clinic. J Mov Disord 2023;16:158-162. [PMID: 37258279 PMCID: PMC10236019 DOI: 10.14802/jmd.23072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 04/18/2023] [Accepted: 04/21/2023] [Indexed: 06/02/2023] Open

Hamed E, Eid A, Alberry M. Exploring ChatGPT's Potential in Facilitating Adaptation of Clinical Guidelines: A Case Study of Diabetic Ketoacidosis Guidelines. Cureus 2023;15:e38784. [PMID: 37303347 PMCID: PMC10249915 DOI: 10.7759/cureus.38784] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2023] [Indexed: 06/13/2023] Open