1
|
Carroll AN, Storms LA, Malempati C, Shanavas RV, Badarudeen S. Generative Artificial Intelligence and Prompt Engineering: A Primer for Orthopaedic Surgeons. JBJS Rev 2024; 12:01874474-202410000-00002. [PMID: 39361780 DOI: 10.2106/jbjs.rvw.24.00122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
» Generative artificial intelligence (AI), a rapidly evolving field, has the potential to revolutionize orthopedic care by enhancing diagnostic accuracy, treatment planning, and patient management through data-driven insights and personalized strategies.» Unlike traditional AI, generative AI has the potential to generate relevant information for orthopaedic surgeons when instructed through prompts, automating tasks such as literature reviews, streamlining workflows, predicting health outcomes, and improving patient interactions.» Prompt engineering is essential for crafting effective prompts for large language models (LLMs), ensuring accurate and reliable AI-generated outputs, and promoting ethical decision-making in clinical settings.» Orthopaedic surgeons can choose between various prompt types-including open-ended, focused, and choice-based prompts-to tailor AI responses for specific clinical tasks to enhance the precision and utility of generated information.» Understanding the limitations of LLMs, such as token limits, context windows, and hallucinations, is crucial for orthopaedic surgeons to effectively use generative AI while addressing ethical concerns related to bias, privacy, and accountability.
Collapse
Affiliation(s)
- Amber N Carroll
- College of Medicine, University of Kentucky, Lexington, Kentucky
| | - Lewis A Storms
- College of Medicine, University of Kentucky, Lexington, Kentucky
| | - Chaitu Malempati
- Department of Orthopaedic Surgery and Sports Medicine, University of Kentucky, Lexington, Kentucky
| | | | - Sameer Badarudeen
- Department of Orthopaedic Surgery and Sports Medicine, University of Kentucky, Lexington, Kentucky
| |
Collapse
|
2
|
Chen CJ, Sobol K, Hickey C, Raphael J. The Comparative Performance of Large Language Models on the Hand Surgery Self-Assessment Examination. Hand (N Y) 2024:15589447241279460. [PMID: 39324769 PMCID: PMC11559719 DOI: 10.1177/15589447241279460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
BACKGROUND Generative artificial intelligence (AI) models have emerged as capable of producing human-like responses and have showcased their potential in general medical specialties. This study explores the performance of AI systems on the American Society for Surgery of the Hand (ASSH) Self-Assessment Exams (SAE). METHODS ChatGPT 4.0 and Bing AI were evaluated on a set of multiple-choice questions drawn from the ASSH SAE online question bank spanning 5 years (2019-2023). Each system was evaluated with 999 questions. Images and video links were inserted into question prompts to allow for complete AI interpretation. The performance of both systems was standardized using the May 2023 version of ChatGPT 4.0 and Microsoft Bing AI, both of which had web browsing and image capabilities. RESULTS ChatGPT 4.0 scored an average of 66.5% on the ASSH questions. Bing AI scored higher, with an average of 75.3%. Bing AI outperformed ChatGPT 4.0 by an average of 8.8%. As a benchmark, a minimum passing score of 50% was required for continuing medical education credit. Both ChatGPT 4.0 and Bing AI had poorer performance on video-type and image-type questions on analysis of variance testing. Responses from both models contained elements from sources such as PubMed, Journal of Hand Surgery, and American Academy of Orthopedic Surgeons. CONCLUSIONS ChatGPT 4.0 with browsing and Bing AI can both be anticipated to achieve passing scores on the ASSH SAE. Generative AI, with its ability to provide logical responses and literature citations, presents a convincing argument for use as an interactive learning aid and educational tool.
Collapse
Affiliation(s)
- Clark J. Chen
- Albert Einstein Healthcare Network, Philadelphia, PA, USA
| | - Keenan Sobol
- Albert Einstein Healthcare Network, Philadelphia, PA, USA
| | - Connor Hickey
- Albert Einstein Healthcare Network, Philadelphia, PA, USA
| | - James Raphael
- Albert Einstein Healthcare Network, Philadelphia, PA, USA
| |
Collapse
|
3
|
Yao JJ, Aggarwal M, Lopez RD, Namdari S. Current Concepts Review: Large Language Models in Orthopaedics: Definitions, Uses, and Limitations. J Bone Joint Surg Am 2024:00004623-990000000-01136. [PMID: 38896652 DOI: 10.2106/jbjs.23.01417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
➤ Large language models are a subset of artificial intelligence. Large language models are powerful tools that excel in natural language text processing and generation.➤ There are many potential clinical, research, and educational applications of large language models in orthopaedics, but the development of these applications needs to be focused on patient safety and the maintenance of high standards.➤ There are numerous methodological, ethical, and regulatory concerns with regard to the use of large language models. Orthopaedic surgeons need to be aware of the controversies and advocate for an alignment of these models with patient and caregiver priorities.
Collapse
Affiliation(s)
- Jie J Yao
- Rothman Orthopaedic Institute, Thomas Jefferson University, Philadelphia, Pennsylvania
| | | | - Ryan D Lopez
- Rothman Orthopaedic Institute, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Surena Namdari
- Rothman Orthopaedic Institute, Thomas Jefferson University, Philadelphia, Pennsylvania
| |
Collapse
|
4
|
Pressman SM, Borna S, Gomez-Cabello CA, Haider SA, Haider C, Forte AJ. AI and Ethics: A Systematic Review of the Ethical Considerations of Large Language Model Use in Surgery Research. Healthcare (Basel) 2024; 12:825. [PMID: 38667587 PMCID: PMC11050155 DOI: 10.3390/healthcare12080825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/02/2024] [Accepted: 04/09/2024] [Indexed: 04/28/2024] Open
Abstract
INTRODUCTION As large language models receive greater attention in medical research, the investigation of ethical considerations is warranted. This review aims to explore surgery literature to identify ethical concerns surrounding these artificial intelligence models and evaluate how autonomy, beneficence, nonmaleficence, and justice are represented within these ethical discussions to provide insights in order to guide further research and practice. METHODS A systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Five electronic databases were searched in October 2023. Eligible studies included surgery-related articles that focused on large language models and contained adequate ethical discussion. Study details, including specialty and ethical concerns, were collected. RESULTS The literature search yielded 1179 articles, with 53 meeting the inclusion criteria. Plastic surgery, orthopedic surgery, and neurosurgery were the most represented surgical specialties. Autonomy was the most explicitly cited ethical principle. The most frequently discussed ethical concern was accuracy (n = 45, 84.9%), followed by bias, patient confidentiality, and responsibility. CONCLUSION The ethical implications of using large language models in surgery are complex and evolving. The integration of these models into surgery necessitates continuous ethical discourse to ensure responsible and ethical use, balancing technological advancement with human dignity and safety.
Collapse
Affiliation(s)
| | - Sahar Borna
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
| | | | - Syed A. Haider
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Clifton Haider
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN 55905, USA
| | - Antonio J. Forte
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
- Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
5
|
Ansari N, Babaei V, Najafpour MM. Enhancing catalysis studies with chat generative pre-trained transformer (ChatGPT): Conversation with ChatGPT. Dalton Trans 2024; 53:3534-3547. [PMID: 38275279 DOI: 10.1039/d3dt04178f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024]
Abstract
The progress made in natural language processing (NLP) and large language models (LLMs), such as generative pre-trained transformers, (GPT) has provided exciting opportunities for enhancing research across various fields. Within the realm of catalysis studies, GPT-driven models present valuable support in expediting the exploration and comprehension of catalytic processes. This research underscores the significance of ChatGPT in catalysis research, emphasizing its prowess as a valuable tool for furthering scientific inquiries. It suggests that for an outstanding oxygen evolution reaction (OER) catalyst as a case study, scientists can leverage ChatGPT to extract deeper insights and brainstorm innovative approaches to grasp the mechanism better and refine current systems.
Collapse
Affiliation(s)
- Navid Ansari
- Max Planck Institute for Informatics Saarbrücken, Germany
| | - Vahid Babaei
- Max Planck Institute for Informatics Saarbrücken, Germany
| | - Mohammad Mahdi Najafpour
- Department of Chemistry, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 45137-66731, Iran.
- Center of Climate Change and Global Warming, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 45137-66731, Iran
- Research Center for Basic Sciences & Modern Technologies (RBST), Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 45137-66731, Iran
| |
Collapse
|
6
|
Andrew A. Potential applications and implications of large language models in primary care. Fam Med Community Health 2024; 12:e002602. [PMID: 38290759 PMCID: PMC10828839 DOI: 10.1136/fmch-2023-002602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/16/2024] [Indexed: 02/01/2024] Open
Abstract
The recent release of highly advanced generative artificial intelligence (AI) chatbots, including ChatGPT and Bard, which are powered by large language models (LLMs), has attracted growing mainstream interest over its diverse applications in clinical practice, including in health and healthcare. The potential applications of LLM-based programmes in the medical field range from assisting medical practitioners in improving their clinical decision-making and streamlining administrative paperwork to empowering patients to take charge of their own health. However, despite the broad range of benefits, the use of such AI tools also comes with several limitations and ethical concerns that warrant further consideration, encompassing issues related to privacy, data bias, and the accuracy and reliability of information generated by AI. The focus of prior research has primarily centred on the broad applications of LLMs in medicine. To the author's knowledge, this is, the first article that consolidates current and pertinent literature on LLMs to examine its potential in primary care. The objectives of this paper are not only to summarise the potential benefits, risks and challenges of using LLMs in primary care, but also to offer insights into considerations that primary care clinicians should take into account when deciding to adopt and integrate such technologies into their clinical practice.
Collapse
Affiliation(s)
- Albert Andrew
- Medical Student, The University of Auckland School of Medicine, Auckland, New Zealand
| |
Collapse
|
7
|
Fabijan A, Polis B, Fabijan R, Zakrzewski K, Nowosławska E, Zawadzka-Fabijan A. Artificial Intelligence in Scoliosis Classification: An Investigation of Language-Based Models. J Pers Med 2023; 13:1695. [PMID: 38138922 PMCID: PMC10744696 DOI: 10.3390/jpm13121695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 12/03/2023] [Accepted: 12/07/2023] [Indexed: 12/24/2023] Open
Abstract
Open-source artificial intelligence models are finding free application in various industries, including computer science and medicine. Their clinical potential, especially in assisting diagnosis and therapy, is the subject of increasingly intensive research. Due to the growing interest in AI for diagnostics, we conducted a study evaluating the abilities of AI models, including ChatGPT, Microsoft Bing, and Scholar AI, in classifying single-curve scoliosis based on radiological descriptions. Fifty-six posturographic images depicting single-curve scoliosis were selected and assessed by two independent neurosurgery specialists, who classified them as mild, moderate, or severe based on Cobb angles. Subsequently, descriptions were developed that accurately characterized the degree of spinal deformation, based on the measured values of Cobb angles. These descriptions were then provided to AI language models to assess their proficiency in diagnosing spinal pathologies. The artificial intelligence models conducted classification using the provided data. Our study also focused on identifying specific sources of information and criteria applied in their decision-making algorithms, aiming for a deeper understanding of the determinants influencing AI decision processes in scoliosis classification. The classification quality of the predictions was evaluated using performance evaluation metrics such as sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and balanced accuracy. Our study strongly supported our hypothesis, showing that among four AI models, ChatGPT 4 and Scholar AI Premium excelled in classifying single-curve scoliosis with perfect sensitivity and specificity. These models demonstrated unmatched rater concordance and excellent performance metrics. In comparing real and AI-generated scoliosis classifications, they showed impeccable precision in all posturographic images, indicating total accuracy (1.0, MAE = 0.0) and remarkable inter-rater agreement, with a perfect Fleiss' Kappa score. This was consistent across scoliosis cases with a Cobb's angle range of 11-92 degrees. Despite high accuracy in classification, each model used an incorrect angular range for the mild stage of scoliosis. Our findings highlight the immense potential of AI in analyzing medical data sets. However, the diversity in competencies of AI models indicates the need for their further development to more effectively meet specific needs in clinical practice.
Collapse
Affiliation(s)
- Artur Fabijan
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (B.P.); (K.Z.); (E.N.)
| | - Bartosz Polis
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (B.P.); (K.Z.); (E.N.)
| | | | - Krzysztof Zakrzewski
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (B.P.); (K.Z.); (E.N.)
| | - Emilia Nowosławska
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (B.P.); (K.Z.); (E.N.)
| | - Agnieszka Zawadzka-Fabijan
- Department of Rehabilitation Medicine, Faculty of Health Sciences, Medical University of Lodz, 90-419 Lodz, Poland;
| |
Collapse
|
8
|
Ray PP. Revisiting the need for the use of GPT in surgery and medicine. Tech Coloproctol 2023; 27:959-960. [PMID: 37498419 DOI: 10.1007/s10151-023-02847-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 07/22/2023] [Indexed: 07/28/2023]
Affiliation(s)
- P P Ray
- Sikkim University, Gangtok, India.
| |
Collapse
|
9
|
Ghim JL, Ahn S. Transforming clinical trials: the emerging roles of large language models. Transl Clin Pharmacol 2023; 31:131-138. [PMID: 37810626 PMCID: PMC10551746 DOI: 10.12793/tcp.2023.31.e16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 09/12/2023] [Accepted: 09/14/2023] [Indexed: 10/10/2023] Open
Abstract
Clinical trials are essential for medical research, but they often face challenges in matching patients to trials and planning. Large language models (LLMs) offer a promising solution, signaling a transformative shift in the field of clinical trials. This review explores the multifaceted applications of LLMs within clinical trials, focusing on five main areas expected to be implemented in the near future: enhancing patient-trial matching, streamlining clinical trial planning, analyzing free text narratives for coding and classification, assisting in technical writing tasks, and providing cognizant consent via LLM-powered chatbots. While the application of LLMs is promising, it poses challenges such as accuracy validation and legal concerns. The convergence of LLMs with clinical trials has the potential to revolutionize the efficiency of clinical trials, paving the way for innovative methodologies and enhancing patient engagement. However, this development requires careful consideration and investment to overcome potential hurdles.
Collapse
Affiliation(s)
- Jong-Lyul Ghim
- Department of Clinical Pharmacology, Inje University Busan Paik Hospital, Busan 47392, Korea
- Center for Personalized Precision Medicine of Tuberculosis, Inje University College of Medicine, Busan 47392, Korea
| | - Sangzin Ahn
- Center for Personalized Precision Medicine of Tuberculosis, Inje University College of Medicine, Busan 47392, Korea
- Department of Pharmacology and Pharmacogenomics Research Center, Inje University College of Medicine, Busan 47392, Korea
| |
Collapse
|