1
|
Lam RX, Sonia He ZM, Thapar R, Wang M, Birhiray DG, Milad M, Ouellette L, Ghilzai U, Cushing TJ, Price MB, Ronna BB, Atassi OH, Perkins CH, Dawson JR, Granberry WM, Harrington MA, Dirschl DR, Deveza LR. A Review of Medical Ethics in Orthopaedic Surgery: Current Foci and Future Considerations. J Bone Joint Surg Am 2025:00004623-990000000-01457. [PMID: 40403094 DOI: 10.2106/jbjs.24.01137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/24/2025]
Abstract
➢ Medical ethics education is a required component of orthopaedic surgery resident training per the Accreditation Council for Graduate Medical Education (ACGME) guidelines, although no standardized curriculum currently exists.➢ Beyond the 4 principles of bioethics (autonomy, beneficence, nonmaleficence, justice), additional ethical concepts relevant to orthopaedic care include utilitarianism, deontology, virtue ethics, moral intuitionism, microethics, and narrative ethics.➢ Ethical themes identified in the literature relevant to orthopaedic surgery include the ethics involved in medical decision-making, use of new technologies, caring for vulnerable patients, performing high-stakes procedures, the impacts of trainee status on patient care, and patient attitude regarding conflict of interest.➢ Ethical themes that we sought to identify in the literature but found lacking include the ethics of providing orthopaedic care in low-resource settings, orthopaedics entrepreneurship, disability ethics, trainee mistreatment by their supervisors, and the ethics involved in the recognition and reporting of child and elder abuse.
Collapse
Affiliation(s)
- Ryan X Lam
- School of Medicine, Baylor College of Medicine, Houston, Texas
| | | | - Ruhi Thapar
- School of Medicine, Baylor College of Medicine, Houston, Texas
| | - Maggie Wang
- School of Medicine, Baylor College of Medicine, Houston, Texas
| | | | - Matthew Milad
- School of Medicine, Baylor College of Medicine, Houston, Texas
| | | | - Umar Ghilzai
- Department of Orthopaedic Surgery, Baylor College of Medicine, Houston, Texas
| | - Tucker J Cushing
- Department of Orthopaedic Surgery, Baylor College of Medicine, Houston, Texas
| | - M Brent Price
- Department of Orthopaedic Surgery, Baylor College of Medicine, Houston, Texas
| | - Brenden B Ronna
- Department of Orthopaedic Surgery, Baylor College of Medicine, Houston, Texas
| | - Omar H Atassi
- Department of Orthopaedic Surgery, Baylor College of Medicine, Houston, Texas
| | | | - John R Dawson
- Department of Orthopaedic Surgery, Baylor College of Medicine, Houston, Texas
| | - William M Granberry
- Department of Orthopaedic Surgery, Baylor College of Medicine, Houston, Texas
| | - Melvyn A Harrington
- Department of Orthopaedic Surgery, Baylor College of Medicine, Houston, Texas
| | - Douglas R Dirschl
- Department of Orthopaedic Surgery, Baylor College of Medicine, Houston, Texas
| | - Lorenzo R Deveza
- Department of Orthopaedic Surgery, Baylor College of Medicine, Houston, Texas
| |
Collapse
|
2
|
Kalem M, Balaban K, Kocaoğlu H, Kından P, Şahin E. Evaluation of ChatGPT responses to common patient questions on ankle fusion. Foot Ankle Surg 2025:S1268-7731(25)00117-1. [PMID: 40360354 DOI: 10.1016/j.fas.2025.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/06/2025] [Revised: 04/11/2025] [Accepted: 05/06/2025] [Indexed: 05/15/2025]
Abstract
BACKGROUND The study assessed the quality and readability of responses given by ChatGPT to common patient questions about ankle fusion. METHODS Twenty-five frequently asked questions about ankle fusion were queried to ChatGPT 4.0 individually, and responses were assessed using the accuracy score developed by Mika et al. and the DISCERN tool. Readability was evaluated using the Flesch-Kincaid grade level, Gunning Fog, Coleman-Liau, and Simple Measure of Gobbledygook indexes. RESULTS ChatGPT's responses were generally of acceptable quality, with a mean accuracy score of 2, indicating that the overall responses were satisfactory and required minimal clarification, along with a DISCERN score of 49.78, which is considered fair. However, the readability level was high, with a mean of 11.6 grade level. CONCLUSIONS ChatGPT showed promise as a resource for answering common patient questions about ankle fusion, providing mostly valid information. However, a high reading level was necessary to understand the response given. LEVELS OF EVIDENCE N/A.
Collapse
Affiliation(s)
- Mahmut Kalem
- Department of Orthopedics and Traumatology, Faculty of Medicine, Ankara University in Ankara, Turkey.
| | - Kamil Balaban
- Department of Orthopedics and Traumatology at the Ministry of Health Finike State Hospital in Antalya, Turkey.
| | - Hakan Kocaoğlu
- Department of Orthopedics and Traumatology, Faculty of Medicine, Ankara University in Ankara, Turkey.
| | - Peri Kından
- Department of Orthopedics and Traumatology, Faculty of Medicine, Ankara University in Ankara, Turkey.
| | - Ercan Şahin
- Bulent Ecevit University Faculty of Medicine, Department of Orthopedics and Traumatology, Zonguldak, Turkey.
| |
Collapse
|
3
|
Milner JD, Quinn MS, Schmitt P, Knebel A, Henstenburg J, Nasreddine A, Boulos AR, Schiller JR, Eberson CP, Cruz AI. Performance of Artificial Intelligence in Addressing Questions Regarding the Management of Pediatric Supracondylar Humerus Fractures. JOURNAL OF THE PEDIATRIC ORTHOPAEDIC SOCIETY OF NORTH AMERICA 2025; 11:100164. [PMID: 40432855 PMCID: PMC12088213 DOI: 10.1016/j.jposna.2025.100164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Revised: 01/04/2025] [Accepted: 01/08/2025] [Indexed: 05/29/2025]
Abstract
Background The vast accessibility of artificial intelligence (AI) has enabled its utilization in medicine to improve patient education, augment patient-physician communications, support research efforts, and enhance medical student education. However, there is significant concern that these models may provide responses that are incorrect, biased, or lacking in the required nuance and complexity of best practice clinical decision-making. Currently, there is a paucity of literature comparing the quality and reliability of AI-generated responses. The purpose of this study was to assess the ability of ChatGPT and Gemini to generate reponses to the 2022 American Academy of Orthopaedic Surgeons' (AAOS) current practice guidlines on pediatric supracondylar humerus fractures. We hypothesized that both ChatGPT and Gemini would demonstrate high-quality, evidence-based responses with no significant difference between the models across evaluation criteria. Methods The responses from ChatGPT and Gemini to responses based on the 14 AAOS guidelines were evaluated by seven fellowship-trained pediatric orthopaedic surgeons using a questionnaire to assess five key characteristics on a scale from 1 to 5. The prompts were categorized into nonoperative or preoperative management and diagnosis, surgical timing and technique, and rehabilitation and prevention. Statistical analysis included mean scoring, standard deviation, and two-sided t-tests to compare the performance between ChatGPT and Gemini. Scores were then evaluated for inter-rater reliability. Results ChatGPT and Gemini demonstrated consistent performance across the criteria, with high mean scores across all criteria except for evidence-based responses. Mean scores were highest for clarity (ChatGPT: 3.745 ± 0.237, Gemini 4.388 ± 0.154) and lowest for evidence-based responses (ChatGPT: 1.816 ± 0.181, Gemini: 3.765 ± 0.229). There were notable statistically significant differences across all criteria, with Gemini having higher mean scores in each criterion (P < .001). Gemini achieved statistically higher ratings in the relevance (P = .03) and evidence-based (P < .001) criteria. Both large language models (LLMs) performed comparably in the accuracy, clarity, and completeness criteria (P > .05). Conclusions ChatGPT and Gemini produced responses aligned with the 2022 AAOS current guideline practices for pediatric supracondylar humerus fractures. Gemini outperformed ChatGPT across all criteria, with the greatest difference in scores seen in the evidence-based category. This study emphasizes the potential for LLMs, particularly Gemini, to provide pertinent clinical information for managing pediatric supracondylar humerus fractures. Key Concepts (1)The accessibility of artificial intelligence has enabled its utilization in medicine to improve patient education, support research efforts, enhance medical student education, and augment patient-physician communications.(2)There is a significant concern that artificial intelligence may provide responses that are incorrect, biased, or lacking in the required nuance and complexity of best practice clinical decision-making.(3)There is a paucity of literature comparing the quality and reliability of AI-generated responses regarding management of pediatric supracondylar humerus fractures.(4)In our study, both ChatGPT and Gemini produced responses that were well aligned with the AAOS current guideline practices for pediatric supracondylar humerus fractures; however, Gemini outperformed ChatGPT across all criteria, with the greatest difference in scores seen in the evidence-based category. Level of Evidence Level II.
Collapse
Affiliation(s)
- John D. Milner
- Department of Orthopaedic Surgery, Brown University, Warren Alpert Medical School, Providence, RI, USA
| | - Matthew S. Quinn
- Department of Orthopaedic Surgery, Brown University, Warren Alpert Medical School, Providence, RI, USA
| | - Phillip Schmitt
- Department of Orthopaedic Surgery, Brown University, Warren Alpert Medical School, Providence, RI, USA
| | - Ashley Knebel
- Department of Orthopaedic Surgery, Brown University, Warren Alpert Medical School, Providence, RI, USA
| | | | - Adam Nasreddine
- Division of Sports Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Alexandre R. Boulos
- Department of Orthopaedic Surgery, Brown University, Warren Alpert Medical School, Providence, RI, USA
| | - Jonathan R. Schiller
- Department of Orthopaedic Surgery, Brown University, Warren Alpert Medical School, Providence, RI, USA
| | - Craig P. Eberson
- Department of Orthopaedic Surgery, Brown University, Warren Alpert Medical School, Providence, RI, USA
| | - Aristides I. Cruz
- Division of Sports Medicine, Boston Children's Hospital, Boston, MA, USA
| |
Collapse
|
4
|
Gencer B, Arzu U, Orhan SS, Dinçal T, Ekinci M. Evaluation of ChatGPT Responses About Sexual Activity After Total Hip Arthroplasty: A Comparative Study with Observers of Different Experience Levels. J Clin Med 2025; 14:2942. [PMID: 40363974 PMCID: PMC12072486 DOI: 10.3390/jcm14092942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2025] [Revised: 04/21/2025] [Accepted: 04/23/2025] [Indexed: 05/15/2025] Open
Abstract
Background/Objectives: Despite the rising tendency to depend on ChatGPT for medical counselling, it is imperative to evaluate ChatGPT's capacity to address sensitive subjects that patients often hesitate to discuss with their physicians. The objective of this study was to evaluate the recommendations provided by ChatGPT for sexual activity subsequent to total hip arthroplasty (THA) by orthopaedic surgeons with varying degrees of experience, as well as using standardized scoring systems. Methods: Four patient scenarios were developed, reflecting different ages and indications for THA. Twenty-four questions were asked to ChatGPT 4.0, and responses were evaluated by three different orthopaedic surgeons. All responses were also scored using defined standardized scales. Results: No response was found to be 'faulty' or 'partial' by any of the observers. While the lowest mean score was attributed by the orthopaedic surgeon with less than five years of experience, the highest mean score was attributed by the orthopaedic surgeon with more than 15 years of experience but not actively working in the field of arthroplasty. An analysis of the data across scenarios revealed that in general, the scores decreased in the more specialized scenarios (p > 0.05). Conclusions: ChatGPT shows potential as a supplementary resource for addressing sensitive postoperative questions such as sexual activity after THA. However, its limitations in providing nuanced, patient-specific recommendations highlight the need for further refinement. While ChatGPT can support general patient education, expert clinical guidance remains essential for addressing complex or individualized concerns.
Collapse
Affiliation(s)
- Batuhan Gencer
- Department of Orthopedics and Traumatology, Marmara University Pendik Training and Research Hospital, 34890 Istanbul, Turkey; (U.A.); (S.S.O.); (T.D.); (M.E.)
| | | | | | | | | |
Collapse
|
5
|
Heisinger S, Salzmann SN, Senker W, Aspalter S, Oberndorfer J, Matzner MP, Stienen MN, Motov S, Huber D, Grohs JG. ChatGPT's Performance in Spinal Metastasis Cases-Can We Discuss Our Complex Cases with ChatGPT? J Clin Med 2024; 13:7864. [PMID: 39768787 PMCID: PMC11727723 DOI: 10.3390/jcm13247864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 12/11/2024] [Accepted: 12/19/2024] [Indexed: 01/06/2025] Open
Abstract
Background: The integration of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT-4, is transforming healthcare. ChatGPT's potential to assist in decision-making for complex cases, such as spinal metastasis treatment, is promising but widely untested. Especially in cancer patients who develop spinal metastases, precise and personalized treatment is essential. This study examines ChatGPT-4's performance in treatment planning for spinal metastasis cases compared to experienced spine surgeons. Materials and Methods: Five spine metastasis cases were randomly selected from recent literature. Consequently, five spine surgeons and ChatGPT-4 were tasked with providing treatment recommendations for each case in a standardized manner. Responses were analyzed for frequency distribution, agreement, and subjective rater opinions. Results: ChatGPT's treatment recommendations aligned with the majority of human raters in 73% of treatment choices, with moderate to substantial agreement on systemic therapy, pain management, and supportive care. However, ChatGPT's recommendations tended towards generalized statements, with raters noting its generalized answers. Agreement among raters improved in sensitivity analyses excluding ChatGPT, particularly for controversial areas like surgical intervention and palliative care. Conclusions: ChatGPT shows potential in aligning with experienced surgeons on certain treatment aspects of spinal metastasis. However, its generalized approach highlights limitations, suggesting that training with specific clinical guidelines could potentially enhance its utility in complex case management. Further studies are necessary to refine AI applications in personalized healthcare decision-making.
Collapse
Affiliation(s)
- Stephan Heisinger
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| | - Stephan N. Salzmann
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| | - Wolfgang Senker
- Department of Neurosurgery, Kepler University Hospital, 4020 Linz, Austria (S.A.)
| | - Stefan Aspalter
- Department of Neurosurgery, Kepler University Hospital, 4020 Linz, Austria (S.A.)
| | - Johannes Oberndorfer
- Department of Neurosurgery, Kepler University Hospital, 4020 Linz, Austria (S.A.)
| | - Michael P. Matzner
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| | - Martin N. Stienen
- Spine Center of Eastern Switzerland & Department of Neurosurgery, Kantonsspital St. Gallen, Medical School of St. Gallen, University of St.Gallen, 9000 St. Gallen, Switzerland
| | - Stefan Motov
- Spine Center of Eastern Switzerland & Department of Neurosurgery, Kantonsspital St. Gallen, Medical School of St. Gallen, University of St.Gallen, 9000 St. Gallen, Switzerland
| | - Dominikus Huber
- Division of Oncology, Department of Medicine I, Medical University of Vienna, 1090 Vienna, Austria
| | - Josef Georg Grohs
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| |
Collapse
|
6
|
Frodl A, Fuchs A, Yilmaz T, Izadpanah K, Schmal H, Siegel M. ChatGPT as a Source for Patient Information on Patellofemoral Surgery-A Comparative Study Amongst Laymen, Doctors, and Experts. Clin Pract 2024; 14:2376-2384. [PMID: 39585014 PMCID: PMC11587161 DOI: 10.3390/clinpract14060186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2024] [Revised: 09/23/2024] [Accepted: 09/26/2024] [Indexed: 11/26/2024] Open
Abstract
INTRODUCTION In November 2022, OpenAI launched ChatGPT for public use through a free online platform. ChatGPT is an artificial intelligence (AI) chatbot trained on a broad dataset encompassing a wide range of topics, including medical literature. The usability in the medical field and the quality of AI-generated responses are widely discussed and are the subject of current investigations. Patellofemoral pain is one of the most common conditions among young adults, often prompting patients to seek advice. This study examines the quality of ChatGPT as a source of information regarding patellofemoral conditions and surgery, hypothesizing that there will be differences in the evaluation of responses generated by ChatGPT between populations with different levels of expertise in patellofemoral disorders. METHODS A comparison was conducted between laymen, doctors (non-orthopedic), and experts in patellofemoral disorders based on a list of 12 questions. These questions were divided into descriptive and recommendatory categories, with each category further split into basic and advanced content. Questions were used to prompt ChatGPT in April 2024 using the ChatGPT 4.0 engine, and answers were evaluated using a custom tool inspired by the Ensuring Quality Information for Patients (EQIP) instrument. Evaluations were performed independently by laymen, non-orthopedic doctors, and experts, with the results statistically analyzed using a Mann-Whitney U Test. A p-value of less than 0.05 was considered statistically significant. RESULTS The study included data from seventeen participants: four experts in patellofemoral disorders, seven non-orthopedic doctors, and six laymen. Experts rated the answers lower on average compared to non-experts. Significant differences were observed in the ratings of descriptive answers with increasing complexity. The average score for experts was 29.3 ± 5.8, whereas non-experts averaged 35.3 ± 5.7. For recommendatory answers, experts also gave lower ratings, particularly for more complex questions. CONCLUSION ChatGPT provides good quality answers to questions concerning patellofemoral disorders, although questions with higher complexity were rated lower by patellofemoral experts compared to non-experts. This study emphasizes the potential of ChatGPT as a complementary tool for patient information on patellofemoral disorders, although the quality of the answers fluctuates with the complexity of the questions, which might not be recognized by non-experts. The lack of personalized recommendations and the problem of "AI hallucinations" remain a challenge. Human expertise and judgement, especially from trained healthcare experts, remain irreplaceable.
Collapse
Affiliation(s)
- Andreas Frodl
- Department of Orthopedic Surgery and Traumatology, Freiburg University Hospital, Albert Ludwigs University Freiburg, Hugstetter Straße 55, 79106 Freiburg, Germany
| | - Andreas Fuchs
- Department of Orthopedic Surgery and Traumatology, Freiburg University Hospital, Albert Ludwigs University Freiburg, Hugstetter Straße 55, 79106 Freiburg, Germany
| | - Tayfun Yilmaz
- Department of Orthopedic Surgery and Traumatology, Freiburg University Hospital, Albert Ludwigs University Freiburg, Hugstetter Straße 55, 79106 Freiburg, Germany
| | - Kaywan Izadpanah
- Department of Orthopedic Surgery and Traumatology, Freiburg University Hospital, Albert Ludwigs University Freiburg, Hugstetter Straße 55, 79106 Freiburg, Germany
| | - Hagen Schmal
- Department of Orthopedic Surgery and Traumatology, Freiburg University Hospital, Albert Ludwigs University Freiburg, Hugstetter Straße 55, 79106 Freiburg, Germany
- Department of Orthopedic Surgery, University Hospital Odense, Sdr. Boulevard 29, 5000 Odense, Denmark
| | - Markus Siegel
- Department of Orthopedic Surgery and Traumatology, Freiburg University Hospital, Albert Ludwigs University Freiburg, Hugstetter Straße 55, 79106 Freiburg, Germany
| |
Collapse
|