1
|
Fiedler B, Azua EN, Phillips T, Ahmed AS. ChatGPT performance on the American Shoulder and Elbow Surgeons maintenance of certification exam. J Shoulder Elbow Surg 2024; 33:1888-1893. [PMID: 38580067 DOI: 10.1016/j.jse.2024.02.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 01/24/2024] [Accepted: 02/12/2024] [Indexed: 04/07/2024]
Abstract
BACKGROUND While multiple studies have tested the ability of large language models (LLMs), such as ChatGPT, to pass standardized medical exams at different levels of training, LLMs have never been tested on surgical sub-specialty examinations, such as the American Shoulder and Elbow Surgeons (ASES) Maintenance of Certification (MOC). The purpose of this study was to compare results of ChatGPT 3.5, GPT-4, and fellowship-trained surgeons on the 2023 ASES MOC self-assessment exam. METHODS ChatGPT 3.5 and GPT-4 were subjected to the same set of text-only questions from the ASES MOC exam, and GPT-4 was additionally subjected to image-based MOC exam questions. Question responses from both models were compared against the correct answers. Performance of both models was compared to corresponding average human performance on the same question subsets. One sided proportional z-test were utilized to analyze data. RESULTS Humans performed significantly better than Chat GPT 3.5 on exclusively text-based questions (76.4% vs. 60.8%, P = .044). Humans also performed significantly better than GPT 4 on image-based questions (73.9% vs. 53.2%, P = .019). There was no significant difference between humans and GPT 4 in text-based questions (76.4% vs. 66.7%, P = .136). Accounting for all questions, humans significantly outperformed GPT-4 (75.3% vs. 60.2%, P = .012). GPT-4 did not perform statistically significantly betterer than ChatGPT 3.5 on text-only questions (66.7% vs. 60.8%, P = .268). DISCUSSION Although human performance was overall superior, ChatGPT demonstrated the capacity to analyze orthopedic information and answer specialty-specific questions on the ASES MOC exam for both text and image-based questions. With continued advancements in deep learning, LLMs may someday rival exam performance of fellowship-trained surgeons.
Collapse
Affiliation(s)
- Benjamin Fiedler
- Baylor College of Medicine, Joseph Barnhart Department of Orthopedic Surgery, Houston, TX, USA.
| | - Eric N Azua
- Baylor College of Medicine, Joseph Barnhart Department of Orthopedic Surgery, Houston, TX, USA
| | - Todd Phillips
- Baylor College of Medicine, Joseph Barnhart Department of Orthopedic Surgery, Houston, TX, USA
| | - Adil Shahzad Ahmed
- Baylor College of Medicine, Joseph Barnhart Department of Orthopedic Surgery, Houston, TX, USA
| |
Collapse
|
2
|
Sassi M, Villa Corta M, Pisani MG, Nicodemi G, Schena E, Pecchia L, Longo UG. Advanced Home-Based Shoulder Rehabilitation: A Systematic Review of Remote Monitoring Devices and Their Therapeutic Efficacy. SENSORS (BASEL, SWITZERLAND) 2024; 24:2936. [PMID: 38733040 PMCID: PMC11086333 DOI: 10.3390/s24092936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 04/30/2024] [Accepted: 05/02/2024] [Indexed: 05/13/2024]
Abstract
Shoulder pain represents the most frequently reported musculoskeletal disorder, often leading to significant functional impairment and pain, impacting quality of life. Home-based rehabilitation programs offer a more accessible and convenient solution for an effective shoulder disorder treatment, addressing logistical and financial constraints associated with traditional physiotherapy. The aim of this systematic review is to report the monitoring devices currently proposed and tested for shoulder rehabilitation in home settings. The research question was formulated using the PICO approach, and the PRISMA guidelines were applied to ensure a transparent methodology for the systematic review process. A comprehensive search of PubMed and Scopus was conducted, and the results were included from 2014 up to 2023. Three different tools (i.e., the Rob 2 version of the Cochrane risk-of-bias tool, the Joanna Briggs Institute (JBI) Critical Appraisal tool, and the ROBINS-I tool) were used to assess the risk of bias. Fifteen studies were included as they fulfilled the inclusion criteria. The results showed that wearable systems represent a promising solution as remote monitoring technologies, offering quantitative and clinically meaningful insights into the progress of individuals within a rehabilitation pathway. Recent trends indicate a growing use of low-cost, non-intrusive visual tracking devices, such as camera-based monitoring systems, within the domain of tele-rehabilitation. The integration of home-based monitoring devices alongside traditional rehabilitation methods is acquiring significant attention, offering broader access to high-quality care, and potentially reducing healthcare costs associated with in-person therapy.
Collapse
Affiliation(s)
- Martina Sassi
- Department of Engineering, Università Campus Bio-Medico di Roma, Via Álvaro del Portillo, 21, 00128 Rome, Italy; (M.S.); (E.S.); (L.P.)
- Fondazione Policlinico Universitario Campus Bio-Medico di Roma, Via Álvaro del Portillo, 200, 00128 Rome, Italy
| | - Mariajose Villa Corta
- Research Unit of Orthopaedic and Trauma Surgery, Department of Medicine and Surgery, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo, 21, 00128 Rome, Italy; (M.V.C.); (M.G.P.); (G.N.)
| | - Matteo Giuseppe Pisani
- Research Unit of Orthopaedic and Trauma Surgery, Department of Medicine and Surgery, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo, 21, 00128 Rome, Italy; (M.V.C.); (M.G.P.); (G.N.)
| | - Guido Nicodemi
- Research Unit of Orthopaedic and Trauma Surgery, Department of Medicine and Surgery, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo, 21, 00128 Rome, Italy; (M.V.C.); (M.G.P.); (G.N.)
| | - Emiliano Schena
- Department of Engineering, Università Campus Bio-Medico di Roma, Via Álvaro del Portillo, 21, 00128 Rome, Italy; (M.S.); (E.S.); (L.P.)
| | - Leandro Pecchia
- Department of Engineering, Università Campus Bio-Medico di Roma, Via Álvaro del Portillo, 21, 00128 Rome, Italy; (M.S.); (E.S.); (L.P.)
| | - Umile Giuseppe Longo
- Fondazione Policlinico Universitario Campus Bio-Medico di Roma, Via Álvaro del Portillo, 200, 00128 Rome, Italy
- Research Unit of Orthopaedic and Trauma Surgery, Department of Medicine and Surgery, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo, 21, 00128 Rome, Italy; (M.V.C.); (M.G.P.); (G.N.)
| |
Collapse
|
3
|
Yang L, Oeding JF, de Marinis R, Marigi E, Sanchez-Sotelo J. Deep learning to automatically classify very large sets of preoperative and postoperative shoulder arthroplasty radiographs. J Shoulder Elbow Surg 2024; 33:773-780. [PMID: 37879598 DOI: 10.1016/j.jse.2023.09.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 09/06/2023] [Accepted: 09/10/2023] [Indexed: 10/27/2023]
Abstract
BACKGROUND Joint arthroplasty registries usually lack information on medical imaging owing to the laborious process of observing and recording, as well as the lack of standard methods to transfer the imaging information to the registries, which can limit the investigation of various research questions. Artificial intelligence (AI) algorithms can automate imaging-feature identification with high accuracy and efficiency. With the purpose of enriching shoulder arthroplasty registries with organized imaging information, it was hypothesized that an automated AI algorithm could be developed to classify and organize preoperative and postoperative radiographs from shoulder arthroplasty patients according to laterality, radiographic projection, and implant type. METHODS This study used a cohort of 2303 shoulder radiographs from 1724 shoulder arthroplasty patients. Two observers manually labeled all radiographs according to (1) laterality (left or right), (2) projection (anteroposterior, axillary, or lateral), and (3) whether the radiograph was a preoperative radiograph or showed an anatomic total shoulder arthroplasty or a reverse shoulder arthroplasty. All these labeled radiographs were randomly split into developmental and testing sets at the patient level and based on stratification. By use of 10-fold cross-validation, a 3-task deep-learning algorithm was trained on the developmental set to classify the 3 aforementioned characteristics. The trained algorithm was then evaluated on the testing set using quantitative metrics and visual evaluation techniques. RESULTS The trained algorithm perfectly classified laterality (F1 scores [harmonic mean values of precision and sensitivity] of 100% on the testing set). When classifying the imaging projection, the algorithm achieved F1 scores of 99.2%, 100%, and 100% on anteroposterior, axillary, and lateral views, respectively. When classifying the implant type, the model achieved F1 scores of 100%, 95.2%, and 100% on preoperative radiographs, anatomic total shoulder arthroplasty radiographs, and reverse shoulder arthroplasty radiographs, respectively. Visual evaluation using integrated maps showed that the algorithm focused on the relevant patient body and prosthesis parts for classification. It took the algorithm 20.3 seconds to analyze 502 images. CONCLUSIONS We developed an efficient, accurate, and reliable AI algorithm to automatically identify key imaging features of laterality, imaging view, and implant type in shoulder radiographs. This algorithm represents the first step to automatically classify and organize shoulder radiographs on a large scale in very little time, which will profoundly enrich shoulder arthroplasty registries.
Collapse
Affiliation(s)
- Linjun Yang
- Orthopedic Surgery Artificial Intelligence Laboratory, Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA; Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
| | - Jacob F Oeding
- Orthopedic Surgery Artificial Intelligence Laboratory, Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
| | - Rodrigo de Marinis
- Orthopedic Surgery Artificial Intelligence Laboratory, Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
| | - Erick Marigi
- Orthopedic Surgery Artificial Intelligence Laboratory, Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
| | - Joaquin Sanchez-Sotelo
- Orthopedic Surgery Artificial Intelligence Laboratory, Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
4
|
Oeding JF, Yang L, Sanchez-Sotelo J, Camp CL, Karlsson J, Samuelsson K, Pearle AD, Ranawat AS, Kelly BT, Pareek A. A practical guide to the development and deployment of deep learning models for the orthopaedic surgeon: Part III, focus on registry creation, diagnosis, and data privacy. Knee Surg Sports Traumatol Arthrosc 2024; 32:518-528. [PMID: 38426614 DOI: 10.1002/ksa.12085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/22/2024] [Accepted: 01/23/2024] [Indexed: 03/02/2024]
Abstract
Deep learning is a subset of artificial intelligence (AI) with enormous potential to transform orthopaedic surgery. As has already become evident with the deployment of Large Language Models (LLMs) like ChatGPT (OpenAI Inc.), deep learning can rapidly enter clinical and surgical practices. As such, it is imperative that orthopaedic surgeons acquire a deeper understanding of the technical terminology, capabilities and limitations associated with deep learning models. The focus of this series thus far has been providing surgeons with an overview of the steps needed to implement a deep learning-based pipeline, emphasizing some of the important technical details for surgeons to understand as they encounter, evaluate or lead deep learning projects. However, this series would be remiss without providing practical examples of how deep learning models have begun to be deployed and highlighting the areas where the authors feel deep learning may have the most profound potential. While computer vision applications of deep learning were the focus of Parts I and II, due to the enormous impact that natural language processing (NLP) has had in recent months, NLP-based deep learning models are also discussed in this final part of the series. In this review, three applications that the authors believe can be impacted the most by deep learning but with which many surgeons may not be familiar are discussed: (1) registry construction, (2) diagnostic AI and (3) data privacy. Deep learning-based registry construction will be essential for the development of more impactful clinical applications, with diagnostic AI being one of those applications likely to augment clinical decision-making in the near future. As the applications of deep learning continue to grow, the protection of patient information will become increasingly essential; as such, applications of deep learning to enhance data privacy are likely to become more important than ever before. Level of Evidence: Level IV.
Collapse
Affiliation(s)
- Jacob F Oeding
- School of Medicine, Mayo Clinic Alix School of Medicine, Rochester, Minnesota, USA
- Department of Orthopaedics, Institute of Clinical Sciences, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Linjun Yang
- Orthopedic Surgery Artificial Intelligence Laboratory (OSAIL), Department of Orthopedic Surgery, Mayo Clinic, Rochester, Minnesota, USA
| | | | - Christopher L Camp
- Department of Orthopedic Surgery, Mayo Clinic, Rochester, Minnesota, USA
| | - Jón Karlsson
- Department of Orthopaedics, Sahlgrenska University Hospital, Sahlgrenska Academy, Gothenburg University, Gothenburg, Sweden
| | - Kristian Samuelsson
- Department of Orthopaedics, Sahlgrenska University Hospital, Sahlgrenska Academy, Gothenburg University, Gothenburg, Sweden
| | - Andrew D Pearle
- Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, New York, USA
| | - Anil S Ranawat
- Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, New York, USA
| | - Bryan T Kelly
- Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, New York, USA
| | - Ayoosh Pareek
- Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, New York, USA
| |
Collapse
|
5
|
Oeding JF, Krych AJ, Pearle AD, Kelly BT, Kunze KN. Medical Imaging Applications Developed Using Artificial Intelligence Demonstrate High Internal Validity Yet Are Limited in Scope and Lack External Validation. Arthroscopy 2024:S0749-8063(24)00099-9. [PMID: 38325497 DOI: 10.1016/j.arthro.2024.01.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 01/21/2024] [Accepted: 01/29/2024] [Indexed: 02/09/2024]
Abstract
PURPOSE To (1) review definitions and concepts necessary to interpret applications of deep learning (DL; a domain of artificial intelligence that leverages neural networks to make predictions on media inputs such as images) and (2) identify knowledge and translational gaps in the literature to provide insight into specific areas for improvement as adoption of this technology continues. METHODS A comprehensive search of the literature was performed in December 2023 for articles regarding the use of DL in sports medicine. For each study, information regarding the joint of focus, specific anatomic structure/pathology to which DL was applied, imaging modality utilized, source of images used for model training and testing, data set size, model performance, and whether the DL model was externally validated was recorded. A numerical scale was used to rate each DL model's clinical impact, with 1 corresponding to proof-of-concept studies with little to no direct clinical impact and 5 corresponding to practice-changing clinical impact and ready for clinical deployment. RESULTS Fifty-five studies were identified, all of which were published within the past 5 years, while 82% were published within the past 3 years. Of the DL models identified, 84% were developed for classification tasks, 9% for automated measurements, and 7% for segmentation. A total of 62% of studies utilized magnetic resonance imaging as the imaging modality, 25% radiographs, and 7% ultrasound, while 1 study each used computed tomography, arthroscopic images, or arthroscopic video. Sixty-five percent of studies focused on the detection of tears (anterior cruciate ligament [ACL], rotator cuff [RC], and meniscus). The diagnostic performance of ACL tears, as determined by the area under the receiver operator curve (AUROC), ranged from 0.81 to 0.99 for ACL tears (excellent to near perfect), 0.83 to 0.94 for RC tears (excellent), and from 0.75 to 0.96 for meniscus tears (acceptable to excellent). In addition, 3 studies focused on detection of cartilage lesions had AUROC ranging from 0.90 to 0.92 (excellent performance). However, only 4 (7%) studies externally validated their models, suggesting that they may not be generalizable or may not perform well when applied to populations other than that used to develop the model. Finally, the mean clinical impact score was 2 (range, 1-3) on scale of 1 to 5, corresponding to limited clinical applicability. CONCLUSIONS DL models in orthopaedic sports medicine show generally excellent performance (high internal validity) but require external validation to facilitate clinical deployment. In addition, current models have low clinical applicability and fail to advance the field due to a focus on routine tasks and a narrow conceptual framework. LEVEL OF EVIDENCE Level IV, scoping review of Level I to IV studies.
Collapse
Affiliation(s)
- Jacob F Oeding
- Mayo Clinic Alix School of Medicine, Rochester, Minnesota, U.S.A
| | - Aaron J Krych
- Department of Orthopaedic Surgery, Mayo Clinic, Rochester, Minnesota, U.S.A
| | - Andrew D Pearle
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, U.S.A
| | - Bryan T Kelly
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, U.S.A
| | - Kyle N Kunze
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, U.S.A..
| |
Collapse
|
6
|
Bi AS, Kunze KN, Jazrawi LM. Editorial Commentary: Artificial Intelligence Models Show Impressive Results for Musculoskeletal Pathology Detection. Arthroscopy 2024; 40:579-580. [PMID: 38296452 DOI: 10.1016/j.arthro.2023.07.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 07/26/2023] [Indexed: 02/08/2024]
Abstract
An important domain of artificial intelligence is deep learning, which comprises computed vision tasks used for recognizing complex patterns in orthopaedic imaging, thus automating the identification of pathology. Purported benefits include an expedited clinical workflow; improved performance and consistency in diagnostic tasks; decreased time allocation burden; augmentation of diagnostic performance, decreased inter-reader discrepancies in measurements and diagnosis as a function of reducing subjectivity in the setting of differences in imaging quality, resolution, penetrance, or orientation; and the ability to function autonomously without rest (unlike human observers). Detection may include the presence or absence of an entity or identification of a specific landmark. Within the field of musculoskeletal health, such capabilities have been shown across a wide range of tasks such as detecting the presence or absence of a rotator cuff tear or automatically identifying the center of the hip joint. The clinical relevance and success of these research endeavors have led to a plethora of novel algorithms. However, few of these algorithms have been externally validated, and evidence remains inconclusive as to whether they provide a diagnostic benefit when compared with the current, human gold standard.
Collapse
|
7
|
Kunze KN, Williams RJ, Ranawat AS, Pearle AD, Kelly BT, Karlsson J, Martin RK, Pareek A. Artificial intelligence (AI) and large data registries: Understanding the advantages and limitations of contemporary data sets for use in AI research. Knee Surg Sports Traumatol Arthrosc 2024; 32:13-18. [PMID: 38226678 DOI: 10.1002/ksa.12018] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 11/27/2023] [Indexed: 01/17/2024]
Affiliation(s)
- Kyle N Kunze
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, USA
- Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, New York, USA
| | - Riley J Williams
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, USA
- Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, New York, USA
| | - Anil S Ranawat
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, USA
- Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, New York, USA
| | - Andrew D Pearle
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, USA
- Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, New York, USA
| | - Bryan T Kelly
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, USA
- Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, New York, USA
| | - Jon Karlsson
- Department of Orthopaedics, Sahlgrenska University Hospital, Sahlgrenska Academy, Gothenburg University, Gothenburg, Sweden
| | - R Kyle Martin
- Department of Orthopedic Surgery, University of Minnesota, Minneapolis, Minnesota, USA
| | - Ayoosh Pareek
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, USA
- Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, New York, USA
| |
Collapse
|