Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Groot OQ, Bindels BJJ, Ogink PT, Kapoor ND, Twining PK, Collins AK, Bongers MER, Lans A, Oosterhoff JHF, Karhade AV, Verlaan JJ, Schwab JH. Availability and reporting quality of external validations of machine-learning prediction models with orthopedic surgical outcomes: a systematic review. Acta Orthop 2021;92:385-393. [PMID: 33870837 PMCID: PMC8436968 DOI: 10.1080/17453674.2021.1910448] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open

For:	Groot OQ, Bindels BJJ, Ogink PT, Kapoor ND, Twining PK, Collins AK, Bongers MER, Lans A, Oosterhoff JHF, Karhade AV, Verlaan JJ, Schwab JH. Availability and reporting quality of external validations of machine-learning prediction models with orthopedic surgical outcomes: a systematic review. Acta Orthop 2021;92:385-393. [PMID: 33870837 PMCID: PMC8436968 DOI: 10.1080/17453674.2021.1910448] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open

Number

Cited by Other Article(s)

Farrow L, Zhong M, Anderson L. Use of natural language processing techniques to predict patient selection for total hip and knee arthroplasty from radiology reports. Bone Joint J 2024;106-B:688-695. [PMID: 38945535 DOI: 10.1302/0301-620x.106b7.bjj-2024-0136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]

Abstract

Aims

To examine whether natural language processing (NLP) using a clinically based large language model (LLM) could be used to predict patient selection for total hip or total knee arthroplasty (THA/TKA) from routinely available free-text radiology reports.

Methods

Data pre-processing and analyses were conducted according to the Artificial intelligence to Revolutionize the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project protocol. This included use of de-identified Scottish regional clinical data of patients referred for consideration of THA/TKA, held in a secure data environment designed for artificial intelligence (AI) inference. Only preoperative radiology reports were included. NLP algorithms were based on the freely available GatorTron model, a LLM trained on over 82 billion words of de-identified clinical text. Two inference tasks were performed: assessment after model-fine tuning (50 Epochs and three cycles of k-fold cross validation), and external validation.

Results

For THA, there were 5,558 patient radiology reports included, of which 4,137 were used for model training and testing, and 1,421 for external validation. Following training, model performance demonstrated average (mean across three folds) accuracy, F1 score, and area under the receiver operating curve (AUROC) values of 0.850 (95% confidence interval (CI) 0.833 to 0.867), 0.813 (95% CI 0.785 to 0.841), and 0.847 (95% CI 0.822 to 0.872), respectively. For TKA, 7,457 patient radiology reports were included, with 3,478 used for model training and testing, and 3,152 for external validation. Performance metrics included accuracy, F1 score, and AUROC values of 0.757 (95% CI 0.702 to 0.811), 0.543 (95% CI 0.479 to 0.607), and 0.717 (95% CI 0.657 to 0.778) respectively. There was a notable deterioration in performance on external validation in both cohorts.

Conclusion

The use of routinely available preoperative radiology reports provides promising potential to help screen suitable candidates for THA, but not for TKA. The external validation results demonstrate the importance of further model testing and training when confronted with new clinical cohorts.

Collapse

la Roi-Teeuw HM, van Royen FS, de Hond A, Zahra A, de Vries S, Bartels R, Carriero AJ, van Doorn S, Dunias ZS, Kant I, Leeuwenberg T, Peters R, Veerhoek L, van Smeden M, Luijken K. Don't be misled: 3 misconceptions about external validation of clinical prediction models. J Clin Epidemiol 2024;172:111387. [PMID: 38729274 DOI: 10.1016/j.jclinepi.2024.111387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/24/2024] [Accepted: 05/02/2024] [Indexed: 05/12/2024]

Affiliation(s)

Hannah M la Roi-Teeuw Department of General Practice and Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands.
Florien S van Royen Department of General Practice and Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
Anne de Hond Department of Epidemiology and Health Economics, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
Anum Zahra Department of Epidemiology and Health Economics, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
Sjoerd de Vries Department of Digital Health, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands; Department of Information and Computing Sciences, Utrecht University, Princetonplein 5, 3584 CC, Utrecht, The Netherlands
Richard Bartels Department of Digital Health, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands; Department of Data Science and Biostatistics, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
Alex J Carriero Department of Epidemiology and Health Economics, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
Sander van Doorn Department of General Practice and Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
Zoë S Dunias Department of Epidemiology and Health Economics, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
Ilse Kant Department of Digital Health, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
Tuur Leeuwenberg Department of Epidemiology and Health Economics, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
Ruben Peters Department of Digital Health, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
Laura Veerhoek Department of Digital Health, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
Maarten van Smeden Department of Epidemiology and Health Economics, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands; Department of Data Science and Biostatistics, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
Kim Luijken Department of Epidemiology and Health Economics, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands

Collapse

Karimi AH, Langberg J, Malige A, Rahman O, Abboud JA, Stone MA. Accuracy of machine learning to predict the outcomes of shoulder arthroplasty: a systematic review. ARTHROPLASTY 2024;6:26. [PMID: 38702749 PMCID: PMC11069283 DOI: 10.1186/s42836-024-00244-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Accepted: 02/26/2024] [Indexed: 05/06/2024] Open

Cho JH, Kim M, Nam HS, Park SY, Lee YS. Age and medial compartmental OA were important predictors of the lateral compartmental OA in the discoid lateral meniscus: Analysis using machine learning approach. Knee Surg Sports Traumatol Arthrosc 2024. [PMID: 38651559 DOI: 10.1002/ksa.12196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 03/16/2024] [Accepted: 03/28/2024] [Indexed: 04/25/2024]

Abstract

PURPOSE

The objective of this study was to develop a machine learning model that would predict lateral compartment osteoarthritis (OA) in the discoid lateral meniscus (DLM), from which to then identify factors contributing to lateral compartment OA, with a key focus on the patient's age.

METHODS

Data were collected from 611 patients with symptomatic DLM diagnosed using magnetic resonance imaging between April 2003 and May 2022. Twenty features, including demographic, clinical and radiological data and six algorithms were used to develop the predictive machine learning models. Shapley additive explanation (SHAP) analysis was performed on the best model, in addition to subgroup analyses according to age.

RESULTS

Extreme gradient boosting classifier was identified as the best prediction model, with an area under the receiver operating characteristic curve (AUROC) of 0.968, the highest among all the models, regardless of age (AUROC of 0.977 in young age and AUROC of 0.937 in old age). In the SHAP analysis, the most predictive feature was age, followed by the presence of medial compartment OA. In the subgroup analysis, the most predictive feature was age in young age, whereas the most predictive feature was the presence of medial compartment OA in old age.

CONCLUSION

The machine learning model developed in this study showed a high predictive performance with regard to predicting lateral compartment OA of the DLM. Age was identified as the most important factor, followed by medial compartment OA. In subgroup analysis, medial compartmental OA was found to be the most important factor in the older age group, whereas age remained the most important factor in the younger age group. These findings provide insights that may prove useful for the establishment of strategies for the treatment of patients with symptomatic DLM.

LEVEL OF EVIDENCE

Level III.

Collapse

Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, Ghassemi M, Liu X, Reitsma JB, van Smeden M, Boulesteix AL, Camaradou JC, Celi LA, Denaxas S, Denniston AK, Glocker B, Golub RM, Harvey H, Heinze G, Hoffman MM, Kengne AP, Lam E, Lee N, Loder EW, Maier-Hein L, Mateen BA, McCradden MD, Oakden-Rayner L, Ordish J, Parnell R, Rose S, Singh K, Wynants L, Logullo P. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024;385:e078378. [PMID: 38626948 PMCID: PMC11019967 DOI: 10.1136/bmj-2023-078378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/17/2024] [Indexed: 04/19/2024]

Affiliation(s)

Gary S Collins Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
Karel G M Moons Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
Paula Dhiman Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
Richard D Riley Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
Andrew L Beam Department of Epidemiology, Harvard T H Chan School of Public Health, Boston, MA, USA
Ben Van Calster Department of Development and Regeneration, KU Leuven, Leuven, Belgium Department of Biomedical Data Science, Leiden University Medical Centre, Leiden, Netherlands
Marzyeh Ghassemi Department of Electrical Engineering and Computer Science, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
Xiaoxuan Liu Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
Johannes B Reitsma Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
Maarten van Smeden Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
Anne-Laure Boulesteix Institute for Medical Information Processing, Biometry and Epidemiology, Faculty of Medicine, Ludwig-Maximilians-University of Munich and Munich Centre of Machine Learning, Germany
Jennifer Catherine Camaradou Patient representative, Health Data Research UK patient and public involvement and engagement group Patient representative, University of East Anglia, Faculty of Health Sciences, Norwich Research Park, Norwich, UK
Leo Anthony Celi Beth Israel Deaconess Medical Center, Boston, MA, USA Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA Department of Biostatistics, Harvard T H Chan School of Public Health, Boston, MA, USA
Spiros Denaxas Institute of Health Informatics, University College London, London, UK British Heart Foundation Data Science Centre, London, UK
Alastair K Denniston National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
Ben Glocker Department of Computing, Imperial College London, London, UK
Robert M Golub Northwestern University Feinberg School of Medicine, Chicago, IL, USA
Hugh Harvey Hardian Health, Haywards Heath, UK
Georg Heinze Section for Clinical Biometrics, Centre for Medical Data Science, Medical University of Vienna, Vienna, Austria
Michael M Hoffman Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada Department of Computer Science, University of Toronto, Toronto, ON, Canada Vector Institute for Artificial Intelligence, Toronto, ON, Canada
André Pascal Kengne Department of Medicine, University of Cape Town, Cape Town, South Africa
Emily Lam Patient representative, Health Data Research UK patient and public involvement and engagement group
Naomi Lee National Institute for Health and Care Excellence, London, UK
Elizabeth W Loder The BMJ, London, UK Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
Lena Maier-Hein Department of Intelligent Medical Systems, German Cancer Research Centre, Heidelberg, Germany
Bilal A Mateen Institute of Health Informatics, University College London, London, UK Wellcome Trust, London, UK Alan Turing Institute, London, UK
Melissa D McCradden Department of Bioethics, Hospital for Sick Children Toronto, ON, Canada Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, Canada
Lauren Oakden-Rayner Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
Johan Ordish Medicines and Healthcare products Regulatory Agency, London, UK
Richard Parnell Patient representative, Health Data Research UK patient and public involvement and engagement group
Sherri Rose Department of Health Policy and Center for Health Policy, Stanford University, Stanford, CA, USA
Karandeep Singh Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
Laure Wynants Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
Patricia Logullo Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK

Collapse

Bhandarkar AR, Onyedimma C, Jarrah RM, Ibrahim S, Fu S, Liu H, Bydon M. An Integrated Voice Recognition and Natural Language Processing Platform to Automatically Extract Thoracolumbar Injury Classification Score Features From Radiology Reports. World Neurosurg 2024;183:e243-e249. [PMID: 38103686 DOI: 10.1016/j.wneu.2023.12.065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 12/10/2023] [Accepted: 12/11/2023] [Indexed: 12/19/2023]

Abstract

BACKGROUND

Many predictive models for estimating clinical outcomes after spine surgery have been reported in the literature. However, implementation of predictive scores in practice is limited by the time-intensive nature of manually abstracting relevant predictors. In this study, we designed natural language processing (NLP) algorithms to automate data abstraction for the thoracolumbar injury classification score (TLICS).

METHODS

We retrieved the radiology reports of all Mayo Clinic patients with an International Classification of Diseases, 9th or 10th revision, code corresponding to a fracture of the thoracolumbar spine between January 2005 and October 2020. Annotated data were used to train an N-gram NLP model using machine learning methods, including random forest, stepwise linear discriminant analysis, k-nearest neighbors, and penalized logistic regression models.

RESULTS

A total of 1085 spine radiology reports were included in our analysis. Our dataset included 483 compression, 401 burst, 103 translational/rotational, and 98 distraction fractures. A total of 103 reports had documented an injury of the posterior ligamentous complex. The overall accuracy of the random forest model for fracture morphology feature detection was 76.96% versus 65.90% in the stepwise linear discriminant analysis, 50.69% in the k-nearest neighbors, and 62.67% in the penalized logistic regression. The overall accuracy to detect posterior ligamentous complex integrity was highest in the random forest model at 83.41%. Our random forest model was implemented in the backend of a web application in which users can dictate reports and have TLICS features automatically extracted.

CONCLUSIONS

We have developed a machine learning NLP model for extracting TLICS features from radiology reports, which we deployed in a web application that can be integrated into clinical practice.

Collapse

Lee C, Tseng T, Chang R, Yen H, Chen Y, Chen Y, Wu C, Hu M, Yen M, Bongers M, Groot OQ, Lai C, Lin W. Psoas muscle area is an independent survival prognosticator in patients undergoing surgery for long-bone metastases. Cancer Med 2024;13:e7072. [PMID: 38457220 PMCID: PMC10922028 DOI: 10.1002/cam4.7072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 02/02/2024] [Accepted: 02/20/2024] [Indexed: 03/09/2024] Open

Abstract

BACKGROUND

Predictive analytics is gaining popularity as an aid to treatment planning for patients with bone metastases, whose expected survival should be considered. Decreased psoas muscle area (PMA), a morphometric indicator of suboptimal nutritional status, has been associated with mortality in various cancers, but never been integrated into current survival prediction algorithms (SPA) for patients with skeletal metastases. This study investigates whether decreased PMA predicts worse survival in patients with extremity metastases and whether incorporating PMA into three modern SPAs (PATHFx, SORG-NG, and SORG-MLA) improves their performance.

METHODS

One hundred eighty-five patients surgically treated for long-bone metastases between 2014 and 2019 were divided into three PMA tertiles (small, medium, and large) based on their psoas size on CT. Kaplan-Meier, multivariable regression, and Cox proportional hazards analyses were employed to compare survival between tertiles and examine factors associated with mortality. Logistic regression analysis was used to assess whether incorporating adjusted PMA values enhanced the three SPAs' discriminatory abilities. The clinical utility of incorporating PMA into these SPAs was evaluated by decision curve analysis (DCA).

RESULTS

Patients with small PMA had worse 90-day and 1-year survival after surgery (log-rank test p < 0.001). Patients in the large PMA group had a higher chance of surviving 90 days (odds ratio, OR, 3.72, p = 0.02) and 1 year than those in the small PMA group (OR 3.28, p = 0.004). All three SPAs had increased AUC after incorporation of adjusted PMA. DCA indicated increased net benefits at threshold probabilities >0.5 after the addition of adjusted PMA to these SPAs.

CONCLUSIONS

Decreased PMA on CT is associated with worse survival in surgically treated patients with extremity metastases, even after controlling for three contemporary SPAs. Physicians should consider the additional prognostic value of PMA on survival in patients undergoing consideration for operative management due to extremity metastases.

Collapse

Riley RD, Snell KIE, Archer L, Ensor J, Debray TPA, van Calster B, van Smeden M, Collins GS. Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study. BMJ 2024;384:e074821. [PMID: 38253388 DOI: 10.1136/bmj-2023-074821] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]

Dijkstra H, van de Kuit A, de Groot T, Canta O, Groot OQ, Oosterhoff JH, Doornberg JN. Systematic review of machine-learning models in orthopaedic trauma. Bone Jt Open 2024;5:9-19. [PMID: 38226447 PMCID: PMC10790183 DOI: 10.1302/2633-1462.51.bjo-2023-0095.r1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/17/2024] Open

Abstract

Aims

Machine-learning (ML) prediction models in orthopaedic trauma hold great promise in assisting clinicians in various tasks, such as personalized risk stratification. However, an overview of current applications and critical appraisal to peer-reviewed guidelines is lacking. The objectives of this study are to 1) provide an overview of current ML prediction models in orthopaedic trauma; 2) evaluate the completeness of reporting following the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement; and 3) assess the risk of bias following the Prediction model Risk Of Bias Assessment Tool (PROBAST) tool.

Methods

A systematic search screening 3,252 studies identified 45 ML-based prediction models in orthopaedic trauma up to January 2023. The TRIPOD statement assessed transparent reporting and the PROBAST tool the risk of bias.

Results

A total of 40 studies reported on training and internal validation; four studies performed both development and external validation, and one study performed only external validation. The most commonly reported outcomes were mortality (33%, 15/45) and length of hospital stay (9%, 4/45), and the majority of prediction models were developed in the hip fracture population (60%, 27/45). The overall median completeness for the TRIPOD statement was 62% (interquartile range 30 to 81%). The overall risk of bias in the PROBAST tool was low in 24% (11/45), high in 69% (31/45), and unclear in 7% (3/45) of the studies. High risk of bias was mainly due to analysis domain concerns including small datasets with low number of outcomes, complete-case analysis in case of missing data, and no reporting of performance measures.

Conclusion

The results of this study showed that despite a myriad of potential clinically useful applications, a substantial part of ML studies in orthopaedic trauma lack transparent reporting, and are at high risk of bias. These problems must be resolved by following established guidelines to instil confidence in ML models among patients and clinicians. Otherwise, there will remain a sizeable gap between the development of ML prediction models and their clinical application in our day-to-day orthopaedic trauma practice.

Collapse

Chen SF, Su CC, Huang CC, Ogink PT, Yen HK, Groot OQ, Hu MH. External validation of machine learning algorithm predicting prolonged opioid prescriptions in opioid-naïve lumbar spine surgery patients using a Taiwanese cohort. J Formos Med Assoc 2023;122:1321-1330. [PMID: 37453900 DOI: 10.1016/j.jfma.2023.06.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 06/26/2023] [Accepted: 06/30/2023] [Indexed: 07/18/2023] Open

Groot OQ. CORR Insights®: Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer. Clin Orthop Relat Res 2023;481:2257-2259. [PMID: 37638845 PMCID: PMC10566951 DOI: 10.1097/corr.0000000000002828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 07/25/2023] [Indexed: 08/29/2023]

Karnuta JM, Shaikh HJF, Murphy MP, Brown NM, Pearle AD, Nawabi DH, Chen AF, Ramkumar PN. Artificial Intelligence for Automated Implant Identification in Knee Arthroplasty: A Multicenter External Validation Study Exceeding 3.5 Million Plain Radiographs. J Arthroplasty 2023;38:2004-2008. [PMID: 36940755 DOI: 10.1016/j.arth.2023.03.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 03/13/2023] [Accepted: 03/14/2023] [Indexed: 03/23/2023] Open

Abstract

BACKGROUND

Surgical management of complications following knee arthroplasty demands accurate and timely identification of implant manufacturer and model. Automated image processing using deep machine learning has been previously developed and internally validated; however, external validation is essential prior to scaling clinical implementation for generalizability.

METHODS

We trained, validated, and externally tested a deep learning system to classify knee arthroplasty systems as one of the 9 models from 4 manufacturers derived from 4,724 original, retrospectively collected anteroposterior plain knee radiographs across 3 academic referral centers. From these radiographs, 3,568 were used for training, 412 for validation, and 744 for external testing. Augmentation was applied to the training set (n = 3,568,000) to increase model robustness. Performance was determined by the area under the receiver operating characteristic curve, sensitivity, specificity, and accuracy. Implant identification processing speed was calculated. The training and testing sets were drawn from statistically different populations of implants (P < .001).

RESULTS

After 1,000 training epochs by the deep learning system, the system discriminated 9 implant models with a mean area under the receiver operating characteristic curve of 0.989, accuracy of 97.4%, sensitivity of 89.2%, and specificity of 99.0% in the external testing dataset of 744 anteroposterior radiographs. The software classified implants at a mean speed of 0.02 seconds per image.

CONCLUSION

An artificial intelligence-based software for identifying knee arthroplasty implants demonstrated excellent internal and external validation. Although continued surveillance is necessary with implant library expansion, this software represents a responsible and meaningful clinical application of artificial intelligence with immediate potential to globally scale and assist in preoperative planning prior to revision knee arthroplasty.

Collapse

Padash S, Mickley JP, Vera-Garcia DV, Nugen F, Khosravi B, Erickson BJ, Wyles CC, Taunton MJ. An Overview of Machine Learning in Orthopedic Surgery: An Educational Paper. J Arthroplasty 2023;38:1938-1942. [PMID: 37598786 PMCID: PMC10601337 DOI: 10.1016/j.arth.2023.08.043] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 08/10/2023] [Accepted: 08/11/2023] [Indexed: 08/22/2023] Open

Chen TLW, Buddhiraju A, Seo HH, Subih MA, Tuchinda P, Kwon YM. Internal and External Validation of the Generalizability of Machine Learning Algorithms in Predicting Non-home Discharge Disposition Following Primary Total Knee Joint Arthroplasty. J Arthroplasty 2023;38:1973-1981. [PMID: 36764409 DOI: 10.1016/j.arth.2023.01.065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 01/24/2023] [Accepted: 01/31/2023] [Indexed: 02/12/2023] Open

Abstract

BACKGROUND

Nonhome discharge disposition following primary total knee arthroplasty (TKA) is associated with a higher rate of complications and constitutes a socioeconomic burden on the health care system. While existing algorithms predicting nonhome discharge disposition varied in degrees of mathematical complexity and prediction power, their capacity to generalize predictions beyond the development dataset remains limited. Therefore, this study aimed to establish the machine learning model generalizability by performing internal and external validations using nation-scale and institutional cohorts, respectively.

METHODS

Four machine learning models were trained using the national cohort. Recursive feature elimination and hyper-parameter tuning were applied. Internal validation was achieved through five-fold cross-validation during model training. The trained models' performance was externally validated using the institutional cohort and assessed by discrimination, calibration, and clinical utility.

RESULTS

The national (424,354 patients) and institutional (10,196 patients) cohorts had non-home discharge rates of 19.4 and 36.4%, respectively. The areas under the receiver operating curve of the model predictions were 0.83 to 0.84 during internal validation and increased to 0.88 to 0.89 during external validation. Artificial neural network and histogram-based gradient boosting elicited the best performance with a mean area under the receiver operating curve of 0.89, calibration slope of 1.39, and Brier score of 0.14, which indicated that the two models were robust in distinguishing non-home discharge and well-calibrated with accurate predictions of the probabilities. The low inter-dataset similarity indicated reliable external validation. Length of stay, age, body mass index, and sex were the strongest predictors of discharge destination after primary TKA.

CONCLUSION

The machine learning models demonstrated excellent predictive performance during both internal and external validations, supporting their generalizability across different patient cohorts and potential applicability in the clinical workflow.

Collapse

Karnuta JM, Murphy MP, Luu BC, Ryan MJ, Haeberle HS, Brown NM, Iorio R, Chen AF, Ramkumar PN. Artificial Intelligence for Automated Implant Identification in Total Hip Arthroplasty: A Multicenter External Validation Study Exceeding Two Million Plain Radiographs. J Arthroplasty 2023;38:1998-2003.e1. [PMID: 35271974 DOI: 10.1016/j.arth.2022.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 02/23/2022] [Accepted: 03/01/2022] [Indexed: 02/02/2023] Open

Abstract

BACKGROUND

The surgical management of complications after total hip arthroplasty (THA) necessitates accurate identification of the femoral implant manufacturer and model. Automated image processing using deep learning has been previously developed and internally validated; however, external validation is necessary prior to responsible application of artificial intelligence (AI)-based technologies.

METHODS

We trained, validated, and externally tested a deep learning system to classify femoral-sided THA implants as one of the 8 models from 2 manufacturers derived from 2,954 original, deidentified, retrospectively collected anteroposterior plain radiographs across 3 academic referral centers and 13 surgeons. From these radiographs, 2,117 were used for training, 249 for validation, and 588 for external testing. Augmentation was applied to the training set (n = 2,117,000) to increase model robustness. Performance was evaluated by area under the receiver operating characteristic curve, sensitivity, specificity, and accuracy. Implant identification processing speed was calculated.

RESULTS

The training and testing sets were drawn from statistically different populations of implants (P < .001). After 1,000 training epochs by the deep learning system, the system discriminated 8 implant models with a mean area under the receiver operating characteristic curve of 0.991, accuracy of 97.9%, sensitivity of 88.6%, and specificity of 98.9% in the external testing dataset of 588 anteroposterior radiographs. The software classified implants at a mean speed of 0.02 seconds per image.

CONCLUSION

An AI-based software demonstrated excellent internal and external validation. Although continued surveillance is necessary with implant library expansion, this software represents responsible and meaningful clinical application of AI with immediate potential to globally scale and assist in preoperative planning prior to revision THA.

Collapse

Hsieh H, Yen H, Tseng T, Pan Y, Liao M, Fu S, Yen M, Jaw F, Lin W, Hu M, Yang S, Groot OQ, Schoenfeld AJ. Determining patients with spinal metastases suitable for surgical intervention: A cost-effective analysis. Cancer Med 2023;12:20059-20069. [PMID: 37749979 PMCID: PMC10587930 DOI: 10.1002/cam4.6576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 09/04/2023] [Accepted: 09/12/2023] [Indexed: 09/27/2023] Open

Abstract

BACKGROUND

Both nonoperative and operative treatments for spinal metastasis are expensive interventions. Patients' expected 3-month survival is believed to be a key factor to determine the most suitable treatment. However, to the best of our knowledge, no previous study lends support to the hypothesis. We sought to determine the cost-effectiveness of operative and nonoperative interventions, stratified by patients' predicted probability of 3-month survival.

METHODS

A Markov model with four defined health states was used to estimate the quality-adjusted life years (QALYs) and costs for operative intervention with postoperative radiotherapy and radiotherapy alone (palliative low-dose external beam radiotherapy) of spine metastases. Transition probabilities for the model, including the risks of mortality and functional deterioration, were obtained from secondary and our institutional data. Willingness to pay thresholds were prespecified at $100,000 and $150,000. The analyses were censored after 5-year simulation from a health system perspective and discounted outcomes at 3% per year. Sensitivity analyses were conducted to test the robustness of the study design.

RESULTS

The incremental cost-effectiveness ratios were $140,907 per QALY for patients with a 3-month survival probability >50%, $3,178,510 per QALY for patients with a 3-month survival probability <50%, and $168,385 per QALY for patients with independent ambulatory and 3-month survival probability >50%.

CONCLUSIONS

This study emphasizes the need to choose patients carefully and estimate preoperative survival for those with spinal metastases. In addition to reaffirming previous research regarding the influence of ambulatory status on cost-effectiveness, our study goes a step further by highlighting that operative intervention with postoperative radiotherapy could be more cost-effective than radiotherapy alone for patients with a better survival outlook. Accurate survival prediction tools and larger future studies could offer more detailed insights for clinical decisions.

Collapse

Affiliation(s)

Hsiang‐Chieh Hsieh Institute of Biomedical Engineering, National Taiwan UniversityTaipeiTaiwan Department of Orthopaedic SurgeryNational Taiwan University HospitalTaipeiTaiwan Department of Orthopaedic SurgeryNational Taiwan University HospitalHsinchuTaiwan
Hung‐Kuan Yen Department of Orthopaedic SurgeryNational Taiwan University HospitalTaipeiTaiwan Department of Orthopaedic SurgeryNational Taiwan University HospitalHsinchuTaiwan Department of Medical EducationNational Taiwan University HospitalHsinchuTaiwan
Ting‐En Tseng Department of Orthopaedic SurgeryNational Taiwan University HospitalTaipeiTaiwan
Yu‐Ting Pan Department of Medical EducationNational Taiwan University HospitalTaipeiTaiwan
Min‐Tsun Liao Division of Cardiology, Department of Internal MedicineNational Taiwan University HospitalHsinchuTaiwan
Shau‐Huai Fu Department of Orthopaedic SurgeryNational Taiwan University HospitalDouliuTaiwan
Mao‐Hsu Yen Department of Computer Science and EngineeringNational Taiwan Ocean UniversityKeelungTaiwan
Fu‐Shan Jaw Institute of Biomedical Engineering, National Taiwan UniversityTaipeiTaiwan
Wei‐Hsin Lin Department of Orthopaedic SurgeryNational Taiwan University HospitalTaipeiTaiwan
Ming‐Hsiao Hu Department of Orthopaedic SurgeryNational Taiwan University HospitalTaipeiTaiwan Department of Orthopaedics, College of medicine, National Taiwan UniversityTaipeiTaiwan
Shu‐Hua Yang Department of Orthopaedic SurgeryNational Taiwan University HospitalTaipeiTaiwan Department of Orthopaedics, College of medicine, National Taiwan UniversityTaipeiTaiwan
Olivier Q. Groot Department of Orthopaedic SurgeryMassachusetts General Hospital, Harvard Medical SchoolBostonMassachusettsUSA Department of OrthopaedicsUniversity Medical Center UtrechtUtrechtNetherlands
Andrew J. Schoenfeld Department of Orthopaedic SurgeryBrigham and Women's Hospital, Harvard Medical SchoolBostonMassachusettsUSA

Collapse

Kokkinakis S, Kritsotakis EI, Paterakis K, Karali GA, Malikides V, Kyprianou A, Papalexandraki M, Anastasiadis CS, Zoras O, Drakos N, Kehagias I, Kehagias D, Gouvas N, Kokkinos G, Pozotou I, Papatheodorou P, Frantzeskou K, Schizas D, Syllaios A, Palios IM, Nastos K, Perdikaris M, Michalopoulos NV, Margaris I, Lolis E, Dimopoulou G, Panagiotou D, Nikolaou V, Glantzounis GK, Pappas-Gogos G, Tepelenis K, Zacharioudakis G, Tsaramanidis S, Patsarikas I, Stylianidis G, Giannos G, Karanikas M, Kofina K, Markou M, Chrysos E, Lasithiotakis K. Prospective multicenter external validation of postoperative mortality prediction tools in patients undergoing emergency laparotomy. J Trauma Acute Care Surg 2023;94:847-856. [PMID: 36726191 DOI: 10.1097/ta.0000000000003904] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]

Abstract

BACKGROUND

Accurate preoperative risk assessment in emergency laparotomy (EL) is valuable for informed decision making and rational use of resources. Available risk prediction tools have not been validated adequately across diverse health care settings. Herein, we report a comparative external validation of four widely cited prognostic models.

METHODS

A multicenter cohort was prospectively composed of consecutive patients undergoing EL in 11 Greek hospitals from January 2020 to May 2021 using the National Emergency Laparotomy Audit (NELA) inclusion criteria. Thirty-day mortality risk predictions were calculated using the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP), NELA, Portsmouth Physiological and Operative Severity Score for the Enumeration of Mortality and Morbidity (P-POSSUM), and Predictive Optimal Trees in Emergency Surgery Risk tools. Surgeons' assessment of postoperative mortality using predefined cutoffs was recorded, and a surgeon-adjusted ACS-NSQIP prediction was calculated when the original model's prediction was relatively low. Predictive performances were compared using scaled Brier scores, discrimination and calibration measures and plots, and decision curve analysis. Heterogeneity across hospitals was assessed by random-effects meta-analysis.

RESULTS

A total of 631 patients were included, and 30-day mortality was 16.3%. The ACS-NSQIP and its surgeon-adjusted version had the highest scaled Brier scores. All models presented high discriminative ability, with concordance statistics ranging from 0.79 for P-POSSUM to 0.85 for NELA. However, except the surgeon-adjusted ACS-NSQIP (Hosmer-Lemeshow test, p = 0.742), all other models were poorly calibrated ( p < 0.001). Decision curve analysis revealed superior clinical utility of the ACS-NSQIP. Following recalibrations, predictive accuracy improved for all models, but ACS-NSQIP retained the lead. Between-hospital heterogeneity was minimum for the ACS-NSQIP model and maximum for P-POSSUM.

CONCLUSION

The ACS-NSQIP tool was most accurate for mortality predictions after EL in a broad external validation cohort, demonstrating utility for facilitating preoperative risk management in the Greek health care system. Subjective surgeon assessments of patient prognosis may optimize ACS-NSQIP predictions.

LEVEL OF EVIDENCE

Diagnostic Test/Criteria; Level II.

Collapse

Lans A, Kanbier LN, Bernstein DN, Groot OQ, Ogink PT, Tobert DG, Verlaan JJ, Schwab JH. Social determinants of health in prognostic machine learning models for orthopaedic outcomes: A systematic review. J Eval Clin Pract 2023;29:292-299. [PMID: 36099267 DOI: 10.1111/jep.13765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 08/22/2022] [Accepted: 08/27/2022] [Indexed: 11/26/2022]

Abstract

RATIONAL

Social determinants of health (SDOH) are being considered more frequently when providing orthopaedic care due to their impact on treatment outcomes. Simultaneously, prognostic machine learning (ML) models that facilitate clinical decision making have become popular tools in the field of orthopaedic surgery. When ML-driven tools are developed, it is important that the perpetuation of potential disparities is minimized. One approach is to consider SDOH during model development. To date, it remains unclear whether and how existing prognostic ML models for orthopaedic outcomes consider SDOH variables.

OBJECTIVE

To investigate whether prognostic ML models for orthopaedic surgery outcomes account for SDOH, and to what extent SDOH variables are included in the final models.

METHODS

A systematic search was conducted in PubMed, Embase and Cochrane for studies published up to 17 November 2020. Two reviewers independently extracted SDOH features using the PROGRESS+ framework (place of residence, race/ethnicity, Occupation, gender/sex, religion, education, social capital, socioeconomic status, 'Plus+' age, disability, and sexual orientation).

RESULTS

The search yielded 7138 studies, of which 59 met the inclusion criteria. Across all studies, 96% (57/59) considered at least one PROGRESS+ factor during development. The most common factors were age (95%; 56/59) and gender/sex (96%; 57/59). Differential effect analyses, such as subgroup analysis, covariate adjustment, and baseline comparison, were rarely reported (10%; 6/59). The majority of models included age (92%; 54/59) and gender/sex (69%; 41/59) as final input variables. However, factors such as insurance status (7%; 4/59), marital status (7%; 4/59) and income (3%; 2/59) were seldom included.

CONCLUSION

The current level of reporting and consideration of SDOH during the development of prognostic ML models for orthopaedic outcomes is limited. Healthcare providers should be critical of the models they consider using and knowledgeable regarding the quality of model development, such as adherence to recognized methodological standards. Future efforts should aim to avoid bias and disparities when developing ML-driven applications for orthopaedics.

Collapse

Buddhiraju A, Chen TLW, Subih MA, Seo HH, Esposito JG, Kwon YM. Validation and Generalizability of Machine Learning Models for the Prediction of Discharge Disposition Following Revision Total Knee Arthroplasty. J Arthroplasty 2023;38:S253-S258. [PMID: 36849013 DOI: 10.1016/j.arth.2023.02.054] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Revised: 02/16/2023] [Accepted: 02/20/2023] [Indexed: 03/01/2023] Open

Abstract

BACKGROUND

Postoperative discharge to facilities account for over 33% of the $ 2.7 billion revision total knee arthroplasty (TKA)-associated annual expenditures and are associated with increased complications when compared to home discharges. Prior studies predicting discharge disposition using advanced machine learning (ML) have been limited due to a lack of generalizability and validation. This study aimed to establish ML model generalizability by externally validating its prediction for nonhome discharge following revision TKA using national and institutional databases.

METHODS

The national and institutional cohorts comprised 52,533 and 1,628 patients, respectively, with 20.6 and 19.4% nonhome discharge rates. Five ML models were trained and internally validated (five-fold cross-validation) on a large national dataset. Subsequently, external validation was performed on our institutional dataset. Model performance was assessed using discrimination, calibration, and clinical utility. Global predictor importance plots and local surrogate models were used for interpretation.

RESULTS

The strongest predictors of nonhome discharge were patient age, body mass index, and surgical indication. The area under the receiver operating characteristic curve increased from internal to external validation and ranged between 0.77 and 0.79. Artificial neural network was the best predictive model for identifying patients at risk for nonhome discharge (area under the receiver operating characteristic curve = 0.78), and also the most accurate (calibration slope = 0.93, intercept = 0.02, and Brier score = 0.12).

CONCLUSION

All five ML models demonstrated good-to-excellent discrimination, calibration, and clinical utility on external validation, with artificial neural network being the best model for predicting discharge disposition following revision TKA. Our findings establish the generalizability of ML models developed using data from a national database. The integration of these predictive models into clinical workflow may assist in optimizing discharge planning, bed management, and cost containment associated with revision TKA.

Collapse

Oosterhoff JHF, Karhade AV, Groot OQ, Schwab JH, Heng M, Klang E, Prat D. Intercontinental validation of a clinical prediction model for predicting 90-day and 2-year mortality in an Israeli cohort of 2033 patients with a femoral neck fracture aged 65 or above. Eur J Trauma Emerg Surg 2023;49:1545-1553. [PMID: 36757419 DOI: 10.1007/s00068-023-02237-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/27/2023] [Indexed: 02/10/2023]

Abstract

PURPOSE

Mortality prediction in elderly femoral neck fracture patients is valuable in treatment decision-making. A previously developed and internally validated clinical prediction model shows promise in identifying patients at risk of 90-day and 2-year mortality. Validation in an independent cohort is required to assess the generalizability; especially in geographically distinct regions. Therefore we questioned, is the SORG Orthopaedic Research Group (SORG) femoral neck fracture mortality algorithm externally valid in an Israeli cohort to predict 90-day and 2-year mortality?

METHODS

We previously developed a prediction model in 2022 for estimating the risk of mortality in femoral neck fracture patients using a multicenter institutional cohort of 2,478 patients from the USA. The model included the following input variables that are available on clinical admission: age, male gender, creatinine level, absolute neutrophil, hemoglobin level, international normalized ratio (INR), congestive heart failure (CHF), displaced fracture, hemiplegia, chronic obstructive pulmonary disease (COPD), history of cerebrovascular accident (CVA) and beta-blocker use. To assess the generalizability, we used an intercontinental institutional cohort from the Sheba Medical Center in Israel (level I trauma center), queried between June 2008 and February 2022. Generalizability of the model was assessed using discrimination, calibration, Brier score, and decision curve analysis.

RESULTS

The validation cohort included 2,033 patients, aged 65 years or above, that underwent femoral neck fracture surgery. Most patients were female 64.8% (n = 1317), the median age was 81 years (interquartile range = 75-86), and 80.4% (n = 1635) patients sustained a displaced fracture (Garden III/IV). The 90-day mortality was 9.4% (n = 190) and 2-year mortality was 30.0% (n = 610). Despite numerous baseline differences, the model performed acceptably to the validation cohort on discrimination (c-statistic 0.67 for 90-day, 0.67 for 2-year), calibration, Brier score, and decision curve analysis.

CONCLUSIONS

The previously developed SORG femoral neck fracture mortality algorithm demonstrated good performance in an independent intercontinental population. Current iteration should not be relied on for patient care, though suggesting potential utility in assessing patients at low risk for 90-day or 2-year mortality. Further studies should evaluate this tool in a prospective setting and evaluate its feasibility and efficacy in clinical practice. The algorithm can be freely accessed: https://sorg-apps.shinyapps.io/hipfracturemortality/ .

LEVEL OF EVIDENCE

Level III, Prognostic study.

Collapse

Jadresic MC, Baker JF. Predicting complications of spine surgery: external validation of three models. Spine J 2022;22:1801-1810. [PMID: 35870799 DOI: 10.1016/j.spinee.2022.07.092] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 06/24/2022] [Accepted: 07/14/2022] [Indexed: 02/03/2023]

Abstract

BACKGROUND CONTEXT

Numerous prediction tools are available for estimating postoperative risk following spine surgery. External validation and comparison of these tools is critical prior to clinical use. No model for adverse events after spine surgery has undergone decision curve analysis.

PURPOSE

External validation, comparison, and decision curve analysis of 3 previously described models [SpineSage, Risk Assessment Tool (RAT), National Surgical Quality Improvement Program Risk Calculator (NSQIP)] for predicting 30-day postoperative complications after spine surgery STUDY DESIGN: Retrospective cohort study.

PATIENT SAMPLE

Three hundred fifteen patients who underwent spine surgery at a tertiary academic surgical center in New Zealand between January 2019 and April 2020.

OUTCOME MEASURES

As defined by each risk prediction tool and objectively using the Comprehensive Complication Index.

METHODS

We retrospectively reviewed risk of postoperative complication was calculated for each patient according to the 3 models. Overall model fit, calibration, discrimination, and decision curve analysis for each model were assessed in line with the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guidelines.

RESULTS

100 (35%) patients experienced complications. SpineSage and RAT were well calibrated, NSQIP systematically underestimated risk. Area under the curve was greatest for SpineSage (0.75) compared with the NSQIP (0.72) and the RAT (0.69). Decision curve analysis showed SpineSage resulted in greatest net benefit across all risk thresholds.

CONCLUSIONS

Of the models studied, SpineSage most accurately predicted risk and can be expected to perform better than a strategy of treating all patients if patient or surgeon deem complication risk >10% significant. NSQIP may not be suitable for the clinical use in our local population.

Collapse

Oosterhoff JHF, Oberai T, Karhade AV, Doornberg JN, Kerkhoffs GM, Jaarsma RL, Schwab JH, Heng M. Does the SORG Orthopaedic Research Group Hip Fracture Delirium Algorithm Perform Well on an Independent Intercontinental Cohort of Patients With Hip Fractures Who Are 60 Years or Older? Clin Orthop Relat Res 2022;480:2205-2213. [PMID: 35561268 PMCID: PMC10476833 DOI: 10.1097/corr.0000000000002246] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 04/22/2022] [Indexed: 01/31/2023]

Abstract

BACKGROUND

Postoperative delirium in patients aged 60 years or older with hip fractures adversely affects clinical and functional outcomes. The economic cost of delirium is estimated to be as high as USD 25,000 per patient, with a total budgetary impact between USD 6.6 to USD 82.4 billion annually in the United States alone. Forty percent of delirium episodes are preventable, and accurate risk stratification can decrease the incidence and improve clinical outcomes in patients. A previously developed clinical prediction model (the SORG Orthopaedic Research Group hip fracture delirium machine-learning algorithm) is highly accurate on internal validation (in 28,207 patients with hip fractures aged 60 years or older in a US cohort) in identifying at-risk patients, and it can facilitate the best use of preventive interventions; however, it has not been tested in an independent population. For an algorithm to be useful in real life, it must be valid externally, meaning that it must perform well in a patient cohort different from the cohort used to "train" it. With many promising machine-learning prediction models and many promising delirium models, only few have also been externally validated, and even fewer are international validation studies.

QUESTION/PURPOSE

Does the SORG hip fracture delirium algorithm, initially trained on a database from the United States, perform well on external validation in patients aged 60 years or older in Australia and New Zealand?

METHODS

We previously developed a model in 2021 for assessing risk of delirium in hip fracture patients using records of 28,207 patients obtained from the American College of Surgeons National Surgical Quality Improvement Program. Variables included in the original model included age, American Society of Anesthesiologists (ASA) class, functional status (independent or partially or totally dependent for any activities of daily living), preoperative dementia, preoperative delirium, and preoperative need for a mobility aid. To assess whether this model could be applied elsewhere, we used records from an international hip fracture registry. Between June 2017 and December 2018, 6672 patients older than 60 years of age in Australia and New Zealand were treated surgically for a femoral neck, intertrochanteric hip, or subtrochanteric hip fracture and entered into the Australian & New Zealand Hip Fracture Registry. Patients were excluded if they had a pathological hip fracture or septic shock. Of all patients, 6% (402 of 6672) did not meet the inclusion criteria, leaving 94% (6270 of 6672) of patients available for inclusion in this retrospective analysis. Seventy-one percent (4249 of 5986) of patients were aged 80 years or older, after accounting for 5% (284 of 6270) of missing values; 68% (4292 of 6266) were female, after accounting for 0.06% (4 of 6270) of missing values, and 83% (4690 of 5661) of patients were classified as ASA III/IV, after accounting for 10% (609 of 6270) of missing values. Missing data were imputed using the missForest methodology. In total, 39% (2467 of 6270) of patients developed postoperative delirium. The performance of the SORG hip fracture delirium algorithm on the validation cohort was assessed by discrimination, calibration, Brier score, and a decision curve analysis. Discrimination, known as the area under the receiver operating characteristic curves (c-statistic), measures the model's ability to distinguish patients who achieved the outcomes from those who did not and ranges from 0.5 to 1.0, with 1.0 indicating the highest discrimination score and 0.50 the lowest. Calibration plots the predicted versus the observed probabilities, a perfect plot has an intercept of 0 and a slope of 1. The Brier score calculates a composite of discrimination and calibration, with 0 indicating perfect prediction and 1 the poorest.

RESULTS

The SORG hip fracture algorithm, when applied to an external patient cohort, distinguished between patients at low risk and patients at moderate to high risk of developing postoperative delirium. The SORG hip fracture algorithm performed with a c-statistic of 0.74 (95% confidence interval 0.73 to 0.76). The calibration plot showed high accuracy in the lower predicted probabilities (intercept -0.28, slope 0.52) and a Brier score of 0.22 (the null model Brier score was 0.24). The decision curve analysis showed that the model can be beneficial compared with no model or compared with characterizing all patients as at risk for developing delirium.

CONCLUSION

Algorithms developed with machine learning are a potential tool for refining treatment of at-risk patients. If high-risk patients can be reliably identified, resources can be appropriately directed toward their care. Although the current iteration of SORG should not be relied on for patient care, it suggests potential utility in assessing risk. Further assessment in different populations, made easier by international collaborations and standardization of registries, would be useful in the development of universally valid prediction models. The model can be freely accessed at: https://sorg-apps.shinyapps.io/hipfxdelirium/ .

LEVEL OF EVIDENCE

Level III, therapeutic study.

Collapse

Yen HK, Chiang H. Letter to the Editor: CORR Synthesis: When Should We Be Skeptical of Clinical Prediction Models? Clin Orthop Relat Res 2022;480:2271-2273. [PMID: 36083689 PMCID: PMC9556068 DOI: 10.1097/corr.0000000000002395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 08/16/2022] [Indexed: 01/31/2023]

HSIEH HC, LAI YH, LEE CC, YEN HK, TSENG TE, YANG JJ, LIN SY, HU MH, HOU CH, YANG RS, WEDIN R, FORSBERG JA, LIN WH. Can a Bayesian belief network for survival prediction in patients with extremity metastases (PATHFx) be externally validated in an Asian cohort of 356 surgically treated patients? Acta Orthop 2022;93:721-731. [PMID: 36083697 PMCID: PMC9463636 DOI: 10.2340/17453674.2022.4545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Indexed: 01/31/2023] Open

Abstract

BACKGROUND AND PURPOSE

Predicted survival may influence the treatment decision for patients with skeletal extremity metastasis, and PATHFx was designed to predict the likelihood of a patient dying in the next 24 months. However, the performance of prediction models could have ethnogeographical variations. We asked if PATHFx generalized well to our Taiwanese cohort consisting of 356 surgically treated patients with extremity metastasis.

PATIENTS AND METHODS

We included 356 patients who underwent surgery for skeletal extremity metastasis in a tertiary center in Taiwan between 2014 and 2019 to validate PATHFx's survival predictions at 6 different time points. Model performance was assessed by concordance index (c-index), calibration analysis, decision curve analysis (DCA), Brier score, and model consistency (MC).

RESULTS

The c-indexes for the 1-, 3-, 6-, 12-, 18-, and 24-month survival estimations were 0.71, 0.66, 0.65, 0.69, 0.68, and 0.67, respectively. The calibration analysis demonstrated positive calibration intercepts for survival predictions at all 6 timepoints, indicating PATHFx tended to underestimate the actual survival. The Brier scores for the 6 models were all less than their respective null model's. DCA demonstrated that only the 6-, 12-, 18-, and 24-month predictions appeared useful for clinical decision-making across a wide range of threshold probabilities. The MC was < 0.9 when the 6- and 12-month models were compared with the 12-month and 18-month models, respectively.

INTERPRETATION

In this Asian cohort, PATHFx's performance was not as encouraging as those of prior validation studies. Clinicians should be cognizant of the potential decline in validity of any tools designed using data outside their particular patient population. Developers of survival prediction tools such as PATHFx might refine their algorithms using data from diverse, contemporary patients that is more reflective of the world's population.

Collapse

Jeong HW, Kim M, Choi HG, Park SY, Lee YS. Development of a machine learning model to predict lateral hinge fractures by analyzing patient factors before open wedge high tibial osteotomy. Knee Surg Sports Traumatol Arthrosc 2022:10.1007/s00167-022-07137-6. [PMID: 36036269 DOI: 10.1007/s00167-022-07137-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 08/19/2022] [Indexed: 11/25/2022]

A machine learning algorithm for predicting prolonged postoperative opioid prescription after lumbar disc herniation surgery. An external validation study using 1,316 patients from a Taiwanese cohort. Spine J 2022;22:1119-1130. [PMID: 35202784 DOI: 10.1016/j.spinee.2022.02.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Revised: 01/31/2022] [Accepted: 02/14/2022] [Indexed: 02/03/2023]

Abstract

BACKGROUND CONTEXT

Preoperative prediction of prolonged postoperative opioid prescription helps identify patients for increased surveillance after surgery. The SORG machine learning model has been developed and successfully tested using 5,413 patients from the United States (US) to predict the risk of prolonged opioid prescription after surgery for lumbar disc herniation. However, external validation is an often-overlooked element in the process of incorporating prediction models in current clinical practice. This cannot be stressed enough in prediction models where medicolegal and cultural differences may play a major role.

PURPOSE

The authors aimed to investigate the generalizability of the US citizens prediction model SORG to a Taiwanese patient cohort.

STUDY DESIGN

Retrospective study at a large academic medical center in Taiwan.

PATIENT SAMPLE

Of 1,316 patients who were 20 years or older undergoing initial operative management for lumbar disc herniation between 2010 and 2018.

OUTCOME MEASURES

The primary outcome of interest was prolonged opioid prescription defined as continuing opioid prescription to at least 90 to 180 days after the first surgery for lumbar disc herniation at our institution.

METHODS

Baseline characteristics were compared between the external validation cohort and the original developmental cohorts. Discrimination (area under the receiver operating characteristic curve and the area under the precision-recall curve), calibration, overall performance (Brier score), and decision curve analysis were used to assess the performance of the SORG ML algorithm in the validation cohort. This study had no funding source or conflict of interests.

RESULTS

Overall, 1,316 patients were identified with sustained postoperative opioid prescription in 41 (3.1%) patients. The validation cohort differed from the development cohort on several variables including 93% of Taiwanese patients receiving NSAIDS preoperatively compared with 22% of US citizens patients, while 30% of Taiwanese patients received opioids versus 25% in the US. Despite these differences, the SORG prediction model retained good discrimination (area under the receiver operating characteristic curve of 0.76 and the area under the precision-recall curve of 0.33) and good overall performance (Brier score of 0.028 compared with null model Brier score of 0.030) while somewhat overestimating the chance of prolonged opioid use (calibration slope of 1.07 and calibration intercept of -0.87). Decision-curve analysis showed the SORG model was suitable for clinical use.

CONCLUSIONS

Despite differences at baseline and a very strict opioid policy, the SORG algorithm for prolonged opioid use after surgery for lumbar disc herniation has good discriminative abilities and good overall performance in a Han Chinese patient group in Taiwan. This freely available digital application can be used to identify high-risk patients and tailor prevention policies for these patients that may mitigate the long-term adverse consequence of opioid dependence: https://sorg-apps.shinyapps.io/lumbardiscopioid/.

Collapse

Hunter S, Kioa G, Baker JF. Predictive Algorithms in the Diagnosis and Management of Pediatric Hip and Periarticular Infection. J Bone Joint Surg Am 2022;104:649-658. [PMID: 35167503 DOI: 10.2106/jbjs.21.01040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

An Evolution Gaining Momentum—The Growing Role of Artificial Intelligence in the Diagnosis and Treatment of Spinal Diseases. Diagnostics (Basel) 2022;12:diagnostics12040836. [PMID: 35453884 PMCID: PMC9025301 DOI: 10.3390/diagnostics12040836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Revised: 03/23/2022] [Accepted: 03/28/2022] [Indexed: 11/17/2022] Open

Body Composition Predictors of Adverse Postoperative Events in Patients Undergoing Surgery for Long Bone Metastases. J Am Acad Orthop Surg Glob Res Rev 2022;6:01979360-202203000-00010. [PMID: 35262530 PMCID: PMC8913089 DOI: 10.5435/jaaosglobal-d-22-00001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 01/03/2022] [Indexed: 11/23/2022]