1
|
Pan YT, Lin YP, Yen HK, Yen HH, Huang CC, Hsieh HC, Janssen S, Hu MH, Lin WH, Groot OQ. Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases? Clin Orthop Relat Res 2024; 482:1710-1721. [PMID: 38517402 PMCID: PMC11343550 DOI: 10.1097/corr.0000000000003030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 02/09/2024] [Indexed: 03/23/2024]
Abstract
BACKGROUND Bone metastasis in advanced cancer is challenging because of pain, functional issues, and reduced life expectancy. Treatment planning is complex, with consideration of factors such as location, symptoms, and prognosis. Prognostic models help guide treatment choices, with Skeletal Oncology Research Group machine-learning algorithms (SORG-MLAs) showing promise in predicting survival for initial spinal metastases and extremity metastases treated with surgery or radiotherapy. Improved therapies extend patient lifespans, increasing the risk of subsequent skeletal-related events (SREs). Patients experiencing subsequent SREs often suffer from disease progression, indicating a deteriorating condition. For these patients, a thorough evaluation, including accurate survival prediction, is essential to determine the most appropriate treatment and avoid aggressive surgical treatment for patients with a poor survival likelihood. Patients experiencing subsequent SREs often suffer from disease progression, indicating a deteriorating condition. However, some variables in the SORG prediction model, such as tumor histology, visceral metastasis, and previous systemic therapies, might remain consistent between initial and subsequent SREs. Given the prognostic difference between patients with and without a subsequent SRE, the efficacy of established prognostic models-originally designed for individuals with an initial SRE-in addressing a subsequent SRE remains uncertain. Therefore, it is crucial to verify the model's utility for subsequent SREs. QUESTION/PURPOSE We aimed to evaluate the reliability of the SORG-MLAs for survival prediction in patients undergoing surgery or radiotherapy for a subsequent SRE for whom both the initial and subsequent SREs occurred in the spine or extremities. METHODS We retrospectively included 738 patients who were 20 years or older who received surgery or radiotherapy for initial and subsequent SREs at a tertiary referral center and local hospital in Taiwan between 2010 and 2019. We excluded 74 patients whose initial SRE was in the spine and in whom the subsequent SRE occurred in the extremities and 37 patients whose initial SRE was in the extremities and the subsequent SRE was in the spine. The rationale was that different SORG-MLAs were exclusively designed for patients who had an initial spine metastasis and those who had an initial extremity metastasis, irrespective of whether they experienced metastatic events in other areas (for example, a patient experiencing an extremity SRE before his or her spinal SRE would also be regarded as a candidate for an initial spinal SRE). Because these patients were already validated in previous studies, we excluded them in case we overestimated our result. Five patients with malignant primary bone tumors and 38 patients in whom the metastasis's origin could not be identified were excluded, leaving 584 patients for analysis. The 584 included patients were categorized into two subgroups based on the location of initial and subsequent SREs: the spine group (68% [399]) and extremity group (32% [185]). No patients were lost to follow-up. Patient data at the time they presented with a subsequent SRE were collected, and survival predictions at this timepoint were calculated using the SORG-MLAs. Multiple imputation with the Missforest technique was conducted five times to impute the missing proportions of each predictor. The effectiveness of SORG-MLAs was gauged through several statistical measures, including discrimination (measured by the area under the receiver operating characteristic curve [AUC]), calibration, overall performance (Brier score), and decision curve analysis. Discrimination refers to the model's ability to differentiate between those with the event and those without the event. An AUC ranges from 0.5 to 1.0, with 0.5 indicating the worst discrimination and 1.0 indicating perfect discrimination. An AUC of 0.7 is considered clinically acceptable discrimination. Calibration is the comparison between the frequency of observed events and the predicted probabilities. In an ideal calibration, the observed and predicted survival rates should be congruent. The logarithm of observed-to-expected survival ratio [log(O:E)] offers insight into the model's overall calibration by considering the total number of observed (O) and expected (E) events. The Brier score measures the mean squared difference between the predicted probability of possible outcomes for each individual and the observed outcomes, ranging from 0 to 1, with 0 indicating perfect overall performance and 1 indicating the worst performance. Moreover, the prevalence of the outcome should be considered, so a null-model Brier score was also calculated by assigning a probability equal to the prevalence of the outcome (in this case, the actual survival rate) to each patient. The benefit of the prediction model is determined by comparing its Brier score with that of the null model. If a prediction model's Brier score is lower than the null model's Brier score, the prediction model is deemed as having good performance. A decision curve analysis was performed for models to evaluate the "net benefit," which weighs the true positive rate over the false positive rate against the "threshold probabilities," the ratio of risk over benefit after an intervention was derived based on a comprehensive clinical evaluation and a well-discussed shared-decision process. A good predictive model should yield a higher net benefit than default strategies (treating all patients and treating no patients) across a range of threshold probabilities. RESULTS For the spine group, the algorithms displayed acceptable AUC results (median AUCs of 0.69 to 0.72) for 42-day, 90-day, and 1-year survival predictions after treatment for a subsequent SRE. In contrast, the extremity group showed median AUCs ranging from 0.65 to 0.73 for the corresponding survival periods. All Brier scores were lower than those of their null model, indicating the SORG-MLAs' good overall performances for both cohorts. The SORG-MLAs yielded a net benefit for both cohorts; however, they overestimated 1-year survival probabilities in patients with a subsequent SRE in the spine, with a median log(O:E) of -0.60 (95% confidence interval -0.77 to -0.42). CONCLUSION The SORG-MLAs maintain satisfactory discriminatory capacity and offer considerable net benefits through decision curve analysis, indicating their continued viability as prediction tools in this clinical context. However, the algorithms overestimate 1-year survival rates for patients with a subsequent SRE of the spine, warranting consideration of specific patient groups. Clinicians and surgeons should exercise caution when using the SORG-MLAs for survival prediction in these patients and remain aware of potential mispredictions when tailoring treatment plans, with a preference for less invasive treatments. Ultimately, this study emphasizes the importance of enhancing prognostic algorithms and developing innovative tools for patients with subsequent SREs as the life expectancy in patients with bone metastases continues to improve and healthcare providers will encounter these patients more often in daily practice. LEVEL OF EVIDENCE Level III, prognostic study.
Collapse
Affiliation(s)
- Yu-Ting Pan
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Department of Medical Education, National Taiwan University Hospital, Taipei, Taiwan
| | - Yen-Po Lin
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
| | - Hung-Kuan Yen
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
- Department of Medical Education, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
| | - Hung-Ho Yen
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Department of Medical Education, National Taiwan University Hospital, Taipei, Taiwan
| | - Chi-Ching Huang
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Department of Medical Education, National Taiwan University Hospital, Taipei, Taiwan
| | - Hsiang-Chieh Hsieh
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
| | - Stein Janssen
- Department of Orthopedic Surgery and Sports Medicine, Amsterdam University Medical Centers, Amsterdam, the Netherlands
| | - Ming-Hsiao Hu
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Wei-Hsin Lin
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Olivier Q Groot
- Department of Orthopaedics, University Medical Center Utrecht, Utrecht, the Netherlands
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
2
|
Buddhiraju A, Shimizu MR, Seo HH, Chen TLW, RezazadehSaatlou M, Huang Z, Kwon YM. Generalizability of machine learning models predicting 30-day unplanned readmission after primary total knee arthroplasty using a nationally representative database. Med Biol Eng Comput 2024; 62:2333-2341. [PMID: 38558351 DOI: 10.1007/s11517-024-03075-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 03/15/2024] [Indexed: 04/04/2024]
Abstract
Unplanned readmission after primary total knee arthroplasty (TKA) costs an average of US $39,000 per episode and negatively impacts patient outcomes. Although predictive machine learning (ML) models show promise for risk stratification in specific populations, existing studies do not address model generalizability. This study aimed to establish the generalizability of previous institutionally developed ML models to predict 30-day readmission following primary TKA using a national database. Data from 424,354 patients from the ACS-NSQIP database was used to develop and validate four ML models to predict 30-day readmission risk after primary TKA. Individual model performance was assessed and compared based on discrimination, accuracy, calibration, and clinical utility. Length of stay (> 2.5 days), body mass index (BMI) (> 33.21 kg/m2), and operation time (> 93 min) were important determinants of 30-day readmission. All ML models demonstrated equally good accuracy, calibration, and discriminatory ability (Brier score, ANN = RF = HGB = NEPLR = 0.03; ANN, slope = 0.90, intercept = - 0.11; RF, slope = 0.93, intercept = - 0.12; HGB, slope = 0.90, intercept = - 0.12; NEPLR, slope = 0.77, intercept = 0.01; AUCANN = AUCRF = AUCHGB = AUCNEPLR = 0.78). This study validates the generalizability of four previously developed ML algorithms in predicting readmission risk in patients undergoing TKA and offers surgeons an opportunity to reduce readmissions by optimizing discharge planning, BMI, and surgical efficiency.
Collapse
Affiliation(s)
- Anirudh Buddhiraju
- Bioengineering Laboratory, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA, 02114, USA
| | - Michelle Riyo Shimizu
- Bioengineering Laboratory, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA, 02114, USA
| | - Henry Hojoon Seo
- Bioengineering Laboratory, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA, 02114, USA
| | - Tony Lin-Wei Chen
- Bioengineering Laboratory, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA, 02114, USA
- Department of Biomedical Engineering, Faculty of Engineering, The Hong Kong Polytechnic University, 999077, Hong Kong SAR, China
| | - MohammadAmin RezazadehSaatlou
- Bioengineering Laboratory, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA, 02114, USA
| | - Ziwei Huang
- Bioengineering Laboratory, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA, 02114, USA
| | - Young-Min Kwon
- Bioengineering Laboratory, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA, 02114, USA.
| |
Collapse
|
3
|
Lee CC, Chen CW, Yen HK, Lin YP, Lai CY, Wang JL, Groot OQ, Janssen SJ, Schwab JH, Hsu FM, Lin WH. Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone. Clin Orthop Relat Res 2024:00003086-990000000-01687. [PMID: 39051924 DOI: 10.1097/corr.0000000000003185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Accepted: 06/20/2024] [Indexed: 07/27/2024]
Abstract
BACKGROUND Survival estimation for patients with symptomatic skeletal metastases ideally should be made before a type of local treatment has already been determined. Currently available survival prediction tools, however, were generated using data from patients treated either operatively or with local radiation alone, raising concerns about whether they would generalize well to all patients presenting for assessment. The Skeletal Oncology Research Group machine-learning algorithm (SORG-MLA), trained with institution-based data of surgically treated patients, and the Metastases location, Elderly, Tumor primary, Sex, Sickness/comorbidity, and Site of radiotherapy model (METSSS), trained with registry-based data of patients treated with radiotherapy alone, are two of the most recently developed survival prediction models, but they have not been tested on patients whose local treatment strategy is not yet decided. QUESTIONS/PURPOSES (1) Which of these two survival prediction models performed better in a mixed cohort made up both of patients who received local treatment with surgery followed by radiotherapy and who had radiation alone for symptomatic bone metastases? (2) Which model performed better among patients whose local treatment consisted of only palliative radiotherapy? (3) Are laboratory values used by SORG-MLA, which are not included in METSSS, independently associated with survival after controlling for predictions made by METSSS? METHODS Between 2010 and 2018, we provided local treatment for 2113 adult patients with skeletal metastases in the extremities at an urban tertiary referral academic medical center using one of two strategies: (1) surgery followed by postoperative radiotherapy or (2) palliative radiotherapy alone. Every patient's survivorship status was ascertained either by their medical records or the national death registry from the Taiwanese National Health Insurance Administration. After applying a priori designated exclusion criteria, 91% (1920) were analyzed here. Among them, 48% (920) of the patients were female, and the median (IQR) age was 62 years (53 to 70 years). Lung was the most common primary tumor site (41% [782]), and 59% (1128) of patients had other skeletal metastases in addition to the treated lesion(s). In general, the indications for surgery were the presence of a complete pathologic fracture or an impending pathologic fracture, defined as having a Mirels score of ≥ 9, in patients with an American Society of Anesthesiologists (ASA) classification of less than or equal to IV and who were considered fit for surgery. The indications for radiotherapy were relief of pain, local tumor control, prevention of skeletal-related events, and any combination of the above. In all, 84% (1610) of the patients received palliative radiotherapy alone as local treatment for the target lesion(s), and 16% (310) underwent surgery followed by postoperative radiotherapy. Neither METSSS nor SORG-MLA was used at the point of care to aid clinical decision-making during the treatment period. Survival was retrospectively estimated by these two models to test their potential for providing survival probabilities. We first compared SORG to METSSS in the entire population. Then, we repeated the comparison in patients who received local treatment with palliative radiation alone. We assessed model performance by area under the receiver operating characteristic curve (AUROC), calibration analysis, Brier score, and decision curve analysis (DCA). The AUROC measures discrimination, which is the ability to distinguish patients with the event of interest (such as death at a particular time point) from those without. AUROC typically ranges from 0.5 to 1.0, with 0.5 indicating random guessing and 1.0 a perfect prediction, and in general, an AUROC of ≥ 0.7 indicates adequate discrimination for clinical use. Calibration refers to the agreement between the predicted outcomes (in this case, survival probabilities) and the actual outcomes, with a perfect calibration curve having an intercept of 0 and a slope of 1. A positive intercept indicates that the actual survival is generally underestimated by the prediction model, and a negative intercept suggests the opposite (overestimation). When comparing models, an intercept closer to 0 typically indicates better calibration. Calibration can also be summarized as log(O:E), the logarithm scale of the ratio of observed (O) to expected (E) survivors. A log(O:E) > 0 signals an underestimation (the observed survival is greater than the predicted survival); and a log(O:E) < 0 indicates the opposite (the observed survival is lower than the predicted survival). A model with a log(O:E) closer to 0 is generally considered better calibrated. The Brier score is the mean squared difference between the model predictions and the observed outcomes, and it ranges from 0 (best prediction) to 1 (worst prediction). The Brier score captures both discrimination and calibration, and it is considered a measure of overall model performance. In Brier score analysis, the "null model" assigns a predicted probability equal to the prevalence of the outcome and represents a model that adds no new information. A prediction model should achieve a Brier score at least lower than the null-model Brier score to be considered as useful. The DCA was developed as a method to determine whether using a model to inform treatment decisions would do more good than harm. It plots the net benefit of making decisions based on the model's predictions across all possible risk thresholds (or cost-to-benefit ratios) in relation to the two default strategies of treating all or no patients. The care provider can decide on an acceptable risk threshold for the proposed treatment in an individual and assess the corresponding net benefit to determine whether consulting with the model is superior to adopting the default strategies. Finally, we examined whether laboratory data, which were not included in the METSSS model, would have been independently associated with survival after controlling for the METSSS model's predictions by using the multivariable logistic and Cox proportional hazards regression analyses. RESULTS Between the two models, only SORG-MLA achieved adequate discrimination (an AUROC of > 0.7) in the entire cohort (of patients treated operatively or with radiation alone) and in the subgroup of patients treated with palliative radiotherapy alone. SORG-MLA outperformed METSSS by a wide margin on discrimination, calibration, and Brier score analyses in not only the entire cohort but also the subgroup of patients whose local treatment consisted of radiotherapy alone. In both the entire cohort and the subgroup, DCA demonstrated that SORG-MLA provided more net benefit compared with the two default strategies (of treating all or no patients) and compared with METSSS when risk thresholds ranged from 0.2 to 0.9 at both 90 days and 1 year, indicating that using SORG-MLA as a decision-making aid was beneficial when a patient's individualized risk threshold for opting for treatment was 0.2 to 0.9. Higher albumin, lower alkaline phosphatase, lower calcium, higher hemoglobin, lower international normalized ratio, higher lymphocytes, lower neutrophils, lower neutrophil-to-lymphocyte ratio, lower platelet-to-lymphocyte ratio, higher sodium, and lower white blood cells were independently associated with better 1-year and overall survival after adjusting for the predictions made by METSSS. CONCLUSION Based on these discoveries, clinicians might choose to consult SORG-MLA instead of METSSS for survival estimation in patients with long-bone metastases presenting for evaluation of local treatment. Basing a treatment decision on the predictions of SORG-MLA could be beneficial when a patient's individualized risk threshold for opting to undergo a particular treatment strategy ranged from 0.2 to 0.9. Future studies might investigate relevant laboratory items when constructing or refining a survival estimation model because these data demonstrated prognostic value independent of the predictions of the METSSS model, and future studies might also seek to keep these models up to date using data from diverse, contemporary patients undergoing both modern operative and nonoperative treatments. LEVEL OF EVIDENCE Level III, diagnostic study.
Collapse
Affiliation(s)
- Chia-Che Lee
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | - Chih-Wei Chen
- Department of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
| | - Hung-Kuan Yen
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu branch, Hsinchu, Taiwan
| | - Yen-Po Lin
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu branch, Hsinchu, Taiwan
| | - Cheng-Yo Lai
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu branch, Hsinchu, Taiwan
| | - Jaw-Lin Wang
- Department of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
| | - Olivier Q Groot
- Department of Orthopedic Surgery and Sports Medicine, Amsterdam University Medical Centers, Amsterdam, the Netherlands
- Department of Orthopaedics, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Stein J Janssen
- Department of Orthopedic Surgery and Sports Medicine, Amsterdam University Medical Centers, Amsterdam, the Netherlands
| | - Joseph H Schwab
- Department of Orthopedics and Neurosurgery, Cedars Sinai Hospital, Los Angeles, CA, USA
| | - Feng-Ming Hsu
- Division of Radiation Oncology, Department of Oncology, National Taiwan University Hospital, Taipei, Taiwan
- Graduate Institute of Oncology, National Taiwan University College of Medicine, Taipei, Taiwan
- Department of Radiation Oncology, National Taiwan University Cancer Center, Taipei, Taiwan
| | - Wei-Hsin Lin
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
| |
Collapse
|
4
|
Farrow L, Zhong M, Anderson L. Use of natural language processing techniques to predict patient selection for total hip and knee arthroplasty from radiology reports. Bone Joint J 2024; 106-B:688-695. [PMID: 38945535 DOI: 10.1302/0301-620x.106b7.bjj-2024-0136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Aims To examine whether natural language processing (NLP) using a clinically based large language model (LLM) could be used to predict patient selection for total hip or total knee arthroplasty (THA/TKA) from routinely available free-text radiology reports. Methods Data pre-processing and analyses were conducted according to the Artificial intelligence to Revolutionize the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project protocol. This included use of de-identified Scottish regional clinical data of patients referred for consideration of THA/TKA, held in a secure data environment designed for artificial intelligence (AI) inference. Only preoperative radiology reports were included. NLP algorithms were based on the freely available GatorTron model, a LLM trained on over 82 billion words of de-identified clinical text. Two inference tasks were performed: assessment after model-fine tuning (50 Epochs and three cycles of k-fold cross validation), and external validation. Results For THA, there were 5,558 patient radiology reports included, of which 4,137 were used for model training and testing, and 1,421 for external validation. Following training, model performance demonstrated average (mean across three folds) accuracy, F1 score, and area under the receiver operating curve (AUROC) values of 0.850 (95% confidence interval (CI) 0.833 to 0.867), 0.813 (95% CI 0.785 to 0.841), and 0.847 (95% CI 0.822 to 0.872), respectively. For TKA, 7,457 patient radiology reports were included, with 3,478 used for model training and testing, and 3,152 for external validation. Performance metrics included accuracy, F1 score, and AUROC values of 0.757 (95% CI 0.702 to 0.811), 0.543 (95% CI 0.479 to 0.607), and 0.717 (95% CI 0.657 to 0.778) respectively. There was a notable deterioration in performance on external validation in both cohorts. Conclusion The use of routinely available preoperative radiology reports provides promising potential to help screen suitable candidates for THA, but not for TKA. The external validation results demonstrate the importance of further model testing and training when confronted with new clinical cohorts.
Collapse
Affiliation(s)
- Luke Farrow
- Grampian Orthopaedics, Aberdeen Royal Infirmary, Aberdeen, UK
- Institute of Applied Health Sciences, University of Aberdeen, Aberdeen, UK
| | - Mingjun Zhong
- Institute of Applied Health Sciences, University of Aberdeen, Aberdeen, UK
| | - Lesley Anderson
- Institute of Applied Health Sciences, University of Aberdeen, Aberdeen, UK
| |
Collapse
|
5
|
Ritter D, Denard PJ, Raiss P, Wijdicks CA, Bachmaier S. Preoperative 3-dimensional computed tomography bone density measures provide objective bone quality classifications for stemless anatomic total shoulder arthroplasty. J Shoulder Elbow Surg 2024; 33:1503-1511. [PMID: 38182017 DOI: 10.1016/j.jse.2023.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/26/2023] [Accepted: 11/12/2023] [Indexed: 01/07/2024]
Abstract
BACKGROUND Reproducible methods for determining adequate bone densities for stemless anatomic total shoulder arthroplasty (aTSA) are currently lacking. The purpose of this study was to evaluate the utility of preoperative computed tomography (CT) imaging for assessing the bone density of the proximal humerus for supportive differentiation in the decision making for stemless humeral component implantation. It was hypothesized that preoperative 3-dimensional (3-D) CT bone density measures provide objective classifications of the bone quality for stemless aTSA. METHODS A 3-part study was performed that included the analysis of cadaveric humerus CT scans followed by retrospective application to a clinical cohort and classification with a machine learning model. Thirty cadaveric humeri were evaluated with clinical CT and micro-CT (μCT) imaging. Phantom-calibrated CT data were used to extract 3-D regions of interest and defined radiographic scores. The final image processing script was applied retrospectively to a clinical cohort (n = 150) that had a preoperative CT and intraoperative bone density assessment using the "thumb test," followed by placement of an anatomic stemmed or stemless humeral component. Postscan patient-specific calibration was used to improve the functionality and accuracy of the density analysis. A machine learning model (Support vector machine [SVM]) was utilized to improve the classification of bone densities for a stemless humeral component. RESULTS The image processing of clinical CT images demonstrated good to excellent accuracy for cylindrical cancellous bone densities (metaphysis [ICC = 0.986] and epiphysis [ICC = 0.883]). Patient-specific internal calibration significantly reduced biases and unwanted variance compared with standard HU CT scans (P < .0001). The SVM showed optimized prediction accuracy compared with conventional statistics with an accuracy of 73.9% and an AUC of 0.83 based on the intraoperative decision of the surgeon. The SVM model based on density clusters increased the accuracy of the bone quality classification to 87.3% with an AUC of 0.93. CONCLUSIONS Preoperative CT imaging allows accurate evaluation of the bone densities in the proximal humerus. Three-dimensional regions of interest, rescaling using patient-specific calibration, and a machine learning model resulted in good to excellent prediction for objective bone quality classification. This approach may provide an objective tool extending preoperative selection criteria for stemless humeral component implantation.
Collapse
Affiliation(s)
- Daniel Ritter
- Department of Orthopedic Research, Arthrex GmbH, Munich, Germany; Department of Orthopaedics and Trauma Surgery, Musculoskeletal University Center Munich (MUM), University Hospital, LMU Munich, Munich, Germany.
| | | | | | - Coen A Wijdicks
- Department of Orthopedic Research, Arthrex GmbH, Munich, Germany
| | - Samuel Bachmaier
- Department of Orthopedic Research, Arthrex GmbH, Munich, Germany
| |
Collapse
|
6
|
Chen Y, Zhang S, Tang N, George DM, Huang T, Tang J. Using Google web search to analyze and evaluate the application of ChatGPT in femoroacetabular impingement syndrome. Front Public Health 2024; 12:1412063. [PMID: 38883198 PMCID: PMC11176516 DOI: 10.3389/fpubh.2024.1412063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 05/23/2024] [Indexed: 06/18/2024] Open
Abstract
Background Chat Generative Pre-trained Transformer (ChatGPT) is a new machine learning tool that allows patients to access health information online, specifically compared to Google, the most commonly used search engine in the United States. Patients can use ChatGPT to better understand medical issues. This study compared the two search engines based on: (i) frequently asked questions (FAQs) about Femoroacetabular Impingement Syndrome (FAI), (ii) the corresponding answers to these FAQs, and (iii) the most FAQs yielding a numerical response. Purpose To assess the suitability of ChatGPT as an online health information resource for patients by replicating their internet searches. Study design Cross-sectional study. Methods The same keywords were used to search the 10 most common questions about FAI on both Google and ChatGPT. The responses from both search engines were recorded and analyzed. Results Of the 20 questions, 8 (40%) were similar. Among the 10 questions searched on Google, 7 were provided by a medical practice. For numerical questions, there was a notable difference in answers between Google and ChatGPT for 3 out of the top 5 most common questions (60%). Expert evaluation indicated that 67.5% of experts were satisfied or highly satisfied with the accuracy of ChatGPT's descriptions of both conservative and surgical treatment options for FAI. Additionally, 62.5% of experts were satisfied or highly satisfied with the safety of the information provided. Regarding the etiology of FAI, including cam and pincer impingements, 52.5% of experts expressed satisfaction or high satisfaction with ChatGPT's explanations. Overall, 62.5% of experts affirmed that ChatGPT could serve effectively as a reliable medical resource for initial information retrieval. Conclusion This study confirms that ChatGPT, despite being a new tool, shows significant potential as a supplementary resource for health information on FAI. Expert evaluations commend its capacity to provide accurate and comprehensive responses, valued by medical professionals for relevance and safety. Nonetheless, continuous improvements in its medical content's depth and precision are recommended for ongoing reliability. While ChatGPT offers a promising alternative to traditional search engines, meticulous validation is imperative before it can be fully embraced as a trusted medical resource.
Collapse
Affiliation(s)
- Yifan Chen
- Orthopaedic Department, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Shengqun Zhang
- Orthopaedic Department, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Ning Tang
- Orthopaedic Department, The Third Xiangya Hospital of Central South University, Changsha, China
| | | | - Tianlong Huang
- Orthopaedic Department, The Second Xiangya Hospital of Central South University, Changsha, China
| | - JinPing Tang
- Department of Orthopaedics, The Third People's Hospital of Chenzhou, Chenzhou, Hunan, China
| |
Collapse
|
7
|
Li S, Bao YG, Wu B. Letter to the editor regarding the article "artificial intelligence and computer-assisted navigation for shoulder surgery". J Orthop Surg (Hong Kong) 2024; 32:10225536241263656. [PMID: 38871346 DOI: 10.1177/10225536241263656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/15/2024] Open
Affiliation(s)
- Shu Li
- Department of Clinical Medicine, Jining Medical University, Jining City, China
| | - Yong-Gang Bao
- Department of Clinical Medicine, Jining Medical University, Jining City, China
| | - Bin Wu
- Department of Orthopedics, Affiliated Hospital of Jining Medical University, Jining City, China
| |
Collapse
|
8
|
Yang J, Ardavanis KS, Slack KE, Fernando ND, Della Valle CJ, Hernandez NM. Chat Generative Pretrained Transformer (ChatGPT) and Bard: Artificial Intelligence Does not yet Provide Clinically Supported Answers for Hip and Knee Osteoarthritis. J Arthroplasty 2024; 39:1184-1190. [PMID: 38237878 DOI: 10.1016/j.arth.2024.01.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 01/08/2024] [Accepted: 01/11/2024] [Indexed: 02/22/2024] Open
Abstract
BACKGROUND Advancements in artificial intelligence (AI) have led to the creation of large language models (LLMs), such as Chat Generative Pretrained Transformer (ChatGPT) and Bard, that analyze online resources to synthesize responses to user queries. Despite their popularity, the accuracy of LLM responses to medical questions remains unknown. This study aimed to compare the responses of ChatGPT and Bard regarding treatments for hip and knee osteoarthritis with the American Academy of Orthopaedic Surgeons (AAOS) Evidence-Based Clinical Practice Guidelines (CPGs) recommendations. METHODS Both ChatGPT (Open AI) and Bard (Google) were queried regarding 20 treatments (10 for hip and 10 for knee osteoarthritis) from the AAOS CPGs. Responses were classified by 2 reviewers as being in "Concordance," "Discordance," or "No Concordance" with AAOS CPGs. A Cohen's Kappa coefficient was used to assess inter-rater reliability, and Chi-squared analyses were used to compare responses between LLMs. RESULTS Overall, ChatGPT and Bard provided responses that were concordant with the AAOS CPGs for 16 (80%) and 12 (60%) treatments, respectively. Notably, ChatGPT and Bard encouraged the use of non-recommended treatments in 30% and 60% of queries, respectively. There were no differences in performance when evaluating by joint or by recommended versus non-recommended treatments. Studies were referenced in 6 (30%) of the Bard responses and none (0%) of the ChatGPT responses. Of the 6 Bard responses, studies could only be identified for 1 (16.7%). Of the remaining, 2 (33.3%) responses cited studies in journals that did not exist, 2 (33.3%) cited studies that could not be found with the information given, and 1 (16.7%) provided links to unrelated studies. CONCLUSIONS Both ChatGPT and Bard do not consistently provide responses that align with the AAOS CPGs. Consequently, physicians and patients should temper expectations on the guidance AI platforms can currently provide.
Collapse
Affiliation(s)
- JaeWon Yang
- Department of Orthopaedic Surgery, University of Washington, Seattle, Washington
| | - Kyle S Ardavanis
- Department of Orthopaedic Surgery, Madigan Medical Center, Tacoma, Washington
| | - Katherine E Slack
- Elson S. Floyd College of Medicine, Washington State University, Spokane, Washington
| | - Navin D Fernando
- Department of Orthopaedic Surgery, University of Washington, Seattle, Washington
| | - Craig J Della Valle
- Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, Illinois
| | - Nicholas M Hernandez
- Department of Orthopaedic Surgery, University of Washington, Seattle, Washington
| |
Collapse
|
9
|
Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, Ghassemi M, Liu X, Reitsma JB, van Smeden M, Boulesteix AL, Camaradou JC, Celi LA, Denaxas S, Denniston AK, Glocker B, Golub RM, Harvey H, Heinze G, Hoffman MM, Kengne AP, Lam E, Lee N, Loder EW, Maier-Hein L, Mateen BA, McCradden MD, Oakden-Rayner L, Ordish J, Parnell R, Rose S, Singh K, Wynants L, Logullo P. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024; 385:e078378. [PMID: 38626948 PMCID: PMC11019967 DOI: 10.1136/bmj-2023-078378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/17/2024] [Indexed: 04/19/2024]
Affiliation(s)
- Gary S Collins
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Karel G M Moons
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Paula Dhiman
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
| | - Andrew L Beam
- Department of Epidemiology, Harvard T H Chan School of Public Health, Boston, MA, USA
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Science, Leiden University Medical Centre, Leiden, Netherlands
| | - Marzyeh Ghassemi
- Department of Electrical Engineering and Computer Science, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Xiaoxuan Liu
- Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Johannes B Reitsma
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Maarten van Smeden
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, Faculty of Medicine, Ludwig-Maximilians-University of Munich and Munich Centre of Machine Learning, Germany
| | - Jennifer Catherine Camaradou
- Patient representative, Health Data Research UK patient and public involvement and engagement group
- Patient representative, University of East Anglia, Faculty of Health Sciences, Norwich Research Park, Norwich, UK
| | - Leo Anthony Celi
- Beth Israel Deaconess Medical Center, Boston, MA, USA
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biostatistics, Harvard T H Chan School of Public Health, Boston, MA, USA
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK
- British Heart Foundation Data Science Centre, London, UK
| | - Alastair K Denniston
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
- Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Ben Glocker
- Department of Computing, Imperial College London, London, UK
| | - Robert M Golub
- Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | | | - Georg Heinze
- Section for Clinical Biometrics, Centre for Medical Data Science, Medical University of Vienna, Vienna, Austria
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | | | - Emily Lam
- Patient representative, Health Data Research UK patient and public involvement and engagement group
| | - Naomi Lee
- National Institute for Health and Care Excellence, London, UK
| | - Elizabeth W Loder
- The BMJ, London, UK
- Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Lena Maier-Hein
- Department of Intelligent Medical Systems, German Cancer Research Centre, Heidelberg, Germany
| | - Bilal A Mateen
- Institute of Health Informatics, University College London, London, UK
- Wellcome Trust, London, UK
- Alan Turing Institute, London, UK
| | - Melissa D McCradden
- Department of Bioethics, Hospital for Sick Children Toronto, ON, Canada
- Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, Canada
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
| | - Johan Ordish
- Medicines and Healthcare products Regulatory Agency, London, UK
| | - Richard Parnell
- Patient representative, Health Data Research UK patient and public involvement and engagement group
| | - Sherri Rose
- Department of Health Policy and Center for Health Policy, Stanford University, Stanford, CA, USA
| | - Karandeep Singh
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
| | - Laure Wynants
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
| | - Patricia Logullo
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| |
Collapse
|
10
|
Toh ZA, Berg B, Han QYC, Hey HWD, Pikkarainen M, Grotle M, He HG. Clinical Decision Support System Used in Spinal Disorders: Scoping Review. J Med Internet Res 2024; 26:e53951. [PMID: 38502157 PMCID: PMC10988379 DOI: 10.2196/53951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 01/29/2024] [Accepted: 02/10/2024] [Indexed: 03/20/2024] Open
Abstract
BACKGROUND Spinal disorders are highly prevalent worldwide with high socioeconomic costs. This cost is associated with the demand for treatment and productivity loss, prompting the exploration of technologies to improve patient outcomes. Clinical decision support systems (CDSSs) are computerized systems that are increasingly used to facilitate safe and efficient health care. Their applications range in depth and can be found across health care specialties. OBJECTIVE This scoping review aims to explore the use of CDSSs in patients with spinal disorders. METHODS We used the Joanna Briggs Institute methodological guidance for this scoping review and reported according to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) statement. Databases, including PubMed, Embase, Cochrane, CINAHL, Web of Science, Scopus, ProQuest, and PsycINFO, were searched from inception until October 11, 2022. The included studies examined the use of digitalized CDSSs in patients with spinal disorders. RESULTS A total of 4 major CDSS functions were identified from 31 studies: preventing unnecessary imaging (n=8, 26%), aiding diagnosis (n=6, 19%), aiding prognosis (n=11, 35%), and recommending treatment options (n=6, 20%). Most studies used the knowledge-based system. Logistic regression was the most commonly used method, followed by decision tree algorithms. The use of CDSSs to aid in the management of spinal disorders was generally accepted over the threat to physicians' clinical decision-making autonomy. CONCLUSIONS Although the effectiveness was frequently evaluated by examining the agreement between the decisions made by the CDSSs and the health care providers, comparing the CDSS recommendations with actual clinical outcomes would be preferable. In addition, future studies on CDSS development should focus on system integration, considering end user's needs and preferences, and external validation and impact studies to assess effectiveness and generalizability. TRIAL REGISTRATION OSF Registries osf.io/dyz3f; https://osf.io/dyz3f.
Collapse
Affiliation(s)
- Zheng An Toh
- National University Hospital, National University Health System, Singapore, Singapore
| | - Bjørnar Berg
- Centre for Intelligent Musculoskeletal Health, Faculty of Health Sciences, Oslo Metropolitan University, Oslo, Norway
| | | | - Hwee Weng Dennis Hey
- Division of Orthopaedic Surgery, National University Hospital, National University Health System, Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Minna Pikkarainen
- Department of Rehabilitation and Health Technology, Oslo Metropolitan University, Oslo, Norway
- Martti Ahtisaari Institute, Oulu Business School, Oulu University, Oulu, Finland
- Department of Product Design, Oslo Metropolitan University, Oslo, Norway
| | - Margreth Grotle
- Centre for Intelligent Musculoskeletal Health, Faculty of Health Sciences, Oslo Metropolitan University, Oslo, Norway
- Department of Research and Innovation, Division of Clinical Neuroscience, Oslo University Hospital, Oslo, Norway
| | - Hong-Gu He
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| |
Collapse
|
11
|
Dijkstra H, van de Kuit A, de Groot T, Canta O, Groot OQ, Oosterhoff JH, Doornberg JN. Systematic review of machine-learning models in orthopaedic trauma. Bone Jt Open 2024; 5:9-19. [PMID: 38226447 PMCID: PMC10790183 DOI: 10.1302/2633-1462.51.bjo-2023-0095.r1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/17/2024] Open
Abstract
Aims Machine-learning (ML) prediction models in orthopaedic trauma hold great promise in assisting clinicians in various tasks, such as personalized risk stratification. However, an overview of current applications and critical appraisal to peer-reviewed guidelines is lacking. The objectives of this study are to 1) provide an overview of current ML prediction models in orthopaedic trauma; 2) evaluate the completeness of reporting following the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement; and 3) assess the risk of bias following the Prediction model Risk Of Bias Assessment Tool (PROBAST) tool. Methods A systematic search screening 3,252 studies identified 45 ML-based prediction models in orthopaedic trauma up to January 2023. The TRIPOD statement assessed transparent reporting and the PROBAST tool the risk of bias. Results A total of 40 studies reported on training and internal validation; four studies performed both development and external validation, and one study performed only external validation. The most commonly reported outcomes were mortality (33%, 15/45) and length of hospital stay (9%, 4/45), and the majority of prediction models were developed in the hip fracture population (60%, 27/45). The overall median completeness for the TRIPOD statement was 62% (interquartile range 30 to 81%). The overall risk of bias in the PROBAST tool was low in 24% (11/45), high in 69% (31/45), and unclear in 7% (3/45) of the studies. High risk of bias was mainly due to analysis domain concerns including small datasets with low number of outcomes, complete-case analysis in case of missing data, and no reporting of performance measures. Conclusion The results of this study showed that despite a myriad of potential clinically useful applications, a substantial part of ML studies in orthopaedic trauma lack transparent reporting, and are at high risk of bias. These problems must be resolved by following established guidelines to instil confidence in ML models among patients and clinicians. Otherwise, there will remain a sizeable gap between the development of ML prediction models and their clinical application in our day-to-day orthopaedic trauma practice.
Collapse
Affiliation(s)
- Hidde Dijkstra
- Department of Orthopaedic Surgery, University Medical Centre Groningen, Groningen, Netherlands
- University Center for Geriatric Medicine, University of Groningen, University Medical Center Groningen, Groningen, Netherlands
| | - Anouk van de Kuit
- Department of Orthopaedic Surgery, University Medical Centre Groningen, Groningen, Netherlands
| | - Tom de Groot
- Department of Orthopaedic Surgery, University Medical Centre Groningen, Groningen, Netherlands
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Olga Canta
- Department of Orthopaedic Surgery, University Medical Centre Groningen, Groningen, Netherlands
| | - Olivier Q. Groot
- Department of Orthopaedic Surgery, University Medical Centre Utrecht, University of Utrecht, Utrecht, Netherlands
| | - Jacobien H. Oosterhoff
- Department of Engineering Systems & Services, Faculty Technology Policy and Management, Delft University of Technology, Delft, Netherlands
| | - Job N. Doornberg
- Department of Orthopaedic Surgery, University Medical Centre Groningen, Groningen, Netherlands
- Department of Orthopaedic Trauma Surgery, Flinders Medical Center, Flinders University, Adelaide, Australia
| | | |
Collapse
|
12
|
Huang CC, Peng KP, Hsieh HC, Groot OQ, Yen HK, Tsai CC, Karhade AV, Lin YP, Kao YT, Yang JJ, Dai SH, Huang CC, Chen CW, Yen MH, Xiao FR, Lin WH, Verlaan JJ, Schwab JH, Hsu FM, Wong T, Yang RS, Yang SH, Hu MH. Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm. Clin Orthop Relat Res 2024; 482:143-157. [PMID: 37306629 PMCID: PMC10723864 DOI: 10.1097/corr.0000000000002706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 01/20/2023] [Accepted: 05/01/2023] [Indexed: 06/13/2023]
Abstract
BACKGROUND The Skeletal Oncology Research Group machine-learning algorithm (SORG-MLA) was developed to predict the survival of patients with spinal metastasis. The algorithm was successfully tested in five international institutions using 1101 patients from different continents. The incorporation of 18 prognostic factors strengthens its predictive ability but limits its clinical utility because some prognostic factors might not be clinically available when a clinician wishes to make a prediction. QUESTIONS/PURPOSES We performed this study to (1) evaluate the SORG-MLA's performance with data and (2) develop an internet-based application to impute the missing data. METHODS A total of 2768 patients were included in this study. The data of 617 patients who were treated surgically were intentionally erased, and the data of the other 2151 patients who were treated with radiotherapy and medical treatment were used to impute the artificially missing data. Compared with those who were treated nonsurgically, patients undergoing surgery were younger (median 59 years [IQR 51 to 67 years] versus median 62 years [IQR 53 to 71 years]) and had a higher proportion of patients with at least three spinal metastatic levels (77% [474 of 617] versus 72% [1547 of 2151]), more neurologic deficit (normal American Spinal Injury Association [E] 68% [301 of 443] versus 79% [1227 of 1561]), higher BMI (23 kg/m 2 [IQR 20 to 25 kg/m 2 ] versus 22 kg/m 2 [IQR 20 to 25 kg/m 2 ]), higher platelet count (240 × 10 3 /µL [IQR 173 to 327 × 10 3 /µL] versus 227 × 10 3 /µL [IQR 165 to 302 × 10 3 /µL], higher lymphocyte count (15 × 10 3 /µL [IQR 9 to 21× 10 3 /µL] versus 14 × 10 3 /µL [IQR 8 to 21 × 10 3 /µL]), lower serum creatinine level (0.7 mg/dL [IQR 0.6 to 0.9 mg/dL] versus 0.8 mg/dL [IQR 0.6 to 1.0 mg/dL]), less previous systemic therapy (19% [115 of 617] versus 24% [526 of 2151]), fewer Charlson comorbidities other than cancer (28% [170 of 617] versus 36% [770 of 2151]), and longer median survival. The two patient groups did not differ in other regards. These findings aligned with our institutional philosophy of selecting patients for surgical intervention based on their level of favorable prognostic factors such as BMI or lymphocyte counts and lower levels of unfavorable prognostic factors such as white blood cell counts or serum creatinine level, as well as the degree of spinal instability and severity of neurologic deficits. This approach aims to identify patients with better survival outcomes and prioritize their surgical intervention accordingly. Seven factors (serum albumin and alkaline phosphatase levels, international normalized ratio, lymphocyte and neutrophil counts, and the presence of visceral or brain metastases) were considered possible missing items based on five previous validation studies and clinical experience. Artificially missing data were imputed using the missForest imputation technique, which was previously applied and successfully tested to fit the SORG-MLA in validation studies. Discrimination, calibration, overall performance, and decision curve analysis were applied to evaluate the SORG-MLA's performance. The discrimination ability was measured with an area under the receiver operating characteristic curve. It ranges from 0.5 to 1.0, with 0.5 indicating the worst discrimination and 1.0 indicating perfect discrimination. An area under the curve of 0.7 is considered clinically acceptable discrimination. Calibration refers to the agreement between the predicted outcomes and actual outcomes. An ideal calibration model will yield predicted survival rates that are congruent with the observed survival rates. The Brier score measures the squared difference between the actual outcome and predicted probability, which captures calibration and discrimination ability simultaneously. A Brier score of 0 indicates perfect prediction, whereas a Brier score of 1 indicates the poorest prediction. A decision curve analysis was performed for the 6-week, 90-day, and 1-year prediction models to evaluate their net benefit across different threshold probabilities. Using the results from our analysis, we developed an internet-based application that facilitates real-time data imputation for clinical decision-making at the point of care. This tool allows healthcare professionals to efficiently and effectively address missing data, ensuring that patient care remains optimal at all times. RESULTS Generally, the SORG-MLA demonstrated good discriminatory ability, with areas under the curve greater than 0.7 in most cases, and good overall performance, with up to 25% improvement in Brier scores in the presence of one to three missing items. The only exceptions were albumin level and lymphocyte count, because the SORG-MLA's performance was reduced when these two items were missing, indicating that the SORG-MLA might be unreliable without these values. The model tended to underestimate the patient survival rate. As the number of missing items increased, the model's discriminatory ability was progressively impaired, and a marked underestimation of patient survival rates was observed. Specifically, when three items were missing, the number of actual survivors was up to 1.3 times greater than the number of expected survivors, while only 10% discrepancy was observed when only one item was missing. When either two or three items were omitted, the decision curves exhibited substantial overlap, indicating a lack of consistent disparities in performance. This finding suggests that the SORG-MLA consistently generates accurate predictions, regardless of the two or three items that are omitted. We developed an internet application ( https://sorg-spine-mets-missing-data-imputation.azurewebsites.net/ ) that allows the use of SORG-MLA with up to three missing items. CONCLUSION The SORG-MLA generally performed well in the presence of one to three missing items, except for serum albumin level and lymphocyte count (which are essential for adequate predictions, even using our modified version of the SORG-MLA). We recommend that future studies should develop prediction models that allow for their use when there are missing data, or provide a means to impute those missing data, because some data are not available at the time a clinical decision must be made. CLINICAL RELEVANCE The results suggested the algorithm could be helpful when a radiologic evaluation owing to a lengthy waiting period cannot be performed in time, especially in situations when an early operation could be beneficial. It could help orthopaedic surgeons to decide whether to intervene palliatively or extensively, even when the surgical indication is clear.
Collapse
Affiliation(s)
- Chi-Ching Huang
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Department of Medical Education, National Taiwan University Hospital, Taipei, Taiwan
| | - Kuang-Ping Peng
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
| | - Hsiang-Chieh Hsieh
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
| | - Olivier Q. Groot
- Department of Orthopaedics, University Medical Center Utrecht, Utrecht, the Netherlands
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Hung-Kuan Yen
- Department of Medical Education, National Taiwan University Hospital, Taipei, Taiwan
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
| | - Cheng-Chen Tsai
- Department of Medical Education, National Taiwan University Hospital, Taipei, Taiwan
| | - Aditya V. Karhade
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Yen-Po Lin
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
| | - Yin-Tien Kao
- Department of Medical Education, National Taiwan University Hospital, Taipei, Taiwan
| | - Jiun-Jen Yang
- Department of Medical Education, National Taiwan University Hospital, Taipei, Taiwan
| | - Shih-Hsiang Dai
- Department of International Business, National Taiwan University, Taipei, Taiwan
| | - Chuan-Ching Huang
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
| | - Chih-Wei Chen
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
| | - Mao-Hsu Yen
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan
| | - Fu-Ren Xiao
- Division of Neurosurgery, Department of Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Wei-Hsin Lin
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Jorrit-Jan Verlaan
- Department of Orthopaedics, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Joseph H. Schwab
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Feng-Ming Hsu
- Division of Radiation Oncology, Department of Oncology, National Taiwan University Hospital, Taipei, Taiwan
- Graduate Institute of Oncology, National Taiwan University College of Medicine, Taipei, Taiwan
- Department of Radiation Oncology, National Taiwan University Cancer Center, Taipei, Taiwan
| | - Tzehong Wong
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
| | - Rong-Sen Yang
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Shu-Hua Yang
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Departmentof Orthopedics, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Ming-Hsiao Hu
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
- Departmentof Orthopedics, National Taiwan University College of Medicine, Taipei, Taiwan
| |
Collapse
|
13
|
Lee KS, Jung SH, Kim DH, Chung SW, Yoon JP. Artificial intelligence- and computer-assisted navigation for shoulder surgery. J Orthop Surg (Hong Kong) 2024; 32:10225536241243166. [PMID: 38546214 DOI: 10.1177/10225536241243166] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 08/28/2024] Open
Abstract
Background: Over the last few decades, shoulder surgery has undergone rapid advancements, with ongoing exploration and the development of innovative technological approaches. In the coming years, technologies such as robot-assisted surgeries, virtual reality, artificial intelligence, patient-specific instrumentation, and different innovative perioperative and preoperative planning tools will continue to fuel a revolution in the medical field, thereby pushing it toward new frontiers and unprecedented advancements. In relation to this, shoulder surgery will experience significant breakthroughs. Main body: Recent advancements and technological innovations in the field were comprehensively analyzed. We aimed to provide a detailed overview of the current landscape, emphasizing the roles of technologies. Computer-assisted surgery utilizing robotic- or image-guided technologies is widely adopted in various orthopedic specialties. The most advanced components of computer-assisted surgery are navigation and robotic systems, with functions and applications that are continuously expanding. Surgical navigation requires a visual system that presents real-time positional data on surgical instruments or implants in relation to the target bone, displayed on a computer monitor. There are three primary categories of surgical planning that utilize navigation systems. The initial category involves volumetric images, such as ultrasound echogram, computed tomography, and magnetic resonance images. The second type is based on intraoperative fluoroscopic images, and the third type incorporates kinetic information about joints or morphometric data about the target bones acquired intraoperatively. Conclusion: The rapid integration of artificial intelligence and deep learning into the medical domain has a significant and transformative influence. Numerous studies utilizing deep learning-based diagnostics in orthopedics have remarkable achievements and performance.
Collapse
Affiliation(s)
- Kang-San Lee
- Department of Orthopaedic Surgery, School of Medicine, Kyungpook National University, Daegu, Korea
| | - Seung Ho Jung
- Department of Orthopaedic Surgery, School of Medicine, Kyungpook National University, Daegu, Korea
| | - Dong-Hyun Kim
- Department of Orthopaedic Surgery, School of Medicine, Kyungpook National University, Daegu, Korea
| | - Seok Won Chung
- Department of Orthopaedic Surgery, School of Medicine, Konkuk University Medical Center, Seoul, Korea
| | - Jong Pil Yoon
- Department of Orthopaedic Surgery, School of Medicine, Kyungpook National University, Daegu, Korea
| |
Collapse
|
14
|
Chiasakul T, Lam BD, McNichol M, Robertson W, Rosovsky RP, Lake L, Vlachos IS, Adamski A, Reyes N, Abe K, Zwicker JI, Patell R. Artificial intelligence in the prediction of venous thromboembolism: A systematic review and pooled analysis. Eur J Haematol 2023; 111:951-962. [PMID: 37794526 PMCID: PMC10900245 DOI: 10.1111/ejh.14110] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/16/2023] [Accepted: 09/18/2023] [Indexed: 10/06/2023]
Abstract
BACKGROUND Accurate diagnostic and prognostic predictions of venous thromboembolism (VTE) are crucial for VTE management. Artificial intelligence (AI) enables autonomous identification of the most predictive patterns from large complex data. Although evidence regarding its performance in VTE prediction is emerging, a comprehensive analysis of performance is lacking. AIMS To systematically review the performance of AI in the diagnosis and prediction of VTE and compare it to clinical risk assessment models (RAMs) or logistic regression models. METHODS A systematic literature search was performed using PubMed, MEDLINE, EMBASE, and Web of Science from inception to April 20, 2021. Search terms included "artificial intelligence" and "venous thromboembolism." Eligible criteria were original studies evaluating AI in the prediction of VTE in adults and reporting one of the following outcomes: sensitivity, specificity, positive predictive value, negative predictive value, or area under receiver operating curve (AUC). Risks of bias were assessed using the PROBAST tool. Unpaired t-test was performed to compare the mean AUC from AI versus conventional methods (RAMs or logistic regression models). RESULTS A total of 20 studies were included. Number of participants ranged from 31 to 111 888. The AI-based models included artificial neural network (six studies), support vector machines (four studies), Bayesian methods (one study), super learner ensemble (one study), genetic programming (one study), unspecified machine learning models (two studies), and multiple machine learning models (five studies). Twelve studies (60%) had both training and testing cohorts. Among 14 studies (70%) where AUCs were reported, the mean AUC for AI versus conventional methods were 0.79 (95% CI: 0.74-0.85) versus 0.61 (95% CI: 0.54-0.68), respectively (p < .001). However, the good to excellent discriminative performance of AI methods is unlikely to be replicated when used in clinical practice, because most studies had high risk of bias due to missing data handling and outcome determination. CONCLUSION The use of AI appears to improve the accuracy of diagnostic and prognostic prediction of VTE over conventional risk models; however, there was a high risk of bias observed across studies. Future studies should focus on transparent reporting, external validation, and clinical application of these models.
Collapse
Affiliation(s)
- Thita Chiasakul
- Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
- Division of Hemostasis and Thrombosis, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
- Division of Hematology, Faculty of Medicine, Department of Medicine, Center of Excellence in Translational Hematology, Chulalongkorn University and King Chulalongkorn Memorial Hospital, Bangkok, Thailand
| | - Barbara D Lam
- Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
- Division of Hemostasis and Thrombosis, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Megan McNichol
- Division of Knowledge Services, Department of Information Services (M.M.), Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
| | - William Robertson
- National Blood Clot Alliance, Philadelphia, Pennsylvania, USA
- Department of Emergency Healthcare, College of Health Professions, Weber State University, Ogden, Utah, USA
| | - Rachel P Rosovsky
- Division of Hematology/Oncology, Department of Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Leslie Lake
- National Blood Clot Alliance, Philadelphia, Pennsylvania, USA
| | - Ioannis S Vlachos
- Department of Pathology, Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Alys Adamski
- Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Nimia Reyes
- Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Karon Abe
- Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Jeffrey I Zwicker
- Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
- Division of Hemostasis and Thrombosis, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
- Department of Medicine, Hematology Service, Memorial Sloan Kettering Cancer Center, New York City, New York, USA
| | - Rushad Patell
- Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
- Division of Hemostasis and Thrombosis, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
15
|
Regazzoni P, Jupiter JB, Liu WC, Fernández dell’Oca AA. Evidence-Based Surgery: What Can Intra-Operative Images Contribute? J Clin Med 2023; 12:6809. [PMID: 37959274 PMCID: PMC10649165 DOI: 10.3390/jcm12216809] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/25/2023] [Accepted: 10/26/2023] [Indexed: 11/15/2023] Open
Abstract
Evidence-based medicine integrates results from randomized controlled trials (RCTs) and meta-analyses, combining the best external evidence with individual clinical expertise and patients' preferences. However, RCTs of surgery differ from those of medicine in that surgical performance is often assumed to be consistent. Yet, evaluating whether each surgery is performed to the same standard is quite challenging. As a primary issue, the novelty of this review is to emphasize-with a focus on orthopedic trauma-the advantage of having complete intra-operative image documentation, allowing the direct evaluation of the quality of the intra-operative technical performance. The absence of complete intra-operative image documentation leads to the inhomogeneity of case series, yielding inconsistent results due to the impossibility of a secondary analysis. Thus, comparisons and the reproduction of studies are difficult. Access to complete intra-operative image data in surgical RCTs allows not only secondary analysis but also comparisons with similar cases. Such complete data can be included in electronic papers. Offering these data to peers-in an accessible link-when presenting papers facilitates the selection process and improves publications for readers. Additionally, having access to the full set of image data for all presented cases serves as a rich resource for learning. It enables the reader to sift through the information and pinpoint the details that are most relevant to their individual needs, allowing them to potentially incorporate this knowledge into daily practice. A broad use of the concept of complete intra-operative image documentation is pivotal for bridging the gap between clinical research findings and real-world applications. Enhancing the quality of surgical RCTs would facilitate the equalization of evidence acquisition in both internal medicine and surgery. Joint effort by surgeons, scientific societies, publishers, and healthcare authorities is needed to support the ideas, implement economic requirements, and overcome the mental obstacles to its realization.
Collapse
Affiliation(s)
- Pietro Regazzoni
- Department of Trauma Surgery, University Hospital Basel, 4031 Basel, Switzerland
| | - Jesse B. Jupiter
- Hand and Arm Center, Department of Orthopedics, Massachusetts General Hospital, Boston, MA 02114, USA;
| | - Wen-Chih Liu
- Hand and Arm Center, Department of Orthopedics, Massachusetts General Hospital, Boston, MA 02114, USA;
- Department of Orthopedics, Kaohsiung Medical University Hospital, Kaohsiung 80756, Taiwan
- School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 80756, Taiwan
| | - Alberto A. Fernández dell’Oca
- Department of Traumatology, Hospital Britanico, Montevideo 11600, Uruguay;
- Residency Program in Traumatology and Orthopedics, University of Montevideo, Montevideo 11600, Uruguay
| |
Collapse
|
16
|
Karlin EA, Lin CC, Meftah M, Slover JD, Schwarzkopf R. The Impact of Machine Learning on Total Joint Arthroplasty Patient Outcomes: A Systemic Review. J Arthroplasty 2023; 38:2085-2095. [PMID: 36441039 DOI: 10.1016/j.arth.2022.10.039] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 10/19/2022] [Accepted: 10/24/2022] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Supervised machine learning techniques have been increasingly applied to predict patient outcomes after hip and knee arthroplasty procedures. The purpose of this study was to systematically review the applications of supervised machine learning techniques to predict patient outcomes after primary total hip and knee arthroplasty. METHODS A comprehensive literature search using the electronic databases MEDLINE, EMBASE, Cochrane Central Register of Controlled Trials, and Cochrane Database of Systematic Reviews was conducted in July of 2021. The inclusion criteria were studies that utilized supervised machine learning techniques to predict patient outcomes after primary total hip or knee arthroplasty. RESULTS Search criteria yielded n = 30 relevant studies. Topics of study included patient complications (n = 6), readmissions (n = 1), revision (n = 2), patient-reported outcome measures (n = 4), patient satisfaction (n = 4), inpatient status and length of stay (LOS) (n = 9), opioid usage (n = 3), and patient function (n = 1). Studies involved TKA (n = 12), THA (n = 11), or a combination (n = 7). Less than 35% of predictive outcomes had an area under the receiver operating characteristic curve (AUC) in the excellent or outstanding range. Additionally, only 9 of the studies found improvement over logistic regression, and only 9 studies were externally validated. CONCLUSION Supervised machine learning algorithms are powerful tools that have been increasingly applied to predict patient outcomes after total hip and knee arthroplasty. However, these algorithms should be evaluated in the context of prognostic accuracy, comparison to traditional statistical techniques for outcome prediction, and application to populations outside the training set. While machine learning algorithms have been received with considerable interest, they should be critically assessed and validated prior to clinical adoption.
Collapse
Affiliation(s)
- Elan A Karlin
- MedStar Georgetown University Hospital, Washington, District of Columbia
| | - Charles C Lin
- Department of Orthopedic Surgery, NYU Langone Health, New York, New York
| | - Morteza Meftah
- Department of Orthopedic Surgery, NYU Langone Health, New York, New York
| | - James D Slover
- Department of Orthopedic Surgery, NYU Langone Health, New York, New York
| | - Ran Schwarzkopf
- Department of Orthopedic Surgery, NYU Langone Health, New York, New York
| |
Collapse
|
17
|
Karnuta JM, Murphy MP, Luu BC, Ryan MJ, Haeberle HS, Brown NM, Iorio R, Chen AF, Ramkumar PN. Artificial Intelligence for Automated Implant Identification in Total Hip Arthroplasty: A Multicenter External Validation Study Exceeding Two Million Plain Radiographs. J Arthroplasty 2023; 38:1998-2003.e1. [PMID: 35271974 DOI: 10.1016/j.arth.2022.03.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 02/23/2022] [Accepted: 03/01/2022] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND The surgical management of complications after total hip arthroplasty (THA) necessitates accurate identification of the femoral implant manufacturer and model. Automated image processing using deep learning has been previously developed and internally validated; however, external validation is necessary prior to responsible application of artificial intelligence (AI)-based technologies. METHODS We trained, validated, and externally tested a deep learning system to classify femoral-sided THA implants as one of the 8 models from 2 manufacturers derived from 2,954 original, deidentified, retrospectively collected anteroposterior plain radiographs across 3 academic referral centers and 13 surgeons. From these radiographs, 2,117 were used for training, 249 for validation, and 588 for external testing. Augmentation was applied to the training set (n = 2,117,000) to increase model robustness. Performance was evaluated by area under the receiver operating characteristic curve, sensitivity, specificity, and accuracy. Implant identification processing speed was calculated. RESULTS The training and testing sets were drawn from statistically different populations of implants (P < .001). After 1,000 training epochs by the deep learning system, the system discriminated 8 implant models with a mean area under the receiver operating characteristic curve of 0.991, accuracy of 97.9%, sensitivity of 88.6%, and specificity of 98.9% in the external testing dataset of 588 anteroposterior radiographs. The software classified implants at a mean speed of 0.02 seconds per image. CONCLUSION An AI-based software demonstrated excellent internal and external validation. Although continued surveillance is necessary with implant library expansion, this software represents responsible and meaningful clinical application of AI with immediate potential to globally scale and assist in preoperative planning prior to revision THA.
Collapse
Affiliation(s)
- Jaret M Karnuta
- Orthopaedic Machine Learning Laboratory, Orthopaedic Intelligence LLC, Cleveland Heights, OH; Department of Orthopaedic Surgery, University of Pennsylvania, Philadelphia, PA
| | - Michael P Murphy
- Department of Orthopaedic Surgery & Rehabilitation, Loyola University Medical Center, Chicago, IL
| | - Bryan C Luu
- Orthopaedic Machine Learning Laboratory, Orthopaedic Intelligence LLC, Cleveland Heights, OH; Department of Orthopaedic Surgery, Baylor College of Medicine, Houston, TX
| | - Michael J Ryan
- Orthopaedic Machine Learning Laboratory, Orthopaedic Intelligence LLC, Cleveland Heights, OH
| | - Heather S Haeberle
- Orthopaedic Machine Learning Laboratory, Orthopaedic Intelligence LLC, Cleveland Heights, OH; Sports Medicine Institute, Hospital for Special Surgery, New York, NY
| | - Nicholas M Brown
- Department of Orthopaedic Surgery & Rehabilitation, Loyola University Medical Center, Chicago, IL
| | - Richard Iorio
- Department of Orthopaedic Surgery, Brigham & Women's Hospital, Boston, MA
| | - Antonia F Chen
- Department of Orthopaedic Surgery, Brigham & Women's Hospital, Boston, MA
| | - Prem N Ramkumar
- Orthopaedic Machine Learning Laboratory, Orthopaedic Intelligence LLC, Cleveland Heights, OH; Sports Medicine Institute, Hospital for Special Surgery, New York, NY; Department of Orthopaedic Surgery, Brigham & Women's Hospital, Boston, MA
| |
Collapse
|
18
|
Chen T, Liu C, Zhang Z, Liang T, Zhu J, Zhou C, Wu S, Yao Y, Huang C, Zhang B, Feng S, Wang Z, Huang S, Sun X, Chen L, Zhan X. Using Machine Learning to Predict Surgical Site Infection After Lumbar Spine Surgery. Infect Drug Resist 2023; 16:5197-5207. [PMID: 37581167 PMCID: PMC10423613 DOI: 10.2147/idr.s417431] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 07/26/2023] [Indexed: 08/16/2023] Open
Abstract
Objective The objective of this study was to utilize machine learning techniques to analyze perioperative factors and identify blood glucose levels that can predict the occurrence of surgical site infection following posterior lumbar spinal surgery. Methods A total of 4019 patients receiving lumbar internal fixation surgery from an institute were enrolled between June 2012 and February 2021. First, the filtered data were randomized into the test and verification groups. Second, in the test group, specific variables were screened using logistic regression analysis, Lasso regression analysis, support vector machine, and random forest. Specific variables obtained using the four methods were intersected, and a dynamic model was constructed. ROC and calibration curves were constructed to assess model performance. Finally, internal model performance was verified in the verification group using ROC and calibration curves. Results The data from 4019 patients were collected. In total, 1327 eligible cases were selected. By combining logistic regression analysis with three machine learning algorithms, this study identified four predictors associated with SSI, namely Modic changes, sebum thickness, hemoglobin, and glucose. Using this information, a prediction model was developed and visually represented. Then, we constructed ROC and calibration curves using the test group; the area under the ROC curve was 0.988. Further, calibration curve analysis revealed favorable consistency of nomogram-predicted values compared with real measurements. The C-index of our model was 0.986 (95% CI 0.981-0.994). Finally, we used the validation group to validate the model internally; the AUC was 0.987. Calibration curve analysis revealed favorable consistency of nomogram-predicted values compared with real measurements. The C-index was 0.982 (95% CI 0.974-0.999). Conclusion Logistic regression analysis and machine learning were employed to select four risk factors: Modic changes, sebum thickness, hemoglobin, and glucose. Then, a dynamic prediction model was constructed to help clinicians simplify the monitoring and prevention of SSI.
Collapse
Affiliation(s)
- Tianyou Chen
- Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
| | - Chong Liu
- Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
| | - Zide Zhang
- Spine Ward, Liuzhou People’s Hospital, Liuzhou, People’s Republic of China
| | - Tuo Liang
- Spine Ward, Liuzhou People’s Hospital, Liuzhou, People’s Republic of China
| | - Jichong Zhu
- Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
| | - Chenxing Zhou
- Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
| | - Shaofeng Wu
- Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
| | - Yuanlin Yao
- Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
| | - Chengqian Huang
- Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
| | - Bin Zhang
- Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
| | - Sitan Feng
- Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
| | - Zequn Wang
- Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
| | - Shengsheng Huang
- Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
| | - Xuhua Sun
- Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
| | - Liyi Chen
- Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
| | - Xinli Zhan
- Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
| |
Collapse
|
19
|
Dubin JA, Bains SS, Chen Z, Hameed D, Nace J, Mont MA, Delanois RE. Using a Google Web Search Analysis to Assess the Utility of ChatGPT in Total Joint Arthroplasty. J Arthroplasty 2023; 38:1195-1202. [PMID: 37040823 DOI: 10.1016/j.arth.2023.04.007] [Citation(s) in RCA: 41] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 03/22/2023] [Accepted: 04/03/2023] [Indexed: 04/13/2023] Open
Abstract
BACKGROUND Rapid technological advancements have laid the foundations for the use of artificial intelligence in medicine. The promise of machine learning (ML) lies in its potential ability to improve treatment decision making, predict adverse outcomes, and streamline the management of perioperative healthcare. In an increasing consumer-focused health care model, unprecedented access to information may extend to patients using ChatGPT to gain insight into medical questions. The main objective of our study was to replicate a patient's internet search in order to assess the appropriateness of ChatGPT, a novel machine learning tool released in 2022 that provides dialogue responses to queries, in comparison to Google Web Search, the most widely used search engine in the United States today, as a resource for patients for online health information. For the 2 different search engines, we compared i) the most frequently asked questions (FAQs) associated with total knee arthroplasty (TKA) and total hip arthroplasty (THA) by question type and topic; ii) the answers to the most frequently asked questions; as well as iii) the FAQs yielding a numerical response. METHODS A Google web search was performed with the following search terms: "total knee replacement" and "total hip replacement." These terms were individually entered and the first 10 FAQs were extracted along with the source of the associated website for each question. The following statements were inputted into ChatGPT: 1) "Perform a google search with the search term 'total knee replacement' and record the 10 most FAQs related to the search term" as well as 2) "Perform a google search with the search term 'total hip replacement' and record the 10 most FAQs related to the search term." A Google web search was repeated with the same search terms to identify the first 10 FAQs that included a numerical response for both "total knee replacement" and "total hip replacement." These questions were then inputted into ChatGPT and the questions and answers were recorded. RESULTS There were 5 of 20 (25%) questions that were similar when performing a Google web search and a search of ChatGPT for all search terms. Of the 20 questions asked for the Google Web Search, 13 of 20 were provided by commercial websites. For ChatGPT, 15 of 20 (75%) questions were answered by government websites, with the most frequent one being PubMed. In terms of numerical questions, 11 of 20 (55%) of the most FAQs provided different responses between a Google web search and ChatGPT. CONCLUSION A comparison of the FAQs by a Google web search with attempted replication by ChatGPT revealed heterogenous questions and responses for open and discrete questions. ChatGPT should remain a trending use as a potential resource to patients that needs further corroboration until its ability to provide credible information is verified and concordant with the goals of the physician and the patient alike.
Collapse
Affiliation(s)
- Jeremy A Dubin
- LifeBridge Health, Sinai Hospital of Baltimore, Rubin Institute for Advanced Orthopedics, Baltimore, Maryland
| | - Sandeep S Bains
- LifeBridge Health, Sinai Hospital of Baltimore, Rubin Institute for Advanced Orthopedics, Baltimore, Maryland
| | - Zhongming Chen
- LifeBridge Health, Sinai Hospital of Baltimore, Rubin Institute for Advanced Orthopedics, Baltimore, Maryland
| | - Daniel Hameed
- LifeBridge Health, Sinai Hospital of Baltimore, Rubin Institute for Advanced Orthopedics, Baltimore, Maryland
| | - James Nace
- LifeBridge Health, Sinai Hospital of Baltimore, Rubin Institute for Advanced Orthopedics, Baltimore, Maryland
| | - Michael A Mont
- LifeBridge Health, Sinai Hospital of Baltimore, Rubin Institute for Advanced Orthopedics, Baltimore, Maryland
| | - Ronald E Delanois
- LifeBridge Health, Sinai Hospital of Baltimore, Rubin Institute for Advanced Orthopedics, Baltimore, Maryland
| |
Collapse
|
20
|
Langenhuijsen LFS, Janse RJ, Venema E, Kent DM, van Diepen M, Dekker FW, Steyerberg EW, de Jong Y. Systematic metareview of prediction studies demonstrates stable trends in bias and low PROBAST inter-rater agreement. J Clin Epidemiol 2023; 159:159-173. [PMID: 37142166 DOI: 10.1016/j.jclinepi.2023.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 03/30/2023] [Accepted: 04/25/2023] [Indexed: 05/06/2023]
Abstract
OBJECTIVES To (1) explore trends of risk of bias (ROB) in prediction research over time following key methodological publications, using the Prediction model Risk Of Bias ASsessment Tool (PROBAST) and (2) assess the inter-rater agreement of the PROBAST. STUDY DESIGN AND SETTING PubMed and Web of Science were searched for reviews with extractable PROBAST scores on domain and signaling question (SQ) level. ROB trends were visually correlated with yearly citations of key publications. Inter-rater agreement was assessed using Cohen's Kappa. RESULTS One hundred and thirty nine systematic reviews were included, of which 85 reviews (containing 2,477 single studies) on domain level and 54 reviews (containing 2,458 single studies) on SQ level. High ROB was prevalent, especially in the Analysis domain, and overall trends of ROB remained relatively stable over time. The inter-rater agreement was low, both on domain (Kappa 0.04-0.26) and SQ level (Kappa -0.14 to 0.49). CONCLUSION Prediction model studies are at high ROB and time trends in ROB as assessed with the PROBAST remain relatively stable. These results might be explained by key publications having no influence on ROB or recency of key publications. Moreover, the trend may suffer from the low inter-rater agreement and ceiling effect of the PROBAST. The inter-rater agreement could potentially be improved by altering the PROBAST or providing training on how to apply the PROBAST.
Collapse
Affiliation(s)
| | - Roemer J Janse
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Esmee Venema
- Department of Public Health, Erasmus MC University Medical Center, Rotterdam, The Netherlands; Department of Emergency Medicine, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - David M Kent
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| | - Merel van Diepen
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Friedo W Dekker
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Ype de Jong
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands; Department of Internal Medicine, Leiden University Medical Center, Leiden, The Netherlands.
| |
Collapse
|
21
|
Lans A, Kanbier LN, Bernstein DN, Groot OQ, Ogink PT, Tobert DG, Verlaan JJ, Schwab JH. Social determinants of health in prognostic machine learning models for orthopaedic outcomes: A systematic review. J Eval Clin Pract 2023; 29:292-299. [PMID: 36099267 DOI: 10.1111/jep.13765] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 08/22/2022] [Accepted: 08/27/2022] [Indexed: 11/26/2022]
Abstract
RATIONAL Social determinants of health (SDOH) are being considered more frequently when providing orthopaedic care due to their impact on treatment outcomes. Simultaneously, prognostic machine learning (ML) models that facilitate clinical decision making have become popular tools in the field of orthopaedic surgery. When ML-driven tools are developed, it is important that the perpetuation of potential disparities is minimized. One approach is to consider SDOH during model development. To date, it remains unclear whether and how existing prognostic ML models for orthopaedic outcomes consider SDOH variables. OBJECTIVE To investigate whether prognostic ML models for orthopaedic surgery outcomes account for SDOH, and to what extent SDOH variables are included in the final models. METHODS A systematic search was conducted in PubMed, Embase and Cochrane for studies published up to 17 November 2020. Two reviewers independently extracted SDOH features using the PROGRESS+ framework (place of residence, race/ethnicity, Occupation, gender/sex, religion, education, social capital, socioeconomic status, 'Plus+' age, disability, and sexual orientation). RESULTS The search yielded 7138 studies, of which 59 met the inclusion criteria. Across all studies, 96% (57/59) considered at least one PROGRESS+ factor during development. The most common factors were age (95%; 56/59) and gender/sex (96%; 57/59). Differential effect analyses, such as subgroup analysis, covariate adjustment, and baseline comparison, were rarely reported (10%; 6/59). The majority of models included age (92%; 54/59) and gender/sex (69%; 41/59) as final input variables. However, factors such as insurance status (7%; 4/59), marital status (7%; 4/59) and income (3%; 2/59) were seldom included. CONCLUSION The current level of reporting and consideration of SDOH during the development of prognostic ML models for orthopaedic outcomes is limited. Healthcare providers should be critical of the models they consider using and knowledgeable regarding the quality of model development, such as adherence to recognized methodological standards. Future efforts should aim to avoid bias and disparities when developing ML-driven applications for orthopaedics.
Collapse
Affiliation(s)
- Amanda Lans
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA.,Department of Orthopaedic Surgery, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Laura N Kanbier
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - David N Bernstein
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Olivier Q Groot
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA.,Department of Orthopaedic Surgery, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Paul T Ogink
- Department of Orthopaedic Surgery, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Daniel G Tobert
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Jorrit-Jan Verlaan
- Department of Orthopaedic Surgery, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Joseph H Schwab
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
22
|
Bhashyam AR, Challa ST, Thomas H, Rodriguez EK, Weaver MJ. Clinic follow-up of orthopaedic trauma patients during and after the post-surgical global period: a retrospective cohort study. BMC Musculoskelet Disord 2023; 24:120. [PMID: 36782143 PMCID: PMC9926540 DOI: 10.1186/s12891-023-06218-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Accepted: 02/02/2023] [Indexed: 02/15/2023] Open
Abstract
BACKGROUND Insurance status is important as medical expenses may decrease the likelihood of follow-up after musculoskeletal trauma, especially for low-income populations. However, it is unknown what insurance factors are associated with follow-up care. In this study, we assessed the association between insurance plan benefits, the end of the post-surgical global period, and follow-up after musculoskeletal injury. METHODS This is a retrospective cohort study of 394 patients with isolated extremity fractures who were treated at three level-I trauma centers over four months in 2018. Paired t-tests were utilized to assess the likelihood of follow-up in relation to the 90-day post-surgical global period. Regression analysis was used to assess factors associated with the likelihood of follow-up. Supervised machine learning algorithms were used to develop predictive models of follow-up after the post-surgical global period. RESULTS Our final analysis included 328 patients. Likelihood of follow-up did not significantly change while within the post-surgical global period. When comparing follow-up within and outside of the post-surgical global period, there was a 20.1% decrease in follow-up between the 6-weeks and 6-month time points (68.3% versus 48.2%, respectively; p < 0.0001). Medicaid insurance compared to Medicare (OR 0.27, 95% confidence interval (CI) = [0.09, 0.84], p = 0.02) was a predictor of decreased likelihood of follow-up at 6-months post-operatively. CONCLUSIONS Our study demonstrates a statistically significant decrease in follow-up for orthopaedic trauma patients after the post-surgical global period, particularly for patients with Medicaid or Private insurance.
Collapse
Affiliation(s)
- Abhiram R. Bhashyam
- grid.32224.350000 0004 0386 9924Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Orthopaedic Trauma Initiative, Harvard Medical School, 55 Fruit St, Boston, MA 02114 USA
| | - Sravya T. Challa
- grid.32224.350000 0004 0386 9924Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA USA
| | - Hannah Thomas
- grid.38142.3c000000041936754XHarvard Medical School, Boston, MA USA
| | - Edward K. Rodriguez
- grid.239395.70000 0000 9011 8547Department of Orthopaedic Surgery, Beth Israel Deaconess Medical Center, Harvard Orthopaedic Trauma Initiative, Harvard Medical School, Boston, MA USA
| | - Michael J. Weaver
- grid.38142.3c000000041936754XDepartment of Orthopaedic Surgery, Brigham and Women’s Hospital, Harvard Orthopaedic Trauma Initiative, Harvard Medical School, Boston, MA USA
| |
Collapse
|
23
|
Lans A, Bales JR, Fourman MS, Borkhetaria PP, Verlaan JJ, Schwab JH. Health Literacy in Orthopedic Surgery: A Systematic Review. HSS J 2023; 19:120-127. [PMID: 36776507 PMCID: PMC9837407 DOI: 10.1177/15563316221110536] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 05/20/2022] [Indexed: 02/14/2023]
Abstract
Background: Limited health literacy has been associated with adverse health outcomes. Undergoing orthopedic surgery often requires patients to make complex decisions and adhere to complicated instructions, suggesting that health literacy skills might have a profound impact on orthopedic surgery outcomes. Purpose: We sought to review the literature for studies investigating the level of health literacy in patients undergoing orthopedic surgery and also to assess how those studies report factors affecting health equity. Methods: We conducted a systematic search of PubMed, Embase, and Cochrane Library for all health literacy studies published in the orthopedic surgery literature up to February 8, 2022. Search terms included synonyms for health literacy and for all orthopedic surgery subspecialties. Two reviewers independently extracted study data in addition to indicators of equity reporting using the PROGRESS+ checklist (Place of Residence, Race/Ethnicity, Occupation, Gender/sex, Religion, Education, Social capital, Socioeconomic status, plus age, disability, and sexual orientation). Results: The search resulted in 616 studies; 9 studies remained after exclusion criteria were applied. Most studies were of arthroplasty (4/9; 44%) or trauma (3/9; 33%) patients. Validated health literacy assessments were used in 4 of the included studies, and only 3 studies reported the rate of limited health literacy in the patients studied, which ranged between 34% and 38.5%. At least one PROGRESS+ item was reported in 88% (8/9) of the studies. Conclusions: We found a paucity of appropriately designed studies that used validated measures of health literacy in the field of orthopedic surgery. The potential impact of health literacy on orthopedic patients and their outcomes has yet to be elucidated. Thoughtful, high-quality trials across diverse demographics and geographies are warranted.
Collapse
Affiliation(s)
- Amanda Lans
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Department of Orthopaedic Surgery, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - John R. Bales
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Mitchell S. Fourman
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Pranati P. Borkhetaria
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Jorrit-Jan Verlaan
- Department of Orthopaedic Surgery, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Joseph H. Schwab
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
24
|
Yen HK, Hu MH, Zijlstra H, Groot OQ, Hsieh HC, Yang JJ, Karhade AV, Chen PC, Chen YH, Huang PH, Chen YH, Xiao FR, Verlaan JJ, Schwab JH, Yang RS, Yang SH, Lin WH, Hsu FM. Prognostic significance of lab data and performance comparison by validating survival prediction models for patients with spinal metastases after radiotherapy. Radiother Oncol 2022; 175:159-166. [PMID: 36067909 DOI: 10.1016/j.radonc.2022.08.029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 07/14/2022] [Accepted: 08/28/2022] [Indexed: 12/17/2022]
Abstract
BACKGROUND AND PURPOSE Well-performing survival prediction models (SPMs) help patients and healthcare professionals to choose treatment aligning with prognosis. This retrospective study aims to investigate the prognostic impacts of laboratory data and to compare the performances of Metastases location, Elderly, Tumor primary, Sex, Sickness/comorbidity, and Site of radiotherapy (METSSS) model, New England Spinal Metastasis Score (NESMS), and Skeletal Oncology Research Group machine learning algorithm (SORG-MLA) for spinal metastases (SM). MATERIALS AND METHODS From 2010 to 2018, patients who received radiotherapy (RT) for SM at a tertiary center were enrolled and the data were retrospectively collected. Multivariate logistic and Cox-proportional-hazard regression analyses were used to assess the association between laboratory values and survival. The area under receiver-operating characteristics curve (AUROC), calibration analysis, Brier score, and decision curve analysis were used to evaluate the performance of SPMs. RESULTS A total of 2786 patients were included for analysis. The 90-day and 1-year survival rates after RT were 70.4% and 35.7%, respectively. Higher albumin, hemoglobin, or lymphocyte count were associated with better survival, while higher alkaline phosphatase, white blood cell count, neutrophil count, neutrophil-to-lymphocyte ratio, platelet-to-lymphocyte ratio, or international normalized ratio were associated with poor prognosis. SORG-MLA has the best discrimination (AUROC 90-day, 0.78; 1-year 0.76), best calibrations, and the lowest Brier score (90-day 0.16; 1-year 0.18). The decision curve of SORG-MLA is above the other two competing models with threshold probabilities from 0.1 to 0.8. CONCLUSION Laboratory data are of prognostic significance in survival prediction after RT for SM. Machine learning-based model SORG-MLA outperforms statistical regression-based model METSSS model and NESMS in survival predictions.
Collapse
Affiliation(s)
- Hung-Kuan Yen
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan; Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsinchu, Taiwan; Department of Medical Education, National Taiwan University Hospital, Hsin-Chu Branch, Hsinchu, Taiwan
| | - Ming-Hsiao Hu
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Hester Zijlstra
- Department of Orthopaedics, University Medical Center Utrecht, Utrecht, Netherlands; Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, United States
| | - Olivier Q Groot
- Department of Orthopaedics, University Medical Center Utrecht, Utrecht, Netherlands; Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, United States
| | - Hsiang-Chieh Hsieh
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsinchu, Taiwan
| | - Jiun-Jen Yang
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Aditya V Karhade
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, United States
| | - Po-Chao Chen
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Yu-Han Chen
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Po-Hao Huang
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Yu-Hung Chen
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Fu-Ren Xiao
- Division of Neurosurgery, Department of Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Jorrit-Jan Verlaan
- Department of Orthopaedics, University Medical Center Utrecht, Utrecht, Netherlands
| | - Joseph H Schwab
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, United States
| | - Rong-Sen Yang
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Shu-Hua Yang
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Wei-Hsin Lin
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan.
| | - Feng-Ming Hsu
- Division of Radiation Oncology, Department of Oncology, National Taiwan University Hospital, Taipei, Taiwan; Graduate Institute of Oncology, National Taiwan University College of Medicine, Taipei, Taiwan; Department of Radiation Oncology, National Taiwan University Cancer Center, Taipei, Taiwan.
| |
Collapse
|
25
|
Pan Y, Zhang Q, Zhang Y, Ge X, Gao X, Yang S, Xu J. Lane-change intention prediction using eye-tracking technology: A systematic review. APPLIED ERGONOMICS 2022; 103:103775. [PMID: 35500523 DOI: 10.1016/j.apergo.2022.103775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 03/16/2022] [Accepted: 04/12/2022] [Indexed: 06/14/2023]
Abstract
The aim of this study is to identify the best practices and future research directions for driver lane-change intention (DLCI) prediction using eye-tracking technologies based on a systematic literature review. We searched five academic literature databases and then conducted an in-depth review, structured coding, and analysis of 40 relevant articles. The literature on DLCI prediction is summarized in terms of input features, feature extraction and prediction time windows, labeling methods, and machine learning algorithms. The results show that eye tracking data features along with other data sources can be useful inputs for the prediction of DLCI. Major challenges in this line of research include determining the optimal time window for feature extraction and developing and evaluating the appropriate machine learning algorithm. Suggestions for future research and practice for DLCI prediction in intelligent vehicles are discussed.
Collapse
Affiliation(s)
- Yunxian Pan
- Center for Psychological Sciences, Zhejiang University, Hangzhou, Zhejiang Province, PR China
| | - Qinyu Zhang
- Center for Psychological Sciences, Zhejiang University, Hangzhou, Zhejiang Province, PR China
| | - Yifan Zhang
- Center for Psychological Sciences, Zhejiang University, Hangzhou, Zhejiang Province, PR China
| | - Xianliang Ge
- Center for Psychological Sciences, Zhejiang University, Hangzhou, Zhejiang Province, PR China
| | - Xiaoqing Gao
- Center for Psychological Sciences, Zhejiang University, Hangzhou, Zhejiang Province, PR China
| | | | - Jie Xu
- Center for Psychological Sciences, Zhejiang University, Hangzhou, Zhejiang Province, PR China.
| |
Collapse
|
26
|
Smith H, Sweeting M, Morris T, Crowther MJ. A scoping methodological review of simulation studies comparing statistical and machine learning approaches to risk prediction for time-to-event data. Diagn Progn Res 2022; 6:10. [PMID: 35650647 PMCID: PMC9161606 DOI: 10.1186/s41512-022-00124-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 03/01/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND There is substantial interest in the adaptation and application of so-called machine learning approaches to prognostic modelling of censored time-to-event data. These methods must be compared and evaluated against existing methods in a variety of scenarios to determine their predictive performance. A scoping review of how machine learning methods have been compared to traditional survival models is important to identify the comparisons that have been made and issues where they are lacking, biased towards one approach or misleading. METHODS We conducted a scoping review of research articles published between 1 January 2000 and 2 December 2020 using PubMed. Eligible articles were those that used simulation studies to compare statistical and machine learning methods for risk prediction with a time-to-event outcome in a medical/healthcare setting. We focus on data-generating mechanisms (DGMs), the methods that have been compared, the estimands of the simulation studies, and the performance measures used to evaluate them. RESULTS A total of ten articles were identified as eligible for the review. Six of the articles evaluated a method that was developed by the authors, four of which were machine learning methods, and the results almost always stated that this developed method's performance was equivalent to or better than the other methods compared. Comparisons were often biased towards the novel approach, with the majority only comparing against a basic Cox proportional hazards model, and in scenarios where it is clear it would not perform well. In many of the articles reviewed, key information was unclear, such as the number of simulation repetitions and how performance measures were calculated. CONCLUSION It is vital that method comparisons are unbiased and comprehensive, and this should be the goal even if realising it is difficult. Fully assessing how newly developed methods perform and how they compare to a variety of traditional statistical methods for prognostic modelling is imperative as these methods are already being applied in clinical contexts. Evaluations of the performance and usefulness of recently developed methods for risk prediction should be continued and reporting standards improved as these methods become increasingly popular.
Collapse
Affiliation(s)
- Hayley Smith
- grid.9918.90000 0004 1936 8411Department of Health Sciences, University of Leicester, Leicester, LE1 7RH UK
| | - Michael Sweeting
- grid.9918.90000 0004 1936 8411Department of Health Sciences, University of Leicester, Leicester, LE1 7RH UK
- grid.417815.e0000 0004 5929 4381Statistical Innovation, Oncology Biometrics, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Tim Morris
- grid.415052.70000 0004 0606 323XMRC Clinical Trials Unit at UCL, 90 High Holborn, London, WC1V 6LJ UK
| | - Michael J. Crowther
- grid.4714.60000 0004 1937 0626Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
27
|
Polce EM, Kunze KN, Dooley MS, Piuzzi NS, Boettner F, Sculco PK. Efficacy and Applications of Artificial Intelligence and Machine Learning Analyses in Total Joint Arthroplasty: A Call for Improved Reporting. J Bone Joint Surg Am 2022; 104:821-832. [PMID: 35045061 DOI: 10.2106/jbjs.21.00717] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
BACKGROUND There has been a considerable increase in total joint arthroplasty (TJA) research using machine learning (ML). Therefore, the purposes of this study were to synthesize the applications and efficacies of ML reported in the TJA literature, and to assess the methodological quality of these studies. METHODS PubMed, OVID/MEDLINE, and Cochrane libraries were queried in January 2021 for articles regarding the use of ML in TJA. Study demographics, topic, primary and secondary outcomes, ML model development and testing, and model presentation and validation were recorded. The TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines were used to assess the methodological quality. RESULTS Fifty-five studies were identified: 31 investigated clinical outcomes and resource utilization; 11, activity and motion surveillance; 10, imaging detection; and 3, natural language processing. For studies reporting the area under the receiver operating characteristic curve (AUC), the median AUC (and range) was 0.80 (0.60 to 0.97) among 26 clinical outcome studies, 0.99 (0.83 to 1.00) among 6 imaging-based studies, and 0.88 (0.76 to 0.98) among 3 activity and motion surveillance studies. Twelve studies compared ML to logistic regression, with 9 (75%) reporting that ML was superior. The average number of TRIPOD guidelines met was 11.5 (range: 5 to 18), with 38 (69%) meeting greater than half of the criteria. Presentation and explanation of the full model for individual predictions and assessments of model calibration were poorly reported (<30%). CONCLUSIONS The performance of ML models was good to excellent when applied to a wide variety of clinically relevant outcomes in TJA. However, reporting of certain key methodological and model presentation criteria was inadequate. Despite the recent surge in TJA literature utilizing ML, the lack of consistent adherence to reporting guidelines needs to be addressed to bridge the gap between model development and clinical implementation.
Collapse
Affiliation(s)
- Evan M Polce
- University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin
| | - Kyle N Kunze
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, NY
| | - Matthew S Dooley
- University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin
| | - Nicolas S Piuzzi
- Department of Orthopaedic Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Friedrich Boettner
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, NY
| | - Peter K Sculco
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, NY
| |
Collapse
|
28
|
Devana SK, Shah AA, Lee C, Gudapati V, Jensen AR, Cheung E, Solorzano C, van der Schaar M, SooHoo NF. Development of a Machine Learning Algorithm for Prediction of Complications and Unplanned Readmission Following Reverse Total Shoulder Arthroplasty. J Shoulder Elb Arthroplast 2022; 5:24715492211038172. [PMID: 35330785 PMCID: PMC8938598 DOI: 10.1177/24715492211038172] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 06/21/2021] [Accepted: 07/20/2021] [Indexed: 11/22/2022] Open
Abstract
Background Reverse total shoulder arthroplasty (rTSA) offers tremendous promise for the treatment of complex pathologies beyond the scope of anatomic total shoulder arthroplasty but is associated with a higher rate of major postoperative complications. We aimed to design and validate a machine learning (ML) model to predict major postoperative complications or readmission following rTSA. Methods We retrospectively reviewed California's Office of Statewide Health Planning and Development database for patients who underwent rTSA between 2015 and 2017. We implemented logistic regression (LR), extreme gradient boosting (XGBoost), gradient boosting machines, adaptive boosting, and random forest classifiers in Python and trained these models using 64 binary, continuous, and discrete variables to predict the occurrence of at least one major postoperative complication or readmission following primary rTSA. Models were validated using the standard metrics of area under the receiver operating characteristic (AUROC) curve, area under the precision–recall curve (AUPRC), and Brier scores. The key factors for the top-performing model were determined. Results Of 2799 rTSAs performed during the study period, 152 patients (5%) had at least 1 major postoperative complication or 30-day readmission. XGBoost had the highest AUROC and AUPRC of 0.681 and 0.129, respectively. The key predictive features in this model were patients with a history of implant complications, protein-calorie malnutrition, and a higher number of comorbidities. Conclusion Our study reports an ML model for the prediction of major complications or 30-day readmission following rTSA. XGBoost outperformed traditional LR models and also identified key predictive features of complications and readmission.
Collapse
Affiliation(s)
- Sai K Devana
- David Geffen School of Medicine UCLA, Los Angeles, CA, USA
| | - Akash A Shah
- David Geffen School of Medicine UCLA, Los Angeles, CA, USA
| | | | - Varun Gudapati
- David Geffen School of Medicine UCLA, Los Angeles, CA, USA
| | | | - Edward Cheung
- David Geffen School of Medicine UCLA, Los Angeles, CA, USA
| | | | | | | |
Collapse
|
29
|
Huang AW, Haslberger M, Coulibaly N, Galárraga O, Oganisian A, Belbasis L, Panagiotou OA. Multivariable prediction models for health care spending using machine learning: a protocol of a systematic review. Diagn Progn Res 2022; 6:4. [PMID: 35321760 PMCID: PMC8943988 DOI: 10.1186/s41512-022-00119-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 01/18/2022] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND With rising cost pressures on health care systems, machine-learning (ML)-based algorithms are increasingly used to predict health care costs. Despite their potential advantages, the successful implementation of these methods could be undermined by biases introduced in the design, conduct, or analysis of studies seeking to develop and/or validate ML models. The utility of such models may also be negatively affected by poor reporting of these studies. In this systematic review, we aim to evaluate the reporting quality, methodological characteristics, and risk of bias of ML-based prediction models for individual-level health care spending. METHODS We will systematically search PubMed and Embase to identify studies developing, updating, or validating ML-based models to predict an individual's health care spending for any medical condition, over any time period, and in any setting. We will exclude prediction models of aggregate-level health care spending, models used to infer causality, models using radiomics or speech parameters, models of non-clinically validated predictors (e.g., genomics), and cost-effectiveness analyses without predicting individual-level health care spending. We will extract data based on the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies (CHARMS), previously published research, and relevant recommendations. We will assess the adherence of ML-based studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement and examine the inclusion of transparency and reproducibility indicators (e.g. statements on data sharing). To assess the risk of bias, we will apply the Prediction model Risk Of Bias Assessment Tool (PROBAST). Findings will be stratified by study design, ML methods used, population characteristics, and medical field. DISCUSSION Our systematic review will appraise the quality, reporting, and risk of bias of ML-based models for individualized health care cost prediction. This review will provide an overview of the available models and give insights into the strengths and limitations of using ML methods for the prediction of health spending.
Collapse
Affiliation(s)
- Andrew W Huang
- Department of Health Services, Policy and Practice, Brown University School of Public Health, Rhode Island, Providence, USA.
| | - Martin Haslberger
- QUEST Center, Berlin Institute of Health, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Neto Coulibaly
- Department of Health Services, Policy and Practice, Brown University School of Public Health, Rhode Island, Providence, USA
| | - Omar Galárraga
- Department of Health Services, Policy and Practice, Brown University School of Public Health, Rhode Island, Providence, USA
| | - Arman Oganisian
- Department of Biostatistics, Brown University School of Public Health, Providence, Rhode Island, USA
| | - Lazaros Belbasis
- Meta-Research Innovation Center Berlin, QUEST Center, Berlin Institute of Health, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Orestis A Panagiotou
- Department of Health Services, Policy and Practice, Brown University School of Public Health, Rhode Island, Providence, USA
- Center for Evidence Synthesis in Health, Brown University School of Public Health, Providence, Rhode Island, USA
| |
Collapse
|
30
|
Body Composition Predictors of Adverse Postoperative Events in Patients Undergoing Surgery for Long Bone Metastases. J Am Acad Orthop Surg Glob Res Rev 2022; 6:01979360-202203000-00010. [PMID: 35262530 PMCID: PMC8913089 DOI: 10.5435/jaaosglobal-d-22-00001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 01/03/2022] [Indexed: 11/23/2022]
Abstract
Body composition assessed using opportunistic CT has been recently identified as a predictor of outcome in patients with cancer. The purpose of this study was to determine whether the cross-sectional area (CSA) and the attenuation of abdominal subcutaneous adipose tissue, visceral adipose tissue (VAT), and paraspinous and abdominal muscles are the predictors of length of hospital stay, 30-day postoperative complications, and revision surgery in patients treated for long bone metastases.
Collapse
|
31
|
Guzman-Vilca WC, Castillo-Cara M, Carrillo-Larco RM. Development, validation and application of a machine learning model to estimate salt consumption in 54 countries. eLife 2022; 11:72930. [PMID: 34984979 PMCID: PMC8789317 DOI: 10.7554/elife.72930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 12/15/2021] [Indexed: 11/13/2022] Open
Abstract
Global targets to reduce salt intake have been proposed but their monitoring is challenged by the lack of population-based data on salt consumption. We developed a machine learning (ML) model to predict salt consumption at the population level based on simple predictors and applied this model to national surveys in 54 countries. We used 21 surveys with spot urine samples for the ML model derivation and validation; we developed a supervised ML regression model based on: sex, age, weight, height, systolic and diastolic blood pressure. We applied the ML model to 54 new surveys to quantify the mean salt consumption in the population. The pooled dataset in which we developed the ML model included 49,776 people. Overall, there were no substantial differences between the observed and ML-predicted mean salt intake (p<0.001). The pooled dataset where we applied the ML model included 166,677 people; the predicted mean salt consumption ranged from 6.8 g/day (95% CI: 6.8-6.8 g/day) in Eritrea to 10.0 g/day (95% CI: 9.9-10.0 g/day) in American Samoa. The countries with the highest predicted mean salt intake were in Western Pacific. The lowest predicted intake was found in Africa. The country-specific predicted mean salt intake was within reasonable difference from the best available evidence. A ML model based on readily available predictors estimated daily salt consumption with good accuracy. This model could be used to predict mean salt consumption in the general population where urine samples are not available.
Collapse
|
32
|
Groot OQ, Bindels BJJ, Ogink PT, Kapoor ND, Twining PK, Collins AK, Bongers MER, Lans A, Oosterhoff JHF, Karhade AV, Verlaan JJ, Schwab JH. Availability and reporting quality of external validations of machine-learning prediction models with orthopedic surgical outcomes: a systematic review. Acta Orthop 2021; 92:385-393. [PMID: 33870837 PMCID: PMC8436968 DOI: 10.1080/17453674.2021.1910448] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Background and purpose - External validation of machine learning (ML) prediction models is an essential step before clinical application. We assessed the proportion, performance, and transparent reporting of externally validated ML prediction models in orthopedic surgery, using the Transparent Reporting for Individual Prognosis or Diagnosis (TRIPOD) guidelines.Material and methods - We performed a systematic search using synonyms for every orthopedic specialty, ML, and external validation. The proportion was determined by using 59 ML prediction models with only internal validation in orthopedic surgical outcome published up until June 18, 2020, previously identified by our group. Model performance was evaluated using discrimination, calibration, and decision-curve analysis. The TRIPOD guidelines assessed transparent reporting.Results - We included 18 studies externally validating 10 different ML prediction models of the 59 available ML models after screening 4,682 studies. All external validations identified in this review retained good discrimination. Other key performance measures were provided in only 3 studies, rendering overall performance evaluation difficult. The overall median TRIPOD completeness was 61% (IQR 43-89), with 6 items being reported in less than 4/18 of the studies.Interpretation - Most current predictive ML models are not externally validated. The 18 available external validation studies were characterized by incomplete reporting of performance measures, limiting a transparent examination of model performance. Further prospective studies are needed to validate or refute the myriad of predictive ML models in orthopedics while adhering to existing guidelines. This ensures clinicians can take full advantage of validated and clinically implementable ML decision tools.
Collapse
Affiliation(s)
- Olivier Q Groot
- Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, USA;;
- Department of Orthopedic Surgery, University Medical Center Utrecht, Utrecht University, The Netherlands
| | - Bas J J Bindels
- Department of Orthopedic Surgery, University Medical Center Utrecht, Utrecht University, The Netherlands
| | - Paul T Ogink
- Department of Orthopedic Surgery, University Medical Center Utrecht, Utrecht University, The Netherlands
| | - Neal D Kapoor
- Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, USA;;
| | - Peter K Twining
- Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, USA;;
| | - Austin K Collins
- Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, USA;;
| | - Michiel E R Bongers
- Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, USA;;
| | - Amanda Lans
- Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, USA;;
- Department of Orthopedic Surgery, University Medical Center Utrecht, Utrecht University, The Netherlands
| | - Jacobien H F Oosterhoff
- Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, USA;;
| | - Aditya V Karhade
- Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, USA;;
| | - Jorrit-Jan Verlaan
- Department of Orthopedic Surgery, University Medical Center Utrecht, Utrecht University, The Netherlands
| | - Joseph H Schwab
- Orthopedic Oncology Service, Massachusetts General Hospital, Harvard Medical School, Boston, USA;;
| |
Collapse
|