Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Groot OQ, Ogink PT, Lans A, Twining PK, Kapoor ND, DiGiovanni W, Bindels BJJ, Bongers MER, Oosterhoff JHF, Karhade AV, Oner FC, Verlaan J, Schwab JH. Machine learning prediction models in orthopedic surgery: A systematic review in transparent reporting. J Orthop Res 2022;40:475-483. [PMID: 33734466 PMCID: PMC9290012 DOI: 10.1002/jor.25036] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 03/10/2021] [Accepted: 03/15/2021] [Indexed: 02/04/2023]

For:	Groot OQ, Ogink PT, Lans A, Twining PK, Kapoor ND, DiGiovanni W, Bindels BJJ, Bongers MER, Oosterhoff JHF, Karhade AV, Oner FC, Verlaan J, Schwab JH. Machine learning prediction models in orthopedic surgery: A systematic review in transparent reporting. J Orthop Res 2022;40:475-483. [PMID: 33734466 PMCID: PMC9290012 DOI: 10.1002/jor.25036] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 03/10/2021] [Accepted: 03/15/2021] [Indexed: 02/04/2023]

Number

Cited by Other Article(s)

Pan YT, Lin YP, Yen HK, Yen HH, Huang CC, Hsieh HC, Janssen S, Hu MH, Lin WH, Groot OQ. Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases? Clin Orthop Relat Res 2024;482:1710-1721. [PMID: 38517402 PMCID: PMC11343550 DOI: 10.1097/corr.0000000000003030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 02/09/2024] [Indexed: 03/23/2024]

Abstract

BACKGROUND

Bone metastasis in advanced cancer is challenging because of pain, functional issues, and reduced life expectancy. Treatment planning is complex, with consideration of factors such as location, symptoms, and prognosis. Prognostic models help guide treatment choices, with Skeletal Oncology Research Group machine-learning algorithms (SORG-MLAs) showing promise in predicting survival for initial spinal metastases and extremity metastases treated with surgery or radiotherapy. Improved therapies extend patient lifespans, increasing the risk of subsequent skeletal-related events (SREs). Patients experiencing subsequent SREs often suffer from disease progression, indicating a deteriorating condition. For these patients, a thorough evaluation, including accurate survival prediction, is essential to determine the most appropriate treatment and avoid aggressive surgical treatment for patients with a poor survival likelihood. Patients experiencing subsequent SREs often suffer from disease progression, indicating a deteriorating condition. However, some variables in the SORG prediction model, such as tumor histology, visceral metastasis, and previous systemic therapies, might remain consistent between initial and subsequent SREs. Given the prognostic difference between patients with and without a subsequent SRE, the efficacy of established prognostic models-originally designed for individuals with an initial SRE-in addressing a subsequent SRE remains uncertain. Therefore, it is crucial to verify the model's utility for subsequent SREs.

QUESTION/PURPOSE

We aimed to evaluate the reliability of the SORG-MLAs for survival prediction in patients undergoing surgery or radiotherapy for a subsequent SRE for whom both the initial and subsequent SREs occurred in the spine or extremities.

METHODS

We retrospectively included 738 patients who were 20 years or older who received surgery or radiotherapy for initial and subsequent SREs at a tertiary referral center and local hospital in Taiwan between 2010 and 2019. We excluded 74 patients whose initial SRE was in the spine and in whom the subsequent SRE occurred in the extremities and 37 patients whose initial SRE was in the extremities and the subsequent SRE was in the spine. The rationale was that different SORG-MLAs were exclusively designed for patients who had an initial spine metastasis and those who had an initial extremity metastasis, irrespective of whether they experienced metastatic events in other areas (for example, a patient experiencing an extremity SRE before his or her spinal SRE would also be regarded as a candidate for an initial spinal SRE). Because these patients were already validated in previous studies, we excluded them in case we overestimated our result. Five patients with malignant primary bone tumors and 38 patients in whom the metastasis's origin could not be identified were excluded, leaving 584 patients for analysis. The 584 included patients were categorized into two subgroups based on the location of initial and subsequent SREs: the spine group (68% [399]) and extremity group (32% [185]). No patients were lost to follow-up. Patient data at the time they presented with a subsequent SRE were collected, and survival predictions at this timepoint were calculated using the SORG-MLAs. Multiple imputation with the Missforest technique was conducted five times to impute the missing proportions of each predictor. The effectiveness of SORG-MLAs was gauged through several statistical measures, including discrimination (measured by the area under the receiver operating characteristic curve [AUC]), calibration, overall performance (Brier score), and decision curve analysis. Discrimination refers to the model's ability to differentiate between those with the event and those without the event. An AUC ranges from 0.5 to 1.0, with 0.5 indicating the worst discrimination and 1.0 indicating perfect discrimination. An AUC of 0.7 is considered clinically acceptable discrimination. Calibration is the comparison between the frequency of observed events and the predicted probabilities. In an ideal calibration, the observed and predicted survival rates should be congruent. The logarithm of observed-to-expected survival ratio [log(O:E)] offers insight into the model's overall calibration by considering the total number of observed (O) and expected (E) events. The Brier score measures the mean squared difference between the predicted probability of possible outcomes for each individual and the observed outcomes, ranging from 0 to 1, with 0 indicating perfect overall performance and 1 indicating the worst performance. Moreover, the prevalence of the outcome should be considered, so a null-model Brier score was also calculated by assigning a probability equal to the prevalence of the outcome (in this case, the actual survival rate) to each patient. The benefit of the prediction model is determined by comparing its Brier score with that of the null model. If a prediction model's Brier score is lower than the null model's Brier score, the prediction model is deemed as having good performance. A decision curve analysis was performed for models to evaluate the "net benefit," which weighs the true positive rate over the false positive rate against the "threshold probabilities," the ratio of risk over benefit after an intervention was derived based on a comprehensive clinical evaluation and a well-discussed shared-decision process. A good predictive model should yield a higher net benefit than default strategies (treating all patients and treating no patients) across a range of threshold probabilities.

RESULTS

For the spine group, the algorithms displayed acceptable AUC results (median AUCs of 0.69 to 0.72) for 42-day, 90-day, and 1-year survival predictions after treatment for a subsequent SRE. In contrast, the extremity group showed median AUCs ranging from 0.65 to 0.73 for the corresponding survival periods. All Brier scores were lower than those of their null model, indicating the SORG-MLAs' good overall performances for both cohorts. The SORG-MLAs yielded a net benefit for both cohorts; however, they overestimated 1-year survival probabilities in patients with a subsequent SRE in the spine, with a median log(O:E) of -0.60 (95% confidence interval -0.77 to -0.42).

CONCLUSION

The SORG-MLAs maintain satisfactory discriminatory capacity and offer considerable net benefits through decision curve analysis, indicating their continued viability as prediction tools in this clinical context. However, the algorithms overestimate 1-year survival rates for patients with a subsequent SRE of the spine, warranting consideration of specific patient groups. Clinicians and surgeons should exercise caution when using the SORG-MLAs for survival prediction in these patients and remain aware of potential mispredictions when tailoring treatment plans, with a preference for less invasive treatments. Ultimately, this study emphasizes the importance of enhancing prognostic algorithms and developing innovative tools for patients with subsequent SREs as the life expectancy in patients with bone metastases continues to improve and healthcare providers will encounter these patients more often in daily practice.

LEVEL OF EVIDENCE

Level III, prognostic study.

Collapse

Buddhiraju A, Shimizu MR, Seo HH, Chen TLW, RezazadehSaatlou M, Huang Z, Kwon YM. Generalizability of machine learning models predicting 30-day unplanned readmission after primary total knee arthroplasty using a nationally representative database. Med Biol Eng Comput 2024;62:2333-2341. [PMID: 38558351 DOI: 10.1007/s11517-024-03075-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 03/15/2024] [Indexed: 04/04/2024]

Lee CC, Chen CW, Yen HK, Lin YP, Lai CY, Wang JL, Groot OQ, Janssen SJ, Schwab JH, Hsu FM, Lin WH. Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone. Clin Orthop Relat Res 2024:00003086-990000000-01687. [PMID: 39051924 DOI: 10.1097/corr.0000000000003185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Accepted: 06/20/2024] [Indexed: 07/27/2024]

Abstract

BACKGROUND

Survival estimation for patients with symptomatic skeletal metastases ideally should be made before a type of local treatment has already been determined. Currently available survival prediction tools, however, were generated using data from patients treated either operatively or with local radiation alone, raising concerns about whether they would generalize well to all patients presenting for assessment. The Skeletal Oncology Research Group machine-learning algorithm (SORG-MLA), trained with institution-based data of surgically treated patients, and the Metastases location, Elderly, Tumor primary, Sex, Sickness/comorbidity, and Site of radiotherapy model (METSSS), trained with registry-based data of patients treated with radiotherapy alone, are two of the most recently developed survival prediction models, but they have not been tested on patients whose local treatment strategy is not yet decided.

QUESTIONS/PURPOSES

(1) Which of these two survival prediction models performed better in a mixed cohort made up both of patients who received local treatment with surgery followed by radiotherapy and who had radiation alone for symptomatic bone metastases? (2) Which model performed better among patients whose local treatment consisted of only palliative radiotherapy? (3) Are laboratory values used by SORG-MLA, which are not included in METSSS, independently associated with survival after controlling for predictions made by METSSS?

METHODS

Between 2010 and 2018, we provided local treatment for 2113 adult patients with skeletal metastases in the extremities at an urban tertiary referral academic medical center using one of two strategies: (1) surgery followed by postoperative radiotherapy or (2) palliative radiotherapy alone. Every patient's survivorship status was ascertained either by their medical records or the national death registry from the Taiwanese National Health Insurance Administration. After applying a priori designated exclusion criteria, 91% (1920) were analyzed here. Among them, 48% (920) of the patients were female, and the median (IQR) age was 62 years (53 to 70 years). Lung was the most common primary tumor site (41% [782]), and 59% (1128) of patients had other skeletal metastases in addition to the treated lesion(s). In general, the indications for surgery were the presence of a complete pathologic fracture or an impending pathologic fracture, defined as having a Mirels score of ≥ 9, in patients with an American Society of Anesthesiologists (ASA) classification of less than or equal to IV and who were considered fit for surgery. The indications for radiotherapy were relief of pain, local tumor control, prevention of skeletal-related events, and any combination of the above. In all, 84% (1610) of the patients received palliative radiotherapy alone as local treatment for the target lesion(s), and 16% (310) underwent surgery followed by postoperative radiotherapy. Neither METSSS nor SORG-MLA was used at the point of care to aid clinical decision-making during the treatment period. Survival was retrospectively estimated by these two models to test their potential for providing survival probabilities. We first compared SORG to METSSS in the entire population. Then, we repeated the comparison in patients who received local treatment with palliative radiation alone. We assessed model performance by area under the receiver operating characteristic curve (AUROC), calibration analysis, Brier score, and decision curve analysis (DCA). The AUROC measures discrimination, which is the ability to distinguish patients with the event of interest (such as death at a particular time point) from those without. AUROC typically ranges from 0.5 to 1.0, with 0.5 indicating random guessing and 1.0 a perfect prediction, and in general, an AUROC of ≥ 0.7 indicates adequate discrimination for clinical use. Calibration refers to the agreement between the predicted outcomes (in this case, survival probabilities) and the actual outcomes, with a perfect calibration curve having an intercept of 0 and a slope of 1. A positive intercept indicates that the actual survival is generally underestimated by the prediction model, and a negative intercept suggests the opposite (overestimation). When comparing models, an intercept closer to 0 typically indicates better calibration. Calibration can also be summarized as log(O:E), the logarithm scale of the ratio of observed (O) to expected (E) survivors. A log(O:E) > 0 signals an underestimation (the observed survival is greater than the predicted survival); and a log(O:E) < 0 indicates the opposite (the observed survival is lower than the predicted survival). A model with a log(O:E) closer to 0 is generally considered better calibrated. The Brier score is the mean squared difference between the model predictions and the observed outcomes, and it ranges from 0 (best prediction) to 1 (worst prediction). The Brier score captures both discrimination and calibration, and it is considered a measure of overall model performance. In Brier score analysis, the "null model" assigns a predicted probability equal to the prevalence of the outcome and represents a model that adds no new information. A prediction model should achieve a Brier score at least lower than the null-model Brier score to be considered as useful. The DCA was developed as a method to determine whether using a model to inform treatment decisions would do more good than harm. It plots the net benefit of making decisions based on the model's predictions across all possible risk thresholds (or cost-to-benefit ratios) in relation to the two default strategies of treating all or no patients. The care provider can decide on an acceptable risk threshold for the proposed treatment in an individual and assess the corresponding net benefit to determine whether consulting with the model is superior to adopting the default strategies. Finally, we examined whether laboratory data, which were not included in the METSSS model, would have been independently associated with survival after controlling for the METSSS model's predictions by using the multivariable logistic and Cox proportional hazards regression analyses.

RESULTS

Between the two models, only SORG-MLA achieved adequate discrimination (an AUROC of > 0.7) in the entire cohort (of patients treated operatively or with radiation alone) and in the subgroup of patients treated with palliative radiotherapy alone. SORG-MLA outperformed METSSS by a wide margin on discrimination, calibration, and Brier score analyses in not only the entire cohort but also the subgroup of patients whose local treatment consisted of radiotherapy alone. In both the entire cohort and the subgroup, DCA demonstrated that SORG-MLA provided more net benefit compared with the two default strategies (of treating all or no patients) and compared with METSSS when risk thresholds ranged from 0.2 to 0.9 at both 90 days and 1 year, indicating that using SORG-MLA as a decision-making aid was beneficial when a patient's individualized risk threshold for opting for treatment was 0.2 to 0.9. Higher albumin, lower alkaline phosphatase, lower calcium, higher hemoglobin, lower international normalized ratio, higher lymphocytes, lower neutrophils, lower neutrophil-to-lymphocyte ratio, lower platelet-to-lymphocyte ratio, higher sodium, and lower white blood cells were independently associated with better 1-year and overall survival after adjusting for the predictions made by METSSS.

CONCLUSION

Based on these discoveries, clinicians might choose to consult SORG-MLA instead of METSSS for survival estimation in patients with long-bone metastases presenting for evaluation of local treatment. Basing a treatment decision on the predictions of SORG-MLA could be beneficial when a patient's individualized risk threshold for opting to undergo a particular treatment strategy ranged from 0.2 to 0.9. Future studies might investigate relevant laboratory items when constructing or refining a survival estimation model because these data demonstrated prognostic value independent of the predictions of the METSSS model, and future studies might also seek to keep these models up to date using data from diverse, contemporary patients undergoing both modern operative and nonoperative treatments.

LEVEL OF EVIDENCE

Level III, diagnostic study.

Collapse

Farrow L, Zhong M, Anderson L. Use of natural language processing techniques to predict patient selection for total hip and knee arthroplasty from radiology reports. Bone Joint J 2024;106-B:688-695. [PMID: 38945535 DOI: 10.1302/0301-620x.106b7.bjj-2024-0136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]

Abstract

Aims

To examine whether natural language processing (NLP) using a clinically based large language model (LLM) could be used to predict patient selection for total hip or total knee arthroplasty (THA/TKA) from routinely available free-text radiology reports.

Methods

Data pre-processing and analyses were conducted according to the Artificial intelligence to Revolutionize the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project protocol. This included use of de-identified Scottish regional clinical data of patients referred for consideration of THA/TKA, held in a secure data environment designed for artificial intelligence (AI) inference. Only preoperative radiology reports were included. NLP algorithms were based on the freely available GatorTron model, a LLM trained on over 82 billion words of de-identified clinical text. Two inference tasks were performed: assessment after model-fine tuning (50 Epochs and three cycles of k-fold cross validation), and external validation.

Results

For THA, there were 5,558 patient radiology reports included, of which 4,137 were used for model training and testing, and 1,421 for external validation. Following training, model performance demonstrated average (mean across three folds) accuracy, F1 score, and area under the receiver operating curve (AUROC) values of 0.850 (95% confidence interval (CI) 0.833 to 0.867), 0.813 (95% CI 0.785 to 0.841), and 0.847 (95% CI 0.822 to 0.872), respectively. For TKA, 7,457 patient radiology reports were included, with 3,478 used for model training and testing, and 3,152 for external validation. Performance metrics included accuracy, F1 score, and AUROC values of 0.757 (95% CI 0.702 to 0.811), 0.543 (95% CI 0.479 to 0.607), and 0.717 (95% CI 0.657 to 0.778) respectively. There was a notable deterioration in performance on external validation in both cohorts.

Conclusion

The use of routinely available preoperative radiology reports provides promising potential to help screen suitable candidates for THA, but not for TKA. The external validation results demonstrate the importance of further model testing and training when confronted with new clinical cohorts.

Collapse

Ritter D, Denard PJ, Raiss P, Wijdicks CA, Bachmaier S. Preoperative 3-dimensional computed tomography bone density measures provide objective bone quality classifications for stemless anatomic total shoulder arthroplasty. J Shoulder Elbow Surg 2024;33:1503-1511. [PMID: 38182017 DOI: 10.1016/j.jse.2023.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/26/2023] [Accepted: 11/12/2023] [Indexed: 01/07/2024]

Abstract

BACKGROUND

Reproducible methods for determining adequate bone densities for stemless anatomic total shoulder arthroplasty (aTSA) are currently lacking. The purpose of this study was to evaluate the utility of preoperative computed tomography (CT) imaging for assessing the bone density of the proximal humerus for supportive differentiation in the decision making for stemless humeral component implantation. It was hypothesized that preoperative 3-dimensional (3-D) CT bone density measures provide objective classifications of the bone quality for stemless aTSA.

METHODS

A 3-part study was performed that included the analysis of cadaveric humerus CT scans followed by retrospective application to a clinical cohort and classification with a machine learning model. Thirty cadaveric humeri were evaluated with clinical CT and micro-CT (μCT) imaging. Phantom-calibrated CT data were used to extract 3-D regions of interest and defined radiographic scores. The final image processing script was applied retrospectively to a clinical cohort (n = 150) that had a preoperative CT and intraoperative bone density assessment using the "thumb test," followed by placement of an anatomic stemmed or stemless humeral component. Postscan patient-specific calibration was used to improve the functionality and accuracy of the density analysis. A machine learning model (Support vector machine [SVM]) was utilized to improve the classification of bone densities for a stemless humeral component.

RESULTS

The image processing of clinical CT images demonstrated good to excellent accuracy for cylindrical cancellous bone densities (metaphysis [ICC = 0.986] and epiphysis [ICC = 0.883]). Patient-specific internal calibration significantly reduced biases and unwanted variance compared with standard HU CT scans (P < .0001). The SVM showed optimized prediction accuracy compared with conventional statistics with an accuracy of 73.9% and an AUC of 0.83 based on the intraoperative decision of the surgeon. The SVM model based on density clusters increased the accuracy of the bone quality classification to 87.3% with an AUC of 0.93.

CONCLUSIONS

Preoperative CT imaging allows accurate evaluation of the bone densities in the proximal humerus. Three-dimensional regions of interest, rescaling using patient-specific calibration, and a machine learning model resulted in good to excellent prediction for objective bone quality classification. This approach may provide an objective tool extending preoperative selection criteria for stemless humeral component implantation.

Collapse

Chen Y, Zhang S, Tang N, George DM, Huang T, Tang J. Using Google web search to analyze and evaluate the application of ChatGPT in femoroacetabular impingement syndrome. Front Public Health 2024;12:1412063. [PMID: 38883198 PMCID: PMC11176516 DOI: 10.3389/fpubh.2024.1412063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 05/23/2024] [Indexed: 06/18/2024] Open

Abstract

Background

Chat Generative Pre-trained Transformer (ChatGPT) is a new machine learning tool that allows patients to access health information online, specifically compared to Google, the most commonly used search engine in the United States. Patients can use ChatGPT to better understand medical issues. This study compared the two search engines based on: (i) frequently asked questions (FAQs) about Femoroacetabular Impingement Syndrome (FAI), (ii) the corresponding answers to these FAQs, and (iii) the most FAQs yielding a numerical response.

Purpose

To assess the suitability of ChatGPT as an online health information resource for patients by replicating their internet searches.

Study design

Cross-sectional study.

Methods

The same keywords were used to search the 10 most common questions about FAI on both Google and ChatGPT. The responses from both search engines were recorded and analyzed.

Results

Of the 20 questions, 8 (40%) were similar. Among the 10 questions searched on Google, 7 were provided by a medical practice. For numerical questions, there was a notable difference in answers between Google and ChatGPT for 3 out of the top 5 most common questions (60%). Expert evaluation indicated that 67.5% of experts were satisfied or highly satisfied with the accuracy of ChatGPT's descriptions of both conservative and surgical treatment options for FAI. Additionally, 62.5% of experts were satisfied or highly satisfied with the safety of the information provided. Regarding the etiology of FAI, including cam and pincer impingements, 52.5% of experts expressed satisfaction or high satisfaction with ChatGPT's explanations. Overall, 62.5% of experts affirmed that ChatGPT could serve effectively as a reliable medical resource for initial information retrieval.

Conclusion

This study confirms that ChatGPT, despite being a new tool, shows significant potential as a supplementary resource for health information on FAI. Expert evaluations commend its capacity to provide accurate and comprehensive responses, valued by medical professionals for relevance and safety. Nonetheless, continuous improvements in its medical content's depth and precision are recommended for ongoing reliability. While ChatGPT offers a promising alternative to traditional search engines, meticulous validation is imperative before it can be fully embraced as a trusted medical resource.

Collapse

Li S, Bao YG, Wu B. Letter to the editor regarding the article "artificial intelligence and computer-assisted navigation for shoulder surgery". J Orthop Surg (Hong Kong) 2024;32:10225536241263656. [PMID: 38871346 DOI: 10.1177/10225536241263656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/15/2024] Open

Yang J, Ardavanis KS, Slack KE, Fernando ND, Della Valle CJ, Hernandez NM. Chat Generative Pretrained Transformer (ChatGPT) and Bard: Artificial Intelligence Does not yet Provide Clinically Supported Answers for Hip and Knee Osteoarthritis. J Arthroplasty 2024;39:1184-1190. [PMID: 38237878 DOI: 10.1016/j.arth.2024.01.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 01/08/2024] [Accepted: 01/11/2024] [Indexed: 02/22/2024] Open

Abstract

BACKGROUND

Advancements in artificial intelligence (AI) have led to the creation of large language models (LLMs), such as Chat Generative Pretrained Transformer (ChatGPT) and Bard, that analyze online resources to synthesize responses to user queries. Despite their popularity, the accuracy of LLM responses to medical questions remains unknown. This study aimed to compare the responses of ChatGPT and Bard regarding treatments for hip and knee osteoarthritis with the American Academy of Orthopaedic Surgeons (AAOS) Evidence-Based Clinical Practice Guidelines (CPGs) recommendations.

METHODS

Both ChatGPT (Open AI) and Bard (Google) were queried regarding 20 treatments (10 for hip and 10 for knee osteoarthritis) from the AAOS CPGs. Responses were classified by 2 reviewers as being in "Concordance," "Discordance," or "No Concordance" with AAOS CPGs. A Cohen's Kappa coefficient was used to assess inter-rater reliability, and Chi-squared analyses were used to compare responses between LLMs.

RESULTS

Overall, ChatGPT and Bard provided responses that were concordant with the AAOS CPGs for 16 (80%) and 12 (60%) treatments, respectively. Notably, ChatGPT and Bard encouraged the use of non-recommended treatments in 30% and 60% of queries, respectively. There were no differences in performance when evaluating by joint or by recommended versus non-recommended treatments. Studies were referenced in 6 (30%) of the Bard responses and none (0%) of the ChatGPT responses. Of the 6 Bard responses, studies could only be identified for 1 (16.7%). Of the remaining, 2 (33.3%) responses cited studies in journals that did not exist, 2 (33.3%) cited studies that could not be found with the information given, and 1 (16.7%) provided links to unrelated studies.

CONCLUSIONS

Both ChatGPT and Bard do not consistently provide responses that align with the AAOS CPGs. Consequently, physicians and patients should temper expectations on the guidance AI platforms can currently provide.

Collapse

Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, Ghassemi M, Liu X, Reitsma JB, van Smeden M, Boulesteix AL, Camaradou JC, Celi LA, Denaxas S, Denniston AK, Glocker B, Golub RM, Harvey H, Heinze G, Hoffman MM, Kengne AP, Lam E, Lee N, Loder EW, Maier-Hein L, Mateen BA, McCradden MD, Oakden-Rayner L, Ordish J, Parnell R, Rose S, Singh K, Wynants L, Logullo P. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024;385:e078378. [PMID: 38626948 PMCID: PMC11019967 DOI: 10.1136/bmj-2023-078378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/17/2024] [Indexed: 04/19/2024]

Affiliation(s)

Gary S Collins Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
Karel G M Moons Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
Paula Dhiman Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
Richard D Riley Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
Andrew L Beam Department of Epidemiology, Harvard T H Chan School of Public Health, Boston, MA, USA
Ben Van Calster Department of Development and Regeneration, KU Leuven, Leuven, Belgium Department of Biomedical Data Science, Leiden University Medical Centre, Leiden, Netherlands
Marzyeh Ghassemi Department of Electrical Engineering and Computer Science, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
Xiaoxuan Liu Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
Johannes B Reitsma Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
Maarten van Smeden Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
Anne-Laure Boulesteix Institute for Medical Information Processing, Biometry and Epidemiology, Faculty of Medicine, Ludwig-Maximilians-University of Munich and Munich Centre of Machine Learning, Germany
Jennifer Catherine Camaradou Patient representative, Health Data Research UK patient and public involvement and engagement group Patient representative, University of East Anglia, Faculty of Health Sciences, Norwich Research Park, Norwich, UK
Leo Anthony Celi Beth Israel Deaconess Medical Center, Boston, MA, USA Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA Department of Biostatistics, Harvard T H Chan School of Public Health, Boston, MA, USA
Spiros Denaxas Institute of Health Informatics, University College London, London, UK British Heart Foundation Data Science Centre, London, UK
Alastair K Denniston National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
Ben Glocker Department of Computing, Imperial College London, London, UK
Robert M Golub Northwestern University Feinberg School of Medicine, Chicago, IL, USA
Hugh Harvey Hardian Health, Haywards Heath, UK
Georg Heinze Section for Clinical Biometrics, Centre for Medical Data Science, Medical University of Vienna, Vienna, Austria
Michael M Hoffman Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada Department of Computer Science, University of Toronto, Toronto, ON, Canada Vector Institute for Artificial Intelligence, Toronto, ON, Canada
André Pascal Kengne Department of Medicine, University of Cape Town, Cape Town, South Africa
Emily Lam Patient representative, Health Data Research UK patient and public involvement and engagement group
Naomi Lee National Institute for Health and Care Excellence, London, UK
Elizabeth W Loder The BMJ, London, UK Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
Lena Maier-Hein Department of Intelligent Medical Systems, German Cancer Research Centre, Heidelberg, Germany
Bilal A Mateen Institute of Health Informatics, University College London, London, UK Wellcome Trust, London, UK Alan Turing Institute, London, UK
Melissa D McCradden Department of Bioethics, Hospital for Sick Children Toronto, ON, Canada Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, Canada
Lauren Oakden-Rayner Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
Johan Ordish Medicines and Healthcare products Regulatory Agency, London, UK
Richard Parnell Patient representative, Health Data Research UK patient and public involvement and engagement group
Sherri Rose Department of Health Policy and Center for Health Policy, Stanford University, Stanford, CA, USA
Karandeep Singh Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
Laure Wynants Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
Patricia Logullo Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK

Collapse

Toh ZA, Berg B, Han QYC, Hey HWD, Pikkarainen M, Grotle M, He HG. Clinical Decision Support System Used in Spinal Disorders: Scoping Review. J Med Internet Res 2024;26:e53951. [PMID: 38502157 PMCID: PMC10988379 DOI: 10.2196/53951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 01/29/2024] [Accepted: 02/10/2024] [Indexed: 03/20/2024] Open

Abstract

BACKGROUND

Spinal disorders are highly prevalent worldwide with high socioeconomic costs. This cost is associated with the demand for treatment and productivity loss, prompting the exploration of technologies to improve patient outcomes. Clinical decision support systems (CDSSs) are computerized systems that are increasingly used to facilitate safe and efficient health care. Their applications range in depth and can be found across health care specialties.

OBJECTIVE

This scoping review aims to explore the use of CDSSs in patients with spinal disorders.

METHODS

We used the Joanna Briggs Institute methodological guidance for this scoping review and reported according to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) statement. Databases, including PubMed, Embase, Cochrane, CINAHL, Web of Science, Scopus, ProQuest, and PsycINFO, were searched from inception until October 11, 2022. The included studies examined the use of digitalized CDSSs in patients with spinal disorders.

RESULTS

A total of 4 major CDSS functions were identified from 31 studies: preventing unnecessary imaging (n=8, 26%), aiding diagnosis (n=6, 19%), aiding prognosis (n=11, 35%), and recommending treatment options (n=6, 20%). Most studies used the knowledge-based system. Logistic regression was the most commonly used method, followed by decision tree algorithms. The use of CDSSs to aid in the management of spinal disorders was generally accepted over the threat to physicians' clinical decision-making autonomy.

CONCLUSIONS

Although the effectiveness was frequently evaluated by examining the agreement between the decisions made by the CDSSs and the health care providers, comparing the CDSS recommendations with actual clinical outcomes would be preferable. In addition, future studies on CDSS development should focus on system integration, considering end user's needs and preferences, and external validation and impact studies to assess effectiveness and generalizability.

TRIAL REGISTRATION

OSF Registries osf.io/dyz3f; https://osf.io/dyz3f.

Collapse

Dijkstra H, van de Kuit A, de Groot T, Canta O, Groot OQ, Oosterhoff JH, Doornberg JN. Systematic review of machine-learning models in orthopaedic trauma. Bone Jt Open 2024;5:9-19. [PMID: 38226447 PMCID: PMC10790183 DOI: 10.1302/2633-1462.51.bjo-2023-0095.r1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/17/2024] Open

Abstract

Aims

Machine-learning (ML) prediction models in orthopaedic trauma hold great promise in assisting clinicians in various tasks, such as personalized risk stratification. However, an overview of current applications and critical appraisal to peer-reviewed guidelines is lacking. The objectives of this study are to 1) provide an overview of current ML prediction models in orthopaedic trauma; 2) evaluate the completeness of reporting following the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement; and 3) assess the risk of bias following the Prediction model Risk Of Bias Assessment Tool (PROBAST) tool.

Methods

A systematic search screening 3,252 studies identified 45 ML-based prediction models in orthopaedic trauma up to January 2023. The TRIPOD statement assessed transparent reporting and the PROBAST tool the risk of bias.

Results

A total of 40 studies reported on training and internal validation; four studies performed both development and external validation, and one study performed only external validation. The most commonly reported outcomes were mortality (33%, 15/45) and length of hospital stay (9%, 4/45), and the majority of prediction models were developed in the hip fracture population (60%, 27/45). The overall median completeness for the TRIPOD statement was 62% (interquartile range 30 to 81%). The overall risk of bias in the PROBAST tool was low in 24% (11/45), high in 69% (31/45), and unclear in 7% (3/45) of the studies. High risk of bias was mainly due to analysis domain concerns including small datasets with low number of outcomes, complete-case analysis in case of missing data, and no reporting of performance measures.

Conclusion

The results of this study showed that despite a myriad of potential clinically useful applications, a substantial part of ML studies in orthopaedic trauma lack transparent reporting, and are at high risk of bias. These problems must be resolved by following established guidelines to instil confidence in ML models among patients and clinicians. Otherwise, there will remain a sizeable gap between the development of ML prediction models and their clinical application in our day-to-day orthopaedic trauma practice.

Collapse

Huang CC, Peng KP, Hsieh HC, Groot OQ, Yen HK, Tsai CC, Karhade AV, Lin YP, Kao YT, Yang JJ, Dai SH, Huang CC, Chen CW, Yen MH, Xiao FR, Lin WH, Verlaan JJ, Schwab JH, Hsu FM, Wong T, Yang RS, Yang SH, Hu MH. Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm. Clin Orthop Relat Res 2024;482:143-157. [PMID: 37306629 PMCID: PMC10723864 DOI: 10.1097/corr.0000000000002706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 01/20/2023] [Accepted: 05/01/2023] [Indexed: 06/13/2023]

Abstract

BACKGROUND

The Skeletal Oncology Research Group machine-learning algorithm (SORG-MLA) was developed to predict the survival of patients with spinal metastasis. The algorithm was successfully tested in five international institutions using 1101 patients from different continents. The incorporation of 18 prognostic factors strengthens its predictive ability but limits its clinical utility because some prognostic factors might not be clinically available when a clinician wishes to make a prediction.

QUESTIONS/PURPOSES

We performed this study to (1) evaluate the SORG-MLA's performance with data and (2) develop an internet-based application to impute the missing data.

METHODS

A total of 2768 patients were included in this study. The data of 617 patients who were treated surgically were intentionally erased, and the data of the other 2151 patients who were treated with radiotherapy and medical treatment were used to impute the artificially missing data. Compared with those who were treated nonsurgically, patients undergoing surgery were younger (median 59 years [IQR 51 to 67 years] versus median 62 years [IQR 53 to 71 years]) and had a higher proportion of patients with at least three spinal metastatic levels (77% [474 of 617] versus 72% [1547 of 2151]), more neurologic deficit (normal American Spinal Injury Association [E] 68% [301 of 443] versus 79% [1227 of 1561]), higher BMI (23 kg/m 2 [IQR 20 to 25 kg/m 2 ] versus 22 kg/m 2 [IQR 20 to 25 kg/m 2 ]), higher platelet count (240 × 10 3 /µL [IQR 173 to 327 × 10 3 /µL] versus 227 × 10 3 /µL [IQR 165 to 302 × 10 3 /µL], higher lymphocyte count (15 × 10 3 /µL [IQR 9 to 21× 10 3 /µL] versus 14 × 10 3 /µL [IQR 8 to 21 × 10 3 /µL]), lower serum creatinine level (0.7 mg/dL [IQR 0.6 to 0.9 mg/dL] versus 0.8 mg/dL [IQR 0.6 to 1.0 mg/dL]), less previous systemic therapy (19% [115 of 617] versus 24% [526 of 2151]), fewer Charlson comorbidities other than cancer (28% [170 of 617] versus 36% [770 of 2151]), and longer median survival. The two patient groups did not differ in other regards. These findings aligned with our institutional philosophy of selecting patients for surgical intervention based on their level of favorable prognostic factors such as BMI or lymphocyte counts and lower levels of unfavorable prognostic factors such as white blood cell counts or serum creatinine level, as well as the degree of spinal instability and severity of neurologic deficits. This approach aims to identify patients with better survival outcomes and prioritize their surgical intervention accordingly. Seven factors (serum albumin and alkaline phosphatase levels, international normalized ratio, lymphocyte and neutrophil counts, and the presence of visceral or brain metastases) were considered possible missing items based on five previous validation studies and clinical experience. Artificially missing data were imputed using the missForest imputation technique, which was previously applied and successfully tested to fit the SORG-MLA in validation studies. Discrimination, calibration, overall performance, and decision curve analysis were applied to evaluate the SORG-MLA's performance. The discrimination ability was measured with an area under the receiver operating characteristic curve. It ranges from 0.5 to 1.0, with 0.5 indicating the worst discrimination and 1.0 indicating perfect discrimination. An area under the curve of 0.7 is considered clinically acceptable discrimination. Calibration refers to the agreement between the predicted outcomes and actual outcomes. An ideal calibration model will yield predicted survival rates that are congruent with the observed survival rates. The Brier score measures the squared difference between the actual outcome and predicted probability, which captures calibration and discrimination ability simultaneously. A Brier score of 0 indicates perfect prediction, whereas a Brier score of 1 indicates the poorest prediction. A decision curve analysis was performed for the 6-week, 90-day, and 1-year prediction models to evaluate their net benefit across different threshold probabilities. Using the results from our analysis, we developed an internet-based application that facilitates real-time data imputation for clinical decision-making at the point of care. This tool allows healthcare professionals to efficiently and effectively address missing data, ensuring that patient care remains optimal at all times.

RESULTS

Generally, the SORG-MLA demonstrated good discriminatory ability, with areas under the curve greater than 0.7 in most cases, and good overall performance, with up to 25% improvement in Brier scores in the presence of one to three missing items. The only exceptions were albumin level and lymphocyte count, because the SORG-MLA's performance was reduced when these two items were missing, indicating that the SORG-MLA might be unreliable without these values. The model tended to underestimate the patient survival rate. As the number of missing items increased, the model's discriminatory ability was progressively impaired, and a marked underestimation of patient survival rates was observed. Specifically, when three items were missing, the number of actual survivors was up to 1.3 times greater than the number of expected survivors, while only 10% discrepancy was observed when only one item was missing. When either two or three items were omitted, the decision curves exhibited substantial overlap, indicating a lack of consistent disparities in performance. This finding suggests that the SORG-MLA consistently generates accurate predictions, regardless of the two or three items that are omitted. We developed an internet application ( https://sorg-spine-mets-missing-data-imputation.azurewebsites.net/ ) that allows the use of SORG-MLA with up to three missing items.

CONCLUSION

The SORG-MLA generally performed well in the presence of one to three missing items, except for serum albumin level and lymphocyte count (which are essential for adequate predictions, even using our modified version of the SORG-MLA). We recommend that future studies should develop prediction models that allow for their use when there are missing data, or provide a means to impute those missing data, because some data are not available at the time a clinical decision must be made.

CLINICAL RELEVANCE

The results suggested the algorithm could be helpful when a radiologic evaluation owing to a lengthy waiting period cannot be performed in time, especially in situations when an early operation could be beneficial. It could help orthopaedic surgeons to decide whether to intervene palliatively or extensively, even when the surgical indication is clear.

Collapse

Affiliation(s)

Chi-Ching Huang Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan Department of Medical Education, National Taiwan University Hospital, Taipei, Taiwan
Kuang-Ping Peng Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
Hsiang-Chieh Hsieh Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
Olivier Q. Groot Department of Orthopaedics, University Medical Center Utrecht, Utrecht, the Netherlands Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, USA
Hung-Kuan Yen Department of Medical Education, National Taiwan University Hospital, Taipei, Taiwan Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
Cheng-Chen Tsai Department of Medical Education, National Taiwan University Hospital, Taipei, Taiwan
Aditya V. Karhade Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, USA
Yen-Po Lin Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
Yin-Tien Kao Department of Medical Education, National Taiwan University Hospital, Taipei, Taiwan
Jiun-Jen Yang Department of Medical Education, National Taiwan University Hospital, Taipei, Taiwan
Shih-Hsiang Dai Department of International Business, National Taiwan University, Taipei, Taiwan
Chuan-Ching Huang Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
Chih-Wei Chen Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
Mao-Hsu Yen Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan
Fu-Ren Xiao Division of Neurosurgery, Department of Surgery, National Taiwan University Hospital, Taipei, Taiwan
Wei-Hsin Lin Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
Jorrit-Jan Verlaan Department of Orthopaedics, University Medical Center Utrecht, Utrecht, the Netherlands
Joseph H. Schwab Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, USA
Feng-Ming Hsu Division of Radiation Oncology, Department of Oncology, National Taiwan University Hospital, Taipei, Taiwan Graduate Institute of Oncology, National Taiwan University College of Medicine, Taipei, Taiwan Department of Radiation Oncology, National Taiwan University Cancer Center, Taipei, Taiwan
Tzehong Wong Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsin-Chu, Taiwan
Rong-Sen Yang Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
Shu-Hua Yang Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan Departmentof Orthopedics, National Taiwan University College of Medicine, Taipei, Taiwan
Ming-Hsiao Hu Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan Departmentof Orthopedics, National Taiwan University College of Medicine, Taipei, Taiwan

Collapse

Lee KS, Jung SH, Kim DH, Chung SW, Yoon JP. Artificial intelligence- and computer-assisted navigation for shoulder surgery. J Orthop Surg (Hong Kong) 2024;32:10225536241243166. [PMID: 38546214 DOI: 10.1177/10225536241243166] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 08/28/2024] Open

Abstract

Background: Over the last few decades, shoulder surgery has undergone rapid advancements, with ongoing exploration and the development of innovative technological approaches. In the coming years, technologies such as robot-assisted surgeries, virtual reality, artificial intelligence, patient-specific instrumentation, and different innovative perioperative and preoperative planning tools will continue to fuel a revolution in the medical field, thereby pushing it toward new frontiers and unprecedented advancements. In relation to this, shoulder surgery will experience significant breakthroughs. Main body: Recent advancements and technological innovations in the field were comprehensively analyzed. We aimed to provide a detailed overview of the current landscape, emphasizing the roles of technologies. Computer-assisted surgery utilizing robotic- or image-guided technologies is widely adopted in various orthopedic specialties. The most advanced components of computer-assisted surgery are navigation and robotic systems, with functions and applications that are continuously expanding. Surgical navigation requires a visual system that presents real-time positional data on surgical instruments or implants in relation to the target bone, displayed on a computer monitor. There are three primary categories of surgical planning that utilize navigation systems. The initial category involves volumetric images, such as ultrasound echogram, computed tomography, and magnetic resonance images. The second type is based on intraoperative fluoroscopic images, and the third type incorporates kinetic information about joints or morphometric data about the target bones acquired intraoperatively. Conclusion: The rapid integration of artificial intelligence and deep learning into the medical domain has a significant and transformative influence. Numerous studies utilizing deep learning-based diagnostics in orthopedics have remarkable achievements and performance.

Collapse

Chiasakul T, Lam BD, McNichol M, Robertson W, Rosovsky RP, Lake L, Vlachos IS, Adamski A, Reyes N, Abe K, Zwicker JI, Patell R. Artificial intelligence in the prediction of venous thromboembolism: A systematic review and pooled analysis. Eur J Haematol 2023;111:951-962. [PMID: 37794526 PMCID: PMC10900245 DOI: 10.1111/ejh.14110] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/16/2023] [Accepted: 09/18/2023] [Indexed: 10/06/2023]

Abstract

BACKGROUND

Accurate diagnostic and prognostic predictions of venous thromboembolism (VTE) are crucial for VTE management. Artificial intelligence (AI) enables autonomous identification of the most predictive patterns from large complex data. Although evidence regarding its performance in VTE prediction is emerging, a comprehensive analysis of performance is lacking.

AIMS

To systematically review the performance of AI in the diagnosis and prediction of VTE and compare it to clinical risk assessment models (RAMs) or logistic regression models.

METHODS

A systematic literature search was performed using PubMed, MEDLINE, EMBASE, and Web of Science from inception to April 20, 2021. Search terms included "artificial intelligence" and "venous thromboembolism." Eligible criteria were original studies evaluating AI in the prediction of VTE in adults and reporting one of the following outcomes: sensitivity, specificity, positive predictive value, negative predictive value, or area under receiver operating curve (AUC). Risks of bias were assessed using the PROBAST tool. Unpaired t-test was performed to compare the mean AUC from AI versus conventional methods (RAMs or logistic regression models).

RESULTS

A total of 20 studies were included. Number of participants ranged from 31 to 111 888. The AI-based models included artificial neural network (six studies), support vector machines (four studies), Bayesian methods (one study), super learner ensemble (one study), genetic programming (one study), unspecified machine learning models (two studies), and multiple machine learning models (five studies). Twelve studies (60%) had both training and testing cohorts. Among 14 studies (70%) where AUCs were reported, the mean AUC for AI versus conventional methods were 0.79 (95% CI: 0.74-0.85) versus 0.61 (95% CI: 0.54-0.68), respectively (p < .001). However, the good to excellent discriminative performance of AI methods is unlikely to be replicated when used in clinical practice, because most studies had high risk of bias due to missing data handling and outcome determination.

CONCLUSION

The use of AI appears to improve the accuracy of diagnostic and prognostic prediction of VTE over conventional risk models; however, there was a high risk of bias observed across studies. Future studies should focus on transparent reporting, external validation, and clinical application of these models.

Collapse

Affiliation(s)

Thita Chiasakul Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA Division of Hemostasis and Thrombosis, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA Division of Hematology, Faculty of Medicine, Department of Medicine, Center of Excellence in Translational Hematology, Chulalongkorn University and King Chulalongkorn Memorial Hospital, Bangkok, Thailand
Barbara D Lam Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA Division of Hemostasis and Thrombosis, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
Megan McNichol Division of Knowledge Services, Department of Information Services (M.M.), Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
William Robertson National Blood Clot Alliance, Philadelphia, Pennsylvania, USA Department of Emergency Healthcare, College of Health Professions, Weber State University, Ogden, Utah, USA
Rachel P Rosovsky Division of Hematology/Oncology, Department of Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
Leslie Lake National Blood Clot Alliance, Philadelphia, Pennsylvania, USA
Ioannis S Vlachos Department of Pathology, Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
Alys Adamski Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
Nimia Reyes Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
Karon Abe Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
Jeffrey I Zwicker Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA Division of Hemostasis and Thrombosis, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA Department of Medicine, Hematology Service, Memorial Sloan Kettering Cancer Center, New York City, New York, USA
Rushad Patell Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA Division of Hemostasis and Thrombosis, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA

Collapse

Regazzoni P, Jupiter JB, Liu WC, Fernández dell’Oca AA. Evidence-Based Surgery: What Can Intra-Operative Images Contribute? J Clin Med 2023;12:6809. [PMID: 37959274 PMCID: PMC10649165 DOI: 10.3390/jcm12216809] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/25/2023] [Accepted: 10/26/2023] [Indexed: 11/15/2023] Open

Abstract

Evidence-based medicine integrates results from randomized controlled trials (RCTs) and meta-analyses, combining the best external evidence with individual clinical expertise and patients' preferences. However, RCTs of surgery differ from those of medicine in that surgical performance is often assumed to be consistent. Yet, evaluating whether each surgery is performed to the same standard is quite challenging. As a primary issue, the novelty of this review is to emphasize-with a focus on orthopedic trauma-the advantage of having complete intra-operative image documentation, allowing the direct evaluation of the quality of the intra-operative technical performance. The absence of complete intra-operative image documentation leads to the inhomogeneity of case series, yielding inconsistent results due to the impossibility of a secondary analysis. Thus, comparisons and the reproduction of studies are difficult. Access to complete intra-operative image data in surgical RCTs allows not only secondary analysis but also comparisons with similar cases. Such complete data can be included in electronic papers. Offering these data to peers-in an accessible link-when presenting papers facilitates the selection process and improves publications for readers. Additionally, having access to the full set of image data for all presented cases serves as a rich resource for learning. It enables the reader to sift through the information and pinpoint the details that are most relevant to their individual needs, allowing them to potentially incorporate this knowledge into daily practice. A broad use of the concept of complete intra-operative image documentation is pivotal for bridging the gap between clinical research findings and real-world applications. Enhancing the quality of surgical RCTs would facilitate the equalization of evidence acquisition in both internal medicine and surgery. Joint effort by surgeons, scientific societies, publishers, and healthcare authorities is needed to support the ideas, implement economic requirements, and overcome the mental obstacles to its realization.

Collapse

Karlin EA, Lin CC, Meftah M, Slover JD, Schwarzkopf R. The Impact of Machine Learning on Total Joint Arthroplasty Patient Outcomes: A Systemic Review. J Arthroplasty 2023;38:2085-2095. [PMID: 36441039 DOI: 10.1016/j.arth.2022.10.039] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 10/19/2022] [Accepted: 10/24/2022] [Indexed: 11/27/2022] Open

Abstract

BACKGROUND

Supervised machine learning techniques have been increasingly applied to predict patient outcomes after hip and knee arthroplasty procedures. The purpose of this study was to systematically review the applications of supervised machine learning techniques to predict patient outcomes after primary total hip and knee arthroplasty.

METHODS

A comprehensive literature search using the electronic databases MEDLINE, EMBASE, Cochrane Central Register of Controlled Trials, and Cochrane Database of Systematic Reviews was conducted in July of 2021. The inclusion criteria were studies that utilized supervised machine learning techniques to predict patient outcomes after primary total hip or knee arthroplasty.

RESULTS

Search criteria yielded n = 30 relevant studies. Topics of study included patient complications (n = 6), readmissions (n = 1), revision (n = 2), patient-reported outcome measures (n = 4), patient satisfaction (n = 4), inpatient status and length of stay (LOS) (n = 9), opioid usage (n = 3), and patient function (n = 1). Studies involved TKA (n = 12), THA (n = 11), or a combination (n = 7). Less than 35% of predictive outcomes had an area under the receiver operating characteristic curve (AUC) in the excellent or outstanding range. Additionally, only 9 of the studies found improvement over logistic regression, and only 9 studies were externally validated.

CONCLUSION

Supervised machine learning algorithms are powerful tools that have been increasingly applied to predict patient outcomes after total hip and knee arthroplasty. However, these algorithms should be evaluated in the context of prognostic accuracy, comparison to traditional statistical techniques for outcome prediction, and application to populations outside the training set. While machine learning algorithms have been received with considerable interest, they should be critically assessed and validated prior to clinical adoption.

Collapse

Karnuta JM, Murphy MP, Luu BC, Ryan MJ, Haeberle HS, Brown NM, Iorio R, Chen AF, Ramkumar PN. Artificial Intelligence for Automated Implant Identification in Total Hip Arthroplasty: A Multicenter External Validation Study Exceeding Two Million Plain Radiographs. J Arthroplasty 2023;38:1998-2003.e1. [PMID: 35271974 DOI: 10.1016/j.arth.2022.03.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 02/23/2022] [Accepted: 03/01/2022] [Indexed: 02/02/2023] Open

Abstract

BACKGROUND

The surgical management of complications after total hip arthroplasty (THA) necessitates accurate identification of the femoral implant manufacturer and model. Automated image processing using deep learning has been previously developed and internally validated; however, external validation is necessary prior to responsible application of artificial intelligence (AI)-based technologies.

METHODS

We trained, validated, and externally tested a deep learning system to classify femoral-sided THA implants as one of the 8 models from 2 manufacturers derived from 2,954 original, deidentified, retrospectively collected anteroposterior plain radiographs across 3 academic referral centers and 13 surgeons. From these radiographs, 2,117 were used for training, 249 for validation, and 588 for external testing. Augmentation was applied to the training set (n = 2,117,000) to increase model robustness. Performance was evaluated by area under the receiver operating characteristic curve, sensitivity, specificity, and accuracy. Implant identification processing speed was calculated.

RESULTS

The training and testing sets were drawn from statistically different populations of implants (P < .001). After 1,000 training epochs by the deep learning system, the system discriminated 8 implant models with a mean area under the receiver operating characteristic curve of 0.991, accuracy of 97.9%, sensitivity of 88.6%, and specificity of 98.9% in the external testing dataset of 588 anteroposterior radiographs. The software classified implants at a mean speed of 0.02 seconds per image.

CONCLUSION

An AI-based software demonstrated excellent internal and external validation. Although continued surveillance is necessary with implant library expansion, this software represents responsible and meaningful clinical application of AI with immediate potential to globally scale and assist in preoperative planning prior to revision THA.

Collapse

Chen T, Liu C, Zhang Z, Liang T, Zhu J, Zhou C, Wu S, Yao Y, Huang C, Zhang B, Feng S, Wang Z, Huang S, Sun X, Chen L, Zhan X. Using Machine Learning to Predict Surgical Site Infection After Lumbar Spine Surgery. Infect Drug Resist 2023;16:5197-5207. [PMID: 37581167 PMCID: PMC10423613 DOI: 10.2147/idr.s417431] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 07/26/2023] [Indexed: 08/16/2023] Open

Abstract

Objective

The objective of this study was to utilize machine learning techniques to analyze perioperative factors and identify blood glucose levels that can predict the occurrence of surgical site infection following posterior lumbar spinal surgery.

Methods

A total of 4019 patients receiving lumbar internal fixation surgery from an institute were enrolled between June 2012 and February 2021. First, the filtered data were randomized into the test and verification groups. Second, in the test group, specific variables were screened using logistic regression analysis, Lasso regression analysis, support vector machine, and random forest. Specific variables obtained using the four methods were intersected, and a dynamic model was constructed. ROC and calibration curves were constructed to assess model performance. Finally, internal model performance was verified in the verification group using ROC and calibration curves.

Results

The data from 4019 patients were collected. In total, 1327 eligible cases were selected. By combining logistic regression analysis with three machine learning algorithms, this study identified four predictors associated with SSI, namely Modic changes, sebum thickness, hemoglobin, and glucose. Using this information, a prediction model was developed and visually represented. Then, we constructed ROC and calibration curves using the test group; the area under the ROC curve was 0.988. Further, calibration curve analysis revealed favorable consistency of nomogram-predicted values compared with real measurements. The C-index of our model was 0.986 (95% CI 0.981-0.994). Finally, we used the validation group to validate the model internally; the AUC was 0.987. Calibration curve analysis revealed favorable consistency of nomogram-predicted values compared with real measurements. The C-index was 0.982 (95% CI 0.974-0.999).

Conclusion

Logistic regression analysis and machine learning were employed to select four risk factors: Modic changes, sebum thickness, hemoglobin, and glucose. Then, a dynamic prediction model was constructed to help clinicians simplify the monitoring and prevention of SSI.

Collapse

Affiliation(s)

Tianyou Chen Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
Chong Liu Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
Zide Zhang Spine Ward, Liuzhou People’s Hospital, Liuzhou, People’s Republic of China
Tuo Liang Spine Ward, Liuzhou People’s Hospital, Liuzhou, People’s Republic of China
Jichong Zhu Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
Chenxing Zhou Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
Shaofeng Wu Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
Yuanlin Yao Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
Chengqian Huang Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
Bin Zhang Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
Sitan Feng Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
Zequn Wang Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
Shengsheng Huang Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
Xuhua Sun Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
Liyi Chen Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China
Xinli Zhan Department of Spine and Osteopathy Ward, the First Affiliated Hospital of Guangxi Medical University, Nanning, People’s Republic of China

Collapse

Dubin JA, Bains SS, Chen Z, Hameed D, Nace J, Mont MA, Delanois RE. Using a Google Web Search Analysis to Assess the Utility of ChatGPT in Total Joint Arthroplasty. J Arthroplasty 2023;38:1195-1202. [PMID: 37040823 DOI: 10.1016/j.arth.2023.04.007] [Citation(s) in RCA: 41] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 03/22/2023] [Accepted: 04/03/2023] [Indexed: 04/13/2023] Open

Abstract

BACKGROUND

Rapid technological advancements have laid the foundations for the use of artificial intelligence in medicine. The promise of machine learning (ML) lies in its potential ability to improve treatment decision making, predict adverse outcomes, and streamline the management of perioperative healthcare. In an increasing consumer-focused health care model, unprecedented access to information may extend to patients using ChatGPT to gain insight into medical questions. The main objective of our study was to replicate a patient's internet search in order to assess the appropriateness of ChatGPT, a novel machine learning tool released in 2022 that provides dialogue responses to queries, in comparison to Google Web Search, the most widely used search engine in the United States today, as a resource for patients for online health information. For the 2 different search engines, we compared i) the most frequently asked questions (FAQs) associated with total knee arthroplasty (TKA) and total hip arthroplasty (THA) by question type and topic; ii) the answers to the most frequently asked questions; as well as iii) the FAQs yielding a numerical response.

METHODS

A Google web search was performed with the following search terms: "total knee replacement" and "total hip replacement." These terms were individually entered and the first 10 FAQs were extracted along with the source of the associated website for each question. The following statements were inputted into ChatGPT: 1) "Perform a google search with the search term 'total knee replacement' and record the 10 most FAQs related to the search term" as well as 2) "Perform a google search with the search term 'total hip replacement' and record the 10 most FAQs related to the search term." A Google web search was repeated with the same search terms to identify the first 10 FAQs that included a numerical response for both "total knee replacement" and "total hip replacement." These questions were then inputted into ChatGPT and the questions and answers were recorded.

RESULTS

There were 5 of 20 (25%) questions that were similar when performing a Google web search and a search of ChatGPT for all search terms. Of the 20 questions asked for the Google Web Search, 13 of 20 were provided by commercial websites. For ChatGPT, 15 of 20 (75%) questions were answered by government websites, with the most frequent one being PubMed. In terms of numerical questions, 11 of 20 (55%) of the most FAQs provided different responses between a Google web search and ChatGPT.

CONCLUSION

A comparison of the FAQs by a Google web search with attempted replication by ChatGPT revealed heterogenous questions and responses for open and discrete questions. ChatGPT should remain a trending use as a potential resource to patients that needs further corroboration until its ability to provide credible information is verified and concordant with the goals of the physician and the patient alike.

Collapse

Langenhuijsen LFS, Janse RJ, Venema E, Kent DM, van Diepen M, Dekker FW, Steyerberg EW, de Jong Y. Systematic metareview of prediction studies demonstrates stable trends in bias and low PROBAST inter-rater agreement. J Clin Epidemiol 2023;159:159-173. [PMID: 37142166 DOI: 10.1016/j.jclinepi.2023.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 03/30/2023] [Accepted: 04/25/2023] [Indexed: 05/06/2023]

Lans A, Kanbier LN, Bernstein DN, Groot OQ, Ogink PT, Tobert DG, Verlaan JJ, Schwab JH. Social determinants of health in prognostic machine learning models for orthopaedic outcomes: A systematic review. J Eval Clin Pract 2023;29:292-299. [PMID: 36099267 DOI: 10.1111/jep.13765] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 08/22/2022] [Accepted: 08/27/2022] [Indexed: 11/26/2022]

Abstract

RATIONAL

Social determinants of health (SDOH) are being considered more frequently when providing orthopaedic care due to their impact on treatment outcomes. Simultaneously, prognostic machine learning (ML) models that facilitate clinical decision making have become popular tools in the field of orthopaedic surgery. When ML-driven tools are developed, it is important that the perpetuation of potential disparities is minimized. One approach is to consider SDOH during model development. To date, it remains unclear whether and how existing prognostic ML models for orthopaedic outcomes consider SDOH variables.

OBJECTIVE

To investigate whether prognostic ML models for orthopaedic surgery outcomes account for SDOH, and to what extent SDOH variables are included in the final models.

METHODS

A systematic search was conducted in PubMed, Embase and Cochrane for studies published up to 17 November 2020. Two reviewers independently extracted SDOH features using the PROGRESS+ framework (place of residence, race/ethnicity, Occupation, gender/sex, religion, education, social capital, socioeconomic status, 'Plus+' age, disability, and sexual orientation).

RESULTS

The search yielded 7138 studies, of which 59 met the inclusion criteria. Across all studies, 96% (57/59) considered at least one PROGRESS+ factor during development. The most common factors were age (95%; 56/59) and gender/sex (96%; 57/59). Differential effect analyses, such as subgroup analysis, covariate adjustment, and baseline comparison, were rarely reported (10%; 6/59). The majority of models included age (92%; 54/59) and gender/sex (69%; 41/59) as final input variables. However, factors such as insurance status (7%; 4/59), marital status (7%; 4/59) and income (3%; 2/59) were seldom included.

CONCLUSION

The current level of reporting and consideration of SDOH during the development of prognostic ML models for orthopaedic outcomes is limited. Healthcare providers should be critical of the models they consider using and knowledgeable regarding the quality of model development, such as adherence to recognized methodological standards. Future efforts should aim to avoid bias and disparities when developing ML-driven applications for orthopaedics.

Collapse

Bhashyam AR, Challa ST, Thomas H, Rodriguez EK, Weaver MJ. Clinic follow-up of orthopaedic trauma patients during and after the post-surgical global period: a retrospective cohort study. BMC Musculoskelet Disord 2023;24:120. [PMID: 36782143 PMCID: PMC9926540 DOI: 10.1186/s12891-023-06218-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Accepted: 02/02/2023] [Indexed: 02/15/2023] Open

Lans A, Bales JR, Fourman MS, Borkhetaria PP, Verlaan JJ, Schwab JH. Health Literacy in Orthopedic Surgery: A Systematic Review. HSS J 2023;19:120-127. [PMID: 36776507 PMCID: PMC9837407 DOI: 10.1177/15563316221110536] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 05/20/2022] [Indexed: 02/14/2023]

Abstract

Background: Limited health literacy has been associated with adverse health outcomes. Undergoing orthopedic surgery often requires patients to make complex decisions and adhere to complicated instructions, suggesting that health literacy skills might have a profound impact on orthopedic surgery outcomes. Purpose: We sought to review the literature for studies investigating the level of health literacy in patients undergoing orthopedic surgery and also to assess how those studies report factors affecting health equity. Methods: We conducted a systematic search of PubMed, Embase, and Cochrane Library for all health literacy studies published in the orthopedic surgery literature up to February 8, 2022. Search terms included synonyms for health literacy and for all orthopedic surgery subspecialties. Two reviewers independently extracted study data in addition to indicators of equity reporting using the PROGRESS+ checklist (Place of Residence, Race/Ethnicity, Occupation, Gender/sex, Religion, Education, Social capital, Socioeconomic status, plus age, disability, and sexual orientation). Results: The search resulted in 616 studies; 9 studies remained after exclusion criteria were applied. Most studies were of arthroplasty (4/9; 44%) or trauma (3/9; 33%) patients. Validated health literacy assessments were used in 4 of the included studies, and only 3 studies reported the rate of limited health literacy in the patients studied, which ranged between 34% and 38.5%. At least one PROGRESS+ item was reported in 88% (8/9) of the studies. Conclusions: We found a paucity of appropriately designed studies that used validated measures of health literacy in the field of orthopedic surgery. The potential impact of health literacy on orthopedic patients and their outcomes has yet to be elucidated. Thoughtful, high-quality trials across diverse demographics and geographies are warranted.

Collapse

Yen HK, Hu MH, Zijlstra H, Groot OQ, Hsieh HC, Yang JJ, Karhade AV, Chen PC, Chen YH, Huang PH, Chen YH, Xiao FR, Verlaan JJ, Schwab JH, Yang RS, Yang SH, Lin WH, Hsu FM. Prognostic significance of lab data and performance comparison by validating survival prediction models for patients with spinal metastases after radiotherapy. Radiother Oncol 2022;175:159-166. [PMID: 36067909 DOI: 10.1016/j.radonc.2022.08.029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 07/14/2022] [Accepted: 08/28/2022] [Indexed: 12/17/2022]

Abstract

BACKGROUND AND PURPOSE

Well-performing survival prediction models (SPMs) help patients and healthcare professionals to choose treatment aligning with prognosis. This retrospective study aims to investigate the prognostic impacts of laboratory data and to compare the performances of Metastases location, Elderly, Tumor primary, Sex, Sickness/comorbidity, and Site of radiotherapy (METSSS) model, New England Spinal Metastasis Score (NESMS), and Skeletal Oncology Research Group machine learning algorithm (SORG-MLA) for spinal metastases (SM).

MATERIALS AND METHODS

From 2010 to 2018, patients who received radiotherapy (RT) for SM at a tertiary center were enrolled and the data were retrospectively collected. Multivariate logistic and Cox-proportional-hazard regression analyses were used to assess the association between laboratory values and survival. The area under receiver-operating characteristics curve (AUROC), calibration analysis, Brier score, and decision curve analysis were used to evaluate the performance of SPMs.

RESULTS

A total of 2786 patients were included for analysis. The 90-day and 1-year survival rates after RT were 70.4% and 35.7%, respectively. Higher albumin, hemoglobin, or lymphocyte count were associated with better survival, while higher alkaline phosphatase, white blood cell count, neutrophil count, neutrophil-to-lymphocyte ratio, platelet-to-lymphocyte ratio, or international normalized ratio were associated with poor prognosis. SORG-MLA has the best discrimination (AUROC 90-day, 0.78; 1-year 0.76), best calibrations, and the lowest Brier score (90-day 0.16; 1-year 0.18). The decision curve of SORG-MLA is above the other two competing models with threshold probabilities from 0.1 to 0.8.

CONCLUSION

Laboratory data are of prognostic significance in survival prediction after RT for SM. Machine learning-based model SORG-MLA outperforms statistical regression-based model METSSS model and NESMS in survival predictions.

Collapse

Affiliation(s)

Hung-Kuan Yen Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan; Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsinchu, Taiwan; Department of Medical Education, National Taiwan University Hospital, Hsin-Chu Branch, Hsinchu, Taiwan
Ming-Hsiao Hu Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
Hester Zijlstra Department of Orthopaedics, University Medical Center Utrecht, Utrecht, Netherlands; Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, United States
Olivier Q Groot Department of Orthopaedics, University Medical Center Utrecht, Utrecht, Netherlands; Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, United States
Hsiang-Chieh Hsieh Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Hsinchu, Taiwan
Jiun-Jen Yang Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
Aditya V Karhade Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, United States
Po-Chao Chen Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
Yu-Han Chen Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
Po-Hao Huang Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
Yu-Hung Chen Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
Fu-Ren Xiao Division of Neurosurgery, Department of Surgery, National Taiwan University Hospital, Taipei, Taiwan
Jorrit-Jan Verlaan Department of Orthopaedics, University Medical Center Utrecht, Utrecht, Netherlands
Joseph H Schwab Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, MA, United States
Rong-Sen Yang Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
Shu-Hua Yang Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan
Wei-Hsin Lin Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan.
Feng-Ming Hsu Division of Radiation Oncology, Department of Oncology, National Taiwan University Hospital, Taipei, Taiwan; Graduate Institute of Oncology, National Taiwan University College of Medicine, Taipei, Taiwan; Department of Radiation Oncology, National Taiwan University Cancer Center, Taipei, Taiwan.

Collapse

Pan Y, Zhang Q, Zhang Y, Ge X, Gao X, Yang S, Xu J. Lane-change intention prediction using eye-tracking technology: A systematic review. APPLIED ERGONOMICS 2022;103:103775. [PMID: 35500523 DOI: 10.1016/j.apergo.2022.103775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 03/16/2022] [Accepted: 04/12/2022] [Indexed: 06/14/2023]

Smith H, Sweeting M, Morris T, Crowther MJ. A scoping methodological review of simulation studies comparing statistical and machine learning approaches to risk prediction for time-to-event data. Diagn Progn Res 2022;6:10. [PMID: 35650647 PMCID: PMC9161606 DOI: 10.1186/s41512-022-00124-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 03/01/2022] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

There is substantial interest in the adaptation and application of so-called machine learning approaches to prognostic modelling of censored time-to-event data. These methods must be compared and evaluated against existing methods in a variety of scenarios to determine their predictive performance. A scoping review of how machine learning methods have been compared to traditional survival models is important to identify the comparisons that have been made and issues where they are lacking, biased towards one approach or misleading.

METHODS

We conducted a scoping review of research articles published between 1 January 2000 and 2 December 2020 using PubMed. Eligible articles were those that used simulation studies to compare statistical and machine learning methods for risk prediction with a time-to-event outcome in a medical/healthcare setting. We focus on data-generating mechanisms (DGMs), the methods that have been compared, the estimands of the simulation studies, and the performance measures used to evaluate them.

RESULTS

A total of ten articles were identified as eligible for the review. Six of the articles evaluated a method that was developed by the authors, four of which were machine learning methods, and the results almost always stated that this developed method's performance was equivalent to or better than the other methods compared. Comparisons were often biased towards the novel approach, with the majority only comparing against a basic Cox proportional hazards model, and in scenarios where it is clear it would not perform well. In many of the articles reviewed, key information was unclear, such as the number of simulation repetitions and how performance measures were calculated.

CONCLUSION

It is vital that method comparisons are unbiased and comprehensive, and this should be the goal even if realising it is difficult. Fully assessing how newly developed methods perform and how they compare to a variety of traditional statistical methods for prognostic modelling is imperative as these methods are already being applied in clinical contexts. Evaluations of the performance and usefulness of recently developed methods for risk prediction should be continued and reporting standards improved as these methods become increasingly popular.

Collapse

Polce EM, Kunze KN, Dooley MS, Piuzzi NS, Boettner F, Sculco PK. Efficacy and Applications of Artificial Intelligence and Machine Learning Analyses in Total Joint Arthroplasty: A Call for Improved Reporting. J Bone Joint Surg Am 2022;104:821-832. [PMID: 35045061 DOI: 10.2106/jbjs.21.00717] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Abstract

BACKGROUND

There has been a considerable increase in total joint arthroplasty (TJA) research using machine learning (ML). Therefore, the purposes of this study were to synthesize the applications and efficacies of ML reported in the TJA literature, and to assess the methodological quality of these studies.

METHODS

PubMed, OVID/MEDLINE, and Cochrane libraries were queried in January 2021 for articles regarding the use of ML in TJA. Study demographics, topic, primary and secondary outcomes, ML model development and testing, and model presentation and validation were recorded. The TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines were used to assess the methodological quality.

RESULTS

Fifty-five studies were identified: 31 investigated clinical outcomes and resource utilization; 11, activity and motion surveillance; 10, imaging detection; and 3, natural language processing. For studies reporting the area under the receiver operating characteristic curve (AUC), the median AUC (and range) was 0.80 (0.60 to 0.97) among 26 clinical outcome studies, 0.99 (0.83 to 1.00) among 6 imaging-based studies, and 0.88 (0.76 to 0.98) among 3 activity and motion surveillance studies. Twelve studies compared ML to logistic regression, with 9 (75%) reporting that ML was superior. The average number of TRIPOD guidelines met was 11.5 (range: 5 to 18), with 38 (69%) meeting greater than half of the criteria. Presentation and explanation of the full model for individual predictions and assessments of model calibration were poorly reported (<30%).

CONCLUSIONS

The performance of ML models was good to excellent when applied to a wide variety of clinically relevant outcomes in TJA. However, reporting of certain key methodological and model presentation criteria was inadequate. Despite the recent surge in TJA literature utilizing ML, the lack of consistent adherence to reporting guidelines needs to be addressed to bridge the gap between model development and clinical implementation.

Collapse

Devana SK, Shah AA, Lee C, Gudapati V, Jensen AR, Cheung E, Solorzano C, van der Schaar M, SooHoo NF. Development of a Machine Learning Algorithm for Prediction of Complications and Unplanned Readmission Following Reverse Total Shoulder Arthroplasty. J Shoulder Elb Arthroplast 2022;5:24715492211038172. [PMID: 35330785 PMCID: PMC8938598 DOI: 10.1177/24715492211038172] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 06/21/2021] [Accepted: 07/20/2021] [Indexed: 11/22/2022] Open

Huang AW, Haslberger M, Coulibaly N, Galárraga O, Oganisian A, Belbasis L, Panagiotou OA. Multivariable prediction models for health care spending using machine learning: a protocol of a systematic review. Diagn Progn Res 2022;6:4. [PMID: 35321760 PMCID: PMC8943988 DOI: 10.1186/s41512-022-00119-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 01/18/2022] [Indexed: 12/11/2022] Open

Abstract

BACKGROUND

With rising cost pressures on health care systems, machine-learning (ML)-based algorithms are increasingly used to predict health care costs. Despite their potential advantages, the successful implementation of these methods could be undermined by biases introduced in the design, conduct, or analysis of studies seeking to develop and/or validate ML models. The utility of such models may also be negatively affected by poor reporting of these studies. In this systematic review, we aim to evaluate the reporting quality, methodological characteristics, and risk of bias of ML-based prediction models for individual-level health care spending.

METHODS

We will systematically search PubMed and Embase to identify studies developing, updating, or validating ML-based models to predict an individual's health care spending for any medical condition, over any time period, and in any setting. We will exclude prediction models of aggregate-level health care spending, models used to infer causality, models using radiomics or speech parameters, models of non-clinically validated predictors (e.g., genomics), and cost-effectiveness analyses without predicting individual-level health care spending. We will extract data based on the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies (CHARMS), previously published research, and relevant recommendations. We will assess the adherence of ML-based studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement and examine the inclusion of transparency and reproducibility indicators (e.g. statements on data sharing). To assess the risk of bias, we will apply the Prediction model Risk Of Bias Assessment Tool (PROBAST). Findings will be stratified by study design, ML methods used, population characteristics, and medical field.

DISCUSSION

Our systematic review will appraise the quality, reporting, and risk of bias of ML-based models for individualized health care cost prediction. This review will provide an overview of the available models and give insights into the strengths and limitations of using ML methods for the prediction of health spending.

Collapse

Body Composition Predictors of Adverse Postoperative Events in Patients Undergoing Surgery for Long Bone Metastases. J Am Acad Orthop Surg Glob Res Rev 2022;6:01979360-202203000-00010. [PMID: 35262530 PMCID: PMC8913089 DOI: 10.5435/jaaosglobal-d-22-00001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 01/03/2022] [Indexed: 11/23/2022]

Guzman-Vilca WC, Castillo-Cara M, Carrillo-Larco RM. Development, validation and application of a machine learning model to estimate salt consumption in 54 countries. eLife 2022;11:72930. [PMID: 34984979 PMCID: PMC8789317 DOI: 10.7554/elife.72930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 12/15/2021] [Indexed: 11/13/2022] Open

Groot OQ, Bindels BJJ, Ogink PT, Kapoor ND, Twining PK, Collins AK, Bongers MER, Lans A, Oosterhoff JHF, Karhade AV, Verlaan JJ, Schwab JH. Availability and reporting quality of external validations of machine-learning prediction models with orthopedic surgical outcomes: a systematic review. Acta Orthop 2021;92:385-393. [PMID: 33870837 PMCID: PMC8436968 DOI: 10.1080/17453674.2021.1910448] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open