1
|
Zabat MA, Giakas AM, Hohmann AL, Lonner JH. Interpreting the Current Literature on Outcomes of Robotic-Assisted Versus Conventional Total Knee Arthroplasty Using Fragility Analysis: A Systematic Review and Cross-Sectional Study of Randomized Controlled Trials. J Arthroplasty 2024; 39:1882-1887. [PMID: 38309638 DOI: 10.1016/j.arth.2024.01.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 01/18/2024] [Accepted: 01/24/2024] [Indexed: 02/05/2024] Open
Abstract
BACKGROUND Fragility analysis is a method of further characterizing outcomes in terms of the stability of statistical findings. This study assesses the statistical fragility of recent randomized controlled trials (RCTs) evaluating robotic-assisted versus conventional total knee arthroplasty (RA-TKA versus C-TKA). METHODS We queried PubMed for RCTs comparing alignment, function, and outcomes between RA-TKA and C-TKA. Fragility index (FI) and reverse fragility index (RFI) (collectively, "FI") were calculated for dichotomous outcomes as the number of outcome reversals needed to change statistical significance. Fragility quotient (FQ) was calculated by dividing the FI by the sample size for that outcome event. Median FI and FQ were calculated for all outcomes collectively as well as for each individual outcome. Subanalyses were performed to assess FI and FQ based on outcome event type and statistical significance, as well as study loss to follow-up and year of publication. RESULTS The overall median FI was 3.0 (interquartile range, [IQR] 1.0 to 6.3) and the median reverse fragility index was 3.0 (IQR 2.0 to 4.0). The overall median FQ was 0.027 (IQR 0.012 to 0.050). Loss to follow-up was greater than FI for 23 of the 38 outcomes assessed. CONCLUSIONS A small number of alternative outcomes is often enough to reverse the statistical significance of findings in RCTs evaluating dichotomous outcomes in RA-TKA versus C-TKA. We recommend reporting FI and FQ alongside P values to improve the interpretability of RCT results.
Collapse
Affiliation(s)
- Michelle A Zabat
- Department of Orthopaedic Surgery, NYU Langone Orthopaedic Hospital, New York, New York
| | - Alec M Giakas
- Rothman Orthopaedic Institute at Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Alexandra L Hohmann
- Rothman Orthopaedic Institute at Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Jess H Lonner
- Rothman Orthopaedic Institute at Thomas Jefferson University, Philadelphia, Pennsylvania
| |
Collapse
|
2
|
Dennstädt F, Zink J, Putora PM, Hastings J, Cihoric N. Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain. Syst Rev 2024; 13:158. [PMID: 38879534 PMCID: PMC11180407 DOI: 10.1186/s13643-024-02575-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 05/30/2024] [Indexed: 06/19/2024] Open
Abstract
BACKGROUND Systematically screening published literature to determine the relevant publications to synthesize in a review is a time-consuming and difficult task. Large language models (LLMs) are an emerging technology with promising capabilities for the automation of language-related tasks that may be useful for such a purpose. METHODS LLMs were used as part of an automated system to evaluate the relevance of publications to a certain topic based on defined criteria and based on the title and abstract of each publication. A Python script was created to generate structured prompts consisting of text strings for instruction, title, abstract, and relevant criteria to be provided to an LLM. The relevance of a publication was evaluated by the LLM on a Likert scale (low relevance to high relevance). By specifying a threshold, different classifiers for inclusion/exclusion of publications could then be defined. The approach was used with four different openly available LLMs on ten published data sets of biomedical literature reviews and on a newly human-created data set for a hypothetical new systematic literature review. RESULTS The performance of the classifiers varied depending on the LLM being used and on the data set analyzed. Regarding sensitivity/specificity, the classifiers yielded 94.48%/31.78% for the FlanT5 model, 97.58%/19.12% for the OpenHermes-NeuralChat model, 81.93%/75.19% for the Mixtral model and 97.58%/38.34% for the Platypus 2 model on the ten published data sets. The same classifiers yielded 100% sensitivity at a specificity of 12.58%, 4.54%, 62.47%, and 24.74% on the newly created data set. Changing the standard settings of the approach (minor adaption of instruction prompt and/or changing the range of the Likert scale from 1-5 to 1-10) had a considerable impact on the performance. CONCLUSIONS LLMs can be used to evaluate the relevance of scientific publications to a certain review topic and classifiers based on such an approach show some promising results. To date, little is known about how well such systems would perform if used prospectively when conducting systematic literature reviews and what further implications this might have. However, it is likely that in the future researchers will increasingly use LLMs for evaluating and classifying scientific publications.
Collapse
Affiliation(s)
- Fabio Dennstädt
- Department of Radiation Oncology, Cantonal Hospital of St. Gallen, St. Gallen, Switzerland.
- Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
| | - Johannes Zink
- Institute for Computer Science, University of Würzburg, Würzburg, Germany
| | - Paul Martin Putora
- Department of Radiation Oncology, Cantonal Hospital of St. Gallen, St. Gallen, Switzerland
- Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland
| | - Janna Hastings
- Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
- School of Medicine, University of St. Gallen, St. Gallen, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nikola Cihoric
- Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland
| |
Collapse
|
3
|
Meade M, Buchan L, Stark M, Woods B. Evidence-Based Medicine and Observational Studies. Clin Spine Surg 2024; 37:242-244. [PMID: 37941105 DOI: 10.1097/bsd.0000000000001550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Accepted: 10/03/2023] [Indexed: 11/10/2023]
Abstract
Evidence-based medicine drives medical decision-making in the modern era, which has historically favored randomized control trials. Despite their notoriety, randomized control trials have multiple disadvantages when applied to spinal surgery. Observational studies are popular in spinal surgery literature and are seen in various forms, such as retrospective studies and prospective cohort studies. For researchers, learners, and practicing spine surgeons, this paper describes options for study design when applied to spinal surgery.
Collapse
Affiliation(s)
- Matthew Meade
- Division of Orthopaedic Surgery, Jefferson Health, Stratford, NJ
| | - Levi Buchan
- Division of Orthopaedic Surgery, Jefferson Health, Stratford, NJ
| | - Michael Stark
- Division of Orthopaedic Surgery, Jefferson Health, Stratford, NJ
| | - Barrett Woods
- The Rothman Institute at Thomas Jefferson University, Philadelphia, PA
| |
Collapse
|
4
|
Brown AN, Yendluri A, Lawrence KW, Cordero JK, Moucha CS, Hayden BL, Parisien RL. The Statistical Fragility of Tranexamic Acid Use in the Orthopaedic Surgery Literature: A Systematic Review of Randomized Controlled Trials. J Am Acad Orthop Surg 2024; 32:508-515. [PMID: 38574390 DOI: 10.5435/jaaos-d-23-00503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 02/15/2024] [Indexed: 04/06/2024] Open
Abstract
INTRODUCTION Randomized controlled trials (RCTs) represent the highest level of evidence in orthopaedic surgery literature, although the robustness of statistical findings in these trials may be unreliable. We used the fragility index (FI), reverse fragility index (rFI), and fragility quotient (FQ) to evaluate the statistical stability of outcomes reported in RCTs that assess the use of tranexamic acid (TXA) across orthopaedic subspecialties. METHODS PubMed, EMBASE, and MEDLINE were queried for RCTs (2010-present) reporting dichotomous outcomes with study groups stratified by TXA administration. The FI and rFI were defined as the number of outcome event reversals needed to alter the significance level of significant and nonsignificant outcomes, respectively. FQ was determined by dividing the FI or rFI by sample size. Subgroup analyses were conducted based on orthopaedic subspecialty. RESULTS Six hundred five RCTs were screened with 108 studies included for analysis comprising 192 total outcomes. The median FI of the 192 outcomes was 4 (IQR 2 to 5) with an associated FQ of 0.03 (IQR 0.019 to 0.050). 45 outcomes were reported as statistically significant with a median FI of 1 (IQR 1 to 5) and associated FQ of 0.02 (IQR 0.011 to 0.034). 147 outcomes were reported as nonsignificant with a median rFI of 4 (IQR 3 to 5) and associated FQ of 0.04 (IQR 0.023 to 0.051). The adult reconstruction, trauma, and spine subspecialties had a median FI of 4. Sports had a median FI of 3. Shoulder and elbow and foot and ankle had median FIs of 6. DISCUSSION Statistical outcomes reported in RCTs on the use of TXA in orthopaedic surgery are fragile. Reversal of a few outcomes is sufficient to alter statistical significance. We recommend reporting FI, rFI, and FQ metrics to aid in interpreting the outcomes reported in comparative trials.
Collapse
Affiliation(s)
- Ashley N Brown
- From the Icahn School of Medicine at Mount Sinai, New York, NY (Brown, Yendluri, Cordero, Moucha, Hayden, Parisien), and the Boston University School of Medicine, Boston, MA (Lawrence)
| | | | | | | | | | | | | |
Collapse
|
5
|
Proal JD, Moon AS, Kwon B. The fragility index and reverse fragility index of FDA investigational device exemption trials in spinal fusion surgery: a systematic review. EUROPEAN SPINE JOURNAL : OFFICIAL PUBLICATION OF THE EUROPEAN SPINE SOCIETY, THE EUROPEAN SPINAL DEFORMITY SOCIETY, AND THE EUROPEAN SECTION OF THE CERVICAL SPINE RESEARCH SOCIETY 2024:10.1007/s00586-024-08317-3. [PMID: 38802596 DOI: 10.1007/s00586-024-08317-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/20/2024] [Accepted: 05/16/2024] [Indexed: 05/29/2024]
Abstract
PURPOSE FDA investigational device exemption (IDE) studies are considered a gold standard of assessing safety and efficacy of novel devices through RCTs. The fragility index (FI) has emerged as a means to assess robustness of statistically significant study results and inversely, the reverse fragility index (RFI) for non-significant differences. Previous authors have defined results as fragile if loss to follow up is greater than the FI or RFI. The aim of this study was to assess the FI, RFI, and robustness of data supplied by IDE studies in spinal surgery. METHODS This was a systematic review of the literature. Inclusion criteria included randomized controlled trials with dichotomous outcome measures conducted under IDE guidelines between 2000 and 2023. FI and RFI were calculated through successively changing events to non-events until the outcome changed to non-significance or significance, respectively. The fragility quotient (FQ) and reverse fragility quotient (RFQ) were calculated by dividing the FI and RFI, respectively, by the sample size. RESULTS Thirty-two studies met inclusion criteria with a total of 40 unique outcome measures; 240 outcomes were analyzed. Twenty-six studies reported 96 statistically significant results. The median FI was 6 (IQR: 3-9.25), and patients lost to follow up was greater than the FI in 99.0% (95/96) of results. The average FQ was 0.027. Thirty studies reported 144 statistically insignificant results and a median RFI of 6 (IQR: 4-8). The average RFQ extrapolated was 0.021, and loss to follow up was greater than the RFI in 98.6% (142/144) of results. CONCLUSIONS IDE studies in spine surgery are surprisingly fragile given their reputations, large sample sizes, and intent to establish safety in investigational devices. This study found a median FI and RFI of 6. The number of patients lost to follow-up was greater than FIand RFI in 98.8% (237/240) of reported outcomes. FQ and RFQ tell us that changes of two to three patients per hundred can flip the significance of reported outcomes. This is an important reminder of the limitations of RCTs. Analysis of fragility in future studies may help clarify the strength of the relationship between reported data and their conclusions.
Collapse
Affiliation(s)
- Joshua D Proal
- Tufts University School of Medicine, 145 Harrison Ave, Boston, MA, 02111, USA.
| | - Andrew S Moon
- Department of Orthopedic Surgery, Tufts Medical Center, Tufts University School of Medicine, 800 Washington St, Tufts MC Box #306, Boston, MA, 02111, USA
| | - Brian Kwon
- New England Baptist Hospital, Department of Orthopaedic Surgery, 125 Parker Hill Ave, Boston, MA, 02120, USA
| |
Collapse
|
6
|
Oeding JF, Krych AJ, Camp CL, Varady NH. The Number of Patients Lost to Follow-up May Exceed the Fragility Index of a Randomized Controlled Trial Without Reversing Statistical Significance: A Systematic Review and Statistical Model. Arthroscopy 2024:S0749-8063(24)00366-9. [PMID: 38777001 DOI: 10.1016/j.arthro.2024.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 04/21/2024] [Accepted: 05/02/2024] [Indexed: 05/25/2024]
Abstract
PURPOSE To 1) analyze trends in the publishing of statistical fragility index (FI)-based systematic reviews in the orthopaedic literature, including the prevalence of misleading or inaccurate statements related to the statistical fragility of randomized controlled trials (RCTs) and patients lost to follow-up (LTF) and 2) determine whether RCTs with relatively "low" FIs are truly as sensitive to patients LTF as previously portrayed in the literature. METHODS All FI-based studies published in the orthopaedic literature were identified using the Cochrane Database of Systematic Reviews, Web of Science Core Collection, PubMed, and MEDLINE databases. All articles involving application of the FI or reverse FI (RFI) to study the statistical fragility of studies in orthopaedics were eligible for inclusion in the study. Study characteristics, median FIs and sample sizes, and misleading or inaccurate statements related to the FI and patients LTF were recorded. Misleading or inaccurate statements were defined as those basing conclusions of trial fragility on the false assumption that adding patients LTF back to a trial has the same statistical effect as existing patients in a trial experiencing the opposite outcome and were determined by two authors. A theoretical RCT with a sample size of 100, p-value of 0.006, and an FI of 4 was used to evaluate the difference in effect on statistical significance between flipping outcome events of patients already included in the trial (the FI) vs. adding patients LTF back to the trial to demonstrate the true sensitivity of RCTs to patients LTF. RESULTS Of the 39 FI-based studies, 37 (95%) directly compared the FI to the number of patients lost to follow-up. Of these, 22 (59%) included a statement regarding the FI and patients LTF that was determined to be inaccurate or misleading. In the theoretical RCT, a reversal of significance was not observed until 7 patients LTF (nearly twice the FI) were added to the trial in the distribution of maximal significance reversal. CONCLUSIONS The claim that any RCT in which the number of patients LTF exceeds the FI could potentially have its significance reversed simply by maintaining study follow-ups is commonly inaccurate and prevalent in orthopaedic studies applying the FI. Patients LTF and the FI are not equivalent. The minimum number of patients LTF required to flip the significance of a typical RCT was demonstrated to be greater than the FI, suggesting RCTs with relatively "low" FIs may not be as sensitive to patients LTF as previously portrayed in the literature; however, only a holistic approach that considers the context in which the trial was conducted, potential biases, and study results can determine the merits of any particular RCT.
Collapse
Affiliation(s)
- Jacob F Oeding
- School of Medicine, Mayo Clinic Alix School of Medicine, Rochester, MN, USA; Department of Orthopaedics, Institute of Clinical Sciences, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden.
| | - Aaron J Krych
- Department of Orthopaedic Surgery, Mayo Clinic, Rochester, MN, USA
| | | | - Nathan H Varady
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, NY, USA
| |
Collapse
|
7
|
Suresh NV, Go BC, Fritz CG, Harris J, Ahluwalia V, Xu K, Lu J, Rajasekaran K. The fragility index: how robust are the outcomes of head and neck cancer randomised, controlled trials? J Laryngol Otol 2024; 138:451-456. [PMID: 37795709 PMCID: PMC10950446 DOI: 10.1017/s0022215123001755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 08/12/2023] [Accepted: 08/29/2023] [Indexed: 10/06/2023]
Abstract
BACKGROUND The fragility index represents the minimum number of patients required to convert an outcome from statistically significant to insignificant. This report assesses the fragility index of head and neck cancer randomised, controlled trials. METHODS Studies were extracted from PubMed/Medline, Scopus, Embase and Cochrane databases. RESULTS Overall, 123 randomised, controlled trials were included. The sample size and fragility index medians (interquartile ranges) were 103 (56-213) and 2 (0-5), respectively. The fragility index exceeded the number of patients lost to follow up in 42.3 per cent (n = 52) of studies. A higher fragility index correlated with higher sample size (r = 0.514, p < 0.001), number of events (r = 0.449, p < 0.001) and statistical significance via p-value (r = -0.367, p < 0.001). CONCLUSION Head and neck cancer randomised, controlled trials demonstrated low fragility index values, in which statistically significant results could be nullified by altering the outcomes of just two patients, on average. Future head and neck oncology randomised, controlled trials should report the fragility index in order to provide insight into statistical robustness.
Collapse
Affiliation(s)
- Neeraj V Suresh
- Department of Otorhinolaryngology – Head and Neck Surgery, University of Pennsylvania, Philadelphia, PA, USA
- Department of Otolaryngology – Head and Neck Surgery, Yale University, New Haven, CT, USA
| | - Beatrice C Go
- Department of Otorhinolaryngology – Head and Neck Surgery, University of Pennsylvania, Philadelphia, PA, USA
| | - Christian G Fritz
- Department of Otorhinolaryngology – Head and Neck Surgery, University of Pennsylvania, Philadelphia, PA, USA
| | - Jacob Harris
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Vinayak Ahluwalia
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Katherine Xu
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Joseph Lu
- Sidney Kimmel Medical College at Thomas Jefferson University, Philadelphia, PA, USA
| | - Karthik Rajasekaran
- Department of Otorhinolaryngology – Head and Neck Surgery, University of Pennsylvania, Philadelphia, PA, USA
- Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
8
|
Stern BZ, Poeran J. Statistics in Brief: The Fragility Index. Clin Orthop Relat Res 2023; 481:1288-1291. [PMID: 36862056 PMCID: PMC10263243 DOI: 10.1097/corr.0000000000002622] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 02/10/2023] [Indexed: 03/03/2023]
Affiliation(s)
- Brocha Z. Stern
- Leni and Peter W. May Department of Orthopaedics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Healthcare Delivery Science, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jashvant Poeran
- Leni and Peter W. May Department of Orthopaedics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Healthcare Delivery Science, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
9
|
Geisler FH, Moghaddamjou A, Wilson JRF, Fehlings MG. Methylprednisolone in acute traumatic spinal cord injury: case-matched outcomes from the NASCIS2 and Sygen historical spinal cord injury studies with contemporary statistical analysis. J Neurosurg Spine 2023; 38:595-606. [PMID: 36640098 DOI: 10.3171/2022.12.spine22713] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 12/12/2022] [Indexed: 01/15/2023]
Abstract
OBJECTIVE Methylprednisolone (MP) to treat acute traumatic spinal cord injury (ATSCI) remains controversial since the release of the second National Acute Spinal Cord Injury Study (NASCIS2) in 1990. As two historical studies, NASCIS2 and Sygen in ATSCI, used identical MP dosages, it was possible to construct a new case-level pooled ATSCI data set satisfying contemporary criteria and able to clarify the effect of MP. METHODS The new pooled data set was first modernized by excluding patients with injury levels caudal to T10, lower-extremity American Spinal Injury Association (ASIA) motor scores (LEMSs) ≥ 46, Glasgow Coma Scale scores ≤ 11, and age < 15 or > 75 years, and then standardized to the ASIA grading and scoring format. A new updated NASCIS2 data set from this pooled data set contained 31.6% fewer patients than the 1990 NASCIS2 data set. RESULTS In the new pooled data set, recovery of LEMSs from baseline to 26 weeks, the primary outcome variable, was separated statistically into five different injury severity cohorts (p < 0.0001). The severity cohorts contained groups with severe floor (62.9%) and ceiling (10.7%) effects, which do not contribute to drug effects. The new NASCIS2 data set duplicated the p value for MP versus placebo in the sub-subgroup analysis of MP initiated ≤ 8 hours (the subgroup) and recovery of motor function on only the right side of the body (a further subgroup within the ≤ 8-hour subgroup), presented as the positive MP effect in the original NASCIS2 reporting. However, current statistical interpretation considers results seen only in post hoc sub-subgroups, without multi-test corrections, to be random effects without clinical significance. The combined case-level pooled data set from the NASCIS2 and Sygen studies increased the MP group from 106 to 431 patients, creating a new MP combined group. This new data set served as a surrogate for a contemporary MP study and found that administration of MP did not enhance ASIA motor score improvement in the lower extremities at 26 weeks. Secondary analysis of descending ASIA motor and sensory cervical neurological levels in cervical ATSCI patients at 26 weeks also found no MP drug effect. CONCLUSIONS Analysis of both the new updated NASCIS2 data set and the new case-matched pooled data set from two historical ATSCI studies revealed that administration of MP after spinal cord injury did not demonstrate any enhancement in neurological recovery at 26 weeks. The results of this analysis warrant review by clinical guideline groups.
Collapse
Affiliation(s)
- Fred H Geisler
- 1Department of Medical Imaging, College of Medicine at the University of Saskatchewan, Saskatoon, Saskatchewan
| | - Ali Moghaddamjou
- 2Division of Neurosurgery, Department of Surgery, University of Toronto and Spinal Program, Toronto Western Hospital, University Health Network, Toronto, Ontario, Canada; and
| | - Jamie R F Wilson
- 3Department of Neurosurgery, College of Medicine, University of Nebraska Medical Center, Omaha, Nebraska
| | - Michael G Fehlings
- 2Division of Neurosurgery, Department of Surgery, University of Toronto and Spinal Program, Toronto Western Hospital, University Health Network, Toronto, Ontario, Canada; and
| |
Collapse
|
10
|
Lee Y, Samarasinghe Y, Chen LH, Jong A, Hapugall A, Javidan A, McKechnie T, Doumouras A, Hong D. Fragility of statistically significant findings from randomized trials in comparing laparoscopic versus robotic abdominopelvic surgeries. Surg Endosc 2023:10.1007/s00464-023-10063-4. [PMID: 37095233 DOI: 10.1007/s00464-023-10063-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 04/01/2023] [Indexed: 04/26/2023]
Abstract
BACKGROUND Utility of robotic over laparoscopic approach has been an area of debate across all surgical specialties over the past decade. The fragility index (FI) is a metric that evaluates the frailty of randomized controlled trials (RCTs) findings by altering the status of patients from an event to non-event until significance is lost. This study aims to evaluate the robustness of RCTs comparing laparoscopic and robotic abdominopelvic surgeries through the FI. METHODS A search was conducted in MEDLINE and EMBASE for RCTs with dichotomous outcomes comparing laparoscopic and robot-assisted surgery in general surgery, gynecology, and urology. The FI and reverse fragility Index (RFI) metrics were used to assess the strength of findings reported by RCTs, and bivariate correlation was conducted to analyze relationships between FI and trial characteristics. RESULTS A total of 21 RCTs were included, with a median sample size of 89 participants (Interquartile range [IQR] 62-126). The median FI was 2 (IQR 0-15) and median RFI 5.5 (IQR 4-8.5). The median FI was 3 (IQR 1-15) for general surgery (n = 7), 2 (0.5-3.5) for gynecology (n = 4), and 0 (IQR 0-8.5) for urology RCTs (n = 4). Correlation was found between increasing FI and decreasing p-value, but not sample size, number of outcome events, journal impact factor, loss to follow-up, or risk of bias. CONCLUSION RCTs comparing laparoscopic and robotic abdominal surgery did not prove to be very robust. While possible advantages of robotic surgery may be emphasized, it remains novel and requires further concrete RCT data.
Collapse
Affiliation(s)
- Yung Lee
- Division of General Surgery, McMaster University, Hamilton, ON, Canada
- Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | | | - Lucy H Chen
- Division of General Surgery, McMaster University, Hamilton, ON, Canada
| | - Audrey Jong
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Akithma Hapugall
- Division of General Surgery, McMaster University, Hamilton, ON, Canada
| | - Arshia Javidan
- Division of Vascular Surgery, University of Toronto, Toronto, ON, Canada
| | - Tyler McKechnie
- Division of General Surgery, McMaster University, Hamilton, ON, Canada
- Department of Health Research Methods and Evidence, McMaster University, Hamilton, ON, Canada
| | | | - Dennis Hong
- Division of General Surgery, McMaster University, Hamilton, ON, Canada.
- Division of General Surgery, St. Joseph's Healthcare, 50 Charlton Avenue East, Hamilton, ON, L8N 4A6, Canada.
| |
Collapse
|
11
|
The efficiency of machine learning-assisted platform for article screening in systematic reviews in orthopaedics. INTERNATIONAL ORTHOPAEDICS 2023; 47:551-556. [PMID: 36562816 DOI: 10.1007/s00264-022-05672-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 12/15/2022] [Indexed: 12/24/2022]
Abstract
PURPOSE With the development of machine learning and artificial intelligence, various platforms were developed to aid in the time-consuming process of article screening in systematic reviews. We aim to analyze the efficiency of a machine learning-assisted platform as an end-user to aid in the screening of the articles for selection into systematic review in orthopaedic surgery. METHODS We included three previously published systematic reviews in the field of orthopaedics of increasing levels of difficulty in the structure of the research question to assess the efficiency of a platform with active-learning technology for article screening. We compared the efficiency of the platform compared to the traditional screening and also across the various scenarios tested. We performed five iterations for each review analyzed. The outcome parameters analyzed were the work saved at 95% recall (WSS-95), work saved at 100% recall (WSS-100), and relevant records found after screening the first 30% of the total records (RRF-30). RESULTS The machine learning-assisted screening significantly improved the rate of identifying the relevant records compared to the traditional screening method (p<0.001). The WSS-95 for the easy, intermediate, and advanced screening scenarios were 78%, 59%, and 38%, respectively. The WSS-100 for the easy, intermediate, and advanced screening scenarios were 75%, 48%, and 7%, respectively. The RRF-30 for the easy, intermediate, and advanced screening scenarios were 97%, 86%, and 64%, respectively. We noted a significant reduction (p<0.001) in the efficiency with the increasing level of difficulty of the screening scenarios. CONCLUSION The machine learning platform is significantly better than the traditional method as an assistive technology to aid in article screening. However, the efficiency of the platform significantly decreases as the complexity of the research question increases.
Collapse
|
12
|
Milto AJ, Negri CE, Baker J, Thuppal S. The Statistical Fragility of Foot and Ankle Surgery Randomized Controlled Trials. J Foot Ankle Surg 2022; 62:191-196. [PMID: 36182644 DOI: 10.1053/j.jfas.2022.08.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 08/16/2022] [Accepted: 08/27/2022] [Indexed: 02/03/2023]
Abstract
Fragility index (FI) is a metric used to interpret the results of randomized controlled trials (RCTs), and describes the number of subjects that would need to be switched from event to non-event for a result to no longer be significant. Studies that analyze FI of RCTs in various orthopedic subspecialties have shown the RCTs to be largely underpowered and highly fragile. However, FI has not been assessed in foot and ankle RCTs. The MEDLINE and Embase online databases were searched from 1/1/2011 through 11/19/2021 for RCTs involving foot and ankle conditions. FI, fragility quotient (FQ), and difference between the FI and number of subjects lost to follow-up was calculated. Spearman correlation was performed to determine the relationship between sample size and FI. Overall, 1262 studies were identified of which 18 were included in the final analysis. The median sample size was 65 (interquartile range [IQR] 57-95.5), the median FI was 2 (IQR 1-2.5), and the median FQ was 0.026 (IQR 0.012-0.033). Ten of 15 (67%) studies with non-zero FI values had FI values less than the number of subjects lost to follow-up. There was linear association between FI and sample size (R2 = 0.495, p-value: .031). This study demonstrates that RCTs in the field of foot and ankle surgery are highly fragile, similar to other orthopedic subspecialties.
Collapse
Affiliation(s)
- Anthony J Milto
- Division of Orthopedics and Rehabilitation, Department of Surgery, Southern Illinois University School of Medicine, Springfield, IL; Center for Clinical Research, Southern Illinois University School of Medicine, Springfield, IL
| | - Cecily E Negri
- Division of Orthopedics and Rehabilitation, Department of Surgery, Southern Illinois University School of Medicine, Springfield, IL
| | - Jeffrey Baker
- Division of Orthopedics and Rehabilitation, Department of Surgery, Southern Illinois University School of Medicine, Springfield, IL
| | - Sowmyanarayanan Thuppal
- Division of Orthopedics and Rehabilitation, Department of Surgery, Southern Illinois University School of Medicine, Springfield, IL; Center for Clinical Research, Southern Illinois University School of Medicine, Springfield, IL.
| |
Collapse
|
13
|
Fackler NP, Karasavvidis T, Ehlers CB, Callan KT, Lai WC, Parisien RL, Wang D. The Statistical Fragility of Operative vs Nonoperative Management for Achilles Tendon Rupture: A Systematic Review of Comparative Studies. Foot Ankle Int 2022; 43:1331-1339. [PMID: 36004430 PMCID: PMC9527367 DOI: 10.1177/10711007221108078] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
BACKGROUND The statistical significance of randomized controlled trials (RCTs) and comparative studies is often conveyed utilizing the P value. However, P values are an imperfect measure and may be vulnerable to a small number of outcome reversals to alter statistical significance. The interpretation of the statistical strength of these studies may be aided by the inclusion of a Fragility Index (FI) and Fragility Quotient (FQ). This study examines the statistical stability of studies comparing operative vs nonoperative management for Achilles tendon rupture. METHODS A systematic search was performed of 10 orthopaedic journals between 2000 and 2021 for comparative studies focusing on management of Achilles tendon rupture reporting dichotomous outcome measures. FI for each outcome was determined by the number of event reversals necessary to alter significance (P < .05). FQ was calculated by dividing the FI by the respective sample size. Additional subgroup analyses were performed. RESULTS Of 8020 studies screened, 1062 met initial search criteria with 17 comparative studies ultimately included for analysis, 10 of which were RCTs. A total of 40 outcomes were examined. Overall, the median FI was 2.5 (interquartile range [IQR] 2-4), the mean FI was 2.90 (±1.58), the median FQ was 0.032 (IQR 0.012-0.069), and the mean FQ was 0.049 (±0.062). The FI was less than the number of patients lost to follow-up for 78% of outcomes. CONCLUSION Studies examining the efficacy of operative vs nonoperative management of Achilles tendon rupture may not be as statistically stable as previously thought. The average number of outcome reversals needed to alter the significance of a given study was 2.90. Future analyses may benefit from the inclusion of a fragility index and a fragility quotient in their statistical analyses.
Collapse
Affiliation(s)
- Nathan P. Fackler
- University of California, Irvine, CA,
USA,Georgetown University School of
Medicine, Washington, DC, USA
| | | | | | | | | | | | - Dean Wang
- University of California, Irvine, CA,
USA,Dean Wang, MD, University of California,
Irvine, 101 The City Drive South, Pavilion III, Building 29A, Orange, CA 92686,
USA.
| |
Collapse
|
14
|
Fackler NP, Ehlers CB, Callan KT, Amirhekmat A, Smith EJ, Parisien RL, Wang D. Statistical Fragility of Single-Row Versus Double-Row Anchoring for Rotator Cuff Repair: A Systematic Review of Comparative Studies. Orthop J Sports Med 2022; 10:23259671221093391. [PMID: 35571970 PMCID: PMC9096204 DOI: 10.1177/23259671221093391] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 02/17/2022] [Indexed: 01/08/2023] Open
Abstract
Background: Comparative studies and randomized controlled trials (RCTs) often use the P (probability) value to convey the statistical significance of their findings. P values are an imperfect measure, however, and are vulnerable to a small number of outcome reversals to alter statistical significance. The inclusion of a fragility index (FI) and fragility quotient (FQ) may aid in the interpretation of a study’s statistical strength. Purpose/Hypothesis: The purpose of this study was to examine the statistical stability of studies comparing single-row to double-row rotator cuff repair. It was hypothesized that the findings of these studies would be vulnerable to a small number of outcome event reversals, often fewer than the number of patients lost to follow-up. Study Design: Systematic review; Level of evidence, 3. Methods: We analyzed comparative studies and RCTs on primary single-row versus double-row rotator cuff repair that were published between 2000 and 2021 in 10 leading orthopaedic journals. Statistical significance was defined as a P < .05. The FI for each outcome was determined by the number of event reversals necessary to alter significance. The FQ was calculated by dividing the FI by the respective sample size. Results: Of 4896 studies screened, 22 comparative studies, 10 of which were RCTs, were ultimately included for analysis. A total of 74 outcomes were examined. Overall, the median FI was 2 (interquartile range [IQR], 1-3), and the median FQ was 0.035 (IQR, 0.020-0.057). The mean FI was 2.55 ± 1.29, and the mean FQ was 0.043 ± 0.027. In 64% of outcomes, the FI was less than the number of patients lost to follow-up.) Additionally, 81% of significant outcomes needed just a single outcome reversal to lose their significance. Conclusion: Over half of the studies currently used to guide clinical practice have a number of patients lost to follow-up greater than their FI. The results of these studies should be interpreted within the context of these limitations. Future analyses may benefit from the inclusion of the FI and the FQ in their statistical analyses.
Collapse
Affiliation(s)
- Nathan P. Fackler
- Department of Orthopaedic Surgery, University of California, Irvine, Irvine, California, USA
- Georgetown University School of Medicine, Washington, DC, USA
| | - Cooper B. Ehlers
- Department of Orthopaedic Surgery, University of California, San Diego, San Diego, California, USA
| | - Kylie T. Callan
- Department of Orthopaedic Surgery, University of California, Irvine, Irvine, California, USA
| | - Arya Amirhekmat
- Department of Orthopaedic Surgery, University of California, Irvine, Irvine, California, USA
| | - Eric J. Smith
- Department of Orthopaedic Surgery, University of California, Irvine, Irvine, California, USA
| | | | - Dean Wang
- Department of Orthopaedic Surgery, University of California, Irvine, Irvine, California, USA
| |
Collapse
|
15
|
Itaya T, Isobe Y, Suzuki S, Koike K, Nishigaki M, Yamamoto Y. The Fragility of Statistically Significant Results in Randomized Clinical Trials for COVID-19. JAMA Netw Open 2022; 5:e222973. [PMID: 35302631 PMCID: PMC8933746 DOI: 10.1001/jamanetworkopen.2022.2973] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
IMPORTANCE Interpreting results from randomized clinical trials (RCTs) for COVID-19, which have been published rapidly and in vast numbers, is challenging during a pandemic. OBJECTIVE To evaluate the robustness of statistically significant findings from RCTs for COVID-19 using the fragility index. DESIGN, SETTING, AND PARTICIPANTS This cross-sectional study included COVID-19 trial articles that randomly assigned patients 1:1 into 2 parallel groups and reported at least 1 binary outcome as significant in the abstract. A systematic search was conducted using PubMed to identify RCTs on COVID-19 published until August 7, 2021. EXPOSURES Trial characteristics, such as type of intervention (treatment drug, vaccine, or others), number of outcome events, and sample size. MAIN OUTCOMES AND MEASURES Fragility index. RESULTS Of the 47 RCTs for COVID-19 included, 36 (77%) were studies of the effects of treatment drugs, 5 (11%) were studies of vaccines, and 6 (13%) were of other interventions. A total of 138 235 participants were included in these trials. The median (IQR) fragility index of the included trials was 4 (1-11). The medians (IQRs) of the fragility indexes of RCTs of treatment drugs, vaccines, and other interventions were 2.5 (1-6), 119 (61-139), and 4.5 (1-18), respectively. The fragility index among more than half of the studies was less than 1% of each sample size, although the fragility index as a proportion of events needing to change would be much higher. CONCLUSIONS AND RELEVANCE This cross-sectional study found a relatively small number of events (a median of 4) would be required to change the results of COVID-19 RCTs from statistically significant to not significant. These findings suggest that health care professionals and policy makers should not rely heavily on individual results of RCTs for COVID-19.
Collapse
Affiliation(s)
- Takahiro Itaya
- Department of Healthcare Epidemiology, Graduate School of Medicine and Public Health, Kyoto University, Kyoto, Japan
| | - Yotsuha Isobe
- Department of Human Health Sciences, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Sayoko Suzuki
- Department of Human Health Sciences, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Kanako Koike
- Department of Medical Genetics, International University of Health and Welfare Graduate School, Tokyo, Japan
| | - Masakazu Nishigaki
- Department of Medical Genetics, International University of Health and Welfare Graduate School, Tokyo, Japan
| | - Yosuke Yamamoto
- Department of Healthcare Epidemiology, Graduate School of Medicine and Public Health, Kyoto University, Kyoto, Japan
| |
Collapse
|
16
|
Marasco D, Russo J, Izzo A, Vallefuoco S, Coppola F, Patel S, Smeraglia F, Balato G, Mariconda M, Bernasconi A. Static versus dynamic fixation of distal tibiofibular syndesmosis: a systematic review of overlapping meta-analyses. Knee Surg Sports Traumatol Arthrosc 2021; 29:3534-3542. [PMID: 34455448 DOI: 10.1007/s00167-021-06721-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 08/24/2021] [Indexed: 11/26/2022]
Abstract
PURPOSE Multiple Level I meta-analyses were conducted comparing traditional static vs. more recently introduced dynamic strategies of fixation for injuries of the distal tibiofibular syndesmosis (TFS). The aim of this review was to assess their robustness and methodological quality, providing support in the choice of a treatment strategy in case of TFS injury using the highest level of evidence. METHODS In this systematic review, conducted in accordance with the PRISMA guidelines, meta-analyses/systematic reviews comparing static and dynamic fixation methods after acute TFS injury were identified. The robustness of studies was evaluated using the fragility index (FI) for meta-analysis and the fragility quotient (FQ). The risk of bias was evaluated using the Assessment of Multiple Systematic Reviews (AMSTAR) instrument. Finally, the Jadad was applied to select the study which provided the highest quality of evidence to develop recommendations for the fixation strategy of these lesions. RESULTS Out of 1.302 records, four Level I meta-analyses were included in this study. Analyzing the statistically significant dichotomous outcomes, the median FI was 3.5 (IQR, 2 to 5.5; range, 1 to 9), while the median FQ was 1.9% (IQR, 1 to 3.5; range 0.35 to 4.4). In total, 37% had an FI of 2 or less and 75% of outcomes had a FI of 4 or less. According to the AMSTAR score and Jadad algorithm, the largest meta-analysis was selected as the highest evidence provided so far. CONCLUSION The meta-analyses with statistically significant dichotomous outcomes comparing dynamic and static fixation for treating injuries of the distal tibiofibular syndesmosis are fragile, with a change in less than four patients or less than 2% of the study population sufficient to reverse a significant outcome to nonsignificant. LEVEL OF EVIDENCE Level I.
Collapse
Affiliation(s)
- Domenico Marasco
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Jacopo Russo
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Antonio Izzo
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Salvatore Vallefuoco
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Francesco Coppola
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Shelain Patel
- Foot and Ankle Unit, Royal National Orthopaedic Hospital, Stanmore, UK
| | - Francesco Smeraglia
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Giovanni Balato
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Massimo Mariconda
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Alessio Bernasconi
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy.
| |
Collapse
|