1
|
Proal JD, Moon AS, Kwon B. The fragility index and reverse fragility index of FDA investigational device exemption trials in spinal fusion surgery: a systematic review. EUROPEAN SPINE JOURNAL : OFFICIAL PUBLICATION OF THE EUROPEAN SPINE SOCIETY, THE EUROPEAN SPINAL DEFORMITY SOCIETY, AND THE EUROPEAN SECTION OF THE CERVICAL SPINE RESEARCH SOCIETY 2024:10.1007/s00586-024-08317-3. [PMID: 38802596 DOI: 10.1007/s00586-024-08317-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/20/2024] [Accepted: 05/16/2024] [Indexed: 05/29/2024]
Abstract
PURPOSE FDA investigational device exemption (IDE) studies are considered a gold standard of assessing safety and efficacy of novel devices through RCTs. The fragility index (FI) has emerged as a means to assess robustness of statistically significant study results and inversely, the reverse fragility index (RFI) for non-significant differences. Previous authors have defined results as fragile if loss to follow up is greater than the FI or RFI. The aim of this study was to assess the FI, RFI, and robustness of data supplied by IDE studies in spinal surgery. METHODS This was a systematic review of the literature. Inclusion criteria included randomized controlled trials with dichotomous outcome measures conducted under IDE guidelines between 2000 and 2023. FI and RFI were calculated through successively changing events to non-events until the outcome changed to non-significance or significance, respectively. The fragility quotient (FQ) and reverse fragility quotient (RFQ) were calculated by dividing the FI and RFI, respectively, by the sample size. RESULTS Thirty-two studies met inclusion criteria with a total of 40 unique outcome measures; 240 outcomes were analyzed. Twenty-six studies reported 96 statistically significant results. The median FI was 6 (IQR: 3-9.25), and patients lost to follow up was greater than the FI in 99.0% (95/96) of results. The average FQ was 0.027. Thirty studies reported 144 statistically insignificant results and a median RFI of 6 (IQR: 4-8). The average RFQ extrapolated was 0.021, and loss to follow up was greater than the RFI in 98.6% (142/144) of results. CONCLUSIONS IDE studies in spine surgery are surprisingly fragile given their reputations, large sample sizes, and intent to establish safety in investigational devices. This study found a median FI and RFI of 6. The number of patients lost to follow-up was greater than FIand RFI in 98.8% (237/240) of reported outcomes. FQ and RFQ tell us that changes of two to three patients per hundred can flip the significance of reported outcomes. This is an important reminder of the limitations of RCTs. Analysis of fragility in future studies may help clarify the strength of the relationship between reported data and their conclusions.
Collapse
Affiliation(s)
- Joshua D Proal
- Tufts University School of Medicine, 145 Harrison Ave, Boston, MA, 02111, USA.
| | - Andrew S Moon
- Department of Orthopedic Surgery, Tufts Medical Center, Tufts University School of Medicine, 800 Washington St, Tufts MC Box #306, Boston, MA, 02111, USA
| | - Brian Kwon
- New England Baptist Hospital, Department of Orthopaedic Surgery, 125 Parker Hill Ave, Boston, MA, 02120, USA
| |
Collapse
|
2
|
Oeding JF, Krych AJ, Camp CL, Varady NH. The Number of Patients Lost to Follow-up May Exceed the Fragility Index of a Randomized Controlled Trial Without Reversing Statistical Significance: A Systematic Review and Statistical Model. Arthroscopy 2024:S0749-8063(24)00366-9. [PMID: 38777001 DOI: 10.1016/j.arthro.2024.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 04/21/2024] [Accepted: 05/02/2024] [Indexed: 05/25/2024]
Abstract
PURPOSE To 1) analyze trends in the publishing of statistical fragility index (FI)-based systematic reviews in the orthopaedic literature, including the prevalence of misleading or inaccurate statements related to the statistical fragility of randomized controlled trials (RCTs) and patients lost to follow-up (LTF) and 2) determine whether RCTs with relatively "low" FIs are truly as sensitive to patients LTF as previously portrayed in the literature. METHODS All FI-based studies published in the orthopaedic literature were identified using the Cochrane Database of Systematic Reviews, Web of Science Core Collection, PubMed, and MEDLINE databases. All articles involving application of the FI or reverse FI (RFI) to study the statistical fragility of studies in orthopaedics were eligible for inclusion in the study. Study characteristics, median FIs and sample sizes, and misleading or inaccurate statements related to the FI and patients LTF were recorded. Misleading or inaccurate statements were defined as those basing conclusions of trial fragility on the false assumption that adding patients LTF back to a trial has the same statistical effect as existing patients in a trial experiencing the opposite outcome and were determined by two authors. A theoretical RCT with a sample size of 100, p-value of 0.006, and an FI of 4 was used to evaluate the difference in effect on statistical significance between flipping outcome events of patients already included in the trial (the FI) vs. adding patients LTF back to the trial to demonstrate the true sensitivity of RCTs to patients LTF. RESULTS Of the 39 FI-based studies, 37 (95%) directly compared the FI to the number of patients lost to follow-up. Of these, 22 (59%) included a statement regarding the FI and patients LTF that was determined to be inaccurate or misleading. In the theoretical RCT, a reversal of significance was not observed until 7 patients LTF (nearly twice the FI) were added to the trial in the distribution of maximal significance reversal. CONCLUSIONS The claim that any RCT in which the number of patients LTF exceeds the FI could potentially have its significance reversed simply by maintaining study follow-ups is commonly inaccurate and prevalent in orthopaedic studies applying the FI. Patients LTF and the FI are not equivalent. The minimum number of patients LTF required to flip the significance of a typical RCT was demonstrated to be greater than the FI, suggesting RCTs with relatively "low" FIs may not be as sensitive to patients LTF as previously portrayed in the literature; however, only a holistic approach that considers the context in which the trial was conducted, potential biases, and study results can determine the merits of any particular RCT.
Collapse
Affiliation(s)
- Jacob F Oeding
- School of Medicine, Mayo Clinic Alix School of Medicine, Rochester, MN, USA; Department of Orthopaedics, Institute of Clinical Sciences, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden.
| | - Aaron J Krych
- Department of Orthopaedic Surgery, Mayo Clinic, Rochester, MN, USA
| | | | - Nathan H Varady
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, NY, USA
| |
Collapse
|
3
|
Ahn BJ, Quinn M, Zhao L, He EW, Dworkin M, Naphade O, Byrne RA, Molino J, Blankenhorn B. Statistical Fragility Analysis of Open Reduction Internal Fixation vs Primary Arthrodesis to Treat Lisfranc Injuries: A Systematic Review. Foot Ankle Int 2024; 45:298-308. [PMID: 38327213 DOI: 10.1177/10711007231224797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
BACKGROUND There is a lack of consensus in the use of open reduction internal fixation (ORIF) vs primary arthrodesis (PA) in the management of Lisfranc injuries. Statistical fragility represents the number of events needed to flip statistical significance and provides context to interpret P values of outcomes from conflicting studies. The current study evaluates the statistical fragility of existing research with an outcome-specific approach to provide statistical clarity to the ORIF vs PA discussion. We hypothesized that statistical fragility analysis would offer clinically relevant insight when interpreting conflicting outcomes regarding ORIF vs PA management of Lisfranc injuries. METHODS All comparative studies, RCTs, and case-series investigating ORIF vs PA management of Lisfranc injuries published through October 5, 2023, were identified. Descriptive characteristics, dichotomous outcomes, and continuous outcomes were extracted. Fragility index and continuous fragility index were calculated by the number of event reversals needed to alter significance. Outcomes were categorized by clinical relevance, and median FI and CFI were reported. RESULTS A total of 244 studies were screened. Ten studies and 67 outcomes (44 dichotomous, 23 continuous) were included in the fragility analysis. Of the 10 studies, 4 studies claimed PA to correlate with superior outcomes compared to ORIF with regard to functional scores and return to function outcomes. Of these 4 studies, 3 were statistically robust. Six studies claimed PA and ORIF to have no differences in outcomes, in which only 2 studies were statistically robust. CONCLUSION The overall research regarding ORIF vs PA is relatively robust compared with other orthopaedic areas of controversy. Although the full statistical context of each article must be considered, studies supporting PA superiority with regard to functional scores and return to function metrics were found to be statistically robust. Outcome-specific analysis revealed moderate fragility in several clinically relevant outcomes such as functional score, return to function, and wound complications.
Collapse
Affiliation(s)
- Benjamin J Ahn
- The Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Matthew Quinn
- Department of Orthopaedic Surgery, The Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Leon Zhao
- Department of Orthopaedic Surgery, The Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Elaine W He
- The Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Myles Dworkin
- Department of Orthopaedic Surgery, The Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Om Naphade
- The Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Rory A Byrne
- Department of Orthopaedic Surgery, The Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Janine Molino
- The Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Brad Blankenhorn
- Department of Orthopaedic Surgery, The Warren Alpert Medical School of Brown University, Providence, RI, USA
| |
Collapse
|
4
|
Wang A, Kwon D, Kim E, Oleru O, Seyidova N, Taub PJ. Statistical fragility of outcomes in acellular dermal matrix literature: A systematic review of randomized controlled trials. J Plast Reconstr Aesthet Surg 2024; 91:284-292. [PMID: 38432086 PMCID: PMC10984759 DOI: 10.1016/j.bjps.2024.02.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 02/04/2024] [Indexed: 03/05/2024]
Abstract
BACKGROUND Acellular dermal matrix (ADM) is commonly used in plastic and reconstructive surgery. With the abundance of randomized controlled trials (RCTs) reporting P-values for ADM outcomes, this study used the fragility index (FI), reverse fragility index (rFI), and fragility quotient (FQ) to evaluate the statistical stability of the outcomes in ADM RCTs. METHODS PubMed, Embase, SCOPUS, Medline, and Cochrane databases were reviewed for ADM RCTs (2003-present) reporting a dichotomous, categorical outcome. FI and rFI (event reversals influencing outcome significance) and FQ (standardized fragility) were calculated and reported as median. Subgroup analysis was performed based on intervention types. RESULTS Among the 127 studies screened, 56 RCTs with 579 outcomes were included. The median FI stood at 4 (3-5) and FQ was 0.04 (0.03-0.07). Only 101 outcomes were statistically significant with a median FI of 3 (1-6) and FQ of 0.04 (0.02-0.08). The nonsignificant outcomes had a median FI of 4 (3-5) and FQ of 0.04 (0.03-0.07). Notably, 26% of the outcomes had several patients lost to follow up equal to or surpassing the FI. Based on the intervention type, the median FIs showed minor fluctuations but remained low. CONCLUSIONS Outcomes from ADM-related RCTs were statistically fragile. Slight outcome reversals or maintenance of patient follow-up can alter the significance of results. Therefore, future researchers are recommended to jointly report FI, FQ, and P-values to offer a comprehensive view of the robustness in ADM literature.
Collapse
Affiliation(s)
- Anya Wang
- Icahn School of Medicine at Mount Sinai, Division of Plastic and Reconstructive Surgery, New York, NY 10029, USA
| | - Daniel Kwon
- Icahn School of Medicine at Mount Sinai, Division of Plastic and Reconstructive Surgery, New York, NY 10029, USA
| | - Esther Kim
- Icahn School of Medicine at Mount Sinai, Division of Plastic and Reconstructive Surgery, New York, NY 10029, USA
| | - Olachi Oleru
- Icahn School of Medicine at Mount Sinai, Division of Plastic and Reconstructive Surgery, New York, NY 10029, USA
| | - Nargiz Seyidova
- Icahn School of Medicine at Mount Sinai, Division of Plastic and Reconstructive Surgery, New York, NY 10029, USA
| | - Peter J Taub
- Icahn School of Medicine at Mount Sinai, Division of Plastic and Reconstructive Surgery, New York, NY 10029, USA.
| |
Collapse
|
5
|
Cote MP, Asnis P, Hutchinson ID, Berkson E. Editorial Commentary: The Statistical Fragility Index of Medical Trials Is Low By Design: Critical Evaluation of Confidence Intervals Is Required. Arthroscopy 2024; 40:1006-1008. [PMID: 38219106 DOI: 10.1016/j.arthro.2023.10.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 10/12/2023] [Indexed: 01/15/2024]
Abstract
The Fragility Index (FI) provides the number of patients whose outcome would need to have changed for the results of a clinical trial to no longer be statistically significant. Although it's a well-intended and easily interpreted metric, its calculation is based on reversing a significant finding and therefore its interpretation is only relevant in the domain of statistical significance. Its interpretation is only relevant in the domain of statistical significance. A well-designed clinical trial includes an a priori sample size calculation that aims to find the bare minimum of patients needed to obtain statistical significance. Such trials are fragile by design! Examining the robustness of clinical trials requires an estimation of uncertainty, rather than a misconstrued, dichotomous focus on statistical significance. Confidence intervals (CIs) provide a range of values that are compatible with a study's data and help determine the precision of results and the compatibility of the data with different hypotheses. The width of the CI speaks to the precision of the results, and the extent to which the values contained within have potential to be clinically important. Finally, one should not assume that a large FI indicates robust findings. Poorly executed trials are prone to bias, leading to large effects, and therefore, small P values, and a large FI. Let's move our future focus from the FI toward the CI.
Collapse
Affiliation(s)
| | | | | | - Eric Berkson
- Boston, Massachusetts, U.S.A.; Foxborough, Massachusetts, U.S.A
| |
Collapse
|