1
|
Cizik AM, Zhang C, Presson AP, Randall D, Kazmers NH. Linking QuickDASH and PROMIS Upper-Extremity Computer-Adaptive Test Scores in Hand Surgery: A Crosswalk Study. J Hand Surg Am 2024; 49:664-674. [PMID: 38795102 DOI: 10.1016/j.jhsa.2024.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 03/22/2024] [Accepted: 04/10/2024] [Indexed: 05/27/2024]
Abstract
PURPOSE Assessment of patient-reported outcome measures (PROMs) for hand and upper-extremity surgery patients using measures such as the Quick Disabilities of the Arm, Shoulder, and Hand (qDASH), as well as general measures including the Patient-Reported Outcomes Measurement Information System Upper Extremity Physical Function domain via a Computer-Adaptive Test (PROMIS UE CAT), has become commonplace. The aim of this study was to link, for crosswalking, the qDASH measure to both versions of the PROMIS UE CAT (v1.2 and v2.0). METHODS We included 18,944 hand and upper-extremity patients who completed both versions of the PROMIS UE CAT and the qDASH at the same clinical encounter. Shoulder pathology was excluded. Score linkage was performed using the R package equate, and multiple equating models (linear regression, identity, mean, linear, equipercentile, and circle-arc models) were used to establish crosswalk tables. RESULTS Mean qDASH and PROMIS UE CAT v1.2 scores were 38.2 (SD = 23.1) and 36.6 (SD = 9.8), respectively. Mean qDASH and PROMIS UE CAT v2.0 scores were 37.3 (SD = 21.8) and 38.3 (SD = 10.4), respectively. Pearson correlations had very strong linear relationships between the qDASH and the PROMIS UE CAT v1.2 and PROMIS UE CAT v2.0 (r = -0.83 [-0.84, -0.92] and r = -0.80 [-0.81, -0.80], respectively). For the equipercentile equating models, the intraclass correlation coefficient (ICC) had very strong positive relationships to linking measures with ICC = 0.85 (0.84, 0.86) for the qDASH-UE CAT v1.2 crosswalk and ICC = 0.83 (0.82, 0.84) for the qDASH-UE CAT v2.0 crosswalk. CONCLUSIONS The linkages establish crosswalk tables using equipercentile equating models to convert the PROMIS UE CAT v1.2 and v2.0 scores to the qDASH and vice versa. CLINICAL RELEVANCE This study provides crosswalk tables for commonly collected PROMs in hand surgery, increasing the comparability of results between centers using different PROMs to study the same conditions or treatments.
Collapse
Affiliation(s)
- Amy M Cizik
- Department of Orthopaedics, Spencer Fox Eccles School of Medicine, University of Utah, Salt Lake City, UT.
| | - Chong Zhang
- Division of Epidemiology, Department of Internal Medicine, Spencer Fox Eccles School of Medicine, University of Utah, Salt Lake City, UT
| | - Angela P Presson
- Division of Epidemiology, Department of Internal Medicine, Spencer Fox Eccles School of Medicine, University of Utah, Salt Lake City, UT
| | - Dustin Randall
- Department of Orthopaedics, Spencer Fox Eccles School of Medicine, University of Utah, Salt Lake City, UT
| | - Nikolas H Kazmers
- Department of Orthopaedics, Spencer Fox Eccles School of Medicine, University of Utah, Salt Lake City, UT
| |
Collapse
|
2
|
Sierevelt IN, van Kampen PM, Terwee CB, Nolte PA, Kerkhoffs GMMJ, Haverkamp D. The minimal important change is not a universal fixed value across diagnoses when using the FAOS and FAAM in patients undergoing elective foot and ankle surgery. Knee Surg Sports Traumatol Arthrosc 2024. [PMID: 38860725 DOI: 10.1002/ksa.12308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/21/2024] [Accepted: 05/28/2024] [Indexed: 06/12/2024]
Abstract
PURPOSE This study aimed to calculate region and diagnosis-specific minimal important changes (MICs) of the Foot and Ankle Outcome Score (FAOS) and the Foot and Ankle Ability Measure (FAAM) in patients requiring foot and ankle surgery and to assess their variability across different foot and ankle diagnoses. METHODS The study used routinely collected data from patients undergoing elective foot and ankle surgery. Patients had been invited to complete the FAOS and FAAM preoperatively and at 3-6 months after surgery, along with two anchor questions encompassing change in pain and daily function. Patients were categorised according to region of pathology and subsequent diagnoses. MICs were calculated using predictive modelling (MICPRED) and receiver operating characteristic curve (MICROC) method and evaluated according to strict credibility criteria. RESULTS Substantial variability of the MICs between forefoot and ankle/hindfoot region was observed, as well as among specific foot and ankle diagnoses, with MICPRED and MICROC values ranging from 7.8 to 25.5 points and 9.4 to 27.8, respectively. Despite differences between MICROC and MICPRED estimates, both calculation methods exhibited largely consistent patterns of variation across subgroups, with forefoot conditions systematically showing smaller MICs than ankle/hindfoot conditions. Most MICs demonstrated high credibility; however, the majority of the MICs for the FAOS symptoms subscale and forefoot conditions exhibited insufficient or low credibility. CONCLUSION The MICs of the FAOS and FAAM vary across foot and ankle diagnoses in patients undergoing elective foot and ankle surgery and should not be used as a universal fixed value, but recognised as contextual parameters. This can help clinicians and researchers in more accurate interpretation of the FAOS and FAAM change scores. LEVEL OF EVIDENCE Level IV.
Collapse
Affiliation(s)
- Inger N Sierevelt
- Department of Orthopedic Surgery, Xpert Clinics, Amsterdam, The Netherlands
- Department of Orthopedic Surgery, Spaarnegasthuis Academy, Hoofddorp, The Netherlands
| | - Paulien M van Kampen
- Department of Research and Innovation, Bergman Clinics, Naarden, The Netherlands
| | - Caroline B Terwee
- Department of Epidemiology and Data Science, Amsterdam UMC, Amsterdam, The Netherlands
| | - Peter A Nolte
- Department of Orthopedic Surgery, Spaarnegasthuis Academy, Hoofddorp, The Netherlands
| | - Gino M M J Kerkhoffs
- Department of Orthopedic Surgery and Sports Medicine, Amsterdam Movement Sciences, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Daniel Haverkamp
- Department of Orthopedic Surgery, Xpert Clinics, Amsterdam, The Netherlands
| |
Collapse
|
3
|
Cella D, Nolla K, Peipert JD. The challenge of using patient reported outcome measures in clinical practice: how do we get there? J Patient Rep Outcomes 2024; 8:35. [PMID: 38512362 PMCID: PMC10957801 DOI: 10.1186/s41687-024-00711-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 03/06/2024] [Indexed: 03/23/2024] Open
Abstract
BACKGROUND As patient-reported outcome measures (PROMs) become available to clinicians for routine clinical decision-making, many wonder how to define a meaningful change in a patient's PROM score. Some PROMs have a specific threshold that indicates meaningful change, but since those numbers are based on population averages, they do not necessarily apply to the varying experiences of each individual patient. Rather than viewing this as a weakness of PROMs, it is worth considering how clinicians use other existing measures in clinical decision-making-and whether PROMs can be used similarly. BODY: An informal survey of 43 clinicians reported using measures such as weight, blood pressure, and blood chemistry to inform clinical decision-making. Although clinicians were very consistent with what constituted a meaningful change for some measures (e.g., ECOG performance status), other measures had considerable variability (e.g., weight), often informed by their specialization (for example, differing thresholds for meaningful weight change for adult primary care, pediatrics, and oncology). For interpreting change in measures, they relied on clinical experience (44%), published literature (38%), and established guidelines (35%). In open-response comments, many clarified that the results of any measure had to be taken in the context of each individual patient before making treatment decisions. In short, clinicians already apply individualized clinical judgment when interpreting score changes in existing clinical measures. As clinicians gain familiarity with PROMs, PROMs will likely be utilized in the same way. CONCLUSION Like other clinical measures from weight to blood chemistry, change in a PROM score is but one piece of a patient's clinical story. Rather than relying on a hard-and-fast number for defining clinically meaningful change in a PROM score, providers should-and many already do-consider the full scope of a patient's experience as they make treatment decisions.
Collapse
Affiliation(s)
- David Cella
- Feinberg School of Medicine, Northwestern University, 625 N. Michigan Ave, 2100, Chicago, IL, 60611, USA.
| | - Kyle Nolla
- Feinberg School of Medicine, Northwestern University, 625 N. Michigan Ave, 2100, Chicago, IL, 60611, USA
| | - John Devin Peipert
- Feinberg School of Medicine, Northwestern University, 625 N. Michigan Ave, 2100, Chicago, IL, 60611, USA
| |
Collapse
|
4
|
Hays RD, Reise SP, Herman PM. Estimating individual health-related quality of life changes in low back pain patients. BMC Musculoskelet Disord 2023; 24:961. [PMID: 38082389 PMCID: PMC10712133 DOI: 10.1186/s12891-023-07093-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 12/04/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND There is a need to evaluate different options for estimating individual change in health-related quality of life for patients with low back pain. METHODS Secondary analysis of data collected at baseline and 6 weeks later in a randomized trial of 749 adults with low back pain receiving usual medical care (UMC) or UMC plus chiropractic care at a small hospital at a military training site or two large military medical centers. The mean age was 31; 76% were male and 67% were White. The study participants completed the Patient-Reported Outcomes Measurement Information System (PROMIS®)-29 v 1.0 physical function, pain interference, pain intensity, fatigue, sleep disturbance, depression, anxiety, satisfaction with participation in social roles, physical summary, and mental health summary scores (T-scored with mean = 50 and standard deviation (SD) = 10 in the U.S. general population). RESULTS Reliability estimates at the baseline ranged from 0.700 to 0.969. Six-week test-retest intraclass correlation estimates were substantially lower than these estimates: the median test-retest intraclass correlation for the two-way mixed-effects model was 0. 532. Restricting the test-retest reliability estimates to the subset who reported they were about the same as at baseline on a retrospective rating of change item increased the median test-retest reliability to 0.686. The amount of individual change that was statistically significant varied by how reliability was estimated, and which SD was used. The smallest change needed was found when internal consistency reliability and the SD at baseline were used. When these values were used, the amount of change needed to be statistically significant (p < .05) at the individual level ranged from 3.33 (mental health summary scale) to 12.30 (pain intensity item) T-score points. CONCLUSIONS We recommend that in research studies estimates of the magnitude of individual change needed for statistical significance be provided for multiple reliability and standard deviation estimates. Whenever possible, patients should be classified based on whether they 1) improved significantly and perceived they got better, 2) improved significantly but did not perceive they were better, 3) did not improve significantly but felt they got better, or 4) did not improve significantly or report getting better.
Collapse
Affiliation(s)
- Ron D Hays
- Division of General Internal Medicine & Health Services Research, UCLA Department of Medicine, 1100 Glendon Avenue, Los Angeles, CA, 90024, USA.
| | | | | |
Collapse
|
5
|
Tang X, Schalet BD, Peipert JD, Cella D. Does Scoring Method Impact Estimation of Significant Individual Changes Assessed by Patient-Reported Outcome Measures? Comparing Classical Test Theory Versus Item Response Theory. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2023; 26:1518-1524. [PMID: 37315768 DOI: 10.1016/j.jval.2023.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 05/25/2023] [Accepted: 06/01/2023] [Indexed: 06/16/2023]
Abstract
OBJECTIVES This study aimed to examine the ability of classical test theory (CTT) and item response theory (IRT) scores assessed by Patient-Reported Outcomes Measurement Information System® (PROMIS®) measures to identify significant individual changes in the setting of clinical studies, using both simulated and empirical data. METHODS We used simulated data to compare the estimation of significant individual changes between CTT and IRT scores across different conditions and a clinical trial data set to verify the simulation results. We calculated reliable change indexes to estimate significant individual changes. RESULTS For small true change, IRT scores showed a slightly higher rate of classifying change groups than CTT scores and were comparable with CTT scores for a shorter test length. Additionally, IRT scores were found to have a prominent advantage in the classification rates of change groups for medium to high true change over CTT scores. Such an advantage became prominent in a longer test length. The empirical data analysis results using an anchor-based approach further supported the above findings that IRT scores can more accurately classify participants into change groups than CTT scores. CONCLUSIONS Given that IRT scores perform better, or at least comparably, in most conditions, we recommend using IRT scores to estimate significant individual changes and identify responders to treatment. This study provides evidence-based guidance in detecting individual changes based on CTT and IRT scores under various measurement conditions and leads to recommendations for identifying responders to treatment for participants in clinical trials.
Collapse
Affiliation(s)
- Xiaodan Tang
- Department of Medical Social Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
| | - Benjamin David Schalet
- Department of Epidemiology and Data Science, Amsterdam UMC, Vrije Universiteit, Amsterdam, The Netherlands
| | - John Devin Peipert
- Department of Medical Social Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - David Cella
- Department of Medical Social Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| |
Collapse
|
6
|
Peipert JD, Goble S, Isaacson J, Tang X, Wallace K, Coleman RL, Ledermann JA, Cella D. Patient-reported outcomes of maintenance rucaparib in patients with recurrent ovarian carcinoma in ARIEL3, a phase III, randomized, placebo-controlled trial. Gynecol Oncol 2023; 175:1-7. [PMID: 37262961 DOI: 10.1016/j.ygyno.2023.05.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 05/15/2023] [Accepted: 05/17/2023] [Indexed: 06/03/2023]
Abstract
PURPOSE To compare NFOSI-18 Disease Related Symptoms - Physical (DRSP), Total score, and side effect bother between maintenance rucaparib (600 mg twice daily) vs. placebo in the phase III ARIEL3 trial. METHODS ARIEL3 (NCT01968213) included patients with ovarian carcinoma who responded to second-line or later platinum-based chemotherapy. The NFOSI-18 DRS-P and Total scales were secondary endpoints. The NFOSI-18 contains a side effect impact item (GP5): "I am bothered by side effects of treatment." We compared treatment arms on change from baseline of DRS-P and Total scores using mixed models with repeated measures (MRMM). Time to first and confirmed deterioration of NFOSI-18 DRS-P and Total scales were analyzed using Cox regression. We also calculated the proportion of patients reporting moderate to high side effect bother on GP5. RESULTS In the intention-to-treat (ITT) cohort, mean change from baseline favored the placebo. Compared to placebo, rucaparib was associated with higher risk of deterioration [e.g., 4-point deteriorator definition hazard ratio (HR): 1.85; 95% CI: 1.46, 2.36; median time to first deterioration on DRSP: 1.9 vs. 7.0 months]. Confirmed deterioration results resembled those for first deterioration. Proportions of patients reporting moderate/high side effect bother on GP5 fluctuated around 20% across treatment cycles. Results in BRCA mutant and homologous recombination deficient cohorts were generally similar to those from the ITT cohort. CONCLUSION This placebo-controlled study in the maintenance therapy setting provides a unique view of the impact of PARP inhibition on the patient-reported outcomes that are commonly used in ovarian cancer clinical trials. Information regarding the adverse side effect impact of PARP inhibitors should be weighed against their clinical benefit.
Collapse
Affiliation(s)
- John Devin Peipert
- Department of Medical Social Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
| | | | | | - Xiaodan Tang
- Department of Medical Social Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Katrine Wallace
- Clovis Oncology, Boulder, CO, USA; Division of Epidemiology and Biostatistics, University of Illinois Chicago School of Public Health, Chicago, IL, USA
| | | | | | - David Cella
- Department of Medical Social Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| |
Collapse
|
7
|
Trigg A, Lenderking WR, Boehnke JR. Introduction to the special section: "Methodologies and considerations for meaningful change". Qual Life Res 2023; 32:1223-1230. [PMID: 37027088 DOI: 10.1007/s11136-023-03413-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
Affiliation(s)
- Andrew Trigg
- Medical Affairs Statistics, Bayer plc, Reading, UK
| | | | - Jan R Boehnke
- School of Health Sciences, University of Dundee, 11 Airlie Place, Dundee, DD1 4HJ, UK.
| |
Collapse
|
8
|
Minimally important changes do not always reflect minimally important change; moreover, there is no need for them. Qual Life Res 2023; 32:1403-1404. [PMID: 36780034 DOI: 10.1007/s11136-023-03366-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/02/2023] [Indexed: 02/14/2023]
|
9
|
Gwaltney C, Stokes J, Aiudi A, Mazar I, Ollis S, Love E, Karaa A, Houts CR, Wirth RJ, Shields AL. Psychometric performance of the Primary Mitochondrial Myopathy Symptom Assessment (PMMSA) in a randomized, double-blind, placebo-controlled crossover study in subjects with mitochondrial disease. J Patient Rep Outcomes 2022; 6:129. [PMID: 36562873 PMCID: PMC9789285 DOI: 10.1186/s41687-022-00534-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 12/14/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The Primary Mitochondrial Myopathy Symptom Assessment (PMMSA) is a 10-item patient-reported outcome (PRO) measure designed to assess the severity of mitochondrial disease symptoms. Analyses of data from a clinical trial with PMM patients were conducted to evaluate the psychometric properties of the PMMSA and to provide score interpretation guidelines for the measure. METHODS The PMMSA was completed as a daily diary for approximately 14 weeks by individuals in a Phase 2 randomized, placebo-controlled crossover trial evaluating the safety, tolerability, and efficacy of subcutaneous injections of elamipretide in patents with mitochondrial disease. In addition to the PMMSA, performance-based assessments, clinician ratings, and other PRO measures were also completed. Descriptive statistics, psychometric analyses, and score interpretation guidelines were evaluated for the PMMSA. RESULTS Participants (N = 30) had a mean age of 45.3 years, with the majority of the sample being female (n = 25, 83.3%) and non-Hispanic white (n = 29, 96.6%). The 10 PMMSA items assessing a diverse symptomology were not found to form a single underlying construct. However, four items assessing tiredness and muscle weakness were grouped into a "general fatigue" domain score. The PMMSA Fatigue 4 summary score (4FS) demonstrated stable test-retest scores, internal consistency, correlations with the scores produced by reference measures, and the ability to differentiate between different global health levels. Changes on the PMMSA 4FS were also related to change scores produced by the reference measures. PMMSA severity scores were higher for the symptom rated as "most bothersome" by each subject relative to the remaining nine PMMSA items (most bothersome symptom mean = 2.88 vs. 2.18 for other items). Distribution- and anchor-based evaluations suggested that reduction in weekly scores between 0.79 and 2.14 (scale range: 4-16) may represent a meaningful change on the PMMSA 4FS and reduction in weekly scores between 0.03 and 0.61 may represent a responder for each of the remaining six non-fatigue items, scored independently. CONCLUSIONS Upon evaluation of its psychometric properties, the PMMSA, specifically the 4FS domain, demonstrated strong reliability and construct-related validity. The PMMSA can be used to evaluate treatment benefit in clinical trials with individuals with PMM. Trial registration ClinicalTrials.gov identifier, NCT02805790; registered June 20, 2016; https://clinicaltrials.gov/ct2/show/NCT02805790 .
Collapse
Affiliation(s)
- Chad Gwaltney
- Gwaltney Consulting Group, 1 Bucks Trail, Westerly, RI USA
| | - Jonathan Stokes
- Adelphi Values (or employed at Adelphi Values at time of conduct of research), Boston, MA USA
| | - Anthony Aiudi
- grid.476731.00000 0004 0414 8723Stealth BioTherapeutics Inc., Newton, MA USA
| | - Iyar Mazar
- Adelphi Values (or employed at Adelphi Values at time of conduct of research), Boston, MA USA
| | - Sarah Ollis
- Adelphi Values (or employed at Adelphi Values at time of conduct of research), Boston, MA USA
| | - Emily Love
- Adelphi Values (or employed at Adelphi Values at time of conduct of research), Boston, MA USA
| | - Amel Karaa
- grid.32224.350000 0004 0386 9924Massachusetts General Hospital, Boston, MA USA
| | | | - R. J. Wirth
- Vector Psychometric Group LLC, Chapel Hill, NC USA
| | - Alan L. Shields
- Adelphi Values (or employed at Adelphi Values at time of conduct of research), Boston, MA USA
| |
Collapse
|
10
|
Terluin B. Likely change indexes do not always index likely change; moreover, there is no need for them. Qual Life Res 2022; 32:1401-1402. [PMID: 36469213 DOI: 10.1007/s11136-022-03314-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/29/2022] [Indexed: 12/12/2022]
|