1
|
Vach W, Saxer F. Anchor-based minimal important difference values are often sensitive to the distribution of the change score. Qual Life Res 2024; 33:1223-1232. [PMID: 38319488 PMCID: PMC11045581 DOI: 10.1007/s11136-024-03610-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/16/2024] [Indexed: 02/07/2024]
Abstract
PURPOSE Anchor-based studies are today the most popular approach to determine a minimal important difference value for an outcome variable. However, a variety of construction methods for such values do exist. This constitutes a challenge to the field. In order to distinguish between more or less adequate construction methods, meaningful minimal requirements can be helpful. For example, minimal important difference values should not reflect the intervention(s) the patients are exposed to in the study used for construction, as they should later allow to compare interventions. This requires that they are not sensitive to the distribution of the change score observed. This study aims at investigating to which degree established construction methods fulfil this minimal requirement. METHODS Six constructions methods were considered, covering very popular and recently suggested methods. The sensitivity of MID values to the distribution of the change score was investigated in a simulation study for these six construction methods. RESULTS Five out of six construction methods turned out to yield MID values which are sensitive to the distribution of the change score to a degree that questions their usefulness. Insensitivity can be obtained by using construction methods based solely on an estimate of the conditional distribution of the anchor variable given the change score. CONCLUSION In future the computation of MID values should be based on construction methods avoiding sensitivity to the distribution of the change score.
Collapse
Affiliation(s)
- Werner Vach
- Department of Environmental Sciences, University of Basel, Spalenring 145, CH-4055, Basel, Switzerland.
- Basel Academy for Quality and Research in Medicine, Basel, Switzerland.
| | - Franziska Saxer
- Medical Faculty, University of Basel, Basel, Switzerland
- Novartis Institutes for Biomedical Research, Basel, Switzerland
| |
Collapse
|
2
|
Clarke NA, Braverman J, Worthy G, Shaw JW, Bennett B, Dhanda D, Cocks K. A Review of Meaningful Change Thresholds for EORTC QLQ-C30 and FACT-G Within Oncology. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2024; 27:458-468. [PMID: 38191023 DOI: 10.1016/j.jval.2023.12.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/11/2023] [Accepted: 12/22/2023] [Indexed: 01/10/2024]
Abstract
OBJECTIVES This literature review provides an overview of meaningful change thresholds for the European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire Core 30 (QLQ-C30) and the Functional Assessment of Cancer Therapy - General (FACT-G) used across hematological cancers and solid tumors (melanoma, lung, bladder, and prostate). METHODS Embase, MEDLINE, and PubMed were searched to identify relevant oncology publications from 2016 to 2021. Label claims from the US Food and Drug Administration and the European Medicines Agency for 7 recently approved drugs (pembrolizumab, atezolizumab, glasdegib, gilteritinib, tisagenlecleucel, axicabtagene ciloleucel, and daratumumab plus hyaluronidase-fihj) were reviewed. RESULTS Publications providing guidance on meaningful change thresholds for the QLQ-C30 displayed a growing trend away from broad "legacy" thresholds of 10 points for all QLQ-C30 scales), toward deriving "contemporary" thresholds (eg, subscale specific, population specific). Contemporary publications generally provide guidance on selecting thresholds for specific scales that account for improved or worsening thresholds (eg, QLQ-C30 subscales). This trend was not clear for FACT-G, with less new guidance available. Most clinical trials used in regulatory label submissions have used thresholds of 10 points for the QLQ-C30 subscales and 3 to 7 points for the FACT-G total score. Despite the availability of more recent guidelines, contemporary meaningful change thresholds seem slow to emerge in the published literature and regulatory labels. CONCLUSIONS Trialists should consider using contemporary thresholds, rather than legacy thresholds, for QLQ-C30 endpoints. Thresholds derived for a similar patient-population should be used where available. Further work is required to provide these across a broader range of cancer sites.
Collapse
Affiliation(s)
- Nathan A Clarke
- Statistics and Programming, Adelphi Values, Bollington, Cheshire, England, UK.
| | - Julia Braverman
- Worldwide Health and Economic Outcomes Research, Bristol Myers Squib, Princeton, NJ, USA
| | - Gill Worthy
- Statistics and Programming, Adelphi Values, Bollington, Cheshire, England, UK
| | - James W Shaw
- Worldwide Health and Economic Outcomes Research, Bristol Myers Squib, Princeton, NJ, USA
| | - Bryan Bennett
- Worldwide Health and Economic Outcomes Research, Bristol Myers Squib, Uxbridge, England, UK
| | - Devender Dhanda
- Worldwide Health and Economic Outcomes Research, Bristol Myers Squib, Princeton, NJ, USA
| | - Kim Cocks
- Statistics and Programming, Adelphi Values, Bollington, Cheshire, England, UK
| |
Collapse
|
3
|
Wells JR, Hillier A, Holland R, Mwacalimba K, Noli C, Panter C, Tatlock S, Wright A. Development and validation of a questionnaire to assess owner and canine quality-of-life and treatment satisfaction in canine allergic dermatitis. Vet Dermatol 2024. [PMID: 38361109 DOI: 10.1111/vde.13242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 12/06/2023] [Accepted: 01/28/2024] [Indexed: 02/17/2024]
Abstract
BACKGROUND Animal and owner quality-of-life (QoL) is pivotal in treatment decisions. Accurate measurement of owner-reported QoL and treatment satisfaction (TS) supports disease burden and treatment benefit evaluation. OBJECTIVES Develop and evaluate an owner-completed canine dermatitis QoL and TS questionnaire (CDQoL-TSQ) in allergic dogs. MATERIALS AND METHODS The CDQoL-TSQ was drafted following review of existing measures and expert input. Content validity was assessed through interviews with owners of allergic dogs. Psychometric properties of the QoL domains (Canine QoL, Owner QoL) were evaluated. Score interpretation was derived. RESULTS Twenty dog owners were interviewed. Item wording was amended following the first 10 interviews. Data from 211 owners were used in the psychometric evaluation. The Canine QoL domain demonstrated strong internal consistency (α = 0.89), test-retest reliability (ICC2,1 = 0.844), moderate convergent validity (r = 0.41) and moderate-high known-groups validity (effect size 0.37-0.64). The Owner QoL domain demonstrated strong internal consistency (α = 0.73), high convergent validity (r = 0.63) and moderate-high known-groups validity (0.43-0.63). Test-retest reliability approached moderate strength (ICC2,1 = 0.490). Group-level interpretation analysis showed minimal important difference of 7.0-13.6 points for dogs and 13.0-13.6 for owners. For individual dogs a change of 6.3 or 12.5 points for dogs, and 12.5 or 18.8 for owners indicates a response. CONCLUSIONS AND CLINICAL RELEVANCE The CDQOL-TSQ is a two-part assessment to evaluate QoL and TS in canine allergic dermatitis. The QoL questionnaire demonstrated validity and reliability, and interpretation of scores was derived, making it suitable for use in research and practice. The TS module is suitable for clinical setting use to improve owner-veterinarian communication.
Collapse
Affiliation(s)
- J R Wells
- Patient-Centered Outcomes, Adelphi Values Ltd, Bollington, UK
- Clinical Outcomes Assessment Department, Sanofi, UK
| | | | | | | | - C Noli
- Servizi Dermatologici Veterinari, Peveragno, Italy
| | - C Panter
- Patient-Centered Outcomes, Adelphi Values Ltd, Bollington, UK
| | - S Tatlock
- Patient-Centered Outcomes, Adelphi Values Ltd, Bollington, UK
| | - A Wright
- Zoetis, Parsippany, New Jersey, USA
| |
Collapse
|
4
|
Terluin B, Trigg A, Fromy P, Schuller W, Terwee CB, Bjorner JB. Estimating anchor-based minimal important change using longitudinal confirmatory factor analysis. Qual Life Res 2023:10.1007/s11136-023-03577-w. [PMID: 38151593 DOI: 10.1007/s11136-023-03577-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/01/2023] [Indexed: 12/29/2023]
Abstract
PURPOSE The minimal important change (MIC) is defined as the smallest within-individual change in a patient-reported outcome measure (PROM) that patients on average perceive as important. We describe a method to estimate this value based on longitudinal confirmatory factor analysis (LCFA). The method is evaluated and compared with a recently published method based on longitudinal item response theory (LIRT) in simulated and real data. We also examined the effect of sample size on bias and precision of the estimate. METHODS We simulated 108 samples with various characteristics in which the true MIC was simulated as the mean of individual MICs, and estimated MICs based on LCFA and LIRT. Additionally, both MICs were estimated in existing PROMIS Pain Behavior data from 909 patients. In another set of 3888 simulated samples with sample sizes of 125, 250, 500, and 1000, we estimated LCFA-based MICs. RESULTS The MIC was equally well recovered with the LCFA-method as using the LIRT-method, but the LCFA analyses were more than 50 times faster. In the Pain Behavior data (with higher scores indicating more pain behavior), an LCFA-based MIC for improvement was estimated to be 2.85 points (on a simple sum scale ranging 14-42), whereas the LIRT-based MIC was estimated to be 2.60. The sample size simulations showed that smaller sample sizes decreased the precision of the LCFA-based MIC and increased the risk of model non-convergence. CONCLUSION The MIC can accurately be estimated using LCFA, but sample sizes need to be preferably greater than 125.
Collapse
Affiliation(s)
- Berend Terluin
- Department of General Practice, Amsterdam UMC, Vrije Universiteit Amsterdam, de Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands.
- Amsterdam Public Health research institute, Amsterdam, The Netherlands.
| | - Andrew Trigg
- Medical Affairs Statistics, Bayer plc, Reading, UK
| | - Piper Fromy
- SeeingTheta, 2 Chemin des Vaux, 49400, Saumur, France
| | - Wouter Schuller
- Department of Epidemiology and Data Science, Amsterdam UMC, Vrije Universiteit Amsterdam, de Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands
- Spine Clinic, Provinciale weg 152-154, 1506 ME, Zaandam, The Netherlands
| | - Caroline B Terwee
- Amsterdam Public Health research institute, Amsterdam, The Netherlands
- Department of Epidemiology and Data Science, Amsterdam UMC, Vrije Universiteit Amsterdam, de Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands
| | - Jakob B Bjorner
- QualityMetric, Johnston, Rhode Island, USA
- Department of Public Health, University of Copenhagen, Copenhagen, Denmark
- National Research Centre for the Working Environment, Copenhagen, Denmark
| |
Collapse
|
5
|
Trigg A, Lenderking WR, Boehnke JR. Introduction to the special section: "Methodologies and considerations for meaningful change". Qual Life Res 2023; 32:1223-1230. [PMID: 37027088 DOI: 10.1007/s11136-023-03413-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
Affiliation(s)
- Andrew Trigg
- Medical Affairs Statistics, Bayer plc, Reading, UK
| | | | - Jan R Boehnke
- School of Health Sciences, University of Dundee, 11 Airlie Place, Dundee, DD1 4HJ, UK.
| |
Collapse
|
6
|
Estimating meaningful thresholds for multi-item questionnaires using item response theory. Qual Life Res 2023; 32:1819-1830. [PMID: 36780033 PMCID: PMC10172229 DOI: 10.1007/s11136-023-03355-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/21/2023] [Indexed: 02/14/2023]
Abstract
PURPOSE Meaningful thresholds are needed to interpret patient-reported outcome measure (PROM) results. This paper introduces a new method, based on item response theory (IRT), to estimate such thresholds. The performance of the method is examined in simulated datasets and two real datasets, and compared with other methods. METHODS The IRT method involves fitting an IRT model to the PROM items and an anchor item indicating the criterion state of interest. The difficulty parameter of the anchor item represents the meaningful threshold on the latent trait. The latent threshold is then linked to the corresponding expected PROM score. We simulated 4500 item response datasets to a 10-item PROM, and an anchor item. The datasets varied with respect to the mean and standard deviation of the latent trait, and the reliability of the anchor item. The real datasets consisted of a depression scale with a clinical depression diagnosis as anchor variable and a pain scale with a patient acceptable symptom state (PASS) question as anchor variable. RESULTS The new IRT method recovered the true thresholds accurately across the simulated datasets. The other methods, except one, produced biased threshold estimates if the state prevalence was smaller or greater than 0.5. The adjusted predictive modeling method matched the new IRT method (also in the real datasets) but showed some residual bias if the prevalence was smaller than 0.3 or greater than 0.7. CONCLUSIONS The new IRT method perfectly recovers meaningful (interpretational) thresholds for multi-item questionnaires, provided that the data satisfy the assumptions for IRT analysis.
Collapse
|
7
|
Terluin B, Terwee C, Eekhout I. Minimal Clinically Important Difference Estimates Are Biased by Adjusting for Baseline Severity, Not by Regression to the Mean. J Athl Train 2022; 57:1122-1123. [PMID: 36656305 PMCID: PMC9875704 DOI: 10.4085/1062-6050-1006.22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
8
|
Bjorner JB, Terluin B, Trigg A, Hu J, Brady KJS, Griffiths P. Establishing thresholds for meaningful within-individual change using longitudinal item response theory. Qual Life Res 2022; 32:1267-1276. [PMID: 35870045 PMCID: PMC10123029 DOI: 10.1007/s11136-022-03172-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/10/2022] [Indexed: 10/16/2022]
Abstract
Abstract
Purpose
Thresholds for meaningful within-individual change (MWIC) are useful for interpreting patient-reported outcome measures (PROM). Transition ratings (TR) have been recommended as anchors to establish MWIC. Traditional statistical methods for analyzing MWIC such as mean change analysis, receiver operating characteristic (ROC) analysis, and predictive modeling ignore problems of floor/ceiling effects and measurement error in the PROM scores and the TR item. We present a novel approach to MWIC estimation for multi-item scales using longitudinal item response theory (LIRT).
Methods
A Graded Response LIRT model for baseline and follow-up PROM data was expanded to include a TR item measuring latent change. The LIRT threshold parameter for the TR established the MWIC threshold on the latent metric, from which the observed PROM score MWIC threshold was estimated. We compared the LIRT approach and traditional methods using an example data set with baseline and three follow-up assessments differing by magnitude of score improvement, variance of score improvement, and baseline-follow-up score correlation.
Results
The LIRT model provided good fit to the data. LIRT estimates of observed PROM MWIC varied between 3 and 4 points score improvement. In contrast, results from traditional methods varied from 2 to 10 points—strongly associated with proportion of self-rated improvement. Best agreement between methods was seen when approximately 50% rated their health as improved.
Conclusion
Results from traditional analyses of anchor-based MWIC are impacted by study conditions. LIRT constitutes a promising and more robust analytic approach to identifying thresholds for MWIC.
Collapse
|
9
|
Perspective on Riddle and Dumenci's 'Commentary on finding meaning in patient-reported outcome change scores: a seemingly unquenchable thirst for understanding'. Osteoarthritis Cartilage 2022; 30:773-774. [PMID: 35358699 DOI: 10.1016/j.joca.2022.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 03/23/2022] [Indexed: 02/02/2023]
|
10
|
Improved adjusted minimal important change took reliability of transition ratings into account. J Clin Epidemiol 2022; 148:48-53. [DOI: 10.1016/j.jclinepi.2022.04.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 03/23/2022] [Accepted: 04/13/2022] [Indexed: 11/24/2022]
|
11
|
Liebmann EP, Resnick SG, Hoff RA, Katz IR. Interpreting patient reports of perceived change during treatment for depression: Findings from the Veterans Outcome Assessment survey. Psychiatry Res 2022; 309:114402. [PMID: 35114571 DOI: 10.1016/j.psychres.2022.114402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 01/07/2022] [Accepted: 01/15/2022] [Indexed: 11/29/2022]
Abstract
This study addressed ongoing questions about the meaning of patients' perceptions of change during treatment. The study used data from the Veterans Outcome Assessment survey for patients with a depressive disorder, without mental health comorbidities, treated in Department of Veterans Affairs general mental health clinics (n = 694). Perceived changes in problems/symptoms, other domains, and the quality of communication with providers were evaluated with items from the Experience of Care & Health Outcomes (ECHO) survey. Depressive symptoms were measured with the Patient Health Questionnaire-9 (PHQ-9). Linear regression models evaluated associations of perceived change at 3-months post-baseline with observed change in PHQ-9 scores, scores on other patient-reported outcome measures (PROMs), and ratings of communication with providers. Patients' reports of their clinical condition at follow-up together with ratings of communication accounted for approximately one-third of the variance in patients' perceptions of change. Adding change-scores based on baseline and follow-up scores on the PHQ-9 and other PROMs did not improve model fit. The findings suggest that patient reports of perceived change during treatment reflect their current clinical state and their experience of care more closely than actual changes in the PHQ-9 or other PROMs.
Collapse
Affiliation(s)
- Edward P Liebmann
- VA Connecticut Healthcare System, West Haven, CT, United States; Department of Psychiatry, Yale School of Medicine, New Haven, CT, United States
| | - Sandra G Resnick
- VA Office of Mental Health and Suicide Prevention, Northeast Program Evaluation Center, West Haven, CT, United States; Department of Psychiatry, Yale School of Medicine, New Haven, CT, United States
| | - Rani A Hoff
- VA Office of Mental Health and Suicide Prevention, Northeast Program Evaluation Center, West Haven, CT, United States; Department of Psychiatry, Yale School of Medicine, New Haven, CT, United States
| | - Ira R Katz
- Department of Veterans Affairs, VA Office of Mental Health and Suicide Prevention, Washington, DC, United States.
| |
Collapse
|
12
|
Vanier A, Leroy M, Hardouin JB. Toward a rigorous assessment of the statistical performances of methods to estimate the Minimal Important Difference of Patient-Reported Outcomes: a protocol for a large-scale simulation study. Methods 2022; 204:396-409. [PMID: 35202798 DOI: 10.1016/j.ymeth.2022.02.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 02/13/2022] [Accepted: 02/18/2022] [Indexed: 12/12/2022] Open
Abstract
Interpreting observed changes over time in Patient-Reported Outcomes (PRO) measures is still considered a challenge. Indeed, concluding an observed change at group level is statistically significant does not necessarily equate this change is meaningful from the perspective of the patient. To help interpret within and/or between group changes in the measure over time, the estimation of the Minimal Important Difference (MID) of the instrument - the smallest value that patients consider as a perceived change - is useful. In the last 30 years, a plethora of methods and estimators have been proposed to derive this MID value using clinical data from sample of patients. MIDs for hundreds of PROs have been estimated, with frequently a substantial variability in the results depending on the method used. Nonetheless, a rigorous assessment of the statistical performances of numerous proposed methods for estimating MIDs by experimental design such as Monte-Carlo study has never been performed. The purpose of this paper is to thoroughly depict a protocol for a large-scale simulation study designed to investigate the statistical performances, especially bias against a true populational value, of the common proposed estimators for MID. This paper depicts how investigated methods and estimators were retained after the conduct of a systematic review, the design of a conceptual model that formally defines what is the true populational MID value and the translation of the conceptual model into a model allowing the simulation of responses of items to a hypothetical PRO at two times of measurement along with the response to a Patient Global Rating of Change at the second time under the constraint of a known true MID value. A statistical analysis plan is depicted in order to conclude if working hypotheses on what could be appropriate MID estimators will be verified. Strengths, assumptions, and limits of the simulation model are exposed. Finally, we show how this protocol could be the basis for fostering future methodological research on the issue of interpreting changes in PRO measures.
Collapse
Affiliation(s)
- Antoine Vanier
- Inserm - University of Nantes - University of Tours, UMR U1246 Sphere "Methods in Patient-centered Outcomes and Health Research", Nantes 44200, France; Haute Autorité de Santé, Assessment and Access to Innovation Direction, Pharmaceutical Drugs Assessment Department, Saint-Denis 93210, France.
| | - Maxime Leroy
- University Hospital of Nantes, Unit of Methodology and Biostatistics, Nantes 44000, France
| | - Jean-Benoit Hardouin
- Inserm - University of Nantes - University of Tours, UMR U1246 Sphere "Methods in Patient-centered Outcomes and Health Research", Nantes 44200, France; University Hospital of Nantes, Unit of Methodology and Biostatistics, Nantes 44000, France
| |
Collapse
|
13
|
Terluin B, Griffiths P, Trigg A, Terwee CB, Bjorner JB. Present state bias in transition ratings was accurately estimated in simulated and real data. J Clin Epidemiol 2021; 143:128-136. [PMID: 34965478 DOI: 10.1016/j.jclinepi.2021.12.024] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 11/14/2021] [Accepted: 12/22/2021] [Indexed: 10/19/2022]
Abstract
OBJECTIVE Patient-reported transition ratings are supposed to reflect the change between a previous baseline health state and a present follow-up state, but may reflect the present state to a greater extent. This so-called 'present state bias' (PSB) potentially threatens the validity of transition ratings. Several criteria have been proposed to assess PSB. We examined how well these criteria perform and to which extent confirmatory factor analysis (CFA) for categorical data provides an accurate assessment of the degree of PSB. STUDY DESIGN AND SETTING We simulated a multiple samples with baseline and follow-up item responses to a hypothetical questionnaire, and transition ratings. The samples varied with respect to various distributional characteristics and the degree of PSB. The performance of criteria proposed in the literature, and a new CFA-based criterion, were evaluated by the proportion of explained variance in PSB. In addition, four real datasets were analyzed. RESULTS The known criteria explained 36-74% of the variance in PSB. A new CFA-based criterion, namely the ratio of the factor loadings of the transition ratings plus one, explained 81-98% of the variance in PSB across the samples. CONCLUSION Present state bias in transition ratings can be estimated accurately using CFA.
Collapse
Affiliation(s)
- Berend Terluin
- Department of General Practice, Amsterdam Public Health research institute, Amsterdam UMC, Vrije Universiteit Amsterdam, de Boelelaan 1117, 1081 HV Amsterdam, The Netherlands.
| | | | - Andrew Trigg
- Patient-Centered Outcomes, Adelphi Values, Adelphi Mill, Bollington, Cheshire, SK10 5JB United Kingdom.
| | - Caroline B Terwee
- Department of Epidemiology and Data Science, Amsterdam Public Health research institute, Amsterdam UMC, Vrije Universiteit Amsterdam, de Boelelaan 1117, 1081 HV Amsterdam, The Netherlands.
| | - Jakob B Bjorner
- QualityMetric, Johnston, Rhode Island, USA; Department of Public Health, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
14
|
Triangulation of multiple meaningful change thresholds for patient-reported outcome scores. Qual Life Res 2021; 30:2755-2764. [PMID: 34319532 DOI: 10.1007/s11136-021-02957-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/16/2021] [Indexed: 02/07/2023]
Abstract
PURPOSE The notion of what constitutes meaningful differences or changes in patient-reported outcome scores is represented by meaningful change thresholds (MCTs). Applying multiple methods to estimate MCTs inevitably results in a range of estimates; however, a single estimate or small range is sought in practice to enable consistent interpretation of scores. While current recommendations for triangulation are appropriate in principle, the vital step of moving from all estimates to a value or small range lacks clarity and is subjective in nature. This article aims to review current triangulation approaches and provide more robust recommendations than what is currently available. METHODS Current approaches to perform triangulation are described and discussed. Anchor-based estimates are focussed upon due to their recognition as the most valid and developed approach. Recommendations for triangulation are provided. RESULTS A correlation-weighted average of MCT estimates is recommended to triangulate multiple MCT estimates derived from a single study into a single value, where increased weighting is given to stronger anchor measures. The choice of method to triangulate estimates from several published studies is highly dependent on the availability of information within the publications. MCTs designed for between-group differences, within-group changes, and within-individual changes should be considered separately. CONCLUSION The recommendations within this article provide a reliable and transparent approach to triangulation when a single value is sought, based on meta-analytic approaches. This approach is preferable to a simple mean of estimates where all are weighted equally, or through 'eyeballing' plotted estimates which is unreliable. We encourage researchers to adopt these methods, but to remain aware of the limitations within each method and further nuances in study design that result in heterogeneity. Sensitivity analyses with a range of plausible values are encouraged; however, the recommendations provide a suitable starting value for inferences. Unresolved issues in triangulation, requiring further exploration, are highlighted.
Collapse
|
15
|
Griffiths P, Terluin B, Trigg A, Schuller W, Bjorner JB. A confirmatory factor analysis approach was found to accurately estimate the reliability of transition ratings. J Clin Epidemiol 2021; 141:36-45. [PMID: 34464687 DOI: 10.1016/j.jclinepi.2021.08.029] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Revised: 06/30/2021] [Accepted: 08/23/2021] [Indexed: 11/29/2022]
Abstract
INTRODUCTION Transition ratings (TRs) are single item measures which ask patients to report on their health change. They allow for a simple assessment of improvement or deterioration and are frequently used as an "anchor" to determine interpretation thresholds on a patient-reported outcome measure (PROM). Despite their widespread use, a routinely applicable method to assess their reliability is lacking. This paper introduces a method to estimate the reliability of TRs based on confirmatory factor analysis (CFA) for categorical data. METHOD We modelled longitudinal PROM data as independent factors representing Time 1 and Time 2 in a CFA model. PROM items taken at Time 1 (T1) loaded on the first factor, although the same items taken at Time 2 (T2) loaded on the second. The TR item loaded onto both T1 and T2 factors. Three models with various constraints on the loadings and thresholds were examined. The communality (R2) statistic was used as a measure of the TR reliability. The approach was evaluated using simulated data and exemplified in four empirical datasets. RESULTS The simplest CFA model without constraints on the item loadings and thresholds performed equivalently to models with constraints on loadings and thresholds over time. Further constraints on the TR item loadings to be equal and opposite over time caused biased TR reliability estimates if the T1 and T2 loadings differed in magnitude. In the four empirical datasets, reliability of TRs ranged from 0.27 to 0.48. In three examples the TR had numerically stronger loading on T2 than on T1. DISCUSSION AND CONCLUSIONS Results support the use of the proposed method in understanding the reliability of TRs. Empirical study results reflect the typical range of reliability that has previously been reported for single items. Methodological considerations to improve TR reliability are presented, and developments of this method, are posited.
Collapse
Affiliation(s)
| | - Berend Terluin
- Department of General Practice, Amsterdam Public Health research institute, Amsterdam UMC, Vrije Universiteit Amsterdam, de Boelelaan 1117, Amsterdam, 1081 HV, The Netherlands
| | | | - Wouter Schuller
- Amsterdam UMC, Location VUmc, Department of Epidemiology and Biostatistics, Amsterdam Public Health Research Institute, de Boelelaan 1117, Amsterdam, The Netherlands; Spine Clinic, Provincialeweg 152-154, Zaandam, 1506 ME, The Netherlands
| | - Jakob Bue Bjorner
- QualityMetric, LLC, Johnston, RI, USA; Department of Public Health, University of Copenhagen, Copenhagen, Denmark; National Research Centre for the Working Environment, Copenhagen, Denmark
| |
Collapse
|