51
|
Young T, Yang Y, Brazier JE, Tsuchiya A, Coyne K. The first stage of developing preference-based measures: constructing a health-state classification using Rasch analysis. Qual Life Res 2008; 18:253-65. [PMID: 19082759 DOI: 10.1007/s11136-008-9428-0] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2008] [Accepted: 11/20/2008] [Indexed: 10/21/2022]
Abstract
OBJECTIVE To set out the methodological process for using Rasch analysis alongside classical psychometric methods in the development of a health-state classification that is amenable to valuation. METHODS The overactive bladder questionnaire is used to illustrate a five step process for deriving a reduced health-state classification from an existing non-preference-based health-related quality-of-life instrument. Step I uses factor analysis to establish instrument dimensions, step II excludes items that do not meet the initial validation process and step III uses criteria based on Rasch analysis and other psychometric testing to select the final items for the health-state classification. In step IV, item levels are examined and Rasch analysis is used to explore the possibility of reducing the number of item levels. Step V repeats steps I-IV on alternative data sets in order to validate the selection of items for the health-state classification. RESULTS The techniques described enable the construction of a five-dimension health-state classification, the OAB-5D, amenable to valuation tasks that will allow the derivation of preference weights. CONCLUSIONS The health-related quality of life of patients with conditions like overactive bladder can be valued and quality adjustment weights estimated for calculation of quality-adjusted life years.
Collapse
Affiliation(s)
- Tracey Young
- School of Health and Related Research, HEDS University of Sheffield, Regent Court 30 Regent Street, Sheffield, S1 4DA, UK.
| | | | | | | | | |
Collapse
|
52
|
Tsutsumi A, Iwata N, Wakita T, Kumagai R, Noguchi H, Kawakami N. Improving the measurement accuracy of the effort-reward imbalance scales. Int J Behav Med 2008; 15:109-19. [PMID: 18569129 DOI: 10.1080/10705500801929718] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
BACKGROUND The effort-reward imbalance (ERI) scale items are answered in a two-step process, but the justification is questioned for the formulation of summary measure by combining information rated in two steps. PURPOSE To examine whether the basic prerequisites of the ERI scales are empirically satisfied and to seek ways to improve the rating procedure. METHODS A polytomous item response theory (IRT) model was applied to the responses of 20,256 workers who completed the ERI scales. To determine the most appropriate statistical justification, three alternative scoring algorithms were compared with regard to the test properties revealed by the IRT analyses and efficiencies of screening performance and criterion validity against depressive symptomatology. RESULTS The rated raw-score units did not reflect the hypothesized order of lowest stress levels to highest stress levels. Exchanging or collapsing the lowest two categories of a Likert scaled item, where data of different quality are combined, solved this problem, thereby making the test content more appropriate. The modified rating improved the efficiencies of screening performance and the correlation of the stress summary measures against health criterion, i.e., depression. CONCLUSION An avoidable measurement error exists in the current ERI scales. Modifying the rating procedure can improve the measurement accuracy.
Collapse
Affiliation(s)
- Akizumi Tsutsumi
- Occupational Health Training Center, University of Occupational and Environmental Health, Noboru Iwata, Japan. .
| | | | | | | | | | | |
Collapse
|
53
|
Deriving utility scores from the SF-36 health instrument using Rasch analysis. Qual Life Res 2008; 17:1183-93. [DOI: 10.1007/s11136-008-9395-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2007] [Accepted: 09/04/2008] [Indexed: 10/21/2022]
|
54
|
Kwan JW, Cronkite RC, Yiu A, Goldstein MK, Kazis L, Cheung RC. The impact of chronic hepatitis C and co-morbid illnesses on health-related quality of life. Qual Life Res 2008; 17:715-24. [PMID: 18427949 DOI: 10.1007/s11136-008-9344-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2007] [Accepted: 04/01/2008] [Indexed: 11/30/2022]
Abstract
OBJECTIVES Determine the relative impact of chronic hepatitis C (CHC) and co-morbid illnesses on health-related quality of life (HRQoL) in 3023 randomly selected veterans with known hepatitis C virus antibody (anti-HCV) status who previously completed a veteran-specific HRQoL questionnaire (SF-36V). METHODS Multiple regression analyses were performed to measure the relative contribution of anti-HCV status, four demographic variables, and ten common medical and six psychiatric co-morbidities to HRQoL between 303 anti-HCV(+) and 2720 anti-HCV(-) patients. RESULTS Anti-HCV(+) veterans were younger, reported a lower HRQoL on seven of eight 36-Item Short Form Health Survey for Veterans (SF-36V) subscales (P < or = 0.001) and the mental component summary (MCS) scale (P < 0.001). The ten medical and six psychiatric co-morbidities had variable impact on predicting lower HRQoL in both groups. After adjusting for demographic variables and co-morbid illnesses, we found that anti-HCV(+) patients reported a significantly lower MCS score (P < 0.001) and a trend toward a lower physical component summary (PCS) score (P < 0.07) compared to anti-HCV(-) veterans. Among the anti-HCV(+) veterans, co-morbid medical illnesses contributed to impaired PCS but not to MCS. CONCLUSIONS Veterans with CHC were younger than HCV(-) veterans and hence less likely to have other co-morbid medical illnesses. Medical co-morbidities seen in those veterans with CHC contribute to impaired PCS but not MCS. Anti-HCV(+) status negatively affects HRQoL, particularly MCS, independently of medical and psychiatric co-morbidities.
Collapse
Affiliation(s)
- Jeffrey W Kwan
- Division of Gastroenterology and Hepatology, Department of Medicine, VA Palo Alto Health Care System, (154C), 3801 Miranda Avenue, Palo Alto, CA 94304, USA
| | | | | | | | | | | |
Collapse
|
55
|
Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). J Clin Epidemiol 2008; 61:17-33. [PMID: 18083459 DOI: 10.1016/j.jclinepi.2006.06.025] [Citation(s) in RCA: 366] [Impact Index Per Article: 22.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2006] [Revised: 05/24/2006] [Accepted: 06/06/2006] [Indexed: 01/18/2023]
Abstract
OBJECTIVE The Patient-Reported Outcomes Measurement Information System (PROMIS) was initiated to improve precision, reduce respondent burden, and enhance the comparability of health outcomes measures. We used item response theory (IRT) to construct and evaluate a preliminary item bank for physical function assuming four subdomains. STUDY DESIGN AND SETTING Data from seven samples (N=17,726) using 136 items from nine questionnaires were evaluated. A generalized partial credit model was used to estimate item parameters, which were normed to a mean of 50 (SD=10) in the US population. Item bank properties were evaluated through Computerized Adaptive Test (CAT) simulations. RESULTS IRT requirements were fulfilled by 70 items covering activities of daily living, lower extremity, and central body functions. The original item context partly affected parameter stability. Items on upper body function, and need for aid or devices did not fit the IRT model. In simulations, a 10-item CAT eliminated floor and decreased ceiling effects, achieving a small standard error (< 2.2) across scores from 20 to 50 (reliability >0.95 for a representative US sample). This precision was not achieved over a similar range by any comparable fixed length item sets. CONCLUSION The methods of the PROMIS project are likely to substantially improve measures of physical function and to increase the efficiency of their administration using CAT.
Collapse
|
56
|
Improvements in short-form measures of health status: Introduction to a series. J Clin Epidemiol 2008; 61:1-5. [DOI: 10.1016/j.jclinepi.2007.08.008] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2006] [Revised: 08/02/2007] [Accepted: 08/09/2007] [Indexed: 01/22/2023]
|
57
|
Sikorskii A, Given CW, Given B, Jeon S, Decker V, Decker D, Champion V, McCorkle R. Symptom management for cancer patients: a trial comparing two multimodal interventions. J Pain Symptom Manage 2007; 34:253-64. [PMID: 17618080 PMCID: PMC2043403 DOI: 10.1016/j.jpainsymman.2006.11.018] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/31/2006] [Revised: 11/16/2006] [Accepted: 11/29/2006] [Indexed: 11/25/2022]
Abstract
The results of a randomized controlled trial that tested the effects of eight-week, six-contact multidimensional interactive interventions for symptom management are presented. Four hundred and thirty-five cancer patients with solid tumors undergoing chemotherapy were randomized to receive either nurse-assisted symptom management (NASM) or automated telephone symptom management (ATSM). A prior trial established the effectiveness of NASM compared with conventional care. Seventeen symptoms commonly experienced by patients undergoing chemotherapy were rated on a scale from 0 to 10 and were evaluated at baseline, at each of the six intervention contacts, and postintervention observation at 10 weeks. Both groups achieved significant reduction in symptom severity over baseline, and there was no difference between groups on symptom severity at 10 weeks. Randomization accounted for possible reductions in severity due to response shifts. Severity of symptoms reported by patients at each of the six intervention contacts was measured using a Rasch model. Symptom pattern was different for lung and non-lung cancer patients, and they were analyzed separately. Longitudinal analyses revealed that lung cancer patients with greater symptom severity withdrew from later intervention contacts of the ATSM. The results suggest that both NASM and ATSM achieved a clinically significant reduction in symptom severity. The NASM may be more effective than ATSM in retaining lung cancer patients in the intervention. Further testing of ATSM supplemented by NASM for patients with severe symptoms is warranted.
Collapse
Affiliation(s)
- Alla Sikorskii
- College of Nursing, Michigan State University, East Lansing, Michigan 48824, USA
| | | | | | | | | | | | | | | |
Collapse
|
58
|
Fries JF, Bruce B, Bjorner J, Rose M. More relevant, precise, and efficient items for assessment of physical function and disability: moving beyond the classic instruments. Ann Rheum Dis 2007; 65 Suppl 3:iii16-21. [PMID: 17038464 PMCID: PMC1798376 DOI: 10.1136/ard.2006.059279] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
OBJECTIVES Patient reported outcomes (PROs) have become standard study endpoints. However, little attention has been given to using item improvement to advance PRO performance which could improve precision, clarity, patient relevance, and information content of "physical function/disability" items and thus the performance of resulting instruments. METHODS The present study included 1860 physical function/disability items from 165 instruments. Item formulations were assessed by frequency of use, modified Delphi consensus, respondent judgement of clarity and importance, and item response theory (IRT). Data from 1100 rheumatoid arthritis, osteoarthritis, and normal ageing subjects, using qualitative item review, focus groups, cognitive interviews, and patient survey were used to achieve a unique item pool that was clear, reliable, sensitive to change, readily translatable, devoid of floor and ceiling limitations, contained unidimensional subdomains, and had maximal information content. RESULTS A "present tense" time frame was used most frequently, better understood, more readily translated, and more directly estimated the latent trait of disability. Items in the "past tense" had 80-90% false negatives (p<0.001). The best items were brief, clear, and contained a single construct. Responses with four to five options were preferred by both experts and respondents. The term physical function may be preferable to the term disability because of fewer floor effects. IRT analyses of "disability" suggest four independent subdomains (mobility, dexterity, axial, and compound) with factor loadings of 0.81-0.99. CONCLUSIONS Major improvement in performance of items and instruments is possible, and may have the effect of substantially reducing sample size requirements for clinical trials.
Collapse
Affiliation(s)
- J F Fries
- Stanford University, 1000 Welch Road, Suite 203, Palo Alto, CA 94304, USA.
| | | | | | | |
Collapse
|
59
|
Brodersen J, McKenna SP, Doward LC, Thorsen H. Measuring the psychosocial consequences of screening. Health Qual Life Outcomes 2007; 5:3. [PMID: 17210071 PMCID: PMC1770907 DOI: 10.1186/1477-7525-5-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2006] [Accepted: 01/08/2007] [Indexed: 11/13/2022] Open
Abstract
The last three decades have seen a dramatic rise in the implementation of screening programmes for cancer in industrialised countries. However, in contrast to screening for infectious diseases, most cancer screening programmes only have the potential to reduce mortality; they cannot lower the incidence of cancer in a population. In fact, most cancer screening programmes have been shown to increase the incidence of the disease as a consequence of over-diagnosis. A further dilemma of cancer screening programmes is that they do not distinguish between healthy people and those with disease. Rather, they identify a continuum of disease severity. Consequently, many healthy people who have abnormal screening tests are wrongly diagnosed. Indeed, studies have demonstrated that for each screening-prevented death from cancer, at least 200 false-positive results are given. Therefore, screening has the potential to be harmful as well as beneficial. The psychosocial consequences of false-positive screening results cannot be determined by diagnostic tests or by other technical means. Instead, patient reported outcome measures must be employed. To measure the outcomes of screening accurately and comprehensively patient reported outcome measures have to capture; the nature and extent of the psychosocial consequences and how these change over time. The outcome measures used must have high content validity and their psychometric properties should be determined prior to their use in the specific population. In particular it is important to establish unidimensionality, additivity and item ordering through the application of Item Response Theory.
Collapse
Affiliation(s)
- John Brodersen
- Department and Research Unit of General Practice, University of Copenhagen, Oester Farimagsgade 5, 24Q, Postbox 2099, DK-1014 Copenhagen, Denmark
| | - Stephen P McKenna
- Galen Research, Enterprise House, Manchester Science Park, Lloyd Street North, Manchester M15 6SE, UK
| | - Lynda C Doward
- Galen Research, Enterprise House, Manchester Science Park, Lloyd Street North, Manchester M15 6SE, UK
| | - Hanne Thorsen
- Department and Research Unit of General Practice, University of Copenhagen, Oester Farimagsgade 5, 24Q, Postbox 2099, DK-1014 Copenhagen, Denmark
| |
Collapse
|
60
|
Taylor WJ, McPherson KM. Using Rasch analysis to compare the psychometric properties of the Short Form 36 physical function score and the Health Assessment Questionnaire disability index in patients with psoriatic arthritis and rheumatoid arthritis. ACTA ACUST UNITED AC 2007; 57:723-9. [PMID: 17530670 DOI: 10.1002/art.22770] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
OBJECTIVE Item-response theory is increasingly used in the development of robust measurement tools. The extent to which the Health Assessment Questionnaire (HAQ) disability index (DI) and Short Form 36 (SF-36) physical functioning subscale (PF) fit a Rasch model in psoriatic arthritis (PsA) is uncertain. Our objective was to compare the psychometric properties of the HAQ DI and SF-36 PF in PsA and rheumatoid arthritis (RA) using Rasch analysis. METHODS Patients with RA (n = 142) and PsA (n = 134) were identified from a disease register based at a regional rheumatology service that serves a population of approximately 400,000 individuals. Responses to the HAQ DI and SF-36 PF were analyzed for item fit, differential item functioning (DIF), scale length (item separation), floor effects, and item difficulty by fitting the data to a Rasch model. The extent to which each instrument measured the same concept (disability) was also assessed in the PsA cohort using the Rasch model. RESULTS Item separation was much better for the SF-36 PF than the HAQ DI in PsA (9.12 logits versus 2.06 logits). There was evidence of marked DIF for the HAQ DI items activities, grip, and rising and relatively minor DIF for 4 items of the SF-36 PF. The distribution of SF-36 PF was better than HAQ DI in PsA, with floor effects of 3.1% versus 30.4%. Common person equating demonstrated that the 2 instruments measure the same construct in PsA. CONCLUSION The SF-36 PF has significant psychometric advantages over the HAQ DI in PsA.
Collapse
Affiliation(s)
- William J Taylor
- Wellington School of Medicine and Health Sciences, University of Otago, Newtown, Wellington, New Zealand.
| | | |
Collapse
|
61
|
Hahn EA, Bode RK, Du H, Cella D. Evaluating linguistic equivalence of patient-reported outcomes in a cancer clinical trial. Clin Trials 2006; 3:280-90. [PMID: 16895045 DOI: 10.1191/1740774506cn148oa] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
BACKGROUND In order to make meaningful cross-cultural or cross-linguistic comparisons of health-related quality of life (HRQL) or to pool international research data, it is essential to create unbiased measures that can detect clinically important differences. When HRQL scores differ between cultural/linguistic groups, it is important to determine whether this reflects real group differences, or is the result of systematic measurement variability. PURPOSE To investigate the linguistic measurement equivalence of a cancer-specific HRQL questionnaire, and to conduct a sensitivity analysis of treatment differences in HRQL in a clinical trial. METHODS Patients with newly diagnosed chronic myelogenous leukemia (n = 1049) completed serial HRQL assessments in an international Phase III trial. Two types of differential item functioning (uniform and non-uniform) were evaluated using item response theory and classical test theory approaches. A sensitivity analysis was conducted to compare HRQL between treatment arms using items without evidence of differential functioning. RESULTS Among 27 items, nine (33%) did not exhibit any evidence of differential functioning in both linguistic comparisons (English versus French, English versus German). Although 18 items functioned differently, there was no evidence of systematic bias. In a sensitivity analysis, adjustment for differential functioning affected the magnitude, but not the direction or interpretation of clinical trial treatment arm differences. LIMITATIONS Sufficient sample sizes were available for only three of the eight language groups. Identification of differential functioning in two-thirds of the items suggests that current psychometric methods may be too sensitive. CONCLUSIONS Enhanced methodologies are needed to differentiate trivial from substantive differential item functioning. Systematic variability in HRQL across different groups can be evaluated for its effect upon clinical trial results; a practice recommended when data are pooled across cultural or linguistic groups to make conclusions about treatment effects.
Collapse
Affiliation(s)
- Elizabeth A Hahn
- Center on Outcomes, Research and Education, Evanston Northwestern Healthcare, Evanston, Illinois, USA.
| | | | | | | |
Collapse
|
62
|
Fries JF. The promise of the future, updated: better outcome tools, greater relevance, more efficient study, lower research costs. ACTA ACUST UNITED AC 2006. [DOI: 10.2217/17460816.1.4.415] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
63
|
Garamendi E, Pesudovs K, Stevens MJ, Elliott DB. The Refractive Status and Vision Profile: Evaluation of psychometric properties and comparison of Rasch and summated Likert-scaling. Vision Res 2006; 46:1375-83. [PMID: 16105674 DOI: 10.1016/j.visres.2005.07.007] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2005] [Revised: 07/05/2005] [Accepted: 07/06/2005] [Indexed: 10/25/2022]
Abstract
The psychometric properties of the Refractive Status and Vision Profile (RSVP) questionnaire were evaluated using Rasch analysis. Ninety-one myopic patients from a refractive surgery clinic and general optometric practice completed the RSVP. Rasch analysis of the RSVP ordinal data was performed to examine for unidimensionality and item reduction. The traditional Likert-scoring system was compared with a Rasch-scored RSVP and a reduced item Rasch-scored RSVP. Rasch analysis of the original RSVP showed poor targeting of item difficulty to patient quality of life, items with a ceiling effect and underutilized response categories. Combining the underutilized response scales and removal of redundant and misfitting items improved the internal consistency and targeting of the RSVP, and the reduced 20-item Rasch scored RSVP showed greater relative precision over standard Likert scoring in discriminating between the two subject groups. A Rasch scaled quality of life questionnaire is recommended for use in refractive outcomes research.
Collapse
Affiliation(s)
- Estibaliz Garamendi
- Department of Optometry, University of Bradford, Richmond Road, Bradford, West Yorkshire, BD7 1DP, United Kingdom
| | | | | | | |
Collapse
|
64
|
Koh ET, Leong KP, Tsou IYY, Lim VH, Pong LY, Chong SY, Seow A. The reliability, validity and sensitivity to change of the Chinese version of SF-36 in oriental patients with rheumatoid arthritis. Rheumatology (Oxford) 2006; 45:1023-8. [PMID: 16495318 DOI: 10.1093/rheumatology/kel051] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE To assess the reliability, validity and sensitivity to change of a Chinese version of the 36-item Short-Form Health Survey (SF-36) in Chinese-speaking patients with rheumatoid arthritis (RA) in Singapore. METHODS The psychometric properties of the Chinese Hong Kong standard version of the SF-36 were assessed in 401 RA patients. The construct validity of the Chinese SF-36 was assessed by comparison with the American College of Rheumatology (ACR) functional status, a validated Chinese Health Assessment Questionnaire (C-HAQ) and markers of RA activity and severity. RESULTS The overall Cronbach's coefficient alpha was 0.921, reflecting excellent internal consistency. The instrument showed reasonable test-retest reliability except in the social functioning (SF) subscale. There was a significant ceiling effect in the role physical (RP), SF and role emotional (RE) subscales and a floor effect in the RP and RE subscales. Physical function (PF) and SF were strongly correlated with C-HAQ and patient's assessment of RA activity [Pearson's correlation coefficient (r) ranging from -0.41 to -0.53] and moderately correlated with ACR functional status (r = -0.35 and -0.3, respectively). Weak correlations were also found between the Chinese SF-36 and markers of RA activity, deformed joint count and radiographic damage. PF and SF were the subscales most responsive to change in quality of life (QOL). CONCLUSION The Chinese SF-36 showed reasonable reliability, criterion validity and responsiveness with limitations in certain subscales. Overall, the physical domains and PF in particular may be the most ideal psychometric measures of QOL in RA.
Collapse
Affiliation(s)
- E T Koh
- Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, 11 Jalan Tan Tock Seng, Singapore 308433.
| | | | | | | | | | | | | |
Collapse
|
65
|
Adams R, Rosier M, Campbell D, Ruffin R. Assessment of an asthma quality of life scale using item-response theory. Respirology 2006; 10:587-93. [PMID: 16268911 DOI: 10.1111/j.1440-1843.2005.00754.x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
OBJECTIVE Health-related quality of life (HRQOL) scores produced by simply summing individual item values have been criticized for lacking linearity and for not being equally discriminating across the range of scores. Differences in summed scores may depend more on their starting point on the scale rather than actual differences in the underlying dimension of HRQOL, making it difficult to judge what a particular score means. We sought to examine the usefulness of an alternative method of scoring questionnaires, item-response theory (IRT), for the clinical interpretation of a modified version of the Marks Asthma Quality of Life Questionnaire (MAQLQ-M). METHODOLOGY Using the MAQLQ-M, we surveyed 293 adults with moderate to severe asthma, managed at two university teaching hospitals in Adelaide, South Australia. Scores obtained by usual summative Likert-type scores were compared to estimates using the partial credit method of IRT. RESULTS We found a non-linear relationship between raw, summative scores and the IRT estimates. The departure from linearity was marked for summative scores below 3.0 and above 5.0 (range 1.0-7.0), values that included half of the study patients. Summative scoring did not produce scores at the interval level of measurement. CONCLUSION For an equivalent difference measured in the underlying dimension of actual HRQOL using IRT, traditional summated scores showed a much smaller difference in scores at both the lower and upper end of HRQOL, than at the mid-range of HRQOL. Caution should be used when interpreting HRQOL surveys scored in the usual summative manner. Advantages may be gained by using IRT as an alternative method of scoring HRQOL questionnaires.
Collapse
Affiliation(s)
- Robert Adams
- The Health Observatory, The University of Adelaide, The Queen Elizabeth Hospital Campus, Woodville, South Australia.
| | | | | | | |
Collapse
|
66
|
Pesudovs K, Garamendi E, Elliott DB. A Quality of Life Comparison of People Wearing Spectacles or Contact Lenses or Having Undergone Refractive Surgery. J Refract Surg 2006; 22:19-27. [PMID: 16447932 DOI: 10.3928/1081-597x-20060101-07] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
PURPOSE To demonstrate the use of the Quality of Life Impact of Refractive Correction (QIRC) questionnaire for comparing the quality of life of pre-presbyopic individuals with refractive correction by spectacles, contact lenses, or refractive surgery. METHODS The 20-item QIRC questionnaire was administered to 104 spectacle wearers, 104 contact lens wearers, and 104 individuals who had undergone refractive surgery (N = 312). These groups were similar for gender, ethnicity, socioeconomic status, and refractive error. The main outcome measure was QIRC overall score (scaled from 0 to 100), a measure of refractive correction related quality of life. Groups were compared for overall QIRC score and on each question by analysis of variance, adjusted for age, with post hoc significance testing (Sheffé). RESULTS On average, refractive surgery patients scored significantly better (mean QIRC score 50.2 +/- 6.3, F(2,309) = 15.18, P < .001) than contact lens wearers (46.7 +/- 5.5, post hoc P < .001) who were in turn significantly better than spectacle wearers (44.1 +/- 5.9, post hoc P < .01). Convenience questions chiefly drove the differences between groups, although functioning, symptoms, economic concerns, heath concerns, and well being were also important. Spectacle wearers with low strength prescriptions (46.18 +/- 5.05) scored significantly better than those with medium strength prescriptions (42.74 +/- 6.08, F(2,190) = 3.66, P < .05, post hoc P < .05). A small number (n = 7, 6.7%) of refractive surgery patients experienced postoperative complications, which impacted quality of life (37.86 +/- 2.13). CONCLUSIONS Quality of life was lowest in spectacle wearers, particularly those with higher corrections. Contact lens wearers had significantly better QIRC score than spectacle wearers. Refractive surgery patients scored significantly better than both. However, this was accompanied by a small risk of poor quality of life due to postoperative complications. The QIRC is an effective outcome measure for quality of life impact of refractive correction.
Collapse
Affiliation(s)
- Konrad Pesudovs
- Department of Ophthalmology, Flinders Medical Centre and Flinders University, Adelaide, South Australia, Australia.
| | | | | |
Collapse
|
67
|
Petersen MA, Groenvold M, Aaronson N, Brenne E, Fayers P, Nielsen JD, Sprangers M, Bjorner JB. Scoring based on item response theory did not alter the measurement ability of EORTC QLQ-C30 scales. J Clin Epidemiol 2005; 58:902-8. [PMID: 16085193 DOI: 10.1016/j.jclinepi.2005.02.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2003] [Revised: 06/09/2004] [Accepted: 02/14/2005] [Indexed: 01/22/2023]
Abstract
BACKGROUND AND OBJECTIVES Most health-related quality-of-life questionnaires include multi-item scales. Scale scores are usually estimated as simple sums of the item scores. However, scoring procedures utilizing more information from the items might improve measurement abilities, and thereby reduce the needed sample sizes. We investigated whether item response theory (IRT)-based scoring improved the measurement abilities of the EORTC QLQ-C30 physical functioning, emotional functioning, and fatigue scales. METHODS Using a database of 13,010 subjects we estimated the relative validities of IRT scoring compared to sum scoring of the scales. RESULTS The mean relative validities were 1.04 (physical), 1.03 (emotional), and 0.97 (fatigue). None of these were significantly larger than 1. Thus, no gain in measurement abilities using IRT scoring was found for these scales. Possible explanations include that the items in the scales are not constructed for IRT scoring and that the scales are relatively short. CONCLUSION IRT scoring of the three longest EORTC QLQ-C30 scales did not improve measurement abilities compared to the traditional sum scoring of the scales.
Collapse
Affiliation(s)
- Morten Aa Petersen
- The research unit, Department of Palliative Medicine, Bispebjerg Hospital, Bispebjerg bakke 23, 2400 Copenhagen, Denmark.
| | | | | | | | | | | | | | | |
Collapse
|
68
|
de Vet HCW, Adèr HJ, Terwee CB, Pouwer F. Are factor analytical techniques used appropriately in the validation of health status questionnaires? A systematic review on the quality of factor analysis of the SF-36. Qual Life Res 2005; 14:1203-18; dicussion 1219-21, 1223-4. [PMID: 16047498 DOI: 10.1007/s11136-004-5742-3] [Citation(s) in RCA: 188] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Factor analysis is widely used to evaluate whether questionnaire items can be grouped into clusters representing different dimensions of the construct under study. This review focuses on the appropriate use of factor analysis. The Medical Outcomes Study Short Form-36 (SF-36) is used as an example. Articles were systematically searched and assessed according to a number of criteria for appropriate use and reporting. Twenty-eight studies were identified: exploratory factor analysis was performed in 22 studies, confirmatory factor analysis was performed in five studies and in one study both were performed. Substantial shortcomings were found in the reporting and justification of the methods applied. In 15 of the 23 studies in which exploratory factor analysis was performed, confirmatory factor analysis would have been more appropriate. Cross-validation was rarely performed. Presentation of the results and conclusions was often incomplete. Some of our results are specific for the SF-36, but the finding that both the application and the reporting of factor analysis leaves much room for improvement probably applies to other health status questionnaires as well. Optimal reporting and justification of methods is crucial for correct interpretation of the results and verification of the conclusions. Our list of criteria may be useful for journal editors, reviewers and researchers who have to assess publications in which factor analysis is applied.
Collapse
Affiliation(s)
- Henrica C W de Vet
- Institute for Research in Extramural Medicine, VU University Medical Center, Amsterdam, The Netherlands.
| | | | | | | |
Collapse
|
69
|
Garamendi E, Pesudovs K, Elliott DB. Changes in quality of life after laser in situ keratomileusis for myopia. J Cataract Refract Surg 2005; 31:1537-43. [PMID: 16129288 DOI: 10.1016/j.jcrs.2004.12.059] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/22/2004] [Indexed: 11/25/2022]
Abstract
PURPOSE To measure quality of life (QoL) outcome in prepresbyopic myopic patients having laser in situ keratomileusis (LASIK) refractive surgery using the Quality of Life Impact of Refractive Correction (QIRC) questionnaire and to compare the QoL of preoperative patients with a sample of spectacle and contact lens wearers not considering refractive surgery. SETTING Department of Optometry, University of Bradford, Bradford, and Ultralase, Leeds, West Yorkshire, United Kingdom. METHODS The validated QIRC questionnaire was prospectively completed by 66 patients before and 3 months after LASIK. Patients had myopia greater than 0.50 diopters (D) (range --0.75 to --10.50 D) and were aged 16 to 39 years. Patients were also directly asked to evaluate their QoL after surgery. RESULTS Overall QIRC scores improved after LASIK from a mean of 40.07+/- 4.30 (SD) to 53.09+/- 5.25 (F(1,130)=172.65, P<.001). Greater improvements occurred in women (53.83+/- 5.46) than in men (49.39+/- 5.94; F(1,64)=9.37, P<.005). Overall, 15 of the 20 questions (especially convenience, health concerns, and well-being questions) showed significantly improved scores (P<.05). Patients who "strongly agreed" (53.96+/- 4.91, n=33) or "agreed" (51.78+/- 6.19, n=23) had improved QoL and had significantly higher QIRC scores than those who "neither agreed nor disagreed" (44.36+/- 4.97, n=5) or "strongly disagreed" (42.82, n=1) (F(1,60)=11.24, P<.001). The matched group not contemplating LASIK scored 42.41 +/- 3.89 on QIRC overall. CONCLUSIONS Large improvements in QIRC QoL scores were found after LASIK for myopia in the majority of patients, with greater improvements in women. A small number of patients (4.5%) had decreased QIRC QoL scores, and these were associated with complications. People presenting for LASIK scored measurably poorer than matched patients not contemplating refractive surgery.
Collapse
Affiliation(s)
- Estibaliz Garamendi
- Department of Optometry, University of Bradford, Richmond Road, Bradford, West Yorkshire, United Kingdom.
| | | | | |
Collapse
|
70
|
|
71
|
Talley NJ, Wiklund I. Patient reported outcomes in gastroesophageal reflux disease: an overview of available measures. Qual Life Res 2005; 14:21-33. [PMID: 15789938 DOI: 10.1007/s11136-004-0613-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Gastroesophageal reflux disease (GERD) is a common, chronic disorder. The main symptom of GERD is heartburn, although a diverse range of symptoms can be associated with the disease including acid regurgitation and epigastric pain. GERD is also a risk factor for Barrett's oesophagus and esophageal adenocarcinoma. The impact of GERD symptoms on patients' lives can be profound and is unrelated to the presence or absence of esophagitis. The impact of GERD can be measured by assessing the patient perspective using Patient Reported Outcomes (PROs). There are two categories of questionnaires that can be used to measure the effect of GERD on health-related quality of life (HRQoL), namely generic and disease or treatment specific. The use of PRO instruments has become more accepted in the assessment of disease treatment. Well-designed instruments that assess physical, psychological and emotional factors can provide clinicians with the data that will promote effective management decisions for the treatment of GERD. The most frequently used instruments in GERD are reviewed here, in terms of their psychometric properties.
Collapse
Affiliation(s)
- Nicholas J Talley
- Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA.
| | | |
Collapse
|
72
|
Abstract
Zusammenfassung. Der IRES-Patientenfragebogen (Indikatoren des Reha-Status) ist eines der am häufigsten eingesetzten Assessmentinstrumente in der medizinischen Rehabilitation in Deutschland. Mittels der Item-Response-Theorie wurde aus der 144 Items umfassenden Langform die Kurzform IRES-24 entwickelt, die die Dimensionen ‘Psychisches Befinden‘, ‘Funktionsfähigkeit im Alltag‘, ‘Somatische Gesundheit‘ und ‘Schmerzen‘ erfasst. Durch die Itemselektion wurde sichergestellt, dass das Antwortverhalten auf den Items jeder Skala durch jeweils eine latente Dimension und geordnete Schwellenwerte modelliert werden kann. Das Item-Fitmaß Q nimmt für alle Items sehr gute Werte an. Die Mixed-Rasch-Analyse zeigt, dass für jede Skala verschiedene Klassen von Patienten unterschieden werden müssen, damit die Annahme der Personenhomogenität erfüllt ist. Auf den Skalen ‘Psychisches Befinden‘ und ‘Funktionsfähigkeit im Alltag‘ unterscheiden sich die Klassen hinsichtlich der Antworttendenzen ‘Tendenz zur Mitte‘ und ‘Tendenz zu Extremwerten‘. Im Bereich ‘Somatische Gesundheit‘ werden in den Klassen unterschiedliche latente Merkmalsdimensionen erfasst. Für die Skala ‘Schmerzen‘ existiert eine kleine Gruppe von Patienten (8.3%) für die die Rasch-Kriterien nur zum Teil erfüllt sind. Die Zugehörigkeit der Patienten zu den latenten Klassen kann durch die Variabilität der Skalenitems, die Häufigkeit extremer Werte und die Ausprägungen auf den Skalenitems sehr gut vorhergesagt werden. Es wird eine zweigestufte Diagnostik vorgeschlagen, bei der zunächst die Klassenzugehörigkeit und anschließend die klassenspezifischen Personenparameter bestimmt werden. Der IRES-24 ermöglicht ein ökonomisches und zuverlässiges Screening der Indikatoren, die für die Behandlungsplanung und -evaluation in der medizinischen Rehabilitation von Bedeutung sind.
Collapse
|
73
|
Pesudovs K, Coster DJ. Correspondence. Patient-centred outcomes of cataract surgery in Australia. Clin Exp Ophthalmol 2005; 33:228. [PMID: 15807841 DOI: 10.1111/j.1442-9071.2005.00985.x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
74
|
Von Korff M, Katon W, Lin EHB, Simon G, Ludman E, Oliver M, Ciechanowski P, Rutter C, Bush T. Potentially modifiable factors associated with disability among people with diabetes. Psychosom Med 2005; 67:233-40. [PMID: 15784788 DOI: 10.1097/01.psy.0000155662.82621.50] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
OBJECTIVE This article seeks to identify potentially modifiable factors associated with disability among people with diabetes. STUDY DESIGN AND SETTING Among people with diabetes (N = 4357) in a large health maintenance organization, disease severity, psychologic and behavioral risk factors for disability were assessed. Disability was evaluated by the WHO Disability Assessment Scale (WHO-DAS-II), the SF-36 Social Functioning scale, and days of reduced household work. RESULTS Depression was associated with a tenfold increase in elevated WHO-DAS-II and low SF-36 Social Functioning scores, and a fourfold increase in 20+ days of reduced household work. Minor depression and the presence of three or more diabetic complications were associated with approximately a twofold increase in disability risk. Diabetic symptoms, chronic disease comorbidity, and reduced exercise were also associated with disability. CONCLUSION Among people with diabetes, depression, diabetic complications, and exercise are potentially modifiable factors associated with disability. This suggests that integrated, biopsychosocial approaches may be needed to understand and to ameliorate disability among people with diabetes.
Collapse
Affiliation(s)
- Michael Von Korff
- Center for Health Studies, Group Health Cooperative, Seattle, WA 98101, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
75
|
Ware JE, Gandek B, Sinclair SJ, Bjorner JB. Item response theory and computerized adaptive testing: Implications for outcomes measurement in rehabilitation. Rehabil Psychol 2005. [DOI: 10.1037/0090-5550.50.1.71] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
76
|
Abstract
OBJECTIVE The global index of safety (GIS) is an adverse event (AE) based instrument designed to evaluate the safety profile of drugs. This paper presents the evaluation of the inter-rater reliability and validity of a 94-item GIS for antipsychotics through Rasch analysis. RESEARCH DESIGN AND METHODS A total of 194 psychiatrists participating in an outpatient pharmacoepidemiologic study of olanzapine in schizophrenia rated the severity that each AE would have on a 5-point scale. Reliability was determined through a paired comparison design involving the new independent ratings of 101 different psychiatrists participating in another study of olanzapine in acute inpatient units. Spearman's, Pearson's and Intra-class correlation (ICC) coefficients were used to estimate the inter-rater reliability of the AE weights. Validity was analyzed through the Rasch rating scale model. RESULTS Reliability coefficient estimates were excellent (Spearman = 0.99, Pearson = 0.99, ICC = 0.98), supporting the inter-rater reliability of the item weights. Through goodness-of-fit statistics and the investigation of the hierarchy of item calibrations, Rasch analysis confirmed the validity of the instrument. CONCLUSION The data presented here on inter-rater reliability estimates of adverse events related to antipsychotic drugs indicate that GIS is a promising alternative for the evaluation of the safety profile of drugs.
Collapse
Affiliation(s)
- L Prieto
- Health Outcomes Research Unit, Eli Lilly & Co., Alcobendas (Madrid), Spain.
| | | | | |
Collapse
|
77
|
Wolfe F, Michaud K, Pincus T. Development and validation of the health assessment questionnaire II: A revised version of the health assessment questionnaire. ACTA ACUST UNITED AC 2004; 50:3296-305. [PMID: 15476213 DOI: 10.1002/art.20549] [Citation(s) in RCA: 191] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
OBJECTIVE The Health Assessment Questionnaire (HAQ) has become the most common tool for measuring functional status in rheumatology. However, the HAQ is long (34 questions, including 20 concerning activities of daily living and 14 relating to the use of aids and devices) and somewhat burdensome to score, has some floor effects, and has psychometric problems relating to linearity and confusing items. We undertook this study to develop and validate a revised version of the HAQ (the HAQ-II). METHODS Using Rasch analysis and a 31-question item bank, including 20 HAQ items, the 10-item HAQ-II was developed. Five original items from the HAQ were retained. We studied the HAQ-II in 14,038 patients with rheumatic disease over a 2-year period to determine its validity and reliability. RESULTS The HAQ-II was reliable (reliability of 0.88, compared with 0.83 for the HAQ), measured disability over a longer scale than the HAQ, and had no nonfitting items and no gaps. Compared with the HAQ, modified HAQ, and Medical Outcomes Study Short Form 36 physical function scale, the HAQ-II was as well correlated or better correlated with clinical and outcome variables. The HAQ-II performed as well as the HAQ in a clinical trial and in prediction of mortality and work disability. The mean difference between the HAQ and HAQ-II scores was 0.02 units. CONCLUSION The HAQ-II is a reliable and valid 10-item questionnaire that performs at least as well as the HAQ and is simpler to administer and score. Conversion from HAQ to HAQ-II and from HAQ-II to HAQ for research purposes is simple and reliable. The HAQ-II can be used in all places where the HAQ is now used, and it may prove to be easier to use in the clinic.
Collapse
|
78
|
Pesudovs K, Garamendi E, Elliott DB. The Quality of Life Impact of Refractive Correction (QIRC) Questionnaire: Development and Validation. Optom Vis Sci 2004; 81:769-77. [PMID: 15557851 DOI: 10.1097/00006324-200410000-00009] [Citation(s) in RCA: 149] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The purpose of the study was to develop a questionnaire that could quantify the quality of life (QOL) of people with refractive correction by spectacles, contact lenses, and refractive surgery in the prepresbyopic age group. METHODS The questionnaire was developed and validated using traditional methods and Rasch analysis. A 90-item pilot questionnaire was developed through extensive literature search and use of professional and lay focus groups. Pilot study data were obtained from 306 subjects for item reduction to produce the 20-item Quality of Life Impact of Refractive Correction (QIRC) questionnaire. Validity and reliability studies (test-retest reliability with intraclass correlation coefficient and Bland-Altman limits of agreement, and internal consistency with Rasch fit statistics, factor analysis, and Cronbach's alpha) were performed from data of an additional 312 subjects. RESULTS Rasch analysis demonstrated QIRC has good precision, reliability, and internal consistency (person separation, 2.03; reliability, 0.80; root-mean-square measurement error, 3.25; mean square +/- SD infit, 0.99 +/- 0.38; outfit, 1.00 +/- 0.39; item infit range, 0.70 to 1.24; and item outfit range, 0.78 to 1.32). The items (mean score, 50.3 +/- 7.3) were well targeted to the subjects (mean score, 47.8 +/- 5.5) with a mean difference of 2.45 (scale range, 0 to 100) units. Test-retest reliability (intraclass correlation coefficient, 0.88; coefficient of repeatability, +/-6.85 units), factor loading range (0.40 to 0.76), and Cronbach's alpha (0.78) also indicated the reliability and validity of QIRC. CONCLUSIONS The 20-item QIRC questionnaire, which quantifies the QOL of people with refractive correction by spectacles, contact lenses, and refractive surgery in the prepresbyopic age group, was developed using Rasch analysis and shown to be valid and reliable. The use of Rasch scaling allows scores to be treated as a valid continuous variable. QIRC has broad applicability for cross-sectional and outcomes research.
Collapse
Affiliation(s)
- Konrad Pesudovs
- Department of Optometry, University of Bradford, Bradford, West Yorkshire, United Kingdom.
| | | | | |
Collapse
|
79
|
Abstract
BACKGROUND The SF-12 is a multidimensional generic measure of health-related quality of life. It has become widely used in clinical trials and routine outcome assessment because of its brevity and psychometric performance, but it cannot be used in economic evaluation in its current form. OBJECTIVES We sought to derive a preference-based measure of health from the SF-12 for use in economic evaluation and to compare it with the original SF-36 preference-based index. RESEARCH DESIGN The SF-12 was revised into a 6-dimensional health state classification (SF-6D [SF-12]) based on an item selection process designed to ensure the minimum loss of descriptive information. SUBJECTS A sample of 241 states defined by the SF-6D (of 7500) have been valued by a representative sample of 611 members of the UK general population using the standard gamble (SG) technique. ANALYSIS Models are estimated of the relationship between the SF-6D (SF-12) and SG values and evaluated in terms of their coefficients, overall fit, and the ability to predict SG values for all health states. RESULTS The models have produced significant coefficients for levels of the SF-6D (SF-12), which are robust across model specification. The coefficients are similar to those of the SF-36 version and achieve similar levels of fit. There are concerns with some inconsistent estimates and these have been merged to produce the final recommended model. As for the SF-36 model, there is evidence of over prediction of the value of the poorest health states. CONCLUSIONS The SF-12 index provides a useful tool for researchers and policy makers wishing to assess the cost-effectiveness of interventions.
Collapse
Affiliation(s)
- John E Brazier
- Health Economics and Decision Science Group, School of Health and Related Research, University of Sheffield, Sheffield, United Kingdom
| | | |
Collapse
|
80
|
Doward LC, Meads DM, Thorsen H. Requirements for quality of life instruments in clinical research. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2004; 7 Suppl 1:S13-S16. [PMID: 15367238 DOI: 10.1111/j.1524-4733.2004.7s104.x] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
The ability to produce high quality instruments for the assessment of quality of life has advanced considerably in recent years. As the science progresses it has become clear that certain standards must be met if outcome measures are to be capable of providing useful, reliable, and valid information within the context of clinical studies and trials. This paper specifies what these standards are with particular reference to theoretical basis, practicality, acceptability to respondents, unidimensionality, scaling and psychometric properties, and cultural validity and equivalence. The paper also indicates how failure to achieve such standards results in measures that are inaccurate and insensitive to true changes in outcome.
Collapse
|
81
|
Alonso J, Ferrer M, Gandek B, Ware JE, Aaronson NK, Mosconi P, Rasmussen NK, Bullinger M, Fukuhara S, Kaasa S, Leplège A. Health-related quality of life associated with chronic conditions in eight countries: results from the International Quality of Life Assessment (IQOLA) Project. Qual Life Res 2004; 13:283-98. [PMID: 15085901 DOI: 10.1023/b:qure.0000018472.46236.05] [Citation(s) in RCA: 519] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
CONTEXT Few studies and no international comparisons have examined the impact of multiple chronic conditions on populations using a comprehensive health-related quality of life (HRQL) questionnaire. OBJECTIVE The impact of common chronic conditions on HRQL among the general populations of eight countries was assessed. DESIGN Cross-sectional mail and interview surveys were conducted. PARTICIPANTS AND SETTING Sample representatives of the adult general population of eight countries (Denmark, France, Germany, Italy, Japan, The Netherlands, Norway and the United States) were evaluated. Sample sizes ranged from 2031 to 4084. MAIN OUTCOME MEASURES Self-reported prevalence of chronic conditions (including allergies, arthritis, congestive heart failure, chronic lung disease, hypertension, diabetes, and ischemic heart disease), sociodemographic data and the SF-36 Health Survey were obtained. The SF-36 scale and summary scores were estimated for individuals with and without selected chronic conditions and compared across countries using multivariate linear regression analyses. Adjustments were made for age, gender, marital status, education and the mode of SF-36 administration. RESULTS More than half (55.1%) of the pooled sample reported at least one chronic condition, and 30.2% had more than one. Hypertension, allergies and arthritis were the most frequently reported conditions. The effect of ischemic heart disease on many of the physical health scales was noteworthy, as was the impact of diabetes on general health, or arthritis on bodily pain scale scores. Arthritis, chronic lung disease and congestive heart failure were the conditions with a higher impact on SF-36 physical summary score, whereas for hypertension and allergies, HRQL impact was low (comparing with a typical person without chronic conditions, deviation scores were around -4 points for the first group and -1 for the second). Differences between chronic conditions in terms of their impact on SF-36 mental summary score were low (deviation scores ranged between -1 and -2). CONCLUSIONS Arthritis has the highest HRQL impact in the general population of the countries studied due to the combination of a high deviation score on physical scales and a high frequency. Impact of chronic conditions on HRQL was similar roughly across countries, despite important variation in prevalence. The use of HRQL measures such as the SF-36 should be useful to better characterize the global burden of disease.
Collapse
Affiliation(s)
- Jordi Alonso
- Health Services Research Unit, Institut Municipal d'Investigació Mèdica (IMIM-IMAS), Barcelona, Spain.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
82
|
Hedner E, Carlsson J, Kulich KR, Stigendal L, Ingelgård A, Wiklund I. An instrument for measuring health-related quality of life in patients with Deep Venous Thrombosis (DVT): development and validation of Deep Venous Thrombosis Quality of Life (DVTQOL) questionnaire. Health Qual Life Outcomes 2004; 2:30. [PMID: 15214965 PMCID: PMC471562 DOI: 10.1186/1477-7525-2-30] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2004] [Accepted: 06/23/2004] [Indexed: 11/10/2022] Open
Abstract
Background Few studies have evaluated patient-reported outcomes in connection with a primary event of deep venous thrombosis, partly due to a lack of disease-specific measures. The aim here was to develop a disease-specific health-related quality of life (HRQL) measure, the deep venous thrombosis quality of life questionnaire (DVTQOL), for patients with recent exposition and treatment of proximal deep venous thrombosis. Methods A total of 121 consecutive outpatients (50 % males; mean age 61.2 ± 14 years) treated with warfarin (Waran®) for symptomatic proximal deep venous thrombosis were included in the study. Patients completed the SF-36, EQ-5D and the pilot version of the DVTQOL. Results Items having: high ceiling and floor effect, items with lower factor loadings than 0.50 and items loading in several factors were removed from the pilot version of DVTQOL. In addition, overlapping and redundant items identified by the Rasch analysis were excluded. The final DVTQOL questionnaire consists of 29 items composing six dimensions depicting problems with: emotional distress; symptoms (e.g. pain, swollen ankles, cramp, bruising); limitation in physical activity; hassle with coagulation monitoring; sleep disturbance; and dietary problems. The internal consistency reliability was high (alpha value ranged from 0.79 to 0.93). The relevant domains of the SF-36 and EQ-5D significantly correlated with DVTQOL, thereby confirming its construct validity. Conclusions The DVTQOL is a short and user-friendly instrument with good reliability and validity. Its test-retest reliability and responsiveness to change in clinical trials, however, must be explored.
Collapse
Affiliation(s)
- Ewa Hedner
- Experimental Medicine, AstraZeneca, R&D, SE-431 83 Mölndal, Sweden
| | | | | | - Lennart Stigendal
- Section for Coagulation Disorders, Department of Internal Medicine, Sahlgrenska University Hospital, Göteborg, Sweden
| | | | | |
Collapse
|
83
|
Twisk J, Proper K. Evaluation of the results of a randomized controlled trial: how to define changes between baseline and follow-up. J Clin Epidemiol 2004; 57:223-8. [PMID: 15066681 DOI: 10.1016/j.jclinepi.2003.07.009] [Citation(s) in RCA: 167] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/31/2003] [Indexed: 11/18/2022]
Abstract
OBJECTIVE The most common way to evaluate the effect of an intervention is to compare the intervention and nonintervention groups regarding the change in the outcome variable between baseline and follow-up; however, there are many different ways to define "changes." The purpose of this article is to demonstrate how different definitions of "change" used in the analysis can influence the results of a study. STUDY DESIGN AND SETTING Two different randomized controlled trials were used as examples. RESULTS The results of the analyses showed that for continuous outcome variables, analysis of covariance seems to be the most appropriate because it corrects for the phenomenon of regression to the mean. For dichotomous outcome variables, multinomial logistic regression analysis with all possible changes over time as outcome seems to be the most appropriate, especially because of its straightforward interpretation. CONCLUSION A different definition of "change" can lead to different results in the evaluation of the effect of an intervention.
Collapse
Affiliation(s)
- Jos Twisk
- Department of Clinical Epidemiology and Biostatistics, VU University Medical Centre, Amsterdam, The Netherlands.
| | | |
Collapse
|
84
|
Norquist JM, Fitzpatrick R, Dawson J, Jenkinson C. Comparing Alternative Rasch-Based Methods vs Raw Scores in Measuring Change in Health. Med Care 2004; 42:I25-36. [PMID: 14707753 DOI: 10.1097/01.mlr.0000103530.13056.88] [Citation(s) in RCA: 102] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES To compare alternative Rasch-based approaches to the assessment of change over time through the example of an outcome measure used in total hip replacement surgery. SUBJECTS Preoperative data were collected on 1424 patients receiving total hip replacement surgery; 1221 (86%) were sent follow-up questionnaires 1 year after surgery. MEASURES The 12-item Oxford Hip Score (OHS) questionnaire administered preoperatively and 1-year postoperatively. METHODS Subscales of the OHS for pain and functional impairment were examined for unidimensionality and item invariance. Two criteria were used to examine Rasch-based measurement of the 2 subscales. Advantages of Rasch measurement were examined in terms of whether it produced improved discrimination of outcomes of patients (1) undergoing different levels of complexity of surgery; and (2) reporting different retrospective judgments of the success of their surgery. Using the method of relative precision in relation to groups of patients distinguished in these 2 ways, change scores using Likert scoring methods were compared with 2 Rasch scoring methods: (1) separate analyses of the 2 time points; and (2) a common scale analysis obtained by stacking patients from the 2 time points. RESULTS Less evidence for item invariance over time was found for the pain subscale. Other evidence supported treating subscales as unidimensional. Whichever Rasch scoring method was used, some gains in precision over standard Likert scoring were obtained in discriminating between groups of patients. CONCLUSIONS The evidence from the current study suggests that there may be some gains in sensitivity to change of outcome measures from different Rasch-based scoring approaches.
Collapse
Affiliation(s)
- Josephine M Norquist
- Department of Public Health, Institute of Health Sciences, University of Oxford, Oxford, UK
| | | | | | | |
Collapse
|
85
|
Holman R, Glas CAW, de Haan RJ. Power analysis in randomized clinical trials based on item response theory. CONTROLLED CLINICAL TRIALS 2003; 24:390-410. [PMID: 12865034 DOI: 10.1016/s0197-2456(03)00061-8] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Patient relevant outcomes, measured using questionnaires, are becoming increasingly popular endpoints in randomized clinical trials (RCTs). Recently, interest in the use of item response theory (IRT) to analyze the responses to such questionnaires has increased. In this paper, we used a simulation study to examine the small sample behavior of a test statistic designed to examine the difference in average latent trait level between two groups when the two-parameter logistic IRT model for binary data is used. The simulation study was extended to examine the relationship between the number of patients required in each arm of an RCT, the number of items used to assess them, and the power to detect minimal, moderate, and substantial treatment effects. The results show that the number of patients required in each arm of an RCT varies with the number of items used to assess the patients. However, as long as at least 20 items are used, the number of items barely affects the number of patients required in each arm of an RCT to detect effect sizes of 0.5 and 0.8 with a power of 80%. In addition, the number of items used has more effect on the number of patients required to detect an effect size of 0.2 with a power of 80%. For instance, if only five randomly selected items are used, it is necessary to include 950 patients in each arm, but if 50 items are used, only 450 are required in each arm. These results indicate that if an RCT is to be designed to detect small effects, it is inadvisable to use very short instruments analyzed using IRT. Finally, the SF-36, SF-12, and SF-8 instruments were considered in the same framework. Since these instruments consist of items scored in more than two categories, slightly different results were obtained.
Collapse
Affiliation(s)
- Rebecca Holman
- Department of Clinical Epidemiology and Biostatistics, Academic Medical Center, Amsterdam, The Netherlands.
| | | | | |
Collapse
|
86
|
Fortinsky RH, Garcia RI, Joseph Sheehan T, Madigan EA, Tullai-McGuinness S. Measuring disability in Medicare home care patients: application of Rasch modeling to the outcome and assessment information set. Med Care 2003; 41:601-15. [PMID: 12719685 DOI: 10.1097/01.mlr.0000062553.63745.7a] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
BACKGROUND The Outcome and Assessment Information Set (OASIS) is the universal clinical assessment tool for adult nonmaternity patients receiving skilled care at home from Medicare-certified home health agencies in the United States. Anticipating increased use of OASIS data for research purposes, this article explored the usefulness of Rasch modeling to address disability measurement challenges presented by the unique response category structure of the seven activities of daily living (ADL) and eight instrumental ADL (IADL) items in the OASIS. OBJECTIVES To illustrate how Rasch model statistics can be used to evaluate OASIS ADL and IADL item unidimensionality and model fit; to illustrate how Rasch modeling simultaneously estimates ADL and IADL item difficulty, thresholds between item response categories, and person disability; and to compare Rasch estimates of item difficulty and person disability scores to estimates based on more conventional Likert scoring techniques. SUBJECTS Medicare-eligible home health care patients (n = 583) served by one of 12 home care agencies in Ohio between November 1999 and September 2000. MEASURES ADL and IADL items were measured three ways: according to the original OASIS scoring (raw Likert); transformed raw Likert scores accounting for the nonuniform item structure (corrected Likert); and Rasch Partial Credit model scores. RESULTS The items bathing and telephone use showed evidence of unexpected response patterns; recoding of these items was necessary for good Rasch model fit. Partial Credit model results revealed that interval distances between response categories varied widely across the 15 ADL and IADL items. When ADL and IADL items were ranked by level of difficulty, results were similar between Rasch and corrected Likert measurement approaches; however, corrected Likert person scores were found to be nonlinear at highest and lowest disability levels when plotted against Rasch person scores. CONCLUSIONS Rasch modeling can help improve the precision of disability measurement in Medicare home care patients when using ADL and IADL items from the OASIS instrument.
Collapse
Affiliation(s)
- Richard H Fortinsky
- Center on Aging, University of Connecticut Health Center, Farmington 06030-5215, USA.
| | | | | | | | | |
Collapse
|
87
|
Bjorner JB, Ware JE, Kosinski M. The potential synergy between cognitive models and modern psychometric models. Qual Life Res 2003; 12:261-74. [PMID: 12769138 DOI: 10.1023/a:1023295421684] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Analyses of cognitive aspects of survey methodology (CASM) and psychometric analysis are two methods that are able to complement each other. We use concrete examples to illustrate how psychometric analyses can test hypotheses from CASM. The psychometrics framework recognizes that survey responses are affected by other factors than the concept being assessed, for example by cognitive factors and processes. Such factors are subsumed under the concept of measurement error. Possible sources of measurement error can be tested, e.g. by randomized experiments. A standard way to reduce measurement error is to ask several questions about the same concept and combine the answers into a multi-item scale that is more precise than the individual items. Techniques like structural equation models use the item correlations to assess the magnitude of measurement error and to test the assumptions behind the multi-item scale, e.g. the effect of common response choices and item time frames. A central problem in modern psychometrics is how to model the mapping of the continuous latent variable onto the item response choice categories. This is achieved by threshold models (e.g. item response models and structural equation models for categorical data). These models can, for example, analyze the impact of mode of administration, test whether the items function in the same way for all people (measurement invariance/differential item functioning) and examine the consistency of responses from any single person. Such analyses provide new possibilities for combining psychometrics and cognitive methods.
Collapse
Affiliation(s)
- Jakob B Bjorner
- National Institute of Occupational Health, Copenhagen, Denmark.
| | | | | |
Collapse
|
88
|
van der Heijden PGM, van Buuren S, Fekkes M, Radder J, Verrips E. Unidimensionality and reliability under Mokken scaling of the Dutch language version of the SF-36. Qual Life Res 2003; 12:189-98. [PMID: 12639065 DOI: 10.1023/a:1022269315437] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The sub-scales of the SF-36 in the Dutch National Study are investigated with respect to unidimensionality and reliability. It is argued that these properties deserve separate treatment. For unidimensionality we use a non-parametric model from item response theory, called the Mokken scaling model, and compute the corresponding scalability coefficients. We estimate reliability under the Mokken model, assuming that the items are double homogeneous, and compare it to Cronbach's alpha. The scalability of the sub-scale general health perceptions is medium (H = 0.46), and for the other sub-scales it is strong (H > or = 0.6). The reliability in terms of alpha indicates that all sub-scales can be used in basic research (alpha > 0.70), but that only physical functioning can be used for clinical applications of quality of life (alpha > 0.90). The relative merits of our approach are discussed.
Collapse
|
89
|
Fitzpatrick R, Norquist JM, Dawson J, Jenkinson C. Rasch scoring of outcomes of total hip replacement. J Clin Epidemiol 2003; 56:68-74. [PMID: 12589872 DOI: 10.1016/s0895-4356(02)00532-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
We examined whether there are advantages in terms of outcome assessment of using Rasch methods of scoring the 12-item Oxford Hip Score questionnaire over conventionally summed scores. Data were collected on patients receiving total hip replacement surgery. Three patient groups were created according to surgery type: primary, revision, and re-revision; two groups were created according to satisfaction with surgery: very satisfied and dissatisfied. Analyses were performed to test the relative precision (RP) of Rasch scoring versus conventionally summed scores in discriminating the groups experiencing different types of surgery and level of satisfaction. At the 1-year follow-up, RP ratios favored the Rasch scoring method in both tests of discrimination. Considerable gains in precision were achieved with Rasch scoring methods when groups were compared in a cross-sectional way. Alternative approaches to scoring questionnaires should be investigated to better assess comparisons over time.
Collapse
Affiliation(s)
- Ray Fitzpatrick
- Department of Public Health, Institute of Health Sciences, University of Oxford, Headington, Oxford OX3 7LF, United Kingdom.
| | | | | | | |
Collapse
|
90
|
Extending the Range of Functional Assessment in Older Adults: Development of the Late-Life Function and Disability Instrument. J Aging Phys Act 2002. [DOI: 10.1123/japa.10.4.453] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
As a preliminary step in developing the physical-functioning measure of the Late-Life Function and Disability Instrument (LLFDI), the authors compared its items with the physical-functioning items (PF-10) on the SF-36 Health Survey. They compared the item coverage, hierarchy, and scale-separation properties of the PF-10 items with those of the physical-functioning items of the LLFDI. Both questionnaires were administered to 50 community-dwelling older adults. A partial-credit, 1-parameter, item-response-theory model was used to scale the items. The LLFDI improved the range of ability of daily activities that was encompassed by the PF-10 items by 46%. By sequentially deleting new items with poor fit to the overall scale and items with redundant content, the authors developed a scale more capable of accurately assessing low-functioning activities. The LLFDI function component incorporates a broader content range and better person and item separation than the PF-10 items. It appears to have potential as a comprehensive functional-activity assessment for community-dwelling older adults.
Collapse
|
91
|
LaValley MP, Felson DT. Statistical presentation and analysis of ordered categorical outcome data in rheumatology journals. ARTHRITIS AND RHEUMATISM 2002; 47:255-9. [PMID: 12115154 DOI: 10.1002/art.10453] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
OBJECTIVE To assess the appropriateness of presentation of summary measures and analysis of ordered categorical (ordinal) data in three rheumatology journals in 1999, and to consider differences between basic and clinical science articles. METHODS Six hundred forty-four full-length articles from the 1999 editions of 3 rheumatology journals were evaluated for inclusion of an ordinal outcome. Articles were classified as basic or clinical science, and the appropriateness of presentation and analysis of the ordinal outcome were assessed. Chi-square tests were used to evaluate difference in percentages. RESULTS Ordinal outcomes were identified in 175 (27.2%) of 644 articles. Only 69 (39.4%) had appropriate data presentation, and 111 (63.4%) had appropriate data analysis. Appropriate presentation was seen less commonly in the basic science rather than the clinical science articles, but differences in the occurrence of appropriate analysis were not seen. CONCLUSION Ordinal data are common in rheumatology articles, but presentation usually does not conform to recommended guidelines.
Collapse
Affiliation(s)
- Michael P LaValley
- Boston University School of Public Health, Boston University School of Medicine, Boston, Massachusetts, USA.
| | | |
Collapse
|
92
|
Hart DL, Wright BD. Development of an index of physical functional health status in rehabilitation. Arch Phys Med Rehabil 2002; 83:655-65. [PMID: 11994805 DOI: 10.1053/apmr.2002.31178] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
OBJECTIVE To describe (1) the development of an index of physical functional health status (FHS) and (2) its hierarchical structure, unidimensionality, reproducibility of item calibrations, and practical application. DESIGN Rasch analysis of existing data sets. SETTING A total of 715 acute, orthopedic outpatient centers and 62 long-term care facilities in 41 states participating with Focus On Therapeutic Outcomes, Inc. PATIENTS A convenience sample of 92,343 patients (40% male; mean age +/- standard deviation [SD], 48+/-17y; range, 14-99y) seeking rehabilitation between 1993 and 1999. INTERVENTIONS Not applicable. MAIN OUTCOME MEASURES Patients completed self-report health status surveys at admission and discharge. The Medical Outcomes Study 36-Item Short-Form Health Survey's physical functioning scale (PF-10) is the foundation of the physical FHS. The Oswestry Low Back Pain Disability Questionnaire, Neck Disability Index, Lysholm Knee Questionnaire, items pertinent to patients with upper-extremity impairments, and items pertinent to patients with more involved neuromusculoskeletal impairments were cocalibrated into the PF-10. RESULTS The final FHS item bank contained 36 items (patient separation, 2.3; root mean square measurement error, 5.9; mean square +/- SD infit, 0.9+/-0.5; outfit, 0.9+/-0.9). Analyses supported empirical item hierarchy, unidimensionality, reproducibility of item calibrations, and content and construct validity of the FHS-36. CONCLUSIONS Results support the reliability and validity of FHS-36 measures in the present sample. Analyses show the potential for a dynamic, computer-controlled, adaptive survey for FHS assessment applicable for group analysis and clinical decision making for individual patients.
Collapse
Affiliation(s)
- Dennis L Hart
- Focus On Therapeutic Outcomes, Inc, White Stone, VA 22578-2403, USA.
| | | |
Collapse
|
93
|
Cella D, Bullinger M, Scott C, Barofsky I. Group vs individual approaches to understanding the clinical significance of differences or changes in quality of life. Mayo Clin Proc 2002; 77:384-92. [PMID: 11936936 DOI: 10.4065/77.4.384] [Citation(s) in RCA: 147] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
This article focuses on the traversing of group and individual levels of quality-of-life data. A deductive approach is used to address the extent to which group data can be used to estimate clinical significance at the individual level. An inductive approach is used to evaluate the extent to which individual change data can be brought to the group level to define clinical significance. Both approaches have benefits and drawbacks. This article addresses how clinical significance can be defined for an individual when the threshold for meaningfulness is drawn from group data. It also addresses the condition under which one can use the same threshold difference for group vs individual differences or changes. A sample inductive approach explores the means to identify a clinically significant result or change, with use of insights from cognitive psychology. In most deductive approaches, the identification of a clinically significant difference or change requires identification of a criterion (or at least an interpretable anchor) against which the significance of a change in respondent score is compared.
Collapse
Affiliation(s)
- David Cella
- Center on Outcomes, Research and Education, Evanston Northwestern Healthcare, Northwestern University, Ill, USA
| | | | | | | |
Collapse
|
94
|
Rabeneck L, Cook KF, Wristers K, Souchek J, Menke T, Wray NP. SODA (severity of dyspepsia assessment): a new effective outcome measure for dyspepsia-related health. J Clin Epidemiol 2001; 54:755-65. [PMID: 11470383 DOI: 10.1016/s0895-4356(00)00365-6] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The aim of this research was to develop and evaluate an instrument for measuring dyspepsia-related health to serve as the primary outcome measure for randomized clinical trials. Building on our previous work we developed SODA (Severity of Dyspepsia Assessment), a multidimensional dyspepsia measure. We evaluated SODA by administering it at enrollment and seven follow-up visits to 98 patients with dyspepsia who were randomized to a 6-week course of omeprazole versus placebo and followed over 1 year. The mean age was 53 years, and six patients (6%) were women. Median Cronbach's alpha reliability estimates over the eight visits for the SODA Pain Intensity, Non-Pain Symptoms, and Satisfaction scales were 0.97, 0.90, and 0.92, respectively. The mean change scores for all three scales discriminated between patients who reported they were improved versus those who were unchanged, providing evidence of validity. The effect sizes for the Pain Intensity (.98) and Satisfaction (.87) scales were large, providing evidence for responsiveness. The effect size for the Non-Pain Symptoms scale was small (.24), indicating lower responsiveness in this study sample. SODA is a new, effective instrument for measuring dyspepsia-related health. SODA is multidimensional and responsive to clinically meaningful change with demonstrated reliability and validity.
Collapse
Affiliation(s)
- L Rabeneck
- Department of Veterans Affairs (VA) Health Services Research and Development (HSR & D) Field Program, the VA Rehabilitation R&D Center, and Baylor College of Medicine, Houston, TX 77030, USA.
| | | | | | | | | | | |
Collapse
|
95
|
Beaton DE, Bombardier C, Guillemin F, Ferraz MB. Guidelines for the process of cross-cultural adaptation of self-report measures. Spine (Phila Pa 1976) 2000; 25:3186-91. [PMID: 11124735 DOI: 10.1097/00007632-200012150-00014] [Citation(s) in RCA: 6692] [Impact Index Per Article: 278.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- D E Beaton
- Institute for Work and Health, Toronto ON, Canada.
| | | | | | | |
Collapse
|
96
|
Response to Hays et al and McHorney and Cohen: Practical Implications of Item Response Theory and Computerized Adaptive Testing. Med Care 2000. [DOI: 10.1097/00005650-200009002-00011] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
97
|
Bj�rner J. Quality of Life Assessment in Clinical Trials; Methods and Practice. Maurice J. Staquet, Ron D. Hays and Peter M. Fayers (eds), Oxford University Press, New York, 1998. No. of pages: x+360. Price: �29.50. ISBN 0-1926-2785-6. Stat Med 2000. [DOI: 10.1002/(sici)1097-0258(20000530)19:10<1382::aid-sim403>3.0.co;2-e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
98
|
Williams JI. Ready, set, stop: reflections on assessing quality of life and the WHOQOL-100 (U.S. version). World Health Organization Quality of Life. J Clin Epidemiol 2000; 53:13-7. [PMID: 10693898 DOI: 10.1016/s0895-4356(99)00122-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- J I Williams
- Institute of Clinical Evaluative Sciences, University of Toronto, Department of Family & Community Medicine, Ontario, Canada.
| |
Collapse
|
99
|
Bonomi AE, Patrick DL, Bushnell DM, Martin M. Quality of life measurement: will we ever be satisfied? J Clin Epidemiol 2000; 53:19-23. [PMID: 10693899 DOI: 10.1016/s0895-4356(99)00121-3] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- A E Bonomi
- University of Washington, Department of Health Services, Seattle, USA
| | | | | | | |
Collapse
|
100
|
Wagner AK, Wyss K, Gandek B, Kilima PM, Lorenz S, Whiting D. A Kiswahili version of the SF-36 Health Survey for use in Tanzania: translation and tests of scaling assumptions. Qual Life Res 1999; 8:101-10. [PMID: 10457743 DOI: 10.1023/a:1026441415079] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The objective of the study was to translate and adapt the SF-36 Health Survey for use in Tanzania and to test the psychometric properties of the Kiswahili SF-36. A cross-sectional study was conducted as part of a household survey of a representative sample of the adult population of Dar es Salaam, Tanzania. The IQOLA method of forward and backward translation was used to translate the SF-36 into Kiswahili. The translated questionnaire was administered by trained interviewers to 3,802 adults (50% women, mean (SD) age 31 (13) years, 50% married and 60% with primary education). Data quality and psychometric assumptions underlying the scoring of the eight SF-36 scales were evaluated for the entire sample and separately for the least educated subgroup (n = 402), using multitrait scaling analysis. Forward and backward translation procedures resulted in a Kiswahili SF-36 that was considered conceptually equivalent to the US English SF-36. Data quality was excellent: only 1.2% of respondents were excluded because they answered less than half of the items for one or more scales; ninety percent of respondents answered mutually exclusive items consistently. Median item-scale correlations across the eight scales ranged from 0.47 to 0.81 for the entire sample. Median scaling success rates were 100% (range 87.5-100.0). The median internal consistency reliability of the eight scales for the entire sample was 0.81 (range 0.70-0.92). Floor effects were low and ceiling effects were high on five of the eight scales. Results for n = 402 people without formal education did not differ substantially from those of the entire sample. The results of data quality and psychometric tests support the scoring of the eight scales using standard scoring algorithms. The Kiswahili translation of the SF-36 may be useful in estimating the health of people in Dar es Salaam. Evidence for the validity of the SF-36 for use in Tanzania needs to be accumulated.
Collapse
Affiliation(s)
- A K Wagner
- Health Institute, New England Medical Center, Boston, MA 02111, USA
| | | | | | | | | | | |
Collapse
|