1
|
Gandek B, Roos EM, Franklin PD, Ware JE. A 12-item short form of the Hip disability and Osteoarthritis Outcome Score (HOOS-12): tests of reliability, validity and responsiveness. Osteoarthritis Cartilage 2019; 27:754-761. [PMID: 30419279 DOI: 10.1016/j.joca.2018.09.017] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 08/31/2018] [Accepted: 09/18/2018] [Indexed: 02/02/2023]
Abstract
OBJECTIVE To evaluate reliability, validity and responsiveness of HOOS-12, a 12-item short form of the 40-item Hip disability and Osteoarthritis Outcome Score (HOOS). HOOS-12 provides Pain, Function and Quality of Life (QOL) scale scores and a summary hip impact score. DESIGN Data from 1,273 FORCE-TJR hip osteoarthritis (OA) patients who completed HOOS before and six and 12 months after total hip replacement (THR) were analyzed. HOOS-12 includes a pain frequency item and three items measuring pain during increasingly difficult (sitting/lying, walking, stairs) activities; function items about standing, rising from sitting, getting in/out of a car, and walking on an uneven surface; and the 4-item HOOS QOL scale. Percent computable scale scores, floor and ceiling effects, internal consistency reliability, validity (scale correlations, tests of known groups validity using one-way analysis of variance (ANOVA)), and responsiveness (effect sizes (ES), standardized response means (SRM)) were compared for HOOS-12, full-length HOOS, HOOS-PS and HOOS, JR. RESULTS Internal consistency reliability was above 0.70 for all HOOS-12 scales and above 0.90 for the HOOS-12 Summary score. Validity and responsiveness of HOOS-12 Pain, Function and QOL scales were satisfactory and reached similar conclusions as comparable full-length HOOS scales. The HOOS-12 Summary score was highly responsive in discriminating between groups who differed in global ratings of post-THR change in physical capabilities and had high ES and SRM standardized response means. CONCLUSIONS HOOS-12 was a reliable and valid alternative to HOOS in THR patients with moderate to severe OA and provided three domain-specific and summary hip impact scores with substantially reduced respondent burden.
Collapse
Affiliation(s)
- B Gandek
- University of Massachusetts Medical School, Worcester, MA, USA; John Ware Research Group, Watertown, MA, USA.
| | - E M Roos
- Department of Sports Science and Clinical Biomechanics, University of Southern Denmark, Odense, Denmark.
| | - P D Franklin
- University of Massachusetts Medical School, Worcester, MA, USA.
| | - J E Ware
- University of Massachusetts Medical School, Worcester, MA, USA; John Ware Research Group, Watertown, MA, USA.
| |
Collapse
|
2
|
Gandek B, Roos EM, Franklin PD, Ware JE. Item selection for 12-item short forms of the Knee injury and Osteoarthritis Outcome Score (KOOS-12) and Hip disability and Osteoarthritis Outcome Score (HOOS-12). Osteoarthritis Cartilage 2019; 27:746-753. [PMID: 30593867 DOI: 10.1016/j.joca.2018.11.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 11/16/2018] [Accepted: 11/27/2018] [Indexed: 02/02/2023]
Abstract
OBJECTIVE To develop 12-item short forms (KOOS-12, HOOS-12) of the 42-item Knee injury and Osteoarthritis Outcome Score (KOOS) and 40-item Hip disability and Osteoarthritis Outcome Score (HOOS) that represent the full-length instruments sufficiently to provide joint-specific pain, function and quality of life (QOL) domain and summary joint impact scores. This paper describes KOOS-12 and HOOS-12 item selection. Subsequent papers will examine KOOS-12 and HOOS-12 reliability, validity and responsiveness. DESIGN Items were selected based on qualitative information from patients, clinicians and KOOS/HOOS translators and analysis of data from 1,395 knee osteoarthritis (OA) and 1,281 hip OA patients from the FORCE-TJR cohort who completed KOOS or HOOS before and after total joint replacement (TJR). Item response theory models and computerized adaptive test (CAT) simulations were used to identify items that best measured patients' levels of pain and function pre- and post-TJR. KOOS-12/HOOS-12 items were selected based on content, coverage of a wide measurement range, high item information, item usage in CAT simulations, scale-level properties (reliability, validity, responsiveness), and qualitative information. RESULTS KOOS-12 and HOOS-12 each included a pain frequency item and three items measuring pain during increasingly difficult activities (sitting/lying, walking, up/down stairs); function items about standing, rising from sitting, getting in/out of a car, and twisting/pivoting (KOOS-12) or walking on an uneven surface (HOOS-12); and the original 4-item QOL scale. CONCLUSIONS This study demonstrated the benefits of examining patient-reported outcome measures using modern psychometric methods, to create short forms with diverse content that provide domain-specific and summary joint impact scores.
Collapse
Affiliation(s)
- B Gandek
- University of Massachusetts Medical School, Worcester, MA, USA; John Ware Research Group, Watertown, MA, USA.
| | - E M Roos
- Department of Sports Science and Clinical Biomechanics, University of Southern Denmark, Odense, Denmark.
| | - P D Franklin
- University of Massachusetts Medical School, Worcester, MA, USA.
| | - J E Ware
- University of Massachusetts Medical School, Worcester, MA, USA; John Ware Research Group, Watertown, MA, USA.
| |
Collapse
|
3
|
Gandek B, Roos EM, Franklin PD, Ware JE. A 12-item short form of the Knee injury and Osteoarthritis Outcome Score (KOOS-12): tests of reliability, validity and responsiveness. Osteoarthritis Cartilage 2019; 27:762-770. [PMID: 30716536 DOI: 10.1016/j.joca.2019.01.011] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 01/22/2019] [Accepted: 01/23/2019] [Indexed: 02/02/2023]
Abstract
OBJECTIVE To evaluate reliability, validity and responsiveness of KOOS-12, a 12-item short form of the 42-item Knee injury and Osteoarthritis Outcome Score (KOOS) that provides Pain, Function and Quality of Life (QOL) scale scores and a summary knee impact score. DESIGN Data from 1,392 knee osteoarthritis (OA) patients from the FORCE-TJR research cohort who completed KOOS before and 6 and 12 months after total knee replacement (TKR) were analyzed. KOOS-12 includes a pain frequency item and three items measuring pain during increasingly difficult (sitting/lying, walking, stairs) activities; function items about standing, rising from sitting, getting in/out of a car, and twisting/pivoting; and the 4-item KOOS QOL scale. Percent computable scale scores, floor and ceiling effects, internal consistency reliability, validity (scale correlations, tests of known groups validity using one-way analysis of variance (ANOVA)) and responsiveness (effect sizes, standardized response means) were compared for the KOOS-12, full-length KOOS, KOOS-PS and KOOS, JR. RESULTS Internal consistency reliability was above 0.70 for all KOOS-12 scales and ≥0.90 for the KOOS-12 Summary score. Validity and responsiveness of KOOS-12 Pain, Function and QOL scales was satisfactory and reached similar conclusions as comparable full-length KOOS scales. The KOOS-12 Summary score was most responsive in discriminating between groups who differed in global ratings of post-TKR change in physical capabilities and had the highest effect sizes and standardized response means. CONCLUSIONS KOOS-12 was a reliable and valid alternative to KOOS in TKR patients with moderate to severe OA and provided three domain-specific and summary knee impact scores with substantially reduced respondent burden.
Collapse
Affiliation(s)
- B Gandek
- University of Massachusetts Medical School, Worcester, MA, USA; John Ware Research Group, Watertown, MA, USA.
| | - E M Roos
- Department of Sports Science and Clinical Biomechanics, University of Southern Denmark, Odense, Denmark.
| | - P D Franklin
- University of Massachusetts Medical School, Worcester, MA, USA.
| | - J E Ware
- University of Massachusetts Medical School, Worcester, MA, USA; John Ware Research Group, Watertown, MA, USA.
| |
Collapse
|
4
|
Kosinski M, Bjorner JB, Ware JE, Batenhorst A, Cady RK. The responsiveness of headache impact scales scored using 'classical' and 'modern' psychometric methods: a re-analysis of three clinical trials. Qual Life Res 2004; 12:903-12. [PMID: 14651411 DOI: 10.1023/a:1026111029376] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
BACKGROUND While item response theory (IRT) offers many theoretical advantages over classical test theory in the construction and scoring of patient based measures of health few studies compare scales constructed from both methodologies head to head. OBJECTIVE Compare the responsiveness to treatment of migraine specific scales scored using summated rating scale methods vs. IRT methods. METHODS The data came from three clinical studies of migraine treatment that used the Migraine Specific Quality of Life Questionnaire (MSQ). Five methods of quantifying responsiveness were used to evaluate and compare changes from pre- to post-treatment in MSQ scales scored using Likert and IRT scaling methods. RESULTS Changes in all MSQ scale scores from pre- to post-treatment were highly significant in all three studies. A single index scored from the MSQ using IRT methods was determined to be more responsive than any one of the MSQ subscales across the five methods used to quantify responsiveness. Across 13 of the 15 tests (5 responsiveness methods * 3 studies) conducted, the single index scored from the MSQ using IRT methods was the most responsive measure. CONCLUSIONS IRT methods increased the responsiveness of the MSQ to the treatment of migraine. The results agree with the psychometric evidence that suggest that it is feasible to score a single index from the MSQ using IRT methods. This approach warrants further testing with other measures of migraine impact.
Collapse
Affiliation(s)
- M Kosinski
- QualityMetric Incorporated, Lincoln, RI, USA.
| | | | | | | | | |
Collapse
|
5
|
Kosinski M, Bayliss MS, Bjorner JB, Ware JE, Garber WH, Batenhorst A, Cady R, Dahlöf CGH, Dowson A, Tepper S. A six-item short-form survey for measuring headache impact: the HIT-6. Qual Life Res 2003; 12:963-74. [PMID: 14651415 DOI: 10.1023/a:1026119331193] [Citation(s) in RCA: 823] [Impact Index Per Article: 39.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
BACKGROUND Migraine and other severe headaches can cause suffering and reduce functioning and productivity. Patients are the best source of information about such impact. OBJECTIVE To develop a new short form (HIT-6) for assessing the impact of headaches that has broad content coverage but is brief as well as reliable and valid enough to use in screening and monitoring patients in clinical research and practice. METHODS HIT-6 items were selected from an existing item pool of 54 items and from 35 items suggested by clinicians. Items were selected and modified based on content validity, item response theory (IRT) information functions, item internal consistency, distributions of scores, clinical validity, and linguistic analyses. The HIT-6 was evaluated in an Internet-based survey of headache sufferers (n = 1103) who were members of America Online (AOL). After 14 days, 540 participated in a follow-up survey. RESULTS HIT-6 covers six content categories represented in widely used surveys of headache impact. Internal consistency, alternate forms, and test-retest reliability estimates of HIT-6 were 0.89, 0.90, and 0.80, respectively. Individual patient score confidence intervals (95%) of app. +/-5 were observed for 88% of all respondents. In tests of validity in discriminating across diagnostic and headache severity groups, relative validity (RV) coefficients of 0.82 and 1.00 were observed for HIT-6, in comparison with the Total Score. Patient-level classifications based in HIT-6 were accurate 88.7% of the time at the recommended cut-off score for a probability of migraine diagnosis. HIT-6 was responsive to self-reported changes in headache impact. CONCLUSIONS The IRT model estimated for a 'pool' of items from widely used measures of headache impact was useful in constructing an efficient, reliable, and valid 'static' short form (HIT-6) for use in screening and monitoring patient outcomes.
Collapse
Affiliation(s)
- M Kosinski
- QualityMetric Incorporated Lincoln, RI 02865, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Abstract
In response to questions raised about the "accuracy" of SF-36 physical (PCS) and mental (MCS) component summary scores, particularly extremely high and low scores, we briefly comment on: how they were developed, how they are scored, the factor content of the eight SF-36 subscales, cross-tabulations between item-level responses and extreme summary scores, and published and new tests of their empirical validity. Published cross-tabulations between SF-36 items and PCS and MCS scores, reanalyses of public datasets (N = 5919), and preliminary results from the Medicare Health Outcomes Survey (HOS) (N = 172,314) yielded little or no evidence in support of Taft's hypothesis that extreme scores are an invalid artifact of some negative scoring weights. For example, in the HOS, those (N = 432) with "unexpected" PCS scores worse than 20 (which, according to Taft, indicate better mental health rather than worse physical health) were about 25% more likely to die within two years, in comparison with those scoring in the next highest (21-30) category. In this test and in all other empirical tests, results of predictions supported the validity of extreme PCS and MCS scores. We recommend against the interpretation of average differences smaller than one point in studies that seek to detect "false" measurement and we again repeat our 7-year-old recommendation that results based on summary measures should be thoroughly compared with the SF-36 profile before drawing conclusions. To facilitate such comparisons, scoring utilities and user-friendly graphs for SF-36 profiles and physical and mental summary scores (both orthogonal and oblique scoring algorithms) have been made available on the Internet at www.sf-36.com/test.
Collapse
Affiliation(s)
- J E Ware
- Quality Metric, Inc, Lincoln, RI 02865, USA.
| | | |
Collapse
|
7
|
Manocchia M, Keller S, Ware JE. Sleep problems, health-related quality of life, work functioning and health care utilization among the chronically ill. Qual Life Res 2002; 10:331-45. [PMID: 11763246 DOI: 10.1023/a:1012299519637] [Citation(s) in RCA: 141] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
OBJECTIVES To provide a comprehensive assessment of whether sleep problems among the chronically ill are associated with decrements in functional health and well-being, decreases in work functioning and increases in the use of health care services. DESIGN Cross-sectional survey of patients from the Medical Outcomes Study (MOS), an observational study of functional health and well-being. Chronically ill patients (n = 3484) were sampled from health maintenance organizations, large multi-specialty groups, and solo or single-specialty group practices in Boston, Los Angeles, and Chicago. Chronic illness subgroups include: clinical depression (n = 527), congestive heart failure (229), diabetes (n = 577), recent myocardial infarction (n = 170), hypertension (n = 2206), asthma (n = 84), back problems (n = 771), and arthritis (n = 672). ANCOVA analyses of the relationship between sleep problems and SF-36 scales and summaries were performed. In addition, a 'relative impact' analysis determined what scales or summaries were most associated with sleep problems. MAIN OUTCOME MEASURES Eight scales and two summary measures from the SF-36 Health Survey, work productivity and work quality measures and self-reports of health care utilization. RESULTS Comparing chronically ill patients with no sleep problems to those with mild, moderate, or severe sleep problems revealed a direct association between sleep problems and decrements in health-related quality of life (HRQOL) as measured by SF-36 scales and summaries (MANOVA F 24.1; d.f. 24; p < or = 0.0001). In addition, significant differences in HRQOL were found when comparing patients with and without sleep problems within most of the disease groups studied. The relative impact analysis revealed that measures of mental health and the mental health summary were most associated with sleep problem severity in the total sample and chronic disease subsets, in comparison with measures of physical health. In addition, monotonic relationships were found between severity of sleep problems and decreases in work productivity and increases in health care utilization, as expected. CONCLUSIONS The analyses revealed that sleep problems go hand in hand with poorer mental health, diminished work productivity and work quality and greater use of health care services. Sleep problems, therefore, may be a significant confounding factor in the interpretation of health outcomes among patients with chronic diseases.
Collapse
Affiliation(s)
- M Manocchia
- Health and Addictions Research, Inc., Boston, MA 02116, USA.
| | | | | |
Collapse
|
8
|
Heiligenstein JH, Ware JE, Beusterien KM, Roback PJ, Andrejasich C, Tollefson GD. Acute effects of fluoxetine versus placebo on functional health and well-being in late-life depression. Int Psychogeriatr 2001; 7 Suppl:125-37. [PMID: 8580388 DOI: 10.1017/s1041610295002407] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
In a randomized 6-week trial comparing fluoxetine with placebo, the Medical Outcomes Study 36-Item Short-Form Health Status Survey (SF-36) scales were used to measure the effects of treatment on functional health and well-being among elderly (age > or = 60 years) outpatients with major depression. In the fluoxetine and placebo groups, 261 and 271 patients, respectively, completed the SF-36 before treatment and at Weeks 3 and 6. Compared with national norms for individuals over age 60, study patients before treatment exhibited baseline decrements on the following SF-36 scales: mental health, role limitations due to emotional problems, social functioning, vitality, role limitations due to physical problems, and bodily pain. Analyses of SF-36 changed scores from baseline to Week 6 revealed that the fluoxetine group improved more than the placebo group across all scales. Differences in changes of scores between groups were significant (p < .05), favoring the fluoxetine group for the scales of mental health, role limitations due to emotional problems, physical functioning, and bodily pain. Improvements observed in the fluoxetine group were both clinically and socially significant.
Collapse
Affiliation(s)
- J H Heiligenstein
- Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana, USA
| | | | | | | | | | | |
Collapse
|
9
|
Neary MP, Cort S, Bayliss MS, Ware JE. Sustained virologic response is associated with improved health-related quality of life in relapsed chronic hepatitis C patients. Semin Liver Dis 2001; 19 Suppl 1:77-85. [PMID: 10349695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Although evidence of virologic elimination, normalization of serum alanine aminotransferase levels, and reduction in liver inflammation are the principal therapeutic outcome goals in chronic hepatitis C patients, improvement in health-related quality of life (HQL) is also an important aspect of therapeutic outcome. In a recent report of chronic hepatitis C patients treated for 24 weeks with interferon, sustained virologic response (24 weeks post-treatment) was associated with improvement in HQL compared with nonresponse. We report on the relationship between sustained virologic response and Hepatitis Quality-of-Life Questionnaire (HQLQ) survey results of patients who relapsed after a previous course of interferon alfa who were subsequently treated with recombinant interferon alfa-2b (rIFN-alpha 2b) either alone or in combination with ribavirin. The HQLQ was administered at baseline, at treatment Weeks 12 and 24, and at follow-up Weeks 12 and 24. All patients received rIFN-alpha 2b 3 million International Units by subcutaneous injection three times weekly plus either oral ribavirin (1,000 or 1,200 mg) or placebo daily for 24 weeks. At baseline, patients scored lower than adjusted population norms in HQL. Relative to patients treated with rIFN-alpha 2b monotherapy, patients receiving combination therapy showed better HQL in 6 of 13 domains. Furthermore, sustained virologic response in either treatment group was associated with improvement in the scores of both generic and hepatitis-specific HQL survey domains. These results indicate that successful therapeutic resolution of hepatitis C infection improves HQL as assessed by generic and hepatitis C-specific measures of functional health and well-being. Furthermore, improvements in HQL outcome measures may predict reduced demand for health care resources and greater productivity in the workplace.
Collapse
Affiliation(s)
- M P Neary
- Schering-Plough Research Institute, Kenilworth, New Jersey 07033, USA
| | | | | | | |
Collapse
|
10
|
Abstract
BACKGROUND Asthma treatment has broadened from managing clinical markers to incorporate factors that are most meaningful to patients, collectively called health-related quality of life (HQL). OBJECTIVE To develop an asthma-specific HQL tool, meeting demands for brevity, usefulness and measurement precision. METHODS The 20-item Sydney Asthma Quality of Life Questionnaire (AQLQ) and six additional items were studied using factor analysis, reliability and validity tests among asthma patients 14 and older. RESULTS The 15-item Integrated Therapeutics Group Asthma Short Form (ITG-ASF) retains the validity of the AQLQ with improved scaling properties and interpretability. The ITG-ASF yields 6 scores: Symptom-Free Index, Functioning with Asthma, Psychosocial Impact of Asthma, Asthma Energy and Asthma-Confidence in Health and a Total. All items correlated 0.40 or higher with their hypothesized scales and passed discriminant validity tests, with scaling success rates from 75 to 100%. Reliability exceeded the minimum of 0.70 for group comparisons. Ceiling and floor effects were acceptable. Scales were valid in relation to changes in asthma severity and lung function. The best predictor of asthma severity (National Asthma Education and Prevention Program (NAEPP) staging) was the Symptom-Free Index. A Spanish translation is available, Chinese-American is forthcoming. The reading grade level is 4.8. CONCLUSIONS Development of the ITG-ASF was a data-driven process maximizing measurement precision and breadth while minimizing burden. The ITG-ASF is a brief, comprehensive and empirically valid tool that complements traditional markers of the outcomes of asthma care.
Collapse
|
11
|
McHutchison JG, Ware JE, Bayliss MS, Pianko S, Albrecht JK, Cort S, Yang I, Neary MP. The effects of interferon alpha-2b in combination with ribavirin on health related quality of life and work productivity. J Hepatol 2001; 34:140-7. [PMID: 11211891 DOI: 10.1016/s0168-8278(00)00026-x] [Citation(s) in RCA: 168] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
BACKGROUND/AIMS Interferon plus ribavirin is the most effective therapy for chronic hepatitis C. The aim of this study was to evaluate the effect of chronic hepatitis C and therapy on health-related quality of life and work functioning. METHODS Nine hundred and twelve patients with hepatitis C infection were randomized in a controlled trial of Interferon alpha 2b 3 MU tiw for 24 or 48 weeks plus ribavirin 1000-1200 mg or placebo. Questionnaire-based assessments of health-related quality of life and work functioning were performed before, during, and after treatment. Outcome measures included the SF-36 Health Survey and additional generic and specific scales. Work functioning was assessed as missed days, shorter hours or less productivity at work. RESULTS Pre-treatment, patients had significant impairment in five of eight SF-36 concepts compared to matched population norms. Sustained responders had a return to normal for four of these five concepts. Quality of life did not improve in non-responders. Improvements in histology, viral load or ALT values predicted improvements in quality of life. Sustained responders also had improvements in work functioning and productivity. CONCLUSIONS Hepatitis C patients had impaired quality of life. After combination therapy, sustained virologic responders achieved benefits in their quality of life and work functioning.
Collapse
Affiliation(s)
- J G McHutchison
- Scripps Clinic and Research Foundation, La Jolla, CA 92037, USA.
| | | | | | | | | | | | | | | |
Collapse
|
12
|
Affiliation(s)
- J E Ware
- QualityMetric, Inc.; Lincoln, Rhode Island 02865, USA.
| |
Collapse
|
13
|
|
14
|
Ware JE, Bjorner JB, Kosinski M. Practical implications of item response theory and computerized adaptive testing: a brief summary of ongoing studies of widely used headache impact scales. Med Care 2000; 38:II73-82. [PMID: 10982092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Affiliation(s)
- J E Ware
- QualityMetric, Inc, Lincoln, Rhode Island 02865, USA.
| | | | | |
Collapse
|
15
|
Kosinski M, Zhao SZ, Dedhiya S, Osterhaus JT, Ware JE. Determining minimally important changes in generic and disease-specific health-related quality of life questionnaires in clinical trials of rheumatoid arthritis. Arthritis Rheum 2000; 43:1478-87. [PMID: 10902749 DOI: 10.1002/1529-0131(200007)43:7<1478::aid-anr10>3.0.co;2-m] [Citation(s) in RCA: 372] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
OBJECTIVE To define clinically meaningful changes in 2 widely used health-related quality of life (HQL) instruments in studies of patients with rheumatoid arthritis (RA). METHODS Patients with RA (n = 693) who were enrolled in 2 double-blind, placebo-controlled clinical trials completed the Short Form 36 (SF-36) modified health survey and the Health Assessment Questionnaire (HAQ) disability index at baseline and 6-week followup assessments. Data on 5 RA severity measures were also collected at baseline and at 6 weeks (patient and physician global assessments, joint swelling and tenderness counts, and global pain assessment). Comparison of changes in the SF-36 scales and HAQ scores was made between groups of patients known to differ in the level of change on each RA severity measure. RESULTS With few exceptions, changes in the SF-36 and HAQ scores were different between patients who differed in the level of change on each RA severity measure. Changes in the SF-36 and HAQ scores were more strongly related to changes in the patient and physician global assessments and patient pain assessment than to changes in the joint swelling and tenderness counts. CONCLUSION Based on these results, minimally important changes in the SF-36 scales and HAQ disability scores were determined, which will be useful in interpreting HQL results in clinical trials.
Collapse
Affiliation(s)
- M Kosinski
- QualityMetric, Inc., Lincoln, Rhode Island, USA
| | | | | | | | | |
Collapse
|
16
|
|
17
|
Graham DM, Blaiss MS, Bayliss MS, Espindle DM, Ware JE. Impact of changes in asthma severity on health-related quality of life in pediatric and adult asthma patients: results from the asthma outcomes monitoring system. Allergy Asthma Proc 2000; 21:151-8. [PMID: 10892517 DOI: 10.2500/108854100778148990] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The goals of asthma treatment have broadened beyond managing traditional clinical markers of disease severity, and now include a focus on benefits of treatment in terms that are most meaningful to patients. Measurement of both generic and disease-specific health-related quality of life (HQL) is advocated because each provides complementary information about how the condition affects everyday functioning and well-being and whether treatments have their intended effects. The purpose of this study was to determine the impact of changes in asthma severity (defined using NHLBI/NAEPP severity staging) on patient-assessed HQL. Two hundred and thirty-three pediatric asthma patients and 269 adult asthma patients were evaluated in a one-year observational study. Analyses were performed to compare the generic and asthma-specific scores for patients whose asthma severity improved, stayed the same, or worsened over one year. The asthma-specific scales are sensitive to changes in disease severity. Of the generic scales, those tapping areas of physical health are more affected than the mental/emotional scales. This confirms that HQL measures are responsive to changes in asthma severity. They complement traditional clinical markers used to evaluate changes in a patient's disease state and thus give the physician another useful tool in following the clinical progress of the child with asthma.
Collapse
Affiliation(s)
- D M Graham
- Department of Pediatrics and Medicine, University of Tennessee, Memphis, USA
| | | | | | | | | |
Collapse
|
18
|
Moinpour CM, Lovato LC, Thompson IM, Ware JE, Ganz PA, Patrick DL, Shumaker SA, Donaldson GW, Ryan A, Coltman CA. Profile of men randomized to the prostate cancer prevention trial: baseline health-related quality of life, urinary and sexual functioning, and health behaviors. J Clin Oncol 2000; 18:1942-53. [PMID: 10784636 DOI: 10.1200/jco.2000.18.9.1942] [Citation(s) in RCA: 44] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE To describe men who agreed to be randomized to the Prostate Cancer Prevention Trial (PCPT), a 7-year, double-blind placebo-controlled study of the efficacy of finasteride in preventing prostate cancer. METHODS Comprehensive health-related quality-of-life data are presented for 18,882 randomized PCPT participants. RESULTS PCPT participants are highly educated, middle to upper income, and primarily white (92%). Participants reported healthy lifestyles. The mean American Urological Association Symptom Index score was well below the maximum entry score of less than 19; existing urinary symptoms were generally not bothersome. The scores for two sexual functioning scales could range from 0 to 100, with higher scores reflecting worse sexual functioning. The mean score for the Sexual Problem Scale was 19.2 out of 100, and the mean Sexual Activities Scale was 44.1 out of 100. Scores for seven of the eight Medical Outcomes Study 36-item Short-Form Health Survey scales (higher scores are better) were 10 to 20 points higher than those reported by a general population sample and differed minimally by race but not by age. Previously reported associations between sexual dysfunction and hypertension, diabetes, and depression were also observed. Men who never smoked reported less sexual dysfunction than did those who either had quit or still smoked. CONCLUSION Individuals who are likely to enroll in primary prevention trials have a high socioeconomic status, healthy lifestyle behaviors, and better health than the general population. These data help oncologists design chemoprevention trials with respect to the selection of health-related quality-of-life assessments and recruitment strategies.
Collapse
Affiliation(s)
- C M Moinpour
- Southwest Oncology Group Statistical Center, Division of Public Health Sciences, and Clinical Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Safran DG, Rogers WH, Tarlov AR, Inui T, Taira DA, Montgomery JE, Ware JE, Slavin CP. Organizational and financial characteristics of health plans: are they related to primary care performance? Arch Intern Med 2000; 160:69-76. [PMID: 10632307 DOI: 10.1001/archinte.160.1.69] [Citation(s) in RCA: 43] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
BACKGROUND Primary care performance has been shown to differ under different models of health care delivery, even among various models of managed care. Pervasive changes in our nation's health care delivery systems, including the emergence of new forms of managed care, compel more current data. OBJECTIVE To compare the primary care received by patients in each of 5 models of managed care (managed indemnity, point of service, network-model health maintenance organization [HMO], group-model HMO, and staff-model HMO) and identify specific characteristics of health plans associated with performance differences. METHODS Cross-sectional observational study of Massachusetts adults who reported having a regular personal physician and for whom plan-type was known (n = 6018). Participants completed a validated questionnaire measuring 7 defining characteristics of primary care. Senior health plan executives provided information about financial and nonfinancial features of the plan's contractual arrangements with physicians. RESULTS The managed indemnity system performed most favorably, with the highest adjusted mean scores for 8 of 10 measures (P<.05). Point of service and network-model HMO performance equaled the indemnity system on many measures. Staff-model HMOs performed least favorably, with adjusted mean scores that were lowest or statistically equivalent to the lowest score on all 10 scales. Among network-model HMOs, several features of the plan's contractual arrangement with physicians (ie, capitated physician payment, extensive use of clinical practice guidelines, financial incentives concerning patient satisfaction) were significantly associated with performance (P<.05). CONCLUSIONS With US employers and purchasers having largely rejected traditional indemnity insurance as unaffordable, the results suggest that the current momentum toward open-model managed care plans is consistent with goals for high-quality primary care, but that the effects of specific financial and nonfinancial incentives used by plans must continue to be examined.
Collapse
Affiliation(s)
- D G Safran
- Health Institute, New England Medical Center, Department of Medicine, Tufts University, Boston, Mass. 02111, USA.
| | | | | | | | | | | | | | | |
Collapse
|
20
|
Scott-Lennox JA, Wu AW, Boyer JG, Ware JE. Reliability and validity of French, German, Italian, Dutch, and UK English translations of the Medical Outcomes Study HIV Health Survey. Med Care 1999; 37:908-25. [PMID: 10493469 DOI: 10.1097/00005650-199909000-00007] [Citation(s) in RCA: 43] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Test the reliability and validity of 5 translations of the 34-item version of the MOS HIV for use in multinational clinical trials. RESEARCH DESIGN Investigators in five countries followed a standardized protocol and recruited HIV+ patients stratified by disease stage: asymptomatic; symptomatic; and AIDS. During routine clinic visits, patients completed the MOS HIV and a checklist of HIV-related symptoms. Clinicians reported patients' demographics, most recent CD4+ count and disease stage. SUBJECTS Three hundred and sixty three HIV+ outpatients attending AIDS clinics in The Netherlands, France, Germany, Italy, and England. MEASURES Dutch, French, German, Italian, and UK English translations of the MOS HIV CD4+ cell count and the SCL-57. RESULTS All translations recruited roughly equal proportions of each disease stage, although the number of patients recruited differed by translation (n: German = 92, French = 86; Italian = 88; UK English = 72; and Dutch = 25). Internal consistency reliability was similar across translations and adequate (alpha >.70) for all scales except for Mental Health in the French sample. Multi-trait analyses supported structural validity of the MOS HIV scales in each translation. Principal component analysis of scale scores identified 2 dimensions for all translations except German. For all translations, scores were significantly correlated with symptom severity scores but were uncorrelated with CD4+ cell counts. CONCLUSIONS In general, the 5 translations of the MOS HIV had similar psychometric properties to those reported in the validation study for the original US English version of the MOS HIV. With some revision, these translations promise to provide useful quality of life data from HIV+ subjects in clinical trials.
Collapse
|
21
|
Ware JE. John E. Ware Jr. on health status and quality of life assessment and the next generation of outcomes measurement. Interview by Marcia Stevic and Katie Berry. J Healthc Qual 1999; 21:12-7. [PMID: 10620879 DOI: 10.1111/j.1945-1474.1999.tb00984.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
John E. Ware Jr. PhD, is a founder of QualityMetric. Inc., as well as its president and chief scientific officer. He also is executive director of the Health Assessment Lab at the Health Institute, New England Medical Center, and holds professorships at Harward University and at Tufts University School of Medicine. For 14 years, he was senior research psychologist at RAND, where he developed health status and patient satisfaction measures used in the Health Insurance Experiment. He also was principal investigator for the Medical Outcomes Study (MOS), which developed the SF-36 Health Survey and other tools widely used in monitoring patient outcomes. A coauthor of papers from the MOS that received the Association for Health Services Research (AHSR) Article of the Year Award in 1993, Dr. Ware has received numerous awards for work in the field of outcomes research. He now is principal investigator of the International Quality of Life Assessment Project, which is translating and validating the SF-36 Health Survey for use in 45 countries. Dr. Ware also is developing the next generation of patient-based assessments that use advances in computer technology to provide very brief measures, yet meet the standard of precision necessary for use on an individual patient basis.
Collapse
Affiliation(s)
- J E Ware
- Health Assessment Lab, Health Institute, New England Medical Center, USA
| |
Collapse
|
22
|
Ware JE, Bayliss MS, Mannocchia M, Davis GL. Health-related quality of life in chronic hepatitis C: impact of disease and treatment response. The Interventional Therapy Group. Hepatology 1999; 30:550-5. [PMID: 10421667 DOI: 10.1002/hep.510300203] [Citation(s) in RCA: 223] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Hepatitis C infects nearly 4 million Americans. Most have chronic hepatitis C (CHC), which progresses to cirrhosis in about 20% of patients. Interferon treatment leads to transient responses in about 40% of patients and apparent eradication of infection in 7% to 40% of patients. In this report, we document the impact of CHC on health-related quality of life (HQL), and changes in HQL among treatment responders. Three hundred twenty-four CHC patients from 10 countries who had relapsed after responding to interferon-alfa therapy were randomized to monotherapy (IFN alfa-2b + placebo) or combination therapy (IFN alfa-2b + ribavirin), treated for 24 weeks, and followed up for 24 weeks. HQL was assessed using the Hepatitis Quality of Life Questionnaire (HQLQ), containing the generic SF-36 Health Survey, three additional generic scales, and two hepatitis-specific scales. Before treatment, CHC patients were impaired in 5 of 8 SF-36 concepts (physical functioning, role-physical, general health, vitality, and social functioning) in comparison with matched population norms. Sustained virological response (SVR) to treatment yielded improvements on three generic scales (vitality, social functioning, and health distress) and the CHC-specific health distress scale. Overall response to treatment (SVR plus histological improvement) yielded the same pattern of improvements with additional gains in generic general health and CHC-specific limitations. Successful treatment of CHC improved HQL as measured by both CHC-specific and generic measures of functional health and well being.
Collapse
Affiliation(s)
- J E Ware
- Health Assessment Lab, New England Medical Center, Boston, MA, USA.
| | | | | | | |
Collapse
|
23
|
Abstract
OBJECTIVE The SF-36 Arthritis-Specific Health Index (ASHI) was constructed to improve the responsiveness of the SF-36 Health Survey to changes in the severity of arthritis through the use of arthritis-specific scoring algorithms. This study compared the responsiveness of the ASHI and other generic scales and summary measures scored from the SF-36 in clinical trials of health outcomes for patients with arthritis. METHODS Longitudinal data for patients (n = 835) participating in four placebo-controlled trials were analyzed. Study participants had at least a 6-month history of moderate to severe osteoarthritis or rheumatoid arthritis of the knee or hip. All had undergone a washout period of 3 to 14 days before baseline assessment to bring about a flare state in osteoarthritis or rheumatoid arthritis symptoms. Their average age was 60 years, and 72% were female. Responders and nonresponders were classified on the basis of physician assessments of changes in arthritis severity, with blinding as to treatment group; treated and untreated (placebo) groups were also compared. For the SF-36 ASHI, generic physical (PCS) and mental (MCS) component summary measures and each of eight subscales scored from the SF-36 (acute version) change scores were computed by subtracting scores before treatment from scores at 2-week follow-up. To evaluate empirical validity, analyses of variance were performed. For each measure, an F-ratio was computed for the comparison between clinically defined groups of responders and nonresponders and between groups of patients assigned to placebo versus drug therapy. Relative validity (RV) coefficients were computed for the ASHI in comparison with PCS, MCS, and the best SF-36 scale to determine which was more responsive. RESULTS In analyses of each of the four trials and all trials combined, RV coefficients for the ASHI were higher than those for both of the generic SF-36 summary measures and for the most valid SF-36 scale (Bodily Pain), with only one exception. Across 40 tests of validity in distinguishing treated from untreated patients, the ASHI was 5% to 19% more valid than the best SF-36 scale (RV = 1.05-1.19; RV = 1.10 in all trials combined). The generic summary measures (PCS and MCS) were much less valid in these tests (RV = 0.67 and 0.27, respectively). In analyses of responders and nonresponders, RV coefficients for the ASHI ranged from 0.70 to 1.22 (RV = 1.04 in all trials combined), in comparison with the best SF-36 subscale, which was always Bodily Pain. RV coefficients were lower for PCS (RV = 0.75) and much lower than the MCS (RV = 0.18) in comparisons of treatment outcomes based on all trials combined. CONCLUSION The ASHI appears to be more valid than the eight SF-36 scales and PCS and MCS summary measures for purposes of distinguishing between treated and untreated patients and between clinical responders and nonresponders. This study demonstrates the feasibility of improving the validity of the SF-36 through the use of arthritis-specific scoring while retaining the option of generic scoring, which makes it possible to also compare results across diseases and treatments.
Collapse
Affiliation(s)
- S D Keller
- Health Assessment Lab, Health Institute, New England Medical Center, Boston, MA, USA.
| | | | | | | |
Collapse
|
24
|
Kosinski M, Keller SD, Ware JE, Hatoum HT, Kong SX. The SF-36 Health Survey as a generic outcome measure in clinical trials of patients with osteoarthritis and rheumatoid arthritis: relative validity of scales in relation to clinical measures of arthritis severity. Med Care 1999; 37:MS23-39. [PMID: 10335741 DOI: 10.1097/00005650-199905001-00003] [Citation(s) in RCA: 85] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
OBJECTIVE To evaluate the validity of SF-36 Health Survey (SF-36) scale scores and summary measure scores to describe the health burden of arthritis and to be responsive to clinical indicators of arthritis severity used in four clinical trials. METHODS Adults participating in four double-blinded, placebo-controlled clinical trials of therapy for osteoarthritis or rheumatoid arthritis were administered the SF-36 concurrent with clinical measures of disease severity (n = 1,016). Data were collected before treatment and 2 weeks after treatment. Mean SF-36 scores for all patients with arthritis at baseline were compared to a sociodemographically equivalent national norm to test the ability of the SF-36 to describe the burden of arthritis. To test the responsiveness of SF-36 scores to clinical measures of arthritis severity, mean SF-36 scale scores were compared across patients differing in arthritis severity before treatment. Two-week mean SF-36 change scores were compared across patients who improved in arthritis severity (responders) versus patients who did not improve (nonresponders). F-statistics and relative validity coefficients were computed to determine how well each SF-36 scale and summary measure discriminated among arthritis severity levels and distinguished treatment responders from nonresponders, relative to the best scale. RESULTS Large and statistically significant differences in mean SF-36 scale scores and summary measures were found such that trial participants scored in worse health than a sociodemographically equivalent US general population norm. In addition, the largest SF-36 scale scores were found to significantly differ across clinically defined levels of arthritis severity. Finally, it was found that the SF-36 scales that best discriminate among arthritis severity groups cross-sectionally were also best at discriminating treatment responders from nonresponders. CONCLUSION Results of this study support the validity of the SF-36 to document the health burden of arthritis and as a measure of generic health outcome for clinical trials of alternative treatments for osteoarthritis and rheumatoid arthritis patients.
Collapse
Affiliation(s)
- M Kosinski
- Health Assessment Lab, Health Institute, New England Medical Center, Boston, MA 02111, USA
| | | | | | | | | |
Collapse
|
25
|
Kosinski M, Keller SD, Hatoum HT, Kong SX, Ware JE. The SF-36 Health Survey as a generic outcome measure in clinical trials of patients with osteoarthritis and rheumatoid arthritis: tests of data quality, scaling assumptions and score reliability. Med Care 1999; 37:MS10-22. [PMID: 10335740 DOI: 10.1097/00005650-199905001-00002] [Citation(s) in RCA: 79] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVE To evaluate the psychometric assumptions underlying the construction and scoring of SF-36 scales and summary measures among clinical trial participants with arthritis. METHODS Cross-sectional SF-36 data from the baseline assessment of adult patients (n = 1,016) participating in four placebo-controlled clinical trials of treatment for arthritis were analyzed with blinding as to treatment. Tests of the completeness of data, scaling assumptions, internal-consistency reliability, and factor structure of SF-36 scales were performed for the combined sample. Eligible participants had at least a 6-month history of moderate to severe osteoarthritis or rheumatoid arthritis of the knee or hip. Participants meeting inclusion criteria had undergone a washout period of 3-14 days before baseline assessment to bring about a flare state in osteoarthritis or rheumatoid arthritis symptoms. Baseline sample sizes for the three osteoarthritis trials were n = 121, n = 341, and n = 187. The baseline sample size for the rheumatoid arthritis trial was n = 367. The average age of participants was 60 years, and the majority were females (72%). Measured were functional health and well-being scales and physical and mental health summary measures from the SF-36 Health Survey acute form. RESULTS Missing responses ranged from 0.0% to 1.5% across SF-36 items, and scale scores could be computed for 96.8% to 100% of participants across trials. In all four trials, item internal consistency tests were passed (91.4%-97.1%) and item discriminant validity tests were passed (96.9%-100.0%). Across the four trials, internal-consistency reliability coefficients ranged from a low of 0.75 to a high of 0.91 for the eight scales (median = 0.84), exceeding the minimum standards for group comparisons. Ceiling effects were minimal for most scales, and floor effects were noteworthy for the role physical and role emotional scales. Physical and mental health factors identified in previous studies were replicated. CONCLUSION The SF-36 Health Survey proved to be a psychometrically sound tool for the assessment of the health status of adult participants in clinical trials of arthritis.
Collapse
Affiliation(s)
- M Kosinski
- Health Assessment Lab, Health Institute, New England Medical Center, Boston, MA 02111, USA
| | | | | | | | | |
Collapse
|
26
|
Abstract
As shown here, general health measures cover much of the content included in arthritis-specific measures, but, are they equally sensitive to changes in disease condition? We reviewed the literature on the most widely used general health measure, the SF-36 Health Survey, to see if the empirical evidence supported its validity for use in arthritis patients. As of this writing, there was no documentation of the sensitivity of the SF-36 to short-term changes in arthritic condition over the course of clinical trials and few studies that compared the sensitivity of the SF-36 to arthritis-specific measures. The empirical research reported in this special supplement contributes to the literature on the use of the SF-36 in arthritis patients and demonstrates methods of studying the validity of general health measures to monitor change in specific conditions.
Collapse
Affiliation(s)
- S D Keller
- Health Assessment Lab, Health Institute, New England Medical Center, Boston, MA, USA.
| | | | | | | |
Collapse
|
27
|
Abstract
An arthritis-specific health index (ASHI) for the SF-36 Health Survey was developed by studying its responsiveness to changes in clinical indicators of arthritis severity. Longitudinal data from 1,076 patients participating in four placebo-controlled trials were analyzed. All had at least a 6-month history of moderate to severe osteoarthritis or rheumatoid arthritis of the knee or hip. All had undergone a washout period of 3 to 14 days before baseline assessment to bring about a flare state in osteoarthritis or rheumatoid arthritis symptoms. Their average age was 60 years and 72% were female. Change scores for the eight-scale SF-36 health profile (acute version) and five arthritis-specific measures of disease severity (knee pain on weight bearing, time to walk 50 feet, physician global evaluation of symptom severity and impact, patient global evaluation of symptom severity and impact, and pain intensity visual analogue scale) were computed by subtracting scores before treatment from scores at two-week follow-up. Canonical correlation methods were used to derive weights for changes in SF-36 scales to score a single index (ASHI) that maximized its correlation with changes in the set of five clinical measures of arthritis severity. The weights used to score the ASHI were cross-validated in a 25% holdout group (N = 144) from the first two osteoarthritis trials and in two additional osteoarthritis and rheumatoid arthritis trials (N = 530). Only one SF-36 canonical variate (ASHI) correlated significantly (F = 4.69, P < 0.0001) with the clinical canonical variate that served as the "criterion" measure of change in the severity of arthritis. Changes in the ASHI and clinical canonical variate were substantially correlated in the developmental sample (r = 0.628, P < 0.0001) and on cross-validation (r = 0.629, P < 0.0001). The clinical canonical variate correlated highly (r = 0.75-0.88) with changes in all but one of the five clinical measures (50-foot walk; r = 0.41). The pattern of correlations between changes in SF-36 scales and the ASHI indicated that ASHI is primarily a measure of bodily pain (r = 0.92) and other aspects of physical and role functioning and well-being (r = 0.69 for Role-Physical, r = 0.68 for Physical Functioning, r = 0.52 for Social Functioning, and r = 0.51 Vitality). The patterns of correlations between SF-36 scales and the ASHI were very similar across developmental and cross-validation samples. This research demonstrates the feasibility and generalizability of a single ASHI scored from changes in responses to the SF-36 Health Survey. The generic SF-36 health profile, which has already been shown to be useful in comparing arthritis with other diseases and treatments, can also be scored specifically to make it more useful in studies of osteoarthritis and rheumatoid arthritis.
Collapse
Affiliation(s)
- J E Ware
- Health Assessment Lab, Health Institute, New England Medical Center, Boston, MA 02111, USA.
| | | | | | | |
Collapse
|
28
|
Keller SD, Ware JE, Gandek B, Aaronson NK, Alonso J, Apolone G, Bjorner JB, Brazier J, Bullinger M, Fukuhara S, Kaasa S, Leplège A, Sanson-Fisher RW, Sullivan M, Wood-Dauphinee S. Testing the equivalence of translations of widely used response choice labels: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:933-44. [PMID: 9817110 DOI: 10.1016/s0895-4356(98)00084-5] [Citation(s) in RCA: 74] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The similarity in meaning assigned to response choice labels from the SF-36 Health Survey (SF-36) was evaluated across countries. Convenience samples of judges (range, 10 to 117; median = 48) from 13 countries rated translations of response choice labels, using a variation of the Thurstone method of equal appearing intervals. Judges marked a point on a 10-cm line-representing the magnitude of a response choice label (e.g., "good" relative to the anchors of "poor" and "excellent"). Ratings were evaluated to determine the ordinal consistency of response choice labels within a response scale; the degree to which differences between adjacent response choice labels were equal interval; and the amount of variance due to response choice label, country, judge, and interaction between response choice label and country. Results confirmed the hypothesized ordering of response choice labels; the percentage of ordinal pairs ranged from 88.7% to 100% (median = 98.2%) across countries and response scales. Examination of the average magnitudes of response choice labels supported the "quasi-interval" nature of the scales. Analysis of variance (ANOVA) results supported the generalizability of response choice magnitudes across countries; labels explained 64% to 77% of the variance in ratings, and country explained 1% to 3%. These results support the equivalence of SF-36 response choice labels across countries. Departures from the assumption of equal intervals, when observed, were similar across countries and were greatest for the two response scales that are recalibrated under standard SF-36 scoring. Results provide justification for scoring translations of individual items using standard SF-36 scoring; whether these items form the same scales in other countries as they do in the United States is evaluated with tests of scaling assumptions.
Collapse
Affiliation(s)
- S D Keller
- Health Assessment Lab at the Health Institute, New England Medical Center, Boston, Massachusetts 02111, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Keller SD, Ware JE, Bentler PM, Aaronson NK, Alonso J, Apolone G, Bjorner JB, Brazier J, Bullinger M, Kaasa S, Leplège A, Sullivan M, Gandek B. Use of structural equation modeling to test the construct validity of the SF-36 Health Survey in ten countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1179-88. [PMID: 9817136 DOI: 10.1016/s0895-4356(98)00110-3] [Citation(s) in RCA: 178] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A crucial prerequisite to the use of the SF-36 Health Survey in multinational studies is the reproduction of the conceptual model underlying its scoring and interpretation. Structural equation modeling (SEM) was used to test these aspects of the construct validity of the SF-36 in ten IQOLA countries: Denmark, France, Germany, Italy, the Netherlands, Norway, Spain, Sweden, the United Kingdom, and the United States. Data came from general population surveys fielded to gather normative data. Measurement and structural models developed in the United States were cross-validated in random halves of the sample in each country. SEM analyses supported the eight first-order factor model of health that underlies the scoring of SF-36 scales and two second-order factors that are the basis for summary physical and mental health measures. A single third-order factor was also observed in support of the hypothesis that all responses to the SF-36 are generated by a single, underlying construct--health. In addition, a third second-order factors, interpreted as general well-being, was shown to improve the fit of the model. This model (including eight first-order factors, three second-order factors, and one third-order factor) was cross-validated using a holdout sample within the United States and in each of the nine other countries. These results confirm the hypothesized relationships between SF-36 items and scales and justify their scoring in each country using standard algorithms. Results also suggest that SF-36 scales and summary physical and mental health measures will have similar interpretations across countries. The practical implications of a third second-order SF-36 factor (general well-being) warrant further study.
Collapse
Affiliation(s)
- S D Keller
- Health Assessment Lab at the Health Institute, New England Medical Center, Boston, Massachusetts, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Raczek AE, Ware JE, Bjorner JB, Gandek B, Haley SM, Aaronson NK, Apolone G, Bech P, Brazier JE, Bullinger M, Sullivan M. Comparison of Rasch and summated rating scales constructed from SF-36 physical functioning items in seven countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1203-14. [PMID: 9817138 DOI: 10.1016/s0895-4356(98)00112-7] [Citation(s) in RCA: 118] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Rasch models for polytomous items were used to assess the scaling assumptions and compare item response patterns in the 10-item SF-36 physical functioning scale (PF-10) for general population respondents in Denmark, Germany, Italy, the Netherlands, Sweden, the United Kingdom, and the United States. The Rasch model of physical functioning developed in the United States was compared to models for other countries, and each country was compared to a multinational composite. Strong scale congruence across the seven countries was demonstrated; items that varied between countries and from the composite may reflect unique cultural response patterns or differences in translation. Scoring algorithms based on the Rasch model for each country were superior to the current Likert scoring in tests of relative validity (RV) in discriminating among age groups in all countries. In relation to the Likert PF-10 scoring (RV = 1.00), scores estimated using the Rasch rating scale model achieve a median RV of 1.31 (range: 1.01-1.59), while the Rasch partial credit model attained a median RV of 1.44 (range: 1.01-2.23). Rasch models hold good potential for improving health status measures, estimating individual scores when responses to scale items are missing, and equating scores across countries.
Collapse
Affiliation(s)
- A E Raczek
- School of Education, Boston College, Chestnut Hill, Massachusetts, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Bullinger M, Alonso J, Apolone G, Leplège A, Sullivan M, Wood-Dauphinee S, Gandek B, Wagner A, Aaronson N, Bech P, Fukuhara S, Kaasa S, Ware JE. Translating health status questionnaires and evaluating their quality: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:913-23. [PMID: 9817108 DOI: 10.1016/s0895-4356(98)00082-1] [Citation(s) in RCA: 611] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
This article describes the methods adopted by the International Quality of Life Assessment (IQOLA) project to translate the SF-36 Health Survey. Translation methods included the production of forward and backward translations, use of difficulty and quality ratings, pilot testing, and cross-cultural comparison of the translation work. Experience to date suggests that the SF-36 can be adapted for use in other countries with relatively minor changes to the content of the form, providing support for the use of these translations in multinational clinical trials and other studies. The most difficult items to translate were physical functioning items, which used examples of activities and distances that are not common outside of the United States; items that used colloquial expressions such as pep or blue; and the social functioning items. Quality ratings were uniformly high across countries. While the IQOLA approach to translation and validation was developed for use with the SF-36, it is applicable to other translation efforts.
Collapse
Affiliation(s)
- M Bullinger
- Abteilung Für Medizinische Psychologie, Universitätskrankenhaus Eppendorf, Hamburg, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Gandek B, Ware JE. Methods for validating and norming translations of health status questionnaires: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:953-9. [PMID: 9817112 DOI: 10.1016/s0895-4356(98)00086-9] [Citation(s) in RCA: 213] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
This article briefly summarizes methods used in the empirical validation of translations of the SF-36 Health Survey. In addition, information about the IQOLA Project norming protocol and 13 general population norming samples analyzed in this supplement is provided.
Collapse
Affiliation(s)
- B Gandek
- Health Assessment Lab at the Health Institute, New England Medical Center, Boston, Massachusetts 02111, USA
| | | |
Collapse
|
33
|
Abstract
Cross-sectional data from a representative sample of the general population in Japan were analyzed to test the validity of Japanese SF-36 Health Survey scales as measures of physical and mental health. Results from psychometric and clinical tests of validity were compared. Principal components analyses were used to test for the hypothesized physical and mental dimensions of health and the pattern of scale correlations with those components. To test the clinical validity of SF-36 scale scores, self-reports of chronic medical conditions and the Zung Self-Rating Depression Scale were used to create mutually exclusive groups differing in the severity of physical and mental conditions. The pattern of correlations between the SF-36 scales and the two empirically derived components generally confirmed hypotheses for most scales. Results of psychometric and clinical tests of validity were in agreement for the Physical Functioning, Role-Physical, Vitality, Social Functioning, and Mental Health scales. Relatively less agreement between psychometric and clinical tests of validity was observed for the Bodily Pain, General Health, and Role-Emotional scales, and the physical and mental health factor content of those scales was not consistent with hypotheses. In clinical tests of validity, the General Health, Bodily Pain, and Physical Functioning scales were the most valid scales in discriminating between groups with and without a severe physical condition. Scales that correlated highest with mental health in the components analysis (Mental Health and Vitality) also were most valid in discriminating between groups with and without depression. The results of this study provide preliminary interpretation guidelines for all SF-36 scales, although caution is recommended in the interpretation of the Role-Emotional, Bodily Pain, and General Health scales pending further studies in Japan.
Collapse
Affiliation(s)
- S Fukuhara
- Graduate School of Medicine and Education, The University of Tokyo, Japan
| | | | | | | | | |
Collapse
|
34
|
Ware JE, Gandek B. Methods for testing data quality, scaling assumptions, and reliability: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:945-52. [PMID: 9817111 DOI: 10.1016/s0895-4356(98)00085-7] [Citation(s) in RCA: 432] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Following the translation development stage, the second research stage of the IQOLA Project tests the assumptions underlying item scoring and scale construction. This article provides detailed information on the research methods used by the IQOLA Project to evaluate data quality, scaling and scoring assumptions, and the reliability of the SF-36 scales. Tests include evaluation of item and scale-level descriptive statistics; examination of the equality of item-scale correlations, item internal consistency and item discriminant validity; and estimation of scale score reliability using internal consistency and test-retest methods. Results from these tests are used to determine if standard algorithms for the construction and scoring of the eight SF-36 scales can be used in each country and to provide information that can be used in translation improvement.
Collapse
Affiliation(s)
- J E Ware
- Health Assessment Lab at the Health Institute, New England Medical Center, Boston, Massachusetts 02111, USA
| | | |
Collapse
|
35
|
Abstract
Statistical analyses of Differential Item Functioning (DIF) can be used for rigorous translation evaluations. DIF techniques test whether each item functions in the same way, irrespective of the country, language, or culture of the respondents. For a given level of health, the score on any item should be independent of nationality. This requirement can be tested through contingency-table methods, which are efficient for analyzing all types of items. We investigated DIF in the Danish translation of the SF-36 Health Survey, using two general population samples (USA, n = 1,506; Denmark, n = 3,950). DIF was identified for 12 out of 35 items. These results agreed with independent ratings of translation quality, but the statistical techniques were more sensitive. When included in scales, the items exhibiting DIF had only a little impact on conclusions about cross-national differences in health in the general population. However, if used as single items, the DIF items could seriously bias results from cross-national comparisons. Also, the DIF items might have larger impact on cross-national comparison of groups with poorer health status. We conclude that analysis of DIF is useful for evaluating questionnaire translations.
Collapse
Affiliation(s)
- J B Bjorner
- Institute of Public Health, University of Copenhagen, Denmark
| | | | | | | | | |
Collapse
|
36
|
Gandek B, Ware JE, Aaronson NK, Apolone G, Bjorner JB, Brazier JE, Bullinger M, Kaasa S, Leplege A, Prieto L, Sullivan M. Cross-validation of item selection and scoring for the SF-12 Health Survey in nine countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1171-8. [PMID: 9817135 DOI: 10.1016/s0895-4356(98)00109-7] [Citation(s) in RCA: 2061] [Impact Index Per Article: 79.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Data from general population surveys (n = 1483 to 9151) in nine European countries (Denmark, France, Germany, Italy, the Netherlands, Norway, Spain, Sweden, and the United Kingdom) were analyzed to cross-validate the selection of questionnaire items for the SF-12 Health Survey and scoring algorithms for 12-item physical and mental component summary measures. In each country, multiple regression methods were used to select 12 SF-36 items that best reproduced the physical and mental health summary scores for the SF-36 Health Survey. Summary scores then were estimated with 12 items in three ways: using standard (U.S.-derived) SF-12 items and scoring algorithms; standard items and country-specific scoring; and country-specific sets of 12 items and scoring. Replication of the 36-item summary measures by the 12-item summary measures was then evaluated through comparison of mean scores and the strength of product-moment correlations. Product-moment correlations between SF-36 summary measures and SF-12 summary measures (standard and country-specific) were very high, ranging from 0.94-0.96 and 0.94-0.97 for the physical and mental summary measures, respectively. Mean 36-item summary measures and comparable 12-item summary measures were within 0.0 to 1.5 points (median = 0.5 points) in each country and were comparable across age groups. Because of the high degree of correspondence between summary physical and mental health measures estimated using the SF-12 and SF-36, it appears that the SF-12 will prove to be a practical alternative to the SF-36 in these countries, for purposes of large group comparisons in which the focus is on overall physical and mental health outcomes.
Collapse
Affiliation(s)
- B Gandek
- Health Assessment Lab at the Health Institute, New England Medical Center, Boston, Massachusetts 02111, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Ware JE, Gandek B, Kosinski M, Aaronson NK, Apolone G, Brazier J, Bullinger M, Kaasa S, Leplège A, Prieto L, Sullivan M, Thunedborg K. The equivalence of SF-36 summary health scores estimated using standard and country-specific algorithms in 10 countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1167-70. [PMID: 9817134 DOI: 10.1016/s0895-4356(98)00108-5] [Citation(s) in RCA: 436] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Data from general population surveys (n = 1771 to 9151) in nine European countries (Denmark, France, Germany, Italy, the Netherlands, Norway, Spain, Sweden, and the United Kingdom) were analyzed to test the algorithms used to score physical and mental component summary measures (PCS-36/MCS-36) based on the SF-36 Health Survey. Scoring coefficients for principal components were estimated independently in each country using identical methods of factor extraction and orthogonal rotation. PCS-36 and MCS-36 scores were also estimated using standard (U.S.-derived) scoring algorithms, and results were compared. Product-moment correlations between scores estimated from standard and country-specific scoring coefficients were very high (0.98 to 1.00) for both physical and mental health components in all countries. As hypothesized for orthogonal components, correlations between physical and mental components within each country were very low (0.00 to 0.12) for both estimation methods. Mean scores for PCS-36 differed by as much as 3.0 points across countries using standard scoring, and mean scores for MCS-36 differed across countries by as much as 6.4 points. In view of the high degree of equivalence observed within each country, using standard and country-specific algorithms, we recommend use of standard scoring algorithms for purposes of multinational studies involving these 10 countries.
Collapse
Affiliation(s)
- J E Ware
- Health Assessment Lab at the Health Institute, New England Medical Center, Boston, Massachusetts 02111, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Gandek B, Ware JE, Aaronson NK, Alonso J, Apolone G, Bjorner J, Brazier J, Bullinger M, Fukuhara S, Kaasa S, Leplège A, Sullivan M. Tests of data quality, scaling assumptions, and reliability of the SF-36 in eleven countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1149-58. [PMID: 9817132 DOI: 10.1016/s0895-4356(98)00106-1] [Citation(s) in RCA: 299] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Data from general population samples in 11 countries (n = 1483 to 9151) were used to assess data quality and test the assumptions underlying the construction and scoring of multi-item scales from the SF-36 Health Survey. Across all countries, the rate of item-level missing data generally was low, although slightly higher for items printed in the grid format. In each country, item means generally were clustered as hypothesized within scales. Correlations between items and hypothesized scales were greater than 0.40 with one exception, supporting item internal consistency. Items generally correlated significantly higher with their own scale than with competing scales, supporting item discriminant validity. Scales could be constructed for 93-100% of respondents. Internal consistency reliability of the eight SF-36 scales was above 0.70 for all scales, with two exceptions. Floor effects were low for all except the two role functioning scales; ceiling effects were high for both role functioning scales and also were noteworthy for the Physical Functioning, Bodily Pain, and Social Functioning scales in some countries. These results support the construction and scoring of the SF-36 translations in these 11 countries using the method of summated ratings.
Collapse
Affiliation(s)
- B Gandek
- Health Assessment Lab at the Health Institute, New England Medical Center, Boston, Massachusetts 02111, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Abstract
This article presents information about the development and evaluation of the SF-36 Health Survey, a 36-item generic measure of health status. It summarizes studies of reliability and validity and provides administrative and interpretation guidelines for the SF-36. A brief history of the International Quality of Life Assessment (IQOLA) Project is also included.
Collapse
Affiliation(s)
- J E Ware
- Health Assessment Lab at the Health Institute, New England Medical Center, Boston, Massachusetts 02111, USA
| | | |
Collapse
|
40
|
Ware JE, Kosinski M, Gandek B, Aaronson NK, Apolone G, Bech P, Brazier J, Bullinger M, Kaasa S, Leplège A, Prieto L, Sullivan M. The factor structure of the SF-36 Health Survey in 10 countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1159-65. [PMID: 9817133 DOI: 10.1016/s0895-4356(98)00107-3] [Citation(s) in RCA: 461] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Studies of the factor structure of the SF-36 Health Survey are an important step in its construct validation. Its structure is also the psychometric basis for scoring physical and mental health summary scales, which are proving useful in simplifying and interpreting statistical analyses. To test the generalizability of the SF-36 factor structure, product-moment correlations among the eight SF-36 Health Survey scales were estimated for representative samples of general populations in each of 10 countries. Matrices were independently factor analyzed using identical methods to test for hypothesized physical and mental health components, and results were compared with those published for the United States. Following simple orthogonal rotation of two principal components, they were easily interpreted as dimensions of physical and mental health in all countries. These components accounted for 76% to 85% of the reliable variance in scale scores across nine European countries, in comparison with 82% in the United States. Similar patterns of correlations between the eight scales and the components were observed across all countries and across age and gender subgroups within each country. Correlations with the physical component were highest (0.64 to 0.86) for the Physical Functioning, Role Physical, and Bodily Pain scales, whereas the Mental Health, Role Emotional, and Social Functioning scales correlated highest (0.62 to 0.91) with the mental component. Secondary correlations for both clusters of scales were much lower. Scales measuring General Health and Vitality correlated moderately with both physical and mental health components. These results support the construct validity of the SF-36 translations and the scoring of physical and mental health components in all countries studied.
Collapse
Affiliation(s)
- J E Ware
- Health Assessment Lab at the Health Institute, New England Medical Center, Boston, Massachusetts 02111, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Wagner AK, Gandek B, Aaronson NK, Acquadro C, Alonso J, Apolone G, Bullinger M, Bjorner J, Fukuhara S, Kaasa S, Leplège A, Sullivan M, Wood-Dauphinee S, Ware JE. Cross-cultural comparisons of the content of SF-36 translations across 10 countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:925-32. [PMID: 9817109 DOI: 10.1016/s0895-4356(98)00083-3] [Citation(s) in RCA: 192] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Increasingly, translated and culturally adapted health-related quality of life measures are being used in cross-cultural research. To assess comparability of results, researchers need to know the comparability of the content of the questionnaires used in different countries. Based on an item-by-item discussion among International Quality of Life Assessment (IQOLA) investigators of the content of the translated versions of the SF-36 in 10 countries, we discuss the difficulties that arose in translating the SF-36. We also review the solutions identified by IQOLA investigators to translate items and response choices so that they are appropriate within each country as well as comparable across countries. We relate problems and solutions to ratings of difficulty and conceptual equivalence for each item. The most difficult items to translate were physical functioning items that refer to activities not common outside the United States and items that use colloquial expressions in the source version. Identifying the origin of the source items, their meaning to American English-speaking respondents and American English synonyms, in response to country-specific translation issues, greatly helped the translation process. This comparison of the content of translated SF-36 items suggests that the translations are culturally appropriate and comparable in their content.
Collapse
Affiliation(s)
- A K Wagner
- Health Assessment Lab at the Health Institute, New England Medical Center, Boston, Massachusetts 02111, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Ware JE. A conversation with John E. Ware, Jr., PhD. Manag Care Interface 1998; 11:64-7. [PMID: 10186008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Affiliation(s)
- J E Ware
- New England Medical Center, Boston, MA, USA
| |
Collapse
|
43
|
Safran DG, Taira DA, Rogers WH, Kosinski M, Ware JE, Tarlov AR. Linking primary care performance to outcomes of care. J Fam Pract 1998; 47:213-220. [PMID: 9752374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
BACKGROUND Substantial research links many of the defining characteristics of primary care to important outcomes; yet little is known about the relative importance of each characteristic, and several characteristics have not been examined. These analyses evaluate the relationship between seven defining elements of primary care (accessibility, continuity, comprehensiveness, integration, clinical interaction, interpersonal treatment, and trust) and three outcomes (adherence to physician's advice, patient satisfaction, and improved health status). METHODS Data were derived from a cross-sectional observational study of adults employed by the Commonwealth of Massachusetts (N = 7204). All patients completed a validated questionnaire, the Primary Care Assessment Survey. Regression methods were used to examine the association between each primary care characteristic (11 summary scales measuring 7 elements of care) and each outcome. RESULTS Physicians' comprehensive ("whole person") knowledge of patients and patients' trust in their physician were the variables most strongly associated with adherence, and trust was the variable most strongly associated with patients' satisfaction with their physician. With other factors equal, adherence rates were 2.6 times higher among patients with whole-person knowledge scores in the 95th percentile compared with the 5th percentile (44.0% adherence vs 16.8% adherence, P < .001). The likelihood of complete satisfaction was 87.5% for those with 95th percentile trust scores compared with 0.4% for patients with 5th percentile trust scores (P < .001). The leading correlates of self-reported health improvements were integration of care, thoroughness of physical examinations, communication, comprehensive knowledge of patients, and trust (P < .001). CONCLUSIONS Patients' trust in their physician and physicians' knowledge of patients are leading correlates of three important outcomes of care. The results are noteworthy in the context of pervasive changes in our nation's health care system that are widely viewed as threatening to the quality of physician-patient relationships.
Collapse
Affiliation(s)
- D G Safran
- Health Institute, New England Medical Center, Boston, MA 02111, USA.
| | | | | | | | | | | |
Collapse
|
44
|
Landgraf JM, Maunsell E, Speechley KN, Bullinger M, Campbell S, Abetz L, Ware JE. Canadian-French, German and UK versions of the Child Health Questionnaire: methodology and preliminary item scaling results. Qual Life Res 1998; 7:433-45. [PMID: 9691723 DOI: 10.1023/a:1008810004694] [Citation(s) in RCA: 272] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Using emerging international guidelines, stringent procedures were used to develop and evaluate Canadian-French, German and UK translations/adaptions of the 50 item, parent-completed Child Health Questionnaire (CHQ-PF50). Multitrait analysis was used to evaluate the convergent and discriminant validity of the hypothesized item sets across countries relative to the results obtained for a representative sample of children in the US. Cronbach's alpha coefficient was used to estimate the internal consistency reliability for each of the health scales. Floor and ceiling effects were also examined. Seventy-nine percent of all the item-scale correlations achieved acceptable internal consistency (0.40 or higher). The tests of the item convergent and discriminant validity were successful at least 87% of the time across all scales and countries. Equal item variance was observed 90% of the time across all countries. The reliability coefficients ranged from a low of 0.43 (parental time impact, Canadian English) to a high of 0.97 (physical functioning index, Canadian French) across all scales (median 0.80). Negligible floor effects were observed across countries. Noteworthy ceiling effects were observed, as expected, for the hypothesized physical scales (mean effect 73%). Conversely, fewer ceiling effects were observed for the psychosocial scales (range 3-17% behaviour-parental emotional impact). The item-scaling results obtained in these pilot studies support the psychometric properties of the American-English CHQ-PF50 and its respective translations.
Collapse
|
45
|
Damiano AM, Pastores GM, Ware JE. The health-related quality of life of adults with Gaucher's disease receiving enzyme replacement therapy: results from a retrospective study. Qual Life Res 1998; 7:373-86. [PMID: 9691718 DOI: 10.1023/a:1008814105603] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Few studies have reported on the effect of Gaucher's disease on patient-reported, health-related quality of life (HRQoL) and we do not know how the HRQoL burden of Gaucher's disease compares to that of other chronic conditions, what areas of HRQoL are most affected or how the course of change in HRQoL compares with that observed for other conditions or for the general adult population. The purpose of this study was to estimate (1) the HRQoL burden associated with Gaucher's disease managed by enzyme replacement therapy (ERT), (2) recalled changes in HRQoL since ERT initiation and (3) risk factors predictive of HRQoL outcomes. We sampled 212 patients with Gaucher's disease recruited from 146 physicians prescribing ERT in the US. The patients were at least 14 years of age and had been on ERT from 1 to 51 months. The mean (SD) age of the participants was 45 (17) years. Forty-nine percent had had a prior splenectomy and 26% had had a joint replacement. We administered the SF-36 Health Survey (SF-36) and three questions about changes in physical, mental and general HRQoL since starting ERT. The patients with Gaucher's disease scored significantly worse than the age- and gender-adjusted US norms on five of the eight SF-36 subscales (p < 0.05). Age (p < 0.0001) and joint replacement (p < 0.001) were negatively associated with physical health. The presence of an intact spleen (p < 0.01) and a longer duration of ERT (p < 0.01) were associated with better mental health. When asked about changes in HRQoL since starting ERT, at least half of the patients reported fewer limitations in physical activities (53%), better general health perceptions (77%) and less negative emotions (49%) at the time of the interview. Patients who had been receiving ERT for approximately 4 years recalled four and five times more improvement in general HRQoL in comparison with recalled changes over a 4 year period among adults in the general population (p < 0.001) and a congestive heart failure population (p < 0.01), respectively. Odds ratios (ORs) revealed that female patients were more likely to report improvements in general HRQoL than males (OR = 4.50 and 95% CI = 2.19-9.25) and 45 year old patients were less likely to report improvements than 35 year olds (OR = 0.76 and 95% CI = 0.62-0.94). Relative to patients who had been receiving ERT for 1 year, those who had been receiving ERT for 2 and 4 years were 1.40 (95% CI = 1.06-1.84) and 2.75 (95% CI = 1.20-6.27) times more likely to report improvements in general HRQoL, respectively. In summary, patients with Gaucher's disease on ERT reported an improvement in HRQoL that was greater than that reported by patients with other chronic diseases. However, Gaucher's patients treated for up to 51 months scored below equivalent adults in the general population. The risk factors, including age and history of splenectomy and joint replacement, warrant further study. Standardized HRQoL measures are likely to prove useful in understanding better the outcomes from the Gaucher's patient's perspective.
Collapse
Affiliation(s)
- A M Damiano
- Outcomes Studies Group, Covance Health Economics and Outcome Services Inc., Washington, DC 20005-3934, USA
| | | | | |
Collapse
|
46
|
Abstract
OBJECTIVES The authors examine the data quality and measurement performance of the Primary Care Assessment Survey (PCAS), a patient-completed questionnaire that operationalizes formal definitions of primary care, including the definition recently proposed by the Institute of Medicine Committee on the Future of Primary Care. METHODS The PCAS measures seven domains of care through 11 summary scales: accessibility (organizational, financial), continuity (longitudinal, visit-based), comprehensiveness (contextual knowledge of patient, preventive counseling), integration, clinical interaction (clinician-patient communication, thoroughness of physical examinations), interpersonal treatment, and trust. Data from a study of Massachusetts state employees (n = 6094) were used to evaluate key measurement properties of the 11 PCAS scales. Analyses were performed on the combined population and for each of the 16 subgroups defined according to sociodemographic and health characteristics. RESULTS The 11 PCAS scales demonstrated consistently strong measurement characteristics across all subgroups of this adult population. Tests of scaling assumptions for summated rating scales were well satisfied by all Likert-scaled measures. Assessment of data completeness, scale score dispersion characteristics, and inter-scale correlations provide strong evidence for the soundness of all scales, and for the value of separately measuring and interpreting these concepts. CONCLUSIONS With public and private sector policies increasingly emphasizing the importance of primary care, the need for tools to evaluate and improve primary care performance is clear. The PCAS has excellent measurement properties, and performs consistently well across varied segments of the adult population. Widespread application of an assessment methodology, such as the PCAS, will afford an empiric basis through which to measure, monitor, and continuously improve primary care.
Collapse
Affiliation(s)
- D G Safran
- The Health Institute, Boston, Massachusetts 02111, USA
| | | | | | | | | | | | | |
Collapse
|
47
|
Ware JE, Kemp JP, Buchner DA, Singer AE, Nolop KB, Goss TF. The responsiveness of disease-specific and generic health measures to changes in the severity of asthma among adults. Qual Life Res 1998; 7:235-44. [PMID: 9584554 DOI: 10.1023/a:1024946316424] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The objective of the study was to compare the validity of asthma-specific and generic health outcome measures in relation to changes in the severity of asthma and to treatment. Adult patients (n = 142) participating in a randomized placebo-controlled trial at six clinics were assessed at baseline, prior to the withdrawal (placebo) or continuation of treatment with Vanceril and again after 8 weeks. The criterion measures of change in severity included pulmonary function expressed as the percent predicted FEV1, five physician-assessed asthma severity measures (cough, chest tightness, wheezing, shortness of breath and overall condition) and two patient-assessed severity measures (night-time symptoms and overall symptoms). The 8 week change scores were estimated for all generic and specific measures and the results were compared across groups of patients who did and did not change in terms of clinical criteria of disease severity and across treatment groups. The responsiveness of each generic and specific measure was estimated independently using the relative validity (RV) methodology, which compares F-ratios for the mean change scores across measures in analyses of the same comparison groups. RV coefficients estimate how much worse each measure discriminated between comparison groups, relative to the best measure (RV = 1.0). Four standardized asthma-specific measures and a total scale score (based on the Marks questionnaire), an individualized asthma-specific scale measuring limitations in activities most important to each patient (based on the Juniper method) and two newly-developed scales measuring physical and psychosocial symptoms were used as outcome measures, generic health outcome measures included eight functional health and well-being scales as well as the physical and mental health summary scales from the SF-36 health survey. A standardized asthma-specific scale was most valid in discriminating between groups of patients who did and did not change according to all of the clinical criterion variables studied and in discriminating between treated and untreated groups. Different scales performed best, depending on the clinical criterion. The asthma-specific Marks breathlessness scale was significant in all nine comparisons (RV = 0.62-1.0) and was most valid in discriminating between groups in six of nine tests. The overall scale also performed well in all comparisons (RV = 0.58-1.0). The newly-developed physical symptoms scale was significant in discriminating between groups in eight out of nine tests (RV = 0.52-1.0) and was most valid in three of the nine, including the treatment comparison. The psychosocial impact scale discriminated significantly in eight of the nine comparisons (RV = 0.16-0.38), but was less valid than other specific measures. The asthma-specific individualized activities scale discriminated significantly in seven of the nine tests, but performed less well than the other specific measures (RV = 0.21-0.35) and was not significant in the treatment comparison. One or more SF-36 scales discriminated significantly between groups in all nine comparisons. Two of those scales (physical functioning and role-physical) were consistently more valid than the others (RV = 0.17 and 0.58, respectively) and were the only two generic scales that discriminated between groups of patients defined in terms of changes in FEV1 (RV = 0.26-0.58). The SF-36 physical summary scale discriminated significantly between groups in all nine comparisons (RV = 0.19-0.61) and was the most valid generic measure in the treatment comparison (RV = 0.55). The SF-36 mental summary scale was significant only for the two patient-assessed changes in disease severity (RV = 0.31 and 0.32) and for physician-assessed overall severity (RV = 0.12). A comprehensive battery of generic and specific measures is likely to be most useful in understanding the impact of changes in disease severity on the functional health and well-being of adults with asthma, a
Collapse
Affiliation(s)
- J E Ware
- Health Assessment Laboratory, New England Medical Center Hospital, Boston, MA 02111, USA
| | | | | | | | | | | |
Collapse
|
48
|
Nelson EC, McHorney CA, Manning WG, Rogers WH, Zubkoff M, Greenfield S, Ware JE, Tarlov AR. A longitudinal study of hospitalization rates for patients with chronic disease: results from the Medical Outcomes Study. Health Serv Res 1998; 32:759-74. [PMID: 9460485 PMCID: PMC1070232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
OBJECTIVE To prospectively compare inpatient and outpatient utilization rates between prepaid (PPD) and fee-for-service (FFS) insurance coverage for patients with chronic disease. DATA SOURCE/STUDY SETTING Data from the Medical Outcomes Study, a longitudinal observational study of chronic disease patients conducted in Boston, Chicago, and Los Angeles. STUDY DESIGN A four-year prospective study of resource utilization among 1,681 patients under treatment for hypertension, diabetes, myocardial infarction, or congestive heart failure in the practices of 367 clinicians. DATA COLLECTION/EXTRACTION METHODS Insurance payment system (PPD or FFS), hospitalizations, and office visits were obtained from patient reports. Disease and severity indicators, sociodemographics, and self-reported functional status were used to adjust for patient mix and to compute expected utilization rates. PRINCIPAL FINDINGS Compared to FFS, PPD patients had 31 percent fewer observed hospitalizations before adjustment for patient differences (p = .005) and 15 percent fewer hospitalizations than expected after adjustment (p = .078). The observed rate of FFS hospitalizations exceeded the expected rate by 9 percent. These results are not explained by system differences in patient mix or trends in hospital use over four years. Half of the PPD/FFS difference in hospitalization rate is due to intrinsic characteristics of the payment system itself. CONCLUSIONS PPD patients with chronic medical conditions followed prospectively over four years, after extensive patient-mix adjustment, had 15 percent fewer hospitalizations than their FFS counterparts owing to differences intrinsic to the insurance reimbursement system.
Collapse
Affiliation(s)
- E C Nelson
- Community and Family Medicine, Dartmouth-Hitchcock Medical Center, Lebanon, NH 03756, USA
| | | | | | | | | | | | | | | |
Collapse
|
49
|
Bayliss MS, Gandek B, Bungay KM, Sugano D, Hsu MA, Ware JE. A questionnaire to assess the generic and disease-specific health outcomes of patients with chronic hepatitis C. Qual Life Res 1998; 7:39-55. [PMID: 9481150 DOI: 10.1023/a:1008884805251] [Citation(s) in RCA: 125] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
A 69-item questionnaire measuring generic functioning and well-being and disease-specific health outcomes was developed and tested using the pre-treatment data from patients with chronic hepatitis C (CHC) participating in two randomized trials of interferon alpha-2b (n = 157). The questionnaire included all eight scales from the SF-36 and measures of nine other generic and disease-specific health concepts. Psychometric tests confirmed the assumptions underlying the construction and scoring of all generic and disease-specific scales. Cross-sectional tests of 'known groups' validity showed that CHC patients scored worse on the generic scales than patients with other chronic conditions and worse than a healthy general population. The generic and disease-specific scale scores were lower in the presence of physical findings of CHC, as hypothesized, but only the physical functioning and bodily pain scales were linked to cirrhosis or extreme alanine aminotransferase (ALT) ratios. This instrument will be useful in studies of health outcome among patients with CHC, a condition whose health burden appears to have been underestimated in studies to date.
Collapse
Affiliation(s)
- M S Bayliss
- Health Institute, New England Medical Center, Boston, MA 02111, USA.
| | | | | | | | | | | |
Collapse
|
50
|
Abstract
A growing scientific literature highlights concern about the influence of social bias in medical care. Differential treatment of male and female patients has been among the documented concerns. Yet, little is known about the extent to which differential treatment of male and female patients reflects the influence of social bias or of more acceptable factors, such as different patient preferences or different anticipated outcomes of care. This paper attempts to ascertain the underlying basis for an observed differential in physicians' tendency to advice activity restrictions for male and female patients. We explore the extent to which the gender-based treatment differential is attributable to: (1) patients' health profile, (2) patients' role responsibilities, (3) patients' illness behaviors, and (4) physician characteristics. These four categories of variables correspond to four prominent social science hypotheses concerning gender differences in health and health care utilization (i.e, biological basis hypothesis, fixed role hypothesis, socialization hypothesis, physician bias hypothesis). Data are drawn from the Medical Outcomes Study (MOS), a longitudinal observational study of 1546 patients of 349 physicians practicing in three U.S. cities. Multivariate logistic regression is used to evaluate the likelihood of physician-prescribed activity restrictions for male and female patients, and to explore the absolute and relative influence of patient and physician factors on the observed treatment differential. Results reveal that the odds of prescribed activity restrictions are 3.6 times higher for female patients than for males with equivalent characteristics. The observed differential is not explained by differences in male and female patients' health or role responsibilities. Gender differences in illness behavior and physician gender biases both appear to contribute to the observed differential. Female patients exhibit more illness behavior than males, and these behaviors increase physicians' tendency to prescribe activity restrictions. After accounting for illness behavior differences and all other factors, the odds of prescribed activity restrictions among female patients of male physicians is four times that of equivalent male patients of those physicians. Medical practice, education, and research must strive to identify and remove the likely unconscious role of social bias in medical decision making.
Collapse
Affiliation(s)
- D G Safran
- Health Institute, New England Medical Center, Boston, MA 02111, USA
| | | | | | | | | |
Collapse
|