26
|
Keller SD, Majkut TC, Kosinski M, Ware JE. Monitoring health outcomes among patients with arthritis using the SF-36 Health Survey: overview. Med Care 1999; 37:MS1-9. [PMID: 10335739 DOI: 10.1097/00005650-199905001-00001] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
As shown here, general health measures cover much of the content included in arthritis-specific measures, but, are they equally sensitive to changes in disease condition? We reviewed the literature on the most widely used general health measure, the SF-36 Health Survey, to see if the empirical evidence supported its validity for use in arthritis patients. As of this writing, there was no documentation of the sensitivity of the SF-36 to short-term changes in arthritic condition over the course of clinical trials and few studies that compared the sensitivity of the SF-36 to arthritis-specific measures. The empirical research reported in this special supplement contributes to the literature on the use of the SF-36 in arthritis patients and demonstrates methods of studying the validity of general health measures to monitor change in specific conditions.
Collapse
|
27
|
Ware JE, Keller SD, Hatoum HT, Kong SX. The SF-36 Arthritis-Specific Health Index (ASHI): I. Development and cross-validation of scoring algorithms. Med Care 1999; 37:MS40-50. [PMID: 10335742 DOI: 10.1097/00005650-199905001-00004] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
An arthritis-specific health index (ASHI) for the SF-36 Health Survey was developed by studying its responsiveness to changes in clinical indicators of arthritis severity. Longitudinal data from 1,076 patients participating in four placebo-controlled trials were analyzed. All had at least a 6-month history of moderate to severe osteoarthritis or rheumatoid arthritis of the knee or hip. All had undergone a washout period of 3 to 14 days before baseline assessment to bring about a flare state in osteoarthritis or rheumatoid arthritis symptoms. Their average age was 60 years and 72% were female. Change scores for the eight-scale SF-36 health profile (acute version) and five arthritis-specific measures of disease severity (knee pain on weight bearing, time to walk 50 feet, physician global evaluation of symptom severity and impact, patient global evaluation of symptom severity and impact, and pain intensity visual analogue scale) were computed by subtracting scores before treatment from scores at two-week follow-up. Canonical correlation methods were used to derive weights for changes in SF-36 scales to score a single index (ASHI) that maximized its correlation with changes in the set of five clinical measures of arthritis severity. The weights used to score the ASHI were cross-validated in a 25% holdout group (N = 144) from the first two osteoarthritis trials and in two additional osteoarthritis and rheumatoid arthritis trials (N = 530). Only one SF-36 canonical variate (ASHI) correlated significantly (F = 4.69, P < 0.0001) with the clinical canonical variate that served as the "criterion" measure of change in the severity of arthritis. Changes in the ASHI and clinical canonical variate were substantially correlated in the developmental sample (r = 0.628, P < 0.0001) and on cross-validation (r = 0.629, P < 0.0001). The clinical canonical variate correlated highly (r = 0.75-0.88) with changes in all but one of the five clinical measures (50-foot walk; r = 0.41). The pattern of correlations between changes in SF-36 scales and the ASHI indicated that ASHI is primarily a measure of bodily pain (r = 0.92) and other aspects of physical and role functioning and well-being (r = 0.69 for Role-Physical, r = 0.68 for Physical Functioning, r = 0.52 for Social Functioning, and r = 0.51 Vitality). The patterns of correlations between SF-36 scales and the ASHI were very similar across developmental and cross-validation samples. This research demonstrates the feasibility and generalizability of a single ASHI scored from changes in responses to the SF-36 Health Survey. The generic SF-36 health profile, which has already been shown to be useful in comparing arthritis with other diseases and treatments, can also be scored specifically to make it more useful in studies of osteoarthritis and rheumatoid arthritis.
Collapse
|
28
|
Keller SD, Ware JE, Gandek B, Aaronson NK, Alonso J, Apolone G, Bjorner JB, Brazier J, Bullinger M, Fukuhara S, Kaasa S, Leplège A, Sanson-Fisher RW, Sullivan M, Wood-Dauphinee S. Testing the equivalence of translations of widely used response choice labels: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:933-44. [PMID: 9817110 DOI: 10.1016/s0895-4356(98)00084-5] [Citation(s) in RCA: 74] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The similarity in meaning assigned to response choice labels from the SF-36 Health Survey (SF-36) was evaluated across countries. Convenience samples of judges (range, 10 to 117; median = 48) from 13 countries rated translations of response choice labels, using a variation of the Thurstone method of equal appearing intervals. Judges marked a point on a 10-cm line-representing the magnitude of a response choice label (e.g., "good" relative to the anchors of "poor" and "excellent"). Ratings were evaluated to determine the ordinal consistency of response choice labels within a response scale; the degree to which differences between adjacent response choice labels were equal interval; and the amount of variance due to response choice label, country, judge, and interaction between response choice label and country. Results confirmed the hypothesized ordering of response choice labels; the percentage of ordinal pairs ranged from 88.7% to 100% (median = 98.2%) across countries and response scales. Examination of the average magnitudes of response choice labels supported the "quasi-interval" nature of the scales. Analysis of variance (ANOVA) results supported the generalizability of response choice magnitudes across countries; labels explained 64% to 77% of the variance in ratings, and country explained 1% to 3%. These results support the equivalence of SF-36 response choice labels across countries. Departures from the assumption of equal intervals, when observed, were similar across countries and were greatest for the two response scales that are recalibrated under standard SF-36 scoring. Results provide justification for scoring translations of individual items using standard SF-36 scoring; whether these items form the same scales in other countries as they do in the United States is evaluated with tests of scaling assumptions.
Collapse
|
29
|
Keller SD, Ware JE, Bentler PM, Aaronson NK, Alonso J, Apolone G, Bjorner JB, Brazier J, Bullinger M, Kaasa S, Leplège A, Sullivan M, Gandek B. Use of structural equation modeling to test the construct validity of the SF-36 Health Survey in ten countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1179-88. [PMID: 9817136 DOI: 10.1016/s0895-4356(98)00110-3] [Citation(s) in RCA: 178] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A crucial prerequisite to the use of the SF-36 Health Survey in multinational studies is the reproduction of the conceptual model underlying its scoring and interpretation. Structural equation modeling (SEM) was used to test these aspects of the construct validity of the SF-36 in ten IQOLA countries: Denmark, France, Germany, Italy, the Netherlands, Norway, Spain, Sweden, the United Kingdom, and the United States. Data came from general population surveys fielded to gather normative data. Measurement and structural models developed in the United States were cross-validated in random halves of the sample in each country. SEM analyses supported the eight first-order factor model of health that underlies the scoring of SF-36 scales and two second-order factors that are the basis for summary physical and mental health measures. A single third-order factor was also observed in support of the hypothesis that all responses to the SF-36 are generated by a single, underlying construct--health. In addition, a third second-order factors, interpreted as general well-being, was shown to improve the fit of the model. This model (including eight first-order factors, three second-order factors, and one third-order factor) was cross-validated using a holdout sample within the United States and in each of the nine other countries. These results confirm the hypothesized relationships between SF-36 items and scales and justify their scoring in each country using standard algorithms. Results also suggest that SF-36 scales and summary physical and mental health measures will have similar interpretations across countries. The practical implications of a third second-order SF-36 factor (general well-being) warrant further study.
Collapse
|
30
|
Raczek AE, Ware JE, Bjorner JB, Gandek B, Haley SM, Aaronson NK, Apolone G, Bech P, Brazier JE, Bullinger M, Sullivan M. Comparison of Rasch and summated rating scales constructed from SF-36 physical functioning items in seven countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1203-14. [PMID: 9817138 DOI: 10.1016/s0895-4356(98)00112-7] [Citation(s) in RCA: 118] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Rasch models for polytomous items were used to assess the scaling assumptions and compare item response patterns in the 10-item SF-36 physical functioning scale (PF-10) for general population respondents in Denmark, Germany, Italy, the Netherlands, Sweden, the United Kingdom, and the United States. The Rasch model of physical functioning developed in the United States was compared to models for other countries, and each country was compared to a multinational composite. Strong scale congruence across the seven countries was demonstrated; items that varied between countries and from the composite may reflect unique cultural response patterns or differences in translation. Scoring algorithms based on the Rasch model for each country were superior to the current Likert scoring in tests of relative validity (RV) in discriminating among age groups in all countries. In relation to the Likert PF-10 scoring (RV = 1.00), scores estimated using the Rasch rating scale model achieve a median RV of 1.31 (range: 1.01-1.59), while the Rasch partial credit model attained a median RV of 1.44 (range: 1.01-2.23). Rasch models hold good potential for improving health status measures, estimating individual scores when responses to scale items are missing, and equating scores across countries.
Collapse
|
31
|
Bullinger M, Alonso J, Apolone G, Leplège A, Sullivan M, Wood-Dauphinee S, Gandek B, Wagner A, Aaronson N, Bech P, Fukuhara S, Kaasa S, Ware JE. Translating health status questionnaires and evaluating their quality: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:913-23. [PMID: 9817108 DOI: 10.1016/s0895-4356(98)00082-1] [Citation(s) in RCA: 611] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
This article describes the methods adopted by the International Quality of Life Assessment (IQOLA) project to translate the SF-36 Health Survey. Translation methods included the production of forward and backward translations, use of difficulty and quality ratings, pilot testing, and cross-cultural comparison of the translation work. Experience to date suggests that the SF-36 can be adapted for use in other countries with relatively minor changes to the content of the form, providing support for the use of these translations in multinational clinical trials and other studies. The most difficult items to translate were physical functioning items, which used examples of activities and distances that are not common outside of the United States; items that used colloquial expressions such as pep or blue; and the social functioning items. Quality ratings were uniformly high across countries. While the IQOLA approach to translation and validation was developed for use with the SF-36, it is applicable to other translation efforts.
Collapse
|
32
|
Gandek B, Ware JE. Methods for validating and norming translations of health status questionnaires: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:953-9. [PMID: 9817112 DOI: 10.1016/s0895-4356(98)00086-9] [Citation(s) in RCA: 212] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
This article briefly summarizes methods used in the empirical validation of translations of the SF-36 Health Survey. In addition, information about the IQOLA Project norming protocol and 13 general population norming samples analyzed in this supplement is provided.
Collapse
|
33
|
Ware JE, Gandek B. Methods for testing data quality, scaling assumptions, and reliability: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:945-52. [PMID: 9817111 DOI: 10.1016/s0895-4356(98)00085-7] [Citation(s) in RCA: 432] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Following the translation development stage, the second research stage of the IQOLA Project tests the assumptions underlying item scoring and scale construction. This article provides detailed information on the research methods used by the IQOLA Project to evaluate data quality, scaling and scoring assumptions, and the reliability of the SF-36 scales. Tests include evaluation of item and scale-level descriptive statistics; examination of the equality of item-scale correlations, item internal consistency and item discriminant validity; and estimation of scale score reliability using internal consistency and test-retest methods. Results from these tests are used to determine if standard algorithms for the construction and scoring of the eight SF-36 scales can be used in each country and to provide information that can be used in translation improvement.
Collapse
|
34
|
Fukuhara S, Ware JE, Kosinski M, Wada S, Gandek B. Psychometric and clinical tests of validity of the Japanese SF-36 Health Survey. J Clin Epidemiol 1998; 51:1045-53. [PMID: 9817122 DOI: 10.1016/s0895-4356(98)00096-1] [Citation(s) in RCA: 663] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Cross-sectional data from a representative sample of the general population in Japan were analyzed to test the validity of Japanese SF-36 Health Survey scales as measures of physical and mental health. Results from psychometric and clinical tests of validity were compared. Principal components analyses were used to test for the hypothesized physical and mental dimensions of health and the pattern of scale correlations with those components. To test the clinical validity of SF-36 scale scores, self-reports of chronic medical conditions and the Zung Self-Rating Depression Scale were used to create mutually exclusive groups differing in the severity of physical and mental conditions. The pattern of correlations between the SF-36 scales and the two empirically derived components generally confirmed hypotheses for most scales. Results of psychometric and clinical tests of validity were in agreement for the Physical Functioning, Role-Physical, Vitality, Social Functioning, and Mental Health scales. Relatively less agreement between psychometric and clinical tests of validity was observed for the Bodily Pain, General Health, and Role-Emotional scales, and the physical and mental health factor content of those scales was not consistent with hypotheses. In clinical tests of validity, the General Health, Bodily Pain, and Physical Functioning scales were the most valid scales in discriminating between groups with and without a severe physical condition. Scales that correlated highest with mental health in the components analysis (Mental Health and Vitality) also were most valid in discriminating between groups with and without depression. The results of this study provide preliminary interpretation guidelines for all SF-36 scales, although caution is recommended in the interpretation of the Role-Emotional, Bodily Pain, and General Health scales pending further studies in Japan.
Collapse
|
35
|
Bjorner JB, Kreiner S, Ware JE, Damsgaard MT, Bech P. Differential item functioning in the Danish translation of the SF-36. J Clin Epidemiol 1998; 51:1189-202. [PMID: 9817137 DOI: 10.1016/s0895-4356(98)00111-5] [Citation(s) in RCA: 155] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
Statistical analyses of Differential Item Functioning (DIF) can be used for rigorous translation evaluations. DIF techniques test whether each item functions in the same way, irrespective of the country, language, or culture of the respondents. For a given level of health, the score on any item should be independent of nationality. This requirement can be tested through contingency-table methods, which are efficient for analyzing all types of items. We investigated DIF in the Danish translation of the SF-36 Health Survey, using two general population samples (USA, n = 1,506; Denmark, n = 3,950). DIF was identified for 12 out of 35 items. These results agreed with independent ratings of translation quality, but the statistical techniques were more sensitive. When included in scales, the items exhibiting DIF had only a little impact on conclusions about cross-national differences in health in the general population. However, if used as single items, the DIF items could seriously bias results from cross-national comparisons. Also, the DIF items might have larger impact on cross-national comparison of groups with poorer health status. We conclude that analysis of DIF is useful for evaluating questionnaire translations.
Collapse
|
36
|
Ware JE, Gandek B, Kosinski M, Aaronson NK, Apolone G, Brazier J, Bullinger M, Kaasa S, Leplège A, Prieto L, Sullivan M, Thunedborg K. The equivalence of SF-36 summary health scores estimated using standard and country-specific algorithms in 10 countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1167-70. [PMID: 9817134 DOI: 10.1016/s0895-4356(98)00108-5] [Citation(s) in RCA: 436] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Data from general population surveys (n = 1771 to 9151) in nine European countries (Denmark, France, Germany, Italy, the Netherlands, Norway, Spain, Sweden, and the United Kingdom) were analyzed to test the algorithms used to score physical and mental component summary measures (PCS-36/MCS-36) based on the SF-36 Health Survey. Scoring coefficients for principal components were estimated independently in each country using identical methods of factor extraction and orthogonal rotation. PCS-36 and MCS-36 scores were also estimated using standard (U.S.-derived) scoring algorithms, and results were compared. Product-moment correlations between scores estimated from standard and country-specific scoring coefficients were very high (0.98 to 1.00) for both physical and mental health components in all countries. As hypothesized for orthogonal components, correlations between physical and mental components within each country were very low (0.00 to 0.12) for both estimation methods. Mean scores for PCS-36 differed by as much as 3.0 points across countries using standard scoring, and mean scores for MCS-36 differed across countries by as much as 6.4 points. In view of the high degree of equivalence observed within each country, using standard and country-specific algorithms, we recommend use of standard scoring algorithms for purposes of multinational studies involving these 10 countries.
Collapse
|
37
|
Gandek B, Ware JE, Aaronson NK, Apolone G, Bjorner JB, Brazier JE, Bullinger M, Kaasa S, Leplege A, Prieto L, Sullivan M. Cross-validation of item selection and scoring for the SF-12 Health Survey in nine countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1171-8. [PMID: 9817135 DOI: 10.1016/s0895-4356(98)00109-7] [Citation(s) in RCA: 2080] [Impact Index Per Article: 80.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Data from general population surveys (n = 1483 to 9151) in nine European countries (Denmark, France, Germany, Italy, the Netherlands, Norway, Spain, Sweden, and the United Kingdom) were analyzed to cross-validate the selection of questionnaire items for the SF-12 Health Survey and scoring algorithms for 12-item physical and mental component summary measures. In each country, multiple regression methods were used to select 12 SF-36 items that best reproduced the physical and mental health summary scores for the SF-36 Health Survey. Summary scores then were estimated with 12 items in three ways: using standard (U.S.-derived) SF-12 items and scoring algorithms; standard items and country-specific scoring; and country-specific sets of 12 items and scoring. Replication of the 36-item summary measures by the 12-item summary measures was then evaluated through comparison of mean scores and the strength of product-moment correlations. Product-moment correlations between SF-36 summary measures and SF-12 summary measures (standard and country-specific) were very high, ranging from 0.94-0.96 and 0.94-0.97 for the physical and mental summary measures, respectively. Mean 36-item summary measures and comparable 12-item summary measures were within 0.0 to 1.5 points (median = 0.5 points) in each country and were comparable across age groups. Because of the high degree of correspondence between summary physical and mental health measures estimated using the SF-12 and SF-36, it appears that the SF-12 will prove to be a practical alternative to the SF-36 in these countries, for purposes of large group comparisons in which the focus is on overall physical and mental health outcomes.
Collapse
|
38
|
Gandek B, Ware JE, Aaronson NK, Alonso J, Apolone G, Bjorner J, Brazier J, Bullinger M, Fukuhara S, Kaasa S, Leplège A, Sullivan M. Tests of data quality, scaling assumptions, and reliability of the SF-36 in eleven countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1149-58. [PMID: 9817132 DOI: 10.1016/s0895-4356(98)00106-1] [Citation(s) in RCA: 299] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Data from general population samples in 11 countries (n = 1483 to 9151) were used to assess data quality and test the assumptions underlying the construction and scoring of multi-item scales from the SF-36 Health Survey. Across all countries, the rate of item-level missing data generally was low, although slightly higher for items printed in the grid format. In each country, item means generally were clustered as hypothesized within scales. Correlations between items and hypothesized scales were greater than 0.40 with one exception, supporting item internal consistency. Items generally correlated significantly higher with their own scale than with competing scales, supporting item discriminant validity. Scales could be constructed for 93-100% of respondents. Internal consistency reliability of the eight SF-36 scales was above 0.70 for all scales, with two exceptions. Floor effects were low for all except the two role functioning scales; ceiling effects were high for both role functioning scales and also were noteworthy for the Physical Functioning, Bodily Pain, and Social Functioning scales in some countries. These results support the construction and scoring of the SF-36 translations in these 11 countries using the method of summated ratings.
Collapse
|
39
|
Ware JE, Gandek B. Overview of the SF-36 Health Survey and the International Quality of Life Assessment (IQOLA) Project. J Clin Epidemiol 1998; 51:903-12. [PMID: 9817107 DOI: 10.1016/s0895-4356(98)00081-x] [Citation(s) in RCA: 1651] [Impact Index Per Article: 63.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
This article presents information about the development and evaluation of the SF-36 Health Survey, a 36-item generic measure of health status. It summarizes studies of reliability and validity and provides administrative and interpretation guidelines for the SF-36. A brief history of the International Quality of Life Assessment (IQOLA) Project is also included.
Collapse
|
40
|
Ware JE, Kosinski M, Gandek B, Aaronson NK, Apolone G, Bech P, Brazier J, Bullinger M, Kaasa S, Leplège A, Prieto L, Sullivan M. The factor structure of the SF-36 Health Survey in 10 countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1159-65. [PMID: 9817133 DOI: 10.1016/s0895-4356(98)00107-3] [Citation(s) in RCA: 461] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Studies of the factor structure of the SF-36 Health Survey are an important step in its construct validation. Its structure is also the psychometric basis for scoring physical and mental health summary scales, which are proving useful in simplifying and interpreting statistical analyses. To test the generalizability of the SF-36 factor structure, product-moment correlations among the eight SF-36 Health Survey scales were estimated for representative samples of general populations in each of 10 countries. Matrices were independently factor analyzed using identical methods to test for hypothesized physical and mental health components, and results were compared with those published for the United States. Following simple orthogonal rotation of two principal components, they were easily interpreted as dimensions of physical and mental health in all countries. These components accounted for 76% to 85% of the reliable variance in scale scores across nine European countries, in comparison with 82% in the United States. Similar patterns of correlations between the eight scales and the components were observed across all countries and across age and gender subgroups within each country. Correlations with the physical component were highest (0.64 to 0.86) for the Physical Functioning, Role Physical, and Bodily Pain scales, whereas the Mental Health, Role Emotional, and Social Functioning scales correlated highest (0.62 to 0.91) with the mental component. Secondary correlations for both clusters of scales were much lower. Scales measuring General Health and Vitality correlated moderately with both physical and mental health components. These results support the construct validity of the SF-36 translations and the scoring of physical and mental health components in all countries studied.
Collapse
|
41
|
Wagner AK, Gandek B, Aaronson NK, Acquadro C, Alonso J, Apolone G, Bullinger M, Bjorner J, Fukuhara S, Kaasa S, Leplège A, Sullivan M, Wood-Dauphinee S, Ware JE. Cross-cultural comparisons of the content of SF-36 translations across 10 countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:925-32. [PMID: 9817109 DOI: 10.1016/s0895-4356(98)00083-3] [Citation(s) in RCA: 194] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Increasingly, translated and culturally adapted health-related quality of life measures are being used in cross-cultural research. To assess comparability of results, researchers need to know the comparability of the content of the questionnaires used in different countries. Based on an item-by-item discussion among International Quality of Life Assessment (IQOLA) investigators of the content of the translated versions of the SF-36 in 10 countries, we discuss the difficulties that arose in translating the SF-36. We also review the solutions identified by IQOLA investigators to translate items and response choices so that they are appropriate within each country as well as comparable across countries. We relate problems and solutions to ratings of difficulty and conceptual equivalence for each item. The most difficult items to translate were physical functioning items that refer to activities not common outside the United States and items that use colloquial expressions in the source version. Identifying the origin of the source items, their meaning to American English-speaking respondents and American English synonyms, in response to country-specific translation issues, greatly helped the translation process. This comparison of the content of translated SF-36 items suggests that the translations are culturally appropriate and comparable in their content.
Collapse
|
42
|
Ware JE. A conversation with John E. Ware, Jr., PhD. MANAGED CARE INTERFACE 1998; 11:64-7. [PMID: 10186008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
|
43
|
Safran DG, Taira DA, Rogers WH, Kosinski M, Ware JE, Tarlov AR. Linking primary care performance to outcomes of care. THE JOURNAL OF FAMILY PRACTICE 1998; 47:213-220. [PMID: 9752374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
BACKGROUND Substantial research links many of the defining characteristics of primary care to important outcomes; yet little is known about the relative importance of each characteristic, and several characteristics have not been examined. These analyses evaluate the relationship between seven defining elements of primary care (accessibility, continuity, comprehensiveness, integration, clinical interaction, interpersonal treatment, and trust) and three outcomes (adherence to physician's advice, patient satisfaction, and improved health status). METHODS Data were derived from a cross-sectional observational study of adults employed by the Commonwealth of Massachusetts (N = 7204). All patients completed a validated questionnaire, the Primary Care Assessment Survey. Regression methods were used to examine the association between each primary care characteristic (11 summary scales measuring 7 elements of care) and each outcome. RESULTS Physicians' comprehensive ("whole person") knowledge of patients and patients' trust in their physician were the variables most strongly associated with adherence, and trust was the variable most strongly associated with patients' satisfaction with their physician. With other factors equal, adherence rates were 2.6 times higher among patients with whole-person knowledge scores in the 95th percentile compared with the 5th percentile (44.0% adherence vs 16.8% adherence, P < .001). The likelihood of complete satisfaction was 87.5% for those with 95th percentile trust scores compared with 0.4% for patients with 5th percentile trust scores (P < .001). The leading correlates of self-reported health improvements were integration of care, thoroughness of physical examinations, communication, comprehensive knowledge of patients, and trust (P < .001). CONCLUSIONS Patients' trust in their physician and physicians' knowledge of patients are leading correlates of three important outcomes of care. The results are noteworthy in the context of pervasive changes in our nation's health care system that are widely viewed as threatening to the quality of physician-patient relationships.
Collapse
|
44
|
Landgraf JM, Maunsell E, Speechley KN, Bullinger M, Campbell S, Abetz L, Ware JE. Canadian-French, German and UK versions of the Child Health Questionnaire: methodology and preliminary item scaling results. Qual Life Res 1998; 7:433-45. [PMID: 9691723 DOI: 10.1023/a:1008810004694] [Citation(s) in RCA: 272] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Using emerging international guidelines, stringent procedures were used to develop and evaluate Canadian-French, German and UK translations/adaptions of the 50 item, parent-completed Child Health Questionnaire (CHQ-PF50). Multitrait analysis was used to evaluate the convergent and discriminant validity of the hypothesized item sets across countries relative to the results obtained for a representative sample of children in the US. Cronbach's alpha coefficient was used to estimate the internal consistency reliability for each of the health scales. Floor and ceiling effects were also examined. Seventy-nine percent of all the item-scale correlations achieved acceptable internal consistency (0.40 or higher). The tests of the item convergent and discriminant validity were successful at least 87% of the time across all scales and countries. Equal item variance was observed 90% of the time across all countries. The reliability coefficients ranged from a low of 0.43 (parental time impact, Canadian English) to a high of 0.97 (physical functioning index, Canadian French) across all scales (median 0.80). Negligible floor effects were observed across countries. Noteworthy ceiling effects were observed, as expected, for the hypothesized physical scales (mean effect 73%). Conversely, fewer ceiling effects were observed for the psychosocial scales (range 3-17% behaviour-parental emotional impact). The item-scaling results obtained in these pilot studies support the psychometric properties of the American-English CHQ-PF50 and its respective translations.
Collapse
|
45
|
Damiano AM, Pastores GM, Ware JE. The health-related quality of life of adults with Gaucher's disease receiving enzyme replacement therapy: results from a retrospective study. Qual Life Res 1998; 7:373-86. [PMID: 9691718 DOI: 10.1023/a:1008814105603] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Few studies have reported on the effect of Gaucher's disease on patient-reported, health-related quality of life (HRQoL) and we do not know how the HRQoL burden of Gaucher's disease compares to that of other chronic conditions, what areas of HRQoL are most affected or how the course of change in HRQoL compares with that observed for other conditions or for the general adult population. The purpose of this study was to estimate (1) the HRQoL burden associated with Gaucher's disease managed by enzyme replacement therapy (ERT), (2) recalled changes in HRQoL since ERT initiation and (3) risk factors predictive of HRQoL outcomes. We sampled 212 patients with Gaucher's disease recruited from 146 physicians prescribing ERT in the US. The patients were at least 14 years of age and had been on ERT from 1 to 51 months. The mean (SD) age of the participants was 45 (17) years. Forty-nine percent had had a prior splenectomy and 26% had had a joint replacement. We administered the SF-36 Health Survey (SF-36) and three questions about changes in physical, mental and general HRQoL since starting ERT. The patients with Gaucher's disease scored significantly worse than the age- and gender-adjusted US norms on five of the eight SF-36 subscales (p < 0.05). Age (p < 0.0001) and joint replacement (p < 0.001) were negatively associated with physical health. The presence of an intact spleen (p < 0.01) and a longer duration of ERT (p < 0.01) were associated with better mental health. When asked about changes in HRQoL since starting ERT, at least half of the patients reported fewer limitations in physical activities (53%), better general health perceptions (77%) and less negative emotions (49%) at the time of the interview. Patients who had been receiving ERT for approximately 4 years recalled four and five times more improvement in general HRQoL in comparison with recalled changes over a 4 year period among adults in the general population (p < 0.001) and a congestive heart failure population (p < 0.01), respectively. Odds ratios (ORs) revealed that female patients were more likely to report improvements in general HRQoL than males (OR = 4.50 and 95% CI = 2.19-9.25) and 45 year old patients were less likely to report improvements than 35 year olds (OR = 0.76 and 95% CI = 0.62-0.94). Relative to patients who had been receiving ERT for 1 year, those who had been receiving ERT for 2 and 4 years were 1.40 (95% CI = 1.06-1.84) and 2.75 (95% CI = 1.20-6.27) times more likely to report improvements in general HRQoL, respectively. In summary, patients with Gaucher's disease on ERT reported an improvement in HRQoL that was greater than that reported by patients with other chronic diseases. However, Gaucher's patients treated for up to 51 months scored below equivalent adults in the general population. The risk factors, including age and history of splenectomy and joint replacement, warrant further study. Standardized HRQoL measures are likely to prove useful in understanding better the outcomes from the Gaucher's patient's perspective.
Collapse
|
46
|
Safran DG, Kosinski M, Tarlov AR, Rogers WH, Taira DH, Lieberman N, Ware JE. The Primary Care Assessment Survey: tests of data quality and measurement performance. Med Care 1998; 36:728-39. [PMID: 9596063 DOI: 10.1097/00005650-199805000-00012] [Citation(s) in RCA: 444] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
OBJECTIVES The authors examine the data quality and measurement performance of the Primary Care Assessment Survey (PCAS), a patient-completed questionnaire that operationalizes formal definitions of primary care, including the definition recently proposed by the Institute of Medicine Committee on the Future of Primary Care. METHODS The PCAS measures seven domains of care through 11 summary scales: accessibility (organizational, financial), continuity (longitudinal, visit-based), comprehensiveness (contextual knowledge of patient, preventive counseling), integration, clinical interaction (clinician-patient communication, thoroughness of physical examinations), interpersonal treatment, and trust. Data from a study of Massachusetts state employees (n = 6094) were used to evaluate key measurement properties of the 11 PCAS scales. Analyses were performed on the combined population and for each of the 16 subgroups defined according to sociodemographic and health characteristics. RESULTS The 11 PCAS scales demonstrated consistently strong measurement characteristics across all subgroups of this adult population. Tests of scaling assumptions for summated rating scales were well satisfied by all Likert-scaled measures. Assessment of data completeness, scale score dispersion characteristics, and inter-scale correlations provide strong evidence for the soundness of all scales, and for the value of separately measuring and interpreting these concepts. CONCLUSIONS With public and private sector policies increasingly emphasizing the importance of primary care, the need for tools to evaluate and improve primary care performance is clear. The PCAS has excellent measurement properties, and performs consistently well across varied segments of the adult population. Widespread application of an assessment methodology, such as the PCAS, will afford an empiric basis through which to measure, monitor, and continuously improve primary care.
Collapse
|
47
|
Ware JE, Kemp JP, Buchner DA, Singer AE, Nolop KB, Goss TF. The responsiveness of disease-specific and generic health measures to changes in the severity of asthma among adults. Qual Life Res 1998; 7:235-44. [PMID: 9584554 DOI: 10.1023/a:1024946316424] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The objective of the study was to compare the validity of asthma-specific and generic health outcome measures in relation to changes in the severity of asthma and to treatment. Adult patients (n = 142) participating in a randomized placebo-controlled trial at six clinics were assessed at baseline, prior to the withdrawal (placebo) or continuation of treatment with Vanceril and again after 8 weeks. The criterion measures of change in severity included pulmonary function expressed as the percent predicted FEV1, five physician-assessed asthma severity measures (cough, chest tightness, wheezing, shortness of breath and overall condition) and two patient-assessed severity measures (night-time symptoms and overall symptoms). The 8 week change scores were estimated for all generic and specific measures and the results were compared across groups of patients who did and did not change in terms of clinical criteria of disease severity and across treatment groups. The responsiveness of each generic and specific measure was estimated independently using the relative validity (RV) methodology, which compares F-ratios for the mean change scores across measures in analyses of the same comparison groups. RV coefficients estimate how much worse each measure discriminated between comparison groups, relative to the best measure (RV = 1.0). Four standardized asthma-specific measures and a total scale score (based on the Marks questionnaire), an individualized asthma-specific scale measuring limitations in activities most important to each patient (based on the Juniper method) and two newly-developed scales measuring physical and psychosocial symptoms were used as outcome measures, generic health outcome measures included eight functional health and well-being scales as well as the physical and mental health summary scales from the SF-36 health survey. A standardized asthma-specific scale was most valid in discriminating between groups of patients who did and did not change according to all of the clinical criterion variables studied and in discriminating between treated and untreated groups. Different scales performed best, depending on the clinical criterion. The asthma-specific Marks breathlessness scale was significant in all nine comparisons (RV = 0.62-1.0) and was most valid in discriminating between groups in six of nine tests. The overall scale also performed well in all comparisons (RV = 0.58-1.0). The newly-developed physical symptoms scale was significant in discriminating between groups in eight out of nine tests (RV = 0.52-1.0) and was most valid in three of the nine, including the treatment comparison. The psychosocial impact scale discriminated significantly in eight of the nine comparisons (RV = 0.16-0.38), but was less valid than other specific measures. The asthma-specific individualized activities scale discriminated significantly in seven of the nine tests, but performed less well than the other specific measures (RV = 0.21-0.35) and was not significant in the treatment comparison. One or more SF-36 scales discriminated significantly between groups in all nine comparisons. Two of those scales (physical functioning and role-physical) were consistently more valid than the others (RV = 0.17 and 0.58, respectively) and were the only two generic scales that discriminated between groups of patients defined in terms of changes in FEV1 (RV = 0.26-0.58). The SF-36 physical summary scale discriminated significantly between groups in all nine comparisons (RV = 0.19-0.61) and was the most valid generic measure in the treatment comparison (RV = 0.55). The SF-36 mental summary scale was significant only for the two patient-assessed changes in disease severity (RV = 0.31 and 0.32) and for physician-assessed overall severity (RV = 0.12). A comprehensive battery of generic and specific measures is likely to be most useful in understanding the impact of changes in disease severity on the functional health and well-being of adults with asthma, a
Collapse
|
48
|
Nelson EC, McHorney CA, Manning WG, Rogers WH, Zubkoff M, Greenfield S, Ware JE, Tarlov AR. A longitudinal study of hospitalization rates for patients with chronic disease: results from the Medical Outcomes Study. Health Serv Res 1998; 32:759-74. [PMID: 9460485 PMCID: PMC1070232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
OBJECTIVE To prospectively compare inpatient and outpatient utilization rates between prepaid (PPD) and fee-for-service (FFS) insurance coverage for patients with chronic disease. DATA SOURCE/STUDY SETTING Data from the Medical Outcomes Study, a longitudinal observational study of chronic disease patients conducted in Boston, Chicago, and Los Angeles. STUDY DESIGN A four-year prospective study of resource utilization among 1,681 patients under treatment for hypertension, diabetes, myocardial infarction, or congestive heart failure in the practices of 367 clinicians. DATA COLLECTION/EXTRACTION METHODS Insurance payment system (PPD or FFS), hospitalizations, and office visits were obtained from patient reports. Disease and severity indicators, sociodemographics, and self-reported functional status were used to adjust for patient mix and to compute expected utilization rates. PRINCIPAL FINDINGS Compared to FFS, PPD patients had 31 percent fewer observed hospitalizations before adjustment for patient differences (p = .005) and 15 percent fewer hospitalizations than expected after adjustment (p = .078). The observed rate of FFS hospitalizations exceeded the expected rate by 9 percent. These results are not explained by system differences in patient mix or trends in hospital use over four years. Half of the PPD/FFS difference in hospitalization rate is due to intrinsic characteristics of the payment system itself. CONCLUSIONS PPD patients with chronic medical conditions followed prospectively over four years, after extensive patient-mix adjustment, had 15 percent fewer hospitalizations than their FFS counterparts owing to differences intrinsic to the insurance reimbursement system.
Collapse
|
49
|
Bayliss MS, Gandek B, Bungay KM, Sugano D, Hsu MA, Ware JE. A questionnaire to assess the generic and disease-specific health outcomes of patients with chronic hepatitis C. Qual Life Res 1998; 7:39-55. [PMID: 9481150 DOI: 10.1023/a:1008884805251] [Citation(s) in RCA: 125] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
A 69-item questionnaire measuring generic functioning and well-being and disease-specific health outcomes was developed and tested using the pre-treatment data from patients with chronic hepatitis C (CHC) participating in two randomized trials of interferon alpha-2b (n = 157). The questionnaire included all eight scales from the SF-36 and measures of nine other generic and disease-specific health concepts. Psychometric tests confirmed the assumptions underlying the construction and scoring of all generic and disease-specific scales. Cross-sectional tests of 'known groups' validity showed that CHC patients scored worse on the generic scales than patients with other chronic conditions and worse than a healthy general population. The generic and disease-specific scale scores were lower in the presence of physical findings of CHC, as hypothesized, but only the physical functioning and bodily pain scales were linked to cirrhosis or extreme alanine aminotransferase (ALT) ratios. This instrument will be useful in studies of health outcome among patients with CHC, a condition whose health burden appears to have been underestimated in studies to date.
Collapse
|
50
|
Safran DG, Rogers WH, Tarlov AR, McHorney CA, Ware JE. Gender differences in medical treatment: the case of physician-prescribed activity restrictions. Soc Sci Med 1997; 45:711-22. [PMID: 9226794 DOI: 10.1016/s0277-9536(96)00405-4] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
A growing scientific literature highlights concern about the influence of social bias in medical care. Differential treatment of male and female patients has been among the documented concerns. Yet, little is known about the extent to which differential treatment of male and female patients reflects the influence of social bias or of more acceptable factors, such as different patient preferences or different anticipated outcomes of care. This paper attempts to ascertain the underlying basis for an observed differential in physicians' tendency to advice activity restrictions for male and female patients. We explore the extent to which the gender-based treatment differential is attributable to: (1) patients' health profile, (2) patients' role responsibilities, (3) patients' illness behaviors, and (4) physician characteristics. These four categories of variables correspond to four prominent social science hypotheses concerning gender differences in health and health care utilization (i.e, biological basis hypothesis, fixed role hypothesis, socialization hypothesis, physician bias hypothesis). Data are drawn from the Medical Outcomes Study (MOS), a longitudinal observational study of 1546 patients of 349 physicians practicing in three U.S. cities. Multivariate logistic regression is used to evaluate the likelihood of physician-prescribed activity restrictions for male and female patients, and to explore the absolute and relative influence of patient and physician factors on the observed treatment differential. Results reveal that the odds of prescribed activity restrictions are 3.6 times higher for female patients than for males with equivalent characteristics. The observed differential is not explained by differences in male and female patients' health or role responsibilities. Gender differences in illness behavior and physician gender biases both appear to contribute to the observed differential. Female patients exhibit more illness behavior than males, and these behaviors increase physicians' tendency to prescribe activity restrictions. After accounting for illness behavior differences and all other factors, the odds of prescribed activity restrictions among female patients of male physicians is four times that of equivalent male patients of those physicians. Medical practice, education, and research must strive to identify and remove the likely unconscious role of social bias in medical decision making.
Collapse
|