1
|
Ware J, Kosinski M, Keller SD. A 12-Item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care 1996; 34:220-33. [PMID: 8628042 DOI: 10.1097/00005650-199603000-00003] [Citation(s) in RCA: 12612] [Impact Index Per Article: 434.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Regression methods were used to select and score 12 items from the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36) to reproduce the Physical Component Summary and Mental Component Summary scales in the general US population (n=2,333). The resulting 12-item short-form (SF-12) achieved multiple R squares of 0.911 and 0.918 in predictions of the SF-36 Physical Component Summary and SF-36 Mental Component Summary scores, respectively. Scoring algorithms from the general population used to score 12-item versions of the two components (Physical Components Summary and Mental Component Summary) achieved R squares of 0.905 with the SF-36 Physical Component Summary and 0.938 with SF-36 Mental Component Summary when cross-validated in the Medical Outcomes Study. Test-retest (2-week)correlations of 0.89 and 0.76 were observed for the 12-item Physical Component Summary and the 12-item Mental Component Summary, respectively, in the general US population (n=232). Twenty cross-sectional and longitudinal tests of empirical validity previously published for the 36-item short-form scales and summary measures were replicated for the 12-item Physical Component Summary and the 12-item Mental Component Summary, including comparisons between patient groups known to differ or to change in terms of the presence and seriousness of physical and mental conditions, acute symptoms, age and aging, self-reported 1-year changes in health, and recovery for depression. In 14 validity tests involving physical criteria, relative validity estimates for the 12-item Physical Component Summary ranged from 0.43 to 0.93 (median=0.67) in comparison with the best 36-item short-form scale. Relative validity estimates for the 12-item Mental Component Summary in 6 tests involving mental criteria ranged from 0.60 to 107 (median=0.97) in relation to the best 36-item short-form scale. Average scores for the 2 summary measures, and those for most scales in the 8-scale profile based on the 12-item short-form, closely mirrored those for the 36-item short-form, although standard errors were nearly always larger for the 12-item short-form.
Collapse
|
|
29 |
12612 |
2
|
McHorney CA, Ware JE, Raczek AE. The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care 1993; 31:247-63. [PMID: 8450681 DOI: 10.1097/00005650-199303000-00006] [Citation(s) in RCA: 4646] [Impact Index Per Article: 145.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Cross-sectional data from the Medical Outcomes Study (MOS) were analyzed to test the validity of the MOS 36-Item Short-Form Health Survey (SF-36) scales as measures of physical and mental health constructs. Results from traditional psychometric and clinical tests of validity were compared. Principal components analysis was used to test for hypothesized physical and mental health dimensions. For purposes of clinical tests of validity, clinical criteria defined mutually exclusive adult patient groups differing in severity of medical and psychiatric conditions. Scales shown in the components analysis to primarily measure physical health (physical functioning and role limitations-physical) best distinguished groups differing in severity of chronic medical condition and had the most pure physical health interpretation. Scales shown to primarily measure mental health (mental health and role limitations-emotional) best distinguished groups differing in the presence and severity of psychiatric disorders and had the most pure mental health interpretation. The social functioning, vitality, and general health perceptions scales measured both physical and mental health components and, thus, had the most complex interpretation. These results are useful in establishing guidelines for the interpretation of each scale and in documenting the size of differences between clinical groups that should be considered very large.
Collapse
|
|
32 |
4646 |
3
|
McHorney CA, Ware JE, Lu JF, Sherbourne CD. The MOS 36-item Short-Form Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care 1994; 32:40-66. [PMID: 8277801 DOI: 10.1097/00005650-199401000-00004] [Citation(s) in RCA: 3184] [Impact Index Per Article: 102.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The widespread use of standardized health surveys is predicated on the largely untested assumption that scales constructed from those surveys will satisfy minimum psychometric requirements across diverse population groups. Data from the Medical Outcomes Study (MOS) were used to evaluate data completeness and quality, test scaling assumptions, and estimate internal-consistency reliability for the eight scales constructed from the MOS SF-36 Health Survey. Analyses were conducted among 3,445 patients and were replicated across 24 subgroups differing in sociodemographic characteristics, diagnosis, and disease severity. For each scale, item-completion rates were high across all groups (88% to 95%), but tended to be somewhat lower among the elderly, those with less than a high school education, and those in poverty. On average, surveys were complete enough to compute scales scores for more than 96% of the sample. Across patient groups, all scales passed tests for item-internal consistency (97% passed) and item-discriminant validity (92% passed). Reliability coefficients ranged from a low of 0.65 to a high of 0.94 across scales (median = 0.85) and varied somewhat across patient subgroups. Floor effects were negligible except for the two role disability scales. Noteworthy ceiling effects were observed for both role disability scales and the social functioning scale. These findings support the use of the SF-36 survey across the diverse populations studied and identify population groups in which use of standardized health status measures may or may not be problematic.
Collapse
|
|
31 |
3184 |
4
|
|
Review |
25 |
2808 |
5
|
Stewart AL, Hays RD, Ware JE. The MOS short-form general health survey. Reliability and validity in a patient population. Med Care 1988; 26:724-35. [PMID: 3393032 DOI: 10.1097/00005650-198807000-00007] [Citation(s) in RCA: 2248] [Impact Index Per Article: 60.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
|
37 |
2248 |
6
|
Gandek B, Ware JE, Aaronson NK, Apolone G, Bjorner JB, Brazier JE, Bullinger M, Kaasa S, Leplege A, Prieto L, Sullivan M. Cross-validation of item selection and scoring for the SF-12 Health Survey in nine countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1171-8. [PMID: 9817135 DOI: 10.1016/s0895-4356(98)00109-7] [Citation(s) in RCA: 2183] [Impact Index Per Article: 80.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Data from general population surveys (n = 1483 to 9151) in nine European countries (Denmark, France, Germany, Italy, the Netherlands, Norway, Spain, Sweden, and the United Kingdom) were analyzed to cross-validate the selection of questionnaire items for the SF-12 Health Survey and scoring algorithms for 12-item physical and mental component summary measures. In each country, multiple regression methods were used to select 12 SF-36 items that best reproduced the physical and mental health summary scores for the SF-36 Health Survey. Summary scores then were estimated with 12 items in three ways: using standard (U.S.-derived) SF-12 items and scoring algorithms; standard items and country-specific scoring; and country-specific sets of 12 items and scoring. Replication of the 36-item summary measures by the 12-item summary measures was then evaluated through comparison of mean scores and the strength of product-moment correlations. Product-moment correlations between SF-36 summary measures and SF-12 summary measures (standard and country-specific) were very high, ranging from 0.94-0.96 and 0.94-0.97 for the physical and mental summary measures, respectively. Mean 36-item summary measures and comparable 12-item summary measures were within 0.0 to 1.5 points (median = 0.5 points) in each country and were comparable across age groups. Because of the high degree of correspondence between summary physical and mental health measures estimated using the SF-12 and SF-36, it appears that the SF-12 will prove to be a practical alternative to the SF-36 in these countries, for purposes of large group comparisons in which the focus is on overall physical and mental health outcomes.
Collapse
|
|
27 |
2183 |
7
|
Ware JE, Gandek B. Overview of the SF-36 Health Survey and the International Quality of Life Assessment (IQOLA) Project. J Clin Epidemiol 1998; 51:903-12. [PMID: 9817107 DOI: 10.1016/s0895-4356(98)00081-x] [Citation(s) in RCA: 1730] [Impact Index Per Article: 64.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
This article presents information about the development and evaluation of the SF-36 Health Survey, a 36-item generic measure of health status. It summarizes studies of reliability and validity and provides administrative and interpretation guidelines for the SF-36. A brief history of the International Quality of Life Assessment (IQOLA) Project is also included.
Collapse
|
|
27 |
1730 |
8
|
Kaplan SH, Greenfield S, Ware JE. Assessing the effects of physician-patient interactions on the outcomes of chronic disease. Med Care 1989; 27:S110-27. [PMID: 2646486 DOI: 10.1097/00005650-198903001-00010] [Citation(s) in RCA: 1244] [Impact Index Per Article: 34.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Growing interest in the doctor-patient relationship focuses attention on the specific elements of that relationship that affect patients' health outcomes. Data are presented for four clinical trials conducted in varied practice settings among chronically ill patients differing markedly in sociodemographic characteristics. These trials demonstrated that "better health" measured physiologically (blood pressure or blood sugar), behaviorally (functional status), or more subjectively (evaluations of overall health status) was consistently related to specific aspects of physician-patient communication. We conclude that the physician-patient relationship may be an important influence on patients' health outcomes and must be taken into account in light of current changes in the health care delivery system that may place this relationship at risk.
Collapse
|
Clinical Trial |
36 |
1244 |
9
|
Stewart AL, Greenfield S, Hays RD, Wells K, Rogers WH, Berry SD, McGlynn EA, Ware JE. Functional status and well-being of patients with chronic conditions. Results from the Medical Outcomes Study. JAMA 1989. [PMID: 2754790 DOI: 10.1001/jama.1989.03430070055030] [Citation(s) in RCA: 1094] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Enhancing daily functioning and well-being is an increasingly advocated goal in the treatment of patients with chronic conditions. We evaluated the functioning and well-being of 9385 adults at the time of office visits to 362 physicians in three US cities, using brief surveys completed by both patients and physicians. For eight of nine common chronic medical conditions, patients with the condition showed markedly worse physical, role, and social functioning; mental health; health perceptions; and/or bodily pain compared with patients with no chronic conditions. Each condition had a unique profile among the various health components. Hypertension had the least overall impact; heart disease and patient-reported gastrointestinal disorders had the greatest impact. Patients with multiple conditions showed greater decrements in functioning and well-being than those with only one condition. Substantial variations in functioning and well-being within each chronic condition group remain to be explained.
Collapse
|
Comparative Study |
36 |
1094 |
10
|
Abstract
An intervention was developed to increase patient involvement in care. Using a treatment algorithm as a guide, patients were helped to read their medical record and coached to ask questions and negotiate medical decisions with their physicians during a 20-minute session before their regularly scheduled visit. In a randomized controlled trial we compared this intervention with a standard educational session of equal length in a clinic for patients with ulcer disease. Six to eight weeks after the trial, patients in the experimental group reported fewer limitations in physical and role-related activities (p less than 0.05), preferred a more active role in medical decision-making, and were as satisfied with their care as the control group. Analysis of audiotapes of physician-patient interactions showed that patients in the experimental group were twice as effective as control patients in obtaining information from physicians (p less than 0.05). Results of the intervention included increased involvement in the interaction with the physician, fewer limitations imposed by the disease on patients' functional ability, and increased preference for active involvement in medical decision-making.
Collapse
|
Clinical Trial |
40 |
933 |
11
|
Veit CT, Ware JE. The structure of psychological distress and well-being in general populations. J Consult Clin Psychol 1983; 51:730-42. [PMID: 6630688 DOI: 10.1037/0022-006x.51.5.730] [Citation(s) in RCA: 923] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
|
42 |
923 |
12
|
Kosinski M, Bayliss MS, Bjorner JB, Ware JE, Garber WH, Batenhorst A, Cady R, Dahlöf CGH, Dowson A, Tepper S. A six-item short-form survey for measuring headache impact: the HIT-6. Qual Life Res 2003; 12:963-74. [PMID: 14651415 DOI: 10.1023/a:1026119331193] [Citation(s) in RCA: 894] [Impact Index Per Article: 40.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
BACKGROUND Migraine and other severe headaches can cause suffering and reduce functioning and productivity. Patients are the best source of information about such impact. OBJECTIVE To develop a new short form (HIT-6) for assessing the impact of headaches that has broad content coverage but is brief as well as reliable and valid enough to use in screening and monitoring patients in clinical research and practice. METHODS HIT-6 items were selected from an existing item pool of 54 items and from 35 items suggested by clinicians. Items were selected and modified based on content validity, item response theory (IRT) information functions, item internal consistency, distributions of scores, clinical validity, and linguistic analyses. The HIT-6 was evaluated in an Internet-based survey of headache sufferers (n = 1103) who were members of America Online (AOL). After 14 days, 540 participated in a follow-up survey. RESULTS HIT-6 covers six content categories represented in widely used surveys of headache impact. Internal consistency, alternate forms, and test-retest reliability estimates of HIT-6 were 0.89, 0.90, and 0.80, respectively. Individual patient score confidence intervals (95%) of app. +/-5 were observed for 88% of all respondents. In tests of validity in discriminating across diagnostic and headache severity groups, relative validity (RV) coefficients of 0.82 and 1.00 were observed for HIT-6, in comparison with the Total Score. Patient-level classifications based in HIT-6 were accurate 88.7% of the time at the recommended cut-off score for a probability of migraine diagnosis. HIT-6 was responsive to self-reported changes in headache impact. CONCLUSIONS The IRT model estimated for a 'pool' of items from widely used measures of headache impact was useful in constructing an efficient, reliable, and valid 'static' short form (HIT-6) for use in screening and monitoring patient outcomes.
Collapse
|
|
22 |
894 |
13
|
Sullivan M, Karlsson J, Ware JE. The Swedish SF-36 Health Survey--I. Evaluation of data quality, scaling assumptions, reliability and construct validity across general populations in Sweden. Soc Sci Med 1995; 41:1349-58. [PMID: 8560302 DOI: 10.1016/0277-9536(95)00125-q] [Citation(s) in RCA: 884] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
We document the applicability of the SF-36 Health Survey, which was translated into Swedish using methods later adopted by the International Quality of Life Assessment (IQOLA) Project procedures. To test its appropriateness for use in Sweden, it was administered through mail-out/mail-back questionnaires in seven general population studies with an average response rate of 68%. The 8930 respondents varied by gender (48.2% men), age (range 15-93 years, mean age 42.7), marital status, education, socio-economic status, and geographical area. Psychometric methods used in the evaluation of the SF-36 in the U.S. were replicated. Over 90% of respondents had complete items for each of the eight SF-36 scales, although more missing data were observed for subjects 75 years and over. Scale scores could be computed for the vast majority of respondents (95% and over); slightly fewer in the oldest subgroup. Item-internal consistency was consistently high across socio-demographic subgroups and the eight scales. Most reliability estimates exceeded the 0.80 level. The highest reliability was observed for the Bodily Pain Scale where all subgroups met the 0.90 level recommended for individual comparisons; coefficients at or above 0.90 were also observed in most subgroups for the Physical Functioning Scale. Tests of scaling assumptions including hypothesized item groupings, which reflect the construct validity of scales, were consistently favorable across subgroups, although lower rates were noted in the oldest age group. In conclusion, these studies have yielded empirical evidence supporting the feasibility of a non-English language reproduction of the SF-36 Health Survey. The Swedish SF-36 is ready for further evaluation.
Collapse
|
|
30 |
884 |
14
|
Greenfield S, Kaplan SH, Ware JE, Yano EM, Frank HJ. Patients' participation in medical care: effects on blood sugar control and quality of life in diabetes. J Gen Intern Med 1988; 3:448-57. [PMID: 3049968 DOI: 10.1007/bf02595921] [Citation(s) in RCA: 761] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
To maximize disease control, patients must participate effectively in their medical care. The authors developed an intervention designed to increase the involvement of patients in medical decision making. In a 20-minute session just before the regular visit to a physician, a clinic assistant reviewed the medical record of each experimental patient with him/her, guided by a diabetes algorithm. Using systematic prompts, the assistant encouraged patients to use the information gained to negotiate medical decisions with the doctor. A randomized trial was conducted in two university hospital clinics to compare this intervention with standard educational materials in sessions of equal length. The mean pre-intervention glycosylated hemoglobin (HbA1) values were 10.6 +/- 2.1% for 33 experimental patients and 10.3 +/- 2.0% for 26 controls. After the intervention the mean levels were 9.1 +/- 1.9% in the experimental group (p less than 0.01) and 10.6 +/- 2.22% for controls. Analysis of audiotapes of the visits to the physician showed the experimental patients were twice as effective as controls in eliciting information from the physician. Experimental patients reported significantly fewer function limitations. The authors conclude that the intervention is feasible and that it changes patient behavior, improves blood sugar control, and decreases functional limitations.
Collapse
|
Clinical Trial |
37 |
761 |
15
|
Ware JE, Snyder MK, Wright WR, Davies AR. Defining and measuring patient satisfaction with medical care. EVALUATION AND PROGRAM PLANNING 1983; 6:247-63. [PMID: 10267253 DOI: 10.1016/0149-7189(83)90005-8] [Citation(s) in RCA: 702] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
This paper describes the development of Form II of the Patient Satisfaction Questionnaire (PSQ), a self-administered survey instrument designed for use in general population studies. The PSQ contains 55 Likert-type items that measure attitudes toward the more salient characteristics of doctors and medical care services (technical and interpersonal skills of providers, waiting time for appointments, office waits, emergency care, costs of care, insurance coverage, availability of hospitals, and other resources) and satisfaction with care in general. Scales are balanced to control for acquiescent response set. Scoring rules for 18 multi-item subscales and eight global scales were standardized following replication of item analyses in four field tests. Internal-consistency and test-retest estimates indicate satisfactory reliability for studies involving group comparisons. The PSQ well represents the content of characteristics of providers and services described most often in the literature and in response to open-ended questions. Empirical tests of validity have also produced generally favorable results.
Collapse
|
|
42 |
702 |
16
|
Fukuhara S, Ware JE, Kosinski M, Wada S, Gandek B. Psychometric and clinical tests of validity of the Japanese SF-36 Health Survey. J Clin Epidemiol 1998; 51:1045-53. [PMID: 9817122 DOI: 10.1016/s0895-4356(98)00096-1] [Citation(s) in RCA: 683] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Cross-sectional data from a representative sample of the general population in Japan were analyzed to test the validity of Japanese SF-36 Health Survey scales as measures of physical and mental health. Results from psychometric and clinical tests of validity were compared. Principal components analyses were used to test for the hypothesized physical and mental dimensions of health and the pattern of scale correlations with those components. To test the clinical validity of SF-36 scale scores, self-reports of chronic medical conditions and the Zung Self-Rating Depression Scale were used to create mutually exclusive groups differing in the severity of physical and mental conditions. The pattern of correlations between the SF-36 scales and the two empirically derived components generally confirmed hypotheses for most scales. Results of psychometric and clinical tests of validity were in agreement for the Physical Functioning, Role-Physical, Vitality, Social Functioning, and Mental Health scales. Relatively less agreement between psychometric and clinical tests of validity was observed for the Bodily Pain, General Health, and Role-Emotional scales, and the physical and mental health factor content of those scales was not consistent with hypotheses. In clinical tests of validity, the General Health, Bodily Pain, and Physical Functioning scales were the most valid scales in discriminating between groups with and without a severe physical condition. Scales that correlated highest with mental health in the components analysis (Mental Health and Vitality) also were most valid in discriminating between groups with and without depression. The results of this study provide preliminary interpretation guidelines for all SF-36 scales, although caution is recommended in the interpretation of the Role-Emotional, Bodily Pain, and General Health scales pending further studies in Japan.
Collapse
|
|
27 |
683 |
17
|
Bullinger M, Alonso J, Apolone G, Leplège A, Sullivan M, Wood-Dauphinee S, Gandek B, Wagner A, Aaronson N, Bech P, Fukuhara S, Kaasa S, Ware JE. Translating health status questionnaires and evaluating their quality: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:913-23. [PMID: 9817108 DOI: 10.1016/s0895-4356(98)00082-1] [Citation(s) in RCA: 647] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
This article describes the methods adopted by the International Quality of Life Assessment (IQOLA) project to translate the SF-36 Health Survey. Translation methods included the production of forward and backward translations, use of difficulty and quality ratings, pilot testing, and cross-cultural comparison of the translation work. Experience to date suggests that the SF-36 can be adapted for use in other countries with relatively minor changes to the content of the form, providing support for the use of these translations in multinational clinical trials and other studies. The most difficult items to translate were physical functioning items, which used examples of activities and distances that are not common outside of the United States; items that used colloquial expressions such as pep or blue; and the social functioning items. Quality ratings were uniformly high across countries. While the IQOLA approach to translation and validation was developed for use with the SF-36, it is applicable to other translation efforts.
Collapse
|
|
27 |
647 |
18
|
McHorney CA, Ware JE, Rogers W, Raczek AE, Lu JF. The validity and relative precision of MOS short- and long-form health status scales and Dartmouth COOP charts. Results from the Medical Outcomes Study. Med Care 1992; 30:MS253-65. [PMID: 1583937 DOI: 10.1097/00005650-199205001-00025] [Citation(s) in RCA: 473] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
This study estimated the validity and relative precision (RP) of four methods (MOS long- and short-form scales, global items, and COOP Poster Charts) in measuring six general health concepts. The authors also tested whether and how precisely each method discriminated relatively well adult patients (N = 638) from those with only severe chronic medical (N = 168) and only psychiatric conditions (N = 163), as clinically defined. For comparisons between the well group and both medical and psychiatric groups, RP estimates favored long-form over short-form, multi-item scales, and favored multi-item scales over single-item global measures and poster charts. In relation to long forms, short-form multi-item scales achieved a median RP of .93; RP estimates for global items and poster charts were .81 and .67, respectively. Variations in RP across methods and concepts were linked to differences in the coarseness of measurement scales, reliability, and content (including the effects of chart illustrations). These variations in RP have implications for the interpretation of scores, the statistical power of comparisons between clinical groups, and the size of confidence intervals around individual patient scores.
Collapse
|
|
33 |
473 |
19
|
Ware JE, Kosinski M, Gandek B, Aaronson NK, Apolone G, Bech P, Brazier J, Bullinger M, Kaasa S, Leplège A, Prieto L, Sullivan M. The factor structure of the SF-36 Health Survey in 10 countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1159-65. [PMID: 9817133 DOI: 10.1016/s0895-4356(98)00107-3] [Citation(s) in RCA: 464] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Studies of the factor structure of the SF-36 Health Survey are an important step in its construct validation. Its structure is also the psychometric basis for scoring physical and mental health summary scales, which are proving useful in simplifying and interpreting statistical analyses. To test the generalizability of the SF-36 factor structure, product-moment correlations among the eight SF-36 Health Survey scales were estimated for representative samples of general populations in each of 10 countries. Matrices were independently factor analyzed using identical methods to test for hypothesized physical and mental health components, and results were compared with those published for the United States. Following simple orthogonal rotation of two principal components, they were easily interpreted as dimensions of physical and mental health in all countries. These components accounted for 76% to 85% of the reliable variance in scale scores across nine European countries, in comparison with 82% in the United States. Similar patterns of correlations between the eight scales and the components were observed across all countries and across age and gender subgroups within each country. Correlations with the physical component were highest (0.64 to 0.86) for the Physical Functioning, Role Physical, and Bodily Pain scales, whereas the Mental Health, Role Emotional, and Social Functioning scales correlated highest (0.62 to 0.91) with the mental component. Secondary correlations for both clusters of scales were much lower. Scales measuring General Health and Vitality correlated moderately with both physical and mental health components. These results support the construct validity of the SF-36 translations and the scoring of physical and mental health components in all countries studied.
Collapse
|
|
27 |
464 |
20
|
Safran DG, Kosinski M, Tarlov AR, Rogers WH, Taira DH, Lieberman N, Ware JE. The Primary Care Assessment Survey: tests of data quality and measurement performance. Med Care 1998; 36:728-39. [PMID: 9596063 DOI: 10.1097/00005650-199805000-00012] [Citation(s) in RCA: 456] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
OBJECTIVES The authors examine the data quality and measurement performance of the Primary Care Assessment Survey (PCAS), a patient-completed questionnaire that operationalizes formal definitions of primary care, including the definition recently proposed by the Institute of Medicine Committee on the Future of Primary Care. METHODS The PCAS measures seven domains of care through 11 summary scales: accessibility (organizational, financial), continuity (longitudinal, visit-based), comprehensiveness (contextual knowledge of patient, preventive counseling), integration, clinical interaction (clinician-patient communication, thoroughness of physical examinations), interpersonal treatment, and trust. Data from a study of Massachusetts state employees (n = 6094) were used to evaluate key measurement properties of the 11 PCAS scales. Analyses were performed on the combined population and for each of the 16 subgroups defined according to sociodemographic and health characteristics. RESULTS The 11 PCAS scales demonstrated consistently strong measurement characteristics across all subgroups of this adult population. Tests of scaling assumptions for summated rating scales were well satisfied by all Likert-scaled measures. Assessment of data completeness, scale score dispersion characteristics, and inter-scale correlations provide strong evidence for the soundness of all scales, and for the value of separately measuring and interpreting these concepts. CONCLUSIONS With public and private sector policies increasingly emphasizing the importance of primary care, the need for tools to evaluate and improve primary care performance is clear. The PCAS has excellent measurement properties, and performs consistently well across varied segments of the adult population. Widespread application of an assessment methodology, such as the PCAS, will afford an empiric basis through which to measure, monitor, and continuously improve primary care.
Collapse
|
|
27 |
456 |
21
|
Ware JE, Gandek B. Methods for testing data quality, scaling assumptions, and reliability: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:945-52. [PMID: 9817111 DOI: 10.1016/s0895-4356(98)00085-7] [Citation(s) in RCA: 440] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Following the translation development stage, the second research stage of the IQOLA Project tests the assumptions underlying item scoring and scale construction. This article provides detailed information on the research methods used by the IQOLA Project to evaluate data quality, scaling and scoring assumptions, and the reliability of the SF-36 scales. Tests include evaluation of item and scale-level descriptive statistics; examination of the equality of item-scale correlations, item internal consistency and item discriminant validity; and estimation of scale score reliability using internal consistency and test-retest methods. Results from these tests are used to determine if standard algorithms for the construction and scoring of the eight SF-36 scales can be used in each country and to provide information that can be used in translation improvement.
Collapse
|
|
27 |
440 |
22
|
Ware JE, Gandek B, Kosinski M, Aaronson NK, Apolone G, Brazier J, Bullinger M, Kaasa S, Leplège A, Prieto L, Sullivan M, Thunedborg K. The equivalence of SF-36 summary health scores estimated using standard and country-specific algorithms in 10 countries: results from the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51:1167-70. [PMID: 9817134 DOI: 10.1016/s0895-4356(98)00108-5] [Citation(s) in RCA: 436] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Data from general population surveys (n = 1771 to 9151) in nine European countries (Denmark, France, Germany, Italy, the Netherlands, Norway, Spain, Sweden, and the United Kingdom) were analyzed to test the algorithms used to score physical and mental component summary measures (PCS-36/MCS-36) based on the SF-36 Health Survey. Scoring coefficients for principal components were estimated independently in each country using identical methods of factor extraction and orthogonal rotation. PCS-36 and MCS-36 scores were also estimated using standard (U.S.-derived) scoring algorithms, and results were compared. Product-moment correlations between scores estimated from standard and country-specific scoring coefficients were very high (0.98 to 1.00) for both physical and mental health components in all countries. As hypothesized for orthogonal components, correlations between physical and mental components within each country were very low (0.00 to 0.12) for both estimation methods. Mean scores for PCS-36 differed by as much as 3.0 points across countries using standard scoring, and mean scores for MCS-36 differed across countries by as much as 6.4 points. In view of the high degree of equivalence observed within each country, using standard and country-specific algorithms, we recommend use of standard scoring algorithms for purposes of multinational studies involving these 10 countries.
Collapse
|
|
27 |
436 |
23
|
Wu AW, Rubin HR, Mathews WC, Ware JE, Brysk LT, Hardy WD, Bozzette SA, Spector SA, Richman DD. A health status questionnaire using 30 items from the Medical Outcomes Study. Preliminary validation in persons with early HIV infection. Med Care 1991; 29:786-98. [PMID: 1875745 DOI: 10.1097/00005650-199108000-00011] [Citation(s) in RCA: 430] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Many current health status instruments either are too long to use in many acquired immune deficiency syndrome (AIDS) clinical trials or omit important concepts. In this study, human immunodeficiency virus (HIV)-relevant items developed for the Medical Outcomes Study (MOS) from subscales for cognitive function, energy/fatigue, health distress, and a single quality of life item were added to a portion of the MOS Short-form General Health Survey. The resulting 30-item questionnaire reliably and distinctly measured ten aspects of health and took less than 5 minutes to complete. To test its validity, this modified measure was used to compare the health of 73 subjects with asymptomatic HIV infection and 44 with early AIDS-related complex (ARC). Compared with ARC subjects, asymptomatic individuals reported superior overall health, less pain, and better physical function, role function, cognitive function, and quality of life (rank-sum, P less than 0.02). Asymptomatic subjects' scores were higher on most subscales than the age-adjusted scores of MOS outpatients with hypertension, diabetes, recent myocardial infarction, or depression; ARC patients scored closest to hypertensive patients. This instrument, containing a subset of the MOS measures of health-related quality of life, may be a useful outcome measure for AIDS clinical trials.
Collapse
|
Comparative Study |
34 |
430 |
24
|
Abstract
In response to questions raised about the "accuracy" of SF-36 physical (PCS) and mental (MCS) component summary scores, particularly extremely high and low scores, we briefly comment on: how they were developed, how they are scored, the factor content of the eight SF-36 subscales, cross-tabulations between item-level responses and extreme summary scores, and published and new tests of their empirical validity. Published cross-tabulations between SF-36 items and PCS and MCS scores, reanalyses of public datasets (N = 5919), and preliminary results from the Medicare Health Outcomes Survey (HOS) (N = 172,314) yielded little or no evidence in support of Taft's hypothesis that extreme scores are an invalid artifact of some negative scoring weights. For example, in the HOS, those (N = 432) with "unexpected" PCS scores worse than 20 (which, according to Taft, indicate better mental health rather than worse physical health) were about 25% more likely to die within two years, in comparison with those scoring in the next highest (21-30) category. In this test and in all other empirical tests, results of predictions supported the validity of extreme PCS and MCS scores. We recommend against the interpretation of average differences smaller than one point in studies that seek to detect "false" measurement and we again repeat our 7-year-old recommendation that results based on summary measures should be thoroughly compared with the SF-36 profile before drawing conclusions. To facilitate such comparisons, scoring utilities and user-friendly graphs for SF-36 profiles and physical and mental summary scores (both orthogonal and oblique scoring algorithms) have been made available on the Internet at www.sf-36.com/test.
Collapse
|
Comparative Study |
23 |
422 |
25
|
Abstract
OBJECTIVES To identify physician and practice characteristics associated with a physician's propensity to involve patients in diagnostic and treatment decisions, or participatory decision-making style. DESIGN A representative cross-sectional sample of patients participating in the Medical Outcomes Study characterized each physician's style by using a self-reported questionnaire. A single averaged style score was generated for each physician. Style scores were compared among physicians who differed in age, sex, minority status, specialty, primary care training or training in interviewing skills, satisfaction with professional autonomy, and practice volume. SETTINGS Solo practices, multispecialty groups, and health maintenance organizations in Boston, Chicago, and Los Angeles. PARTICIPANTS 7730 patients sampled over 9 days from the practices of 300 physicians. Physicians were practicing general internal medicine, family medicine, cardiology, and endocrinology. MEASUREMENTS Participatory decision-making style was measured using a three-item scale on a questionnaire that was completed by patients after their office visit. Physician and practice characteristics were reported by physicians on self-administered questionnaires. RESULTS Among patients of physicians who were rated in the lowest (least participatory) quartile, one third changed physicians in the following year; among patients of physicians who were rated in the highest quartile, only 15% changed physicians. Higher scores were associated with greater patient satisfaction. Physicians who had had primary care training or training in interviewing skills scored higher than those without such training. Physicians in higher-volume practices were rated as less participatory than those in lower-volume practices. Physicians who were satisfied with their level of professional autonomy were rates as more participatory than those who were dissatisfied. CONCLUSION Participatory decision-making style is influenced by physicians' background, training, practice volume, and professional autonomy. Because participatory decision-making style is related to patient satisfaction and loyalty to the physician, cost-containment strategies that reduce time with patients and decrease physician autonomy may result in suboptimal patient outcomes.
Collapse
|
|
29 |
401 |