1
|
Nieser KJ, Harris AHS. Comparing methods for assessing the reliability of health care quality measures. Stat Med 2024. [PMID: 39145538 DOI: 10.1002/sim.10197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 06/02/2024] [Accepted: 07/31/2024] [Indexed: 08/16/2024]
Abstract
Quality measurement plays an increasing role in U.S. health care. Measures inform quality improvement efforts, public reporting of variations in quality of care across providers and hospitals, and high-stakes financial decisions. To be meaningful in these contexts, measures should be reliable and not heavily impacted by chance variations in sampling or measurement. Several different methods are used in practice by measure developers and endorsers to evaluate reliability; however, there is uncertainty and debate over differences between these methods and their interpretations. We review methods currently used in practice, pointing out differences that can lead to disparate reliability estimates. We compare estimates from 14 different methods in the case of two sets of mental health quality measures within a large health system. We find that estimates can differ substantially and that these discrepancies widen when sample size is reduced.
Collapse
Affiliation(s)
- Kenneth J Nieser
- Center for Innovation to Implementation, VA Palo Alto Health Care System, Menlo Park, California
- Stanford-Surgery Policy Improvement Research and Education Center, Department of Surgery, Stanford University, Stanford, California
| | - Alex H S Harris
- Center for Innovation to Implementation, VA Palo Alto Health Care System, Menlo Park, California
- Stanford-Surgery Policy Improvement Research and Education Center, Department of Surgery, Stanford University, Stanford, California
| |
Collapse
|
2
|
Hartman N, Shahinian VB, Ashby VB, Price KJ, He K. Limitations of the Inter-Unit Reliability: A Set of Practical Examples. HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY 2024; 24:156-169. [PMID: 39145149 PMCID: PMC11323040 DOI: 10.1007/s10742-023-00307-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 04/11/2023] [Accepted: 06/10/2023] [Indexed: 08/16/2024]
Abstract
Healthcare quality measures are statistics that serve to evaluate healthcare providers and identify those that need to improve their care. Before using these measures in clinical practice, developers and reviewers assess measure reliability, which describes the degree to which differences in the measure values reflect actual variation in healthcare quality, as opposed to random noise. The Inter-Unit Reliability (IUR) is a popular statistic for assessing reliability, and it describes the proportion of total variation in a measure that is attributable to between-provider variation. However, Kalbfleisch, He, Xia, and Li (2018) [Health Services and Outcomes Research Methodology, 18, 215-225] have argued that the IUR has a severe limitation in that some of the between-provider variation may be unrelated to quality of care. In this paper, we illustrate the practical implications of this limitation through several concrete examples. We show that certain best-practices in measure development, such as careful risk adjustment and exclusion of unstable measure values, can decrease the sample IUR value. These findings uncover potential negative consequences of discarding measures with IUR values below some arbitrary threshold.
Collapse
Affiliation(s)
- Nicholas Hartman
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, U.S.A
- Kidney Epidemiology and Cost Center, University of Michigan, Ann Arbor, MI, U.S.A
| | - Vahakn B. Shahinian
- Kidney Epidemiology and Cost Center, University of Michigan, Ann Arbor, MI, U.S.A
- Division of Nephrology, University of Michigan, Ann Arbor, MI, U.S.A
| | - Valarie B. Ashby
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, U.S.A
- Kidney Epidemiology and Cost Center, University of Michigan, Ann Arbor, MI, U.S.A
| | - Katrina J. Price
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, U.S.A
- Kidney Epidemiology and Cost Center, University of Michigan, Ann Arbor, MI, U.S.A
| | - Kevin He
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, U.S.A
- Kidney Epidemiology and Cost Center, University of Michigan, Ann Arbor, MI, U.S.A
| |
Collapse
|
3
|
Austin PC, Fang J, Yu B, Kapral MK. Examining Hospital Variation on Multiple Indicators of Stroke Quality of Care. Circ Cardiovasc Qual Outcomes 2020; 13:e006968. [PMID: 33238729 PMCID: PMC7742217 DOI: 10.1161/circoutcomes.120.006968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Provider profiling involves comparing the performance of hospitals on indicators of quality of care. Typically, provider profiling examines the performance of hospitals on each quality indicator in isolation. Consequently, one cannot formally examine whether hospitals that have poor performance on one indicator also have poor performance on a second indicator. METHODS We used Bayesian multivariate response random effects logistic regression model to simultaneously examine variation and covariation in multiple binary indicators across hospitals. We considered 7 binary patient-level indicators of quality of care for patients presenting to hospital with a diagnosis of acute stroke. We examined between-hospital variation in these 7 indicators across 86 hospitals in Ontario, Canada. RESULTS The number of patients eligible for each indicator ranged from 1321 to 14 079. There were 7 pairs of indicators for which there was a strong correlation between a hospital's performance on each of the 2 indicators. Twenty-nine of the 86 hospitals had a probability higher than 0.90 of having worse performance than average on at least 4 of the 7 indicators. Seven of the 86 of hospitals had a probability higher than 0.90 of having worse performance than average on at least 5 indicators. Fourteen of the 86 of hospitals had a probability higher than 0.50 of having worse performance than average on at least 6 indicators. No hospitals had a probability higher than 0.50 of having worse performance than average on all 7 indicators. CONCLUSIONS These findings suggest that there are a small number of hospitals that perform poorly on at least half of the quality indicators, and that certain indicators tend to cluster together. The described methods allow for targeting quality improvement initiatives at these hospitals.
Collapse
Affiliation(s)
- Peter C Austin
- ICES, Toronto, ON, Canada (P.C.A., J.F., B.Y., M.K.K.).,Institute of Health Policy, Management and Evaluation (P.C.A., M.K.K.), University of Toronto, ON, Canada.,Schulich Heart Research Program, Sunnybrook Research Institute, Toronto, ON, Canada (P.C.A.)
| | - Jiming Fang
- ICES, Toronto, ON, Canada (P.C.A., J.F., B.Y., M.K.K.)
| | - Bing Yu
- ICES, Toronto, ON, Canada (P.C.A., J.F., B.Y., M.K.K.)
| | - Moira K Kapral
- ICES, Toronto, ON, Canada (P.C.A., J.F., B.Y., M.K.K.).,Institute of Health Policy, Management and Evaluation (P.C.A., M.K.K.), University of Toronto, ON, Canada.,Department of Medicine (M.K.K.), University of Toronto, ON, Canada
| |
Collapse
|
4
|
Defining and estimating the reliability of physician quality measures in hierarchical logistic regression models. HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY 2020. [DOI: 10.1007/s10742-020-00226-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
5
|
He K, Kalbfleisch JD, Yang Y, Fei Z. Inter‐unit reliability for nonlinear models. Stat Med 2018; 38:844-854. [DOI: 10.1002/sim.8005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2018] [Revised: 09/21/2018] [Accepted: 09/28/2018] [Indexed: 11/07/2022]
Affiliation(s)
- Kevin He
- Department of BiostatisticsUniversity of Michigan Ann Arbor Michigan
| | | | - Yuan Yang
- Department of BiostatisticsUniversity of Michigan Ann Arbor Michigan
| | - Zhe Fei
- Department of BiostatisticsUniversity of Michigan Ann Arbor Michigan
| |
Collapse
|
6
|
Staggs VS, Cramer E. Can Nursing Units With High Fall Rates Be Identified Using One Year of Data? Reliability of Fall Rates As a Function of the Number of Quarters on Which They Are Based. Res Nurs Health 2016; 40:80-87. [PMID: 27687008 DOI: 10.1002/nur.21770] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2016] [Indexed: 11/12/2022]
Abstract
Reliability-the extent to which multiple measurements of a target yield similar results-is critical in comparing healthcare provider quality. Hospital unit fall rates are widely tracked and used for benchmarking, but their reliability is not well-studied. Our twofold purpose was to estimate fall rate reliability, both in terms of signal (between-unit variability) relative to noise (within-unit variability) and in terms of the accuracy with which units can be classified as high-fall units; and to assess reliability as a function of the number of quarters of data used to compute fall rates. Using year 2013 data from 11,765 critical care, step-down, medical, surgical, medical-surgical, and rehabilitation units in 1,552 US hospitals, we identified high-fall-rate units, computed units' signal-noise reliability, and simulated data to assess accuracy of high-fall-rate unit classification as a function of quarters of data. When critical care units were excluded, median unit type signal-noise reliabilities for annual total and injurious fall rates, respectively, ranged from .74 to .82 and from .53 to .68. In simulation, seven quarters of data were sufficient to achieve top-decile misclassification rates at or below 10% for all unit types except critical care. Top-quartile misclassification rates were higher; even 12 quarters of data did not consistently yield top-quartile misclassification rates below 10%. In the absence of long-term data, and for units with low patient volume and unit types with very low fall rates, comparison with a unit's own historical data may be more helpful for quality monitoring than attempting to rank it among its peers. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Vincent S Staggs
- Research Faculty, Health Services and Outcomes Research, Children's Mercy Hospitals and Clinics, Associate Professor, School of Medicine, University of Missouri-Kansas City, Kansas City, MO 64108
| | - Emily Cramer
- Research Assistant Professor, School of Nursing, University of Kansas Medical Center, Kansas City, KS
| |
Collapse
|
7
|
Staggs VS. Deviations in Monthly Staffing and Injurious Assaults Against Staff and Patients on Psychiatric Units. Res Nurs Health 2016; 39:347-52. [PMID: 27304990 DOI: 10.1002/nur.21735] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2016] [Indexed: 11/07/2022]
Abstract
It is widely thought that low staffing levels are associated with higher risk of psychiatric inpatient violence. The purpose of this study was to determine whether odds of an injurious assault are higher in months in which unit staffing levels are higher or lower relative to unit average, using a design allowing each unit to serve as its own control. Using 2011-2013 National Database of Nursing Quality Indicators data from 480 adult and 90 geriatric units in 361 US hospitals, monthly assault odds were modeled as functions of unit staffing. Monthly RN and non-RN staffing (hours per patient day) were categorized as very low, low, average, high, or very high, based on deviation from the unit's average staffing across study months. Endpoints were binary indicators for one or more injurious assaults against staff during the month and for one or more injurious assaults against patients during the month. Despite large sample sizes, neither RN nor non-RN staffing was a statistically significant predictor of odds of assault, nor was there a consistent trend of odds of assault being higher at below- or above-average staffing levels. There was little evidence that monthly deviation in unit staffing is associated with the odds of an injurious assault on a unit. This suggests that staffing-assault rate associations in previous studies of monthly data are largely attributable to between-unit rather than within-unit staffing differences. Hospitals may need to look beyond below- or above-average nurse staffing as a cause of assaults. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Vincent S Staggs
- Senior Biostatistician, Health Services and Outcomes Research, Children's Mercy Hospitals and Clinics.
- Assistant Professor, School of Medicine, University of Missouri-Kansas City, 2401 Gillham Road, Kansas City, MO, 64108.
| |
Collapse
|
8
|
Staggs VS, Cramer E. Reliability of Pressure Ulcer Rates: How Precisely Can We Differentiate Among Hospital Units, and Does the Standard Signal-Noise Reliability Measure Reflect This Precision? Res Nurs Health 2016; 39:298-305. [PMID: 27223598 PMCID: PMC5089619 DOI: 10.1002/nur.21727] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/22/2016] [Indexed: 11/09/2022]
Abstract
Hospital performance reports often include rankings of unit pressure ulcer rates. Differentiating among units on the basis of quality requires reliable measurement. Our objectives were to describe and apply methods for assessing reliability of hospital‐acquired pressure ulcer rates and evaluate a standard signal‐noise reliability measure as an indicator of precision of differentiation among units. Quarterly pressure ulcer data from 8,199 critical care, step‐down, medical, surgical, and medical‐surgical nursing units from 1,299 US hospitals were analyzed. Using beta‐binomial models, we estimated between‐unit variability (signal) and within‐unit variability (noise) in annual unit pressure ulcer rates. Signal‐noise reliability was computed as the ratio of between‐unit variability to the total of between‐ and within‐unit variability. To assess precision of differentiation among units based on ranked pressure ulcer rates, we simulated data to estimate the probabilities of a unit's observed pressure ulcer rate rank in a given sample falling within five and ten percentiles of its true rank, and the probabilities of units with ulcer rates in the highest quartile and highest decile being identified as such. We assessed the signal‐noise measure as an indicator of differentiation precision by computing its correlations with these probabilities. Pressure ulcer rates based on a single year of quarterly or weekly prevalence surveys were too susceptible to noise to allow for precise differentiation among units, and signal‐noise reliability was a poor indicator of precision of differentiation. To ensure precise differentiation on the basis of true differences, alternative methods of assessing reliability should be applied to measures purported to differentiate among providers or units based on quality. © 2016 The Authors. Research in Nursing & Health published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Vincent S Staggs
- Health Services and Outcomes Research, Children's Mercy Hospitals and Clinics, School of Medicine, University of Missouri-Kansas City, 2401 Gillham Road, Kansas City, MO, 64108
| | - Emily Cramer
- School of Nursing, University of Kansas Medical Center, Kansas City, KS
| |
Collapse
|