1
|
Hengelbrock J, Rauh J, Cederbaum J, Kähler M, Höhle M. Hospital profiling using Bayesian decision theory. Biometrics 2023; 79:2757-2769. [PMID: 36401573 DOI: 10.1111/biom.13798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 11/02/2022] [Indexed: 11/21/2022]
Abstract
For evaluating the quality of care provided by hospitals, special interest lies in the identification of performance outliers. The classification of healthcare providers as outliers or non-outliers is a decision under uncertainty, because the true quality is unknown and can only be inferred from an observed result of a quality indicator. We propose to embed the classification of healthcare providers into a Bayesian decision theoretical framework that enables the derivation of optimal decision rules with respect to the expected decision consequences. We propose paradigmatic utility functions for two typical purposes of hospital profiling: the external reporting of healthcare quality and the initiation of change in care delivery. We make use of funnel plots to illustrate and compare the resulting optimal decision rules and argue that sensitivity and specificity of the resulting decision rules should be analyzed. We then apply the proposed methodology to the area of hip replacement surgeries by analyzing data from 1,277 hospitals in Germany which performed over 180,000 such procedures in 2017. Our setting illustrates that the classification of outliers can be highly dependent upon the underlying utilities. We conclude that analyzing the classification of hospitals as a decision theoretic problem helps to derive transparent and justifiable decision rules. The methodology for classifying quality indicator results is implemented in an R package (iqtigbdt) and is available on GitHub.
Collapse
Affiliation(s)
- Johannes Hengelbrock
- Federal Institute for Quality Assurance and Transparency in Healthcare, Berlin, Germany
| | - Johannes Rauh
- Federal Institute for Quality Assurance and Transparency in Healthcare, Berlin, Germany
| | - Jona Cederbaum
- Federal Institute for Quality Assurance and Transparency in Healthcare, Berlin, Germany
| | - Maximilian Kähler
- Federal Institute for Quality Assurance and Transparency in Healthcare, Berlin, Germany
| | - Michael Höhle
- Federal Institute for Quality Assurance and Transparency in Healthcare, Berlin, Germany
- Department of Mathematics, Stockholm University, Stockholm, Sweden
| |
Collapse
|
2
|
Austin PC, Lee DS, Leckie G. Comparing a multivariate response Bayesian random effects logistic regression model with a latent variable item response theory model for provider profiling on multiple binary indicators simultaneously. Stat Med 2020; 39:1390-1406. [PMID: 32043653 PMCID: PMC7187268 DOI: 10.1002/sim.8484] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 01/09/2020] [Accepted: 01/09/2020] [Indexed: 01/06/2023]
Abstract
Provider profiling entails comparing the performance of hospitals on indicators of quality of care. Many common indicators of healthcare quality are binary (eg, short‐term mortality, use of appropriate medications). Typically, provider profiling examines the variation in each indicator in isolation across hospitals. We developed Bayesian multivariate response random effects logistic regression models that allow one to simultaneously examine variation and covariation in multiple binary indicators across hospitals. Use of this model allows for (i) determining the probability that a hospital has poor performance on a single indicator; (ii) determining the probability that a hospital has poor performance on multiple indicators simultaneously; (iii) determining, by using the Mahalanobis distance, how far the performance of a given hospital is from that of an average hospital. We illustrate the utility of the method by applying it to 10 881 patients hospitalized with acute myocardial infarction at 102 hospitals. We considered six binary patient‐level indicators of quality of care: use of reperfusion, assessment of left ventricular ejection fraction, measurement of cardiac troponins, use of acetylsalicylic acid within 6 hours of hospital arrival, use of beta‐blockers within 12 hours of hospital arrival, and survival to 30 days after hospital admission. When considering the five measures evaluating processes of care, we found that there was a strong correlation between a hospital's performance on one indicator and its performance on a second indicator for five of the 10 possible comparisons. We compared inferences made using this approach with those obtained using a latent variable item response theory model.
Collapse
Affiliation(s)
- Peter C Austin
- ICES, Toronto, Canada.,Institute of Health Management, Policy and Evaluation, University of Toronto, Toronto, Canada.,Schulich Heart Research Program, Sunnybrook Research Institute, Toronto, Canada
| | - Douglas S Lee
- ICES, Toronto, Canada.,Institute of Health Management, Policy and Evaluation, University of Toronto, Toronto, Canada.,Department of Medicine, University of Toronto, Toronto, Canada.,Peter Munk Cardiac Centre and Joint Department of Medical Imaging, and University Health Network, Toronto, Canada
| | - George Leckie
- Centre for Multilevel Modeling, University of Bristol, Bristol, UK
| |
Collapse
|
3
|
de la Guardia FH, Hwang J, Adams JL, Paddock SM. Loss function-based evaluation of physician report cards. HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY 2018. [DOI: 10.1007/s10742-018-0179-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
4
|
Adams M, Braun J, Bucher HU, Puhan MA, Bassler D, Von Wyl V. Comparison of three different methods for risk adjustment in neonatal medicine. BMC Pediatr 2017; 17:106. [PMID: 28415984 PMCID: PMC5392992 DOI: 10.1186/s12887-017-0861-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 04/06/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Quality improvement in health care requires identification of areas in need of improvement by comparing processes and patient outcomes within and between health care providers. It is critical to adjust for different case-mix and outcome risks of patient populations but it is currently unclear which approach has higher validity and how limitations need to be dealt with. Our aim was to compare 3 approaches towards risk adjustment for 7 different major quality indicators in neonatal intensive care (21 models). METHODS We compared an indirect standardization, logistic regression and multilevel approach. Parameters for risk adjustment were chosen according to literature and the condition that they may not depend on processes performed by treating clinics. Predictive validity was tested using the mean Brier Score and by comparing area under curve (AUC) using high quality population based data separated into training and validation sets. Changes in attributional validity were analysed by comparing the effect of the models on the observed-to-expected ratios of the clinics in standardized mortality/morbidity ratio charts. RESULTS Risk adjustment based on indirect standardization revealed inferior c-statistics but superior Brier scores for 3 of 7 outcomes. Logistic regression and multilevel modelling were equivalent to one another. C-statistics revealed that predictive validity was high for 8 and acceptable for 11 of the 21 models. Yet, the effect of all forms of risk adjustment on any clinic's comparison with the standard was small, even though there was clear risk heterogeneity between clinics. CONCLUSIONS All three approaches to risk adjustment revealed comparable results. The limited effect of risk adjustment on clinic comparisons indicates a small case-mix influence on observed outcomes, but also a limited ability to isolate quality improvement potential based on risk-adjustment models. Rather than relying on methodological approaches, we instead recommend that clinics build small collaboratives and compare their indicators both in risk-adjusted and unadjusted form together. This allows qualitatively investigating and discussing the residual risk-differences within networks. The predictive validity should be quantified and reported and stratification into risk groups should be more widely used to correct for confounding.
Collapse
Affiliation(s)
- Mark Adams
- Division of Neonatology, University Hospital Zurich and University of Zurich, Wagistrasse 14, 8952, Schlieren, Switzerland. .,Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland.
| | - Julia Braun
- Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
| | - Hans Ulrich Bucher
- Division of Neonatology, University Hospital Zurich and University of Zurich, Wagistrasse 14, 8952, Schlieren, Switzerland
| | - Milo Alan Puhan
- Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
| | - Dirk Bassler
- Division of Neonatology, University Hospital Zurich and University of Zurich, Wagistrasse 14, 8952, Schlieren, Switzerland
| | - Viktor Von Wyl
- Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
| | | |
Collapse
|
5
|
Hubbard RA, Benjamin-Johnson R, Onega T, Smith-Bindman R, Zhu W, Fenton JJ. Classification accuracy of claims-based methods for identifying providers failing to meet performance targets. Stat Med 2014; 34:93-105. [PMID: 25302935 DOI: 10.1002/sim.6318] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2013] [Revised: 08/15/2014] [Accepted: 09/14/2014] [Indexed: 11/09/2022]
Abstract
Quality assessment is critical for healthcare reform, but data sources are lacking for measurement of many important healthcare outcomes. With over 49 million people covered by Medicare as of 2010, Medicare claims data offer a potentially valuable source that could be used in targeted health care quality improvement efforts. However, little is known about the operating characteristics of provider profiling methods using claims-based outcome measures that may estimate provider performance with error. Motivated by the example of screening mammography performance, we compared approaches to identifying providers failing to meet guideline targets using Medicare claims data. We used data from the Breast Cancer Surveillance Consortium and linked Medicare claims to compare claims-based and clinical estimates of cancer detection rate. We then demonstrated the performance of claim-based estimates across a broad range of operating characteristics using simulation studies. We found that identification of poor performing providers was extremely sensitive to algorithm specificity, with no approach identifying more than 65% of poor performing providers when claims-based measures had specificity of 0.995 or less. We conclude that claims have the potential to contribute important information on healthcare outcomes to quality improvement efforts. However, to achieve this potential, development of highly accurate claims-based outcome measures should remain a priority.
Collapse
Affiliation(s)
- Rebecca A Hubbard
- Group Health Research Institute, Seattle, WA, U.S.A.; Department of Biostatistics, University of Washington, Seattle, WA, U.S.A
| | | | | | | | | | | |
Collapse
|
6
|
He Y, Selck F, Normand SLT. On the accuracy of classifying hospitals on their performance measures. Stat Med 2014; 33:1081-103. [PMID: 24122879 PMCID: PMC6400472 DOI: 10.1002/sim.6012] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2012] [Revised: 09/17/2013] [Accepted: 09/19/2013] [Indexed: 11/11/2022]
Abstract
The evaluation, comparison, and public report of health care provider performance is essential to improving the quality of health care. Hospitals, as one type of provider, are often classified into quality tiers (e.g., top or suboptimal) based on their performance data for various purposes. However, potential misclassification might lead to detrimental effects for both consumers and payers. Although such risk has been highlighted by applied health services researchers, a systematic investigation of statistical approaches has been lacking. We assess and compare the expected accuracy of several commonly used classification methods: unadjusted hospital-level averages, shrinkage estimators under a random-effects model accommodating between-hospital variation, and two others based on posterior probabilities. Assuming that performance data follow a classic one-way random-effects model with unequal sample size per hospital, we derive accuracy formulae for these classification approaches and gain insight into how the misclassification might be affected by various factors such as reliability of the data, hospital-level sample size distribution, and cutoff values between quality tiers. The case of binary performance data is also explored using Monte Carlo simulation strategies. We apply the methods to real data and discuss the practical implications.
Collapse
Affiliation(s)
- Yulei He
- Office of Research and Methodology, National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD 20782, U.S.A
| | | | | |
Collapse
|
7
|
Austin PC, Reeves MJ. Effect of Provider Volume on the Accuracy of Hospital Report Cards. Circ Cardiovasc Qual Outcomes 2014; 7:299-305. [DOI: 10.1161/circoutcomes.113.000685] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Affiliation(s)
- Peter C. Austin
- From the Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada (P.C.A.); Institute of Health Management, Policy and Evaluation, University of Toronto (P.C.A.); Schulich Heart Research Program, Sunnybrook Research Institute, Toronto, Canada (P.C.A.); and Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI (M.J.R.)
| | - Mathew J. Reeves
- From the Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada (P.C.A.); Institute of Health Management, Policy and Evaluation, University of Toronto (P.C.A.); Schulich Heart Research Program, Sunnybrook Research Institute, Toronto, Canada (P.C.A.); and Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI (M.J.R.)
| |
Collapse
|
8
|
Katzan IL, Spertus J, Bettger JP, Bravata DM, Reeves MJ, Smith EE, Bushnell C, Higashida RT, Hinchey JA, Holloway RG, Howard G, King RB, Krumholz HM, Lutz BJ, Yeh RW. Risk adjustment of ischemic stroke outcomes for comparing hospital performance: a statement for healthcare professionals from the American Heart Association/American Stroke Association. Stroke 2014; 45:918-44. [PMID: 24457296 DOI: 10.1161/01.str.0000441948.35804.77] [Citation(s) in RCA: 84] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
BACKGROUND AND PURPOSE Stroke is the fourth-leading cause of death and a leading cause of long-term major disability in the United States. Measuring outcomes after stroke has important policy implications. The primary goals of this consensus statement are to (1) review statistical considerations when evaluating models that define hospital performance in providing stroke care; (2) discuss the benefits, limitations, and potential unintended consequences of using various outcome measures when evaluating the quality of ischemic stroke care at the hospital level; (3) summarize the evidence on the role of specific clinical and administrative variables, including patient preferences, in risk-adjusted models of ischemic stroke outcomes; (4) provide recommendations on the minimum list of variables that should be included in risk adjustment of ischemic stroke outcomes for comparisons of quality at the hospital level; and (5) provide recommendations for further research. METHODS AND RESULTS This statement gives an overview of statistical considerations for the evaluation of hospital-level outcomes after stroke and provides a systematic review of the literature for the following outcome measures for ischemic stroke at 30 days: functional outcomes, mortality, and readmissions. Data on outcomes after stroke have primarily involved studies conducted at an individual patient level rather than a hospital level. On the basis of the available information, the following factors should be included in all hospital-level risk-adjustment models: age, sex, stroke severity, comorbid conditions, and vascular risk factors. Because stroke severity is the most important prognostic factor for individual patients and appears to be a significant predictor of hospital-level performance for 30-day mortality, inclusion of a stroke severity measure in risk-adjustment models for 30-day outcome measures is recommended. Risk-adjustment models that do not include stroke severity or other recommended variables must provide comparable classification of hospital performance as models that include these variables. Stroke severity and other variables that are included in risk-adjustment models should be standardized across sites, so that their reliability and accuracy are equivalent. There is a pressing need for research in multiple areas to better identify methods and metrics to evaluate outcomes of stroke care. CONCLUSIONS There are a number of important methodological challenges in undertaking risk-adjusted outcome comparisons to assess the quality of stroke care in different hospitals. It is important for stakeholders to recognize these challenges and for there to be a concerted approach to improving the methods for quality assessment and improvement.
Collapse
|
9
|
The relationship between the C-statistic of a risk-adjustment model and the accuracy of hospital report cards: a Monte Carlo Study. Med Care 2013; 51:275-84. [PMID: 23295579 DOI: 10.1097/mlr.0b013e31827ff0dc] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
BACKGROUND Hospital report cards, in which outcomes following the provision of medical or surgical care are compared across health care providers, are being published with increasing frequency. Essential to the production of these reports is risk-adjustment, which allows investigators to account for differences in the distribution of patient illness severity across different hospitals. Logistic regression models are frequently used for risk adjustment in hospital report cards. Many applied researchers use the c-statistic (equivalent to the area under the receiver operating characteristic curve) of the logistic regression model as a measure of the credibility and accuracy of hospital report cards. OBJECTIVES To determine the relationship between the c-statistic of a risk-adjustment model and the accuracy of hospital report cards. RESEARCH DESIGN Monte Carlo simulations were used to examine this issue. We examined the influence of 3 factors on the accuracy of hospital report cards: the c-statistic of the logistic regression model used for risk adjustment, the number of hospitals, and the number of patients treated at each hospital. The parameters used to generate the simulated datasets came from analyses of patients hospitalized with a diagnosis of acute myocardial infarction in Ontario, Canada. RESULTS The c-statistic of the risk-adjustment model had, at most, a very modest impact on the accuracy of hospital report cards, whereas the number of patients treated at each hospital had a much greater impact. CONCLUSIONS The c-statistic of a risk-adjustment model should not be used to assess the accuracy of a hospital report card.
Collapse
|
10
|
Variation in New Zealand hospital outcomes: combining hierarchical Bayesian modeling and propensity score methods for hospital performance comparisons. HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY 2012. [DOI: 10.1007/s10742-012-0079-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
11
|
Austin PC. Are (the log-odds of) hospital mortality rates normally distributed? Implications for studying variations in outcomes of medical care. J Eval Clin Pract 2009; 15:514-23. [PMID: 19522906 DOI: 10.1111/j.1365-2753.2008.01053.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
RATIONALE Hierarchical regression models are increasingly being used to examine variations in outcomes following the provision of medical care across providers. These models frequently assume a normal distribution for the provider-specific random effects. The appropriateness of this assumption for examining variations in health care outcomes has never been explicitly tested. AIMS AND OBJECTIVES To compare hierarchical logistic regression models in which the provider-specific random effects were either a normal distribution or a mixture of three normal distributions. METHODS We used data on 18,825 patients admitted to 109 hospitals in Ontario with a diagnosis of acute myocardial infarction. We used the Deviance Information Criterion, Bayes factors and predictive distributions to compare the evidence between the two competing models. RESULTS There was strong evidence that the distribution of hospital-specific log-odds of mortality was a mixture of three normal distributions compared to the evidence that it was normal. In some scenarios, the hospital-specific posterior tail probabilities of unacceptably high mortality were lower when a logistic-normal model was fit compared to when a logistic-mixture of normal distributions model was fit. Additionally, in these same scenarios, fewer hospitals were classified as having higher than acceptable mortality when the logistic-mixture of three normal distributions was used. CONCLUSIONS These findings have important consequences for those who use hierarchical models to examine variations in outcomes of medical care across providers since the mixture of three normal distributions model indicated that variations in outcomes across providers was greater than indicated by the logistic-normal model.
Collapse
Affiliation(s)
- Peter C Austin
- Institute for Clinical Evaluative Sciences, Department of Public Health Sciences, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
12
|
Gajewski BJ, Mahnken JD, Dunton N. Improving quality indicator report cards through Bayesian modeling. BMC Med Res Methodol 2008; 8:77. [PMID: 19017399 PMCID: PMC2596790 DOI: 10.1186/1471-2288-8-77] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2008] [Accepted: 11/18/2008] [Indexed: 11/24/2022] Open
Abstract
Background The National Database for Nursing Quality Indicators® (NDNQI®) was established in 1998 to assist hospitals in monitoring indicators of nursing quality (eg, falls and pressure ulcers). Hospitals participating in NDNQI transmit data from nursing units to an NDNQI data repository. Data are summarized and published in reports that allow participating facilities to compare the results for their units with those from other units across the nation. A disadvantage of this reporting scheme is that the sampling variability is not explicit. For example, suppose a small nursing unit that has 2 out of 10 (rate of 20%) patients with pressure ulcers. Should the nursing unit immediately undertake a quality improvement plan because of the rate difference from the national average (7%)? Methods In this paper, we propose approximating 95% credible intervals (CrIs) for unit-level data using statistical models that account for the variability in unit rates for report cards. Results Bayesian CrIs communicate the level of uncertainty of estimates more clearly to decision makers than other significance tests. Conclusion A benefit of this approach is that nursing units would be better able to distinguish problematic or beneficial trends from fluctuations likely due to chance.
Collapse
Affiliation(s)
- Byron J Gajewski
- Department of Biostatistics, School of Medicine, University of Kansas Medical Center, Kansas City, KS, USA.
| | | | | |
Collapse
|
13
|
Austin PC. Bayes rules for optimally using Bayesian hierarchical regression models in provider profiling to identify high-mortality hospitals. BMC Med Res Methodol 2008; 8:30. [PMID: 18474094 PMCID: PMC2415179 DOI: 10.1186/1471-2288-8-30] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2007] [Accepted: 05/12/2008] [Indexed: 11/28/2022] Open
Abstract
Background There is a growing trend towards the production of "hospital report-cards" in which hospitals with higher than acceptable mortality rates are identified. Several commentators have advocated for the use of Bayesian hierarchical models in provider profiling. Several researchers have shown that some degree of misclassification will result when hospital report cards are produced. The impact of misclassifying hospital performance can be quantified using different loss functions. Methods We propose several families of loss functions for hospital report cards and then develop Bayes rules for these families of loss functions. The resultant Bayes rules minimize the expected loss arising from misclassifying hospital performance. We develop Bayes rules for generalized 1-0 loss functions, generalized absolute error loss functions, and for generalized squared error loss functions. We then illustrate the application of these decision rules on a sample of 19,757 patients hospitalized with an acute myocardial infarction at 163 hospitals. Results We found that the number of hospitals classified as having higher than acceptable mortality is affected by the relative penalty assigned to false negatives compared to false positives. However, the choice of loss function family had a lesser impact upon which hospitals were identified as having higher than acceptable mortality. Conclusion The design of hospital report cards can be placed in a decision-theoretic framework. This allows researchers to minimize costs arising from the misclassification of hospitals. The choice of loss function can affect the classification of a small number of hospitals.
Collapse
Affiliation(s)
- Peter C Austin
- Institute for Clinical Evaluative Sciences, Toronto, Ontario.
| |
Collapse
|