1
|
Kim C, Tec M, Zigler C. Bayesian nonparametric adjustment of confounding. Biometrics 2023; 79:3252-3265. [PMID: 36718599 DOI: 10.1111/biom.13833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 01/19/2023] [Indexed: 02/01/2023]
Abstract
Analysis of observational studies increasingly confronts the challenge of determining which of a possibly high-dimensional set of available covariates are required to satisfy the assumption of ignorable treatment assignment for estimation of causal effects. We propose a Bayesian nonparametric approach that simultaneously (1) prioritizes inclusion of adjustment variables in accordance with existing principles of confounder selection; (2) estimates causal effects in a manner that permits complex relationships among confounders, exposures, and outcomes; and (3) provides causal estimates that account for uncertainty in the nature of confounding. The proposal relies on specification of multiple Bayesian additive regression trees models, linked together with a common prior distribution that accrues posterior selection probability to covariates on the basis of association with both the exposure and the outcome of interest. A set of extensive simulation studies demonstrates that the proposed method performs well relative to similarly-motivated methodologies in a variety of scenarios. We deploy the method to investigate the causal effect of emissions from coal-fired power plants on ambient air pollution concentrations, where the prospect of confounding due to local and regional meteorological factors introduces uncertainty around the confounding role of a high-dimensional set of measured variables. Ultimately, we show that the proposed method produces more efficient and more consistent results across adjacent years than alternative methods, lending strength to the evidence of the causal relationship between SO2 emissions and ambient particulate pollution.
Collapse
Affiliation(s)
- Chanmin Kim
- Department of Statistics, SungKyunKwan University, Seoul, South Korea
| | - Mauricio Tec
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Corwin Zigler
- Department of Statistics and Data Science, The University of Texas, Austin, Texas, USA
| |
Collapse
|
2
|
Talbot D, Massamba VK. A descriptive review of variable selection methods in four epidemiologic journals: there is still room for improvement. Eur J Epidemiol 2019; 34:725-730. [PMID: 31161279 DOI: 10.1007/s10654-019-00529-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 05/24/2019] [Indexed: 11/29/2022]
Abstract
A review of epidemiological papers conducted in 2009 concluded that several studies employed variable selection methods susceptible to introduce bias and yield inadequate inferences. Many new confounder selection methods have been developed since then. The goal of the study was to provide an updated descriptive portrait of which variable selection methods are used by epidemiologists for analyzing observational data. Studies published in four major epidemiological journals in 2015 were reviewed. Only articles concerned with a predictive or explicative objective and reporting on the analysis of individual data were included. Method(s) employed for selecting variables were extracted from retained articles. A total of 975 articles were retrieved and 299 met eligibility criteria, 292 of which pursued an explicative objective. Among those, 146 studies (50%) reported using prior knowledge or causal graphs for selecting variables, 34 (12%) used change in effect estimate methods, 26 (9%) used stepwise approaches, 16 (5%) employed univariate analyses, 5 (2%) used various other methods and 107 (37%) did not provide sufficient details to allow classification (more than one method could be employed in a single article). Despite being less frequent than in the previous review, stepwise and univariable analyses, which are susceptible to introduce bias and produce inadequate inferences, were still prevalent. Moreover, 37% studies did not provide sufficient details to assess how variables were selected. We thus believe there is still room for improvement in variable selection methods used by epidemiologists and in their reporting.
Collapse
Affiliation(s)
- Denis Talbot
- Département de médecine sociale et préventive, Faculté de médecine, Université Laval, 1050, avenue de la Médecine, Pavillon Ferdinand-Vandry, room 2454, Quebec, QC, G1V 0A6, Canada. .,Unité santé des populations et pratiques optimales en santé, CHU de Québec - Université Laval Research Center, Quebec, QC, Canada.
| | - Victoria Kubuta Massamba
- Département de médecine sociale et préventive, Faculté de médecine, Université Laval, 1050, avenue de la Médecine, Pavillon Ferdinand-Vandry, room 2454, Quebec, QC, G1V 0A6, Canada.,Unité santé des populations et pratiques optimales en santé, CHU de Québec - Université Laval Research Center, Quebec, QC, Canada
| |
Collapse
|
3
|
Antonelli J, Zigler C, Dominici F. Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research. Biostatistics 2018; 18:553-568. [PMID: 28334230 DOI: 10.1093/biostatistics/kxx003] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 01/06/2017] [Indexed: 11/12/2022] Open
Abstract
In comparative effectiveness research, we are often interested in the estimation of an average causal effect from large observational data (the main study). Often this data does not measure all the necessary confounders. In many occasions, an extensive set of additional covariates is measured for a smaller and non-representative population (the validation study). In this setting, standard approaches for missing data imputation might not be adequate due to the large number of missing covariates in the main data relative to the smaller sample size of the validation data. We propose a Bayesian approach to estimate the average causal effect in the main study that borrows information from the validation study to improve confounding adjustment. Our approach combines ideas of Bayesian model averaging, confounder selection, and missing data imputation into a single framework. It allows for different treatment effects in the main study and in the validation study, and propagates the uncertainty due to the missing data imputation and confounder selection when estimating the average causal effect (ACE) in the main study. We compare our method to several existing approaches via simulation. We apply our method to a study examining the effect of surgical resection on survival among 10 396 Medicare beneficiaries with a brain tumor when additional covariate information is available on 2220 patients in SEER-Medicare. We find that the estimated ACE decreases by 30% when incorporating additional information from SEER-Medicare.
Collapse
Affiliation(s)
- Joseph Antonelli
- Department of Biostatistics, Harvard TH Chan School of Public Health, 655 Huntington Avenue, Boston, MA, 02115,USA
| | - Corwin Zigler
- Department of Biostatistics, Harvard TH Chan School of Public Health, 655 Huntington Avenue, Boston, MA, 02115,USA
| | - Francesca Dominici
- Department of Biostatistics, Harvard TH Chan School of Public Health, 655 Huntington Avenue, Boston, MA, 02115,USA
| |
Collapse
|
4
|
Keller JP, Rice KM. Selecting Shrinkage Parameters for Effect Estimation: The Multi-Ethnic Study of Atherosclerosis. Am J Epidemiol 2018; 187:358-365. [PMID: 28992037 DOI: 10.1093/aje/kwx225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 04/24/2017] [Indexed: 11/14/2022] Open
Abstract
We present a method for improving estimation in linear regression models in samples of moderate size, using shrinkage techniques. Our work connects the theory of causal inference, which describes how variable adjustment should be performed with large samples, with shrinkage estimators such as ridge regression and the least absolute shrinkage and selection operator (LASSO), which can perform better in sample sizes seen in epidemiologic practice. Shrinkage methods reduce mean squared error by trading off some amount of bias for a reduction in variance. However, when inference is the goal, there are no standard methods for choosing the penalty "tuning" parameters that govern these tradeoffs. We propose selecting the penalty parameters for these shrinkage estimators by minimizing bias and variance in future similar data sets drawn from the posterior predictive distribution. Our method provides both the point estimate of interest and corresponding standard error estimates. Through simulations, we demonstrate that it can achieve better mean squared error than using cross-validation for penalty parameter selection. We apply our method to a cross-sectional analysis of the association between smoking and carotid intima-media thickness in the Multi-Ethnic Study of Atherosclerosis (multiple US locations, 2000-2002) and compare it with similar analyses of these data.
Collapse
Affiliation(s)
- Joshua P Keller
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland
| | - Kenneth M Rice
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington
| |
Collapse
|
5
|
Wang C, Liu J, Fardo DW. Causal effect estimation in sequencing studies: a Bayesian method to account for confounder adjustment uncertainty. BMC Proc 2016; 10:411-415. [PMID: 27980670 PMCID: PMC5133506 DOI: 10.1186/s12919-016-0064-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Estimating the causal effect of a single nucleotide variant (SNV) on clinical phenotypes is of interest in many genetic studies. The effect estimation may be confounded by other SNVs as a result of linkage disequilibrium as well as demographic and clinical characteristics. Because a large number of these other variables, which we call potential confounders, are collected, it is challenging to select and adjust for the variables that truly confound the causal effect. The Bayesian adjustment for confounding (BAC) method has been proposed as a general method to estimate the average causal effect in the presence of a large number of potential confounders under the assumption of no unmeasured confounders. In this paper, we explore the application of BAC in genetic studies using Genetic Analysis Workshop 19 exome sequencing data. Our results show that BAC can efficiently estimate the causal effect of genetic variants with adjustment for confounding. Consequently, BAC may serve as a useful tool for genome-wide association studies data analysis to effectively assess the causal effect of genetic variants and the impact of potential interventions.
Collapse
Affiliation(s)
- Chi Wang
- Department of Biostatistics, College of Public Health, University of Kentucky, 725 Rose St, Lexington, KY 40536 USA
- Biostatistics and Bioinformatics Shared Resource Facility, Markey Cancer Center, University of Kentucky, 800 Rose St, Lexington, KY 40536 USA
| | - Jinpeng Liu
- Biostatistics and Bioinformatics Shared Resource Facility, Markey Cancer Center, University of Kentucky, 800 Rose St, Lexington, KY 40536 USA
| | - David W. Fardo
- Department of Biostatistics, College of Public Health, University of Kentucky, 725 Rose St, Lexington, KY 40536 USA
| |
Collapse
|
6
|
Wang C, Dominici F, Parmigiani G, Zigler CM. Accounting for uncertainty in confounder and effect modifier selection when estimating average causal effects in generalized linear models. Biometrics 2015; 71:654-65. [PMID: 25899155 DOI: 10.1111/biom.12315] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2014] [Revised: 02/01/2015] [Accepted: 03/01/2015] [Indexed: 12/25/2022]
Abstract
Confounder selection and adjustment are essential elements of assessing the causal effect of an exposure or treatment in observational studies. Building upon work by Wang et al. (2012, Biometrics 68, 661-671) and Lefebvre et al. (2014, Statistics in Medicine 33, 2797-2813), we propose and evaluate a Bayesian method to estimate average causal effects in studies with a large number of potential confounders, relatively few observations, likely interactions between confounders and the exposure of interest, and uncertainty on which confounders and interaction terms should be included. Our method is applicable across all exposures and outcomes that can be handled through generalized linear models. In this general setting, estimation of the average causal effect is different from estimation of the exposure coefficient in the outcome model due to noncollapsibility. We implement a Bayesian bootstrap procedure to integrate over the distribution of potential confounders and to estimate the causal effect. Our method permits estimation of both the overall population causal effect and effects in specified subpopulations, providing clear characterization of heterogeneous exposure effects that may vary considerably across different covariate profiles. Simulation studies demonstrate that the proposed method performs well in small sample size situations with 100-150 observations and 50 covariates. The method is applied to data on 15,060 US Medicare beneficiaries diagnosed with a malignant brain tumor between 2000 and 2009 to evaluate whether surgery reduces hospital readmissions within 30 days of diagnosis.
Collapse
Affiliation(s)
- Chi Wang
- Department of Biostatistics, University of Kentucky, Lexington, Kentucky, U.S.A.,Markey Cancer Center, University of Kentucky, Lexington, Kentucky, U.S.A
| | - Francesca Dominici
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, U.S.A
| | - Giovanni Parmigiani
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, U.S.A.,Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, U.S.A
| | - Corwin Matthew Zigler
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, U.S.A
| |
Collapse
|