1
|
Liu Y, Gao Q, Wei K, Huang C, Wang C, Yu Y, Qin G, Wang T. High-dimensional generalized median adaptive lasso with application to omics data. Brief Bioinform 2024; 25:bbae059. [PMID: 38436558 PMCID: PMC10939310 DOI: 10.1093/bib/bbae059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 01/03/2024] [Indexed: 03/05/2024] Open
Abstract
Recently, there has been a growing interest in variable selection for causal inference within the context of high-dimensional data. However, when the outcome exhibits a skewed distribution, ensuring the accuracy of variable selection and causal effect estimation might be challenging. Here, we introduce the generalized median adaptive lasso (GMAL) for covariate selection to achieve an accurate estimation of causal effect even when the outcome follows skewed distributions. A distinctive feature of our proposed method is that we utilize a linear median regression model for constructing penalty weights, thereby maintaining the accuracy of variable selection and causal effect estimation even when the outcome presents extremely skewed distributions. Simulation results showed that our proposed method performs comparably to existing methods in variable selection when the outcome follows a symmetric distribution. Besides, the proposed method exhibited obvious superiority over the existing methods when the outcome follows a skewed distribution. Meanwhile, our proposed method consistently outperformed the existing methods in causal estimation, as indicated by smaller root-mean-square error. We also utilized the GMAL method on a deoxyribonucleic acid methylation dataset from the Alzheimer's disease (AD) neuroimaging initiative database to investigate the association between cerebrospinal fluid tau protein levels and the severity of AD.
Collapse
Affiliation(s)
- Yahang Liu
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Qian Gao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
- Key Laboratory of Coal Environmental Pathogenicity and Prevention (Shanxi Medical University), Ministry of Education, China
| | - Kecheng Wei
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Chen Huang
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Ce Wang
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Yongfu Yu
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
- Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China
- Key Laboratory of Public Health Safety of Ministry of Education, Key Laboratory for Health Technology Assessment, National Commission of Health, Fudan University, Shanghai, China
| | - Guoyou Qin
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
- Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China
- Key Laboratory of Public Health Safety of Ministry of Education, Key Laboratory for Health Technology Assessment, National Commission of Health, Fudan University, Shanghai, China
| | - Tong Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
- Key Laboratory of Coal Environmental Pathogenicity and Prevention (Shanxi Medical University), Ministry of Education, China
| |
Collapse
|
2
|
Shin H, Antonelli J. Improved inference for doubly robust estimators of heterogeneous treatment effects. Biometrics 2023; 79:3140-3152. [PMID: 36745745 DOI: 10.1111/biom.13837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/16/2023] [Accepted: 01/30/2023] [Indexed: 02/08/2023]
Abstract
We propose a doubly robust approach to characterizing treatment effect heterogeneity in observational studies. We develop a frequentist inferential procedure that utilizes posterior distributions for both the propensity score and outcome regression models to provide valid inference on the conditional average treatment effect even when high-dimensional or nonparametric models are used. We show that our approach leads to conservative inference in finite samples or under model misspecification and provides a consistent variance estimator when both models are correctly specified. In simulations, we illustrate the utility of these results in difficult settings such as high-dimensional covariate spaces or highly flexible models for the propensity score and outcome regression. Lastly, we analyze environmental exposure data from NHANES to identify how the effects of these exposures vary by subject-level characteristics.
Collapse
Affiliation(s)
- Heejun Shin
- Department of Statistics, University of Florida, Gainesville, Florida, USA
| | - Joseph Antonelli
- Department of Statistics, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
3
|
Tyrer P, Sharp C. Establishing efficacy and effectiveness in the treatment of personality disorders. Personal Ment Health 2023; 17:295-299. [PMID: 37957135 DOI: 10.1002/pmh.1595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/15/2023]
Affiliation(s)
- Peter Tyrer
- Division of Psychiatry, Imperial College, London, UK
| | - Carla Sharp
- Department of Psychology, University of Houston, Houston, Texas, USA
| |
Collapse
|
4
|
Li F, Ding P, Mealli F. Bayesian causal inference: a critical review. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220153. [PMID: 36970828 DOI: 10.1098/rsta.2022.0153] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 10/23/2022] [Indexed: 06/18/2023]
Abstract
This paper provides a critical review of the Bayesian perspective of causal inference based on the potential outcomes framework. We review the causal estimands, assignment mechanism, the general structure of Bayesian inference of causal effects and sensitivity analysis. We highlight issues that are unique to Bayesian causal inference, including the role of the propensity score, the definition of identifiability, the choice of priors in both low- and high-dimensional regimes. We point out the central role of covariate overlap and more generally the design stage in Bayesian causal inference. We extend the discussion to two complex assignment mechanisms: instrumental variable and time-varying treatments. We identify the strengths and weaknesses of the Bayesian approach to causal inference. Throughout, we illustrate the key concepts via examples. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
Collapse
Affiliation(s)
- Fan Li
- Duke University, Durham, NC, USA
| | - Peng Ding
- University of California, Berkeley, CA, USA
| | | |
Collapse
|
5
|
Papadogeorgou G. Discussion on "Spatial+: a novel approach to spatial confounding" by Emiko Dupont, Simon N. Wood, and Nicole H. Augustin. Biometrics 2022; 78:1305-1308. [PMID: 35712896 DOI: 10.1111/biom.13655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 09/23/2021] [Accepted: 10/12/2021] [Indexed: 12/30/2022]
Abstract
I congratulate Dupont, Wood, and Augustin (DWA hereon) for providing an easy-to-implement method for estimation in the presence of spatial confounding, and for addressing some of the complicated aspects on the topic. I discuss conceptual and operational issues that are fundamental to inference in spatial settings: (i) the target quantity and its interpretability, (ii) the nonspatial aspect of covariates and their relative spatial scales, and (iii) the impact of spatial smoothing. While DWA provide some insights on these issues, I believe that the audience might benefit from a deeper discussion.
Collapse
|
6
|
Gao Q, Zhang Y, Sun H, Wang T. Evaluation of propensity score methods for causal inference with high-dimensional covariates. Brief Bioinform 2022; 23:6603435. [PMID: 35667004 DOI: 10.1093/bib/bbac227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2021] [Revised: 05/11/2022] [Accepted: 05/17/2022] [Indexed: 11/12/2022] Open
Abstract
In recent work, researchers have paid considerable attention to the estimation of causal effects in observational studies with a large number of covariates, which makes the unconfoundedness assumption plausible. In this paper, we review propensity score (PS) methods developed in high-dimensional settings and broadly group them into model-based methods that extend models for prediction to causal inference and balance-based methods that combine covariate balancing constraints. We conducted systematic simulation experiments to evaluate these two types of methods, and studied whether the use of balancing constraints further improved estimation performance. Our comparison methods were post-double-selection (PDS), double-index PS (DiPS), outcome-adaptive LASSO (OAL), group LASSO and doubly robust estimation (GLiDeR), high-dimensional covariate balancing PS (hdCBPS), regularized calibrated estimators (RCAL) and approximate residual balancing method (balanceHD). For the four model-based methods, simulation studies showed that GLiDeR was the most stable approach, with high estimation accuracy and precision, followed by PDS, OAL and DiPS. For balance-based methods, hdCBPS performed similarly to GLiDeR in terms of accuracy, and outperformed balanceHD and RCAL. These findings imply that PS methods do not benefit appreciably from covariate balancing constraints in high-dimensional settings. In conclusion, we recommend the preferential use of GLiDeR and hdCBPS approaches for estimating causal effects in high-dimensional settings; however, further studies on the construction of valid confidence intervals are required.
Collapse
Affiliation(s)
- Qian Gao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Yu Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Hongwei Sun
- Department of Health Statistics, School of Public Health and Management, Binzhou Medical University, Yantai, China
| | - Tong Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| |
Collapse
|
7
|
Antonelli J, Papadogeorgou G, Dominici F. Causal inference in high dimensions: A marriage between Bayesian modeling and good frequentist properties. Biometrics 2022; 78:100-114. [PMID: 33349923 PMCID: PMC8209114 DOI: 10.1111/biom.13417] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 12/02/2020] [Accepted: 12/04/2020] [Indexed: 11/30/2022]
Abstract
We introduce a framework for estimating causal effects of binary and continuous treatments in high dimensions. We show how posterior distributions of treatment and outcome models can be used together with doubly robust estimators. We propose an approach to uncertainty quantification for the doubly robust estimator, which utilizes posterior distributions of model parameters and (1) results in good frequentist properties in small samples, (2) is based on a single run of a Markov chain Monte Carlo (MCMC) algorithm, and (3) improves over frequentist measures of uncertainty which rely on asymptotic properties. We consider a flexible framework for modeling the treatment and outcome processes within the Bayesian paradigm that reduces model dependence, accommodates nonlinearity, and achieves dimension reduction of the covariate space. We illustrate the ability of the proposed approach to flexibly estimate causal effects in high dimensions and appropriately quantify uncertainty. We show that our proposed variance estimation strategy is consistent when both models are correctly specified, and we see empirically that it performs well in finite samples and under model misspecification. Finally, we estimate the effect of continuous environmental exposures on cholesterol and triglyceride levels.
Collapse
Affiliation(s)
- Joseph Antonelli
- Department of Statistics, University of Florida, Gainesville, FL, 32611
| | | | - Francesca Dominici
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| |
Collapse
|
8
|
Tang D, Kong D, Pan W, Wang L. Ultra‐high dimensional variable selection for doubly robust causal inference. Biometrics 2022. [DOI: 10.1111/biom.13625] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 12/01/2021] [Accepted: 12/16/2021] [Indexed: 11/30/2022]
Affiliation(s)
- Dingke Tang
- Department of Statistical Sciences University of Toronto Toronto ON M5S 3G3 Canada
| | - Dehan Kong
- Department of Statistical Sciences University of Toronto Toronto ON M5S 3G3 Canada
| | - Wenliang Pan
- Department of Statistical Science School of Mathematics Sun Yat‐Sen University China
| | - Linbo Wang
- Department of Statistical Sciences University of Toronto Toronto ON M5S 3G3 Canada
| |
Collapse
|
9
|
Gao Q, Zhang Y, Liang J, Sun H, Wang T. High-dimensional generalized propensity score with application to omics data. Brief Bioinform 2021; 22:6354024. [PMID: 34410351 DOI: 10.1093/bib/bbab331] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 07/26/2021] [Accepted: 07/27/2021] [Indexed: 01/09/2023] Open
Abstract
Propensity score (PS) methods are popular when estimating causal effects in non-randomized studies. Drawing causal conclusion relies on the unconfoundedness assumption. This assumption is untestable and is considered more plausible if a large number of pre-treatment covariates are included in the analysis. However, previous studies have shown that including unnecessary covariates into PS models can lead to bias and efficiency loss. With the ever-increasing amounts of available data, such as the omics data, there is often little prior knowledge of the exact set of important covariates. Therefore, variable selection for causal inference in high-dimensional settings has received considerable attention in recent years. However, recent studies have focused mainly on binary treatments. In this study, we considered continuous treatments and proposed the generalized outcome-adaptive LASSO (GOAL) to select covariates that can provide an unbiased and statistically efficient estimation. Simulation studies showed that when the outcome model was linear, the GOAL selected almost all true confounders and predictors of outcome and excluded other covariates. The accuracy and precision of the estimates were close to ideal. Furthermore, the GOAL is robust to model misspecification. We applied the GOAL to seven DNA methylation datasets from the Gene Expression Omnibus database, which covered four brain regions, to estimate the causal effects of epigenetic aging acceleration on the incidence of Alzheimer's disease.
Collapse
Affiliation(s)
- Qian Gao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Yu Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Jie Liang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Hongwei Sun
- Department of Health Statistics, School of Public Health and Management, Binzhou Medical University, Yantai, China
| | - Tong Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| |
Collapse
|
10
|
Schnell PM, Papadogeorgou G. Mitigating unobserved spatial confounding when estimating the effect of supermarket access on cardiovascular disease deaths. Ann Appl Stat 2020. [DOI: 10.1214/20-aoas1377] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
11
|
Affiliation(s)
- Edward I. George
- Department of Statistics, University of Pennsylvania, Philadelphia, PA
| | | |
Collapse
|
12
|
Koslovsky MD, Hoffman KL, Daniel CR, Vannucci M. A Bayesian model of microbiome data for simultaneous identification of covariate associations and prediction of phenotypic outcomes. Ann Appl Stat 2020. [DOI: 10.1214/20-aoas1354] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
13
|
Bai R, Moran GE, Antonelli JL, Chen Y, Boland MR. Spike-and-Slab Group Lassos for Grouped Regression and Sparse Generalized Additive Models. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1765784] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Ray Bai
- Department of Statistics, University of South Carolina, Columbia, SC
| | - Gemma E. Moran
- Data Science Institute, Columbia University, New York, NY
| | | | - Yong Chen
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA
| | - Mary R. Boland
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
14
|
Papadogeorgou G, Dominici F. A causal exposure response function with local adjustment for confounding: Estimating health effects of exposure to low levels of ambient fine particulate matter. Ann Appl Stat 2020; 14:850-871. [PMID: 33649709 PMCID: PMC7914396 DOI: 10.1214/20-aoas1330] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
In the last two decades, ambient levels of air pollution have declined substantially. At the same time, the Clean Air Act mandates that the National Ambient Air Quality Standards (NAAQS) must be routinely assessed to protect populations based on the latest science. Therefore, researchers should continue to address the following question: is exposure to levels of air pollution below the NAAQS harmful to human health? Furthermore, the contentious nature surrounding environmental regulations urges us to cast this question within a causal inference framework. Several parametric and semi-parametric regression approaches have been used to estimate the exposure-response (ER) curve between long-term exposure to ambient air pollution concentrations and health outcomes. However, most of the existing approaches are not formulated within a formal framework for causal inference, adjust for the same set of potential confounders across all levels of exposure, and do not account for model uncertainty regarding covariate selection and the shape of the ER. In this paper, we introduce a Bayesian framework for the estimation of a causal ER curve called LERCA (Local Exposure Response Confounding Adjustment), which a) allows for different confounders and different strength of confounding at the different exposure levels; and b) propagates model uncertainty regarding confounders' selection and the shape of the ER. Importantly, LERCA provides a principled way of assessing the observed covariates' confounding importance at different exposure levels, providing researchers with important information regarding the set of variables to measure and adjust for in regression models. Using simulation studies, we show that state of the art approaches perform poorly in estimating the ER curve in the presence of local confounding. LERCA is used to evaluate the relationship between long-term exposure to ambient PM2.5, a key regulated pollutant, and cardiovascular hospitalizations for 5,362 zip codes in the continental U.S. and located near a pollution monitoring site, while adjusting for a potentially varying set of confounders across the exposure range. Our data set includes rich health, weather, demographic, and pollution information for the years of 2011-2013. The estimated exposure-response curve is increasing indicating that higher ambient concentrations lead to higher cardiovascular hospitalization rates, and ambient PM2.5 was estimated to lead to an increase in cardiovascular hospitalization rates when focusing at the low exposure range. Our results indicate that there is no threshold for the effect of PM2.5 on cardiovascular hospitalizations.
Collapse
Affiliation(s)
| | - Francesca Dominici
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston MA 02115
| |
Collapse
|
15
|
Abstract
PURPOSE OF REVIEW Data science is an exploding trans-disciplinary field that aims to harness the power of data to gain information or insights on researcher-defined topics of interest. In this paper we review how data science can help advance environmental health research. RECENT FINDINGS We discuss the concepts computationally scalable handling of Big Data and the design of efficient research data platforms, and how data science can provide solutions for methodological challenges in environmental health research, such as high-dimensional outcomes and exposures, and prediction models. Finally, we discuss tools for reproducible research. SUMMARY In this paper we present opportunities to improve environmental research capabilities by embracing data science, and the pitfalls that environmental health researchers should avoid when employing data scientific approaches. Throughout the paper, we emphasize the need for environmental health researchers to collaborate more closely with biostatisticians and data scientists to ensure robust and interpretable results.
Collapse
Affiliation(s)
| | - Danielle Braun
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA
| | | |
Collapse
|
16
|
Antonelli J, Parmigiani G, Dominici F. High-Dimensional Confounding Adjustment Using Continuous Spike and Slab Priors. BAYESIAN ANALYSIS 2019; 14:805-828. [PMID: 32431779 PMCID: PMC7236769 DOI: 10.1214/18-ba1131] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In observational studies, estimation of a causal effect of a treatment on an outcome relies on proper adjustment for confounding. If the number of the potential confounders (p) is larger than the number of observations (n), then direct control for all potential confounders is infeasible. Existing approaches for dimension reduction and penalization are generally aimed at predicting the outcome, and are less suited for estimation of causal effects. Under standard penalization approaches (e.g. Lasso), if a variable Xj is strongly associated with the treatment T but weakly with the outcome Y, the coefficient βj will be shrunk towards zero thus leading to confounding bias. Under the assumption of a linear model for the outcome and sparsity, we propose continuous spike and slab priors on the regression coefficients βj corresponding to the potential confounders Xj . Specifically, we introduce a prior distribution that does not heavily shrink to zero the coefficients (βj s) of the Xj s that are strongly associated with T but weakly associated with Y. We compare our proposed approach to several state of the art methods proposed in the literature. Our proposed approach has the following features: 1) it reduces confounding bias in high dimensional settings; 2) it shrinks towards zero coefficients of instrumental variables; and 3) it achieves good coverages even in small sample sizes. We apply our approach to the National Health and Nutrition Examination Survey (NHANES) data to estimate the causal effects of persistent pesticide exposure on triglyceride levels.
Collapse
Affiliation(s)
- Joseph Antonelli
- Department of Statistics, University of Florida, 102 Griffin-Floyd Hall, P.O. Box 118545, Gainesville, Fl, 32611, USA
| | - Giovanni Parmigiani
- Department of Biostatistics and Computational Biology, CLS 11007, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, 02215, USA
| | - Francesca Dominici
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| |
Collapse
|