1
|
Xu Z, Li C, Chi S, Yang T, Wei P. Speeding up interval estimation for R2-based mediation effect of high-dimensional mediators via cross-fitting. Biostatistics 2024; 26:kxae037. [PMID: 39412139 PMCID: PMC11823199 DOI: 10.1093/biostatistics/kxae037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 08/22/2024] [Accepted: 08/28/2024] [Indexed: 10/30/2024] Open
Abstract
Mediation analysis is a useful tool in investigating how molecular phenotypes such as gene expression mediate the effect of exposure on health outcomes. However, commonly used mean-based total mediation effect measures may suffer from cancellation of component-wise mediation effects in opposite directions in the presence of high-dimensional omics mediators. To overcome this limitation, we recently proposed a variance-based R-squared total mediation effect measure that relies on the computationally intensive nonparametric bootstrap for confidence interval estimation. In the work described herein, we formulated a more efficient two-stage, cross-fitted estimation procedure for the R2 measure. To avoid potential bias, we performed iterative Sure Independence Screening (iSIS) in two subsamples to exclude the non-mediators, followed by ordinary least squares regressions for the variance estimation. We then constructed confidence intervals based on the newly derived closed-form asymptotic distribution of the R2 measure. Extensive simulation studies demonstrated that this proposed procedure is much more computationally efficient than the resampling-based method, with comparable coverage probability. Furthermore, when applied to the Framingham Heart Study, the proposed method replicated the established finding of gene expression mediating age-related variation in systolic blood pressure and identified the role of gene expression profiles in the relationship between sex and high-density lipoprotein cholesterol level. The proposed estimation procedure is implemented in R package CFR2M.
Collapse
Affiliation(s)
- Zhichao Xu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 7007 Bertner Avenue, Houston, TX 77030, United States
| | - Chunlin Li
- Department of Statistics, Iowa State University, 2438 Osborn Dr, Ames, IA 50011, United States
| | - Sunyi Chi
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 7007 Bertner Avenue, Houston, TX 77030, United States
| | - Tianzhong Yang
- Division of Biostatistics and Health Data Science, University of Minnesota, 2221 University Ave SE, Minneapolis, MN 55455, United States
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 7007 Bertner Avenue, Houston, TX 77030, United States
| |
Collapse
|
2
|
Xu Z, Wei P. A novel statistical framework for meta-analysis of total mediation effect with high-dimensional omics mediators in large-scale genomic consortia. PLoS Genet 2024; 20:e1011483. [PMID: 39561194 PMCID: PMC11614268 DOI: 10.1371/journal.pgen.1011483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 12/03/2024] [Accepted: 11/03/2024] [Indexed: 11/21/2024] Open
Abstract
Meta-analysis is used to aggregate the effects of interest across multiple studies, while its methodology is largely underexplored in mediation analysis, particularly in estimating the total mediation effect of high-dimensional omics mediators. Large-scale genomic consortia, such as the Trans-Omics for Precision Medicine (TOPMed) program, comprise multiple cohorts with diverse technologies to elucidate the genetic architecture and biological mechanisms underlying complex human traits and diseases. Leveraging the recent established asymptotic standard error of the R-squared (R2)-based mediation effect estimation for high-dimensional omics mediators, we have developed a novel meta-analysis framework requiring only summary statistics and allowing inter-study heterogeneity. Whereas the proposed meta-analysis can uniquely evaluate and account for potential effect heterogeneity across studies due to, for example, varying genomic profiling platforms, our extensive simulations showed that the developed method was more computationally efficient and yielded satisfactory operating characteristics comparable to analysis of the pooled individual-level data when there was no inter-study heterogeneity. We applied the developed method to 5 TOPMed studies with over 5800 participants to estimate the mediation effects of gene expression on age-related variation in systolic blood pressure and sex-related variation in high-density lipoprotein (HDL) cholesterol. The proposed method is available in R package MetaR2M on GitHub.
Collapse
Affiliation(s)
- Zhichao Xu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| |
Collapse
|
3
|
Derkach A, Kantor ED, Sampson JN, Pfeiffer RM. Mediation analysis using incomplete information from publicly available data sources. Stat Med 2024; 43:2695-2712. [PMID: 38606437 DOI: 10.1002/sim.10076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 03/08/2024] [Accepted: 03/25/2024] [Indexed: 04/13/2024]
Abstract
Our work was motivated by the question whether, and to what extent, well-established risk factors mediate the racial disparity observed for colorectal cancer (CRC) incidence in the United States. Mediation analysis examines the relationships between an exposure, a mediator and an outcome. All available methods require access to a single complete data set with these three variables. However, because population-based studies usually include few non-White participants, these approaches have limited utility in answering our motivating question. Recently, we developed novel methods to integrate several data sets with incomplete information for mediation analysis. These methods have two limitations: (i) they only consider a single mediator and (ii) they require a data set containing individual-level data on the mediator and exposure (and possibly confounders) obtained by independent and identically distributed sampling from the target population. Here, we propose a new method for mediation analysis with several different data sets that accommodates complex survey and registry data, and allows for multiple mediators. The proposed approach yields unbiased causal effects estimates and confidence intervals with nominal coverage in simulations. We apply our method to data from U.S. cancer registries, a U.S.-population-representative survey and summary level odds-ratio estimates, to rigorously evaluate what proportion of the difference in CRC risk between non-Hispanic Whites and Blacks is mediated by three potentially modifiable risk factors (CRC screening history, body mass index, and regular aspirin use).
Collapse
Affiliation(s)
- Andriy Derkach
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, New York
| | - Elizabeth D Kantor
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, New York
| | - Joshua N Sampson
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland
| | - Ruth M Pfeiffer
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland
| |
Collapse
|
4
|
Xu Z, Wei P. A novel statistical framework for meta-analysis of total mediation effect with high-dimensional omics mediators in large-scale genomic consortia. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.29.591700. [PMID: 38746374 PMCID: PMC11092451 DOI: 10.1101/2024.04.29.591700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Meta-analysis is used to aggregate the effects of interest across multiple studies, while its methodology is largely underexplored in mediation analysis, particularly in estimating the total mediation effect of high-dimensional omics mediators. Large-scale genomic consortia, such as the Trans-Omics for Precision Medicine (TOPMed) program, comprise multiple cohorts with diverse technologies to elucidate the genetic architecture and biological mechanisms underlying complex human traits and diseases. Leveraging the recent established asymptotic standard error of the R-squared R 2 -based mediation effect estimation for high-dimensional omics mediators, we have developed a novel meta-analysis framework requiring only summary statistics and allowing inter-study heterogeneity. Whereas the proposed meta-analysis can uniquely evaluate and account for potential effect heterogeneity across studies due to, for example, varying genomic profiling platforms, our extensive simulations showed that the developed method was more computationally efficient and yielded satisfactory operating characteristics comparable to analysis of the pooled individual-level data when there was no inter-study heterogeneity. We applied the developed method to 8 TOPMed studies with over 5800 participants to estimate the mediation effects of gene expression on age-related variation in systolic blood pressure and sex-related variation in high-density lipoprotein (HDL) cholesterol. The proposed method is available in R package MetaR2M on GitHub.
Collapse
Affiliation(s)
- Zhichao Xu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center
| |
Collapse
|
5
|
He Y, Song PXK, Xu G. Adaptive bootstrap tests for composite null hypotheses in the mediation pathway analysis. J R Stat Soc Series B Stat Methodol 2024; 86:411-434. [PMID: 38746015 PMCID: PMC11090400 DOI: 10.1093/jrsssb/qkad129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 10/15/2023] [Accepted: 10/16/2023] [Indexed: 05/16/2024]
Abstract
Mediation analysis aims to assess if, and how, a certain exposure influences an outcome of interest through intermediate variables. This problem has recently gained a surge of attention due to the tremendous need for such analyses in scientific fields. Testing for the mediation effect (ME) is greatly challenged by the fact that the underlying null hypothesis (i.e. the absence of MEs) is composite. Most existing mediation tests are overly conservative and thus underpowered. To overcome this significant methodological hurdle, we develop an adaptive bootstrap testing framework that can accommodate different types of composite null hypotheses in the mediation pathway analysis. Applied to the product of coefficients test and the joint significance test, our adaptive testing procedures provide type I error control under the composite null, resulting in much improved statistical power compared to existing tests. Both theoretical properties and numerical examples of the proposed methodology are discussed.
Collapse
Affiliation(s)
- Yinqiu He
- Department of Statistics, University of Wisconsin, Madison, WI, USA
| | - Peter X K Song
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Gongjun Xu
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
6
|
Fritz J, Huang T, Depner CM, Zeleznik OA, Cespedes Feliciano EM, Li W, Stone KL, Manson JE, Clish C, Sofer T, Schernhammer E, Rexrode K, Redline S, Wright KP, Vetter C. Sleep duration, plasma metabolites, and obesity and diabetes: a metabolome-wide association study in US women. Sleep 2023; 46:zsac226. [PMID: 36130143 PMCID: PMC9832513 DOI: 10.1093/sleep/zsac226] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 08/08/2022] [Indexed: 01/16/2023] Open
Abstract
Short and long sleep duration are associated with adverse metabolic outcomes, such as obesity and diabetes. We evaluated cross-sectional differences in metabolite levels between women with self-reported habitual short (<7 h), medium (7-8 h), and long (≥9 h) sleep duration to delineate potential underlying biological mechanisms. In total, 210 metabolites were measured via liquid chromatography-mass spectrometry in 9207 women from the Nurses' Health Study (NHS; N = 5027), the NHSII (N = 2368), and the Women's Health Initiative (WHI; N = 2287). Twenty metabolites were consistently (i.e. praw < .05 in ≥2 cohorts) and/or strongly (pFDR < .05 in at least one cohort) associated with short sleep duration after multi-variable adjustment. Specifically, levels of two lysophosphatidylethanolamines, four lysophosphatidylcholines, hydroxyproline and phenylacetylglutamine were higher compared to medium sleep duration, while levels of one diacylglycerol and eleven triacylglycerols (TAGs; all with ≥3 double bonds) were lower. Moreover, enrichment analysis assessing associations of metabolites with short sleep based on biological categories demonstrated significantly increased acylcarnitine levels for short sleep. A metabolite score for short sleep duration based on 12 LASSO-regression selected metabolites was not significantly associated with prevalent and incident obesity and diabetes. Associations of single metabolites with long sleep duration were less robust. However, enrichment analysis demonstrated significant enrichment scores for four lipid classes, all of which (most markedly TAGs) were of opposite sign than the scores for short sleep. Habitual short sleep exhibits a signature on the human plasma metabolome which is different from medium and long sleep. However, we could not detect a direct link of this signature with obesity and diabetes risk.
Collapse
Affiliation(s)
- Josef Fritz
- Circadian and Sleep Epidemiology Laboratory, Department of Integrative Physiology, University of Colorado Boulder, Boulder, CO, USA
- Department of Medical Statistics, Informatics and Health Economics, Medical University of Innsbruck, Innsbruck, Austria
| | - Tianyi Huang
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Christopher M Depner
- Department of Health and Kinesiology, University of Utah, Salt Lake City, UT, USA
| | - Oana A Zeleznik
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Wenjun Li
- Department of Public Health, School of Health Sciences, University of Massachusetts Lowell, Lowell, MA, USA
| | - Katie L Stone
- California Pacific Medical Center Research Institute, San Francisco, CA, USA
| | - JoAnn E Manson
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
- Division of Preventive Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Clary Clish
- Metabolomics Platform, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Tamar Sofer
- Division of Sleep and Circadian Disorders, Harvard Medical School, Brigham and Women’s Hospital, Boston, MA, USA
| | - Eva Schernhammer
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Center for Public Health, Medical University of Vienna, Vienna, Austria
| | - Kathryn Rexrode
- Division of Preventive Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Division of Women’s Health, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Susan Redline
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Kenneth P Wright
- Sleep and Chronobiology Laboratory, Department of Integrative Physiology, University of Colorado Boulder, Boulder, CO, USA
| | - Céline Vetter
- Circadian and Sleep Epidemiology Laboratory, Department of Integrative Physiology, University of Colorado Boulder, Boulder, CO, USA
| |
Collapse
|
7
|
Zhang H, Zheng Y, Hou L, Zheng C, Liu L. Mediation analysis for survival data with high-dimensional mediators. Bioinformatics 2021; 37:3815-3821. [PMID: 34343267 PMCID: PMC8570823 DOI: 10.1093/bioinformatics/btab564] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 07/18/2021] [Accepted: 07/29/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Mediation analysis has become a prevalent method to identify causal pathway(s) between an independent variable and a dependent variable through intermediate variable(s). However, little work has been done when the intermediate variables (mediators) are high-dimensional and the outcome is a survival endpoint. In this paper, we introduce a novel method to identify potential mediators in a causal framework of high-dimensional Cox regression. RESULTS We first reduce the data dimension through a mediation-based sure independence screening method. A de-biased Lasso inference procedure is used for Cox's regression parameters. We adopt a multiple-testing procedure to accurately control the false discovery rate when testing high-dimensional mediation hypotheses. Simulation studies are conducted to demonstrate the performance of our method. We apply this approach to explore the mediation mechanisms of 379 330 DNA methylation markers between smoking and overall survival among lung cancer patients in The Cancer Genome Atlas lung cancer cohort. Two methylation sites (cg08108679 and cg26478297) are identified as potential mediating epigenetic markers. AVAILABILITY AND IMPLEMENTATION Our proposed method is available with the R package HIMA at https://cran.r-project.org/web/packages/HIMA/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Haixiang Zhang
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Yinan Zheng
- Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Cheng Zheng
- Department of Biostatistics, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Lei Liu
- Division of Biostatistics, Washington University in St. Louis, St. Louis, MO 63110, USA
| |
Collapse
|
8
|
Loh WW, Moerkerke B, Loeys T, Vansteelandt S. Nonlinear mediation analysis with high-dimensional mediators whose causal structure is unknown. Biometrics 2020; 78:46-59. [PMID: 33215694 DOI: 10.1111/biom.13402] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 10/28/2020] [Accepted: 11/11/2020] [Indexed: 11/28/2022]
Abstract
With multiple possible mediators on the causal pathway from a treatment to an outcome, we consider the problem of decomposing the effects along multiple possible causal path(s) through each distinct mediator. Under a path-specific effects framework, such fine-grained decompositions necessitate stringent assumptions, such as correctly specifying the causal structure among the mediators, and no unobserved confounding among the mediators. In contrast, interventional direct and indirect effects for multiple mediators can be identified under much weaker conditions, while providing scientifically relevant causal interpretations. Nonetheless, current estimation approaches require (correctly) specifying a model for the joint mediator distribution, which can be difficult when there is a high-dimensional set of possibly continuous and noncontinuous mediators. In this article, we avoid the need to model this distribution, by developing a definition of interventional effects previously suggested for longitudinal mediation. We propose a novel estimation strategy that uses nonparametric estimates of the (counterfactual) mediator distributions. Noncontinuous outcomes can be accommodated using nonlinear outcome models. Estimation proceeds via Monte Carlo integration. The procedure is illustrated using publicly available genomic data to assess the causal effect of a microRNA expression on the 3-month mortality of brain cancer patients that is potentially mediated by expression values of multiple genes.
Collapse
Affiliation(s)
- Wen Wei Loh
- Department of Data Analysis, Ghent University, Gent, Belgium
| | | | - Tom Loeys
- Department of Data Analysis, Ghent University, Gent, Belgium
| | - Stijn Vansteelandt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.,Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|