1
|
Li R, Zhu X, Lee S. Model Selection for Exposure-Mediator Interaction. DATA SCIENCE IN SCIENCE 2024; 3:2360892. [PMID: 38947225 PMCID: PMC11210705 DOI: 10.1080/26941899.2024.2360892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 05/23/2024] [Indexed: 07/02/2024]
Abstract
In mediation analysis, the exposure often influences the mediating effect, i.e., there is an interaction between exposure and mediator on the dependent variable. When the mediator is high-dimensional, it is necessary to identify non-zero mediatorsM and exposure-by-mediator ( X -by- M ) interactions. Although several high-dimensional mediation methods can naturally handle X -by- M interactions, research is scarce in preserving the underlying hierarchical structure between the main effects and the interactions. To fill the knowledge gap, we develop the XMInt procedure to select M and X -by- M interactions in the high-dimensional mediators setting while preserving the hierarchical structure. Our proposed method employs a sequential regularization-based forward-selection approach to identify the mediators and their hierarchically preserved interaction with exposure. Our numerical experiments showed promising selection results. Further, we applied our method to ADNI morphological data and examined the role of cortical thickness and subcortical volumes on the effect of amyloid-beta accumulation on cognitive performance, which could be helpful in understanding the brain compensation mechanism.
Collapse
Affiliation(s)
| | - Ruiyang Li
- Department of Biostatistics, Columbia University, New York, USA
| | - Xi Zhu
- Department of Psychiatry, Columbia University, New York, USA
- Mental Health Data Science, New York State Psychiatric Institute and Research Foundation for Mental Hygiene, Inc., New York, USA
| | - Seonjoo Lee
- Department of Biostatistics, Columbia University, New York, USA
- Department of Psychiatry, Columbia University, New York, USA
- Mental Health Data Science, New York State Psychiatric Institute and Research Foundation for Mental Hygiene, Inc., New York, USA
| |
Collapse
|
2
|
Hu W, Chen S, Cai J, Yang Y, Yan H, Chen F. High-dimensional mediation analysis for continuous outcome with confounders using overlap weighting method in observational epigenetic study. BMC Med Res Methodol 2024; 24:125. [PMID: 38831262 PMCID: PMC11145821 DOI: 10.1186/s12874-024-02254-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 05/22/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND Mediation analysis is a powerful tool to identify factors mediating the causal pathway of exposure to health outcomes. Mediation analysis has been extended to study a large number of potential mediators in high-dimensional data settings. The presence of confounding in observational studies is inevitable. Hence, it's an essential part of high-dimensional mediation analysis (HDMA) to adjust for the potential confounders. Although the propensity score (PS) related method such as propensity score regression adjustment (PSR) and inverse probability weighting (IPW) has been proposed to tackle this problem, the characteristics with extreme propensity score distribution of the PS-based method would result in the biased estimation. METHODS In this article, we integrated the overlapping weighting (OW) technique into HDMA workflow and proposed a concise and powerful high-dimensional mediation analysis procedure consisting of OW confounding adjustment, sure independence screening (SIS), de-biased Lasso penalization, and joint-significance testing underlying the mixture null distribution. We compared the proposed method with the existing method consisting of PS-based confounding adjustment, SIS, minimax concave penalty (MCP) variable selection, and classical joint-significance testing. RESULTS Simulation studies demonstrate the proposed procedure has the best performance in mediator selection and estimation. The proposed procedure yielded the highest true positive rate, acceptable false discovery proportion level, and lower mean square error. In the empirical study based on the GSE117859 dataset in the Gene Expression Omnibus database using the proposed method, we found that smoking history may lead to the estimated natural killer (NK) cell level reduction through the mediation effect of some methylation markers, mainly including methylation sites cg13917614 in CNP gene and cg16893868 in LILRA2 gene. CONCLUSIONS The proposed method has higher power, sufficient false discovery rate control, and precise mediation effect estimation. Meanwhile, it is feasible to be implemented with the presence of confounders. Hence, our method is worth considering in HDMA studies.
Collapse
Affiliation(s)
- Weiwei Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China
| | - Shiyu Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China
| | - Jiaxin Cai
- Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China
| | - Yuhui Yang
- Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China
| | - Hong Yan
- Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China
| | - Fangyao Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China.
- Department of Radiology, First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China.
| |
Collapse
|
3
|
Cai Q, Fu Y, Lyu C, Wang Z, Rao S, Alvarez JA, Bai Y, Kang J, Yu T. A new framework for exploratory network mediator analysis in omics data. Genome Res 2024; 34:642-654. [PMID: 38719472 PMCID: PMC11146592 DOI: 10.1101/gr.278684.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/11/2024] [Indexed: 06/01/2024]
Abstract
Omics methods are widely used in basic biology and translational medicine research. More and more omics data are collected to explain the impact of certain risk factors on clinical outcomes. To explain the mechanism of the risk factors, a core question is how to find the genes/proteins/metabolites that mediate their effects on the clinical outcome. Mediation analysis is a modeling framework to study the relationship between risk factors and pathological outcomes, via mediator variables. However, high-dimensional omics data are far more challenging than traditional data: (1) From tens of thousands of genes, can we overcome the curse of dimensionality to reliably select a set of mediators? (2) How do we ensure that the selected mediators are functionally consistent? (3) Many biological mechanisms contain nonlinear effects. How do we include nonlinear effects in the high-dimensional mediation analysis? (4) How do we consider multiple risk factors at the same time? To meet these challenges, we propose a new exploratory mediation analysis framework, medNet, which focuses on finding mediators through predictive modeling. We propose new definitions for predictive exposure, predictive mediator, and predictive network mediator, using a statistical hypothesis testing framework to identify predictive exposures and mediators. Additionally, two heuristic search algorithms are proposed to identify network mediators, essentially subnetworks in the genome-scale biological network that mediate the effects of single or multiple exposures. We applied medNet on a breast cancer data set and a metabolomics data set combined with food intake questionnaire data. It identified functionally consistent network mediators for the exposures' impact on the outcome, facilitating data interpretation.
Collapse
Affiliation(s)
- Qingpo Cai
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia 30322, USA
| | - Yinghao Fu
- Shenzhen Research Institute of Big Data, School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Guangdong 518172, P.R. China
- School of Medicine, the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Guangdong 518172, P.R. China
| | - Cheng Lyu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia 30322, USA
| | - Zihe Wang
- Shenzhen Research Institute of Big Data, School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Guangdong 518172, P.R. China
| | - Shun Rao
- Shenzhen Research Institute of Big Data, School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Guangdong 518172, P.R. China
| | - Jessica A Alvarez
- Department of Medicine, Emory University, Atlanta, Georgia 30322, USA
| | - Yun Bai
- School of Medicine, the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Guangdong 518172, P.R. China
| | - Jian Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Tianwei Yu
- Shenzhen Research Institute of Big Data, School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Guangdong 518172, P.R. China;
| |
Collapse
|
4
|
Chi S, Flowers CR, Li Z, Huang X, Wei P. MASH: MEDIATION ANALYSIS OF SURVIVAL OUTCOME AND HIGH-DIMENSIONAL OMICS MEDIATORS WITH APPLICATION TO COMPLEX DISEASES. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.22.554286. [PMID: 37662296 PMCID: PMC10473652 DOI: 10.1101/2023.08.22.554286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Environmental exposures such as cigarette smoking influence health outcomes through intermediate molecular phenotypes, such as the methylome, transcriptome, and metabolome. Mediation analysis is a useful tool for investigating the role of potentially high-dimensional intermediate phenotypes in the relationship between environmental exposures and health outcomes. However, little work has been done on mediation analysis when the mediators are high-dimensional and the outcome is a survival endpoint, and none of it has provided a robust measure of total mediation effect. To this end, we propose an estimation procedure for Mediation Analysis of Survival outcome and High-dimensional omics mediators (MASH) based on sure independence screening for putative mediator variable selection and a second-moment-based measure of total mediation effect for survival data analogous to the R 2 measure in a linear model. Extensive simulations showed good performance of MASH in estimating the total mediation effect and identifying true mediators. By applying MASH to the metabolomics data of 1919 subjects in the Framingham Heart Study, we identified five metabolites as mediators of the effect of cigarette smoking on coronary heart disease risk (total mediation effect, 51.1%) and two metabolites as mediators between smoking and risk of cancer (total mediation effect, 50.7%). Application of MASH to a diffuse large B-cell lymphoma genomics data set identified copy-number variations for eight genes as mediators between the baseline International Prognostic Index score and overall survival.
Collapse
Affiliation(s)
- Sunyi Chi
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Christopher R Flowers
- Department of Lymphoma, The University of Texas MD Anderson Cancer Center, Houston, USA
| | - Ziyi Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Xuelin Huang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
5
|
Zhang H, Hong X, Zheng Y, Hou L, Zheng C, Wang X, Liu L. High-dimensional quantile mediation analysis with application to a birth cohort study of mother-newborn pairs. Bioinformatics 2024; 40:btae055. [PMID: 38290773 PMCID: PMC10873903 DOI: 10.1093/bioinformatics/btae055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 02/16/2024] [Accepted: 02/16/2024] [Indexed: 02/01/2024] Open
Abstract
MOTIVATION There has been substantial recent interest in developing methodology for high-dimensional mediation analysis. Yet, the majority of mediation statistical methods lean heavily on mean regression, which limits their ability to fully capture the complex mediating effects across the outcome distribution. To bridge this gap, we propose a novel approach for selecting and testing mediators throughout the full range of the outcome distribution spectrum. RESULTS The proposed high-dimensional quantile mediation model provides a comprehensive insight into how potential mediators impact outcomes via their mediation pathways. This method's efficacy is demonstrated through extensive simulations. The study presents a real-world data application examining the mediating effects of DNA methylation on the relationship between maternal smoking and offspring birthweight. AVAILABILITY AND IMPLEMENTATION Our method offers a publicly available and user-friendly function qHIMA(), which can be accessed through the R package HIMA at https://CRAN.R-project.org/package=HIMA.
Collapse
Affiliation(s)
- Haixiang Zhang
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Xiumei Hong
- Department of Population, Family and Reproductive Health, Center On the Early Life Origins of Disease, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD 21205, United States
| | - Yinan Zheng
- Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, United States
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, United States
| | - Cheng Zheng
- Department of Biostatistics, University of Nebraska Medical Center, Omaha, NE 68198, United States
| | - Xiaobin Wang
- Department of Population, Family and Reproductive Health, Center On the Early Life Origins of Disease, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD 21205, United States
- Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, United States
| | - Lei Liu
- Division of Biostatistics, Washington University in St. Louis, St. Louis, MO 63110, United States
| |
Collapse
|
6
|
Dai R, Zheng C. False discovery rate-controlled multiple testing for union null hypotheses: a knockoff-based approach. Biometrics 2023; 79:3497-3509. [PMID: 36854821 PMCID: PMC10460825 DOI: 10.1111/biom.13848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 02/17/2023] [Indexed: 03/02/2023]
Abstract
False discovery rate (FDR) controlling procedures provide important statistical guarantees for replicability in signal identification based on multiple hypotheses testing. In many fields of study, FDR controling procedures are used in high-dimensional (HD) analyses to discover features that are truly associated with the outcome. In some recent applications, data on the same set of candidate features are independently collected in multiple different studies. For example, gene expression data are collected at different facilities and with different cohorts, to identify the genetic biomarkers of multiple types of cancers. These studies provide us with opportunities to identify signals by considering information from different sources (with potential heterogeneity) jointly. This paper is about how to provide FDR control guarantees for the tests of union null hypotheses of conditional independence. We present a knockoff-based variable selection method (Simultaneous knockoffs) to identify mutual signals from multiple independent datasets, providing exact FDR control guarantees under finite sample settings. This method can work with very general model settings and test statistics. We demonstrate the performance of this method with extensive numerical studies and two real-data examples.
Collapse
Affiliation(s)
- Ran Dai
- Department of Biostatistics, University of Nebraska Medical Center, Omaha, Nebraska, U.S.A
| | | |
Collapse
|
7
|
Liao Y, Deng Y, Yu X, Zhang P, Liu R. The mediating role of AKT/ERK/JNK signaling on the malignant phenotype of microcystin-LR in gastric adenocarcinoma cells. Food Chem Toxicol 2023; 182:114174. [PMID: 37949205 DOI: 10.1016/j.fct.2023.114174] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 10/23/2023] [Accepted: 11/02/2023] [Indexed: 11/12/2023]
Abstract
Microcystin-leucine arginine (MC-LR), a widely distributed and highly toxic environmental pollutant, plays crucial roles in cancer malignancy by activating characteristically toxic signaling pathways. Traditional animal-based toxicity evaluation methods have proven insufficient for identifying the specific role of these signaling pathways. Therefore, this study aimed to uncover the regulatory relationship between the toxic pathways and the progression of gastric cancer (GC). The findings provide novel avenues for conducting in vitro toxicity tests based on the investigated pathways. We found that MC-LR promoted the migration and invasion of SGC-7901 cells while simultaneously inhibiting their apoptosis in a dose-dependent manner. This observed cytotoxicity was primarily mediated through the AKT, JNK, and ERK signaling pathways. By using a mediation analysis model, we determined that AKT and ERK exhibited competitive effects in MC-LR-treated GC malignancy, while AKT and JNK acted independently from one another. This study establishes an in vitro toxicity test model of MC-LR based on toxicity-related pathways and underscores the pivotal roles of AKT, ERK, and JNK signaling in MC-LR toxicity. The findings offer a novel, fundamental framework for conducting chemical toxicity risk assessment.
Collapse
Affiliation(s)
- Yinghao Liao
- Key Laboratory of Environmental Medicine Engineering, Ministry of Education, School of Public Health, Southeast University, Nanjing, 210009, China
| | - Yali Deng
- Key Laboratory of Environmental Medicine Engineering, Ministry of Education, School of Public Health, Southeast University, Nanjing, 210009, China; Center for Disease Control and Prevention of Huizhou, No. 10, Fumin Road, Huizhou, 516003, Guangdong, China
| | - Xiaojin Yu
- Key Laboratory of Environmental Medicine Engineering, Ministry of Education, School of Public Health, Southeast University, Nanjing, 210009, China
| | - Peng Zhang
- Huzhou Center for Disease Prevention and Control, Huzhou, 313000, China.
| | - Ran Liu
- Key Laboratory of Environmental Medicine Engineering, Ministry of Education, School of Public Health, Southeast University, Nanjing, 210009, China.
| |
Collapse
|
8
|
Chen F, Hu W, Cai J, Chen S, Si A, Zhang Y, Liu W. Instrumental variable-based high-dimensional mediation analysis with unmeasured confounders for survival data in the observational epigenetic study. Front Genet 2023; 14:1092489. [PMID: 36816039 PMCID: PMC9932046 DOI: 10.3389/fgene.2023.1092489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 01/16/2023] [Indexed: 02/04/2023] Open
Abstract
Background: High dimensional mediation analysis is frequently conducted to explore the role of epigenetic modifiers between exposure and health outcome. However, the issue of high dimensional mediation analysis with unmeasured confounders for survival analysis in observational study has not been well solved. Methods: In this study, we proposed an instrumental variable based approach for high dimensional mediation analysis with unmeasured confounders in survival analysis for epigenetic study. We used the Sobel's test, the Joint test, and the Bootstrap method to test the mediation effect. A comprehensive simulation study was conducted to decide the best test strategy. An empirical study based on DNA methylation data of lung cancer patients was conducted to illustrate the performance of the proposed method. Results: Simulation study suggested that the proposed method performed well in the identifying mediating factors. The estimation of the mediation effect by the proposed approach is also reliable with less bias compared with the classical approach. In the empirical study, we identified two DNA methylation signatures including cg21926276 and cg26387355 with a mediation effect of 0.226 (95%CI: 0.108-0.344) and 0.158 (95%CI: 0.065-0.251) between smoking and lung cancer using the proposed approach. Conclusion: The proposed method obtained good performance in simulation and empirical studies, it could be an effective statistical tool for high dimensional mediation analysis.
Collapse
Affiliation(s)
- Fangyao Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an, Shaanxi, China,Department of Radiology, First Affiliated Hospital of Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Weiwei Hu
- Department of Radiology, First Affiliated Hospital of Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Jiaxin Cai
- Department of Epidemiology and Biostatistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an, Shaanxi, China
| | - Shiyu Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an, Shaanxi, China
| | - Aima Si
- Department of Epidemiology and Biostatistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an, Shaanxi, China
| | - Yuxiang Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an, Shaanxi, China
| | - Wei Liu
- Department of Cell Biology and Genetics, School of Basic Medical Science, Xi’an Jiaotong University Health Science Center, Xi’an, Shaanxi, China,*Correspondence: Wei Liu,
| |
Collapse
|
9
|
Han Q, Wang Y, Sun N, Chu J, Hu W, Shen Y. Mediation analysis method review of high throughput data. Stat Appl Genet Mol Biol 2023; 22:sagmb-2023-0031. [PMID: 38015771 DOI: 10.1515/sagmb-2023-0031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 11/11/2023] [Indexed: 11/30/2023]
Abstract
High-throughput technologies have made high-dimensional settings increasingly common, providing opportunities for the development of high-dimensional mediation methods. We aimed to provide useful guidance for researchers using high-dimensional mediation analysis and ideas for biostatisticians to develop it by summarizing and discussing recent advances in high-dimensional mediation analysis. The method still faces many challenges when extended single and multiple mediation analyses to high-dimensional settings. The development of high-dimensional mediation methods attempts to address these issues, such as screening true mediators, estimating mediation effects by variable selection, reducing the mediation dimension to resolve correlations between variables, and utilizing composite null hypothesis testing to test them. Although these problems regarding high-dimensional mediation have been solved to some extent, some challenges remain. First, the correlation between mediators are rarely considered when the variables are selected for mediation. Second, downscaling without incorporating prior biological knowledge makes the results difficult to interpret. In addition, a method of sensitivity analysis for the strict sequential ignorability assumption in high-dimensional mediation analysis is still lacking. An analyst needs to consider the applicability of each method when utilizing them, while a biostatistician could consider extensions and improvements in the methodology.
Collapse
Affiliation(s)
- Qiang Han
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou 215123, China
| | - Yu Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou 215123, China
| | - Na Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou 215123, China
| | - Jiadong Chu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou 215123, China
| | - Wei Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou 215123, China
| | - Yueping Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou 215123, China
| |
Collapse
|
10
|
Tian P, Yao M, Huang T, Liu Z. CoxMKF: a knockoff filter for high-dimensional mediation analysis with a survival outcome in epigenetic studies. Bioinformatics 2022; 38:5229-5235. [PMID: 36255264 DOI: 10.1093/bioinformatics/btac687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 10/11/2022] [Accepted: 10/17/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION It is of scientific interest to identify DNA methylation CpG sites that might mediate the effect of an environmental exposure on a survival outcome in high-dimensional mediation analysis. However, there is a lack of powerful statistical methods that can provide a guarantee of false discovery rate (FDR) control in finite-sample settings. RESULTS In this article, we propose a novel method called CoxMKF, which applies aggregation of multiple knockoffs to a Cox proportional hazards model for a survival outcome with high-dimensional mediators. The proposed CoxMKF can achieve FDR control even in finite-sample settings, which is particularly advantageous when the sample size is not large. Moreover, our proposed CoxMKF can overcome the randomness of the unstable model-X knockoffs. Our simulation results show that CoxMKF controls FDR well in finite samples. We further apply CoxMKF to a lung cancer dataset from The Cancer Genome Atlas (TCGA) project with 754 subjects and 365 306 DNA methylation CpG sites, and identify four DNA methylation CpG sites that might mediate the effect of smoking on the overall survival among lung cancer patients. AVAILABILITY AND IMPLEMENTATION The R package CoxMKF is publicly available at https://github.com/MinhaoYaooo/CoxMKF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peixin Tian
- Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam, Hong Kong SAR 999077, China
| | - Minhao Yao
- Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam, Hong Kong SAR 999077, China
| | - Tao Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China.,Key Laboratory of Molecular Cardiovascular Sciences (Peking University), Ministry of Education, Beijing 100191, China
| | - Zhonghua Liu
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| |
Collapse
|
11
|
Luo L, Yan Y, Cui Y, Yuan X, Yu Z. Linear high-dimensional mediation models adjusting for confounders using propensity score method. Front Genet 2022; 13:961148. [PMID: 36299590 PMCID: PMC9589256 DOI: 10.3389/fgene.2022.961148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Accepted: 09/14/2022] [Indexed: 11/13/2022] Open
Abstract
High-dimensional mediation analysis has been developed to study whether epigenetic phenotype in a high-dimensional data form would mediate the causal pathway of exposure to disease. However, most existing models are designed based on the assumption that there are no confounders between the exposure, the mediators, and the outcome. In practice, this assumption may not be feasible since high-dimensional mediation analysis (HIMA) tends to be observational where a randomized controlled trial (RCT) cannot be conducted for some economic or ethical reasons. Thus, to deal with the confounders in HIMA cases, we proposed three propensity score-related approaches named PSR (propensity score regression), PSW (propensity score weighting), and PSU (propensity score union) to adjust for the confounder bias in HIMA, and compared them with the traditional covariate regression method. The procedures mainly include four parts: calculating the propensity score, sure independence screening, MCP (minimax concave penalty) variable selection, and joint-significance testing. Simulation results show that the PSU model is the most recommended. Applying our models to the TCGA lung cancer dataset, we find that smoking may lead to lung disease through the mediation effect of some specific DNA-methylation sites, including site Cg24480765 in gene RP11-347H15.2 and site Cg22051776 in gene KLF3.
Collapse
Affiliation(s)
- Linghao Luo
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Yuting Yan
- Jinmai Community Service Center, Guiyang, China
| | - Yidan Cui
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Xin Yuan
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Zhangsheng Yu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
- Clinical Research Institute, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- *Correspondence: Zhangsheng Yu,
| |
Collapse
|
12
|
High-dimensional causal mediation analysis based on partial linear structural equation models. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
13
|
Zhang L, Jiang H, Zhu Z, Liu J, Li B. Integrating CRISPR/Cas within isothermal amplification for point-of-Care Assay of nucleic acid. Talanta 2022; 243:123388. [DOI: 10.1016/j.talanta.2022.123388] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Revised: 03/03/2022] [Accepted: 03/11/2022] [Indexed: 12/14/2022]
|
14
|
Cui Y, Luo C, Luo L, Yu Z. High-Dimensional Mediation Analysis Based on Additive Hazards Model for Survival Data. Front Genet 2021; 12:771932. [PMID: 35003213 PMCID: PMC8734376 DOI: 10.3389/fgene.2021.771932] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 10/19/2021] [Indexed: 11/13/2022] Open
Abstract
Mediation analysis has been extensively used to identify potential pathways between exposure and outcome. However, the analytical methods of high-dimensional mediation analysis for survival data are still yet to be promoted, especially for non-Cox model approaches. We propose a procedure including "two-step" variable selection and indirect effect estimation for the additive hazards model with high-dimensional mediators. We first apply sure independence screening and smoothly clipped absolute deviation regularization to select mediators. Then we use the Sobel test and the BH method for indirect effect hypothesis testing. Simulation results demonstrate its good performance with a higher true-positive rate and accuracy, as well as a lower false-positive rate. We apply the proposed procedure to analyze DNA methylation markers mediating smoking and survival time of lung cancer patients in a TCGA (The Cancer Genome Atlas) cohort study. The real data application identifies four mediate CpGs, three of which are newly found.
Collapse
Affiliation(s)
- Yidan Cui
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Chengwen Luo
- Public Laboratory, Taizhou Hospital of Zhejiang Province, Wenzhou Medical University, Linhai, Zhejiang, China
| | - Linghao Luo
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Zhangsheng Yu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
- Clinical Research Institute, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
15
|
Zhang H, Zheng Y, Hou L, Zheng C, Liu L. Mediation analysis for survival data with high-dimensional mediators. Bioinformatics 2021; 37:3815-3821. [PMID: 34343267 PMCID: PMC8570823 DOI: 10.1093/bioinformatics/btab564] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 07/18/2021] [Accepted: 07/29/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Mediation analysis has become a prevalent method to identify causal pathway(s) between an independent variable and a dependent variable through intermediate variable(s). However, little work has been done when the intermediate variables (mediators) are high-dimensional and the outcome is a survival endpoint. In this paper, we introduce a novel method to identify potential mediators in a causal framework of high-dimensional Cox regression. RESULTS We first reduce the data dimension through a mediation-based sure independence screening method. A de-biased Lasso inference procedure is used for Cox's regression parameters. We adopt a multiple-testing procedure to accurately control the false discovery rate when testing high-dimensional mediation hypotheses. Simulation studies are conducted to demonstrate the performance of our method. We apply this approach to explore the mediation mechanisms of 379 330 DNA methylation markers between smoking and overall survival among lung cancer patients in The Cancer Genome Atlas lung cancer cohort. Two methylation sites (cg08108679 and cg26478297) are identified as potential mediating epigenetic markers. AVAILABILITY AND IMPLEMENTATION Our proposed method is available with the R package HIMA at https://cran.r-project.org/web/packages/HIMA/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Haixiang Zhang
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Yinan Zheng
- Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Cheng Zheng
- Department of Biostatistics, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Lei Liu
- Division of Biostatistics, Washington University in St. Louis, St. Louis, MO 63110, USA
| |
Collapse
|
16
|
Yang T, Niu J, Chen H, Wei P. Estimation of total mediation effect for high-dimensional omics mediators. BMC Bioinformatics 2021; 22:414. [PMID: 34425752 PMCID: PMC8381496 DOI: 10.1186/s12859-021-04322-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 08/10/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Environmental exposures can regulate intermediate molecular phenotypes, such as gene expression, by different mechanisms and thereby lead to various health outcomes. It is of significant scientific interest to unravel the role of potentially high-dimensional intermediate phenotypes in the relationship between environmental exposure and traits. Mediation analysis is an important tool for investigating such relationships. However, it has mainly focused on low-dimensional settings, and there is a lack of a good measure of the total mediation effect. Here, we extend an R-squared (R[Formula: see text]) effect size measure, originally proposed in the single-mediator setting, to the moderate- and high-dimensional mediator settings in the mixed model framework. RESULTS Based on extensive simulations, we compare our measure and estimation procedure with several frequently used mediation measures, including product, proportion, and ratio measures. Our R[Formula: see text]-based second-moment measure has small bias and variance under the correctly specified model. To mitigate potential bias induced by non-mediators, we examine two variable selection procedures, i.e., iterative sure independence screening and false discovery rate control, to exclude the non-mediators. We establish the consistency of the proposed estimation procedures and introduce a resampling-based confidence interval. By applying the proposed estimation procedure, we found that 38% of the age-related variations in systolic blood pressure can be explained by gene expression profiles in the Framingham Heart Study of 1711 individuals. An R package "RsqMed" is available on CRAN. CONCLUSION R-squared (R[Formula: see text]) is an effective and efficient measure for total mediation effect especially under high-dimensional setting.
Collapse
Affiliation(s)
- Tianzhong Yang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, USA
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, USA
- Division of Biostatistics, University of Minnesota, Minneapolis, USA
| | - Jingbo Niu
- Section of Nephrology, Baylor College of Medicine, Houston, USA
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, USA
- Center for Precision Health, School of Public Health and School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, USA.
| |
Collapse
|
17
|
Shao Z, Wang T, Zhang M, Jiang Z, Huang S, Zeng P. IUSMMT: Survival mediation analysis of gene expression with multiple DNA methylation exposures and its application to cancers of TCGA. PLoS Comput Biol 2021; 17:e1009250. [PMID: 34464378 PMCID: PMC8437300 DOI: 10.1371/journal.pcbi.1009250] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 09/13/2021] [Accepted: 07/06/2021] [Indexed: 02/07/2023] Open
Abstract
Effective and powerful survival mediation models are currently lacking. To partly fill such knowledge gap, we particularly focus on the mediation analysis that includes multiple DNA methylations acting as exposures, one gene expression as the mediator and one survival time as the outcome. We proposed IUSMMT (intersection-union survival mixture-adjusted mediation test) to effectively examine the existence of mediation effect by fitting an empirical three-component mixture null distribution. With extensive simulation studies, we demonstrated the advantage of IUSMMT over existing methods. We applied IUSMMT to ten TCGA cancers and identified multiple genes that exhibited mediating effects. We further revealed that most of the identified regions, in which genes behaved as active mediators, were cancer type-specific and exhibited a full mediation from DNA methylation CpG sites to the survival risk of various types of cancers. Overall, IUSMMT represents an effective and powerful alternative for survival mediation analysis; our results also provide new insights into the functional role of DNA methylation and gene expression in cancer progression/prognosis and demonstrate potential therapeutic targets for future clinical practice.
Collapse
Affiliation(s)
- Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Meng Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, China
| |
Collapse
|
18
|
Yu Z, Cui Y, Wei T, Ma Y, Luo C. High-Dimensional Mediation Analysis With Confounders in Survival Models. Front Genet 2021; 12:688871. [PMID: 34262599 PMCID: PMC8273300 DOI: 10.3389/fgene.2021.688871] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 06/07/2021] [Indexed: 12/02/2022] Open
Abstract
Mediation analysis is a common statistical method for investigating the mechanism of environmental exposures on health outcomes. Previous studies have extended mediation models with a single mediator to high-dimensional mediators selection. It is often assumed that there are no confounders that influence the relations among the exposure, mediator, and outcome. This is not realistic for the observational studies. To accommodate the potential confounders, we propose a concise and efficient high-dimensional mediation analysis procedure using the propensity score for adjustment. Results from simulation studies demonstrate the proposed procedure has good performance in mediator selection and effect estimation compared with methods that ignore all confounders. Of note, as the sample size increases, the performance of variable selection and mediation effect estimation is as well as the results shown in the method which include all confounders as covariates in the mediation model. By applying this procedure to a TCGA lung cancer data set, we find that lung cancer patients who had serious smoking history have increased the risk of death via the methylation markers cg21926276 and cg20707991 with significant hazard ratios of 1.2093 (95% CI: 1.2019-1.2167) and 1.1388 (95% CI: 1.1339-1.1438), respectively.
Collapse
Affiliation(s)
- Zhangsheng Yu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
- Clinical Research Institute, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yidan Cui
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Ting Wei
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Yanran Ma
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Chengwen Luo
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
19
|
Zeng P, Shao Z, Zhou X. Statistical methods for mediation analysis in the era of high-throughput genomics: Current successes and future challenges. Comput Struct Biotechnol J 2021; 19:3209-3224. [PMID: 34141140 PMCID: PMC8187160 DOI: 10.1016/j.csbj.2021.05.042] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 05/21/2021] [Accepted: 05/21/2021] [Indexed: 12/12/2022] Open
Abstract
Mediation analysis investigates the intermediate mechanism through which an exposure exerts its influence on the outcome of interest. Mediation analysis is becoming increasingly popular in high-throughput genomics studies where a common goal is to identify molecular-level traits, such as gene expression or methylation, which actively mediate the genetic or environmental effects on the outcome. Mediation analysis in genomics studies is particularly challenging, however, thanks to the large number of potential mediators measured in these studies as well as the composite null nature of the mediation effect hypothesis. Indeed, while the standard univariate and multivariate mediation methods have been well-established for analyzing one or multiple mediators, they are not well-suited for genomics studies with a large number of mediators and often yield conservative p-values and limited power. Consequently, over the past few years many new high-dimensional mediation methods have been developed for analyzing the large number of potential mediators collected in high-throughput genomics studies. In this work, we present a thorough review of these important recent methodological advances in high-dimensional mediation analysis. Specifically, we describe in detail more than ten high-dimensional mediation methods, focusing on their motivations, basic modeling ideas, specific modeling assumptions, practical successes, methodological limitations, as well as future directions. We hope our review will serve as a useful guidance for statisticians and computational biologists who develop methods of high-dimensional mediation analysis as well as for analysts who apply mediation methods to high-throughput genomics studies.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Zhonghe Shao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor 48109, MI, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor 48109, MI, USA
| |
Collapse
|