1
|
Ozminkowski S, Solís‐Lemus C. Identifying microbial drivers in biological phenotypes with a Bayesian network regression model. Ecol Evol 2024; 14:e11039. [PMID: 38774136 PMCID: PMC11106058 DOI: 10.1002/ece3.11039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 01/29/2024] [Accepted: 02/03/2024] [Indexed: 05/24/2024] Open
Abstract
In Bayesian Network Regression models, networks are considered the predictors of continuous responses. These models have been successfully used in brain research to identify regions in the brain that are associated with specific human traits, yet their potential to elucidate microbial drivers in biological phenotypes for microbiome research remains unknown. In particular, microbial networks are challenging due to their high dimension and high sparsity compared to brain networks. Furthermore, unlike in brain connectome research, in microbiome research, it is usually expected that the presence of microbes has an effect on the response (main effects), not just the interactions. Here, we develop the first thorough investigation of whether Bayesian Network Regression models are suitable for microbial datasets on a variety of synthetic and real data under diverse biological scenarios. We test whether the Bayesian Network Regression model that accounts only for interaction effects (edges in the network) is able to identify key drivers (microbes) in phenotypic variability. We show that this model is indeed able to identify influential nodes and edges in the microbial networks that drive changes in the phenotype for most biological settings, but we also identify scenarios where this method performs poorly which allows us to provide practical advice for domain scientists aiming to apply these tools to their datasets. BNR models provide a framework for microbiome researchers to identify connections between microbes and measured phenotypes. We allow the use of this statistical model by providing an easy-to-use implementation which is publicly available Julia package at https://github.com/solislemuslab/BayesianNetworkRegression.jl.
Collapse
Affiliation(s)
- Samuel Ozminkowski
- Department of Statistics and Wisconsin Institute for DiscoveryUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
| | - Claudia Solís‐Lemus
- Department of Plant Pathology and Wisconsin Institute for DiscoveryUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
| |
Collapse
|
2
|
Kim J, Shen J, Wang A, Mehrotra DV, Ko S, Zhou JJ, Zhou H. VCSEL: Prioritizing SNP-set by penalized variance component selection. Ann Appl Stat 2021; 15:1652-1672. [DOI: 10.1214/21-aoas1491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Juhyun Kim
- Department of Biostatistics, University of California, Los Angeles
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc
| | - Anran Wang
- Biostatistics and Research Decision Sciences, Merck & Co., Inc
| | | | - Seyoon Ko
- Department of Biostatistics, University of California, Los Angeles
| | - Jin J. Zhou
- Department of Medicine, University of California, Los Angeles
| | - Hua Zhou
- Department of Biostatistics, University of California, Los Angeles
| |
Collapse
|
3
|
Chi JT, Ipsen ICF, Hsiao TH, Lin CH, Wang LS, Lee WP, Lu TP, Tzeng JY. SEAGLE: A Scalable Exact Algorithm for Large-Scale Set-Based Gene-Environment Interaction Tests in Biobank Data. Front Genet 2021; 12:710055. [PMID: 34795690 PMCID: PMC8593472 DOI: 10.3389/fgene.2021.710055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Accepted: 09/13/2021] [Indexed: 11/13/2022] Open
Abstract
The explosion of biobank data offers unprecedented opportunities for gene-environment interaction (GxE) studies of complex diseases because of the large sample sizes and the rich collection in genetic and non-genetic information. However, the extremely large sample size also introduces new computational challenges in G×E assessment, especially for set-based G×E variance component (VC) tests, which are a widely used strategy to boost overall G×E signals and to evaluate the joint G×E effect of multiple variants from a biologically meaningful unit (e.g., gene). In this work, we focus on continuous traits and present SEAGLE, a Scalable Exact AlGorithm for Large-scale set-based G×E tests, to permit G×E VC tests for biobank-scale data. SEAGLE employs modern matrix computations to calculate the test statistic and p-value of the GxE VC test in a computationally efficient fashion, without imposing additional assumptions or relying on approximations. SEAGLE can easily accommodate sample sizes in the order of 105, is implementable on standard laptops, and does not require specialized computing equipment. We demonstrate the performance of SEAGLE using extensive simulations. We illustrate its utility by conducting genome-wide gene-based G×E analysis on the Taiwan Biobank data to explore the interaction of gene and physical activity status on body mass index.
Collapse
Affiliation(s)
- Jocelyn T. Chi
- Department of Statistics, North Carolina State University, Raleigh, NC, United States
| | - Ilse C. F. Ipsen
- Department of Mathematics, North Carolina State University, Raleigh, NC, United States
| | - Tzu-Hung Hsiao
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Ching-Heng Lin
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Li-San Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Wan-Ping Lee
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Tzu-Pin Lu
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Jung-Ying Tzeng
- Department of Statistics, North Carolina State University, Raleigh, NC, United States
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
| |
Collapse
|
4
|
Liu JZ, Deng W, Lee J, Lin PID, Valeri L, Christiani DC, Bellinger DC, Wright RO, Mazumdar MM, Coull BA. A Cross-validated Ensemble Approach to Robust Hypothesis Testing of Continuous Nonlinear Interactions: Application to Nutrition-Environment Studies. J Am Stat Assoc 2021; 117:561-573. [PMID: 36310839 PMCID: PMC9611147 DOI: 10.1080/01621459.2021.1962889] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2019] [Revised: 07/23/2021] [Accepted: 07/28/2021] [Indexed: 01/03/2023]
Abstract
Gene-environment and nutrition-environment studies often involve testing of high-dimensional interactions between two sets of variables, each having potentially complex nonlinear main effects on an outcome. Construction of a valid and powerful hypothesis test for such an interaction is challenging, due to the difficulty in constructing an efficient and unbiased estimator for the complex, nonlinear main effects. In this work we address this problem by proposing a Cross-validated Ensemble of Kernels (CVEK) that learns the space of appropriate functions for the main effects using a cross-validated ensemble approach. With a carefully chosen library of base kernels, CVEK flexibly estimates the form of the main-effect functions from the data, and encourages test power by guarding against over-fitting under the alternative. The method is motivated by a study on the interaction between metal exposures in utero and maternal nutrition on children's neurodevelopment in rural Bangladesh. The proposed tests identified evidence of an interaction between minerals and vitamins intake and arsenic and manganese exposures. Results suggest that the detrimental effects of these metals are most pronounced at low intake levels of the nutrients, suggesting nutritional interventions in pregnant women could mitigate the adverse impacts of in utero metal exposures on children's neurodevelopment.
Collapse
Affiliation(s)
- Jeremiah Zhe Liu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Wenying Deng
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jane Lee
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA, USA
| | - Pi-i Debby Lin
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Linda Valeri
- Department of Biostatistics, Columbia Mailman School of Public Health, New York, New York, USA
| | - David C. Christiani
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - David C. Bellinger
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA, USA
| | - Robert O. Wright
- Department of Environmental Medicine and Public Health, Icahn School of Medicine, New York, NY, USA
| | - Maitreyi M. Mazumdar
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA, USA
| | - Brent A. Coull
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
5
|
Integration of peripheral transcriptomics, genomics, and interactomics following trauma identifies causal genes for symptoms of post-traumatic stress and major depression. Mol Psychiatry 2021; 26:3077-3092. [PMID: 33963278 DOI: 10.1038/s41380-021-01084-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 02/26/2021] [Accepted: 03/26/2021] [Indexed: 02/03/2023]
Abstract
Posttraumatic stress disorder (PTSD) is a debilitating syndrome with substantial morbidity and mortality that occurs in the aftermath of trauma. Symptoms of major depressive disorder (MDD) are also a frequent consequence of trauma exposure. Identifying novel risk markers in the immediate aftermath of trauma is a critical step for the identification of novel biological targets to understand mechanisms of pathophysiology and prevention, as well as the determination of patients most at risk who may benefit from immediate intervention. Our study utilizes a novel approach to computationally integrate blood-based transcriptomics, genomics, and interactomics to understand the development of risk vs. resilience in the months following trauma exposure. In a two-site longitudinal, observational prospective study, we assessed over 10,000 individuals and enrolled >700 subjects in the immediate aftermath of trauma (average 5.3 h post-trauma (range 0.5-12 h)) in the Grady Memorial Hospital (Atlanta) and Jackson Memorial Hospital (Miami) emergency departments. RNA expression data and 6-month follow-up data were available for 366 individuals, while genotype, transcriptome, and phenotype data were available for 297 patients. To maximize our power and understanding of genes and pathways that predict risk vs. resilience, we utilized a set-cover approach to capture fluctuations of gene expression of PTSD or depression-converting patients and non-converting trauma-exposed controls to find representative sets of disease-relevant dysregulated genes. We annotated such genes with their corresponding expression quantitative trait loci and applied a variant of a current flow algorithm to identify genes that potentially were causal for the observed dysregulation of disease genes involved in the development of depression and PTSD symptoms after trauma exposure. We obtained a final list of 11 driver causal genes related to MDD symptoms, 13 genes for PTSD symptoms, and 22 genes in PTSD and/or MDD. We observed that these individual or combined disorders shared ESR1, RUNX1, PPARA, and WWOX as driver causal genes, while other genes appeared to be causal driver in the PTSD only or MDD only cases. A number of these identified causal pathways have been previously implicated in the biology or genetics of PTSD and MDD, as well as in preclinical models of amygdala function and fear regulation. Our work provides a promising set of initial pathways that may underlie causal mechanisms in the development of PTSD or MDD in the aftermath of trauma.
Collapse
|
6
|
Family-based gene-environment interaction using sequence kernel association test (FGE-SKAT) for complex quantitative traits. Sci Rep 2021; 11:7431. [PMID: 33795796 PMCID: PMC8016937 DOI: 10.1038/s41598-021-86871-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 03/22/2021] [Indexed: 11/30/2022] Open
Abstract
After the genome-wide association studies (GWAS) era, whole-genome sequencing is highly engaged in identifying the association of complex traits with rare variations. A score-based variance-component test has been proposed to identify common and rare genetic variants associated with complex traits while quickly adjusting for covariates. Such kernel score statistic allows for familial dependencies and adjusts for random confounding effects. However, the etiology of complex traits may involve the effects of genetic and environmental factors and the complex interactions between genes and the environment. Therefore, in this research, a novel method is proposed to detect gene and gene-environment interactions in a complex family-based association study with various correlated structures. We also developed an R function for the Fast Gene-Environment Sequence Kernel Association Test (FGE-SKAT), which is freely available as supplementary material for easy GWAS implementation to unveil such family-based joint effects. Simulation studies confirmed the validity of the new strategy and the superior statistical power. The FGE-SKAT was applied to the whole genome sequence data provided by Genetic Analysis Workshop 18 (GAW18) and discovered concordant and discordant regions compared to the methods without considering gene by environment interactions.
Collapse
|
7
|
Zhang H, Zhao N, Mehrotra DV, Shen J. Composite Kernel Association Test (CKAT) for SNP-set joint assessment of genotype and genotype-by-treatment interaction in Pharmacogenetics studies. Bioinformatics 2020; 36:3162-3168. [PMID: 32101275 DOI: 10.1093/bioinformatics/btaa125] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 02/14/2020] [Accepted: 02/19/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION It is of substantial interest to discover novel genetic markers that influence drug response in order to develop personalized treatment strategies that maximize therapeutic efficacy and safety. To help enable such discoveries, we focus on testing the association between the cumulative effect of multiple single nucleotide polymorphisms (SNPs) in a particular genomic region and a drug response of interest. However, the currently existing methods are either computational inefficient or not able to control type I error and provide decent power for whole exome or genome analysis in Pharmacogenetics (PGx) studies with small sample sizes. RESULTS In this article, we propose the Composite Kernel Association Test (CKAT), a flexible and robust kernel machine-based approach to jointly test the genetic main effect and SNP-treatment interaction effect for SNP-sets in Pharmacogenetics (PGx) assessments embedded within randomized clinical trials. An analytic procedure is developed to accurately calculate the P-value so that computationally extensive procedures (e.g. permutation or perturbation) can be avoided. We evaluate CKAT through extensive simulation studies and application to the gene-level association test of the reduction in Clostridium difficile infection recurrence in patients treated with bezlotoxumab. The results demonstrate that the proposed CKAT controls type I error well for PGx studies, is efficient for whole exome/genome association analysis and provides better power performance than existing methods across multiple scenarios. AVAILABILITY AND IMPLEMENTATION The R package CKAT is publicly available on CRAN https://cran.r-project.org/web/packages/CKAT/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hong Zhang
- Biostatistics and Research Decision Sciences, Merck Research Laboratories, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck Research Laboratories, Merck & Co., Inc., North Wales, PA 19454, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck Research Laboratories, Merck & Co., Inc., Rahway, NJ 07065, USA
| |
Collapse
|
8
|
Liu L, Wang P, Meng J, Chen L, Zhu W, Ma W. A permutation method for detecting trend correlations in rare variant association studies. Genet Res (Camb) 2019; 101:e13. [PMID: 31831092 PMCID: PMC7044977 DOI: 10.1017/s0016672319000120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 09/25/2019] [Accepted: 11/07/2019] [Indexed: 01/14/2023] Open
Abstract
In recent years, there has been an increasing interest in detecting disease-related rare variants in sequencing studies. Numerous studies have shown that common variants can only explain a small proportion of the phenotypic variance for complex diseases. More and more evidence suggests that some of this missing heritability can be explained by rare variants. Considering the importance of rare variants, researchers have proposed a considerable number of methods for identifying the rare variants associated with complex diseases. Extensive research has been carried out on testing the association between rare variants and dichotomous, continuous or ordinal traits. So far, however, there has been little discussion about the case in which both genotypes and phenotypes are ordinal variables. This paper introduces a method based on the γ-statistic, called OV-RV, for examining disease-related rare variants when both genotypes and phenotypes are ordinal. At present, little is known about the asymptotic distribution of the γ-statistic when conducting association analyses for rare variants. One advantage of OV-RV is that it provides a robust estimation of the distribution of the γ-statistic by employing the permutation approach proposed by Fisher. We also perform extensive simulations to investigate the numerical performance of OV-RV under various model settings. The simulation results reveal that OV-RV is valid and efficient; namely, it controls the type I error approximately at the pre-specified significance level and achieves greater power at the same significance level. We also apply OV-RV for rare variant association studies of diastolic blood pressure.
Collapse
Affiliation(s)
- Lifeng Liu
- School of Mathematical Sciences, Heilongjiang University, Harbin150080, China
| | - Pengfei Wang
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun130024, China
| | - Jingbo Meng
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun130024, China
| | - Lili Chen
- School of Mathematical Sciences, Heilongjiang University, Harbin150080, China
| | - Wensheng Zhu
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun130024, China
| | - Weijun Ma
- School of Mathematical Sciences, Heilongjiang University, Harbin150080, China
| |
Collapse
|
9
|
Zhao N, Zhang H, Clark JJ, Maity A, Wu MC. Composite kernel machine regression based on likelihood ratio test for joint testing of genetic and gene–environment interaction effect. Biometrics 2019; 75:625-637. [DOI: 10.1111/biom.13003] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 10/09/2018] [Indexed: 12/17/2022]
Affiliation(s)
- Ni Zhao
- Department of BiostatisticsJohns Hopkins UniversityBaltimore, Maryland
| | - Haoyu Zhang
- Department of BiostatisticsJohns Hopkins UniversityBaltimore, Maryland
| | - Jennifer J. Clark
- Department of BiostatisticsUniversity of North Carolina at Chapel HillChapel Hill, North Carolina
| | - Arnab Maity
- Department of StatisticsNorth Carolina State UniversityRaleigh, North Carolina
| | - Michael C. Wu
- Public Health Sciences Division,Fred Hutchinson Cancer Research CenterSeattle, Washington
| |
Collapse
|
10
|
Shao F, Wang Y, Zhao Y, Yang S. Identifying and exploiting gene-pathway interactions from RNA-seq data for binary phenotype. BMC Genet 2019; 20:36. [PMID: 30890140 PMCID: PMC6423879 DOI: 10.1186/s12863-019-0739-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 03/12/2019] [Indexed: 11/29/2022] Open
Abstract
Background RNA sequencing (RNA-seq) technology has identified multiple differentially expressed (DE) genes associated to complex disease, however, these genes only explain a modest part of variance. Omnigenic model assumes that disease may be driven by genes with indirect relevance to disease and be propagated by functional pathways. Here, we focus on identifying the interactions between the external genes and functional pathways, referring to gene-pathway interactions (GPIs). Specifically, relying on the relationship between the garrote kernel machine (GKM) and variance component test and permutations for the empirical distributions of score statistics, we propose an efficient analysis procedure as Permutation based gEne-pAthway interaction identification in binary phenotype (PEA). Results Various simulations show that PEA has well-calibrated type I error rates and higher power than the traditional likelihood ratio test (LRT). In addition, we perform the gene set enrichment algorithms and PEA to identifying the GPIs from a pan-cancer data (GES68086). These GPIs and genes possibly further illustrate the potential etiology of cancers, most of which are identified and some external genes and significant pathways are consistent with previous studies. Conclusions PEA is an efficient tool for identifying the GPIs from RNA-seq data. It can be further extended to identify the interactions between one variable and one functional set of other omics data for binary phenotypes. Electronic supplementary material The online version of this article (10.1186/s12863-019-0739-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Fang Shao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, Jiangsu, People's Republic of China
| | - Yaqi Wang
- Department of Pharmacy Informatics, School of Science, China Pharmaceutical University, 24 Tongjia Xiang, Nanjing , Jiangsu, People's Republic of China
| | - Yang Zhao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, Jiangsu, People's Republic of China
| | - Sheng Yang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, Jiangsu, People's Republic of China.
| |
Collapse
|
11
|
Larson NB, Chen J, Schaid DJ. A review of kernel methods for genetic association studies. Genet Epidemiol 2019; 43:122-136. [PMID: 30604442 DOI: 10.1002/gepi.22180] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 11/09/2018] [Accepted: 11/26/2018] [Indexed: 12/17/2022]
Abstract
Evaluating the association of multiple genetic variants with a trait of interest by use of kernel-based methods has made a significant impact on how genetic association analyses are conducted. An advantage of kernel methods is that they tend to be robust when the genetic variants have effects that are a mixture of positive and negative effects, as well as when there is a small fraction of causal variants. Another advantage is that kernel methods fit within the framework of mixed models, providing flexible ways to adjust for additional covariates that influence traits. Herein, we review the basic ideas behind the use of kernel methods for genetic association analysis as well as recent methodological advancements for different types of traits, multivariate traits, pedigree data, and longitudinal data. Finally, we discuss opportunities for future research.
Collapse
Affiliation(s)
- Nicholas B Larson
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Jun Chen
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Daniel J Schaid
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
12
|
Gauderman WJ, Mukherjee B, Aschard H, Hsu L, Lewinger JP, Patel CJ, Witte JS, Amos C, Tai CG, Conti D, Torgerson DG, Lee S, Chatterjee N. Update on the State of the Science for Analytical Methods for Gene-Environment Interactions. Am J Epidemiol 2017; 186:762-770. [PMID: 28978192 PMCID: PMC5859988 DOI: 10.1093/aje/kwx228] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 04/24/2017] [Accepted: 04/25/2017] [Indexed: 12/14/2022] Open
Abstract
The analysis of gene-environment interaction (G×E) may hold the key for further understanding the etiology of many complex traits. The current availability of high-volume genetic data, the wide range in types of environmental data that can be measured, and the formation of consortiums of multiple studies provide new opportunities to identify G×E but also new analytical challenges. In this article, we summarize several statistical approaches that can be used to test for G×E in a genome-wide association study. These include traditional models of G×E in a case-control or quantitative trait study as well as alternative approaches that can provide substantially greater power. The latest methods for analyzing G×E with gene sets and with data in a consortium setting are summarized, as are issues that arise due to the complexity of environmental data. We provide some speculation on why detecting G×E in a genome-wide association study has thus far been difficult. We conclude with a description of software programs that can be used to implement most of the methods described in the paper.
Collapse
Affiliation(s)
- W. James Gauderman
- Correspondence to Dr. W. James Gauderman, Division of Biostatistics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, 2001 North Soto Street, 202-K, Los Angeles, CA 90032 (e-mail: )
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|