1
|
Behr M, Kumbier K, Cordova-Palomera A, Aguirre M, Ronen O, Ye C, Ashley E, Butte AJ, Arnaout R, Brown B, Priest J, Yu B. Learning epistatic polygenic phenotypes with Boolean interactions. PLoS One 2024; 19:e0298906. [PMID: 38625909 PMCID: PMC11020961 DOI: 10.1371/journal.pone.0298906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 01/31/2024] [Indexed: 04/18/2024] Open
Abstract
Detecting epistatic drivers of human phenotypes is a considerable challenge. Traditional approaches use regression to sequentially test multiplicative interaction terms involving pairs of genetic variants. For higher-order interactions and genome-wide large-scale data, this strategy is computationally intractable. Moreover, multiplicative terms used in regression modeling may not capture the form of biological interactions. Building on the Predictability, Computability, Stability (PCS) framework, we introduce the epiTree pipeline to extract higher-order interactions from genomic data using tree-based models. The epiTree pipeline first selects a set of variants derived from tissue-specific estimates of gene expression. Next, it uses iterative random forests (iRF) to search training data for candidate Boolean interactions (pairwise and higher-order). We derive significance tests for interactions, based on a stabilized likelihood ratio test, by simulating Boolean tree-structured null (no epistasis) and alternative (epistasis) distributions on hold-out test data. Finally, our pipeline computes PCS epistasis p-values that probabilisticly quantify improvement in prediction accuracy via bootstrap sampling on the test set. We validate the epiTree pipeline in two case studies using data from the UK Biobank: predicting red hair and multiple sclerosis (MS). In the case of predicting red hair, epiTree recovers known epistatic interactions surrounding MC1R and novel interactions, representing non-linearities not captured by logistic regression models. In the case of predicting MS, a more complex phenotype than red hair, epiTree rankings prioritize novel interactions surrounding HLA-DRB1, a variant previously associated with MS in several populations. Taken together, these results highlight the potential for epiTree rankings to help reduce the design space for follow up experiments.
Collapse
Affiliation(s)
- Merle Behr
- Faculty of Informatics and Data Science, University of Regensburg, Regensburg, Germany
| | - Karl Kumbier
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, United States of America
| | | | - Matthew Aguirre
- Department of Pediatrics, Stanford Medicine, Stanford, CA, United States of America
- Department of Biomedical Data Science, Stanford Medicine, Stanford, CA, United States of America
| | - Omer Ronen
- Department of Statistics, University of California at Berkeley, Berkeley, CA, United States of America
| | - Chengzhong Ye
- Department of Statistics, University of California at Berkeley, Berkeley, CA, United States of America
| | - Euan Ashley
- Division of Cardiovascular Medicine, Stanford Medicine, Stanford, CA, United States of America
| | - Atul J. Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States of America
| | - Rima Arnaout
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States of America
- Division of Cardiology, Department of Medicine, University of California, San Francisco, San Francisco, CA, United States of America
| | - Ben Brown
- Department of Statistics, University of California at Berkeley, Berkeley, CA, United States of America
- Biosciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - James Priest
- Department of Pediatrics, Stanford Medicine, Stanford, CA, United States of America
| | - Bin Yu
- Department of Statistics, University of California at Berkeley, Berkeley, CA, United States of America
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California at Berkeley, Berkeley, CA, United States of America
| |
Collapse
|
2
|
Richard-St-Hilaire A, Gamache I, Pelletier J, Grenier JC, Poujol R, Hussin JG. Signatures of Co-evolution and Co-regulation in the CYP3A and CYP4F Genes in Humans. Genome Biol Evol 2024; 16:evad236. [PMID: 38207129 PMCID: PMC10805436 DOI: 10.1093/gbe/evad236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 12/19/2023] [Accepted: 12/22/2023] [Indexed: 01/13/2024] Open
Abstract
Cytochromes P450 (CYP450) are hemoproteins generally involved in the detoxification of the body of xenobiotic molecules. They participate in the metabolism of many drugs and genetic polymorphisms in humans have been found to impact drug responses and metabolic functions. In this study, we investigate the genetic diversity of CYP450 genes. We found that two clusters, CYP3A and CYP4F, are notably differentiated across human populations with evidence for selective pressures acting on both clusters: we found signals of recent positive selection in CYP3A and CYP4F genes and signals of balancing selection in CYP4F genes. Furthermore, an extensive amount of unusual linkage disequilibrium is detected in this latter cluster, indicating co-evolution signatures among CYP4F genes. Several of the selective signals uncovered co-localize with expression quantitative trait loci (eQTL), which could suggest epistasis acting on co-regulation in these gene families. In particular, we detected a potential co-regulation event between CYP3A5 and CYP3A43, a gene whose function remains poorly characterized. We further identified a causal relationship between CYP3A5 expression and reticulocyte count through Mendelian randomization analyses, potentially involving a regulatory region displaying a selective signal specific to African populations. Our findings linking natural selection and gene expression in CYP3A and CYP4F subfamilies are of importance in understanding population differences in metabolism of nutrients and drugs.
Collapse
Affiliation(s)
- Alex Richard-St-Hilaire
- Département de biochimie et médecine moléculaire, Université de Montréal, Montreal, QC, Canada
- Sainte-Justine Hospital, Research Center, Montreal, QC, Canada
| | - Isabel Gamache
- Département de biochimie et médecine moléculaire, Université de Montréal, Montreal, QC, Canada
- Montreal Heart Institute, Research Center, Montreal, QC, Canada
| | - Justin Pelletier
- Département de biochimie et médecine moléculaire, Université de Montréal, Montreal, QC, Canada
- McGill CERC in Genomic Medicine, McGill University, Montreal, Canada
| | | | - Raphaël Poujol
- Montreal Heart Institute, Research Center, Montreal, QC, Canada
| | - Julie G Hussin
- Montreal Heart Institute, Research Center, Montreal, QC, Canada
- Département de médecine, Université de Montréal, Montreal, QC, Canada
- Mila-Quebec AI institute, Montreal, QC, Canada
| |
Collapse
|
3
|
Yashin AI, Wu D, Arbeev K, Yashkin AP, Akushevich I, Bagley O, Duan M, Ukraintseva S. Roles of interacting stress-related genes in lifespan regulation: insights for translating experimental findings to humans. JOURNAL OF TRANSLATIONAL GENETICS AND GENOMICS 2021; 5:357-379. [PMID: 34825130 PMCID: PMC8612394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
AIM Experimental studies provided numerous evidence that caloric/dietary restriction may improve health and increase the lifespan of laboratory animals, and that the interplay among molecules that sense cellular stress signals and those regulating cell survival can play a crucial role in cell response to nutritional stressors. However, it is unclear whether the interplay among corresponding genes also plays a role in human health and lifespan. METHODS Literature about roles of cellular stressors have been reviewed, such as amino acid deprivation, and the integrated stress response (ISR) pathway in health and aging. Single nucleotide polymorphisms (SNPs) in two candidate genes (GCN2/EIF2AK4 and CHOP/DDIT3) that are closely involved in the cellular stress response to amino acid starvation, have been selected using information from experimental studies. Associations of these SNPs and their interactions with human survival in the Health and Retirement Study data have been estimated. The impact of collective associations of multiple interacting SNP pairs on survival has been evaluated, using a recently developed composite index: the SNP-specific Interaction Polygenic Risk Score (SIPRS). RESULTS Significant interactions have been found between SNPs from GCN2/EIF2AK4 and CHOP/DDI3T genes that were associated with survival 85+ compared to survival between ages 75 and 85 in the total sample (males and females combined) and in females only. This may reflect sex differences in genetic regulation of the human lifespan. Highly statistically significant associations of SIPRS [constructed for the rs16970024 (GCN2/EIF2AK4) and rs697221 (CHOP/DDIT3)] with survival in both sexes also been found in this study. CONCLUSION Identifying associations of the genetic interactions with human survival is an important step in translating the knowledge from experimental to human aging research. Significant associations of multiple SNPxSNP interactions in ISR genes with survival to the oldest old age that have been found in this study, can help uncover mechanisms of multifactorial regulation of human lifespan and its heterogeneity.
Collapse
|
4
|
Dannemann M, He Z, Heide C, Vernot B, Sidow L, Kanton S, Weigert A, Treutlein B, Pääbo S, Kelso J, Camp JG. Human Stem Cell Resources Are an Inroad to Neandertal DNA Functions. Stem Cell Reports 2020; 15:214-225. [PMID: 32559457 PMCID: PMC7363959 DOI: 10.1016/j.stemcr.2020.05.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 05/21/2020] [Accepted: 05/22/2020] [Indexed: 02/07/2023] Open
Abstract
Induced pluripotent stem cells (iPSCs) from diverse humans offer the potential to study human functional variation in controlled culture environments. A portion of this variation originates from an ancient admixture between modern humans and Neandertals, which introduced alleles that left a phenotypic legacy on individual humans today. Here, we show that a large iPSC repository harbors extensive Neandertal DNA, including alleles that contribute to human phenotypes and diseases, encode hundreds of amino acid changes, and alter gene expression in specific tissues. We provide a database of the inferred introgressed Neandertal alleles for each individual iPSC line, together with the annotation of the predicted functional variants. We also show that transcriptomic data from organoids generated from iPSCs can be used to track Neandertal-derived RNA over developmental processes. Human iPSC resources provide an opportunity to experimentally explore Neandertal DNA function and its contribution to present-day phenotypes, and potentially study Neandertal traits.
Collapse
Affiliation(s)
- Michael Dannemann
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany; Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Zhisong He
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Christian Heide
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Benjamin Vernot
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Leila Sidow
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Sabina Kanton
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Anne Weigert
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Barbara Treutlein
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany; Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Svante Pääbo
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Janet Kelso
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - J Gray Camp
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany; Institute of Molecular and Clinical Ophthalmology Basel, Basel, Switzerland; Department of Ophthalmology, University of Basel, Basel, Switzerland.
| |
Collapse
|
5
|
Wen J, Ford CT, Janies D, Shi X. A parallelized strategy for epistasis analysis based on Empirical Bayesian Elastic Net models. Bioinformatics 2020; 36:3803-3810. [PMID: 32227194 DOI: 10.1093/bioinformatics/btaa216] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 03/05/2020] [Accepted: 03/26/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Epistasis reflects the distortion on a particular trait or phenotype resulting from the combinatorial effect of two or more genes or genetic variants. Epistasis is an important genetic foundation underlying quantitative traits in many organisms as well as in complex human diseases. However, there are two major barriers in identifying epistasis using large genomic datasets. One is that epistasis analysis will induce over-fitting of an over-saturated model with the high-dimensionality of a genomic dataset. Therefore, the problem of identifying epistasis demands efficient statistical methods. The second barrier comes from the intensive computing time for epistasis analysis, even when the appropriate model and data are specified. RESULTS In this study, we combine statistical techniques and computational techniques to scale up epistasis analysis using Empirical Bayesian Elastic Net (EBEN) models. Specifically, we first apply a matrix manipulation strategy for pre-computing the correlation matrix and pre-filter to narrow down the search space for epistasis analysis. We then develop a parallelized approach to further accelerate the modeling process. Our experiments on synthetic and empirical genomic data demonstrate that our parallelized methods offer tens of fold speed up in comparison with the classical EBEN method which runs in a sequential manner. We applied our parallelized approach to a yeast dataset, and we were able to identify both main and epistatic effects of genetic variants associated with traits such as fitness. AVAILABILITY AND IMPLEMENTATION The software is available at github.com/shilab/parEBEN.
Collapse
Affiliation(s)
- Jia Wen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Colby T Ford
- Department of Bioinformatics and Genomics, College of Computing and Informatics.,School of Data Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Daniel Janies
- Department of Bioinformatics and Genomics, College of Computing and Informatics
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| |
Collapse
|
6
|
Pedruzzi G, Barlukova A, Rouzine IM. Evolutionary footprint of epistasis. PLoS Comput Biol 2018; 14:e1006426. [PMID: 30222748 PMCID: PMC6177197 DOI: 10.1371/journal.pcbi.1006426] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 10/09/2018] [Accepted: 08/09/2018] [Indexed: 11/18/2022] Open
Abstract
Variation of an inherited trait across a population cannot be explained by additive contributions of relevant genes, due to epigenetic effects and biochemical interactions (epistasis). Detecting epistasis in genomic data still represents a significant challenge that requires a better understanding of epistasis from the mechanistic point of view. Using a standard Wright-Fisher model of bi-allelic asexual population, we study how compensatory epistasis affects the process of adaptation. The main result is a universal relationship between four haplotype frequencies of a single site pair in a genome, which depends only on the epistasis strength of the pair defined regarding Darwinian fitness. We demonstrate the existence, at any time point, of a quasi-equilibrium between epistasis and disorder (entropy) caused by random genetic drift and mutation. We verify the accuracy of these analytic results by Monte-Carlo simulation over a broad range of parameters, including the topology of the interacting network. Thus, epistasis assists the evolutionary transit through evolutionary hurdles leaving marks at the level of haplotype disequilibrium. The method allows determining selection coefficient for each site and the epistasis strength of each pair from a sequence set. The resulting ability to detect clusters of deleterious mutation close to full compensation is essential for biomedical applications. These findings help to understand the role of epistasis in multiple compensatory mutations in viral resistance to antivirals and immune response.
Collapse
Affiliation(s)
- Gabriele Pedruzzi
- Sorbonne Université, Institute de Biologie Paris-Seine, Laboratoire de Biologie Computationelle et Quantitative, Paris, France
| | - Ayuna Barlukova
- Sorbonne Université, Institute de Biologie Paris-Seine, Laboratoire de Biologie Computationelle et Quantitative, Paris, France
| | - Igor M. Rouzine
- Sorbonne Université, Institute de Biologie Paris-Seine, Laboratoire de Biologie Computationelle et Quantitative, Paris, France
- * E-mail:
| |
Collapse
|
7
|
Lee C. Genome-Wide Expression Quantitative Trait Loci Analysis Using Mixed Models. Front Genet 2018; 9:341. [PMID: 30186313 PMCID: PMC6110903 DOI: 10.3389/fgene.2018.00341] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Accepted: 08/09/2018] [Indexed: 01/22/2023] Open
Abstract
Expression quantitative trait loci (eQTLs) are important for understanding the genetic basis of cellular activities and complex phenotypes. Genome-wide eQTL analyses can be effectively conducted by employing a mixed model. The mixed model includes random polygenic effects with variability, which can be estimated by the covariance structure of pairwise genomic similarity among individuals based on genotype information for nucleotide sequence variants. This increases the accuracy of identifying eQTLs by avoiding population stratification. Its extensive use will accelerate our understanding of the genetics of gene expression and complex phenotypes. An overview of genome-wide eQTL analyses using mixed model methodology is provided, including discussions of both theoretical and practical issues. The advantages of employing mixed models are also discussed in this review.
Collapse
Affiliation(s)
- Chaeyoung Lee
- Department of Bioinformatics and Life Science, Soongsil University, Seoul, South Korea
| |
Collapse
|
8
|
Wen J, Quitadamo A, Hall B, Shi X. Epistasis analysis of microRNAs on pathological stages in colon cancer based on an Empirical Bayesian Elastic Net method. BMC Genomics 2017. [PMID: 29513198 PMCID: PMC5657052 DOI: 10.1186/s12864-017-4130-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Background Colon cancer is a leading cause of worldwide cancer death. It has become clear that microRNAs (miRNAs) play a role in the progress of colon cancer and understanding the effect of miRNAs on tumorigenesis could lead to better prognosis and improved treatment. However, most studies have focused on studying differentially expressed miRNAs between tumor and non-tumor samples or between stages in tumor tissue. Limited work has conducted to study the interactions or epistasis between miRNAs and how the epistasis brings about effect on tumor progression. In this study, we investigate the main and pair-wise epistatic effects of miRNAs on the pathological stages of colon cancer using datasets from The Cancer Genome Atlas. Results We develop a workflow composed of multiple steps for feature selection based on the Empirical Bayesian Elastic Net (EBEN) method. First, we identify the main effects using a model with only main effect on the phenotype. Second, a corrected phenotype is calculated by removing the significant main effect from the original phenotype. Third, we select features with epistatic effect on the corrected phenotype. Finally, we run the full model with main and epistatic effects on the previously selected main and epistatic features. Using the multi-step workflow, we identify a set of miRNAs with main and epistatic effect on the pathological stages of colon cancer. Many of miRNAs with main effect on colon cancer have been previously reported to be associated with colon cancer, and the majority of the epistatic miRNAs share common target genes that could explain their epistasis effect on the pathological stages of colon cancer. We also find many of the target genes of detected miRNAs are associated with colon cancer. Go Ontology Enrichment Analysis of the experimentally validates targets of main and epistatic miRNAs, shows that these target genes are enriched for biological processes associated with cancer progression. Conclusion Our results provide a set of candidate miRNAs associated with colon cancer progression that could have potential translational and therapeutic utility. Our analysis workflow offers a new opportunity to efficiently explore epistatic interactions among genetic and epigenetic factors that could be associated with human diseases. Furthermore, our workflow is flexible and can be applied to analyze the main and epistatic effect of various genetic and epigenetic factors on a wide range of phenotypes. Electronic supplementary material The online version of this article (10.1186/s12864-017-4130-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jia Wen
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Andrew Quitadamo
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Benika Hall
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| |
Collapse
|
9
|
Crawford L, Zeng P, Mukherjee S, Zhou X. Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits. PLoS Genet 2017; 13:e1006869. [PMID: 28746338 PMCID: PMC5550000 DOI: 10.1371/journal.pgen.1006869] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 08/09/2017] [Accepted: 06/15/2017] [Indexed: 12/13/2022] Open
Abstract
Epistasis, commonly defined as the interaction between multiple genes, is an important genetic component underlying phenotypic variation. Many statistical methods have been developed to model and identify epistatic interactions between genetic variants. However, because of the large combinatorial search space of interactions, most epistasis mapping methods face enormous computational challenges and often suffer from low statistical power due to multiple test correction. Here, we present a novel, alternative strategy for mapping epistasis: instead of directly identifying individual pairwise or higher-order interactions, we focus on mapping variants that have non-zero marginal epistatic effects-the combined pairwise interaction effects between a given variant and all other variants. By testing marginal epistatic effects, we can identify candidate variants that are involved in epistasis without the need to identify the exact partners with which the variants interact, thus potentially alleviating much of the statistical and computational burden associated with standard epistatic mapping procedures. Our method is based on a variance component model, and relies on a recently developed variance component estimation method for efficient parameter inference and p-value computation. We refer to our method as the "MArginal ePIstasis Test", or MAPIT. With simulations, we show how MAPIT can be used to estimate and test marginal epistatic effects, produce calibrated test statistics under the null, and facilitate the detection of pairwise epistatic interactions. We further illustrate the benefits of MAPIT in a QTL mapping study by analyzing the gene expression data of over 400 individuals from the GEUVADIS consortium.
Collapse
Affiliation(s)
- Lorin Crawford
- Department of Biostatistics, Brown University, Providence, Rhode Island, United States of America
- Center for Statistical Sciences, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Ping Zeng
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sayan Mukherjee
- Department of Statistical Science, Duke University, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
- Department of Mathematics, Duke University, Durham, North Carolina, United States of America
- Department of Bioinformatics & Biostatistics, Duke University, Durham, North Carolina, United States of America
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
10
|
Xu K, Jin L, Xiong M. Functional regression method for whole genome eQTL epistasis analysis with sequencing data. BMC Genomics 2017; 18:385. [PMID: 28521784 PMCID: PMC5436462 DOI: 10.1186/s12864-017-3777-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 05/09/2017] [Indexed: 12/02/2022] Open
Abstract
Background Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. Methods We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. Results By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction identified using FRGM, RPKM and DESeq were 16,2361, 260 and 51, respectively, from the 350 European samples. Conclusions The proposed FRGM for epistasis analysis of RNA-seq can capture isoform and position-level information and will have a broad application. Both simulations and real data analysis highlight the potential for the FRGM to be a good choice of the epistatic analysis with sequencing data. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3777-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kelin Xu
- State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, 200438, China.,School of Data Science and Institute for Big Data, Fudan University, Shanghai, 200433, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Momiao Xiong
- State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, 200438, China. .,Department of Biostatistics, Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA. .,Human Genetics Center, The University of Texas Health Science Center at Houston, P.O. Box 20186, Houston, TX, 77225, USA.
| |
Collapse
|
11
|
Botzman M, Nachshon A, Brodt A, Gat-Viks I. POEM: Identifying Joint Additive Effects on Regulatory Circuits. Front Genet 2016; 7:48. [PMID: 27148351 PMCID: PMC4835676 DOI: 10.3389/fgene.2016.00048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 03/17/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Expression Quantitative Trait Locus (eQTL) mapping tackles the problem of identifying variation in DNA sequence that have an effect on the transcriptional regulatory network. Major computational efforts are aimed at characterizing the joint effects of several eQTLs acting in concert to govern the expression of the same genes. Yet, progress toward a comprehensive prediction of such joint effects is limited. For example, existing eQTL methods commonly discover interacting loci affecting the expression levels of a module of co-regulated genes. Such "modularization" approaches, however, are focused on epistatic relations and thus have limited utility for the case of additive (non-epistatic) effects. RESULTS Here we present POEM (Pairwise effect On Expression Modules), a methodology for identifying pairwise eQTL effects on gene modules. POEM is specifically designed to achieve high performance in the case of additive joint effects. We applied POEM to transcription profiles measured in bone marrow-derived dendritic cells across a population of genotyped mice. Our study reveals widespread additive, trans-acting pairwise effects on gene modules, characterizes their organizational principles, and highlights high-order interconnections between modules within the immune signaling network. These analyses elucidate the central role of additive pairwise effect in regulatory circuits, and provide computational tools for future investigations into the interplay between eQTLs. AVAILABILITY The software described in this article is available at csgi.tau.ac.il/POEM/.
Collapse
Affiliation(s)
- Maya Botzman
- Department of Cell Research and Immunology, The George S. Wise Faculty of Life Sciences, Tel Aviv University Tel Aviv, Israel
| | - Aharon Nachshon
- Department of Cell Research and Immunology, The George S. Wise Faculty of Life Sciences, Tel Aviv University Tel Aviv, Israel
| | - Avital Brodt
- Department of Cell Research and Immunology, The George S. Wise Faculty of Life Sciences, Tel Aviv University Tel Aviv, Israel
| | - Irit Gat-Viks
- Department of Cell Research and Immunology, The George S. Wise Faculty of Life Sciences, Tel Aviv University Tel Aviv, Israel
| |
Collapse
|
12
|
An interaction quantitative trait loci tool implicates epistatic functional variants in an apoptosis pathway in smallpox vaccine eQTL data. Genes Immun 2016; 17:244-50. [PMID: 27052692 DOI: 10.1038/gene.2016.15] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Revised: 12/06/2015] [Accepted: 01/04/2016] [Indexed: 12/17/2022]
Abstract
Expression quantitative trait loci (eQTL) studies have functionalized nucleic acid variants through the regulation of gene expression. Although most eQTL studies only examine the effects of single variants on transcription, a more complex process of variant-variant interaction (epistasis) may regulate transcription. Herein, we describe a tool called interaction QTL (iQTL) designed to efficiently detect epistatic interactions that regulate gene expression. To maximize biological relevance and minimize the computational and hypothesis testing burden, iQTL restricts interactions such that one variant is within a user-defined proximity of the transcript (cis-regulatory). We apply iQTL to a data set of 183 smallpox vaccine study participants with genome-wide association study and gene expression data from unstimulated samples and samples stimulated by inactivated vaccinia virus. While computing only 0.15% of possible interactions, we identify 11 probe sets whose expression is regulated through a variant-variant interaction. We highlight the functional epistatic interactions among apoptosis-related genes, DIABLO, TRAPPC4 and FADD, in the context of smallpox vaccination. We also use an integrative network approach to characterize these iQTL interactions in a posterior network of known prior functional interactions. iQTL is an efficient, open-source tool to analyze variant interactions in eQTL studies, providing better understanding of the function of epistasis in immune response and other complex phenotypes.
Collapse
|
13
|
Niel C, Sinoquet C, Dina C, Rocheleau G. A survey about methods dedicated to epistasis detection. Front Genet 2015; 6:285. [PMID: 26442103 PMCID: PMC4564769 DOI: 10.3389/fgene.2015.00285] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Accepted: 08/27/2015] [Indexed: 12/25/2022] Open
Abstract
During the past decade, findings of genome-wide association studies (GWAS) improved our knowledge and understanding of disease genetics. To date, thousands of SNPs have been associated with diseases and other complex traits. Statistical analysis typically looks for association between a phenotype and a SNP taken individually via single-locus tests. However, geneticists admit this is an oversimplified approach to tackle the complexity of underlying biological mechanisms. Interaction between SNPs, namely epistasis, must be considered. Unfortunately, epistasis detection gives rise to analytic challenges since analyzing every SNP combination is at present impractical at a genome-wide scale. In this review, we will present the main strategies recently proposed to detect epistatic interactions, along with their operating principle. Some of these methods are exhaustive, such as multifactor dimensionality reduction, likelihood ratio-based tests or receiver operating characteristic curve analysis; some are non-exhaustive, such as machine learning techniques (random forests, Bayesian networks) or combinatorial optimization approaches (ant colony optimization, computational evolution system).
Collapse
Affiliation(s)
- Clément Niel
- Computer Science Institute of Nantes-Atlantic (Lina), Centre National de la Recherche Scientifique UMR 6241, Ecole Polytechnique de l'Université de Nantes Nantes, France
| | - Christine Sinoquet
- Computer Science Institute of Nantes-Atlantic (Lina), Centre National de la Recherche Scientifique UMR 6241, University of Nantes Nantes, France
| | - Christian Dina
- Institut du Thorax, Institut National de la Santé et de la Recherche Médicale UMR 1087, Centre National de la Recherche Scientifique UMR 6291, University of Nantes Nantes, France
| | - Ghislain Rocheleau
- European Genomic Institute for Diabetes FR3508, Centre National de la Recherche Scientifique UMR 8199, Lille 2 University Lille, France
| |
Collapse
|
14
|
Okser S, Pahikkala T, Airola A, Salakoski T, Ripatti S, Aittokallio T. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet 2014; 10:e1004754. [PMID: 25393026 PMCID: PMC4230844 DOI: 10.1371/journal.pgen.1004754] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Affiliation(s)
- Sebastian Okser
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Tapio Pahikkala
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Antti Airola
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Tapio Salakoski
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Samuli Ripatti
- Hjelt Institute, University of Helsinki, Helsinki, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Tero Aittokallio
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- * E-mail:
| |
Collapse
|
15
|
Kang M, Zhang C, Chun HW, Ding C, Liu C, Gao J. eQTL epistasis: detecting epistatic effects and inferring hierarchical relationships of genes in biological pathways. ACTA ACUST UNITED AC 2014; 31:656-64. [PMID: 25359893 DOI: 10.1093/bioinformatics/btu727] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
MOTIVATION Epistasis is the interactions among multiple genetic variants. It has emerged to explain the 'missing heritability' that a marginal genetic effect does not account for by genome-wide association studies, and also to understand the hierarchical relationships between genes in the genetic pathways. The Fisher's geometric model is common in detecting the epistatic effects. However, despite the substantial successes of many studies with the model, it often fails to discover the functional dependence between genes in an epistasis study, which is an important role in inferring hierarchical relationships of genes in the biological pathway. RESULTS We justify the imperfectness of Fisher's model in the simulation study and its application to the biological data. Then, we propose a novel generic epistasis model that provides a flexible solution for various biological putative epistatic models in practice. The proposed method enables one to efficiently characterize the functional dependence between genes. Moreover, we suggest a statistical strategy for determining a recessive or dominant link among epistatic expression quantitative trait locus to enable the ability to infer the hierarchical relationships. The proposed method is assessed by simulation experiments of various settings and is applied to human brain data regarding schizophrenia. AVAILABILITY AND IMPLEMENTATION The MATLAB source codes are publicly available at: http://biomecis.uta.edu/epistasis.
Collapse
Affiliation(s)
- Mingon Kang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 66012, USA and Department of Mathematics, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Chunling Zhang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 66012, USA and Department of Mathematics, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Hyung-Wook Chun
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 66012, USA and Department of Mathematics, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Chris Ding
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 66012, USA and Department of Mathematics, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Chunyu Liu
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 66012, USA and Department of Mathematics, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Jean Gao
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 66012, USA and Department of Mathematics, University of Texas at Arlington, Arlington, TX 76019, USA
| |
Collapse
|
16
|
Abstract
Genome-wide association studies (GWASs) have become the focus of the statistical analysis of complex traits in humans, successfully shedding light on several aspects of genetic architecture and biological aetiology. Single-nucleotide polymorphisms (SNPs) are usually modelled as having additive, cumulative and independent effects on the phenotype. Although evidently a useful approach, it is often argued that this is not a realistic biological model and that epistasis (that is, the statistical interaction between SNPs) should be included. The purpose of this Review is to summarize recent directions in methodology for detecting epistasis and to discuss evidence of the role of epistasis in human complex trait variation. We also discuss the relevance of epistasis in the context of GWASs and potential hazards in the interpretation of statistical interaction terms.
Collapse
|
17
|
Genetic architecture of ethanol-responsive transcriptome variation in Saccharomyces cerevisiae strains. Genetics 2014; 198:369-82. [PMID: 24970865 DOI: 10.1534/genetics.114.167429] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Natural variation in gene expression is pervasive within and between species, and it likely explains a significant fraction of phenotypic variation between individuals. Phenotypic variation in acute systemic responses can also be leveraged to reveal physiological differences in how individuals perceive and respond to environmental perturbations. We previously found extensive variation in the transcriptomic response to acute ethanol exposure in two wild isolates and a common laboratory strain of Saccharomyces cerevisiae. Many expression differences persisted across several modules of coregulated genes, implicating trans-acting systemic differences in ethanol sensing and/or response. Here, we conducted expression QTL mapping of the ethanol response in two strain crosses to identify the genetic basis for these differences. To understand systemic differences, we focused on "hotspot" loci that affect many transcripts in trans. Candidate causal regulators contained within hotspots implicate upstream regulators as well as downstream effectors of the ethanol response. Overlap in hotspot targets revealed additive genetic effects of trans-acting loci as well as "epi-hotspots," in which epistatic interactions between two loci affected the same suites of downstream targets. One epi-hotspot implicated interactions between Mkt1p and proteins linked to translational regulation, prompting us to show that Mkt1p localizes to P bodies upon ethanol stress in a strain-specific manner. Our results provide a glimpse into the genetic architecture underlying natural variation in a stress response and present new details on how yeast respond to ethanol stress.
Collapse
|
18
|
Chen GK, Guo Y. Discovering epistasis in large scale genetic association studies by exploiting graphics cards. Front Genet 2013; 4:266. [PMID: 24348518 PMCID: PMC3848199 DOI: 10.3389/fgene.2013.00266] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 11/16/2013] [Indexed: 11/13/2022] Open
Abstract
Despite the enormous investments made in collecting DNA samples and generating germline variation data across thousands of individuals in modern genome-wide association studies (GWAS), progress has been frustratingly slow in explaining much of the heritability in common disease. Today's paradigm of testing independent hypotheses on each single nucleotide polymorphism (SNP) marker is unlikely to adequately reflect the complex biological processes in disease risk. Alternatively, modeling risk as an ensemble of SNPs that act in concert in a pathway, and/or interact non-additively on log risk for example, may be a more sensible way to approach gene mapping in modern studies. Implementing such analyzes genome-wide can quickly become intractable due to the fact that even modest size SNP panels on modern genotype arrays (500k markers) pose a combinatorial nightmare, require tens of billions of models to be tested for evidence of interaction. In this article, we provide an in-depth analysis of programs that have been developed to explicitly overcome these enormous computational barriers through the use of processors on graphics cards known as Graphics Processing Units (GPU). We include tutorials on GPU technology, which will convey why they are growing in appeal with today's numerical scientists. One obvious advantage is the impressive density of microprocessor cores that are available on only a single GPU. Whereas high end servers feature up to 24 Intel or AMD CPU cores, the latest GPU offerings from nVidia feature over 2600 cores. Each compute node may be outfitted with up to 4 GPU devices. Success on GPUs varies across problems. However, epistasis screens fare well due to the high degree of parallelism exposed in these problems. Papers that we review routinely report GPU speedups of over two orders of magnitude (>100x) over standard CPU implementations.
Collapse
Affiliation(s)
- Gary K Chen
- Division of Biostatics, Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Yunfei Guo
- Division of Biostatics, Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA ; Zilkha Neurogenetic Institute, University of Southern California Los Angeles, CA, USA
| |
Collapse
|