1
|
Aygün N, Liang D, Crouse WL, Keele GR, Love MI, Stein JL. Inferring cell-type-specific causal gene regulatory networks during human neurogenesis. Genome Biol 2023; 24:130. [PMID: 37254169 PMCID: PMC10230710 DOI: 10.1186/s13059-023-02959-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 05/05/2023] [Indexed: 06/01/2023] Open
Abstract
BACKGROUND Genetic variation influences both chromatin accessibility, assessed in chromatin accessibility quantitative trait loci (caQTL) studies, and gene expression, assessed in expression QTL (eQTL) studies. Genetic variants can impact either nearby genes (cis-eQTLs) or distal genes (trans-eQTLs). Colocalization between caQTL and eQTL, or cis- and trans-eQTLs suggests that they share causal variants. However, pairwise colocalization between these molecular QTLs does not guarantee a causal relationship. Mediation analysis can be applied to assess the evidence supporting causality versus independence between molecular QTLs. Given that the function of QTLs can be cell-type-specific, we performed mediation analyses to find epigenetic and distal regulatory causal pathways for genes within two major cell types of the developing human cortex, progenitors and neurons. RESULTS We find that the expression of 168 and 38 genes is mediated by chromatin accessibility in progenitors and neurons, respectively. We also find that the expression of 11 and 12 downstream genes is mediated by upstream genes in progenitors and neurons. Moreover, we discover that a genetic locus associated with inter-individual differences in brain structure shows evidence for mediation of SLC26A7 through chromatin accessibility, identifying molecular mechanisms of a common variant association to a brain trait. CONCLUSIONS In this study, we identify cell-type-specific causal gene regulatory networks whereby the impacts of variants on gene expression were mediated by chromatin accessibility or distal gene expression. Identification of these causal paths will enable identifying and prioritizing actionable regulatory targets perturbing these key processes during neurodevelopment.
Collapse
Affiliation(s)
- Nil Aygün
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Dan Liang
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Wesley L Crouse
- Department of Human Genetics, University of Chicago, Chicago, IL, 60637, USA
| | - Gregory R Keele
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, 04609, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
2
|
Gewirtz AD, Townes FW, Engelhardt BE. Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues. Life Sci Alliance 2022; 5:e202101297. [PMID: 35977827 PMCID: PMC9387650 DOI: 10.26508/lsa.202101297] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 07/15/2022] [Accepted: 07/18/2022] [Indexed: 11/24/2022] Open
Abstract
Expression quantitative trait loci (eQTLs), or single-nucleotide polymorphisms that affect average gene expression levels, provide important insights into context-specific gene regulation. Classic eQTL analyses use one-to-one association tests, which test gene-variant pairs individually and ignore correlations induced by gene regulatory networks and linkage disequilibrium. Probabilistic topic models, such as latent Dirichlet allocation, estimate latent topics for a collection of count observations. Prior multimodal frameworks that bridge genotype and expression data assume matched sample numbers between modalities. However, many data sets have a nested structure where one individual has several associated gene expression samples and a single germline genotype vector. Here, we build a telescoping bimodal latent Dirichlet allocation (TBLDA) framework to learn shared topics across gene expression and genotype data that allows multiple RNA sequencing samples to correspond to a single individual's genotype. By using raw count data, our model avoids possible adulteration via normalization procedures. Ancestral structure is captured in a genotype-specific latent space, effectively removing it from shared components. Using GTEx v8 expression data across 10 tissues and genotype data, we show that the estimated topics capture meaningful and robust biological signal in both modalities and identify associations within and across tissue types. We identify 4,645 cis-eQTLs and 995 trans-eQTLs by conducting eQTL mapping between the most informative features in each topic. Our TBLDA model is able to identify associations using raw sequencing count data when the samples in two separate data modalities are matched one-to-many, as is often the case in biological data. Our code is freely available at https://github.com/gewirtz/TBLDA.
Collapse
Affiliation(s)
- Ariel Dh Gewirtz
- Lewis-Sigler Institute of Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - F William Townes
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Barbara E Engelhardt
- Department of Computer Science, Princeton University, Princeton, NJ, USA
- Gladstone Institutes, San Francisco, CA, USA
| |
Collapse
|
3
|
Gao C, Wei H, Zhang K. LORSEN: Fast and Efficient eQTL Mapping With Low Rank Penalized Regression. Front Genet 2021; 12:690926. [PMID: 34868194 PMCID: PMC8636089 DOI: 10.3389/fgene.2021.690926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Accepted: 10/08/2021] [Indexed: 12/02/2022] Open
Abstract
Characterization of genetic variations that are associated with gene expression levels is essential to understand cellular mechanisms that underline human complex traits. Expression quantitative trait loci (eQTL) mapping attempts to identify genetic variants, such as single nucleotide polymorphisms (SNPs), that affect the expression of one or more genes. With the availability of a large volume of gene expression data, it is necessary and important to develop fast and efficient statistical and computational methods to perform eQTL mapping for such large scale data. In this paper, we proposed a new method, the low rank penalized regression method (LORSEN), for eQTL mapping. We evaluated and compared the performance of LORSEN with two existing methods for eQTL mapping using extensive simulations as well as real data from the HapMap3 project. Simulation studies showed that our method outperformed two commonly used methods for eQTL mapping, LORS and FastLORS, in many scenarios in terms of area under the curve (AUC). We illustrated the usefulness of our method by applying it to SNP variants data and gene expression levels on four chromosomes from the HapMap3 Project.
Collapse
Affiliation(s)
- Cheng Gao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| | - Hairong Wei
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, United States
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| |
Collapse
|
4
|
Ruffieux H, Fairfax BP, Nassiri I, Vigorito E, Wallace C, Richardson S, Bottolo L. EPISPOT: An epigenome-driven approach for detecting and interpreting hotspots in molecular QTL studies. Am J Hum Genet 2021; 108:983-1000. [PMID: 33909991 PMCID: PMC8206410 DOI: 10.1016/j.ajhg.2021.04.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 04/08/2021] [Indexed: 12/27/2022] Open
Abstract
We present EPISPOT, a fully joint framework which exploits large panels of epigenetic annotations as variant-level information to enhance molecular quantitative trait locus (QTL) mapping. Thanks to a purpose-built Bayesian inferential algorithm, EPISPOT accommodates functional information for both cis and trans actions, including QTL hotspot effects. It effectively couples simultaneous QTL analysis of thousands of genetic variants and molecular traits with hypothesis-free selection of biologically interpretable annotations which directly contribute to the QTL effects. This unified, epigenome-aided learning boosts statistical power and sheds light on the regulatory basis of the uncovered hits; EPISPOT therefore marks an essential step toward improving the challenging detection and functional interpretation of trans-acting genetic variants and hotspots. We illustrate the advantages of EPISPOT in simulations emulating real-data conditions and in a monocyte expression QTL study, which confirms known hotspots and finds other signals, as well as plausible mechanisms of action. In particular, by highlighting the role of monocyte DNase-I sensitivity sites from >150 epigenetic annotations, we clarify the mediation effects and cell-type specificity of major hotspots close to the lysozyme gene. Our approach forgoes the daunting and underpowered task of one-annotation-at-a-time enrichment analyses for prioritizing cis and trans QTL hits and is tailored to any transcriptomic, proteomic, or metabolomic QTL problem. By enabling principled epigenome-driven QTL mapping transcriptome-wide, EPISPOT helps progress toward a better functional understanding of genetic regulation.
Collapse
Affiliation(s)
- Hélène Ruffieux
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK.
| | - Benjamin P Fairfax
- Department of Oncology, MRC Weatherall Institute for Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DS, UK
| | - Isar Nassiri
- Department of Oncology, MRC Weatherall Institute for Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DS, UK
| | - Elena Vigorito
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK
| | - Chris Wallace
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK; Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge CB2 0AW, UK
| | - Sylvia Richardson
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK; The Alan Turing Institute, London NW1 2DB, UK
| | - Leonardo Bottolo
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK; The Alan Turing Institute, London NW1 2DB, UK; Department of Medical Genetics, University of Cambridge, Cambridge CB2 0QQ, UK
| |
Collapse
|
5
|
Banerjee S, Simonetti FL, Detrois KE, Kaphle A, Mitra R, Nagial R, Söding J. Tejaas: reverse regression increases power for detecting trans-eQTLs. Genome Biol 2021; 22:142. [PMID: 33957961 PMCID: PMC8101255 DOI: 10.1186/s13059-021-02361-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 04/22/2021] [Indexed: 12/18/2022] Open
Abstract
Trans-acting expression quantitative trait loci (trans-eQTLs) account for ≥70% expression heritability and could therefore facilitate uncovering mechanisms underlying the origination of complex diseases. Identifying trans-eQTLs is challenging because of small effect sizes, tissue specificity, and a severe multiple-testing burden. Tejaas predicts trans-eQTLs by performing L2-regularized “reverse” multiple regression of each SNP on all genes, aggregating evidence from many small trans-effects while being unaffected by the strong expression correlations. Combined with a novel unsupervised k-nearest neighbor method to remove confounders, Tejaas predicts 18851 unique trans-eQTLs across 49 tissues from GTEx. They are enriched in open chromatin, enhancers, and other regulatory regions. Many overlap with disease-associated SNPs, pointing to tissue-specific transcriptional regulation mechanisms.
Collapse
Affiliation(s)
- Saikat Banerjee
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany.
| | - Franco L Simonetti
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany
| | - Kira E Detrois
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany.,Georg-August University, Göttingen, 37075, Germany
| | - Anubhav Kaphle
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany.,Georg-August University, Göttingen, 37075, Germany
| | | | | | - Johannes Söding
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany. .,Campus-Institut Data Science (CIDAS), University of Göttingen, Göttingen, 37073, Germany. .,Cluster of Excellence "Multiscale Bioimaging" (MBExC), University of Göttingen, Göttingen, 37075, Germany.
| |
Collapse
|
6
|
Fan Y, Zhu H, Song Y, Peng Q, Zhou X. Efficient and effective control of confounding in eQTL mapping studies through joint differential expression and Mendelian randomization analyses. Bioinformatics 2021; 37:296-302. [PMID: 32790868 PMCID: PMC8058772 DOI: 10.1093/bioinformatics/btaa715] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 07/09/2020] [Accepted: 08/06/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Identifying cis-acting genetic variants associated with gene expression levels-an analysis commonly referred to as expression quantitative trait loci (eQTLs) mapping-is an important first step toward understanding the genetic determinant of gene expression variation. Successful eQTL mapping requires effective control of confounding factors. A common method for confounding effects control in eQTL mapping studies is the probabilistic estimation of expression residual (PEER) analysis. PEER analysis extracts PEER factors to serve as surrogates for confounding factors, which is further included in the subsequent eQTL mapping analysis. However, it is computationally challenging to determine the optimal number of PEER factors used for eQTL mapping. In particular, the standard approach to determine the optimal number of PEER factors examines one number at a time and chooses a number that optimizes eQTLs discovery. Unfortunately, this standard approach involves multiple repetitive eQTL mapping procedures that are computationally expensive, restricting its use in large-scale eQTL mapping studies that being collected today. RESULTS Here, we present a simple and computationally scalable alternative, Effect size Correlation for COnfounding determination (ECCO), to determine the optimal number of PEER factors used for eQTL mapping studies. Instead of performing repetitive eQTL mapping, ECCO jointly applies differential expression analysis and Mendelian randomization analysis, leading to substantial computational savings. In simulations and real data applications, we show that ECCO identifies a similar number of PEER factors required for eQTL mapping analysis as the standard approach but is two orders of magnitude faster. The computational scalability of ECCO allows for optimized eQTL discovery across 48 GTEx tissues for the first time, yielding an overall 5.89% power gain on the number of eQTL harboring genes (eGenes) discovered as compared to the previous GTEx recommendation that does not attempt to determine tissue-specific optimal number of PEER factors. AVAILABILITYAND IMPLEMENTATION Our method is implemented in the ECCO software, which, along with its GTEx mapping results, is freely available at www.xzlab.org/software.html. All R scripts used in this study are also available at this site. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yue Fan
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, China.,Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Huanhuan Zhu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yanyi Song
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Qinke Peng
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
7
|
Proteome-wide Systems Genetics to Identify Functional Regulators of Complex Traits. Cell Syst 2021; 12:5-22. [PMID: 33476553 DOI: 10.1016/j.cels.2020.10.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 09/15/2020] [Accepted: 10/07/2020] [Indexed: 02/08/2023]
Abstract
Proteomic technologies now enable the rapid quantification of thousands of proteins across genetically diverse samples. Integration of these data with systems-genetics analyses is a powerful approach to identify new regulators of economically important or disease-relevant phenotypes in various populations. In this review, we summarize the latest proteomic technologies and discuss technical challenges for their use in population studies. We demonstrate how the analysis of correlation structure and loci mapping can be used to identify genetic factors regulating functional protein networks and complex traits. Finally, we provide an extensive summary of the use of proteome-wide systems genetics throughout fungi, plant, and animal kingdoms and discuss the power of this approach to identify candidate regulators and drug targets in large human consortium studies.
Collapse
|
8
|
Kolberg L, Kerimov N, Peterson H, Alasoo K. Co-expression analysis reveals interpretable gene modules controlled by trans-acting genetic variants. eLife 2020; 9:e58705. [PMID: 32880574 PMCID: PMC7470823 DOI: 10.7554/elife.58705] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 08/20/2020] [Indexed: 12/16/2022] Open
Abstract
Understanding the causal processes that contribute to disease onset and progression is essential for developing novel therapies. Although trans-acting expression quantitative trait loci (trans-eQTLs) can directly reveal cellular processes modulated by disease variants, detecting trans-eQTLs remains challenging due to their small effect sizes. Here, we analysed gene expression and genotype data from six blood cell types from 226 to 710 individuals. We used co-expression modules inferred from gene expression data with five methods as traits in trans-eQTL analysis to limit multiple testing and improve interpretability. In addition to replicating three established associations, we discovered a novel trans-eQTL near SLC39A8 regulating a module of metallothionein genes in LPS-stimulated monocytes. Interestingly, this effect was mediated by a transient cis-eQTL present only in early LPS response and lost before the trans effect appeared. Our analyses highlight how co-expression combined with functional enrichment analysis improves the identification and prioritisation of trans-eQTLs when applied to emerging cell-type-specific datasets.
Collapse
Affiliation(s)
- Liis Kolberg
- Institute of Computer Science, University of TartuTartuEstonia
| | - Nurlan Kerimov
- Institute of Computer Science, University of TartuTartuEstonia
| | - Hedi Peterson
- Institute of Computer Science, University of TartuTartuEstonia
| | - Kaur Alasoo
- Institute of Computer Science, University of TartuTartuEstonia
| |
Collapse
|
9
|
Ramdhani S, Navarro E, Udine E, Efthymiou AG, Schilder BM, Parks M, Goate A, Raj T. Tensor decomposition of stimulated monocyte and macrophage gene expression profiles identifies neurodegenerative disease-specific trans-eQTLs. PLoS Genet 2020; 16:e1008549. [PMID: 32012164 PMCID: PMC7018232 DOI: 10.1371/journal.pgen.1008549] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Revised: 02/13/2020] [Accepted: 12/02/2019] [Indexed: 01/10/2023] Open
Abstract
Recent human genetic studies suggest that cells of the innate immune system have a primary role in the pathogenesis of neurodegenerative diseases. However, the results from these studies often do not elucidate how the genetic variants affect the biology of these cells to modulate disease risk. Here, we applied a tensor decomposition method to uncover disease associated gene networks linked to distal genetic variation in stimulated human monocyte and macrophage gene expression profiles. We report robust evidence that some disease associated genetic variants affect the expression of multiple genes in trans. These include a Parkinson's disease locus influencing the expression of genes mediated by a protease that controls lysosomal function, and Alzheimer's disease loci influencing the expression of genes involved in type 1 interferon signaling, myeloid phagocytosis, and complement cascade pathways. Overall, we uncover gene networks in induced innate immune cells linked to disease associated genetic variants, which may help elucidate the underlying biology of disease.
Collapse
Affiliation(s)
- Satesh Ramdhani
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Elisa Navarro
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Evan Udine
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Anastasia G. Efthymiou
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Brian M. Schilder
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Madison Parks
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Alison Goate
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Towfique Raj
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| |
Collapse
|
10
|
Shan N, Wang Z, Hou L. Identification of trans-eQTLs using mediation analysis with multiple mediators. BMC Bioinformatics 2019; 20:126. [PMID: 30925861 PMCID: PMC6440281 DOI: 10.1186/s12859-019-2651-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Background Mapping expression quantitative trait loci (eQTLs) has provided insight into gene regulation. Compared to cis-eQTLs, the regulatory mechanisms of trans-eQTLs are less known. Previous studies suggest that trans-eQTLs may regulate expression of remote genes by altering the expression of nearby genes. Trans-association has been studied in the mediation analysis with a single mediator. However, prior applications with one mediator are prone to model misspecification due to correlations between genes. Motivated from the observation that trans-eQTLs are more likely to associate with more than one cis-gene than randomly selected SNPs in the GTEx dataset, we developed a computational method to identify trans-eQTLs that are mediated by multiple mediators. Results We proposed two hypothesis tests for testing the total mediation effect (TME) and the component-wise mediation effects (CME), respectively. We demonstrated in simulation studies that the type I error rates were controlled in both tests despite model misspecification. The TME test was more powerful than the CME test when the two mediation effects are in the same direction, while the CME test was more powerful than the TME test when the two mediation effects are in opposite direction. Multiple mediator analysis had increased power to detect mediated trans-eQTLs, especially in large samples. In the HapMap3 data, we identified 11 mediated trans-eQTLs that were not detected by the single mediator analysis in the combined samples of African populations. Moreover, the mediated trans-eQTLs in the HapMap3 samples are more likely to be trait-associated SNPs. In terms of computation, although there is no limit in the number of mediators in our model, analysis takes more time when adding additional mediators. In the analysis of the HapMap3 samples, we included at most 5 cis-gene mediators. Majority of the trios we considered have one or two mediators. Conclusions Trans-eQTLs are more likely to associate with multiple cis-genes than randomly selected SNPs. Mediation analysis with multiple mediators improves power of identification of mediated trans-eQTLs, especially in large samples. Electronic supplementary material The online version of this article (10.1186/s12859-019-2651-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nayang Shan
- Center for Statistical Science, Tsinghua University, Beijing, 100084, China.,Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06510, USA.
| | - Lin Hou
- Center for Statistical Science, Tsinghua University, Beijing, 100084, China. .,Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China. .,MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
11
|
Causal Queries from Observational Data in Biological Systems via Bayesian Networks: An Empirical Study in Small Networks. Methods Mol Biol 2018. [PMID: 30547398 DOI: 10.1007/978-1-4939-8882-2_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Biological networks are a very convenient modeling and visualization tool to discover knowledge from modern high-throughput genomics and post-genomics data sets. Indeed, biological entities are not isolated but are components of complex multilevel systems. We go one step further and advocate for the consideration of causal representations of the interactions in living systems. We present the causal formalism and bring it out in the context of biological networks, when the data is observational. We also discuss its ability to decipher the causal information flow as observed in gene expression. We also illustrate our exploration by experiments on small simulated networks as well as on a real biological data set.
Collapse
|
12
|
Green R, Ireton RC, Gale M. Interferon-stimulated genes: new platforms and computational approaches. Mamm Genome 2018; 29:593-602. [PMID: 29982912 DOI: 10.1007/s00335-018-9755-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 06/22/2018] [Indexed: 12/12/2022]
Abstract
Interferon-stimulated genes (ISGs) are the effectors of interferon (IFN) actions and play major roles in innate immune defense against microbial infection. During virus infection, ISGs impart antiviral actions to control virus replication and spread but can also contribute to disease pathology if their expression is unchecked. Antiviral ISGs have been identified by a variety of biochemical, genetic, and virologic methods. New computational approaches are expanding and redefining ISGs as responders to a variety of stimuli beyond IFNs, including virus infection, stress, and other events that induce cytokines. These studies reveal that the expression of ISG subsets link to interferon regulatory factors (IRF)s, NF-kB, and other transcription factors that impart gene expression in specific cell types independently of IFNs, including stem cells and other cell types where ISGs are constitutively expressed. Here, we provide a broad overview of ISGs, define virus-induced genes (VSG)s, and discuss the application of computational approaches and bioinformatics platforms to evaluate the functional role of ISGs in epigenetics, immune programming, and vaccine responses.
Collapse
Affiliation(s)
- Richard Green
- Department of Immunology and the Center for Innate Immunity and Immune Disease (CIIID), University of Washington, Seattle, WA, USA.
| | - Reneé C Ireton
- Department of Immunology and the Center for Innate Immunity and Immune Disease (CIIID), University of Washington, Seattle, WA, USA
| | - Michael Gale
- Department of Immunology and the Center for Innate Immunity and Immune Disease (CIIID), University of Washington, Seattle, WA, USA
| |
Collapse
|
13
|
Battle A, Brown CD, Engelhardt BE, Montgomery SB. Genetic effects on gene expression across human tissues. Nature 2017; 550:204-213. [PMID: 29022597 PMCID: PMC5776756 DOI: 10.1038/nature24277] [Citation(s) in RCA: 2534] [Impact Index Per Article: 362.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Accepted: 09/15/2017] [Indexed: 12/12/2022]
Abstract
Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
Collapse
Affiliation(s)
- Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Christopher D Brown
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Barbara E Engelhardt
- Department of Computer Science and Center for Statistics and Machine Learning, Princeton University, Princeton, New Jersey 08540, USA
| | - Stephen B Montgomery
- Department of Genetics, Stanford University, Stanford, California 94305, USA
- Department of Pathology, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
14
|
Elliott HR, Shihab HA, Lockett GA, Holloway JW, McRae AF, Smith GD, Ring SM, Gaunt TR, Relton CL. Role of DNA Methylation in Type 2 Diabetes Etiology: Using Genotype as a Causal Anchor. Diabetes 2017; 66:1713-1722. [PMID: 28246294 PMCID: PMC5860189 DOI: 10.2337/db16-0874] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Accepted: 02/21/2017] [Indexed: 12/24/2022]
Abstract
Several studies have investigated the relationship between genetic variation and DNA methylation with respect to type 2 diabetes, but it is unknown if DNA methylation is a mediator in the disease pathway or if it is altered in response to disease state. This study uses genotypic information as a causal anchor to help decipher the likely role of DNA methylation measured in peripheral blood in the etiology of type 2 diabetes. Illumina HumanMethylation450 BeadChip data were generated on 1,018 young individuals from the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort. In stage 1, 118 unique associations between published type 2 diabetes single nucleotide polymorphisms (SNPs) and genome-wide methylation (methylation quantitative trait loci [mQTLs]) were identified. In stage 2, a further 226 mQTLs were identified between 202 additional independent non-type 2 diabetes SNPs and CpGs identified in stage 1. Where possible, associations were replicated in independent cohorts of similar age. We discovered that around half of known type 2 diabetes SNPs are associated with variation in DNA methylation and postulated that methylation could either be on a causal pathway to future disease or could be a noncausal biomarker. For one locus (KCNQ1), we were able to provide further evidence that methylation is likely to be on the causal pathway to disease in later life.
Collapse
Affiliation(s)
- Hannah R Elliott
- MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Bristol, U.K.
| | - Hashem A Shihab
- MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Bristol, U.K
| | - Gabrielle A Lockett
- Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, U.K
| | - John W Holloway
- Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, U.K
- Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Southampton, U.K
| | - Allan F McRae
- Queensland Brain Institute, The University of Queensland, Brisbane, Queensland, Australia
- The University of Queensland Diamantina Institute, Translational Research Institute, University of Queensland, Brisbane, Queensland, Australia
| | - George Davey Smith
- MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Bristol, U.K
| | - Susan M Ring
- MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Bristol, U.K
| | - Tom R Gaunt
- MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Bristol, U.K
| | - Caroline L Relton
- MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Bristol, U.K
| |
Collapse
|
15
|
Casale FP, Horta D, Rakitsch B, Stegle O. Joint genetic analysis using variant sets reveals polygenic gene-context interactions. PLoS Genet 2017; 13:e1006693. [PMID: 28426829 PMCID: PMC5398484 DOI: 10.1371/journal.pgen.1006693] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Accepted: 03/15/2017] [Indexed: 01/28/2023] Open
Abstract
Joint genetic models for multiple traits have helped to enhance association analyses. Most existing multi-trait models have been designed to increase power for detecting associations, whereas the analysis of interactions has received considerably less attention. Here, we propose iSet, a method based on linear mixed models to test for interactions between sets of variants and environmental states or other contexts. Our model generalizes previous interaction tests and in particular provides a test for local differences in the genetic architecture between contexts. We first use simulations to validate iSet before applying the model to the analysis of genotype-environment interactions in an eQTL study. Our model retrieves a larger number of interactions than alternative methods and reveals that up to 20% of cases show context-specific configurations of causal variants. Finally, we apply iSet to test for sub-group specific genetic effects in human lipid levels in a large human cohort, where we identify a gene-sex interaction for C-reactive protein that is missed by alternative methods. Genetic effects on phenotypes can depend on external contexts, including environment. Statistical tests for identifying such interactions are important to understand how individual genetic variants may act in different contexts. Interaction effects can either be studied using measurements of a given phenotype in different contexts, under the same genetic backgrounds, or by stratifying a population into subgroups. Here, we derive a method based on linear mixed models that can be applied to both of these designs. iSet enables testing for interactions between context and sets of variants, and accounts for polygenic effects. We validate our model using simulations, before applying it to the genetic analysis of gene expression studies and genome-wide association studies of human blood lipid levels. We find that modeling interactions with variant sets offers increased power, thereby uncovering interactions that cannot be detected by alternative methods.
Collapse
Affiliation(s)
- Francesco Paolo Casale
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SD Hinxton, Cambridge, United Kingdom
- * E-mail: (FPC); (OS)
| | - Danilo Horta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SD Hinxton, Cambridge, United Kingdom
| | - Barbara Rakitsch
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SD Hinxton, Cambridge, United Kingdom
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SD Hinxton, Cambridge, United Kingdom
- * E-mail: (FPC); (OS)
| |
Collapse
|
16
|
Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol Syst Biol 2016; 12:878. [PMID: 27474269 PMCID: PMC4965871 DOI: 10.15252/msb.20156651] [Citation(s) in RCA: 669] [Impact Index Per Article: 83.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Revised: 06/02/2016] [Accepted: 06/06/2016] [Indexed: 12/11/2022] Open
Abstract
Technological advances in genomics and imaging have led to an explosion of molecular and cellular profiling data from large numbers of samples. This rapid increase in biological data dimension and acquisition rate is challenging conventional analysis strategies. Modern machine learning methods, such as deep learning, promise to leverage very large data sets for finding hidden structure within them, and for making accurate predictions. In this review, we discuss applications of this new breed of analysis approaches in regulatory genomics and cellular imaging. We provide background of what deep learning is, and the settings in which it can be successfully applied to derive biological insights. In addition to presenting specific applications and providing tips for practical use, we also highlight possible pitfalls and limitations to guide computational biologists when and how to make the most use of this new technology.
Collapse
Affiliation(s)
- Christof Angermueller
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK
| | - Tanel Pärnamaa
- Department of Computer Science, University of Tartu, Tartu, Estonia Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK
| | - Leopold Parts
- Department of Computer Science, University of Tartu, Tartu, Estonia Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK
| |
Collapse
|