1
|
Khan M, Ludl AA, Bankier S, Björkegren JLM, Michoel T. Prediction of causal genes at GWAS loci with pleiotropic gene regulatory effects using sets of correlated instrumental variables. ARXIV 2024:arXiv:2401.06261v2. [PMID: 38259344 PMCID: PMC10802687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Multivariate Mendelian randomization (MVMR) is a statistical technique that uses sets of genetic instruments to estimate the direct causal effects of multiple exposures on an outcome of interest. At genomic loci with pleiotropic gene regulatory effects, that is, loci where the same genetic variants are associated to multiple nearby genes, MVMR can potentially be used to predict candidate causal genes. However, consensus in the field dictates that the genetic instruments in MVMR must be independent (not in linkage disequilibrium), which is usually not possible when considering a group of candidate genes from the same locus. Here we used causal inference theory to show that MVMR with correlated instruments satisfies the instrumental set condition. This is a classical result by Brito and Pearl (2002) for structural equation models that guarantees the identifiability of individual causal effects in situations where multiple exposures collectively, but not individually, separate a set of instrumental variables from an outcome variable. Extensive simulations confirmed the validity and usefulness of these theoretical results. Importantly, the causal effect estimates remained unbiased and their variance small even when instruments are highly correlated, while bias introduced by horizontal pleiotropy or LD matrix sampling error was comparable to standard MR. We applied MVMR with correlated instrumental variable sets at genome-wide significant loci for coronary artery disease (CAD) risk using expression Quantitative Trait Loci (eQTL) data from seven vascular and metabolic tissues in the STARNET study. Our method predicts causal genes at twelve loci, each associated with multiple colocated genes in multiple tissues. We confirm causal roles for PHACTR1 and ADAMTS7 in arterial tissues, among others. However, the extensive degree of regulatory pleiotropy across tissues and the limited number of causal variants in each locus still require that MVMR is run on a tissue-by-tissue basis, and testing all gene-tissue pairs with cis-eQTL associations at a given locus in a single model to predict causal gene-tissue combinations remains infeasible. Our results show that within tissues, MVMR with dependent, as opposed to independent, sets of instrumental variables significantly expands the scope for predicting causal genes in disease risk loci with pleiotropic regulatory effects. However, considering risk loci with regulatory pleiotropy that also spans across tissues remains an unsolved problem.
Collapse
Affiliation(s)
- Mariyam Khan
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Adriaan-Alexander Ludl
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Sean Bankier
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Johan LM Björkegren
- Department of Medicine (Huddinge), Karolinska Institutet, Huddinge, Sweden
- Department of Genetics & Genomic Sciences/Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Tom Michoel
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| |
Collapse
|
2
|
Tan JY, Marques AC. The activity of human enhancers is modulated by the splicing of their associated lncRNAs. PLoS Comput Biol 2022; 18:e1009722. [PMID: 35015755 PMCID: PMC8803168 DOI: 10.1371/journal.pcbi.1009722] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 01/31/2022] [Accepted: 12/05/2021] [Indexed: 11/19/2022] Open
Abstract
Pervasive enhancer transcription is at the origin of more than half of all long noncoding RNAs in humans. Transcription of enhancer-associated long noncoding RNAs (elncRNA) contribute to their cognate enhancer activity and gene expression regulation in cis. Recently, splicing of elncRNAs was shown to be associated with elevated enhancer activity. However, whether splicing of elncRNA transcripts is a mere consequence of accessibility at highly active enhancers or if elncRNA splicing directly impacts enhancer function, remains unanswered. We analysed genetically driven changes in elncRNA splicing, in humans, to address this outstanding question. We showed that splicing related motifs within multi-exonic elncRNAs evolved under selective constraints during human evolution, suggesting the processing of these transcripts is unlikely to have resulted from transcription across spurious splice sites. Using a genome-wide and unbiased approach, we used nucleotide variants as independent genetic factors to directly assess the causal relationship that underpin elncRNA splicing and their cognate enhancer activity. We found that the splicing of most elncRNAs is associated with changes in chromatin signatures at cognate enhancers and target mRNA expression. We provide evidence that efficient and conserved processing of enhancer-associated elncRNAs contributes to enhancer activity. Most, if not all, active enhancers are transcribed, giving rise to a plethora of transcripts, including enhancer-associated long noncoding RNAs (elncRNAs). Changes in elncRNA levels impacts cognate enhancer activity. Recently splicing of elncRNA has also been found to associate with enhancer activity. Whether this associations reflects a contribution of elncRNA splicing to increased enhancer activity or else is simply the consequence of increased chromatin accessibility that promotes transcriptional elongation and allows for spurious splicing events remains unknown. We show that natural selection has acted, at the species and population level, to preserve DNA elements required for frequent and efficient elncRNA splicing Importantly, using a genome-wide and unbiased statistical population genomics approach, we demonstrate that elncRNA splicing is associated with cognate enhancer function, contributing to chromatin status and enhancer activity. Our results provides strong evidence that efficient elncRNA splicing contributes to enhancer activity genome-wide.
Collapse
Affiliation(s)
- Jennifer Yihong Tan
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- * E-mail: (JYT); (ACM)
| | - Ana Claudia Marques
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- * E-mail: (JYT); (ACM)
| |
Collapse
|
3
|
Rijlaarsdam J, Barker ED, Caserini C, Koopman-Verhoeff ME, Mulder RH, Felix JF, Cecil CA. Genome-wide DNA methylation patterns associated with general psychopathology in children. J Psychiatr Res 2021; 140:214-220. [PMID: 34118639 PMCID: PMC8578013 DOI: 10.1016/j.jpsychires.2021.05.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 04/22/2021] [Accepted: 05/20/2021] [Indexed: 12/29/2022]
Abstract
Psychiatric symptoms are interrelated and found to be largely captured by a general psychopathology factor (GPF). Although epigenetic mechanisms, such as DNA methylation (DNAm), have been linked to individual psychiatric outcomes, associations with GPF remain unclear. Using data from 440 children aged 10 years participating in the Generation R Study, we examined the associations of DNAm with both general and specific (internalizing, externalizing) factors of psychopathology. Genome-wide DNAm levels, measured in peripheral blood using the Illumina 450K array, were clustered into wider co-methylation networks ('modules') using a weighted gene co-expression network analysis. One co-methylated module associated with GPF after multiple testing correction, while none associated with the specific factors. This module comprised of 218 CpG probes, of which 198 mapped onto different genes. The CpG most strongly driving the association with GPF was annotated to FZD1, a gene that has been implicated in schizophrenia and wider neurological processes. Associations between the probes contained in the co-methylated module and GPF were supported in an independent sample of children from the Avon Longitudinal Study of Parents and Children (ALSPAC), as evidenced by significant correlations in effect sizes. These findings might contribute to improving our understanding of dynamic molecular processes underlying complex psychiatric phenotypes.
Collapse
Affiliation(s)
- Jolien Rijlaarsdam
- Department of Child and Adolescent Psychiatry/ Psychology, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands.
| | - Edward D. Barker
- Department of Psychology, Institute of Psychiatry, Psychology and Neuroscience, King's College London, UK
| | - Chiara Caserini
- Department of Psychology, Sigmund Freud University, Milan, Italy
| | - M. Elisabeth Koopman-Verhoeff
- Department of Child and Adolescent Psychiatry/ Psychology, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands,The Generation R Study Group, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Rosa H. Mulder
- Department of Child and Adolescent Psychiatry/ Psychology, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands,The Generation R Study Group, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands,Department of Pediatrics, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Janine F. Felix
- Department of Child and Adolescent Psychiatry/ Psychology, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands,The Generation R Study Group, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Charlotte A.M. Cecil
- Department of Child and Adolescent Psychiatry/ Psychology, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands,Department of Epidemiology, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands,Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
| |
Collapse
|
4
|
Ludl AA, Michoel T. Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast. Mol Omics 2021; 17:241-251. [PMID: 33438713 DOI: 10.1039/d0mo00140f] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Causal gene networks model the flow of information within a cell. Reconstructing causal networks from omics data is challenging because correlation does not imply causation. When genomics and transcriptomics data from a segregating population are combined, genomic variants can be used to orient the direction of causality between gene expression traits. Instrumental variable methods use a local expression quantitative trait locus (eQTL) as a randomized instrument for a gene's expression level, and assign target genes based on distal eQTL associations. Mediation-based methods additionally require that distal eQTL associations are mediated by the source gene. A detailed comparison between these methods has not yet been conducted, due to the lack of a standardized implementation of different methods, the limited sample size of most multi-omics datasets, and the absence of ground-truth networks for most organisms. Here we used Findr, a software package providing uniform implementations of instrumental variable, mediation, and coexpression-based methods, a recent dataset of 1012 segregants from a cross between two budding yeast strains, and the Yeastract database of known transcriptional interactions to compare causal gene network inference methods. We found that causal inference methods result in a significant overlap with the ground-truth, whereas coexpression did not perform better than random. A subsampling analysis revealed that the performance of mediation saturates at large sample sizes, due to a loss of sensitivity when residual correlations become significant. Instrumental variable methods on the other hand contain false positive predictions, due to genomic linkage between eQTL instruments. Instrumental variable and mediation-based methods also have complementary roles for identifying causal genes underlying transcriptional hotspots. Instrumental variable methods correctly predicted STB5 targets for a hotspot centred on the transcription factor STB5, whereas mediation failed due to Stb5p auto-regulating its own expression. Mediation suggests a new candidate gene, DNM1, for a hotspot on Chr XII, whereas instrumental variable methods could not distinguish between multiple genes located within the hotspot. In conclusion, causal inference from genomics and transcriptomics data is a powerful approach for reconstructing causal gene networks, which could be further improved by the development of methods to control for residual correlations in mediation analyses, and for genomic linkage and pleiotropic effects from transcriptional hotspots in instrumental variable analyses.
Collapse
Affiliation(s)
- Adriaan-Alexander Ludl
- Computational Biology Unit, Department of Informatics, University of Bergen, PO Box 7803, 5020 Bergen, Norway.
| | | |
Collapse
|
5
|
Ha MJ, Sun W. Estimation of high-dimensional directed acyclic graphs with surrogate intervention. Biostatistics 2020; 21:659-675. [PMID: 30596892 DOI: 10.1093/biostatistics/kxy080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 11/18/2018] [Accepted: 11/25/2018] [Indexed: 11/15/2022] Open
Abstract
Directed acyclic graphs (DAGs) have been used to describe causal relationships between variables. The standard method for determining such relations uses interventional data. For complex systems with high-dimensional data, however, such interventional data are often not available. Therefore, it is desirable to estimate causal structure from observational data without subjecting variables to interventions. Observational data can be used to estimate the skeleton of a DAG and the directions of a limited number of edges. We develop a Bayesian framework to estimate a DAG using surrogate interventional data, where the interventions are applied to a set of external variables, and thus such interventions are considered to be surrogate interventions on the variables of interest. Our work is motivated by expression quantitative trait locus (eQTL) studies, where the variables of interest are the expression of genes, the external variables are DNA variations, and interventions are applied to DNA variants during the process of a randomly selected DNA allele being passed to a child from either parent. Our method, surrogate intervention recovery of a DAG ($\texttt{sirDAG}$), first constructs a DAG skeleton using penalized regressions and the subsequent partial correlation tests, and then estimates the posterior probabilities of all the edge directions after incorporating DNA variant data. We demonstrate the utilities of $\texttt{sirDAG}$ by simulation and an application to an eQTL study for 550 breast cancer patients.
Collapse
Affiliation(s)
- Min Jin Ha
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, TX, USA
| | - Wei Sun
- Program in Biostatistics and Bioinformatics, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA USA
| |
Collapse
|
6
|
Fadason T, Schierding W, Kolbenev N, Liu J, Ingram JR, O’Sullivan JM. Reconstructing the blood metabolome and genotype using long-range chromatin interactions. Metabol Open 2020; 6:100035. [PMID: 32812909 PMCID: PMC7424797 DOI: 10.1016/j.metop.2020.100035] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 03/15/2020] [Accepted: 03/16/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND -Maintenance of tight controls on circulating blood metabolites is crucial to normal, healthy tissue and organismal function. A number of single nucleotide polymorphisms (SNPs) have been associated with changes in the levels of blood metabolites. However, the impacts of the metabolite-associated SNPs are largely unknown because they fall within non-coding regions of the genome. OBJECTIVE -We aimed to identify genes and tissues that are linked to changes in circulating blood metabolites by characterizing genome-wide spatial regulatory interactions involving blood metabolite-associated SNPs. METHOD -We systematically integrated chromatin interaction (Hi-C), expression quantitative trait loci (eQTL), gene ontology, drug interaction, and literature-supported connections to deconvolute the genetic regulatory influences of 145 blood metabolite-associated SNPs. FINDINGS -We identified 577 genes that are regulated by 130 distal and proximal metabolite-associated SNPs across 48 different human tissues. The affected genes are enriched in categories that include metabolism, enzymes, plasma proteins, disease development, and potential drug targets. Our results suggest that regulatory interactions in other tissues contribute to the modulation of blood metabolites. CONCLUSIONS -The spatial SNP-gene-metabolite associations identified in this study expand on the list of genes and tissues that are influenced by metabolic-associated SNPs and improves our understanding of the molecular mechanisms underlying pathologic blood metabolite levels.
Collapse
Affiliation(s)
- Tayaza Fadason
- The Liggins Institute, The University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
| | - William Schierding
- The Liggins Institute, The University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
| | - Nikolai Kolbenev
- The Department of Computer Science, The University of Auckland, Auckland, New Zealand
| | - Jiamou Liu
- The Department of Computer Science, The University of Auckland, Auckland, New Zealand
| | | | - Justin M. O’Sullivan
- The Liggins Institute, The University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
| |
Collapse
|
7
|
Davis JP, Vadlamudi S, Roman TS, Zeynalzadeh M, Iyengar AK, Mohlke KL. Enhancer deletion and allelic effects define a regulatory molecular mechanism at the VLDLR cholesterol GWAS locus. Hum Mol Genet 2019; 28:888-895. [PMID: 30445632 PMCID: PMC6400044 DOI: 10.1093/hmg/ddy385] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 10/26/2018] [Accepted: 11/02/2018] [Indexed: 02/06/2023] Open
Abstract
Total cholesterol (TC) and low-density lipoprotein cholesterol (LDL-C) are heritable risk factors for cardiovascular disease, yet the molecular mechanisms underlying the majority of blood lipid-associated genome-wide association studies signals remain elusive. One association signal is located in intron 3 of VLDLR; rs3780181-A is a risk allele associated (P ≤ 2 × 10-9) with increased TC and LDL-C. We investigated variants, genes and mechanisms underlying this association signal. We used a functional genetic approach to show that the intronic region spanning rs3780181 exhibited 1.6-7.6-fold enhancer activity in human HepG2 hepatocyte, THP-1 monocyte and Simpson-Golabi-Behmel Syndrome (SGBS) preadipocyte cells and that the rs3780181-A risk allele showed significantly less enhancer activity compared with the G allele, consistent with the direction of an expression quantitative trait locus in liver. In addition, rs3780181 alleles showed differential binding to multiple nuclear proteins, including stronger IRF2 binding to the rs3780181 G allele. We used a CRISPR-cas9 approach to delete 475 and 663 bp of the putative enhancer element in HEK293T kidney cells; compared to expression of mock-edited cell lines, the homozygous enhancer deletion cell lines showed 1.2-fold significantly (P < 0.04) decreased expression of VLDLR, as well as 1.5-fold decreased expression of SMARCA2, located 388 kb away. Together, these results identify an enhancer of VLDLR expression and suggest that altered binding of one or more factors bound to rs3780181 alleles decreases enhancer activity and reduces at least VLDLR expression, leading to increased TC and LDL-C.
Collapse
Affiliation(s)
- James P Davis
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | | | - Tamara S Roman
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Monica Zeynalzadeh
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Apoorva K Iyengar
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Karen L Mohlke
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
8
|
The arms race between man and Mycobacterium tuberculosis: Time to regroup. INFECTION GENETICS AND EVOLUTION 2018; 66:361-375. [DOI: 10.1016/j.meegid.2017.08.021] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Revised: 08/21/2017] [Accepted: 08/22/2017] [Indexed: 12/12/2022]
|
9
|
Wang L, Michoel T. Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data. PLoS Comput Biol 2017; 13:e1005703. [PMID: 28821014 PMCID: PMC5576763 DOI: 10.1371/journal.pcbi.1005703] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Revised: 08/30/2017] [Accepted: 07/26/2017] [Indexed: 02/07/2023] Open
Abstract
Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr. Understanding how genetic variation between individuals determines variation in observable traits or disease risk is one of the core aims of genetics. It is known that genetic variation often affects gene regulatory DNA elements and directly causes variation in expression of nearby genes. This effect in turn cascades down to other genes via the complex pathways and gene interaction networks that ultimately govern how cells operate in an ever changing environment. In theory, when genetic variation and gene expression levels are measured simultaneously in a large number of individuals, the causal effects of genes on each other can be inferred using statistical models similar to those used in randomized controlled trials. We developed a novel method and ultra-fast software Findr which, unlike existing methods, takes into account the complex but unknown network context when predicting causality between specific gene pairs. Findr’s predictions have a significantly higher overlap with known gene networks compared to existing methods, using both simulated and real data. Findr is also nearly a million times faster, and hence the only software in its class that can handle modern datasets where the expression levels of ten-thousands of genes are simultaneously measured in hundreds to thousands of individuals.
Collapse
Affiliation(s)
- Lingfei Wang
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, United Kingdom
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, United Kingdom
- * E-mail:
| |
Collapse
|