1
|
Ludl AA, Michoel T. Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast. Mol Omics 2021; 17:241-251. [PMID: 33438713 DOI: 10.1039/d0mo00140f] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Causal gene networks model the flow of information within a cell. Reconstructing causal networks from omics data is challenging because correlation does not imply causation. When genomics and transcriptomics data from a segregating population are combined, genomic variants can be used to orient the direction of causality between gene expression traits. Instrumental variable methods use a local expression quantitative trait locus (eQTL) as a randomized instrument for a gene's expression level, and assign target genes based on distal eQTL associations. Mediation-based methods additionally require that distal eQTL associations are mediated by the source gene. A detailed comparison between these methods has not yet been conducted, due to the lack of a standardized implementation of different methods, the limited sample size of most multi-omics datasets, and the absence of ground-truth networks for most organisms. Here we used Findr, a software package providing uniform implementations of instrumental variable, mediation, and coexpression-based methods, a recent dataset of 1012 segregants from a cross between two budding yeast strains, and the Yeastract database of known transcriptional interactions to compare causal gene network inference methods. We found that causal inference methods result in a significant overlap with the ground-truth, whereas coexpression did not perform better than random. A subsampling analysis revealed that the performance of mediation saturates at large sample sizes, due to a loss of sensitivity when residual correlations become significant. Instrumental variable methods on the other hand contain false positive predictions, due to genomic linkage between eQTL instruments. Instrumental variable and mediation-based methods also have complementary roles for identifying causal genes underlying transcriptional hotspots. Instrumental variable methods correctly predicted STB5 targets for a hotspot centred on the transcription factor STB5, whereas mediation failed due to Stb5p auto-regulating its own expression. Mediation suggests a new candidate gene, DNM1, for a hotspot on Chr XII, whereas instrumental variable methods could not distinguish between multiple genes located within the hotspot. In conclusion, causal inference from genomics and transcriptomics data is a powerful approach for reconstructing causal gene networks, which could be further improved by the development of methods to control for residual correlations in mediation analyses, and for genomic linkage and pleiotropic effects from transcriptional hotspots in instrumental variable analyses.
Collapse
Affiliation(s)
- Adriaan-Alexander Ludl
- Computational Biology Unit, Department of Informatics, University of Bergen, PO Box 7803, 5020 Bergen, Norway.
| | | |
Collapse
|
2
|
Glymour C, Zhang K, Spirtes P. Review of Causal Discovery Methods Based on Graphical Models. Front Genet 2019; 10:524. [PMID: 31214249 PMCID: PMC6558187 DOI: 10.3389/fgene.2019.00524] [Citation(s) in RCA: 143] [Impact Index Per Article: 28.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Accepted: 05/13/2019] [Indexed: 12/11/2022] Open
Abstract
A fundamental task in various disciplines of science, including biology, is to find underlying causal relations and make use of them. Causal relations can be seen if interventions are properly applied; however, in many cases they are difficult or even impossible to conduct. It is then necessary to discover causal relations by analyzing statistical properties of purely observational data, which is known as causal discovery or causal structure search. This paper aims to give a introduction to and a brief review of the computational methods for causal discovery that were developed in the past three decades, including constraint-based and score-based methods and those based on functional causal models, supplemented by some illustrations and applications.
Collapse
Affiliation(s)
- Clark Glymour
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Kun Zhang
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Peter Spirtes
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
3
|
Abstract
Gaussian process dynamical systems (GPDS) represent Bayesian nonparametric approaches to inference of nonlinear dynamical systems, and provide a principled framework for the learning of biological networks from multiple perturbed time series measurements of gene or protein expression. Such approaches are able to capture the full richness of complex ODE models, and can be scaled for inference in moderately large systems containing hundreds of genes. Related hierarchical approaches allow for inference from multiple datasets in which the underlying generative networks are assumed to have been rewired, either by context-dependent changes in network structure, evolutionary processes, or synthetic manipulation. These approaches can also be used to leverage experimentally determined network structures from one species into another where the network structure is unknown. Collectively, these methods provide a comprehensive and flexible platform for inference from a diverse range of data, with applications in systems and synthetic biology, as well as spatiotemporal modelling of embryo development. In this chapter we provide an overview of GPDS approaches and highlight their applications in the biological sciences, with accompanying tutorials available as a Jupyter notebook from https://github.com/cap76/GPDS .
Collapse
Affiliation(s)
| | - Iulia Gherman
- Warwick Integrative Synthetic Biology Centre, School of Engineering, University of Warwick, Coventry, UK
| | - Anastasiya Sybirna
- Wellcome/CRUK Gurdon Institute, University of Cambridge, Cambridge, UK
- Wellcome/MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
- Physiology, Development and Neuroscience Department, University of Cambridge, Cambridge, UK
| | - David L Wild
- Department of Statistics and Systems Biology Centre, University of Warwick, Coventry, UK
| |
Collapse
|
4
|
Shahdoust M, Pezeshk H, Mahjub H, Sadeghi M. F-MAP: A Bayesian approach to infer the gene regulatory network using external hints. PLoS One 2017; 12:e0184795. [PMID: 28938012 PMCID: PMC5609748 DOI: 10.1371/journal.pone.0184795] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 08/31/2017] [Indexed: 01/07/2023] Open
Abstract
The Common topological features of related species gene regulatory networks suggest reconstruction of the network of one species by using the further information from gene expressions profile of related species. We present an algorithm to reconstruct the gene regulatory network named; F-MAP, which applies the knowledge about gene interactions from related species. Our algorithm sets a Bayesian framework to estimate the precision matrix of one species microarray gene expressions dataset to infer the Gaussian Graphical model of the network. The conjugate Wishart prior is used and the information from related species is applied to estimate the hyperparameters of the prior distribution by using the factor analysis. Applying the proposed algorithm on six related species of drosophila shows that the precision of reconstructed networks is improved considerably compared to the precision of networks constructed by other Bayesian approaches.
Collapse
Affiliation(s)
- Maryam Shahdoust
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Hamid Pezeshk
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
| | - Hossein Mahjub
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
5
|
Reyes PFL, Michoel T, Joshi A, Devailly G. Meta-analysis of Liver and Heart Transcriptomic Data for Functional Annotation Transfer in Mammalian Orthologs. Comput Struct Biotechnol J 2017; 15:425-432. [PMID: 29187960 PMCID: PMC5691612 DOI: 10.1016/j.csbj.2017.08.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 08/10/2017] [Accepted: 08/11/2017] [Indexed: 11/30/2022] Open
Abstract
Functional annotation transfer across multi-gene family
orthologs can lead to functional misannotations. We hypothesised that co-expression
network will help predict functional orthologs amongst complex homologous gene
families. To explore the use of transcriptomic data available in public domain to
identify functionally equivalent ones from all predicted orthologs, we collected
genome wide expression data in mouse and rat liver from over 1500 experiments with
varied treatments. We used a hyper-graph clustering method to identify clusters of
orthologous genes co-expressed in both mouse and rat. We validated these clusters by
analysing expression profiles in each species separately, and demonstrating a high
overlap. We then focused on genes in 18 homology groups with one-to-many or
many-to-many relationships between two species, to discriminate between functionally
equivalent and non-equivalent orthologs. Finally, we further applied our method by
collecting heart transcriptomic data (over 1400 experiments) in rat and mouse to
validate the method in an independent tissue.
Collapse
Affiliation(s)
| | - Tom Michoel
- The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, Scotland, UK
| | - Anagha Joshi
- The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, Scotland, UK
| | - Guillaume Devailly
- The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, Scotland, UK
| |
Collapse
|
6
|
Koch C, Konieczka J, Delorey T, Lyons A, Socha A, Davis K, Knaack SA, Thompson D, O'Shea EK, Regev A, Roy S. Inference and Evolutionary Analysis of Genome-Scale Regulatory Networks in Large Phylogenies. Cell Syst 2017; 4:543-558.e8. [PMID: 28544882 PMCID: PMC5515301 DOI: 10.1016/j.cels.2017.04.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Revised: 02/20/2017] [Accepted: 04/26/2017] [Indexed: 11/22/2022]
Abstract
Changes in transcriptional regulatory networks can significantly contribute to species evolution and adaptation. However, identification of genome-scale regulatory networks is an open challenge, especially in non-model organisms. Here, we introduce multi-species regulatory network learning (MRTLE), a computational approach that uses phylogenetic structure, sequence-specific motifs, and transcriptomic data, to infer the regulatory networks in different species. Using simulated data from known networks and transcriptomic data from six divergent yeasts, we demonstrate that MRTLE predicts networks with greater accuracy than existing methods because it incorporates phylogenetic information. We used MRTLE to infer the structure of the transcriptional networks that control the osmotic stress responses of divergent, non-model yeast species and then validated our predictions experimentally. Interrogating these networks reveals that gene duplication promotes network divergence across evolution. Taken together, our approach facilitates study of regulatory network evolutionary dynamics across multiple poorly studied species.
Collapse
Affiliation(s)
- Christopher Koch
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wl, USA
| | - Jay Konieczka
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Toni Delorey
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Ana Lyons
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Amanda Socha
- Dartmouth College, Biology department, Hanover, NH 03755, USA
| | - Kathleen Davis
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
| | - Sara A Knaack
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, Wl, USA
| | - Dawn Thompson
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Erin K O'Shea
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA
- Howard Hughes Medical Institute, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
- Faculty of Arts and Sciences Center for Systems Biology, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
- Department of Molecular and Cellular Biology, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Howard Hughes Medical Institute, Chevy Chase, Maryland, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, Wl, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wl, USA
| |
Collapse
|
7
|
Fused Regression for Multi-source Gene Regulatory Network Inference. PLoS Comput Biol 2016; 12:e1005157. [PMID: 27923054 PMCID: PMC5140053 DOI: 10.1371/journal.pcbi.1005157] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 09/20/2016] [Indexed: 12/03/2022] Open
Abstract
Understanding gene regulatory networks is critical to understanding cellular differentiation and response to external stimuli. Methods for global network inference have been developed and applied to a variety of species. Most approaches consider the problem of network inference independently in each species, despite evidence that gene regulation can be conserved even in distantly related species. Further, network inference is often confined to single data-types (single platforms) and single cell types. We introduce a method for multi-source network inference that allows simultaneous estimation of gene regulatory networks in multiple species or biological processes through the introduction of priors based on known gene relationships such as orthology incorporated using fused regression. This approach improves network inference performance even when orthology mapping and conservation are incomplete. We refine this method by presenting an algorithm that extracts the true conserved subnetwork from a larger set of potentially conserved interactions and demonstrate the utility of our method in cross species network inference. Last, we demonstrate our method’s utility in learning from data collected on different experimental platforms. Gene regulatory networks describing related biological processes are thought to share conserved interaction structure. This assumption motivates a great deal of work in model systems–where discovery of gene regulation may be more experimentally tractable–but is difficult to directly evaluate using existing methods. The presence of shared structure in a well studied model system or process should make the problem of network inference in a related process easier, but this information is not often applied to the discovery of global gene regulatory networks. Further, to be able to successfully translate findings between different organisms, it is important to be able to identify where regulatory structure is different. We provide a method based on penalized fused regression for inferring gene regulatory networks given prior knowledge about the similarity of interactions in each network. This method is demonstrated on synthetic data, and applied to the problem of inferring networks in distantly related bacterial organisms. We then introduce an extension of the method to deal with the condition of uncertainty over the degree of regulatory conservation by simultaneously inferring gene conservation and interaction weights.
Collapse
|
8
|
Macedo LMF, Nunes FMF, Freitas FCP, Pires CV, Tanaka ED, Martins JR, Piulachs MD, Cristino AS, Pinheiro DG, Simões ZLP. MicroRNA signatures characterizing caste-independent ovarian activity in queen and worker honeybees (Apis mellifera L.). INSECT MOLECULAR BIOLOGY 2016; 25:216-26. [PMID: 26853694 DOI: 10.1111/imb.12214] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Queen and worker honeybees differ profoundly in reproductive capacity. The queen of this complex society, with 200 highly active ovarioles in each ovary, is the fertile caste, whereas the workers have approximately 20 ovarioles as a result of receiving a different diet during larval development. In a regular queenright colony, the workers have inactive ovaries and do not reproduce. However, if the queen is sensed to be absent, some of the workers activate their ovaries, producing viable haploid eggs that develop into males. Here, a deep-sequenced ovary transcriptome library of reproductive workers was used as supporting data to assess the dynamic expression of the regulatory molecules and microRNAs (miRNAs) of reproductive and nonreproductive honeybee females. In this library, most of the differentially expressed miRNAs are related to ovary physiology or oogenesis. When we quantified the dynamic expression of 19 miRNAs in the active and inactive worker ovaries and compared their expression in the ovaries of virgin and mated queens, we noted that some miRNAs (miR-1, miR-31a, miR-13b, miR-125, let-7 RNA, miR-100, miR-276, miR-12, miR-263a, miR-306, miR-317, miR-92a and miR-9a) could be used to identify reproductive and nonreproductive statuses independent of caste. Furthermore, integrative gene networks suggested that some candidate miRNAs function in the process of ovary activation in worker bees.
Collapse
Affiliation(s)
- L M F Macedo
- Departamento De Genética, Faculdade De Medicina De Ribeirão Preto, Universidade De São Paulo, Ribeirão Preto, Brazil
| | - F M F Nunes
- Departamento De Genética E Evolução, Centro De Ciências Biológicas E Da Saúde, Universidade Federal De São Carlos, São Carlos, Brazil
| | - F C P Freitas
- Departamento De Genética, Faculdade De Medicina De Ribeirão Preto, Universidade De São Paulo, Ribeirão Preto, Brazil
| | - C V Pires
- Departamento De Genética, Faculdade De Medicina De Ribeirão Preto, Universidade De São Paulo, Ribeirão Preto, Brazil
| | - E D Tanaka
- Departamento De Genética, Faculdade De Medicina De Ribeirão Preto, Universidade De São Paulo, Ribeirão Preto, Brazil
| | - J R Martins
- Departamento De Genética, Faculdade De Medicina De Ribeirão Preto, Universidade De São Paulo, Ribeirão Preto, Brazil
| | - M-D Piulachs
- Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, Spain
| | - A S Cristino
- The University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, Australia
| | - D G Pinheiro
- Departamento De Tecnologia, Faculdade De Ciências Agrárias E Veterinárias, Universidade Estadual Paulista, Jaboticabal, Brazil
| | - Z L P Simões
- Departamento De Biologia, Faculdade De Filosofia, Ciências E Letras De Ribeirão Preto, Universidade De São Paulo, Ribeirão Preto, Brazil
| |
Collapse
|
9
|
Penfold CA, Millar JBA, Wild DL. Inferring orthologous gene regulatory networks using interspecies data fusion. Bioinformatics 2015; 31:i97-105. [PMID: 26072515 PMCID: PMC4765882 DOI: 10.1093/bioinformatics/btv267] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Motivation: The ability to jointly learn gene regulatory networks (GRNs) in, or leverage GRNs between related species would allow the vast amount of legacy data obtained in model organisms to inform the GRNs of more complex, or economically or medically relevant counterparts. Examples include transferring information from Arabidopsis thaliana into related crop species for food security purposes, or from mice into humans for medical applications. Here we develop two related Bayesian approaches to network inference that allow GRNs to be jointly inferred in, or leveraged between, several related species: in one framework, network information is directly propagated between species; in the second hierarchical approach, network information is propagated via an unobserved ‘hypernetwork’. In both frameworks, information about network similarity is captured via graph kernels, with the networks additionally informed by species-specific time series gene expression data, when available, using Gaussian processes to model the dynamics of gene expression. Results: Results on in silico benchmarks demonstrate that joint inference, and leveraging of known networks between species, offers better accuracy than standalone inference. The direct propagation of network information via the non-hierarchical framework is more appropriate when there are relatively few species, while the hierarchical approach is better suited when there are many species. Both methods are robust to small amounts of mislabelling of orthologues. Finally, the use of Saccharomyces cerevisiae data and networks to inform inference of networks in the budding yeast Schizosaccharomyces pombe predicts a novel role in cell cycle regulation for Gas1 (SPAC19B12.02c), a 1,3-beta-glucanosyltransferase. Availability and implementation: MATLAB code is available from http://go.warwick.ac.uk/systemsbiology/software/. Contact:d.l.wild@warwick.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher A Penfold
- Warwick Systems Biology Centre and Biomedical Cell Biology, Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK
| | - Jonathan B A Millar
- Warwick Systems Biology Centre and Biomedical Cell Biology, Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK
| | - David L Wild
- Warwick Systems Biology Centre and Biomedical Cell Biology, Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|