1
|
Castro DM, de Veaux NR, Miraldi ER, Bonneau R. Multi-study inference of regulatory networks for more accurate models of gene regulation. PLoS Comput Biol 2019; 15:e1006591. [PMID: 30677040 PMCID: PMC6363223 DOI: 10.1371/journal.pcbi.1006591] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Revised: 02/05/2019] [Accepted: 10/23/2018] [Indexed: 12/16/2022] Open
Abstract
Gene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets. Due to increasing availability of biological data, methods to properly integrate data generated across the globe become essential for extracting reproducible insights into relevant research questions. In this work, we developed a framework to reconstruct gene regulatory networks from expression datasets generated in separate studies—and thus, because of technical variation (different dates, handlers, laboratories, protocols etc…), challenging to integrate. Since regulatory mechanisms are often shared across conditions, we hypothesized that drawing conclusions from various data sources would improve performance of gene regulatory network inference. By transferring knowledge among regulatory models, our method is able to detect weaker patterns that are conserved across datasets, while also being able to detect dataset-unique interactions. We also allow incorporation of prior knowledge on network structure to favor models that are somewhat similar to the prior itself. Using two model organisms, we show that joint network inference outperforms inference from a single dataset. We also demonstrate that our method is robust to false edges in the prior and to low condition overlap across datasets, and that it can outperform current data integration strategies.
Collapse
Affiliation(s)
| | - Nicholas R de Veaux
- Center for Computational Biology, Flatiron Institute, New York, NY 10010, USA
| | - Emily R Miraldi
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA.,Divisions of Immunobiology & Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH 45229, USA
| | - Richard Bonneau
- New York University, New York, NY 10003, USA.,Center for Computational Biology, Flatiron Institute, New York, NY 10010, USA
| |
Collapse
|
2
|
Todorov H, Cannoodt R, Saelens W, Saeys Y. Network Inference from Single-Cell Transcriptomic Data. Methods Mol Biol 2019; 1883:235-249. [PMID: 30547403 DOI: 10.1007/978-1-4939-8882-2_10] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Recent technological breakthroughs in single-cell RNA sequencing are revolutionizing modern experimental design in biology. The increasing size of the single-cell expression data from which networks can be inferred allows identifying more complex, non-linear dependencies between genes. Moreover, the inter-cellular variability that is observed in single-cell expression data can be used to infer not only one global network representing all the cells, but also numerous regulatory networks that are more specific to certain conditions. By experimentally perturbing certain genes, the deconvolution of the true contribution of these genes can also be greatly facilitated. In this chapter, we will therefore tackle the advantages of single-cell transcriptomic data and show how new methods exploit this novel data type to enhance the inference of gene regulatory networks.
Collapse
Affiliation(s)
- Helena Todorov
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium. .,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium. .,Centre International de Recherche en Infectiologie, Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, École Normale Supérieure de Lyon, Univ Lyon, Lyon, France.
| | - Robrecht Cannoodt
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.,Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Wouter Saelens
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| |
Collapse
|
3
|
Hempel S, Koseska A, Nikoloski Z, Kurths J. Unraveling gene regulatory networks from time-resolved gene expression data - a measures comparison study. BMC Bioinformatics 2011; 12:292. [PMID: 21771321 PMCID: PMC3161045 DOI: 10.1186/1471-2105-12-292] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2011] [Accepted: 07/19/2011] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Inferring regulatory interactions between genes from transcriptomics time-resolved data, yielding reverse engineered gene regulatory networks, is of paramount importance to systems biology and bioinformatics studies. Accurate methods to address this problem can ultimately provide a deeper insight into the complexity, behavior, and functions of the underlying biological systems. However, the large number of interacting genes coupled with short and often noisy time-resolved read-outs of the system renders the reverse engineering a challenging task. Therefore, the development and assessment of methods which are computationally efficient, robust against noise, applicable to short time series data, and preferably capable of reconstructing the directionality of the regulatory interactions remains a pressing research problem with valuable applications. RESULTS Here we perform the largest systematic analysis of a set of similarity measures and scoring schemes within the scope of the relevance network approach which are commonly used for gene regulatory network reconstruction from time series data. In addition, we define and analyze several novel measures and schemes which are particularly suitable for short transcriptomics time series. We also compare the considered 21 measures and 6 scoring schemes according to their ability to correctly reconstruct such networks from short time series data by calculating summary statistics based on the corresponding specificity and sensitivity. Our results demonstrate that rank and symbol based measures have the highest performance in inferring regulatory interactions. In addition, the proposed scoring scheme by asymmetric weighting has shown to be valuable in reducing the number of false positive interactions. On the other hand, Granger causality as well as information-theoretic measures, frequently used in inference of regulatory networks, show low performance on the short time series analyzed in this study. CONCLUSIONS Our study is intended to serve as a guide for choosing a particular combination of similarity measures and scoring schemes suitable for reconstruction of gene regulatory networks from short time series data. We show that further improvement of algorithms for reverse engineering can be obtained if one considers measures that are rooted in the study of symbolic dynamics or ranks, in contrast to the application of common similarity measures which do not consider the temporal character of the employed data. Moreover, we establish that the asymmetric weighting scoring scheme together with symbol based measures (for low noise level) and rank based measures (for high noise level) are the most suitable choices.
Collapse
Affiliation(s)
- Sabrina Hempel
- Interdisciplinary Center for Dynamics of Complex Systems, University of Potsdam, Campus Golm, Karl-Liebknecht-Str. 24, D-14476 Potsdam, Germany
- Potsdam Institute for Climate Impact Research (PIK), Telegraphenberg A 31, D-14473 Potsdam, Germany
- Department of Physics, Humboldt University of Berlin, Campus Adlershof, Newtonstr. 15, D-12489 Berlin, Germany
| | - Aneta Koseska
- Interdisciplinary Center for Dynamics of Complex Systems, University of Potsdam, Campus Golm, Karl-Liebknecht-Str. 24, D-14476 Potsdam, Germany
| | - Zoran Nikoloski
- Systems Biology and Mathematical Modeling Group, Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, D-14476 Potsdam, Germany
- Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 25, D-14476 Potsdam, Germany
| | - Jürgen Kurths
- Potsdam Institute for Climate Impact Research (PIK), Telegraphenberg A 31, D-14473 Potsdam, Germany
- Department of Physics, Humboldt University of Berlin, Campus Adlershof, Newtonstr. 15, D-12489 Berlin, Germany
- Institute for Complex Systems and Mathematical Biology, University of Aberdeen, Aberdeen AB243UE, UK
| |
Collapse
|
4
|
Nikoloski Z, May P, Selbig J. Algebraic connectivity may explain the evolution of gene regulatory networks. J Theor Biol 2010; 267:7-14. [PMID: 20682325 DOI: 10.1016/j.jtbi.2010.07.028] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2009] [Revised: 07/21/2010] [Accepted: 07/21/2010] [Indexed: 11/26/2022]
Abstract
Gene expression is a result of the interplay between the structure, type, kinetics, and specificity of gene regulatory interactions, whose diversity gives rise to the variety of life forms. As the dynamic behavior of gene regulatory networks depends on their structure, here we attempt to determine structural reasons which, despite the similarities in global network properties, may explain the large differences in organismal complexity. We demonstrate that the algebraic connectivity, the smallest non-trivial eigenvalue of the Laplacian, of the directed gene regulatory networks decreases with the increase of organismal complexity, and may therefore explain the difference between the variety of analyzed regulatory networks. In addition, our results point out that, for the species considered in this study, evolution favours decreasing concentration of strategically positioned feed forward loops, so that the network as a whole can increase the specificity towards changing environments. Moreover, contrary to the existing results, we show that the average degree, the length of the longest cascade, and the average cascade length of gene regulatory networks cannot recover the evolutionary relationships between organisms. Whereas the dynamical properties of special subnetworks are relatively well understood, there is still limited knowledge about the evolutionary reasons for the already identified design principles pertaining to these special subnetworks, underlying the global quantitative features of gene regulatory networks of different organisms. The behavior of the algebraic connectivity, which we show valid on gene regulatory networks extracted from curated databases, can serve as an additional evolutionary principle of organism-specific regulatory networks.
Collapse
Affiliation(s)
- Zoran Nikoloski
- Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Brandenburg, Germany.
| | | | | |
Collapse
|
5
|
Cell state switching factors and dynamical patterning modules: complementary mediators of plasticity in development and evolution. J Biosci 2009; 34:553-72. [DOI: 10.1007/s12038-009-0074-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
6
|
Newman SA, Bhat R, Mezentseva NV. Cell state switching factors and dynamical patterning modules: complementary mediators of plasticity in development and evolution. J Biosci 2009. [DOI: 10.1007/s12038-009-0001-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
7
|
Amoutzias GD, Pichler EE, Mian N, De Graaf D, Imsiridou A, Robinson-Rechavi M, Bornberg-Bauer E, Robertson DL, Oliver SG. A protein interaction atlas for the nuclear receptors: properties and quality of a hub-based dimerisation network. BMC SYSTEMS BIOLOGY 2007; 1:34. [PMID: 17672894 PMCID: PMC1971058 DOI: 10.1186/1752-0509-1-34] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2007] [Accepted: 07/31/2007] [Indexed: 12/16/2022]
Abstract
BACKGROUND The nuclear receptors are a large family of eukaryotic transcription factors that constitute major pharmacological targets. They exert their combinatorial control through homotypic heterodimerisation. Elucidation of this dimerisation network is vital in order to understand the complex dynamics and potential cross-talk involved. RESULTS Phylogeny, protein-protein interactions, protein-DNA interactions and gene expression data have been integrated to provide a comprehensive and up-to-date description of the topology and properties of the nuclear receptor interaction network in humans. We discriminate between DNA-binding and non-DNA-binding dimers, and provide a comprehensive interaction map, that identifies potential cross-talk between the various pathways of nuclear receptors. CONCLUSION We infer that the topology of this network is hub-based, and much more connected than previously thought. The hub-based topology of the network and the wide tissue expression pattern of NRs create a highly competitive environment for the common heterodimerising partners. Furthermore, a significant number of negative feedback loops is present, with the hub protein SHP [NR0B2] playing a major role. We also compare the evolution, topology and properties of the nuclear receptor network with the hub-based dimerisation network of the bHLH transcription factors in order to identify both unique themes and ubiquitous properties in gene regulation. In terms of methodology, we conclude that such a comprehensive picture can only be assembled by semi-automated text-mining, manual curation and integration of data from various sources.
Collapse
Affiliation(s)
- Gregory D Amoutzias
- Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
- Department of Ecology and Evolution, University of Lausanne & Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Discovery Information, AstraZeneca R&D Boston, 35 Gatehouse Drive, Waltham, MA 02451, USA
- Bioinformatics & Evolutionary Genomics, Department of Plant Systems Biology, VIB/Ghent University, Technologiepark 927, B-9052 Ghent, Belgium
| | - Elgar E Pichler
- Discovery Information, AstraZeneca R&D Boston, 35 Gatehouse Drive, Waltham, MA 02451, USA
| | | | - David De Graaf
- Discovery Information, AstraZeneca R&D Boston, 35 Gatehouse Drive, Waltham, MA 02451, USA
- Pfizer RTC Cambridge, Cambridge, MA, USA
| | - Anastasia Imsiridou
- Higher Technological Educational Institute of Thessaloniki, 63200 Nea Moudania, Halkidiki, Greece
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne & Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Erich Bornberg-Bauer
- Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
- Bioinformatics Division, Institute for Evolution and Biodiversity, School of Biological Sciences, University of Muenster, Schlossplatz 4, D48149, Muenster, Germany
| | - David L Robertson
- Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| | - Stephen G Oliver
- Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| |
Collapse
|
8
|
Brynildsen MP, Wu TY, Jang SS, Liao JC. Biological network mapping and source signal deduction. ACTA ACUST UNITED AC 2007; 23:1783-91. [PMID: 17495996 DOI: 10.1093/bioinformatics/btm246] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Many biological networks, including transcriptional regulation, metabolism, and the absorbance spectra of metabolite mixtures, can be represented in a bipartite fashion. Key to understanding these bipartite networks are the network architecture and governing source signals. Such information is often implicitly imbedded in the data. Here we develop a technique, network component mapping (NCM), to deduce bipartite network connectivity and regulatory signals from data without any need for prior information. RESULTS We demonstrate the utility of our approach by analyzing UV-vis spectra from mixtures of metabolites and gene expression data from Saccharomyces cerevisiae. From UV-vis spectra, hidden mixing networks and pure component spectra (sources) were deduced to a higher degree of resolution with our method than other current bipartite techniques. Analysis of S. cerevisiae gene expression from two separate environmental conditions (zinc and DTT treatment) yielded transcription networks consistent with ChIP-chip derived network connectivity. Due to the high degree of noise in gene expression data, the transcription network for many genes could not be inferred. However, with relatively clean expression data, our technique was able to deduce hidden transcription networks and instances of combinatorial regulation. These results suggest that NCM can deduce correct network connectivity from relatively accurate data. For noisy data, NCM yields the sparsest network capable of explaining the data. In addition, partial knowledge of the network topology can be incorporated into NCM as constraints. AVAILABILITY Algorithm available on request from the authors. Soon to be posted on the web, http://www.seas.ucla.edu/~liaoj/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mark P Brynildsen
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, CA 90095, USA
| | | | | | | |
Collapse
|
9
|
Sanguinetti G, Lawrence ND, Rattray M. Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities. ACTA ACUST UNITED AC 2006; 22:2775-81. [PMID: 16966362 DOI: 10.1093/bioinformatics/btl473] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Quantitative estimation of the regulatory relationship between transcription factors and genes is a fundamental stepping stone when trying to develop models of cellular processes. Recent experimental high-throughput techniques, such as Chromatin Immunoprecipitation (ChIP) provide important information about the architecture of the regulatory networks in the cell. However, it is very difficult to measure the concentration levels of transcription factor proteins and determine their regulatory effect on gene transcription. It is therefore an important computational challenge to infer these quantities using gene expression data and network architecture data. RESULTS We develop a probabilistic state space model that allows genome-wide inference of both transcription factor protein concentrations and their effect on the transcription rates of each target gene from microarray data. We use variational inference techniques to learn the model parameters and perform posterior inference of protein concentrations and regulatory strengths. The probabilistic nature of the model also means that we can associate credibility intervals to our estimates, as well as providing a tool to detect which binding events lead to significant regulation. We demonstrate our model on artificial data and on two yeast datasets in which the network structure has previously been obtained using ChIP data. Predictions from our model are consistent with the underlying biology and offer novel quantitative insights into the regulatory structure of the yeast cell. AVAILABILITY MATLAB code is available from http://umber.sbs.man.ac.uk/resources/puma
Collapse
Affiliation(s)
- Guido Sanguinetti
- Department of Computer Science, Regent Court 211 Portobello Road, Sheffield, S1 4DP, UK.
| | | | | |
Collapse
|
10
|
Tan N, Ouyang Q. Design of a network with state stability. J Theor Biol 2005; 240:592-8. [PMID: 16343546 DOI: 10.1016/j.jtbi.2005.10.019] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2005] [Revised: 08/30/2005] [Accepted: 10/26/2005] [Indexed: 11/28/2022]
Abstract
Designing a network with given functions or reconstruct a network based on its dynamical behavior is an important problem in the study of complex systems. In this paper, we put forward certain principles in constructing a network with state stability. We show that a necessary and sufficient condition to design networks with a global fixed point is that active nodes inhibit inactive nodes, while the latter activate the former directly or indirectly. We also designed networks based on basic modules, where each basic module consists a sub-network, they communicate through the inhibition link from each activator in lower module to the inhibitor of upper module. We found that long activation links, i.e. indirect activation links are important to the formation of convergence trajectory. We believe that these principles may help us to understand the topology of biological networks.
Collapse
Affiliation(s)
- Ning Tan
- School of Physics and the Center for Theoretical Biology, Peking University, 100871 Beijing, China
| | | |
Collapse
|