51
|
Siahpirani AF, Chasman D, Roy S. Integrative Approaches for Inference of Genome-Scale Gene Regulatory Networks. Methods Mol Biol 2019; 1883:161-194. [PMID: 30547400 DOI: 10.1007/978-1-4939-8882-2_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Transcriptional regulatory networks specify the regulatory proteins of target genes that control the context-specific expression levels of genes. With our ability to profile the different types of molecular components of cells under different conditions, we are now uniquely positioned to infer regulatory networks in diverse biological contexts such as different cell types, tissues, and time points. In this chapter, we cover two main classes of computational methods to integrate different types of information to infer genome-scale transcriptional regulatory networks. The first class of methods focuses on integrative methods for specifically inferring connections between transcription factors and target genes by combining gene expression data with regulatory edge-specific knowledge. The second class of methods integrates upstream signaling networks with transcriptional regulatory networks by combining gene expression data with protein-protein interaction networks and proteomic datasets. We conclude with a section on practical applications of a network inference algorithm to infer a genome-scale regulatory network.
Collapse
Affiliation(s)
- Alireza Fotuhi Siahpirani
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.,Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - Deborah Chasman
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA. .,Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
52
|
Causal Queries from Observational Data in Biological Systems via Bayesian Networks: An Empirical Study in Small Networks. Methods Mol Biol 2018. [PMID: 30547398 DOI: 10.1007/978-1-4939-8882-2_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Biological networks are a very convenient modeling and visualization tool to discover knowledge from modern high-throughput genomics and post-genomics data sets. Indeed, biological entities are not isolated but are components of complex multilevel systems. We go one step further and advocate for the consideration of causal representations of the interactions in living systems. We present the causal formalism and bring it out in the context of biological networks, when the data is observational. We also discuss its ability to decipher the causal information flow as observed in gene expression. We also illustrate our exploration by experiments on small simulated networks as well as on a real biological data set.
Collapse
|
53
|
Xu T, Ou-Yang L, Hu X, Zhang XF. Identifying Gene Network Rewiring by Integrating Gene Expression and Gene Network Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:2079-2085. [PMID: 29994068 DOI: 10.1109/tcbb.2018.2809603] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Exploring the rewiring pattern of gene regulatory networks between different pathological states is an important task in bioinformatics. Although a number of computational approaches have been developed to infer differential networks from high-throughput data, most of them only focus on gene expression data. The valuable static gene regulatory network data accumulated in recent biomedical researches are neglected. In this study, we propose a new Gaussian graphical model-based method to infer differential networks by integrating gene expression and static gene regulatory network data. We first evaluate the empirical performance of our method by comparing with the state-of-the-art methods using simulation data. We also apply our method to The Cancer Genome Atlas data to identify gene network rewiring between ovarian cancers with different platinum responses, and rewiring between breast cancers of luminal A subtype and basal-like subtype. Hub genes in the estimated differential networks rediscover known genes associated with platinum resistance in ovarian cancer and signatures of the breast cancer intrinsic subtypes.
Collapse
|
54
|
Kuzmanovski V, Todorovski L, Džeroski S. Extensive evaluation of the generalized relevance network approach to inferring gene regulatory networks. Gigascience 2018; 7:5099470. [PMID: 30239704 PMCID: PMC6420648 DOI: 10.1093/gigascience/giy118] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2017] [Accepted: 09/11/2018] [Indexed: 01/15/2023] Open
Abstract
Background The generalized relevance network approach to network inference reconstructs network links based on the strength of associations between data in individual network nodes. It can reconstruct undirected networks, i.e., relevance networks, sensu stricto, as well as directed networks, referred to as causal relevance networks. The generalized approach allows the use of an arbitrary measure of pairwise association between nodes, an arbitrary scoring scheme that transforms the associations into weights of the network links, and a method for inferring the directions of the links. While this makes the approach powerful and flexible, it introduces the challenge of finding a combination of components that would perform well on a given inference task. Results We address this challenge by performing an extensive empirical analysis of the performance of 114 variants of the generalized relevance network approach on 47 tasks of gene network inference from time-series data and 39 tasks of gene network inference from steady-state data. We compare the different variants in a multi-objective manner, considering their ranking in terms of different performance metrics. The results suggest a set of recommendations that provide guidance for selecting an appropriate variant of the approach in different data settings. Conclusions The association measures based on correlation, combined with a particular scoring scheme of asymmetric weighting, lead to optimal performance of the relevance network approach in the general case. In the two special cases of inference tasks involving short time-series data and/or large networks, association measures based on identifying qualitative trends in the time series are more appropriate.
Collapse
Affiliation(s)
- Vladimir Kuzmanovski
- Department of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia
| | - Ljupco Todorovski
- Department of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia.,Faculty of Public Administration, University of Ljubljana, Gosarjeva ulica 5, 1000 Ljubljana, Slovenia
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia
| |
Collapse
|
55
|
Barbosa S, Niebel B, Wolf S, Mauch K, Takors R. A guide to gene regulatory network inference for obtaining predictive solutions: Underlying assumptions and fundamental biological and data constraints. Biosystems 2018; 174:37-48. [PMID: 30312740 DOI: 10.1016/j.biosystems.2018.10.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Revised: 10/05/2018] [Accepted: 10/08/2018] [Indexed: 02/07/2023]
Abstract
The study of biological systems at a system level has become a reality due to the increasing powerful computational approaches able to handle increasingly larger datasets. Uncovering the dynamic nature of gene regulatory networks in order to attain a system level understanding and improve the predictive power of biological models is an important research field in systems biology. The task itself presents several challenges, since the problem is of combinatorial nature and highly depends on several biological constraints and also the intended application. Given the intrinsic interdisciplinary nature of gene regulatory network inference, we present a review on the currently available approaches, their challenges and limitations. We propose guidelines to select the most appropriate method considering the underlying assumptions and fundamental biological and data constraints.
Collapse
Affiliation(s)
- Sara Barbosa
- Insilico Biotechnology AG, Meitnerstrasse 9, 70563 Stuttgart, Germany.
| | - Bastian Niebel
- Insilico Biotechnology AG, Meitnerstrasse 9, 70563 Stuttgart, Germany
| | - Sebastian Wolf
- Insilico Biotechnology AG, Meitnerstrasse 9, 70563 Stuttgart, Germany
| | - Klaus Mauch
- Insilico Biotechnology AG, Meitnerstrasse 9, 70563 Stuttgart, Germany
| | - Ralf Takors
- Institute of Biochemical Engineering, University of Stuttgart, Allmandring 31, 70569 Stuttgart, Germany
| |
Collapse
|
56
|
Siahpirani AF, Roy S. A prior-based integrative framework for functional transcriptional regulatory network inference. Nucleic Acids Res 2018; 45:e21. [PMID: 27794550 PMCID: PMC5389674 DOI: 10.1093/nar/gkw963] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2015] [Accepted: 10/12/2016] [Indexed: 12/16/2022] Open
Abstract
Transcriptional regulatory networks specify regulatory proteins controlling the context-specific expression levels of genes. Inference of genome-wide regulatory networks is central to understanding gene regulation, but remains an open challenge. Expression-based network inference is among the most popular methods to infer regulatory networks, however, networks inferred from such methods have low overlap with experimentally derived (e.g. ChIP-chip and transcription factor (TF) knockouts) networks. Currently we have a limited understanding of this discrepancy. To address this gap, we first develop a regulatory network inference algorithm, based on probabilistic graphical models, to integrate expression with auxiliary datasets supporting a regulatory edge. Second, we comprehensively analyze our and other state-of-the-art methods on different expression perturbation datasets. Networks inferred by integrating sequence-specific motifs with expression have substantially greater agreement with experimentally derived networks, while remaining more predictive of expression than motif-based networks. Our analysis suggests natural genetic variation as the most informative perturbation for network inference, and, identifies core TFs whose targets are predictable from expression. Multiple reasons make the identification of targets of other TFs difficult, including network architecture and insufficient variation of TF mRNA level. Finally, we demonstrate the utility of our inference algorithm to infer stress-specific regulatory networks and for regulator prioritization.
Collapse
Affiliation(s)
- Alireza F Siahpirani
- Department of Computer Sciences, University of Wisconsin-Madison, 1210 W. Dayton St. Madison, WI 53706-1613, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Discovery Building 330 North Orchard St. Madison, WI 53715, USA.,Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, K6/446 Clinical Sciences Center 600 Highland Avenue Madison, WI 53792-4675, USA
| |
Collapse
|
57
|
Franks AM, Markowetz F, Airoldi EM. REFINING CELLULAR PATHWAY MODELS USING AN ENSEMBLE OF HETEROGENEOUS DATA SOURCES. Ann Appl Stat 2018; 12:1361-1384. [PMID: 36506698 PMCID: PMC9733905 DOI: 10.1214/16-aoas915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Improving current models and hypotheses of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of new high-throughput studies. Moreover, the available sources of data are heterogeneous, and the data need to be integrated in different ways depending on which part of the pathway they are most informative for. In this paper, we introduce a compartment specific strategy to integrate edge, node and path data for refining a given network hypothesis. To carry out inference, we use a local-move Gibbs sampler for updating the pathway hypothesis from a compendium of heterogeneous data sources, and a new network regression idea for integrating protein attributes. We demonstrate the utility of this approach in a case study of the pheromone response MAPK pathway in the yeast S. cerevisiae.
Collapse
Affiliation(s)
- Alexander M Franks
- Department of Statistics and, Applied Probability, University of California, Santa Barbara, South Hall, Santa Barbara, California 93106, USA
| | - Florian Markowetz
- Cancer Research UK, Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, United Kingdom
| | - Edoardo M Airoldi
- Fox School of Business, Department of Statistical Science, Temple University, Center for Data Science, 1810 Liacouras Walk, Philadelphia, Pennsylvania 19122, USA
| |
Collapse
|
58
|
Identifiability and Reconstruction of Biochemical Reaction Networks from Population Snapshot Data. Processes (Basel) 2018. [DOI: 10.3390/pr6090136] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Inference of biochemical network models from experimental data is a crucial problem in systems and synthetic biology that includes parameter calibration but also identification of unknown interactions. Stochastic modelling from single-cell data is known to improve identifiability of reaction network parameters for specific systems. However, general results are lacking, and the advantage over deterministic, population-average approaches has not been explored for network reconstruction. In this work, we study identifiability and propose new reconstruction methods for biochemical interaction networks. Focusing on population-snapshot data and networks with reaction rates affine in the state, for parameter estimation, we derive general methods to test structural identifiability and demonstrate them in connection with practical identifiability for a reporter gene in silico case study. In the same framework, we next develop a two-step approach to the reconstruction of unknown networks of interactions. We apply it to compare the achievable network reconstruction performance in a deterministic and a stochastic setting, showing the advantage of the latter, and demonstrate it on population-snapshot data from a simulated example.
Collapse
|
59
|
Kim OD, Rocha M, Maia P. A Review of Dynamic Modeling Approaches and Their Application in Computational Strain Optimization for Metabolic Engineering. Front Microbiol 2018; 9:1690. [PMID: 30108559 PMCID: PMC6079213 DOI: 10.3389/fmicb.2018.01690] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 07/06/2018] [Indexed: 12/03/2022] Open
Abstract
Mathematical modeling is a key process to describe the behavior of biological networks. One of the most difficult challenges is to build models that allow quantitative predictions of the cells' states along time. Recently, this issue started to be tackled through novel in silico approaches, such as the reconstruction of dynamic models, the use of phenotype prediction methods, and pathway design via efficient strain optimization algorithms. The use of dynamic models, which include detailed kinetic information of the biological systems, potentially increases the scope of the applications and the accuracy of the phenotype predictions. New efforts in metabolic engineering aim at bridging the gap between this approach and other different paradigms of mathematical modeling, as constraint-based approaches. These strategies take advantage of the best features of each method, and deal with the most remarkable limitation—the lack of available experimental information—which affects the accuracy and feasibility of solutions. Parameter estimation helps to solve this problem, but adding more computational cost to the overall process. Moreover, the existing approaches include limitations such as their scalability, flexibility, convergence time of the simulations, among others. The aim is to establish a trade-off between the size of the model and the level of accuracy of the solutions. In this work, we review the state of the art of dynamic modeling and related methods used for metabolic engineering applications, including approaches based on hybrid modeling. We describe approaches developed to undertake issues regarding the mathematical formulation and the underlying optimization algorithms, and that address the phenotype prediction by including available kinetic rate laws of metabolic processes. Then, we discuss how these have been used and combined as the basis to build computational strain optimization methods for metabolic engineering purposes, how they lead to bi-level schemes that can be used in the industry, including a consideration of their limitations.
Collapse
Affiliation(s)
- Osvaldo D Kim
- SilicoLife Lda, Braga, Portugal.,Centre of Biological Engineering, Universidade do Minho, Braga, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, Universidade do Minho, Braga, Portugal
| | | |
Collapse
|
60
|
Villaverde AF, Becker K, Banga JR. PREMER: A Tool to Infer Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1193-1202. [PMID: 28981423 DOI: 10.1109/tcbb.2017.2758786] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features-such as distinguishing between direct and indirect interactions or determining the direction of a causal link-requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end, we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux, and OSX (https://sites.google.com/site/premertoolbox/).
Collapse
|
61
|
Rougny A, Gloaguen P, Langonné N, Reiter E, Crépieux P, Poupon A, Froidevaux C. A logic-based method to build signaling networks and propose experimental plans. Sci Rep 2018; 8:7830. [PMID: 29777117 PMCID: PMC5959848 DOI: 10.1038/s41598-018-26006-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Accepted: 04/27/2018] [Indexed: 11/15/2022] Open
Abstract
With the dramatic increase of the diversity and the sheer quantity of biological data generated, the construction of comprehensive signaling networks that include precise mechanisms cannot be carried out manually anymore. In this context, we propose a logic-based method that allows building large signaling networks automatically. Our method is based on a set of expert rules that make explicit the reasoning made by biologists when interpreting experimental results coming from a wide variety of experiment types. These rules allow formulating all the conclusions that can be inferred from a set of experimental results, and thus building all the possible networks that explain these results. Moreover, given an hypothesis, our system proposes experimental plans to carry out in order to validate or invalidate it. To evaluate the performance of our method, we applied our framework to the reconstruction of the FSHR-induced and the EGFR-induced signaling networks. The FSHR is known to induce the transactivation of the EGFR, but very little is known on the resulting FSH- and EGF-dependent network. We built a single network using data underlying both networks. This leads to a new hypothesis on the activation of MEK by p38MAPK, which we validate experimentally. These preliminary results represent a first step in the demonstration of a cross-talk between these two major MAP kinases pathways.
Collapse
Affiliation(s)
- Adrien Rougny
- Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), Aomi, Tokyo, 135-0064, Japan.,Laboratoire de Recherche en Informatique UMR CNRS 8623, Université Paris-Sud, Université Paris-Saclay, Orsay Cedex, 91405, France
| | - Pauline Gloaguen
- PRC, INRA, CNRS, Université François Rabelais-Tours, 37380, Nouzilly, France
| | - Nathalie Langonné
- PRC, INRA, CNRS, Université François Rabelais-Tours, 37380, Nouzilly, France.,CNRS; Université François-Rabelais de Tours, UMR 7292, 37032, Tours, France
| | - Eric Reiter
- PRC, INRA, CNRS, Université François Rabelais-Tours, 37380, Nouzilly, France
| | - Pascale Crépieux
- PRC, INRA, CNRS, Université François Rabelais-Tours, 37380, Nouzilly, France
| | - Anne Poupon
- PRC, INRA, CNRS, Université François Rabelais-Tours, 37380, Nouzilly, France.
| | - Christine Froidevaux
- Laboratoire de Recherche en Informatique UMR CNRS 8623, Université Paris-Sud, Université Paris-Saclay, Orsay Cedex, 91405, France
| |
Collapse
|
62
|
Tu JJ, Ou-Yang L, Hu X, Zhang XF. Identifying gene network rewiring by combining gene expression and gene mutation data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:1042-1048. [PMID: 29993891 DOI: 10.1109/tcbb.2018.2834529] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Understanding how gene dependency networks rewire between different disease states is an important task in genomic research. Although many computational methods have been proposed to undertake this task via differential network analysis, most of them are designed for a predefined data type. With the development of the high throughput technologies, gene activity measurements can be collected from different aspects (e.g., mRNA expression and DNA mutation). Different data types might share some common characteristics and include certain unique properties. New methods are needed to explore the similarity and difference between differential networks estimated from different data types. In this study, we develop a new differential network inference model which identifies gene network rewiring by combining gene expression and gene mutation data. Similarity and difference between different data types are learned via a group bridge penalty function. Simulation studies have demonstrated that our method consistently outperforms the competing methods. We also apply our method to identify gene network rewiring associated with ovarian cancer platinum resistance. There are certain differential edges common to both data types and some differential edges unique to individual data types. Hub genes in the differential networks inferred by our method play important roles in ovarian cancer drug resistance.
Collapse
|
63
|
Mall R, Cerulo L, Garofano L, Frattini V, Kunji K, Bensmail H, Sabedot TS, Noushmehr H, Lasorella A, Iavarone A, Ceccarelli M. RGBM: regularized gradient boosting machines for identification of the transcriptional regulators of discrete glioma subtypes. Nucleic Acids Res 2018; 46:e39. [PMID: 29361062 PMCID: PMC6283452 DOI: 10.1093/nar/gky015] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Accepted: 01/06/2018] [Indexed: 01/05/2023] Open
Abstract
We propose a generic framework for gene regulatory network (GRN) inference approached as a feature selection problem. GRNs obtained using Machine Learning techniques are often dense, whereas real GRNs are rather sparse. We use a Tikonov regularization inspired optimal L-curve criterion that utilizes the edge weight distribution for a given target gene to determine the optimal set of TFs associated with it. Our proposed framework allows to incorporate a mechanistic active biding network based on cis-regulatory motif analysis. We evaluate our regularization framework in conjunction with two non-linear ML techniques, namely gradient boosting machines (GBM) and random-forests (GENIE), resulting in a regularized feature selection based method specifically called RGBM and RGENIE respectively. RGBM has been used to identify the main transcription factors that are causally involved as master regulators of the gene expression signature activated in the FGFR3-TACC3-positive glioblastoma. Here, we illustrate that RGBM identifies the main regulators of the molecular subtypes of brain tumors. Our analysis reveals the identity and corresponding biological activities of the master regulators characterizing the difference between G-CIMP-high and G-CIMP-low subtypes and between PA-like and LGm6-GBM, thus providing a clue to the yet undetermined nature of the transcriptional events among these subtypes.
Collapse
Affiliation(s)
- Raghvendra Mall
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Luigi Cerulo
- Department of Science and Technology, University of Sannio, Benevento, Italy
- BIOGEM Istituto di Ricerche Genetiche “G. Salvatore”, Ariano Irpino, Italy
| | - Luciano Garofano
- Department of Science and Technology, University of Sannio, Benevento, Italy
- BIOGEM Istituto di Ricerche Genetiche “G. Salvatore”, Ariano Irpino, Italy
| | - Veronique Frattini
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY 10032, USA
| | - Khalid Kunji
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Halima Bensmail
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Thais S Sabedot
- Department of Neurosurgery, Brain Tumor Center, Henry Ford Health System, Detroit, MI, USA
- Department of Genetics (CISBi/NAP), Department of Surgery and Anatomy, Ribeirão Preto Medical School, University of Sao Paulo, Monte Alegre, Ribeirao Preto, Brazil
| | - Houtan Noushmehr
- Department of Neurosurgery, Brain Tumor Center, Henry Ford Health System, Detroit, MI, USA
- Department of Genetics (CISBi/NAP), Department of Surgery and Anatomy, Ribeirão Preto Medical School, University of Sao Paulo, Monte Alegre, Ribeirao Preto, Brazil
| | - Anna Lasorella
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY 10032, USA
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, New York 10032, USA
- Department of Pediatrics, Columbia University Medical Center, New York, New York 10032, USA
| | - Antonio Iavarone
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY 10032, USA
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, New York 10032, USA
- Department of Neurology, Columbia University Medical Center, New York, New York 10032, USA
| | - Michele Ceccarelli
- Department of Science and Technology, University of Sannio, Benevento, Italy
- BIOGEM Istituto di Ricerche Genetiche “G. Salvatore”, Ariano Irpino, Italy
| |
Collapse
|
64
|
Muñoz S, Carrillo M, Azpeitia E, Rosenblueth DA. Griffin: A Tool for Symbolic Inference of Synchronous Boolean Molecular Networks. Front Genet 2018; 9:39. [PMID: 29559993 PMCID: PMC5845696 DOI: 10.3389/fgene.2018.00039] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 01/29/2018] [Indexed: 11/30/2022] Open
Abstract
Boolean networks are important models of biochemical systems, located at the high end of the abstraction spectrum. A number of Boolean gene networks have been inferred following essentially the same method. Such a method first considers experimental data for a typically underdetermined “regulation” graph. Next, Boolean networks are inferred by using biological constraints to narrow the search space, such as a desired set of (fixed-point or cyclic) attractors. We describe Griffin, a computer tool enhancing this method. Griffin incorporates a number of well-established algorithms, such as Dubrova and Teslenko's algorithm for finding attractors in synchronous Boolean networks. In addition, a formal definition of regulation allows Griffin to employ “symbolic” techniques, able to represent both large sets of network states and Boolean constraints. We observe that when the set of attractors is required to be an exact set, prohibiting additional attractors, a naive Boolean coding of this constraint may be unfeasible. Such cases may be intractable even with symbolic methods, as the number of Boolean constraints may be astronomically large. To overcome this problem, we employ an Artificial Intelligence technique known as “clause learning” considerably increasing Griffin's scalability. Without clause learning only toy examples prohibiting additional attractors are solvable: only one out of seven queries reported here is answered. With clause learning, by contrast, all seven queries are answered. We illustrate Griffin with three case studies drawn from the Arabidopsis thaliana literature. Griffin is available at: http://turing.iimas.unam.mx/griffin.
Collapse
Affiliation(s)
- Stalin Muñoz
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mexico City, Mexico.,Facultad de Ingeniería, Universidad Nacional Autónoma de México, Mexico City, Mexico.,Maestría en Ciencias de la Complejidad, Universidad Autónoma de la Ciudad de México, Mexico City, Mexico
| | - Miguel Carrillo
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Eugenio Azpeitia
- Institut National de Recherche en Informatique et en Automatique Project-Team Virtual Plants, Inria, CIRAD, INRA, Montpellier, France.,Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - David A Rosenblueth
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de Mexico, Mexico City, Mexico
| |
Collapse
|
65
|
Aloraini A, ElSawy KM. Potential Breast Anticancer Drug Targets Revealed by Differential Gene Regulatory Network Analysis and Molecular Docking: Neoadjuvant Docetaxel Drug as a Case Study. Cancer Inform 2018; 17:1176935118755354. [PMID: 29449773 PMCID: PMC5808968 DOI: 10.1177/1176935118755354] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 01/04/2018] [Indexed: 01/19/2023] Open
Abstract
Understanding gene-gene interaction and its causal relationship to protein-protein interaction is a viable route for understanding drug action at the genetic level, which is largely hindered by inability to robustly map gene regulatory networks. Here, we use biological prior knowledge of family-to-family gene interactions available in the KEGG database to reveal individual gene-to-gene interaction networks that underlie the gene expression profiles of 2 cell line data sets, sensitive and resistive to neoadjuvant docetaxel breast anticancer drug. Comparison of the topology of the 2 networks revealed that the resistant network is highly connected with 2 large domains of connectivity: one in which the RAF1 and MAP2K2 genes form hubs of connectivity and another in which the RAS gene is highly connected. On the contrary, the sensitive network is highly disrupted with a lower degree of connectivity. We investigated the interactions of the neoadjuvant docetaxel drug with the protein chains encoded by gene-gene interactions that underlie the disruption of the sensitive network topology using protein-protein and drug-protein docking techniques. We found that the sensitive network is likely to be disrupted by interaction of the neoadjuvant docetaxel drug with the DAXX and FGR1 proteins, which is consistent with the observed accumulation of cytoplasmic DAXX and overexpression of FGR1 precursors in cancer cell lines. This indicates that the DAXX and FGR1 proteins could be potential targets for the neoadjuvant docetaxel drug. The work, therefore, provides a new route for understanding the effect of the drug mode of action from the viewpoint of the change in the topology of gene-gene regulatory networks and provides a new avenue for bridging the gap between gene-gene interactions and protein-protein interactions which could have deep implications on mainstream drug development protocols.
Collapse
Affiliation(s)
- Adel Aloraini
- Department of Computer Science, Qassim University, Buraydah, Saudi Arabia
| | - Karim M ElSawy
- York Centre for Complex Systems Analysis (YCCSA), University of York, York, UK.,Department of Chemistry, College of Science, Qassim University, Buraydah, Saudi Arabia
| |
Collapse
|
66
|
Sehl ME, Wicha MS. Modeling of Interactions between Cancer Stem Cells and their Microenvironment: Predicting Clinical Response. Methods Mol Biol 2018; 1711:333-349. [PMID: 29344897 PMCID: PMC6322404 DOI: 10.1007/978-1-4939-7493-1_16] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2023]
Abstract
Mathematical models of cancer stem cells are useful in translational cancer research for facilitating the understanding of tumor growth dynamics and for predicting treatment response and resistance to combined targeted therapies. In this chapter, we describe appealing aspects of different methods used in mathematical oncology and discuss compelling questions in oncology that can be addressed with these modeling techniques. We describe a simplified version of a model of the breast cancer stem cell niche, illustrate the visualization of the model, and apply stochastic simulation to generate full distributions and average trajectories of cell type populations over time. We further discuss the advent of single-cell data in studying cancer stem cell heterogeneity and how these data can be integrated with modeling to advance understanding of the dynamics of invasive and proliferative populations during cancer progression and response to therapy.
Collapse
Affiliation(s)
- Mary E Sehl
- Division of Hematology-Oncology, Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
- Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Max S Wicha
- Department of Internal Medicine, University of Michigan, 1500 East Medical Center Drive, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
67
|
Tian Z, Guo M, Wang C, Liu X, Wang S. Refine gene functional similarity network based on interaction networks. BMC Bioinformatics 2017; 18:550. [PMID: 29297381 PMCID: PMC5751769 DOI: 10.1186/s12859-017-1969-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND In recent years, biological interaction networks have become the basis of some essential study and achieved success in many applications. Some typical networks such as protein-protein interaction networks have already been investigated systematically. However, little work has been available for the construction of gene functional similarity networks so far. In this research, we will try to build a high reliable gene functional similarity network to promote its further application. RESULTS Here, we propose a novel method to construct and refine the gene functional similarity network. It mainly contains three steps. First, we establish an integrated gene functional similarity networks based on different functional similarity calculation methods. Then, we construct a referenced gene-gene association network based on the protein-protein interaction networks. At last, we refine the spurious edges in the integrated gene functional similarity network with the help of the referenced gene-gene association network. Experiment results indicate that the refined gene functional similarity network (RGFSN) exhibits a scale-free, small world and modular architecture, with its degrees fit best to power law distribution. In addition, we conduct protein complex prediction experiment for human based on RGFSN and achieve an outstanding result, which implies it has high reliability and wide application significance. CONCLUSIONS Our efforts are insightful for constructing and refining gene functional similarity networks, which can be applied to build other high quality biological networks.
Collapse
Affiliation(s)
- Zhen Tian
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Maozu Guo
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, 100044 People’s Republic of China
| | - Chunyu Wang
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Xiaoyan Liu
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Shiming Wang
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| |
Collapse
|
68
|
Xiong W, Wang C, Zhang X, Yang Q, Shao R, Lai J, Du C. Highly interwoven communities of a gene regulatory network unveil topologically important genes for maize seed development. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2017; 92:1143-1156. [PMID: 29072883 DOI: 10.1111/tpj.13750] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Revised: 10/10/2017] [Accepted: 10/17/2017] [Indexed: 06/07/2023]
Abstract
The complex interactions between transcription factors (TFs) and their target genes in a spatially and temporally specific manner are crucial to all cellular processes. Reconstruction of gene regulatory networks (GRNs) from gene expression profiles can help to decipher TF-gene regulations in a variety of contexts; however, the inevitable prediction errors of GRNs hinder optimal data mining of RNA-Seq transcriptome profiles. Here we perform an integrative study of Zea mays (maize) seed development in order to identify key genes in a complex developmental process. First, we reverse engineered a GRN from 78 maize seed transcriptome profiles. Then, we studied collective gene interaction patterns and uncovered highly interwoven network communities as the building blocks of the GRN. One community, composed of mostly unknown genes interacting with opaque2, brittle endosperm1 and shrunken2, contributes to seed phenotypes. Another community, composed mostly of genes expressed in the basal endosperm transfer layer, is responsible for nutrient transport. We further integrated our inferred GRN with gene expression patterns in different seed compartments and at various developmental stages and pathways. The integration facilitated a biological interpretation of the GRN. Our yeast one-hybrid assays verified six out of eight TF-promoter bindings in the reconstructed GRN. This study identified topologically important genes in interwoven network communities that may be crucial to maize seed development.
Collapse
Affiliation(s)
- Wenwei Xiong
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450002, China
- Department of Biology, Montclair State University, Montclair, NJ, 07043, USA
| | - Chunlei Wang
- National Maize Improvement Center, China Agricultural University, Beijing, 100083, China
| | - Xiangbo Zhang
- National Maize Improvement Center, China Agricultural University, Beijing, 100083, China
| | - Qinghua Yang
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450002, China
| | - Ruixin Shao
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450002, China
| | - Jinsheng Lai
- National Maize Improvement Center, China Agricultural University, Beijing, 100083, China
| | - Chunguang Du
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450002, China
- Department of Biology, Montclair State University, Montclair, NJ, 07043, USA
| |
Collapse
|
69
|
Yuan H, Xi R, Chen C, Deng M. Differential network analysis via lasso penalized D-trace loss. Biometrika 2017. [DOI: 10.1093/biomet/asx049] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
|
70
|
Zhang XF, Ou-Yang L, Yan H. Node-based differential network analysis in genomics. Comput Biol Chem 2017; 69:194-201. [DOI: 10.1016/j.compbiolchem.2017.03.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Accepted: 03/27/2017] [Indexed: 12/26/2022]
|
71
|
Tripathi S, Lloyd-Price J, Ribeiro A, Yli-Harja O, Dehmer M, Emmert-Streib F. sgnesR: An R package for simulating gene expression data from an underlying real gene network structure considering delay parameters. BMC Bioinformatics 2017; 18:325. [PMID: 28676075 PMCID: PMC5496254 DOI: 10.1186/s12859-017-1731-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 06/15/2017] [Indexed: 01/04/2023] Open
Abstract
Background sgnesR (Stochastic Gene Network Expression Simulator in R) is an R package that provides an interface to simulate gene expression data from a given gene network using the stochastic simulation algorithm (SSA). The package allows various options for delay parameters and can easily included in reactions for promoter delay, RNA delay and Protein delay. A user can tune these parameters to model various types of reactions within a cell. As examples, we present two network models to generate expression profiles. We also demonstrated the inference of networks and the evaluation of association measure of edge and non-edge components from the generated expression profiles. Results The purpose of sgnesR is to enable an easy to use and a quick implementation for generating realistic gene expression data from biologically relevant networks that can be user selected. Conclusions sgnesR is freely available for academic use. The R package has been tested for R 3.2.0 under Linux, Windows and Mac OS X. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1731-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shailesh Tripathi
- Predictive Medicine and Data Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Jason Lloyd-Price
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, USA.,Laboratory of Biosystem Dynamics, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Andre Ribeiro
- Laboratory of Biosystem Dynamics, Department of Signal Processing, Tampere University of Technology, Tampere, Finland.,Institute of Biosciences and Medical Technology, Tampere, Finland
| | - Olli Yli-Harja
- Institute of Biosciences and Medical Technology, Tampere, Finland.,Computational Systems Biology, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Matthias Dehmer
- Institute for Theoretical Informatics, Mathematics and Operations Research, Department of Computer Science, Universität der Bundeswehr München, Munich, Germany
| | - Frank Emmert-Streib
- Predictive Medicine and Data Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland. .,Institute of Biosciences and Medical Technology, Tampere, Finland.
| |
Collapse
|
72
|
Reverse engineering highlights potential principles of large gene regulatory network design and learning. NPJ Syst Biol Appl 2017. [PMID: 28649444 PMCID: PMC5481436 DOI: 10.1038/s41540-017-0019-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Inferring transcriptional gene regulatory networks from transcriptomic datasets is a key challenge of systems biology, with potential impacts ranging from medicine to agronomy. There are several techniques used presently to experimentally assay transcription factors to target relationships, defining important information about real gene regulatory networks connections. These techniques include classical ChIP-seq, yeast one-hybrid, or more recently, DAP-seq or target technologies. These techniques are usually used to validate algorithm predictions. Here, we developed a reverse engineering approach based on mathematical and computer simulation to evaluate the impact that this prior knowledge on gene regulatory networks may have on training machine learning algorithms. First, we developed a gene regulatory networks-simulating engine called FRANK (Fast Randomizing Algorithm for Network Knowledge) that is able to simulate large gene regulatory networks (containing 104 genes) with characteristics of gene regulatory networks observed in vivo. FRANK also generates stable or oscillatory gene expression directly produced by the simulated gene regulatory networks. The development of FRANK leads to important general conclusions concerning the design of large and stable gene regulatory networks harboring scale free properties (built ex nihilo). In combination with supervised (accepting prior knowledge) support vector machine algorithm we (i) address biologically oriented questions concerning our capacity to accurately reconstruct gene regulatory networks and in particular we demonstrate that prior-knowledge structure is crucial for accurate learning, and (ii) draw conclusions to inform experimental design to performed learning able to solve gene regulatory networks in the future. By demonstrating that our predictions concerning the influence of the prior-knowledge structure on support vector machine learning capacity holds true on real data (Escherichia coli K14 network reconstruction using network and transcriptomic data), we show that the formalism used to build FRANK can to some extent be a reasonable model for gene regulatory networks in real cells. This work by Carré et al addresses central questions in biology, which are: how very large gene regulatory networks (GRNs) are organized, generate stable gene expression, and can be learnt using machine learning algorithms? In this work authors developed an algorithm able to simulate large GRNs. From these networks they simulate stable or oscillating gene expression and highlights some mathematical rules controlling such a collective (several thousands of genes) behavior. They discuss consequent hypothesis concerning the organization of GRNs in real cells. Using this simulation tool, authors also demonstrate that it’s likely possible to computationally learn GRNs from transcriptomic data and prior knowledge on the network (actual known connections issued from Yeast One Hybrid or ChIP Seq for instance). They particularly highlight the crucial importance of the prior knowledge structure in their capacity to learn large GRNs.
Collapse
|
73
|
De Souza Jacomini R, Martins DC, Da Silva FL, Costa AHR. GeNICE: A Novel Framework for Gene Network Inference by Clustering, Exhaustive Search, and Multivariate Analysis. J Comput Biol 2017. [PMID: 28636461 DOI: 10.1089/cmb.2017.0022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Gene network (GN) inference from temporal gene expression data is a crucial and challenging problem in systems biology. Expression data sets usually consist of dozens of temporal samples, while networks consist of thousands of genes, thus rendering many inference methods unfeasible in practice. To improve the scalability of GN inference methods, we propose a novel framework called GeNICE, based on probabilistic GNs; the main novelty is the introduction of a clustering procedure to group genes with related expression profiles and to provide an approximate solution with reduced computational complexity. We use the defined clusters to perform an exhaustive search to retrieve the best predictor gene subsets for each target gene, according to multivariate criterion functions. GeNICE greatly reduces the search space because predictor candidates are restricted to one gene per cluster. Finally, a multivariate analysis is performed for each defined predictor subset to retrieve minimal subsets and to simplify the network. In our experiments with in silico generated data sets, GeNICE achieved substantial computational time reduction when compared to solutions without the clustering step, while preserving the gene expression prediction accuracy even when the number of clusters is small (about 50) relative to the number of genes (order of thousands). For a Plasmodium falciparum microarray data set, the prediction accuracy achieved by GeNICE was roughly 97%, while the respective topologies involving glycolytic and apicoplast seed genes had a very large intramodularity, very small interconnection between modules, and some module hub genes, reflecting small-world and scale-free topological properties, as expected.
Collapse
|
74
|
Guo W, Calixto CPG, Tzioutziou N, Lin P, Waugh R, Brown JWS, Zhang R. Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size. BMC SYSTEMS BIOLOGY 2017; 11:62. [PMID: 28629365 PMCID: PMC5477119 DOI: 10.1186/s12918-017-0440-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 06/09/2017] [Indexed: 12/18/2022]
Abstract
BACKGROUND Co-expression has been widely used to identify novel regulatory relationships using high throughput measurements, such as microarray and RNA-seq data. Evaluation studies on co-expression network analysis methods mostly focus on networks of small or medium size of up to a few hundred nodes. For large networks, simulated expression data usually consist of hundreds or thousands of profiles with different perturbations or knock-outs, which is uncommon in real experiments due to their cost and the amount of work required. Thus, the performances of co-expression network analysis methods on large co-expression networks consisting of a few thousand nodes, with only a small number of profiles with a single perturbation, which more accurately reflect normal experimental conditions, are generally uncharacterized and unknown. METHODS We proposed a novel network inference methods based on Relevance Low order Partial Correlation (RLowPC). RLowPC method uses a two-step approach to select on the high-confidence edges first by reducing the search space by only picking the top ranked genes from an intial partial correlation analysis and, then computes the partial correlations in the confined search space by only removing the linear dependencies from the shared neighbours, largely ignoring the genes showing lower association. RESULTS We selected six co-expression-based methods with good performance in evaluation studies from the literature: Partial correlation, PCIT, ARACNE, MRNET, MRNETB and CLR. The evaluation of these methods was carried out on simulated time-series data with various network sizes ranging from 100 to 3000 nodes. Simulation results show low precision and recall for all of the above methods for large networks with a small number of expression profiles. We improved the inference significantly by refinement of the top weighted edges in the pre-inferred partial correlation networks using RLowPC. We found improved performance by partitioning large networks into smaller co-expressed modules when assessing the method performance within these modules. CONCLUSIONS The evaluation results show that current methods suffer from low precision and recall for large co-expression networks where only a small number of profiles are available. The proposed RLowPC method effectively reduces the indirect edges predicted as regulatory relationships and increases the precision of top ranked predictions. Partitioning large networks into smaller highly co-expressed modules also helps to improve the performance of network inference methods. The RLowPC R package for network construction, refinement and evaluation is available at GitHub: https://github.com/wyguo/RLowPC .
Collapse
Affiliation(s)
- Wenbin Guo
- Information and Computational Sciences, The James Hutton Institute, Invergowrie, Dundee, Scotland, DD2 5DA, UK
- Plant Sciences Division, School of Life Sciences, University of Dundee, Invergowrie, Dundee, Scotland, DD2 5DA, UK
| | - Cristiane P G Calixto
- Plant Sciences Division, School of Life Sciences, University of Dundee, Invergowrie, Dundee, Scotland, DD2 5DA, UK
| | - Nikoleta Tzioutziou
- Plant Sciences Division, School of Life Sciences, University of Dundee, Invergowrie, Dundee, Scotland, DD2 5DA, UK
| | - Ping Lin
- Division of Mathematics, University of Dundee, Nethergate, Dundee, Scotland, DD1 4HN, UK
| | - Robbie Waugh
- Plant Sciences Division, School of Life Sciences, University of Dundee, Invergowrie, Dundee, Scotland, DD2 5DA, UK
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, Scotland, DD2 5DA, UK
| | - John W S Brown
- Plant Sciences Division, School of Life Sciences, University of Dundee, Invergowrie, Dundee, Scotland, DD2 5DA, UK
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, Scotland, DD2 5DA, UK
| | - Runxuan Zhang
- Information and Computational Sciences, The James Hutton Institute, Invergowrie, Dundee, Scotland, DD2 5DA, UK.
| |
Collapse
|
75
|
Barlow SM, Maron JL, Alterovitz G, Song D, Wilson BJ, Jegatheesan P, Govindaswami B, Lee J, Rosner AO. Somatosensory Modulation of Salivary Gene Expression and Oral Feeding in Preterm Infants: Randomized Controlled Trial. JMIR Res Protoc 2017; 6:e113. [PMID: 28615158 PMCID: PMC5489710 DOI: 10.2196/resprot.7712] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 04/28/2017] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Despite numerous medical advances in the care of at-risk preterm neonates, oral feeding still represents one of the first and most advanced neurological challenges facing this delicate population. Objective, quantitative, and noninvasive assessment tools, as well as neurotherapeutic strategies, are greatly needed in order to improve feeding and developmental outcomes. Pulsed pneumatic orocutaneous stimulation has been shown to improve nonnutritive sucking (NNS) skills in preterm infants who exhibit delayed or disordered nipple feeding behaviors. Separately, the study of the salivary transcriptome in neonates has helped identify biomarkers directly linked to successful neonatal oral feeding behavior. The combination of noninvasive treatment strategies and transcriptomic analysis represents an integrative approach to oral feeding in which rapid technological advances and personalized transcriptomics can safely and noninvasively be brought to the bedside to inform medical care decisions and improve care and outcomes. OBJECTIVE The study aimed to conduct a multicenter randomized control trial (RCT) to combine molecular and behavioral methods in an experimental conceptualization approach to map the effects of PULSED somatosensory stimulation on salivary gene expression in the context of the acquisition of oral feeding habits in high-risk human neonates. The aims of this study represent the first attempt to combine noninvasive treatment strategies and transcriptomic assessments of high-risk extremely preterm infants (EPI) to (1) improve oral feeding behavior and skills, (2) further our understanding of the gene ontology of biologically diverse pathways related to oral feeding, (3) use gene expression data to personalize neonatal care and individualize treatment strategies and timing interventions, and (4) improve long-term developmental outcomes. METHODS A total of 180 extremely preterm infants from three neonatal intensive care units (NICUs) will be randomized to receive either PULSED or SHAM (non-pulsing) orocutaneous intervention simultaneous with tube feedings 3 times per day for 4 weeks, beginning at 30 weeks postconceptional age. Infants will also be assessed 3 times per week for NNS performance, and multiple saliva samples will be obtained each week for transcriptomic analysis, until infants have achieved full oral feeding status. At 18 months corrected age (CA), infants will undergo neurodevelopmental follow-up testing, the results of which will be correlated with feeding outcomes in the neo-and post-natal period and with gene expression data and intervention status. RESULTS The ongoing National Institutes of Health funded randomized controlled trial R01HD086088 is actively recruiting participants. The expected completion date of the study is 2021. CONCLUSIONS Differential salivary gene expression profiles in response to orosensory entrainment intervention are expected to lead to the development of individualized interventions for the diagnosis and management of oral feeding in preterm infants. TRIAL REGISTRATION ClinicalTrials.gov NCT02696343; https://clinicaltrials.gov/ct2/show/NCT02696343 (Archived by WebCite at http://www.webcitation.org/6r5NbJ9Ym).
Collapse
Affiliation(s)
- Steven Michael Barlow
- Center for Brain, Biology, and Behavior, Department of Special Education and Communication Disorders, Biological Systems Engineering, University of Nebraska, Lincoln, NE, United States
| | - Jill Lamanna Maron
- Tufts Medical Center, Division of Neonatology, Department of Pediatrics, Boston, MA, United States
| | - Gil Alterovitz
- Center for Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Dongli Song
- Division of Neonatology, Department of Pediatrics, Santa Clara Valley Medical Center, San Jose, CA, United States
| | - Bernard Joseph Wilson
- CHI Health St. Elizabeth, Division of Neonatal-Perinatal Medicine, Lincoln, NE, United States
| | - Priya Jegatheesan
- Division of Neonatology, Department of Pediatrics, Santa Clara Valley Medical Center, San Jose, CA, United States
| | - Balaji Govindaswami
- Division of Neonatology, Department of Pediatrics, Santa Clara Valley Medical Center, San Jose, CA, United States
| | - Jaehoon Lee
- IMMAP, Department of Educational Psychology and Leadership, Texas Tech University, Lubbock, TX, United States
| | - Austin Oder Rosner
- Tufts Medical Center, Division of Neonatology, Department of Pediatrics, Boston, MA, United States
| |
Collapse
|
76
|
Koch C, Konieczka J, Delorey T, Lyons A, Socha A, Davis K, Knaack SA, Thompson D, O'Shea EK, Regev A, Roy S. Inference and Evolutionary Analysis of Genome-Scale Regulatory Networks in Large Phylogenies. Cell Syst 2017; 4:543-558.e8. [PMID: 28544882 PMCID: PMC5515301 DOI: 10.1016/j.cels.2017.04.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Revised: 02/20/2017] [Accepted: 04/26/2017] [Indexed: 11/22/2022]
Abstract
Changes in transcriptional regulatory networks can significantly contribute to species evolution and adaptation. However, identification of genome-scale regulatory networks is an open challenge, especially in non-model organisms. Here, we introduce multi-species regulatory network learning (MRTLE), a computational approach that uses phylogenetic structure, sequence-specific motifs, and transcriptomic data, to infer the regulatory networks in different species. Using simulated data from known networks and transcriptomic data from six divergent yeasts, we demonstrate that MRTLE predicts networks with greater accuracy than existing methods because it incorporates phylogenetic information. We used MRTLE to infer the structure of the transcriptional networks that control the osmotic stress responses of divergent, non-model yeast species and then validated our predictions experimentally. Interrogating these networks reveals that gene duplication promotes network divergence across evolution. Taken together, our approach facilitates study of regulatory network evolutionary dynamics across multiple poorly studied species.
Collapse
Affiliation(s)
- Christopher Koch
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wl, USA
| | - Jay Konieczka
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Toni Delorey
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Ana Lyons
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Amanda Socha
- Dartmouth College, Biology department, Hanover, NH 03755, USA
| | - Kathleen Davis
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
| | - Sara A Knaack
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, Wl, USA
| | - Dawn Thompson
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Erin K O'Shea
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA
- Howard Hughes Medical Institute, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
- Faculty of Arts and Sciences Center for Systems Biology, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
- Department of Molecular and Cellular Biology, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Howard Hughes Medical Institute, Chevy Chase, Maryland, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, Wl, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wl, USA
| |
Collapse
|
77
|
Pirkl M, Diekmann M, van der Wees M, Beerenwinkel N, Fröhlich H, Markowetz F. Inferring modulators of genetic interactions with epistatic nested effects models. PLoS Comput Biol 2017; 13:e1005496. [PMID: 28406896 PMCID: PMC5407847 DOI: 10.1371/journal.pcbi.1005496] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Revised: 04/27/2017] [Accepted: 04/03/2017] [Indexed: 12/27/2022] Open
Abstract
Maps of genetic interactions can dissect functional redundancies in cellular networks. Gene expression profiles as high-dimensional molecular readouts of combinatorial perturbations provide a detailed view of genetic interactions, but can be hard to interpret if different gene sets respond in different ways (called mixed epistasis). Here we test the hypothesis that mixed epistasis between a gene pair can be explained by the action of a third gene that modulates the interaction. We have extended the framework of Nested Effects Models (NEMs), a type of graphical model specifically tailored to analyze high-dimensional gene perturbation data, to incorporate logical functions that describe interactions between regulators on downstream genes and proteins. We benchmark our approach in the controlled setting of a simulation study and show high accuracy in inferring the correct model. In an application to data from deletion mutants of kinases and phosphatases in S. cerevisiae we show that epistatic NEMs can point to modulators of genetic interactions. Our approach is implemented in the R-package 'epiNEM' available from https://github.com/cbg-ethz/epiNEM and https://bioconductor.org/packages/epiNEM/.
Collapse
Affiliation(s)
- Martin Pirkl
- ETH Zurich, Department of Biosystems Science and Engineering, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Madeline Diekmann
- ETH Zurich, Department of Biosystems Science and Engineering, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Niko Beerenwinkel
- ETH Zurich, Department of Biosystems Science and Engineering, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Holger Fröhlich
- Bonn-Aachen International Center for IT (B-IT), University of Bonn, Bonn, Germany
- UCB Biosciences GmbH, Monheim, Germany
| | - Florian Markowetz
- University of Cambridge, Cancer Research UK Cambridge Institute, Cambridge, United Kingdom
| |
Collapse
|
78
|
Trescher S, Münchmeyer J, Leser U. Estimating genome-wide regulatory activity from multi-omics data sets using mathematical optimization. BMC SYSTEMS BIOLOGY 2017; 11:41. [PMID: 28347313 PMCID: PMC5369021 DOI: 10.1186/s12918-017-0419-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2016] [Accepted: 03/08/2017] [Indexed: 12/28/2022]
Abstract
Background Gene regulation is one of the most important cellular processes, indispensable for the adaptability of organisms and closely interlinked with several classes of pathogenesis and their progression. Elucidation of regulatory mechanisms can be approached by a multitude of experimental methods, yet integration of the resulting heterogeneous, large, and noisy data sets into comprehensive and tissue or disease-specific cellular models requires rigorous computational methods. Recently, several algorithms have been proposed which model genome-wide gene regulation as sets of (linear) equations over the activity and relationships of transcription factors, genes and other factors. Subsequent optimization finds those parameters that minimize the divergence of predicted and measured expression intensities. In various settings, these methods produced promising results in terms of estimating transcription factor activity and identifying key biomarkers for specific phenotypes. However, despite their common root in mathematical optimization, they vastly differ in the types of experimental data being integrated, the background knowledge necessary for their application, the granularity of their regulatory model, the concrete paradigm used for solving the optimization problem and the data sets used for evaluation. Results Here, we review five recent methods of this class in detail and compare them with respect to several key properties. Furthermore, we quantitatively compare the results of four of the presented methods based on publicly available data sets. Conclusions The results show that all methods seem to find biologically relevant information. However, we also observe that the mutual result overlaps are very low, which contradicts biological intuition. Our aim is to raise further awareness of the power of these methods, yet also to identify common shortcomings and necessary extensions enabling focused research on the critical points. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0419-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Saskia Trescher
- Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany.
| | - Jannes Münchmeyer
- Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
| | - Ulf Leser
- Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
| |
Collapse
|
79
|
Nam S. Databases and tools for constructing signal transduction networks in cancer. BMB Rep 2017; 50:12-19. [PMID: 27502015 PMCID: PMC5319659 DOI: 10.5483/bmbrep.2017.50.1.135] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2016] [Indexed: 12/22/2022] Open
Abstract
Traditionally, biologists have devoted their careers to studying individual biological entities of their own interest, partly due to lack of available data regarding that entity. Large, high-throughput data, too complex for conventional processing methods (i.e., “big data”), has accumulated in cancer biology, which is freely available in public data repositories. Such challenges urge biologists to inspect their biological entities of interest using novel approaches, firstly including repository data retrieval. Essentially, these revolutionary changes demand new interpretations of huge datasets at a systems-level, by so called “systems biology”. One of the representative applications of systems biology is to generate a biological network from high-throughput big data, providing a global map of molecular events associated with specific phenotype changes. In this review, we introduce the repositories of cancer big data and cutting-edge systems biology tools for network generation, and improved identification of therapeutic targets.
Collapse
Affiliation(s)
- Seungyoon Nam
- Department of Life Sciences, Gachon University, Seongnam 13120; Department of Genome Medicine and Science, College of Medicine, Gachon University; Gachon Institute of Genome Medicine and Science, Gachon University Gil Medical Center, Incheon 21565, Korea
| |
Collapse
|
80
|
Henriques D, Villaverde AF, Rocha M, Saez-Rodriguez J, Banga JR. Data-driven reverse engineering of signaling pathways using ensembles of dynamic models. PLoS Comput Biol 2017; 13:e1005379. [PMID: 28166222 PMCID: PMC5319798 DOI: 10.1371/journal.pcbi.1005379] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 02/21/2017] [Accepted: 01/24/2017] [Indexed: 11/19/2022] Open
Abstract
Despite significant efforts and remarkable progress, the inference of signaling networks from experimental data remains very challenging. The problem is particularly difficult when the objective is to obtain a dynamic model capable of predicting the effect of novel perturbations not considered during model training. The problem is ill-posed due to the nonlinear nature of these systems, the fact that only a fraction of the involved proteins and their post-translational modifications can be measured, and limitations on the technologies used for growing cells in vitro, perturbing them, and measuring their variations. As a consequence, there is a pervasive lack of identifiability. To overcome these issues, we present a methodology called SELDOM (enSEmbLe of Dynamic lOgic-based Models), which builds an ensemble of logic-based dynamic models, trains them to experimental data, and combines their individual simulations into an ensemble prediction. It also includes a model reduction step to prune spurious interactions and mitigate overfitting. SELDOM is a data-driven method, in the sense that it does not require any prior knowledge of the system: the interaction networks that act as scaffolds for the dynamic models are inferred from data using mutual information. We have tested SELDOM on a number of experimental and in silico signal transduction case-studies, including the recent HPN-DREAM breast cancer challenge. We found that its performance is highly competitive compared to state-of-the-art methods for the purpose of recovering network topology. More importantly, the utility of SELDOM goes beyond basic network inference (i.e. uncovering static interaction networks): it builds dynamic (based on ordinary differential equation) models, which can be used for mechanistic interpretations and reliable dynamic predictions in new experimental conditions (i.e. not used in the training). For this task, SELDOM's ensemble prediction is not only consistently better than predictions from individual models, but also often outperforms the state of the art represented by the methods used in the HPN-DREAM challenge.
Collapse
Affiliation(s)
- David Henriques
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
| | - Alejandro F. Villaverde
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Julio Saez-Rodriguez
- Joint Research Center for Computational Biomedicine, RWTH-Aachen University, Aachen, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Julio R. Banga
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
| |
Collapse
|
81
|
Guo S, Jiang Q, Chen L, Guo D. Gene regulatory network inference using PLS-based methods. BMC Bioinformatics 2016; 17:545. [PMID: 28031031 PMCID: PMC5192600 DOI: 10.1186/s12859-016-1398-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 12/01/2016] [Indexed: 12/12/2022] Open
Abstract
Background Inferring the topology of gene regulatory networks (GRNs) from microarray gene expression data has many potential applications, such as identifying candidate drug targets and providing valuable insights into the biological processes. It remains a challenge due to the fact that the data is noisy and high dimensional, and there exists a large number of potential interactions. Results We introduce an ensemble gene regulatory network inference method PLSNET, which decomposes the GRN inference problem with p genes into p subproblems and solves each of the subproblems by using Partial least squares (PLS) based feature selection algorithm. Then, a statistical technique is used to refine the predictions in our method. The proposed method was evaluated on the DREAM4 and DREAM5 benchmark datasets and achieved higher accuracy than the winners of those competitions and other state-of-the-art GRN inference methods. Conclusions Superior accuracy achieved on different benchmark datasets, including both in silico and in vivo networks, shows that PLSNET reaches state-of-the-art performance. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1398-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shun Guo
- Department of Electronic Engineering, Xiamen University, Fujian, 361005, China.,Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000, China
| | - Qingshan Jiang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000, China
| | - Lifei Chen
- School of Mathematics and Computer Science, Fujian Normal University, Fujian, 350117, China
| | - Donghui Guo
- Department of Electronic Engineering, Xiamen University, Fujian, 361005, China.
| |
Collapse
|
82
|
Young WC, Raftery AE, Yeung KY. A posterior probability approach for gene regulatory network inference in genetic perturbation data. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2016; 13:1241-1251. [PMID: 27775378 DOI: 10.3934/mbe.2016041] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown experiments in the NIH LINCS dataset by calculating posterior probabilities, incorporating prior information. We show that the method is able to find previously identified edges from TRANSFAC and JASPAR and discuss the merits and limitations of this approach.
Collapse
Affiliation(s)
- William Chad Young
- University of Washington, Department of Statistics, Box 354322, Seattle, WA 98195-4322, United States.
| | | | | |
Collapse
|
83
|
Kannan V, Tegner J. Adaptive input data transformation for improved network reconstruction with information theoretic algorithms. Stat Appl Genet Mol Biol 2016; 15:507-520. [PMID: 27875324 DOI: 10.1515/sagmb-2016-0013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
We propose a novel systematic procedure of non-linear data transformation for an adaptive algorithm in the context of network reverse-engineering using information theoretic methods. Our methodology is rooted in elucidating and correcting for the specific biases in the estimation techniques for mutual information (MI) given a finite sample of data. These are, in turn, tied to lack of well-defined bounds for numerical estimation of MI for continuous probability distributions from finite data. The nature and properties of the inevitable bias is described, complemented by several examples illustrating their form and variation. We propose an adaptive partitioning scheme for MI estimation that effectively transforms the sample data using parameters determined from its local and global distribution guaranteeing a more robust and reliable reconstruction algorithm. Together with a normalized measure (Shared Information Metric) we report considerably enhanced performance both for in silico and real-world biological networks. We also find that the recovery of true interactions is in particular better for intermediate range of false positive rates, suggesting that our algorithm is less vulnerable to spurious signals of association.
Collapse
|
84
|
Gene Regulatory Network Inferences Using a Maximum-Relevance and Maximum-Significance Strategy. PLoS One 2016; 11:e0166115. [PMID: 27829000 PMCID: PMC5102470 DOI: 10.1371/journal.pone.0166115] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 10/24/2016] [Indexed: 12/18/2022] Open
Abstract
Recovering gene regulatory networks from expression data is a challenging problem in systems biology that provides valuable information on the regulatory mechanisms of cells. A number of algorithms based on computational models are currently used to recover network topology. However, most of these algorithms have limitations. For example, many models tend to be complicated because of the "large p, small n" problem. In this paper, we propose a novel regulatory network inference method called the maximum-relevance and maximum-significance network (MRMSn) method, which converts the problem of recovering networks into a problem of how to select the regulator genes for each gene. To solve the latter problem, we present an algorithm that is based on information theory and selects the regulator genes for a specific gene by maximizing the relevance and significance. A first-order incremental search algorithm is used to search for regulator genes. Eventually, a strict constraint is adopted to adjust all of the regulatory relationships according to the obtained regulator genes and thus obtain the complete network structure. We performed our method on five different datasets and compared our method to five state-of-the-art methods for network inference based on information theory. The results confirm the effectiveness of our method.
Collapse
|
85
|
Han SW, Chen G, Cheon MS, Zhong H. Estimation of Directed Acyclic Graphs Through Two-stage Adaptive Lasso for Gene Network Inference. J Am Stat Assoc 2016; 111:1004-1019. [PMID: 28239216 DOI: 10.1080/01621459.2016.1142880] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Graphical models are a popular approach to find dependence and conditional independence relationships between gene expressions. Directed acyclic graphs (DAGs) are a special class of directed graphical models, where all the edges are directed edges and contain no directed cycles. The DAGs are well known models for discovering causal relationships between genes in gene regulatory networks. However, estimating DAGs without assuming known ordering is challenging due to high dimensionality, the acyclic constraints, and the presence of equivalence class from observational data. To overcome these challenges, we propose a two-stage adaptive Lasso approach, called NS-DIST, which performs neighborhood selection (NS) in stage 1, and then estimates DAGs by the Discrete Improving Search with Tabu (DIST) algorithm within the selected neighborhood. Simulation studies are presented to demonstrate the effectiveness of the method and its computational efficiency. Two real data examples are used to demonstrate the practical usage of our method for gene regulatory network inference.
Collapse
Affiliation(s)
- Sung Won Han
- Division of Biostatistics, Departments of Population Health, New York University, New York, NY, USA, 10016
| | - Gong Chen
- Pharmaceutical Sciences, Pharma Early Research and Development, Roche Innovation Center New York, New York, NY, USA
| | - Myun-Seok Cheon
- School of Industrial and System Engineering, Georgia Institute of Technology, Atlanta, GA, USA, 30332
| | - Hua Zhong
- Division of Biostatistics, Departments of Population Health, New York University, New York, NY, USA, 10016
| |
Collapse
|
86
|
Banf M, Rhee SY. Computational inference of gene regulatory networks: Approaches, limitations and opportunities. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2016; 1860:41-52. [PMID: 27641093 DOI: 10.1016/j.bbagrm.2016.09.003] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Revised: 09/08/2016] [Accepted: 09/08/2016] [Indexed: 10/21/2022]
Abstract
Gene regulatory networks lie at the core of cell function control. In E. coli and S. cerevisiae, the study of gene regulatory networks has led to the discovery of regulatory mechanisms responsible for the control of cell growth, differentiation and responses to environmental stimuli. In plants, computational rendering of gene regulatory networks is gaining momentum, thanks to the recent availability of high-quality genomes and transcriptomes and development of computational network inference approaches. Here, we review current techniques, challenges and trends in gene regulatory network inference and highlight challenges and opportunities for plant science. We provide plant-specific application examples to guide researchers in selecting methodologies that suit their particular research questions. Given the interdisciplinary nature of gene regulatory network inference, we tried to cater to both biologists and computer scientists to help them engage in a dialogue about concepts and caveats in network inference. Specifically, we discuss problems and opportunities in heterogeneous data integration for eukaryotic organisms and common caveats to be considered during network model evaluation. This article is part of a Special Issue entitled: Plant Gene Regulatory Mechanisms and Networks, edited by Dr. Erich Grotewold and Dr. Nathan Springer.
Collapse
Affiliation(s)
- Michael Banf
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama Street, Stanford 93405, United States.
| | - Seung Y Rhee
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama Street, Stanford 93405, United States.
| |
Collapse
|
87
|
Ud-Dean SMM, Heise S, Klamt S, Gunawan R. TRaCE+: Ensemble inference of gene regulatory networks from transcriptional expression profiles of gene knock-out experiments. BMC Bioinformatics 2016; 17:252. [PMID: 27342648 PMCID: PMC4919846 DOI: 10.1186/s12859-016-1137-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 06/12/2016] [Indexed: 12/26/2022] Open
Abstract
Background The inference of gene regulatory networks (GRNs) from transcriptional expression profiles is challenging, predominantly due to its underdetermined nature. One important consequence of underdetermination is the existence of many possible solutions to this inference. Our previously proposed ensemble inference algorithm TRaCE addressed this issue by inferring an ensemble of network directed graphs (digraphs) using differential gene expressions from gene knock-out (KO) experiments. However, TRaCE could not deal with the mode of the transcriptional regulations (activation or repression), an important feature of GRNs. Results In this work, we developed a new algorithm called TRaCE+ for the inference of an ensemble of signed GRN digraphs from transcriptional expression data of gene KO experiments. The sign of the edges indicates whether the regulation is an activation (positive) or a repression (negative). TRaCE+ generates the upper and lower bounds of the ensemble, which define uncertain regulatory interactions that could not be verified by the data. As demonstrated in the case studies using Escherichia coli GRN and 100-gene gold-standard GRNs from DREAM 4 network inference challenge, by accounting for regulatory signs, TRaCE+ could extract more information from the KO data than TRaCE, leading to fewer uncertain edges. Importantly, iterating TRaCE+ with an optimal design of gene KOs could resolve the underdetermined issue of GRN inference in much fewer KO experiments than using TRaCE. Conclusions TRaCE+ expands the applications of ensemble GRN inference strategy by accounting for the mode of the gene regulatory interactions. In comparison to TRaCE, TRaCE+ enables a better utilization of gene KO data, thereby reducing the cost of tackling underdetermined GRN inference. TRaCE+ subroutines for MATLAB are freely available at the following website: http://www.cabsel.ethz.ch/tools/trace.html.
Collapse
Affiliation(s)
- S M Minhaz Ud-Dean
- Institute for Chemical and Bioengineering, ETH Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sandra Heise
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Steffen Klamt
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Rudiyanto Gunawan
- Institute for Chemical and Bioengineering, ETH Zurich, Zurich, Switzerland. .,Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
88
|
Mayer G, Marcus K, Eisenacher M, Kohl M. Boolean modeling techniques for protein co-expression networks in systems medicine. Expert Rev Proteomics 2016; 13:555-69. [PMID: 27105325 DOI: 10.1080/14789450.2016.1181546] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
INTRODUCTION Application of systems biology/systems medicine approaches is promising for proteomics/biomedical research, but requires selection of an adequate modeling type. AREAS COVERED This article reviews the existing Boolean network modeling approaches, which provide in comparison with alternative modeling techniques several advantages for the processing of proteomics data. Application of methods for inference, reduction and validation of protein co-expression networks that are derived from quantitative high-throughput proteomics measurements is presented. It's also shown how Boolean models can be used to derive system-theoretic characteristics that describe both the dynamical behavior of such networks as a whole and the properties of different cell states (e.g. healthy or diseased cell states). Furthermore, application of methods derived from control theory is proposed in order to simulate the effects of therapeutic interventions on such networks, which is a promising approach for the computer-assisted discovery of biomarkers and drug targets. Finally, the clinical application of Boolean modeling analyses is discussed. Expert commentary: Boolean modeling of proteomics data is still in its infancy. Progress in this field strongly depends on provision of a repository with public access to relevant reference models. Also required are community supported standards that facilitate input of both proteomics and patient related data (e.g. age, gender, laboratory results, etc.).
Collapse
Affiliation(s)
- Gerhard Mayer
- a Medizinisches Proteom Center (MPC) , Ruhr-Universität Bochum , Bochum , Germany
| | - Katrin Marcus
- a Medizinisches Proteom Center (MPC) , Ruhr-Universität Bochum , Bochum , Germany
| | - Martin Eisenacher
- a Medizinisches Proteom Center (MPC) , Ruhr-Universität Bochum , Bochum , Germany
| | - Michael Kohl
- a Medizinisches Proteom Center (MPC) , Ruhr-Universität Bochum , Bochum , Germany
| |
Collapse
|
89
|
Hou J, Acharya L, Zhu D, Cheng J. An overview of bioinformatics methods for modeling biological pathways in yeast. Brief Funct Genomics 2016; 15:95-108. [PMID: 26476430 PMCID: PMC5065356 DOI: 10.1093/bfgp/elv040] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The advent of high-throughput genomics techniques, along with the completion of genome sequencing projects, identification of protein-protein interactions and reconstruction of genome-scale pathways, has accelerated the development of systems biology research in the yeast organism Saccharomyces cerevisiae In particular, discovery of biological pathways in yeast has become an important forefront in systems biology, which aims to understand the interactions among molecules within a cell leading to certain cellular processes in response to a specific environment. While the existing theoretical and experimental approaches enable the investigation of well-known pathways involved in metabolism, gene regulation and signal transduction, bioinformatics methods offer new insights into computational modeling of biological pathways. A wide range of computational approaches has been proposed in the past for reconstructing biological pathways from high-throughput datasets. Here we review selected bioinformatics approaches for modeling biological pathways inS. cerevisiae, including metabolic pathways, gene-regulatory pathways and signaling pathways. We start with reviewing the research on biological pathways followed by discussing key biological databases. In addition, several representative computational approaches for modeling biological pathways in yeast are discussed.
Collapse
|
90
|
Inferring causal molecular networks: empirical assessment through a community-based effort. Nat Methods 2016; 13:310-8. [PMID: 26901648 PMCID: PMC4854847 DOI: 10.1038/nmeth.3773] [Citation(s) in RCA: 138] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 01/21/2016] [Indexed: 01/08/2023]
Abstract
The HPN-DREAM community challenge assessed the ability of computational methods to infer causal molecular networks, focusing specifically on the task of inferring causal protein signaling networks in cancer cell lines. It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.
Collapse
|
91
|
Evaluating network inference methods in terms of their ability to preserve the topology and complexity of genetic networks. Semin Cell Dev Biol 2016; 51:44-52. [PMID: 26851626 DOI: 10.1016/j.semcdb.2016.01.012] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 01/07/2016] [Indexed: 12/26/2022]
Abstract
Network inference is a rapidly advancing field, with new methods being proposed on a regular basis. Understanding the advantages and limitations of different network inference methods is key to their effective application in different circumstances. The common structural properties shared by diverse networks naturally pose a challenge when it comes to devising accurate inference methods, but surprisingly, there is a paucity of comparison and evaluation methods. Historically, every new methodology has only been tested against gold standard (true values) purpose-designed synthetic and real-world (validated) biological networks. In this paper we aim to assess the impact of taking into consideration aspects of topological and information content in the evaluation of the final accuracy of an inference procedure. Specifically, we will compare the best inference methods, in both graph-theoretic and information-theoretic terms, for preserving topological properties and the original information content of synthetic and biological networks. New methods for performance comparison are introduced by borrowing ideas from gene set enrichment analysis and by applying concepts from algorithmic complexity. Experimental results show that no individual algorithm outperforms all others in all cases, and that the challenging and non-trivial nature of network inference is evident in the struggle of some of the algorithms to turn in a performance that is superior to random guesswork. Therefore special care should be taken to suit the method to the purpose at hand. Finally, we show that evaluations from data generated using different underlying topologies have different signatures that can be used to better choose a network reconstruction method.
Collapse
|
92
|
Ness RO, Sachs K, Vitek O. From Correlation to Causality: Statistical Approaches to Learning Regulatory Relationships in Large-Scale Biomolecular Investigations. J Proteome Res 2016; 15:683-90. [PMID: 26731284 DOI: 10.1021/acs.jproteome.5b00911] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Causal inference, the task of uncovering regulatory relationships between components of biomolecular pathways and networks, is a primary goal of many high-throughput investigations. Statistical associations between observed protein concentrations can suggest an enticing number of hypotheses regarding the underlying causal interactions, but when do such associations reflect the underlying causal biomolecular mechanisms? The goal of this perspective is to provide suggestions for causal inference in large-scale experiments, which utilize high-throughput technologies such as mass-spectrometry-based proteomics. We describe in nontechnical terms the pitfalls of inference in large data sets and suggest methods to overcome these pitfalls and reliably find regulatory associations.
Collapse
Affiliation(s)
- Robert O Ness
- Department of Statistics, Purdue University , West Lafayette, Indiana 47907-2066, United States.,College of Science, College of Computer and Information Science, Northeastern University , Boston, Massachusetts 02115, United States
| | - Karen Sachs
- School of Medicine, Stanford University , Palo Alto, California 94305, United States
| | - Olga Vitek
- College of Science, College of Computer and Information Science, Northeastern University , Boston, Massachusetts 02115, United States
| |
Collapse
|
93
|
Liu F, Heiner M, Yang M. Representing network reconstruction solutions with colored Petri nets. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.04.112] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
94
|
Petrovskiy ED, Saik OV, Tiys ES, Lavrik IN, Kolchanov NA, Ivanisenko VA. Prediction of tissue-specific effects of gene knockout on apoptosis in different anatomical structures of human brain. BMC Genomics 2015; 16 Suppl 13:S3. [PMID: 26693857 PMCID: PMC4686796 DOI: 10.1186/1471-2164-16-s13-s3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND An important issue in the target identification for the drug design is the tissue-specific effect of inhibition of target genes. The task of assessing the tissue-specific effect in suppressing gene activity is especially relevant in the studies of the brain, because a significant variability in gene expression levels among different areas of the brain was well documented. RESULTS A method is proposed for constructing statistical models to predict the potential effect of the knockout of target genes on the expression of genes involved in the regulation of apoptosis in various brain regions. The model connects the expression of the objective group of genes with expression of the target gene by means of machine learning models trained on available expression data. Information about the interactions between target and objective genes is determined by reconstruction of target-centric gene network. STRING and ANDSystem databases are used for the reconstruction of gene networks. The developed models have been used to analyse gene knockout effects of more than 7,500 target genes on the expression of 1,900 objective genes associated with the Gene Ontology category "apoptotic process". The tissue-specific effect was calculated for 12 main anatomical structures of the human brain. Initial values of gene expression in these anatomical structures were taken from the Allen Brain Atlas database. The results of the predictions of the effect of suppressing the activity of target genes on apoptosis, calculated on average for all brain structures, were in good agreement with experimental data on siRNA-inhibition. CONCLUSIONS This theoretical paper presents an approach that can be used to assess tissue-specific gene knockout effect on gene expression of the studied biological process in various structures of the brain. Genes that, according to the predictions of the model, have the highest values of tissue-specific effects on the apoptosis network can be considered as potential pharmacological targets for the development of drugs that would potentially have strong effect on the specific area of the brain and a much weaker effect on other brain structures. Further experiments should be provided in order to confirm the potential findings of the method.
Collapse
Affiliation(s)
- Evgeny D Petrovskiy
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090, Russia
- International Tomography Center, The Siberian Branch of the Russian Academy of Sciences, Institutskaya 3A, Novosibirsk, 630090, Russia
| | - Olga V Saik
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090, Russia
| | - Evgeny S Tiys
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090, Russia
| | - Inna N Lavrik
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090, Russia
- Otto von Guericke University Magdeburg, Medical Faculty, Department Translational Inflammation Research, Institute of Experimental Internal Medicine, Pfälzer Platz, Building 28, Magdeburg, 39106, Germany
| | - Nikolay A Kolchanov
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090, Russia
| | - Vladimir A Ivanisenko
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090, Russia
| |
Collapse
|
95
|
Ceci M, Pio G, Kuzmanovski V, Džeroski S. Semi-Supervised Multi-View Learning for Gene Network Reconstruction. PLoS One 2015; 10:e0144031. [PMID: 26641091 PMCID: PMC4671612 DOI: 10.1371/journal.pone.0144031] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Accepted: 11/12/2015] [Indexed: 12/30/2022] Open
Abstract
The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has also been shown that the integration of predictions from multiple inference methods is more robust and shows high performance across diverse datasets. Inspired by this research, in this paper, we propose a machine learning solution which learns to combine predictions from multiple inference methods. While this approach adds additional complexity to the inference process, we expect it would also carry substantial benefits. These would come from the automatic adaptation to patterns on the outputs of individual inference methods, so that it is possible to identify regulatory interactions more reliably when these patterns occur. This article demonstrates the benefits (in terms of accuracy of the reconstructed networks) of the proposed method, which exploits an iterative, semi-supervised ensemble-based algorithm. The algorithm learns to combine the interactions predicted by many different inference methods in the multi-view learning setting. The empirical evaluation of the proposed algorithm on a prokaryotic model organism (E. coli) and on a eukaryotic model organism (S. cerevisiae) clearly shows improved performance over the state of the art methods. The results indicate that gene regulatory network reconstruction for the real datasets is more difficult for S. cerevisiae than for E. coli. The software, all the datasets used in the experiments and all the results are available for download at the following link: http://figshare.com/articles/Semi_supervised_Multi_View_Learning_for_Gene_Network_Reconstruction/1604827.
Collapse
Affiliation(s)
- Michelangelo Ceci
- Dept. of Computer Science, University of Bari Aldo Moro, Via Orabona 4, 70125 Bari, Italy
| | - Gianvito Pio
- Dept. of Computer Science, University of Bari Aldo Moro, Via Orabona 4, 70125 Bari, Italy
- * E-mail: (GP); (VK)
| | - Vladimir Kuzmanovski
- Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia
- Jožef Stefan International Postgraduate School, Jamova 39, 1000 Ljubljana, Slovenia
- * E-mail: (GP); (VK)
| | - Sašo Džeroski
- Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia
- Jožef Stefan International Postgraduate School, Jamova 39, 1000 Ljubljana, Slovenia
| |
Collapse
|
96
|
Ghanbari M, Lasserre J, Vingron M. Reconstruction of gene networks using prior knowledge. BMC SYSTEMS BIOLOGY 2015; 9:84. [PMID: 26589494 PMCID: PMC4654848 DOI: 10.1186/s12918-015-0233-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2014] [Accepted: 11/11/2015] [Indexed: 01/08/2023]
Abstract
Background Reconstructing gene regulatory networks (GRNs) from expression data is a challenging task that has become essential to the understanding of complex regulatory mechanisms in cells. The major issues are the usually very high ratio of number of genes to sample size, and the noise in the available data. Integrating biological prior knowledge to the learning process is a natural and promising way to partially compensate for the lack of reliable expression data and to increase the accuracy of network reconstruction algorithms. Results In this manuscript, we present PriorPC, a new algorithm based on the PC algorithm. PC algorithm is one of the most popular methods for Bayesian network reconstruction. The result of PC is known to depend on the order in which conditional independence tests are processed, especially for large networks. PriorPC uses prior knowledge to exclude unlikely edges from network estimation and introduces a particular ordering for the conditional independence tests. We show on synthetic data that the structural accuracy of networks obtained with PriorPC is greatly improved compared to PC. Conclusion PriorPC improves structural accuracy of inferred gene networks by using soft priors which assign to edges a probability of existence. It is robust to false prior which is not avoidable in the context of biological data. PriorPC is also fast and scales well for large networks which is important for its applicability to real data. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0233-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mahsa Ghanbari
- Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, Berlin, D-14195, Germany.
| | - Julia Lasserre
- Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, Berlin, D-14195, Germany.
| | - Martin Vingron
- Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, Berlin, D-14195, Germany.
| |
Collapse
|
97
|
Wang X, Alshawaqfeh M, Dang X, Wajid B, Noor A, Qaraqe M, Serpedin E. An Overview of NCA-Based Algorithms for Transcriptional Regulatory Network Inference. ACTA ACUST UNITED AC 2015; 4:596-617. [PMID: 27600242 PMCID: PMC4996402 DOI: 10.3390/microarrays4040596] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 10/07/2015] [Accepted: 11/11/2015] [Indexed: 01/08/2023]
Abstract
In systems biology, the regulation of gene expressions involves a complex network of regulators. Transcription factors (TFs) represent an important component of this network: they are proteins that control which genes are turned on or off in the genome by binding to specific DNA sequences. Transcription regulatory networks (TRNs) describe gene expressions as a function of regulatory inputs specified by interactions between proteins and DNA. A complete understanding of TRNs helps to predict a variety of biological processes and to diagnose, characterize and eventually develop more efficient therapies. Recent advances in biological high-throughput technologies, such as DNA microarray data and next-generation sequence (NGS) data, have made the inference of transcription factor activities (TFAs) and TF-gene regulations possible. Network component analysis (NCA) represents an efficient computational framework for TRN inference from the information provided by microarrays, ChIP-on-chip and the prior information about TF-gene regulation. However, NCA suffers from several shortcomings. Recently, several algorithms based on the NCA framework have been proposed to overcome these shortcomings. This paper first overviews the computational principles behind NCA, and then, it surveys the state-of-the-art NCA-based algorithms proposed in the literature for TRN reconstruction.
Collapse
Affiliation(s)
- Xu Wang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Mustafa Alshawaqfeh
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Xuan Dang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Bilal Wajid
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Amina Noor
- Institute of Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA.
| | - Marwa Qaraqe
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Erchin Serpedin
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| |
Collapse
|
98
|
Peterson CB, Stingo FC, Vannucci M. Joint Bayesian variable and graph selection for regression models with network-structured predictors. Stat Med 2015; 35:1017-31. [PMID: 26514925 DOI: 10.1002/sim.6792] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Revised: 10/12/2015] [Accepted: 10/14/2015] [Indexed: 01/09/2023]
Abstract
In this work, we develop a Bayesian approach to perform selection of predictors that are linked within a network. We achieve this by combining a sparse regression model relating the predictors to a response variable with a graphical model describing conditional dependencies among the predictors. The proposed method is well-suited for genomic applications because it allows the identification of pathways of functionally related genes or proteins that impact an outcome of interest. In contrast to previous approaches for network-guided variable selection, we infer the network among predictors using a Gaussian graphical model and do not assume that network information is available a priori. We demonstrate that our method outperforms existing methods in identifying network-structured predictors in simulation settings and illustrate our proposed model with an application to inference of proteins relevant to glioblastoma survival.
Collapse
Affiliation(s)
- Christine B Peterson
- Department of Health Research and Policy, Stanford University, Stanford, CA, 94305, U.S.A
| | - Francesco C Stingo
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, U.S.A
| | - Marina Vannucci
- Department of Statistics, Rice University, Houston, TX, 77005, U.S.A
| |
Collapse
|
99
|
Nepomuceno JA, Troncoso A, Aguilar-Ruiz JS. Scatter search-based identification of local patterns with positive and negative correlations in gene expression data. Appl Soft Comput 2015. [DOI: 10.1016/j.asoc.2015.06.019] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
100
|
Agostinho NB, Machado KS, Werhli AV. Inference of regulatory networks with a convergence improved MCMC sampler. BMC Bioinformatics 2015; 16:306. [PMID: 26399857 PMCID: PMC4581096 DOI: 10.1186/s12859-015-0734-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Accepted: 09/09/2015] [Indexed: 12/01/2022] Open
Abstract
Background One of the goals of the Systems Biology community is to have a detailed map of all biological interactions in an organism. One small yet important step in this direction is the creation of biological networks from post-genomic data. Bayesian networks are a very promising model for the inference of regulatory networks in Systems Biology. Usually, Bayesian networks are sampled with a Markov Chain Monte Carlo (MCMC) sampler in the structure space. Unfortunately, conventional MCMC sampling schemes are often slow in mixing and convergence. To improve MCMC convergence, an alternative method is proposed and tested with different sets of data. Moreover, the proposed method is compared with the traditional MCMC sampling scheme. Results In the proposed method, a simpler and faster method for the inference of regulatory networks, Graphical Gaussian Models (GGMs), is integrated into the Bayesian network inference, trough a Hierarchical Bayesian model. In this manner, information about the structure obtained from the data with GGMs is taken into account in the MCMC scheme, thus improving mixing and convergence. The proposed method is tested with three types of data, two from simulated models and one from real data. The results are compared with the results of the traditional MCMC sampling scheme in terms of network recovery accuracy and convergence. The results show that when compared with a traditional MCMC scheme, the proposed method presents improved convergence leading to better network reconstruction with less MCMC iterations. Conclusions The proposed method is a viable alternative to improve mixing and convergence of traditional MCMC schemes. It allows the use of Bayesian networks with an MCMC sampler with less iterations. The proposed method has always converged earlier than the traditional MCMC scheme. We observe an improvement in accuracy of the recovered networks for the Gaussian simulated data, but this improvement is absent for both real data and data simulated from ODE. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0734-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nilzair B Agostinho
- Centro de Ciências Computacionais - C3 Universidade Federal do Rio Grande- FURG, Campus Carreiros, Rio Grande, Brazil.
| | - Karina S Machado
- Centro de Ciências Computacionais - C3 Universidade Federal do Rio Grande- FURG, Campus Carreiros, Rio Grande, Brazil.
| | - Adriano V Werhli
- Centro de Ciências Computacionais - C3 Universidade Federal do Rio Grande- FURG, Campus Carreiros, Rio Grande, Brazil.
| |
Collapse
|