1
|
Karlebach G, Robinson PN. Computing Minimal Boolean Models of Gene Regulatory Networks. J Comput Biol 2024; 31:117-127. [PMID: 37889991 DOI: 10.1089/cmb.2023.0122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/29/2023] Open
Abstract
Models of gene regulatory networks (GRNs) capture the dynamics of the regulatory processes that occur within the cell as a means to understanding the variability observed in gene expression between different conditions. Arguably the simplest mathematical construct used for modeling is the Boolean network, which dictates a set of logical rules for transition between states described as Boolean vectors. Due to the complexity of gene regulation and the limitations of experimental technologies, in most cases knowledge about regulatory interactions and Boolean states is partial. In addition, the logical rules themselves are not known a priori. Our goal in this work is to create an algorithm that finds the network that fits the data optimally, and identify the network states that correspond to the noise-free data. We present a novel methodology for integrating experimental data and performing a search for the optimal consistent structure via optimization of a linear objective function under a set of linear constraints. In addition, we extend our methodology into a heuristic that alleviates the computational complexity of the problem for datasets that are generated by single-cell RNA-Sequencing (scRNA-Seq). We demonstrate the effectiveness of these tools using simulated data, and in addition a publicly available scRNA-Seq dataset and the GRN that is associated with it. Our methodology will enable researchers to obtain a better understanding of the dynamics of GRNs and their biological role.
Collapse
Affiliation(s)
- Guy Karlebach
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, USA
| |
Collapse
|
2
|
Xavier JB, Monk JM, Poudel S, Norsigian CJ, Sastry AV, Liao C, Bento J, Suchard MA, Arrieta-Ortiz ML, Peterson EJ, Baliga NS, Stoeger T, Ruffin F, Richardson RA, Gao CA, Horvath TD, Haag AM, Wu Q, Savidge T, Yeaman MR. Mathematical models to study the biology of pathogens and the infectious diseases they cause. iScience 2022; 25:104079. [PMID: 35359802 PMCID: PMC8961237 DOI: 10.1016/j.isci.2022.104079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Mathematical models have many applications in infectious diseases: epidemiologists use them to forecast outbreaks and design containment strategies; systems biologists use them to study complex processes sustaining pathogens, from the metabolic networks empowering microbial cells to ecological networks in the microbiome that protects its host. Here, we (1) review important models relevant to infectious diseases, (2) draw parallels among models ranging widely in scale. We end by discussing a minimal set of information for a model to promote its use by others and to enable predictions that help us better fight pathogens and the diseases they cause.
Collapse
Affiliation(s)
- Joao B. Xavier
- Program for Computational and Systems Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | | | - Saugat Poudel
- Department of Bioengineering, UC San Diego, San Diego, CA, USA
| | | | - Anand V. Sastry
- Department of Bioengineering, UC San Diego, San Diego, CA, USA
| | - Chen Liao
- Program for Computational and Systems Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Jose Bento
- Computer Science Department, Boston College, Chestnut Hill, MA, USA
| | - Marc A. Suchard
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA, USA
| | | | | | | | - Thomas Stoeger
- Department of Chemical and Biological Engineering; Northwestern University, Evanston, IL 60208, USA
- Successful Clinical Response in Pneumonia Therapy (SCRIPT) Systems Biology Center, Northwestern University, Chicago, IL, USA
| | - Felicia Ruffin
- Division of Infectious Diseases, Department of Medicine, Duke University School of Medicine, Durham, NC, USA
| | - Reese A.K. Richardson
- Department of Chemical and Biological Engineering; Northwestern University, Evanston, IL 60208, USA
- Successful Clinical Response in Pneumonia Therapy (SCRIPT) Systems Biology Center, Northwestern University, Chicago, IL, USA
| | - Catherine A. Gao
- Successful Clinical Response in Pneumonia Therapy (SCRIPT) Systems Biology Center, Northwestern University, Chicago, IL, USA
- Division of Pulmonary and Critical Care, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Thomas D. Horvath
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Pathology, Texas Children’s Microbiome Center, Texas Children’s Hospital, Houston, TX 77030, USA
| | - Anthony M. Haag
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Pathology, Texas Children’s Microbiome Center, Texas Children’s Hospital, Houston, TX 77030, USA
| | - Qinglong Wu
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Pathology, Texas Children’s Microbiome Center, Texas Children’s Hospital, Houston, TX 77030, USA
| | - Tor Savidge
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Pathology, Texas Children’s Microbiome Center, Texas Children’s Hospital, Houston, TX 77030, USA
| | - Michael R. Yeaman
- David Geffen School of Medicine at UCLA & Lundquist Institute for Infection & Immunity at Harbor UCLA Medical Center, Los Angeles, CA, USA
| |
Collapse
|
3
|
A review of methods for the reconstruction and analysis of integrated genome-scale models of metabolism and regulation. Biochem Soc Trans 2020; 48:1889-1903. [DOI: 10.1042/bst20190840] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 07/16/2020] [Accepted: 08/21/2020] [Indexed: 02/07/2023]
Abstract
The current survey aims to describe the main methodologies for extending the reconstruction and analysis of genome-scale metabolic models and phenotype simulation with Flux Balance Analysis mathematical frameworks, via the integration of Transcriptional Regulatory Networks and/or gene expression data. Although the surveyed methods are aimed at improving phenotype simulations obtained from these models, the perspective of reconstructing integrated genome-scale models of metabolism and gene expression for diverse prokaryotes is still an open challenge.
Collapse
|
4
|
Condition-Specific Modeling of Biophysical Parameters Advances Inference of Regulatory Networks. Cell Rep 2019; 23:376-388. [PMID: 29641998 PMCID: PMC5987223 DOI: 10.1016/j.celrep.2018.03.048] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 01/12/2018] [Accepted: 03/12/2018] [Indexed: 12/31/2022] Open
Abstract
Large-scale inference of eukaryotic transcription-regulatory networks remains challenging. One underlying reason is that existing algorithms typically ignore crucial regulatory mechanisms, such as RNA degradation and post-transcriptional processing. Here, we describe InfereCLaDR, which incorporates such elements and advances prediction in Saccharomyces cerevisiae. First, InfereCLaDR employs a high-quality Gold Standard dataset that we use separately as prior information and for model validation. Second, InfereCLaDR explicitly models transcription factor activity and RNA half-lives. Third, it introduces expression subspaces to derive condition-responsive regulatory networks for every gene. InfereCLaDR’s final network is validated by known data and trends and results in multiple insights. For example, it predicts long half-lives for transcripts of the nucleic acid metabolism genes and members of the cytosolic chaperonin complex as targets of the proteasome regulator Rpn4p. InfereCLaDR demonstrates that more biophysically realistic modeling of regulatory networks advances prediction accuracy both in eukaryotes and prokaryotes.
Collapse
|
5
|
Barbosa S, Niebel B, Wolf S, Mauch K, Takors R. A guide to gene regulatory network inference for obtaining predictive solutions: Underlying assumptions and fundamental biological and data constraints. Biosystems 2018; 174:37-48. [PMID: 30312740 DOI: 10.1016/j.biosystems.2018.10.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Revised: 10/05/2018] [Accepted: 10/08/2018] [Indexed: 02/07/2023]
Abstract
The study of biological systems at a system level has become a reality due to the increasing powerful computational approaches able to handle increasingly larger datasets. Uncovering the dynamic nature of gene regulatory networks in order to attain a system level understanding and improve the predictive power of biological models is an important research field in systems biology. The task itself presents several challenges, since the problem is of combinatorial nature and highly depends on several biological constraints and also the intended application. Given the intrinsic interdisciplinary nature of gene regulatory network inference, we present a review on the currently available approaches, their challenges and limitations. We propose guidelines to select the most appropriate method considering the underlying assumptions and fundamental biological and data constraints.
Collapse
Affiliation(s)
- Sara Barbosa
- Insilico Biotechnology AG, Meitnerstrasse 9, 70563 Stuttgart, Germany.
| | - Bastian Niebel
- Insilico Biotechnology AG, Meitnerstrasse 9, 70563 Stuttgart, Germany
| | - Sebastian Wolf
- Insilico Biotechnology AG, Meitnerstrasse 9, 70563 Stuttgart, Germany
| | - Klaus Mauch
- Insilico Biotechnology AG, Meitnerstrasse 9, 70563 Stuttgart, Germany
| | - Ralf Takors
- Institute of Biochemical Engineering, University of Stuttgart, Allmandring 31, 70569 Stuttgart, Germany
| |
Collapse
|
6
|
Yamada T, Akimitsu N. Contributions of regulated transcription and mRNA decay to the dynamics of gene expression. WILEY INTERDISCIPLINARY REVIEWS-RNA 2018; 10:e1508. [PMID: 30276972 DOI: 10.1002/wrna.1508] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2018] [Revised: 08/06/2018] [Accepted: 08/27/2018] [Indexed: 12/21/2022]
Abstract
Organisms have acquired sophisticated regulatory networks that control gene expression in response to cellular perturbations. Understanding of the mechanisms underlying the coordinated changes in gene expression in response to external and internal stimuli is a fundamental issue in biology. Recent advances in high-throughput technologies have enabled the measurement of diverse biological information, including gene expression levels, kinetics of gene expression, and interactions among gene expression regulatory molecules. By coupling these technologies with quantitative modeling, we can now uncover the biological roles and mechanisms of gene regulation at the system level. This review consists of two parts. First, we focus on the methods using uridine analogs that measure synthesis and decay rates of RNAs, which demonstrate how cells dynamically change the regulation of gene expression in response to both internal and external cues. Second, we discuss the underlying mechanisms of these changes in kinetics, including the functions of transcription factors and RNA-binding proteins. Overall, this review will help to clarify a system-level view of gene expression programs in cells. This article is categorized under: Regulatory RNAs/RNAi/Riboswitches > Regulatory RNAs RNA Turnover and Surveillance > Regulation of RNA Stability RNA Methods > RNA Analyses in vitro and In Silico.
Collapse
Affiliation(s)
- Toshimichi Yamada
- Department of Molecular and Cellular Biochemistry, Meiji Pharmaceutical University, Tokyo, Japan
| | | |
Collapse
|
7
|
van Dam S, Võsa U, van der Graaf A, Franke L, de Magalhães JP. Gene co-expression analysis for functional classification and gene-disease predictions. Brief Bioinform 2018; 19:575-592. [PMID: 28077403 PMCID: PMC6054162 DOI: 10.1093/bib/bbw139] [Citation(s) in RCA: 431] [Impact Index Per Article: 71.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Revised: 12/01/2016] [Indexed: 01/06/2023] Open
Abstract
Gene co-expression networks can be used to associate genes of unknown function with biological processes, to prioritize candidate disease genes or to discern transcriptional regulatory programmes. With recent advances in transcriptomics and next-generation sequencing, co-expression networks constructed from RNA sequencing data also enable the inference of functions and disease associations for non-coding genes and splice variants. Although gene co-expression networks typically do not provide information about causality, emerging methods for differential co-expression analysis are enabling the identification of regulatory genes underlying various phenotypes. Here, we introduce and guide researchers through a (differential) co-expression analysis. We provide an overview of methods and tools used to create and analyse co-expression networks constructed from gene expression data, and we explain how these can be used to identify genes with a regulatory role in disease. Furthermore, we discuss the integration of other data types with co-expression networks and offer future perspectives of co-expression analysis.
Collapse
Affiliation(s)
- Sipko van Dam
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | - Urmo Võsa
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | | | - Lude Franke
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | | |
Collapse
|
8
|
Leifeld T, Zhang Z, Zhang P. Identification of Boolean Network Models From Time Series Data Incorporating Prior Knowledge. Front Physiol 2018; 9:695. [PMID: 29937735 PMCID: PMC6002699 DOI: 10.3389/fphys.2018.00695] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 05/18/2018] [Indexed: 01/24/2023] Open
Abstract
Motivation: Mathematical models take an important place in science and engineering. A model can help scientists to explain dynamic behavior of a system and to understand the functionality of system components. Since length of a time series and number of replicates is limited by the cost of experiments, Boolean networks as a structurally simple and parameter-free logical model for gene regulatory networks have attracted interests of many scientists. In order to fit into the biological contexts and to lower the data requirements, biological prior knowledge is taken into consideration during the inference procedure. In the literature, the existing identification approaches can only deal with a subset of possible types of prior knowledge. Results: We propose a new approach to identify Boolean networks from time series data incorporating prior knowledge, such as partial network structure, canalizing property, positive and negative unateness. Using vector form of Boolean variables and applying a generalized matrix multiplication called the semi-tensor product (STP), each Boolean function can be equivalently converted into a matrix expression. Based on this, the identification problem is reformulated as an integer linear programming problem to reveal the system matrix of Boolean model in a computationally efficient way, whose dynamics are consistent with the important dynamics captured in the data. By using prior knowledge the number of candidate functions can be reduced during the inference. Hence, identification incorporating prior knowledge is especially suitable for the case of small size time series data and data without sufficient stimuli. The proposed approach is illustrated with the help of a biological model of the network of oxidative stress response. Conclusions: The combination of efficient reformulation of the identification problem with the possibility to incorporate various types of prior knowledge enables the application of computational model inference to systems with limited amount of time series data. The general applicability of this methodological approach makes it suitable for a variety of biological systems and of general interest for biological and medical research.
Collapse
Affiliation(s)
| | | | - Ping Zhang
- Institute of Automatic Control, Technische Universität Kaiserslautern, Kaiserslautern, Germany
| |
Collapse
|
9
|
He B, Tan K. Understanding transcriptional regulatory networks using computational models. Curr Opin Genet Dev 2016; 37:101-108. [PMID: 26950762 DOI: 10.1016/j.gde.2016.02.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Revised: 01/29/2016] [Accepted: 02/08/2016] [Indexed: 01/06/2023]
Abstract
Transcriptional regulatory networks (TRNs) encode instructions for animal development and physiological responses. Recent advances in genomic technologies and computational modeling have revolutionized our ability to construct models of TRNs. Here, we survey current computational methods for inferring TRN models using genome-scale data. We discuss their advantages and limitations. We summarize representative TRNs constructed using genome-scale data in both normal and disease development. We discuss lessons learned about the structure/function relationship of TRNs, based on examining various large-scale TRN models. Finally, we outline some open questions regarding TRNs, including how to improve model accuracy by integrating complementary data types, how to infer condition-specific TRNs, and how to compare TRNs across conditions and species in order to understand their structure/function relationship.
Collapse
Affiliation(s)
- Bing He
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA 52242, USA
| | - Kai Tan
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA 52242, USA; Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA.
| |
Collapse
|
10
|
Modelling the immune system response to epithelial wound infections. J Theor Biol 2016; 393:158-69. [DOI: 10.1016/j.jtbi.2015.12.030] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2014] [Revised: 10/01/2015] [Accepted: 12/25/2015] [Indexed: 12/15/2022]
|
11
|
Picchetti T, Chiquet J, Elati M, Neuvial P, Nicolle R, Birmelé E. A model for gene deregulation detection using expression data. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 6:S6. [PMID: 26679516 PMCID: PMC4674863 DOI: 10.1186/1752-0509-9-s6-s6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
In tumoral cells, gene regulation mechanisms are severely altered. Genes that do not react normally to their regulators' activity can provide explanations for the tumoral behavior, and be characteristic of cancer subtypes. We thus propose a statistical methodology to identify the misregulated genes given a reference network and gene expression data. Our model is based on a regulatory process in which all genes are allowed to be deregulated. We derive an EM algorithm where the hidden variables correspond to the status (under/over/normally expressed) of the genes and where the E-step is solved thanks to a message passing algorithm. Our procedure provides posterior probabilities of deregulation in a given sample for each gene. We assess the performance of our method by numerical experiments on simulations and on a bladder cancer data set.
Collapse
|
12
|
Liu ZP. Reverse Engineering of Genome-wide Gene Regulatory Networks from Gene Expression Data. Curr Genomics 2015; 16:3-22. [PMID: 25937810 PMCID: PMC4412962 DOI: 10.2174/1389202915666141110210634] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Revised: 09/05/2014] [Accepted: 09/05/2014] [Indexed: 12/17/2022] Open
Abstract
Transcriptional regulation plays vital roles in many fundamental biological processes. Reverse engineering of genome-wide regulatory networks from high-throughput transcriptomic data provides a promising way to characterize the global scenario of regulatory relationships between regulators and their targets. In this review, we summarize and categorize the main frameworks and methods currently available for inferring transcriptional regulatory networks from microarray gene expression profiling data. We overview each of strategies and introduce representative methods respectively. Their assumptions, advantages, shortcomings, and possible improvements and extensions are also clarified and commented.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
13
|
Xu H, Ang YS, Sevilla A, Lemischka IR, Ma'ayan A. Construction and validation of a regulatory network for pluripotency and self-renewal of mouse embryonic stem cells. PLoS Comput Biol 2014; 10:e1003777. [PMID: 25122140 PMCID: PMC4133156 DOI: 10.1371/journal.pcbi.1003777] [Citation(s) in RCA: 74] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Accepted: 06/27/2014] [Indexed: 11/22/2022] Open
Abstract
A 30-node signed and directed network responsible for self-renewal and pluripotency of mouse embryonic stem cells (mESCs) was extracted from several ChIP-Seq and knockdown followed by expression prior studies. The underlying regulatory logic among network components was then learned using the initial network topology and single cell gene expression measurements from mESCs cultured in serum/LIF or serum-free 2i/LIF conditions. Comparing the learned network regulatory logic derived from cells cultured in serum/LIF vs. 2i/LIF revealed differential roles for Nanog, Oct4/Pou5f1, Sox2, Esrrb and Tcf3. Overall, gene expression in the serum/LIF condition was more variable than in the 2i/LIF but mostly consistent across the two conditions. Expression levels for most genes in single cells were bimodal across the entire population and this motivated a Boolean modeling approach. In silico predictions derived from removal of nodes from the Boolean dynamical model were validated with experimental single and combinatorial RNA interference (RNAi) knockdowns of selected network components. Quantitative post-RNAi expression level measurements of remaining network components showed good agreement with the in silico predictions. Computational removal of nodes from the Boolean network model was also used to predict lineage specification outcomes. In summary, data integration, modeling, and targeted experiments were used to improve our understanding of the regulatory topology that controls mESC fate decisions as well as to develop robust directed lineage specification protocols. For this study we first constructed a directed and signed network consisting of 15 pluripotency regulators and 15 lineage commitment markers that extensively interact to regulate mouse embryonic stem cells fate decisions from data available in the public domain. Given the connectivity structure of this network, the underlying regulatory logic was learned using single cell gene expression measurements of mESCs cultured in two different conditions. With connectivity and logic learned, the network was then simulated using a dynamic Boolean logic framework. Such simulations enabled prediction of knockdown effects on the overall activity of the network. Such predictions were validated by single and combinatorial RNA interference experiments followed by expression measurements. Finally, lineage specification outcomes upon single and combinatorial gene knockdowns were predicted for all possible knockdown combinations.
Collapse
Affiliation(s)
- Huilei Xu
- Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Developmental and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Yen-Sin Ang
- Department of Developmental and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Ana Sevilla
- Department of Developmental and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Ihor R. Lemischka
- Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Developmental and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- * E-mail: (IRL); (AM)
| | - Avi Ma'ayan
- Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- * E-mail: (IRL); (AM)
| |
Collapse
|
14
|
Harnessing diversity towards the reconstructing of large scale gene regulatory networks. PLoS Comput Biol 2013; 9:e1003361. [PMID: 24278007 PMCID: PMC3836705 DOI: 10.1371/journal.pcbi.1003361] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 10/10/2013] [Indexed: 12/17/2022] Open
Abstract
Elucidating gene regulatory network (GRN) from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i) a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii) TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks. Elucidating gene regulatory networks is crucial to understand disease mechanisms at the system level. A large number of algorithms have been developed to infer gene regulatory networks from gene-expression datasets. If you remember the success of IBM's Watson in ”Jeopardy!„ quiz show, the critical features of Watson were the use of very large numbers of heterogeneous algorithms generating various hypotheses and to select one of which as the answer. We took similar approach, “TopkNet”, to see if “Wisdom of Crowd” approach can be applied for network reconstruction. We discovered that “Wisdom of Crowd” is a powerful approach where integration of optimal algorithms for a given dataset can achieve better results than the best individual algorithm. However, such an analysis begs the question “How to choose optimal algorithms for a given dataset?” We found that similarity among gene-expression datasets is a key to select optimal algorithms, i.e., if dataset A for which optimal algorithms are known is similar to dataset B, the optimal algorithms for dataset A may be also optimal for dataset B. Thus, our “TopkNet” together with similarity measure among datasets can provide a powerful strategy towards harnessing “Wisdom of Crowd” in high-quality reconstruction of gene regulatory networks.
Collapse
|
15
|
Karlebach G. Inferring Boolean network states from partial information. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2013; 2013:11. [PMID: 24006954 PMCID: PMC3850440 DOI: 10.1186/1687-4153-2013-11] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Accepted: 08/26/2013] [Indexed: 01/14/2023]
Abstract
Networks of molecular interactions regulate key processes in living cells. Therefore, understanding their functionality is a high priority in advancing biological knowledge. Boolean networks are often used to describe cellular networks mathematically and are fitted to experimental datasets. The fitting often results in ambiguities since the interpretation of the measurements is not straightforward and since the data contain noise. In order to facilitate a more reliable mapping between datasets and Boolean networks, we develop an algorithm that infers network trajectories from a dataset distorted by noise. We analyze our algorithm theoretically and demonstrate its accuracy using simulation and microarray expression data.
Collapse
Affiliation(s)
- Guy Karlebach
- German Cancer Research Institute (DKFZ), Im Neuenheimer Feld 280, Heidelberg 69121, Germany.
| |
Collapse
|
16
|
Abstract
Since the first emergence of protein-protein interaction networks more than a decade ago, they have been viewed as static scaffolds of the signaling-regulatory events taking place in cells, and their analysis has been mainly confined to topological aspects. Recently, functional models of these networks have been suggested, ranging from Boolean to constraint-based methods. However, learning such models from large-scale data remains a formidable task, and most modeling approaches rely on extensive human curation. Here we provide a generic approach to learning Boolean models automatically from data. We apply our approach to growth and inflammatory signaling systems in humans and show how the learning phase can improve the fit of the model to experimental data, remove spurious interactions, and lead to better understanding of the system at hand.
Collapse
Affiliation(s)
- Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
| | | |
Collapse
|
17
|
Abstract
Reconstructing gene regulatory networks from high-throughput data is a long-standing problem. Through the DREAM project (Dialogue on Reverse Engineering Assessment and Methods), we performed a comprehensive blind assessment of over thirty network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae, and in silico microarray data. We characterize performance, data requirements, and inherent biases of different inference approaches offering guidelines for both algorithm application and development. We observe that no single inference method performs optimally across all datasets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse datasets. Thereby, we construct high-confidence networks for E. coli and S. aureus, each comprising ~1700 transcriptional interactions at an estimated precision of 50%. We experimentally test 53 novel interactions in E. coli, of which 23 were supported (43%). Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks.
Collapse
|