51
|
Pinello L, Lo Bosco G, Hanlon B, Yuan GC. A motif-independent metric for DNA sequence specificity. BMC Bioinformatics 2011; 12:408. [PMID: 22017798 PMCID: PMC3267244 DOI: 10.1186/1471-2105-12-408] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 10/21/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genome-wide mapping of protein-DNA interactions has been widely used to investigate biological functions of the genome. An important question is to what extent such interactions are regulated at the DNA sequence level. However, current investigation is hampered by the lack of computational methods for systematic evaluating sequence specificity. RESULTS We present a simple, unbiased quantitative measure for DNA sequence specificity called the Motif Independent Measure (MIM). By analyzing both simulated and real experimental data, we found that the MIM measure can be used to detect sequence specificity independent of presence of transcription factor (TF) binding motifs. We also found that the level of specificity associated with H3K4me1 target sequences is highly cell-type specific and highest in embryonic stem (ES) cells. We predicted H3K4me1 target sequences by using the N- score model and found that the prediction accuracy is indeed high in ES cells.The software to compute the MIM is freely available at: https://github.com/lucapinello/mim. CONCLUSIONS Our method provides a unified framework for quantifying DNA sequence specificity and serves as a guide for development of sequence-based prediction models.
Collapse
Affiliation(s)
- Luca Pinello
- Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA
| | | | | | | |
Collapse
|
52
|
Abstract
In a network orientation problem, one is given a mixed graph, consisting of directed and undirected edges, and a set of source-target vertex pairs. The goal is to orient the undirected edges so that a maximum number of pairs admit a directed path from the source to the target. This NP-complete problem arises in the context of analyzing physical networks of protein-protein and protein-DNA interactions. While the latter are naturally directed from a transcription factor to a gene, the direction of signal flow in protein-protein interactions is often unknown or cannot be measured en masse. One then tries to infer this information by using causality data on pairs of genes such that the perturbation of one gene changes the expression level of the other gene. Here we provide a first polynomial-size ILP formulation for this problem, which can be efficiently solved on current networks. We apply our algorithm to orient protein-protein interactions in yeast and measure our performance using edges with known orientations. We find that our algorithm achieves high accuracy and coverage in the orientation, outperforming simplified algorithmic variants that do not use information on edge directions. The obtained orientations can lead to a better understanding of the structure and function of the network.
Collapse
Affiliation(s)
- Dana Silverbush
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | | | | |
Collapse
|
53
|
Cloots L, Marchal K. Network-based functional modeling of genomics, transcriptomics and metabolism in bacteria. Curr Opin Microbiol 2011; 14:599-607. [DOI: 10.1016/j.mib.2011.09.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2011] [Revised: 08/28/2011] [Accepted: 09/05/2011] [Indexed: 01/10/2023]
|
54
|
Dorn B, Hüffner F, Krüger D, Niedermeier R, Uhlmann J. Exploiting bounded signal flow for graph orientation based on cause-effect pairs. Algorithms Mol Biol 2011; 6:21. [PMID: 21867496 PMCID: PMC3189099 DOI: 10.1186/1748-7188-6-21] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2011] [Accepted: 08/25/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We consider the following problem: Given an undirected network and a set of sender-receiver pairs, direct all edges such that the maximum number of "signal flows" defined by the pairs can be routed respecting edge directions. This problem has applications in understanding protein interaction based cell regulation mechanisms. Since this problem is NP-hard, research so far concentrated on polynomial-time approximation algorithms and tractable special cases. RESULTS We take the viewpoint of parameterized algorithmics and examine several parameters related to the maximum signal flow over vertices or edges. We provide several fixed-parameter tractability results, and in one case a sharp complexity dichotomy between a linear-time solvable case and a slightly more general NP-hard case. We examine the value of these parameters for several real-world network instances. CONCLUSIONS Several biologically relevant special cases of the NP-hard problem can be solved to optimality. In this way, parameterized analysis yields both deeper insight into the computational complexity and practical solving strategies.
Collapse
Affiliation(s)
- Britta Dorn
- Fakultät für Mathematik und Wirtschaftswissenschaften, Universität Ulm, Ulm, Germany
| | - Falk Hüffner
- Institut für Softwaretechnik und Theoretische Informatik, TU Berlin, Berlin, Germany
| | - Dominikus Krüger
- Institut für Theoretische Informatik, Universität Ulm, Ulm, Germany
| | - Rolf Niedermeier
- Institut für Softwaretechnik und Theoretische Informatik, TU Berlin, Berlin, Germany
| | - Johannes Uhlmann
- Institut für Softwaretechnik und Theoretische Informatik, TU Berlin, Berlin, Germany
| |
Collapse
|
55
|
Zhang B, Shi Z, Duncan DT, Prodduturi N, Marnett LJ, Liebler DC. Relating protein adduction to gene expression changes: a systems approach. MOLECULAR BIOSYSTEMS 2011; 7:2118-27. [PMID: 21594272 DOI: 10.1039/c1mb05014a] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Modification of proteins by reactive electrophiles such as the 4-hydroxy-2-nonenal (HNE) plays a critical role in oxidant-associated human diseases. However, little is known about protein adduction and the mechanism by which protein damage elicits adaptive effects and toxicity. We developed a systems approach for relating protein adduction to gene expression changes through the integration of protein adduction, gene expression, protein-DNA interaction, and protein-protein interaction data. Using a random walk strategy, we expanded a list of responsive transcription factors inferred from gene expression studies to upstream signaling networks, which in turn allowed overlaying protein adduction data on the network for the prediction of stress sensors and their associated regulatory mechanisms. We demonstrated the general applicability of transcription factor-based signaling network inference using 103 known pathways. Applying our workflow on gene expression and protein adduction data from HNE-treatment not only rediscovered known mechanisms of electrophile stress but also generated novel hypotheses regarding protein damage sensors. Although developed for analyzing protein adduction data, the framework can be easily adapted for phosphoproteomics and other types of protein modification data.
Collapse
Affiliation(s)
- Bing Zhang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA.
| | | | | | | | | | | |
Collapse
|
56
|
Lan A, Smoly IY, Rapaport G, Lindquist S, Fraenkel E, Yeger-Lotem E. ResponseNet: revealing signaling and regulatory networks linking genetic and transcriptomic screening data. Nucleic Acids Res 2011; 39:W424-9. [PMID: 21576238 PMCID: PMC3125767 DOI: 10.1093/nar/gkr359] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Cellular response to stimuli is typically complex and involves both regulatory and metabolic processes. Large-scale experimental efforts to identify components of these processes often comprise of genetic screening and transcriptomic profiling assays. We previously established that in yeast genetic screens tend to identify response regulators, while transcriptomic profiling assays tend to identify components of metabolic processes. ResponseNet is a network-optimization approach that integrates the results from these assays with data of known molecular interactions. Specifically, ResponseNet identifies a high-probability sub-network, composed of signaling and regulatory molecular interaction paths, through which putative response regulators may lead to the measured transcriptomic changes. Computationally, this is achieved by formulating a minimum-cost flow optimization problem and solving it efficiently using linear programming tools. The ResponseNet web server offers a simple interface for applying ResponseNet. Users can upload weighted lists of proteins and genes and obtain a sparse, weighted, molecular interaction sub-network connecting their data. The predicted sub-network and its gene ontology enrichment analysis are presented graphically or as text. Consequently, the ResponseNet web server enables researchers that were previously limited to separate analysis of their distinct, large-scale experiments, to meaningfully integrate their data and substantially expand their understanding of the underlying cellular response. ResponseNet is available at http://bioinfo.bgu.ac.il/respnet.
Collapse
Affiliation(s)
- Alex Lan
- Department of Computer Science, Department of Software Engineering, Ben-Gurion University of The Negev, Beer-Sheva 84105, Israel, Whitehead Institute for Biomedical Research, Cambridge, MA 02142, Department of Biology, Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA and Department of Clinical Biochemistry and National Center for Biotechnology in the Negev, Ben-Gurion University of The Negev, Beer-Sheva 84105, Israel
| | - Ilan Y. Smoly
- Department of Computer Science, Department of Software Engineering, Ben-Gurion University of The Negev, Beer-Sheva 84105, Israel, Whitehead Institute for Biomedical Research, Cambridge, MA 02142, Department of Biology, Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA and Department of Clinical Biochemistry and National Center for Biotechnology in the Negev, Ben-Gurion University of The Negev, Beer-Sheva 84105, Israel
| | - Guy Rapaport
- Department of Computer Science, Department of Software Engineering, Ben-Gurion University of The Negev, Beer-Sheva 84105, Israel, Whitehead Institute for Biomedical Research, Cambridge, MA 02142, Department of Biology, Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA and Department of Clinical Biochemistry and National Center for Biotechnology in the Negev, Ben-Gurion University of The Negev, Beer-Sheva 84105, Israel
| | - Susan Lindquist
- Department of Computer Science, Department of Software Engineering, Ben-Gurion University of The Negev, Beer-Sheva 84105, Israel, Whitehead Institute for Biomedical Research, Cambridge, MA 02142, Department of Biology, Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA and Department of Clinical Biochemistry and National Center for Biotechnology in the Negev, Ben-Gurion University of The Negev, Beer-Sheva 84105, Israel
| | - Ernest Fraenkel
- Department of Computer Science, Department of Software Engineering, Ben-Gurion University of The Negev, Beer-Sheva 84105, Israel, Whitehead Institute for Biomedical Research, Cambridge, MA 02142, Department of Biology, Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA and Department of Clinical Biochemistry and National Center for Biotechnology in the Negev, Ben-Gurion University of The Negev, Beer-Sheva 84105, Israel
| | - Esti Yeger-Lotem
- Department of Computer Science, Department of Software Engineering, Ben-Gurion University of The Negev, Beer-Sheva 84105, Israel, Whitehead Institute for Biomedical Research, Cambridge, MA 02142, Department of Biology, Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA and Department of Clinical Biochemistry and National Center for Biotechnology in the Negev, Ben-Gurion University of The Negev, Beer-Sheva 84105, Israel
- *To whom correspondence should be addressed. Tel/Fax: +972 8 6428675;
| |
Collapse
|
57
|
Xu H, Schaniel C, Lemischka IR, Ma'ayan A. Toward a complete in silico, multi-layered embryonic stem cell regulatory network. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2011; 2:708-33. [PMID: 20890967 DOI: 10.1002/wsbm.93] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Recent efforts in systematically profiling embryonic stem (ES) cells have yielded a wealth of high-throughput data. Complementarily, emerging databases and computational tools facilitate ES cell studies and further pave the way toward the in silico reconstruction of regulatory networks encompassing multiple molecular layers. Here, we briefly survey databases, algorithms, and software tools used to organize and analyze high-throughput experimental data collected to study mammalian cellular systems with a focus on ES cells. The vision of using heterogeneous data to reconstruct a complete multi-layered ES cell regulatory network is discussed. This review also provides an accompanying manually extracted dataset of different types of regulatory interactions from low-throughput experimental ES cell studies available at http://amp.pharm.mssm.edu/iscmid/literature.
Collapse
Affiliation(s)
- Huilei Xu
- Department of Gene and Cell Medicine and The Black Family Stem Cell Institute, Mount Sinai School of Medicine, New York, NY 10029, USA
| | | | | | | |
Collapse
|
58
|
Silverbush D, Elberfeld M, Sharan R. Optimally Orienting Physical Networks. LECTURE NOTES IN COMPUTER SCIENCE 2011. [DOI: 10.1007/978-3-642-20036-6_39] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
59
|
|
60
|
Liu Q, Tan Y, Huang T, Ding G, Tu Z, Liu L, Li Y, Dai H, Xie L. TF-centered downstream gene set enrichment analysis: Inference of causal regulators by integrating TF-DNA interactions and protein post-translational modifications information. BMC Bioinformatics 2010; 11 Suppl 11:S5. [PMID: 21172055 PMCID: PMC3024863 DOI: 10.1186/1471-2105-11-s11-s5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Background Inference of causal regulators responsible for gene expression changes under different conditions is of great importance but remains rather challenging. To date, most approaches use direct binding targets of transcription factors (TFs) to associate TFs with expression profiles. However, the low overlap between binding targets of a TF and the affected genes of the TF knockout limits the power of those methods. Results We developed a TF-centered downstream gene set enrichment analysis approach to identify potential causal regulators responsible for expression changes. We constructed hierarchical and multi-layer regulation models to derive possible downstream gene sets of a TF using not only TF-DNA interactions, but also, for the first time, post-translational modifications (PTM) information. We verified our method in one expression dataset of large-scale TF knockout and another dataset involving both TF knockout and TF overexpression. Compared with the flat model using TF-DNA interactions alone, our method correctly identified five more actual perturbed TFs in large-scale TF knockout data and six more perturbed TFs in overexpression data. Potential regulatory pathways downstream of three perturbed regulators— SNF1, AFT1 and SUT1 —were given to demonstrate the power of multilayer regulation models integrating TF-DNA interactions and PTM information. Additionally, our method successfully identified known important TFs and inferred some novel potential TFs involved in the transition from fermentative to glycerol-based respiratory growth and in the pheromone response. Downstream regulation pathways of SUT1 and AFT1 were also supported by the mRNA and/or phosphorylation changes of their mediating TFs and/or “modulator” proteins. Conclusions The results suggest that in addition to direct transcription, indirect transcription and post-translational regulation are also responsible for the effects of TFs perturbation, especially for TFs overexpression. Many TFs inferred by our method are supported by literature. Multiple TF regulation models could lead to new hypotheses for future experiments. Our method provides a valuable framework for analyzing gene expression data to identify causal regulators in the context of TF-DNA interactions and PTM information.
Collapse
Affiliation(s)
- Qi Liu
- School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | | | | | | | | | | | | | | | | |
Collapse
|
61
|
Gitter A, Klein-Seetharaman J, Gupta A, Bar-Joseph Z. Discovering pathways by orienting edges in protein interaction networks. Nucleic Acids Res 2010; 39:e22. [PMID: 21109539 PMCID: PMC3045580 DOI: 10.1093/nar/gkq1207] [Citation(s) in RCA: 89] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Modern experimental technology enables the identification of the sensory proteins that interact with the cells’ environment or various pathogens. Expression and knockdown studies can determine the downstream effects of these interactions. However, when attempting to reconstruct the signaling networks and pathways between these sources and targets, one faces a substantial challenge. Although pathways are directed, high-throughput protein interaction data are undirected. In order to utilize the available data, we need methods that can orient protein interaction edges and discover high-confidence pathways that explain the observed experimental outcomes. We formalize the orientation problem in weighted protein interaction graphs as an optimization problem and present three approximation algorithms based on either weighted Boolean satisfiability solvers or probabilistic assignments. We use these algorithms to identify pathways in yeast. Our approach recovers twice as many known signaling cascades as a recent unoriented signaling pathway prediction technique and over 13 times as many as an existing network orientation algorithm. The discovered paths match several known signaling pathways and suggest new mechanisms that are not currently present in signaling databases. For some pathways, including the pheromone signaling pathway and the high-osmolarity glycerol pathway, our method suggests interesting and novel components that extend current annotations.
Collapse
Affiliation(s)
- Anthony Gitter
- Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | | | | | | |
Collapse
|
62
|
Jaimovich A, Friedman N. From large-scale assays to mechanistic insights: computational analysis of interactions. Curr Opin Biotechnol 2010; 22:87-93. [PMID: 21109421 DOI: 10.1016/j.copbio.2010.10.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2010] [Accepted: 10/27/2010] [Indexed: 01/17/2023]
Abstract
The activity in the living cell is carried out by a myriad network of interactions between macromolecules. These include interactions between proteins that form a functional complex, a protein modifying another protein in a transient interaction, a transcription factor that binds a specific DNA locus triggering a change in chromatin or transcription, and so on. Characterization of these interactions in terms of timing, context, and function is crucial for understanding how cells carry out basic biological processes. The recent years have led to the introduction of many assays for probing these interactions in a systematic and large-scale manner. However, there is a large gap between assay results and understanding of biological systems. The challenge for computational methods is to bridge this gap by combining results of different assays and introducing statistical methodologies. In this review we discuss recent advances in approaches dealing with these challenges, and key directions for the future.
Collapse
Affiliation(s)
- Ariel Jaimovich
- School of Computer Science & Engineering, Hebrew University of Jerusalem, Jerusalem, Israel
| | | |
Collapse
|
63
|
The carbon assimilation network in Escherichia coli is densely connected and largely sign-determined by directions of metabolic fluxes. PLoS Comput Biol 2010; 6:e1000812. [PMID: 20548959 PMCID: PMC2883603 DOI: 10.1371/journal.pcbi.1000812] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2009] [Accepted: 05/07/2010] [Indexed: 11/30/2022] Open
Abstract
Gene regulatory networks consist of direct interactions but also include indirect interactions mediated by metabolites and signaling molecules. We describe how these indirect interactions can be derived from a model of the underlying biochemical reaction network, using weak time-scale assumptions in combination with sensitivity criteria from metabolic control analysis. We apply this approach to a model of the carbon assimilation network in Escherichia coli. Our results show that the derived gene regulatory network is densely connected, contrary to what is usually assumed. Moreover, the network is largely sign-determined, meaning that the signs of the indirect interactions are fixed by the flux directions of biochemical reactions, independently of specific parameter values and rate laws. An inversion of the fluxes following a change in growth conditions may affect the signs of the indirect interactions though. This leads to a feedback structure that is at the same time robust to changes in the kinetic properties of enzymes and that has the flexibility to accommodate radical changes in the environment. The regulation of gene expression is tightly interwoven with metabolism and signal transduction. A realistic view of gene regulatory networks should therefore not only include direct interactions resulting from transcription regulation, but also indirect regulatory interactions mediated by metabolic effectors and signaling molecules. Ignoring these indirect interactions during the analysis of the network dynamics may lead crucial feedback loops to be missed. We present a method for systematically deriving indirect interactions from a model of the underlying biochemical reaction network, using weak time-scale assumptions in combination with sensitivity criteria from metabolic control analysis. This approach leads to novel insights as exemplified here on the carbon assimilation network of E. coli. We show that the derived gene regulatory network is densely connected, that the signs of the indirect interactions are largely fixed by the direction of metabolic fluxes, and that a change in flux direction may invert the sign of indirect interactions. Therefore the feedback structure of the network is much more complex than usually assumed; it appears robust to changes in the kinetic properties of its components and it can be flexibly rewired when the environment changes.
Collapse
|
64
|
Joshi A, Van Parys T, Van de Peer Y, Michoel T. Characterizing regulatory path motifs in integrated networks using perturbational data. Genome Biol 2010; 11:R32. [PMID: 20230615 PMCID: PMC2864572 DOI: 10.1186/gb-2010-11-3-r32] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2009] [Revised: 10/01/2009] [Accepted: 03/11/2010] [Indexed: 01/12/2023] Open
Abstract
Pathicular – a Cytoscape plugin for analysing cellular responses to transcription factor perturbations is presented We introduce Pathicular http://bioinformatics.psb.ugent.be/software/details/Pathicular, a Cytoscape plugin for studying the cellular response to perturbations of transcription factors by integrating perturbational expression data with transcriptional, protein-protein and phosphorylation networks. Pathicular searches for 'regulatory path motifs', short paths in the integrated physical networks which occur significantly more often than expected between transcription factors and their targets in the perturbational data. A case study in Saccharomyces cerevisiae identifies eight regulatory path motifs and demonstrates their biological significance.
Collapse
Affiliation(s)
- Anagha Joshi
- Department of Plant Systems Biology, VIB, Technologiepark 927, Gent, Belgium.
| | | | | | | |
Collapse
|
65
|
|
66
|
Karlebach G, Shamir R. Minimally perturbing a gene regulatory network to avoid a disease phenotype: the glioma network as a test case. BMC SYSTEMS BIOLOGY 2010; 4:15. [PMID: 20184733 PMCID: PMC2851584 DOI: 10.1186/1752-0509-4-15] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2009] [Accepted: 02/25/2010] [Indexed: 11/10/2022]
Abstract
BACKGROUND Mathematical modeling of biological networks is an essential part of Systems Biology. Developing and using such models in order to understand gene regulatory networks is a major challenge. RESULTS We present an algorithm that determines the smallest perturbations required for manipulating the dynamics of a network formulated as a Petri net, in order to cause or avoid a specified phenotype. By modifying McMillan's unfolding algorithm, we handle partial knowledge and reduce computation cost. The methodology is demonstrated on a glioma network. Out of the single gene perturbations, activation of glutathione S-transferase P (GSTP1) gene was by far the most effective in blocking the cancer phenotype. Among pairs of perturbations, NFkB and TGF-beta had the largest joint effect, in accordance with their role in the EMT process. CONCLUSION Our method allows perturbation analysis of regulatory networks and can overcome incomplete information. It can help in identifying drug targets and in prioritizing perturbation experiments.
Collapse
Affiliation(s)
- Guy Karlebach
- Tel-Aviv University, Haim Levanon St,, 69978, Tel-Aviv, Israel.
| | | |
Collapse
|
67
|
Yip KY, Alexander RP, Yan KK, Gerstein M. Improved reconstruction of in silico gene regulatory networks by integrating knockout and perturbation data. PLoS One 2010; 5:e8121. [PMID: 20126643 PMCID: PMC2811182 DOI: 10.1371/journal.pone.0008121] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2009] [Accepted: 10/13/2009] [Indexed: 11/23/2022] Open
Abstract
We performed computational reconstruction of the in silico gene regulatory networks in the DREAM3 Challenges. Our task was to learn the networks from two types of data, namely gene expression profiles in deletion strains (the ‘deletion data’) and time series trajectories of gene expression after some initial perturbation (the ‘perturbation data’). In the course of developing the prediction method, we observed that the two types of data contained different and complementary information about the underlying network. In particular, deletion data allow for the detection of direct regulatory activities with strong responses upon the deletion of the regulator while perturbation data provide richer information for the identification of weaker and more complex types of regulation. We applied different techniques to learn the regulation from the two types of data. For deletion data, we learned a noise model to distinguish real signals from random fluctuations using an iterative method. For perturbation data, we used differential equations to model the change of expression levels of a gene along the trajectories due to the regulation of other genes. We tried different models, and combined their predictions. The final predictions were obtained by merging the results from the two types of data. A comparison with the actual regulatory networks suggests that our approach is effective for networks with a range of different sizes. The success of the approach demonstrates the importance of integrating heterogeneous data in network reconstruction.
Collapse
Affiliation(s)
- Kevin Y. Yip
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
| | - Roger P. Alexander
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - Koon-Kiu Yan
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - Mark Gerstein
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
68
|
Przytycka TM, Singh M, Slonim DK. Toward the dynamic interactome: it's about time. Brief Bioinform 2010; 11:15-29. [PMID: 20061351 PMCID: PMC2810115 DOI: 10.1093/bib/bbp057] [Citation(s) in RCA: 144] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Revised: 11/01/2009] [Indexed: 11/14/2022] Open
Abstract
Dynamic molecular interactions play a central role in regulating the functioning of cells and organisms. The availability of experimentally determined large-scale cellular networks, along with other high-throughput experimental data sets that provide snapshots of biological systems at different times and conditions, is increasingly helpful in elucidating interaction dynamics. Here we review the beginnings of a new subfield within computational biology, one focused on the global inference and analysis of the dynamic interactome. This burgeoning research area, which entails a shift from static to dynamic network analysis, promises to be a major step forward in our ability to model and reason about cellular function and behavior.
Collapse
Affiliation(s)
- Teresa M Przytycka
- National Center of Biotechnology Information, NLM, NIH, 8000 Rockville Pike, Bethesda MD 20814, USA.
| | | | | |
Collapse
|
69
|
Peleg T, Yosef N, Ruppin E, Sharan R. Network-free inference of knockout effects in yeast. PLoS Comput Biol 2010; 6:e1000635. [PMID: 20066032 PMCID: PMC2795781 DOI: 10.1371/journal.pcbi.1000635] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Accepted: 12/07/2009] [Indexed: 11/19/2022] Open
Abstract
Perturbation experiments, in which a certain gene is knocked out and the expression levels of other genes are observed, constitute a fundamental step in uncovering the intricate wiring diagrams in the living cell and elucidating the causal roles of genes in signaling and regulation. Here we present a novel framework for analyzing large cohorts of gene knockout experiments and their genome-wide effects on expression levels. We devise clustering-like algorithms that identify groups of genes that behave similarly with respect to the knockout data, and utilize them to predict knockout effects and to annotate physical interactions between proteins as inhibiting or activating. Differing from previous approaches, our prediction approach does not depend on physical network information; the latter is used only for the annotation task. Consequently, it is both more efficient and of wider applicability than previous methods. We evaluate our approach using a large scale collection of gene knockout experiments in yeast, comparing it to the state-of-the-art SPINE algorithm. In cross validation tests, our algorithm exhibits superior prediction accuracy, while at the same time increasing the coverage by over 25-fold. Significant coverage gains are obtained also in the annotation of the physical network.
Collapse
Affiliation(s)
- Tal Peleg
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Nir Yosef
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Eytan Ruppin
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
- School of Medicine, Tel-Aviv University, Tel-Aviv, Israel
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| |
Collapse
|
70
|
Huang SSC, Fraenkel E. Integrating proteomic, transcriptional, and interactome data reveals hidden components of signaling and regulatory networks. Sci Signal 2009; 2:ra40. [PMID: 19638617 DOI: 10.1126/scisignal.2000350] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Cellular signaling and regulatory networks underlie fundamental biological processes such as growth, differentiation, and response to the environment. Although there are now various high-throughput methods for studying these processes, knowledge of them remains fragmentary. Typically, the majority of hits identified by transcriptional, proteomic, and genetic assays lie outside of the expected pathways. These unexpected components of the cellular response are often the most interesting, because they can provide new insights into biological processes and potentially reveal new therapeutic approaches. However, they are also the most difficult to interpret. We present a technique, based on the Steiner tree problem, that uses previously reported protein-protein and protein-DNA interactions to determine how these hits are organized into functionally coherent pathways, revealing many components of the cellular response that are not readily apparent in the original data. Applied simultaneously to phosphoproteomic and transcriptional data for the yeast pheromone response, it identifies changes in diverse cellular processes that extend far beyond the expected pathways.
Collapse
Affiliation(s)
- Shao-Shan Carol Huang
- Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | |
Collapse
|
71
|
Leach SM, Tipney H, Feng W, Baumgartner WA, Kasliwal P, Schuyler RP, Williams T, Spritz RA, Hunter L. Biomedical discovery acceleration, with applications to craniofacial development. PLoS Comput Biol 2009; 5:e1000215. [PMID: 19325874 PMCID: PMC2653649 DOI: 10.1371/journal.pcbi.1000215] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2008] [Accepted: 02/12/2009] [Indexed: 01/17/2023] Open
Abstract
The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work.
Collapse
Affiliation(s)
- Sonia M. Leach
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Hannah Tipney
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Weiguo Feng
- Department of Craniofacial Biology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - William A. Baumgartner
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Priyanka Kasliwal
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Ronald P. Schuyler
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Trevor Williams
- Department of Craniofacial Biology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Richard A. Spritz
- Human Medical Genetics Program, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Lawrence Hunter
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
- * E-mail:
| |
Collapse
|
72
|
Modeling the temporal interplay of molecular signaling and gene expression by using dynamic nested effects models. Proc Natl Acad Sci U S A 2009; 106:6447-52. [PMID: 19329492 DOI: 10.1073/pnas.0809822106] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Cellular decision making in differentiation, proliferation, or cell death is mediated by molecular signaling processes, which control the regulation and expression of genes. Vice versa, the expression of genes can trigger the activity of signaling pathways. We introduce and describe a statistical method called Dynamic Nested Effects Model (D-NEM) for analyzing the temporal interplay of cell signaling and gene expression. D-NEMs are Bayesian models of signal propagation in a network. They decompose observed time delays of multiple step signaling processes into single steps. Time delays are assumed to be exponentially distributed. Rate constants of signal propagation are model parameters, whose joint posterior distribution is assessed via Gibbs sampling. They hold information on the interplay of different forms of biological signal propagation. Molecular signaling in the cytoplasm acts at high rates, direct signal propagation via transcription and translation act at intermediate rates, while secondary effects operate at low rates. D-NEMs allow the dissection of biological processes into signaling and expression events, and analysis of cellular signal flow. An application of D-NEMs to embryonic stem cell development in mice reveals a feed-forward loop dominated network, which stabilizes the differentiated state of cells and points to Nanog as the key sensitizer of stem cells for differentiation stimuli.
Collapse
|
73
|
Tu Z, Argmann C, Wong KK, Mitnaul LJ, Edwards S, Sach IC, Zhu J, Schadt EE. Integrating siRNA and protein-protein interaction data to identify an expanded insulin signaling network. Genome Res 2009; 19:1057-67. [PMID: 19261841 DOI: 10.1101/gr.087890.108] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Insulin resistance is one of the dominant symptoms of type 2 diabetes (T2D). Although the molecular mechanisms leading to this resistance are largely unknown, experimental data support that the insulin signaling pathway is impaired in patients who are insulin resistant. To identify novel components/modulators of the insulin signaling pathway, we designed siRNAs targeting over 300 genes and tested the effects of knocking down these genes in an insulin-dependent, anti-lipolysis assay in 3T3-L1 adipocytes. For 126 genes, significant changes in free fatty acid release were observed. However, due to off-target effects (in addition to other limitations), high-throughput RNAi-based screens in cell-based systems generate significant amounts of noise. Therefore, to obtain a more reliable set of genes from the siRNA hits in our screen, we developed and applied a novel network-based approach that elucidates the mechanisms of action for the true positive siRNA hits. Our analysis results in the identification of a core network underlying the insulin signaling pathway that is more significantly enriched for genes previously associated with insulin resistance than the set of genes annotated in the KEGG database as belonging to the insulin signaling pathway. We experimentally validated one of the predictions, S1pr2, as a novel candidate gene for T2D.
Collapse
Affiliation(s)
- Zhidong Tu
- Rosetta Inpharmatics, a wholly owned subsidiary of Merck & Co., Inc., Seattle, Washington 98109, USA
| | | | | | | | | | | | | | | |
Collapse
|
74
|
Yeger-Lotem E, Riva L, Su LJ, Gitler AD, Cashikar AG, King OD, Auluck PK, Geddie ML, Valastyan JS, Karger DR, Lindquist S, Fraenkel E. Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity. Nat Genet 2009; 41:316-23. [PMID: 19234470 DOI: 10.1038/ng.337] [Citation(s) in RCA: 221] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2008] [Accepted: 01/27/2009] [Indexed: 02/07/2023]
Abstract
Cells respond to stimuli by changes in various processes, including signaling pathways and gene expression. Efforts to identify components of these responses increasingly depend on mRNA profiling and genetic library screens. By comparing the results of these two assays across various stimuli, we found that genetic screens tend to identify response regulators, whereas mRNA profiling frequently detects metabolic responses. We developed an integrative approach that bridges the gap between these data using known molecular interactions, thus highlighting major response pathways. We used this approach to reveal cellular pathways responding to the toxicity of alpha-synuclein, a protein implicated in several neurodegenerative disorders including Parkinson's disease. For this we screened an established yeast model to identify genes that when overexpressed alter alpha-synuclein toxicity. Bridging these data and data from mRNA profiling provided functional explanations for many of these genes and identified previously unknown relations between alpha-synuclein toxicity and basic cellular pathways.
Collapse
Affiliation(s)
- Esti Yeger-Lotem
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
75
|
A factor graph nested effects model to identify networks from genetic perturbations. PLoS Comput Biol 2009; 5:e1000274. [PMID: 19180177 PMCID: PMC2613752 DOI: 10.1371/journal.pcbi.1000274] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2008] [Accepted: 12/12/2008] [Indexed: 11/26/2022] Open
Abstract
Complex phenotypes such as the transformation of a normal population of cells into cancerous tissue result from a series of molecular triggers gone awry. We describe a method that searches for a genetic network consistent with expression changes observed under the knock-down of a set of genes that share a common role in the cell, such as a disease phenotype. The method extends the Nested Effects Model of Markowetz et al. (2005) by using a probabilistic factor graph to search for a network representing interactions among these silenced genes. The method also expands the network by attaching new genes at specific downstream points, providing candidates for subsequent perturbations to further characterize the pathway. We investigated an extension provided by the factor graph approach in which the model distinguishes between inhibitory and stimulatory interactions. We found that the extension yielded significant improvements in recovering the structure of simulated and Saccharomyces cerevisae networks. We applied the approach to discover a signaling network among genes involved in a human colon cancer cell invasiveness pathway. The method predicts several genes with new roles in the invasiveness process. We knocked down two genes identified by our approach and found that both knock-downs produce loss of invasive potential in a colon cancer cell line. Nested effects models may be a powerful tool for inferring regulatory connections and genes that operate in normal and disease-related processes. Biological processes are the result of the actions and interactions of many genes and the proteins that they encode. Our knowledge of interactions for many biological processes is limited, especially for cancer where genomic alterations may create entirely novel pathways not present in normal tissue. Perturbing gene expression (for example, by deleting a gene) has long been used as a tool in molecular biology to elucidate interactions but is very expensive and labor intensive. The search for new genes that may participate can be a daunting “fishing expedition.” We have devised a tool that automatically infers interactions using high-throughput gene expression data. When a gene is silenced, it causes other genes to be switched on or off, which provide clues about the pathway(s) in which the gene acts. Our method uses the genomewide on/off states as a fingerprint to detect interactions among a set of silenced genes. We were able to elucidate a network of interactions for several genes implicated in metastatic colon cancer. Genes newly connected to the network were found to operate in cancer cell invasion in human cells, validating the approach. Thus, the method enables an efficient discovery of the networks that underlie biological processes such as carcinogenesis.
Collapse
|
76
|
Huang Y, Tienda-Luna IM, Wang Y. A Survey of Statistical Models for Reverse Engineering Gene Regulatory Networks. IEEE SIGNAL PROCESSING MAGAZINE 2009; 26:76-97. [PMID: 20046885 PMCID: PMC2763329 DOI: 10.1109/msp.2008.930647] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Statistical models for reverse engineering gene regulatory networks are surveyed in this article. To provide readers with a system-level view of the modeling issues in this research, a graphical modeling framework is proposed. This framework serves as the scaffolding on which the review of different models can be systematically assembled. Based on the framework, we review many existing models for many aspects of gene regulation; the pros and cons of each model are discussed. In addition, network inference algorithms are also surveyed under the graphical modeling framework by the categories of point solutions and probabilistic solutions and the connections and differences among the algorithms are provided. This survey has the potential to elucidate the development and future of reverse engineering GRNs and bring statistical signal processing closer to the core of this research.
Collapse
Affiliation(s)
- Yufei Huang
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX 78249-0669,
| | | | | |
Collapse
|
77
|
Karlebach G, Shamir R. Modelling and analysis of gene regulatory networks. Nat Rev Mol Cell Biol 2008; 9:770-80. [PMID: 18797474 DOI: 10.1038/nrm2503] [Citation(s) in RCA: 568] [Impact Index Per Article: 35.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Gene regulatory networks have an important role in every process of life, including cell differentiation, metabolism, the cell cycle and signal transduction. By understanding the dynamics of these networks we can shed light on the mechanisms of diseases that occur when these cellular processes are dysregulated. Accurate prediction of the behaviour of regulatory networks will also speed up biotechnological projects, as such predictions are quicker and cheaper than lab experiments. Computational methods, both for supporting the development of network models and for the analysis of their functionality, have already proved to be a valuable research tool.
Collapse
Affiliation(s)
- Guy Karlebach
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | | |
Collapse
|
78
|
Abstract
Background To date, the reconstruction of gene regulatory networks from gene expression data has primarily relied on the correlation between the expression of transcription regulators and that of target genes. Results We developed a network reconstruction method based on quantities that are closely related to the biophysical properties of TF-TF interaction, TF-DNA binding and transcriptional activation and repression. The Network-Identifier method utilized a thermodynamic model for gene regulation to infer regulatory relationships from multiple time course gene expression datasets. Applied to five datasets of differentiating embryonic stem cells, Network-Identifier identified a gene regulatory network among 87 transcription regulator genes. This network suggests that Oct4, Sox2 and Klf4 indirectly repress lineage specific differentiation genes by activating transcriptional repressors of Ctbp2, Rest and Mtf2.
Collapse
|
79
|
Abstract
During a decade of proof-of-principle analysis in model organisms, protein networks have been used to further the study of molecular evolution, to gain insight into the robustness of cells to perturbation, and for assignment of new protein functions. Following these analyses, and with the recent rise of protein interaction measurements in mammals, protein networks are increasingly serving as tools to unravel the molecular basis of disease. We review promising applications of protein networks to disease in four major areas: identifying new disease genes; the study of their network properties; identifying disease-related subnetworks; and network-based disease classification. Applications in infectious disease, personalized medicine, and pharmacology are also forthcoming as the available protein network information improves in quality and coverage.
Collapse
Affiliation(s)
- Trey Ideker
- Department of Bioengineering, University of California at San Diego, La Jolla, California 92093, USA
| | | |
Collapse
|
80
|
Veber P, Guziolowski C, Le Borgne M, Radulescu O, Siegel A. Inferring the role of transcription factors in regulatory networks. BMC Bioinformatics 2008; 9:228. [PMID: 18460200 PMCID: PMC2422845 DOI: 10.1186/1471-2105-9-228] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2007] [Accepted: 05/06/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Expression profiles obtained from multiple perturbation experiments are increasingly used to reconstruct transcriptional regulatory networks, from well studied, simple organisms up to higher eukaryotes. Admittedly, a key ingredient in developing a reconstruction method is its ability to integrate heterogeneous sources of information, as well as to comply with practical observability issues: measurements can be scarce or noisy. In this work, we show how to combine a network of genetic regulations with a set of expression profiles, in order to infer the functional effect of the regulations, as inducer or repressor. Our approach is based on a consistency rule between a network and the signs of variation given by expression arrays. RESULTS We evaluate our approach in several settings of increasing complexity. First, we generate artificial expression data on a transcriptional network of E. coli extracted from the literature (1529 nodes and 3802 edges), and we estimate that 30% of the regulations can be annotated with about 30 profiles. We additionally prove that at most 40.8% of the network can be inferred using our approach. Second, we use this network in order to validate the predictions obtained with a compendium of real expression profiles. We describe a filtering algorithm that generates particularly reliable predictions. Finally, we apply our inference approach to S. cerevisiae transcriptional network (2419 nodes and 4344 interactions), by combining ChIP-chip data and 15 expression profiles. We are able to detect and isolate inconsistencies between the expression profiles and a significant portion of the model (15% of all the interactions). In addition, we report predictions for 14.5% of all interactions. CONCLUSION Our approach does not require accurate expression levels nor times series. Nevertheless, we show on both data, real and artificial, that a relatively small number of perturbation experiments are enough to determine a significant portion of regulatory effects. This is a key practical asset compared to statistical methods for network reconstruction. We demonstrate that our approach is able to provide accurate predictions, even when the network is incomplete and the data is noisy.
Collapse
Affiliation(s)
- Philippe Veber
- Centre INRIA Rennes Bretagne Atlantique, IRISA, Rennes, France.
| | | | | | | | | |
Collapse
|
81
|
Shachar R, Ungar L, Kupiec M, Ruppin E, Sharan R. A systems-level approach to mapping the telomere length maintenance gene circuitry. Mol Syst Biol 2008; 4:172. [PMID: 18319724 PMCID: PMC2290934 DOI: 10.1038/msb.2008.13] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2008] [Accepted: 01/28/2008] [Indexed: 11/09/2022] Open
Abstract
The ends of eukaryotic chromosomes are protected by telomeres, nucleoprotein structures that are essential for chromosomal stability and integrity. Understanding how telomere length is controlled has significant medical implications, especially in the fields of aging and cancer. Two recent systematic genome-wide surveys measuring the telomere length of deleted mutants in the yeast Saccharomyces cerevisiae have identified hundreds of telomere length maintenance (TLM) genes, which span a large array of functional categories and different localizations within the cell. This study presents a novel general method that integrates large-scale screening mutant data with protein–protein interaction information to rigorously chart the cellular subnetwork underlying the function investigated. Applying this method to the yeast telomere length control data, we identify pathways that connect the TLM proteins to the telomere-processing machinery, and predict new TLM genes and their effect on telomere length. We experimentally validate some of these predictions, demonstrating that our method is remarkably accurate. Our results both uncover the complex cellular network underlying TLM and validate a new method for inferring such networks.
Collapse
Affiliation(s)
- Rafi Shachar
- School of Computer Science, Tel Aviv University, Ramat Aviv, Israel
| | | | | | | | | |
Collapse
|
82
|
Chen G, Larsen P, Almasri E, Dai Y. Rank-based edge reconstruction for scale-free genetic regulatory networks. BMC Bioinformatics 2008; 9:75. [PMID: 18237422 PMCID: PMC2275249 DOI: 10.1186/1471-2105-9-75] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2007] [Accepted: 01/31/2008] [Indexed: 11/12/2022] Open
Abstract
Background The reconstruction of genetic regulatory networks from microarray gene expression data has been a challenging task in bioinformatics. Various approaches to this problem have been proposed, however, they do not take into account the topological characteristics of the targeted networks while reconstructing them. Results In this study, an algorithm that explores the scale-free topology of networks was proposed based on the modification of a rank-based algorithm for network reconstruction. The new algorithm was evaluated with the use of both simulated and microarray gene expression data. The results demonstrated that the proposed algorithm outperforms the original rank-based algorithm. In addition, in comparison with the Bayesian Network approach, the results show that the proposed algorithm gives much better recovery of the underlying network when sample size is much smaller relative to the number of genes. Conclusion The proposed algorithm is expected to be useful in the reconstruction of biological networks whose degree distributions follow the scale-free topology.
Collapse
Affiliation(s)
- Guanrao Chen
- Department of Computer Science (MC152), University of Illinois at Chicago, 851 South Morgan Street, Chicago, IL 60607, USA.
| | | | | | | |
Collapse
|
83
|
Tan K, Tegner J, Ravasi T. Integrated approaches to uncovering transcription regulatory networks in mammalian cells. Genomics 2008; 91:219-31. [PMID: 18191937 DOI: 10.1016/j.ygeno.2007.11.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2007] [Revised: 11/14/2007] [Accepted: 11/16/2007] [Indexed: 11/16/2022]
Abstract
Integrative systems biology has emerged as an exciting research approach in molecular biology and functional genomics that involves the integration of genomics, proteomics, and metabolomics datasets. These endeavors establish a systematic paradigm by which to interrogate, model, and iteratively refine our knowledge of the regulatory events within a cell. Here we review the latest technologies available to collect high-throughput measurements of a cellular state as well as the most successful methods for the integration and interrogation of these measurements. In particular we will focus on methods available to infer transcription regulatory networks in mammals.
Collapse
Affiliation(s)
- Kai Tan
- Department of Bioengineering, Jacobs School of Engineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.
| | | | | |
Collapse
|
84
|
Abstract
Common human diseases like obesity and diabetes are driven by complex networks of genes and any number of environmental factors. To understand this complexity in hopes of identifying targets and developing drugs against disease, a systematic approach is required to elucidate the genetic and environmental factors and interactions among and between these factors, and to establish how these factors induce changes in gene networks that in turn lead to disease. The explosion of large-scale, high-throughput technologies in the biological sciences has enabled researchers to take a more systems biology approach to study complex traits like disease. Genotyping of hundreds of thousands of DNA markers and profiling tens of thousands of molecular phenotypes simultaneously in thousands of individuals is now possible, and this scale of data is making it possible for the first time to reconstruct whole gene networks associated with disease. In the following sections, we review different approaches for integrating genetic expression and clinical data to infer causal relationships among gene expression traits and between expression and disease traits. We further review methods to integrate these data in a more comprehensive manner to identify common pathways shared by the causal factors driving disease, including the reconstruction of association and probabilistic causal networks. Particular attention is paid to integrating diverse information to refine these types of networks so that they are more predictive. To highlight these different approaches in practice, we step through an example on how Insig2 was identified as a causal factor for plasma cholesterol levels in mice.
Collapse
|
85
|
Chen G, Larsen P, Almasri E, Dai Y. Sample scale-free gene regulatory network using gene ontology. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007; 2006:5523-6. [PMID: 17946312 DOI: 10.1109/iembs.2006.259261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Currently there are various approaches to the reconstruction of gene regulatory networks from different sources of data. However, none of these methods incorporates explicitly scale-free property, one of the most important features of the targeted network, into their algorithms. In this paper, several network sampling strategies are explored on a set assembled from previous published gene interactions in yeast, expecting to reconstruct regulatory networks that are scale-free.
Collapse
Affiliation(s)
- Guanrao Chen
- Department of Computer Science, University of Illinois at Chicago, IL 60607, USA.
| | | | | | | |
Collapse
|
86
|
GONG Y, ZHANG Z. Alternative Pathway Approach for Automating Analysis and Validation of Cell Perturbation Networks and Design of Perturbation Experiments. Ann N Y Acad Sci 2007; 1115:267-85. [DOI: 10.1196/annals.1407.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
87
|
Abstract
In this review we give an overview of computational and statistical methods to reconstruct cellular networks. Although this area of research is vast and fast developing, we show that most currently used methods can be organized by a few key concepts. The first part of the review deals with conditional independence models including Gaussian graphical models and Bayesian networks. The second part discusses probabilistic and graph-based methods for data from experimental interventions and perturbations.
Collapse
Affiliation(s)
- Florian Markowetz
- Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany
- Princeton University, Lewis-Sigler Institute for Integrative Genomics and Dept. of Computer Science, Princeton, NJ 08544, USA
| | - Rainer Spang
- Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany
- Present affiliation: University Regensburg, Institute of Functional Genomics, Josef-Engert-Str. 9, 93053 Regensburg, Germany
| |
Collapse
|
88
|
Beyer A, Bandyopadhyay S, Ideker T. Integrating physical and genetic maps: from genomes to interaction networks. Nat Rev Genet 2007; 8:699-710. [PMID: 17703239 PMCID: PMC2811081 DOI: 10.1038/nrg2144] [Citation(s) in RCA: 161] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Physical and genetic mapping data have become as important to network biology as they once were to the Human Genome Project. Integrating physical and genetic networks currently faces several challenges: increasing the coverage of each type of network; establishing methods to assemble individual interaction measurements into contiguous pathway models; and annotating these pathways with detailed functional information. A particular challenge involves reconciling the wide variety of interaction types that are currently available. For this purpose, recent studies have sought to classify genetic and physical interactions along several complementary dimensions, such as ordered versus unordered, alleviating versus aggravating, and first versus second degree.
Collapse
Affiliation(s)
- Andreas Beyer
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| | | | | |
Collapse
|
89
|
Markowetz F, Kostka D, Troyanskaya OG, Spang R. Nested effects models for high-dimensional phenotyping screens. ACTA ACUST UNITED AC 2007; 23:i305-12. [PMID: 17646311 DOI: 10.1093/bioinformatics/btm178] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION In high-dimensional phenotyping screens, a large number of cellular features is observed after perturbing genes by knockouts or RNA interference. Comprehensive analysis of perturbation effects is one of the most powerful techniques for attributing functions to genes, but not much work has been done so far to adapt statistical and computational methodology to the specific needs of large-scale and high-dimensional phenotyping screens. RESULTS We introduce and compare probabilistic methods to efficiently infer a genetic hierarchy from the nested structure of observed perturbation effects. These hierarchies elucidate the structures of signaling pathways and regulatory networks. Our methods achieve two goals: (1) they reveal clusters of genes with highly similar phenotypic profiles, and (2) they order (clusters of) genes according to subset relationships between phenotypes. We evaluate our algorithms in the controlled setting of simulation studies and show their practical use in two experimental scenarios: (1) a data set investigating the response to microbial challenge in Drosophila melanogaster, and (2) a compendium of expression profiles of Saccharomyces cerevisiae knockout strains. We show that our methods identify biologically justified genetic hierarchies of perturbation effects. AVAILABILITY The software used in our analysis is freely available in the R package 'nem' from www.bioconductor.org.
Collapse
Affiliation(s)
- Florian Markowetz
- Lewis-Sigler Institute for Integrative Genomics and Department of Computer Science, Princeton University, Princeton, NJ, 08544, USA
| | | | | | | |
Collapse
|
90
|
Ourfali O, Shlomi T, Ideker T, Ruppin E, Sharan R. SPINE: a framework for signaling-regulatory pathway inference from cause-effect experiments. ACTA ACUST UNITED AC 2007; 23:i359-66. [PMID: 17646318 DOI: 10.1093/bioinformatics/btm170] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The complex program of gene expression allows the cell to cope with changing genetic, developmental and environmental conditions. The accumulating large-scale measurements of gene knockout effects and molecular interactions allow us to begin to uncover regulatory and signaling pathways within the cell that connect causal to affected genes on a network of physical interactions. RESULTS We present a novel framework, SPINE, for Signaling-regulatory Pathway INferencE. The framework aims at explaining gene expression experiments in which a gene is knocked out and as a result multiple genes change their expression levels. To this end, an integrated network of protein-protein and protein-DNA interactions is constructed, and signaling pathways connecting the causal gene to the affected genes are searched for in this network. The reconstruction problem is translated into that of assigning an activation/repression attribute with each protein so as to explain (in expectation) a maximum number of the knockout effects observed. We provide an integer programming formulation for the latter problem and solve it using a commercial solver. We validate the method by applying it to a yeast subnetwork that is involved in mating. In cross-validation tests, SPINE obtains very high accuracy in predicting knockout effects (99%). Next, we apply SPINE to the entire yeast network to predict protein effects and reconstruct signaling and regulatory pathways. Overall, we are able to infer 861 paths with confidence and assign effects to 183 genes. The predicted effects are found to be in high agreement with current biological knowledge. AVAILABILITY The algorithm and data are available at http://cs.tau.ac.il/~roded/SPINE.html.
Collapse
Affiliation(s)
- Oved Ourfali
- School of Computer Science, School of Medicine, Tel-Aviv University, Tel-Aviv, Israel
| | | | | | | | | |
Collapse
|
91
|
Abstract
Network analysis of living systems is an essential component of contemporary systems biology. It is targeted at assemblance of mutual dependences between interacting systems elements into an integrated view of whole-system functioning. In the following chapter we describe the existing classification of what is referred to as biological networks and show how complex interdependencies in biological systems can be represented in a simpler form of network graphs. Further structural analysis of the assembled biological network allows getting knowledge on the functioning of the entire biological system. Such aspects of network structure as connectivity of network elements and connectivity degree distribution, degree of node centralities, clustering coefficient, network diameter and average path length are touched. Networks are analyzed as static entities, or the dynamical behavior of underlying biological systems may be considered. The description of mathematical and computational approaches for determining the dynamics of regulatory networks is provided. Causality as another characteristic feature of a dynamically functioning biosystem can be also accessed in the reconstruction of biological networks; we give the examples of how this integration is accomplished. Further questions about network dynamics and evolution can be approached by means of network comparison. Network analysis gives rise to new global hypotheses on systems functionality and reductionist findings of novel molecular interactions, based on the reliability of network reconstructions, which has to be tested in the subsequent experiments. We provide a collection of useful links to be used for the analysis of biological networks.
Collapse
Affiliation(s)
- Victoria J Nikiforova
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany.
| | | |
Collapse
|
92
|
Shi Y, Mitchell T, Bar-Joseph Z. Inferring pairwise regulatory relationships from multiple time series datasets. Bioinformatics 2007; 23:755-63. [PMID: 17237067 DOI: 10.1093/bioinformatics/btl676] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Time series expression experiments have emerged as a popular method for studying a wide range of biological systems under a variety of conditions. One advantage of such data is the ability to infer regulatory relationships using time lag analysis. However, such analysis in a single experiment may result in many false positives due to the small number of time points and the large number of genes. Extending these methods to simultaneously analyze several time series datasets is challenging since under different experimental conditions biological systems may behave faster or slower making it hard to rely on the actual duration of the experiment. RESULTS We present a new computational model and an associated algorithm to address the problem of inferring time-lagged regulatory relationships from multiple time series expression experiments with varying (unknown) time-scales. Our proposed algorithm uses a set of known interacting pairs to compute a temporal transformation between every two datasets. Using this temporal transformation we search for new interacting pairs. As we show, our method achieves a much lower false-positive rate compared to previous methods that use time series expression data for pairwise regulatory relationship discovery. Some of the new predictions made by our method can be verified using other high throughput data sources and functional annotation databases. AVAILABILITY Matlab implementation is available from the supporting website: http://www.cs.cmu.edu/~yanxins/regulation_inference/index.html.
Collapse
Affiliation(s)
- Yanxin Shi
- Machine Learning Department, Language Technologies Institute, Computer Science Department and Department of Biological Sciences, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213, USA
| | | | | |
Collapse
|
93
|
Dai X, Souza ATD, Dai H, Lewis DL, Lee CK, Spencer AG, Herweijer H, Hagstrom JE, Linsley PS, Bassett DE, Ulrich RG, He YD. PPARalpha siRNA-treated expression profiles uncover the causal sufficiency network for compound-induced liver hypertrophy. PLoS Comput Biol 2007; 3:e30. [PMID: 17335344 PMCID: PMC1808491 DOI: 10.1371/journal.pcbi.0030030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2006] [Accepted: 01/02/2007] [Indexed: 11/18/2022] Open
Abstract
Uncovering pathways underlying drug-induced toxicity is a fundamental objective in the field of toxicogenomics. Developing mechanism-based toxicity biomarkers requires the identification of such novel pathways and the order of their sufficiency in causing a phenotypic response. Genome-wide RNA interference (RNAi) phenotypic screening has emerged as an effective tool in unveiling the genes essential for specific cellular functions and biological activities. However, eliciting the relative contribution of and sufficiency relationships among the genes identified remains challenging. In the rodent, the most widely used animal model in preclinical studies, it is unrealistic to exhaustively examine all potential interactions by RNAi screening. Application of existing computational approaches to infer regulatory networks with biological outcomes in the rodent is limited by the requirements for a large number of targeted permutations. Therefore, we developed a two-step relay method that requires only one targeted perturbation for genome-wide de novo pathway discovery. Using expression profiles in response to small interfering RNAs (siRNAs) against the gene for peroxisome proliferator-activated receptor alpha (Ppara), our method unveiled the potential causal sufficiency order network for liver hypertrophy in the rodent. The validity of the inferred 16 causal transcripts or 15 known genes for PPARalpha-induced liver hypertrophy is supported by their ability to predict non-PPARalpha-induced liver hypertrophy with 84% sensitivity and 76% specificity. Simulation shows that the probability of achieving such predictive accuracy without the inferred causal relationship is exceedingly small (p < 0.005). Five of the most sufficient causal genes have been previously disrupted in mouse models; the resulting phenotypic changes in the liver support the inferred causal roles in liver hypertrophy. Our results demonstrate the feasibility of defining pathways mediating drug-induced toxicity from siRNA-treated expression profiles. When combined with phenotypic evaluation, our approach should help to unleash the full potential of siRNAs in systematically unveiling the molecular mechanism of biological events.
Collapse
Affiliation(s)
- Xudong Dai
- Informatics, Rosetta Inpharmatics, Seattle, Washington, United States of America
- * To whom correspondence should be addressed. E-mail: (XD); (YDH)
| | - Angus T. De Souza
- Preclinical Molecular Profiling, Rosetta Inpharmatics, Seattle, Washington, United States of America
| | - Hongyue Dai
- Informatics, Rosetta Inpharmatics, Seattle, Washington, United States of America
| | - David L Lewis
- Mirus Bio Corporation, Madison, Wisconsin, United States of America
| | - Chang-kyu Lee
- Informatics, Rosetta Inpharmatics, Seattle, Washington, United States of America
| | - Andy G Spencer
- Mirus Bio Corporation, Madison, Wisconsin, United States of America
| | - Hans Herweijer
- Mirus Bio Corporation, Madison, Wisconsin, United States of America
| | - Jim E Hagstrom
- Mirus Bio Corporation, Madison, Wisconsin, United States of America
| | - Peter S Linsley
- Cancer Biology, Rosetta Inpharmatics, Seattle, Washington, United States of America
| | - Douglas E Bassett
- Informatics, Rosetta Inpharmatics, Seattle, Washington, United States of America
| | - Roger G Ulrich
- Preclinical Molecular Profiling, Rosetta Inpharmatics, Seattle, Washington, United States of America
| | - Yudong D He
- Informatics, Rosetta Inpharmatics, Seattle, Washington, United States of America
- * To whom correspondence should be addressed. E-mail: (XD); (YDH)
| |
Collapse
|
94
|
Tegnér J, Björkegren J. Perturbations to uncover gene networks. Trends Genet 2007; 23:34-41. [PMID: 17098324 DOI: 10.1016/j.tig.2006.11.003] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2006] [Revised: 09/12/2006] [Accepted: 11/02/2006] [Indexed: 11/19/2022]
Abstract
After the major achievements of the DNA sequencing projects, an equally important challenge now is to uncover the functional relationships among genes (i.e. gene networks). It has become increasingly clear that computational algorithms are crucial for extracting meaningful information from the massive amount of data generated by high-throughput genome-wide technologies. Here, we summarise how systems identification algorithms, originating from physics and control theory, have been adapted for use in biology. We also explain how experimental perturbations combined with genome-wide measurements are being used to uncover gene networks. Perturbation techniques could pave the way for identifying gene networks in more complex settings such as multifactorial diseases and for improving the efficacy of drug evaluation.
Collapse
Affiliation(s)
- Jesper Tegnér
- Division of Computational Biology, Department of Physics, Chemistry and Biology, The Institute of Technology, Linköping University, SE-581 83 Linköping, Sweden.
| | | |
Collapse
|
95
|
Abstract
Reliable and comprehensive maps of molecular pathways are indispensable for guiding complex biomedical experiments. Such maps are typically assembled from myriads of disparate research reports and are replete with inconsistencies due to variations in experimental conditions and/or errors. It is often an intractable task to manually verify internal consistency over a large collection of experimental statements. To automate large-scale reconciliation efforts, we propose a random-arcs-and-nodes model where both nodes (tissue-specific states of biological molecules) and arcs (interactions between them) are represented with random variables. We show how to obtain a non-contradictory model of a molecular network by computing the joint distribution for arc and node variables, and then apply our methodology to a realistic network, generating a set of experimentally testable hypotheses. This network, derived from an automated analysis of over 3,000 full-text research articles, includes genes that have been hypothetically linked to four neurological disorders: Alzheimer's disease, autism, bipolar disorder, and schizophrenia. We estimated that approximately 10% of the published molecular interactions are logically incompatible. Our approach can be directly applied to an array of diverse problems including those encountered in molecular biology, ecology, economics, politics, and sociology.
Collapse
Affiliation(s)
- Andrey Rzhetsky
- Department of Biomedical Informatics, Center for Computational Biology and Bioinformatics and Joint Centers for Systems Biology, Columbia University, New York, New York, USA.
| | | | | |
Collapse
|
96
|
Yeang CH, Vingron M. A joint model of regulatory and metabolic networks. BMC Bioinformatics 2006; 7:332. [PMID: 16820044 PMCID: PMC1559649 DOI: 10.1186/1471-2105-7-332] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2005] [Accepted: 07/04/2006] [Indexed: 11/10/2022] Open
Abstract
Background Gene regulation and metabolic reactions are two primary activities of life. Although many works have been dedicated to study each system, the coupling between them is less well understood. To bridge this gap, we propose a joint model of gene regulation and metabolic reactions. Results We integrate regulatory and metabolic networks by adding links specifying the feedback control from the substrates of metabolic reactions to enzyme gene expressions. We adopt two alternative approaches to build those links: inferring the links between metabolites and transcription factors to fit the data or explicitly encoding the general hypotheses of feedback control as links between metabolites and enzyme expressions. A perturbation data is explained by paths in the joint network if the predicted response along the paths is consistent with the observed response. The consistency requirement for explaining the perturbation data imposes constraints on the attributes in the network such as the functions of links and the activities of paths. We build a probabilistic graphical model over the attributes to specify these constraints, and apply an inference algorithm to identify the attribute values which optimally explain the data. The inferred models allow us to 1) identify the feedback links between metabolites and regulators and their functions, 2) identify the active paths responsible for relaying perturbation effects, 3) computationally test the general hypotheses pertaining to the feedback control of enzyme expressions, 4) evaluate the advantage of an integrated model over separate systems. Conclusion The modeling results provide insight about the mechanisms of the coupling between the two systems and possible "design rules" pertaining to enzyme gene regulation. The model can be used to investigate the less well-probed systems and generate consistent hypotheses and predictions for further validation.
Collapse
Affiliation(s)
- Chen-Hsiang Yeang
- Center for Biomolecular Science & Engineering, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Martin Vingron
- Max-Planck Institute for Molecular Genetics, 73 Ihnerstraße, Berlin, Germany
| |
Collapse
|
97
|
Yeang CH, Jaakkola T. Modeling the combinatorial functions of multiple transcription factors. J Comput Biol 2006; 13:463-80. [PMID: 16597252 DOI: 10.1089/cmb.2006.13.463] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A considerable fraction of gene promoters are bound by multiple transcription factors. It is therefore important to understand how such factors interact in regulating the genes. In this paper, we propose a computational method to identify groups of co-regulated genes and the corresponding regulatory programs of multiple transcription factors from protein- DNA binding and gene expression data. The key concept is to characterize a regulatory program in terms of two properties of individual transcription factors: the function of a regulator as an activator or a repressor, and its direction of effectiveness as necessary or sufficient. We apply a greedy algorithm to find the regulatory models which best explain the available data. Empirical analysis indicates that the inferred regulatory models agree with known combinatorial interactions between regulators and are robust against various parameter choices.
Collapse
Affiliation(s)
- Chen-Hsiang Yeang
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, 95064, USA.
| | | |
Collapse
|
98
|
Gat-Viks I, Tanay A, Raijman D, Shamir R. A probabilistic methodology for integrating knowledge and experiments on biological networks. J Comput Biol 2006; 13:165-81. [PMID: 16597233 DOI: 10.1089/cmb.2006.13.165] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Abstract
Biological systems are traditionally studied by focusing on a specific subsystem, building an intuitive model for it, and refining the model using results from carefully designed experiments. Modern experimental techniques provide massive data on the global behavior of biological systems, and systematically using these large datasets for refining existing knowledge is a major challenge. Here we introduce an extended computational framework that combines formalization of existing qualitative models, probabilistic modeling, and integration of high-throughput experimental data. Using our methods, it is possible to interpret genomewide measurements in the context of prior knowledge on the system, to assign statistical meaning to the accuracy of such knowledge, and to learn refined models with improved fit to the experiments. Our model is represented as a probabilistic factor graph, and the framework accommodates partial measurements of diverse biological elements. We study the performance of several probabilistic inference algorithms and show that hidden model variables can be reliably inferred even in the presence of feedback loops and complex logic. We show how to refine prior knowledge on combinatorial regulatory relations using hypothesis testing and derive p-values for learned model features. We test our methodology and algorithms on a simulated model and on two real yeast models. In particular, we use our method to explore uncharacterized relations among regulators in the yeast response to hyper-osmotic shock and in the yeast lysine biosynthesis system. Our integrative approach to the analysis of biological regulation is demonstrated to synergistically combine qualitative and quantitative evidence into concrete biological predictions.
Collapse
Affiliation(s)
- Irit Gat-Viks
- School of Computer Science, Tel-Aviv University, Israel.
| | | | | | | |
Collapse
|
99
|
Workman CT, Mak HC, McCuine S, Tagne JB, Agarwal M, Ozier O, Begley TJ, Samson LD, Ideker T. A systems approach to mapping DNA damage response pathways. Science 2006; 312:1054-9. [PMID: 16709784 PMCID: PMC2811083 DOI: 10.1126/science.1122088] [Citation(s) in RCA: 203] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Failure of cells to respond to DNA damage is a primary event associated with mutagenesis and environmental toxicity. To map the transcriptional network controlling the damage response, we measured genomewide binding locations for 30 damage-related transcription factors (TFs) after exposure of yeast to methyl-methanesulfonate (MMS). The resulting 5272 TF-target interactions revealed extensive changes in the pattern of promoter binding and identified damage-specific binding motifs. As systematic functional validation, we identified interactions for which the target changed expression in wild-type cells in response to MMS but was nonresponsive in cells lacking the TF. Validated interactions were assembled into causal pathway models that provide global hypotheses of how signaling, transcription, and phenotype are integrated after damage.
Collapse
Affiliation(s)
| | - H. Craig Mak
- University of California San Diego, La Jolla, CA 92093, USA
| | - Scott McCuine
- University of California San Diego, La Jolla, CA 92093, USA
| | - Jean-Bosco Tagne
- Whitehead Institute for Biomedical Research, Cambridge, MA 02139, USA
| | - Maya Agarwal
- University of California San Diego, La Jolla, CA 92093, USA
| | - Owen Ozier
- Whitehead Institute for Biomedical Research, Cambridge, MA 02139, USA
| | - Thomas J. Begley
- University of Albany–State University at New York, Rensselaer, NY 12144, USA
| | - Leona D. Samson
- Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Trey Ideker
- University of California San Diego, La Jolla, CA 92093, USA
- To whom correspondence should be addressed.
| |
Collapse
|
100
|
Barrett CL, Palsson BO. Iterative reconstruction of transcriptional regulatory networks: an algorithmic approach. PLoS Comput Biol 2006; 2:e52. [PMID: 16710450 PMCID: PMC1463018 DOI: 10.1371/journal.pcbi.0020052] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2005] [Accepted: 04/05/2006] [Indexed: 11/25/2022] Open
Abstract
The number of complete, publicly available genome sequences is now greater than 200, and this number is expected to rapidly grow in the near future as metagenomic and environmental sequencing efforts escalate and the cost of sequencing drops. In order to make use of this data for understanding particular organisms and for discerning general principles about how organisms function, it will be necessary to reconstruct their various biochemical reaction networks. Principal among these will be transcriptional regulatory networks. Given the physical and logical complexity of these networks, the various sources of (often noisy) data that can be utilized for their elucidation, the monetary costs involved, and the huge number of potential experiments (~1012) that can be performed, experiment design algorithms will be necessary for synthesizing the various computational and experimental data to maximize the efficiency of regulatory network reconstruction. This paper presents an algorithm for experimental design to systematically and efficiently reconstruct transcriptional regulatory networks. It is meant to be applied iteratively in conjunction with an experimental laboratory component. The algorithm is presented here in the context of reconstructing transcriptional regulation for metabolism in Escherichia coli, and, through a retrospective analysis with previously performed experiments, we show that the produced experiment designs conform to how a human would design experiments. The algorithm is able to utilize probability estimates based on a wide range of computational and experimental sources to suggest experiments with the highest potential of discovering the greatest amount of new regulatory knowledge. In recent years, the exploration of life has been bolstered through the advent of whole genome sequencing. This new data source significantly enables the reconstruction of genome-scale metabolic networks. After a metabolic reconstruction, it will be necessary to discover the genetic control mechanisms that operate within an organism. Transcriptional regulatory network (TRN) reconstruction is costly both in terms of time and money, so it is critical that the reconstruction efforts be made as efficient as possible. Experiments must be designed so that the most new regulatory knowledge is discovered in each experiment. The huge number of possible experiments (~1012) and the vast amount of heterogeneous data available for designing experiments overwhelms the human ability to assimilate. The authors have developed an algorithm that utilizes a mathematical model of a reconstructed metabolic network integrated with a partially reconstructed TRN to identify the experiment designs with the highest potential of yielding the most new regulatory knowledge. The authors show that the produced experiment designs are similar to those a human expert would produce, and that the algorithm has a facility to incorporate any relevant data source to design such experiments.
Collapse
Affiliation(s)
- Christian L Barrett
- Bioengineering Department, University of California San Diego, La Jolla, California, United States of America
| | - Bernhard O Palsson
- Bioengineering Department, University of California San Diego, La Jolla, California, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|