301
|
Song M, Ouyang Z, Liu ZL. Discrete dynamical system modelling for gene regulatory networks of 5-hydroxymethylfurfural tolerance for ethanologenic yeast. IET Syst Biol 2009; 3:203-18. [PMID: 19449980 DOI: 10.1049/iet-syb.2008.0089] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Composed of linear difference equations, a discrete dynamical system (DDS) model was designed to reconstruct transcriptional regulations in gene regulatory networks (GRNs) for ethanologenic yeast Saccharomyces cerevisiae in response to 5-hydroxymethylfurfural (HMF), a bioethanol conversion inhibitor. The modelling aims at identification of a system of linear difference equations to represent temporal interactions among significantly expressed genes. Power stability is imposed on a system model under the normal condition in the absence of the inhibitor. Non-uniform sampling, typical in a time-course experimental design, is addressed by a log-time domain interpolation. A statistically significant DDS model of the yeast GRN derived from time-course gene expression measurements by exposure to HMF, revealed several verified transcriptional regulation events. These events implicate Yap1 and Pdr3, transcription factors consistently known for their regulatory roles by other studies or postulated by independent sequence motif analysis, suggesting their involvement in yeast tolerance and detoxification of the inhibitor.
Collapse
Affiliation(s)
- M Song
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA.
| | | | | |
Collapse
|
302
|
Abstract
Organisms must continually adapt to changing cellular and environmental factors (e.g., oxygen levels) by altering their gene expression patterns. At the same time, all organisms must have stable gene expression patterns that are robust to small fluctuations in environmental factors and genetic variation. Learning and characterizing the structure and dynamics of Regulatory Networks (RNs), on a whole-genome scale, is a key problem in systems biology. Here, we review the challenges associated with inferring RNs in a solely data-driven manner, concisely discuss the implications and contingencies of possible procedures that can be used, specifically focusing on one such procedure, the Inferelator. Importantly, the Inferelator explicitly models the temporal component of regulation, can learn the interactions between transcription factors and environmental factors, and attaches a statistically meaningful weight to every edge. The result of the Inferelator is a dynamical model of the RN that can be used to model the time-evolution of cell state.
Collapse
|
303
|
Prevalence of transcription promoters within archaeal operons and coding sequences. Mol Syst Biol 2009; 5:285. [PMID: 19536208 PMCID: PMC2710873 DOI: 10.1038/msb.2009.42] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2008] [Accepted: 05/13/2009] [Indexed: 01/21/2023] Open
Abstract
Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of approximately 64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein-DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3' ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes-events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements.
Collapse
|
304
|
Michoel T, De Smet R, Joshi A, Van de Peer Y, Marchal K. Comparative analysis of module-based versus direct methods for reverse-engineering transcriptional regulatory networks. BMC SYSTEMS BIOLOGY 2009; 3:49. [PMID: 19422680 PMCID: PMC2684101 DOI: 10.1186/1752-0509-3-49] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2008] [Accepted: 05/07/2009] [Indexed: 12/20/2022]
Abstract
BACKGROUND A myriad of methods to reverse-engineer transcriptional regulatory networks have been developed in recent years. Direct methods directly reconstruct a network of pairwise regulatory interactions while module-based methods predict a set of regulators for modules of coexpressed genes treated as a single unit. To date, there has been no systematic comparison of the relative strengths and weaknesses of both types of methods. RESULTS We have compared a recently developed module-based algorithm, LeMoNe (Learning Module Networks), to a mutual information based direct algorithm, CLR (Context Likelihood of Relatedness), using benchmark expression data and databases of known transcriptional regulatory interactions for Escherichia coli and Saccharomyces cerevisiae. A global comparison using recall versus precision curves hides the topologically distinct nature of the inferred networks and is not informative about the specific subtasks for which each method is most suited. Analysis of the degree distributions and a regulator specific comparison show that CLR is 'regulator-centric', making true predictions for a higher number of regulators, while LeMoNe is 'target-centric', recovering a higher number of known targets for fewer regulators, with limited overlap in the predicted interactions between both methods. Detailed biological examples in E. coli and S. cerevisiae are used to illustrate these differences and to prove that each method is able to infer parts of the network where the other fails. Biological validation of the inferred networks cautions against over-interpreting recall and precision values computed using incomplete reference networks. CONCLUSION Our results indicate that module-based and direct methods retrieve largely distinct parts of the underlying transcriptional regulatory networks. The choice of algorithm should therefore be based on the particular biological problem of interest and not on global metrics which cannot be transferred between organisms. The development of sound statistical methods for integrating the predictions of different reverse-engineering strategies emerges as an important challenge for future research.
Collapse
Affiliation(s)
- Tom Michoel
- Department of Plant Systems Biology, VIB, Technologiepark 927, B-9052 Gent, Belgium.
| | | | | | | | | |
Collapse
|
305
|
Watkinson J, Liang KC, Wang X, Zheng T, Anastassiou D. Inference of regulatory gene interactions from expression data using three-way mutual information. Ann N Y Acad Sci 2009; 1158:302-13. [PMID: 19348651 DOI: 10.1111/j.1749-6632.2008.03757.x] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
This paper describes the technique designated best performer in the 2nd conference on Dialogue for Reverse Engineering Assessments and Methods (DREAM2) Challenge 5 (unsigned genome-scale network prediction from blinded microarray data). Existing algorithms use the pairwise correlations of the expression levels of genes, which provide valuable but insufficient information for the inference of regulatory interactions. Here we present a computational approach based on the recently developed context likelihood of related (CLR) algorithm, extracting additional complementary information using the information theoretic measure of synergy and assigning a score to each ordered pair of genes measuring the degree of confidence that the first gene regulates the second. When tested on a set of publicly available Escherichia coli gene-expression data with known assumed ground truth, the synergy augmented CLR (SA-CLR) algorithm had significantly improved prediction performance when compared to CLR. There is also enhanced potential for biological discovery as a result of the identification of the most likely synergistic partner genes involved in the interactions.
Collapse
|
306
|
Michoel T, De Smet R, Joshi A, Marchal K, Van de Peer Y. Reverse-engineering transcriptional modules from gene expression data. Ann N Y Acad Sci 2009; 1158:36-43. [PMID: 19348630 DOI: 10.1111/j.1749-6632.2008.03943.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
"Module networks" are a framework to learn gene regulatory networks from expression data using a probabilistic model in which coregulated genes share the same parameters and conditional distributions. We present a method to infer ensembles of such networks and an averaging procedure to extract the statistically most significant modules and their regulators. We show that the inferred probabilistic models extend beyond the dataset used to learn the models.
Collapse
Affiliation(s)
- Tom Michoel
- Department of Plant Systems Biology, VIB, Gent, Belgium.
| | | | | | | | | |
Collapse
|
307
|
Lemmens K, De Bie T, Dhollander T, De Keersmaecker SC, Thijs IM, Schoofs G, De Weerdt A, De Moor B, Vanderleyden J, Collado-Vides J, Engelen K, Marchal K. DISTILLER: a data integration framework to reveal condition dependency of complex regulons in Escherichia coli. Genome Biol 2009; 10:R27. [PMID: 19265557 PMCID: PMC2690998 DOI: 10.1186/gb-2009-10-3-r27] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2008] [Revised: 01/15/2009] [Accepted: 03/06/2009] [Indexed: 11/13/2022] Open
Abstract
DISTILLER, a data integration framework for the inference of transcriptional module networks, is presented and used to investigate the condition dependency and modularity in Escherichia coli networks. We present DISTILLER, a data integration framework for the inference of transcriptional module networks. Experimental validation of predicted targets for the well-studied fumarate nitrate reductase regulator showed the effectiveness of our approach in Escherichia coli. In addition, the condition dependency and modularity of the inferred transcriptional network was studied. Surprisingly, the level of regulatory complexity seemed lower than that which would be expected from RegulonDB, indicating that complex regulatory programs tend to decrease the degree of modularity.
Collapse
Affiliation(s)
- Karen Lemmens
- Department of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
308
|
Bickel DR, Montazeri Z, Hsieh PC, Beatty M, Lawit SJ, Bate NJ. Gene network reconstruction from transcriptional dynamics under kinetic model uncertainty: a case for the second derivative. ACTA ACUST UNITED AC 2009; 25:772-9. [PMID: 19218351 PMCID: PMC2654806 DOI: 10.1093/bioinformatics/btp028] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Motivation: Measurements of gene expression over time enable the reconstruction of transcriptional networks. However, Bayesian networks and many other current reconstruction methods rely on assumptions that conflict with the differential equations that describe transcriptional kinetics. Practical approximations of kinetic models would enable inferring causal relationships between genes from expression data of microarray, tag-based and conventional platforms, but conclusions are sensitive to the assumptions made. Results: The representation of a sufficiently large portion of genome enables computation of an upper bound on how much confidence one may place in influences between genes on the basis of expression data. Information about which genes encode transcription factors is not necessary but may be incorporated if available. The methodology is generalized to cover cases in which expression measurements are missing for many of the genes that might control the transcription of the genes of interest. The assumption that the gene expression level is roughly proportional to the rate of translation led to better empirical performance than did either the assumption that the gene expression level is roughly proportional to the protein level or the Bayesian model average of both assumptions. Availability:http://www.oisb.ca points to R code implementing the methods (R Development Core Team 2004). Contact:dbickel@uottawa.ca Supplementary information:http://www.davidbickel.com
Collapse
Affiliation(s)
- David R Bickel
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology, and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, Ontario, ON K1H 8M5, Canada.
| | | | | | | | | | | |
Collapse
|
309
|
Carrera J, Rodrigo G, Jaramillo A. Model-based redesign of global transcription regulation. Nucleic Acids Res 2009; 37:e38. [PMID: 19188257 PMCID: PMC2655681 DOI: 10.1093/nar/gkp022] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Synthetic biology aims to the design or redesign of biological systems. In particular, one possible goal could be the rewiring of the transcription regulation network by exchanging the endogenous promoters. To achieve this objective, we have adapted current methods to the inference of a model based on ordinary differential equations that is able to predict the network response after a major change in its topology. Our procedure utilizes microarray data for training. We have experimentally validated our inferred global regulatory model in Escherichia coli by predicting transcriptomic profiles under new perturbations. We have also tested our methodology in silico by providing accurate predictions of the underlying networks from expression data generated with artificial genomes. In addition, we have shown the predictive power of our methodology by obtaining the gene profile in experimental redesigns of the E. coli genome, where rewiring the transcriptional network by means of knockouts of master regulators or by upregulating transcription factors controlled by different promoters. Our approach is compatible with most network inference methods, allowing to explore computationally future genome-wide redesign experiments in synthetic biology.
Collapse
Affiliation(s)
- Javier Carrera
- Instituto de Biologia Molecular y Celular de Plantas, CSIC, Instituto de Aplicaciones en Tecnologias de la Informacion y las Comunicaciones Avanzadas, Universidad Politecnica de Valencia, Camino de Vera s/n, 46022 Valencia, Spain
| | | | | |
Collapse
|
310
|
Affiliation(s)
- Debopriya Das
- Life Sciences Division, Ernest O Lawrence Berkeley National Laboratory, Berkeley, California, United States of America.
| | | | | |
Collapse
|
311
|
Abstract
In this study, a reverse-engineering strategy was used to infer and analyze the structure and function of an aging and glucose repressed gene regulatory network in the budding yeast Saccharomyces cerevisiae. The method uses transcriptional perturbations to model the functional interactions between genes as a system of first-order ordinary differential equations. The resulting network model correctly identified the known interactions of key regulators in a 10-gene network from the Snf1 signaling pathway, which is required for expression of glucose-repressed genes upon calorie restriction. The majority of interactions predicted by the network model were confirmed using promoter-reporter gene fusions in gene-deletion mutants and chromatin immunoprecipitation experiments, revealing a more complex network architecture than previously appreciated. The reverse-engineered network model also predicted an unexpected role for transcriptional regulation of the SNF1 gene by hexose kinase enzyme/transcriptional repressor Hxk2, Mediator subunit Med8, and transcriptional repressor Mig1. These interactions were validated experimentally and used to design new experiments demonstrating Snf1 and its transcriptional regulators Hxk2 and Mig1 as modulators of chronological lifespan. This work demonstrates the value of using network inference methods to identify and characterize the regulators of complex phenotypes, such as aging.
Collapse
|
312
|
Joshi A, De Smet R, Marchal K, Van de Peer Y, Michoel T. Module networks revisited: computational assessment and prioritization of model predictions. Bioinformatics 2009; 25:490-6. [PMID: 19136553 DOI: 10.1093/bioinformatics/btn658] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION The solution of high-dimensional inference and prediction problems in computational biology is almost always a compromise between mathematical theory and practical constraints, such as limited computational resources. As time progresses, computational power increases but well-established inference methods often remain locked in their initial suboptimal solution. RESULTS We revisit the approach of Segal et al. to infer regulatory modules and their condition-specific regulators from gene expression data. In contrast to their direct optimization-based solution, we use a more representative centroid-like solution extracted from an ensemble of possible statistical models to explain the data. The ensemble method automatically selects a subset of most informative genes and builds a quantitatively better model for them. Genes which cluster together in the majority of models produce functionally more coherent modules. Regulators which are consistently assigned to a module are more often supported by literature, but a single model always contains many regulator assignments not supported by the ensemble. Reliably detecting condition-specific or combinatorial regulation is particularly hard in a single optimum but can be achieved using ensemble averaging. AVAILABILITY All software developed for this study is available from http://bioinformatics.psb.ugent.be/software.
Collapse
Affiliation(s)
- Anagha Joshi
- Department of Plant Systems Biology, VIB, Ghent University, Technologiepark 927, B-9052 Gent, Belgium
| | | | | | | | | |
Collapse
|
313
|
Abstract
UNLABELLED Attaining a detailed understanding of the various biological networks in an organism lies at the core of the emerging discipline of systems biology. A precise description of the relationships formed between genes, mRNA molecules, and proteins is a necessary step toward a complete description of the dynamic behavior of an organism at the cellular level, and toward intelligent, efficient, and directed modification of an organism. The importance of understanding such regulatory, signaling, and interaction networks has fueled the development of numerous in silico inference algorithms, as well as new experimental techniques and a growing collection of public databases. The Software Environment for BIological Network Inference (SEBINI) has been created to provide an interactive environment for the deployment, evaluation, and improvement of algorithms used to reconstruct the structure of biological regulatory and interaction networks. SEBINI can be used to analyze high-throughput gene expression, protein abundance, or protein activation data via a suite of state-of-the-art network inference algorithms. It also allows algorithm developers to compare and train network inference methods on artificial networks and simulated gene expression perturbation data. SEBINI can therefore be used by software developers wishing to evaluate, refine, or combine inference techniques, as well as by bioinformaticians analyzing experimental data. Networks inferred from the SEBINI software platform can be further analyzed using the Collective Analysis of Biological Interaction Networks (CABIN) tool, which is an exploratory data analysis software that enables integration and analysis of protein-protein interaction and gene-to-gene regulatory evidence obtained from multiple sources. The collection of edges in a public database, along with the confidence held in each edge (if available), can be fed into CABIN as one "evidence network," using the Cytoscape SIF file format. Using CABIN, one may increase the confidence in individual edges in a network inferred by an algorithm in SEBINI, as well as extend such a network by combining it with species-specific or generic information, e.g., known protein-protein interactions or target genes identified for known transcription factors. Thus, the combined SEBINI-CABIN toolkit aids in the more accurate reconstruction of biological networks, with less effort, in less time.A demonstration web site for SEBINI can be accessed from https://www.emsl.pnl.gov/SEBINI/RootServlet . Source code and PostgreSQL database schema are available under open source license. CONTACT ronald.taylor@pnl.gov. For commercial use, some algorithms included in SEBINI require licensing from the original developers. CABIN can be downloaded from http://www.sysbio.org/dataresources/cabin.stm . CONTACT mudita.singhal@pnl.gov.
Collapse
Affiliation(s)
- Ronald Taylor
- Computational Biology and Bioinformatics Group, Computational and Informational Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, USA.
| | | |
Collapse
|
314
|
Gene regulatory network inference: data integration in dynamic models-a review. Biosystems 2008; 96:86-103. [PMID: 19150482 DOI: 10.1016/j.biosystems.2008.12.004] [Citation(s) in RCA: 404] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Revised: 11/05/2008] [Accepted: 12/09/2008] [Indexed: 12/19/2022]
Abstract
Systems biology aims to develop mathematical models of biological systems by integrating experimental and theoretical techniques. During the last decade, many systems biological approaches that base on genome-wide data have been developed to unravel the complexity of gene regulation. This review deals with the reconstruction of gene regulatory networks (GRNs) from experimental data through computational methods. Standard GRN inference methods primarily use gene expression data derived from microarrays. However, the incorporation of additional information from heterogeneous data sources, e.g. genome sequence and protein-DNA interaction data, clearly supports the network inference process. This review focuses on promising modelling approaches that use such diverse types of molecular biological information. In particular, approaches are discussed that enable the modelling of the dynamics of gene regulatory systems. The review provides an overview of common modelling schemes and learning algorithms and outlines current challenges in GRN modelling.
Collapse
|
315
|
Abstract
Learning regulatory networks from genomics data is an important problem with applications spanning all of biology and biomedicine. Functional genomics projects offer a cost-effective means of greatly expanding the completeness of our regulatory models, and for some prokaryotic organisms they offer a means of learning accurate models that incorporate the majority of the genome. There are, however, several reasons to believe that regulatory network inference is beyond our current reach, such as (i) the combinatorics of the problem, (ii) factors we can't (or don't often) collect genome-wide measurements for and (iii) dynamics that elude cost-effective experimental designs. Recent works have demonstrated the ability to reconstruct large fractions of prokaryotic regulatory networks from compendiums of genomics data; they have also demonstrated that these global regulatory models can be used to predict the dynamics of the transcriptome. We review an overall strategy for the reconstruction of global networks based on these results in microbial systems.
Collapse
|
316
|
Ortega F, Sameith K, Turan N, Compton R, Trevino V, Vannucci M, Falciani F. Models and computational strategies linking physiological response to molecular networks from large-scale data. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2008; 366:3067-3089. [PMID: 18559319 DOI: 10.1098/rsta.2008.0085] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
An important area of research in systems biology involves the analysis and integration of genome-wide functional datasets. In this context, a major goal is the identification of a putative molecular network controlling physiological response from experimental data. With very fragmentary mechanistic information, this is a challenging task. A number of methods have been developed, each one with the potential to address an aspect of the problem. Here, we review some of the most widely used methodologies and report new results in support of the usefulness of modularization and other modelling techniques in identifying components of the molecular networks that are predictive of physiological response. We also discuss how system identification in biology could be approached, using a combination of methodologies that aim to reconstruct the relationship between molecular pathways and physiology at different levels of the organizational complexity of the molecular network.
Collapse
Affiliation(s)
- Fernando Ortega
- School of Biosciences and IBR, University of Birmingham, Birmingham B15 2TT, UK
| | | | | | | | | | | | | |
Collapse
|
317
|
Cosgrove EJ, Zhou Y, Gardner TS, Kolaczyk ED. Predicting gene targets of perturbations via network-based filtering of mRNA expression compendia. ACTA ACUST UNITED AC 2008; 24:2482-90. [PMID: 18779235 DOI: 10.1093/bioinformatics/btn476] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION DNA microarrays are routinely applied to study diseased or drug-treated cell populations. A critical challenge is distinguishing the genes directly affected by these perturbations from the hundreds of genes that are indirectly affected. Here, we developed a sparse simultaneous equation model (SSEM) of mRNA expression data and applied Lasso regression to estimate the model parameters, thus constructing a network model of gene interaction effects. This inferred network model was then used to filter data from a given experimental condition of interest and predict the genes directly targeted by that perturbation. RESULTS Our proposed SSEM-Lasso method demonstrated substantial improvement in sensitivity compared with other tested methods for predicting the targets of perturbations in both simulated datasets and microarray compendia. In simulated data, for two different network types, and over a wide range of signal-to-noise ratios, our algorithm demonstrated a 167% increase in sensitivity on average for the top 100 ranked genes, compared with the next best method. Our method also performed well in identifying targets of genetic perturbations in microarray compendia, with up to a 24% improvement in sensitivity on average for the top 100 ranked genes. The overall performance of our network-filtering method shows promise for identifying the direct targets of genetic dysregulation in cancer and disease from expression profiles. AVAILABILITY Microarray data are available at the Many Microbe Microarrays Database (M3D, http://m3d.bu.edu). Algorithm scripts are available at the Gardner Lab website (http://gardnerlab.bu.edu/SSEMLasso).
Collapse
Affiliation(s)
- Elissa J Cosgrove
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | | | | | | |
Collapse
|
318
|
Models from experiments: combinatorial drug perturbations of cancer cells. Mol Syst Biol 2008; 4:216. [PMID: 18766176 PMCID: PMC2564730 DOI: 10.1038/msb.2008.53] [Citation(s) in RCA: 139] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2007] [Accepted: 07/14/2008] [Indexed: 12/26/2022] Open
Abstract
We present a novel method for deriving network models from molecular profiles of perturbed cellular systems. The network models aim to predict quantitative outcomes of combinatorial perturbations, such as drug pair treatments or multiple genetic alterations. Mathematically, we represent the system by a set of nodes, representing molecular concentrations or cellular processes, a perturbation vector and an interaction matrix. After perturbation, the system evolves in time according to differential equations with built-in nonlinearity, similar to Hopfield networks, capable of representing epistasis and saturation effects. For a particular set of experiments, we derive the interaction matrix by minimizing a composite error function, aiming at accuracy of prediction and simplicity of network structure. To evaluate the predictive potential of the method, we performed 21 drug pair treatment experiments in a human breast cancer cell line (MCF7) with observation of phospho-proteins and cell cycle markers. The best derived network model rediscovered known interactions and contained interesting predictions. Possible applications include the discovery of regulatory interactions, the design of targeted combination therapies and the engineering of molecular biological networks.
Collapse
|
319
|
Murali TM, Rivera CG. Network Legos: Building Blocks of Cellular Wiring Diagrams. J Comput Biol 2008; 15:829-44. [DOI: 10.1089/cmb.2007.0139] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Affiliation(s)
- T. M. Murali
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA
| | - Corban G. Rivera
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA
| |
Collapse
|
320
|
Knijnenburg TA, Wessels LFA, Reinders MJT. Combinatorial influence of environmental parameters on transcription factor activity. ACTA ACUST UNITED AC 2008; 24:i172-81. [PMID: 18586711 PMCID: PMC2718633 DOI: 10.1093/bioinformatics/btn155] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Motivation: Cells receive a wide variety of environmental signals, which are often processed combinatorially to generate specific genetic responses. Changes in transcript levels, as observed across different environmental conditions, can, to a large extent, be attributed to changes in the activity of transcription factors (TFs). However, in unraveling these transcription regulation networks, the actual environmental signals are often not incorporated into the model, simply because they have not been measured. The unquantified heterogeneity of the environmental parameters across microarray experiments frustrates regulatory network inference. Results: We propose an inference algorithm that models the influence of environmental parameters on gene expression. The approach is based on a yeast microarray compendium of chemostat steady-state experiments. Chemostat cultivation enables the accurate control and measurement of many of the key cultivation parameters, such as nutrient concentrations, growth rate and temperature. The observed transcript levels are explained by inferring the activity of TFs in response to combinations of cultivation parameters. The interplay between activated enhancers and repressors that bind a gene promoter determine the possible up- or downregulation of the gene. The model is translated into a linear integer optimization problem. The resulting regulatory network identifies the combinatorial effects of environmental parameters on TF activity and gene expression. Availability: The Matlab code is available from the authors upon request. Contact:t.a.knijnenburg@tudelft.nl Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- T A Knijnenburg
- Information and Communication Theory Group, Department of Mediamatics, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands.
| | | | | |
Collapse
|
321
|
Van PT, Schmid AK, King NL, Kaur A, Pan M, Whitehead K, Koide T, Facciotti MT, Goo YA, Deutsch EW, Reiss DJ, Mallick P, Baliga NS. Halobacterium salinarum NRC-1 PeptideAtlas: toward strategies for targeted proteomics and improved proteome coverage. J Proteome Res 2008; 7:3755-64. [PMID: 18652504 DOI: 10.1021/pr800031f] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The relatively small numbers of proteins and fewer possible post-translational modifications in microbes provide a unique opportunity to comprehensively characterize their dynamic proteomes. We have constructed a PeptideAtlas (PA) covering 62.7% of the predicted proteome of the extremely halophilic archaeon Halobacterium salinarum NRC-1 by compiling approximately 636 000 tandem mass spectra from 497 mass spectrometry runs in 88 experiments. Analysis of the PA with respect to biophysical properties of constituent peptides, functional properties of parent proteins of detected peptides, and performance of different mass spectrometry approaches has highlighted plausible strategies for improving proteome coverage and selecting signature peptides for targeted proteomics. Notably, discovery of a significant correlation between absolute abundances of mRNAs and proteins has helped identify low abundance of proteins as the major limitation in peptide detection. Furthermore, we have discovered that iTRAQ labeling for quantitative proteomic analysis introduces a significant bias in peptide detection by mass spectrometry. Therefore, despite identifying at least one proteotypic peptide for almost all proteins in the PA, a context-dependent selection of proteotypic peptides appears to be the most effective approach for targeted proteomics.
Collapse
Affiliation(s)
- Phu T Van
- Institute for Systems Biology, 1441 North 34th Street, Seattle, Washington 98103, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
322
|
Hood L, Rowen L, Galas DJ, Aitchison JD. Systems biology at the Institute for Systems Biology. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2008; 7:239-48. [PMID: 18579616 DOI: 10.1093/bfgp/eln027] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Systems biology represents an experimental approach to biology that attempts to study biological systems in a holistic rather than an atomistic manner. Ideally this involves gathering dynamic and global data sets as well as phenotypic data from different levels of the biological information hierarchy, integrating them and modeling them graphically and/or mathematically to generate mechanistic explanations for the emergent systems properties. This requires that the biological frontiers drive the development of new measurement and visualization technologies and the pioneering of new computational and mathematical tools-all of which requires a cross-disciplinary environment composed of biologists, chemists, computer scientists, engineers, mathematicians, physicists, and physicians speaking common discipline languages. The Institute for Systems Biology has aspired to pioneer and seamlessly integrate each of these concepts.
Collapse
Affiliation(s)
- Leroy Hood
- Institute for Systems Biology, Seattle, WA 98103, USA
| | | | | | | |
Collapse
|
323
|
Bertin PN, Médigue C, Normand P. Advances in environmental genomics: towards an integrated view of micro-organisms and ecosystems. MICROBIOLOGY-SGM 2008; 154:347-359. [PMID: 18227239 DOI: 10.1099/mic.0.2007/011791-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Microbial genome sequencing has, for the first time, made accessible all the components needed for both the elaboration and the functioning of a cell. Associated with other global methods such as protein and mRNA profiling, genomics has considerably extended our knowledge of physiological processes and their diversity not only in human, animal and plant pathogens but also in environmental isolates. At a higher level of complexity, the so-called meta approaches have recently shown great promise in investigating microbial communities, including uncultured micro-organisms. Combined with classical methods of physico-chemistry and microbiology, these endeavours should provide us with an integrated view of how micro-organisms adapt to particular ecological niches and participate in the dynamics of ecosystems.
Collapse
Affiliation(s)
- Philippe N Bertin
- Génétique Moléculaire, Génomique et Microbiologie, Université Louis Pasteur, UMR7156 CNRS, Strasbourg, France
| | | | - Philippe Normand
- Ecologie Microbienne, Université Claude Bernard - Lyon 1, UMR5557 CNRS, Villeurbanne, France
| |
Collapse
|
324
|
Ramsey SA, Klemm SL, Zak DE, Kennedy KA, Thorsson V, Li B, Gilchrist M, Gold ES, Johnson CD, Litvak V, Navarro G, Roach JC, Rosenberger CM, Rust AG, Yudkovsky N, Aderem A, Shmulevich I. Uncovering a macrophage transcriptional program by integrating evidence from motif scanning and expression dynamics. PLoS Comput Biol 2008; 4:e1000021. [PMID: 18369420 PMCID: PMC2265556 DOI: 10.1371/journal.pcbi.1000021] [Citation(s) in RCA: 143] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2007] [Accepted: 02/04/2008] [Indexed: 01/04/2023] Open
Abstract
Macrophages are versatile immune cells that can detect a variety of pathogen-associated molecular patterns through their Toll-like receptors (TLRs). In response to microbial challenge, the TLR-stimulated macrophage undergoes an activation program controlled by a dynamically inducible transcriptional regulatory network. Mapping a complex mammalian transcriptional network poses significant challenges and requires the integration of multiple experimental data types. In this work, we inferred a transcriptional network underlying TLR-stimulated murine macrophage activation. Microarray-based expression profiling and transcription factor binding site motif scanning were used to infer a network of associations between transcription factor genes and clusters of co-expressed target genes. The time-lagged correlation was used to analyze temporal expression data in order to identify potential causal influences in the network. A novel statistical test was developed to assess the significance of the time-lagged correlation. Several associations in the resulting inferred network were validated using targeted ChIP-on-chip experiments. The network incorporates known regulators and gives insight into the transcriptional control of macrophage activation. Our analysis identified a novel regulator (TGIF1) that may have a role in macrophage activation. Macrophages play a vital role in host defense against infection by recognizing pathogens through pattern recognition receptors, such as the Toll-like receptors (TLRs), and mounting an immune response. Stimulation of TLRs initiates a complex transcriptional program in which induced transcription factor genes dynamically regulate downstream genes. Microarray-based transcriptional profiling has proved useful for mapping such transcriptional programs in simpler model organisms; however, mammalian systems present difficulties such as post-translational regulation of transcription factors, combinatorial gene regulation, and a paucity of available gene-knockout expression data. Additional evidence sources, such as DNA sequence-based identification of transcription factor binding sites, are needed. In this work, we computationally inferred a transcriptional network for TLR-stimulated murine macrophages. Our approach combined sequence scanning with time-course expression data in a probabilistic framework. Expression data were analyzed using the time-lagged correlation. A novel, unbiased method was developed to assess the significance of the time-lagged correlation. The inferred network of associations between transcription factor genes and co-expressed gene clusters was validated with targeted ChIP-on-chip experiments, and yielded insights into the macrophage activation program, including a potential novel regulator. Our general approach could be used to analyze other complex mammalian systems for which time-course expression data are available.
Collapse
Affiliation(s)
- Stephen A. Ramsey
- Institute for Systems Biology, Seattle, Washington, United States of America
- * E-mail: (SR); (AA); (IS)
| | - Sandy L. Klemm
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Daniel E. Zak
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Kathleen A. Kennedy
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Vesteinn Thorsson
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Bin Li
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Mark Gilchrist
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Elizabeth S. Gold
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Carrie D. Johnson
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Vladimir Litvak
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Garnet Navarro
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Jared C. Roach
- Institute for Systems Biology, Seattle, Washington, United States of America
| | | | - Alistair G. Rust
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Natalya Yudkovsky
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Alan Aderem
- Institute for Systems Biology, Seattle, Washington, United States of America
- * E-mail: (SR); (AA); (IS)
| | - Ilya Shmulevich
- Institute for Systems Biology, Seattle, Washington, United States of America
- * E-mail: (SR); (AA); (IS)
| |
Collapse
|
325
|
A predictive model for transcriptional control of physiology in a free living cell. Cell 2008; 131:1354-65. [PMID: 18160043 DOI: 10.1016/j.cell.2007.10.053] [Citation(s) in RCA: 256] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2007] [Revised: 09/27/2007] [Accepted: 10/31/2007] [Indexed: 12/18/2022]
Abstract
The environment significantly influences the dynamic expression and assembly of all components encoded in the genome of an organism into functional biological networks. We have constructed a model for this process in Halobacterium salinarum NRC-1 through the data-driven discovery of regulatory and functional interrelationships among approximately 80% of its genes and key abiotic factors in its hypersaline environment. Using relative changes in 72 transcription factors and 9 environmental factors (EFs) this model accurately predicts dynamic transcriptional responses of all these genes in 147 newly collected experiments representing completely novel genetic backgrounds and environments-suggesting a remarkable degree of network completeness. Using this model we have constructed and tested hypotheses critical to this organism's interaction with its changing hypersaline environment. This study supports the claim that the high degree of connectivity within biological and EF networks will enable the construction of similar models for any organism from relatively modest numbers of experiments.
Collapse
|
326
|
Hood L. A personal journey of discovery: developing technology and changing biology. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2008; 1:1-43. [PMID: 20636073 DOI: 10.1146/annurev.anchem.1.031207.113113] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
This autobiographical article describes my experiences in developing chemically based, biological technologies for deciphering biological information: DNA, RNA, proteins, interactions, and networks. The instruments developed include protein and DNA sequencers and synthesizers, as well as ink-jet technology for synthesizing DNA chips. Diverse new strategies for doing biology also arose from novel applications of these instruments. The functioning of these instruments can be integrated to generate powerful new approaches to cloning and characterizing genes from a small amount of protein sequence or to using gene sequences to synthesize peptide fragments so as to characterize various properties of the proteins. I also discuss the five paradigm changes in which I have participated: the development and integration of biological instrumentation; the human genome project; cross-disciplinary biology; systems biology; and predictive, personalized, preventive, and participatory (P4) medicine. Finally, I discuss the origins, the philosophy, some accomplishments, and the future trajectories of the Institute for Systems Biology.
Collapse
Affiliation(s)
- Lee Hood
- Institute for Systems Biology, Seattle, Washington 98103, USA.
| |
Collapse
|
327
|
Capobianco E. Model validation for gene selection and regulation maps. Funct Integr Genomics 2007; 8:87-99. [PMID: 18064499 DOI: 10.1007/s10142-007-0066-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2007] [Revised: 10/09/2007] [Accepted: 10/14/2007] [Indexed: 10/22/2022]
Abstract
Consider the problem of investigating the structure of a set of sample points in a very high dimensional (Euclidean) space. This case is paradigmatic, for instance, in postgenomic applications. The high dimensionality and small sample size make statistical inference and optimization difficult problems, such that selecting a model or choosing a learning algorithm face the evidence that currently no consensus guidelines exist. Usually, the intervention of linear or nonlinear projection method is required to map the observations into a low-dimensional space with the most salient data features preserved. This step usually involves computing statistics from the low-dimensional projected space of features and then inferring on the highly dimensional original structures (the genes). This work deals with model validation for gene selection and regulation dynamics. The analysis is conducted through a mix of quantitative methods and qualitative aspects. A regularized inference approach is employed based on dimensionality reduction, data denoising, and feature extraction tasks. Each task requires the implementation of statistics and machine learning algorithms. We focus on the complex problem of inferring the coregulation from the coexpression gene dynamics in the presence of limited biological information and time course perturbation experiments. In particular, both separation and interference gene dynamics are considered and validated to design the most coherent underlying transcriptional regulatory map.
Collapse
Affiliation(s)
- Enrico Capobianco
- CRS4 Bioinformatics Laboratory, Technology Park of Sardinia, Pula, Cagliari, Sardinia, Italy.
| |
Collapse
|
328
|
Reiss DJ, Facciotti MT, Baliga NS. Model-based deconvolution of genome-wide DNA binding. ACTA ACUST UNITED AC 2007; 24:396-403. [PMID: 18056063 DOI: 10.1093/bioinformatics/btm592] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Chromatin immunoprecipitation followed by hybridization to a genomic tiling microarray (ChIP-chip) is a routinely used protocol for localizing the genomic targets of DNA-binding proteins. The resolution to which binding sites in this assay can be identified is commonly considered to be limited by two factors: (1) the resolution at which the genomic targets are tiled in the microarray and (2) the large and variable lengths of the immunoprecipitated DNA fragments. RESULTS We have developed a generative model of binding sites in ChIP-chip data and an approach, MeDiChI, for efficiently and robustly learning that model from diverse data sets. We have evaluated MeDiChI's performance using simulated data, as well as on several diverse ChIP-chip data sets collected on widely different tiling array platforms for two different organisms (Saccharomyces cerevisiae and Halobacterium salinarium NRC-1). We find that MeDiChI accurately predicts binding locations to a resolution greater than that of the probe spacing, even for overlapping peaks, and can increase the effective resolution of tiling array data by a factor of 5x or better. Moreover, the method's performance on simulated data provides insights into effectively optimizing the experimental design for increased binding site localization accuracy and efficacy. AVAILABILITY MeDiChI is available as an open-source R package, including all data, from http://baliga.systemsbiology.net/medichi.
Collapse
Affiliation(s)
- David J Reiss
- Institute for Systems Biology, 1441 N. 34th St. Seattle, WA 98103-8904, USA.
| | | | | |
Collapse
|
329
|
Abstract
The identification, purification and characterization of cancer stem cells (CSCs) holds tremendous promise for improving the treatment of cancer. Mounting evidence is demonstrating that only certain tumour cells (i.e. the CSCs) can give rise to tumours when injected and that these purified cell populations generate heterogeneous tumours. While the cell of origin is still not determined definitively, specific molecular markers for populations containing these CSCs have been found for leukaemia, brain cancer and breast cancer, among others. Systems approaches, particularly molecular profiling, have proven to be of great utility for cancer diagnosis and characterization. These approaches also hold significant promise for identifying distinctive properties of the CSCs, and progress is already being made.
Collapse
|
330
|
Blanding CR, Simmons SJ, Casati P, Walbot V, Stapleton AE. Coordinated regulation of maize genes during increasing exposure to ultraviolet radiation: identification of ultraviolet-responsive genes, functional processes and associated potential promoter motifs. PLANT BIOTECHNOLOGY JOURNAL 2007; 5:677-95. [PMID: 17924934 DOI: 10.1111/j.1467-7652.2007.00282.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Genetic gain in the yield of modern maize reflects increased stress tolerance. The manipulation of genes for deliberate alterations in tolerance relies on an understanding of the regulation and components of stress responses. Transcriptome analysis of an ultraviolet (UV) radiation time course with paired treatment and control measurements yielded groups of coordinately regulated genes and gene ontology processes. A comparison of the patterns of gene expression with patterns of morphological changes allowed the identification of physiologically relevant gene expression regulons. A set of genes significantly affected by UV radiation in maize leaves was selected by linear modelling plus order-restricted inference profile matches. This gene list was used to find upstream sequence motifs that predict the UV regulation of maize gene expression.
Collapse
Affiliation(s)
- Carletha R Blanding
- Department of Biology and Marine Biology, University of North Carolina at Wilmington, 601 S. College, Wilmington, NC 28403, USA
| | | | | | | | | |
Collapse
|
331
|
Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 2007; 5:e8. [PMID: 17214507 PMCID: PMC1764438 DOI: 10.1371/journal.pbio.0050008] [Citation(s) in RCA: 980] [Impact Index Per Article: 57.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2006] [Accepted: 11/07/2006] [Indexed: 11/19/2022] Open
Abstract
Machine learning approaches offer the potential to systematically identify transcriptional regulatory interactions from a compendium of microarray expression profiles. However, experimental validation of the performance of these methods at the genome scale has remained elusive. Here we assess the global performance of four existing classes of inference algorithms using 445 Escherichia coli Affymetrix arrays and 3,216 known E. coli regulatory interactions from RegulonDB. We also developed and applied the context likelihood of relatedness (CLR) algorithm, a novel extension of the relevance networks class of algorithms. CLR demonstrates an average precision gain of 36% relative to the next-best performing algorithm. At a 60% true positive rate, CLR identifies 1,079 regulatory interactions, of which 338 were in the previously known network and 741 were novel predictions. We tested the predicted interactions for three transcription factors with chromatin immunoprecipitation, confirming 21 novel interactions and verifying our RegulonDB-based performance estimates. CLR also identified a regulatory link providing central metabolic control of iron transport, which we confirmed with real-time quantitative PCR. The compendium of expression data compiled in this study, coupled with RegulonDB, provides a valuable model system for further improvement of network inference algorithms using experimental data.
Collapse
Affiliation(s)
- Jeremiah J Faith
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Boris Hayete
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Joshua T Thaden
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
- Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Ilaria Mogno
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
- Department of Computer and Systems Science A. Ruberti, University of Rome, La Sapienza, Rome, Italy
| | - Jamey Wierzbowski
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
- Cellicon Biotechnologies, Boston, Massachusetts, United States of America
| | - Guillaume Cottarel
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
- Cellicon Biotechnologies, Boston, Massachusetts, United States of America
| | - Simon Kasif
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - James J Collins
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Timothy S Gardner
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
332
|
Abstract
In this review we give an overview of computational and statistical methods to reconstruct cellular networks. Although this area of research is vast and fast developing, we show that most currently used methods can be organized by a few key concepts. The first part of the review deals with conditional independence models including Gaussian graphical models and Bayesian networks. The second part discusses probabilistic and graph-based methods for data from experimental interventions and perturbations.
Collapse
Affiliation(s)
- Florian Markowetz
- Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany
- Princeton University, Lewis-Sigler Institute for Integrative Genomics and Dept. of Computer Science, Princeton, NJ 08544, USA
| | - Rainer Spang
- Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany
- Present affiliation: University Regensburg, Institute of Functional Genomics, Josef-Engert-Str. 9, 93053 Regensburg, Germany
| |
Collapse
|
333
|
Schmid AK, Reiss DJ, Kaur A, Pan M, King N, Van PT, Hohmann L, Martin DB, Baliga NS. The anatomy of microbial cell state transitions in response to oxygen. Genome Res 2007; 17:1399-413. [PMID: 17785531 PMCID: PMC1987344 DOI: 10.1101/gr.6728007] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Adjustment of physiology in response to changes in oxygen availability is critical for the survival of all organisms. However, the chronology of events and the regulatory processes that determine how and when changes in environmental oxygen tension result in an appropriate cellular response is not well understood at a systems level. Therefore, transcriptome, proteome, ATP, and growth changes were analyzed in a halophilic archaeon to generate a temporal model that describes the cellular events that drive the transition between the organism's two opposing cell states of anoxic quiescence and aerobic growth. According to this model, upon oxygen influx, an initial burst of protein synthesis precedes ATP and transcription induction, rapidly driving the cell out of anoxic quiescence, culminating in the resumption of growth. This model also suggests that quiescent cells appear to remain actively poised for energy production from a variety of different sources. Dynamic temporal analysis of relationships between transcription and translation of key genes suggests several important mechanisms for cellular sustenance under anoxia as well as specific instances of post-transcriptional regulation.
Collapse
Affiliation(s)
- Amy K. Schmid
- Institute for Systems Biology, Seattle, Washington 98103, USA
| | - David J. Reiss
- Institute for Systems Biology, Seattle, Washington 98103, USA
| | - Amardeep Kaur
- Institute for Systems Biology, Seattle, Washington 98103, USA
| | - Min Pan
- Institute for Systems Biology, Seattle, Washington 98103, USA
| | - Nichole King
- Institute for Systems Biology, Seattle, Washington 98103, USA
| | - Phu T. Van
- Institute for Systems Biology, Seattle, Washington 98103, USA
| | - Laura Hohmann
- Institute for Systems Biology, Seattle, Washington 98103, USA
| | - Daniel B. Martin
- Institute for Systems Biology, Seattle, Washington 98103, USA
- Divisions of Human Biology and Clinical Research, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109-1024, USA
| | - Nitin S. Baliga
- Institute for Systems Biology, Seattle, Washington 98103, USA
- Corresponding author.E-mail ; fax (206) 732-1299
| |
Collapse
|
334
|
Price ND, Shmulevich I. Biochemical and statistical network models for systems biology. Curr Opin Biotechnol 2007; 18:365-70. [PMID: 17681779 PMCID: PMC2034526 DOI: 10.1016/j.copbio.2007.07.009] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2007] [Accepted: 07/12/2007] [Indexed: 11/19/2022]
Abstract
The normal and abnormal behavior of a living cell is governed by complex networks of interacting biomolecules. Models of these networks allow us to make predictions about cellular behavior under a variety of environmental cues. In this review, we focus on two broad classes of such models: biochemical network models and statistical inference models. In particular, we discuss a number of modeling approaches in the context of the assumptions that they entail, the types of data required for their inference, and the range of their applicability.
Collapse
|
335
|
Bansal M, Belcastro V, Ambesi-Impiombato A, di Bernardo D. How to infer gene networks from expression profiles. Mol Syst Biol 2007; 3:78. [PMID: 17299415 PMCID: PMC1828749 DOI: 10.1038/msb4100120] [Citation(s) in RCA: 442] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2006] [Accepted: 12/18/2006] [Indexed: 02/01/2023] Open
Abstract
Inferring, or 'reverse-engineering', gene networks can be defined as the process of identifying gene interactions from experimental data through computational analysis. Gene expression data from microarrays are typically used for this purpose. Here we compared different reverse-engineering algorithms for which ready-to-use software was available and that had been tested on experimental data sets. We show that reverse-engineering algorithms are indeed able to correctly infer regulatory interactions among genes, at least when one performs perturbation experiments complying with the algorithm requirements. These algorithms are superior to classic clustering algorithms for the purpose of finding regulatory interactions among genes, and, although further improvements are needed, have reached a discreet performance for being practically useful.
Collapse
Affiliation(s)
- Mukesh Bansal
- Telethon Institute of Genetics and Medicine, Via P Castellino, Naples, Italy
- European School of Molecular Medicine, Naples, Italy
| | - Vincenzo Belcastro
- Department of Natural Sciences, University of Naples ‘Federico II', Naples, Italy
| | - Alberto Ambesi-Impiombato
- Telethon Institute of Genetics and Medicine, Via P Castellino, Naples, Italy
- Department of Neuroscience, University of Naples ‘Federico II', Naples, Italy
| | - Diego di Bernardo
- Telethon Institute of Genetics and Medicine, Via P Castellino, Naples, Italy
- European School of Molecular Medicine, Naples, Italy
- Systems Biology Lab, Telethon Institute of Genetics and Medicine, Via P Castellino 111, Naples 18131, Italy. Tel.: +39 081 6132 319; Fax: +39 081 6132 351;
| |
Collapse
|
336
|
Affiliation(s)
- Boris Hayete
- Bioinformatics Program and Center for BioDynamics, Boston University, Boston, MA, USA
| | | | | |
Collapse
|
337
|
Ernst J, Vainas O, Harbison CT, Simon I, Bar-Joseph Z. Reconstructing dynamic regulatory maps. Mol Syst Biol 2007; 3:74. [PMID: 17224918 PMCID: PMC1800355 DOI: 10.1038/msb4100115] [Citation(s) in RCA: 160] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2006] [Accepted: 11/15/2006] [Indexed: 02/07/2023] Open
Abstract
Even simple organisms have the ability to respond to internal and external stimuli. This response is carried out by a dynamic network of protein-DNA interactions that allows the specific regulation of genes needed for the response. We have developed a novel computational method that uses an input-output hidden Markov model to model these regulatory networks while taking into account their dynamic nature. Our method works by identifying bifurcation points, places in the time series where the expression of a subset of genes diverges from the rest of the genes. These points are annotated with the transcription factors regulating these transitions resulting in a unified temporal map. Applying our method to study yeast response to stress, we derive dynamic models that are able to recover many of the known aspects of these responses. Predictions made by our method have been experimentally validated leading to new roles for Ino4 and Gcn4 in controlling yeast response to stress. The temporal cascade of factors reveals common pathways and highlights differences between master and secondary factors in the utilization of network motifs and in condition-specific regulation.
Collapse
Affiliation(s)
- Jason Ernst
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Oded Vainas
- Department of Molecular Biology, Hebrew University Medical School, Jerusalem, Israel
| | | | - Itamar Simon
- Department of Molecular Biology, Hebrew University Medical School, Jerusalem, Israel
| | - Ziv Bar-Joseph
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Computer Science, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
338
|
Gilchrist M, Thorsson V, Li B, Rust AG, Korb M, Roach JC, Kennedy K, Hai T, Bolouri H, Aderem A. Systems biology approaches identify ATF3 as a negative regulator of Toll-like receptor 4. Nature 2006; 441:173-8. [PMID: 16688168 DOI: 10.1038/nature04768] [Citation(s) in RCA: 611] [Impact Index Per Article: 33.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2006] [Accepted: 03/29/2006] [Indexed: 11/09/2022]
Abstract
The innate immune system is absolutely required for host defence, but, uncontrolled, it leads to inflammatory disease. This control is mediated, in part, by cytokines that are secreted by macrophages. Immune regulation is extraordinarily complex, and can be best investigated with systems approaches (that is, using computational tools to predict regulatory networks arising from global, high-throughput data sets). Here we use cluster analysis of a comprehensive set of transcriptomic data derived from Toll-like receptor (TLR)-activated macrophages to identify a prominent group of genes that appear to be regulated by activating transcription factor 3 (ATF3), a member of the CREB/ATF family of transcription factors. Network analysis predicted that ATF3 is part of a transcriptional complex that also contains members of the nuclear factor (NF)-kappaB family of transcription factors. Promoter analysis of the putative ATF3-regulated gene cluster demonstrated an over-representation of closely apposed ATF3 and NF-kappaB binding sites, which was verified by chromatin immunoprecipitation and hybridization to a DNA microarray. This cluster included important cytokines such as interleukin (IL)-6 and IL-12b. ATF3 and Rel (a component of NF-kappaB) were shown to bind to the regulatory regions of these genes upon macrophage activation. A kinetic model of Il6 and Il12b messenger RNA expression as a function of ATF3 and NF-kappaB promoter binding predicted that ATF3 is a negative regulator of Il6 and Il12b transcription, and this hypothesis was validated using Atf3-null mice. ATF3 seems to inhibit Il6 and Il12b transcription by altering chromatin structure, thereby restricting access to transcription factors. Because ATF3 is itself induced by lipopolysaccharide, it seems to regulate TLR-stimulated inflammatory responses as part of a negative-feedback loop.
Collapse
Affiliation(s)
- Mark Gilchrist
- Institute for Systems Biology, Seattle, Washington 98103, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
339
|
Reiss DJ, Baliga NS, Bonneau R. Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics 2006; 7:280. [PMID: 16749936 PMCID: PMC1502140 DOI: 10.1186/1471-2105-7-280] [Citation(s) in RCA: 153] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2006] [Accepted: 06/02/2006] [Indexed: 12/23/2022] Open
Abstract
Background The learning of global genetic regulatory networks from expression data is a severely under-constrained problem that is aided by reducing the dimensionality of the search space by means of clustering genes into putatively co-regulated groups, as opposed to those that are simply co-expressed. Be cause genes may be co-regulated only across a subset of all observed experimental conditions, biclustering (clustering of genes and conditions) is more appropriate than standard clustering. Co-regulated genes are also often functionally (physically, spatially, genetically, and/or evolutionarily) associated, and such a priori known or pre-computed associations can provide support for appropriately grouping genes. One important association is the presence of one or more common cis-regulatory motifs. In organisms where these motifs are not known, their de novo detection, integrated into the clustering algorithm, can help to guide the process towards more biologically parsimonious solutions. Results We have developed an algorithm, cMonkey, that detects putative co-regulated gene groupings by integrating the biclustering of gene expression data and various functional associations with the de novo detection of sequence motifs. Conclusion We have applied this procedure to the archaeon Halobacterium NRC-1, as part of our efforts to decipher its regulatory network. In addition, we used cMonkey on public data for three organisms in the other two domains of life: Helicobacter pylori, Saccharomyces cerevisiae, and Escherichia coli. The biclusters detected by cMonkey both recapitulated known biology and enabled novel predictions (some for Halobacterium were subsequently confirmed in the laboratory). For example, it identified the bacteriorhodopsin regulon, assigned additional genes to this regulon with apparently unrelated function, and detected its known promoter motif. We have performed a thorough comparison of cMonkey results against other clustering methods, and find that cMonkey biclusters are more parsimonious with all available evidence for co-regulation.
Collapse
Affiliation(s)
- David J Reiss
- Institute for Systems Biology, 1441 N. 34th St. Seattle, WA 98103-8904, USA
| | - Nitin S Baliga
- Institute for Systems Biology, 1441 N. 34th St. Seattle, WA 98103-8904, USA
| | - Richard Bonneau
- New York University Dept. of Biology, Dept. of Computer Science, New York, USA
| |
Collapse
|
340
|
Kaur A, Pan M, Meislin M, Facciotti MT, El-Gewely R, Baliga NS. A systems view of haloarchaeal strategies to withstand stress from transition metals. Genome Res 2006; 16:841-54. [PMID: 16751342 PMCID: PMC1484451 DOI: 10.1101/gr.5189606] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Given that transition metals are essential cofactors in central biological processes, misallocation of the wrong metal ion to a metalloprotein can have resounding and often detrimental effects on diverse aspects of cellular physiology. Therefore, in an attempt to characterize unique and shared responses to chemically similar metals, we have reconstructed physiological behaviors of Halobacterium NRC-1, an archaeal halophile, in sublethal levels of Mn(II), Fe(II), Co(II), Ni(II), Cu(II), and Zn(II). Over 20% of all genes responded transiently within minutes of exposure to Fe(II), perhaps reflecting immediate large-scale physiological adjustments to maintain homeostasis. At steady state, each transition metal induced growth arrest, attempts to minimize oxidative stress, toxic ion scavenging, increased protein turnover and DNA repair, and modulation of active ion transport. While several of these constitute generalized stress responses, up-regulation of active efflux of Co(II), Ni(II), Cu(II), and Zn(II), down-regulation of Mn(II) uptake and up-regulation of Fe(II) chelation, confer resistance to the respective metals. We have synthesized all of these discoveries into a unified systems-level model to provide an integrated perspective of responses to six transition metals with emphasis on experimentally verified regulatory mechanisms. Finally, through comparisons across global transcriptional responses to different metals, we provide insights into putative in vivo metal selectivity of metalloregulatory proteins and demonstrate that a systems approach can help rapidly unravel novel metabolic potential and regulatory programs of poorly studied organisms.
Collapse
Affiliation(s)
- Amardeep Kaur
- Institute for Systems Biology, Seattle, Washington 98103-8904 USA
| | - Min Pan
- Institute for Systems Biology, Seattle, Washington 98103-8904 USA
| | - Megan Meislin
- Institute for Systems Biology, Seattle, Washington 98103-8904 USA
| | | | | | - Nitin S. Baliga
- Institute for Systems Biology, Seattle, Washington 98103-8904 USA
- Corresponding author.E-mail ; fax (206) 732-1299
| |
Collapse
|