151
|
Integrating Transcriptomic and Proteomic Data Using Predictive Regulatory Network Models of Host Response to Pathogens. PLoS Comput Biol 2016; 12:e1005013. [PMID: 27403523 PMCID: PMC4942116 DOI: 10.1371/journal.pcbi.1005013] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2015] [Accepted: 06/06/2016] [Indexed: 12/17/2022] Open
Abstract
Mammalian host response to pathogenic infections is controlled by a complex regulatory network connecting regulatory proteins such as transcription factors and signaling proteins to target genes. An important challenge in infectious disease research is to understand molecular similarities and differences in mammalian host response to diverse sets of pathogens. Recently, systems biology studies have produced rich collections of omic profiles measuring host response to infectious agents such as influenza viruses at multiple levels. To gain a comprehensive understanding of the regulatory network driving host response to multiple infectious agents, we integrated host transcriptomes and proteomes using a network-based approach. Our approach combines expression-based regulatory network inference, structured-sparsity based regression, and network information flow to infer putative physical regulatory programs for expression modules. We applied our approach to identify regulatory networks, modules and subnetworks that drive host response to multiple influenza infections. The inferred regulatory network and modules are significantly enriched for known pathways of immune response and implicate apoptosis, splicing, and interferon signaling processes in the differential response of viral infections of different pathogenicities. We used the learned network to prioritize regulators and study virus and time-point specific networks. RNAi-based knockdown of predicted regulators had significant impact on viral replication and include several previously unknown regulators. Taken together, our integrated analysis identified novel module level patterns that capture strain and pathogenicity-specific patterns of expression and helped identify important regulators of host response to influenza infection. An important challenge in infectious disease research is to understand how the human immune system responds to different types of pathogenic infections. An important component of mounting proper response is the transcriptional regulatory network that specifies the context-specific gene expression program in the host cell. However, our understanding of this regulatory network and how it drives context-specific transcriptional programs is incomplete. To address this gap, we performed a network-based analysis of host response to influenza viruses that integrated high-throughput mRNA- and protein measurements and protein-protein interaction networks to identify virus and pathogenicity-specific modules and their upstream physical regulatory programs. We inferred regulatory networks for human cell line and mouse host systems, which recapitulated several known regulators and pathways of the immune response and viral life cycle. We used the networks to study time point and strain-specific subnetworks and to prioritize important regulators of host response. We predicted several novel regulators, both at the mRNA and protein levels, and experimentally verified their role in the virus life cycle based on their ability to significantly impact viral replication.
Collapse
|
152
|
Gao C, McDowell IC, Zhao S, Brown CD, Engelhardt BE. Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering. PLoS Comput Biol 2016; 12:e1004791. [PMID: 27467526 PMCID: PMC4965098 DOI: 10.1371/journal.pcbi.1004791] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Accepted: 02/03/2016] [Indexed: 01/15/2023] Open
Abstract
Identifying latent structure in high-dimensional genomic data is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-regulated genes that covary in all of the samples or in only a subset of the samples. Our biclustering method, BicMix, allows overcomplete representations of the data, computational tractability, and joint modeling of unknown confounders and biological signals. Compared with related biclustering methods, BicMix recovers latent structure with higher precision across diverse simulation scenarios as compared to state-of-the-art biclustering methods. Further, we develop a principled method to recover context specific gene co-expression networks from the estimated sparse biclustering matrices. We apply BicMix to breast cancer gene expression data and to gene expression data from a cardiovascular study cohort, and we recover gene co-expression networks that are differential across ER+ and ER- samples and across male and female samples. We apply BicMix to the Genotype-Tissue Expression (GTEx) pilot data, and we find tissue specific gene networks. We validate these findings by using our tissue specific networks to identify trans-eQTLs specific to one of four primary tissues.
Collapse
Affiliation(s)
- Chuan Gao
- Department of Statistical Science, Duke University, Durham, North Carolina, United States of America
| | - Ian C. McDowell
- Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina, United States of America
| | - Shiwen Zhao
- Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina, United States of America
| | - Christopher D. Brown
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Barbara E. Engelhardt
- Department of Computer Science, Center for Statistics and Machine Learning, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
153
|
Effective gene expression data generation framework based on multi-model approach. Artif Intell Med 2016; 70:41-61. [PMID: 27431036 DOI: 10.1016/j.artmed.2016.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Accepted: 05/27/2016] [Indexed: 11/20/2022]
Abstract
OBJECTIVE Overcome the lack of enough samples in gene expression data sets having thousands of genes but a small number of samples challenging the computational methods using them. METHODS AND MATERIAL This paper introduces a multi-model artificial gene expression data generation framework where different gene regulatory network (GRN) models contribute to the final set of samples based on the characteristics of their underlying paradigms. In the first stage, we build different GRN models, and sample data from each of them separately. Then, we pool the generated samples into a rich set of gene expression samples, and finally try to select the best of the generated samples based on a multi-objective selection method measuring the quality of the generated samples from three different aspects such as compatibility, diversity and coverage. We use four alternative GRN models, namely, ordinary differential equations, probabilistic Boolean networks, multi-objective genetic algorithm and hierarchical Markov model. RESULTS We conducted a comprehensive set of experiments based on both real-life biological and synthetic gene expression data sets. We show that our multi-objective sample selection mechanism effectively combines samples from different models having up to 95% compatibility, 10% diversity and 50% coverage. We show that the samples generated by our framework has up to 1.5x higher compatibility, 2x higher diversity and 2x higher coverage than the samples generated by the individual models that the multi-model framework uses. Moreover, the results show that the GRNs inferred from the samples generated by our framework can have 2.4x higher precision, 12x higher recall, and 5.4x higher f-measure values than the GRNs inferred from the original gene expression samples. CONCLUSIONS Therefore, we show that, we can significantly improve the quality of generated gene expression samples by integrating different computational models into one unified framework without dealing with complex internal details of each individual model. Moreover, the rich set of artificial gene expression samples is able to capture some biological relations that can even not be captured by the original gene expression data set.
Collapse
|
154
|
Cox process representation and inference for stochastic reaction-diffusion processes. Nat Commun 2016; 7:11729. [PMID: 27222432 PMCID: PMC4894951 DOI: 10.1038/ncomms11729] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 04/26/2016] [Indexed: 01/30/2023] Open
Abstract
Complex behaviour in many systems arises from the stochastic interactions of spatially distributed particles or agents. Stochastic reaction–diffusion processes are widely used to model such behaviour in disciplines ranging from biology to the social sciences, yet they are notoriously difficult to simulate and calibrate to observational data. Here we use ideas from statistical physics and machine learning to provide a solution to the inverse problem of learning a stochastic reaction–diffusion process from data. Our solution relies on a non-trivial connection between stochastic reaction–diffusion processes and spatio-temporal Cox processes, a well-studied class of models from computational statistics. This connection leads to an efficient and flexible algorithm for parameter inference and model selection. Our approach shows excellent accuracy on numeric and real data examples from systems biology and epidemiology. Our work provides both insights into spatio-temporal stochastic systems, and a practical solution to a long-standing problem in computational modelling. Stochastic reaction-diffusion systems are used for modelling spatial dynamics in many disciplines, but parameter inference and model selection remain challenging. Here the authors offer a solution enabled by a connection between reaction-diffusion and the well-studied spatio-temporal Cox processes.
Collapse
|
155
|
Lobo D, Morokuma J, Levin M. Computational discovery andin vivovalidation ofhnf4as a regulatory gene in planarian regeneration. Bioinformatics 2016; 32:2681-5. [DOI: 10.1093/bioinformatics/btw299] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Accepted: 05/04/2016] [Indexed: 11/14/2022] Open
|
156
|
Durant F, Lobo D, Hammelman J, Levin M. Physiological controls of large-scale patterning in planarian regeneration: a molecular and computational perspective on growth and form. REGENERATION (OXFORD, ENGLAND) 2016; 3:78-102. [PMID: 27499881 PMCID: PMC4895326 DOI: 10.1002/reg2.54] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Revised: 02/18/2016] [Accepted: 02/22/2016] [Indexed: 12/12/2022]
Abstract
Planaria are complex metazoans that repair damage to their bodies and cease remodeling when a correct anatomy has been achieved. This model system offers a unique opportunity to understand how large-scale anatomical homeostasis emerges from the activities of individual cells. Much progress has been made on the molecular genetics of stem cell activity in planaria. However, recent data also indicate that the global pattern is regulated by physiological circuits composed of ionic and neurotransmitter signaling. Here, we overview the multi-scale problem of understanding pattern regulation in planaria, with specific focus on bioelectric signaling via ion channels and gap junctions (electrical synapses), and computational efforts to extract explanatory models from functional and molecular data on regeneration. We present a perspective that interprets results in this fascinating field using concepts from dynamical systems theory and computational neuroscience. Serving as a tractable nexus between genetic, physiological, and computational approaches to pattern regulation, planarian pattern homeostasis harbors many deep insights for regenerative medicine, evolutionary biology, and engineering.
Collapse
Affiliation(s)
- Fallon Durant
- Department of Biology, Allen Discovery Center at Tufts University, Tufts Center for Regenerative and Developmental BiologyTufts UniversityMA02155USA
| | - Daniel Lobo
- Department of Biological SciencesUniversity of MarylandBaltimore County, 1000 Hilltop CircleBaltimoreMD21250USA
| | - Jennifer Hammelman
- Department of Biology, Allen Discovery Center at Tufts University, Tufts Center for Regenerative and Developmental BiologyTufts UniversityMA02155USA
| | - Michael Levin
- Department of Biology, Allen Discovery Center at Tufts University, Tufts Center for Regenerative and Developmental BiologyTufts UniversityMA02155USA
| |
Collapse
|
157
|
Tanevski J, Todorovski L, Džeroski S. Learning stochastic process-based models of dynamical systems from knowledge and data. BMC SYSTEMS BIOLOGY 2016; 10:30. [PMID: 27005698 PMCID: PMC4802653 DOI: 10.1186/s12918-016-0273-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 03/06/2016] [Indexed: 01/02/2023]
Abstract
Background Identifying a proper model structure, using methods that address both structural and parameter uncertainty, is a crucial problem within the systems approach to biology. And yet, it has a marginal presence in the recent literature. While many existing approaches integrate methods for simulation and parameter estimation of a single model to address parameter uncertainty, only few of them address structural uncertainty at the same time. The methods for handling structure uncertainty often oversimplify the problem by allowing the human modeler to explicitly enumerate a relatively small number of alternative model structures. On the other hand, process-based modeling methods provide flexible modular formalisms for specifying large classes of plausible model structures, but their scope is limited to deterministic models. Here, we aim at extending the scope of process-based modeling methods to inductively learn stochastic models from knowledge and data. Results We combine the flexibility of process-based modeling in terms of addressing structural uncertainty with the benefits of stochastic modeling. The proposed method combines search trough the space of plausible model structures, the parsimony principle and parameter estimation to identify a model with optimal structure and parameters. We illustrate the utility of the proposed method on four stochastic modeling tasks in two domains: gene regulatory networks and epidemiology. Within the first domain, using synthetically generated data, the method successfully recovers the structure and parameters of known regulatory networks from simulations. In the epidemiology domain, the method successfully reconstructs previously established models of epidemic outbreaks from real, sparse and noisy measurement data. Conclusions The method represents a unified approach to modeling dynamical systems that allows for flexible formalization of the space of candidate model structures, deterministic and stochastic interpretation of model dynamics, and automated induction of model structure and parameters from data. The method is able to reconstruct models of dynamical systems from synthetic and real data.
Collapse
Affiliation(s)
- Jovan Tanevski
- Jožef Stefan Institute, Jamova cesta 39, Ljubljana, 1000, Slovenia. .,Jožef Stefan International Postgraduate School, Jamova cesta 39, Ljubljana, 1000, Slovenia.
| | - Ljupčo Todorovski
- University of Ljubljana, Gosarjeva ulica 5, Ljubljana, 1000, Slovenia
| | - Sašo Džeroski
- Jožef Stefan Institute, Jamova cesta 39, Ljubljana, 1000, Slovenia.,Jožef Stefan International Postgraduate School, Jamova cesta 39, Ljubljana, 1000, Slovenia
| |
Collapse
|
158
|
Lobo D, Hammelman J, Levin M. MoCha: Molecular Characterization of Unknown Pathways. J Comput Biol 2016; 23:291-7. [PMID: 26950055 DOI: 10.1089/cmb.2015.0211] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Automated methods for the reverse-engineering of complex regulatory networks are paving the way for the inference of mechanistic comprehensive models directly from experimental data. These novel methods can infer not only the relations and parameters of the known molecules defined in their input datasets, but also unknown components and pathways identified as necessary by the automated algorithms. Identifying the molecular nature of these unknown components is a crucial step for making testable predictions and experimentally validating the models, yet no specific and efficient tools exist to aid in this process. To this end, we present here MoCha (Molecular Characterization), a tool optimized for the search of unknown proteins and their pathways from a given set of known interacting proteins. MoCha uses the comprehensive dataset of protein-protein interactions provided by the STRING database, which currently includes more than a billion interactions from over 2,000 organisms. MoCha is highly optimized, performing typical searches within seconds. We demonstrate the use of MoCha with the characterization of unknown components from reverse-engineered models from the literature. MoCha is useful for working on network models by hand or as a downstream step of a model inference engine workflow and represents a valuable and efficient tool for the characterization of unknown pathways using known data from thousands of organisms. MoCha and its source code are freely available online under the GPLv3 license.
Collapse
Affiliation(s)
- Daniel Lobo
- 1 Department of Biological Sciences, University of Maryland , Baltimore County, Baltimore, Maryland
| | - Jennifer Hammelman
- 2 Center for Regenerative and Developmental Biology, and Department of Biology, Tufts University , Medford, Massachusetts
| | - Michael Levin
- 2 Center for Regenerative and Developmental Biology, and Department of Biology, Tufts University , Medford, Massachusetts
| |
Collapse
|
159
|
He B, Tan K. Understanding transcriptional regulatory networks using computational models. Curr Opin Genet Dev 2016; 37:101-108. [PMID: 26950762 DOI: 10.1016/j.gde.2016.02.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Revised: 01/29/2016] [Accepted: 02/08/2016] [Indexed: 01/06/2023]
Abstract
Transcriptional regulatory networks (TRNs) encode instructions for animal development and physiological responses. Recent advances in genomic technologies and computational modeling have revolutionized our ability to construct models of TRNs. Here, we survey current computational methods for inferring TRN models using genome-scale data. We discuss their advantages and limitations. We summarize representative TRNs constructed using genome-scale data in both normal and disease development. We discuss lessons learned about the structure/function relationship of TRNs, based on examining various large-scale TRN models. Finally, we outline some open questions regarding TRNs, including how to improve model accuracy by integrating complementary data types, how to infer condition-specific TRNs, and how to compare TRNs across conditions and species in order to understand their structure/function relationship.
Collapse
Affiliation(s)
- Bing He
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA 52242, USA
| | - Kai Tan
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA 52242, USA; Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA.
| |
Collapse
|
160
|
Omranian N, Eloundou-Mbebi JMO, Mueller-Roeber B, Nikoloski Z. Gene regulatory network inference using fused LASSO on multiple data sets. Sci Rep 2016; 6:20533. [PMID: 26864687 PMCID: PMC4750075 DOI: 10.1038/srep20533] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Accepted: 01/06/2016] [Indexed: 01/14/2023] Open
Abstract
Devising computational methods to accurately reconstruct gene regulatory networks given gene expression data is key to systems biology applications. Here we propose a method for reconstructing gene regulatory networks by simultaneous consideration of data sets from different perturbation experiments and corresponding controls. The method imposes three biologically meaningful constraints: (1) expression levels of each gene should be explained by the expression levels of a small number of transcription factor coding genes, (2) networks inferred from different data sets should be similar with respect to the type and number of regulatory interactions, and (3) relationships between genes which exhibit similar differential behavior over the considered perturbations should be favored. We demonstrate that these constraints can be transformed in a fused LASSO formulation for the proposed method. The comparative analysis on transcriptomics time-series data from prokaryotic species, Escherichia coli and Mycobacterium tuberculosis, as well as a eukaryotic species, mouse, demonstrated that the proposed method has the advantages of the most recent approaches for regulatory network inference, while obtaining better performance and assigning higher scores to the true regulatory links. The study indicates that the combination of sparse regression techniques with other biologically meaningful constraints is a promising framework for gene regulatory network reconstructions.
Collapse
Affiliation(s)
- Nooshin Omranian
- Systems Biology and Mathematical Modelling Group, Max Planck Institute for Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam, Germany
- Department of Molecular Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Haus 20, 14476 Potsdam, Germany
| | - Jeanne M. O. Eloundou-Mbebi
- Systems Biology and Mathematical Modelling Group, Max Planck Institute for Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam, Germany
| | - Bernd Mueller-Roeber
- Department of Molecular Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Haus 20, 14476 Potsdam, Germany
| | - Zoran Nikoloski
- Systems Biology and Mathematical Modelling Group, Max Planck Institute for Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam, Germany
| |
Collapse
|
161
|
Ruyssinck J, Demeester P, Dhaene T, Saeys Y. Netter: re-ranking gene network inference predictions using structural network properties. BMC Bioinformatics 2016; 17:76. [PMID: 26862054 PMCID: PMC4746913 DOI: 10.1186/s12859-016-0913-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Accepted: 01/20/2016] [Indexed: 11/18/2022] Open
Abstract
Background Many algorithms have been developed to infer the topology of gene regulatory networks from gene expression data. These methods typically produce a ranking of links between genes with associated confidence scores, after which a certain threshold is chosen to produce the inferred topology. However, the structural properties of the predicted network do not resemble those typical for a gene regulatory network, as most algorithms only take into account connections found in the data and do not include known graph properties in their inference process. This lowers the prediction accuracy of these methods, limiting their usability in practice. Results We propose a post-processing algorithm which is applicable to any confidence ranking of regulatory interactions obtained from a network inference method which can use, inter alia, graphlets and several graph-invariant properties to re-rank the links into a more accurate prediction. To demonstrate the potential of our approach, we re-rank predictions of six different state-of-the-art algorithms using three simple network properties as optimization criteria and show that Netter can improve the predictions made on both artificially generated data as well as the DREAM4 and DREAM5 benchmarks. Additionally, the DREAM5 E.coli. community prediction inferred from real expression data is further improved. Furthermore, Netter compares favorably to other post-processing algorithms and is not restricted to correlation-like predictions. Lastly, we demonstrate that the performance increase is robust for a wide range of parameter settings. Netter is available at http://bioinformatics.intec.ugent.be. Conclusions Network inference from high-throughput data is a long-standing challenge. In this work, we present Netter, which can further refine network predictions based on a set of user-defined graph properties. Netter is a flexible system which can be applied in unison with any method producing a ranking from omics data. It can be tailored to specific prior knowledge by expert users but can also be applied in general uses cases. Concluding, we believe that Netter is an interesting second step in the network inference process to further increase the quality of prediction. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0913-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Joeri Ruyssinck
- Department of Information Technology, Ghent University - iMinds, IBCN research group iGent Technologiepark 15, Ghent, B-9052, Belgium. .,Bioinformatics Institute Ghent, Ghent University - VIB, Ghent, B-9000, Belgium.
| | - Piet Demeester
- Department of Information Technology, Ghent University - iMinds, IBCN research group iGent Technologiepark 15, Ghent, B-9052, Belgium. .,Bioinformatics Institute Ghent, Ghent University - VIB, Ghent, B-9000, Belgium.
| | - Tom Dhaene
- Department of Information Technology, Ghent University - iMinds, IBCN research group iGent Technologiepark 15, Ghent, B-9052, Belgium. .,Bioinformatics Institute Ghent, Ghent University - VIB, Ghent, B-9000, Belgium.
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine group, VIB Inflammation Research Center, Ghent, Belgium. .,Department of Internal Medicine, Ghent University, Ghent, Belgium.
| |
Collapse
|
162
|
Niu Z, Chasman D, Eisfeld AJ, Kawaoka Y, Roy S. Multi-task consensus clustering of genome-wide transcriptomes from related biological conditions. Bioinformatics 2016; 32:1509-17. [PMID: 26801959 DOI: 10.1093/bioinformatics/btw007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 01/04/2016] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Identifying the shared and pathogen-specific components of host transcriptional regulatory programs is important for understanding the principles of regulation of immune response. Recent efforts in systems biology studies of infectious diseases have resulted in a large collection of datasets measuring host transcriptional response to various pathogens. Computational methods to identify and compare gene expression modules across different infections offer a powerful way to identify strain-specific and shared components of the regulatory program. An important challenge is to identify statistically robust gene expression modules as well as to reliably detect genes that change their module memberships between infections. RESULTS We present MULCCH (MULti-task spectral Consensus Clustering for Hierarchically related tasks), a consensus extension of a multi-task clustering algorithm to infer high-confidence strain-specific host response modules under infections from multiple virus strains. On simulated data, MULCCH more accurately identifies genes exhibiting pathogen-specific patterns compared to non-consensus and nonmulti-task clustering approaches. Application of MULCCH to mammalian transcriptional response to a panel of influenza viruses showed that our method identifies clusters with greater coherence compared to non-consensus methods. Further, MULCCH derived clusters are enriched for several immune system-related processes and regulators. In summary, MULCCH provides a reliable module-based approach to identify molecular pathways and gene sets characterizing commonality and specificity of host response to viruses of different pathogenicities. AVAILABILITY AND IMPLEMENTATION The source code is available at https://bitbucket.org/roygroup/mulcch CONTACT sroy@biostat.wisc.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhen Niu
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Deborah Chasman
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Amie J Eisfeld
- Influenza Research Institute, Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI, 53711, USA
| | - Yoshihiro Kawaoka
- Influenza Research Institute, Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI, 53711, USA Division of Virology, Department of Microbiology and Immunology, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Sushmita Roy
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, USA
| |
Collapse
|
163
|
Villaverde AF, Becker K, Banga JR. PREMER: Parallel Reverse Engineering of Biological Networks with Information Theory. COMPUTATIONAL METHODS IN SYSTEMS BIOLOGY 2016. [DOI: 10.1007/978-3-319-45177-0_21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
164
|
Petrovskiy ED, Saik OV, Tiys ES, Lavrik IN, Kolchanov NA, Ivanisenko VA. Prediction of tissue-specific effects of gene knockout on apoptosis in different anatomical structures of human brain. BMC Genomics 2015; 16 Suppl 13:S3. [PMID: 26693857 PMCID: PMC4686796 DOI: 10.1186/1471-2164-16-s13-s3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND An important issue in the target identification for the drug design is the tissue-specific effect of inhibition of target genes. The task of assessing the tissue-specific effect in suppressing gene activity is especially relevant in the studies of the brain, because a significant variability in gene expression levels among different areas of the brain was well documented. RESULTS A method is proposed for constructing statistical models to predict the potential effect of the knockout of target genes on the expression of genes involved in the regulation of apoptosis in various brain regions. The model connects the expression of the objective group of genes with expression of the target gene by means of machine learning models trained on available expression data. Information about the interactions between target and objective genes is determined by reconstruction of target-centric gene network. STRING and ANDSystem databases are used for the reconstruction of gene networks. The developed models have been used to analyse gene knockout effects of more than 7,500 target genes on the expression of 1,900 objective genes associated with the Gene Ontology category "apoptotic process". The tissue-specific effect was calculated for 12 main anatomical structures of the human brain. Initial values of gene expression in these anatomical structures were taken from the Allen Brain Atlas database. The results of the predictions of the effect of suppressing the activity of target genes on apoptosis, calculated on average for all brain structures, were in good agreement with experimental data on siRNA-inhibition. CONCLUSIONS This theoretical paper presents an approach that can be used to assess tissue-specific gene knockout effect on gene expression of the studied biological process in various structures of the brain. Genes that, according to the predictions of the model, have the highest values of tissue-specific effects on the apoptosis network can be considered as potential pharmacological targets for the development of drugs that would potentially have strong effect on the specific area of the brain and a much weaker effect on other brain structures. Further experiments should be provided in order to confirm the potential findings of the method.
Collapse
Affiliation(s)
- Evgeny D Petrovskiy
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090, Russia
- International Tomography Center, The Siberian Branch of the Russian Academy of Sciences, Institutskaya 3A, Novosibirsk, 630090, Russia
| | - Olga V Saik
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090, Russia
| | - Evgeny S Tiys
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090, Russia
| | - Inna N Lavrik
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090, Russia
- Otto von Guericke University Magdeburg, Medical Faculty, Department Translational Inflammation Research, Institute of Experimental Internal Medicine, Pfälzer Platz, Building 28, Magdeburg, 39106, Germany
| | - Nikolay A Kolchanov
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090, Russia
| | - Vladimir A Ivanisenko
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090, Russia
| |
Collapse
|
165
|
Arrieta-Ortiz ML, Hafemeister C, Bate AR, Chu T, Greenfield A, Shuster B, Barry SN, Gallitto M, Liu B, Kacmarczyk T, Santoriello F, Chen J, Rodrigues CDA, Sato T, Rudner DZ, Driks A, Bonneau R, Eichenberger P. An experimentally supported model of the Bacillus subtilis global transcriptional regulatory network. Mol Syst Biol 2015; 11:839. [PMID: 26577401 PMCID: PMC4670728 DOI: 10.15252/msb.20156236] [Citation(s) in RCA: 137] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Organisms from all domains of life use gene regulation networks to control cell growth, identity, function, and responses to environmental challenges. Although accurate global regulatory models would provide critical evolutionary and functional insights, they remain incomplete, even for the best studied organisms. Efforts to build comprehensive networks are confounded by challenges including network scale, degree of connectivity, complexity of organism–environment interactions, and difficulty of estimating the activity of regulatory factors. Taking advantage of the large number of known regulatory interactions in Bacillus subtilis and two transcriptomics datasets (including one with 38 separate experiments collected specifically for this study), we use a new combination of network component analysis and model selection to simultaneously estimate transcription factor activities and learn a substantially expanded transcriptional regulatory network for this bacterium. In total, we predict 2,258 novel regulatory interactions and recall 74% of the previously known interactions. We obtained experimental support for 391 (out of 635 evaluated) novel regulatory edges (62% accuracy), thus significantly increasing our understanding of various cell processes, such as spore formation.
Collapse
Affiliation(s)
- Mario L Arrieta-Ortiz
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Christoph Hafemeister
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Ashley Rose Bate
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Timothy Chu
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Alex Greenfield
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Bentley Shuster
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Samantha N Barry
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Matthew Gallitto
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Brian Liu
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Thadeous Kacmarczyk
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Francis Santoriello
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Jie Chen
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | | | - Tsutomu Sato
- Department of Frontier Bioscience, Hosei University, Koganei, Tokyo, Japan
| | - David Z Rudner
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA, USA
| | - Adam Driks
- Department of Microbiology and Immunology, Stritch School of Medicine, Loyola University Chicago, Maywood, IL, USA
| | - Richard Bonneau
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA Courant Institute of Mathematical Science, Computer Science Department, New York, NY, USA Simons Foundation, Simons Center for Data Analysis, New York, NY, USA
| | - Patrick Eichenberger
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| |
Collapse
|
166
|
Henriques D, Rocha M, Saez-Rodriguez J, Banga JR. Reverse engineering of logic-based differential equation models using a mixed-integer dynamic optimization approach. Bioinformatics 2015; 31:2999-3007. [PMID: 26002881 PMCID: PMC4565031 DOI: 10.1093/bioinformatics/btv314] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2014] [Revised: 05/12/2015] [Accepted: 05/15/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Systems biology models can be used to test new hypotheses formulated on the basis of previous knowledge or new experimental data, contradictory with a previously existing model. New hypotheses often come in the shape of a set of possible regulatory mechanisms. This search is usually not limited to finding a single regulation link, but rather a combination of links subject to great uncertainty or no information about the kinetic parameters. RESULTS In this work, we combine a logic-based formalism, to describe all the possible regulatory structures for a given dynamic model of a pathway, with mixed-integer dynamic optimization (MIDO). This framework aims to simultaneously identify the regulatory structure (represented by binary parameters) and the real-valued parameters that are consistent with the available experimental data, resulting in a logic-based differential equation model. The alternative to this would be to perform real-valued parameter estimation for each possible model structure, which is not tractable for models of the size presented in this work. The performance of the method presented here is illustrated with several case studies: a synthetic pathway problem of signaling regulation, a two-component signal transduction pathway in bacterial homeostasis, and a signaling network in liver cancer cells. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT julio@iim.csic.es or saezrodriguez@ebi.ac.uk.
Collapse
Affiliation(s)
- David Henriques
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, C/Eduardo Cabello 6, 36208 Vigo, Spain, Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal and European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, C/Eduardo Cabello 6, 36208 Vigo, Spain, Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal and European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Miguel Rocha
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, C/Eduardo Cabello 6, 36208 Vigo, Spain, Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal and European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Julio Saez-Rodriguez
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, C/Eduardo Cabello 6, 36208 Vigo, Spain, Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal and European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Julio R Banga
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, C/Eduardo Cabello 6, 36208 Vigo, Spain, Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal and European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| |
Collapse
|
167
|
Trejo Banos D, Millar AJ, Sanguinetti G. A Bayesian approach for structure learning in oscillating regulatory networks. Bioinformatics 2015; 31:3617-24. [PMID: 26177966 PMCID: PMC4817140 DOI: 10.1093/bioinformatics/btv414] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2014] [Accepted: 07/07/2015] [Indexed: 12/26/2022] Open
Abstract
Motivation: Oscillations lie at the core of many biological processes, from the cell cycle, to circadian oscillations and developmental processes. Time-keeping mechanisms are essential to enable organisms to adapt to varying conditions in environmental cycles, from day/night to seasonal. Transcriptional regulatory networks are one of the mechanisms behind these biological oscillations. However, while identifying cyclically expressed genes from time series measurements is relatively easy, determining the structure of the interaction network underpinning the oscillation is a far more challenging problem. Results: Here, we explicitly leverage the oscillatory nature of the transcriptional signals and present a method for reconstructing network interactions tailored to this special but important class of genetic circuits. Our method is based on projecting the signal onto a set of oscillatory basis functions using a Discrete Fourier Transform. We build a Bayesian Hierarchical model within a frequency domain linear model in order to enforce sparsity and incorporate prior knowledge about the network structure. Experiments on real and simulated data show that the method can lead to substantial improvements over competing approaches if the oscillatory assumption is met, and remains competitive also in cases it is not. Availability: DSS, experiment scripts and data are available at http://homepages.inf.ed.ac.uk/gsanguin/DSS.zip. Contact: d.trejo-banos@sms.ed.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Trejo Banos
- School of Informatics, University of Edinburgh, 10 Crichton St, Edinburgh EH8 9AB, UK
| | - Andrew J Millar
- SynthSys-Systems and Synthetic Biology, University of Edinburgh, CH Waddington Building, King's Buildings, Mayfield Road, Edinburgh EH9 3JD, UK and School of Biological Sciences, University of Edinburgh, Darwin Building, King's Buildings, Mayfield Road, Edinburgh EH9 3JR, UK
| | - Guido Sanguinetti
- School of Informatics, University of Edinburgh, 10 Crichton St, Edinburgh EH8 9AB, UK, SynthSys-Systems and Synthetic Biology, University of Edinburgh, CH Waddington Building, King's Buildings, Mayfield Road, Edinburgh EH9 3JD, UK and
| |
Collapse
|
168
|
Ciaccio MF, Chen VC, Jones RB, Bagheri N. The DIONESUS algorithm provides scalable and accurate reconstruction of dynamic phosphoproteomic networks to reveal new drug targets. Integr Biol (Camb) 2015; 7:776-91. [PMID: 26057728 PMCID: PMC4511116 DOI: 10.1039/c5ib00065c] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Many drug candidates fail in clinical trials due to an incomplete understanding of how small-molecule perturbations affect cell phenotype. Cellular responses can be non-intuitive due to systems-level properties such as redundant pathways caused by co-activation of multiple receptor tyrosine kinases. We therefore created a scalable algorithm, DIONESUS, based on partial least squares regression with variable selection to reconstruct a cellular signaling network in a human carcinoma cell line driven by EGFR overexpression. We perturbed the cells with 26 diverse growth factors and/or small molecules chosen to activate or inhibit specific subsets of receptor tyrosine kinases. We then quantified the abundance of 60 phosphosites at four time points using a modified microwestern array, a high-confidence assay of protein abundance and modification. DIONESUS, after being validated using three in silico networks, was applied to connect perturbations, phosphorylation, and cell phenotype from the high-confidence, microwestern dataset. We identified enhancement of STAT1 activity as a potential strategy to treat EGFR-hyperactive cancers and PTEN as a target of the antioxidant, N-acetylcysteine. Quantification of the relationship between drug dosage and cell viability in a panel of triple-negative breast cancer cell lines validated proposed therapeutic strategies.
Collapse
Affiliation(s)
- Mark F Ciaccio
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA.
| | | | | | | |
Collapse
|
169
|
Inferring regulatory networks from experimental morphological phenotypes: a computational method reverse-engineers planarian regeneration. PLoS Comput Biol 2015; 11:e1004295. [PMID: 26042810 PMCID: PMC4456145 DOI: 10.1371/journal.pcbi.1004295] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 04/21/2015] [Indexed: 01/18/2023] Open
Abstract
Transformative applications in biomedicine require the discovery of complex regulatory networks that explain the development and regeneration of anatomical structures, and reveal what external signals will trigger desired changes of large-scale pattern. Despite recent advances in bioinformatics, extracting mechanistic pathway models from experimental morphological data is a key open challenge that has resisted automation. The fundamental difficulty of manually predicting emergent behavior of even simple networks has limited the models invented by human scientists to pathway diagrams that show necessary subunit interactions but do not reveal the dynamics that are sufficient for complex, self-regulating pattern to emerge. To finally bridge the gap between high-resolution genetic data and the ability to understand and control patterning, it is critical to develop computational tools to efficiently extract regulatory pathways from the resultant experimental shape phenotypes. For example, planarian regeneration has been studied for over a century, but despite increasing insight into the pathways that control its stem cells, no constructive, mechanistic model has yet been found by human scientists that explains more than one or two key features of its remarkable ability to regenerate its correct anatomical pattern after drastic perturbations. We present a method to infer the molecular products, topology, and spatial and temporal non-linear dynamics of regulatory networks recapitulating in silico the rich dataset of morphological phenotypes resulting from genetic, surgical, and pharmacological experiments. We demonstrated our approach by inferring complete regulatory networks explaining the outcomes of the main functional regeneration experiments in the planarian literature; By analyzing all the datasets together, our system inferred the first systems-biology comprehensive dynamical model explaining patterning in planarian regeneration. This method provides an automated, highly generalizable framework for identifying the underlying control mechanisms responsible for the dynamic regulation of growth and form. Developmental and regenerative biology experiments are producing a huge number of morphological phenotypes from functional perturbation experiments. However, existing pathway models do not generally explain the dynamic regulation of anatomical shape due to the difficulty of inferring and testing non-linear regulatory networks responsible for appropriate form, shape, and pattern. We present a method that automates the discovery and testing of regulatory networks explaining morphological outcomes directly from the resultant phenotypes, producing network models as testable hypotheses explaining regeneration data. Our system integrates a formalization of the published results in planarian regeneration, an in silico simulator in which the patterning properties of regulatory networks can be quantitatively tested in a regeneration assay, and a machine learning module that evolves networks whose behavior in this assay optimally matches the database of planarian results. We applied our method to explain the key experiments in planarian regeneration, and discovered the first comprehensive model of anterior-posterior patterning in planaria under surgical, pharmacological, and genetic manipulations. Beyond the planarian data, our approach is readily generalizable to facilitate the discovery of testable regulatory networks in developmental biology and biomedicine, and represents the first developmental model discovered de novo from morphological outcomes by an automated system.
Collapse
|
170
|
Inferring Broad Regulatory Biology from Time Course Data: Have We Reached an Upper Bound under Constraints Typical of In Vivo Studies? PLoS One 2015; 10:e0127364. [PMID: 25984725 PMCID: PMC4435750 DOI: 10.1371/journal.pone.0127364] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 04/13/2015] [Indexed: 12/21/2022] Open
Abstract
There is a growing appreciation for the network biology that regulates the coordinated expression of molecular and cellular markers however questions persist regarding the identifiability of these networks. Here we explore some of the issues relevant to recovering directed regulatory networks from time course data collected under experimental constraints typical of in vivo studies. NetSim simulations of sparsely connected biological networks were used to evaluate two simple feature selection techniques used in the construction of linear Ordinary Differential Equation (ODE) models, namely truncation of terms versus latent vector projection. Performance was compared with ODE-based Time Series Network Identification (TSNI) integral, and the information-theoretic Time-Delay ARACNE (TD-ARACNE). Projection-based techniques and TSNI integral outperformed truncation-based selection and TD-ARACNE on aggregate networks with edge densities of 10-30%, i.e. transcription factor, protein-protein cliques and immune signaling networks. All were more robust to noise than truncation-based feature selection. Performance was comparable on the in silico 10-node DREAM 3 network, a 5-node Yeast synthetic network designed for In vivo Reverse-engineering and Modeling Assessment (IRMA) and a 9-node human HeLa cell cycle network of similar size and edge density. Performance was more sensitive to the number of time courses than to sample frequency and extrapolated better to larger networks by grouping experiments. In all cases performance declined rapidly in larger networks with lower edge density. Limited recovery and high false positive rates obtained overall bring into question our ability to generate informative time course data rather than the design of any particular reverse engineering algorithm.
Collapse
|
171
|
Darnell CL, Schmid AK. Systems biology approaches to defining transcription regulatory networks in halophilic archaea. Methods 2015; 86:102-14. [PMID: 25976837 DOI: 10.1016/j.ymeth.2015.04.034] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Revised: 04/27/2015] [Accepted: 04/28/2015] [Indexed: 12/31/2022] Open
Abstract
To survive complex and changing environmental conditions, microorganisms use gene regulatory networks (GRNs) composed of interacting regulatory transcription factors (TFs) to control the timing and magnitude of gene expression. Genome-wide datasets; such as transcriptomics and protein-DNA interactions; and experiments such as high throughput growth curves; facilitate the construction of GRNs and provide insight into TF interactions occurring under stress. Systems biology approaches integrate these datasets into models of GRN architecture as well as statistical and/or dynamical models to understand the function of networks occurring in cells. Previously, these types of studies have focused on traditional model organisms (e.g. Escherichia coli, yeast). However, recent advances in archaeal genetics and other tools have enabled a systems approach to understanding GRNs in these relatively less studied archaeal model organisms. In this report, we outline a systems biology workflow for generating and integrating data focusing on the TF regulator. We discuss experimental design, outline the process of data collection, and provide the tools required to produce high confidence regulons for the TFs of interest. We provide a case study as an example of this workflow, describing the construction of a GRN centered on multi-TF coordinate control of gene expression governing the oxidative stress response in the hypersaline-adapted archaeon Halobacterium salinarum.
Collapse
Affiliation(s)
| | - Amy K Schmid
- Biology Department, Duke University, Durham, NC 27708, USA; Center for Systems Biology, Duke University, Durham, NC 27708, USA.
| |
Collapse
|
172
|
Liu ZP. Reverse Engineering of Genome-wide Gene Regulatory Networks from Gene Expression Data. Curr Genomics 2015; 16:3-22. [PMID: 25937810 PMCID: PMC4412962 DOI: 10.2174/1389202915666141110210634] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Revised: 09/05/2014] [Accepted: 09/05/2014] [Indexed: 12/17/2022] Open
Abstract
Transcriptional regulation plays vital roles in many fundamental biological processes. Reverse engineering of genome-wide regulatory networks from high-throughput transcriptomic data provides a promising way to characterize the global scenario of regulatory relationships between regulators and their targets. In this review, we summarize and categorize the main frameworks and methods currently available for inferring transcriptional regulatory networks from microarray gene expression profiling data. We overview each of strategies and introduce representative methods respectively. Their assumptions, advantages, shortcomings, and possible improvements and extensions are also clarified and commented.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
173
|
Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol 2015; 11:e1004226. [PMID: 25950956 PMCID: PMC4423992 DOI: 10.1371/journal.pcbi.1004226] [Citation(s) in RCA: 778] [Impact Index Per Article: 86.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Accepted: 03/02/2015] [Indexed: 11/19/2022] Open
Abstract
16S ribosomal RNA (rRNA) gene and other environmental sequencing techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions (from metabolic and immunological health in mammals to ecological stability in soils and oceans), identification of underlying mechanisms requires new statistical tools, as these datasets present several technical challenges. First, the abundances of microbial operational taxonomic units (OTUs) from amplicon-based datasets are compositional. Counts are normalized to the total number of counts in the sample. Thus, microbial abundances are not independent, and traditional statistical metrics (e.g., correlation) for the detection of OTU-OTU relationships can lead to spurious results. Secondly, microbial sequencing-based studies typically measure hundreds of OTUs on only tens to hundreds of samples; thus, inference of OTU-OTU association networks is severely under-powered, and additional information (or assumptions) are required for accurate inference. Here, we present SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that addresses both of these issues. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological association network is sparse. To reconstruct the network, SPIEC-EASI relies on algorithms for sparse neighborhood and inverse covariance selection. To provide a synthetic benchmark in the absence of an experimentally validated gold-standard network, SPIEC-EASI is accompanied by a set of computational tools to generate OTU count data from a set of diverse underlying network topologies. SPIEC-EASI outperforms state-of-the-art methods to recover edges and network properties on synthetic data under a variety of scenarios. SPIEC-EASI also reproducibly predicts previously unknown microbial associations using data from the American Gut project.
Collapse
Affiliation(s)
- Zachary D. Kurtz
- Departments of Microbiology and Medicine, New York University School of Medicine, New York, New York, United States of America
| | - Christian L. Müller
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Courant Institute of Mathematical Sciences, New York University, New York, New York, United States of America
| | - Emily R. Miraldi
- Departments of Microbiology and Medicine, New York University School of Medicine, New York, New York, United States of America
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Courant Institute of Mathematical Sciences, New York University, New York, New York, United States of America
| | - Dan R. Littman
- Departments of Microbiology and Medicine, New York University School of Medicine, New York, New York, United States of America
| | - Martin J. Blaser
- Departments of Microbiology and Medicine, New York University School of Medicine, New York, New York, United States of America
| | - Richard A. Bonneau
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Courant Institute of Mathematical Sciences, New York University, New York, New York, United States of America
- Simons Center for Data Analysis, Simons Foundation, New York, New York, United States of America
| |
Collapse
|
174
|
Linde J, Schulze S, Henkel SG, Guthke R. Data- and knowledge-based modeling of gene regulatory networks: an update. EXCLI JOURNAL 2015; 14:346-78. [PMID: 27047314 PMCID: PMC4817425 DOI: 10.17179/excli2015-168] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 02/10/2015] [Indexed: 02/01/2023]
Abstract
Gene regulatory network inference is a systems biology approach which predicts interactions between genes with the help of high-throughput data. In this review, we present current and updated network inference methods focusing on novel techniques for data acquisition, network inference assessment, network inference for interacting species and the integration of prior knowledge. After the advance of Next-Generation-Sequencing of cDNAs derived from RNA samples (RNA-Seq) we discuss in detail its application to network inference. Furthermore, we present progress for large-scale or even full-genomic network inference as well as for small-scale condensed network inference and review advances in the evaluation of network inference methods by crowdsourcing. Finally, we reflect the current availability of data and prior knowledge sources and give an outlook for the inference of gene regulatory networks that reflect interacting species, in particular pathogen-host interactions.
Collapse
Affiliation(s)
- Jörg Linde
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | - Sylvie Schulze
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | | | - Reinhard Guthke
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| |
Collapse
|
175
|
Abstract
Behaviours of complex biomolecular systems are often irreducible to the elementary properties of their individual components. Explanatory and predictive mathematical models are therefore useful for fully understanding and precisely engineering cellular functions. The development and analyses of these models require their adaptation to the problems that need to be solved and the type and amount of available genetic or molecular data. Quantitative and logic modelling are among the main methods currently used to model molecular and gene networks. Each approach comes with inherent advantages and weaknesses. Recent developments show that hybrid approaches will become essential for further progress in synthetic biology and in the development of virtual organisms.
Collapse
Affiliation(s)
- Nicolas Le Novère
- Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| |
Collapse
|
176
|
An integrated approach to reconstructing genome-scale transcriptional regulatory networks. PLoS Comput Biol 2015; 11:e1004103. [PMID: 25723545 PMCID: PMC4344238 DOI: 10.1371/journal.pcbi.1004103] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Accepted: 12/23/2014] [Indexed: 11/24/2022] Open
Abstract
Transcriptional regulatory networks (TRNs) program cells to dynamically alter their gene expression in response to changing internal or environmental conditions. In this study, we develop a novel workflow for generating large-scale TRN models that integrates comparative genomics data, global gene expression analyses, and intrinsic properties of transcription factors (TFs). An assessment of this workflow using benchmark datasets for the well-studied γ-proteobacterium Escherichia coli showed that it outperforms expression-based inference approaches, having a significantly larger area under the precision-recall curve. Further analysis indicated that this integrated workflow captures different aspects of the E. coli TRN than expression-based approaches, potentially making them highly complementary. We leveraged this new workflow and observations to build a large-scale TRN model for the α-Proteobacterium Rhodobacter sphaeroides that comprises 120 gene clusters, 1211 genes (including 93 TFs), 1858 predicted protein-DNA interactions and 76 DNA binding motifs. We found that ~67% of the predicted gene clusters in this TRN are enriched for functions ranging from photosynthesis or central carbon metabolism to environmental stress responses. We also found that members of many of the predicted gene clusters were consistent with prior knowledge in R. sphaeroides and/or other bacteria. Experimental validation of predictions from this R. sphaeroides TRN model showed that high precision and recall was also obtained for TFs involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341). In addition, this integrative approach enabled generation of TRNs with increased information content relative to R. sphaeroides TRN models built via other approaches. We also show how this approach can be used to simultaneously produce TRN models for each related organism used in the comparative genomics analysis. Our results highlight the advantages of integrating comparative genomics of closely related organisms with gene expression data to assemble large-scale TRN models with high-quality predictions. The ever growing amount of genomic data enables the assembly of large-scale network models that can provide important new insights into living systems. However, assembly and validation of such large-scale models can be challenging, since we often lack sufficient information to make accurate predictions. This work describes a new approach for constructing large-scale transcriptional regulatory networks of individual cells. We show that the reconstructed network captures a significantly larger fraction of cellular regulatory processes than networks generated by other existing approaches. We predict this approach, with appropriate refinements, will allow reconstruction of large-scale transcriptional network models for a variety of other organisms. As we work towards modeling the function of cells or complex ecosystems, individually reconstructed network models of signaling, information transfer and metabolism, can be integrated to provide high information predictions and insights not otherwise obtainable.
Collapse
|
177
|
Schulze S, Henkel SG, Driesch D, Guthke R, Linde J. Computational prediction of molecular pathogen-host interactions based on dual transcriptome data. Front Microbiol 2015; 6:65. [PMID: 25705211 PMCID: PMC4319478 DOI: 10.3389/fmicb.2015.00065] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 01/19/2015] [Indexed: 11/13/2022] Open
Abstract
Inference of inter-species gene regulatory networks based on gene expression data is an important computational method to predict pathogen-host interactions (PHIs). Both the experimental setup and the nature of PHIs exhibit certain characteristics. First, besides an environmental change, the battle between pathogen and host leads to a constantly changing environment and thus complex gene expression patterns. Second, there might be a delay until one of the organisms reacts. Third, toward later time points only one organism may survive leading to missing gene expression data of the other organism. Here, we account for PHI characteristics by extending NetGenerator, a network inference tool that predicts gene regulatory networks from gene expression time series data. We tested multiple modeling scenarios regarding the stimuli functions of the interaction network based on a benchmark example. We show that modeling perturbation of a PHI network by multiple stimuli better represents the underlying biological phenomena. Furthermore, we utilized the benchmark example to test the influence of missing data points on the inference performance. Our results suggest that PHI network inference with missing data is possible, but we recommend to provide complete time series data. Finally, we extended the NetGenerator tool to incorporate gene- and time point specific variances, because complex PHIs may lead to high variance in expression data. Sample variances are directly considered in the objective function of NetGenerator and indirectly by testing the robustness of interactions based on variance dependent disturbance of gene expression values. We evaluated the method of variance incorporation on dual RNA sequencing (RNA-Seq) data of Mus musculus dendritic cells incubated with Candida albicans and proofed our method by predicting previously verified PHIs as robust interactions.
Collapse
Affiliation(s)
- Sylvie Schulze
- Department of Systems Biology and Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knoell-Institute Jena, Germany
| | | | | | - Reinhard Guthke
- Department of Systems Biology and Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knoell-Institute Jena, Germany
| | - Jörg Linde
- Department of Systems Biology and Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knoell-Institute Jena, Germany
| |
Collapse
|
178
|
Abstract
The succession of protein activation and deactivation mediated by phosphorylation and dephosphorylation events constitutes a key mechanism of molecular information transfer in cellular systems. To deduce the details of those molecular information cascades and networks has been a central goal pursued by both experimental and computational approaches. Many computational network reconstruction methods employing an array of different statistical learning methods have been developed to infer phosphorylation networks based on different types of molecular data sets such as protein sequence, protein structure, or phosphoproteomics data. In this chapter, different computational network inference methods and resources for biological network reconstruction with a particular focus on phosphorylation networks are surveyed.
Collapse
|
179
|
Wang F, Tian Z, Wei H. Genomic expression profiling of NK cells in health and disease. Eur J Immunol 2014; 45:661-78. [PMID: 25476835 DOI: 10.1002/eji.201444998] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Revised: 10/01/2014] [Accepted: 12/01/2014] [Indexed: 12/15/2022]
Abstract
NK cells are important components of innate and adaptive immunity. Functionally, they play key roles in host defense against tumors and infectious pathogens. Within the past few years, genomic-scale experiments have provided us with a plethora of gene expression data that reveal an extensive molecular and biological map underlying gene expression programs. In order to better explore and take advantage of existing datasets, we review here the genomic expression profiles of NK cells and their subpopulations in resting or stimulated states, in diseases, and in different organs; moreover, we contrast these expression data to those of other lymphocytes. We have also compiled a comprehensive list of genomic profiling studies of both human and murine NK cells in this review.
Collapse
Affiliation(s)
- Fuyan Wang
- Institute of Immunology, School of Life Sciences and Hefei National Laboratory for Physical Sciences at the Microscale, University of Science and Technology of China, Hefei, China; Diabetes Center, School of Medicine, Ningbo University, Ningbo, China
| | | | | |
Collapse
|
180
|
Jain S, Gitter A, Bar-Joseph Z. Multitask learning of signaling and regulatory networks with application to studying human response to flu. PLoS Comput Biol 2014; 10:e1003943. [PMID: 25522349 PMCID: PMC4270428 DOI: 10.1371/journal.pcbi.1003943] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 09/28/2014] [Indexed: 01/04/2023] Open
Abstract
Reconstructing regulatory and signaling response networks is one of the major goals of systems biology. While several successful methods have been suggested for this task, some integrating large and diverse datasets, these methods have so far been applied to reconstruct a single response network at a time, even when studying and modeling related conditions. To improve network reconstruction we developed MT-SDREM, a multi-task learning method which jointly models networks for several related conditions. In MT-SDREM, parameters are jointly constrained across the networks while still allowing for condition-specific pathways and regulation. We formulate the multi-task learning problem and discuss methods for optimizing the joint target function. We applied MT-SDREM to reconstruct dynamic human response networks for three flu strains: H1N1, H5N1 and H3N2. Our multi-task learning method was able to identify known and novel factors and genes, improving upon prior methods that model each condition independently. The MT-SDREM networks were also better at identifying proteins whose removal affects viral load indicating that joint learning can still lead to accurate, condition-specific, networks. Supporting website with MT-SDREM implementation: http://sb.cs.cmu.edu/mtsdrem To understand why some flu strains are more virulent than others, researchers attempt to profile and model the molecular human response to these strains and identify similarities and differences between the resulting models. So far, the modeling and analysis part has been done independently for each strain and the results contrasted in a post-processing step. Here we present a new method, termed MT-SDREM, that simultaneously models the response to all strains allowing us to identify both, the core response elements that are shared among the strains, and factors that are uniquely activated or repressed by individual strains. We applied this method to study the human response to three flu strains: H1N1, H3N2 and H5N1. As we show, the method was able to correctly identify several common and known factors regulating immune response to such strains and also identified unique factors for each of the strains. The models reconstructed by the simultaneous analysis method improved upon those generated by methods that model each strain response separately. Our joint models can be used to identify strain specific treatments as well as treatments that are likely to be effective against all three strains.
Collapse
Affiliation(s)
- Siddhartha Jain
- Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Anthony Gitter
- Microsoft Research, Cambridge, Massachusetts, United States of America
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Ziv Bar-Joseph
- Lane Center for Computational Biology and Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
181
|
Abstract
Mathematical models of natural systems are abstractions of much more complicated processes. Developing informative and realistic models of such systems typically involves suitable statistical inference methods, domain expertise, and a modicum of luck. Except for cases where physical principles provide sufficient guidance, it will also be generally possible to come up with a large number of potential models that are compatible with a given natural system and any finite amount of data generated from experiments on that system. Here we develop a computational framework to systematically evaluate potentially vast sets of candidate differential equation models in light of experimental and prior knowledge about biological systems. This topological sensitivity analysis enables us to evaluate quantitatively the dependence of model inferences and predictions on the assumed model structures. Failure to consider the impact of structural uncertainty introduces biases into the analysis and potentially gives rise to misleading conclusions.
Collapse
|
182
|
Knaack SA, Siahpirani AF, Roy S. A pan-cancer modular regulatory network analysis to identify common and cancer-specific network components. Cancer Inform 2014; 13:69-84. [PMID: 25374456 PMCID: PMC4213198 DOI: 10.4137/cin.s14058] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2014] [Revised: 09/22/2014] [Accepted: 09/24/2014] [Indexed: 12/19/2022] Open
Abstract
Many human diseases including cancer are the result of perturbations to transcriptional regulatory networks that control context-specific expression of genes. A comparative approach across multiple cancer types is a powerful approach to illuminate the common and specific network features of this family of diseases. Recent efforts from The Cancer Genome Atlas (TCGA) have generated large collections of functional genomic data sets for multiple types of cancers. An emerging challenge is to devise computational approaches that systematically compare these genomic data sets across different cancer types that identify common and cancer-specific network components. We present a module- and network-based characterization of transcriptional patterns in six different cancers being studied in TCGA: breast, colon, rectal, kidney, ovarian, and endometrial. Our approach uses a recently developed regulatory network reconstruction algorithm, modular regulatory network learning with per gene information (MERLIN), within a stability selection framework to predict regulators for individual genes and gene modules. Our module-based analysis identifies a common theme of immune system processes in each cancer study, with modules statistically enriched for immune response processes as well as targets of key immune response regulators from the interferon regulatory factor (IRF) and signal transducer and activator of transcription (STAT) families. Comparison of the inferred regulatory networks from each cancer type identified a core regulatory network that included genes involved in chromatin remodeling, cell cycle, and immune response. Regulatory network hubs included genes with known roles in specific cancer types as well as genes with potentially novel roles in different cancer types. Overall, our integrated module and network analysis recapitulated known themes in cancer biology and additionally revealed novel regulatory hubs that suggest a complex interplay of immune response, cell cycle, and chromatin remodeling across multiple cancers.
Collapse
Affiliation(s)
- Sara A Knaack
- Wisconsin Institute for Discovery, University of Wisconsin, Madison, WI, USA
| | - Alireza Fotuhi Siahpirani
- Wisconsin Institute for Discovery, University of Wisconsin, Madison, WI, USA. ; Department of Computer Sciences, University of Wisconsin, Madison, WI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin, Madison, WI, USA. ; Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
| |
Collapse
|
183
|
Bilal E, Sakellaropoulos T, Melas IN, Messinis DE, Belcastro V, Rhrissorrakrai K, Meyer P, Norel R, Iskandar A, Blaese E, Rice JJ, Peitsch MC, Hoeng J, Stolovitzky G, Alexopoulos LG, Poussin C. A crowd-sourcing approach for the construction of species-specific cell signaling networks. Bioinformatics 2014; 31:484-91. [PMID: 25294919 PMCID: PMC4325542 DOI: 10.1093/bioinformatics/btu659] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Motivation: Animal models are important tools in drug discovery and for understanding human biology in general. However, many drugs that initially show promising results in rodents fail in later stages of clinical trials. Understanding the commonalities and differences between human and rat cell signaling networks can lead to better experimental designs, improved allocation of resources and ultimately better drugs. Results: The sbv IMPROVER Species-Specific Network Inference challenge was designed to use the power of the crowds to build two species-specific cell signaling networks given phosphoproteomics, transcriptomics and cytokine data generated from NHBE and NRBE cells exposed to various stimuli. A common literature-inspired reference network with 220 nodes and 501 edges was also provided as prior knowledge from which challenge participants could add or remove edges but not nodes. Such a large network inference challenge not based on synthetic simulations but on real data presented unique difficulties in scoring and interpreting the results. Because any prior knowledge about the networks was already provided to the participants for reference, novel ways for scoring and aggregating the results were developed. Two human and rat consensus networks were obtained by combining all the inferred networks. Further analysis showed that major signaling pathways were conserved between the two species with only isolated components diverging, as in the case of ribosomal S6 kinase RPS6KA1. Overall, the consensus between inferred edges was relatively high with the exception of the downstream targets of transcription factors, which seemed more difficult to predict. Contact:ebilal@us.ibm.com or gustavo@us.ibm.com. Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Erhan Bilal
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - Theodore Sakellaropoulos
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - Ioannis N Melas
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - Dimitris E Messinis
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - Vincenzo Belcastro
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - Kahn Rhrissorrakrai
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - Pablo Meyer
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - Raquel Norel
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - Anita Iskandar
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - Elise Blaese
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - John J Rice
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - Manuel C Peitsch
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - Julia Hoeng
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - Gustavo Stolovitzky
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - Leonidas G Alexopoulos
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | - Carine Poussin
- IBM Research, Computational Biology Center, Yorktown Heights, NY 10598, USA, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki, Greece, National Technical University of Athens, Heroon Polytechniou 9, Zografou, 15780, Greece and Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
| | | |
Collapse
|
184
|
Studham ME, Tjärnberg A, Nordling TEM, Nelander S, Sonnhammer ELL. Functional association networks as priors for gene regulatory network inference. ACTA ACUST UNITED AC 2014; 30:i130-8. [PMID: 24931976 PMCID: PMC4058914 DOI: 10.1093/bioinformatics/btu285] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Motivation: Gene regulatory network (GRN) inference reveals the influences genes have on one another in cellular regulatory systems. If the experimental data are inadequate for reliable inference of the network, informative priors have been shown to improve the accuracy of inferences. Results: This study explores the potential of undirected, confidence-weighted networks, such as those in functional association databases, as a prior source for GRN inference. Such networks often erroneously indicate symmetric interaction between genes and may contain mostly correlation-based interaction information. Despite these drawbacks, our testing on synthetic datasets indicates that even noisy priors reflect some causal information that can improve GRN inference accuracy. Our analysis on yeast data indicates that using the functional association databases FunCoup and STRING as priors can give a small improvement in GRN inference accuracy with biological data. Contact:matthew.studham@scilifelab.se Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matthew E Studham
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| | - Andreas Tjärnberg
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| | - Torbjörn E M Nordling
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| | - Sven Nelander
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| | - Erik L L Sonnhammer
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| |
Collapse
|
185
|
van Dam JCJ, Schaap PJ, Martins dos Santos VAP, Suárez-Diez M. Integration of heterogeneous molecular networks to unravel gene-regulation in Mycobacterium tuberculosis. BMC SYSTEMS BIOLOGY 2014; 8:111. [PMID: 25279447 PMCID: PMC4181829 DOI: 10.1186/s12918-014-0111-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 09/05/2014] [Indexed: 12/23/2022]
Abstract
BACKGROUND Different methods have been developed to infer regulatory networks from heterogeneous omics datasets and to construct co-expression networks. Each algorithm produces different networks and efforts have been devoted to automatically integrate them into consensus sets. However each separate set has an intrinsic value that is diluted and partly lost when building a consensus network. Here we present a methodology to generate co-expression networks and, instead of a consensus network, we propose an integration framework where the different networks are kept and analysed with additional tools to efficiently combine the information extracted from each network. RESULTS We developed a workflow to efficiently analyse information generated by different inference and prediction methods. Our methodology relies on providing the user the means to simultaneously visualise and analyse the coexisting networks generated by different algorithms, heterogeneous datasets, and a suite of analysis tools. As a show case, we have analysed the gene co-expression networks of Mycobacterium tuberculosis generated using over 600 expression experiments. Regarding DNA damage repair, we identified SigC as a key control element, 12 new targets for LexA, an updated LexA binding motif, and a potential mismatch repair system. We expanded the DevR regulon with 27 genes while identifying 9 targets wrongly assigned to this regulon. We discovered 10 new genes linked to zinc uptake and a new regulatory mechanism for ZuR. The use of co-expression networks to perform system level analysis allows the development of custom made methodologies. As show cases we implemented a pipeline to integrate ChIP-seq data and another method to uncover multiple regulatory layers. CONCLUSIONS Our workflow is based on representing the multiple types of information as network representations and presenting these networks in a synchronous framework that allows their simultaneous visualization while keeping specific associations from the different networks. By simultaneously exploring these networks and metadata, we gained insights into regulatory mechanisms in M. tuberculosis that could not be obtained through the separate analysis of each data type.
Collapse
Affiliation(s)
- Jesse CJ van Dam
- />Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, 6703 HB Wageningen, The Netherlands
| | - Peter J Schaap
- />Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, 6703 HB Wageningen, The Netherlands
| | - Vitor AP Martins dos Santos
- />Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, 6703 HB Wageningen, The Netherlands
- />LifeGlimmer GmbH, Markelstrasse 38, Berlin, Germany
| | - María Suárez-Diez
- />Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, 6703 HB Wageningen, The Netherlands
| |
Collapse
|
186
|
Abstract
Genomic analysis of H. salinarum indicated that the de novo pathway for aromatic amino acid (AroAA) biosynthesis does not follow the classical pathway but begins from non-classical precursors, as is the case for M. jannaschii. The first two steps in the pathway were predicted to be carried out by genes OE1472F and OE1475F, while the 3rd step follows the canonical pathway involving gene OE1477R. The functions of these genes and their products were tested by biochemical and genetic methods. In this study, we provide evidence that supports the role of proteins OE1472F and OE1475F catalyzing consecutive enzymatic reactions leading to the production of 3-dehydroquinate (DHQ), after which AroAA production proceeds via the canonical pathway starting with the formation of DHS (dehydroshikimate), catalyzed by the product of ORF OE1477R. Nutritional requirements and AroAA uptake studies of the mutants gave results that were consistent with the proposed roles of these ORFs in AroAA biosynthesis. DNA microarray data indicated that the 13 genes of the canonical pathway appear to be utilised for AroAA biosynthesis in H. salinarum, as they are differentially expressed when cells are grown in medium lacking AroAA.
Collapse
|
187
|
A data-driven approach to reverse engineering customer engagement models: towards functional constructs. PLoS One 2014; 9:e102768. [PMID: 25036766 PMCID: PMC4103885 DOI: 10.1371/journal.pone.0102768] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2014] [Accepted: 06/01/2014] [Indexed: 11/19/2022] Open
Abstract
Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The ‘communities’ of questionnaire items that emerge from our community detection method form possible ‘functional constructs’ inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such ‘functional constructs’ suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling.
Collapse
|
188
|
Brooks AN, Reiss DJ, Allard A, Wu WJ, Salvanha DM, Plaisier CL, Chandrasekaran S, Pan M, Kaur A, Baliga NS. A system-level model for the microbial regulatory genome. Mol Syst Biol 2014; 10:740. [PMID: 25028489 PMCID: PMC4299497 DOI: 10.15252/msb.20145160] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Microbes can tailor transcriptional responses to diverse environmental challenges despite having streamlined genomes and a limited number of regulators. Here, we present data-driven models that capture the dynamic interplay of the environment and genome-encoded regulatory programs of two types of prokaryotes: Escherichia coli (a bacterium) and Halobacterium salinarum (an archaeon). The models reveal how the genome-wide distributions of cis-acting gene regulatory elements and the conditional influences of transcription factors at each of those elements encode programs for eliciting a wide array of environment-specific responses. We demonstrate how these programs partition transcriptional regulation of genes within regulons and operons to re-organize gene-gene functional associations in each environment. The models capture fitness-relevant co-regulation by different transcriptional control mechanisms acting across the entire genome, to define a generalized, system-level organizing principle for prokaryotic gene regulatory networks that goes well beyond existing paradigms of gene regulation. An online resource (http://egrin2.systemsbiology.net) has been developed to facilitate multiscale exploration of conditional gene regulation in the two prokaryotes.
Collapse
Affiliation(s)
- Aaron N Brooks
- Institute for Systems Biology, Seattle, WA, USA Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | | | - Antoine Allard
- Département de Physique, de Génie Physique et d'Optique, Université Laval, Québec, QC, Canada
| | - Wei-Ju Wu
- Institute for Systems Biology, Seattle, WA, USA
| | - Diego M Salvanha
- Institute for Systems Biology, Seattle, WA, USA LabPIB, Department of Computing and Mathematics FFCLRP-USP, University of Sao Paulo, Ribeirao Preto, Brazil
| | | | | | - Min Pan
- Institute for Systems Biology, Seattle, WA, USA
| | | | - Nitin S Baliga
- Institute for Systems Biology, Seattle, WA, USA Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA Departments of Microbiology and Biology, University of Washington, Seattle, WA, USA Lawrence Berkeley National Laboratories, Berkeley, CA, USA
| |
Collapse
|
189
|
Hernández-Prieto MA, Semeniuk TA, Futschik ME. Toward a systems-level understanding of gene regulatory, protein interaction, and metabolic networks in cyanobacteria. Front Genet 2014; 5:191. [PMID: 25071821 PMCID: PMC4079066 DOI: 10.3389/fgene.2014.00191] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Accepted: 06/11/2014] [Indexed: 12/21/2022] Open
Abstract
Cyanobacteria are essential primary producers in marine ecosystems, playing an important role in both carbon and nitrogen cycles. In the last decade, various genome sequencing and metagenomic projects have generated large amounts of genetic data for cyanobacteria. This wealth of data provides researchers with a new basis for the study of molecular adaptation, ecology and evolution of cyanobacteria, as well as for developing biotechnological applications. It also facilitates the use of multiplex techniques, i.e., expression profiling by high-throughput technologies such as microarrays, RNA-seq, and proteomics. However, exploration and analysis of these data is challenging, and often requires advanced computational methods. Also, they need to be integrated into our existing framework of knowledge to use them to draw reliable biological conclusions. Here, systems biology provides important tools. Especially, the construction and analysis of molecular networks has emerged as a powerful systems-level framework, with which to integrate such data, and to better understand biological relevant processes in these organisms. In this review, we provide an overview of the advances and experimental approaches undertaken using multiplex data from genomic, transcriptomic, proteomic, and metabolomic studies in cyanobacteria. Furthermore, we summarize currently available web-based tools dedicated to cyanobacteria, i.e., CyanoBase, CyanoEXpress, ProPortal, Cyanorak, CyanoBIKE, and CINPER. Finally, we present a case study for the freshwater model cyanobacteria, Synechocystis sp. PCC6803, to show the power of meta-analysis, and the potential to extrapolate acquired knowledge to the ecologically important marine cyanobacteria genus, Prochlorococcus.
Collapse
Affiliation(s)
| | - Trudi A Semeniuk
- Systems Biology and Bioinformatics Laboratory, IBB-CBME, University of Algarve Faro, Portugal
| | - Matthias E Futschik
- Systems Biology and Bioinformatics Laboratory, IBB-CBME, University of Algarve Faro, Portugal ; Centre of Marine Sciences, University of Algarve Faro, Portugal
| |
Collapse
|
190
|
Inferring gene regulatory networks by integrating ChIP-seq/chip and transcriptome data via LASSO-type regularization methods. Methods 2014; 67:294-303. [DOI: 10.1016/j.ymeth.2014.03.006] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2013] [Revised: 03/04/2014] [Accepted: 03/05/2014] [Indexed: 01/14/2023] Open
|
191
|
Sudhakar P, Reck M, Wang W, He FQ, Wagner-Döbler I, Dobler IW, Zeng AP. Construction and verification of the transcriptional regulatory response network of Streptococcus mutans upon treatment with the biofilm inhibitor carolacton. BMC Genomics 2014; 15:362. [PMID: 24884510 PMCID: PMC4048456 DOI: 10.1186/1471-2164-15-362] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2013] [Accepted: 04/17/2014] [Indexed: 11/26/2022] Open
Abstract
Background Carolacton is a newly identified secondary metabolite causing altered cell morphology and death of Streptococcus mutans biofilm cells. To unravel key regulators mediating these effects, the transcriptional regulatory response network of S. mutans biofilms upon carolacton treatment was constructed and analyzed. A systems biological approach integrating time-resolved transcriptomic data, reverse engineering, transcription factor binding sites, and experimental validation was carried out. Results The co-expression response network constructed from transcriptomic data using the reverse engineering algorithm called the Trend Correlation method consisted of 8284 gene pairs. The regulatory response network inferred by superimposing transcription factor binding site information into the co-expression network comprised 329 putative transcriptional regulatory interactions and could be classified into 27 sub-networks each co-regulated by a transcription factor. These sub-networks were significantly enriched with genes sharing common functions. The regulatory response network displayed global hierarchy and network motifs as observed in model organisms. The sub-networks modulated by the pyrimidine biosynthesis regulator PyrR, the glutamine synthetase repressor GlnR, the cysteine metabolism regulator CysR, global regulators CcpA and CodY and the two component system response regulators VicR and MbrC among others could putatively be related to the physiological effect of carolacton. The predicted interactions from the regulatory network between MbrC, known to be involved in cell envelope stress response, and the murMN-SMU_718c genes encoding peptidoglycan biosynthetic enzymes were experimentally confirmed using Electro Mobility Shift Assays. Furthermore, gene deletion mutants of five predicted key regulators from the response networks were constructed and their sensitivities towards carolacton were investigated. Deletion of cysR, the node having the highest connectivity among the regulators chosen from the regulatory network, resulted in a mutant which was insensitive to carolacton thus demonstrating not only the essentiality of cysR for the response of S. mutans biofilms to carolacton but also the relevance of the predicted network. Conclusion The network approach used in this study revealed important regulators and interactions as part of the response mechanisms of S. mutans biofilm cells to carolacton. It also opens a door for further studies into novel drug targets against streptococci. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-362) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | - Irene W Dobler
- Institute of Bioprocess and Biosystems Engineering, Hamburg University of Technology, 21073 Hamburg, Germany.
| | | |
Collapse
|
192
|
Henderson J, Michailidis G. Network reconstruction using nonparametric additive ODE models. PLoS One 2014; 9:e94003. [PMID: 24732037 PMCID: PMC3986056 DOI: 10.1371/journal.pone.0094003] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 03/13/2014] [Indexed: 01/05/2023] Open
Abstract
Network representations of biological systems are widespread and reconstructing unknown networks from data is a focal problem for computational biologists. For example, the series of biochemical reactions in a metabolic pathway can be represented as a network, with nodes corresponding to metabolites and edges linking reactants to products. In a different context, regulatory relationships among genes are commonly represented as directed networks with edges pointing from influential genes to their targets. Reconstructing such networks from data is a challenging problem receiving much attention in the literature. There is a particular need for approaches tailored to time-series data and not reliant on direct intervention experiments, as the former are often more readily available. In this paper, we introduce an approach to reconstructing directed networks based on dynamic systems models. Our approach generalizes commonly used ODE models based on linear or nonlinear dynamics by extending the functional class for the functions involved from parametric to nonparametric models. Concomitantly we limit the complexity by imposing an additive structure on the estimated slope functions. Thus the submodel associated with each node is a sum of univariate functions. These univariate component functions form the basis for a novel coupling metric that we define in order to quantify the strength of proposed relationships and hence rank potential edges. We show the utility of the method by reconstructing networks using simulated data from computational models for the glycolytic pathway of Lactocaccus Lactis and a gene network regulating the pluripotency of mouse embryonic stem cells. For purposes of comparison, we also assess reconstruction performance using gene networks from the DREAM challenges. We compare our method to those that similarly rely on dynamic systems models and use the results to attempt to disentangle the distinct roles of linearity, sparsity, and derivative estimation.
Collapse
Affiliation(s)
- James Henderson
- Department of Statistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - George Michailidis
- Department of Statistics, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
193
|
Wang YXR, Huang H. Review on statistical methods for gene network reconstruction using expression data. J Theor Biol 2014; 362:53-61. [PMID: 24726980 DOI: 10.1016/j.jtbi.2014.03.040] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2014] [Revised: 03/29/2014] [Accepted: 03/31/2014] [Indexed: 12/16/2022]
Abstract
Network modeling has proven to be a fundamental tool in analyzing the inner workings of a cell. It has revolutionized our understanding of biological processes and made significant contributions to the discovery of disease biomarkers. Much effort has been devoted to reconstruct various types of biochemical networks using functional genomic datasets generated by high-throughput technologies. This paper discusses statistical methods used to reconstruct gene regulatory networks using gene expression data. In particular, we highlight progress made and challenges yet to be met in the problems involved in estimating gene interactions, inferring causality and modeling temporal changes of regulation behaviors. As rapid advances in technologies have made available diverse, large-scale genomic data, we also survey methods of incorporating all these additional data to achieve better, more accurate inference of gene networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- Department of Statistics, University of California, Berkeley, CA 94720, USA.
| | - Haiyan Huang
- Department of Statistics, University of California, Berkeley, CA 94720, USA.
| |
Collapse
|
194
|
Unifying immunology with informatics and multiscale biology. Nat Immunol 2014; 15:118-27. [PMID: 24448569 DOI: 10.1038/ni.2787] [Citation(s) in RCA: 128] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Accepted: 11/14/2013] [Indexed: 12/14/2022]
Abstract
The immune system is a highly complex and dynamic system. Historically, the most common scientific and clinical practice has been to evaluate its individual components. This kind of approach cannot always expose the interconnecting pathways that control immune-system responses and does not reveal how the immune system works across multiple biological systems and scales. High-throughput technologies can be used to measure thousands of parameters of the immune system at a genome-wide scale. These system-wide surveys yield massive amounts of quantitative data that provide a means to monitor and probe immune-system function. New integrative analyses can help synthesize and transform these data into valuable biological insight. Here we review some of the computational analysis tools for high-dimensional data and how they can be applied to immunology.
Collapse
|
195
|
Macklin DN, Ruggero NA, Covert MW. The future of whole-cell modeling. Curr Opin Biotechnol 2014; 28:111-5. [PMID: 24556244 DOI: 10.1016/j.copbio.2014.01.012] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Revised: 01/19/2014] [Accepted: 01/20/2014] [Indexed: 12/21/2022]
Abstract
Integrated whole-cell modeling is poised to make a dramatic impact on molecular and systems biology, bioengineering, and medicine--once certain obstacles are overcome. From our group's experience building a whole-cell model of Mycoplasma genitalium, we identified several significant challenges to building models of more complex cells. Here we review and discuss these challenges in seven areas: first, experimental interrogation; second, data curation; third, model building and integration; fourth, accelerated computation; fifth, analysis and visualization; sixth, model validation; and seventh, collaboration and community development. Surmounting these challenges will require the cooperation of an interdisciplinary group of researchers to create increasingly sophisticated whole-cell models and make data, models, and simulations more accessible to the wider community.
Collapse
Affiliation(s)
- Derek N Macklin
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Nicholas A Ruggero
- Department of Chemical Engineering, Stanford University, Stanford, CA, USA
| | - Markus W Covert
- Department of Bioengineering, Stanford University, Stanford, CA, USA.
| |
Collapse
|
196
|
Dimitrakopoulou K, Dimitrakopoulos GN, Wilk E, Tsimpouris C, Sgarbas KN, Schughart K, Bezerianos A. Influenza A immunomics and public health omics: the dynamic pathway interplay in host response to H1N1 infection. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:167-83. [PMID: 24512282 DOI: 10.1089/omi.2013.0062] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Towards unraveling the influenza A (H1N1) immunome, this work aims at constructing the murine host response pathway interactome. To accomplish that, an ensemble of dynamic and time-varying Gene Regulatory Network Inference methodologies was recruited to set a confident interactome based on mouse time series transcriptome data (day 1-day 60). The proposed H1N1 interactome demonstrated significant transformations among activated and suppressed pathways in time. Enhanced interplay was observed at day 1, while the maximal network complexity was reached at day 8 (correlated with viral clearance and iBALT tissue formation) and one interaction was present at day 40. Next, we searched for common interactivity features between the murine-adapted PR8 strain and other influenza A subtypes/strains. For this, two other interactomes, describing the murine host response against H5N1 and H1N1pdm, were constructed, which in turn validated many of the observed interactions (in the period day 1-day 7). The H1N1 interactome revealed the role of cell cycle both in innate and adaptive immunity (day 1-day 14). Also, pathogen sensory pathways (e.g., RIG-I) displayed long-lasting association with cytokine/chemokine signaling (until day 8). Interestingly, the above observations were also supported by the H5N1 and H1N1pdm models. It also elucidated the enhanced coupling of the activated innate pathways with the suppressed PPAR signaling to keep low inflammation until viral clearance (until day 14). Further, it showed that interactions reflecting phagocytosis processes continued long after the viral clearance and the establishment of adaptive immunity (day 8-day 40). Additionally, interactions involving B cell receptor pathway were evident since day 1. These results collectively inform the emerging field of public health omics and future clinical studies aimed at deciphering dynamic host responses to infectious agents.
Collapse
|
197
|
Villaverde AF, Banga JR. Reverse engineering and identification in systems biology: strategies, perspectives and challenges. J R Soc Interface 2014; 11:20130505. [PMID: 24307566 PMCID: PMC3869153 DOI: 10.1098/rsif.2013.0505] [Citation(s) in RCA: 163] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 11/12/2013] [Indexed: 12/17/2022] Open
Abstract
The interplay of mathematical modelling with experiments is one of the central elements in systems biology. The aim of reverse engineering is to infer, analyse and understand, through this interplay, the functional and regulatory mechanisms of biological systems. Reverse engineering is not exclusive of systems biology and has been studied in different areas, such as inverse problem theory, machine learning, nonlinear physics, (bio)chemical kinetics, control theory and optimization, among others. However, it seems that many of these areas have been relatively closed to outsiders. In this contribution, we aim to compare and highlight the different perspectives and contributions from these fields, with emphasis on two key questions: (i) why are reverse engineering problems so hard to solve, and (ii) what methods are available for the particular problems arising from systems biology?
Collapse
Affiliation(s)
| | - Julio R. Banga
- BioProcess Engineering Group, IIM-CSIC, Spanish National Research Council, Vigo 36208, Spain
| |
Collapse
|
198
|
Zheng Z, Christley S, Chiu WT, Blitz IL, Xie X, Cho KWY, Nie Q. Inference of the Xenopus tropicalis embryonic regulatory network and spatial gene expression patterns. BMC SYSTEMS BIOLOGY 2014; 8:3. [PMID: 24397936 PMCID: PMC3896677 DOI: 10.1186/1752-0509-8-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Accepted: 12/19/2013] [Indexed: 11/10/2022]
Abstract
BACKGROUND During embryogenesis, signaling molecules produced by one cell population direct gene regulatory changes in neighboring cells and influence their developmental fates and spatial organization. One of the earliest events in the development of the vertebrate embryo is the establishment of three germ layers, consisting of the ectoderm, mesoderm and endoderm. Attempts to measure gene expression in vivo in different germ layers and cell types are typically complicated by the heterogeneity of cell types within biological samples (i.e., embryos), as the responses of individual cell types are intermingled into an aggregate observation of heterogeneous cell types. Here, we propose a novel method to elucidate gene regulatory circuits from these aggregate measurements in embryos of the frog Xenopus tropicalis using gene network inference algorithms and then test the ability of the inferred networks to predict spatial gene expression patterns. RESULTS We use two inference models with different underlying assumptions that incorporate existing network information, an ODE model for steady-state data and a Markov model for time series data, and contrast the performance of the two models. We apply our method to both control and knockdown embryos at multiple time points to reconstruct the core mesoderm and endoderm regulatory circuits. Those inferred networks are then used in combination with known dorsal-ventral spatial expression patterns of a subset of genes to predict spatial expression patterns for other genes. Both models are able to predict spatial expression patterns for some of the core mesoderm and endoderm genes, but interestingly of different gene subsets, suggesting that neither model is sufficient to recapitulate all of the spatial patterns, yet they are complementary for the patterns that they do capture. CONCLUSION The presented methodology of gene network inference combined with spatial pattern prediction provides an additional layer of validation to elucidate the regulatory circuits controlling the spatial-temporal dynamics in embryonic development.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Qing Nie
- Department of Mathematics, University of California, Irvine, CA 92697, USA.
| |
Collapse
|
199
|
Wang Z, Wang Y, Wang N, Wang J, Wang Z, Vallejos CE, Wu R. Towards a comprehensive picture of the genetic landscape of complex traits. Brief Bioinform 2014; 15:30-42. [PMID: 22930650 PMCID: PMC3896925 DOI: 10.1093/bib/bbs049] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Accepted: 07/09/2012] [Indexed: 12/11/2022] Open
Abstract
The formation of phenotypic traits, such as biomass production, tumor volume and viral abundance, undergoes a complex process in which interactions between genes and developmental stimuli take place at each level of biological organization from cells to organisms. Traditional studies emphasize the impact of genes by directly linking DNA-based markers with static phenotypic values. Functional mapping, derived to detect genes that control developmental processes using growth equations, has proven powerful for addressing questions about the roles of genes in development. By treating phenotypic formation as a cohesive system using differential equations, a different approach-systems mapping-dissects the system into interconnected elements and then map genes that determine a web of interactions among these elements, facilitating our understanding of the genetic machineries for phenotypic development. Here, we argue that genetic mapping can play a more important role in studying the genotype-phenotype relationship by filling the gaps in the biochemical and regulatory process from DNA to end-point phenotype. We describe a new framework, named network mapping, to study the genetic architecture of complex traits by integrating the regulatory networks that cause a high-order phenotype. Network mapping makes use of a system of differential equations to quantify the rule by which transcriptional, proteomic and metabolomic components interact with each other to organize into a functional whole. The synthesis of functional mapping, systems mapping and network mapping provides a novel avenue to decipher a comprehensive picture of the genetic landscape of complex phenotypes that underlie economically and biomedically important traits.
Collapse
Affiliation(s)
- Zhong Wang
- Center for Statistical Genetics, The Pennsylvania State University, Hershey, PA 17033, USA.
| | | | | | | | | | | | | |
Collapse
|
200
|
Lopes M, Bontempi G. Experimental assessment of static and dynamic algorithms for gene regulation inference from time series expression data. Front Genet 2013; 4:303. [PMID: 24400020 PMCID: PMC3872039 DOI: 10.3389/fgene.2013.00303] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Accepted: 12/10/2013] [Indexed: 11/13/2022] Open
Abstract
Accurate inference of causal gene regulatory networks from gene expression data is an open bioinformatics challenge. Gene interactions are dynamical processes and consequently we can expect that the effect of any regulation action occurs after a certain temporal lag. However such lag is unknown a priori and temporal aspects require specific inference algorithms. In this paper we aim to assess the impact of taking into consideration temporal aspects on the final accuracy of the inference procedure. In particular we will compare the accuracy of static algorithms, where no dynamic aspect is considered, to that of fixed lag and adaptive lag algorithms in three inference tasks from microarray expression data. Experimental results show that network inference algorithms that take dynamics into account perform consistently better than static ones, once the considered lags are properly chosen. However, no individual algorithm stands out in all three inference tasks, and the challenging nature of network inference tasks is evidenced, as a large number of the assessed algorithms does not perform better than random.
Collapse
Affiliation(s)
- Miguel Lopes
- Machine Learning Group, Computer Science Department, Universite Libre de Bruxelles Bruxelles, Belgium ; Interuniversity Institute of Bioinformatics in Brussels (IB)2 Brussels, Belgium
| | - Gianluca Bontempi
- Machine Learning Group, Computer Science Department, Universite Libre de Bruxelles Bruxelles, Belgium ; Interuniversity Institute of Bioinformatics in Brussels (IB)2 Brussels, Belgium
| |
Collapse
|