1
|
Kim D, Tran A, Kim HJ, Lin Y, Yang JYH, Yang P. Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data. NPJ Syst Biol Appl 2023; 9:51. [PMID: 37857632 PMCID: PMC10587078 DOI: 10.1038/s41540-023-00312-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/02/2023] [Indexed: 10/21/2023] Open
Abstract
Inferring gene regulatory networks (GRNs) is a fundamental challenge in biology that aims to unravel the complex relationships between genes and their regulators. Deciphering these networks plays a critical role in understanding the underlying regulatory crosstalk that drives many cellular processes and diseases. Recent advances in sequencing technology have led to the development of state-of-the-art GRN inference methods that exploit matched single-cell multi-omic data. By employing diverse mathematical and statistical methodologies, these methods aim to reconstruct more comprehensive and precise gene regulatory networks. In this review, we give a brief overview on the statistical and methodological foundations commonly used in GRN inference methods. We then compare and contrast the latest state-of-the-art GRN inference methods for single-cell matched multi-omics data, and discuss their assumptions, limitations and opportunities. Finally, we discuss the challenges and future directions that hold promise for further advancements in this rapidly developing field.
Collapse
Affiliation(s)
- Daniel Kim
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
| | - Andy Tran
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia
| | - Hani Jieun Kim
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
| | - Yingxin Lin
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia.
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia.
| | - Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia.
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia.
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia.
| |
Collapse
|
2
|
Lee AJ, Reiter T, Doing G, Oh J, Hogan DA, Greene CS. Using genome-wide expression compendia to study microorganisms. Comput Struct Biotechnol J 2022; 20:4315-4324. [PMID: 36016717 PMCID: PMC9396250 DOI: 10.1016/j.csbj.2022.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 08/07/2022] [Accepted: 08/07/2022] [Indexed: 11/30/2022] Open
Abstract
A gene expression compendium is a heterogeneous collection of gene expression experiments assembled from data collected for diverse purposes. The widely varied experimental conditions and genetic backgrounds across samples creates a tremendous opportunity for gaining a systems level understanding of the transcriptional responses that influence phenotypes. Variety in experimental design is particularly important for studying microbes, where the transcriptional responses integrate many signals and demonstrate plasticity across strains including response to what nutrients are available and what microbes are present. Advances in high-throughput measurement technology have made it feasible to construct compendia for many microbes. In this review we discuss how these compendia are constructed and analyzed to reveal transcriptional patterns.
Collapse
Affiliation(s)
- Alexandra J. Lee
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA, USA
| | - Taylor Reiter
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Denver, CO, USA
| | - Georgia Doing
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Julia Oh
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Deborah A. Hogan
- Department of Microbiology and Immunology, Geisel School of Medicine, Dartmouth, Hanover, NH, USA
| | - Casey S. Greene
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Denver, CO, USA
| |
Collapse
|
3
|
Saint-André V. Computational biology approaches for mapping transcriptional regulatory networks. Comput Struct Biotechnol J 2021; 19:4884-4895. [PMID: 34522292 PMCID: PMC8426465 DOI: 10.1016/j.csbj.2021.08.028] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 08/16/2021] [Accepted: 08/16/2021] [Indexed: 12/13/2022] Open
Abstract
Transcriptional Regulatory Networks (TRNs) are mainly responsible for the cell-type- or cell-state-specific expression of gene sets from the same DNA sequence. However, so far there are no precise maps of TRNs available for each cell-type or cell-state, and no ideal tool to map those networks clearly and in full from biological samples. In this review, major approaches and tools to map TRNs from high-throughput data are presented, depending on the type of methods or data used to infer them, and their advantages and limitations are discussed. After summarizing the main principles defining the topology and structure–function relationships in TRNs, an overview of the extensive work done to map TRNs from bulk transcriptomic data will be presented by type of methodological approach. Most recent modellings of TRNs using other types of molecular data or integrating different data types, including single-cell RNA-sequencing and chromatin information, will then be discussed, before briefly concluding with improvements expected to come in the field.
Collapse
Affiliation(s)
- Violaine Saint-André
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, Paris, France
| |
Collapse
|
4
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 PMCID: PMC8296984 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
5
|
Larmuseau M, Verbeke LPC, Marchal K. Associating expression and genomic data using co-occurrence measures. Biol Direct 2019; 14:10. [PMID: 31072345 PMCID: PMC6507230 DOI: 10.1186/s13062-019-0240-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 04/10/2019] [Indexed: 12/11/2022] Open
Abstract
Abstract Recent technological evolutions have led to an exponential increase in data in all the omics fields. It is expected that integration of these different data sources, will drastically enhance our knowledge of the biological mechanisms behind genomic diseases such as cancer. However, the integration of different omics data still remains a challenge. In this work we propose an intuitive workflow for the integrative analysis of expression, mutation and copy number data taken from the METABRIC study on breast cancer. First, we present evidence that the expression profile of many important breast cancer genes consists of two modes or ‘regimes’, which contain important clinical information. Then, we show how the co-occurrence of these expression regimes can be used as an association measure between genes and validate our findings on the TCGA-BRCA study. Finally, we demonstrate how these co-occurrence measures can also be applied to link expression regimes to genomic aberrations, providing a more complete, integrative view on breast cancer. As a case study, an integrative analysis of the identified MLPH-FOXA1 association is performed, illustrating that the obtained expression associations are intimately linked to the underlying genomic changes. Reviewers This article was reviewed by Dirk Walther, Francisco Garcia and Isabel Nepomuceno. Electronic supplementary material The online version of this article (10.1186/s13062-019-0240-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maarten Larmuseau
- Department of Information Technology, Ghent University - Imec, Technologiepark-Zwijnaarde 126, 9052, Ghent, Belgium
| | - Lieven P C Verbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University - Imec, Technologiepark-Zwijnaarde 126, 9052, Ghent, Belgium
| | - Kathleen Marchal
- Department of Plant Biotechnology and Bioinformatics, Ghent University - Imec, Technologiepark-Zwijnaarde 126, 9052, Ghent, Belgium.
| |
Collapse
|
6
|
Ko Y, Kim J, Rodriguez-Zas SL. Markov chain Monte Carlo simulation of a Bayesian mixture model for gene network inference. Genes Genomics 2019; 41:547-555. [PMID: 30741379 DOI: 10.1007/s13258-019-00789-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 01/21/2019] [Indexed: 12/31/2022]
Abstract
BACKGROUND Simultaneous measurement of gene expression level for thousands of genes contains the rich information about many different aspects of biological mechanisms. A major computational challenge is to find methods to extract new biological insights from this wealth of data. Complex biological processes are often regulated under the various conditions or circumstances and associated gene interactions are dynamically changed depending on different biological contexts. Thus, inference of such dynamic relationships between genes with consideration of biological conditions is very challenging. METHOD In this study, we propose a comprehensive and integrated approach to infer the dynamic relationships between genes and evaluate this approach on three distinct gene networks. RESULTS This study demonstrates the advantage of integrating Markov chain Monte Carlo (MCMC) simulation into a Bayesian mixture model to overcome the high-dimension, low sample size (HDLSS) problem as well as to identify context-specific biological modules. Such biological modules were identified through the summarization of sampled network structures obtained from MCMC simulation. CONCLUSION This novel approach gives a comprehensive understanding of the dynamically regulated biological modules.
Collapse
Affiliation(s)
- Younhee Ko
- Division of Biomedical Engineering, Hankuk University of Foreign Studies, Gyeonggi-do, 17035, South Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea.
| | - Sandra L Rodriguez-Zas
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA.
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA.
| |
Collapse
|
7
|
Pannier L, Merino E, Marchal K, Collado-Vides J. Effect of genomic distance on coexpression of coregulated genes in E. coli. PLoS One 2017; 12:e0174887. [PMID: 28419102 PMCID: PMC5395161 DOI: 10.1371/journal.pone.0174887] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 03/16/2017] [Indexed: 12/26/2022] Open
Abstract
In prokaryotes, genomic distance is a feature that in addition to coregulation affects coexpression. Several observations, such as genomic clustering of highly coexpressed small regulons, support the idea that coexpression behavior of coregulated genes is affected by the distance between the coregulated genes. However, the specific contribution of distance in addition to coregulation in determining the degree of coexpression has not yet been studied systematically. In this work, we exploit the rich information in RegulonDB to study how the genomic distance between coregulated genes affects their degree of coexpression, measured by pairwise similarity of expression profiles obtained under a large number of conditions. We observed that, in general, coregulated genes display higher degrees of coexpression as they are more closely located on the genome. This contribution of genomic distance in determining the degree of coexpression was relatively small compared to the degree of coexpression that was determined by the tightness of the coregulation (degree of overlap of regulatory programs) but was shown to be evolutionary constrained. In addition, the distance effect was sufficient to guarantee coexpression of coregulated genes that are located at very short distances, irrespective of their tightness of coregulation. This is partly but definitely not always because the close distance is also the cause of the coregulation. In cases where it is not, we hypothesize that the effect of the distance on coexpression could be caused by the fact that coregulated genes closely located to each other are also relatively more equidistantly located from their common TF and therefore subject to more similar levels of TF molecules. The absolute genomic distance of the coregulated genes to their common TF-coding gene tends to be less important in determining the degree of coexpression. Our results pinpoint the importance of taking into account the combined effect of distance and coregulation when studying prokaryotic coexpression and transcriptional regulation.
Collapse
Affiliation(s)
- Lucia Pannier
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Enrique Merino
- Departamento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Kathleen Marchal
- Department of Microbial and Molecular Systems, KU Leuven, Centre of Microbial and Plant Genetics, Leuven, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark, Ghent, Belgium
- Department of Information Technology, Ghent University, IMinds, Ghent, Belgium
- Department of Genetics, University of Pretoria, Hatfield Campus, Pretoria, South Africa
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| |
Collapse
|
8
|
Taghipour S, Zarrineh P, Ganjtabesh M, Nowzari-Dalini A. Improving protein complex prediction by reconstructing a high-confidence protein-protein interaction network of Escherichia coli from different physical interaction data sources. BMC Bioinformatics 2017; 18:10. [PMID: 28049415 PMCID: PMC5209909 DOI: 10.1186/s12859-016-1422-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2016] [Accepted: 12/12/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Although different protein-protein physical interaction (PPI) datasets exist for Escherichia coli, no common methodology exists to integrate these datasets and extract reliable modules reflecting the existing biological process and protein complexes. Naïve Bayesian formula is the highly accepted method to integrate different PPI datasets into a single weighted PPI network, but detecting proper weights in such network is still a major problem. RESULTS In this paper, we proposed a new methodology to integrate various physical PPI datasets into a single weighted PPI network in a way that the detected modules in PPI network exhibit the highest similarity to available functional modules. We used the co-expression modules as functional modules, and we shown that direct functional modules detected from Gene Ontology terms could be used as an alternative dataset. After running this integrating methodology over six different physical PPI datasets, orthologous high-confidence interactions from a related organism and two AP-MS PPI datasets gained high weights in the integrated networks, while the weights for one AP-MS PPI dataset and two other datasets derived from public databases have converged to zero. The majority of detected modules shaped around one or few hub protein(s). Still, a large number of highly interacting protein modules were detected which are functionally relevant and are likely to construct protein complexes. CONCLUSIONS We provided a new high confidence protein complex prediction method supported by functional studies and literature mining.
Collapse
Affiliation(s)
- Shirin Taghipour
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, P.O.Box: 14155-6455, Tehran, Iran
| | - Peyman Zarrineh
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, P.O.Box: 14155-6455, Tehran, Iran
| | - Mohammad Ganjtabesh
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, P.O.Box: 14155-6455, Tehran, Iran.
| | - Abbas Nowzari-Dalini
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, P.O.Box: 14155-6455, Tehran, Iran
| |
Collapse
|
9
|
Žurauskienė J, Kirk PDW, Stumpf MPH. A graph theoretical approach to data fusion. Stat Appl Genet Mol Biol 2016; 15:107-22. [PMID: 26992203 DOI: 10.1515/sagmb-2016-0016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The rapid development of high throughput experimental techniques has resulted in a growing diversity of genomic datasets being produced and requiring analysis. Therefore, it is increasingly being recognized that we can gain deeper understanding about underlying biology by combining the insights obtained from multiple, diverse datasets. Thus we propose a novel scalable computational approach to unsupervised data fusion. Our technique exploits network representations of the data to identify similarities among the datasets. We may work within the Bayesian formalism, using Bayesian nonparametric approaches to model each dataset; or (for fast, approximate, and massive scale data fusion) can naturally switch to more heuristic modeling techniques. An advantage of the proposed approach is that each dataset can initially be modeled independently (in parallel), before applying a fast post-processing step to perform data integration. This allows us to incorporate new experimental data in an online fashion, without having to rerun all of the analysis. We first demonstrate the applicability of our tool on artificial data, and then on examples from the literature, which include yeast cell cycle, breast cancer and sporadic inclusion body myositis datasets.
Collapse
|
10
|
Liu Q, Song R, Li J. Inference of gene interaction networks using conserved subsequential patterns from multiple time course gene expression datasets. BMC Genomics 2015; 16 Suppl 12:S4. [PMID: 26681650 PMCID: PMC4682423 DOI: 10.1186/1471-2164-16-s12-s4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Motivation Deciphering gene interaction networks (GINs) from time-course gene expression (TCGx) data is highly valuable to understand gene behaviors (e.g., activation, inhibition, time-lagged causality) at the system level. Existing methods usually use a global or local proximity measure to infer GINs from a single dataset. As the noise contained in a single data set is hardly self-resolved, the results are sometimes not reliable. Also, these proximity measurements cannot handle the co-existence of the various in vivo positive, negative and time-lagged gene interactions. Methods and results We propose to infer reliable GINs from multiple TCGx datasets using a novel conserved subsequential pattern of gene expression. A subsequential pattern is a maximal subset of genes sharing positive, negative or time-lagged correlations of one expression template on their own subsets of time points. Based on these patterns, a GIN can be built from each of the datasets. It is assumed that reliable gene interactions would be detected repeatedly. We thus use conserved gene pairs from the individual GINs of the multiple TCGx datasets to construct a reliable GIN for a species. We apply our method on six TCGx datasets related to yeast cell cycle, and validate the reliable GINs using protein interaction networks, biopathways and transcription factor-gene regulations. We also compare the reliable GINs with those GINs reconstructed by a global proximity measure Pearson correlation coefficient method from single datasets. It has been demonstrated that our reliable GINs achieve much better prediction performance especially with much higher precision. The functional enrichment analysis also suggests that gene sets in a reliable GIN are more functionally significant. Our method is especially useful to decipher GINs from multiple TCGx datasets related to less studied organisms where little knowledge is available except gene expression data.
Collapse
|
11
|
Arrieta-Ortiz ML, Hafemeister C, Bate AR, Chu T, Greenfield A, Shuster B, Barry SN, Gallitto M, Liu B, Kacmarczyk T, Santoriello F, Chen J, Rodrigues CDA, Sato T, Rudner DZ, Driks A, Bonneau R, Eichenberger P. An experimentally supported model of the Bacillus subtilis global transcriptional regulatory network. Mol Syst Biol 2015; 11:839. [PMID: 26577401 PMCID: PMC4670728 DOI: 10.15252/msb.20156236] [Citation(s) in RCA: 138] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Organisms from all domains of life use gene regulation networks to control cell growth, identity, function, and responses to environmental challenges. Although accurate global regulatory models would provide critical evolutionary and functional insights, they remain incomplete, even for the best studied organisms. Efforts to build comprehensive networks are confounded by challenges including network scale, degree of connectivity, complexity of organism–environment interactions, and difficulty of estimating the activity of regulatory factors. Taking advantage of the large number of known regulatory interactions in Bacillus subtilis and two transcriptomics datasets (including one with 38 separate experiments collected specifically for this study), we use a new combination of network component analysis and model selection to simultaneously estimate transcription factor activities and learn a substantially expanded transcriptional regulatory network for this bacterium. In total, we predict 2,258 novel regulatory interactions and recall 74% of the previously known interactions. We obtained experimental support for 391 (out of 635 evaluated) novel regulatory edges (62% accuracy), thus significantly increasing our understanding of various cell processes, such as spore formation.
Collapse
Affiliation(s)
- Mario L Arrieta-Ortiz
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Christoph Hafemeister
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Ashley Rose Bate
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Timothy Chu
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Alex Greenfield
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Bentley Shuster
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Samantha N Barry
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Matthew Gallitto
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Brian Liu
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Thadeous Kacmarczyk
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Francis Santoriello
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Jie Chen
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | | | - Tsutomu Sato
- Department of Frontier Bioscience, Hosei University, Koganei, Tokyo, Japan
| | - David Z Rudner
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA, USA
| | - Adam Driks
- Department of Microbiology and Immunology, Stritch School of Medicine, Loyola University Chicago, Maywood, IL, USA
| | - Richard Bonneau
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA Courant Institute of Mathematical Science, Computer Science Department, New York, NY, USA Simons Foundation, Simons Center for Data Analysis, New York, NY, USA
| | - Patrick Eichenberger
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| |
Collapse
|
12
|
Reiss DJ, Plaisier CL, Wu WJ, Baliga NS. cMonkey2: Automated, systematic, integrated detection of co-regulated gene modules for any organism. Nucleic Acids Res 2015; 43:e87. [PMID: 25873626 PMCID: PMC4513845 DOI: 10.1093/nar/gkv300] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Revised: 03/05/2015] [Accepted: 03/26/2015] [Indexed: 12/25/2022] Open
Abstract
The cMonkey integrated biclustering algorithm identifies conditionally co-regulated modules of genes (biclusters). cMonkey integrates various orthogonal pieces of information which support evidence of gene co-regulation, and optimizes biclusters to be supported simultaneously by one or more of these prior constraints. The algorithm served as the cornerstone for constructing the first global, predictive Environmental Gene Regulatory Influence Network (EGRIN) model for a free-living cell, and has now been applied to many more organisms. However, due to its computational inefficiencies, long run-time and complexity of various input data types, cMonkey was not readily usable by the wider community. To address these primary concerns, we have significantly updated the cMonkey algorithm and refactored its implementation, improving its usability and extendibility. These improvements provide a fully functioning and user-friendly platform for building co-regulated gene modules and the tools necessary for their exploration and interpretation. We show, via three separate analyses of data for E. coli, M. tuberculosis and H. sapiens, that the updated algorithm and inclusion of novel scoring functions for new data types (e.g. ChIP-seq and transcription factor over-expression [TFOE]) improve discovery of biologically informative co-regulated modules. The complete cMonkey2 software package, including source code, is available at https://github.com/baliga-lab/cmonkey2.
Collapse
Affiliation(s)
- David J Reiss
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | | | - Wei-Ju Wu
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | - Nitin S Baliga
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA Department of Microbiology, University of Washington, Seattle, WA 98103, USA
| |
Collapse
|
13
|
Dong X, Yambartsev A, Ramsey SA, Thomas LD, Shulzhenko N, Morgun A. Reverse enGENEering of Regulatory Networks from Big Data: A Roadmap for Biologists. Bioinform Biol Insights 2015; 9:61-74. [PMID: 25983554 PMCID: PMC4415676 DOI: 10.4137/bbi.s12467] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2014] [Revised: 02/16/2015] [Accepted: 02/17/2015] [Indexed: 12/29/2022] Open
Abstract
Omics technologies enable unbiased investigation of biological systems through massively parallel sequence acquisition or molecular measurements, bringing the life sciences into the era of Big Data. A central challenge posed by such omics datasets is how to transform these data into biological knowledge, for example, how to use these data to answer questions such as: Which functional pathways are involved in cell differentiation? Which genes should we target to stop cancer? Network analysis is a powerful and general approach to solve this problem consisting of two fundamental stages, network reconstruction, and network interrogation. Here we provide an overview of network analysis including a step-by-step guide on how to perform and use this approach to investigate a biological question. In this guide, we also include the software packages that we and others employ for each of the steps of a network analysis workflow.
Collapse
Affiliation(s)
- Xiaoxi Dong
- College of Pharmacy, Oregon State University, Corvallis, OR, USA
| | - Anatoly Yambartsev
- Department of Statistics, Institute of Mathematics and Statistics, University of Sao Paulo, Sao Paulo, SP, Brazil
| | - Stephen A Ramsey
- School of Electrical Engineering and Computer Science, Department of Biomedical Sciences, Oregon State University, Corvallis, OR, USA. ; College of Veterinary Medicine, Department of Biomedical Sciences, Oregon State University, Corvallis, OR, USA
| | - Lina D Thomas
- Department of Statistics, Institute of Mathematics and Statistics, University of Sao Paulo, Sao Paulo, SP, Brazil
| | - Natalia Shulzhenko
- College of Veterinary Medicine, Department of Biomedical Sciences, Oregon State University, Corvallis, OR, USA
| | - Andrey Morgun
- College of Pharmacy, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
14
|
An integrated approach to reconstructing genome-scale transcriptional regulatory networks. PLoS Comput Biol 2015; 11:e1004103. [PMID: 25723545 PMCID: PMC4344238 DOI: 10.1371/journal.pcbi.1004103] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Accepted: 12/23/2014] [Indexed: 11/24/2022] Open
Abstract
Transcriptional regulatory networks (TRNs) program cells to dynamically alter their gene expression in response to changing internal or environmental conditions. In this study, we develop a novel workflow for generating large-scale TRN models that integrates comparative genomics data, global gene expression analyses, and intrinsic properties of transcription factors (TFs). An assessment of this workflow using benchmark datasets for the well-studied γ-proteobacterium Escherichia coli showed that it outperforms expression-based inference approaches, having a significantly larger area under the precision-recall curve. Further analysis indicated that this integrated workflow captures different aspects of the E. coli TRN than expression-based approaches, potentially making them highly complementary. We leveraged this new workflow and observations to build a large-scale TRN model for the α-Proteobacterium Rhodobacter sphaeroides that comprises 120 gene clusters, 1211 genes (including 93 TFs), 1858 predicted protein-DNA interactions and 76 DNA binding motifs. We found that ~67% of the predicted gene clusters in this TRN are enriched for functions ranging from photosynthesis or central carbon metabolism to environmental stress responses. We also found that members of many of the predicted gene clusters were consistent with prior knowledge in R. sphaeroides and/or other bacteria. Experimental validation of predictions from this R. sphaeroides TRN model showed that high precision and recall was also obtained for TFs involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341). In addition, this integrative approach enabled generation of TRNs with increased information content relative to R. sphaeroides TRN models built via other approaches. We also show how this approach can be used to simultaneously produce TRN models for each related organism used in the comparative genomics analysis. Our results highlight the advantages of integrating comparative genomics of closely related organisms with gene expression data to assemble large-scale TRN models with high-quality predictions. The ever growing amount of genomic data enables the assembly of large-scale network models that can provide important new insights into living systems. However, assembly and validation of such large-scale models can be challenging, since we often lack sufficient information to make accurate predictions. This work describes a new approach for constructing large-scale transcriptional regulatory networks of individual cells. We show that the reconstructed network captures a significantly larger fraction of cellular regulatory processes than networks generated by other existing approaches. We predict this approach, with appropriate refinements, will allow reconstruction of large-scale transcriptional network models for a variety of other organisms. As we work towards modeling the function of cells or complex ecosystems, individually reconstructed network models of signaling, information transfer and metabolism, can be integrated to provide high information predictions and insights not otherwise obtainable.
Collapse
|
15
|
Gouthu S, O'Neil ST, Di Y, Ansarolia M, Megraw M, Deluc LG. A comparative study of ripening among berries of the grape cluster reveals an altered transcriptional programme and enhanced ripening rate in delayed berries. JOURNAL OF EXPERIMENTAL BOTANY 2014; 65:5889-902. [PMID: 25135520 PMCID: PMC4203125 DOI: 10.1093/jxb/eru329] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Transcriptional studies in relation to fruit ripening generally aim to identify the transcriptional states associated with physiological ripening stages and the transcriptional changes between stages within the ripening programme. In non-climacteric fruits such as grape, all ripening-related genes involved in this programme have not been identified, mainly due to the lack of mutants for comparative transcriptomic studies. A feature in grape cluster ripening (Vitis vinifera cv. Pinot noir), where all berries do not initiate the ripening at the same time, was exploited to study their shifted ripening programmes in parallel. Berries that showed marked ripening state differences in a véraison-stage cluster (ripening onset) ultimately reached similar ripeness states toward maturity, indicating the flexibility of the ripening programme. The expression variance between these véraison-stage berry classes, where 11% of the genes were found to be differentially expressed, was reduced significantly toward maturity, resulting in the synchronization of their transcriptional states. Defined quantitative expression changes (transcriptional distances) not only existed between the véraison transitional stages, but also between the véraison to maturity stages, regardless of the berry class. It was observed that lagging berries complete their transcriptional programme in a shorter time through altered gene expressions and ripening-related hormone dynamics, and enhance the rate of physiological ripening progression. Finally, the reduction in expression variance of genes can identify new genes directly associated with ripening and also assess the relevance of gene activity to the phase of the ripening programme.
Collapse
Affiliation(s)
- Satyanarayana Gouthu
- Oregon Wine Research Institute, Oregon State University, Corvallis, OR 97331, USA Department of Horticulture, College of Agricultural Sciences, Oregon State University, Corvallis, OR 97331, USA
| | - Shawn T O'Neil
- Center For Genome Research and Biocomputing, Oregon State University, Corvallis, OR 97331, USA
| | - Yanming Di
- Department of Statistics, College of Sciences, Oregon State University, Corvallis, OR 97331, USA
| | - Mitra Ansarolia
- Department of Botany and Plant Pathology, College of Agricultural Sciences, Oregon State University, Corvallis, OR 97331, USA
| | - Molly Megraw
- Department of Botany and Plant Pathology, College of Agricultural Sciences, Oregon State University, Corvallis, OR 97331, USA
| | - Laurent G Deluc
- Oregon Wine Research Institute, Oregon State University, Corvallis, OR 97331, USA Department of Horticulture, College of Agricultural Sciences, Oregon State University, Corvallis, OR 97331, USA
| |
Collapse
|
16
|
Genome-scale co-expression network comparison across Escherichia coli and Salmonella enterica serovar Typhimurium reveals significant conservation at the regulon level of local regulators despite their dissimilar lifestyles. PLoS One 2014; 9:e102871. [PMID: 25101984 PMCID: PMC4125155 DOI: 10.1371/journal.pone.0102871] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2013] [Accepted: 06/24/2014] [Indexed: 01/01/2023] Open
Abstract
Availability of genome-wide gene expression datasets provides the opportunity to study gene expression across different organisms under a plethora of experimental conditions. In our previous work, we developed an algorithm called COMODO (COnserved MODules across Organisms) that identifies conserved expression modules between two species. In the present study, we expanded COMODO to detect the co-expression conservation across three organisms by adapting the statistics behind it. We applied COMODO to study expression conservation/divergence between Escherichia coli, Salmonella enterica, and Bacillus subtilis. We observed that some parts of the regulatory interaction networks were conserved between E. coli and S. enterica especially in the regulon of local regulators. However, such conservation was not observed between the regulatory interaction networks of B. subtilis and the two other species. We found co-expression conservation on a number of genes involved in quorum sensing, but almost no conservation for genes involved in pathogenicity across E. coli and S. enterica which could partially explain their different lifestyles. We concluded that despite their different lifestyles, no significant rewiring have occurred at the level of local regulons involved for instance, and notable conservation can be detected in signaling pathways and stress sensing in the phylogenetically close species S. enterica and E. coli. Moreover, conservation of local regulons seems to depend on the evolutionary time of divergence across species disappearing at larger distances as shown by the comparison with B. subtilis. Global regulons follow a different trend and show major rewiring even at the limited evolutionary distance that separates E. coli and S. enterica.
Collapse
|
17
|
Ghosh S, Matsuoka Y, Asai Y, Hsin KY, Kitano H. Toward an integrated software platform for systems pharmacology. Biopharm Drug Dispos 2014; 34:508-26. [PMID: 24150748 PMCID: PMC4253131 DOI: 10.1002/bdd.1875] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2013] [Accepted: 10/06/2013] [Indexed: 01/19/2023]
Abstract
Understanding complex biological systems requires the extensive support of computational tools. This is particularly true for systems pharmacology, which aims to understand the action of drugs and their interactions in a systems context. Computational models play an important role as they can be viewed as an explicit representation of biological hypotheses to be tested. A series of software and data resources are used for model development, verification and exploration of the possible behaviors of biological systems using the model that may not be possible or not cost effective by experiments. Software platforms play a dominant role in creativity and productivity support and have transformed many industries, techniques that can be applied to biology as well. Establishing an integrated software platform will be the next important step in the field. © 2013 The Authors. Biopharmaceutics & Drug Disposition published by John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Samik Ghosh
- The Systems Biology Institute5F Falcon Building, 5-6-9 Shirokanedai, Minato, Tokyo, 108-0071, Japan
- Disease Systems Modeling Laboratory, RIKEN Center for Integrative Medical Sciences1-7-22 Suehiro-Cho, Tsurumi, Yokohama, 230-0045, Japan
- * Correspondence to: The Systems Biology Institute, 5F Falcon Building, 5-6-9 Shirokanedai, Minato, Tokyo 108–0071 Japan., E-mail: ;
| | - Yukiko Matsuoka
- The Systems Biology Institute5F Falcon Building, 5-6-9 Shirokanedai, Minato, Tokyo, 108-0071, Japan
- JST ERATO Kawaoka Infection-induced Host Response Project4-6-1 Shirokanedai, Minato, Tokyo, 108-8639, Japan
| | - Yoshiyuki Asai
- Okinawa Institute of Science and Technology1919-1, Tancha, Onna-son, Kunigami, Okinawa, 904-0412, Japan
| | - Kun-Yi Hsin
- Okinawa Institute of Science and Technology1919-1, Tancha, Onna-son, Kunigami, Okinawa, 904-0412, Japan
| | - Hiroaki Kitano
- The Systems Biology Institute5F Falcon Building, 5-6-9 Shirokanedai, Minato, Tokyo, 108-0071, Japan
- Disease Systems Modeling Laboratory, RIKEN Center for Integrative Medical Sciences1-7-22 Suehiro-Cho, Tsurumi, Yokohama, 230-0045, Japan
- Okinawa Institute of Science and Technology1919-1, Tancha, Onna-son, Kunigami, Okinawa, 904-0412, Japan
- * Correspondence to: The Systems Biology Institute, 5F Falcon Building, 5-6-9 Shirokanedai, Minato, Tokyo 108–0071 Japan., E-mail: ;
| |
Collapse
|
18
|
Brooks AN, Reiss DJ, Allard A, Wu WJ, Salvanha DM, Plaisier CL, Chandrasekaran S, Pan M, Kaur A, Baliga NS. A system-level model for the microbial regulatory genome. Mol Syst Biol 2014; 10:740. [PMID: 25028489 PMCID: PMC4299497 DOI: 10.15252/msb.20145160] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Microbes can tailor transcriptional responses to diverse environmental challenges despite having streamlined genomes and a limited number of regulators. Here, we present data-driven models that capture the dynamic interplay of the environment and genome-encoded regulatory programs of two types of prokaryotes: Escherichia coli (a bacterium) and Halobacterium salinarum (an archaeon). The models reveal how the genome-wide distributions of cis-acting gene regulatory elements and the conditional influences of transcription factors at each of those elements encode programs for eliciting a wide array of environment-specific responses. We demonstrate how these programs partition transcriptional regulation of genes within regulons and operons to re-organize gene-gene functional associations in each environment. The models capture fitness-relevant co-regulation by different transcriptional control mechanisms acting across the entire genome, to define a generalized, system-level organizing principle for prokaryotic gene regulatory networks that goes well beyond existing paradigms of gene regulation. An online resource (http://egrin2.systemsbiology.net) has been developed to facilitate multiscale exploration of conditional gene regulation in the two prokaryotes.
Collapse
Affiliation(s)
- Aaron N Brooks
- Institute for Systems Biology, Seattle, WA, USA Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | | | - Antoine Allard
- Département de Physique, de Génie Physique et d'Optique, Université Laval, Québec, QC, Canada
| | - Wei-Ju Wu
- Institute for Systems Biology, Seattle, WA, USA
| | - Diego M Salvanha
- Institute for Systems Biology, Seattle, WA, USA LabPIB, Department of Computing and Mathematics FFCLRP-USP, University of Sao Paulo, Ribeirao Preto, Brazil
| | | | | | - Min Pan
- Institute for Systems Biology, Seattle, WA, USA
| | | | - Nitin S Baliga
- Institute for Systems Biology, Seattle, WA, USA Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA Departments of Microbiology and Biology, University of Washington, Seattle, WA, USA Lawrence Berkeley National Laboratories, Berkeley, CA, USA
| |
Collapse
|
19
|
Tran TP, Ong E, Hodges AP, Paternostro G, Piermarocchi C. Prediction of kinase inhibitor response using activity profiling, in vitro screening, and elastic net regression. BMC SYSTEMS BIOLOGY 2014; 8:74. [PMID: 24961498 PMCID: PMC4094402 DOI: 10.1186/1752-0509-8-74] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 06/18/2014] [Indexed: 11/10/2022]
Abstract
Background Many kinase inhibitors have been approved as cancer therapies. Recently, libraries of kinase inhibitors have been extensively profiled, thus providing a map of the strength of action of each compound on a large number of its targets. These profiled libraries define drug-kinase networks that can predict the effectiveness of untested drugs and elucidate the roles of specific kinases in different cellular systems. Predictions of drug effectiveness based on a comprehensive network model of cellular signalling are difficult, due to our partial knowledge of the complex biological processes downstream of the targeted kinases. Results We have developed the Kinase Inhibitors Elastic Net (KIEN) method, which integrates information contained in drug-kinase networks with in vitro screening. The method uses the in vitro cell response of single drugs and drug pair combinations as a training set to build linear and nonlinear regression models. Besides predicting the effectiveness of untested drugs, the KIEN method identifies sets of kinases that are statistically associated to drug sensitivity in a given cell line. We compared different versions of the method, which is based on a regression technique known as elastic net. Data from two-drug combinations led to predictive models, and we found that predictivity can be improved by applying logarithmic transformation to the data. The method was applied to the A549 lung cancer cell line, and we identified specific kinases known to have an important role in this type of cancer (TGFBR2, EGFR, PHKG1 and CDK4). A pathway enrichment analysis of the set of kinases identified by the method showed that axon guidance, activation of Rac, and semaphorin interactions pathways are associated to a selective response to therapeutic intervention in this cell line. Conclusions We have proposed an integrated experimental and computational methodology, called KIEN, that identifies the role of specific kinases in the drug response of a given cell line. The method will facilitate the design of new kinase inhibitors and the development of therapeutic interventions with combinations of many inhibitors.
Collapse
|
20
|
Turkarslan S, Wurtmann EJ, Wu WJ, Jiang N, Bare JC, Foley K, Reiss DJ, Novichkov P, Baliga NS. Network portal: a database for storage, analysis and visualization of biological networks. Nucleic Acids Res 2013; 42:D184-90. [PMID: 24271392 PMCID: PMC3964938 DOI: 10.1093/nar/gkt1190] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The ease of generating high-throughput data has enabled investigations into organismal complexity at the systems level through the inference of networks of interactions among the various cellular components (genes, RNAs, proteins and metabolites). The wider scientific community, however, currently has limited access to tools for network inference, visualization and analysis because these tasks often require advanced computational knowledge and expensive computing resources. We have designed the network portal (http://networks.systemsbiology.net) to serve as a modular database for the integration of user uploaded and public data, with inference algorithms and tools for the storage, visualization and analysis of biological networks. The portal is fully integrated into the Gaggle framework to seamlessly exchange data with desktop and web applications and to allow the user to create, save and modify workspaces, and it includes social networking capabilities for collaborative projects. While the current release of the database contains networks for 13 prokaryotic organisms from diverse phylogenetic clades (4678 co-regulated gene modules, 3466 regulators and 9291 cis-regulatory motifs), it will be rapidly populated with prokaryotic and eukaryotic organisms as relevant data become available in public repositories and through user input. The modular architecture, simple data formats and open API support community development of the portal.
Collapse
Affiliation(s)
- Serdar Turkarslan
- Institute for Systems Biology, Seattle, WA 98109, USA and Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Meysman P, Sonego P, Bianco L, Fu Q, Ledezma-Tejeida D, Gama-Castro S, Liebens V, Michiels J, Laukens K, Marchal K, Collado-Vides J, Engelen K. COLOMBOS v2.0: an ever expanding collection of bacterial expression compendia. Nucleic Acids Res 2013; 42:D649-53. [PMID: 24214998 PMCID: PMC3965013 DOI: 10.1093/nar/gkt1086] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The COLOMBOS database (http://www.colombos.net) features comprehensive organism-specific cross-platform gene expression compendia of several bacterial model organisms and is supported by a fully interactive web portal and an extensive web API. COLOMBOS was originally published in PLoS One, and COLOMBOS v2.0 includes both an update of the expression data, by expanding the previously available compendia and by adding compendia for several new species, and an update of the surrounding functionality, with improved search and visualization options and novel tools for programmatic access to the database. The scope of the database has also been extended to incorporate RNA-seq data in our compendia by a dedicated analysis pipeline. We demonstrate the validity and robustness of this approach by comparing the same RNA samples measured in parallel using both microarrays and RNA-seq. As far as we know, COLOMBOS currently hosts the largest homogenized gene expression compendia available for seven bacterial model organisms.
Collapse
Affiliation(s)
- Pieter Meysman
- Department of Mathematics and Computer Science, University of Antwerp, B-2020 Antwerp, Belgium, Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, B-2650 Edegem, Belgium, Department of Computational Biology, Research and Innovation Center, Fondazione Edmund Mach, San Michele all'Adige, Trento (TN) 38010, Italy, Department of Microbial and Molecular Sciences, KU Leuven, Leuven B-3001, Belgium, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos 62210, Mexico, Department of Plant Biotechnology and Bioinformatics, Ghent University, Gent 9052, Belgium and Department of Information Technology, IMinds, Ghent University, Gent 9052, Belgium
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
De Maeyer D, Renkens J, Cloots L, De Raedt L, Marchal K. PheNetic: network-based interpretation of unstructured gene lists in E. coli. MOLECULAR BIOSYSTEMS 2013; 9:1594-603. [PMID: 23591551 DOI: 10.1039/c3mb25551d] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
At the present time, omics experiments are commonly used in wet lab practice to identify leads involved in interesting phenotypes. These omics experiments often result in unstructured gene lists, the interpretation of which in terms of pathways or the mode of action is challenging. To aid in the interpretation of such gene lists, we developed PheNetic, a decision theoretic method that exploits publicly available information, captured in a comprehensive interaction network to obtain a mechanistic view of the listed genes. PheNetic selects from an interaction network the sub-networks highlighted by these gene lists. We applied PheNetic to an Escherichia coli interaction network to reanalyse a previously published KO compendium, assessing gene expression of 27 E. coli knock-out mutants under mild acidic conditions. Being able to unveil previously described mechanisms involved in acid resistance demonstrated both the performance of our method and the added value of our integrated E. coli network. PheNetic is available at .
Collapse
Affiliation(s)
- Dries De Maeyer
- Center of Microbial and Plant Genetics, Katholieke Universiteit Leuven, Kasteelpark Arenberg 20, B-3001 Leuven, Belgium
| | | | | | | | | |
Collapse
|
23
|
Faria JP, Overbeek R, Xia F, Rocha M, Rocha I, Henry CS. Genome-scale bacterial transcriptional regulatory networks: reconstruction and integrated analysis with metabolic models. Brief Bioinform 2013; 15:592-611. [DOI: 10.1093/bib/bbs071] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
|
24
|
|
25
|
Baitaluk M, Kozhenkov S, Ponomarenko J. An integrative approach to inferring gene regulatory module networks. PLoS One 2012; 7:e52836. [PMID: 23285197 PMCID: PMC3527610 DOI: 10.1371/journal.pone.0052836] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2012] [Accepted: 11/22/2012] [Indexed: 12/31/2022] Open
Abstract
Background Gene regulatory networks (GRNs) provide insight into the mechanisms of differential gene expression at a system level. However, the methods for inference, functional analysis and visualization of gene regulatory modules and GRNs require the user to collect heterogeneous data from many sources using numerous bioinformatics tools. This makes the analysis expensive and time-consuming. Results In this work, the BiologicalNetworks application–the data integration and network based research environment–was extended with tools for inference and analysis of gene regulatory modules and networks. The backend database of the application integrates public data on gene expression, pathways, transcription factor binding sites, gene and protein sequences, and functional annotations. Thus, all data essential for the gene regulation analysis can be mined publicly. In addition, the user’s data can either be integrated in the database and become public, or kept private within the application. The capabilities to analyze multiple gene expression experiments are also provided. Conclusion The generated modular networks, regulatory modules and binding sites can be visualized and further analyzed within this same application. The developed tools were applied to the mouse model of asthma and the OCT4 regulatory network in embryonic stem cells. Developed methods and data are available through the Java application from BiologicalNetworks program at http://www.biologicalnetworks.org.
Collapse
Affiliation(s)
- Michael Baitaluk
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California, United States of America
| | - Sergey Kozhenkov
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California, United States of America
| | - Julia Ponomarenko
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California, United States of America
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
26
|
Abstract
Reconstructing gene regulatory networks from high-throughput data is a long-standing problem. Through the DREAM project (Dialogue on Reverse Engineering Assessment and Methods), we performed a comprehensive blind assessment of over thirty network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae, and in silico microarray data. We characterize performance, data requirements, and inherent biases of different inference approaches offering guidelines for both algorithm application and development. We observe that no single inference method performs optimally across all datasets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse datasets. Thereby, we construct high-confidence networks for E. coli and S. aureus, each comprising ~1700 transcriptional interactions at an estimated precision of 50%. We experimentally test 53 novel interactions in E. coli, of which 23 were supported (43%). Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks.
Collapse
|
27
|
Beg QK, Zampieri M, Klitgord N, Collins SB, Altafini C, Serres MH, Segrè D. Detection of transcriptional triggers in the dynamics of microbial growth: application to the respiratorily versatile bacterium Shewanella oneidensis. Nucleic Acids Res 2012; 40:7132-49. [PMID: 22638572 PMCID: PMC3424579 DOI: 10.1093/nar/gks467] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The capacity of microorganisms to respond to variable external conditions requires a coordination of environment-sensing mechanisms and decision-making regulatory circuits. Here, we seek to understand the interplay between these two processes by combining high-throughput measurement of time-dependent mRNA profiles with a novel computational approach that searches for key genetic triggers of transcriptional changes. Our approach helped us understand the regulatory strategies of a respiratorily versatile bacterium with promising bioenergy and bioremediation applications, Shewanella oneidensis, in minimal and rich media. By comparing expression profiles across these two conditions, we unveiled components of the transcriptional program that depend mainly on the growth phase. Conversely, by integrating our time-dependent data with a previously available large compendium of static perturbation responses, we identified transcriptional changes that cannot be explained solely by internal network dynamics, but are rather triggered by specific genes acting as key mediators of an environment-dependent response. These transcriptional triggers include known and novel regulators that respond to carbon, nitrogen and oxygen limitation. Our analysis suggests a sequence of physiological responses, including a coupling between nitrogen depletion and glycogen storage, partially recapitulated through dynamic flux balance analysis, and experimentally confirmed by metabolite measurements. Our approach is broadly applicable to other systems.
Collapse
Affiliation(s)
- Qasim K Beg
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | | | | | | | | | | | | |
Collapse
|
28
|
Fu Q, Lemmens K, Sanchez-Rodriguez A, Thijs IM, Meysman P, Sun H, Fierro AC, Engelen K, Marchal K. Directed module detection in a large-scale expression compendium. Methods Mol Biol 2012; 804:131-165. [PMID: 22144152 DOI: 10.1007/978-1-61779-361-5_8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Public online microarray databases contain tremendous amounts of expression data. Mining these data sources can provide a wealth of information on the underlying transcriptional networks. In this chapter, we illustrate how the web services COLOMBOS and DISTILLER can be used to identify condition-dependent coexpression modules by exploring compendia of public expression data. COLOMBOS is designed for user-specified query-driven analysis, whereas DISTILLER generates a global regulatory network overview. The user is guided through both web services by means of a case study in which condition-dependent coexpression modules comprising a gene of interest (i.e., "directed") are identified.
Collapse
Affiliation(s)
- Qiang Fu
- Centre of Microbial and Plant Genetics, Katholieke Universiteit Leuven, Heverlee, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Van Deun K, Wilderjans TF, van den Berg RA, Antoniadis A, Van Mechelen I. A flexible framework for sparse simultaneous component based data integration. BMC Bioinformatics 2011; 12:448. [PMID: 22085701 PMCID: PMC3283562 DOI: 10.1186/1471-2105-12-448] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Accepted: 11/15/2011] [Indexed: 12/05/2022] Open
Abstract
1 Background High throughput data are complex and methods that reveal structure underlying the data are most useful. Principal component analysis, frequently implemented as a singular value decomposition, is a popular technique in this respect. Nowadays often the challenge is to reveal structure in several sources of information (e.g., transcriptomics, proteomics) that are available for the same biological entities under study. Simultaneous component methods are most promising in this respect. However, the interpretation of the principal and simultaneous components is often daunting because contributions of each of the biomolecules (transcripts, proteins) have to be taken into account. 2 Results We propose a sparse simultaneous component method that makes many of the parameters redundant by shrinking them to zero. It includes principal component analysis, sparse principal component analysis, and ordinary simultaneous component analysis as special cases. Several penalties can be tuned that account in different ways for the block structure present in the integrated data. This yields known sparse approaches as the lasso, the ridge penalty, the elastic net, the group lasso, sparse group lasso, and elitist lasso. In addition, the algorithmic results can be easily transposed to the context of regression. Metabolomics data obtained with two measurement platforms for the same set of Escherichia coli samples are used to illustrate the proposed methodology and the properties of different penalties with respect to sparseness across and within data blocks. 3 Conclusion Sparse simultaneous component analysis is a useful method for data integration: First, simultaneous analyses of multiple blocks offer advantages over sequential and separate analyses and second, interpretation of the results is highly facilitated by their sparseness. The approach offered is flexible and allows to take the block structure in different ways into account. As such, structures can be found that are exclusively tied to one data platform (group lasso approach) as well as structures that involve all data platforms (Elitist lasso approach). 4 Availability The additional file contains a MATLAB implementation of the sparse simultaneous component method.
Collapse
Affiliation(s)
- Katrijn Van Deun
- Center for Computational Systems Biology SymBioSys, Katholieke Universiteit Leuven, 3000 Leuven, Belgium.
| | | | | | | | | |
Collapse
|
30
|
Abstract
Understanding complex biological systems requires extensive support from software tools. Such tools are needed at each step of a systems biology computational workflow, which typically consists of data handling, network inference, deep curation, dynamical simulation and model analysis. In addition, there are now efforts to develop integrated software platforms, so that tools that are used at different stages of the workflow and by different researchers can easily be used together. This Review describes the types of software tools that are required at different stages of systems biology research and the current options that are available for systems biology researchers. We also discuss the challenges and prospects for modelling the effects of genetic changes on physiology and the concept of an integrated platform.
Collapse
|
31
|
Cloots L, Marchal K. Network-based functional modeling of genomics, transcriptomics and metabolism in bacteria. Curr Opin Microbiol 2011; 14:599-607. [DOI: 10.1016/j.mib.2011.09.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2011] [Revised: 08/28/2011] [Accepted: 09/05/2011] [Indexed: 01/10/2023]
|
32
|
Engelen K, Fu Q, Meysman P, Sánchez-Rodríguez A, De Smet R, Lemmens K, Fierro AC, Marchal K. COLOMBOS: access port for cross-platform bacterial expression compendia. PLoS One 2011; 6:e20938. [PMID: 21779320 PMCID: PMC3136457 DOI: 10.1371/journal.pone.0020938] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2011] [Accepted: 05/13/2011] [Indexed: 12/26/2022] Open
Abstract
Background Microarrays are the main technology for large-scale transcriptional gene expression profiling, but the large bodies of data available in public databases are not useful due to the large heterogeneity. There are several initiatives that attempt to bundle these data into expression compendia, but such resources for bacterial organisms are scarce and limited to integration of experiments from the same platform or to indirect integration of per experiment analysis results. Methodology/Principal Findings We have constructed comprehensive organism-specific cross-platform expression compendia for three bacterial model organisms (Escherichia coli, Bacillus subtilis, and Salmonella enterica serovar Typhimurium) together with an access portal, dubbed COLOMBOS, that not only provides easy access to the compendia, but also includes a suite of tools for exploring, analyzing, and visualizing the data within these compendia. It is freely available at http://bioi.biw.kuleuven.be/colombos. The compendia are unique in directly combining expression information from different microarray platforms and experiments, and we illustrate the potential benefits of this direct integration with a case study: extending the known regulon of the Fur transcription factor of E. coli. The compendia also incorporate extensive annotations for both genes and experimental conditions; these heterogeneous data are functionally integrated in the COLOMBOS analysis tools to interactively browse and query the compendia not only for specific genes or experiments, but also metabolic pathways, transcriptional regulation mechanisms, experimental conditions, biological processes, etc. Conclusions/Significance We have created cross-platform expression compendia for several bacterial organisms and developed a complementary access port COLOMBOS, that also serves as a convenient expression analysis tool to extract useful biological information. This work is relevant to a large community of microbiologists by facilitating the use of publicly available microarray experiments to support their research.
Collapse
Affiliation(s)
- Kristof Engelen
- Department of Microbial and Molecular Systems, Katholieke Universiteit Leuven, Heverlee-Leuven, Belgium
- * E-mail: (KE); (KM)
| | - Qiang Fu
- Department of Microbial and Molecular Systems, Katholieke Universiteit Leuven, Heverlee-Leuven, Belgium
| | - Pieter Meysman
- Department of Microbial and Molecular Systems, Katholieke Universiteit Leuven, Heverlee-Leuven, Belgium
| | - Aminael Sánchez-Rodríguez
- Department of Microbial and Molecular Systems, Katholieke Universiteit Leuven, Heverlee-Leuven, Belgium
| | - Riet De Smet
- Department of Microbial and Molecular Systems, Katholieke Universiteit Leuven, Heverlee-Leuven, Belgium
| | - Karen Lemmens
- Department of Microbial and Molecular Systems, Katholieke Universiteit Leuven, Heverlee-Leuven, Belgium
| | - Ana Carolina Fierro
- Department of Microbial and Molecular Systems, Katholieke Universiteit Leuven, Heverlee-Leuven, Belgium
| | - Kathleen Marchal
- Department of Microbial and Molecular Systems, Katholieke Universiteit Leuven, Heverlee-Leuven, Belgium
- * E-mail: (KE); (KM)
| |
Collapse
|
33
|
De Smet R, Marchal K. An ensemble biclustering approach for querying gene expression compendia with experimental lists. Bioinformatics 2011; 27:1948-56. [PMID: 21593133 DOI: 10.1093/bioinformatics/btr307] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Query-based biclustering techniques allow interrogating a gene expression compendium with a given gene or gene list. They do so by searching for genes in the compendium that have a profile close to the average expression profile of the genes in this query-list. As it can often not be guaranteed that the genes in a long query-list will all be mutually coexpressed, it is advisable to use each gene separately as a query. This approach, however, leaves the user with a tedious post-processing of partially redundant biclustering results. The fact that for each query-gene multiple parameter settings need to be tested in order to detect the 'most optimal bicluster size' adds to the redundancy problem. RESULTS To aid with this post-processing, we developed an ensemble approach to be used in combination with query-based biclustering. The method relies on a specifically designed consensus matrix in which the biclustering outcomes for multiple query-genes and for different possible parameter settings are merged in a statistically robust way. Clustering of this matrix results in distinct, non-redundant consensus biclusters that maximally reflect the information contained within the original query-based biclustering results. The usefulness of the developed approach is illustrated on a biological case study in Escherichia coli. AVAILABILITY AND IMPLEMENTATION Compiled Matlab code is available from http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Information_DeSmet_2011/.
Collapse
Affiliation(s)
- Riet De Smet
- Department of Plant Systems Biology, VIB, Ghent University, Technologiepark 927, Ghent, Belgium
| | | |
Collapse
|
34
|
Zhao H, Cloots L, Van den Bulcke T, Wu Y, De Smet R, Storms V, Meysman P, Engelen K, Marchal K. Query-based biclustering of gene expression data using Probabilistic Relational Models. BMC Bioinformatics 2011; 12 Suppl 1:S37. [PMID: 21342568 PMCID: PMC3044293 DOI: 10.1186/1471-2105-12-s1-s37] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background With the availability of large scale expression compendia it is now possible to view own findings in the light of what is already available and retrieve genes with an expression profile similar to a set of genes of interest (i.e., a query or seed set) for a subset of conditions. To that end, a query-based strategy is needed that maximally exploits the coexpression behaviour of the seed genes to guide the biclustering, but that at the same time is robust against the presence of noisy genes in the seed set as seed genes are often assumed, but not guaranteed to be coexpressed in the queried compendium. Therefore, we developed ProBic, a query-based biclustering strategy based on Probabilistic Relational Models (PRMs) that exploits the use of prior distributions to extract the information contained within the seed set. Results We applied ProBic on a large scale Escherichia coli compendium to extend partially described regulons with potentially novel members. We compared ProBic's performance with previously published query-based biclustering algorithms, namely ISA and QDB, from the perspective of bicluster expression quality, robustness of the outcome against noisy seed sets and biological relevance. This comparison learns that ProBic is able to retrieve biologically relevant, high quality biclusters that retain their seed genes and that it is particularly strong in handling noisy seeds. Conclusions ProBic is a query-based biclustering algorithm developed in a flexible framework, designed to detect biologically relevant, high quality biclusters that retain relevant seed genes even in the presence of noise or when dealing with low quality seed sets.
Collapse
Affiliation(s)
- Hui Zhao
- Microbial and Molecular Systems, KU Leuven, Leuven 3001, Belgium.
| | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Zarrineh P, Fierro AC, Sánchez-Rodríguez A, De Moor B, Engelen K, Marchal K. COMODO: an adaptive coclustering strategy to identify conserved coexpression modules between organisms. Nucleic Acids Res 2010; 39:e41. [PMID: 21149270 PMCID: PMC3074154 DOI: 10.1093/nar/gkq1275] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Increasingly large-scale expression compendia for different species are becoming available. By exploiting the modularity of the coexpression network, these compendia can be used to identify biological processes for which the expression behavior is conserved over different species. However, comparing module networks across species is not trivial. The definition of a biologically meaningful module is not a fixed one and changing the distance threshold that defines the degree of coexpression gives rise to different modules. As a result when comparing modules across species, many different partially overlapping conserved module pairs across species exist and deciding which pair is most relevant is hard. Therefore, we developed a method referred to as conserved modules across organisms (COMODO) that uses an objective selection criterium to identify conserved expression modules between two species. The method uses as input microarray data and a gene homology map and provides as output pairs of conserved modules and searches for the pair of modules for which the number of sharing homologs is statistically most significant relative to the size of the linked modules. To demonstrate its principle, we applied COMODO to study coexpression conservation between the two well-studied bacteria Escherichia coli and Bacillus subtilis. COMODO is available at: http://homes.esat.kuleuven.be/∼kmarchal/Supplementary_Information_Zarrineh_2010/comodo/index.html.
Collapse
Affiliation(s)
- Peyman Zarrineh
- Department of Electrical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium
| | | | | | | | | | | |
Collapse
|
36
|
Meysman P, Dang TH, Laukens K, De Smet R, Wu Y, Marchal K, Engelen K. Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli. Nucleic Acids Res 2010; 39:e6. [PMID: 21051340 PMCID: PMC3025552 DOI: 10.1093/nar/gkq1071] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Recognition of genomic binding sites by transcription factors can occur through base-specific recognition, or by recognition of variations within the structure of the DNA macromolecule. In this article, we investigate what information can be retrieved from local DNA structural properties that is relevant to transcription factor binding and that cannot be captured by the nucleotide sequence alone. More specifically, we explore the benefit of employing the structural characteristics of DNA to create binding-site models that encompass indirect recognition for the Escherichia coli model organism. We developed a novel methodology [Conditional Random fields of Smoothed Structural Data (CRoSSeD)], based on structural scales and conditional random fields to model and predict regulator binding sites. The value of relying on local structural-DNA properties is demonstrated by improved classifier performance on a large number of biological datasets, and by the detection of novel binding sites which could be validated by independent data sources, and which could not be identified using sequence data alone. We further show that the CRoSSeD-binding-site models can be related to the actual molecular mechanisms of the transcription factor DNA binding, and thus cannot only be used for prediction of novel sites, but might also give valuable insights into unknown binding mechanisms of transcription factors.
Collapse
Affiliation(s)
- Pieter Meysman
- Department of Microbial and Molecular systems, KU Leuven, Leuven Heverlee, Belgium
| | | | | | | | | | | | | |
Collapse
|
37
|
De Smet R, Marchal K. Advantages and limitations of current network inference methods. Nat Rev Microbiol 2010; 8:717-29. [PMID: 20805835 DOI: 10.1038/nrmicro2419] [Citation(s) in RCA: 312] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Network inference, which is the reconstruction of biological networks from high-throughput data, can provide valuable information about the regulation of gene expression in cells. However, it is an underdetermined problem, as the number of interactions that can be inferred exceeds the number of independent measurements. Different state-of-the-art tools for network inference use specific assumptions and simplifications to deal with underdetermination, and these influence the inferences. The outcome of network inference therefore varies between tools and can be highly complementary. Here we categorize the available tools according to the strategies that they use to deal with the problem of underdetermination. Such categorization allows an insight into why a certain tool is more appropriate for the specific research question or data set at hand.
Collapse
Affiliation(s)
- Riet De Smet
- Centre of Microbial and Plant Genetics/Bioinformatics, Department of Microbial and Molecular Systems, Katholieke Universiteit Leuven, Leuven, Belgium
| | | |
Collapse
|
38
|
Kint G, Fierro C, Marchal K, Vanderleyden J, De Keersmaecker SCJ. Integration of ‘omics’ data: does it lead to new insights into host–microbe interactions? Future Microbiol 2010; 5:313-28. [DOI: 10.2217/fmb.10.1] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
The interaction between both beneficial and pathogenic microbes and their host has been the subject of many studies. Although the field of systems biology is rapidly evolving, the use of a systems biology approach by means of high-throughput techniques to study host–microbe interactions is just beginning to be explored. In this review, we discuss some of the most recently developed high-throughput ‘omics’ techniques and their use in the context of host–microbe interaction. Moreover, we highlight studies combining several techniques that are pioneering the integration of ‘omics’ data related to host–microbe interactions. Finally, we list the major challenges ahead for successful systems biology research on host–microbe interactions.
Collapse
Affiliation(s)
- Gwendoline Kint
- Centre of Microbial & Plant Genetics, KU Leuven, Kasteelpark Arenberg 20, B-3001 Leuven, Belgium
| | - Carolina Fierro
- Centre of Microbial & Plant Genetics, KU Leuven, Kasteelpark Arenberg 20, B-3001 Leuven, Belgium
| | - Kathleen Marchal
- Centre of Microbial & Plant Genetics, KU Leuven, Kasteelpark Arenberg 20, B-3001 Leuven, Belgium
| | - Jos Vanderleyden
- Centre of Microbial & Plant Genetics, KU Leuven, Kasteelpark Arenberg 20, B-3001 Leuven, Belgium
| | | |
Collapse
|
39
|
Przytycka TM, Singh M, Slonim DK. Toward the dynamic interactome: it's about time. Brief Bioinform 2010; 11:15-29. [PMID: 20061351 PMCID: PMC2810115 DOI: 10.1093/bib/bbp057] [Citation(s) in RCA: 147] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Revised: 11/01/2009] [Indexed: 11/14/2022] Open
Abstract
Dynamic molecular interactions play a central role in regulating the functioning of cells and organisms. The availability of experimentally determined large-scale cellular networks, along with other high-throughput experimental data sets that provide snapshots of biological systems at different times and conditions, is increasingly helpful in elucidating interaction dynamics. Here we review the beginnings of a new subfield within computational biology, one focused on the global inference and analysis of the dynamic interactome. This burgeoning research area, which entails a shift from static to dynamic network analysis, promises to be a major step forward in our ability to model and reason about cellular function and behavior.
Collapse
Affiliation(s)
- Teresa M Przytycka
- National Center of Biotechnology Information, NLM, NIH, 8000 Rockville Pike, Bethesda MD 20814, USA.
| | | | | |
Collapse
|
40
|
Huttenhower C, Mutungu KT, Indik N, Yang W, Schroeder M, Forman JJ, Troyanskaya OG, Coller HA. Detailing regulatory networks through large scale data integration. Bioinformatics 2009; 25:3267-74. [PMID: 19825796 DOI: 10.1093/bioinformatics/btp588] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Much of a cell's regulatory response to changing environments occurs at the transcriptional level. Particularly in higher organisms, transcription factors (TFs), microRNAs and epigenetic modifications can combine to form a complex regulatory network. Part of this system can be modeled as a collection of regulatory modules: co-regulated genes, the conditions under which they are co-regulated and sequence-level regulatory motifs. RESULTS We present the Combinatorial Algorithm for Expression and Sequence-based Cluster Extraction (COALESCE) system for regulatory module prediction. The algorithm is efficient enough to discover expression biclusters and putative regulatory motifs in metazoan genomes (>20,000 genes) and very large microarray compendia (>10,000 conditions). Using Bayesian data integration, it can also include diverse supporting data types such as evolutionary conservation or nucleosome placement. We validate its performance using a functional evaluation of co-clustered genes, known yeast and Escherichea coli TF targets, synthetic data and various metazoan data compendia. In all cases, COALESCE performs as well or better than current biclustering and motif prediction tools, with high accuracy in functional and TF/target assignments and zero false positives on synthetic data. COALESCE provides an efficient and flexible platform within which large, diverse data collections can be integrated to predict metazoan regulatory networks. AVAILABILITY Source code (C++) is available at http://function.princeton.edu/sleipnir, and supporting data and a web interface are provided at http://function.princeton.edu/coalesce. CONTACT ogt@cs.princeton.edu; hcoller@princeton.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Curtis Huttenhower
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
| | | | | | | | | | | | | | | |
Collapse
|
41
|
Fadda A, Fierro AC, Lemmens K, Monsieurs P, Engelen K, Marchal K. Inferring the transcriptional network of Bacillus subtilis. MOLECULAR BIOSYSTEMS 2009; 5:1840-52. [PMID: 20023724 DOI: 10.1039/b907310h] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The adaptation of bacteria to the vigorous environmental changes they undergo is crucial to their survival. They achieve this adaptation partly via intricate regulation of the transcription of their genes. In this study, we infer the transcriptional network of the Gram-positive model organism, Bacillus subtilis. We use a data integration workflow, exploiting both motif and expression data, towards the generation of condition-dependent transcriptional modules. In building the motif data, we rely on both known and predicted information. Known motifs were derived from DBTBS, while predicted motifs were generated by a de novo motif detection method that utilizes comparative genomics. The expression data consists of a compendium of microarrays across different platforms. Our results indicate that a considerable part of the B. subtilis network is yet undiscovered; we could predict 417 new regulatory interactions for known regulators and 453 interactions for yet uncharacterized regulators. The regulators in our network showed a preference for regulating modules in certain environmental conditions. Also, substantial condition-dependent intra-operonic regulation seems to take place. Global regulators seem to require functional flexibility to attain their roles by acting as both activators and repressors.
Collapse
Affiliation(s)
- Abeer Fadda
- Department of Microbial and Molecular Systems, KULeuven, Kasteelpark Arenberg 20, 3001 Heverlee, Belgium
| | | | | | | | | | | |
Collapse
|
42
|
Sun H, Lemmens K, Bulcke TVD, Engelen K, Moor BD, Marchal K. ViTraM: visualization of transcriptional modules. Bioinformatics 2009; 25:2450-1. [DOI: 10.1093/bioinformatics/btp400] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|