1
|
Gao Z, Zhao R, Ruan J. A genome-wide cis-regulatory element discovery method based on promoter sequences and gene co-expression networks. BMC Genomics 2013; 14 Suppl 1:S4. [PMID: 23368633 PMCID: PMC3549801 DOI: 10.1186/1471-2164-14-s1-s4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background Deciphering cis-regulatory networks has become an attractive yet challenging task. This paper presents a simple method for cis-regulatory network discovery which aims to avoid some of the common problems of previous approaches. Results Using promoter sequences and gene expression profiles as input, rather than clustering the genes by the expression data, our method utilizes co-expression neighborhood information for each individual gene, thereby overcoming the disadvantages of current clustering based models which may miss specific information for individual genes. In addition, rather than using a motif database as an input, it implements a simple motif count table for each enumerated k-mer for each gene promoter sequence. Thus, it can be used for species where previous knowledge of cis-regulatory motifs is unknown and has the potential to discover new transcription factor binding sites. Applications on Saccharomyces cerevisiae and Arabidopsis have shown that our method has a good prediction accuracy and outperforms a phylogenetic footprinting approach. Furthermore, the top ranked gene-motif regulatory clusters are evidently functionally co-regulated, and the regulatory relationships between the motifs and the enriched biological functions can often be confirmed by literature. Conclusions Since this method is simple and gene-specific, it can be readily utilized for insufficiently studied species or flexibly used as an additional step or data source for previous transcription regulatory networks discovery models.
Collapse
Affiliation(s)
- Zhen Gao
- Department of Computer Science, The University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78249, USA.
| | | | | |
Collapse
|
2
|
Brohée S, Janky R, Abdel-Sater F, Vanderstocken G, André B, van Helden J. Unraveling networks of co-regulated genes on the sole basis of genome sequences. Nucleic Acids Res 2011; 39:6340-58. [PMID: 21572103 PMCID: PMC3159452 DOI: 10.1093/nar/gkr264] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
With the growing number of available microbial genome sequences, regulatory signals can now be revealed as conserved motifs in promoters of orthologous genes (phylogenetic footprints). A next challenge is to unravel genome-scale regulatory networks. Using as sole input genome sequences, we predicted cis-regulatory elements for each gene of the yeast Saccharomyces cerevisiae by discovering over-represented motifs in the promoters of their orthologs in 19 Saccharomycetes species. We then linked all genes displaying similar motifs in their promoter regions and inferred a co-regulation network including 56,919 links between 3171 genes. Comparison with annotated regulons highlights the high predictive value of the method: a majority of the top-scoring predictions correspond to already known co-regulations. We also show that this inferred network is as accurate as a co-expression network built from hundreds of transcriptome microarray experiments. Furthermore, we experimentally validated 14 among 16 new functional links between orphan genes and known regulons. This approach can be readily applied to unravel gene regulatory networks from hundreds of microbial genomes for which no other information is available except the sequence. Long-term benefits can easily be perceived when considering the exponential increase of new genome sequences.
Collapse
Affiliation(s)
- Sylvain Brohée
- Lab. Bioinformatique des Génomes et des Réseaux (BiGRe), Université Libre de Bruxelles (ULB), CP 263, Campus Plaine, Bld du Triomphe, 1050 Brussels, Belgium
| | | | | | | | | | | |
Collapse
|
3
|
Uncovering gene regulatory networks from time-series microarray data with variational Bayesian structural expectation maximization. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2010:71312. [PMID: 18309364 DOI: 10.1155/2007/71312] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2006] [Revised: 12/04/2006] [Accepted: 05/11/2007] [Indexed: 11/17/2022]
Abstract
We investigate in this paper reverse engineering of gene regulatory networks from time-series microarray data. We apply dynamic Bayesian networks (DBNs) for modeling cell cycle regulations. In developing a network inference algorithm, we focus on soft solutions that can provide a posteriori probability (APP) of network topology. In particular, we propose a variational Bayesian structural expectation maximization algorithm that can learn the posterior distribution of the network model parameters and topology jointly. We also show how the obtained APPs of the network topology can be used in a Bayesian data integration strategy to integrate two different microarray data sets. The proposed VBSEM algorithm has been tested on yeast cell cycle data sets. To evaluate the confidence of the inferred networks, we apply a moving block bootstrap method. The inferred network is validated by comparing it to the KEGG pathway map.
Collapse
|
4
|
Ruan J, Deng Y, Perkins EJ, Zhang W. An ensemble learning approach to reverse-engineering transcriptional regulatory networks from time-series gene expression data. BMC Genomics 2009; 10 Suppl 1:S8. [PMID: 19594885 PMCID: PMC2709269 DOI: 10.1186/1471-2164-10-s1-s8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND One of the most challenging tasks in the post-genomic era is to reconstruct the transcriptional regulatory networks. The goal is to reveal, for each gene that responds to a certain biological event, which transcription factors affect its expression, and how a set of transcription factors coordinate to accomplish temporal and spatial specific regulations. RESULTS Here we propose a supervised machine learning approach to address these questions. We focus our study on the gene transcriptional regulation of the cell cycle in the budding yeast, thanks to the large amount of data available and relatively well-understood biology, although the main ideas of our method can be applied to other data as well. Our method starts with building an ensemble of decision trees for each microarray data to capture the association between the expression levels of yeast genes and the binding of transcription factors to gene promoter regions, as determined by chromatin immunoprecipitation microarray (ChIP-chip) experiment. Cross-validation experiments show that the method is more accurate and reliable than the naive decision tree algorithm and several other ensemble learning methods. From the decision tree ensembles, we extract logical rules that explain how a set of transcription factors act in concert to regulate the expression of their targets. We further compute a profile for each rule to show its regulation strengths at different time points. We also propose a spline interpolation method to integrate the rule profiles learned from several time series expression data sets that measure the same biological process. We then combine these rule profiles to build a transcriptional regulatory network for the yeast cell cycle. Compared to the results in the literature, our method correctly identifies all major known yeast cell cycle transcription factors, and assigns them into appropriate cell cycle phases. Our method also identifies many interesting synergetic relationships among these transcription factors, most of which are well known, while many of the rest can also be supported by other evidences. CONCLUSION The high accuracy of our method indicates that our method is valid and robust. As more gene expression and transcription factor binding data become available, we believe that our method is useful for reconstructing large-scale transcriptional regulatory networks in other species as well.
Collapse
Affiliation(s)
- Jianhua Ruan
- Department of Computer Science, The University of Texas at San Antonio, San Antonio, TX 78249, USA.
| | | | | | | |
Collapse
|
5
|
On the impact of entropy estimation on transcriptional regulatory network inference based on mutual information. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2009:308959. [PMID: 19148299 PMCID: PMC3171423 DOI: 10.1155/2009/308959] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2008] [Accepted: 10/08/2008] [Indexed: 11/17/2022]
Abstract
The reverse engineering of transcription regulatory networks from expression data is gaining large interest in the bioinformatics community. An important family of inference techniques is represented by algorithms based on information theoretic measures which rely on the computation of pairwise mutual information. This paper aims to study the impact of the entropy estimator on the quality of the inferred networks. This is done by means of a comprehensive study which takes into consideration three state-of-the-art mutual information algorithms: ARACNE, CLR, and MRNET. Two different setups are considered in this work. The first one considers a set of 12 synthetically generated datasets to compare 8 different entropy estimators and three network inference algorithms. The two methods emerging as the most accurate ones from the first set of experiments are the MRNET method combined with the newly applied Spearman correlation and the CLR method combined with the Pearson correlation. The validation of these two techniques is then carried out on a set of 10 public domain microarray datasets measuring the transcriptional regulatory activity in the yeast organism.
Collapse
|
6
|
Kontos K, Godard P, André B, van Helden J, Bontempi G. Machine learning techniques to identify putative genes involved in nitrogen catabolite repression in the yeast Saccharomyces cerevisiae. BMC Proc 2008; 2 Suppl 4:S5. [PMID: 19091052 PMCID: PMC2654973 DOI: 10.1186/1753-6561-2-s4-s5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Nitrogen is an essential nutrient for all life forms. Like most unicellular organisms, the yeast Saccharomyces cerevisiae transports and catabolizes good nitrogen sources in preference to poor ones. Nitrogen catabolite repression (NCR) refers to this selection mechanism. All known nitrogen catabolite pathways are regulated by four regulators. The ultimate goal is to infer the complete nitrogen catabolite pathways. Bioinformatics approaches offer the possibility to identify putative NCR genes and to discard uninteresting genes. RESULTS We present a machine learning approach where the identification of putative NCR genes in the yeast Saccharomyces cerevisiae is formulated as a supervised two-class classification problem. Classifiers predict whether genes are NCR-sensitive or not from a large number of variables related to the GATA motif in the upstream non-coding sequences of the genes. The positive and negative training sets are composed of annotated NCR genes and manually-selected genes known to be insensitive to NCR, respectively. Different classifiers and variable selection methods are compared. We show that all classifiers make significant and biologically valid predictions by comparing these predictions to annotated and putative NCR genes, and by performing several negative controls. In particular, the inferred NCR genes significantly overlap with putative NCR genes identified in three genome-wide experimental and bioinformatics studies. CONCLUSION These results suggest that our approach can successfully identify potential NCR genes. Hence, the dimensionality of the problem of identifying all genes involved in NCR is drastically reduced.
Collapse
Affiliation(s)
- Kevin Kontos
- Machine Learning Group, Département d'Informatique, Faculté des Sciences, Université Libre de Bruxelles (ULB), Boulevard du Triomphe CP 212, 1050 Brussels, Belgium.
| | | | | | | | | |
Collapse
|
7
|
Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences. Nat Protoc 2008; 3:1589-603. [PMID: 18802440 DOI: 10.1038/nprot.2008.98] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
This protocol explains how to discover functional signals in genomic sequences by detecting over- or under-represented oligonucleotides (words) or spaced pairs thereof (dyads) with the Regulatory Sequence Analysis Tools (http://rsat.ulb.ac.be/rsat/). Two typical applications are presented: (i) predicting transcription factor-binding motifs in promoters of coregulated genes and (ii) discovering phylogenetic footprints in promoters of orthologous genes. The steps of this protocol include purging genomic sequences to discard redundant fragments, discovering over-represented patterns and assembling them to obtain degenerate motifs, scanning sequences and drawing feature maps. The main strength of the method is its statistical ground: the binomial significance provides an efficient control on the rate of false positives. In contrast with optimization-based pattern discovery algorithms, the method supports the detection of under- as well as over-represented motifs. Computation times vary from seconds (gene clusters) to minutes (whole genomes). The execution of the whole protocol should take approximately 1 h.
Collapse
|
8
|
Holloway DT, Kon M, DeLisi C. In silico regulatory analysis for exploring human disease progression. Biol Direct 2008; 3:24. [PMID: 18564415 PMCID: PMC2464594 DOI: 10.1186/1745-6150-3-24] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2008] [Accepted: 06/18/2008] [Indexed: 12/24/2022] Open
Abstract
Background An important goal in bioinformatics is to unravel the network of transcription factors (TFs) and their targets. This is important in the human genome, where many TFs are involved in disease progression. Here, classification methods are applied to identify new targets for 152 transcriptional regulators using publicly-available targets as training examples. Three types of sequence information are used: composition, conservation, and overrepresentation. Results Starting with 8817 TF-target interactions we predict an additional 9333 targets for 152 TFs. Randomized classifiers make few predictions (~2/18660) indicating that our predictions for many TFs are significantly enriched for true targets. An enrichment score is calculated and used to filter new predictions. Two case-studies for the TFs OCT4 and WT1 illustrate the usefulness of our predictions: • Many predicted OCT4 targets fall into the Wnt-pathway. This is consistent with known biology as OCT4 is developmentally related and Wnt pathway plays a role in early development. • Beginning with 15 known targets, 354 predictions are made for WT1. WT1 has a role in formation of Wilms' tumor. Chromosomal regions previously implicated in Wilms' tumor by cytological evidence are statistically enriched in predicted WT1 targets. These findings may shed light on Wilms' tumor progression, suggesting that the tumor progresses either by loss of WT1 or by loss of regions harbouring its targets. • Targets of WT1 are statistically enriched for cancer related functions including metastasis and apoptosis. Among new targets are BAX and PDE4B, which may help mediate the established anti-apoptotic effects of WT1. • Of the thirteen TFs found which co-regulate genes with WT1 (p ≤ 0.02), 8 have been previously implicated in cancer. The regulatory-network for WT1 targets in genomic regions relevant to Wilms' tumor is provided. Conclusion We have assembled a set of features for the targets of human TFs and used them to develop classifiers for the determination of new regulatory targets. Many predicted targets are consistent with the known biology of their regulators, and new targets for the Wilms' tumor regulator, WT1, are proposed. We speculate that Wilms' tumor development is mediated by chromosomal rearrangements in the location of WT1 targets. Reviewers This article was reviewed by Trey Ideker, Vladimir A. Kuznetsov(nominated by Frank Eisenhaber), and Tzachi Pilpel.
Collapse
Affiliation(s)
- Dustin T Holloway
- Molecular Biology Cell Biology and Biochemistry Department, Boston University, 5 Cummington Street, Boston, USA
| | | | | |
Collapse
|
9
|
Classifying transcription factor targets and discovering relevant biological features. Biol Direct 2008; 3:22. [PMID: 18513408 PMCID: PMC2441612 DOI: 10.1186/1745-6150-3-22] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2008] [Accepted: 05/30/2008] [Indexed: 01/04/2023] Open
Abstract
Background An important goal in post-genomic research is discovering the network of interactions between transcription factors (TFs) and the genes they regulate. We have previously reported the development of a supervised-learning approach to TF target identification, and used it to predict targets of 104 transcription factors in yeast. We now include a new sequence conservation measure, expand our predictions to include 59 new TFs, introduce a web-server, and implement an improved ranking method to reveal the biological features contributing to regulation. The classifiers combine 8 genomic datasets covering a broad range of measurements including sequence conservation, sequence overrepresentation, gene expression, and DNA structural properties. Principal Findings (1) Application of the method yields an amplification of information about yeast regulators. The ratio of total targets to previously known targets is greater than 2 for 11 TFs, with several having larger gains: Ash1(4), Ino2(2.6), Yaf1(2.4), and Yap6(2.4). (2) Many predicted targets for TFs match well with the known biology of their regulators. As a case study we discuss the regulator Swi6, presenting evidence that it may be important in the DNA damage response, and that the previously uncharacterized gene YMR279C plays a role in DNA damage response and perhaps in cell-cycle progression. (3) A procedure based on recursive-feature-elimination is able to uncover from the large initial data sets those features that best distinguish targets for any TF, providing clues relevant to its biology. An analysis of Swi6 suggests a possible role in lipid metabolism, and more specifically in metabolism of ceramide, a bioactive lipid currently being investigated for anti-cancer properties. (4) An analysis of global network properties highlights the transcriptional network hubs; the factors which control the most genes and the genes which are bound by the largest set of regulators. Cell-cycle and growth related regulators dominate the former; genes involved in carbon metabolism and energy generation dominate the latter. Conclusion Postprocessing of regulatory-classifier results can provide high quality predictions, and feature ranking strategies can deliver insight into the regulatory functions of TFs. Predictions are available at an online web-server, including the full transcriptional network, which can be analyzed using VisAnt network analysis suite. Reviewers This article was reviewed by Igor Jouline, Todd Mockler(nominated by Valerian Dolja), and Sandor Pongor.
Collapse
|
10
|
Godard P, Urrestarazu A, Vissers S, Kontos K, Bontempi G, van Helden J, André B. Effect of 21 different nitrogen sources on global gene expression in the yeast Saccharomyces cerevisiae. Mol Cell Biol 2007; 27:3065-86. [PMID: 17308034 PMCID: PMC1899933 DOI: 10.1128/mcb.01084-06] [Citation(s) in RCA: 186] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2006] [Revised: 07/24/2006] [Accepted: 01/16/2007] [Indexed: 11/20/2022] Open
Abstract
We compared the transcriptomes of Saccharomyces cerevisiae cells growing under steady-state conditions on 21 unique sources of nitrogen. We found 506 genes differentially regulated by nitrogen and estimated the activation degrees of all identified nitrogen-responding transcriptional controls according to the nitrogen source. One main group of nitrogenous compounds supports fast growth and a highly active nitrogen catabolite repression (NCR) control. Catabolism of these compounds typically yields carbon derivatives directly assimilable by a cell's metabolism. Another group of nitrogen compounds supports slower growth, is associated with excretion by cells of nonmetabolizable carbon compounds such as fusel oils, and is characterized by activation of the general control of amino acid biosynthesis (GAAC). Furthermore, NCR and GAAC appear interlinked, since expression of the GCN4 gene encoding the transcription factor that mediates GAAC is subject to NCR. We also observed that several transcriptional-regulation systems are active under a wider range of nitrogen supply conditions than anticipated. Other transcriptional-regulation systems acting on genes not involved in nitrogen metabolism, e.g., the pleiotropic-drug resistance and the unfolded-protein response systems, also respond to nitrogen. We have completed the lists of target genes of several nitrogen-sensitive regulons and have used sequence comparison tools to propose functions for about 20 orphan genes. Similar studies conducted for other nutrients should provide a more complete view of alternative metabolic pathways in yeast and contribute to the attribution of functions to many other orphan genes.
Collapse
Affiliation(s)
- Patrice Godard
- Physiologie Moléculaire de la Cellule, IBMM, Université Libre de Bruxelles, Rue des Pr. Jeener et Brachet 12, 6041 Gosselies, Belgium
| | | | | | | | | | | | | |
Collapse
|
11
|
Holloway DT, Kon M, DeLisi C. Machine learning for regulatory analysis and transcription factor target prediction in yeast. SYSTEMS AND SYNTHETIC BIOLOGY 2007; 1:25-46. [PMID: 19003435 PMCID: PMC2533145 DOI: 10.1007/s11693-006-9003-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
High throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional maps-the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using support vector machines (SVMs) to combine 26 different data types. A comparison with the standard approach to site identification using position specific scoring matrices (PSSMs) for a set of 104 Saccharomyces cerevisiae regulators indicates that our SVM-based target classification is more sensitive (73 vs. 20%) when specificity and positive predictive value are the same. We have applied our SVM classifier for each transcriptional regulator to all promoters in the yeast genome to obtain thousands of new targets, which are currently being analyzed and refined to limit the risk of classifier over-fitting. For the purpose of illustration we discuss several results, including biochemical pathway predictions for Gcn4 and Rap1. For both transcription factors SVM predictions match well with the known biology of control mechanisms, and possible new roles for these factors are suggested, such as a function for Rap1 in regulating fermentative growth. We also examine the promoter melting temperature curves for the targets of YJR060W, and show that targets of this TF have potentially unique physical properties which distinguish them from other genes. The SVM output automatically provides the means to rank dataset features to identify important biological elements. We use this property to rank classifying k-mers, thereby reconstructing known binding sites for several TFs, and to rank expression experiments, determining the conditions under which Fhl1, the factor responsible for expression of ribosomal protein genes, is active. We can see that targets of Fhl1 are differentially expressed in the chosen conditions as compared to the expression of average and negative set genes. SVM-based classifiers provide a robust framework for analysis of regulatory networks. Processing of classifier outputs can provide high quality predictions and biological insight into functions of particular transcription factors. Future work on this method will focus on increasing the accuracy and quality of predictions using feature reduction and clustering strategies. Since predictions have been made on only 104 TFs in yeast, new classifiers will be built for the remaining 100 factors which have available binding data.
Collapse
Affiliation(s)
- Dustin T. Holloway
- Molecular Biology Cell Biology and Biochemistry, Boston University, Boston, MA 02215 USA
| | - Mark Kon
- Department of Mathematics and Statistics, Boston University, Boston, MA 02215 USA
- Bioinformatics and Systems Biology, Boston University, Boston, MA 02215 USA
| | - Charles DeLisi
- Bioinformatics and Systems Biology, Boston University, Boston, MA 02215 USA
| |
Collapse
|
12
|
Azuaje F, Wang H, Zheng H, Bodenreider O, Chesneau A. Predictive Integration of Gene Ontology-Driven Similarity and Functional Interactions. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON DATA MINING 2006; 2006:114-119. [PMID: 25698910 DOI: 10.1109/icdmw.2006.130] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
There is a need to develop methods to automatically incorporate prior knowledge to support the prediction and validation of novel functional associations. One such important source is represented by the Gene Ontology (GO)™ and the many model organism databases of gene products annotated to the GO. We investigated quantitative relationships between the GO-driven similarity of genes and their functional interactions by analyzing different types of associations in Saccharomyces cerevisiae and Caenorhabditis elegans. Interacting genes exhibited significantly higher levels of GO-driven similarity (GOS) in comparison to random pairs of genes used as a surrogate for negative interactions. The Biological Process hierarchy provides more reliable results for co-regulatory and protein-protein interactions. GOS represent a relevant resource to support prediction of functional networks in combination with other resources.
Collapse
Affiliation(s)
| | - Haiying Wang
- School of Computing and Mathematics, University of Ulster, UK
| | - Huiru Zheng
- School of Computing and Mathematics, University of Ulster, UK
| | | | - Alban Chesneau
- High-Throughput Protein Technologies Group, EMBL-Grenoble, France
| |
Collapse
|
13
|
Simonis N, Gonze D, Orsi C, van Helden J, Wodak SJ. Modularity of the transcriptional response of protein complexes in yeast. J Mol Biol 2006; 363:589-610. [PMID: 16973176 DOI: 10.1016/j.jmb.2006.06.024] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2005] [Revised: 05/14/2006] [Accepted: 06/12/2006] [Indexed: 11/24/2022]
Abstract
A comprehensive study is performed on the condition-dependent expression of genes coding for the components of hand curated multi-protein complexes of the yeast Saccharomyces cerevisiae, in order to identify coherent transcriptional modules within these complexes. Such modules are defined as groups of genes within complexes whose expression profiles under a common set of experimental conditions allow us to discriminate them from random sets of genes. Our analysis reveals that complexes such as the cytoplasmic ribosome, the proteasome and the respiration chain complexes previously characterized as "stable" or "permanent" represent transcriptional modules that are coherently up or down-regulated in many different conditions. Overall however, some level of coherent expression is detected only in 71 out of the total of 113 complexes with at least five different protein components that could be reliably analyzed. Of these, 26 behave as coherently expressed transcriptional modules encompassing all the components of the complex. In another 15, at least half of the components make up such modules and in ten, few or no modules are detected. In an additional 20 complexes coherent expression is detected, but in too few conditions to enable reliable module detection. Interestingly, the transcriptional modules, when detected, often correspond to one or more known sub-complexes with specific functions. Furthermore, detected modules are generally consistent with transcriptional modules identified on the basis of predicted cis-regulatory sequence motifs. Also, groups of genes shared between complexes that carry out related functions tend to be part of overlapping transcriptional modules identified in these complexes. Together these findings suggest that transcriptional modules may represent basic functional and evolutionary building blocs of protein complexes.
Collapse
Affiliation(s)
- Nicolas Simonis
- Service de Conformation des Macromolécules Biologiques, Centre de Biologie Structurale et Bioinformatique, CP 263, Université Libre de Bruxelles, Bld. du Triomphe B-1050 Bruxelles, Belgium
| | | | | | | | | |
Collapse
|
14
|
Abstract
My encounter with Jacques Monod has shaped my scientific career. After a short incursion in the biochemistry of strict anaerobes, and after elucidating the biosynthetic pathway leading from aspartate to threonine in Escherichia coli, I joined his laboratory. With him and Howard Rickenberg, I discovered the stereospecific permeability of galactosides and amino acids (permeases). After this intermezzo, I returned to the analysis of biosynthetic pathways and of their regulation by allosteric feedback inhibition and repression in E. coli. Among others, my studies led to the discovery of the tryptophan and methionine repressors, to the incorporation of amino acid analogues in proteins, including selenomethionine (which much later led to progress in protein crystallography), to the definition of isofunctional and multifunctional enzymes, and to the elucidation of the primary structure of most of the enzymes leading to threonine and methionine.
Collapse
Affiliation(s)
- Georges N Cohen
- Insitut Pasteur, Centre National de la Recherche Scientifique, Paris 75015, France.
| |
Collapse
|
15
|
Gonze D, Pinloche S, Gascuel O, van Helden J. Discrimination of yeast genes involved in methionine and phosphate metabolism on the basis of upstream motifs. Bioinformatics 2005; 21:3490-500. [PMID: 15998664 DOI: 10.1093/bioinformatics/bti558] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION In yeast, methionine and phosphate metabolism are regulated by the complexes Met4p/Met28p/Cbf1p and Pho4p, respectively. The binding sites for these factors share a common core CACGTG. We evaluate our capability to discriminate phosphate- and methionine-responding genes on the basis of putative regulatory elements, despite the similarity between Met4p/Met28p/Cbf1p and Pho4p consensus. RESULTS We scanned upstream regions of methionine, phosphate and control genes with position-specific weight matrices for Pho4p, Met4p/Met28p/Cbf1p and Met31p/Met32p, and applied discriminant analysis to classify genes according to matrix matching scores. This analysis showed that matrix scores provided a good discrimination between phosphate, methionine and control genes. The optimal parameters have then been used to predict phosphate and methionine regulation at a genome scale. The genome-scale analysis predicts 37 genes as methionine-regulated and 40 as phosphate-regulated. We compare the predictive results with high throughput data and discuss the difference. AVAILABILITY The programs for sequence retrieval and analysis, as well as the complete data and results, are available on the website on regulatory sequence analysis tools (http://rsat.scmbb.ulb.ac.be/rsat/). CONTACT jvanheld@scmbb.ulb.ac.be SUPPLEMENTARY INFORMATION The complete datasets and results are available at http://rsat.scmbb.ulb.ac.be/rsat/data/published_data/Gonze_MET_PHO/
Collapse
Affiliation(s)
- Didier Gonze
- Service de Conformation des Macromolécules Biologiques et de Bioinformatique, Université Libre de Bruxelles, CP 263, Campus Plaine, Blvd du Triomphe, B-1050 Bruxelles, Belgium
| | | | | | | |
Collapse
|
16
|
Ruan J, Zhang W. CAGER: classification analysis of gene expression regulation using multiple information sources. BMC Bioinformatics 2005; 6:114. [PMID: 15890068 PMCID: PMC1174863 DOI: 10.1186/1471-2105-6-114] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2004] [Accepted: 05/12/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many classification approaches have been applied to analyzing transcriptional regulation of gene expressions. These methods build models that can explain a gene's expression level from the regulatory elements (features) on its promoter sequence. Different types of features, such as experimentally verified binding motifs, motifs discovered by computer programs, or transcription factor binding data measured with Chromatin Immunoprecipitation (ChIP) assays, have been used towards this goal. Each type of features has been shown successful in modeling gene transcriptional regulation under certain conditions. However, no comparison has been made to evaluate the relative merit of these features. Furthermore, most publicly available classification tools were not designed specifically for modeling transcriptional regulation, and do not allow the user to combine different types of features. RESULTS In this study, we use a specific classification method, decision trees, to model transcriptional regulation in yeast with features based on predefined motifs, automatically identified motifs, ChlP-chip data, or their combinations. We compare the accuracies and stability of these models, and analyze their capabilities in identifying functionally related genes. Furthermore, we design and implement a user-friendly web server called CAGER (Classification Analysis of Gene Expression Regulation) that integrates several software components for automated analysis of transcriptional regulation using decision trees. Finally, we use CAGER to study the transcriptional regulation of Arabidopsis genes in response to abscisic acid, and report some interesting new results. CONCLUSION Models built with ChlP-chip data suffer from low accuracies when the condition under which gene expressions are measured is significantly different from the condition under which the ChIP experiment is conducted. Models built with automatically identified motifs can sometimes discover new features, but their modeling accuracies may have been over-estimated in previous studies. Furthermore, models built with automatically identified motifs are not stable with respect to noises. A combination of ChlP-chip data and predefined motifs can substantially improve modeling accuracies, and is effective in identifying true regulons. The CAGER web server, which is freely available at http://cic.cs.wustl.edu/CAGER/, allows the user to select combinations of different feature types for building decision trees, and interact with the models graphically. We believe that it will be a useful tool to facilitate the discovery of gene transcriptional regulatory networks.
Collapse
Affiliation(s)
- Jianhua Ruan
- Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
| | - Weixiong Zhang
- Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
17
|
Güldener U, Münsterkötter M, Kastenmüller G, Strack N, van Helden J, Lemer C, Richelles J, Wodak SJ, García-Martínez J, Pérez-Ortín JE, Michael H, Kaps A, Talla E, Dujon B, André B, Souciet JL, De Montigny J, Bon E, Gaillardin C, Mewes HW. CYGD: the Comprehensive Yeast Genome Database. Nucleic Acids Res 2005; 33:D364-8. [PMID: 15608217 PMCID: PMC540007 DOI: 10.1093/nar/gki053] [Citation(s) in RCA: 208] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The Comprehensive Yeast Genome Database (CYGD) compiles a comprehensive data resource for information on the cellular functions of the yeast Saccharomyces cerevisiae and related species, chosen as the best understood model organism for eukaryotes. The database serves as a common resource generated by a European consortium, going beyond the provision of sequence information and functional annotations on individual genes and proteins. In addition, it provides information on the physical and functional interactions among proteins as well as other genetic elements. These cellular networks include metabolic and regulatory pathways, signal transduction and transport processes as well as co-regulated gene clusters. As more yeast genomes are published, their annotation becomes greatly facilitated using S.cerevisiae as a reference. CYGD provides a way of exploring related genomes with the aid of the S.cerevisiae genome as a backbone and SIMAP, the Similarity Matrix of Proteins. The comprehensive resource is available under http://mips.gsf.de/genre/proj/yeast/.
Collapse
Affiliation(s)
- U Güldener
- Institute for Bioinformatics, GSF National Research Center for Environment and Health, Ingolstädter Landstrasse 1, D-85764 Neuherberg, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Simonis N, van Helden J, Cohen GN, Wodak SJ. Transcriptional regulation of protein complexes in yeast. Genome Biol 2004; 5:R33. [PMID: 15128447 PMCID: PMC416469 DOI: 10.1186/gb-2004-5-5-r33] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2003] [Revised: 03/30/2004] [Accepted: 04/06/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Multiprotein complexes play an essential role in many cellular processes. But our knowledge of the mechanism of their formation, regulation and lifetimes is very limited. We investigated transcriptional regulation of protein complexes in yeast using two approaches. First, known regulons, manually curated or identified by genome-wide screens, were mapped onto the components of multiprotein complexes. The complexes comprised manually curated ones and those characterized by high-throughput analyses. Second, putative regulatory sequence motifs were identified in the upstream regions of the genes involved in individual complexes and regulons were predicted on the basis of these motifs. RESULTS Only a very small fraction of the analyzed complexes (5-6%) have subsets of their components mapping onto known regulons. Likewise, regulatory motifs are detected in only about 8-15% of the complexes, and in those, about half of the components are on average part of predicted regulons. In the manually curated complexes, the so-called 'permanent' assemblies have a larger fraction of their components belonging to putative regulons than 'transient' complexes. For the noisier set of complexes identified by high-throughput screens, valuable insights are obtained into the function and regulation of individual genes. CONCLUSIONS A small fraction of the known multiprotein complexes in yeast seems to have at least a subset of their components co-regulated on the transcriptional level. Preliminary analysis of the regulatory motifs for these components suggests that the corresponding genes are likely to be co-regulated either together or in smaller subgroups, indicating that transcriptionally regulated modules might exist within complexes.
Collapse
Affiliation(s)
- Nicolas Simonis
- Service de Conformation des Macromolécules Biologiques, Centre de Biologie Structurale et Bioinformatique, CP 263, Université Libre de Bruxelles, Bld du Triomphe, B-1050 Bruxelles, Belgium
| | - Jacques van Helden
- Service de Conformation des Macromolécules Biologiques, Centre de Biologie Structurale et Bioinformatique, CP 263, Université Libre de Bruxelles, Bld du Triomphe, B-1050 Bruxelles, Belgium
| | - George N Cohen
- Institut Pasteur, Unité d'Expression des Gènes Eucaryotes, Institut Pasteur, rue du Docteur Roux, 75724 Paris Cedex 15, France
| | - Shoshana J Wodak
- Service de Conformation des Macromolécules Biologiques, Centre de Biologie Structurale et Bioinformatique, CP 263, Université Libre de Bruxelles, Bld du Triomphe, B-1050 Bruxelles, Belgium
| |
Collapse
|