Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Simonis N, Wodak SJ, Cohen GN, van Helden J. Combining pattern discovery and discriminant analysis to predict gene co-regulation. Bioinformatics 2004;20:2370-9. [PMID: 15073004 DOI: 10.1093/bioinformatics/bth252] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

For:	Simonis N, Wodak SJ, Cohen GN, van Helden J. Combining pattern discovery and discriminant analysis to predict gene co-regulation. Bioinformatics 2004;20:2370-9. [PMID: 15073004 DOI: 10.1093/bioinformatics/bth252] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Number

Cited by Other Article(s)

Gao Z, Zhao R, Ruan J. A genome-wide cis-regulatory element discovery method based on promoter sequences and gene co-expression networks. BMC Genomics 2013;14 Suppl 1:S4. [PMID: 23368633 PMCID: PMC3549801 DOI: 10.1186/1471-2164-14-s1-s4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Brohée S, Janky R, Abdel-Sater F, Vanderstocken G, André B, van Helden J. Unraveling networks of co-regulated genes on the sole basis of genome sequences. Nucleic Acids Res 2011;39:6340-58. [PMID: 21572103 PMCID: PMC3159452 DOI: 10.1093/nar/gkr264] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Uncovering gene regulatory networks from time-series microarray data with variational Bayesian structural expectation maximization. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2010:71312. [PMID: 18309364 DOI: 10.1155/2007/71312] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2006] [Revised: 12/04/2006] [Accepted: 05/11/2007] [Indexed: 11/17/2022]

Ruan J, Deng Y, Perkins EJ, Zhang W. An ensemble learning approach to reverse-engineering transcriptional regulatory networks from time-series gene expression data. BMC Genomics 2009;10 Suppl 1:S8. [PMID: 19594885 PMCID: PMC2709269 DOI: 10.1186/1471-2164-10-s1-s8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

One of the most challenging tasks in the post-genomic era is to reconstruct the transcriptional regulatory networks. The goal is to reveal, for each gene that responds to a certain biological event, which transcription factors affect its expression, and how a set of transcription factors coordinate to accomplish temporal and spatial specific regulations.

RESULTS

Here we propose a supervised machine learning approach to address these questions. We focus our study on the gene transcriptional regulation of the cell cycle in the budding yeast, thanks to the large amount of data available and relatively well-understood biology, although the main ideas of our method can be applied to other data as well. Our method starts with building an ensemble of decision trees for each microarray data to capture the association between the expression levels of yeast genes and the binding of transcription factors to gene promoter regions, as determined by chromatin immunoprecipitation microarray (ChIP-chip) experiment. Cross-validation experiments show that the method is more accurate and reliable than the naive decision tree algorithm and several other ensemble learning methods. From the decision tree ensembles, we extract logical rules that explain how a set of transcription factors act in concert to regulate the expression of their targets. We further compute a profile for each rule to show its regulation strengths at different time points. We also propose a spline interpolation method to integrate the rule profiles learned from several time series expression data sets that measure the same biological process. We then combine these rule profiles to build a transcriptional regulatory network for the yeast cell cycle. Compared to the results in the literature, our method correctly identifies all major known yeast cell cycle transcription factors, and assigns them into appropriate cell cycle phases. Our method also identifies many interesting synergetic relationships among these transcription factors, most of which are well known, while many of the rest can also be supported by other evidences.

CONCLUSION

The high accuracy of our method indicates that our method is valid and robust. As more gene expression and transcription factor binding data become available, we believe that our method is useful for reconstructing large-scale transcriptional regulatory networks in other species as well.

Collapse

On the impact of entropy estimation on transcriptional regulatory network inference based on mutual information. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2009:308959. [PMID: 19148299 PMCID: PMC3171423 DOI: 10.1155/2009/308959] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2008] [Accepted: 10/08/2008] [Indexed: 11/17/2022]

Kontos K, Godard P, André B, van Helden J, Bontempi G. Machine learning techniques to identify putative genes involved in nitrogen catabolite repression in the yeast Saccharomyces cerevisiae. BMC Proc 2008;2 Suppl 4:S5. [PMID: 19091052 PMCID: PMC2654973 DOI: 10.1186/1753-6561-2-s4-s5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences. Nat Protoc 2008;3:1589-603. [PMID: 18802440 DOI: 10.1038/nprot.2008.98] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]

Holloway DT, Kon M, DeLisi C. In silico regulatory analysis for exploring human disease progression. Biol Direct 2008;3:24. [PMID: 18564415 PMCID: PMC2464594 DOI: 10.1186/1745-6150-3-24] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2008] [Accepted: 06/18/2008] [Indexed: 12/24/2022] Open

Abstract

Background

An important goal in bioinformatics is to unravel the network of transcription factors (TFs) and their targets. This is important in the human genome, where many TFs are involved in disease progression. Here, classification methods are applied to identify new targets for 152 transcriptional regulators using publicly-available targets as training examples. Three types of sequence information are used: composition, conservation, and overrepresentation.

Results

Starting with 8817 TF-target interactions we predict an additional 9333 targets for 152 TFs. Randomized classifiers make few predictions (~2/18660) indicating that our predictions for many TFs are significantly enriched for true targets. An enrichment score is calculated and used to filter new predictions.

Two case-studies for the TFs OCT4 and WT1 illustrate the usefulness of our predictions:

• Many predicted OCT4 targets fall into the Wnt-pathway. This is consistent with known biology as OCT4 is developmentally related and Wnt pathway plays a role in early development.

• Beginning with 15 known targets, 354 predictions are made for WT1. WT1 has a role in formation of Wilms' tumor. Chromosomal regions previously implicated in Wilms' tumor by cytological evidence are statistically enriched in predicted WT1 targets. These findings may shed light on Wilms' tumor progression, suggesting that the tumor progresses either by loss of WT1 or by loss of regions harbouring its targets.

• Targets of WT1 are statistically enriched for cancer related functions including metastasis and apoptosis. Among new targets are BAX and PDE4B, which may help mediate the established anti-apoptotic effects of WT1.

• Of the thirteen TFs found which co-regulate genes with WT1 (p ≤ 0.02), 8 have been previously implicated in cancer. The regulatory-network for WT1 targets in genomic regions relevant to Wilms' tumor is provided.

Conclusion

We have assembled a set of features for the targets of human TFs and used them to develop classifiers for the determination of new regulatory targets. Many predicted targets are consistent with the known biology of their regulators, and new targets for the Wilms' tumor regulator, WT1, are proposed. We speculate that Wilms' tumor development is mediated by chromosomal rearrangements in the location of WT1 targets.

Reviewers

This article was reviewed by Trey Ideker, Vladimir A. Kuznetsov(nominated by Frank Eisenhaber), and Tzachi Pilpel.

Collapse

Classifying transcription factor targets and discovering relevant biological features. Biol Direct 2008;3:22. [PMID: 18513408 PMCID: PMC2441612 DOI: 10.1186/1745-6150-3-22] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2008] [Accepted: 05/30/2008] [Indexed: 01/04/2023] Open

Abstract

Background

An important goal in post-genomic research is discovering the network of interactions between transcription factors (TFs) and the genes they regulate. We have previously reported the development of a supervised-learning approach to TF target identification, and used it to predict targets of 104 transcription factors in yeast. We now include a new sequence conservation measure, expand our predictions to include 59 new TFs, introduce a web-server, and implement an improved ranking method to reveal the biological features contributing to regulation. The classifiers combine 8 genomic datasets covering a broad range of measurements including sequence conservation, sequence overrepresentation, gene expression, and DNA structural properties.

Principal Findings

(1) Application of the method yields an amplification of information about yeast regulators. The ratio of total targets to previously known targets is greater than 2 for 11 TFs, with several having larger gains: Ash1(4), Ino2(2.6), Yaf1(2.4), and Yap6(2.4).

(2) Many predicted targets for TFs match well with the known biology of their regulators. As a case study we discuss the regulator Swi6, presenting evidence that it may be important in the DNA damage response, and that the previously uncharacterized gene YMR279C plays a role in DNA damage response and perhaps in cell-cycle progression.

(3) A procedure based on recursive-feature-elimination is able to uncover from the large initial data sets those features that best distinguish targets for any TF, providing clues relevant to its biology. An analysis of Swi6 suggests a possible role in lipid metabolism, and more specifically in metabolism of ceramide, a bioactive lipid currently being investigated for anti-cancer properties.

(4) An analysis of global network properties highlights the transcriptional network hubs; the factors which control the most genes and the genes which are bound by the largest set of regulators. Cell-cycle and growth related regulators dominate the former; genes involved in carbon metabolism and energy generation dominate the latter.

Conclusion

Postprocessing of regulatory-classifier results can provide high quality predictions, and feature ranking strategies can deliver insight into the regulatory functions of TFs. Predictions are available at an online web-server, including the full transcriptional network, which can be analyzed using VisAnt network analysis suite.

Reviewers

This article was reviewed by Igor Jouline, Todd Mockler(nominated by Valerian Dolja), and Sandor Pongor.

Collapse

Godard P, Urrestarazu A, Vissers S, Kontos K, Bontempi G, van Helden J, André B. Effect of 21 different nitrogen sources on global gene expression in the yeast Saccharomyces cerevisiae. Mol Cell Biol 2007;27:3065-86. [PMID: 17308034 PMCID: PMC1899933 DOI: 10.1128/mcb.01084-06] [Citation(s) in RCA: 186] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2006] [Revised: 07/24/2006] [Accepted: 01/16/2007] [Indexed: 11/20/2022] Open

Holloway DT, Kon M, DeLisi C. Machine learning for regulatory analysis and transcription factor target prediction in yeast. SYSTEMS AND SYNTHETIC BIOLOGY 2007;1:25-46. [PMID: 19003435 PMCID: PMC2533145 DOI: 10.1007/s11693-006-9003-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Abstract

High throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional maps-the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using support vector machines (SVMs) to combine 26 different data types. A comparison with the standard approach to site identification using position specific scoring matrices (PSSMs) for a set of 104 Saccharomyces cerevisiae regulators indicates that our SVM-based target classification is more sensitive (73 vs. 20%) when specificity and positive predictive value are the same. We have applied our SVM classifier for each transcriptional regulator to all promoters in the yeast genome to obtain thousands of new targets, which are currently being analyzed and refined to limit the risk of classifier over-fitting. For the purpose of illustration we discuss several results, including biochemical pathway predictions for Gcn4 and Rap1. For both transcription factors SVM predictions match well with the known biology of control mechanisms, and possible new roles for these factors are suggested, such as a function for Rap1 in regulating fermentative growth. We also examine the promoter melting temperature curves for the targets of YJR060W, and show that targets of this TF have potentially unique physical properties which distinguish them from other genes. The SVM output automatically provides the means to rank dataset features to identify important biological elements. We use this property to rank classifying k-mers, thereby reconstructing known binding sites for several TFs, and to rank expression experiments, determining the conditions under which Fhl1, the factor responsible for expression of ribosomal protein genes, is active. We can see that targets of Fhl1 are differentially expressed in the chosen conditions as compared to the expression of average and negative set genes. SVM-based classifiers provide a robust framework for analysis of regulatory networks. Processing of classifier outputs can provide high quality predictions and biological insight into functions of particular transcription factors. Future work on this method will focus on increasing the accuracy and quality of predictions using feature reduction and clustering strategies. Since predictions have been made on only 104 TFs in yeast, new classifiers will be built for the remaining 100 factors which have available binding data.

Collapse

Azuaje F, Wang H, Zheng H, Bodenreider O, Chesneau A. Predictive Integration of Gene Ontology-Driven Similarity and Functional Interactions. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON DATA MINING 2006;2006:114-119. [PMID: 25698910 DOI: 10.1109/icdmw.2006.130] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Simonis N, Gonze D, Orsi C, van Helden J, Wodak SJ. Modularity of the transcriptional response of protein complexes in yeast. J Mol Biol 2006;363:589-610. [PMID: 16973176 DOI: 10.1016/j.jmb.2006.06.024] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2005] [Revised: 05/14/2006] [Accepted: 06/12/2006] [Indexed: 11/24/2022]

Abstract

A comprehensive study is performed on the condition-dependent expression of genes coding for the components of hand curated multi-protein complexes of the yeast Saccharomyces cerevisiae, in order to identify coherent transcriptional modules within these complexes. Such modules are defined as groups of genes within complexes whose expression profiles under a common set of experimental conditions allow us to discriminate them from random sets of genes. Our analysis reveals that complexes such as the cytoplasmic ribosome, the proteasome and the respiration chain complexes previously characterized as "stable" or "permanent" represent transcriptional modules that are coherently up or down-regulated in many different conditions. Overall however, some level of coherent expression is detected only in 71 out of the total of 113 complexes with at least five different protein components that could be reliably analyzed. Of these, 26 behave as coherently expressed transcriptional modules encompassing all the components of the complex. In another 15, at least half of the components make up such modules and in ten, few or no modules are detected. In an additional 20 complexes coherent expression is detected, but in too few conditions to enable reliable module detection. Interestingly, the transcriptional modules, when detected, often correspond to one or more known sub-complexes with specific functions. Furthermore, detected modules are generally consistent with transcriptional modules identified on the basis of predicted cis-regulatory sequence motifs. Also, groups of genes shared between complexes that carry out related functions tend to be part of overlapping transcriptional modules identified in these complexes. Together these findings suggest that transcriptional modules may represent basic functional and evolutionary building blocs of protein complexes.

Collapse

Cohen GN. LOOKING BACK. Annu Rev Microbiol 2005;59:1-17. [PMID: 16153161 DOI: 10.1146/annurev.micro.59.030804.121329] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Gonze D, Pinloche S, Gascuel O, van Helden J. Discrimination of yeast genes involved in methionine and phosphate metabolism on the basis of upstream motifs. Bioinformatics 2005;21:3490-500. [PMID: 15998664 DOI: 10.1093/bioinformatics/bti558] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Ruan J, Zhang W. CAGER: classification analysis of gene expression regulation using multiple information sources. BMC Bioinformatics 2005;6:114. [PMID: 15890068 PMCID: PMC1174863 DOI: 10.1186/1471-2105-6-114] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2004] [Accepted: 05/12/2005] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Many classification approaches have been applied to analyzing transcriptional regulation of gene expressions. These methods build models that can explain a gene's expression level from the regulatory elements (features) on its promoter sequence. Different types of features, such as experimentally verified binding motifs, motifs discovered by computer programs, or transcription factor binding data measured with Chromatin Immunoprecipitation (ChIP) assays, have been used towards this goal. Each type of features has been shown successful in modeling gene transcriptional regulation under certain conditions. However, no comparison has been made to evaluate the relative merit of these features. Furthermore, most publicly available classification tools were not designed specifically for modeling transcriptional regulation, and do not allow the user to combine different types of features.

RESULTS

In this study, we use a specific classification method, decision trees, to model transcriptional regulation in yeast with features based on predefined motifs, automatically identified motifs, ChlP-chip data, or their combinations. We compare the accuracies and stability of these models, and analyze their capabilities in identifying functionally related genes. Furthermore, we design and implement a user-friendly web server called CAGER (Classification Analysis of Gene Expression Regulation) that integrates several software components for automated analysis of transcriptional regulation using decision trees. Finally, we use CAGER to study the transcriptional regulation of Arabidopsis genes in response to abscisic acid, and report some interesting new results.

CONCLUSION

Models built with ChlP-chip data suffer from low accuracies when the condition under which gene expressions are measured is significantly different from the condition under which the ChIP experiment is conducted. Models built with automatically identified motifs can sometimes discover new features, but their modeling accuracies may have been over-estimated in previous studies. Furthermore, models built with automatically identified motifs are not stable with respect to noises. A combination of ChlP-chip data and predefined motifs can substantially improve modeling accuracies, and is effective in identifying true regulons. The CAGER web server, which is freely available at http://cic.cs.wustl.edu/CAGER/, allows the user to select combinations of different feature types for building decision trees, and interact with the models graphically. We believe that it will be a useful tool to facilitate the discovery of gene transcriptional regulatory networks.

Collapse

Güldener U, Münsterkötter M, Kastenmüller G, Strack N, van Helden J, Lemer C, Richelles J, Wodak SJ, García-Martínez J, Pérez-Ortín JE, Michael H, Kaps A, Talla E, Dujon B, André B, Souciet JL, De Montigny J, Bon E, Gaillardin C, Mewes HW. CYGD: the Comprehensive Yeast Genome Database. Nucleic Acids Res 2005;33:D364-8. [PMID: 15608217 PMCID: PMC540007 DOI: 10.1093/nar/gki053] [Citation(s) in RCA: 208] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Simonis N, van Helden J, Cohen GN, Wodak SJ. Transcriptional regulation of protein complexes in yeast. Genome Biol 2004;5:R33. [PMID: 15128447 PMCID: PMC416469 DOI: 10.1186/gb-2004-5-5-r33] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2003] [Revised: 03/30/2004] [Accepted: 04/06/2004] [Indexed: 11/10/2022] Open