51
|
Waleev T, Shtokalo D, Konovalova T, Voss N, Cheremushkin E, Stegmaier P, Kel-Margoulis O, Wingender E, Kel A. Composite Module Analyst: identification of transcription factor binding site combinations using genetic algorithm. Nucleic Acids Res 2006; 34:W541-5. [PMID: 16845066 PMCID: PMC1538785 DOI: 10.1093/nar/gkl342] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Composite Module Analyst (CMA) is a novel software tool aiming to identify promoter-enhancer models based on the composition of transcription factor (TF) binding sites and their pairs. CMA is closely interconnected with the TRANSFAC database. In particular, CMA uses the positional weight matrix (PWM) library collected in TRANSFAC and therefore provides the possibility to search for a large variety of different TF binding sites. We model the structure of the long gene regulatory regions by a Boolean function that joins several local modules, each consisting of co-localized TF binding sites. Having as an input a set of co-regulated genes, CMA builds the promoter model and optimizes the parameters of the model automatically by applying a genetic-regression algorithm. We use a multicomponent fitness function of the algorithm which includes several statistical criteria in a weighted linear function. We show examples of successful application of CMA to a microarray data on transcription profiling of TNF-alpha stimulated primary human endothelial cells. The CMA web server is freely accessible at http://www.gene-regulation.com/pub/programs/cma/CMA.html. An advanced version of CMA is also a part of the commercial system ExPlaintrade mark (www.biobase.de) designed for causal analysis of gene expression data.
Collapse
Affiliation(s)
- T. Waleev
- A.P. Ershov's Institute of Informatics Systems6, Lavrentiev avenue, 630090 Novosibirsk, Russia
| | - D. Shtokalo
- A.P. Ershov's Institute of Informatics Systems6, Lavrentiev avenue, 630090 Novosibirsk, Russia
| | - T. Konovalova
- Institute of Cytology and GeneticsNovosibirsk, Russia
| | - N. Voss
- BIOBASE GmbHHalchtersche Strasse 33, D-38304 Wolfenbüttel, Germany
| | - E. Cheremushkin
- A.P. Ershov's Institute of Informatics Systems6, Lavrentiev avenue, 630090 Novosibirsk, Russia
| | - P. Stegmaier
- BIOBASE GmbHHalchtersche Strasse 33, D-38304 Wolfenbüttel, Germany
| | - O. Kel-Margoulis
- BIOBASE GmbHHalchtersche Strasse 33, D-38304 Wolfenbüttel, Germany
| | - E. Wingender
- BIOBASE GmbHHalchtersche Strasse 33, D-38304 Wolfenbüttel, Germany
- Department Bioinformatics, UKG/University GöttingenGoldschmidtstr. 1, D-37077 Göttingen, Germany
| | - A. Kel
- BIOBASE GmbHHalchtersche Strasse 33, D-38304 Wolfenbüttel, Germany
- To whom correspondence should be addressed. Tel: +49-5331-858441; Fax: +49-5331-858470;
| |
Collapse
|
52
|
Abstract
TiProD is a database of human promoter sequences for which some functional features are known. It allows a user to query individual promoters and the expression pattern they mediate, gene expression signatures of individual tissues, and to retrieve sets of promoters according to their tissue-specific activity or according to individual Gene Ontology terms the corresponding genes are assigned to. We have defined a measure for tissue-specificity that allows the user to discriminate between ubiquitously and specifically expressed genes. The database is accessible at http://tiprod.cbi.pku.edu.cn:8080/index.html.
Collapse
Affiliation(s)
| | | | - Klaus Hornischer
- BIOBASE GmbHHalchtersche Strasse 33, D-38304 Wolfenbüttel, Germany
| | - Alexander Kel
- BIOBASE GmbHHalchtersche Strasse 33, D-38304 Wolfenbüttel, Germany
| | - Edgar Wingender
- BIOBASE GmbHHalchtersche Strasse 33, D-38304 Wolfenbüttel, Germany
- Department of Bioinformatics, UKG, University of GöttingenGoldschmidtstrasse 1, D-37077 Göttingen, Germany
- To whom correspondence should be addressed. Tel: +49 (0) 551 39 14912; Fax: +49 (0) 551 39 14914;
| |
Collapse
|
53
|
Krull M, Pistor S, Voss N, Kel A, Reuter I, Kronenberg D, Michael H, Schwarzer K, Potapov A, Choi C, Kel-Margoulis O, Wingender E. TRANSPATH: an information resource for storing and visualizing signaling pathways and their pathological aberrations. Nucleic Acids Res 2006; 34:D546-51. [PMID: 16381929 PMCID: PMC1347469 DOI: 10.1093/nar/gkj107] [Citation(s) in RCA: 120] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
TRANSPATH is a database about signal transduction events. It provides information about signaling molecules, their reactions and the pathways these reactions constitute. The representation of signaling molecules is organized in a number of orthogonal hierarchies reflecting the classification of the molecules, their species-specific or generic features, and their post-translational modifications. Reactions are similarly hierarchically organized in a three-layer architecture, differentiating between reactions that are evidenced by individual publications, generalizations of these reactions to construct species-independent 'reference pathways' and the 'semantic projections' of these pathways. A number of search and browse options allow easy access to the database contents, which can be visualized with the tool PathwayBuildertrade mark. The module PathoSign adds data about pathologically relevant mutations in signaling components, including their genotypes and phenotypes. TRANSPATH and PathoSign can be used as encyclopaedia, in the educational process, for vizualization and modeling of signal transduction networks and for the analysis of gene expression data. TRANSPATH Public 6.0 is freely accessible for users from non-profit organizations under http://www.gene-regulation.com/pub/databases.html.
Collapse
Affiliation(s)
- Mathias Krull
- BIOBASE GmbH, Halchtersche Strasse 33, D-38304 Wolfenbüttel, Germany.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
54
|
Kel A, Konovalova T, Waleev T, Cheremushkin E, Kel-Margoulis O, Wingender E. Composite Module Analyst: a fitness-based tool for identification of transcription factor binding site combinations. Bioinformatics 2006; 22:1190-7. [PMID: 16473870 DOI: 10.1093/bioinformatics/btl041] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
MOTIVATION Functionally related genes involved in the same molecular-genetic, biochemical or physiological process are often regulated coordinately. Such regulation is provided by precisely organized binding of a multiplicity of special proteins [transcription factors (TFs)] to their target sites (cis-elements) in regulatory regions of genes. Cis-element combinations provide a structural basis for the generation of unique patterns of gene expression. RESULTS Here we present a new approach for defining promoter models based on the composition of TF binding sites and their pairs. We utilize a multicomponent fitness function for selection of the promoter model that fits best to the observed gene expression profile. We demonstrate examples of successful application of the fitness function with the help of a genetic algorithm for the analysis of functionally related or co-expressed genes as well as testing on simulated and permutated data. AVAILABILITY The CMA program is freely available for non-commercial users. URL http://www.gene-regulation.com/pub/programs.html#CMAnalyst. It is also a part of the commercial system ExPlain (www.biobase.de) designed for causal analysis of gene expression data..
Collapse
Affiliation(s)
- A Kel
- BIOBASE GmbH Halchtersche Str. 33, D-38304 Wolfenbüttel, Germany.
| | | | | | | | | | | |
Collapse
|
55
|
Abstract
MOTIVATION Mathematical models of the cell cycle can contribute to an understanding of its basic mechanisms. Modern simulation tools make the analysis of key components and their interactions very effective. This paper focuses on the role of small modules and feedbacks in the gene-protein network governing the G1/S transition in mammalian cells. Mutations in this network may lead to uncontrolled cell proliferation. Bifurcation analysis helps to identify the key components of this extremely complex interaction network. RESULTS We identify various positive and negative feedback loops in the network controlling the G1/S transition. It is shown that the positive feedback regulation of E2F1 and a double activator-inhibitor module can lead to bistability. Extensions of the core module preserve the essential features such as bistability. The complete model exhibits a transcritical bifurcation in addition to bistability. We relate these bifurcations to the cell cycle checkpoint and the G1/S phase transition point. Thus, core modules can explain major features of the complex G1/S network and have a robust decision taking function.
Collapse
Affiliation(s)
- Maciej Swat
- Institute for Theoretical Biology, Humboldt University Berlin, Invalidenstrasse 43, Berlin, D-10115, Germany.
| | | | | |
Collapse
|
56
|
Sakaki Y, Watanabe H, Taylor T, Hattori M, Fujiyama A, Toyoda A, Kuroki Y, Itoh T, Saitou N, Oota S, Kim CG, Kitano T, Lehrach H, Yaspo ML, Sudbrak R, Kahla A, Reinhardt R, Kube M, Platzer M, Taenzer S, Galgoczy P, Kel A, Blöecker H, Scharfe M, Nordsiek G, Hellmann I, Khaitovich P, Pääbo S, Chen Z, Wang SY, Ren SX, Zhang XL, Zheng HJ, Zhu GF, Wang BF, Zhao GP, Tsai SF, Wu K, Liu TT, Hsiao KJ, Park HS, Lee YS, Cheong JE, Choi SH. Human versus chimpanzee chromosome-wide sequence comparison and its evolutionary implication. Cold Spring Harb Symp Quant Biol 2004; 68:455-60. [PMID: 15338648 DOI: 10.1101/sqb.2003.68.455] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Affiliation(s)
- Y Sakaki
- RIKEN, Genomic Sciences Center, Yokohama 230-0045, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
57
|
Kel A, Reymann S, Matys V, Nettesheim P, Wingender E, Borlak J. A Novel Computational Approach for the Prediction of Networked Transcription Factors of Aryl Hydrocarbon-Receptor-Regulated Genes. Mol Pharmacol 2004; 66:1557-72. [PMID: 15342792 DOI: 10.1124/mol.104.001677] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
A novel computational method based on a genetic algorithm was developed to study composite structure of promoters of coexpressed genes. Our method enabled an identification of combinations of multiple transcription factor binding sites regulating the concerted expression of genes. In this article, we study genes whose expression is regulated by a ligand-activated transcription factor, aryl hydrocarbon receptor (AhR), that mediates responses to a variety of toxins. AhR-mediated change in expression of AhR target genes was measured by oligonucleotide microarrays and by reverse transcription-polymerase chain reaction in human and rat hepatocytes. Promoters and long-distance regulatory regions (>10 kb) of AhR-responsive genes were analyzed by the genetic algorithm and a variety of other computational methods. Rules were established on the local oligonucleotide context in the flanks of the AhR binding sites, on the occurrence of clusters of AhR recognition elements, and on the presence in the promoters of specific combinations of multiple binding sites for the transcription factors cooperating in the AhR regulatory network. Our rules were applied to search for yet unknown Ah-receptor target genes. Experimental evidence is presented to demonstrate high fidelity of this novel in silico approach.
Collapse
|
58
|
Kel A, Tikunov Y, Voss N, Wingender E. Recognition of multiple patterns in unaligned sets of sequences: comparison of kernel clustering method with other methods. Bioinformatics 2004; 20:1512-6. [PMID: 15231544 DOI: 10.1093/bioinformatics/bth111] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Transcription factor binding sites often differ significantly in their primary sequence and can hardly be aligned. Often one set of sites can contain several subsets of sequences that follow not just one but several different patterns. There is a need for sensitive methods to reveal multiple patterns in unaligned sets of sequences. RESULTS We developed a novel method for analysis of unaligned sets of sequences based on kernel estimation. The method is able to reveal 'multiple local patterns'-a set of weight matrices. Every weight matrix characterizes a pattern that can be found in a significant subset of sequences under analysis. The method developed has been compared with several other methods of pattern discovery such as Gibbs sampling, MEME, CONSENSUS, MULTIPROFILER and PROJECTION. The kernel method showed the best performance in terms of how close the revealed weight matrices are to the original ones. We applied the kernel method to analyze three samples of promoters (cell-cycle, T-cells and muscle-specific). We compared the multiple patterns revealed with the TRANSFAC library of weight matrices and found a strong similarity to several weight matrices for transcription factors known to be involved in the mentioned specific gene regulation. AVAILABILITY The program is available for on-line use at: http://www.biobase.de/cgi-bin/biobase/cbs2/bin/template.cgi?template=cbscall.html
Collapse
Affiliation(s)
- A Kel
- Institute of Cytology and Genetics SB RAN, Lavrentyev pr., 10, 630090, Novosibirsk, Russia.
| | | | | | | |
Collapse
|
59
|
Choi C, Crass T, Kel A, Kel-Margoulis O, Krull M, Pistor S, Potapov A, Voss N, Wingender E. Consistent re-modeling of signaling pathways and its implementation in the TRANSPATH database. Genome Inform 2004; 15:244-54. [PMID: 15706510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
The data model of the signaling pathways database TRANSPATH has been re-engineered to a three-layer model comprising experimental evidences and summarized pathway information, both in a mechanistically detailed manner, and a "semantic" projection for the abstract overview. Each molecule is described in the context of a certain reaction in the multidimensional space of posttranslational modification, molecular family relationships, and the biological species of its origin. The new model makes the data better suitable for reconstructing signaling pathways and networks and mapping expression data, for instance from microarray experiments, onto regulatory networks.
Collapse
Affiliation(s)
- Claudia Choi
- BIOBASE GmbH, Halchtersche Str.33, D-38304 Wolfenbüttel, Germany.
| | | | | | | | | | | | | | | | | |
Collapse
|
60
|
Abstract
Gene expression in higher organisms, is, to a large degree, controlled at the level of transcription, where DNA-binding proteins (transcription factors) play an influential role in gene regulation. This is achieved through various mechanisms, including those that involve silencer and enhancer regions. Variation in those regulatory regions, as well as in the genes encoding the transcription factors, has been shown to generate functional effects at the molecular, cellular, and neurobehavioral levels. The aim of the present paper is two-fold. First, for the sake of clarity and to reintroduce the terminology to Behavior Genetics readers, we review the concepts of gene structure, gene expression, and gene regulation. Second, using distinct bioinformatic tools, we set out to identify transcription factors that could be involved in the transcriptional regulation of genes known to be associated with aggressive behavior in mice. The results of this in silico study reveal common putative transcription factor binding sites among the set of genes investigated (especially for SRY), suggesting similar molecular transcriptional mechanisms.
Collapse
Affiliation(s)
- Ursula M D'Souza
- MRC Social, Genetics and Developmental Psychiatry Centre, Institute of Psychiatry, King's College London, United Kingdom.
| | | | | |
Collapse
|
61
|
Abstract
UNLABELLED Phylogenetic footprinting is an efficient approach for revealing potential transcription factor binding sites in promoter sequences. The idea is based on an assumption that functional sites in promoters should evolve much slower then other regions that do not bear any conservative function. Therefore, potential transcription factor (TF) binding sites that are found in the evolutionally conservative regions of promoters have more chances to be considered as "real" sites. The most difficult step of the phylogenetic footprinting is alignment of promoter sequences between different organisms (fe. human and mouse). The conventional alignment methods often can not align promoters due to the high level of sequence variability. We have developed a new alignment method that takes into account similarity in distribution of potential binding sites (motif-based alignment). This method has been used effectively for promoter alignment and for revealing new potential binding sites for various transcription factors. We made a systematic phylogenetic footprinting of human/mouse conserved non-coding sequences (CNS). 60 thousand potential binding sites were revealed in human and mouse genomes. We have developed a database of the predicted potential TF binding sites. AVAILABILITY http://compel.bionet.nsc.ru/FunSite/footprint/; www.gene-regulation.com/.
Collapse
Affiliation(s)
- E Cheremushkin
- Institute of Cytology & Genetics SB RAN, 10 Lavrentyev pr., 630090, Novosibirsk, Russia
| | | |
Collapse
|
62
|
Farnham P, Graveel C, Kirmizis A, Bartley S, Kel A, Kel-Margoulis O, Wingender E, Zhang M, Jatkoe T, Madore S. Use of chromatin immunoprecipitation to study transcriptional deregulation in cancer cells. Nat Genet 2001. [DOI: 10.1038/87076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
63
|
Kel A, Kel-Margoulis O, Babenko V, Wingender E. Recognition of NFATp/AP-1 composite elements within genes induced upon the activation of immune cells. J Mol Biol 1999; 288:353-76. [PMID: 10329147 DOI: 10.1006/jmbi.1999.2684] [Citation(s) in RCA: 95] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Composite elements are regulatory modules of promoters or enhancers that consist of binding sites of two different but synergizing transcription factors. A well-studied example is nuclear factors of activated T-cell (NFAT) sites which are composite elements of a NFATp/c and an activating protein 1 (AP-1) binding site. We have developed a computational approach to identify potential NFAT target genes which (a) comprises an improved method to scan for individual NFAT composite elements; (b) considers positional effects relative to transcription start sites; and (c) involves cluster analysis of potential NFAT composite elements. All three steps progressively helpX?ed to discriminate T-cell-specific promoter sequences against other functional regions (coding and intronic sequences) of the same genes, against promoters of muscle-specific genes or against random sequences. Using this approach, we identified potential NFAT composite elements in promoters of cytokine genes and their receptors as well as in promoters of genes for AP-1 family members, Ca2+-binding proteins and some other components of the regulatory network operating in activated T-cells and other immune cells. The method developed can be adapted to characterize and identify other composite elements as well. The program for recognition NFAT composite elements is available through the World Wide Web (http://compel.bionet.nsc.ru/FunSite/CompelScan. html and http://transfac.gbf.de/dbsearch/funsitep/s _comp.html).
Collapse
Affiliation(s)
- A Kel
- Institute of Cytology and Genetics, pr. Lavrentyeva-10, 630090, Novosibirsk, Russia.
| | | | | | | |
Collapse
|
64
|
Kel A, Ptitsyn A, Babenko V, Meier-Ewert S, Lehrach H. A genetic algorithm for designing gene family-specific oligonucleotide sets used for hybridization: the G protein-coupled receptor protein superfamily. Bioinformatics 1998; 14:259-70. [PMID: 9614269 DOI: 10.1093/bioinformatics/14.3.259] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION Massive oligonucleotide hybridization is one of the most promising technologies of functional genome analysis. The critical point is to design appropriate sets of oligonucleotides that can be used effectively in identification by hybridization. RESULTS Using a genetic algorithm approach, we have attempted to design sets of oligo probes capable of identifying new genes belonging to a defined gene family within a cDNA or genomic library. It is not limited by oligonucleotide length and admits the letter 'N' in the structure of the oligonucleotides selected. One of the major advantages of this approach is the low homology required to identify functional families of sequences with little homology. We have designed the oligonucleotide sets that are most selective for the cDNA clones of transmembrane G protein-coupled receptors (GPCRs), a large family of proteins that form part of a modular system of extracellular signal transduction to the intracellular second messenger pathways. The accuracy of identification has been checked on the EST library containing 713 870 cDNA sequences. A set of 15 oligos between 7 and 14 bases in length has correctly identified 70% of the GPCR cDNA collection sequences with 0.02% false positives. AVAILABILITY The developed software is available by ftp://ftp.bionet.nsc. ru/pub/biology/ and on the Web page http://www.bionet.nsc. ru/SRCG/Oligoselector/. CONTACT kel@.bionet.nsc.ru; sebastian. meier-ewert@gpc-ag.com
Collapse
Affiliation(s)
- A Kel
- Institute of Cytology and Genetics, pr.Lavrentyeva 10, 630090, Novosibirsk, Russia.
| | | | | | | | | |
Collapse
|