51
|
Abstract
In this chapter, we present a brief overview of current knowledge about the promoters of plant microRNAs (miRNAs), and provide a step-by-step guide for predicting plant miRNA promoter elements using known transcription factor binding motifs. The approach to promoter element prediction is based on a carefully constructed collection of Positional Weight Matrices (PWMs) for known transcription factors (TFs) in Arabidopsis. A key concept of the method is to use scoring thresholds for potential binding sites that are appropriate to each individual transcription factor. While the procedure can be applied to search for Transcription Factor Binding Sites (TFBSs) in any pol-II promoter region, it is particularly practical for the case of plant miRNA promoters where upstream sequence regions and binding sites are not readily available in existing databases. The majority of the material described in this chapter is available for download at http://microrna.gr.
Collapse
Affiliation(s)
- Molly Megraw
- Department of Genetics, Center for Bioinformatics, School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | |
Collapse
|
52
|
Damert A, Raiz J, Horn AV, Löwer J, Wang H, Xing J, Batzer MA, Löwer R, Schumann GG. 5'-Transducing SVA retrotransposon groups spread efficiently throughout the human genome. Genome Res 2009; 19:1992-2008. [PMID: 19652014 DOI: 10.1101/gr.093435.109] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
SVA elements represent the youngest family of hominid non-LTR retrotransposons, which alter the human genome continuously. They stand out due to their organization as composite repetitive elements. To draw conclusions on the assembly process that led to the current organization of SVA elements and on their transcriptional regulation, we initiated our study by assessing differences in structures of the 116 SVA elements located on human chromosome 19. We classified SVA elements into seven structural variants, including novel variants like 3'-truncated elements and elements with 5'-flanking sequence transductions. We established a genome-wide inventory of 5'-transduced SVA elements encompassing approximately 8% of all human SVA elements. The diversity of 5' transduction events found indicates transcriptional control of their SVA source elements by a multitude of external cellular promoters in germ cells in the course of their evolution and suggests that SVA elements might be capable of acquiring 5' promoter sequences. Our data indicate that SVA-mediated 5' transduction events involve alternative RNA splicing at cryptic splice sites. We analyzed one remarkably successful human-specific SVA 5' transduction group in detail because it includes at least 32% of all SVA subfamily F members. An ancient retrotransposition event brought an SVA insertion under transcriptional control of the MAST2 gene promoter, giving rise to the primal source element of this group. Members of this group are currently transcribed. Here we show that SVA-mediated 5' transduction events lead to structural diversity of SVA elements and represent a novel source of genomic rearrangements contributing to genomic diversity.
Collapse
Affiliation(s)
- Annette Damert
- Fachgebiet PR2/Retroelemente, Paul-Ehrlich-Institut, D-63225 Langen, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
53
|
Gan Y, Guan J, Zhou S. A pattern-based nearest neighbor search approach for promoter prediction using DNA structural profiles. ACTA ACUST UNITED AC 2009; 25:2006-12. [PMID: 19515962 DOI: 10.1093/bioinformatics/btp359] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Identification of core promoters is a key clue in understanding gene regulations. However, due to the diverse nature of promoter sequences, the accuracy of existing prediction approaches for non-CpG island (simply CGI)-related promoters is not as high as that for CGI-related promoters. This consequently leads to a low genome-wide promoter prediction accuracy. RESULTS In this article, we first systematically analyze the similarities and differences between the two types of promoters (CGI- and non-CGI-related) from a novel structural perspective, and then devise a unified framework, called PNNP (Pattern-based Nearest Neighbor search for Promoter), to predict both CGI- and non-CGI-related promoters based on their structural features. Our comparative analysis on the structural characteristics of promoters reveals two interesting facts: (i) the structural values of CGI- and non-CGI-related promoters are quite different, but they exhibit nearly similar structural patterns; (ii) the structural patterns of promoters are obviously different from that of non-promoter sequences though the sequences have almost similar structural values. Extensive experiments demonstrate that the proposed PNNP approach is effective in capturing the structural patterns of promoters, and can significantly improve genome-wide performance of promoters prediction, especially non-CGI-related promoters prediction. AVAILABILITY The implementation of the program PNNP is available at http://admis.tongji.edu.cn/Projects/pnnp.aspx.
Collapse
Affiliation(s)
- Yanglan Gan
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | | | | |
Collapse
|
54
|
Shelenkov A, Korotkov E. Search of regular sequences in promoters from eukaryotic genomes. Comput Biol Chem 2009; 33:196-204. [DOI: 10.1016/j.compbiolchem.2009.03.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2008] [Revised: 02/08/2009] [Accepted: 03/18/2009] [Indexed: 12/14/2022]
|
55
|
Uher R, Huezo-Diaz P, Perroud N, Smith R, Rietschel M, Mors O, Hauser J, Maier W, Kozel D, Henigsberg N, Barreto M, Placentino A, Dernovsek MZ, Schulze TG, Kalember P, Zobel A, Czerski PM, Larsen ER, Souery D, Giovannini C, Gray JM, Lewis CM, Farmer A, Aitchison KJ, McGuffin P, Craig I. Genetic predictors of response to antidepressants in the GENDEP project. THE PHARMACOGENOMICS JOURNAL 2009; 9:225-33. [PMID: 19365399 DOI: 10.1038/tpj.2009.12] [Citation(s) in RCA: 148] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
The objective of the Genome-based Therapeutic Drugs for Depression study is to investigate the function of variations in genes encoding key proteins in serotonin, norepinephrine, neurotrophic and glucocorticoid signaling in determining the response to serotonin-reuptake-inhibiting and norepinephrine-reuptake-inhibiting antidepressants. A total of 116 single nucleotide polymorphisms in 10 candidate genes were genotyped in 760 adult patients with moderate-to-severe depression, treated with escitalopram (a serotonin reuptake inhibitor) or nortriptyline (a norepinephrine reuptake inhibitor) for 12 weeks in an open-label part-randomized multicenter study. The effect of genetic variants on change in depressive symptoms was evaluated using mixed linear models. Several variants in a serotonin receptor gene (HTR2A) predicted response to escitalopram with one marker (rs9316233) explaining 1.1% of variance (P=0.0016). Variants in the norepinephrine transporter gene (SLC6A2) predicted response to nortriptyline, and variants in the glucocorticoid receptor gene (NR3C1) predicted response to both antidepressants. Two HTR2A markers remained significant after hypothesis-wide correction for multiple testing. A false discovery rate of 0.106 for the three strongest associations indicated that the multiple findings are unlikely to be false positives. The pattern of associations indicated a degree of specificity with variants in genes encoding proteins in serotonin signaling influencing response to the serotonin-reuptake-inhibiting escitalopram, genes encoding proteins in norepinephrine signaling influencing response to the norepinephrine-reuptake-inhibiting nortriptyline and a common pathway gene influencing response to both antidepressants. The single marker associations explained only a small proportion of variance in response to antidepressants, indicating a need for a multivariate approach to prediction.
Collapse
Affiliation(s)
- Rudolf Uher
- MRC Social Genetic and Developmental Psychiatry Center, Institute of Psychiatry, King's College London, London, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
56
|
Simpson C, Thomas C, Findlay K, Bayer E, Maule AJ. An Arabidopsis GPI-anchor plasmodesmal neck protein with callose binding activity and potential to regulate cell-to-cell trafficking. THE PLANT CELL 2009; 21:581-94. [PMID: 19223515 PMCID: PMC2660613 DOI: 10.1105/tpc.108.060145] [Citation(s) in RCA: 223] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2008] [Revised: 01/05/2009] [Accepted: 01/26/2009] [Indexed: 05/18/2023]
Abstract
Plasmodesmata (Pds) traverse the cell wall to establish a symplastic continuum through most of the plant. Rapid and reversible deposition of callose in the cell wall surrounding the Pd apertures is proposed to provide a regulatory process through physical constriction of the symplastic channel. We identified members within a larger family of X8 domain-containing proteins that targeted to Pds. This subgroup of proteins contains signal sequences for a glycosylphosphatidylinositol linkage to the extracellular face of the plasma membrane. We focused our attention on three closely related members of this family, two of which specifically bind to 1,3-beta-glucans (callose) in vitro. We named this family of proteins Pd callose binding proteins (PDCBs). Yellow fluorescent protein-PDCB1 was found to localize to the neck region of Pds with potential to provide a structural anchor between the plasma membrane component of Pds and the cell wall. PDCB1, PDCB2, and PDCB3 had overlapping and widespread patterns of expression, but neither single nor combined insertional mutants for PDCB2 and PDCB3 showed any visible phenotype. However, increased expression of PDCB1 led to an increase in callose accumulation and a reduction of green fluorescent protein (GFP) movement in a GFP diffusion assay, identifying a potential association between PDCB-mediated callose deposition and plant cell-to-cell communication.
Collapse
Affiliation(s)
- Clare Simpson
- John Ines Centre, Norwich Research Park, Colney, Norwich, Norfolk NR4 7UH, United Kingdom
| | | | | | | | | |
Collapse
|
57
|
Rangannan V, Bansal M. Relative stability of DNA as a generic criterion for promoter prediction: whole genome annotation of microbial genomes with varying nucleotide base composition. MOLECULAR BIOSYSTEMS 2009; 5:1758-69. [DOI: 10.1039/b906535k] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
58
|
Wang M, Bao YL, Wu Y, Yu CL, Meng X, Xu HP, Li YX. Identification and characterization of the human testes-specific protease 50 gene promoter. DNA Cell Biol 2008; 27:307-14. [PMID: 18462069 DOI: 10.1089/dna.2007.0692] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Testes-specific protease 50 (TSP50) has been identified as one of the testis-specific proteins that is expressed at high levels in approximately 92% of human breast cancer samples, making it an attractive molecular marker and a potential target for diagnosis and therapy. However, little is known about the transcriptional mechanisms controlling TSP50 gene expression. In the present study, we have characterized the 5' regulatory region of the TSP50 gene in order to understand the molecular mechanisms regulating its expression. Analysis with a series of deletions demonstrated that a 624-bp region was essential for the basal promoter activity of the TSP50 gene. Further analysis results indicated that the two fragment regions +231 to +251 and -22 to -8, especially the putative Sp1 binding site (+237 to +239) and the putative CCAAT/enhancer binding protein (C/EBP) binding site (-15 to -13), are more important for the basal transcription activity of the human TSP50 promoter. Overexpression of Sp1 and C/EBPbeta transcriptional factors upregulated the activities of the TSP50 promoter. Taken together, these results will help to better understand the role of the TSP50 gene in signal-dependent transcriptional regulation, and to develop new reagents for therapeutic downregulation of the TSP50 gene in human breast cancer.
Collapse
Affiliation(s)
- Miao Wang
- Institute of Genetics and Cytology, Northeast Normal University, ChangChun, China
| | | | | | | | | | | | | |
Collapse
|
59
|
Abstract
The formation of diverse cell types from an invariant set of genes is governed by biochemical and molecular processes that regulate gene activity. A complete understanding of the regulatory mechanisms of gene expression is the major function of genomics. Computational genomics is a rapidly emerging area for deciphering the regulation of metazoan genes as well as interpreting the results of high-throughput screening. The integration of computer science with biology has expedited molecular modelling and processing of large-scale data inputs such as microarrays, analysis of genomes, transcriptomes and proteomes. Many bioinformaticians have developed various algorithms for predicting transcriptional regulatory mechanisms from the sequence, gene expression and interaction data. This review contains compiled information of various computational methods adopted to dissect gene expression pathways.
Collapse
Affiliation(s)
- Vibha Rani
- Department of Biotechnology, Jaypee Institute of Information Technology University, A-10, Sector 62, Noida 210 307, India.
| |
Collapse
|
60
|
Abstract
As the number of sequenced genomes increases, the ability to deduce genome function becomes increasingly salient. For many genome sequences, the only annotation that will be available for the foreseeable future will be based on computational predictions and comparisons with functional elements in related species. Here we discuss computational approaches for automated genome-wide annotation of functional elements in mammalian genomes. These include methods for ab initio and comparative gene-structure predictions. Gene features such as intron splice sites, 3' untranslated regions, promoters, and cis-regulatory elements are discussed, as is a novel method for predicting DNaseI hypersensitive sites. Recent methodologies for predicting noncoding RNA genes, including microRNA genes and their targets, are also reviewed.
Collapse
Affiliation(s)
- Steven J M Jones
- Genome Sciences Centre, British Columbia Cancer Research Center, Vancouver, British Columbia, V5Z 1L3, Canada.
| |
Collapse
|
61
|
Pérez A, Lankas F, Luque FJ, Orozco M. Towards a molecular dynamics consensus view of B-DNA flexibility. Nucleic Acids Res 2008; 36:2379-94. [PMID: 18299282 PMCID: PMC2367714 DOI: 10.1093/nar/gkn082] [Citation(s) in RCA: 130] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2007] [Revised: 02/07/2008] [Accepted: 02/08/2008] [Indexed: 01/05/2023] Open
Abstract
We present a systematic study of B-DNA flexibility in aqueous solution using long-scale molecular dynamics simulations with the two more recent versions of nucleic acids force fields (CHARMM27 and parmbsc0) using four long duplexes designed to contain several copies of each individual base pair step. Our study highlights some differences between pambsc0 and CHARMM27 families of simulations, but also extensive agreement in the representation of DNA flexibility. We also performed additional simulations with the older AMBER force fields parm94 and parm99, corrected for non-canonical backbone flips. Taken together, the results allow us to draw for the first time a consensus molecular dynamics picture of B-DNA flexibility.
Collapse
Affiliation(s)
- Alberto Pérez
- Joint IRB-BSC Program on Computational Biology, Institute of Research in Biomedicine, Parc Científic de Barcelona, Josep Samitier 1-5, Barcelona 08028, Barcelona Supercomputing Centre, Jordi Girona 31, Edifici Torre Girona. Barcelona 08034, Departament de Fisicoquímica, Facultat de Farmàcia, Avgda Diagonal sn, Barcelona 08028, Spain, Laboratory for Computation and Visualization in Mathematics and Mechanics, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland, Centre for Complex Molecular Systems and Biomolecues, Institute of Organic Chemistry and Biochemistry Flemingovo nam. 2, 166 10 Praha 6, Czech Republic, National Institute of Bioinformatics, Parc Científic de Barcelona, Josep Samitier 1-5 and Departament de Bioquímica, Facultat de Biología, Avgda Diagonal 647, Barcelona 08028, Spain
| | - Filip Lankas
- Joint IRB-BSC Program on Computational Biology, Institute of Research in Biomedicine, Parc Científic de Barcelona, Josep Samitier 1-5, Barcelona 08028, Barcelona Supercomputing Centre, Jordi Girona 31, Edifici Torre Girona. Barcelona 08034, Departament de Fisicoquímica, Facultat de Farmàcia, Avgda Diagonal sn, Barcelona 08028, Spain, Laboratory for Computation and Visualization in Mathematics and Mechanics, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland, Centre for Complex Molecular Systems and Biomolecues, Institute of Organic Chemistry and Biochemistry Flemingovo nam. 2, 166 10 Praha 6, Czech Republic, National Institute of Bioinformatics, Parc Científic de Barcelona, Josep Samitier 1-5 and Departament de Bioquímica, Facultat de Biología, Avgda Diagonal 647, Barcelona 08028, Spain
| | - F. Javier Luque
- Joint IRB-BSC Program on Computational Biology, Institute of Research in Biomedicine, Parc Científic de Barcelona, Josep Samitier 1-5, Barcelona 08028, Barcelona Supercomputing Centre, Jordi Girona 31, Edifici Torre Girona. Barcelona 08034, Departament de Fisicoquímica, Facultat de Farmàcia, Avgda Diagonal sn, Barcelona 08028, Spain, Laboratory for Computation and Visualization in Mathematics and Mechanics, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland, Centre for Complex Molecular Systems and Biomolecues, Institute of Organic Chemistry and Biochemistry Flemingovo nam. 2, 166 10 Praha 6, Czech Republic, National Institute of Bioinformatics, Parc Científic de Barcelona, Josep Samitier 1-5 and Departament de Bioquímica, Facultat de Biología, Avgda Diagonal 647, Barcelona 08028, Spain
| | - Modesto Orozco
- Joint IRB-BSC Program on Computational Biology, Institute of Research in Biomedicine, Parc Científic de Barcelona, Josep Samitier 1-5, Barcelona 08028, Barcelona Supercomputing Centre, Jordi Girona 31, Edifici Torre Girona. Barcelona 08034, Departament de Fisicoquímica, Facultat de Farmàcia, Avgda Diagonal sn, Barcelona 08028, Spain, Laboratory for Computation and Visualization in Mathematics and Mechanics, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland, Centre for Complex Molecular Systems and Biomolecues, Institute of Organic Chemistry and Biochemistry Flemingovo nam. 2, 166 10 Praha 6, Czech Republic, National Institute of Bioinformatics, Parc Científic de Barcelona, Josep Samitier 1-5 and Departament de Bioquímica, Facultat de Biología, Avgda Diagonal 647, Barcelona 08028, Spain
| |
Collapse
|
62
|
Goñi JR, Pérez A, Torrents D, Orozco M. Determining promoter location based on DNA structure first-principles calculations. Genome Biol 2008; 8:R263. [PMID: 18072969 PMCID: PMC2246265 DOI: 10.1186/gb-2007-8-12-r263] [Citation(s) in RCA: 105] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2007] [Revised: 11/24/2007] [Accepted: 12/11/2007] [Indexed: 11/25/2022] Open
Abstract
A new method is presented which predicts promoter regions based on atomistic molecular dynamics simulations of small oligonucleotides, without requiring information on sequence conservation or features. A new method for the prediction of promoter regions based on atomic molecular dynamics simulations of small oligonucleotides has been developed. The method works independently of gene structure conservation and orthology and of the presence of detectable sequence features. Results obtained with our method confirm the existence of a hidden physical code that modulates genome expression.
Collapse
Affiliation(s)
- J Ramon Goñi
- Institute for Research in Biomedicine, Parc Científic de Barcelona, Josep Samitier, Barcelona 08028, Spain
| | | | | | | |
Collapse
|
63
|
Korb M, Rust AG, Thorsson V, Battail C, Li B, Hwang D, Kennedy KA, Roach JC, Rosenberger CM, Gilchrist M, Zak D, Johnson C, Marzolf B, Aderem A, Shmulevich I, Bolouri H. The Innate Immune Database (IIDB). BMC Immunol 2008; 9:7. [PMID: 18321385 PMCID: PMC2268913 DOI: 10.1186/1471-2172-9-7] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2007] [Accepted: 03/05/2008] [Indexed: 02/04/2023] Open
Abstract
Background As part of a National Institute of Allergy and Infectious Diseases funded collaborative project, we have performed over 150 microarray experiments measuring the response of C57/BL6 mouse bone marrow macrophages to toll-like receptor stimuli. These microarray expression profiles are available freely from our project web site . Here, we report the development of a database of computationally predicted transcription factor binding sites and related genomic features for a set of over 2000 murine immune genes of interest. Our database, which includes microarray co-expression clusters and a host of web-based query, analysis and visualization facilities, is available freely via the internet. It provides a broad resource to the research community, and a stepping stone towards the delineation of the network of transcriptional regulatory interactions underlying the integrated response of macrophages to pathogens. Description We constructed a database indexed on genes and annotations of the immediate surrounding genomic regions. To facilitate both gene-specific and systems biology oriented research, our database provides the means to analyze individual genes or an entire genomic locus. Although our focus to-date has been on mammalian toll-like receptor signaling pathways, our database structure is not limited to this subject, and is intended to be broadly applicable to immunology. By focusing on selected immune-active genes, we were able to perform computationally intensive expression and sequence analyses that would currently be prohibitive if applied to the entire genome. Using six complementary computational algorithms and methodologies, we identified transcription factor binding sites based on the Position Weight Matrices available in TRANSFAC. For one example transcription factor (ATF3) for which experimental data is available, over 50% of our predicted binding sites coincide with genome-wide chromatin immnuopreciptation (ChIP-chip) results. Our database can be interrogated via a web interface. Genomic annotations and binding site predictions can be automatically viewed with a customized version of the Argo genome browser. Conclusion We present the Innate Immune Database (IIDB) as a community resource for immunologists interested in gene regulatory systems underlying innate responses to pathogens. The database website can be freely accessed at .
Collapse
Affiliation(s)
- Martin Korb
- Institute for Systems Biology, 1441 North 34thStreet, Seattle, Washington 98103-8904, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
64
|
Won HH, Kim MJ, Kim S, Kim JW. EnsemPro: An ensemble approach to predicting transcription start sites in human genomic DNA sequences. Genomics 2008; 91:259-66. [DOI: 10.1016/j.ygeno.2007.11.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2007] [Revised: 10/31/2007] [Accepted: 11/07/2007] [Indexed: 11/17/2022]
|
65
|
Abeel T, Saeys Y, Bonnet E, Rouzé P, Van de Peer Y. Generic eukaryotic core promoter prediction using structural features of DNA. Genes Dev 2008; 18:310-23. [PMID: 18096745 PMCID: PMC2203629 DOI: 10.1101/gr.6991408] [Citation(s) in RCA: 133] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2007] [Accepted: 11/14/2007] [Indexed: 11/24/2022]
Abstract
Despite many recent efforts, in silico identification of promoter regions is still in its infancy. However, the accurate identification and delineation of promoter regions is important for several reasons, such as improving genome annotation and devising experiments to study and understand transcriptional regulation. Current methods to identify the core region of promoters require large amounts of high-quality training data and often behave like black box models that output predictions that are difficult to interpret. Here, we present a novel approach for predicting promoters in whole-genome sequences by using large-scale structural properties of DNA. Our technique requires no training, is applicable to many eukaryotic genomes, and performs extremely well in comparison with the best available promoter prediction programs. Moreover, it is fast, simple in design, and has no size constraints, and the results are easily interpretable. We compared our approach with 14 current state-of-the-art implementations using human gene and transcription start site data and analyzed the ENCODE region in more detail. We also validated our method on 12 additional eukaryotic genomes, including vertebrates, invertebrates, plants, fungi, and protists.
Collapse
Affiliation(s)
- Thomas Abeel
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| | - Yvan Saeys
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| | - Eric Bonnet
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| | - Pierre Rouzé
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
- Laboratoire Associé de l’INRA (France), Ghent University, 9052 Gent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| |
Collapse
|
66
|
Rajangam AS, Yang H, Teeri TT, Arvestad L. Evolution of a domain conserved in microtubule-associated proteins of eukaryotes. Adv Appl Bioinform Chem 2008; 1:51-69. [PMID: 21918606 PMCID: PMC3169935 DOI: 10.2147/aabc.s3211] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
The microtubule network, the major organelle of the eukaryotic cytoskeleton, is involved in cell division and differentiation but also with many other cellular functions. In plants, microtubules seem to be involved in the ordered deposition of cellulose microfibrils by a so far unknown mechanism. Microtubule-associated proteins (MAP) typically contain various domains targeting or binding proteins with different functions to microtubules. Here we have investigated a proposed microtubule-targeting domain, TPX2, first identified in the Kinesin-like protein 2 in Xenopus. A TPX2 containing microtubule binding protein, PttMAP20, has been recently identified in poplar tissues undergoing xylogenesis. Furthermore, the herbicide 2,6-dichlorobenzonitrile (DCB), which is a known inhibitor of cellulose synthesis, was shown to bind specifically to PttMAP20. It is thus possible that PttMAP20 may have a role in coupling cellulose biosynthesis and the microtubular networks in poplar secondary cell walls. In order to get more insight into the occurrence, evolution and potential functions of TPX2-containing proteins we have carried out bioinformatic analysis for all genes so far found to encode TPX2 domains with special reference to poplar PttMAP20 and its putative orthologs in other plants.
Collapse
Affiliation(s)
- Alex S Rajangam
- KTH Biotechnology, Swedish Center for Biomimetic Fiber Engineering, AlbaNova, Stockholm, Sweden
| | | | | | | |
Collapse
|
67
|
Koenig SF, Lattanzio R, Mansperger K, Rupp RA, Wedlich D, Gradl D. Autoregulation of XTcf-4 depends on a Lef/Tcf site on the XTcf-4 promoter. Genesis 2008; 46:81-6. [DOI: 10.1002/dvg.20363] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
68
|
Wang J, Ungar LH, Tseng H, Hannenhalli S. MetaProm: a neural network based meta-predictor for alternative human promoter prediction. BMC Genomics 2007; 8:374. [PMID: 17941982 PMCID: PMC2194789 DOI: 10.1186/1471-2164-8-374] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2007] [Accepted: 10/17/2007] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND De novo eukaryotic promoter prediction is important for discovering novel genes and understanding gene regulation. In spite of the great advances made in the past decade, recent studies revealed that the overall performances of the current promoter prediction programs (PPPs) are still poor, and predictions made by individual PPPs do not overlap each other. Furthermore, most PPPs are trained and tested on the most-upstream promoters; their performances on alternative promoters have not been assessed. RESULTS In this paper, we evaluate the performances of current major promoter prediction programs (i.e., PSPA, FirstEF, McPromoter, DragonGSF, DragonPF, and FProm) using 42,536 distinct human gene promoters on a genome-wide scale, and with emphasis on alternative promoters. We describe an artificial neural network (ANN) based meta-predictor program that integrates predictions from the current PPPs and the predicted promoters' relation to CpG islands. Our specific analysis of recently discovered alternative promoters reveals that although only 41% of the 3' most promoters overlap a CpG island, 74% of 5' most promoters overlap a CpG island. CONCLUSION Our assessment of six PPPs on 1.06 x 109 bps of human genome sequence reveals the specific strengths and weaknesses of individual PPPs. Our meta-predictor outperforms any individual PPP in sensitivity and specificity. Furthermore, we discovered that the 5' alternative promoters are more likely to be associated with a CpG island.
Collapse
Affiliation(s)
- Junwen Wang
- Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | | | | | | |
Collapse
|
69
|
Vashist A, Kulikowski CA, Muchnik I. Ortholog clustering on a multipartite graph. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2007; 4:17-27. [PMID: 17277410 DOI: 10.1109/tcbb.2007.1004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
We present a method for automatically extracting groups of orthologous genes from a large set of genomes by a new clustering algorithm on a weighted multipartite graph. The method assigns a score to an arbitrary subset of genes from multiple genomes to assess the orthologous relationships between genes in the subset. This score is computed using sequence similarities between the member genes and the phylogenetic relationship between the corresponding genomes. An ortholog cluster is found as the subset with the highest score, so ortholog clustering is formulated as a combinatorial optimization problem. The algorithm for finding an ortholog cluster runs in time O(absolute value(E) + absolute value(V) log absolute value(V)), where V and E are the sets of vertices and edges, respectively, in the graph. However, if we discretize the similarity scores into a constant number of bins, the runtime improves to O(absolute value(E) + absolute value(V)). The proposed method was applied to seven complete eukaryote genomes on which the manually curated database of eukaryotic ortholog clusters, KOG, is constructed. A comparison of our results with the manually curated ortholog clusters shows that our clusters are well correlated with the existing clusters.
Collapse
Affiliation(s)
- Akshay Vashist
- Department of Computer Science, Rutgers-The State University of New Jersey, Piscataway 08854, USA.
| | | | | |
Collapse
|
70
|
Abstract
Chinese hamster ovary (CHO) cells are a prevalent tool in biological research and are among the most widely used host cell lines for production of recombinant therapeutic proteins. While research in other organisms has been revolutionized through the development of DNA sequence-based tools, the lack of comparable genomic resources for the Chinese hamster has impeded similar work in CHO cell lines. A comparative genomics approach, based upon the completely sequenced mouse genome, can facilitate genomic work in this important organism. Using chromosome synteny to define regions of conserved linkage between Chinese hamster and mouse chromosomes, a working scaffold for the Chinese hamster genome has been developed. Mapping CHO and Chinese hamster sequences to the mouse genome creates direct access to relevant information in public databases. Additionally, mapping gene expression data onto a chromosome scaffold affords the ability to interpret information in a genomic context, potentially revealing important structural and regulatory features in the Chinese hamster genome. Further development of this genomic scaffold will provide opportunities to use biomolecular tools for research in CHO cell lines today and will be an asset to future efforts to sequence the Chinese hamster genome.
Collapse
Affiliation(s)
- Katie F Wlaschin
- Department of Chemical Engineering and Materials Science, University of Minnesota, 421 Washington Avenue SE, Minneapolis, Minnesota 55455-0132, USA
| | | |
Collapse
|
71
|
Xie X, Wu S, Lam KM, Yan H. PromoterExplorer: an effective promoter identification method based on the AdaBoost algorithm. Bioinformatics 2006; 22:2722-8. [PMID: 17000749 DOI: 10.1093/bioinformatics/btl482] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Promoter prediction is important for the analysis of gene regulations. Although a number of promoter prediction algorithms have been reported in literature, significant improvement in prediction accuracy remains a challenge. In this paper, an effective promoter identification algorithm, which is called PromoterExplorer, is proposed. In our approach, we analyze the different roles of various features, that is, local distribution of pentamers, positional CpG island features and digitized DNA sequence, and then combine them to build a high-dimensional input vector. A cascade AdaBoost-based learning procedure is adopted to select the most 'informative' or 'discriminating' features to build a sequence of weak classifiers, which are combined to form a strong classifier so as to achieve a better performance. The cascade structure used for identification can also reduce the false positive. RESULTS PromoterExplorer is tested based on large-scale DNA sequences from different databases, including the EPD, DBTSS, GenBank and human chromosome 22. Experimental results show that consistent and promising performance can be achieved.
Collapse
Affiliation(s)
- Xudong Xie
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong
| | | | | | | |
Collapse
|
72
|
Li QZ, Lin H. The recognition and prediction of σ70 promoters in Escherichia coli K-12. J Theor Biol 2006; 242:135-41. [PMID: 16603195 DOI: 10.1016/j.jtbi.2006.02.007] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2005] [Revised: 02/05/2006] [Accepted: 02/10/2006] [Indexed: 10/24/2022]
Abstract
Based on the conservation analysis of the 683 latest experimentally verified sigma(70)-promoter sequences of Escherichia coli K-12, it is found that the conservative hexamers segments in different sites play a key role of promoter regions, a novel position-correlation scoring matrix (PCSM) algorithm for predicting sigma(70) promoter is presented. The predictive capacity of the algorithm is tested by 10-cross validation test. The results show that the overall prediction accuracies (sensitivity) and specificity are 91% and 81%, respectively. By selecting the 683 experimentally verified sigma(70) promoters as training set and searching for the complete sequence in E. coli K-12 with 4639221bp. Results show that the 100% of the 683 experimentally verified sigma(70) promoters have been identified and some possible promoters are predicted.
Collapse
Affiliation(s)
- Qian-Zhong Li
- Department of Physics, Laboratory of Theoretical Biophysics, College of Sciences and Technology, Inner Mongolia University, Hohhot 010021, China.
| | | |
Collapse
|
73
|
Bajic VB, Brent MR, Brown RH, Frankish A, Harrow J, Ohler U, Solovyev VV, Tan SL. Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment. Genome Biol 2006; 7 Suppl 1:S3.1-13. [PMID: 16925837 PMCID: PMC1810552 DOI: 10.1186/gb-2006-7-s1-s3] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends. RESULTS The predictions were assessed relative to the manual HAVANA annotation of the 5' gene ends. These 5' gene ends were used as the estimated reference transcription start sites. With the maximum allowed distance for predictions of 1,000 nucleotides from the reference transcription start sites, the sensitivity of predictors was in the range 32% to 56%, while the positive predictive value was in the range 79% to 93%. The average distance mismatch of predictions from the reference transcription start sites was in the range 259 to 305 nucleotides. At the same time, using transcription start site estimates from DBTSS and H-Invitational databases as promoter predictions, we obtained a sensitivity of 58%, a positive predictive value of 92%, and an average distance from the annotated transcription start sites of 117 nucleotides. In this experiment, the best performing promoter predictors were those that combined promoter prediction with gene prediction. The main reason for this is the reduced promoter search space that resulted in smaller numbers of false positive predictions. CONCLUSION The main finding, now supported by comprehensive data, is that the accuracy of human promoter predictors for high-throughput annotation purposes can be significantly improved if promoter prediction is combined with gene prediction. Based on the lessons learned in this experiment, we propose a framework for the preparation of the next similar promoter prediction assessment.
Collapse
Affiliation(s)
- Vladimir B Bajic
- South African National Bioinformatics Institute, University of the Western Cape, Bellville 7535, South Africa.
| | | | | | | | | | | | | | | |
Collapse
|
74
|
Blanco E, Messeguer X, Smith TF, Guigó R. Transcription factor map alignment of promoter regions. PLoS Comput Biol 2006; 2:e49. [PMID: 16733547 PMCID: PMC1464811 DOI: 10.1371/journal.pcbi.0020049] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2005] [Accepted: 03/31/2006] [Indexed: 11/18/2022] Open
Abstract
We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels--to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human-mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments.
Collapse
Affiliation(s)
- Enrique Blanco
- Research Group in Biomedical Informatics, Institut Municipal d'Investigació Mèdica/Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | | | | | | |
Collapse
|
75
|
Xuan Z, Zhao F, Wang J, Chen G, Zhang MQ. Genome-wide promoter extraction and analysis in human, mouse, and rat. Genome Biol 2005; 6:R72. [PMID: 16086854 PMCID: PMC1273639 DOI: 10.1186/gb-2005-6-8-r72] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2005] [Revised: 05/23/2005] [Accepted: 07/11/2005] [Indexed: 01/27/2023] Open
Abstract
Large-scale and high-throughput genomics research needs reliable and comprehensive genome-wide promoter annotation resources. We have conducted a systematic investigation on how to improve mammalian promoter prediction by incorporating both transcript and conservation information. This enabled us to build a better multispecies promoter annotation pipeline and hence to create CSHLmpd (Cold Spring Harbor Laboratory Mammalian Promoter Database) for the biomedical research community, which can act as a starting reference system for more refined functional annotations.
Collapse
Affiliation(s)
- Zhenyu Xuan
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Fang Zhao
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Jinhua Wang
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Gengxin Chen
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Michael Q Zhang
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| |
Collapse
|
76
|
Vishnevsky OV, Kolchanov NA. ARGO: a web system for the detection of degenerate motifs and large-scale recognition of eukaryotic promoters. Nucleic Acids Res 2005; 33:W417-22. [PMID: 15980502 PMCID: PMC1160220 DOI: 10.1093/nar/gki459] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2005] [Revised: 04/13/2005] [Accepted: 04/13/2005] [Indexed: 11/13/2022] Open
Abstract
Reliable recognition of the promoters in eukaryotic genomes remains an open issue. This is largely owing to the poor understanding of the features of the structural-functional organization of the eukaryotic promoters essential for their function and recognition. However, it was demonstrated that detection of ensembles of regulatory signals characteristic of specific promoter groups increases the accuracy of promoter recognition and prediction of specific expression features of the queried genes. The ARGO_Motifs package was developed for the detection of sets of region-specific degenerate oligonucleotide motifs in the regulatory regions of the eukaryotic genes. The ARGO_Viewer package was developed for the recognition of tissue-specific gene promoters based on the presence and distribution of oligonucleotide motifs obtained by the ARGO_Motifs program. Analysis and recognition of tissue-specific promoters in five gene samples demonstrated high quality of promoter recognition. The public version of the ARGO system is available at http://wwwmgs2.bionet.nsc.ru/argo/ and http://emj-pc.ics.uci.edu/argo/.
Collapse
Affiliation(s)
- Oleg V Vishnevsky
- Institute of Cytology and Genetics, SB RAS Lavrentyev Avenue, 10, Novosibirsk, 630090, Russia.
| | | |
Collapse
|
77
|
Barta E, Sebestyén E, Pálfy TB, Tóth G, Ortutay CP, Patthy L. DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants. Nucleic Acids Res 2005; 33:D86-90. [PMID: 15608291 PMCID: PMC540051 DOI: 10.1093/nar/gki097] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
DoOP (http://doop.abc.hu/) is a database of eukaryotic promoter sequences (upstream regions) aiming to facilitate the recognition of regulatory sites conserved between species. The annotated first exons of human and Arabidopsis thaliana genes were used as queries in BLAST searches to collect the most closely related orthologous first exon sequences from Chordata and Viridiplantae species. Up to 3000 bp DNA segments upstream from these first exons constitute the clusters in the chordate and plant sections of the Database of Orthologous Promoters. Release 1.0 of DoOP contains 21 061 chordate clusters from 284 different species and 7548 plant clusters from 269 different species. The database can be used to find and retrieve promoter sequences of a given gene from various species and it is also suitable to see the most trivial conserved sequence blocks in the orthologous upstream regions. Users can search DoOP with either sequence or text (annotation) to find promoter clusters of various genes. In addition to the sequence data, the positions of the conserved sequence blocks derived from multiple alignments, the positions of repetitive elements and the positions of transcription start sites known from the Eukaryotic Promoter Database (EPD) can be viewed graphically.
Collapse
Affiliation(s)
- Endre Barta
- Agricultural Biotechnology Center, Gödöllo, Szent-Györgyi Albert u. 4, H-2100, Hungary.
| | | | | | | | | | | |
Collapse
|
78
|
Shahmuradov IA, Solovyev VV, Gammerman AJ. Plant promoter prediction with confidence estimation. Nucleic Acids Res 2005; 33:1069-76. [PMID: 15722481 PMCID: PMC549412 DOI: 10.1093/nar/gki247] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2004] [Revised: 12/15/2004] [Accepted: 01/24/2005] [Indexed: 11/24/2022] Open
Abstract
Accurate prediction of promoters is fundamental to understanding gene expression patterns, where confidence estimation is one of the main requirements. Using recently developed transductive confidence machine (TCM) techniques, we developed a new program TSSP-TCM for the prediction of plant promoters that also provides confidence of the prediction. The program was trained on 132 and 104 sequences and tested on 40 and 25 sequences (containing TATA and TATA-less promoters, respectively) with known transcription start sites (TSSs). As negative training samples for TCM learning we used coding and intron sequences of plant genes annotated in the GenBank. In the test set of TATA promoters, the program correctly predicted TSS for 35 out of 40 (87.5%) genes with a median deviation of several base pairs from the true site location. For 25 TATA-less promoters, TSSs were predicted for 21 out of 25 (84%) genes, including 14 cases of 5 bp distance between annotated and predicted TSSs. Using TSSP-TCM program we annotated promoters in the whole Arabidopsis genome. The predicted promoters were in good agreement with the start position of known Arabidopsis mRNAs. Thus, TCM technique has produced a plant-oriented promoter prediction tool of high accuracy. TSSP-TCM program and annotated promoters are available at http://mendel.cs.rhul.ac.uk/mendel.php?topic=fgen.
Collapse
Affiliation(s)
| | - V. V. Solovyev
- Royal Holloway, University of LondonEgham, Surrey TW20 0EX, UK
- Softberry Inc.116 Radio Circle, Suite 400, Mount Kisco, NY 10549, USA
| | - A. J. Gammerman
- Royal Holloway, University of LondonEgham, Surrey TW20 0EX, UK
| |
Collapse
|
79
|
Bajic VB, Tan SL, Suzuki Y, Sugano S. Promoter prediction analysis on the whole human genome. Nat Biotechnol 2004; 22:1467-73. [PMID: 15529174 DOI: 10.1038/nbt1032] [Citation(s) in RCA: 114] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Promoter prediction programs (PPPs) are important for in silico gene discovery without support from expressed sequence tag (EST)/cDNA/mRNA sequences, in the analysis of gene regulation and in genome annotation. Contrary to previous expectations, a comprehensive analysis of PPPs reveals that no program simultaneously achieves sensitivity and a positive predictive value >65%. PPP performances deduced from a limited number of chromosomes or smaller data sets do not hold when evaluated at the level of the whole genome, with serious inaccuracy of predictions for non-CpG-island-related promoters. Some PPPs even perform worse than, or close to, pure random guessing.
Collapse
Affiliation(s)
- Vladimir B Bajic
- Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613 Singapore.
| | | | | | | |
Collapse
|
80
|
Burden S, Lin YX, Zhang R. Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences. Bioinformatics 2004; 21:601-7. [PMID: 15454410 DOI: 10.1093/bioinformatics/bti047] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Although a great deal of research has been undertaken in the area of promoter prediction, prediction techniques are still not fully developed. Many algorithms tend to exhibit poor specificity, generating many false positives, or poor sensitivity. The neural network prediction program NNPP2.2 is one such example. RESULTS To improve the NNPP2.2 prediction technique, the distance between the transcription start site (TSS) associated with the promoter and the translation start site (TLS) of the subsequent gene coding region has been studied for Escherichia coli K12 bacteria. An empirical probability distribution that is consistent for all E.coli promoters has been established. This information is combined with the results from NNPP2.2 to create a new technique called TLS-NNPP, which improves the specificity of promoter prediction. The technique is shown to be effective using E.coli DNA sequences, however, it is applicable to any organism for which a set of promoters has been experimentally defined. AVAILABILITY The data used in this project and the prediction results for the tested sequences can be obtained from http://www.uow.edu.au/~yanxia/E_Coli_paper/SBurden_Results.xls CONTACT alh98@uow.edu.au.
Collapse
Affiliation(s)
- S Burden
- Department of Mathematics and Applied Statistics, University of Wollongong Wollongong, NSW 2522, Australia.
| | | | | |
Collapse
|
81
|
Abstract
Transfer of SXT, a Vibrio cholerae-derived integrating conjugative element that encodes multiple antibiotic resistance genes, is repressed by SetR, a lambda434 cI-related repressor. Here we identify divergent promoters between s086 and setR that drive expression of the regulators of SXT transfer. One transcript encodes the activators of transfer, setC and setD. The second transcript codes for SetR and, like the cI transcript of lambda, is leaderless. SetR binds to four operators located between setR and s086; the locations and relative affinities of these sites suggest a model for regulation of SXT transfer.
Collapse
Affiliation(s)
- John W Beaber
- Department of Microbiology, Tufts University School of Medicine, 136 Harrison Ave., Jaharis 425, Boston, MA 02111, USA
| | | |
Collapse
|
82
|
Schwartz YB, Boykova T, Belyaeva ES, Ashburner M, Zhimulev IF. Molecular characterization of the singed wings locus of Drosophila melanogaster. BMC Genet 2004; 5:15. [PMID: 15189568 PMCID: PMC446189 DOI: 10.1186/1471-2156-5-15] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2004] [Accepted: 06/09/2004] [Indexed: 11/21/2022] Open
Abstract
Background Hormones frequently guide animal development via the induction of cascades of gene activities, whose products further amplify an initial hormonal stimulus. In Drosophila the transformation of the larva into the pupa and the subsequent metamorphosis to the adult stage is triggered by changes in the titer of the steroid hormone 20-hydroxyecdysone. singed wings (swi) is the only gene known in Drosophila melanogaster for which mutations specifically interrupt the transmission of the regulatory signal from early to late ecdysone inducible genes. Results We have characterized singed wings locus, showing it to correspond to EG:171E4.2 (CG3095). swi encodes a predicted 68.5-kDa protein that contains N-terminal histidine-rich and threonine-rich domains, a cysteine-rich C-terminal region and two leucine-rich repeats. The SWI protein has a close homolog in D. melanogaster, defining a new family of SWI-like proteins, and is conserved in D. pseudoobscura. A lethal mutation, swit476, shows a severe disruption of the ecdysone pathway and is a C>Y substitution in one of the two conserved CysXCys motifs that are common to SWI and the Drosophila Toll-4 protein. Conclusions It is not entirely clear from the present molecular analysis how the SWI protein may function in the ecdysone induced cascade. Currently all predictions agree in that SWI is very unlikely to be a nuclear protein. Thus it probably exercises its control of "late" ecdysone genes indirectly. Apparently the genetic regulation of ecdysone signaling is much more complex then was previously anticipated.
Collapse
Affiliation(s)
- Yuri B Schwartz
- Institute of Cytology and Genetics, Russian Academy of Sciences, Novosibirsk, 630090, Russia
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
- Department of Zoology, University of Geneva, Geneva, 1205, Switzerland
| | - Tatiana Boykova
- Institute of Cytology and Genetics, Russian Academy of Sciences, Novosibirsk, 630090, Russia
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Elena S Belyaeva
- Institute of Cytology and Genetics, Russian Academy of Sciences, Novosibirsk, 630090, Russia
| | - Michael Ashburner
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Igor F Zhimulev
- Institute of Cytology and Genetics, Russian Academy of Sciences, Novosibirsk, 630090, Russia
| |
Collapse
|