1
|
Abstract
The correct mapping of promoter elements is a crucial step in microbial genomics. Also, when combining new DNA elements into synthetic sequences, predicting the potential generation of new promoter sequences is critical. Over the last years, many bioinformatics tools have been created to allow users to predict promoter elements in a sequence or genome of interest. Here, we assess the predictive power of some of the main prediction tools available using well-defined promoter data sets. Using Escherichia coli as a model organism, we demonstrated that while some tools are biased toward AT-rich sequences, others are very efficient in identifying real promoters with low false-negative rates. We hope the potentials and limitations presented here will help the microbiology community to choose promoter prediction tools among many available alternatives. The promoter region is a key element required for the production of RNA in bacteria. While new high-throughput technology allows massively parallel mapping of promoter elements, we still mainly rely on bioinformatics tools to predict such elements in bacterial genomes. Additionally, despite many different prediction tools having become popular to identify bacterial promoters, no systematic comparison of such tools has been performed. Here, we performed a systematic comparison between several widely used promoter prediction tools (BPROM, bTSSfinder, BacPP, CNNProm, IBBP, Virtual Footprint, iPro70-FMWin, 70ProPred, iPromoter-2L, and MULTiPly) using well-defined sequence data sets and standardized metrics to determine how well those tools performed related to each other. For this, we used data sets of experimentally validated promoters from Escherichia coli and a control data set composed of randomly generated sequences with similar nucleotide distributions. We compared the performance of the tools using metrics such as specificity, sensitivity, accuracy, and Matthews correlation coefficient (MCC). We show that the widely used BPROM presented the worse performance among the compared tools, while four tools (CNNProm, iPro70-FMWin, 70ProPred, and iPromoter-2L) offered high predictive power. Of these tools, iPro70-FMWin exhibited the best results for most of the metrics used. We present here some potentials and limitations of available tools, and we hope that future work can build upon our effort to systematically characterize this useful class of bioinformatics tools. IMPORTANCE The correct mapping of promoter elements is a crucial step in microbial genomics. Also, when combining new DNA elements into synthetic sequences, predicting the potential generation of new promoter sequences is critical. Over the last years, many bioinformatics tools have been created to allow users to predict promoter elements in a sequence or genome of interest. Here, we assess the predictive power of some of the main prediction tools available using well-defined promoter data sets. Using Escherichia coli as a model organism, we demonstrated that while some tools are biased toward AT-rich sequences, others are very efficient in identifying real promoters with low false-negative rates. We hope the potentials and limitations presented here will help the microbiology community to choose promoter prediction tools among many available alternatives.
Collapse
|
2
|
Liu X, Guo Z, He T, Ren M. Prediction and analysis of prokaryotic promoters based on sequence features. Biosystems 2020; 197:104218. [PMID: 32755610 DOI: 10.1016/j.biosystems.2020.104218] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 07/03/2020] [Accepted: 07/21/2020] [Indexed: 10/23/2022]
Abstract
Promoter recognition is an important part of functional genomic annotation but a difficult problem. Many studies have been carried out to address this issue. However, they still cannot meet application needs. Most of the methods exhibit specificity, and the objects analyzed are relatively simple, especially for prokaryotes. Hence, more research on prokaryotic promoters is lacking. In this study, the similarity between gene expression and the transmission of information inspired us to analyze promoter sequences by calculating the information content of the sequences and the correlation between sequences in the subregion. We also calculated other sequence features as supplements, such as the Hurst exponent, GC content, and sequence bending property. Then, we employed an artificial neural network to build a classifier and applied it to identify promoters in three organisms, Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa. The experiments on the benchmark test set indicate that our method has good capability to distinguish promoters from randomly selected nonpromoters. The maximal AUC for the classifier is 0.90, and the minimal AUC score is 0.80. Additionally, cross-species experiments were conducted. The AUC of the cross-experiment on three organisms yielded 0.8, suggesting that our approach has better generalization ability, which is conducive to revealing the more common characteristics of prokaryotic promoters.
Collapse
Affiliation(s)
- Xiao Liu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China.
| | - Zhirui Guo
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| | - Ting He
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| | - Meixiang Ren
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| |
Collapse
|
3
|
Iwaki H, Yamamoto T, Hasegawa Y. Isolation of marine xylene-utilizing bacteria and characterization of Halioxenophilus aromaticivorans gen. nov., sp. nov. and its xylene degradation gene cluster. FEMS Microbiol Lett 2019; 365:4867970. [PMID: 29462302 DOI: 10.1093/femsle/fny042] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2018] [Accepted: 02/15/2018] [Indexed: 11/13/2022] Open
Abstract
Seven xylene-utilizing bacterial strains were isolated from seawater collected off the coast of Japan. Analysis of 16S rRNA gene sequences indicated that six isolates were most closely related to the marine bacterial genera Alteromonas, Marinobacter or Aestuariibacter. The sequence of the remaining strain, KU68FT, showed low similarity to the 16S rRNA gene sequences of known bacteria with validly published names, the most similar species being Maricurvus nonylphenolicus strain KU41ET (92.6% identity). On the basis of physiological, chemotaxonomic and phylogenetic data, strain KU68FT is suggested to represent a novel species of a new genus in the family Cellvibrionaceae of the order Cellvibrionales within the Gammaproteobacteria, for which the name Halioxenophilus aromaticivorans gen. nov., sp. nov. is proposed. The type strain of Halioxenophilus aromaticivorans is KU68FT (=JCM 19134T = KCTC 32387T). PCR and sequence analysis revealed that strain KU68FT possesses an entire set of genes encoding the enzymes for the upper xylene methyl-monooxygenase pathway, xylCMABN, resembling the gene set of the terrestrial Pseudomonas putida strain mt-2.
Collapse
Affiliation(s)
| | - Taisei Yamamoto
- Department of Life Science & Biotechnology, Kansai University, 3-3-35 Yamate-cho, Suita, Osaka 564-8680, Japan
| | - Yoshie Hasegawa
- Department of Life Science & Biotechnology, Kansai University, 3-3-35 Yamate-cho, Suita, Osaka 564-8680, Japan
| |
Collapse
|
4
|
Zhang M, Li F, Marquez-Lago TT, Leier A, Fan C, Kwoh CK, Chou KC, Song J, Jia C. MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics 2019; 35:2957-2965. [PMID: 30649179 PMCID: PMC6736106 DOI: 10.1093/bioinformatics/btz016] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 12/09/2018] [Accepted: 01/05/2019] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Promoters are short DNA consensus sequences that are localized proximal to the transcription start sites of genes, allowing transcription initiation of particular genes. However, the precise prediction of promoters remains a challenging task because individual promoters often differ from the consensus at one or more positions. RESULTS In this study, we present a new multi-layer computational approach, called MULTiPly, for recognizing promoters and their specific types. MULTiPly took into account the sequences themselves, including both local information such as k-tuple nucleotide composition, dinucleotide-based auto covariance and global information of the entire samples based on bi-profile Bayes and k-nearest neighbour feature encodings. Specifically, the F-score feature selection method was applied to identify the best unique type of feature prediction results, in combination with other types of features that were subsequently added to further improve the prediction performance of MULTiPly. Benchmarking experiments on the benchmark dataset and comparisons with five state-of-the-art tools show that MULTiPly can achieve a better prediction performance on 5-fold cross-validation and jackknife tests. Moreover, the superiority of MULTiPly was also validated on a newly constructed independent test dataset. MULTiPly is expected to be used as a useful tool that will facilitate the discovery of both general and specific types of promoters in the post-genomic era. AVAILABILITY AND IMPLEMENTATION The MULTiPly webserver and curated datasets are freely available at http://flagshipnt.erc.monash.edu/MULTiPly/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Meng Zhang
- School of Science, Dalian Maritime University, Dalian, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Tatiana T Marquez-Lago
- Department of Genetics, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - André Leier
- Department of Genetics, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Cunshuo Fan
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | | | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian, China
- College of Information Engineering, Northwest A&F University, Yangling, China
| |
Collapse
|
5
|
Abstract
In this issue of the Journal of Bacteriology, Hustmyer and colleagues describe a new method for rapidly generating reporter libraries (Hustmyer citation). This RAIL technique (Rapid Arbitrary PCR Insertion Libraries) uses arbitrary PCR and isothermal DNA assembly to insert random fragments of promoter regions into reporter plasmids, resulting in libraries that can be screened to identify regions required for gene expression. This technique will likely be useful for a number of different genetic applications.
Collapse
Affiliation(s)
- Jyl S Matson
- Department of Medical Microbiology and Immunology, University of Toledo College of Medicine and Life Sciences, Toledo, OH
| |
Collapse
|
6
|
Chen Y, Ho JML, Shis DL, Gupta C, Long J, Wagner DS, Ott W, Josić K, Bennett MR. Tuning the dynamic range of bacterial promoters regulated by ligand-inducible transcription factors. Nat Commun 2018; 9:64. [PMID: 29302024 PMCID: PMC5754348 DOI: 10.1038/s41467-017-02473-5] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Accepted: 12/01/2017] [Indexed: 11/09/2022] Open
Abstract
One challenge for synthetic biologists is the predictable tuning of genetic circuit regulatory components to elicit desired outputs. Gene expression driven by ligand-inducible transcription factor systems must exhibit the correct ON and OFF characteristics: appropriate activation and leakiness in the presence and absence of inducer, respectively. However, the dynamic range of a promoter (i.e., absolute difference between ON and OFF states) is difficult to control. We report a method that tunes the dynamic range of ligand-inducible promoters to achieve desired ON and OFF characteristics. We build combinatorial sets of AraC-and LasR-regulated promoters containing -10 and -35 sites from synthetic and Escherichia coli promoters. Four sequence combinations with diverse dynamic ranges were chosen to build multi-input transcriptional logic gates regulated by two and three ligand-inducible transcription factors (LacI, TetR, AraC, XylS, RhlR, LasR, and LuxR). This work enables predictable control over the dynamic range of regulatory components.
Collapse
Affiliation(s)
- Ye Chen
- Department of Biosciences, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| | - Joanne M L Ho
- Department of Biosciences, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| | - David L Shis
- Department of Biosciences, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| | - Chinmaya Gupta
- Department of Mathematics, University of Houston, 4800 Calhoun Road, Houston, TX, 77204, USA
| | - James Long
- Department of Biosciences, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| | - Daniel S Wagner
- Department of Biosciences, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| | - William Ott
- Department of Mathematics, University of Houston, 4800 Calhoun Road, Houston, TX, 77204, USA
| | - Krešimir Josić
- Department of Biosciences, Rice University, 6100 Main Street, Houston, TX, 77005, USA. .,Department of Mathematics, University of Houston, 4800 Calhoun Road, Houston, TX, 77204, USA. .,Department of Biology and Biochemistry, University of Houston, 4800 Calhoun Road, Houston, TX, 77204, USA.
| | - Matthew R Bennett
- Department of Biosciences, Rice University, 6100 Main Street, Houston, TX, 77005, USA. .,Department of Bioengineering, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| |
Collapse
|
7
|
Yus E, Yang JS, Sogues A, Serrano L. A reporter system coupled with high-throughput sequencing unveils key bacterial transcription and translation determinants. Nat Commun 2017; 8:368. [PMID: 28848232 PMCID: PMC5573727 DOI: 10.1038/s41467-017-00239-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 06/09/2017] [Indexed: 12/24/2022] Open
Abstract
Quantitative analysis of the sequence determinants of transcription and translation regulation is relevant for systems and synthetic biology. To identify these determinants, researchers have developed different methods of screening random libraries using fluorescent reporters or antibiotic resistance genes. Here, we have implemented a generic approach called ELM-seq (expression level monitoring by DNA methylation) that overcomes the technical limitations of such classic reporters. ELM-seq uses DamID (Escherichia coli DNA adenine methylase as a reporter coupled with methylation-sensitive restriction enzyme digestion and high-throughput sequencing) to enable in vivo quantitative analyses of upstream regulatory sequences. Using the genome-reduced bacterium Mycoplasma pneumoniae, we show that ELM-seq has a large dynamic range and causes minimal toxicity. We use ELM-seq to determine key sequences (known and putatively novel) of promoter and untranslated regions that influence transcription and translation efficiency. Applying ELM-seq to other organisms will help us to further understand gene expression and guide synthetic biology. Quantitative analysis of how DNA sequence determines transcription and translation regulation is of interest to systems and synthetic biologists. Here the authors present ELM-seq, which uses Dam activity as reporter for high-throughput analysis of promoter and 5’-UTR regions.
Collapse
Affiliation(s)
- Eva Yus
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Doctor Aiguader 88, Barcelona, 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jae-Seong Yang
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Doctor Aiguader 88, Barcelona, 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Adrià Sogues
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Doctor Aiguader 88, Barcelona, 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Institut Pasteur, Unité de Microbiologie Structurale (CNRS) UMR 3528, Université Paris Diderot, 25 rue du Docteur Roux, Paris, 75724, France
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Doctor Aiguader 88, Barcelona, 08003, Spain. .,Universitat Pompeu Fabra (UPF), Barcelona, Spain. .,Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluis Companys 23, Barcelona, 08010, Spain.
| |
Collapse
|
8
|
Shahmuradov IA, Mohamad Razali R, Bougouffa S, Radovanovic A, Bajic VB. bTSSfinder: a novel tool for the prediction of promoters in cyanobacteria and Escherichia coli. Bioinformatics 2017; 33:334-340. [PMID: 27694198 PMCID: PMC5408793 DOI: 10.1093/bioinformatics/btw629] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 09/27/2016] [Indexed: 12/01/2022] Open
Abstract
Motivation The computational search for promoters in prokaryotes remains an attractive problem in bioinformatics. Despite the attention it has received for many years, the problem has not been addressed satisfactorily. In any bacterial genome, the transcription start site is chosen mostly by the sigma (σ) factor proteins, which control the gene activation. The majority of published bacterial promoter prediction tools target σ70 promoters in Escherichia coli. Moreover, no σ-specific classification of promoters is available for prokaryotes other than for E. coli. Results Here, we introduce bTSSfinder, a novel tool that predicts putative promoters for five classes of σ factors in Cyanobacteria (σA, σC, σH, σG and σF) and for five classes of sigma factors in E. coli (σ70, σ38, σ32, σ28 and σ24). Comparing to currently available tools, bTSSfinder achieves higher accuracy (MCC = 0.86, F1-score = 0.93) compared to the next best tool with MCC = 0.59, F1-score = 0.79) and covers multiple classes of promoters. Availability and Implementation bTSSfinder is available standalone and online at http://www.cbrc.kaust.edu.sa/btssfinder. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ilham Ayub Shahmuradov
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Rozaimi Mohamad Razali
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Salim Bougouffa
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Aleksandar Radovanovic
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Vladimir B Bajic
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| |
Collapse
|
9
|
Shahmuradov IA, Umarov RK, Solovyev VV. TSSPlant: a new tool for prediction of plant Pol II promoters. Nucleic Acids Res 2017; 45:e65. [PMID: 28082394 PMCID: PMC5416875 DOI: 10.1093/nar/gkw1353] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2016] [Revised: 12/16/2016] [Accepted: 12/27/2016] [Indexed: 11/22/2022] Open
Abstract
Our current knowledge of eukaryotic promoters indicates their complex architecture that is often composed of numerous functional motifs. Most of known promoters include multiple and in some cases mutually exclusive transcription start sites (TSSs). Moreover, TSS selection depends on cell/tissue, development stage and environmental conditions. Such complex promoter structures make their computational identification notoriously difficult. Here, we present TSSPlant, a novel tool that predicts both TATA and TATA-less promoters in sequences of a wide spectrum of plant genomes. The tool was developed by using large promoter collections from ppdb and PlantProm DB. It utilizes eighteen significant compositional and signal features of plant promoter sequences selected in this study, that feed the artificial neural network-based model trained by the backpropagation algorithm. TSSPlant achieves significantly higher accuracy compared to the next best promoter prediction program for both TATA promoters (MCC≃0.84 and F1-score≃0.91 versus MCC≃0.51 and F1-score≃0.71) and TATA-less promoters (MCC≃0.80, F1-score≃0.89 versus MCC≃0.29 and F1-score≃0.50). TSSPlant is available to download as a standalone program at http://www.cbrc.kaust.edu.sa/download/.
Collapse
Affiliation(s)
- Ilham A. Shahmuradov
- King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
- Institue of Molecular Biology and Biotechnologies, ANAS, 2 Matbuat strasse, Baku AZ1073, Azerbaijan
| | - Ramzan Kh. Umarov
- King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | | |
Collapse
|
10
|
Nikolic M, Stankovic T, Djordjevic M. Contribution of bacterial promoter elements to transcription start site detection accuracy. J Bioinform Comput Biol 2016; 15:1650038. [PMID: 27908222 DOI: 10.1142/s0219720016500384] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Accurately detecting transcription start sites (TSS) is a starting point for understanding gene transcription, and an important ingredient in a number of applications necessary for functional gene annotation, such as gene and operon predictions. Available methods for TSS detection in bacteria use very different description of the bacterial promoter structure and all of them show low accuracy. It is therefore unclear which promoter features should be included in TSS recognition, and how their accuracy impacts the search detection. We here address this question for [Formula: see text] and [Formula: see text] (an alternative [Formula: see text] factor) promoters in E. coli. We find that [Formula: see text]35 element, which is considered exchangeable, and is often not included in TSS search, contributes to the search accuracy equally (for [Formula: see text], or more (for [Formula: see text] than the ubiquitous [Formula: see text]10 element. Surprisingly, the sequence of the spacer between [Formula: see text]35 and [Formula: see text]10 promoter elements, which is commonly included in TSS detection, significantly decreases the search accuracy for [Formula: see text] promoters. However, the spacer sequence improves the search accuracy for [Formula: see text] promoters, which we attribute to a presence of sequence conservation. Overall, there is as much as [Formula: see text]50% false positive reduction for optimally implemented promoter features in [Formula: see text], underlying necessity for accurate promoter element alignments.
Collapse
Affiliation(s)
- Milos Nikolic
- * Faculty of Biology, University of Belgrade, Studentski trg 16 Belgrade, 11000, Serbia
| | - Tamara Stankovic
- * Faculty of Biology, University of Belgrade, Studentski trg 16 Belgrade, 11000, Serbia.,† Interdisciplinary PhD program in Biophysics, University of Belgrade, Studentski trg 1, 11000, Serbia
| | - Marko Djordjevic
- * Faculty of Biology, University of Belgrade, Studentski trg 16 Belgrade, 11000, Serbia
| |
Collapse
|
11
|
Prediction and identification of an acid-inducible promoter from Lactococcus lactis ssp. cremoris MG1363. Food Sci Biotechnol 2015. [DOI: 10.1007/s10068-015-0227-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
12
|
Lloréns-Rico V, Lluch-Senar M, Serrano L. Distinguishing between productive and abortive promoters using a random forest classifier in Mycoplasma pneumoniae. Nucleic Acids Res 2015; 43:3442-53. [PMID: 25779052 PMCID: PMC4402517 DOI: 10.1093/nar/gkv170] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Accepted: 02/22/2015] [Indexed: 12/01/2022] Open
Abstract
Distinguishing between promoter-like sequences in bacteria that belong to true or abortive promoters, or to those that do not initiate transcription at all, is one of the important challenges in transcriptomics. To address this problem, we have studied the genome-reduced bacterium Mycoplasma pneumoniae, for which the RNAs associated with transcriptional start sites have been recently experimentally identified. We determined the contribution to transcription events of different genomic features: the –10, extended –10 and –35 boxes, the UP element, the bases surrounding the –10 box and the nearest-neighbor free energy of the promoter region. Using a random forest classifier and the aforementioned features transformed into scores, we could distinguish between true, abortive promoters and non-promoters with good –10 box sequences. The methods used in this characterization of promoters can be extended to other bacteria and have important applications for promoter design in bacterial genome engineering.
Collapse
Affiliation(s)
- Verónica Lloréns-Rico
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain
| | - Maria Lluch-Senar
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain
| | - Luis Serrano
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
13
|
Panyukov VV, Ozoline ON. Promoters of Escherichia coli versus promoter islands: function and structure comparison. PLoS One 2013; 8:e62601. [PMID: 23717391 PMCID: PMC3661553 DOI: 10.1371/journal.pone.0062601] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2013] [Accepted: 03/23/2013] [Indexed: 12/21/2022] Open
Abstract
Expression of bacterial genes takes place under the control of RNA polymerase with exchangeable σ-subunits and multiple transcription factors. A typical promoter region contains one or several overlapping promoters. In the latter case promoters have the same or different σ-specificity and are often subjected to different regulatory stimuli. Genes, transcribed from multiple promoters, have on average higher expression levels. However, recently in the genome of Escherichia coli we found 78 regions with an extremely large number of potential transcription start points (promoter islands, PIs). It was shown that all PIs interact with RNA polymerase in vivo and are able to form transcriptionally competent open complexes both in vitro and in vivo but their transcriptional activity measured by oligonucleotide microarrays was very low, if any. Here we confirmed transcriptional defectiveness of PIs by analyzing the 5'-end specific RNA-seq data, but showed their ability to produce short oligos (9-14 bases). This combination of functional properties indicated a deliberate suppression of transcriptional activity within PIs. According to our data this suppression may be due to a specific conformation of the DNA double helix, which provides an ideal platform for interaction with both RNA polymerase and the histone-like nucleoid protein H-NS. The genomic DNA of E.coli contains therefore several dozen sites optimized by evolution for staying in a heterochromatin-like state. Since almost all promoter islands are associated with horizontally acquired genes, we offer them as specific components of bacterial evolution involved in acquisition of foreign genetic material by turning off the expression of toxic or useless aliens or by providing optimal promoter for beneficial genes. The putative molecular mechanism underlying the appearance of promoter islands within recipient genomes is discussed.
Collapse
Affiliation(s)
- Valeriy V. Panyukov
- Department of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
| | - Olga N. Ozoline
- Department of Functional Genomics and Cellular Stress, Institute of Cell Biophysics, Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
- Department of Cell Biology, Pushchino State Institute of Natural Sciences, Pushchino, Moscow Region, Russian Federation
| |
Collapse
|
14
|
Todt TJ, Wels M, Bongers RS, Siezen RS, van Hijum SAFT, Kleerebezem M. Genome-wide prediction and validation of sigma70 promoters in Lactobacillus plantarum WCFS1. PLoS One 2012; 7:e45097. [PMID: 23028780 PMCID: PMC3447810 DOI: 10.1371/journal.pone.0045097] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Accepted: 08/14/2012] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND In prokaryotes, sigma factors are essential for directing the transcription machinery towards promoters. Various sigma factors have been described that recognize, and bind to specific DNA sequence motifs in promoter sequences. The canonical sigma factor σ(70) is commonly involved in transcription of the cell's housekeeping genes, which is mediated by the conserved σ(70) promoter sequence motifs. In this study the σ(70)-promoter sequences in Lactobacillus plantarum WCFS1 were predicted using a genome-wide analysis. The accuracy of the transcriptionally-active part of this promoter prediction was subsequently evaluated by correlating locations of predicted promoters with transcription start sites inferred from the 5'-ends of transcripts detected by high-resolution tiling array transcriptome datasets. RESULTS To identify σ(70)-related promoter sequences, we performed a genome-wide sequence motif scan of the L. plantarum WCFS1 genome focussing on the regions upstream of protein-encoding genes. We obtained several highly conserved motifs including those resembling the conserved σ(70)-promoter consensus. Position weight matrices-based models of the recovered σ(70)-promoter sequence motif were employed to identify 3874 motifs with significant similarity (p-value<10(-4)) to the model-motif in the L. plantarum genome. Genome-wide transcript information deduced from whole genome tiling-array transcriptome datasets, was used to infer transcription start sites (TSSs) from the 5'-end of transcripts. By this procedure, 1167 putative TSSs were identified that were used to corroborate the transcriptionally active fraction of these predicted promoters. In total, 568 predicted promoters were found in proximity (≤ 40 nucleotides) of the putative TSSs, showing a highly significant co-occurrence of predicted promoter and TSS (p-value<10(-263)). CONCLUSIONS High-resolution tiling arrays provide a suitable source to infer TSSs at a genome-wide level, and allow experimental verification of in silico predicted promoter sequence motifs.
Collapse
Affiliation(s)
- Tilman J. Todt
- Center for Molecular and Biomolecular Informatics, Nijmegen Center for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, The Netherlands
- HAN University of Applied Sciences, Institute of Applied Sciences, Nijmegen, The Netherlands
| | - Michiel Wels
- Center for Molecular and Biomolecular Informatics, Nijmegen Center for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, The Netherlands
- NIZO food research, Ede, The Netherlands
- TI Food and Nutrition, Wageningen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, Delft, The Netherlands
| | - Roger S. Bongers
- NIZO food research, Ede, The Netherlands
- TI Food and Nutrition, Wageningen, The Netherlands
| | - Roland S. Siezen
- Center for Molecular and Biomolecular Informatics, Nijmegen Center for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, The Netherlands
- HAN University of Applied Sciences, Institute of Applied Sciences, Nijmegen, The Netherlands
- NIZO food research, Ede, The Netherlands
- TI Food and Nutrition, Wageningen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, Delft, The Netherlands
- Netherlands Bioinformatics Centre, Nijmegen, The Netherlands
| | - Sacha A. F. T. van Hijum
- Center for Molecular and Biomolecular Informatics, Nijmegen Center for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, The Netherlands
- NIZO food research, Ede, The Netherlands
- TI Food and Nutrition, Wageningen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, Delft, The Netherlands
- Netherlands Bioinformatics Centre, Nijmegen, The Netherlands
- * E-mail:
| | - Michiel Kleerebezem
- NIZO food research, Ede, The Netherlands
- TI Food and Nutrition, Wageningen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, Delft, The Netherlands
- Wageningen University, Host Microbe Interactomics Group, Wageningen, The Netherlands
| |
Collapse
|
15
|
Kim D, Hong JSJ, Qiu Y, Nagarajan H, Seo JH, Cho BK, Tsai SF, Palsson BØ. Comparative analysis of regulatory elements between Escherichia coli and Klebsiella pneumoniae by genome-wide transcription start site profiling. PLoS Genet 2012; 8:e1002867. [PMID: 22912590 PMCID: PMC3415461 DOI: 10.1371/journal.pgen.1002867] [Citation(s) in RCA: 111] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2012] [Accepted: 06/14/2012] [Indexed: 01/08/2023] Open
Abstract
Genome-wide transcription start site (TSS) profiles of the enterobacteria Escherichia coli and Klebsiella pneumoniae were experimentally determined through modified 5′ RACE followed by deep sequencing of intact primary mRNA. This identified 3,746 and 3,143 TSSs for E. coli and K. pneumoniae, respectively. Experimentally determined TSSs were then used to define promoter regions and 5′ UTRs upstream of coding genes. Comparative analysis of these regulatory elements revealed the use of multiple TSSs, identical sequence motifs of promoter and Shine-Dalgarno sequence, reflecting conserved gene expression apparatuses between the two species. In both species, over 70% of primary transcripts were expressed from operons having orthologous genes during exponential growth. However, expressed orthologous genes in E. coli and K. pneumoniae showed a strikingly different organization of upstream regulatory regions with only 20% identical promoters with TSSs in both species. Over 40% of promoters had TSSs identified in only one species, despite conserved promoter sequences existing in the other species. 662 conserved promoters having TSSs in both species resulted in the same number of comparable 5′ UTR pairs, and that regulatory element was found to be the most variant region in sequence among promoter, 5′ UTR, and ORF. In K. pneumoniae, 48 sRNAs were predicted and 36 of them were expressed during exponential growth. Among them, 34 orthologous sRNAs between two species were analyzed in depth, and the analysis showed that many sRNAs of K. pneumoniae, including pleiotropic sRNAs such as rprA, arcZ, and sgrS, may work in the same way as in E. coli. These results reveal a new dimension of comparative genomics such that a comparison of two genomes needs to be comprehensive over all levels of genome organization. In order to investigate similarities and differences of closely related species, most of the comparative genomics studies focus on comparing the gene contents either shared or specific for each genome. However, it is also important to investigate the differences in non-coding regulatory elements because they influence the transcriptional and post-transcriptional processes. Thus, we performed a genome-wide profiling of transcription start sites (TSSs) in two species, E. coli K-12 MG1655 and K. pneumoniae MGH78578. Experimental identification of TSSs is important for precise definition of promoter regions and 5′ untranslated regions upstream of coding genes. Comparative analysis of these regulatory elements revealed the use of multiple TSSs, identical sequence motifs of promoter and Shine-Dalgarno sequence. However, we observed that the upstream regulatory regions of the majority of operons having orthologous genes were organized with different usage of promoters and TSSs, resulting in diverse and complex gene regulation. We also found that the 5′ UTR is the least conserved regulatory element in sequence between the two species. Moreover, 34 orthologous sRNAs between E. coli and K. pneumoniae were analyzed in depth. The analysis suggested many of K. pneumoniae sRNAs might regulate the target genes as in E. coli.
Collapse
Affiliation(s)
- Donghyuk Kim
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Jay Sung-Joong Hong
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Yu Qiu
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Harish Nagarajan
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Joo-Hyun Seo
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Byung-Kwan Cho
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Shih-Feng Tsai
- Division of Molecular and Genomic Medicine, National Health Research Institutes, Miaoli, Taiwan
| | - Bernhard Ø. Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
16
|
Redefining Escherichia coli σ(70) promoter elements: -15 motif as a complement of the -10 motif. J Bacteriol 2011; 193:6305-14. [PMID: 21908667 DOI: 10.1128/jb.05947-11] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Classical elements of σ(70) bacterial promoters include the -35 element ((-35)TTGACA(-30)), the -10 element ((-12)TATAAT(-7)), and the extended -10 element ((-15)TG(-14)). Although the -35 element, the extended -10 element, and the upstream-most base in the -10 element ((-12)T) interact with σ(70) in double-stranded DNA (dsDNA) form, the downstream bases in the -10 motif ((-11)ATAAT(-7)) are responsible for σ(70)-single-stranded DNA (ssDNA) interactions. In order to directly reflect this correspondence, an extension of the extended -10 element to a so-called -15 element ((-15)TGnT(-12)) has been recently proposed. I investigated here the sequence specificity of the proposed -15 element and its relationship to other promoter elements. I found a previously undetected significant conservation of (-13)G and a high degeneracy at (-15)T. I therefore defined the -15 element as a degenerate motif, which, together with the conserved stretch of sequence between -15 and -12, allows treating this element analogously to -35 and -10 elements. Furthermore, the strength of the -15 element inversely correlates with the strengths of the -35 element and -10 element, whereas no such complementation between other promoter elements was found. Despite the direct involvement of -15 element in σ(70)-dsDNA interactions, I found a significantly stronger tendency of this element to complement weak -10 elements that are involved in σ(70)-ssDNA interactions. This finding is in contrast to the established view, according to which the -15 element provides a sufficient number of σ(70)-dsDNA interactions, and suggests that the main parameter determining a functional promoter is the overall promoter strength.
Collapse
|
17
|
Bland C, Newsome AS, Markovets AA. Promoter prediction in E. coli based on SIDD profiles and Artificial Neural Networks. BMC Bioinformatics 2010; 11 Suppl 6:S17. [PMID: 20946600 PMCID: PMC3026364 DOI: 10.1186/1471-2105-11-s6-s17] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND One of the major challenges in biology is the correct identification of promoter regions. Computational methods based on motif searching have been the traditional approach taken. Recent studies have shown that DNA structural properties, such as curvature, stacking energy, and stress-induced duplex destabilization (SIDD) are useful in promoter prediction, as well. In this paper, the currently used SIDD energy threshold method is compared to the proposed artificial neural network (ANN) approach for finding promoters based on SIDD profile data. RESULTS When compared to the SIDD threshold prediction method, artificial neural networks showed noticeable improvements for precision, recall, and F-score over a range of values. The maximal F-score for the ANN classifier was 62.3 and 56.8 for the threshold-based classifier. CONCLUSIONS Artificial neural networks were used to predict promoters based on SIDD profile data. Results using this technique were an improvement over the previous SIDD threshold approach. Over a wide range of precision-recall values, artificial neural networks were more capable of identifying distinctive characteristics of promoter regions than threshold based methods.
Collapse
Affiliation(s)
- Charles Bland
- Department Natural Sciences and Environmental Health, Mississippi Valley State University, 14000 Hwy 82 West, Itta Bena, Mississippi 38941, USA
| | | | | |
Collapse
|
18
|
Hopman CTP, Speijer D, van der Ende A, Pannekoek Y. Identification of a novel anti-sigmaE factor in Neisseria meningitidis. BMC Microbiol 2010; 10:164. [PMID: 20525335 PMCID: PMC2893595 DOI: 10.1186/1471-2180-10-164] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2010] [Accepted: 06/04/2010] [Indexed: 08/30/2023] Open
Abstract
Background Fine tuning expression of genes is a prerequisite for the strictly human pathogen Neisseria meningitidis to survive hostile growth conditions and establish disease. Many bacterial species respond to stress by using alternative σ factors which, in complex with RNA polymerase holoenzyme, recognize specific promoter determinants. σE, encoded by rpoE (NMB2144) in meningococci, is known to be essential in mounting responses to environmental challenges in many pathogens. Here we identified genes belonging to the σE regulon of meningococci. Results We show that meningococcal σE is part of the polycistronic operon NMB2140-NMB2145 and autoregulated. In addition we demonstrate that σE controls expression of methionine sulfoxide reductase (MsrA/MsrB). Moreover, we provide evidence that the activity of σE is under control of NMB2145, directly downstream of rpoE. The protein encoded by NMB2145 is structurally related to anti-sigma domain (ASD) proteins and characterized by a zinc containing anti-σ factor (ZAS) motif, a hall mark of a specific class of Zn2+-binding ASD proteins acting as anti-σ factors. We demonstrate that Cys residues in ZAS, as well as the Cys residue on position 4, are essential for anti-σE activity of NMB2145, as found for a minority of members of the ZAS family that are predicted to act in the cytoplasm and responding to oxidative stimuli. However, exposure of cells to oxidative stimuli did not result in altered expression of σE. Conclusions Together, our results demonstrate that meningococci express a functional transcriptionally autoregulated σE factor, the activity of which is controlled by a novel meningococcal anti-σ factor belonging to the ZAS family.
Collapse
Affiliation(s)
- Carla Th P Hopman
- Academic Medical Center, Center for Infection and Immunity Amsterdam (CINIMA), Department of Medical Microbiology, Amsterdam, the Netherlands
| | | | | | | |
Collapse
|
19
|
Predicting strength and function for promoters of the Escherichia coli alternative sigma factor, sigmaE. Proc Natl Acad Sci U S A 2010; 107:2854-9. [PMID: 20133665 DOI: 10.1073/pnas.0915066107] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Sequenced bacterial genomes provide a wealth of information but little understanding of transcriptional regulatory circuits largely because accurate prediction of promoters is difficult. We examined two important issues for accurate promoter prediction: (1) the ability to predict promoter strength and (2) the sequence properties that distinguish between active and weak/inactive promoters. We addressed promoter prediction using natural core promoters recognized by the well-studied alternative sigma factor, Escherichia coli sigma(E), as a representative of group 4 sigmas, the largest sigma group. To evaluate the contribution of sequence to promoter strength and function, we used modular position weight matrix models comprised of each promoter motif and a penalty score for suboptimal motif location. We find that a combination of select modules is moderately predictive of promoter strength and that imposing minimal motif scores distinguished active from weak/inactive promoters. The combined -35/-10 score is the most important predictor of activity. Our models also identified key sequence features associated with active promoters. A conserved "AAC" motif in the -35 region is likely to be a general predictor of function for promoters recognized by group 4 sigmas. These results provide valuable insights into sequences that govern promoter strength, distinguish active and inactive promoters for the first time, and are applicable to both in vivo and in vitro measures of promoter strength.
Collapse
|
20
|
Mallios RR, Ojcius DM, Ardell DH. An iterative strategy combining biophysical criteria and duration hidden Markov models for structural predictions of Chlamydia trachomatis sigma66 promoters. BMC Bioinformatics 2009; 10:271. [PMID: 19715597 PMCID: PMC2743672 DOI: 10.1186/1471-2105-10-271] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2009] [Accepted: 08/28/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Promoter identification is a first step in the quest to explain gene regulation in bacteria. It has been demonstrated that the initiation of bacterial transcription depends upon the stability and topology of DNA in the promoter region as well as the binding affinity between the RNA polymerase sigma-factor and promoter. However, promoter prediction algorithms to date have not explicitly used an ensemble of these factors as predictors. In addition, most promoter models have been trained on data from Escherichia coli. Although it has been shown that transcriptional mechanisms are similar among various bacteria, it is quite possible that the differences between Escherichia coli and Chlamydia trachomatis are large enough to recommend an organism-specific modeling effort. RESULTS Here we present an iterative stochastic model building procedure that combines such biophysical metrics as DNA stability, curvature, twist and stress-induced DNA duplex destabilization along with duration hidden Markov model parameters to model Chlamydia trachomatis sigma66 promoters from 29 experimentally verified sequences. Initially, iterative duration hidden Markov modeling of the training set sequences provides a scoring algorithm for Chlamydia trachomatis RNA polymerase sigma66/DNA binding. Subsequently, an iterative application of Stepwise Binary Logistic Regression selects multiple promoter predictors and deletes/replaces training set sequences to determine an optimal training set. The resulting model predicts the final training set with a high degree of accuracy and provides insights into the structure of the promoter region. Model based genome-wide predictions are provided so that optimal promoter candidates can be experimentally evaluated, and refined models developed. Co-predictions with three other algorithms are also supplied to enhance reliability. CONCLUSION This strategy and resulting model support the conjecture that DNA biophysical properties, along with RNA polymerase sigma-factor/DNA binding collaboratively, contribute to a sequence's ability to promote transcription. This work provides a baseline model that can evolve as new Chlamydia trachomatis sigma66 promoters are identified with assistance from the provided genome-wide predictions. The proposed methodology is ideal for organisms with few identified promoters and relatively small genomes.
Collapse
Affiliation(s)
- Ronna R Mallios
- School of Natural Sciences, University of California, Merced, CA 95344, USA.
| | | | | |
Collapse
|
21
|
Shavkunov KS, Masulis IS, Tutukina MN, Deev AA, Ozoline ON. Gains and unexpected lessons from genome-scale promoter mapping. Nucleic Acids Res 2009; 37:4919-31. [PMID: 19528070 PMCID: PMC2731890 DOI: 10.1093/nar/gkp490] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Potential promoters in the genome of Escherichia coli were searched by pattern recognition software PlatProm and classified on the basis of positions relative to gene borders. Beside the expected promoters located in front of the coding sequences we found a considerable amount of intragenic promoter-like signals with a putative ability to drive either antisense or alternative transcription and revealed unusual genomic regions with extremely high density of predicted transcription start points (promoter ‘islands’), some of which are located in coding sequences. PlatProm scores converted into probability of RNA polymerase binding demonstrated certain correlation with the enzyme retention registered by ChIP-on-chip technique; however, in ‘dense’ regions the value of correlation coefficient is lower than throughout the entire genome. Experimental verification confirmed the ability of RNA polymerase to interact and form multiple open complexes within promoter ‘island’ associated with appY, yet transcription efficiency was lower than might be expected. Analysis of expression data revealed the same tendency for other promoter ‘islands’, thus assuming functional relevance of non-productive RNA polymerase binding. Our data indicate that genomic DNA of E. coli is enriched by numerous unusual promoter-like sites with biological role yet to be understood.
Collapse
Affiliation(s)
- K S Shavkunov
- Institute of Cell Biophysics, of Russian Academy of Sciences, Pushchino, Moscow Region 142290, Russian Federation
| | | | | | | | | |
Collapse
|
22
|
Zhang J, Li E, Olsen GJ. Protein-coding gene promoters in Methanocaldococcus (Methanococcus) jannaschii. Nucleic Acids Res 2009; 37:3588-601. [PMID: 19359364 PMCID: PMC2699501 DOI: 10.1093/nar/gkp213] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Although Methanocaldococcus (Methanococcus) jannaschii was the first archaeon to have its genome sequenced, little is known about the promoters of its protein-coding genes. To expand our knowledge, we have experimentally identified 131 promoters for 107 protein-coding genes in this genome by mapping their transcription start sites. Compared to previously identified promoters, more than half of which are from genes for stable RNAs, the protein-coding gene promoters are qualitatively similar in overall sequence pattern, but statistically different at several positions due to greater variation among their sequences. Relative binding affinity for general transcription factors was measured for 12 of these promoters by competition electrophoretic mobility shift assays. These promoters bind the factors less tightly than do most tRNA gene promoters. When a position weight matrix (PWM) was constructed from the protein gene promoters, factor binding affinities correlated with corresponding promoter PWM scores. We show that the PWM based on our data more accurately predicts promoters in the genome and transcription start sites than could be done with the previously available data. We also introduce a PWM logo, which visually displays the implications of observing a given base at a position in a sequence.
Collapse
Affiliation(s)
- Jian Zhang
- Department of Microbiology, University of Illinois at Urbana-Champaign, 601 South Goodwin Avenue, Urbana, IL 61801, USA
| | | | | |
Collapse
|
23
|
Dekhtyar M, Morin A, Sakanyan V. Triad pattern algorithm for predicting strong promoter candidates in bacterial genomes. BMC Bioinformatics 2008; 9:233. [PMID: 18471287 PMCID: PMC2412878 DOI: 10.1186/1471-2105-9-233] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2007] [Accepted: 05/09/2008] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Bacterial promoters, which increase the efficiency of gene expression, differ from other promoters by several characteristics. This difference, not yet widely exploited in bioinformatics, looks promising for the development of relevant computational tools to search for strong promoters in bacterial genomes. RESULTS We describe a new triad pattern algorithm that predicts strong promoter candidates in annotated bacterial genomes by matching specific patterns for the group I sigma70 factors of Escherichia coli RNA polymerase. It detects promoter-specific motifs by consecutively matching three patterns, consisting of an UP-element, required for interaction with the alpha subunit, and then optimally-separated patterns of -35 and -10 boxes, required for interaction with the sigma70 subunit of RNA polymerase. Analysis of 43 bacterial genomes revealed that the frequency of candidate sequences depends on the A+T content of the DNA under examination. The accuracy of in silico prediction was experimentally validated for the genome of a hyperthermophilic bacterium, Thermotoga maritima, by applying a cell-free expression assay using the predicted strong promoters. In this organism, the strong promoters govern genes for translation, energy metabolism, transport, cell movement, and other as-yet unidentified functions. CONCLUSION The triad pattern algorithm developed for predicting strong bacterial promoters is well suited for analyzing bacterial genomes with an A+T content of less than 62%. This computational tool opens new prospects for investigating global gene expression, and individual strong promoters in bacteria of medical and/or economic significance.
Collapse
Affiliation(s)
| | - Amelie Morin
- Laboratoire de Biotechnologie, UMR CNRS 6204, Université de Nantes, 2 rue de la Houssinière, 44322 Nantes, France
| | - Vehary Sakanyan
- Laboratoire de Biotechnologie, UMR CNRS 6204, Université de Nantes, 2 rue de la Houssinière, 44322 Nantes, France
- ProtNeteomix, 2 rue de la Houssinière, 44322 Nantes, France
| |
Collapse
|
24
|
Wang Z, Jin L, Węgrzyn G, Węgrzyn A. Screening of the osmotic pressure-inducible promoter regions from the whole genome of Escherichia coli by using a novel cloning method. Biotechnol Lett 2008; 30:707-11. [DOI: 10.1007/s10529-007-9583-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2007] [Revised: 10/22/2007] [Accepted: 10/24/2007] [Indexed: 11/27/2022]
|
25
|
SIGffRid: a tool to search for sigma factor binding sites in bacterial genomes using comparative approach and biologically driven statistics. BMC Bioinformatics 2008; 9:73. [PMID: 18237374 PMCID: PMC2375139 DOI: 10.1186/1471-2105-9-73] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2007] [Accepted: 01/31/2008] [Indexed: 11/10/2022] Open
Abstract
Background Many programs have been developed to identify transcription factor binding sites. However, most of them are not able to infer two-word motifs with variable spacer lengths. This case is encountered for RNA polymerase Sigma (σ) Factor Binding Sites (SFBSs) usually composed of two boxes, called -35 and -10 in reference to the transcription initiation point. Our goal is to design an algorithm detecting SFBS by using combinational and statistical constraints deduced from biological observations. Results We describe a new approach to identify SFBSs by comparing two related bacterial genomes. The method, named SIGffRid (SIGma Factor binding sites Finder using R'MES to select Input Data), performs a simultaneous analysis of pairs of promoter regions of orthologous genes. SIGffRid uses a prior identification of over-represented patterns in whole genomes as selection criteria for potential -35 and -10 boxes. These patterns are then grouped using pairs of short seeds (of which one is possibly gapped), allowing a variable-length spacer between them. Next, the motifs are extended guided by statistical considerations, a feature that ensures a selection of motifs with statistically relevant properties. We applied our method to the pair of related bacterial genomes of Streptomyces coelicolor and Streptomyces avermitilis. Cross-check with the well-defined SFBSs of the SigR regulon in S. coelicolor is detailed, validating the algorithm. SFBSs for HrdB and BldN were also found; and the results suggested some new targets for these σ factors. In addition, consensus motifs for BldD and new SFBSs binding sites were defined, overlapping previously proposed consensuses. Relevant tests were carried out also on bacteria with moderate GC content (i.e. Escherichia coli/Salmonella typhimurium and Bacillus subtilis/Bacillus licheniformis pairs). Motifs of house-keeping σ factors were found as well as other SFBSs such as that of SigW in Bacillus strains. Conclusion We demonstrate that our approach combining statistical and biological criteria was successful to predict SFBSs. The method versatility autorizes the recognition of other kinds of two-box regulatory sites.
Collapse
|
26
|
Sorokin AA, Osipov AA, Beskaravainyi PM, Kamzolova SG. Analysis of the nucleotide sequence and electrostatic potential distribution in the Escherichia coli genome. Biophysics (Nagoya-shi) 2007. [DOI: 10.1134/s0006350907020042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
27
|
Abstract
Information theory was used to build a promoter model that accounts for the -10, the -35 and the uncertainty of the gap between them on a common scale. Helical face assignment indicated that base -7, rather than -11, of the -10 may be flipping to initiate transcription. We found that the sequence conservation of sigma70 binding sites is 6.5 +/- 0.1 bits. Some promoters lack a -35 region, but have a 6.7 +/- 0.2 bit extended -10, almost the same information as the bipartite promoter. These results and similarities between the contacts in the extended -10 binding and the -35 suggest that the flexible bipartite sigma factor evolved from a simpler polymerase. Binding predicted by the bipartite model is enriched around 35 bases upstream of the translational start. This distance is the smallest 5' mRNA leader necessary for ribosome binding, suggesting that selective pressure minimizes transcript length. The promoter model was combined with models of the transcription factors Fur and Lrp to locate new promoters, to quantify promoter strengths, and to predict activation and repression. Finally, the DNA-bending proteins Fis, H-NS and IHF frequently have sites within one DNA persistence length from the -35, so bending allows distal activators to reach the polymerase.
Collapse
Affiliation(s)
| | | | | | - Thomas D. Schneider
- To whom correspondence should be addressed. Tel: +1 301 846 5581; Fax: +1 301 846 5598;
| |
Collapse
|
28
|
Mann S, Li J, Chen YPP. A pHMM-ANN based discriminative approach to promoter identification in prokaryote genomic contexts. Nucleic Acids Res 2006; 35:e12. [PMID: 17170007 PMCID: PMC1802591 DOI: 10.1093/nar/gkl1024] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2006] [Revised: 10/25/2006] [Accepted: 11/14/2006] [Indexed: 11/14/2022] Open
Abstract
The computational approach for identifying promoters on increasingly large genomic sequences has led to many false positives. The biological significance of promoter identification lies in the ability to locate true promoters with and without prior sequence contextual knowledge. Prior approaches to promoter modelling have involved artificial neural networks (ANNs) or hidden Markov models (HMMs), each producing adequate results on small scale identification tasks, i.e. narrow upstream regions. In this work, we present an architecture to support prokaryote promoter identification on large scale genomic sequences, i.e. not limited to narrow upstream regions. The significant contribution involved the hybrid formed via aggregation of the profile HMM with the ANN, via Viterbi scoring optimizations. The benefit obtained using this architecture includes the modelling ability of the profile HMM with the ability of the ANN to associate elements composing the promoter. We present the high effectiveness of the hybrid approach in comparison to profile HMMs and ANNs when used separately. The contribution of Viterbi optimizations is also highlighted for supporting the hybrid architecture in which gains in sensitivity (+0.3), specificity (+0.65) and precision (+0.54) are achieved over existing approaches.
Collapse
Affiliation(s)
- Scott Mann
- School of Engineering and Information Technology, Deakin UniversityVictoria, Australia
| | - Jinyan Li
- Institute for Infocomm ResearchSingapore 119613
| | - Yi-Ping Phoebe Chen
- School of Engineering and Information Technology, Deakin UniversityVictoria, Australia
- Australian Research Council Centre in BioinformaticsMelbourne, Australia
| |
Collapse
|
29
|
Sorokin AA, Osypov AA, Dzhelyadin TR, Beskaravainy PM, Kamzolova SG. Electrostatic properties of promoter recognized by E. coli RNA polymerase Esigma70. J Bioinform Comput Biol 2006; 4:455-67. [PMID: 16819795 DOI: 10.1142/s0219720006002077] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2005] [Revised: 01/04/2006] [Accepted: 01/05/2006] [Indexed: 11/18/2022]
Abstract
A comparative analysis of electrostatic patterns for 359 sigma70-specific promoters and 359 nonpromoter regions on electrostatic map of Escherichia coli genome was carried out. It was found that DNA is not a uniformly charged molecule. There are some local inhomogeneities in its electrostatic profile which correlate with promoter sequences. Electrostatic patterns of promoter DNAs can be specified due to the presence of some distinctive motifs which differ for different promoter groups and may be involved as signal elements in differential recognition of various promoters by the enzyme. Some specific electrostatic elements which are responsible for modulating promoter activities due to ADP-ribosylation of RNA polymerase alpha-subunit were found in far upstream regions of T4 phage early promoters and E. coli ribosomal promoters.
Collapse
Affiliation(s)
- Anatoly A Sorokin
- Laboratory of Mechanisms of the Cell Genom Functioning, Institute of Cell Biophysics RAS, Pushchino, 142290, Russia.
| | | | | | | | | |
Collapse
|
30
|
Ozoline ON, Deev AA. Predicting antisense RNAs in the genomes of Escherichia coli and Salmonella typhimurium using promoter-search algorithm PlatProm. J Bioinform Comput Biol 2006; 4:443-54. [PMID: 16819794 DOI: 10.1142/s0219720006001916] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2005] [Revised: 12/29/2005] [Accepted: 01/13/2006] [Indexed: 11/18/2022]
Abstract
A pattern recognition software PlatProm, which takes into consideration both sequence-specific and structure-specific features in the genetic environment of the promoter sites and identifies transcription start points with a very high accuracy was used to reveal potentially transcribed regions in the genomes of two bacterial species. Along with the expected promoters located upstream from coding sequences PlatProm identified several hundred of very similar signals in other intergenic regions and within coding sequences. Homologous genes of Escherichia coli and Salmonella typhimurium, containing potential promoters on the template strand are suggested as putative targets for regulations by antisense RNA-products (aRNAs).
Collapse
Affiliation(s)
- Olga N Ozoline
- Institute of Cell Biophysics, Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia.
| | | |
Collapse
|
31
|
Wang H, Benham CJ. Promoter prediction and annotation of microbial genomes based on DNA sequence and structural responses to superhelical stress. BMC Bioinformatics 2006; 7:248. [PMID: 16677393 PMCID: PMC1468432 DOI: 10.1186/1471-2105-7-248] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2006] [Accepted: 05/05/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In our previous studies, we found that the sites in prokaryotic genomes which are most susceptible to duplex destabilization under the negative superhelical stresses that occur in vivo are statistically highly significantly associated with intergenic regions that are known or inferred to contain promoters. In this report we investigate how this structural property, either alone or together with other structural and sequence attributes, may be used to search prokaryotic genomes for promoters. RESULTS We show that the propensity for stress-induced DNA duplex destabilization (SIDD) is closely associated with specific promoter regions. The extent of destabilization in promoter-containing regions is found to be bimodally distributed. When compared with DNA curvature, deformability, thermostability or sequence motif scores within the -10 region, SIDD is found to be the most informative DNA property regarding promoter locations in the E. coli K12 genome. SIDD properties alone perform better at detecting promoter regions than other programs trained on this genome. Because this approach has a very low false positive rate, it can be used to predict with high confidence the subset of promoters that are strongly destabilized. When SIDD properties are combined with -10 motif scores in a linear classification function, they predict promoter regions with better than 80% accuracy. When these methods were tested with promoter and non-promoter sequences from Bacillus subtilis, they achieved similar or higher accuracies. We also present a strictly SIDD-based predictor for annotating promoter sequences in complete microbial genomes. CONCLUSION In this report we show that the propensity to undergo stress-induced duplex destabilization (SIDD) is a distinctive structural attribute of many prokaryotic promoter sequences. We have developed methods to identify promoter sequences in prokaryotic genomes that use SIDD either as a sole predictor or in combination with other DNA structural and sequence properties. Although these methods cannot predict all the promoter-containing regions in a genome, they do find large sets of potential regions that have high probabilities of being true positives. This approach could be especially valuable for annotating those genomes about which there is limited experimental data.
Collapse
Affiliation(s)
- Huiquan Wang
- UC Davis Genome Center, University of California, One Shields Avenue, Davis, CA 95616, USA
| | - Craig J Benham
- UC Davis Genome Center, University of California, One Shields Avenue, Davis, CA 95616, USA
| |
Collapse
|
32
|
Rhodius VA, Suh WC, Nonaka G, West J, Gross CA. Conserved and variable functions of the sigmaE stress response in related genomes. PLoS Biol 2006; 4:e2. [PMID: 16336047 PMCID: PMC1312014 DOI: 10.1371/journal.pbio.0040002] [Citation(s) in RCA: 412] [Impact Index Per Article: 22.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2005] [Accepted: 10/13/2005] [Indexed: 11/19/2022] Open
Abstract
Bacteria often cope with environmental stress by inducing alternative sigma (σ) factors, which direct RNA polymerase to specific promoters, thereby inducing a set of genes called a regulon to combat the stress. To understand the conserved and organism-specific functions of each σ, it is necessary to be able to predict their promoters, so that their regulons can be followed across species. However, the variability of promoter sequences and motif spacing makes their prediction difficult. We developed and validated an accurate promoter prediction model for Escherichia coli σE, which enabled us to predict a total of 89 unique σE-controlled transcription units in E. coli K-12 and eight related genomes. σE controls the envelope stress response in E. coli K-12. The portion of the regulon conserved across genomes is functionally coherent, ensuring the synthesis, assembly, and homeostasis of lipopolysaccharide and outer membrane porins, the key constituents of the outer membrane of Gram-negative bacteria. The larger variable portion is predicted to perform pathogenesis-associated functions, suggesting that σE provides organism-specific functions necessary for optimal host interaction. The success of our promoter prediction model for σE suggests that it will be applicable for the prediction of promoter elements for many alternative σ factors. A model for predicting the variable promoter sequences associated with the bacterial stress response is developed and used to identify constituents of the transcriptional response to σE.
Collapse
Affiliation(s)
- Virgil A Rhodius
- 1 Department of Microbiology and Immunology, University of California, San Francisco, California, United States of America
| | - Won Chul Suh
- 1 Department of Microbiology and Immunology, University of California, San Francisco, California, United States of America
| | - Gen Nonaka
- 1 Department of Microbiology and Immunology, University of California, San Francisco, California, United States of America
| | - Joyce West
- 1 Department of Microbiology and Immunology, University of California, San Francisco, California, United States of America
| | - Carol A Gross
- 1 Department of Microbiology and Immunology, University of California, San Francisco, California, United States of America
- 2 Department of Cell and Tissue Biology, University of California, San Francisco, California, United States of America
| |
Collapse
|
33
|
A novel method for prokaryotic promoter prediction based on DNA stability. BMC Bioinformatics 2005; 6:1. [PMID: 15631638 PMCID: PMC545949 DOI: 10.1186/1471-2105-6-1] [Citation(s) in RCA: 179] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2004] [Accepted: 01/05/2005] [Indexed: 11/28/2022] Open
Abstract
Background In the post-genomic era, correct gene prediction has become one of the biggest challenges in genome annotation. Improved promoter prediction methods can be one step towards developing more reliable ab initio gene prediction methods. This work presents a novel prokaryotic promoter prediction method based on DNA stability. Results The promoter region is less stable and hence more prone to melting as compared to other genomic regions. Our analysis shows that a method of promoter prediction based on the differences in the stability of DNA sequences in the promoter and non-promoter region works much better compared to existing prokaryotic promoter prediction programs, which are based on sequence motif searches. At present the method works optimally for genomes such as that of Escherichia coli, which have near 50 % G+C composition and also performs satisfactorily in case of other prokaryotic promoters. Conclusions Our analysis clearly shows that the change in stability of DNA seems to provide a much better clue than usual sequence motifs, such as Pribnow box and -35 sequence, for differentiating promoter region from non-promoter regions. To a certain extent, it is more general and is likely to be applicable across organisms. Hence incorporation of such features in addition to the signature motifs can greatly improve the presently available promoter prediction programs.
Collapse
|
34
|
Pisanti N, Crochemore M, Grossi R, Sagot MF. Bases of motifs for generating repeated patterns with wild cards. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2005; 2:40-50. [PMID: 17044163 DOI: 10.1109/tcbb.2005.5] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Motif inference represents one of the most important areas of research in computational biology, and one of its oldest ones. Despite this, the problem remains very much open in the sense that no existing definition is fully satisfying, either in formal terms, or in relation to the biological questions that involve finding such motifs. Two main types of motifs have been considered in the literature: matrices (of letter frequency per position in the motif) and patterns. There is no conclusive evidence in favor of either, and recent work has attempted to integrate the two types into a single model. In this paper, we address the formal issue in relation to motifs as patterns. This is essential to get at a better understanding of motifs in general. In particular, we consider a promising idea that was recently proposed, which attempted to avoid the combinatorial explosion in the number of motifs by means of a generator set for the motifs. Instead of exhibiting a complete list of motifs satisfying some input constraints, what is produced is a basis of such motifs from which all the other ones can be generated. We study the computational cost of determining such a basis of repeated motifs with wild cards in a sequence. We give new upper and lower bounds on such a cost, introducing a notion of basis that is provably contained in (and, thus, smaller) than previously defined ones. Our basis can be computed in less time and space, and is still able to generate the same set of motifs. We also prove that the number of motifs in all bases defined so far grows exponentially with the quorum, that is, with the minimal number of times a motif must appear in a sequence, something unnoticed in previous work. We show that there is no hope to efficiently compute such bases unless the quorum is fixed.
Collapse
Affiliation(s)
- Nadia Pisanti
- Dipartimento di Informatica, Université di Pisa, Italy.
| | | | | | | |
Collapse
|
35
|
Wang H, Noordewier M, Benham CJ. Stress-induced DNA duplex destabilization (SIDD) in the E. coli genome: SIDD sites are closely associated with promoters. Genome Res 2004; 14:1575-84. [PMID: 15289476 PMCID: PMC509266 DOI: 10.1101/gr.2080004] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
We present the first analysis of stress-induced DNA duplex destabilization (SIDD) in a complete chromosome, the Escherichia coli K12 genome. We used a newly developed method to calculate the locations and extents of stress-induced destabilization to single-base resolution at superhelix density sigma = -0.06. We find that SIDD sites in this genome show a statistically highly significant tendency to avoid coding regions. And among intergenic regions, those that either contain documented promoters or occur between divergently transcribing coding regions, and hence may be inferred to contain promoters, are associated with strong SIDD sites in a statistically highly significant manner. Intergenic regions located between convergently transcribing genes, which are inferred not to contain promoters, are not significantly enriched for destabilized sites. Statistical analysis shows that a strongly destabilized intergenic region has an 80% chance of containing a promoter, whereas an intergenic region that does not contain a strong SIDD site has only a 24% chance. We describe how these observations may illuminate specific mechanisms of regulation, and assist in the computational identification of promoter locations in prokaryotes.
Collapse
Affiliation(s)
- Huiquan Wang
- UC Davis Genome Center, University of California, Davis, California 95616, USA
| | | | | |
Collapse
|
36
|
Zinin NV, Serkina AV, Gelfand MS, Shevelev AB, Sineoky SP. Gene cloning, expression and characterization of novel phytase from Obesumbacterium proteus. FEMS Microbiol Lett 2004. [DOI: 10.1111/j.1574-6968.2004.tb09659.x] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
37
|
Huerta AM, Collado-Vides J. Sigma70 promoters in Escherichia coli: specific transcription in dense regions of overlapping promoter-like signals. J Mol Biol 2003; 333:261-78. [PMID: 14529615 DOI: 10.1016/j.jmb.2003.07.017] [Citation(s) in RCA: 158] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
We present here a computational analysis showing that sigma70 house-keeping promoters are located within zones with high densities of promoter-like signals in Escherichia coli, and we introduce strategies that allow for the correct computer prediction of sigma70 promoters. Based on 599 experimentally verified promoters of E.coli K-12, we generated and evaluated more than 200 weight matrices optimizing different criteria to obtain the best recognition matrices. The alignments generating the best statistical models did not fully correspond with the canonical sigma70 model. However, matrices that correspond to such a canonical model performed better as tools for prediction. We tested the predictive capacity of these matrices on 250 bp long regions upstream of gene starts, where 90% of the known promoters occur. The computational matrix models generated an average of 38 promoter-like signals within each 250 bp region. In more than 50% of the cases, the true promoter does not have the best score within the region. We observed, in fact, that real promoters occur mostly within regions with high densities of overlapping putative promoters. We evaluated several strategies to identify promoters. The best one uses an intrinsic score of the -10 and -35 hexamers that form the promoter as well as an extrinsic score that uses the distribution of promoters from the start of the gene. We were able to identify 86% true promoters correctly, generating an average of 4.7 putative promoters per region as output, of which 3.7, on average, exist in clusters, as a series of overlapping potentially competing RNA polymerase-binding sites. As far as we know, this is the highest predictive capability reported so far. This high signal density is found mainly within regions upstream of genes, contrasting with coding regions and regions located between convergently transcribed genes. These results are consistent with experimental evidence that show the existence of multiple overlapping promoter sites that become functional under particular conditions. This density is probably the consequence of a rich number of vestiges of promoters in evolution. We suggest that transcriptional regulators as well as other functional promoters play an important role in keeping these latent signals suppressed.
Collapse
Affiliation(s)
- Araceli M Huerta
- Program of Computational Genomics, Nitrogen Fixation Center, UNAM, Cuernavaca, AP 565-A, Morelos 62100, Mexico
| | | |
Collapse
|
38
|
Rombauts S, Florquin K, Lescot M, Marchal K, Rouzé P, van de Peer Y. Computational approaches to identify promoters and cis-regulatory elements in plant genomes. PLANT PHYSIOLOGY 2003; 132:1162-76. [PMID: 12857799 PMCID: PMC167057 DOI: 10.1104/pp.102.017715] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2002] [Revised: 01/10/2003] [Accepted: 03/17/2003] [Indexed: 05/19/2023]
Abstract
The identification of promoters and their regulatory elements is one of the major challenges in bioinformatics and integrates comparative, structural, and functional genomics. Many different approaches have been developed to detect conserved motifs in a set of genes that are either coregulated or orthologous. However, although recent approaches seem promising, in general, unambiguous identification of regulatory elements is not straightforward. The delineation of promoters is even harder, due to its complex nature, and in silico promoter prediction is still in its infancy. Here, we review the different approaches that have been developed for identifying promoters and their regulatory elements. We discuss the detection of cis-acting regulatory elements using word-counting or probabilistic methods (so-called "search by signal" methods) and the delineation of promoters by considering both sequence content and structural features ("search by content" methods). As an example of search by content, we explored in greater detail the association of promoters with CpG islands. However, due to differences in sequence content, the parameters used to detect CpG islands in humans and other vertebrates cannot be used for plants. Therefore, a preliminary attempt was made to define parameters that could possibly define CpG and CpNpG islands in Arabidopsis, by exploring the compositional landscape around the transcriptional start site. To this end, a data set of more than 5,000 gene sequences was built, including the promoter region, the 5'-untranslated region, and the first introns and coding exons. Preliminary analysis shows that promoter location based on the detection of potential CpG/CpNpG islands in the Arabidopsis genome is not straightforward. Nevertheless, because the landscape of CpG/CpNpG islands differs considerably between promoters and introns on the one side and exons (whether coding or not) on the other, more sophisticated approaches can probably be developed for the successful detection of "putative" CpG and CpNpG islands in plants.
Collapse
Affiliation(s)
- Stephane Rombauts
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology, Ghent University, B-9000 Gent, Belgium
| | | | | | | | | | | |
Collapse
|
39
|
Heyduk T, Niedziela-Majka A. Fluorescence resonance energy transfer analysis of escherichia coli RNA polymerase and polymerase-DNA complexes. Biopolymers 2002; 61:201-13. [PMID: 11987181 DOI: 10.1002/bip.10139] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Fluorescence resonance energy transfer (FRET) is a technique allowing measurements of atomic-scale distances in diluted solutions of macromolecules under native conditions. This feature makes FRET a powerful tool to study complicated biological assemblies. In this report we review the applications of FRET to studies of transcription initiation by Escherichia coli RNA polymerase. The versatility of FRET for studies of a large macromolecular assembly such as RNA polymerase is illustrated by examples of using FRET to address several different aspects of transcription initiation by polymerase. FRET has been used to determine the architecture of polymerase, its complex with single-stranded DNA, and the conformation of promoter fragment bound to polymerase. FRET has been also used as a binding assay to determine the thermodynamics of promoter DNA fragment binding to the polymerase. Functional conformational changes in the specificity subunit of polymerase responsible for the modulation of the promoter binding activity of the enzyme and the mechanistic aspects of the transition from the initiation to the elongation complex were also investigated.
Collapse
Affiliation(s)
- T Heyduk
- Edward A. Doisy Department of Biochemistry and Molecular Biology, St. Louis University Medical School, 1402 S. Grand Blvd., MO 63104, USA.
| | | |
Collapse
|
40
|
Lewis SE, Searle SMJ, Harris N, Gibson M, Lyer V, Richter J, Wiel C, Bayraktaroglu L, Birney E, Crosby MA, Kaminker JS, Matthews BB, Prochnik SE, Smithy CD, Tupy JL, Rubin GM, Misra S, Mungall CJ, Clamp ME. Apollo: a sequence annotation editor. Genome Biol 2002; 3:RESEARCH0082. [PMID: 12537571 PMCID: PMC151184 DOI: 10.1186/gb-2002-3-12-research0082] [Citation(s) in RCA: 331] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2002] [Revised: 11/13/2002] [Accepted: 11/23/2002] [Indexed: 11/10/2022] Open
Abstract
The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects.
Collapse
Affiliation(s)
- S E Lewis
- Department of Molecular and Cellular Biology, Life Sciences Addition, University of California, Berkeley, CA 94720-3200, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Schneider TD. Strong minor groove base conservation in sequence logos implies DNA distortion or base flipping during replication and transcription initiation. Nucleic Acids Res 2001; 29:4881-91. [PMID: 11726698 PMCID: PMC96701 DOI: 10.1093/nar/29.23.4881] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The sequence logo for DNA binding sites of the bacteriophage P1 replication protein RepA shows unusually high sequence conservation ( approximately 2 bits) at a minor groove that faces RepA. However, B-form DNA can support only 1 bit of sequence conservation via contacts into the minor groove. The high conservation in RepA sites therefore implies a distorted DNA helix with direct or indirect contacts to the protein. Here I show that a high minor groove conservation signature also appears in sequence logos of sites for other replication origin binding proteins (Rts1, DnaA, P4 alpha, EBNA1, ORC) and promoter binding proteins (sigma(70), sigma(D) factors). This finding implies that DNA binding proteins generally use non-B-form DNA distortion such as base flipping to initiate replication and transcription.
Collapse
Affiliation(s)
- T D Schneider
- National Cancer Institute at Frederick, Laboratory of Experimental and Computational Biology, Building 469, PO Box B, Frederick, MD 21702-1201, USA.
| |
Collapse
|
42
|
Chan CL, Gross CA. The anti-initial transcribed sequence, a portable sequence that impedes promoter escape, requires sigma70 for function. J Biol Chem 2001; 276:38201-9. [PMID: 11481327 DOI: 10.1074/jbc.m104764200] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The anti-sequence, a portable element extending from +1 to +15 of the transcript, is sufficient to prevent promoter escape from a variety of strong final sigma70 promoters. We show here that this sequence does not function with even the strongest final sigma32 promoter. Moreover, a particular class of substitutions in final sigma70 that disrupt interaction between Region 2.2 of final sigma70 and a coiled-coiled motif in the beta'-subunit of RNA polymerase antagonizes the function of the anti-element. This same group of mutants prevents lambdaQ-mediated anti-termination at the lambdaP(R') promoter. At this promoter, interaction of final sigma70 with the non-template strand of the initial transcribed sequence (ITS) is required to promote the pause prerequisite for anti-termination. These mutants prevent pausing because they are defective in this recognition event. By analogy, we suggest that interaction of final sigma70 with the non-template strand of the anti-ITS is required for function of this portable element, thus explaining why neither final sigma32 nor the Region 2.2 final sigma70 mutants mediate anti-function. Support for the analogy with the lambdaP(R') promoter comes from preliminary experiments suggesting that the anti-ITS, like the lambdaP(R') ITS, is bipartite.
Collapse
Affiliation(s)
- C L Chan
- Department of Stomatology, University of California, San Francisco, 94143, USA
| | | |
Collapse
|
43
|
Forward KR, Willey BM, Low DE, McGeer A, Kapala MA, Kapala MM, Burrows LL. Molecular mechanisms of cefoxitin resistance in Escherichia coli from the Toronto area hospitals. Diagn Microbiol Infect Dis 2001; 41:57-63. [PMID: 11687315 DOI: 10.1016/s0732-8893(01)00278-4] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Escherichia coli may become resistant to cephamycines and oxyimino cephalosporins by virtue of promotor and attenuator mutations or because they have acquired mobilized beta-lactamases from other gram-negative bacilli. This study examined Canadian strains to determine how often promotor and/or attenuator mutations account for this mechanism of resistance and the extent to which clonal spread of these organisms has occurred. We sequenced the promotor and attenuator region of 30 strains resistant to cefoxitin. Twenty-two strains had promotor mutations, 26 had attenuator mutations. Most promotor mutations resulted either in a change in the -35 promotor region towards the E. coli sigma 70 consensus sequence or in the creation of a new consensus hexamer upstream. Eight strains had mutations that increased the typical ampC 16-nucleotide spacer region to the consensus 17- or an 18-nucleotide sequence. Of the attenuator mutations, most did not substantially affect the attenuator loop. Several of the mutations have previously been described in South Africa, Scandinavia, and France. There was evidence that strains bearing certain mutations were clonally disseminated; however, the 11 strains bearing a complex set of attenuator mutations were not. The majority of cephamycin resistant E. coli strains in Toronto have attenuator and/or promotor mutations upstream of the chromosomal ampC gene.
Collapse
Affiliation(s)
- K R Forward
- Dalhousie University, Halifax, Nova Scotia, Canada.
| | | | | | | | | | | | | |
Collapse
|
44
|
Argaman L, Hershberg R, Vogel J, Bejerano G, Wagner EG, Margalit H, Altuvia S. Novel small RNA-encoding genes in the intergenic regions of Escherichia coli. Curr Biol 2001; 11:941-50. [PMID: 11448770 DOI: 10.1016/s0960-9822(01)00270-6] [Citation(s) in RCA: 588] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Small, untranslated RNA molecules were identified initially in bacteria, but examples can be found in all kingdoms of life. These RNAs carry out diverse functions, and many of them are regulators of gene expression. Genes encoding small, untranslated RNAs are difficult to detect experimentally or to predict by traditional sequence analysis approaches. Thus, in spite of the rising recognition that such RNAs may play key roles in bacterial physiology, many of the small RNAs known to date were discovered fortuitously. RESULTS To search the Escherichia coli genome sequence for genes encoding small RNAs, we developed a computational strategy employing transcription signals and genomic features of the known small RNA-encoding genes. The search, for which we used rather restrictive criteria, has led to the prediction of 24 putative sRNA-encoding genes, of which 23 were tested experimentally. Here we report on the discovery of 14 genes encoding novel small RNAs in E. coli and their expression patterns under a variety of physiological conditions. Most of the newly discovered RNAs are abundant. Interestingly, the expression level of a significant number of these RNAs increases upon entry into stationary phase. CONCLUSIONS Based on our results, we conclude that small RNAs are much more widespread than previously imagined and that these versatile molecules may play important roles in the fine-tuning of cell responses to changing environments.
Collapse
Affiliation(s)
- L Argaman
- Department of Molecular Genetics and Biotechnology, The Hebrew University-Hadassah Medical School, 91120, Jerusalem, Israel
| | | | | | | | | | | | | |
Collapse
|
45
|
Abstract
Very little is understood of the structure of mycoplasma promoters, and this limits interpretation of genomic sequence data in these species. In this study the transcriptional start points of 22 genes of Mycoplasma pneumoniae were identified and the regions 5' to the start point compared. Although a strong consensus -10 region could be seen, there was only a weak consensus in the -35 region. A high proportion of transcripts had heterogeneous 5'-ends and characterisation of the sequence of the 5'-ends of two transcripts established that the heterogeneity was derived from initiation of transcription at reduced levels between 1 and 4 bases 5' to the major starting point. In addition to this apparently unique feature, a high proportion of transcripts lacked a 5' untranslated leader region that could contain a ribosomal binding site. Such leaderless transcripts are seen rarely in other bacterial species. Although the promoter regions for a number of members of lipoprotein multigene families were examined, no obvious explanation for regulation of expression was apparent. Using the data from this study an improved matrix for prediction of M.pneumoniae promoters was derived. Application of this matrix to the sequences immediately 3' and 5' to each predicted start codon in the genome suggested that most M. pneumoniae transcriptional start points were likely to occur between 5 and 30 bases 5' to the start codon.
Collapse
Affiliation(s)
- J Weiner
- Zentrum für Molekulare Biologie Heidelberg, Mikrobiologie, Universität Heidelberg, 69120 Heidelberg, Germany
| | | | | |
Collapse
|
46
|
Shtatland T, Gill SC, Javornik BE, Johansson HE, Singer BS, Uhlenbeck OC, Zichi DA, Gold L. Interactions of Escherichia coli RNA with bacteriophage MS2 coat protein: genomic SELEX. Nucleic Acids Res 2000; 28:E93. [PMID: 11058143 PMCID: PMC113162 DOI: 10.1093/nar/28.21.e93] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Genomic SELEX is a method for studying the network of nucleic acid-protein interactions within any organism. Here we report the discovery of several interesting and potentially biologically important interactions using genomic SELEX. We have found that bacteriophage MS2 coat protein binds several Escherichia coli mRNA fragments more tightly than it binds the natural, well-studied, phage mRNA site. MS2 coat protein binds mRNA fragments from rffG (involved in formation of lipopolysaccharide in the bacterial outer membrane), ebgR (lactose utilization repressor), as well as from several other genes. Genomic SELEX may yield experimentally induced artifacts, such as molecules in which the fixed sequences participate in binding. We describe several methods (annealing of oligonucleotides complementary to fixed sequences or switching fixed sequences) to eliminate some, or almost all, of these artifacts. Such methods may be useful tools for both randomized sequence SELEX and genomic SELEX.
Collapse
MESH Headings
- Artifacts
- Bacteriophages
- Base Sequence
- Binding Sites
- Capsid/metabolism
- Capsid Proteins
- Computational Biology
- Consensus Sequence
- Genes, Bacterial/genetics
- Genome, Bacterial
- Genomic Library
- Nucleic Acid Conformation
- Nucleic Acid Hybridization
- Oligodeoxyribonucleotides/genetics
- Oligodeoxyribonucleotides/metabolism
- Polymerase Chain Reaction
- Protein Binding
- RNA, Bacterial/chemistry
- RNA, Bacterial/genetics
- RNA, Bacterial/metabolism
- RNA, Messenger/chemistry
- RNA, Messenger/genetics
- RNA, Messenger/metabolism
- RNA, Viral/genetics
- RNA, Viral/metabolism
- RNA-Binding Proteins/metabolism
- Sensitivity and Specificity
- Substrate Specificity
- Transcription, Genetic
Collapse
Affiliation(s)
- T Shtatland
- Department of Molecular, University of Colorado, Boulder, CO 80309-0347, USA
| | | | | | | | | | | | | | | |
Collapse
|
47
|
Holt PJ, Williams RE, Jordan KN, Lowe CR, Bruce NC. Cloning, sequencing and expression in Escherichia coli of the primary alcohol dehydrogenase gene from Thermoanaerobacter ethanolicus JW200. FEMS Microbiol Lett 2000; 190:57-62. [PMID: 10981690 DOI: 10.1111/j.1574-6968.2000.tb09262.x] [Citation(s) in RCA: 22] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
The structural gene, adhA, for a thermostable primary alcohol dehydrogenase was cloned from Thermoanaerobacter ethanolicus JW200. Constitutive expression from its own promoter was observed in Escherichia coli. The nucleotide sequence of adhA corresponded to an open reading frame of 1197 bp, encoding a polypeptide of 399 amino acids with a calculated Mr of 43 192. Amino acid sequence analysis showed 67-69% identity with alcohol dehydrogenases from two archaeal species and 29-37% identity with bacterial type III alcohol dehydrogenases. This represents the first reported cloning of an alcohol dehydrogenase from a bacterial species that is both thermostable and active against primary long-chain alcohols.
Collapse
MESH Headings
- Alcohol Dehydrogenase/genetics
- Alcohol Dehydrogenase/isolation & purification
- Alcohol Dehydrogenase/metabolism
- Bacteria, Anaerobic/enzymology
- Bacteria, Anaerobic/genetics
- Bacteria, Anaerobic/growth & development
- Base Sequence
- Cloning, Molecular
- Coculture Techniques
- Enzyme Stability
- Escherichia coli/enzymology
- Escherichia coli/genetics
- Genes, Bacterial
- Gram-Positive Asporogenous Rods, Irregular/enzymology
- Gram-Positive Asporogenous Rods, Irregular/genetics
- Gram-Positive Asporogenous Rods, Irregular/growth & development
- Molecular Sequence Data
- Promoter Regions, Genetic
- Sequence Analysis, DNA
Collapse
Affiliation(s)
- P J Holt
- Institute of Biotechnology, University of Cambridge, UK
| | | | | | | | | |
Collapse
|
48
|
Lee S, Garfinkel MD. Characterization of Drosophila OVO protein DNA binding specificity using random DNA oligomer selection suggests zinc finger degeneration. Nucleic Acids Res 2000; 28:826-34. [PMID: 10637336 PMCID: PMC102545 DOI: 10.1093/nar/28.3.826] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The Drosophila melanogaster ovo locus codes for several tissue- and stage-specific proteins that all possess a common C-terminal array of four C(2)H(2)zinc fingers. Three fingers conform to the motif framework and are evolutionarily conserved; the fourth diverges considerably. The ovo genetic function affects germ cell viability, sex identity and oogenesis, while the overlapping svb function is a key selector for epidermal structures under the control of wnt and EGF receptor signaling. We isolated synthetic DNA oligomers bound by the OVO zinc finger array from a high complexity starting population and derived a statistically significant 9 bp long DNA consensus sequence, which is nearly identical to a consensus derived from several Drosophila genes known or suspected of being regulated by the ovo function in vivo. The DNA consensus recognized by Drosophila OVO protein is atypical for zinc finger proteins in that it does not conform to many of the 'rules' for the interaction of amino acid contact residues and DNA bases. Additionally, our results suggest that only three of the OVO zinc fingers contribute to DNA-binding specificity.
Collapse
Affiliation(s)
- S Lee
- Division of Biology, Illinois Institute of Technology, Chicago, IL 60616, USA
| | | |
Collapse
|
49
|
Abstract
This paper presents a survey of currently available mathematical models and algorithmical methods for trying to identify promoter sequences. The methods concern both searching in a genome for a previously defined consensus and extracting a consensus from a set of sequences. Such methods were often tailored for either eukaryotes or prokaryotes although this does not preclude use of the same method for both types of organisms. The survey therefore covers all methods; however, emphasis is placed on prokaryotic promoter sequence identification. Illustrative applications of the main extracting algorithms are given for three bacteria.
Collapse
Affiliation(s)
- A Vanet
- Institut de biologie physico-chimique, Paris, France
| | | | | |
Collapse
|
50
|
Huang X, Gaballa A, Cao M, Helmann JD. Identification of target promoters for the Bacillus subtilis extracytoplasmic function sigma factor, sigma W. Mol Microbiol 1999; 31:361-71. [PMID: 9987136 DOI: 10.1046/j.1365-2958.1999.01180.x] [Citation(s) in RCA: 106] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The Bacillus subtilis sigW gene encodes an extracytoplasmic function (ECF) sigma factor that is expressed in early stationary phase from a sigW-dependent autoregulatory promoter, PW. Using a consensus-based search procedure, we have identified 15 operons preceded by promoters similar in sequence to PW. At least 14 of these promoters are dependent on sigma W both in vivo and in vitro as judged by lacZ reporter fusions, run-off transcription assays and nucleotide resolution start site mapping. We conclude that sigma W controls a regulon of more than 30 genes, many of which encode membrane proteins of unknown function. The sigma W regulon includes a penicillin binding protein (PBP4*) and a co-transcribed amino acid racemase (RacX), homologues of signal peptide peptidase (YteI), flotillin (YuaG), ABC transporters (YknXYZ), non-haem bromoperoxidase (YdjP), epoxide hydrolase (YfhM) and three small peptides with structural similarities to bacteriocin precursor polypeptides. We suggest that sigma W activates a large stationary-phase regulon that functions in detoxification, production of anti-microbial compounds or both.
Collapse
Affiliation(s)
- X Huang
- Section of Microbiology, Cornell University, Ithaca, NY 14853-8101, USA
| | | | | | | |
Collapse
|