1
|
Benvenuti JL, Casa PL, Pessi de Abreu F, Martinez GS, de Avila E Silva S. From straight to curved: A historical perspective of DNA shape. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2024; 193:46-54. [PMID: 39260792 DOI: 10.1016/j.pbiomolbio.2024.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 07/30/2024] [Accepted: 09/04/2024] [Indexed: 09/13/2024]
Abstract
DNA is the macromolecule responsible for storing the genetic information of a cell and it has intrinsic properties such as deformability, stability and curvature. DNA Curvature plays an important role in gene transcription and, consequently, in the subsequent production of proteins, a fundamental process of cells. With recent advances in bioinformatics and theoretical biology, it became possible to analyze and understand the involvement of DNA Curvature as a discriminatory characteristic of gene-promoting regions. These regions act as sites where RNAp (ribonucleic acid-polymerase) binds to initiate transcription. This review aims to describe the formation of Curvature, as well as highlight its importance in predicting promoters. Furthermore, this article provides the potential of DNA Curvature as a distinguishing feature for promoter prediction tools, as well as outlining the calculation procedures that have been described by other researchers. This work may support further studies directed towards the enhancement of promoter prediction software.
Collapse
Affiliation(s)
- Jean Lucas Benvenuti
- Universidade de Caxias do Sul. Petrópolis, Caxias do Sul, Rio Grande do Sul, Brazil.
| | - Pedro Lenz Casa
- Universidade de Caxias do Sul. Petrópolis, Caxias do Sul, Rio Grande do Sul, Brazil
| | - Fernanda Pessi de Abreu
- Universidade de Caxias do Sul. Petrópolis, Caxias do Sul, Rio Grande do Sul, Brazil; Instituto de Biociências, Programa de Pós-Graduação em Genética e Biologia Molecular, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| | | | | |
Collapse
|
2
|
Coppens L, Wicke L, Lavigne R. SAPPHIRE.CNN: Implementation of dRNA-seq-driven, species-specific promoter prediction using convolutional neural networks. Comput Struct Biotechnol J 2022; 20:4969-4974. [PMID: 36147675 PMCID: PMC9478156 DOI: 10.1016/j.csbj.2022.09.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 09/03/2022] [Accepted: 09/05/2022] [Indexed: 11/22/2022] Open
Abstract
Data availability is a consistent bottleneck for the development of bacterial species-specific promoter prediction software. In this work we leverage genome-wide promoter datasets generated with dRNA-seq in the Gram-negative bacteria Pseudomonas aeruginosa and Salmonella enterica for promoter prediction. Convolutional neural networks are presented as an optimal architecture for model training and are further modified and tailored for promoter prediction. The resulting predictors reach high binary accuracies (95% and 94.9%) on test sets and outperform each other when predicting promoters in their associated species. SAPPHIRE.CNN is available online and can also be downloaded to run locally. Our results indicate a dependency of binary promoter classification on an organism’s GC content and a decreased performance of our classifiers on genera they were not trained for, further supporting the need for dedicated, species-specific promoter classification tools.
Collapse
Affiliation(s)
- Lucas Coppens
- Department of Bioengineering and Imperial College Centre for Synthetic Biology, Imperial College London, London, UK.,Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Kasteelpark Arenberg 21, Box 2462, 3001 Leuven, Belgium
| | - Laura Wicke
- Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Kasteelpark Arenberg 21, Box 2462, 3001 Leuven, Belgium.,Institute for Molecular Infection Biology (IMIB), Medical Faculty, University of Würzburg, Josef-Schneider-Straße 2, 97080 Würzburg, Germany
| | - Rob Lavigne
- Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Kasteelpark Arenberg 21, Box 2462, 3001 Leuven, Belgium
| |
Collapse
|
3
|
Zhang M, Jia C, Li F, Li C, Zhu Y, Akutsu T, Webb GI, Zou Q, Coin LJM, Song J. Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction. Brief Bioinform 2022; 23:6502561. [PMID: 35021193 PMCID: PMC8921625 DOI: 10.1093/bib/bbab551] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/12/2021] [Accepted: 11/30/2021] [Indexed: 01/13/2023] Open
Abstract
Promoters are crucial regulatory DNA regions for gene transcriptional activation. Rapid advances in next-generation sequencing technologies have accelerated the accumulation of genome sequences, providing increased training data to inform computational approaches for both prokaryotic and eukaryotic promoter prediction. However, it remains a significant challenge to accurately identify species-specific promoter sequences using computational approaches. To advance computational support for promoter prediction, in this study, we curated 58 comprehensive, up-to-date, benchmark datasets for 7 different species (i.e. Escherichia coli, Bacillus subtilis, Homo sapiens, Mus musculus, Arabidopsis thaliana, Zea mays and Drosophila melanogaster) to assist the research community to assess the relative functionality of alternative approaches and support future research on both prokaryotic and eukaryotic promoters. We revisited 106 predictors published since 2000 for promoter identification (40 for prokaryotic promoter, 61 for eukaryotic promoter, and 5 for both). We systematically evaluated their training datasets, computational methodologies, calculated features, performance and software usability. On the basis of these benchmark datasets, we benchmarked 19 predictors with functioning webservers/local tools and assessed their prediction performance. We found that deep learning and traditional machine learning-based approaches generally outperformed scoring function-based approaches. Taken together, the curated benchmark dataset repository and the benchmarking analysis in this study serve to inform the design and implementation of computational approaches for promoter prediction and facilitate more rigorous comparison of new techniques in the future.
Collapse
Affiliation(s)
| | - Cangzhi Jia
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | | | | | | | | | - Geoffrey I Webb
- Department of Data Science and Artificial Intelligence, Monash University, Melbourne, VIC 3800, Australia,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Quan Zou
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | - Lachlan J M Coin
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | - Jiangning Song
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| |
Collapse
|
4
|
Orlov MA, Sorokin AA. DNA sequence, physics, and promoter function: Analysis of high-throughput data On T7 promoter variants activity. J Bioinform Comput Biol 2021; 18:2040001. [PMID: 32404013 DOI: 10.1142/s0219720020400016] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
RNA polymerase/promoter recognition represents a basic problem of molecular biology. Decades-long efforts were made in the area, and yet certain challenges persist. The usage of certain most suitable model subjects is pivotal for the research. System of T7 bacteriophage RNA-polymerase/T7 native promoter represents an exceptional example for the purpose. Moreover, it has been studied the most and successfully applied to aims of biotechnology and bioengineering. Both structural simplicity and high specificity of this molecular duo are the reason for this. Despite highly similar sequences of distinct T7 native promoters, the T7 RNA-polymerase enzyme is capable of binding respective promoter in a highly specific and adjustable manner. One explanation here is that the process relies primarily on DNA physical properties rather than nucleotide sequence. Here, we address the issue by analyzing massive data recently published by Komura and colleagues. This initial study employed Next Generation Sequencing (NGS) in order to quantify activity of promoter variants including ones with multiple substitutions. As a result of our work substantial bias in simultaneous occurrence of single-nucleotide sequence alterations was found: the highest rate of co-occurrence was evidenced within specificity loop of binding region while the lowest - in initiation region of promoter. If both location and a kind of nucleotides involved in replacement (both initial and resulting) are taken into consideration, one can easily note that N to A substitutions are most preferred ones across the whole 19 b.p.-long sequence. At the same time, N to C are tolerated only at crucial position in recognition loop of binding region, and N to G are uniformly least tolerable. Later in this work the complete set of variants was split into groups with mutations (1) exclusively in binding region; (2) exclusively in melting region; (3) in both regions. Among these three groups second comprises extremely few variants (at triple-digit rate lesser than in two other groups, 46 versus over one and six thousand). Yet these are all promoter with substantial to high activity. This group two appeared heterogenous by primary sequence; indeed, upon further subdivision into above versus below average activity subgroups first one was found to comprise promoters with negligible conservation at -2 position of melting region; the second was hardly conserved in this region at all. This draws our attention to perfect consensus sequence of class III T7 promoter with -2 nucleotide randomized (all four are present by one to several copies in the previously published source dataset), the picture becomes even more pronounced. We therefore suggest that mutations at the position therefore do not cause significant changes in terms of promoter activity. At the same time, such modifications dramatically change DNA physical properties which were calculated in our study (namely electrostatic potential and propensity to bend). One possible suggestion here is that -2 nucleotide might function as a generic switch; if so, substitution -2A to -2T has important regulatory consequences. The fact that that -2 b.p. is the most evidently different nucleotide between class II versus class III promoters of T7 genome and that it also distinguishes the class III promoter in T7 genome versus promoters of its relative but reproductively isolated bacteriophage T3. In other words, it appears feasible that mutation at -2 nucleotide does not impede promoter activity yet alter its physical properties thus affecting differential RNA polymerase/promoter interaction.
Collapse
Affiliation(s)
- Mikhail A Orlov
- Institute of Cell Biophysics of RAS, 3 Institutskaya str., Poushchino, 142290, Russia
| | - Anatoly A Sorokin
- Institute of Cell Biophysics of RAS, 3 Institutskaya str., Poushchino, 142290, Russia
| |
Collapse
|
5
|
Ma M, Welch RD, Garza AG. The σ 54 system directly regulates bacterial natural product genes. Sci Rep 2021; 11:4771. [PMID: 33637792 PMCID: PMC7910581 DOI: 10.1038/s41598-021-84057-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 02/05/2021] [Indexed: 01/31/2023] Open
Abstract
Bacterial-derived polyketide and non-ribosomal peptide natural products are crucial sources of therapeutics and yet little is known about the conditions that favor activation of natural product genes or the regulatory machinery controlling their transcription. Recent findings suggest that the σ54 system, which includes σ54-loaded RNA polymerase and transcriptional activators called enhancer binding proteins (EBPs), might be a common regulator of natural product genes. Here, we explored this idea by analyzing a selected group of putative σ54 promoters identified in Myxococcus xanthus natural product gene clusters. We show that mutations in putative σ54-RNA polymerase binding regions and in putative Nla28 EBP binding sites dramatically reduce in vivo promoter activities in growing and developing cells. We also show in vivo promoter activities are reduced in a nla28 mutant, that Nla28 binds to wild-type fragments of these promoters in vitro, and that in vitro binding is lost when the Nla28 binding sites are mutated. Together, our results indicate that M. xanthus uses σ54 promoters for transcription of at least some of its natural product genes. Interestingly, the vast majority of experimentally confirmed and putative σ54 promoters in M. xanthus natural product loci are located within genes and not in intergenic sequences.
Collapse
Affiliation(s)
- Muqing Ma
- grid.264484.80000 0001 2189 1568Department of Biology, Syracuse University, 107 College Place, Syracuse, NY 13244 USA
| | - Roy D. Welch
- grid.264484.80000 0001 2189 1568Department of Biology, Syracuse University, 107 College Place, Syracuse, NY 13244 USA
| | - Anthony G. Garza
- grid.264484.80000 0001 2189 1568Department of Biology, Syracuse University, 107 College Place, Syracuse, NY 13244 USA
| |
Collapse
|
6
|
Mishra A, Dhanda S, Siwach P, Aggarwal S, Jayaram B. A novel method SEProm for prokaryotic promoter prediction based on DNA structure and energetics. Bioinformatics 2020; 36:2375-2384. [PMID: 31909789 DOI: 10.1093/bioinformatics/btz941] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 11/08/2019] [Accepted: 01/02/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Despite conservation in general architecture of promoters and protein-DNA interaction interface of RNA polymerases among various prokaryotes, identification of promoter regions in the whole genome sequences remains a daunting challenge. The available tools for promoter prediction do not seem to address the problem satisfactorily, apparently because the biochemical nature of promoter signals is yet to be understood fully. Using 28 structural and 3 energetic parameters, we found that prokaryotic promoter regions have a unique structural and energy state, quite distinct from that of coding regions and the information for this signature state is in-built in their sequences. We developed a novel promoter prediction tool from these 31 parameters using various statistical techniques. RESULTS Here, we introduce SEProm, a novel tool that is developed by studying and utilizing the in-built structural and energy information of DNA sequences, which is applicable to all prokaryotes including archaea. Compared to five most recent, diverged and current best available tools, SEProm performs much better, predicting promoters with an 'F-value' of 82.04 and 'Precision' of 81.08. The next best 'F-value' was obtained with PromPredict (72.14) followed by BProm (68.37). On the basis of 'Precision' value, the next best 'Precision' was observed for Pepper (75.39) followed by PromPredict (72.01). SEProm maintained the lead even when comparison was done on two test organisms (not involved in training for SEProm). AVAILABILITY AND IMPLEMENTATION The software is freely available with easy to follow instructions (www.scfbio-iitd.res.in/software/TSS_Predict.jsp). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology.,Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi 110016, India
| | - Sahil Dhanda
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - Priyanka Siwach
- Supercomputing Facility for Bioinformatics & Computational Biology.,Department of Biotechnology, Chaudhary Devi Lal University, Sirsa 125055, India
| | - Shruti Aggarwal
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology.,Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi 110016, India.,Department of Chemistry, Indian Institute of Technology, New Delhi 110016, India
| |
Collapse
|
7
|
Chedin F, Benham CJ. Emerging roles for R-loop structures in the management of topological stress. J Biol Chem 2020; 295:4684-4695. [PMID: 32107311 DOI: 10.1074/jbc.rev119.006364] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
R-loop structures are a prevalent class of alternative non-B DNA structures that form during transcription upon invasion of the DNA template by the nascent RNA. R-loops form universally in the genomes of organisms ranging from bacteriophages, bacteria, and yeasts to plants and animals, including mammals. A growing body of work has linked these structures to both physiological and pathological processes, in particular to genome instability. The rising interest in R-loops is placing new emphasis on understanding the fundamental physicochemical forces driving their formation and stability. Pioneering work in Escherichia coli revealed that DNA topology, in particular negative DNA superhelicity, plays a key role in driving R-loops. A clear role for DNA sequence was later uncovered. Here, we review and synthesize available evidence on the roles of DNA sequence and DNA topology in controlling R-loop formation and stability. Factoring in recent developments in R-loop modeling and single-molecule profiling, we propose a coherent model accounting for the interplay between DNA sequence and DNA topology in driving R-loop structure formation. This model reveals R-loops in a new light as powerful and reversible topological stress relievers, an insight that significantly expands the repertoire of R-loops' potential biological roles under both normal and aberrant conditions.
Collapse
Affiliation(s)
- Frederic Chedin
- Department of Molecular and Cellular Biology, University of California, Davis, California 95616 .,Genome Center, University of California, Davis, California 95616
| | - Craig J Benham
- Genome Center, University of California, Davis, California 95616 .,Departments of Mathematics and Biomedical Engineering, University of California, Davis, California 95616
| |
Collapse
|
8
|
Orlov M, Garanina I, Fisunov GY, Sorokin A. Comparative Analysis of Mycoplasma gallisepticum vlhA Promoters. Front Genet 2018; 9:569. [PMID: 30519256 PMCID: PMC6258824 DOI: 10.3389/fgene.2018.00569] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 11/06/2018] [Indexed: 12/15/2022] Open
Abstract
Mycoplasma gallisepticum is an intracellular parasite affecting respiratory tract of poultry that belongs to class Mollicutes. M. gallisepticum features numerous variable lipoprotein hemagglutinin genes (vlhA) that play a role in immune escape. The vlhA promoters have a set of distinct properties in comparison to promoters of the other genes. The vlhA promoters carry a variable GAA repeats region at approximately 40 nts upstream of transcription start site. The promoters have been considered active only in the presence of exactly 12 GAA repeats. The mechanisms of vlhA expression regulation and GAA number variation are not described. Here we tried to understand these mechanisms using different computational methods. We conducted a comparative analysis among several M. gallisepticum strains. Nucleotide sequences analysis showed the presence of highly conserved regions flanking repeated trinucleotides that are not linked to GAA number variation. VlhA genes with 12 GAA repeats and their orthologs in 12 M. gallisepticum strains are more conserved than other vlhA genes and have narrower GAA number distribution. We conducted comparative analysis of physicochemical profiles of M. gallisepticum vlhA and sigma-70 promoters. Stress-induced duplex destabilization (SIDD) profiles showed that sigma-70 group is characterized by the common to prokaryotic promoters sharp maxima while vlhA promoters are hardly destabilized with the region between GAA repeats and transcription start site having zero opening probability. Electrostatic potential profiles of vlhA promoters indicate the presence of the distinct patterns that appear to govern initial stages of specific DNA-protein recognition. Open state dynamics profiles of vlhA demonstrate the pattern that might facilitate transcription bubble formation. Obtained data could be the basis for experimental identification of mechanisms of phase variation in M. gallisepticum.
Collapse
Affiliation(s)
- Mikhail Orlov
- Institute of Cell Biophysics, Russian Academy of Sciences, Pushchino, Russia
| | - Irina Garanina
- Federal Research and Clinical Center of Physical-Chemical Medicine, Federal Medical-Biological Agency, Moscow, Russia
| | - Gleb Y Fisunov
- Federal Research and Clinical Center of Physical-Chemical Medicine, Federal Medical-Biological Agency, Moscow, Russia
| | - Anatoly Sorokin
- Institute of Cell Biophysics, Russian Academy of Sciences, Pushchino, Russia
| |
Collapse
|
9
|
He W, Jia C, Duan Y, Zou Q. 70ProPred: a predictor for discovering sigma70 promoters based on combining multiple features. BMC SYSTEMS BIOLOGY 2018; 12:44. [PMID: 29745856 PMCID: PMC5998878 DOI: 10.1186/s12918-018-0570-1] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
BACKGROUND Promoter is an important sequence regulation element, which is in charge of gene transcription initiation. In prokaryotes, σ70 promoters regulate the transcription of most genes. The promoter recognition has been a crucial part of gene structure recognition. It's also the core issue of constructing gene transcriptional regulation network. With the successfully completion of genome sequencing from an increasing number of microbe species, the accurate identification of σ70 promoter regions in DNA sequence is not easy. RESULTS In order to improve the prediction accuracy of sigma70 promoters in prokaryote, a promoter recognition model 70ProPred was established. In this work, two sequence-based features, including position-specific trinucleotide propensity based on single-stranded characteristic (PSTNPss) and electron-ion potential values for trinucleotides (PseEIIP), were assessed to build the best prediction model. It was found that 79 features of PSTNPSS combined with 64 features of PseEIIP obtained the best performance for sigma70 promoter identification, with a promising accuracy and the Matthews correlation coefficient (MCC) at 95.56% and 0.90, respectively. CONCLUSION The jackknife tests showed that 70ProPred outperforms the existing sigma70 promoter prediction approaches in terms of accuracy and stability. Additionally, this approach can also be extended to predict promoters of other species. In order to facilitate experimental biologists, an online web server for the proposed method was established, which is freely available at http://server.malab.cn/70ProPred/ .
Collapse
Affiliation(s)
- Wenying He
- School of Computer Science and Technology, Tianjin University, Tianjin, 300072 China
| | - Cangzhi Jia
- Department of Mathematics, Dalian Maritime University, Dalian, 116026 China
| | - Yucong Duan
- College of Information and Technology, Hainan University, Haikou, 570228 China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, 300072 China
| |
Collapse
|
10
|
Ryasik A, Orlov M, Zykova E, Ermak T, Sorokin A. Bacterial promoter prediction: Selection of dynamic and static physical properties of DNA for reliable sequence classification. J Bioinform Comput Biol 2018; 16:1840003. [DOI: 10.1142/s0219720018400036] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Predicting promoter activity of DNA fragment is an important task for computational biology. Approaches using physical properties of DNA to predict bacterial promoters have recently gained a lot of attention. To select an adequate set of physical properties for training a classifier, various characteristics of DNA molecule should be taken into consideration. Here, we present a systematic approach that allows us to select less correlated properties for classification by means of both correlation and cophenetic coefficients as well as concordance matrices. To prove this concept, we have developed the first classifier that uses not only sequence and static physical properties of DNA fragment, but also dynamic properties of DNA open states. Therefore, the best performing models with accuracy values up to 90% for all types of sequences were obtained. Furthermore, we have demonstrated that the classifier can serve as a reliable tool enabling promoter DNA fragments to be distinguished from promoter islands despite the similarity of their nucleotide sequences.
Collapse
Affiliation(s)
- Artem Ryasik
- Mechanism of Cell Genome Functioning Laboratory, Institute of Cell Biophysics, ul. Institutskaya 3, Pushchino 142290, Russia
| | - Mikhail Orlov
- Mechanism of Cell Genome Functioning Laboratory, Institute of Cell Biophysics, ul. Institutskaya 3, Pushchino 142290, Russia
| | - Evgenia Zykova
- Mechanism of Cell Genome Functioning Laboratory, Institute of Cell Biophysics, ul. Institutskaya 3, Pushchino 142290, Russia
- Department of Applied Research Informatization, State Institute of Information Technologies and Telecommunications (SIIT&T Informika), per. Brusov 21 st.2, Moscow, 125009, Russia
| | - Timofei Ermak
- Laboratory of Molecular Genetics Systems, Institute of Cytology and Genetics, pr. Akademika Lavrentyeva 10, Novosibirsk 630090, Russia
| | - Anatoly Sorokin
- Mechanism of Cell Genome Functioning Laboratory, Institute of Cell Biophysics, ul. Institutskaya 3, Pushchino 142290, Russia
| |
Collapse
|
11
|
Umarov RK, Solovyev VV. Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks. PLoS One 2017; 12:e0171410. [PMID: 28158264 PMCID: PMC5291440 DOI: 10.1371/journal.pone.0171410] [Citation(s) in RCA: 130] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2016] [Accepted: 01/20/2017] [Indexed: 11/18/2022] Open
Abstract
Accurate computational identification of promoters remains a challenge as these key DNA regulatory regions have variable structures composed of functional motifs that provide gene-specific initiation of transcription. In this paper we utilize Convolutional Neural Networks (CNN) to analyze sequence characteristics of prokaryotic and eukaryotic promoters and build their predictive models. We trained a similar CNN architecture on promoters of five distant organisms: human, mouse, plant (Arabidopsis), and two bacteria (Escherichia coli and Bacillus subtilis). We found that CNN trained on sigma70 subclass of Escherichia coli promoter gives an excellent classification of promoters and non-promoter sequences (Sn = 0.90, Sp = 0.96, CC = 0.84). The Bacillus subtilis promoters identification CNN model achieves Sn = 0.91, Sp = 0.95, and CC = 0.86. For human, mouse and Arabidopsis promoters we employed CNNs for identification of two well-known promoter classes (TATA and non-TATA promoters). CNN models nicely recognize these complex functional regions. For human promoters Sn/Sp/CC accuracy of prediction reached 0.95/0.98/0,90 on TATA and 0.90/0.98/0.89 for non-TATA promoter sequences, respectively. For Arabidopsis we observed Sn/Sp/CC 0.95/0.97/0.91 (TATA) and 0.94/0.94/0.86 (non-TATA) promoters. Thus, the developed CNN models, implemented in CNNProm program, demonstrated the ability of deep learning approach to grasp complex promoter sequence characteristics and achieve significantly higher accuracy compared to the previously developed promoter prediction programs. We also propose random substitution procedure to discover positionally conserved promoter functional elements. As the suggested approach does not require knowledge of any specific promoter features, it can be easily extended to identify promoters and other complex functional regions in sequences of many other and especially newly sequenced genomes. The CNNProm program is available to run at web server http://www.softberry.com.
Collapse
Affiliation(s)
- Ramzan Kh. Umarov
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | | |
Collapse
|
12
|
Nikolic M, Stankovic T, Djordjevic M. Contribution of bacterial promoter elements to transcription start site detection accuracy. J Bioinform Comput Biol 2016; 15:1650038. [PMID: 27908222 DOI: 10.1142/s0219720016500384] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Accurately detecting transcription start sites (TSS) is a starting point for understanding gene transcription, and an important ingredient in a number of applications necessary for functional gene annotation, such as gene and operon predictions. Available methods for TSS detection in bacteria use very different description of the bacterial promoter structure and all of them show low accuracy. It is therefore unclear which promoter features should be included in TSS recognition, and how their accuracy impacts the search detection. We here address this question for [Formula: see text] and [Formula: see text] (an alternative [Formula: see text] factor) promoters in E. coli. We find that [Formula: see text]35 element, which is considered exchangeable, and is often not included in TSS search, contributes to the search accuracy equally (for [Formula: see text], or more (for [Formula: see text] than the ubiquitous [Formula: see text]10 element. Surprisingly, the sequence of the spacer between [Formula: see text]35 and [Formula: see text]10 promoter elements, which is commonly included in TSS detection, significantly decreases the search accuracy for [Formula: see text] promoters. However, the spacer sequence improves the search accuracy for [Formula: see text] promoters, which we attribute to a presence of sequence conservation. Overall, there is as much as [Formula: see text]50% false positive reduction for optimally implemented promoter features in [Formula: see text], underlying necessity for accurate promoter element alignments.
Collapse
Affiliation(s)
- Milos Nikolic
- * Faculty of Biology, University of Belgrade, Studentski trg 16 Belgrade, 11000, Serbia
| | - Tamara Stankovic
- * Faculty of Biology, University of Belgrade, Studentski trg 16 Belgrade, 11000, Serbia.,† Interdisciplinary PhD program in Biophysics, University of Belgrade, Studentski trg 1, 11000, Serbia
| | - Marko Djordjevic
- * Faculty of Biology, University of Belgrade, Studentski trg 16 Belgrade, 11000, Serbia
| |
Collapse
|
13
|
Kumar A, Manivelan V, Bansal M. Structural features of DNA are conserved in the promoter region of orthologous genes across different strains ofHelicobacter pylori. FEMS Microbiol Lett 2016; 363:fnw207. [DOI: 10.1093/femsle/fnw207] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/25/2016] [Indexed: 12/19/2022] Open
|
14
|
Abbas MM, Mohie-Eldin MM, EL-Manzalawy Y. Assessing the effects of data selection and representation on the development of reliable E. coli sigma 70 promoter region predictors. PLoS One 2015; 10:e0119721. [PMID: 25803493 PMCID: PMC4372424 DOI: 10.1371/journal.pone.0119721] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Accepted: 01/26/2015] [Indexed: 11/27/2022] Open
Abstract
As the number of sequenced bacterial genomes increases, the need for rapid and reliable tools for the annotation of functional elements (e.g., transcriptional regulatory elements) becomes more desirable. Promoters are the key regulatory elements, which recruit the transcriptional machinery through binding to a variety of regulatory proteins (known as sigma factors). The identification of the promoter regions is very challenging because these regions do not adhere to specific sequence patterns or motifs and are difficult to determine experimentally. Machine learning represents a promising and cost-effective approach for computational identification of prokaryotic promoter regions. However, the quality of the predictors depends on several factors including: i) training data; ii) data representation; iii) classification algorithms; iv) evaluation procedures. In this work, we create several variants of E. coli promoter data sets and utilize them to experimentally examine the effect of these factors on the predictive performance of E. coli σ70 promoter models. Our results suggest that under some combinations of the first three criteria, a prediction model might perform very well on cross-validation experiments while its performance on independent test data is drastically very poor. This emphasizes the importance of evaluating promoter region predictors using independent test data, which corrects for the over-optimistic performance that might be estimated using the cross-validation procedure. Our analysis of the tested models shows that good prediction models often perform well despite how the non-promoter data was obtained. On the other hand, poor prediction models seems to be more sensitive to the choice of non-promoter sequences. Interestingly, the best performing sequence-based classifiers outperform the best performing structure-based classifiers on both cross-validation and independent test performance evaluation experiments. Finally, we propose a meta-predictor method combining two top performing sequence-based and structure-based classifiers and compare its performance with some of the state-of-the-art E. coli σ70 promoter prediction methods.
Collapse
Affiliation(s)
- Mostafa M. Abbas
- KINDI Center for Computing Research, College of Engineering, Qatar University, Doha, Qatar
| | | | - Yasser EL-Manzalawy
- Systems and Computer Engineering, Al-Azhar University, Cairo, Egypt
- College of Information Sciences, Penn State University, University Park, United States of America
| |
Collapse
|
15
|
Lloréns-Rico V, Lluch-Senar M, Serrano L. Distinguishing between productive and abortive promoters using a random forest classifier in Mycoplasma pneumoniae. Nucleic Acids Res 2015; 43:3442-53. [PMID: 25779052 PMCID: PMC4402517 DOI: 10.1093/nar/gkv170] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Accepted: 02/22/2015] [Indexed: 12/01/2022] Open
Abstract
Distinguishing between promoter-like sequences in bacteria that belong to true or abortive promoters, or to those that do not initiate transcription at all, is one of the important challenges in transcriptomics. To address this problem, we have studied the genome-reduced bacterium Mycoplasma pneumoniae, for which the RNAs associated with transcriptional start sites have been recently experimentally identified. We determined the contribution to transcription events of different genomic features: the –10, extended –10 and –35 boxes, the UP element, the bases surrounding the –10 box and the nearest-neighbor free energy of the promoter region. Using a random forest classifier and the aforementioned features transformed into scores, we could distinguish between true, abortive promoters and non-promoters with good –10 box sequences. The methods used in this characterization of promoters can be extended to other bacteria and have important applications for promoter design in bacterial genome engineering.
Collapse
Affiliation(s)
- Verónica Lloréns-Rico
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain
| | - Maria Lluch-Senar
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain
| | - Luis Serrano
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
16
|
Notari DL, Molin A, Davanzo V, Picolotto D, Ribeiro HG, Silva SDAE. IntergenicDB: a database for intergenic sequences. Bioinformation 2014; 10:381-3. [PMID: 25097383 PMCID: PMC4110431 DOI: 10.6026/97320630010381] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2014] [Accepted: 05/24/2014] [Indexed: 12/01/2022] Open
Abstract
A whole genome contains not only coding regions, but also non-coding regions. These are located between the end of a given
coding region and the beginning of the following coding region. For this reason, the information about gene regulation process
underlies in intergenic regions. There is no easy way to obtain intergenic regions from current available databases. IntergenicDB
was developed to integrate data of intergenic regions and their gene related information from NCBI databases. The main goal of
INTERGENICDB is to offer friendly database for intergenic sequences of bacterial genomes.
Collapse
Affiliation(s)
- Daniel Luis Notari
- Centro de Computação e Tecnologia da Informação, Universidade de Caxias do Sul, Rua Francisco Getúlio Vargas, 1130 - CEP 95070-560 - Caxias do Sul, Rio Grande do Sul, Brasil ; Instituto de Biotecnologia, Universidade de Caxias do Sul, Rua Francisco Getúlio Vargas, 1130 - CEP 95070-560 - Caxias do Sul, Rio Grande do Sul, Brasil
| | - Aurione Molin
- Centro de Computação e Tecnologia da Informação, Universidade de Caxias do Sul, Rua Francisco Getúlio Vargas, 1130 - CEP 95070-560 - Caxias do Sul, Rio Grande do Sul, Brasil
| | - Vanessa Davanzo
- Centro de Computação e Tecnologia da Informação, Universidade de Caxias do Sul, Rua Francisco Getúlio Vargas, 1130 - CEP 95070-560 - Caxias do Sul, Rio Grande do Sul, Brasil
| | - Douglas Picolotto
- Centro de Computação e Tecnologia da Informação, Universidade de Caxias do Sul, Rua Francisco Getúlio Vargas, 1130 - CEP 95070-560 - Caxias do Sul, Rio Grande do Sul, Brasil
| | - Helena Graziottin Ribeiro
- Centro de Computação e Tecnologia da Informação, Universidade de Caxias do Sul, Rua Francisco Getúlio Vargas, 1130 - CEP 95070-560 - Caxias do Sul, Rio Grande do Sul, Brasil
| | - Scheila de Avila E Silva
- Instituto de Biotecnologia, Universidade de Caxias do Sul, Rua Francisco Getúlio Vargas, 1130 - CEP 95070-560 - Caxias do Sul, Rio Grande do Sul, Brasil
| |
Collapse
|
17
|
Meysman P, Collado-Vides J, Morett E, Viola R, Engelen K, Laukens K. Structural properties of prokaryotic promoter regions correlate with functional features. PLoS One 2014; 9:e88717. [PMID: 24516674 PMCID: PMC3918002 DOI: 10.1371/journal.pone.0088717] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Accepted: 01/10/2014] [Indexed: 12/31/2022] Open
Abstract
The structural properties of the DNA molecule are known to play a critical role in transcription. In this paper, the structural profiles of promoter regions were studied within the context of their diversity and their function for eleven prokaryotic species; Escherichia coli, Klebsiella pneumoniae, Salmonella Typhimurium, Pseudomonas auroginosa, Geobacter sulfurreducens Helicobacter pylori, Chlamydophila pneumoniae, Synechocystis sp., Synechoccocus elongates, Bacillus anthracis, and the archaea Sulfolobus solfataricus. The main anchor point for these promoter regions were transcription start sites identified through high-throughput experiments or collected within large curated databases. Prokaryotic promoter regions were found to be less stable and less flexible than the genomic mean across all studied species. However, direct comparison between species revealed differences in their structural profiles that can not solely be explained by the difference in genomic GC content. In addition, comparison with functional data revealed that there are patterns in the promoter structural profiles that can be linked to specific functional loci, such as sigma factor regulation or transcription factor binding. Interestingly, a novel structural element clearly visible near the transcription start site was found in genes associated with essential cellular functions and growth in several species. Our analyses reveals the great diversity in promoter structural profiles both between and within prokaryotic species. We observed relationships between structural diversity and functional features that are interesting prospects for further research to yet uncharacterized functional loci defined by DNA structural properties.
Collapse
Affiliation(s)
- Pieter Meysman
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - Julio Collado-Vides
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
| | - Enrique Morett
- Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
- Instituto Nacional de Medicina Genómica, Mexico City, Mexico
| | - Roberto Viola
- Department of Computational Biology, Fondazione Edmund Mach, San Michele all’Adige, Trento, Italy
| | - Kristof Engelen
- Department of Computational Biology, Fondazione Edmund Mach, San Michele all’Adige, Trento, Italy
- * E-mail: (KE); (KL)
| | - Kris Laukens
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
- * E-mail: (KE); (KL)
| |
Collapse
|
18
|
Bansal M, Kumar A, Yella VR. Role of DNA sequence based structural features of promoters in transcription initiation and gene expression. Curr Opin Struct Biol 2014; 25:77-85. [PMID: 24503515 DOI: 10.1016/j.sbi.2014.01.007] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Accepted: 01/07/2014] [Indexed: 11/18/2022]
Abstract
Regulatory information for transcription initiation is present in a stretch of genomic DNA, called the promoter region that is located upstream of the transcription start site (TSS) of the gene. The promoter region interacts with different transcription factors and RNA polymerase to initiate transcription and contains short stretches of transcription factor binding sites (TFBSs), as well as structurally unique elements. Recent experimental and computational analyses of promoter sequences show that they often have non-B-DNA structural motifs, as well as some conserved structural properties, such as stability, bendability, nucleosome positioning preference and curvature, across a class of organisms. Here, we briefly describe these structural features, the differences observed in various organisms and their possible role in regulation of gene expression.
Collapse
Affiliation(s)
- Manju Bansal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India.
| | - Aditya Kumar
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India
| | | |
Collapse
|
19
|
Meysman P, Marchal K, Engelen K. DNA structural properties in the classification of genomic transcription regulation elements. Bioinform Biol Insights 2012; 6:155-68. [PMID: 22837642 PMCID: PMC3399529 DOI: 10.4137/bbi.s9426] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
It has been long known that DNA molecules encode information at various levels. The most basic level comprises the base sequence itself and is primarily important for the encoding of proteins and direct base recognition by DNA-binding proteins. A more elusive level consists of the local structural properties of the DNA molecule wherein the DNA sequence only plays an indirect supportive role. These properties are nevertheless an important factor in a large number of biomolecular processes and can be considered as informative signals for the presence of a variety of genomic features. Several recent studies have unequivocally shown the benefit of relying on such DNA properties for modeling and predicting genomic features as diverse as transcription start sites, transcription factor binding sites, or nucleosome occupancy. This review is meant to provide an overview of the key aspects of these DNA conformational and physicochemical properties. To illustrate their potential added value compared to relying solely on the nucleotide sequence in genomics studies, we discuss their application in research on transcription regulation mechanisms as representative cases.
Collapse
Affiliation(s)
- Pieter Meysman
- Department of Molecular and Microbial Systems, KULeuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium
| | | | | |
Collapse
|
20
|
Redefining Escherichia coli σ(70) promoter elements: -15 motif as a complement of the -10 motif. J Bacteriol 2011; 193:6305-14. [PMID: 21908667 DOI: 10.1128/jb.05947-11] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Classical elements of σ(70) bacterial promoters include the -35 element ((-35)TTGACA(-30)), the -10 element ((-12)TATAAT(-7)), and the extended -10 element ((-15)TG(-14)). Although the -35 element, the extended -10 element, and the upstream-most base in the -10 element ((-12)T) interact with σ(70) in double-stranded DNA (dsDNA) form, the downstream bases in the -10 motif ((-11)ATAAT(-7)) are responsible for σ(70)-single-stranded DNA (ssDNA) interactions. In order to directly reflect this correspondence, an extension of the extended -10 element to a so-called -15 element ((-15)TGnT(-12)) has been recently proposed. I investigated here the sequence specificity of the proposed -15 element and its relationship to other promoter elements. I found a previously undetected significant conservation of (-13)G and a high degeneracy at (-15)T. I therefore defined the -15 element as a degenerate motif, which, together with the conserved stretch of sequence between -15 and -12, allows treating this element analogously to -35 and -10 elements. Furthermore, the strength of the -15 element inversely correlates with the strengths of the -35 element and -10 element, whereas no such complementation between other promoter elements was found. Despite the direct involvement of -15 element in σ(70)-dsDNA interactions, I found a significantly stronger tendency of this element to complement weak -10 elements that are involved in σ(70)-ssDNA interactions. This finding is in contrast to the established view, according to which the -15 element provides a sufficient number of σ(70)-dsDNA interactions, and suggests that the main parameter determining a functional promoter is the overall promoter strength.
Collapse
|
21
|
Rangannan V, Bansal M. PromBase: a web resource for various genomic features and predicted promoters in prokaryotic genomes. BMC Res Notes 2011; 4:257. [PMID: 21781326 PMCID: PMC3160392 DOI: 10.1186/1756-0500-4-257] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2011] [Accepted: 07/22/2011] [Indexed: 12/19/2022] Open
Abstract
Background As more and more genomes are being sequenced, an overview of their genomic features and annotation of their functional elements, which control the expression of each gene or transcription unit of the genome, is a fundamental challenge in genomics and bioinformatics. Findings Relative stability of DNA sequence has been used to predict promoter regions in 913 microbial genomic sequences with GC-content ranging from 16.6% to 74.9%. Irrespective of the genome GC-content the relative stability based promoter prediction method has already been proven to be robust in terms of recall and precision. The predicted promoter regions for the 913 microbial genomes have been accumulated in a database called PromBase. Promoter search can be carried out in PromBase either by specifying the gene name or the genomic position. Each predicted promoter region has been assigned to a reliability class (low, medium, high, very high and highest) based on the difference between its average free energy and the downstream region. The recall and precision values for each class are shown graphically in PromBase. In addition, PromBase provides detailed information about base composition, CDS and CG/TA skews for each genome and various DNA sequence dependent structural properties (average free energy, curvature and bendability) in the vicinity of all annotated translation start sites (TLS). Conclusion PromBase is a database, which contains predicted promoter regions and detailed analysis of various genomic features for 913 microbial genomes. PromBase can serve as a valuable resource for comparative genomics study and help the experimentalist to rapidly access detailed information on various genomic features and putative promoter regions in any given genome. This database is freely accessible for academic and non- academic users via the worldwide web http://nucleix.mbu.iisc.ernet.in/prombase/.
Collapse
Affiliation(s)
- Vetriselvi Rangannan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560 012, India.
| | | |
Collapse
|
22
|
Sershen CL, Mell JC, Madden SM, Benham CJ. Superhelical duplex destabilization and the recombination position effect. PLoS One 2011; 6:e20798. [PMID: 21695263 PMCID: PMC3111454 DOI: 10.1371/journal.pone.0020798] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2010] [Accepted: 05/12/2011] [Indexed: 11/19/2022] Open
Abstract
The susceptibility to recombination of a plasmid inserted into a chromosome varies with its genomic position. This recombination position effect is known to correlate with the average G+C content of the flanking sequences. Here we propose that this effect could be mediated by changes in the susceptibility to superhelical duplex destabilization that would occur. We use standard nonparametric statistical tests, regression analysis and principal component analysis to identify statistically significant differences in the destabilization profiles calculated for the plasmid in different contexts, and correlate the results with their measured recombination rates. We show that the flanking sequences significantly affect the free energy of denaturation at specific sites interior to the plasmid. These changes correlate well with experimentally measured variations of the recombination rates within the plasmid. This correlation of recombination rate with superhelical destabilization properties of the inserted plasmid DNA is stronger than that with average G+C content of the flanking sequences. This model suggests a possible mechanism by which flanking sequence base composition, which is not itself a context-dependent attribute, can affect recombination rates at positions within the plasmid.
Collapse
Affiliation(s)
- Cheryl L Sershen
- Baylor College of Medicine, Houston, Texas, United States of America.
| | | | | | | |
Collapse
|
23
|
Herbig A, Nieselt K. nocoRNAc: characterization of non-coding RNAs in prokaryotes. BMC Bioinformatics 2011; 12:40. [PMID: 21281482 PMCID: PMC3230914 DOI: 10.1186/1471-2105-12-40] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Accepted: 01/31/2011] [Indexed: 11/10/2022] Open
Abstract
Background The interest in non-coding RNAs (ncRNAs) constantly rose during the past few years because of the wide spectrum of biological processes in which they are involved. This led to the discovery of numerous ncRNA genes across many species. However, for most organisms the non-coding transcriptome still remains unexplored to a great extent. Various experimental techniques for the identification of ncRNA transcripts are available, but as these methods are costly and time-consuming, there is a need for computational methods that allow the detection of functional RNAs in complete genomes in order to suggest elements for further experiments. Several programs for the genome-wide prediction of functional RNAs have been developed but most of them predict a genomic locus with no indication whether the element is transcribed or not. Results We present NOCORNAc, a program for the genome-wide prediction of ncRNA transcripts in bacteria. NOCORNAc incorporates various procedures for the detection of transcriptional features which are then integrated with functional ncRNA loci to determine the transcript coordinates. We applied RNAz and NOCORNAc to the genome of Streptomyces coelicolor and detected more than 800 putative ncRNA transcripts most of them located antisense to protein-coding regions. Using a custom design microarray we profiled the expression of about 400 of these elements and found more than 300 to be transcribed, 38 of them are predicted novel ncRNA genes in intergenic regions. The expression patterns of many ncRNAs are similarly complex as those of the protein-coding genes, in particular many antisense ncRNAs show a high expression correlation with their protein-coding partner. Conclusions We have developed NOCORNAc, a framework that facilitates the automated characterization of functional ncRNAs. NOCORNAc increases the confidence of predicted ncRNA loci, especially if they contain transcribed ncRNAs. NOCORNAc is not restricted to intergenic regions, but it is applicable to the prediction of ncRNA transcripts in whole microbial genomes. The software as well as a user guide and example data is available at http://www.zbit.uni-tuebingen.de/pas/nocornac.htm.
Collapse
Affiliation(s)
- Alexander Herbig
- Center for Bioinformatics Tübingen, University of Tübingen, Sand 14, 72076 Tübingen, Germany
| | | |
Collapse
|
24
|
Apostolaki A, Kalosakas G. Targets of DNA-binding proteins in bacterial promoter regions present enhanced probabilities for spontaneous thermal openings. Phys Biol 2011; 8:026006. [DOI: 10.1088/1478-3975/8/2/026006] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
25
|
Identification of TATA and TATA-less promoters in plant genomes by integrating diversity measure, GC-Skew and DNA geometric flexibility. Genomics 2010; 97:112-20. [PMID: 21112384 DOI: 10.1016/j.ygeno.2010.11.002] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Revised: 11/05/2010] [Accepted: 11/12/2010] [Indexed: 11/20/2022]
Abstract
Accurate identification of core promoters is important for gaining more insight about the understanding of the eukaryotic transcription regulation. In this study, the authors focused on the biologically realistic promoter prediction of plant genomes. By analyzing the correlative conservation, GC-compositional bias and specific structural patterns of TATA and TATA-less promoters in PlantPromDB, a hybrid multi-feature approach based on support vector machine (SVM) for predicting the two types of promoters were developed by integrating local word content, GC-Skew and DNA geometric flexibility. Compared with the TSSP-TCM program on the same test dataset, better prediction results were obtained. Especially for the TATA-less promoter, the accuracy is 10% higher than the result of TSSP-TCM program. The good performance of the hybrid promoters and the experimental data also indicate that our method has the ability to locate the promoter region of the plant genome.
Collapse
|
26
|
Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theory Biosci 2010; 130:91-100. [DOI: 10.1007/s12064-010-0114-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2010] [Accepted: 10/23/2010] [Indexed: 12/27/2022]
|
27
|
Rangannan V, Bansal M. High-quality annotation of promoter regions for 913 bacterial genomes. ACTA ACUST UNITED AC 2010; 26:3043-50. [PMID: 20956245 DOI: 10.1093/bioinformatics/btq577] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
MOTIVATION The number of bacterial genomes being sequenced is increasing very rapidly and hence, it is crucial to have procedures for rapid and reliable annotation of their functional elements such as promoter regions, which control the expression of each gene or each transcription unit of the genome. The present work addresses this requirement and presents a generic method applicable across organisms. RESULTS Relative stability of the DNA double helical sequences has been used to discriminate promoter regions from non-promoter regions. Based on the difference in stability between neighboring regions, an algorithm has been implemented to predict promoter regions on a large scale over 913 microbial genome sequences. The average free energy values for the promoter regions as well as their downstream regions are found to differ, depending on their GC content. Threshold values to identify promoter regions have been derived using sequences flanking a subset of translation start sites from all microbial genomes and then used to predict promoters over the complete genome sequences. An average recall value of 72% (which indicates the percentage of protein and RNA coding genes with predicted promoter regions assigned to them) and precision of 56% is achieved over the 913 microbial genome dataset. AVAILABILITY The binary executable for 'PromPredict' algorithm (implemented in PERL and supported on Linux and MS Windows) and the predicted promoter data for all 913 microbial genomes are available at http://nucleix.mbu.iisc.ernet.in/prombase/.
Collapse
|
28
|
Bland C, Newsome AS, Markovets AA. Promoter prediction in E. coli based on SIDD profiles and Artificial Neural Networks. BMC Bioinformatics 2010; 11 Suppl 6:S17. [PMID: 20946600 PMCID: PMC3026364 DOI: 10.1186/1471-2105-11-s6-s17] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND One of the major challenges in biology is the correct identification of promoter regions. Computational methods based on motif searching have been the traditional approach taken. Recent studies have shown that DNA structural properties, such as curvature, stacking energy, and stress-induced duplex destabilization (SIDD) are useful in promoter prediction, as well. In this paper, the currently used SIDD energy threshold method is compared to the proposed artificial neural network (ANN) approach for finding promoters based on SIDD profile data. RESULTS When compared to the SIDD threshold prediction method, artificial neural networks showed noticeable improvements for precision, recall, and F-score over a range of values. The maximal F-score for the ANN classifier was 62.3 and 56.8 for the threshold-based classifier. CONCLUSIONS Artificial neural networks were used to predict promoters based on SIDD profile data. Results using this technique were an improvement over the previous SIDD threshold approach. Over a wide range of precision-recall values, artificial neural networks were more capable of identifying distinctive characteristics of promoter regions than threshold based methods.
Collapse
Affiliation(s)
- Charles Bland
- Department Natural Sciences and Environmental Health, Mississippi Valley State University, 14000 Hwy 82 West, Itta Bena, Mississippi 38941, USA
| | | | | |
Collapse
|
29
|
Vollbrecht E, Duvick J, Schares JP, Ahern KR, Deewatthanawong P, Xu L, Conrad LJ, Kikuchi K, Kubinec TA, Hall BD, Weeks R, Unger-Wallace E, Muszynski M, Brendel VP, Brutnell TP. Genome-wide distribution of transposed Dissociation elements in maize. THE PLANT CELL 2010; 22:1667-85. [PMID: 20581308 PMCID: PMC2910982 DOI: 10.1105/tpc.109.073452] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2009] [Revised: 04/09/2010] [Accepted: 06/09/2010] [Indexed: 05/18/2023]
Abstract
The maize (Zea mays) transposable element Dissociation (Ds) was mobilized for large-scale genome mutagenesis and to study its endogenous biology. Starting from a single donor locus on chromosome 10, over 1500 elements were distributed throughout the genome and positioned on the maize physical map. Genetic strategies to enrich for both local and unlinked insertions were used to distribute Ds insertions. Global, regional, and local insertion site trends were examined. We show that Ds transposed to both linked and unlinked sites and displayed a nonuniform distribution on the genetic map around the donor r1-sc:m3 locus. Comparison of Ds and Mutator insertions reveals distinct target preferences, which provide functional complementarity of the two elements for gene tagging in maize. In particular, Ds displays a stronger preference for insertions within exons and introns, whereas Mutator insertions are more enriched in promoters and 5'-untranslated regions. Ds has no strong target site consensus sequence, but we identified properties of the DNA molecule inherent to its local structure that may influence Ds target site selection. We discuss the utility of Ds for forward and reverse genetics in maize and provide evidence that genes within a 2- to 3-centimorgan region flanking Ds insertions will serve as optimal targets for regional mutagenesis.
Collapse
Affiliation(s)
- Erik Vollbrecht
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, Iowa 50011, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Dineen DG, Wilm A, Cunningham P, Higgins DG. High DNA melting temperature predicts transcription start site location in human and mouse. Nucleic Acids Res 2010; 37:7360-7. [PMID: 19820114 PMCID: PMC2794178 DOI: 10.1093/nar/gkp821] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The accurate computational prediction of transcription start sites (TSS) in vertebrate genomes is a difficult problem. The physicochemical properties of DNA can be computed in various ways and a many combinations of DNA features have been tested in the past for use as predictors of transcription. We looked in detail at melting temperature, which measures the temperature, at which two strands of DNA separate, considering the cooperative nature of this process. We find that peaks in melting temperature correspond closely to experimentally determined transcription start sites in human and mouse chromosomes. Using melting temperature alone, and with simple thresholding, we can predict TSS with accuracy that is competitive with the most accurate state-of-the-art TSS prediction methods. Accuracy is measured using both experimentally and manually determined TSS. The method works especially well with CpG island containing promoters, but also works when CpG islands are absent. This result is clear evidence of the important role of the physical properties of DNA in the process of transcription. It also points to the importance for TSS prediction methods to include melting temperature as prior information.
Collapse
Affiliation(s)
- David G Dineen
- Complex and Adaptive Systems Laboratory (CASL), University College Dublin, Belfield, Dublin 4, Ireland.
| | | | | | | |
Collapse
|
31
|
Mallios RR, Ojcius DM, Ardell DH. An iterative strategy combining biophysical criteria and duration hidden Markov models for structural predictions of Chlamydia trachomatis sigma66 promoters. BMC Bioinformatics 2009; 10:271. [PMID: 19715597 PMCID: PMC2743672 DOI: 10.1186/1471-2105-10-271] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2009] [Accepted: 08/28/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Promoter identification is a first step in the quest to explain gene regulation in bacteria. It has been demonstrated that the initiation of bacterial transcription depends upon the stability and topology of DNA in the promoter region as well as the binding affinity between the RNA polymerase sigma-factor and promoter. However, promoter prediction algorithms to date have not explicitly used an ensemble of these factors as predictors. In addition, most promoter models have been trained on data from Escherichia coli. Although it has been shown that transcriptional mechanisms are similar among various bacteria, it is quite possible that the differences between Escherichia coli and Chlamydia trachomatis are large enough to recommend an organism-specific modeling effort. RESULTS Here we present an iterative stochastic model building procedure that combines such biophysical metrics as DNA stability, curvature, twist and stress-induced DNA duplex destabilization along with duration hidden Markov model parameters to model Chlamydia trachomatis sigma66 promoters from 29 experimentally verified sequences. Initially, iterative duration hidden Markov modeling of the training set sequences provides a scoring algorithm for Chlamydia trachomatis RNA polymerase sigma66/DNA binding. Subsequently, an iterative application of Stepwise Binary Logistic Regression selects multiple promoter predictors and deletes/replaces training set sequences to determine an optimal training set. The resulting model predicts the final training set with a high degree of accuracy and provides insights into the structure of the promoter region. Model based genome-wide predictions are provided so that optimal promoter candidates can be experimentally evaluated, and refined models developed. Co-predictions with three other algorithms are also supplied to enhance reliability. CONCLUSION This strategy and resulting model support the conjecture that DNA biophysical properties, along with RNA polymerase sigma-factor/DNA binding collaboratively, contribute to a sequence's ability to promote transcription. This work provides a baseline model that can evolve as new Chlamydia trachomatis sigma66 promoters are identified with assistance from the provided genome-wide predictions. The proposed methodology is ideal for organisms with few identified promoters and relatively small genomes.
Collapse
Affiliation(s)
- Ronna R Mallios
- School of Natural Sciences, University of California, Merced, CA 95344, USA.
| | | | | |
Collapse
|
32
|
Shavkunov KS, Masulis IS, Tutukina MN, Deev AA, Ozoline ON. Gains and unexpected lessons from genome-scale promoter mapping. Nucleic Acids Res 2009; 37:4919-31. [PMID: 19528070 PMCID: PMC2731890 DOI: 10.1093/nar/gkp490] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Potential promoters in the genome of Escherichia coli were searched by pattern recognition software PlatProm and classified on the basis of positions relative to gene borders. Beside the expected promoters located in front of the coding sequences we found a considerable amount of intragenic promoter-like signals with a putative ability to drive either antisense or alternative transcription and revealed unusual genomic regions with extremely high density of predicted transcription start points (promoter ‘islands’), some of which are located in coding sequences. PlatProm scores converted into probability of RNA polymerase binding demonstrated certain correlation with the enzyme retention registered by ChIP-on-chip technique; however, in ‘dense’ regions the value of correlation coefficient is lower than throughout the entire genome. Experimental verification confirmed the ability of RNA polymerase to interact and form multiple open complexes within promoter ‘island’ associated with appY, yet transcription efficiency was lower than might be expected. Analysis of expression data revealed the same tendency for other promoter ‘islands’, thus assuming functional relevance of non-productive RNA polymerase binding. Our data indicate that genomic DNA of E. coli is enriched by numerous unusual promoter-like sites with biological role yet to be understood.
Collapse
Affiliation(s)
- K S Shavkunov
- Institute of Cell Biophysics, of Russian Academy of Sciences, Pushchino, Moscow Region 142290, Russian Federation
| | | | | | | | | |
Collapse
|
33
|
Coculescu BI. Antimicrobial resistance induced by genetic changes. J Med Life 2009; 2:114-23. [PMID: 20108530 PMCID: PMC3018982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Decoding the mechanisms of resistance to antibiotics is essential in fighting a phenomenon, which is amplifying everyday due to the uncontrolled excessive and many times unjustified use of anti-microbial substances. At present it has become a matter of public health, together with the resistance of Mycobacterium tuberculosis to tuberculostatic or the spreading of the AIDS virus which not only affects the European countries but the entire globe. This paper presents the genic mutations taking place at the level of bacterial chromosome and inducing the resistance to antibiotics.
Collapse
Affiliation(s)
- Bogdan-Ioan Coculescu
- Microbiology, Parasitology and Virology Laboratory, Centre of Prophylactic Medicine Bucharest, Romania.
| |
Collapse
|
34
|
Williams JA, Carnes AE, Hodgson CP. Plasmid DNA vaccine vector design: impact on efficacy, safety and upstream production. Biotechnol Adv 2009; 27:353-70. [PMID: 19233255 DOI: 10.1016/j.biotechadv.2009.02.003] [Citation(s) in RCA: 121] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2008] [Revised: 02/02/2009] [Accepted: 02/07/2009] [Indexed: 10/21/2022]
Abstract
Critical molecular and cellular biological factors impacting design of licensable DNA vaccine vectors that combine high yield and integrity during bacterial production with increased expression in mammalian cells are reviewed. Food and Drug Administration (FDA), World Health Organization (WHO) and European Medical Agencies (EMEA) regulatory guidance's are discussed, as they relate to vector design and plasmid fermentation. While all new vectors will require extensive preclinical testing to validate safety and performance prior to clinical use, regulatory testing burden for follow-on products can be reduced by combining carefully designed synthetic genes with existing validated vector backbones. A flowchart for creation of new synthetic genes, combining rationale design with bioinformatics, is presented. The biology of plasmid replication is reviewed, and process engineering strategies that reduce metabolic burden discussed. Utilizing recently developed low metabolic burden seed stock and fermentation strategies, optimized vectors can now be manufactured in high yields exceeding 2 g/L, with specific plasmid yields of 5% total dry cell weight.
Collapse
|
35
|
Askary A, Masoudi-Nejad A, Sharafi R, Mizbani A, Parizi SN, Purmasjedi M. N4: A precise and highly sensitive promoter predictor using neural network fed by nearest neighbors. Genes Genet Syst 2009; 84:425-30. [DOI: 10.1266/ggs.84.425] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Amjad Askary
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics and COE in Biomathematics, University of Tehran
- Department of Biotechnology, College of Science, University of Tehran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics and COE in Biomathematics, University of Tehran
| | - Roozbeh Sharafi
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics and COE in Biomathematics, University of Tehran
| | - Amir Mizbani
- Department of Biotechnology, College of Science, University of Tehran
| | | | - Malihe Purmasjedi
- Department of Biotechnology, College of Science, University of Tehran
| |
Collapse
|
36
|
Rangannan V, Bansal M. Relative stability of DNA as a generic criterion for promoter prediction: whole genome annotation of microbial genomes with varying nucleotide base composition. MOLECULAR BIOSYSTEMS 2009; 5:1758-69. [DOI: 10.1039/b906535k] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
37
|
Dekhtyar M, Morin A, Sakanyan V. Triad pattern algorithm for predicting strong promoter candidates in bacterial genomes. BMC Bioinformatics 2008; 9:233. [PMID: 18471287 PMCID: PMC2412878 DOI: 10.1186/1471-2105-9-233] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2007] [Accepted: 05/09/2008] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Bacterial promoters, which increase the efficiency of gene expression, differ from other promoters by several characteristics. This difference, not yet widely exploited in bioinformatics, looks promising for the development of relevant computational tools to search for strong promoters in bacterial genomes. RESULTS We describe a new triad pattern algorithm that predicts strong promoter candidates in annotated bacterial genomes by matching specific patterns for the group I sigma70 factors of Escherichia coli RNA polymerase. It detects promoter-specific motifs by consecutively matching three patterns, consisting of an UP-element, required for interaction with the alpha subunit, and then optimally-separated patterns of -35 and -10 boxes, required for interaction with the sigma70 subunit of RNA polymerase. Analysis of 43 bacterial genomes revealed that the frequency of candidate sequences depends on the A+T content of the DNA under examination. The accuracy of in silico prediction was experimentally validated for the genome of a hyperthermophilic bacterium, Thermotoga maritima, by applying a cell-free expression assay using the predicted strong promoters. In this organism, the strong promoters govern genes for translation, energy metabolism, transport, cell movement, and other as-yet unidentified functions. CONCLUSION The triad pattern algorithm developed for predicting strong bacterial promoters is well suited for analyzing bacterial genomes with an A+T content of less than 62%. This computational tool opens new prospects for investigating global gene expression, and individual strong promoters in bacteria of medical and/or economic significance.
Collapse
Affiliation(s)
| | - Amelie Morin
- Laboratoire de Biotechnologie, UMR CNRS 6204, Université de Nantes, 2 rue de la Houssinière, 44322 Nantes, France
| | - Vehary Sakanyan
- Laboratoire de Biotechnologie, UMR CNRS 6204, Université de Nantes, 2 rue de la Houssinière, 44322 Nantes, France
- ProtNeteomix, 2 rue de la Houssinière, 44322 Nantes, France
| |
Collapse
|
38
|
Wang Z, Jin L, Węgrzyn G, Węgrzyn A. Screening of the osmotic pressure-inducible promoter regions from the whole genome of Escherichia coli by using a novel cloning method. Biotechnol Lett 2008; 30:707-11. [DOI: 10.1007/s10529-007-9583-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2007] [Revised: 10/22/2007] [Accepted: 10/24/2007] [Indexed: 11/27/2022]
|
39
|
Wang H, Benham CJ. Superhelical destabilization in regulatory regions of stress response genes. PLoS Comput Biol 2008; 4:e17. [PMID: 18208321 PMCID: PMC2211533 DOI: 10.1371/journal.pcbi.0040017] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2007] [Accepted: 12/03/2007] [Indexed: 11/18/2022] Open
Abstract
Stress-induced DNA duplex destabilization (SIDD) analysis exploits the known structural and energetic properties of DNA to predict sites that are susceptible to strand separation under negative superhelical stress. When this approach was used to calculate the SIDD profile of the entire Escherichia coli K12 genome, it was found that strongly destabilized sites occur preferentially in intergenic regions that are either known or inferred to contain promoters, but rarely occur in coding regions. Here, we investigate whether the genes grouped in different functional categories have characteristic SIDD properties in their upstream flanks. We report that strong SIDD sites in the E. coli K12 genome are statistically significantly overrepresented in the upstream regions of genes encoding transcriptional regulators. In particular, the upstream regions of genes that directly respond to physiological and environmental stimuli are more destabilized than are those regions of genes that are not involved in these responses. Moreover, if a pathway is controlled by a transcriptional regulator whose gene has a destabilized 5′ flank, then the genes (operons) in that pathway also usually contain strongly destabilized SIDD sites in their 5′ flanks. We observe this statistically significant association of SIDD sites with upstream regions of genes functioning in transcription in 38 of 43 genomes of free-living bacteria, but in only four of 18 genomes of endosymbionts or obligate parasitic bacteria. These results suggest that strong SIDD sites 5′ to participating genes may be involved in transcriptional responses to environmental changes, which are known to transiently alter superhelicity. We propose that these SIDD sites are active and necessary participants in superhelically mediated regulatory mechanisms governing changes in the global pattern of gene expression in prokaryotes in response to physiological or environmental changes. DNA in vivo experiences regulated amounts of untwisting stress. If sufficiently large, these stresses can destabilize the double helix at specific locations. These sites then become favored locations for strand separations. Gene expression and DNA replication, the two major jobs of DNA, both require the strands of the duplex to be separated. Thus, events that affect the ease of strand separation can regulate the initiation of these processes. Stress-induced DNA duplex destabilization (SIDD) has been implicated in mechanisms regulating several biological processes, including the initiation of gene expression and replication. We have developed computational methods that accurately predict the locations and extents of destabilization within genomic DNA sequences that occur in response to specified stress levels. Here, we report that the easily destabilized sites we find in the Escherichia coli K12 genome are statistically significantly overrepresented in the upstream regions of genes encoding proteins that regulate transcription. In particular, the regions upstream of genes that directly respond to physiological and environmental stimuli are more destabilized than are those regions of genes that are not involved in these responses. These results suggest that strong SIDD sites upstream of participating genes may be involved in transcriptional responses to environmental changes.
Collapse
Affiliation(s)
- Huiquan Wang
- UC Davis Genome Center, University of California Davis, Davis, California, United States of America
| | - Craig J Benham
- UC Davis Genome Center, University of California Davis, Davis, California, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
40
|
Abeel T, Saeys Y, Bonnet E, Rouzé P, Van de Peer Y. Generic eukaryotic core promoter prediction using structural features of DNA. Genes Dev 2008; 18:310-23. [PMID: 18096745 PMCID: PMC2203629 DOI: 10.1101/gr.6991408] [Citation(s) in RCA: 133] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2007] [Accepted: 11/14/2007] [Indexed: 11/24/2022]
Abstract
Despite many recent efforts, in silico identification of promoter regions is still in its infancy. However, the accurate identification and delineation of promoter regions is important for several reasons, such as improving genome annotation and devising experiments to study and understand transcriptional regulation. Current methods to identify the core region of promoters require large amounts of high-quality training data and often behave like black box models that output predictions that are difficult to interpret. Here, we present a novel approach for predicting promoters in whole-genome sequences by using large-scale structural properties of DNA. Our technique requires no training, is applicable to many eukaryotic genomes, and performs extremely well in comparison with the best available promoter prediction programs. Moreover, it is fast, simple in design, and has no size constraints, and the results are easily interpretable. We compared our approach with 14 current state-of-the-art implementations using human gene and transcription start site data and analyzed the ENCODE region in more detail. We also validated our method on 12 additional eukaryotic genomes, including vertebrates, invertebrates, plants, fungi, and protists.
Collapse
Affiliation(s)
- Thomas Abeel
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| | - Yvan Saeys
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| | - Eric Bonnet
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| | - Pierre Rouzé
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
- Laboratoire Associé de l’INRA (France), Ghent University, 9052 Gent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| |
Collapse
|
41
|
Janga SC, Collado-Vides J. Structure and evolution of gene regulatory networks in microbial genomes. Res Microbiol 2007; 158:787-94. [PMID: 17996425 DOI: 10.1016/j.resmic.2007.09.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2007] [Revised: 08/07/2007] [Accepted: 09/17/2007] [Indexed: 12/24/2022]
Abstract
With the availability of genome sequences for hundreds of microbial genomes, it has become possible to address several questions from a comparative perspective to understand the structure and function of regulatory systems, at least in model organisms. Recent studies have focused on topological properties and the evolution of regulatory networks and their components. Our understanding of natural networks is paving the way to embedding synthetic regulatory systems into organisms, allowing us to expand the natural diversity of living systems to an extent we had never before anticipated.
Collapse
Affiliation(s)
- Sarath Chandra Janga
- Program of Computational Genomics, CCG-UNAM, Apdo Postal 565-A, Cuernavaca, Morelos, 62100 Mexico.
| | | |
Collapse
|
42
|
Liu F, Tøstesen E, Sundet JK, Jenssen TK, Bock C, Jerstad GI, Thilly WG, Hovig E. The human genomic melting map. PLoS Comput Biol 2007; 3:e93. [PMID: 17511513 PMCID: PMC1868775 DOI: 10.1371/journal.pcbi.0030093] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2006] [Accepted: 04/11/2007] [Indexed: 11/19/2022] Open
Abstract
In a living cell, the antiparallel double-stranded helix of DNA is a dynamically changing structure. The structure relates to interactions between and within the DNA strands, and the array of other macromolecules that constitutes functional chromatin. It is only through its changing conformations that DNA can organize and structure a large number of cellular functions. In particular, DNA must locally uncoil, or melt, and become single-stranded for DNA replication, repair, recombination, and transcription to occur. It has previously been shown that this melting occurs cooperatively, whereby several base pairs act in concert to generate melting bubbles, and in this way constitute a domain that behaves as a unit with respect to local DNA single-strandedness. We have applied a melting map calculation to the complete human genome, which provides information about the propensities of forming local bubbles determined from the whole sequence, and present a first report on its basic features, the extent of cooperativity, and correlations to various physical and biological features of the human genome. Globally, the melting map covaries very strongly with GC content. Most importantly, however, cooperativity of DNA denaturation causes this correlation to be weaker at resolutions fewer than 500 bps. This is also the resolution level at which most structural and biological processes occur, signifying the importance of the informational content inherent in the genomic melting map. The human DNA melting map may be further explored at http://meltmap.uio.no.
Collapse
Affiliation(s)
- Fang Liu
- Department of Tumor Biology, Institute for Cancer Research, Rikshospitalet-Radiumhospitalet Medical Center, Oslo, Norway
- PubGene AS, Vinderen, Oslo, Norway
| | - Eivind Tøstesen
- Department of Tumor Biology, Institute for Cancer Research, Rikshospitalet-Radiumhospitalet Medical Center, Oslo, Norway
| | | | | | - Christoph Bock
- Max-Planck-Institut für Informatik, Saarbrücken, Germany
| | - Geir Ivar Jerstad
- Department of Tumor Biology, Institute for Cancer Research, Rikshospitalet-Radiumhospitalet Medical Center, Oslo, Norway
| | - William G Thilly
- Biological Engineering Division, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Eivind Hovig
- Department of Tumor Biology, Institute for Cancer Research, Rikshospitalet-Radiumhospitalet Medical Center, Oslo, Norway
- Institute of Informatics, University of Oslo, Norway
- Medical Informatics, Institute for Cancer Research, Rikshospitalet-Radiumhospitalet Medical Center, Oslo, Norway
| |
Collapse
|