1
|
Wang Y, Peng Q, Mou X, Wang X, Li H, Han T, Sun Z, Wang X. A successful hybrid deep learning model aiming at promoter identification. BMC Bioinformatics 2022; 23:206. [PMID: 35641900 PMCID: PMC9158169 DOI: 10.1186/s12859-022-04735-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 05/16/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The zone adjacent to a transcription start site (TSS), namely, the promoter, is primarily involved in the process of DNA transcription initiation and regulation. As a result, proper promoter identification is critical for further understanding the mechanism of the networks controlling genomic regulation. A number of methodologies for the identification of promoters have been proposed. Nonetheless, due to the great heterogeneity existing in promoters, the results of these procedures are still unsatisfactory. In order to establish additional discriminative characteristics and properly recognize promoters, we developed the hybrid model for promoter identification (HMPI), a hybrid deep learning model that can characterize both the native sequences of promoters and the morphological outline of promoters at the same time. We developed the HMPI to combine a method called the PSFN (promoter sequence features network), which characterizes native promoter sequences and deduces sequence features, with a technique referred to as the DSPN (deep structural profiles network), which is specially structured to model the promoters in terms of their structural profile and to deduce their structural attributes. RESULTS The HMPI was applied to human, plant and Escherichia coli K-12 strain datasets, and the findings showed that the HMPI was successful at extracting the features of the promoter while greatly enhancing the promoter identification performance. In addition, after the improvements of synthetic sampling, transfer learning and label smoothing regularization, the improved HMPI models achieved good results in identifying subtypes of promoters on prokaryotic promoter datasets. CONCLUSIONS The results showed that the HMPI was successful at extracting the features of promoters while greatly enhancing the performance of identifying promoters on both eukaryotic and prokaryotic datasets, and the improved HMPI models are good at identifying subtypes of promoters on prokaryotic promoter datasets. The HMPI is additionally adaptable to different biological functional sequences, allowing for the addition of new features or models.
Collapse
Affiliation(s)
- Ying Wang
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China
| | - Qinke Peng
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China.
| | - Xu Mou
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China
| | - Xinyuan Wang
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China
| | - Haozhou Li
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China
| | - Tian Han
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China
| | - Zhao Sun
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China
| | - Xiao Wang
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China
| |
Collapse
|
2
|
Martinez GS, de Ávila e Silva S, Kumar A, Pérez-Rueda E. DNA structural and physical properties reveal peculiarities in promoter sequences of the bacterium Escherichia coli K-12. SN APPLIED SCIENCES 2021. [DOI: 10.1007/s42452-021-04713-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
Abstract
AbstractThe gene transcription of bacteria starts with a promoter sequence being recognized by a transcription factor found in the RNAP enzyme, this process is assisted through the conservation of nucleotides as well as other factors governing these intergenic regions. Faced with this, the coding of genetic information into physical aspects of the DNA such as enthalpy, stability, and base-pair stacking could suggest promoter activity as well as protrude differentiation of promoter and non-promoter data. In this work, a total of 3131 promoter sequences associated to six different sigma factors in the bacterium E. coli were converted into numeric attributes, a strong set of control sequences referring to a shuffled version of the original sequences as well as coding regions is provided. Then, the parameterized genetic information was normalized, exhaustively analyzed through statistical tests. The results suggest that strong signals in the promoter sequences match the binding site of transcription factor proteins, indicating that promoter activity is well represented by its conversion into physical attributes. Moreover, the features tested in this report conveyed significant variances between promoter and control data, enabling these features to be employed in bacterial promoter classification. The results produced here may aid in bacterial promoter recognition by providing a robust set of biological inferences.
Collapse
|
3
|
Orlov MA, Sorokin AA. DNA sequence, physics, and promoter function: Analysis of high-throughput data On T7 promoter variants activity. J Bioinform Comput Biol 2021; 18:2040001. [PMID: 32404013 DOI: 10.1142/s0219720020400016] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
RNA polymerase/promoter recognition represents a basic problem of molecular biology. Decades-long efforts were made in the area, and yet certain challenges persist. The usage of certain most suitable model subjects is pivotal for the research. System of T7 bacteriophage RNA-polymerase/T7 native promoter represents an exceptional example for the purpose. Moreover, it has been studied the most and successfully applied to aims of biotechnology and bioengineering. Both structural simplicity and high specificity of this molecular duo are the reason for this. Despite highly similar sequences of distinct T7 native promoters, the T7 RNA-polymerase enzyme is capable of binding respective promoter in a highly specific and adjustable manner. One explanation here is that the process relies primarily on DNA physical properties rather than nucleotide sequence. Here, we address the issue by analyzing massive data recently published by Komura and colleagues. This initial study employed Next Generation Sequencing (NGS) in order to quantify activity of promoter variants including ones with multiple substitutions. As a result of our work substantial bias in simultaneous occurrence of single-nucleotide sequence alterations was found: the highest rate of co-occurrence was evidenced within specificity loop of binding region while the lowest - in initiation region of promoter. If both location and a kind of nucleotides involved in replacement (both initial and resulting) are taken into consideration, one can easily note that N to A substitutions are most preferred ones across the whole 19 b.p.-long sequence. At the same time, N to C are tolerated only at crucial position in recognition loop of binding region, and N to G are uniformly least tolerable. Later in this work the complete set of variants was split into groups with mutations (1) exclusively in binding region; (2) exclusively in melting region; (3) in both regions. Among these three groups second comprises extremely few variants (at triple-digit rate lesser than in two other groups, 46 versus over one and six thousand). Yet these are all promoter with substantial to high activity. This group two appeared heterogenous by primary sequence; indeed, upon further subdivision into above versus below average activity subgroups first one was found to comprise promoters with negligible conservation at -2 position of melting region; the second was hardly conserved in this region at all. This draws our attention to perfect consensus sequence of class III T7 promoter with -2 nucleotide randomized (all four are present by one to several copies in the previously published source dataset), the picture becomes even more pronounced. We therefore suggest that mutations at the position therefore do not cause significant changes in terms of promoter activity. At the same time, such modifications dramatically change DNA physical properties which were calculated in our study (namely electrostatic potential and propensity to bend). One possible suggestion here is that -2 nucleotide might function as a generic switch; if so, substitution -2A to -2T has important regulatory consequences. The fact that that -2 b.p. is the most evidently different nucleotide between class II versus class III promoters of T7 genome and that it also distinguishes the class III promoter in T7 genome versus promoters of its relative but reproductively isolated bacteriophage T3. In other words, it appears feasible that mutation at -2 nucleotide does not impede promoter activity yet alter its physical properties thus affecting differential RNA polymerase/promoter interaction.
Collapse
Affiliation(s)
- Mikhail A Orlov
- Institute of Cell Biophysics of RAS, 3 Institutskaya str., Poushchino, 142290, Russia
| | - Anatoly A Sorokin
- Institute of Cell Biophysics of RAS, 3 Institutskaya str., Poushchino, 142290, Russia
| |
Collapse
|
4
|
Ryasik A, Orlov M, Zykova E, Ermak T, Sorokin A. Bacterial promoter prediction: Selection of dynamic and static physical properties of DNA for reliable sequence classification. J Bioinform Comput Biol 2018; 16:1840003. [DOI: 10.1142/s0219720018400036] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Predicting promoter activity of DNA fragment is an important task for computational biology. Approaches using physical properties of DNA to predict bacterial promoters have recently gained a lot of attention. To select an adequate set of physical properties for training a classifier, various characteristics of DNA molecule should be taken into consideration. Here, we present a systematic approach that allows us to select less correlated properties for classification by means of both correlation and cophenetic coefficients as well as concordance matrices. To prove this concept, we have developed the first classifier that uses not only sequence and static physical properties of DNA fragment, but also dynamic properties of DNA open states. Therefore, the best performing models with accuracy values up to 90% for all types of sequences were obtained. Furthermore, we have demonstrated that the classifier can serve as a reliable tool enabling promoter DNA fragments to be distinguished from promoter islands despite the similarity of their nucleotide sequences.
Collapse
Affiliation(s)
- Artem Ryasik
- Mechanism of Cell Genome Functioning Laboratory, Institute of Cell Biophysics, ul. Institutskaya 3, Pushchino 142290, Russia
| | - Mikhail Orlov
- Mechanism of Cell Genome Functioning Laboratory, Institute of Cell Biophysics, ul. Institutskaya 3, Pushchino 142290, Russia
| | - Evgenia Zykova
- Mechanism of Cell Genome Functioning Laboratory, Institute of Cell Biophysics, ul. Institutskaya 3, Pushchino 142290, Russia
- Department of Applied Research Informatization, State Institute of Information Technologies and Telecommunications (SIIT&T Informika), per. Brusov 21 st.2, Moscow, 125009, Russia
| | - Timofei Ermak
- Laboratory of Molecular Genetics Systems, Institute of Cytology and Genetics, pr. Akademika Lavrentyeva 10, Novosibirsk 630090, Russia
| | - Anatoly Sorokin
- Mechanism of Cell Genome Functioning Laboratory, Institute of Cell Biophysics, ul. Institutskaya 3, Pushchino 142290, Russia
| |
Collapse
|
5
|
Rawal K, Ramaswamy R. Genome-wide analysis of mobile genetic element insertion sites. Nucleic Acids Res 2011; 39:6864-78. [PMID: 21609951 PMCID: PMC3167599 DOI: 10.1093/nar/gkr337] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Mobile genetic elements (MGEs) account for a significant fraction of eukaryotic genomes and are implicated in altered gene expression and disease. We present an efficient computational protocol for MGE insertion site analysis. ELAN, the suite of tools described here uses standard techniques to identify different MGEs and their distribution on the genome. One component, DNASCANNER analyses known insertion sites of MGEs for the presence of signals that are based on a combination of local physical and chemical properties. ISF (insertion site finder) is a machine-learning tool that incorporates information derived from DNASCANNER. ISF permits classification of a given DNA sequence as a potential insertion site or not, using a support vector machine. We have studied the genomes of Homo sapiens, Mus musculus, Drosophila melanogaster and Entamoeba histolytica via a protocol whereby DNASCANNER is used to identify a common set of statistically important signals flanking the insertion sites in the various genomes. These are used in ISF for insertion site prediction, and the current accuracy of the tool is over 65%. We find similar signals at gene boundaries and splice sites. Together, these data are suggestive of a common insertion mechanism that operates in a variety of eukaryotes.
Collapse
Affiliation(s)
- Kamal Rawal
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110 067, India
| | | |
Collapse
|
6
|
Shavkunov KS, Masulis IS, Tutukina MN, Deev AA, Ozoline ON. Gains and unexpected lessons from genome-scale promoter mapping. Nucleic Acids Res 2009; 37:4919-31. [PMID: 19528070 PMCID: PMC2731890 DOI: 10.1093/nar/gkp490] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Potential promoters in the genome of Escherichia coli were searched by pattern recognition software PlatProm and classified on the basis of positions relative to gene borders. Beside the expected promoters located in front of the coding sequences we found a considerable amount of intragenic promoter-like signals with a putative ability to drive either antisense or alternative transcription and revealed unusual genomic regions with extremely high density of predicted transcription start points (promoter ‘islands’), some of which are located in coding sequences. PlatProm scores converted into probability of RNA polymerase binding demonstrated certain correlation with the enzyme retention registered by ChIP-on-chip technique; however, in ‘dense’ regions the value of correlation coefficient is lower than throughout the entire genome. Experimental verification confirmed the ability of RNA polymerase to interact and form multiple open complexes within promoter ‘island’ associated with appY, yet transcription efficiency was lower than might be expected. Analysis of expression data revealed the same tendency for other promoter ‘islands’, thus assuming functional relevance of non-productive RNA polymerase binding. Our data indicate that genomic DNA of E. coli is enriched by numerous unusual promoter-like sites with biological role yet to be understood.
Collapse
Affiliation(s)
- K S Shavkunov
- Institute of Cell Biophysics, of Russian Academy of Sciences, Pushchino, Moscow Region 142290, Russian Federation
| | | | | | | | | |
Collapse
|
7
|
Shelenkov A, Korotkov E. Search of regular sequences in promoters from eukaryotic genomes. Comput Biol Chem 2009; 33:196-204. [DOI: 10.1016/j.compbiolchem.2009.03.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2008] [Revised: 02/08/2009] [Accepted: 03/18/2009] [Indexed: 12/14/2022]
|
8
|
Sclavi B. Opening the DNA at the Promoter; The Energetic Challenge. RNA POLYMERASES AS MOLECULAR MOTORS 2009. [DOI: 10.1039/9781847559982-00038] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Bianca Sclavi
- LBPA UMR 8113 du CNRS ENS Cachan 61 Avenue du Président Wilson 94235 Cachan France
| |
Collapse
|
9
|
Mruk I, Rajesh P, Blumenthal RM. Regulatory circuit based on autogenous activation-repression: roles of C-boxes and spacer sequences in control of the PvuII restriction-modification system. Nucleic Acids Res 2007; 35:6935-52. [PMID: 17933763 PMCID: PMC2175313 DOI: 10.1093/nar/gkm837] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Type II restriction-modification (R-M) systems comprise a restriction endonuclease (REase) and a protective methyltransferase (MTase). After R-M genes enter a new cell, MTase must appear before REase or the chromosome will be cleaved. PvuII and some other R-M systems achieve this delay by cotranscribing the REase gene with the gene for an autogenous transcription activator (the controlling or 'C' protein C.PvuII). This study reveals, through in vivo titration, that C.PvuII is not only an activator but also a repressor for its own gene. In other systems, this type of circuit can result in oscillatory behavior. Despite the use of identical, symmetrical C protein-binding sequences (C-boxes) in the left and right operators, C.PvuII showed higher in vitro affinity for O(L) than for O(R), implicating the spacer sequences in this difference. Mutational analysis associated the repression with O(R), which overlaps the promoter -35 hexamer but is otherwise dispensable for activation. A nonrepressing mutant exhibited poor establishment in new cells. Comparing promoter-operator regions from PvuII and 29 R-M systems controlled by C proteins revealed that the most-highly conserved sequence is the tetranucleotide spacer separating O(L) from O(R). Any changes in that spacer reduced the stability of C.PvuII-operator complexes and abolished activation.
Collapse
Affiliation(s)
- Iwona Mruk
- Department of Medical Microbiology and Immunology, University of Toledo Health Sciences Campus, Toledo, OH 43614-2598, USA
| | | | | |
Collapse
|
10
|
Wang J, Hannenhalli S. Generalizations of Markov model to characterize biological sequences. BMC Bioinformatics 2005; 6:219. [PMID: 16144548 PMCID: PMC1236913 DOI: 10.1186/1471-2105-6-219] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2005] [Accepted: 09/06/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The currently used kth order Markov models estimate the probability of generating a single nucleotide conditional upon the immediately preceding (gap = 0) k units. However, this neither takes into account the joint dependency of multiple neighboring nucleotides, nor does it consider the long range dependency with gap > 0. RESULT We describe a configurable tool to explore generalizations of the standard Markov model. We evaluated whether the sequence classification accuracy can be improved by using an alternative set of model parameters. The evaluation was done on four classes of biological sequences--CpG-poor promoters, all promoters, exons and nucleosome positioning sequences. Using di- and tri-nucleotide as the model unit significantly improved the sequence classification accuracy relative to the standard single nucleotide model. In the case of nucleosome positioning sequences, optimal accuracy was achieved at a gap length of 4. Furthermore in the plot of classification accuracy versus the gap, a periodicity of 10-11 bps was observed which might indicate structural preferences in the nucleosome positioning sequence. The tool is implemented in Java and is available for download at ftp://ftp.pcbi.upenn.edu/GMM/. CONCLUSION Markov modeling is an important component of many sequence analysis tools. We have extended the standard Markov model to incorporate joint and long range dependencies between the sequence elements. The proposed generalizations of the Markov model are likely to improve the overall accuracy of sequence analysis tools.
Collapse
Affiliation(s)
- Junwen Wang
- Penn Center for Bioinformatics, Department of Genetics, University of Pennsylvania Philadelphia, PA 19104-6021, USA
| | - Sridhar Hannenhalli
- Penn Center for Bioinformatics, Department of Genetics, University of Pennsylvania Philadelphia, PA 19104-6021, USA
| |
Collapse
|
11
|
Fukue Y, Sumida N, Nishikawa JI, Ohyama T. Core promoter elements of eukaryotic genes have a highly distinctive mechanical property. Nucleic Acids Res 2004; 32:5834-40. [PMID: 15520466 PMCID: PMC528791 DOI: 10.1093/nar/gkh905] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In spite of the abundant data on DNA sequence, the mechanical aspects of promoter DNA remain poorly understood. We classified 1871 human and 196 mouse RNA polymerase II promoters and investigated average flexibility profiles of the human promoters containing either a TATA box or an initiator (Inr) sequence only. Here, we show that TATA boxes and Inr sequences have a common anomalous mechanical property: they are comprised of distinctively flexible and rigid sequences, compared with the other parts of the promoter region. The +2 position in the Inr consensus sequence does not favor adenine to keep the high flexibility and thus this position is more accurately represented as 'T, G, C>>A'. Additionally, it was also found that DNA region upstream of TATA box or Inr sequence is more rigid than region downstream of each element. These properties may function as a marker for recognition by TATA-binding protein and Inr-binding protein.
Collapse
Affiliation(s)
- Yoshiro Fukue
- Department of Biology, Faculty of Science and Engineering, Konan University, 8-9-1 Okamoto, Higashinada-ku, Kobe 658-8501, Japan
| | | | | | | |
Collapse
|
12
|
Hosid S, Trifonov EN, Bolshoy A. Sequence periodicity of Escherichia coli is concentrated in intergenic regions. BMC Mol Biol 2004; 5:14. [PMID: 15333140 PMCID: PMC516772 DOI: 10.1186/1471-2199-5-14] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2003] [Accepted: 08/26/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Sequence periodicity with a period close to the DNA helical repeat is a very basic genomic property. This genomic feature was demonstrated for many prokaryotic genomes. The Escherichia coli sequences display the period close to 11 base pairs. RESULTS Here we demonstrate that practically only ApA/TpT dinucleotides contribute to overall dinucleotide periodicity in Escherichia coli. The noncoding sequences reveal this periodicity much more prominently compared to protein-coding sequences. The sequence periodicity of ApC/GpT, ApT and GpC dinucleotides along the Escherichia coli K-12 is found to be located as well mainly within the intergenic regions. CONCLUSIONS The observed concentration of the dinucleotide sequence periodicity in the intergenic regions of E. coli suggests that the periodicity is a typical property of prokaryotic intergenic regions. We suppose that this preferential distribution of dinucleotide periodicity serves many biological functions; first of all, the regulation of transcription.
Collapse
Affiliation(s)
- Sergey Hosid
- Genome Diversity Center, Institute of Evolution, University of Haifa, Mt. Carmel 31905 ISRAEL
| | - Edward N Trifonov
- Genome Diversity Center, Institute of Evolution, University of Haifa, Mt. Carmel 31905 ISRAEL
| | - Alexander Bolshoy
- Genome Diversity Center, Institute of Evolution, University of Haifa, Mt. Carmel 31905 ISRAEL
| |
Collapse
|
13
|
Liu M, Tolstorukov M, Zhurkin V, Garges S, Adhya S. A mutant spacer sequence between -35 and -10 elements makes the Plac promoter hyperactive and cAMP receptor protein-independent. Proc Natl Acad Sci U S A 2004; 101:6911-6. [PMID: 15118087 PMCID: PMC406441 DOI: 10.1073/pnas.0401929101] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To determine whether the spacer region between the -35 and -10 elements plays any sequence-specific role, we randomized the GC-rich sequence ((-20)CCGGCTCG(-13)) within the spacer region of the cAMP-dependent lac promoter and selected an activator-independent mutant, which showed extraordinarily high intrinsic activity. The hyperactive promoter is obtained by incorporation of a specific 10-bp-long AT-rich DNA sequence within the spacer, referred to as the -15 sequence, which must be juxtaposed to the upstream end of the -10 sequence for the hyperactivity. The transcription enhancement functions only in the presence of a -35 element. The spacer sequence enhanced both RNA polymerase binding and open complex formation. Isolated in the lac promoter, it also enhanced transcription when placed at two other unrelated promoters. Sequence analysis shows a low GC content and an abundance of stereochemically flexible TG:CA and TA:TA dimeric steps in the -18/-9 region and a strong correlation between the presence of flexible dimeric steps in this region and the intrinsic strength of the promoter.
Collapse
Affiliation(s)
- Mofang Liu
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | | | | | | | | |
Collapse
|
14
|
Beloin C, Jeusset J, Revet B, Mirambeau G, Le Hégarat F, Le Cam E. Contribution of DNA conformation and topology in right-handed DNA wrapping by the Bacillus subtilis LrpC protein. J Biol Chem 2003; 278:5333-42. [PMID: 12458218 DOI: 10.1074/jbc.m207489200] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The Bacillus subtilis LrpC protein belongs to the Lrp/AsnC family of transcriptional regulators. It binds the upstream region of the lrpC gene and autoregulates its expression. In this study, we have dissected the mechanisms that govern the interaction of LrpC with DNA by electrophoretic mobility shift assay, electron microscopy, and atomic force microscopy. LrpC is a structure-specific DNA binding protein that forms stable complexes with curved sequences containing phased A tracts and wraps DNA to form spherical, nucleosome-like structures. Formation of such wraps, initiated by cooperative binding of LrpC to DNA, results from optimal protein/protein interactions specified by the DNA conformation. In addition, we have demonstrated that LrpC constrains positive supercoils by wrapping the DNA in a right-handed superhelix, as visualized by electron microscopy.
Collapse
Affiliation(s)
- Christophe Beloin
- Institut de Génétique et Microbiologie, Université Paris XI, Unité Mixte Recherche 8621, Bâtiment 360, 91405 Orsay Cedex, France
| | | | | | | | | | | |
Collapse
|
15
|
Thayer KM, Beveridge DL. Hidden Markov models from molecular dynamics simulations on DNA. Proc Natl Acad Sci U S A 2002; 99:8642-7. [PMID: 12072566 PMCID: PMC124344 DOI: 10.1073/pnas.132148699] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
An enhanced bioinformatics tool incorporating the participation of molecular structure as well as sequence in protein DNA recognition is proposed and tested. Boltzmann probability models of sequence-dependent DNA structure from all-atom molecular dynamics simulations were obtained and incorporated into hidden Markov models (HMMs) that can recognize molecular structural signals as well as sequence in protein-DNA binding sites on a genome. The binding of catabolite activator protein (CAP) to cognate DNA sequences was used as a prototype case for implementation and testing of the method. The results indicate that even HMMs based on probabilistic roll/tilt dinucleotide models of sequence-dependent DNA structure have some capability to discriminate between known CAP binding and nonbinding sites and to predict putative CAP binding sites in unknowns. Restricting HMMs to sequence only in regions of strong consensus in which the protein makes base specific contacts with the cognate DNA further improved the discriminatory capabilities of the HMMs. Comparison of results with controls based on sequence only indicates that extending the definition of consensus from sequence to structure improves the transferability of the HMMs, and provides further supportive evidence of a role for dynamical molecular structure as well as sequence in genomic regulatory mechanisms.
Collapse
Affiliation(s)
- Kelly M Thayer
- Department of Molecular Biology and Biochemistry, Wesleyan University, Middletown, CT 06457, USA.
| | | |
Collapse
|
16
|
Masulis IS, Buckin VA, Ozoline ON. Flexible elements in the structure of promoter DNA as probed by cationic surfactant binding. J Biomol Struct Dyn 2002; 19:919-27. [PMID: 11922845 DOI: 10.1080/07391102.2002.10506794] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
A susceptibility of promoter DNA for adaptive conformational transitions has been studied using a cationic surfactant dodecyltrimethylammonium bromide (C(12)TAB) as a model DNA-binding ligand. DNAse 1 and KMnO(4) were utilized as structure-specific reagents. Both reagents revealed ligand-induced perturbations in the double helix of promoters T7A1 and T7D. These conformational transitions appeared to be strongly associated with pyrimidine-purine steps, which have non-random distribution within RNA polymerase contact region of the promoter DNA and are present in the binding sites for a majority of transcription regulation proteins. Potential flexibility of these elements creates therefore a specific media for transcription complex formation. Molecular mechanism of DNA interaction with C(12)TAB is discussed.
Collapse
Affiliation(s)
- I S Masulis
- Institute of Cell Biophysics, Russian Academy of Sciences Pushchino, Moscow Region, 142290 Russia
| | | | | |
Collapse
|
17
|
Miyano M, Kawashima T, Ohyama T. A common feature shared by bent DNA structures locating in the eukaryotic promoter region. Mol Biol Rep 2001; 28:53-61. [PMID: 11710566 DOI: 10.1023/a:1011999730828] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Eukaryotic promoters often contain a bent DNA structure, suggesting that this structure plays some role in transcription. To reveal the role, we need more information on the promoters that contain or flank a bent DNA structure. In this study, we collected such promoters by the following approach: we first isolated human genomic DNA fragments that contained at least one bent DNA structure, then shotgun cloned them into a promoter trap vector, screened DNA fragments that functioned as a promoter, and finally found the promoters of interest by determining the bent DNA locus and the region expressing promoter activity. From 1,187 recombinant plasmids, we isolated 51 that showed promoter activity. Structural and functional analyses of randomly selected 10 clones with inserts of 548-913 bp demonstrated 11 sequences that could drive transcription. Unexpectedly, all of these clones met our purpose: i.e., each segment that showed a promoter activity (67-179 bp) was very close to the bent DNA structure (spanning about 150 bp in all clones), and in some cases overlapped it. More interestingly, these bent DNA structures all had a superhelical writhe. We propose a hypothesis that in the bent-DNA-containing eukaryotic promoters. bent DNA organizes local chromatin infrastructure appropriately for transcription initiation.
Collapse
Affiliation(s)
- M Miyano
- Department of Biology, Faculty of Science, Konan University, Kobe, Japan
| | | | | |
Collapse
|
18
|
Qicheng Ma, Wang J, Shasha D, Wu C. DNA sequence classification via an expectation maximization algorithm and neural networks: a case study. ACTA ACUST UNITED AC 2001. [DOI: 10.1109/5326.983930] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
19
|
Roychoudhury M, Sitlani A, Lapham J, Crothers DM. Global structure and mechanical properties of a 10-bp nucleosome positioning motif. Proc Natl Acad Sci U S A 2000; 97:13608-13. [PMID: 11095739 PMCID: PMC17623 DOI: 10.1073/pnas.250476297] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The method of DNA cyclization kinetics reveals special properties of the TATAAACGCC sequence motif found in DNA sequences that have high affinity for core histones. Replacement of 30 bp of generic DNA by three 10-bp repeats of the motif in small cyclization constructs increases cyclization rates by two orders of magnitude. We document a 13 degrees bend in the motif and characterize the direction of curvature. The bending force constant is smaller by nearly 2-fold and there is a 35% decrease in the twist modulus, relative to generic DNA. These features are the likely source of the high affinity for bending around core histones to form nucleosomes. Our results establish a protocol for determination of the ensemble-averaged global solution structure and mechanical properties of any approximately 10-bp DNA sequence element of interest, providing information complementary to that from NMR and crystallographic structural studies.
Collapse
Affiliation(s)
- M Roychoudhury
- Department of Chemistry, P.O. Box 208107, Yale University, New Haven, CT 06520, USA
| | | | | | | |
Collapse
|
20
|
Pedersen AG, Jensen LJ, Brunak S, Staerfeldt HH, Ussery DW. A DNA structural atlas for Escherichia coli. J Mol Biol 2000; 299:907-30. [PMID: 10843847 DOI: 10.1006/jmbi.2000.3787] [Citation(s) in RCA: 178] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We have performed a computational analysis of DNA structural features in 18 fully sequenced prokaryotic genomes using models for DNA curvature, DNA flexibility, and DNA stability. The structural values that are computed for the Escherichia coli chromosome are significantly different from (and generally more extreme than) that expected from the nucleotide composition. To aid this analysis, we have constructed tools that plot structural measures for all positions in a long DNA sequence (e.g. an entire chromosome) in the form of color-coded wheels (http://www.cbs.dtu. dk/services/GenomeAtlas/). We find that these "structural atlases" are useful for the discovery of interesting features that may then be investigated in more depth using statistical methods. From investigation of the E. coli structural atlas, we discovered a genome-wide trend, where an extended region encompassing the terminus displays a high of level curvature, a low level of flexibility, and a low degree of helix stability. The same situation is found in the distantly related Gram-positive bacterium Bacillus subtilis, suggesting that the phenomenon is biologically relevant. Based on a search for long DNA segments where all the independent structural measures agree, we have found a set of 20 regions with identical and very extreme structural properties. Due to their strong inherent curvature, we suggest that these may function as topological domain boundaries by efficiently organizing plectonemically supercoiled DNA. Interestingly, we find that in practically all the investigated eubacterial and archaeal genomes, there is a trend for promoter DNA being more curved, less flexible, and less stable than DNA in coding regions and in intergenic DNA without promoters. This trend is present regardless of the absolute levels of the structural parameters, and we suggest that this may be related to the requirement for helix unwinding during initiation of transcription, or perhaps to the previously observed location of promoters at the apex of plectonemically supercoiled DNA. We have also analyzed the structural similarities between groups of genes by clustering all RNA and protein-encoding genes in E. coli, based on the average structural parameters. We find that most ribosomal genes (protein-encoding as well as rRNA genes) cluster together, and we suggest that DNA structure may play a role in the transcription of these highly expressed genes.
Collapse
MESH Headings
- Bacterial Proteins/genetics
- Base Pairing/genetics
- Color
- Computational Biology
- Computer Simulation
- Crystallography, X-Ray
- DNA, Bacterial/chemistry
- DNA, Bacterial/genetics
- DNA, Superhelical/chemistry
- DNA, Superhelical/genetics
- Deoxyribonuclease I/metabolism
- Escherichia coli/genetics
- Genes, Bacterial/genetics
- Genome, Bacterial
- Models, Molecular
- Nucleic Acid Conformation
- Nucleosomes/chemistry
- Nucleosomes/genetics
- Pattern Recognition, Automated
- Phylogeny
- Pliability
- Promoter Regions, Genetic/genetics
- RNA, Bacterial/genetics
- Software
- Statistics as Topic
- Thermodynamics
Collapse
Affiliation(s)
- A G Pedersen
- Center for Biological Sequence Analysis, Department of Biotechnology, The Technical University of Denmark, Building 208, DK-2800 Lyngby, Denmark
| | | | | | | | | |
Collapse
|
21
|
Ozoline ON, Fujita N, Ishihama A. Transcription activation mediated by the carboxyl-terminal domain of the RNA polymerase alpha-subunit. Multipoint monitoring using a fluorescent probe. J Biol Chem 2000; 275:1119-27. [PMID: 10625654 DOI: 10.1074/jbc.275.2.1119] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Conformational changes within the carboxyl-terminal domain of the Escherichia coli RNA polymerase alpha-subunit (alpha-CTD) upon interaction with the DNA UP element or the transcription factor cAMP receptor protein (CRP) were studied by monitoring the spectral parameters of a fluorescent dye, fluorescein mercuric acetate, conjugated to various positions of alpha-CTD. When fluorescein mercuric acetate was conjugated to Cys located on helix I and the loop between helices III and IV, the spectral changes typical for DNA interaction were observed for the RNA polymerase-promoter binary complex with UP element-dependent rrnBP1 and the ternary complex with the CRP-dependent uxuAB promoter in the presence of cAMP/CRP. Likewise, the chemical nuclease iron-(p-bromoacetamidobenzyl)-EDTA conjugated to Cys-269 or Cys-272 introduced CRP-dependent cleavage of the uxuAB promoter, as in the case of rrnBP1 (Murakami, K., Owens, J. T., Belyaeva, T. A., Meares, C. F., Busby, S. J. W., and Ishihama, A. (1997) Proc. Natl. Acad. Sci. U. S. A. 94, 11274-11278), indicating that CRP rearranges the topology of the DNA contact surface in alpha-CTD. Conformational changes in alpha-CTD were also observed upon formation of a binary complex with the uxuAB (in the absence of CRP) and factor-independent T7D promoters. The spectral changes suggested that helix IV of alpha-CTD approaches the negatively charged phosphate moiety of DNA. In agreement with this prediction, iron-(p-bromoacetamidobenzyl)-EDTA conjugated to Cys-309 induced extensive DNA cleavage upstream from the uxuAB promoter -35 element. We propose that helix IV of alpha-CTD is involved in direct interaction with some promoters.
Collapse
Affiliation(s)
- O N Ozoline
- Department of Molecular Genetics, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | | | | |
Collapse
|