1
|
Vanaja A, Yella VR. Delineation of the DNA Structural Features of Eukaryotic Core Promoter Classes. ACS OMEGA 2022; 7:5657-5669. [PMID: 35224327 PMCID: PMC8867553 DOI: 10.1021/acsomega.1c04603] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 01/27/2022] [Indexed: 05/02/2023]
Abstract
The eukaryotic transcription is orchestrated from a chunk of the DNA region stated as the core promoter. Multifarious and punctilious core promoter signals, viz., TATA-box, Inr, BREs, and Pause Button, are associated with a subset of genes and regulate their spatiotemporal expression. However, the core promoter architecture linked with these signals has not been investigated exhaustively for several species. In this study, we attempted to envisage the adaptive binding landscape of the transcription initiation machinery as a function of DNA structure. To this end, we deployed a set of k-mer based DNA structural estimates and regular expression models derived from experiments, molecular dynamic simulations, and theoretical frameworks, and high-throughout promoter data sets retrieved from the eukaryotic promoter database. We categorized protein-coding gene core promoters based on characteristic motifs at precise locations and analyzed the B-DNA structural properties and non-B-DNA structural motifs for 15 different eukaryotic genomes. We observed that Inr, BREd, and no-motif classes display common patterns of DNA sequence and structural environment. TATA-containing, BREu, and Pause Button classes show a deviant behavior with the TATA class displaying varied axial and twisting flexibility while BREu and Pause Button leaned toward G-quadruplex motif enrichment. Intriguingly, DNA meltability and shape signals are conserved irrespective of the presence or absence of distinct core promoter motifs in the majority of species. Altogether, here we delineated the conserved DNA structural signals associated with several promoter classes that may contribute to the chromatin configuration, orchestration of transcription machinery, and DNA duplex melting during the transcription process.
Collapse
Affiliation(s)
- Akkinepally Vanaja
- Department
of Biotechnology, Koneru Lakshmaiah Education
Foundation, Vaddeswaram, Guntur 522502, Andhra
Pradesh, India
- KL
College of Pharmacy, Koneru Lakshmaiah Education
Foundation, Vaddeswaram, Guntur 522502, Andhra
Pradesh, India
| | - Venkata Rajesh Yella
- Department
of Biotechnology, Koneru Lakshmaiah Education
Foundation, Vaddeswaram, Guntur 522502, Andhra
Pradesh, India
- . Tel: +91-863-2399999, Extn-1021. Website: https://www.kluniversity.in/bt/faculty-list.aspx
| |
Collapse
|
2
|
Yella VR, Vanaja A, Kulandaivelu U, Kumar A. Delving into Eukaryotic Origins of Replication Using DNA Structural Features. ACS OMEGA 2020; 5:13601-13611. [PMID: 32566825 PMCID: PMC7301376 DOI: 10.1021/acsomega.0c00441] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 05/15/2020] [Indexed: 05/18/2023]
Abstract
DNA replication in eukaryotes is an intricate process, which is precisely synchronized by a set of regulatory proteins, and the replication fork emanates from discrete sites on chromatin called origins of replication (Oris). These spots are considered as the gateway to chromosomal replication and are stereotyped by sequence motifs. The cognate sequences are noticeable in a small group of entire origin regions or totally absent across different metazoans. Alternatively, the use of DNA secondary structural features can provide additional information compared to the primary sequence. In this article, we report the trends in DNA sequence-based structural properties of origin sequences in nine eukaryotic systems representing different families of life. Biologically relevant DNA secondary structural properties, namely, stability, propeller twist, flexibility, and minor groove shape were studied in the sequences flanking replication start sites. Results indicate that Oris in yeasts show lower stability, more rigidity, and narrow minor groove preferences compared to genomic sequences surrounding them. Yeast Oris also show preference for A-tracts and the promoter element TATA box in the vicinity of replication start sites. On the contrary, Drosophila melanogaster, humans, and Arabidopsis thaliana do not have such features in their Oris, and instead, they show high preponderance of G-rich sequence motifs such as putative G-quadruplexes or i-motifs and CpG islands. Our extensive study applies the DNA structural feature computation to delve into origins of replication across organisms ranging from yeasts to mammals and including a plant. Insights from this study would be significant in understanding origin architecture and help in designing new algorithms for predicting DNA trans-acting factor recognition events.
Collapse
Affiliation(s)
- Venkata Rajesh Yella
- Department
of Biotechnology, Koneru Lakshmaiah Education
Foundation, Guntur 522502, Andhra Pradesh, India
| | - Akkinepally Vanaja
- Department
of Biotechnology, Koneru Lakshmaiah Education
Foundation, Guntur 522502, Andhra Pradesh, India
- KL
College of Pharmacy, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur 522502, Andhra Pradesh, India
| | - Umasankar Kulandaivelu
- KL
College of Pharmacy, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur 522502, Andhra Pradesh, India
| | - Aditya Kumar
- Department
of Molecular Biology and Biotechnology, Tezpur University, Tezpur 784028, Assam, India
| |
Collapse
|
3
|
Singh VK, Krishnamachari A. Context based computational analysis and characterization of ARS consensus sequences (ACS) of Saccharomyces cerevisiae genome. GENOMICS DATA 2016; 9:130-6. [PMID: 27508123 PMCID: PMC4971157 DOI: 10.1016/j.gdata.2016.07.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2016] [Revised: 06/27/2016] [Accepted: 07/06/2016] [Indexed: 01/08/2023]
Abstract
Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS) requires an essential consensus sequence (ACS) for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC) denoted as ORC-ACS and non-replicating ACS sequences (nrACS), that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.
Collapse
|
4
|
Meysman P, Marchal K, Engelen K. DNA structural properties in the classification of genomic transcription regulation elements. Bioinform Biol Insights 2012; 6:155-68. [PMID: 22837642 PMCID: PMC3399529 DOI: 10.4137/bbi.s9426] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
It has been long known that DNA molecules encode information at various levels. The most basic level comprises the base sequence itself and is primarily important for the encoding of proteins and direct base recognition by DNA-binding proteins. A more elusive level consists of the local structural properties of the DNA molecule wherein the DNA sequence only plays an indirect supportive role. These properties are nevertheless an important factor in a large number of biomolecular processes and can be considered as informative signals for the presence of a variety of genomic features. Several recent studies have unequivocally shown the benefit of relying on such DNA properties for modeling and predicting genomic features as diverse as transcription start sites, transcription factor binding sites, or nucleosome occupancy. This review is meant to provide an overview of the key aspects of these DNA conformational and physicochemical properties. To illustrate their potential added value compared to relying solely on the nucleotide sequence in genomics studies, we discuss their application in research on transcription regulation mechanisms as representative cases.
Collapse
Affiliation(s)
- Pieter Meysman
- Department of Molecular and Microbial Systems, KULeuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium
| | | | | |
Collapse
|
5
|
Abstract
The origin recognition complex (ORC) was first discovered in the baker's yeast in 1992. Identification of ORC opened up a path for subsequent molecular level investigations on how eukaryotic cells initiate and control genome duplication each cell cycle. Twenty years after the first biochemical isolation, ORC is now taking on a three-dimensional shape, although a very blurry shape at the moment, thanks to the recent electron microscopy and image reconstruction efforts. In this chapter, we outline the current biochemical knowledge about ORC from several eukaryotic systems, with emphasis on the most recent structural and biochemical studies. Despite many species-specific properties, an emerging consensus is that ORC is an ATP-dependent machine that recruits other key proteins to form pre-replicative complexes (pre-RCs) at many origins of DNA replication, enabling the subsequent initiation of DNA replication in S phase.
Collapse
Affiliation(s)
- Huilin Li
- Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, NY 11794, USA, And, Biology Department, Brookhaven National Laboratory, Upton, NY 11973, USA, , Tel: 631-344-2931, Fax: 631-344-3407
| | - Bruce Stillman
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA, , Tel: 516-367-8383
| |
Collapse
|
6
|
Zeng J, Zhao XY, Cao XQ, Yan H. SCS: signal, context, and structure features for genome-wide human promoter recognition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010; 7:550-562. [PMID: 20671324 DOI: 10.1109/tcbb.2008.95] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
This paper integrates the signal, context, and structure features for genome-wide human promoter recognition, which is important in improving genome annotation and analyzing transcriptional regulation without experimental supports of ESTs, cDNAs, or mRNAs. First, CpG islands are salient biological signals associated with approximately 50 percent of mammalian promoters. Second, the genomic context of promoters may have biological significance, which is based on n-mers (sequences of n bases long) and their statistics estimated from training samples. Third, sequence-dependent DNA flexibility originates from DNA 3D structures and plays an important role in guiding transcription factors to the target site in promoters. Employing decision trees, we combine above signal, context, and structure features to build a hierarchical promoter recognition system called SCS. Experimental results on controlled data sets and the entire human genome demonstrate that SCS is significantly superior in terms of sensitivity and specificity as compared to other state-of-the-art methods. The SCS promoter recognition system is available online as supplemental materials for academic use and can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.95.
Collapse
Affiliation(s)
- Jia Zeng
- School of Computer Science and Technology, Soochow University, Suzhou, China.
| | | | | | | |
Collapse
|
7
|
Zeng J, Zhu S, Liew AWC, Yan H. Multiconstrained gene clustering based on generalized projections. BMC Bioinformatics 2010; 11:164. [PMID: 20356386 PMCID: PMC3098054 DOI: 10.1186/1471-2105-11-164] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2009] [Accepted: 03/31/2010] [Indexed: 11/10/2022] Open
Abstract
Background Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem. Results We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods. Conclusions The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions.
Collapse
Affiliation(s)
- Jia Zeng
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
| | | | | | | |
Collapse
|
8
|
Zeng J, Cao XQ, Zhao H, Yan H. Finding human promoter groups based on DNA physical properties. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2009; 80:041917. [PMID: 19905352 DOI: 10.1103/physreve.80.041917] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2009] [Revised: 08/24/2009] [Indexed: 05/28/2023]
Abstract
DNA rigidity is an important physical property originating from the DNA three-dimensional structure. Although the general DNA rigidity patterns in human promoters have been investigated, their distinct roles in transcription are largely unknown. In this paper, we discover four highly distinct human promoter groups based on similarity of their rigidity profiles. First, we find that all promoter groups conserve relatively rigid DNAs at the canonical TATA box [a consensus TATA(A/T)A(A/T) sequence] position, which are important physical signals in binding transcription factors. Second, we find that the genes activated by each group of promoters share significant biological functions based on their gene ontology annotations. Finally, we find that these human promoter groups correlate with the tissue-specific gene expression.
Collapse
Affiliation(s)
- Jia Zeng
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong.
| | | | | | | |
Collapse
|
9
|
Zeng J, Zhu S, Yan H. Towards accurate human promoter recognition: a review of currently used sequence features and classification methods. Brief Bioinform 2009; 10:498-508. [PMID: 19531545 DOI: 10.1093/bib/bbp027] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
This review describes important advances that have been made during the past decade for genome-wide human promoter recognition. Interest in promoter recognition algorithms on a genome-wide scale is worldwide and touches on a number of practical systems that are important in analysis of gene regulation and in genome annotation without experimental support of ESTs, cDNAs or mRNAs. The main focus of this review is on feature extraction and model selection for accurate human promoter recognition, with descriptions of what they are, what has been accomplished, and what remains to be done.
Collapse
Affiliation(s)
- Jia Zeng
- Department of Computer Science, Hong Kong Baptist University, Kowloon, Hong Kong.
| | | | | |
Collapse
|
10
|
Abstract
This paper discovers consensus physical signals around eukaryotic splice sites, transcription start sites, and replication origin start and end sites on a genome-wide scale based on their DNA flexibility profiles calculated by three different flexibility models. These salient physical signals are localized highly rigid and flexible DNAs, which may play important roles in protein-DNA recognition by the sliding search mechanism. The found physical signals lead us to a detailed hypothetical view of the search process in which a DNA-binding protein first finds a genomic region close to the target site from an arbitrary starting location by three-dimensional (3D) hopping and intersegment transfer mechanisms for long distances, and subsequently uses the one-dimensional (1D) sliding mechanism facilitated by the localized highly rigid DNAs to accurately locate the target flexible binding site within 30 bp (base pair) short distances. Guided by these physical signals, DNA-binding proteins rapidly search the entire genome to recognize a specific target site from the 3D to 1D pathway. Our findings also show that current promoter prediction programs (PPPs) based on DNA physical properties may suffer from lots of false positives because other functional sites such as splice sites and replication origins have similar physical signals as promoters do.
Collapse
|
11
|
Current awareness on yeast. Yeast 2009. [DOI: 10.1002/yea.1619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|