Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kumar A, Bansal M. Unveiling DNA structural features of promoters associated with various types of TSSs in prokaryotic transcriptomes and their role in gene expression. DNA Res 2017;24:25-35. [PMID: 27803028 PMCID: PMC5381344 DOI: 10.1093/dnares/dsw045] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/23/2016] [Indexed: 01/28/2023] Open

For:	Kumar A, Bansal M. Unveiling DNA structural features of promoters associated with various types of TSSs in prokaryotic transcriptomes and their role in gene expression. DNA Res 2017;24:25-35. [PMID: 27803028 PMCID: PMC5381344 DOI: 10.1093/dnares/dsw045] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/23/2016] [Indexed: 01/28/2023] Open

Number

Cited by Other Article(s)

Paul S, Olymon K, Martinez GS, Sarkar S, Yella VR, Kumar A. MLDSPP: Bacterial Promoter Prediction Tool Using DNA Structural Properties with Machine Learning and Explainable AI. J Chem Inf Model 2024;64:2705-2719. [PMID: 38258978 DOI: 10.1021/acs.jcim.3c02017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]

Abstract

Bacterial promoters play a crucial role in gene expression by serving as docking sites for the transcription initiation machinery. However, accurately identifying promoter regions in bacterial genomes remains a challenge due to their diverse architecture and variations. In this study, we propose MLDSPP (Machine Learning and Duplex Stability based Promoter prediction in Prokaryotes), a machine learning-based promoter prediction tool, to comprehensively screen bacterial promoter regions in 12 diverse genomes. We leveraged biologically relevant and informative DNA structural properties, such as DNA duplex stability and base stacking, and state-of-the-art machine learning (ML) strategies to gain insights into promoter characteristics. We evaluated several machine learning models, including Support Vector Machines, Random Forests, and XGBoost, and assessed their performance using accuracy, precision, recall, specificity, F1 score, and MCC metrics. Our findings reveal that XGBoost outperformed other models and current state-of-the-art promoter prediction tools, namely Sigma70pred and iPromoter2L, achieving F1-scores >95% in most systems. Significantly, the use of one-hot encoding for representing nucleotide sequences complements these structural features, enhancing our XGBoost model's predictive capabilities. To address the challenge of model interpretability, we incorporated explainable AI techniques using Shapley values. This enhancement allows for a better understanding and interpretation of the predictions of our model. In conclusion, our study presents MLDSPP as a novel, generic tool for predicting promoter regions in bacteria, utilizing original downstream sequences as nonpromoter controls. This tool has the potential to significantly advance the field of bacterial genomics and contribute to our understanding of gene regulation in diverse bacterial systems.

Collapse

Dey U, Olymon K, Banik A, Abbas E, Yella VR, Kumar A. DNA structural properties of DNA binding sites for 21 transcription factors in the mycobacterial genome. Front Cell Infect Microbiol 2023;13:1147544. [PMID: 37396305 PMCID: PMC10312376 DOI: 10.3389/fcimb.2023.1147544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 05/19/2023] [Indexed: 07/04/2023] Open

Genome-Wide Prediction of Transcription Start Sites in Conifers. Int J Mol Sci 2022;23:ijms23031735. [PMID: 35163661 PMCID: PMC8836283 DOI: 10.3390/ijms23031735] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/30/2022] [Accepted: 02/01/2022] [Indexed: 02/04/2023] Open

Abstract

The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms.

Collapse

Casa PL, de Abreu FP, Benvenuti JL, Martinez GS, de Avila e Silva S. Beyond consensual motifs: an analysis of DNA curvature within Escherichia coli promoters. Biologia (Bratisl) 2022. [DOI: 10.1007/s11756-021-00999-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Martinez GS, de Ávila e Silva S, Kumar A, Pérez-Rueda E. DNA structural and physical properties reveal peculiarities in promoter sequences of the bacterium Escherichia coli K-12. SN APPLIED SCIENCES 2021. [DOI: 10.1007/s42452-021-04713-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open

Zrimec J, Buric F, Kokina M, Garcia V, Zelezniak A. Learning the Regulatory Code of Gene Expression. Front Mol Biosci 2021;8:673363. [PMID: 34179082 PMCID: PMC8223075 DOI: 10.3389/fmolb.2021.673363] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open

Mishra A, Siwach P, Misra P, Dhiman S, Pandey AK, Srivastava P, Jayaram B. Intron exon boundary junctions in human genome have in-built unique structural and energetic signals. Nucleic Acids Res 2021;49:2674-2683. [PMID: 33621338 PMCID: PMC7969029 DOI: 10.1093/nar/gkab098] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2019] [Revised: 01/21/2021] [Accepted: 02/22/2021] [Indexed: 11/13/2022] Open

Yella VR, Vanaja A, Kulandaivelu U, Kumar A. Delving into Eukaryotic Origins of Replication Using DNA Structural Features. ACS OMEGA 2020;5:13601-13611. [PMID: 32566825 PMCID: PMC7301376 DOI: 10.1021/acsomega.0c00441] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 05/15/2020] [Indexed: 05/18/2023]

Abstract

DNA replication in eukaryotes is an intricate process, which is precisely synchronized by a set of regulatory proteins, and the replication fork emanates from discrete sites on chromatin called origins of replication (Oris). These spots are considered as the gateway to chromosomal replication and are stereotyped by sequence motifs. The cognate sequences are noticeable in a small group of entire origin regions or totally absent across different metazoans. Alternatively, the use of DNA secondary structural features can provide additional information compared to the primary sequence. In this article, we report the trends in DNA sequence-based structural properties of origin sequences in nine eukaryotic systems representing different families of life. Biologically relevant DNA secondary structural properties, namely, stability, propeller twist, flexibility, and minor groove shape were studied in the sequences flanking replication start sites. Results indicate that Oris in yeasts show lower stability, more rigidity, and narrow minor groove preferences compared to genomic sequences surrounding them. Yeast Oris also show preference for A-tracts and the promoter element TATA box in the vicinity of replication start sites. On the contrary, Drosophila melanogaster, humans, and Arabidopsis thaliana do not have such features in their Oris, and instead, they show high preponderance of G-rich sequence motifs such as putative G-quadruplexes or i-motifs and CpG islands. Our extensive study applies the DNA structural feature computation to delve into origins of replication across organisms ranging from yeasts to mammals and including a plant. Insights from this study would be significant in understanding origin architecture and help in designing new algorithms for predicting DNA trans-acting factor recognition events.

Collapse

Das S, Bansal M. Variation of gene expression in plants is influenced by gene architecture and structural properties of promoters. PLoS One 2019;14:e0212678. [PMID: 30908494 PMCID: PMC6433290 DOI: 10.1371/journal.pone.0212678] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2018] [Accepted: 02/07/2019] [Indexed: 12/03/2022] Open

Mishra A, Siwach P, Misra P, Jayaram B, Bansal M, Olson WK, Thayer KM, Beveridge DL. Toward a Universal Structural and Energetic Model for Prokaryotic Promoters. Biophys J 2018;115:1180-1189. [PMID: 30172386 DOI: 10.1016/j.bpj.2018.08.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Revised: 07/28/2018] [Accepted: 08/02/2018] [Indexed: 01/04/2023] Open

Abstract

With almost no consensus promoter sequence in prokaryotes, recruitment of RNA polymerase (RNAP) to precise transcriptional start sites (TSSs) has remained an unsolved puzzle. Uncovering the underlying mechanism is critical for understanding the principle of gene regulation. We attempted to search the hidden code in ∼16,500 promoters of 12 prokaryotes representing two kingdoms in their structure and energetics. Twenty-eight fundamental parameters of DNA structure including backbone angles, basepair axis, and interbasepair and intrabasepair parameters were used, and information was extracted from x-ray crystallography data. Three parameters (solvation energy, hydrogen-bond energy, and stacking energy) were selected for creating energetics profiles using in-house programs. DNA of promoter regions was found to be inherently designed to undergo a change in every parameter undertaken for the study, in all prokaryotes. The change starts from some distance upstream of TSSs and continues past some distance from TSS, hence giving a signature state to promoter regions. These signature states might be the universal hidden codes recognized by RNAP. This observation was reiterated when randomly selected promoter sequences (with little sequence conservation) were subjected to structure generation; all developed into very similar three-dimensional structures quite distinct from those of conventional B-DNA and coding sequences. Fine structural details at important motifs (viz. -11, -35, and -75 positions relative to TSS) of promoters reveal novel to our knowledge and pointed insights for RNAP interaction at these locations; it could be correlated with how some particular structural changes at the -11 region may allow insertion of RNAP amino acids in interbasepair space as well as facilitate the flipping out of bases from the DNA duplex.

Collapse

Identification of putative promoters in 48 eukaryotic genomes on the basis of DNA free energy. Sci Rep 2018. [PMID: 29540741 PMCID: PMC5852095 DOI: 10.1038/s41598-018-22129-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open