Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yella VR, Vanaja A, Kulandaivelu U, Kumar A. Delving into Eukaryotic Origins of Replication Using DNA Structural Features. ACS Omega 2020;5:13601-13611. [PMID: 32566825 PMCID: PMC7301376 DOI: 10.1021/acsomega.0c00441] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 05/15/2020] [Indexed: 05/18/2023]

For:	Yella VR, Vanaja A, Kulandaivelu U, Kumar A. Delving into Eukaryotic Origins of Replication Using DNA Structural Features. ACS Omega 2020;5:13601-13611. [PMID: 32566825 PMCID: PMC7301376 DOI: 10.1021/acsomega.0c00441] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 05/15/2020] [Indexed: 05/18/2023]

Number

Cited by Other Article(s)

Paul S, Olymon K, Martinez GS, Sarkar S, Yella VR, Kumar A. MLDSPP: Bacterial Promoter Prediction Tool Using DNA Structural Properties with Machine Learning and Explainable AI. J Chem Inf Model 2024;64:2705-2719. [PMID: 38258978 DOI: 10.1021/acs.jcim.3c02017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]

Abstract

Bacterial promoters play a crucial role in gene expression by serving as docking sites for the transcription initiation machinery. However, accurately identifying promoter regions in bacterial genomes remains a challenge due to their diverse architecture and variations. In this study, we propose MLDSPP (Machine Learning and Duplex Stability based Promoter prediction in Prokaryotes), a machine learning-based promoter prediction tool, to comprehensively screen bacterial promoter regions in 12 diverse genomes. We leveraged biologically relevant and informative DNA structural properties, such as DNA duplex stability and base stacking, and state-of-the-art machine learning (ML) strategies to gain insights into promoter characteristics. We evaluated several machine learning models, including Support Vector Machines, Random Forests, and XGBoost, and assessed their performance using accuracy, precision, recall, specificity, F1 score, and MCC metrics. Our findings reveal that XGBoost outperformed other models and current state-of-the-art promoter prediction tools, namely Sigma70pred and iPromoter2L, achieving F1-scores >95% in most systems. Significantly, the use of one-hot encoding for representing nucleotide sequences complements these structural features, enhancing our XGBoost model's predictive capabilities. To address the challenge of model interpretability, we incorporated explainable AI techniques using Shapley values. This enhancement allows for a better understanding and interpretation of the predictions of our model. In conclusion, our study presents MLDSPP as a novel, generic tool for predicting promoter regions in bacteria, utilizing original downstream sequences as nonpromoter controls. This tool has the potential to significantly advance the field of bacterial genomics and contribute to our understanding of gene regulation in diverse bacterial systems.

Collapse

Bétermier M, Klobutcher LA, Orias E. Programmed chromosome fragmentation in ciliated protozoa: multiple means to chromosome ends. Microbiol Mol Biol Rev 2023;87:e0018422. [PMID: 38009915 PMCID: PMC10732028 DOI: 10.1128/mmbr.00184-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023] Open

Kari H, Bandi SMS, Kumar A, Yella VR. DeePromClass: Delineator for Eukaryotic Core Promoters Employing Deep Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:802-807. [PMID: 35353704 DOI: 10.1109/tcbb.2022.3163418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Vanaja A, Yella VR. Delineation of the DNA Structural Features of Eukaryotic Core Promoter Classes. ACS OMEGA 2022;7:5657-5669. [PMID: 35224327 PMCID: PMC8867553 DOI: 10.1021/acsomega.1c04603] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 01/27/2022] [Indexed: 05/02/2023]

Sarkar S, Dey U, Khohliwe TB, Yella VR, Kumar A. Analysis of nucleoid-associated protein-binding regions reveals DNA structural features influencing genome organization in Mycobacterium tuberculosis. FEBS Lett 2021;595:2504-2521. [PMID: 34387867 DOI: 10.1002/1873-3468.14178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 08/01/2021] [Accepted: 08/11/2021] [Indexed: 11/10/2022]

Symphony of the DNA flexibility and sequence environment orchestrates p53 binding to its responsive elements. Gene 2021;803:145892. [PMID: 34375633 DOI: 10.1016/j.gene.2021.145892] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 07/26/2021] [Accepted: 08/05/2021] [Indexed: 11/23/2022]

Dey U, Sarkar S, Teronpi V, Yella VR, Kumar A. G-quadruplex motifs are functionally conserved in cis-regulatory regions of pathogenic bacteria: An in-silico evaluation. Biochimie 2021;184:40-51. [PMID: 33548392 DOI: 10.1016/j.biochi.2021.01.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 01/28/2021] [Accepted: 01/29/2021] [Indexed: 02/06/2023]

Wu F, Yang R, Zhang C, Zhang L. A deep learning framework combined with word embedding to identify DNA replication origins. Sci Rep 2021;11:844. [PMID: 33436981 PMCID: PMC7804333 DOI: 10.1038/s41598-020-80670-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 12/24/2020] [Indexed: 01/29/2023] Open

Abstract

The DNA replication influences the inheritance of genetic information in the DNA life cycle. As the distribution of replication origins (ORIs) is the major determinant to precisely regulate the replication process, the correct identification of ORIs is significant in giving an insightful understanding of DNA replication mechanisms and the regulatory mechanisms of genetic expressions. For eukaryotes in particular, multiple ORIs exist in each of their gene sequences to complete the replication in a reasonable period of time. To simplify the identification process of eukaryote's ORIs, most of existing methods are developed by traditional machine learning algorithms, and target to the gene sequences with a fixed length. Consequently, the identification results are not satisfying, i.e. there is still great room for improvement. To break through the limitations in previous studies, this paper develops sequence segmentation methods, and employs the word embedding technique, 'Word2vec', to convert gene sequences into word vectors, thereby grasping the inner correlations of gene sequences with different lengths. Then, a deep learning framework to perform the ORI identification task is constructed by a convolutional neural network with an embedding layer. On the basis of the analysis of similarity reduction dimensionality diagram, Word2vec can effectively transform the inner relationship among words into numerical feature. For four species in this study, the best models are obtained with the overall accuracy of 0.975, 0.765, 0.885, 0.967, the Matthew's correlation coefficient of 0.940, 0.530, 0.771, 0.934, and the AUC of 0.975, 0.800, 0.888, 0.981, which indicate that the proposed predictor has a stable ability and provide a high confidence coefficient to classify both of ORIs and non-ORIs. Compared with state-of-the-art methods, the proposed predictor can achieve ORI identification with significant improvement. It is therefore reasonable to anticipate that the proposed method will make a useful high throughput tool for genome analysis.

Collapse