1
|
Durge AR, Shrimankar DD, Sawarkar AD. Heuristic Analysis of Genomic Sequence Processing Models for High Efficiency Prediction: A Statistical Perspective. Curr Genomics 2022; 23:299-317. [PMID: 36778194 PMCID: PMC9878859 DOI: 10.2174/1389202923666220927105311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 08/29/2022] [Accepted: 09/01/2022] [Indexed: 11/22/2022] Open
Abstract
Genome sequences indicate a wide variety of characteristics, which include species and sub-species type, genotype, diseases, growth indicators, yield quality, etc. To analyze and study the characteristics of the genome sequences across different species, various deep learning models have been proposed by researchers, such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Multilayer Perceptrons (MLPs), etc., which vary in terms of evaluation performance, area of application and species that are processed. Due to a wide differentiation between the algorithmic implementations, it becomes difficult for research programmers to select the best possible genome processing model for their application. In order to facilitate this selection, the paper reviews a wide variety of such models and compares their performance in terms of accuracy, area of application, computational complexity, processing delay, precision and recall. Thus, in the present review, various deep learning and machine learning models have been presented that possess different accuracies for different applications. For multiple genomic data, Repeated Incremental Pruning to Produce Error Reduction with Support Vector Machine (Ripper SVM) outputs 99.7% of accuracy, and for cancer genomic data, it exhibits 99.27% of accuracy using the CNN Bayesian method. Whereas for Covid genome analysis, Bidirectional Long Short-Term Memory with CNN (BiLSTM CNN) exhibits the highest accuracy of 99.95%. A similar analysis of precision and recall of different models has been reviewed. Finally, this paper concludes with some interesting observations related to the genomic processing models and recommends applications for their efficient use.
Collapse
Affiliation(s)
- Aditi R. Durge
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India
| | - Deepti D. Shrimankar
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India,Address correspondence to this author at the Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India; Tel: 9860606477; E-mail:
| | - Ankush D. Sawarkar
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India
| |
Collapse
|
2
|
Cheng X, Wang J, Li Q, Liu T. BiLSTM-5mC: A Bidirectional Long Short-Term Memory-Based Approach for Predicting 5-Methylcytosine Sites in Genome-Wide DNA Promoters. Molecules 2021; 26:molecules26247414. [PMID: 34946497 PMCID: PMC8704614 DOI: 10.3390/molecules26247414] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 12/04/2021] [Indexed: 12/04/2022] Open
Abstract
An important reason of cancer proliferation is the change in DNA methylation patterns, characterized by the localized hypermethylation of the promoters of tumor-suppressor genes together with an overall decrease in the level of 5-methylcytosine (5mC). Therefore, identifying the 5mC sites in the promoters is a critical step towards further understanding the diverse functions of DNA methylation in genetic diseases such as cancers and aging. However, most wet-lab experimental techniques are often time consuming and laborious for detecting 5mC sites. In this study, we proposed a deep learning-based approach, called BiLSTM-5mC, for accurately identifying 5mC sites in genome-wide DNA promoters. First, we randomly divided the negative samples into 11 subsets of equal size, one of which can form the balance subset by combining with the positive samples in the same amount. Then, two types of feature vectors encoded by the one-hot method, and the nucleotide property and frequency (NPF) methods were fed into a bidirectional long short-term memory (BiLSTM) network and a full connection layer to train the 22 submodels. Finally, the outputs of these models were integrated to predict 5mC sites by using the majority vote strategy. Our experimental results demonstrated that BiLSTM-5mC outperformed existing methods based on the same independent dataset.
Collapse
Affiliation(s)
- Xin Cheng
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China; (X.C.); (Q.L.)
| | - Jun Wang
- School of Software Technology, Zhejiang University, Ningbo 315048, China;
| | - Qianyue Li
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China; (X.C.); (Q.L.)
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China; (X.C.); (Q.L.)
- Correspondence: ; Tel.: +86-21-61900624
| |
Collapse
|
3
|
Guo G, Pan K, Fang S, Ye L, Tong X, Wang Z, Xue X, Zhang H. Advances in mRNA 5-methylcytosine modifications: Detection, effectors, biological functions, and clinical relevance. MOLECULAR THERAPY. NUCLEIC ACIDS 2021; 26:575-593. [PMID: 34631286 PMCID: PMC8479277 DOI: 10.1016/j.omtn.2021.08.020] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
5-methylcytosine (m5C) post-transcriptional modifications affect the maturation, stability, and translation of the mRNA molecule. These modifications play an important role in many physiological and pathological processes, including stress response, tumorigenesis, tumor cell migration, embryogenesis, and viral replication. Recently, there has been a better understanding of the biological implications of m5C modification owing to the rapid development and optimization of detection technologies, including liquid chromatography-tandem mass spectrometry (LC-MS/MS) and RNA-BisSeq. Further, predictive models (such as PEA-m5C, m5C-PseDNC, and DeepMRMP) for the identification of potential m5C modification sites have also emerged. In this review, we summarize the current experimental detection methods and predictive models for mRNA m5C modifications, focusing on their advantages and limitations. We systematically surveyed the latest research on the effectors related to mRNA m5C modifications and their biological functions in multiple species. Finally, we discuss the physiological effects and pathological significance of m5C modifications in multiple diseases, as well as their therapeutic potential, thereby providing new perspectives for disease treatment and prognosis.
Collapse
Affiliation(s)
- Gangqiang Guo
- Wenzhou Collaborative Innovation Center of Gastrointestinal Cancer in Basic Research and Precision Medicine, Wenzhou Key Laboratory of Cancer-related Pathogens and Immunity, Department of Microbiology and Immunology, Institute of Molecular Virology and Immunology, Institute of Tropical Medicine, School of Basic Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Kan Pan
- First Clinical College, Wenzhou Medical University, Wenzhou, China
| | - Su Fang
- Wenzhou Collaborative Innovation Center of Gastrointestinal Cancer in Basic Research and Precision Medicine, Wenzhou Key Laboratory of Cancer-related Pathogens and Immunity, Department of Microbiology and Immunology, Institute of Molecular Virology and Immunology, Institute of Tropical Medicine, School of Basic Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Lele Ye
- Department of Gynecologic Oncology, Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Xinya Tong
- Wenzhou Collaborative Innovation Center of Gastrointestinal Cancer in Basic Research and Precision Medicine, Wenzhou Key Laboratory of Cancer-related Pathogens and Immunity, Department of Microbiology and Immunology, Institute of Molecular Virology and Immunology, Institute of Tropical Medicine, School of Basic Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Zhibin Wang
- Wenzhou Collaborative Innovation Center of Gastrointestinal Cancer in Basic Research and Precision Medicine, Wenzhou Key Laboratory of Cancer-related Pathogens and Immunity, Department of Microbiology and Immunology, Institute of Molecular Virology and Immunology, Institute of Tropical Medicine, School of Basic Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Xiangyang Xue
- Wenzhou Collaborative Innovation Center of Gastrointestinal Cancer in Basic Research and Precision Medicine, Wenzhou Key Laboratory of Cancer-related Pathogens and Immunity, Department of Microbiology and Immunology, Institute of Molecular Virology and Immunology, Institute of Tropical Medicine, School of Basic Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Huidi Zhang
- Department of Nephrology, The First Affiliated Hospital, Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
4
|
Schaefer MR. The Regulation of RNA Modification Systems: The Next Frontier in Epitranscriptomics? Genes (Basel) 2021; 12:genes12030345. [PMID: 33652758 PMCID: PMC7996938 DOI: 10.3390/genes12030345] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 02/22/2021] [Accepted: 02/24/2021] [Indexed: 12/12/2022] Open
Abstract
RNA modifications, long considered to be molecular curiosities embellishing just abundant and non-coding RNAs, have now moved into the focus of both academic and applied research. Dedicated research efforts (epitranscriptomics) aim at deciphering the underlying principles by determining RNA modification landscapes and investigating the molecular mechanisms that establish, interpret and modulate the information potential of RNA beyond the combination of four canonical nucleotides. This has resulted in mapping various epitranscriptomes at high resolution and in cataloguing the effects caused by aberrant RNA modification circuitry. While the scope of the obtained insights has been complex and exciting, most of current epitranscriptomics appears to be stuck in the process of producing data, with very few efforts to disentangle cause from consequence when studying a specific RNA modification system. This article discusses various knowledge gaps in this field with the aim to raise one specific question: how are the enzymes regulated that dynamically install and modify RNA modifications? Furthermore, various technologies will be highlighted whose development and use might allow identifying specific and context-dependent regulators of epitranscriptomic mechanisms. Given the complexity of individual epitranscriptomes, determining their regulatory principles will become crucially important, especially when aiming at modifying specific aspects of an epitranscriptome both for experimental and, potentially, therapeutic purposes.
Collapse
Affiliation(s)
- Matthias R Schaefer
- Centre for Anatomy & Cell Biology, Division of Cell-and Developmental Biology, Medical University of Vienna, Schwarzspanierstrasse 17, Haus C, 1st Floor, 1090 Vienna, Austria
| |
Collapse
|
5
|
Chen W. Computational RNA Epigenetics. Curr Genomics 2020; 21:2. [PMID: 32655292 PMCID: PMC7324888 DOI: 10.2174/138920292101200305145123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Affiliation(s)
- Wei Chen
- Guest Editor
- Center for Genomics and Computational Biology, School of Life Sciences
- North China University of Science and Technology, Tangshan063000
- China and
- Innovative Institute of Chinese Medicine and Pharmacy
- Chengdu University of Traditional Chinese Medicine, Chengdu611137
- China
- E-mail:
| |
Collapse
|