1
|
Han K, Wang J, Chu Y, Liao Q, Ding Y, Zheng D, Wan J, Guo X, Zou Q. Deep learning based method for predicting DNA N6-methyladenosine sites. Methods 2024; 230:91-98. [PMID: 39097179 DOI: 10.1016/j.ymeth.2024.07.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 07/22/2024] [Accepted: 07/29/2024] [Indexed: 08/05/2024] Open
Abstract
DNA N6 methyladenine (6mA) plays an important role in many biological processes, and accurately identifying its sites helps one to understand its biological effects more comprehensively. Previous traditional experimental methods are very labor-intensive and traditional machine learning methods also seem to be somewhat insufficient as the database of 6mA methylation groups becomes progressively larger, so we propose a deep learning-based method called multi-scale convolutional model based on global response normalization (CG6mA) to solve the prediction problem of 6mA site. This method is tested with other methods on three different kinds of benchmark datasets, and the results show that our model can get more excellent prediction results.
Collapse
Affiliation(s)
- Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China
| | - Jianchun Wang
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China
| | - Ying Chu
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China
| | - Qian Liao
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Dequan Zheng
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China
| | - Jie Wan
- Laboratory for Space Environment and Physical Sciences, Harbin Institute of Technology, Harbin 150001, China
| | - Xiaoyi Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.
| |
Collapse
|
2
|
Hong Z, Xu Y, Wu J. Bisphenol A: Epigenetic effects on the male reproductive system and male offspring. Reprod Toxicol 2024; 129:108656. [PMID: 39004383 DOI: 10.1016/j.reprotox.2024.108656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 06/23/2024] [Accepted: 07/03/2024] [Indexed: 07/16/2024]
Abstract
Bisphenol A (BPA) is a commonly used organic compound. Over the past decades, many studies have examined the mechanisms of BPA toxicity, with BPA-induced alterations in epigenetic modifications receiving considerable attention. Particularly in the male reproductive system, abnormal alterations in epigenetic markers can adversely affect reproductive function. Furthermore, these changes in epigenetic markers can be transmitted to offspring through the father. Here, we review the effects of BPA exposure on various epigenetic markers in the male reproductive system, including DNA methylation, histone modifications, and noncoding RNA, as well as associated changes in the male reproductive function. We also reviewed the effects of father's exposure to BPA on offspring epigenetic modification patterns.
Collapse
Affiliation(s)
- Zhilin Hong
- The center of clinical laboratory, the Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian 362000, PR China.
| | - Yingpei Xu
- Department of Reproductive Medicine, Longyan First Affiliated Hospital of Fujian Medical University, Longyan, Fujian 364000, PR China
| | - Jinxiang Wu
- Department of reproductive medicine, the Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian 362000, PR China.
| |
Collapse
|
3
|
Khan S, Uddin I, Khan M, Iqbal N, Alshanbari HM, Ahmad B, Khan DM. Sequence based model using deep neural network and hybrid features for identification of 5-hydroxymethylcytosine modification. Sci Rep 2024; 14:9116. [PMID: 38643305 DOI: 10.1038/s41598-024-59777-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 04/15/2024] [Indexed: 04/22/2024] Open
Abstract
RNA modifications are pivotal in the development of newly synthesized structures, showcasing a vast array of alterations across various RNA classes. Among these, 5-hydroxymethylcytosine (5HMC) stands out, playing a crucial role in gene regulation and epigenetic changes, yet its detection through conventional methods proves cumbersome and costly. To address this, we propose Deep5HMC, a robust learning model leveraging machine learning algorithms and discriminative feature extraction techniques for accurate 5HMC sample identification. Our approach integrates seven feature extraction methods and various machine learning algorithms, including Random Forest, Naive Bayes, Decision Tree, and Support Vector Machine. Through K-fold cross-validation, our model achieved a notable 84.07% accuracy rate, surpassing previous models by 7.59%, signifying its potential in early cancer and cardiovascular disease diagnosis. This study underscores the promise of Deep5HMC in offering insights for improved medical assessment and treatment protocols, marking a significant advancement in RNA modification analysis.
Collapse
Affiliation(s)
- Salman Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Islam Uddin
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Mukhtaj Khan
- Department of Information Technology, The University of Haripur, Haripur, Pakistan
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Huda M Alshanbari
- Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia
| | - Bakhtiyar Ahmad
- Higher Education Department Afghanistan, Kabul, Afghanistan.
| | - Dost Muhammad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| |
Collapse
|
4
|
Lin L, Zhao Y, Zheng Q, Zhang J, Li H, Wu W. Epigenetic targeting of autophagy for cancer: DNA and RNA methylation. Front Oncol 2023; 13:1290330. [PMID: 38148841 PMCID: PMC10749975 DOI: 10.3389/fonc.2023.1290330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 11/28/2023] [Indexed: 12/28/2023] Open
Abstract
Autophagy, a crucial cellular mechanism responsible for degradation and recycling of intracellular components, is modulated by an intricate network of molecular signals. Its paradoxical involvement in oncogenesis, acting as both a tumor suppressor and promoter, has been underscored in recent studies. Central to this regulatory network are the epigenetic modifications of DNA and RNA methylation, notably the presence of N6-methyldeoxyadenosine (6mA) in genomic DNA and N6-methyladenosine (m6A) in eukaryotic mRNA. The 6mA modification in genomic DNA adds an extra dimension of epigenetic regulation, potentially impacting the transcriptional dynamics of genes linked to autophagy and, especially, cancer. Conversely, m6A modification, governed by methyltransferases and demethylases, influences mRNA stability, processing, and translation, affecting genes central to autophagic pathways. As we delve deeper into the complexities of autophagy regulation, the importance of these methylation modifications grows more evident. The interplay of 6mA, m6A, and autophagy points to a layered regulatory mechanism, illuminating cellular reactions to a range of conditions. This review delves into the nexus between DNA 6mA and RNA m6A methylation and their influence on autophagy in cancer contexts. By closely examining these epigenetic markers, we underscore their promise as therapeutic avenues, suggesting novel approaches for cancer intervention through autophagy modulation.
Collapse
Affiliation(s)
- Luobin Lin
- Guangdong Province Key Laboratory of Biotechnology Drug Candidates, School of Life Sciences and Biopharmaceuticals, Guangdong Pharmaceutical University, Guangzhou, Guangdong, China
| | - Yuntao Zhao
- Guangdong Province Key Laboratory of Biotechnology Drug Candidates, School of Life Sciences and Biopharmaceuticals, Guangdong Pharmaceutical University, Guangzhou, Guangdong, China
| | - Qinzhou Zheng
- Guangdong Province Key Laboratory of Biotechnology Drug Candidates, School of Life Sciences and Biopharmaceuticals, Guangdong Pharmaceutical University, Guangzhou, Guangdong, China
| | - Jiayang Zhang
- Guangdong Province Key Laboratory of Biotechnology Drug Candidates, School of Life Sciences and Biopharmaceuticals, Guangdong Pharmaceutical University, Guangzhou, Guangdong, China
| | - Huaqin Li
- School of Health Sciences, Guangzhou Xinhua University, Guangzhou, Guangdong, China
| | - Wenmei Wu
- Guangdong Province Key Laboratory of Biotechnology Drug Candidates, School of Life Sciences and Biopharmaceuticals, Guangdong Pharmaceutical University, Guangzhou, Guangdong, China
| |
Collapse
|
5
|
Sultana A, Mitu SJ, Pathan MN, Uddin MN, Uddin MA, Aryal S. 4mC-CGRU: Identification of N4-Methylcytosine (4mC) sites using convolution gated recurrent unit in Rosaceae genome. Comput Biol Chem 2023; 107:107974. [PMID: 37944386 DOI: 10.1016/j.compbiolchem.2023.107974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 09/22/2023] [Accepted: 10/24/2023] [Indexed: 11/12/2023]
Abstract
An epigenetic modification is DNA N4-methylcytosine (4mC) that affects several biological functions without altering the DNA nucleotides, including DNA conformation, cell development, replication, stability, and DNA structural changes. To prevent restriction enzyme from damaging self-DNA, 4mC performs a critical role in restriction-modification functions. Existing studies mainly focused on finding hand-crafted features to identify 4mC locations, but these methods are inefficient due to high time consuming and high costs. In our research work, we propose a 4mC-CGRU which is a deep learning-based computational model with a standard encoding method to identify the 4mC sites from DNA sequences that learned autonomous feature selection in the Rosaceae genome, particularly in Rosa chinensis (R. chinensis) and Fragaria vesca (F. vesca). The proposed model consists of a convolutional neural network (CNN) and a gated recurrent unit network (GRU)-based model for identifying 4mC sites from Fragaria vesca and Rosa chinensis in the genomes. The CNN model extracts useful features from the datasets and the GRU classifies the DNA sequences. Thus, our approach can automatically extract important features to detect relative sites from DNA sequence. The performance analysis shows that the proposed model consistently outperforms over the state-of-the-art works in detecting 4mC sites.
Collapse
Affiliation(s)
- Abida Sultana
- Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka, Bangladesh.
| | - Sadia Jannat Mitu
- Department of Computer Science and Engineering, Jagannath University, Dhaka, Bangladesh.
| | - Md Naimul Pathan
- Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka, Bangladesh.
| | - Mohammed Nasir Uddin
- Department of Computer Science and Engineering, Jagannath University, Dhaka, Bangladesh.
| | - Md Ashraf Uddin
- School of Information Technology, Deakin University Geelong, Australia.
| | - Sunil Aryal
- School of Information Technology, Deakin University Geelong, Australia.
| |
Collapse
|
6
|
Fan Y, Xiong H, Sun G. DeepASDPred: a CNN-LSTM-based deep learning method for Autism spectrum disorders risk RNA identification. BMC Bioinformatics 2023; 24:261. [PMID: 37349705 DOI: 10.1186/s12859-023-05378-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 06/06/2023] [Indexed: 06/24/2023] Open
Abstract
BACKGROUND Autism spectrum disorders (ASD) are a group of neurodevelopmental disorders characterized by difficulty communicating with society and others, behavioral difficulties, and a brain that processes information differently than normal. Genetics has a strong impact on ASD associated with early onset and distinctive signs. Currently, all known ASD risk genes are able to encode proteins, and some de novo mutations disrupting protein-coding genes have been demonstrated to cause ASD. Next-generation sequencing technology enables high-throughput identification of ASD risk RNAs. However, these efforts are time-consuming and expensive, so an efficient computational model for ASD risk gene prediction is necessary. RESULTS In this study, we propose DeepASDPerd, a predictor for ASD risk RNA based on deep learning. Firstly, we use K-mer to feature encode the RNA transcript sequences, and then fuse them with corresponding gene expression values to construct a feature matrix. After combining chi-square test and logistic regression to select the best feature subset, we input them into a binary classification prediction model constructed by convolutional neural network and long short-term memory for training and classification. The results of the tenfold cross-validation proved our method outperformed the state-of-the-art methods. Dataset and source code are available at https://github.com/Onebear-X/DeepASDPred is freely available. CONCLUSIONS Our experimental results show that DeepASDPred has outstanding performance in identifying ASD risk RNA genes.
Collapse
Affiliation(s)
- Yongxian Fan
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China
| | - Hui Xiong
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China
| | - Guicong Sun
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China.
| |
Collapse
|
7
|
Yang S, Yang Z, Yang J. 4mCBERT: A computing tool for the identification of DNA N4-methylcytosine sites by sequence- and chemical-derived information based on ensemble learning strategies. Int J Biol Macromol 2023; 231:123180. [PMID: 36646347 DOI: 10.1016/j.ijbiomac.2023.123180] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 11/26/2022] [Accepted: 12/30/2022] [Indexed: 01/15/2023]
Abstract
N4-methylcytosine (4mC) is an important DNA chemical modification pattern which is a new methylation modification discovered in recent years and plays critical roles in gene expression regulation, defense against invading genetic elements, genomic imprinting, and so on. Identifying 4mC site from DNA sequence segment contributes to discovering more novel modification patterns. In this paper, we present a model called 4mCBERT that encodes DNA sequence segments by sequence characteristics including one-hot, electron-ion interaction pseudopotential, nucleotide chemical property, word2vec and chemical information containing physicochemical properties (PCP), chemical bidirectional encoder representations from transformers (chemical BERT) and employs ensemble learning framework to develop a prediction model. PCP and chemical BERT features are firstly constructed and applied to predict 4mC sites and show positive contributions to identifying 4mC. For the Matthew's Correlation Coefficient, 4mCBERT significantly outperformed other state-of-the-art models on six independent benchmark datasets including A. thaliana, C. elegans, D. melanogaster, E. coli, G. Pickering, and G. subterraneous by 4.32 % to 24.39 %, 2.52 % to 31.65 %, 2 % to 16.49 %, 6.63 % to 35.15, 8.59 % to 61.85 %, and 8.45 % to 34.45 %. Moreover, 4mCBERT is designed to allow users to predict 4mC sites and retrain 4mC prediction models. In brief, 4mCBERT shows higher performance on six benchmark datasets by incorporating sequence- and chemical-driven information and is available at http://cczubio.top/4mCBERT and https://github.com/abcair/4mCBERT.
Collapse
Affiliation(s)
- Sen Yang
- School of Computer Science and Artificial Intelligence, Aliyun School of Big Data, School of Software, Changzhou 213164, China; The Affiliated Changzhou No 2 People's Hospital of Nanjing Medical University, Changzhou 213164, China.
| | - Zexi Yang
- School of Computer Science and Artificial Intelligence, Aliyun School of Big Data, School of Software, Changzhou 213164, China
| | - Jun Yang
- School of Educational Sciences, Yili Normal University, Yining 835000, China
| |
Collapse
|
8
|
Comparison of polyphenolic profile and antioxidant capacity of Prunus subgenus Cerasus L. species from Turkey. Eur Food Res Technol 2023. [DOI: 10.1007/s00217-023-04219-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
|
9
|
Han K, Wang J, Wang Y, Zhang L, Yu M, Xie F, Zheng D, Xu Y, Ding Y, Wan J. A review of methods for predicting DNA N6-methyladenine sites. Brief Bioinform 2023; 24:6887111. [PMID: 36502371 DOI: 10.1093/bib/bbac514] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 10/07/2022] [Accepted: 10/27/2022] [Indexed: 12/14/2022] Open
Abstract
Deoxyribonucleic acid(DNA) N6-methyladenine plays a vital role in various biological processes, and the accurate identification of its site can provide a more comprehensive understanding of its biological effects. There are several methods for 6mA site prediction. With the continuous development of technology, traditional techniques with the high costs and low efficiencies are gradually being replaced by computer methods. Computer methods that are widely used can be divided into two categories: traditional machine learning and deep learning methods. We first list some existing experimental methods for predicting the 6mA site, then analyze the general process from sequence input to results in computer methods and review existing model architectures. Finally, the results were summarized and compared to facilitate subsequent researchers in choosing the most suitable method for their work.
Collapse
Affiliation(s)
- Ke Han
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China.,College of Pharmacy, Harbin University of Commerce, Harbin, 150076, China
| | - Jianchun Wang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Yu Wang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Lei Zhang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Mengyao Yu
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Fang Xie
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Dequan Zheng
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Yaoqun Xu
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Jie Wan
- Laboratory for Space Environment and Physical Sciences, Harbin Institute of Technology, Harbin, 150001, China
| |
Collapse
|
10
|
Nabeel Asim M, Ali Ibrahim M, Fazeel A, Dengel A, Ahmed S. DNA-MP: a generalized DNA modifications predictor for multiple species based on powerful sequence encoding method. Brief Bioinform 2023; 24:6931721. [PMID: 36528802 DOI: 10.1093/bib/bbac546] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 11/06/2022] [Accepted: 11/12/2022] [Indexed: 12/23/2022] Open
Abstract
Accurate prediction of deoxyribonucleic acid (DNA) modifications is essential to explore and discern the process of cell differentiation, gene expression and epigenetic regulation. Several computational approaches have been proposed for particular type-specific DNA modification prediction. Two recent generalized computational predictors are capable of detecting three different types of DNA modifications; however, type-specific and generalized modifications predictors produce limited performance across multiple species mainly due to the use of ineffective sequence encoding methods. The paper in hand presents a generalized computational approach "DNA-MP" that is competent to more precisely predict three different DNA modifications across multiple species. Proposed DNA-MP approach makes use of a powerful encoding method "position specific nucleotides occurrence based 117 on modification and non-modification class densities normalized difference" (POCD-ND) to generate the statistical representations of DNA sequences and a deep forest classifier for modifications prediction. POCD-ND encoder generates statistical representations by extracting position specific distributional information of nucleotides in the DNA sequences. We perform a comprehensive intrinsic and extrinsic evaluation of the proposed encoder and compare its performance with 32 most widely used encoding methods on $17$ benchmark DNA modifications prediction datasets of $12$ different species using $10$ different machine learning classifiers. Overall, with all classifiers, the proposed POCD-ND encoder outperforms existing $32$ different encoders. Furthermore, combinedly over 5-fold cross validation benchmark datasets and independent test sets, proposed DNA-MP predictor outperforms state-of-the-art type-specific and generalized modifications predictors by an average accuracy of 7% across 4mc datasets, 1.35% across 5hmc datasets and 10% for 6ma datasets. To facilitate the scientific community, the DNA-MP web application is available at https://sds_genetic_analysis.opendfki.de/DNA_Modifications/.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern 67663, Germany.,German Research Center for Artificial Intelligence GmbH, Kaiserslautern 67663, Germany
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern 67663, Germany.,German Research Center for Artificial Intelligence GmbH, Kaiserslautern 67663, Germany
| | - Ahtisham Fazeel
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern 67663, Germany.,German Research Center for Artificial Intelligence GmbH, Kaiserslautern 67663, Germany
| | - Andreas Dengel
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern 67663, Germany.,German Research Center for Artificial Intelligence GmbH, Kaiserslautern 67663, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern 67663, Germany
| |
Collapse
|
11
|
Rehman MU, Tayara H, Zou Q, Chong KT. i6mA-Caps: a CapsuleNet-based framework for identifying DNA N6-methyladenine sites. Bioinformatics 2022; 38:3885-3891. [PMID: 35771648 DOI: 10.1093/bioinformatics/btac434] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 05/19/2022] [Accepted: 06/28/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION DNA N6-methyladenine (6mA) has been demonstrated to have an essential function in epigenetic modification in eukaryotic species in recent research. 6mA has been linked to various biological processes. It's critical to create a new algorithm that can rapidly and reliably detect 6mA sites in genomes to investigate their biological roles. The identification of 6mA marks in the genome is the first and most important step in understanding the underlying molecular processes, as well as their regulatory functions. RESULTS In this article, we proposed a novel computational tool called i6mA-Caps which CapsuleNet based a framework for identifying the DNA N6-methyladenine sites. The proposed framework uses a single encoding scheme for numerical representation of the DNA sequence. The numerical data is then used by the set of convolution layers to extract low-level features. These features are then used by the capsule network to extract intermediate-level and later high-level features to classify the 6mA sites. The proposed network is evaluated on three datasets belonging to three genomes which are Rosaceae, Rice and Arabidopsis thaliana. Proposed method has attained an accuracy of 96.71%, 94% and 86.83% for independent Rosaceae dataset, Rice dataset and A.thaliana dataset respectively. The proposed framework has exhibited improved results when compared with the existing top-of-the-line methods. AVAILABILITY AND IMPLEMENTATION A user-friendly web-server is made available for the biological experts which can be accessed at: http://nsclbio.jbnu.ac.kr/tools/i6mA-Caps/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mobeen Ur Rehman
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea.,Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea
| |
Collapse
|
12
|
Abbas Z, Tayara H, Chong KT. ZayyuNet - A Unified Deep Learning Model for the Identification of Epigenetic Modifications Using Raw Genomic Sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2533-2544. [PMID: 34038365 DOI: 10.1109/tcbb.2021.3083789] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Epigenetic modifications have a vital role in gene expression and are linked to cellular processes such as differentiation, development, and tumorigenesis. Thus, the availability of reliable and accurate methods for identifying and defining these changes facilitates greater insights into the regulatory mechanisms that rely on epigenetic modifications. The current experimental methods provide a genome-wide identification of epigenetic modifications; however, they are expensive and time-consuming. To date, several machine learning methods have been proposed for identifying modifications such as DNA N6-Methyladenine (6mA), RNA N6-Methyladenosine (m6A), DNA N4-methylcytosine (4mC), and RNA pseudouridine ( Ψ). However, these methods are task-specific computational tools and require different encoding representations of DNA/RNA sequences. In this study, we propose a unified deep learning model, called ZayyuNet, for the identification of various epigenetic modifications. The proposed model is based on an architecture called, SpinalNet, inspired by the human somatosensory system that can efficiently receive large inputs and achieve better performance. The proposed model has been evaluated on various epigenetic modifications such as 6mA, m6A, 4mC, and Ψ and the results achieved outperform current state-of-the-art models. A user-friendly web server has been built and made freely available at http://nsclbio.jbnu.ac.kr/tools/ZayyuNet/.
Collapse
|
13
|
Jiménez-Ramírez IA, Pijeira-Fernández G, Moreno-Cálix DM, De-la-Peña C. Same modification, different location: the mythical role of N 6-adenine methylation in plant genomes. PLANTA 2022; 256:9. [PMID: 35696004 DOI: 10.1007/s00425-022-03926-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 05/23/2022] [Indexed: 06/15/2023]
Abstract
The present review summarizes recent advances in the understanding of 6mA in DNA as an emergent epigenetic mark with distinctive characteristics, discusses its importance in plant genomes, and highlights its chemical nature and functions. Adenine methylation is an epigenetic modification present in DNA (6mA) and RNA (m6A) that has a regulatory function in many cellular processes. This modification occurs through a reversible reaction that covalently binds a methyl group, usually at the N6 position of the purine ring. This modification carries biophysical properties that affect the stability of nucleic acids as well as their binding affinity with other molecules. DNA 6mA has been related to genome stability, gene expression, DNA replication, and repair mechanisms. Recent advances have shown that 6mA in plant genomes is related to development and stress response. In this review, we present recent advances in the understanding of 6mA in DNA as an emergent epigenetic mark with distinctive characteristics. We discuss the key elements of this modification, focusing mainly on its importance in plant genomes. Furthermore, we highlight its chemical nature and the regulatory effects that it exerts on gene expression and plant development. Finally, we emphasize the functions of 6mA in photosynthesis, stress, and flowering.
Collapse
Affiliation(s)
- Irma A Jiménez-Ramírez
- Unidad de Bioquímica y Biología Molecular de Plantas, Centro de Investigación Científica de Yucatán, Calle 43 No. 130 x 32 y 34. Col. Chuburná de Hidalgo, 97205, Mérida, Yucatán, Mexico
| | - Gema Pijeira-Fernández
- Unidad de Bioquímica y Biología Molecular de Plantas, Centro de Investigación Científica de Yucatán, Calle 43 No. 130 x 32 y 34. Col. Chuburná de Hidalgo, 97205, Mérida, Yucatán, Mexico
| | - Delia M Moreno-Cálix
- Unidad de Bioquímica y Biología Molecular de Plantas, Centro de Investigación Científica de Yucatán, Calle 43 No. 130 x 32 y 34. Col. Chuburná de Hidalgo, 97205, Mérida, Yucatán, Mexico
| | - Clelia De-la-Peña
- Centro de Investigación Científica de Yucatán, Unidad de Biotecnología, Calle 43 No. 130 x 32 y 34. Col. Chuburná de Hidalgo, 97205, Mérida, Yucatán, Mexico.
| |
Collapse
|
14
|
Tang X, Zheng P, Li X, Wu H, Wei DQ, Liu Y, Huang G. Deep6mAPred: A CNN and Bi-LSTM-based deep learning method for predicting DNA N6-methyladenosine sites across plant species. Methods 2022; 204:142-150. [PMID: 35477057 DOI: 10.1016/j.ymeth.2022.04.011] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 04/16/2022] [Accepted: 04/20/2022] [Indexed: 12/11/2022] Open
Abstract
DNA N6-methyladenine (6mA) is a key DNA modification, which plays versatile roles in the cellular processes, including regulation of gene expression, DNA repair, and DNA replication. DNA 6mA is closely associated with many diseases in the mammals and with growth as well as development of plants. Precisely detecting DNA 6mA sites is of great importance to exploration of 6mA functions. Although many computational methods have been presented for DNA 6mA prediction, there is still a wide gap in the practical application. We presented a convolution neural network (CNN) and bi-directional long-short term memory (Bi-LSTM)-based deep learning method (Deep6mAPred) for predicting DNA 6mA sites across plant species. The Deep6mAPred stacked the CNNs and the Bi-LSTMs in a paralleling manner instead of a series-connection manner. The Deep6mAPred also employed the attention mechanism for improving the representations of sequences. The Deep6mAPred reached an accuracy of 0.9556 over the independent rice dataset, far outperforming the state-of-the-art methods. The tests across plant species showed that the Deep6mAPred is of a remarkable advantage over the state of the art methods. We developed a user-friendly web application for DNA 6mA prediction, which is freely available at http://106.13.196.152:7001/ for all the scientific researchers. The Deep6mAPred would enrich tools to predict DNA 6mA sites and speed up the exploration of DNA modification.
Collapse
Affiliation(s)
- Xingyu Tang
- School of Electrical Engineering, Shaoyang University, Shaoyang, Hunan 422000, China
| | - Peijie Zheng
- School of Electrical Engineering, Shaoyang University, Shaoyang, Hunan 422000, China
| | - Xueyong Li
- School of Electrical Engineering, Shaoyang University, Shaoyang, Hunan 422000, China
| | - Hongyan Wu
- The Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Dong-Qing Wei
- The Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.
| | - Yuewu Liu
- College of Information and Intelligence, Hunan Agricultural University, Changsha, Hunan 410081, China
| | - Guohua Huang
- School of Electrical Engineering, Shaoyang University, Shaoyang, Hunan 422000, China.
| |
Collapse
|
15
|
Yu L, Zhang Y, Xue L, Liu F, Chen Q, Luo J, Jing R. Systematic Analysis and Accurate Identification of DNA N4-Methylcytosine Sites by Deep Learning. Front Microbiol 2022; 13:843425. [PMID: 35401453 PMCID: PMC8989013 DOI: 10.3389/fmicb.2022.843425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Accepted: 02/21/2022] [Indexed: 11/13/2022] Open
Abstract
DNA N4-methylcytosine (4mC) is a pivotal epigenetic modification that plays an essential role in DNA replication, repair, expression and differentiation. To gain insight into the biological functions of 4mC, it is critical to identify their modification sites in the genomics. Recently, deep learning has become increasingly popular in recent years and frequently employed for the 4mC site identification. However, a systematic analysis of how to build predictive models using deep learning techniques is still lacking. In this work, we first summarized all existing deep learning-based predictors and systematically analyzed their models, features and datasets, etc. Then, using a typical standard dataset with three species (A. thaliana, C. elegans, and D. melanogaster), we assessed the contribution of different model architectures, encoding methods and the attention mechanism in establishing a deep learning-based model for the 4mC site prediction. After a series of optimizations, convolutional-recurrent neural network architecture using the one-hot encoding and attention mechanism achieved the best overall prediction performance. Extensive comparison experiments were conducted based on the same dataset. This work will be helpful for researchers who would like to build the 4mC prediction models using deep learning in the future.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, China
| | - Yonglin Zhang
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, China
| | - Li Xue
- School of Public Health, Southwest Medical University, Luzhou, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang, China
| | - Qi Chen
- Department of Endocrinology and Metabolism, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, China.,Department of Pharmacy, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu, China
| |
Collapse
|
16
|
Tsukiyama S, Hasan MM, Deng HW, Kurata H. BERT6mA: prediction of DNA N6-methyladenine site using deep learning-based approaches. Brief Bioinform 2022; 23:6539171. [PMID: 35225328 PMCID: PMC8921755 DOI: 10.1093/bib/bbac053] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 01/28/2022] [Accepted: 01/31/2022] [Indexed: 01/29/2023] Open
Abstract
N6-methyladenine (6mA) is associated with important roles in DNA replication, DNA repair, transcription, regulation of gene expression. Several experimental methods were used to identify DNA modifications. However, these experimental methods are costly and time-consuming. To detect the 6mA and complement these shortcomings of experimental methods, we proposed a novel, deep leaning approach called BERT6mA. To compare the BERT6mA with other deep learning approaches, we used the benchmark datasets including 11 species. The BERT6mA presented the highest AUCs in eight species in independent tests. Furthermore, BERT6mA showed higher and comparable performance with the state-of-the-art models while the BERT6mA showed poor performances in a few species with a small sample size. To overcome this issue, pretraining and fine-tuning between two species were applied to the BERT6mA. The pretrained and fine-tuned models on specific species presented higher performances than other models even for the species with a small sample size. In addition to the prediction, we analyzed the attention weights generated by BERT6mA to reveal how the BERT6mA model extracts critical features responsible for the 6mA prediction. To facilitate biological sciences, the BERT6mA online web server and its source codes are freely accessible at https://github.com/kuratahiroyuki/BERT6mA.git, respectively.
Collapse
Affiliation(s)
- Sho Tsukiyama
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Hong-Wen Deng
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Hiroyuki Kurata
- Corresponding author: Hiroyuki Kurata, Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan. Tel: 81-948-29-7828; E-mail:
| |
Collapse
|
17
|
Shi H, Li S, Su X. Plant6mA: a predictor for predicting N6-methyladenine sites with lightweight structure in plant genomes. Methods 2022; 204:126-131. [DOI: 10.1016/j.ymeth.2022.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 02/20/2022] [Accepted: 02/24/2022] [Indexed: 10/19/2022] Open
|
18
|
Teng Z, Zhao Z, Li Y, Tian Z, Guo M, Lu Q, Wang G. i6mA-Vote: Cross-Species Identification of DNA N6-Methyladenine Sites in Plant Genomes Based on Ensemble Learning With Voting. FRONTIERS IN PLANT SCIENCE 2022; 13:845835. [PMID: 35237293 PMCID: PMC8882731 DOI: 10.3389/fpls.2022.845835] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 01/24/2022] [Indexed: 05/17/2023]
Abstract
DNA N6-Methyladenine (6mA) is a common epigenetic modification, which plays some significant roles in the growth and development of plants. It is crucial to identify 6mA sites for elucidating the functions of 6mA. In this article, a novel model named i6mA-vote is developed to predict 6mA sites of plants. Firstly, DNA sequences were coded into six feature vectors with diverse strategies based on density, physicochemical properties, and position of nucleotides, respectively. To find the best coding strategy, the feature vectors were compared on several machine learning classifiers. The results suggested that the position of nucleotides has a significant positive effect on 6mA sites identification. Thus, the dinucleotide one-hot strategy which can describe position characteristics of nucleotides well was employed to extract DNA features in our method. Secondly, DNA sequences of Rosaceae were divided into a training dataset and a test dataset randomly. Finally, i6mA-vote was constructed by combining five different base-classifiers under a majority voting strategy and trained on the Rosaceae training dataset. The i6mA-vote was evaluated on the task of predicting 6mA sites from the genome of the Rosaceae, Rice, and Arabidopsis separately. In Rosaceae, the performances of i6mA-vote were 0.955 on accuracy (ACC), 0.909 on Matthew correlation coefficients (MCC), 0.955 on sensitivity (SN), and 0.954 on specificity (SP). Those indicators, in the order of ACC, MCC, SN, SP, were 0.882, 0.774, 0.961, and 0.803 on Rice while they were 0.798, 0.617, 0.666, and 0.929 on Arabidopsis. According to the indicators, our method was effectiveness and better than other concerned methods. The results also illustrated that i6mA-vote does not only well in 6mA sites prediction of intraspecies but also interspecies plants. Moreover, it can be seen that the specificity is distinctly lower than the sensitivity in Rice while it is just the opposite in Arabidopsis. It may be resulted from sequence similarity among Rosaceae, Rice and Arabidopsis.
Collapse
Affiliation(s)
- Zhixia Teng
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Zhengnan Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yanjuan Li
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Zhen Tian
- College of Information Engineering, Zhengzhou University, Zhengzhou, China
| | - Maozu Guo
- College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Qianzi Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- *Correspondence: Qianzi Lu,
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
- Guohua Wang,
| |
Collapse
|
19
|
Rehman MU, Tayara H, Chong KT. DCNN-4mC: Densely connected neural network based N4-methylcytosine site prediction in multiple species. Comput Struct Biotechnol J 2021; 19:6009-6019. [PMID: 34849205 PMCID: PMC8605313 DOI: 10.1016/j.csbj.2021.10.034] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 10/27/2021] [Accepted: 10/28/2021] [Indexed: 01/17/2023] Open
Abstract
DNA N4-methylcytosine (4mC) being a significant genetic modification holds a dominant role in controlling different biological functions, i.e., DNA replication, DNA repair, gene regulations and gene expression levels. The identification of 4mC sites is important to get insight information regarding different organics mechanisms. However, getting modification prediction from experimental methods is a challenging task due to high expenses and time-consuming techniques. Therefore, computational tools can be a great option for modification identification. Various computational tools are proposed in literature but their generalization and prediction performance require improvement. For this motive, we have proposed a neural network based tool named DCNN-4mC for identifying 4mC sites. The proposed model involves a set of neural network layers with a skip connection which allows to share the shallow features with dense layers. Skip connection have allowed to gather crucial information regarding 4mC sites. In literature, different models are employed on different species hence in many cases different datasets are available for a single species. In this research, we have combined all available datasets to create a single benchmark dataset for every species. To the best of our knowledge, no model in literature is employed on more than six different species. To ensure the generalizability of DCNN-4mC we have used 12 different species for performance evaluation. The DCNN-4mC tool has attained 2% to 14% higher accuracy than state-of-the-art tools on all available datasets of different species. Furthermore, independent test datasets are also engaged and DCNN-4mC have overall yielded high performance in them as well.
Collapse
Affiliation(s)
- Mobeen Ur Rehman
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
- Department of Avionics Engineering, Air University, Islamabad 44000, Pakistan
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea
- Corresponding author at: School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea (Hilal Tayara); Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea. (Kil To Chong)
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea
- Corresponding author at: School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea (Hilal Tayara); Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea. (Kil To Chong)
| |
Collapse
|
20
|
Ao C, Gao L, Yu L. Research progress in predicting DNA methylation modifications and the relation with human diseases. Curr Med Chem 2021; 29:822-836. [PMID: 34533438 DOI: 10.2174/0929867328666210917115733] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/05/2021] [Accepted: 07/11/2021] [Indexed: 11/22/2022]
Abstract
DNA methylation is an important mode of regulation in epigenetic mechanisms, and it is one of the research foci in the field of epigenetics. DNA methylation modification affects a series of biological processes, such as eukaryotic cell growth, differentiation and transformation mechanisms, by regulating gene expression. In this review, we systematically summarized the DNA methylation databases, prediction tools for DNA methylation modification, machine learning algorithms for predicting DNA methylation modification, and the relationship between DNA methylation modification and diseases such as hypertension, Alzheimer's disease, diabetic nephropathy, and cancer. An in-depth understanding of DNA methylation mechanisms can promote accurate prediction of DNA methylation modifications and the treatment and diagnosis of related diseases.
Collapse
Affiliation(s)
- Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
21
|
i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning. BIOMED RESEARCH INTERNATIONAL 2021; 2021:5515342. [PMID: 34159192 PMCID: PMC8187051 DOI: 10.1155/2021/5515342] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/21/2021] [Indexed: 12/03/2022]
Abstract
As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) plays a crucial role in controlling gene replication, expression, cell cycle, DNA replication, and differentiation. The accurate identification of 4mC sites is necessary to understand biological functions. In the paper, we use ensemble learning to develop a model named i4mC-EL to identify 4mC sites in the mouse genome. Firstly, a multifeature encoding scheme consisting of Kmer and EIIP was adopted to describe the DNA sequences. Secondly, on the basis of the multifeature encoding scheme, we developed a stacked ensemble model, in which four machine learning algorithms, namely, BayesNet, NaiveBayes, LibSVM, and Voted Perceptron, were utilized to implement an ensemble of base classifiers that produce intermediate results as input of the metaclassifier, Logistic. The experimental results on the independent test dataset demonstrate that the overall rate of predictive accurate of i4mC-EL is 82.19%, which is better than the existing methods. The user-friendly website implementing i4mC-EL can be accessed freely at the following.
Collapse
|
22
|
iRG-4mC: Neural Network Based Tool for Identification of DNA 4mC Sites in Rosaceae Genome. Symmetry (Basel) 2021. [DOI: 10.3390/sym13050899] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
DNA N4-Methylcytosine is a genetic modification process which has an essential role in changing different biological processes such as DNA conformation, DNA replication, DNA stability, cell development and structural alteration in DNA. Due to its negative effects, it is important to identify the modified 4mC sites. Further, methylcytosine may develop anywhere at cytosine residue, however, clonal gene expression patterns are most likely transmitted just for cytosine residues in strand-symmetrical sequences. For this reason many different experiments are introduced but they proved not to be viable choice due to time limitation and high expenses. Therefore, to date there is still need for an efficient computational method to deal with 4mC sites identification. Keeping it in mind, in this research we have proposed an efficient model for Fragaria vesca (F. vesca) and Rosa chinensis (R. chinensis) genome. The proposed iRG-4mC tool is developed based on neural network architecture with two encoding schemes to identify the 4mC sites. The iRG-4mC predictor outperformed the existing state-of-the-art computational model by an accuracy difference of 9.95% on F. vesca (training dataset), 8.7% on R. chinesis (training dataset), 6.2% on F. vesca (independent dataset) and 10.6% on R. chinesis (independent dataset). We have also established a webserver which is freely accessible for the research community.
Collapse
|
23
|
Rahman CR, Amin R, Shatabda S, Toaha MSI. A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome. Sci Rep 2021; 11:10357. [PMID: 33990665 PMCID: PMC8121938 DOI: 10.1038/s41598-021-89850-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Accepted: 05/04/2021] [Indexed: 12/23/2022] Open
Abstract
DNA N6-methylation (6mA) in Adenine nucleotide is a post replication modification responsible for many biological functions. Automated and accurate computational methods can help to identify 6mA sites in long genomes saving significant time and money. Our study develops a convolutional neural network (CNN) based tool i6mA-CNN capable of identifying 6mA sites in the rice genome. Our model coordinates among multiple types of features such as PseAAC (Pseudo Amino Acid Composition) inspired customized feature vector, multiple one hot representations and dinucleotide physicochemical properties. It achieves auROC (area under Receiver Operating Characteristic curve) score of 0.98 with an overall accuracy of 93.97% using fivefold cross validation on benchmark dataset. Finally, we evaluate our model on three other plant genome 6mA site identification test datasets. Results suggest that our proposed tool is able to generalize its ability of 6mA site identification on plant genomes irrespective of plant species. An algorithm for potential motif extraction and a feature importance analysis procedure are two by products of this research. Web tool for this research can be found at: https://cutt.ly/dgp3QTR.
Collapse
Affiliation(s)
| | - Ruhul Amin
- United International University, Dhaka, Bangladesh
| | | | | |
Collapse
|
24
|
Zeng R, Cheng S, Liao M. 4mCPred-MTL: Accurate Identification of DNA 4mC Sites in Multiple Species Using Multi-Task Deep Learning Based on Multi-Head Attention Mechanism. Front Cell Dev Biol 2021; 9:664669. [PMID: 34041243 PMCID: PMC8141656 DOI: 10.3389/fcell.2021.664669] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 03/17/2021] [Indexed: 01/10/2023] Open
Abstract
DNA methylation is one of the most extensive epigenetic modifications. DNA 4mC modification plays a key role in regulating chromatin structure and gene expression. In this study, we proposed a generic 4mC computational predictor, namely, 4mCPred-MTL using multi-task learning coupled with Transformer to predict 4mC sites in multiple species. In this predictor, we utilize a multi-task learning framework, in which each task is to train species-specific data based on Transformer. Extensive experimental results show that our multi-task predictive model can significantly improve the performance of the model based on single task and outperform existing methods on benchmarking comparison. Moreover, we found that our model can sufficiently capture better characteristics of 4mC sites as compared to existing commonly used feature descriptors, demonstrating the strong feature learning ability of our model. Therefore, based on the above results, it can be expected that our 4mCPred-MTL can be a useful tool for research communities of interest.
Collapse
Affiliation(s)
- Rao Zeng
- Department of Software Engineering, School of Informatics, Xiamen University, Xiamen, China
| | - Song Cheng
- Department of Thoracic Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Minghong Liao
- Department of Software Engineering, School of Informatics, Xiamen University, Xiamen, China
| |
Collapse
|
25
|
Chachar S, Liu J, Zhang P, Riaz A, Guan C, Liu S. Harnessing Current Knowledge of DNA N6-Methyladenosine From Model Plants for Non-model Crops. Front Genet 2021; 12:668317. [PMID: 33995495 PMCID: PMC8118384 DOI: 10.3389/fgene.2021.668317] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 04/06/2021] [Indexed: 12/12/2022] Open
Abstract
Epigenetic modifications alter the gene activity and function by causing change in the chromosomal architecture through DNA methylation/demethylation, or histone modifications without causing any change in DNA sequence. In plants, DNA cytosine methylation (5mC) is vital for various pathways such as, gene regulation, transposon suppression, DNA repair, replication, transcription, and recombination. Thanks to recent advances in high throughput sequencing (HTS) technologies for epigenomic “Big Data” generation, accumulated studies have revealed the occurrence of another novel DNA methylation mark, N6-methyladenosine (6mA), which is highly present on gene bodies mainly activates gene expression in model plants such as eudicot Arabidopsis (Arabidopsis thaliana) and monocot rice (Oryza sativa). However, in non-model crops, the occurrence and importance of 6mA remains largely less known, with only limited reports in few species, such as Rosaceae (wild strawberry), and soybean (Glycine max). Given the aforementioned vital roles of 6mA in plants, hereinafter, we summarize the latest advances of DNA 6mA modification, and investigate the historical, known and vital functions of 6mA in plants. We also consider advanced artificial-intelligence biotechnologies that improve extraction and prediction of 6mA concepts. In this Review, we discuss the potential challenges that may hinder exploitation of 6mA, and give future goals of 6mA from model plants to non-model crops.
Collapse
Affiliation(s)
- Sadaruddin Chachar
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Horticulture, Northwest A&F University, Yangling, China.,Department of Biotechnology, Faculty of Crop Production, Sindh Agriculture University, Tandojam, Pakistan
| | - Jingrong Liu
- College of Mathematics and Statistics, Northwest Normal University, Lanzhou, China
| | - Pingxian Zhang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Adeel Riaz
- Deaprtment of Biochemistry, Faculty of Life Sciences, University of Okara, Okara, Pakistan
| | - Changfei Guan
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Horticulture, Northwest A&F University, Yangling, China
| | - Shuyuan Liu
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Horticulture, Northwest A&F University, Yangling, China
| |
Collapse
|
26
|
Li M, Xiao Y, Mount S, Liu Z. An Atlas of Genomic Resources for Studying Rosaceae Fruits and Ornamentals. FRONTIERS IN PLANT SCIENCE 2021; 12:644881. [PMID: 33868343 PMCID: PMC8047320 DOI: 10.3389/fpls.2021.644881] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 02/22/2021] [Indexed: 05/12/2023]
Abstract
Rosaceae, a large plant family of more than 3,000 species, consists of many economically important fruit and ornamental crops, including peach, apple, strawberry, raspberry, cherry, and rose. These horticultural crops are not only important economic drivers in many regions of the world, but also major sources of human nutrition. Additionally, due to the diversity of fruit types in Rosaceae, this plant family offers excellent opportunities for investigations into fleshy fruit diversity, evolution, and development. With the development of high-throughput sequencing technologies and computational tools, an increasing number of high-quality genomes and transcriptomes of Rosaceae species have become available and will greatly facilitate Rosaceae research and breeding. This review summarizes major genomic resources and genome research progress in Rosaceae, highlights important databases, and suggests areas for further improvement. The availability of these big data resources will greatly accelerate research progress and enhance the agricultural productivity of Rosaceae.
Collapse
Affiliation(s)
| | | | | | - Zhongchi Liu
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, United States
| |
Collapse
|
27
|
Khanal J, Tayara H, Zou Q, Chong KT. Identifying DNA N4-methylcytosine sites in the rosaceae genome with a deep learning model relying on distributed feature representation. Comput Struct Biotechnol J 2021; 19:1612-1619. [PMID: 33868598 PMCID: PMC8042287 DOI: 10.1016/j.csbj.2021.03.015] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 03/12/2021] [Accepted: 03/13/2021] [Indexed: 12/11/2022] Open
Abstract
DNA N4-methylcytosine (4mC), an epigenetic modification found in prokaryotic and eukaryotic species, is involved in numerous biological functions, including host defense, transcription regulation, gene expression, and DNA replication. To identify 4mC sites, previous computational studies mostly focused on finding hand-crafted features. This area of research, therefore, would benefit from the development of a computational approach that relies on automatic feature selection to identify relevant sites. We here report 4mC-w2vec, a computational method that learned automatic feature discrimination in the Rosaceae genomes, especially in Rosa chinensis (R. chinensis) and Fragaria vesca (F. vesca), based on distributed feature representation and through the word embedding technique ‘word2vec’. While a few bioinformatics tools are currently employed to identify 4mC sites in these genomes, their prediction performance is inadequate. Our system processed 4mC and non-4mC sites through a word embedding process, including sub-word information of its biological words through k-mer, which then served as features that were fed into a double layer of convolutional neural network (CNN) to classify whether the sample sequences contained 4mCs or non-4mCs sites. Our tool demonstrated performance superior to current tools that use the same genomic datasets. Additionally, 4mC-w2vec is effective for balanced and imbalanced class datasets alike, and the online web-server is currently available at: http://nsclbio.jbnu.ac.kr/tools/4mC-w2vec/.
Collapse
Affiliation(s)
- Jhabindra Khanal
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Hilal Tayara
- School of international Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea.,Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea
| |
Collapse
|
28
|
Lombó M, Herráez P. The effects of endocrine disruptors on the male germline: an intergenerational health risk. Biol Rev Camb Philos Soc 2021; 96:1243-1262. [PMID: 33660399 DOI: 10.1111/brv.12701] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 02/17/2021] [Accepted: 02/19/2021] [Indexed: 12/22/2022]
Abstract
Environmental pollution is becoming one of the major concerns of society. Among the emerging contaminants, endocrine-disrupting chemicals (EDCs), a large group of toxicants, have been the subject of many scientific studies. Besides the capacity of these compounds to interfere with the endocrine system, they have also been reported to exert both genotoxic and epigenotoxic effects. Given that spermatogenesis is a coordinated process that requires the involvement of several steroid hormones and that entails deep changes in the chromatin, such as DNA compaction and epigenetic remodelling, it could be affected by male exposure to EDCs. A great deal of evidence highlights that these compounds have detrimental effects on male reproductive health, including alterations to sperm motility, sexual function, and gonad development. This review focuses on the consequences of paternal exposure to such chemicals for future generations, which still remain poorly known. Historically, spermatozoa have long been considered as mere vectors delivering the paternal haploid genome to the oocyte. Only recently have they been understood to harbour genetic and epigenetic information that plays a remarkable role during offspring early development and long-term health. This review examines the different modes of action by which the spermatozoa represent a key target for EDCs, and analyses the consequences of environmentally induced changes in sperm genetic and epigenetic information for subsequent generations.
Collapse
Affiliation(s)
- Marta Lombó
- Department of Animal Reproduction, INIA, Puerta de Hierro 18, Madrid, 28040, Spain
| | - Paz Herráez
- Department of Molecular Biology, Faculty of Biology, Universidad de León, Campus de Vegazana s/n, León, 24071, Spain
| |
Collapse
|
29
|
Li Z, Jiang H, Kong L, Chen Y, Lang K, Fan X, Zhang L, Pian C. Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species. PLoS Comput Biol 2021; 17:e1008767. [PMID: 33600435 PMCID: PMC7924747 DOI: 10.1371/journal.pcbi.1008767] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 03/02/2021] [Accepted: 02/03/2021] [Indexed: 12/25/2022] Open
Abstract
N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA’s biological functions. However, the existing experimental techniques for detecting 6mA sites are cost-ineffective, which implies the great need of developing new computational methods for this problem. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca and Rosa chinensis with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression. DNA N6 methyladenine (6mA) is a newly recognized methylation modification in eukaryotes. It exists widely and conservatively in organisms, and its modification level changes dynamically in the whole life cycle. This study proposes an algorithm based on a deep learning framework including LSTM and CNN to predict 6mA sites. The results showed that our method could accurately predict the 6mA sites in different species, which means DNA sub-sequences containing 6mA sites among species have certain conservation. Importantly, we found that 6mA methylation in most different species is more likely to occur on the GAGG motif. In addition, we also found that 6mA is rich in the promoter’s TATA box, which may be a mechanism of regulating downstream gene expression.
Collapse
Affiliation(s)
- Zutan Li
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
| | - Hangjin Jiang
- Center for Data Science, Zhejiang University, Hangzhou, China
| | - Lingpeng Kong
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
| | - Yuanyuan Chen
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
| | - Kun Lang
- College of information science & Technology, Nanjing Agricultural University, Nanjing, China
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Liangyun Zhang
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
- * E-mail: (LYZ); (CP)
| | - Cong Pian
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
- * E-mail: (LYZ); (CP)
| |
Collapse
|
30
|
Hasan MM, Shoombuatong W, Kurata H, Manavalan B. Critical evaluation of web-based DNA N6-methyladenine site prediction tools. Brief Funct Genomics 2021; 20:258-272. [PMID: 33491072 DOI: 10.1093/bfgp/elaa028] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Revised: 12/11/2020] [Accepted: 12/15/2020] [Indexed: 12/13/2022] Open
Abstract
Methylation of DNA N6-methyladenosine (6mA) is a type of epigenetic modification that plays pivotal roles in various biological processes. The accurate genome-wide identification of 6mA is a challenging task that leads to understanding the biological functions. For the last 5 years, a number of bioinformatics approaches and tools for 6mA site prediction have been established, and some of them are easily accessible as web application. Nevertheless, the accurate genome-wide identification of 6mA is still one of the challenging works that lead to understanding the biological functions. Especially in practical applications, these tools have implemented diverse encoding schemes, machine learning algorithms and feature selection methods, whereas few systematic performance comparisons of 6mA site predictors have been reported. In this review, 11 publicly available 6mA predictors evaluated with seven different species-specific datasets (Arabidopsis thaliana, Tolypocladium, Diospyros lotus, Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans and Escherichia coli). Of those, few species are close homologs, and the remaining datasets are distant sequences. Our independent, validation tests demonstrated that Meta-i6mA and MM-6mAPred models for A. thaliana, Tolypocladium, S. cerevisiae and D. melanogaster achieved excellent overall performance when compared with their counterparts. However, none of the existing methods were suitable for E. coli, C. elegans and D. lotus. A feasibility of the existing predictors is also discussed for the seven species. Our evaluation provides useful guidelines for the development of 6mA site predictors and helps biologists selecting suitable prediction tools.
Collapse
Affiliation(s)
| | - Watshara Shoombuatong
- Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics in the Kyushu Institute of Technology, Japan
| | | |
Collapse
|
31
|
Khanal J, Lim DY, Tayara H, Chong KT. i6mA-stack: A stacking ensemble-based computational prediction of DNA N6-methyladenine (6mA) sites in the Rosaceae genome. Genomics 2020; 113:582-592. [PMID: 33010390 DOI: 10.1016/j.ygeno.2020.09.054] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 09/09/2020] [Accepted: 09/23/2020] [Indexed: 01/09/2023]
Abstract
DNA N6-methyladenine (6 mA) is an epigenetic modification that plays a vital role in a variety of cellular processes in both eukaryotes and prokaryotes. Accurate information of 6 mA sites in the Rosaceae genome may assist in understanding genomic 6 mA distributions and various biological functions such as epigenetic inheritance. Various studies have shown the possibility of identifying 6 mA sites through experiments, but the procedures are time-consuming and costly. To overcome the drawbacks of experimental methods, we propose an accurate computational paradigm based on a machine learning (ML) technique to identify 6 mA sites in Rosa chinensis (R.chinensis) and Fragaria vesca (F.vesca). To improve the performance of the proposed model and to avoid overfitting, a recursive feature elimination with cross-validation (RFECV) strategy is used to extract the optimal number of features (ONF) subset from five different DNA sequence encoding schemes, i.e., Binary Encoding (BE), Ring-Function-Hydrogen-Chemical Properties (RFHC), Electron-Ion-Interaction Pseudo Potentials of Nucleotides (EIIP), Dinucleotide Physicochemical Properties (DPCP), and Trinucleotide Physicochemical Properties (TPCP). Subsequently, we use the ONF subset to train a double layers of ML-based stacking model to create a bioinformatics tool named 'i6mA-stack'. This tool outperforms its peer tool in general and is currently available at http://nsclbio.jbnu.ac.kr/tools/i6mA-stack/.
Collapse
Affiliation(s)
- Jhabindra Khanal
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Dae Young Lim
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea; Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea; Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea.
| |
Collapse
|
32
|
Manavalan B, Hasan MM, Basith S, Gosu V, Shin TH, Lee G. Empirical Comparison and Analysis of Web-Based DNA N 4-Methylcytosine Site Prediction Tools. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 22:406-420. [PMID: 33230445 PMCID: PMC7533314 DOI: 10.1016/j.omtn.2020.09.010] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 09/11/2020] [Indexed: 12/12/2022]
Abstract
DNA N4-methylcytosine (4mC) is a crucial epigenetic modification involved in various biological processes. Accurate genome-wide identification of these sites is critical for improving our understanding of their biological functions and mechanisms. As experimental methods for 4mC identification are tedious, expensive, and labor-intensive, several machine learning-based approaches have been developed for genome-wide detection of such sites in multiple species. However, the predictions projected by these tools are difficult to quantify and compare. To date, no systematic performance comparison of 4mC tools has been reported. The aim of this study was to compare and critically evaluate 12 publicly available 4mC site prediction tools according to species specificity, based on a huge independent validation dataset. The tools 4mCCNN (Escherichia coli), DNA4mC-LIP (Arabidopsis thaliana), iDNA-MS (Fragaria vesca), DNA4mC-LIP and 4mCCNN (Drosophila melanogaster), and four tools for Caenorhabditis elegans achieved excellent overall performance compared with their counterparts. However, none of the existing methods was suitable for Geoalkalibacter subterraneus, Geobacter pickeringii, and Mus musculus, thereby limiting their practical applicability. Model transferability to five species and non-transferability to three species are also discussed. The presented evaluation will assist researchers in selecting appropriate prediction tools that best suit their purpose and provide useful guidelines for the development of improved 4mC predictors in the future.
Collapse
Affiliation(s)
- Balachandran Manavalan
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan.,Japan Society for the Promotion of Science, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea
| | - Vijayakumar Gosu
- Department of Animal Biotechnology, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Tae-Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea.,Department of Molecular Science and Technology, Ajou University, Suwon 16499, Republic of Korea
| |
Collapse
|
33
|
Hasan MM, Basith S, Khatun MS, Lee G, Manavalan B, Kurata H. Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework. Brief Bioinform 2020; 22:5903398. [PMID: 32910169 DOI: 10.1093/bib/bbaa202] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 08/06/2020] [Accepted: 08/06/2020] [Indexed: 12/13/2022] Open
Abstract
DNA N6-methyladenine (6mA) represents important epigenetic modifications, which are responsible for various cellular processes. The accurate identification of 6mA sites is one of the challenging tasks in genome analysis, which leads to an understanding of their biological functions. To date, several species-specific machine learning (ML)-based models have been proposed, but majority of them did not test their model to other species. Hence, their practical application to other plant species is quite limited. In this study, we explored 10 different feature encoding schemes, with the goal of capturing key characteristics around 6mA sites. We selected five feature encoding schemes based on physicochemical and position-specific information that possesses high discriminative capability. The resultant feature sets were inputted to six commonly used ML methods (random forest, support vector machine, extremely randomized tree, logistic regression, naïve Bayes and AdaBoost). The Rosaceae genome was employed to train the above classifiers, which generated 30 baseline models. To integrate their individual strength, Meta-i6mA was proposed that combined the baseline models using the meta-predictor approach. In extensive independent test, Meta-i6mA showed high Matthews correlation coefficient values of 0.918, 0.827 and 0.635 on Rosaceae, rice and Arabidopsis thaliana, respectively and outperformed the existing predictors. We anticipate that the Meta-i6mA can be applied across different plant species. Furthermore, we developed an online user-friendly web server, which is available at http://kurata14.bio.kyutech.ac.jp/Meta-i6mA/.
Collapse
Affiliation(s)
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Republic of Korea
| | - Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Japan
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Republic of Korea
| | | | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics in the Kyushu Institute of Technology, Japan
| |
Collapse
|
34
|
Yuan DH, Xing JF, Luan MW, Ji KK, Guo J, Xie SQ, Zhang YM. DNA N6-Methyladenine Modification in Wild and Cultivated Soybeans Reveals Different Patterns in Nucleus and Cytoplasm. Front Genet 2020; 11:736. [PMID: 32849778 PMCID: PMC7398112 DOI: 10.3389/fgene.2020.00736] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 06/18/2020] [Indexed: 01/16/2023] Open
Abstract
DNA 6mA modification, an important newly discovered epigenetic mark, plays a crucial role in organisms and has been attracting more and more attention in recent years. The soybean is economically the most important bean in the world, providing vegetable protein for millions of people. However, the distribution pattern and function of 6mA in soybean are still unknown. In this study, we decoded 6mA modification to single-nucleotide resolution in wild and cultivated soybeans, and compared the 6mA differences between cytoplasmic and nuclear genomes and between wild and cultivated soybeans. The motif of 6mA in the nuclear genome was conserved across the two kinds of soybeans, and ANHGA was the most dominant motif in wild and cultivated soybeans. Genes with 6mA modification in the nucleus had higher expression than those without modification. Interestingly, 6mA distribution patterns in cytoplasm for each soybean were significantly different from those in nucleus, which was reported for the first time in soybean. Our research provides a new insight in the deep analysis of cytoplasmic genomic DNA modification in plants.
Collapse
Affiliation(s)
- De-Hui Yuan
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Jian-Feng Xing
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| | - Mei-Wei Luan
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| | - Kai-Kai Ji
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| | - Jun Guo
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| | - Shang-Qian Xie
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| | - Yuan-Ming Zhang
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
35
|
Xu H, Jia P, Zhao Z. Deep4mC: systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning. Brief Bioinform 2020; 22:5856341. [PMID: 32578842 DOI: 10.1093/bib/bbaa099] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 04/16/2020] [Accepted: 05/02/2020] [Indexed: 12/11/2022] Open
Abstract
DNA N4-methylcytosine (4mC) modification represents a novel epigenetic regulation. It involves in various cellular processes, including DNA replication, cell cycle and gene expression, among others. In addition to experimental identification of 4mC sites, in silico prediction of 4mC sites in the genome has emerged as an alternative and promising approach. In this study, we first reviewed the current progress in the computational prediction of 4mC sites and systematically evaluated the predictive capacity of eight conventional machine learning algorithms as well as 12 feature types commonly used in previous studies in six species. Using a representative benchmark dataset, we investigated the contribution of feature selection and stacking approach to the model construction, and found that feature optimization and proper reinforcement learning could improve the performance. We next recollected newly added 4mC sites in the six species' genomes and developed a novel deep learning-based 4mC site predictor, namely Deep4mC. Deep4mC applies convolutional neural networks with four representative features. For species with small numbers of samples, we extended our deep learning framework with a bootstrapping method. Our evaluation indicated that Deep4mC could obtain high accuracy and robust performance with the average area under curve (AUC) values greater than 0.9 in all species (range: 0.9005-0.9722). In comparison, Deep4mC achieved an AUC value improvement from 10.14 to 46.21% when compared to previous tools in these six species. A user-friendly web server (https://bioinfo.uth.edu/Deep4mC) was built for predicting putative 4mC sites in a genome.
Collapse
Affiliation(s)
- Haodong Xu
- Center for Precision Health, School of Biomedical Informatics
| | - Peilin Jia
- Center for Precision Health, School of Biomedical Informatics
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics
| |
Collapse
|
36
|
Zheng X, Shi M, Wang J, Yang N, Wang K, Xi J, Wu C, Xi T, Zheng J, Zhang J. Isoform Sequencing Provides Insight Into Freezing Response of Common Wheat ( Triticum aestivum L.). Front Genet 2020; 11:462. [PMID: 32595694 PMCID: PMC7300213 DOI: 10.3389/fgene.2020.00462] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 04/14/2020] [Indexed: 12/12/2022] Open
Abstract
The objective of the study is to reveal the freezing tolerance mechanisms of wheat by combining the emerging single-molecule real-time (SMRT) sequencing technology PacBio Sequel and Illumina sequencing. Commercial semiwinter wheat Zhoumai 18 was exposed to -6°C for 4 h at the four-leave stage. Leaves of the control group and freezing-treated group were used to perform cDNA library construction. PacBio SMRT sequencing yielded 51,570 high-quality isoforms from leaves of control sample of Zhoumai 18, encoded by 20,366 gene loci. In total, 73,695 transcript isoforms, corresponding to 23,039 genes, were identified from the freezing-treated leaves. Compared with transcripts from the International Wheat Genome Sequencing Consortium RefSeq v1.1, 57,667 novel isoforms were discovered, which were annotated 21,672 known gene loci, as well as 3,399 novel gene loci. Transcriptome characterization including alterative spliced events, alternative polydenylation sites, transcription factors, and fusion transcripts were also analyzed. Freezing-responsive genes and signals were uncovered and proved that the ICE-ERF-COR pathway and ABA signal transduction play a vital role in the freezing response of wheat. In this study, PacBio sequencing and Illumina sequencing were applied to investigate the freezing tolerance in common wheat, and the transcriptome results provide insights into the molecular regulation mechanisms under freezing treatment.
Collapse
Affiliation(s)
- Xingwei Zheng
- Institute of Cotton Research, Shanxi Agricultural University, Yuncheng, China
| | - Mengmeng Shi
- Institute of Wheat Research, Shanxi Agricultural University, Linfen, China
| | - Jian Wang
- Institute of Cotton Research, Shanxi Agricultural University, Yuncheng, China
| | - Na Yang
- Institute of Cotton Research, Shanxi Agricultural University, Yuncheng, China
| | - Ke Wang
- Institute of Cotton Research, Shanxi Agricultural University, Yuncheng, China
| | - Jilong Xi
- Institute of Cotton Research, Shanxi Agricultural University, Yuncheng, China
| | - Caixia Wu
- Institute of Wheat Research, Shanxi Agricultural University, Linfen, China
| | - Tianyuan Xi
- Institute of Cotton Research, Shanxi Agricultural University, Yuncheng, China
| | - Jun Zheng
- Institute of Wheat Research, Shanxi Agricultural University, Linfen, China
| | - Jiancheng Zhang
- Institute of Cotton Research, Shanxi Agricultural University, Yuncheng, China
| |
Collapse
|
37
|
Hasan MM, Manavalan B, Shoombuatong W, Khatun MS, Kurata H. i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation. PLANT MOLECULAR BIOLOGY 2020; 103:225-234. [PMID: 32140819 DOI: 10.1007/s11103-020-00988-y] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 02/29/2020] [Indexed: 05/28/2023]
Abstract
DNA N6-methyladenine (6 mA) is one of the most vital epigenetic modifications and involved in controlling the various gene expression levels. With the avalanche of DNA sequences generated in numerous databases, the accurate identification of 6 mA plays an essential role for understanding molecular mechanisms. Because the experimental approaches are time-consuming and costly, it is desirable to develop a computation model for rapidly and accurately identifying 6 mA. To the best of our knowledge, we first proposed a computational model named i6mA-Fuse to predict 6 mA sites from the Rosaceae genomes, especially in Rosa chinensis and Fragaria vesca. We implemented the five encoding schemes, i.e., mononucleotide binary, dinucleotide binary, k-space spectral nucleotide, k-mer, and electron-ion interaction pseudo potential compositions, to build the five, single-encoding random forest (RF) models. The i6mA-Fuse uses a linear regression model to combine the predicted probability scores of the five, single encoding-based RF models. The resultant species-specific i6mA-Fuse achieved remarkably high performances with AUCs of 0.982 and 0.978 and with MCCs of 0.869 and 0.858 on the independent datasets of Rosa chinensis and Fragaria vesca, respectively. In the F. vesca-specific i6mA-Fuse, the MBE and EIIP contributed to 75% and 25% of the total prediction; in the R. chinensis-specific i6mA-Fuse, Kmer, MBE, and EIIP contribute to 15%, 65%, and 20% of the total prediction. To assist high-throughput prediction for DNA 6 mA identification, the i6mA-Fuse is publicly accessible at https://kurata14.bio.kyutech.ac.jp/i6mA-Fuse/.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
- Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan
| | | | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan.
- Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan.
| |
Collapse
|
38
|
Zeng R, Liao M. Developing a Multi-Layer Deep Learning Based Predictive Model to Identify DNA N4-Methylcytosine Modifications. Front Bioeng Biotechnol 2020; 8:274. [PMID: 32373597 PMCID: PMC7186498 DOI: 10.3389/fbioe.2020.00274] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 03/16/2020] [Indexed: 12/21/2022] Open
Abstract
DNA N4-methylcytosine modification (4mC) plays an essential role in a variety of biological processes. Therefore, accurate identification the 4mC distribution in genome-scale is important for systematically understanding its biological functions. In this study, we present Deep4mcPred, a multi-layer deep learning based predictive model to identify DNA N4-methylcytosine modifications. In this predictor, we for the first time integrate residual network and recurrent neural network to build a multi-layer deep learning predictive system. As compared to existing predictors using traditional machine learning, our proposed method has two advantages. First, our deep learning framework does not need to specify the features when training the predictive model. It can automatically learn the high-level features and capture the characteristic specificity of 4mC sites, benefiting to distinguish true 4mC sites from non-4mC sites. On the other hand, our deep learning method outperforms the traditional machine learning predictors in performance by benchmarking comparison, demonstrating that the proposed Deep4mcPred is more effective in the DNA 4mC site prediction. Moreover, via experimental comparison, we found that attention mechanism introduced into the deep learning framework is useful to capture the critical features. Additionally, we develop a webserver implementing the proposed method for the academic use of research community, which is now available at http://server.malab.cn/Deep4mcPred.
Collapse
Affiliation(s)
- Rao Zeng
- Department of Software Engineering, School of Informatics, Xiamen University, Xiamen, China
| | - Minghong Liao
- Department of Software Engineering, School of Informatics, Xiamen University, Xiamen, China
| |
Collapse
|
39
|
Li Y, Zhang XM, Luan MW, Xing JF, Chen J, Xie SQ. Distribution Patterns of DNA N6-Methyladenosine Modification in Non-coding RNA Genes. Front Genet 2020; 11:268. [PMID: 32265991 PMCID: PMC7105833 DOI: 10.3389/fgene.2020.00268] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 03/05/2020] [Indexed: 01/20/2023] Open
Abstract
N6-methyladenosine (6mA) DNA modification played an important role in epigenetic regulation of gene expression. And the aberrational expression of non-coding genes, as important regular elements of gene expression, was related to many diseases. However, the distribution and potential functions of 6mA modification in non-coding RNA (ncRNA) genes are still unknown. In this study, we analyzed the 6mA distribution of ncRNA genes and compared them with protein-coding genes in four species (Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens) using single-molecule real-time (SMRT) sequencing data. The results indicated that the consensus motifs of short nucleotides at 6mA location were highly conserved in four species, and the non-coding gene was less likely to be methylated compared with protein-coding gene. Especially, the 6mA-methylated lncRNA genes were expressed significant lower than genes without methylation in A. thaliana (p = 3.295e-4), D. melanogaster (p = 3.439e-11), and H. sapiens (p = 9.087e-3). The detection and distribution profiling of 6mA modification in ncRNA regions from four species reveal that 6mA modifications may have effects on their expression level.
Collapse
Affiliation(s)
- Yu Li
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| | - Xiao-Ming Zhang
- College of Grassland, Resources and Environment, Inner Mongolia Agricultural University, Huhhot, China
| | - Mei-Wei Luan
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| | - Jian-Feng Xing
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| | - Jianguo Chen
- School of Life Sciences, Hubei University, Wuhan, China
| | - Shang-Qian Xie
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| |
Collapse
|
40
|
Karanthamalai J, Chodon A, Chauhan S, Pandi G. DNA N 6-Methyladenine Modification in Plant Genomes-A Glimpse into Emerging Epigenetic Code. PLANTS (BASEL, SWITZERLAND) 2020; 9:E247. [PMID: 32075056 PMCID: PMC7076483 DOI: 10.3390/plants9020247] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 02/09/2020] [Accepted: 02/11/2020] [Indexed: 02/08/2023]
Abstract
N6-methyladenine (6mA) is a DNA base modification at the 6th nitrogen position; recently, it has been resurfaced as a potential reversible epigenetic mark in eukaryotes. Despite its existence, 6mA was considered to be absent due to its undetectable level. However, with the new advancements in methods, considerable 6mA distribution is identified across the plant genome. Unlike 5-methylcytosine (5mC) in the gene promoter, 6mA does not have a definitive role in repression but is exposed to have divergent regulation in gene expression. Though 6mA information is less known, the available evidences suggest its function in plant development, tissue differentiation, and regulations in gene expression. The current review article emphasizes the research advances in DNA 6mA modifications, identification, available databases, analysis tools and its significance in plant development, cellular functions and future perspectives of research.
Collapse
Affiliation(s)
| | | | | | - Gopal Pandi
- Department of Plant Biotechnology, School of Biotechnology, Madurai Kamaraj University, Madurai625021, Tamil Nadu, India; (J.K.); (A.C.); (S.C.)
| |
Collapse
|
41
|
Xie SQ, Xing JF, Zhang XM, Liu ZY, Luan MW, Zhu J, Ling P, Xiao CL, Song XQ, Zheng J, Chen Y. N 6-Methyladenine DNA Modification in the Woodland Strawberry ( Fragaria vesca) Genome Reveals a Positive Relationship With Gene Transcription. Front Genet 2020; 10:1288. [PMID: 31998359 PMCID: PMC6967393 DOI: 10.3389/fgene.2019.01288] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Accepted: 11/22/2019] [Indexed: 01/24/2023] Open
Abstract
N 6-methyladenine (6mA) DNA modification has been detected in several eukaryotic organisms, where it plays important roles in gene regulation and epigenetic memory maintenance. However, the genome-wide distribution patterns and potential functions of 6mA DNA modification in woodland strawberry (Fragaria vesca) remain largely unknown. Here, we examined the 6mA landscape in the F. vesca genome by adopting single-molecule real-time sequencing technology and found that 6mA modification sites were broadly distributed across the woodland strawberry genome. The pattern of 6mA distribution in the long non-coding RNA was significantly different from that in protein-coding genes. The 6mA modification influenced the gene transcription and was positively associated with gene expression, which was validated by computational and experimental analyses. Our study provides new insights into the DNA methylation in F. vesca.
Collapse
Affiliation(s)
- Shang-Qian Xie
- Key Laboratory of Ministry of Education for Genetics and Germplasm Innovation of Tropical Special Trees and Ornamental Plants, Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Natural Rubber Cooperative Innovation Centre of Hainan Province & Ministry of Education of China, Hainan University, Haikou, China
| | - Jian-Feng Xing
- Key Laboratory of Ministry of Education for Genetics and Germplasm Innovation of Tropical Special Trees and Ornamental Plants, Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Natural Rubber Cooperative Innovation Centre of Hainan Province & Ministry of Education of China, Hainan University, Haikou, China
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Xiao-Ming Zhang
- Institute of Wheat Research, Shanxi Academy of Agricultural Sciences, Linfen, China
| | - Zhao-Yu Liu
- Key Laboratory of Ministry of Education for Genetics and Germplasm Innovation of Tropical Special Trees and Ornamental Plants, Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Natural Rubber Cooperative Innovation Centre of Hainan Province & Ministry of Education of China, Hainan University, Haikou, China
| | - Mei-Wei Luan
- Key Laboratory of Ministry of Education for Genetics and Germplasm Innovation of Tropical Special Trees and Ornamental Plants, Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Natural Rubber Cooperative Innovation Centre of Hainan Province & Ministry of Education of China, Hainan University, Haikou, China
| | - Jie Zhu
- Key Laboratory of Ministry of Education for Genetics and Germplasm Innovation of Tropical Special Trees and Ornamental Plants, Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Natural Rubber Cooperative Innovation Centre of Hainan Province & Ministry of Education of China, Hainan University, Haikou, China
| | - Peng Ling
- Key Laboratory of Ministry of Education for Genetics and Germplasm Innovation of Tropical Special Trees and Ornamental Plants, Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Natural Rubber Cooperative Innovation Centre of Hainan Province & Ministry of Education of China, Hainan University, Haikou, China
| | - Chuan-Le Xiao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Xi-Qiang Song
- Key Laboratory of Ministry of Education for Genetics and Germplasm Innovation of Tropical Special Trees and Ornamental Plants, Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Natural Rubber Cooperative Innovation Centre of Hainan Province & Ministry of Education of China, Hainan University, Haikou, China
| | - Jun Zheng
- Institute of Wheat Research, Shanxi Academy of Agricultural Sciences, Linfen, China
| | - Ying Chen
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
42
|
Hasan MM, Manavalan B, Khatun MS, Kurata H. i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome. Int J Biol Macromol 2019; 157:752-758. [PMID: 31805335 DOI: 10.1016/j.ijbiomac.2019.12.009] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 11/29/2019] [Accepted: 12/02/2019] [Indexed: 12/18/2022]
Abstract
One of the most important epigenetic modifications is N4-methylcytosine, which regulates many biological processes including DNA replication and chromosome stability. Identification of N4-methylcytosine sites is pivotal to understand specific biological functions. Herein, we developed the first bioinformatics tool called i4mC-ROSE for identifying N4-methylcytosine sites in the genomes of Fragaria vesca and Rosa chinensis in the Rosaceae, which utilizes a random forest classifier with six encoding methods that cover various aspects of DNA sequence information. The i4mC-ROSE predictor achieves area under the curve scores of 0.883 and 0.889 for the two genomes during cross-validation. Moreover, the i4mC-ROSE outperforms other classifiers tested in this study when objectively evaluated on the independent datasets. The proposed i4mC-ROSE tool can serve users' demand for the prediction of 4mC sites in the Rosaceae genome. The i4mC-ROSE predictor and utilized datasets are publicly accessible at http://kurata14.bio.kyutech.ac.jp/i4mC-ROSE/.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Balachandran Manavalan
- Department of Physiology, Ajou University School of Medicine, Suwon 443380, Republic of Korea
| | - Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| |
Collapse
|
43
|
i6mA-DNCP: Computational Identification of DNA N6-Methyladenine Sites in the Rice Genome Using Optimized Dinucleotide-Based Features. Genes (Basel) 2019; 10:genes10100828. [PMID: 31635172 PMCID: PMC6826501 DOI: 10.3390/genes10100828] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 10/16/2019] [Accepted: 10/18/2019] [Indexed: 12/22/2022] Open
Abstract
DNA N6-methyladenine (6mA) plays an important role in regulating the gene expression of eukaryotes. Accurate identification of 6mA sites may assist in understanding genomic 6mA distributions and biological functions. Various experimental methods have been applied to detect 6mA sites in a genome-wide scope, but they are too time-consuming and expensive. Developing computational methods to rapidly identify 6mA sites is needed. In this paper, a new machine learning-based method, i6mA-DNCP, was proposed for identifying 6mA sites in the rice genome. Dinucleotide composition and dinucleotide-based DNA properties were first employed to represent DNA sequences. After a specially designed DNA property selection process, a bagging classifier was used to build the prediction model. The jackknife test on a benchmark dataset demonstrated that i6mA-DNCP could obtain 84.43% sensitivity, 88.86% specificity, 86.65% accuracy, a 0.734 Matthew's correlation coefficient (MCC), and a 0.926 area under the receiver operating characteristic curve (AUC). Moreover, three independent datasets were established to assess the generalization ability of our method. Extensive experiments validated the effectiveness of i6mA-DNCP.
Collapse
|