1
|
Hu F, Gao J, Zheng J, Kwoh C, Jia C. N-GlycoPred: A hybrid deep learning model for accurate identification of N-glycosylation sites. Methods 2024; 227:48-57. [PMID: 38734394 DOI: 10.1016/j.ymeth.2024.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/16/2024] [Accepted: 05/03/2024] [Indexed: 05/13/2024] Open
Abstract
Studies have shown that protein glycosylation in cells reflects the real-time dynamics of biological processes, and the occurrence and development of many diseases are closely related to protein glycosylation. Abnormal protein glycosylation can be used as a potential diagnostic and prognostic marker of a disease, as well as a therapeutic target and a new breakthrough point for exploring pathogenesis. To address the issue of significant differences in the prediction results of previous models for different species, we constructed a hybrid deep learning model N-GlycoPred on the basis of dual-layer convolution, a paired attention mechanism and BiLSTM for accurate identification of N-glycosylation sites. By adopting one-hot encoding or the AAindex, we specifically selected the optimum combination of features and deep learning frameworks for human and mouse to refine the models. Based on six independent test datasets, our N-GlycoPred model achieved an average AUC of 0.9553, which is 0.23% higher than MusiteDeep. The comparison results indicate that our model can serve as a powerful tool for N-glycosylation site prescreening for biological researchers.
Collapse
Affiliation(s)
- Fengzhu Hu
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jie Gao
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jia Zheng
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Cheekeong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China.
| |
Collapse
|
2
|
Shi M, Wang C, Wang P, Yun F, Liu Z, Ye F, Wei L, Liao W. Role of methylation in vernalization and photoperiod pathway: a potential flowering regulator? HORTICULTURE RESEARCH 2023; 10:uhad174. [PMID: 37841501 PMCID: PMC10569243 DOI: 10.1093/hr/uhad174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 08/23/2023] [Indexed: 10/17/2023]
Abstract
Recognized as a pivotal developmental transition, flowering marks the continuation of a plant's life cycle. Vernalization and photoperiod are two major flowering pathways orchestrating numerous florigenic signals. Methylation, including histone, DNA and RNA methylation, is one of the recent foci in plant development. Considerable studies reveal that methylation seems to show an increasing potential regulatory role in plant flowering via altering relevant gene expression without altering the genetic basis. However, little has been reviewed about whether and how methylation acts on vernalization- and photoperiod-induced flowering before and after FLOWERING LOCUS C (FLC) reactivation, what role RNA methylation plays in vernalization- and photoperiod-induced flowering, how methylation participates simultaneously in both vernalization- and photoperiod-induced flowering, the heritability of methylation memory under the vernalization/photoperiod pathway, and whether and how methylation replaces vernalization/photoinduction to regulate flowering. Our review provides insight about the crosstalk among the genetic control of the flowering gene network, methylation (methyltransferases/demethylases) and external signals (cold, light, sRNA and phytohormones) in vernalization and photoperiod pathways. The existing evidence that RNA methylation may play a potential regulatory role in vernalization- and photoperiod-induced flowering has been gathered and represented for the first time. This review speculates about and discusses the possibility of substituting methylation for vernalization and photoinduction to promote flowering. Current evidence is utilized to discuss the possibility of future methylation reagents becoming flowering regulators at the molecular level.
Collapse
Affiliation(s)
- Meimei Shi
- College of Horticulture, Gansu Agricultural University, Lanzhou 730070, China
| | - Chunlei Wang
- College of Horticulture, Gansu Agricultural University, Lanzhou 730070, China
| | - Peng Wang
- Vegetable and Flower Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Fahong Yun
- College of Horticulture, Gansu Agricultural University, Lanzhou 730070, China
| | - Zhiya Liu
- College of Horticulture, Gansu Agricultural University, Lanzhou 730070, China
| | - Fujin Ye
- College of Horticulture, Gansu Agricultural University, Lanzhou 730070, China
| | - Lijuan Wei
- College of Horticulture, Gansu Agricultural University, Lanzhou 730070, China
| | - Weibiao Liao
- College of Horticulture, Gansu Agricultural University, Lanzhou 730070, China
| |
Collapse
|
3
|
Sun X, Guo Y, Zhang Y, Zhao P, Wang Z, Wei Z, Qiao H. Colon Cancer-Related Genes Identification and Function Study Based on Single-Cell Multi-Omics Integration. Front Cell Dev Biol 2021; 9:789587. [PMID: 34901030 PMCID: PMC8657154 DOI: 10.3389/fcell.2021.789587] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 11/01/2021] [Indexed: 12/13/2022] Open
Abstract
Transcriptomes and DNA methylation of colon cancer at the single-cell level are used to identify marker genes and improve diagnoses and therapies. Seven colon cancer subtypes are recognized based on the single-cell RNA sequence, and the differentially expressed genes regulated by dysregulated methylation are identified as marker genes for different types of colon cancer. Compared with normal colon cells, marker genes of different types show very obvious specificity, especially upregulated genes in tumors. Functional enrichment analysis for marker genes indicates a possible relation between colon cancer and nervous system disease, moreover, the weak immune system is verified in colon cancer. The heightened expression of markers and the reduction of methylation in colon cancer promote tumor development in an extensive mechanism so that there is no biological process that can be enriched in different types.
Collapse
Affiliation(s)
- Xuepu Sun
- The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yu Zhang
- Department of Neurosurgery, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Peng Zhao
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Zhaoqing Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Zheng Wei
- The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Haiquan Qiao
- The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
4
|
Malebary SJ, Alzahrani E, Khan YD. A comprehensive tool for accurate identification of methyl-Glutamine sites. J Mol Graph Model 2021; 110:108074. [PMID: 34768228 DOI: 10.1016/j.jmgm.2021.108074] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 10/15/2021] [Accepted: 11/02/2021] [Indexed: 11/16/2022]
Abstract
Methylation is a biochemical process involved in nearly all of the human body functions. Glutamine is considered an indispensable amino acid that is susceptible to methylation via post-translational modification (PTM). Modern research has proved that methylation plays a momentous role in the progression of most types of cancers. Therefore, there is a need for an effective method to predict glutamine sites vulnerable to methylation accurately and inexpensively. The motive of this study is the formulation of an accurate method that could predict such sites with high accuracy. Various computationally intelligent classifiers were employed for their formulation and evaluation. Rigorous validations prove that deep learning performs best as compared to other classifiers. The accuracy (ACC) and the area under the receiver operating curve (AUC) obtained by 10-fold cross-validation was 0.962 and 0.981, while with the jackknife testing, it was 0.968 and 0.980, respectively. From these results, it is concluded that the proposed methodology works sufficiently well for the prediction of methyl-glutamine sites. The webserver's code, developed for the prediction of methyl-glutamine sites, is freely available at https://github.com/s20181080001/WebServer.git. The code can easily be set up by any intermediate-level Python user.
Collapse
Affiliation(s)
- Sharaf J Malebary
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, P.O. Box 344, Rabigh, 21911, Saudi Arabia.
| | - Ebraheem Alzahrani
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P. O. Box 80203, Jeddah, 21589, Saudi Arabia.
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan.
| |
Collapse
|
5
|
Improved protein relative solvent accessibility prediction using deep multi-view feature learning framework. Anal Biochem 2021; 631:114358. [PMID: 34478704 DOI: 10.1016/j.ab.2021.114358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/22/2021] [Accepted: 08/25/2021] [Indexed: 11/20/2022]
Abstract
The accurate prediction of the relative solvent accessibility of a protein is critical to understanding its 3D structure and biological function. In this study, a novel deep multi-view feature learning (DMVFL) framework that integrates three different neural network units, i.e., bidirectional long short-term memory recurrent neural network, squeeze-and-excitation, and fully-connected hidden layer, with four sequence-based single-view features, i.e., position-specific scoring matrix, position-specific frequency matrix, predicted secondary structure, and roughly predicted three-state relative solvent accessibility probability, is developed to accurately predict relative solvent accessibility information of protein. On the basis of this newly developed framework, one new protein relative solvent accessibility predictor was proposed and called DMVFL-RSA, which employs a customized multiple feedback mechanism that helps to extract discriminative information embedded in the four single-view features. In benchmark tests on TEST524 and CASP14-derived (CASP14set) datasets, DMVFL-RSA outperforms other existing state-of-the-art protein relative solvent accessibility predictors when predicting two-state (exposure threshold of 25%), three-state (exposure thresholds of 9% and 36%), and four-state (exposure thresholds of 4%, 25%, and 50%) discrete values. For real-valued prediction on TEST524 and CASP14set, DMVFL-RSA has also gained high Pearson correlation coefficient values, indicating a positive correlation between the predicted and native relative solvent accessibility. Detailed analyses show that the major advantages of DMVFL-RSA lie in the high efficiency of the DMVFL framework, the applied multiple feedback mechanism, and the strong sensitivity of the sequence-based features. The web server of DMVFL-RSA is freely available at https://jun-csbio.github.io/DMVFL-RSA/for academic use. The standalone package of DMVFL-RSA is downloadable at https://github.com/XueQiangFan/DMVFL-RSA.
Collapse
|
6
|
Yang X, Ye X, Li X, Wei L. iDNA-MT: Identification DNA Modification Sites in Multiple Species by Using Multi-Task Learning Based a Neural Network Tool. Front Genet 2021; 12:663572. [PMID: 33868390 PMCID: PMC8044371 DOI: 10.3389/fgene.2021.663572] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 03/02/2021] [Indexed: 02/04/2023] Open
Abstract
Motivation DNA N4-methylcytosine (4mC) and N6-methyladenine (6mA) are two important DNA modifications and play crucial roles in a variety of biological processes. Accurate identification of the modifications is essential to better understand their biological functions and mechanisms. However, existing methods to identify 4mA or 6mC sites are all single tasks, which demonstrates that they can identify only a certain modification in one species. Therefore, it is desirable to develop a novel computational method to identify the modification sites in multiple species simultaneously. Results In this study, we proposed a computational method, called iDNA-MT, to identify 4mC sites and 6mA sites in multiple species, respectively. The proposed iDNA-MT mainly employed multi-task learning coupled with the bidirectional gated recurrent units (BGRU) to capture the sharing information among different species directly from DNA primary sequences. Experimental comparative results on two benchmark datasets, containing different species respectively, show that either for identifying 4mA or for 6mC site in multiple species, the proposed iDNA-MT outperforms other state-of-the-art single-task methods. The promising results have demonstrated that iDNA-MT has great potential to be a powerful and practically useful tool to accurately identify DNA modifications.
Collapse
Affiliation(s)
- Xiao Yang
- School of Software, Shandong University, Jinan, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Xuehong Li
- Department of Rehabilitation, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Lesong Wei
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| |
Collapse
|
7
|
He S, Guo F, Zou Q, HuiDing. MRMD2.0: A Python Tool for Machine Learning with Feature Ranking and Reduction. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200503030350] [Citation(s) in RCA: 101] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Aims:
The study aims to find a way to reduce the dimensionality of the dataset.
Background:
Dimensionality reduction is the key issue of the machine learning process. It does
not only improve the prediction performance but also could recommend the intrinsic features and
help to explore the biological expression of the machine learning “black box”.
Objective:
A variety of feature selection algorithms are used to select data features to achieve
dimensionality reduction.
Methods:
First, MRMD2.0 integrated 7 different popular feature ranking algorithms with
PageRank strategy. Second, optimized dimensionality was detected with forward adding strategy.
Result:
We have achieved good results in our experiments.
Conclusion:
Several works have been tested with MRMD2.0. It showed well performance.
Otherwise, it also can draw the performance curves according to the feature dimensionality. If
users want to sacrifice accuracy for fewer features, they can select the dimensionality from the
performance curves.
Other:
We developed friendly python tools together with the web server. The users could upload
their csv, arff or libsvm format files. Then the webserver would help to rank features and find the
optimized dimensionality.
Collapse
Affiliation(s)
- Shida He
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - HuiDing
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
8
|
Dai C, Feng P, Cui L, Su R, Chen W, Wei L. Iterative feature representation algorithm to improve the predictive performance of N7-methylguanosine sites. Brief Bioinform 2020; 22:5964186. [PMID: 33169141 DOI: 10.1093/bib/bbaa278] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 09/11/2020] [Accepted: 09/21/2020] [Indexed: 01/13/2023] Open
Abstract
MOTIVATION N7-methylguanosine (m7G) is an important epigenetic modification, playing an essential role in gene expression regulation. Therefore, accurate identification of m7G modifications will facilitate revealing and in-depth understanding their potential functional mechanisms. Although high-throughput experimental methods are capable of precisely locating m7G sites, they are still cost ineffective. Therefore, it's necessary to develop new methods to identify m7G sites. RESULTS In this work, by using the iterative feature representation algorithm, we developed a machine learning based method, namely m7G-IFL, to identify m7G sites. To demonstrate its superiority, m7G-IFL was evaluated and compared with existing predictors. The results demonstrate that our predictor outperforms existing predictors in terms of accuracy for identifying m7G sites. By analyzing and comparing the features used in the predictors, we found that the positive and negative samples in our feature space were more separated than in existing feature space. This result demonstrates that our features extracted more discriminative information via the iterative feature learning process, and thus contributed to the predictive performance improvement.
Collapse
Affiliation(s)
- Chichi Dai
- Bachelor of Engineering in Software Engineering from Sichuan University
| | | | - Lizhen Cui
- School of Software, Shandong University, the Deputy Director of the E-Commerce Research Center
| | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wei Chen
- School of Life Sciences, North China University of Science and Technology, 21 Bohai Road, Caofeidian Xincheng, Tangshan 063210, China
| | - Leyi Wei
- Computer Science from Xiamen University, China
| |
Collapse
|
9
|
Dou L, Li X, Zhang L, Xiang H, Xu L. iGlu_AdaBoost: Identification of Lysine Glutarylation Using the AdaBoost Classifier. J Proteome Res 2020; 20:191-201. [PMID: 33090794 DOI: 10.1021/acs.jproteome.0c00314] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Lysine glutarylation is a newly reported post-translational modification (PTM) that plays significant roles in regulating metabolic and mitochondrial processes. Accurate identification of protein glutarylation is the primary task to better investigate molecular functions and various applications. Due to the common disadvantages of the time-consuming and expensive nature of traditional biological sequencing techniques as well as the explosive growth of protein data, building precise computational models to rapidly diagnose glutarylation is a popular and feasible solution. In this work, we proposed a novel AdaBoost-based predictor called iGlu_AdaBoost to distinguish glutarylation and non-glutarylation sequences. Here, the top 37 features were chosen from a total of 1768 combined features using Chi2 following incremental feature selection (IFS) to build the model, including 188D, the composition of k-spaced amino acid pairs (CKSAAP), and enhanced amino acid composition (EAAC). With the help of the hybrid-sampling method SMOTE-Tomek, the AdaBoost algorithm was performed with satisfactory recall, specificity, and AUC values of 87.48%, 72.49%, and 0.89 over 10-fold cross validation as well as 72.73%, 71.92%, and 0.63 over independent test, respectively. Further feature analysis inferred that positively charged amino acids RK play critical roles in glutarylation recognition. Our model presented the well generalization ability and consistency of the prediction results of positive and negative samples, which is comparable to four published tools. The proposed predictor is an efficient tool to find potential glutarylation sites and provides helpful suggestions for further research on glutarylation mechanisms and concerned disease treatments.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen 518055, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiaoling Li
- Department of Oncology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150000, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen 518172, China
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| |
Collapse
|