1
|
Han C, Fu S, Chen M, Gou Y, Liu D, Zhang C, Huang X, Xiao L, Zhao M, Zhang J, Xiao Q, Peng D, Xue Y. GPSD: a hybrid learning framework for the prediction of phosphatase-specific dephosphorylation sites. Brief Bioinform 2024; 26:bbae694. [PMID: 39749667 PMCID: PMC11695897 DOI: 10.1093/bib/bbae694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Revised: 11/30/2024] [Accepted: 12/17/2024] [Indexed: 01/04/2025] Open
Abstract
Protein phosphorylation is dynamically and reversibly regulated by protein kinases and protein phosphatases, and plays an essential role in orchestrating a wide range of biological processes. Although a number of tools have been developed for predicting kinase-specific phosphorylation sites (p-sites), computational prediction of phosphatase-specific dephosphorylation sites remains to be a great challenge. In this study, we manually curated 4393 experimentally identified site-specific phosphatase-substrate relationships for 3463 dephosphorylation sites occurring on phosphoserine, phosphothreonine, and/or phosphotyrosine residues, from the literature and public databases. Then, we developed a hybrid learning framework, the group-based prediction system for the prediction of phosphatase-specific dephosphorylation sites (GPSD). For model training, we integrated 10 types of sequence features and utilized three types of machine learning methods, including penalized logistic regression, deep neural networks, and transformer neural networks. First, a pretrained model was constructed using 561 416 nonredundant p-sites and then fine-tuned to generate computational models for predicting general dephosphorylation sites. In addition, 103 individual phosphatase-specific predictors were constructed via transfer learning and meta-learning. For site prediction, one or multiple protein sequences in FASTA format could be inputted, and the prediction results will be shown together with additional annotations, such as protein-protein interactions, structural information, and disorder propensity. The online service of GPSD is freely available at https://gpsd.biocuckoo.cn/. We believe that GPSD can serve as a valuable tool for further analysis of dephosphorylation.
Collapse
Affiliation(s)
- Cheng Han
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan, Hubei 430074, China
| | - Shanshan Fu
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan, Hubei 430074, China
| | - Miaomiao Chen
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan, Hubei 430074, China
| | - Yujie Gou
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan, Hubei 430074, China
| | - Dan Liu
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan, Hubei 430074, China
| | - Chi Zhang
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan, Hubei 430074, China
| | - Xinhe Huang
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan, Hubei 430074, China
| | - Leming Xiao
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan, Hubei 430074, China
| | - Miaoying Zhao
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan, Hubei 430074, China
| | - Jiayi Zhang
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan, Hubei 430074, China
| | - Qiang Xiao
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan, Hubei 430074, China
| | - Di Peng
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan, Hubei 430074, China
| | - Yu Xue
- Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan, Hubei 430074, China
| |
Collapse
|
2
|
Yoodee S, Thongboonkerd V. Bioinformatics and computational analyses of kidney stone modulatory proteins lead to solid experimental evidence and therapeutic potential. Biomed Pharmacother 2023; 159:114217. [PMID: 36623450 DOI: 10.1016/j.biopha.2023.114217] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 12/26/2022] [Accepted: 01/04/2023] [Indexed: 01/09/2023] Open
Abstract
In recent biomedical research, bioinformatics and computational analyses have played essential roles for examining experimental findings and database information. Several bioinformatic tools have been developed and made publicly available for analyzing protein sequence, structure, functional motif/domain, and interactions network. Such properties are very helpful to define biochemical and functional roles of the protein(s) of interest. During the past few decades, bioinformatics and computational biotechnology have been widely applied to kidney stone research. This review summarizes commonly used tools and evidence of bioinformatics and computational biotechnology applied to kidney stone disease (KSD) with special emphasis on analyses of the stone modulatory proteins that play critical roles in kidney stone formation. Such analyses lead to solid experimental evidence to demonstrate mechanisms underlying their stone modulatory activities. The findings obtained from such analyses may also lead to better understanding of KSD pathogenesis and to further development of new therapeutic and preventive strategies.
Collapse
Affiliation(s)
- Sunisa Yoodee
- Medical Proteomics Unit, Research Department, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Visith Thongboonkerd
- Medical Proteomics Unit, Research Department, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
3
|
The Tyrosine Phosphatase SHP2: A New Target for Insulin Resistance? Biomedicines 2022; 10:biomedicines10092139. [PMID: 36140242 PMCID: PMC9495760 DOI: 10.3390/biomedicines10092139] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 08/26/2022] [Accepted: 08/28/2022] [Indexed: 11/17/2022] Open
Abstract
The SH2 containing protein tyrosine phosphatase 2(SHP2) plays essential roles in fundamental signaling pathways, conferring on it versatile physiological functions during development and in homeostasis maintenance, and leading to major pathological outcomes when dysregulated. Many studies have documented that SHP2 modulation disrupted glucose homeostasis, pointing out a relationship between its dysfunction and insulin resistance, and the therapeutic potential of its targeting. While studies from cellular or tissue-specific models concluded on both pros-and-cons effects of SHP2 on insulin resistance, recent data from integrated systems argued for an insulin resistance promoting role for SHP2, and therefore a therapeutic benefit of its inhibition. In this review, we will summarize the general knowledge of SHP2’s molecular, cellular, and physiological functions, explaining the pathophysiological impact of its dysfunctions, then discuss its protective or promoting roles in insulin resistance as well as the potency and limitations of its pharmacological modulation.
Collapse
|
4
|
Wang H, Wang S, Zhang Y, Bi S, Zhu X. A brief review of machine learning methods for RNA methylation sites prediction. Methods 2022; 203:399-421. [DOI: 10.1016/j.ymeth.2022.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 02/15/2022] [Accepted: 03/01/2022] [Indexed: 02/07/2023] Open
|
5
|
Pang Y, Yao L, Jhong JH, Wang Z, Lee TY. AVPIden: a new scheme for identification and functional prediction of antiviral peptides based on machine learning approaches. Brief Bioinform 2021; 22:6323205. [PMID: 34279599 DOI: 10.1093/bib/bbab263] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 06/07/2021] [Accepted: 06/21/2021] [Indexed: 02/06/2023] Open
Abstract
Antiviral peptide (AVP) is a kind of antimicrobial peptide (AMP) that has the potential ability to fight against virus infection. Machine learning-based prediction with a computational biology approach can facilitate the development of the novel therapeutic agents. In this study, we proposed a double-stage classification scheme, named AVPIden, for predicting the AVPs and their functional activities against different viruses. The first stage is to distinguish the AVP from a broad-spectrum peptide collection, including not only the regular peptides (non-AMP) but also the AMPs without antiviral functions (non-AVP). The second stage is responsible for characterizing one or more virus families or species that the AVP targets. Imbalanced learning is utilized to improve the performance of prediction. The AVPIden uses multiple descriptors to precisely demonstrate the peptide properties and adopts explainable machine learning strategies based on Shapley value to exploit how the descriptors impact the antiviral activities. Finally, the evaluation performance of the proposed model suggests its ability to predict the antivirus activities and their potential functions against six virus families (Coronaviridae, Retroviridae, Herpesviridae, Paramyxoviridae, Orthomyxoviridae, Flaviviridae) and eight kinds of virus (FIV, HCV, HIV, HPIV3, HSV1, INFVA, RSV, SARS-CoV). The AVPIden gives an option for reinforcing the development of AVPs with the computer-aided method and has been deployed at http://awi.cuhk.edu.cn/AVPIden/.
Collapse
Affiliation(s)
- Yuxuan Pang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, PR China
| | - Lantian Yao
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, PR China
| | - Jhih-Hua Jhong
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, PR China
| | - Zhuo Wang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, PR China
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, PR China
| |
Collapse
|
6
|
Chaudhari M, Thapa N, Ismail H, Chopade S, Caragea D, Köhn M, Newman RH, Kc DB. DTL-DephosSite: Deep Transfer Learning Based Approach to Predict Dephosphorylation Sites. Front Cell Dev Biol 2021; 9:662983. [PMID: 34249915 PMCID: PMC8264445 DOI: 10.3389/fcell.2021.662983] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 05/20/2021] [Indexed: 11/17/2022] Open
Abstract
Phosphorylation, which is mediated by protein kinases and opposed by protein phosphatases, is an important post-translational modification that regulates many cellular processes, including cellular metabolism, cell migration, and cell division. Due to its essential role in cellular physiology, a great deal of attention has been devoted to identifying sites of phosphorylation on cellular proteins and understanding how modification of these sites affects their cellular functions. This has led to the development of several computational methods designed to predict sites of phosphorylation based on a protein’s primary amino acid sequence. In contrast, much less attention has been paid to dephosphorylation and its role in regulating the phosphorylation status of proteins inside cells. Indeed, to date, dephosphorylation site prediction tools have been restricted to a few tyrosine phosphatases. To fill this knowledge gap, we have employed a transfer learning strategy to develop a deep learning-based model to predict sites that are likely to be dephosphorylated. Based on independent test results, our model, which we termed DTL-DephosSite, achieved efficiency scores for phosphoserine/phosphothreonine residues of 84%, 84% and 0.68 with respect to sensitivity (SN), specificity (SP) and Matthew’s correlation coefficient (MCC). Similarly, DTL-DephosSite exhibited efficiency scores of 75%, 88% and 0.64 for phosphotyrosine residues with respect to SN, SP, and MCC.
Collapse
Affiliation(s)
- Meenal Chaudhari
- Department of Computational Data Science and Engineering, North Carolina A&T State University, Greensboro, NC, United States
| | - Niraj Thapa
- Department of Computational Data Science and Engineering, North Carolina A&T State University, Greensboro, NC, United States
| | - Hamid Ismail
- Department of Computational Data Science and Engineering, North Carolina A&T State University, Greensboro, NC, United States
| | - Sandhya Chopade
- Department of Computational Data Science and Engineering, North Carolina A&T State University, Greensboro, NC, United States
| | - Doina Caragea
- Department of Computer Science, Kansas State University, Manhattan, KS, United States
| | - Maja Köhn
- Faculty of Biology, Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Freiburg, Germany
| | - Robert H Newman
- Department of Biology, North Carolina A&T State University, Greensboro, NC, United States
| | - Dukka B Kc
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS, United States
| |
Collapse
|
7
|
Liu T, Chen JM, Zhang D, Zhang Q, Peng B, Xu L, Tang H. ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features. Front Cell Dev Biol 2021; 8:621144. [PMID: 33490085 PMCID: PMC7820372 DOI: 10.3389/fcell.2020.621144] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Accepted: 11/24/2020] [Indexed: 01/24/2023] Open
Abstract
Apolipoprotein is a group of plasma proteins that are associated with a variety of diseases, such as hyperlipidemia, atherosclerosis, Alzheimer's disease, and diabetes. In order to investigate the function of apolipoproteins and to develop effective targets for related diseases, it is necessary to accurately identify and classify apolipoproteins. Although it is possible to identify apolipoproteins accurately through biochemical experiments, they are expensive and time-consuming. This work aims to establish a high-efficiency and high-accuracy prediction model for recognition of apolipoproteins and their subfamilies. We firstly constructed a high-quality benchmark dataset including 270 apolipoproteins and 535 non-apolipoproteins. Based on the dataset, pseudo-amino acid composition (PseAAC) and composition of k-spaced amino acid pairs (CKSAAP) were used as input vectors. To improve the prediction accuracy and eliminate redundant information, analysis of variance (ANOVA) was used to rank the features. And the incremental feature selection was utilized to obtain the best feature subset. Support vector machine (SVM) was proposed to construct the classification model, which could produce the accuracy of 97.27%, sensitivity of 96.30%, and specificity of 97.76% for discriminating apolipoprotein from non-apolipoprotein in 10-fold cross-validation. In addition, the same process was repeated to generate a new model for predicting apolipoprotein subfamilies. The new model could achieve an overall accuracy of 95.93% in 10-fold cross-validation. According to our proposed model, a convenient webserver called ApoPred was established, which can be freely accessed at http://tang-biolab.com/server/ApoPred/service.html. We expect that this work will contribute to apolipoprotein function research and drug development in relevant diseases.
Collapse
Affiliation(s)
- Ting Liu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
| | - Jia-Mao Chen
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
| | - Dan Zhang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
| | - Bowen Peng
- Division of international Cooperation, Health Commission of Sichuan Province, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
- Central Nervous System Drug Key Laboratory of Sichuan Province, Luzhou, China
| |
Collapse
|
8
|
Liu X, Wang L, Li J, Hu J, Zhang X. Mal-Prec: computational prediction of protein Malonylation sites via machine learning based feature integration : Malonylation site prediction. BMC Genomics 2020; 21:812. [PMID: 33225896 PMCID: PMC7682087 DOI: 10.1186/s12864-020-07166-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 10/20/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Malonylation is a recently discovered post-translational modification that is associated with a variety of diseases such as Type 2 Diabetes Mellitus and different types of cancers. Compared with experimental identification of malonylation sites, computational method is a time-effective process with comparatively low costs. RESULTS In this study, we proposed a novel computational model called Mal-Prec (Malonylation Prediction) for malonylation site prediction through the combination of Principal Component Analysis and Support Vector Machine. One-hot encoding, physio-chemical properties, and composition of k-spaced acid pairs were initially performed to extract sequence features. PCA was then applied to select optimal feature subsets while SVM was adopted to predict malonylation sites. Five-fold cross-validation results showed that Mal-Prec can achieve better prediction performance compared with other approaches. AUC (area under the receiver operating characteristic curves) analysis achieved 96.47 and 90.72% on 5-fold cross-validation of independent data sets, respectively. CONCLUSION Mal-Prec is a computationally reliable method for identifying malonylation sites in protein sequences. It outperforms existing prediction tools and can serve as a useful tool for identifying and discovering novel malonylation sites in human proteins. Mal-Prec is coded in MATLAB and is publicly available at https://github.com/flyinsky6/Mal-Prec , together with the data sets used in this study.
Collapse
Affiliation(s)
- Xin Liu
- Department of Bioinformatics, School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, 221004 Jiangsu China
| | - Liang Wang
- Department of Bioinformatics, School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, 221004 Jiangsu China
- Jiangsu Key Laboratory of New Drug Research and Clinical Pharmacy, School of Pharmacy, Xuzhou Medical University, Xuzhou, 221000 Jiangsu China
| | - Jian Li
- School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA 70118 USA
| | - Junfeng Hu
- Department of Bioinformatics, School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, 221004 Jiangsu China
| | - Xiao Zhang
- Department of Bioinformatics, School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, 221004 Jiangsu China
| |
Collapse
|
9
|
Liu X, Liu Z, Mao X, Li Q. m7GPredictor: An improved machine learning-based model for predicting internal m7G modifications using sequence properties. Anal Biochem 2020; 609:113905. [DOI: 10.1016/j.ab.2020.113905] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 07/24/2020] [Accepted: 08/05/2020] [Indexed: 12/21/2022]
|
10
|
Zhou Y, Cui Q, Zhou Y. NmSEER V2.0: a prediction tool for 2'-O-methylation sites based on random forest and multi-encoding combination. BMC Bioinformatics 2019; 20:690. [PMID: 31874624 PMCID: PMC6929462 DOI: 10.1186/s12859-019-3265-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Background 2′-O-methylation (2′-O-me or Nm) is a post-transcriptional RNA methylation modified at 2′-hydroxy, which is common in mRNAs and various non-coding RNAs. Previous studies revealed the significance of Nm in multiple biological processes. With Nm getting more and more attention, a revolutionary technique termed Nm-seq, was developed to profile Nm sites mainly in mRNA with single nucleotide resolution and high sensitivity. In a recent work, supported by the Nm-seq data, we have reported a method in silico for predicting Nm sites, which relies on nucleotide sequence information, and established an online server named NmSEER. More recently, a more confident dataset produced by refined Nm-seq was available. Therefore, in this work, we redesigned the prediction model to achieve a more robust performance on the new data. Results We redesigned the prediction model from two perspectives, including machine learning algorithm and multi-encoding scheme combination. With optimization by 5-fold cross-validation tests and evaluation by independent test respectively, random forest was selected as the most robust algorithm. Meanwhile, one-hot encoding, together with position-specific dinucleotide sequence profile and K-nucleotide frequency encoding were collectively applied to build the final predictor. Conclusions The predictor of updated version, named NmSEER V2.0, achieves an accurate prediction performance (AUROC = 0.862) and has been settled into a brand-new server, which is available at http://www.rnanut.net/nmseer-v2/ for free.
Collapse
Affiliation(s)
- Yiran Zhou
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China
| | - Qinghua Cui
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China.,Center of Bioinformatics, Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Yuan Zhou
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China.
| |
Collapse
|
11
|
Liu Z, Dong W, Jiang W, He Z. csDMA: an improved bioinformatics tool for identifying DNA 6 mA modifications via Chou's 5-step rule. Sci Rep 2019; 9:13109. [PMID: 31511570 PMCID: PMC6739324 DOI: 10.1038/s41598-019-49430-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Accepted: 08/24/2019] [Indexed: 12/31/2022] Open
Abstract
DNA N6-methyldeoxyadenosine (6 mA) modifications were first found more than 60 years ago but were thought to be only widespread in prokaryotes and unicellular eukaryotes. With the development of high-throughput sequencing technology, 6 mA modifications were found in different multicellular eukaryotes by using experimental methods. However, the experimental methods were time-consuming and costly, which makes it is very necessary to develop computational methods instead. In this study, a machine learning-based prediction tool, named csDMA, was developed for predicting 6 mA modifications. Firstly, three feature encoding schemes, Motif, Kmer, and Binary, were used to generate the feature matrix. Secondly, different algorithms were selected into the prediction model and the ExtraTrees model received the best AUC of 0.878 by using 5-fold cross-validation on the training dataset. Besides, the ExtraTrees model also received the best AUC of 0.893 on the independent testing dataset. Finally, we compared our method with state-of-the-art predictors and the results shown that our model achieved better performance than existing tools.
Collapse
Affiliation(s)
- Ze Liu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.,Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| | - Wei Dong
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China. .,Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China.
| | - Wei Jiang
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.,Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| | - Zili He
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.,Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| |
Collapse
|
12
|
Zhang S, Li X, Fan C, Wu Z, Liu Q. Application of Machine Learning Techniques to Predict Protein Phosphorylation Sites. LETT ORG CHEM 2019. [DOI: 10.2174/1570178615666180907150928] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Protein phosphorylation is one of the most important post-translational modifications of proteins.
Almost all processes that regulate the life activities of an organism as well as almost all physiological
and pathological processes are involved in protein phosphorylation. In this paper, we summarize
specific implementation and application of the methods used in protein phosphorylation site prediction
such as the support vector machine algorithm, random forest, Jensen-Shannon divergence combined
with quadratic discriminant analysis, Adaboost algorithm, increment of diversity with quadratic
discriminant analysis, modified CKSAAP algorithm, Bayes classifier combined with phosphorylation
sequences enrichment analysis, least absolute shrinkage and selection operator, stochastic search variable
selection, partial least squares and deep learning. On the basis of this prediction, we use k-nearest
neighbor algorithm with BLOSUM80 matrix method to predict phosphorylation sites. Firstly, we construct
dataset and remove the redundant set of positive and negative samples, that is, removal of protein
sequences with similarity of more than 30%. Next, the proposed method is evaluated by sensitivity
(Sn), specificity (Sp), accuracy (ACC) and Mathew’s correlation coefficient (MCC) these four metrics.
Finally, tenfold cross-validation is employed to evaluate this method. The result, which is verified by
tenfold cross-validation, shows that the average values of Sn, Sp, ACC and MCC of three types of amino
acid (serine, threonine, and tyrosine) are 90.44%, 86.95%, 88.74% and 0.7742, respectively. A
comparison with the predictive performance of PhosphoSVM and Musite reveals that the prediction
performance of the proposed method is better, and it has the advantages of simplicity, practicality and
low time complexity in classification.
Collapse
Affiliation(s)
- Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Xian Li
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Chengcheng Fan
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Zhehui Wu
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Qian Liu
- Centre for Biostatistics, School of Health Sciences, The University of Manchester, Manchester, M13 9PL, United Kingdom
| |
Collapse
|
13
|
Wang X, Yan R. RFAthM6A: a new tool for predicting m 6A sites in Arabidopsis thaliana. PLANT MOLECULAR BIOLOGY 2018; 96:327-337. [PMID: 29340952 DOI: 10.1007/s11103-018-0698-9] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Accepted: 01/05/2018] [Indexed: 06/07/2023]
Abstract
We curated a reliable dataset of m6A sites in Arabidopsis thaliana, built competitive models for predicting m6A sites, extracted predominant rules from the prediction models and analyzed the most important features. In biological RNA, approximately 150 chemical modifications have been discovered, of which N6-methyladenine (m6A) is the most prevalent and abundant. This modification plays an essential role in a myriad of biological mechanisms and regulates RNA localization, nuclear export, translation, stability, alternative splicing, and other processes. However, m6A-seq and other wet-lab techniques do not easily facilitate accurate and complete determination of m6A sites across the transcriptome. Therefore, the use of computational methods to establish accurate models for predicting m6A sites is essential. In this work, we manually curated a reliable dataset of m6A sites and non-m6A sites and developed a new tool called RFAthM6A for predicting m6A sites in Arabidopsis thaliana. Briefly, RFAthM6A consists of four independent models named RFPSNSP, RFPSDSP, RFKSNPF and RFKNF and strict benchmarks show that the AUC values of the four models reached 0.894, 0.914, 0.920 and 0.926, respectively in a fivefold cross validation and the prediction performance of RFPSDSP, RFKSNPF and RFKNF exceeded that of three previously reported models (AthMethPre, M6ATH and RAM-NPPS). Linear combination of the prediction scores of RFPSDSP, RFKSNPF and RFKNF improved the prediction performance. We also extracted several predominant rules that underlie the m6A site identification from the trained models. Furthermore, the most important features of the predictors for the m6A site identification were also analyzed in depth. To facilitate use of our proposed models by interested researchers, all the source codes and datasets are publicly deposited at https://github.com/nongdaxiaofeng/RFAthM6A .
Collapse
Affiliation(s)
- Xiaofeng Wang
- College of Mathematics and Computer Science, Shanxi Normal University, Linfen, 041004, China.
| | - Renxiang Yan
- Institute of Applied Genomics, School of Biological Sciences and Engineering, Fuzhou University, Fuzhou, 350002, China.
| |
Collapse
|
14
|
Wang H, Chen X, Li C, Liu Y, Yang F, Wang C. Sequence-Based Prediction of Cysteine Reactivity Using Machine Learning. Biochemistry 2017; 57:451-460. [DOI: 10.1021/acs.biochem.7b00897] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Haobo Wang
- Synthetic
and Functional Biomolecules Center, Beijing National Laboratory for
Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular
Engineering of Ministry of Education, Peking University, Beijing 100871, China
- Department
of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Xuemin Chen
- Synthetic
and Functional Biomolecules Center, Beijing National Laboratory for
Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular
Engineering of Ministry of Education, Peking University, Beijing 100871, China
- Department
of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Can Li
- Department
of Chemical Engineering, Tsinghua University, Beijing 100084, China
- Peking-Tsinghua
Center for Life Sciences, Peking University, Beijing 100871, China
| | - Yuan Liu
- Synthetic
and Functional Biomolecules Center, Beijing National Laboratory for
Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular
Engineering of Ministry of Education, Peking University, Beijing 100871, China
- Department
of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Peking University, Beijing 100871, China
| | - Fan Yang
- Synthetic
and Functional Biomolecules Center, Beijing National Laboratory for
Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular
Engineering of Ministry of Education, Peking University, Beijing 100871, China
- Department
of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Chu Wang
- Synthetic
and Functional Biomolecules Center, Beijing National Laboratory for
Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular
Engineering of Ministry of Education, Peking University, Beijing 100871, China
- Department
of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Peking University, Beijing 100871, China
| |
Collapse
|
15
|
Abram CL, Lowell CA. Shp1 function in myeloid cells. J Leukoc Biol 2017; 102:657-675. [PMID: 28606940 DOI: 10.1189/jlb.2mr0317-105r] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Revised: 05/01/2017] [Accepted: 05/02/2017] [Indexed: 01/28/2023] Open
Abstract
The motheaten mouse was first described in 1975 as a model of systemic inflammation and autoimmunity, as a result of immune system dysregulation. The phenotype was later ascribed to mutations in the cytoplasmic tyrosine phosphatase Shp1. This phosphatase is expressed widely throughout the hematopoietic system and has been shown to impact a multitude of cell signaling pathways. The determination of which cell types contribute to the different aspects of the phenotype caused by global Shp1 loss or mutation and which pathways within these cell types are regulated by Shp1 is important to further our understanding of immune system regulation. In this review, we focus on the role of Shp1 in myeloid cells and how its dysregulation affects immune function, which can impact human disease.
Collapse
Affiliation(s)
- Clare L Abram
- Department of Laboratory Medicine and Immunology Program, University of California, San Francisco, California, USA
| | - Clifford A Lowell
- Department of Laboratory Medicine and Immunology Program, University of California, San Francisco, California, USA
| |
Collapse
|
16
|
Wang X, Yan R, Li J, Song J. SOHPRED: a new bioinformatics tool for the characterization and prediction of human S-sulfenylation sites. MOLECULAR BIOSYSTEMS 2016; 12:2849-58. [DOI: 10.1039/c6mb00314a] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
SOHPRED is a new and competitive bioinformatics tool for characterizing and predicting human S-sulfenylation sites.
Collapse
Affiliation(s)
- Xiaofeng Wang
- College of Mathematics and Computer Science
- Shanxi Normal University
- Linfen 041004
- China
| | - Renxiang Yan
- Institute of Applied Genomics
- School of Biological Sciences and Engineering
- Fuzhou University
- Fuzhou 350002
- China
| | - Jinyan Li
- Advanced Analytics Institute and Centre for Health Technologies
- University of Technology Sydney
- Ultimo
- Australia
| | - Jiangning Song
- Infection and Immunity Program
- Biomedicine Discovery Institute and The Department of Biochemistry and Molecular Biology
- Monash University
- Clayton
- Australia
| |
Collapse
|