Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Chang X, Zhu Y, Chen Y, Li L. DeepNphos: A deep-learning architecture for prediction of N-phosphorylation sites. Comput Biol Med 2024;170:108079. [PMID: 38295472 DOI: 10.1016/j.compbiomed.2024.108079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 01/25/2024] [Accepted: 01/27/2024] [Indexed: 02/02/2024]

Abstract

MOTIVATION

Phosphorylation, a prevalent post-translational modification, plays a crucial role in regulating cellular activities. This process encompasses O-phosphorylation (e.g., phosphoserine) and N-phosphorylation (e.g., phospho-lysine (pK), phospho-arginine (pR), and phospho-histidine (pH)). While significant research has focused on O-phosphorylation, resulting in the development of various algorithms for predicting O-phosphorylation sites with commendable performance, there has been a notable absence of models designed to predict N-phosphorylation sites. This study introduces an integrated model named DeepNphos, designed to predict N-phosphorylation sites. This model is developed based on the analysis of thousands of experimentally identified pK, pR and pH sites.

RESULTS

Observing that the Convolutional Neural Network (CNN) model, incorporating the One-Hot encoding feature, demonstrates favorable performance in comparison to other models when predicting pK, pR, and pH sites. Additionally, pK exhibits similarities to other lysine modification types, and integrating the CNN model with a deep-transfer learning (DTL) strategy based on tens of thousands of known lysine modification sites could enhance pK prediction performance. In contrast, pR exhibits little similarity to other arginine modification types, and the integration of DTL has minimal impact on pR prediction performance. Furthermore, the decision was made to refrain from incorporating the DTL strategy in predicting pH sites, given the scarcity of histidine modification sites beyond those associated with pH. The final classifiers for predicting pK, pR, and pH sites achieve AUC values of 0.856, 0.805 and 0.802 for ten-fold cross-validation, respectively. Overall, DeepNphos is the first classifier for predicting N-phosphorylation sites, accessible at https://github.com/ChangXulinmessi/DeepNPhos.

Collapse

Poretsky E, Andorf CM, Sen TZ. PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models. PLANT DIRECT 2023;7:e554. [PMID: 38124705 PMCID: PMC10732782 DOI: 10.1002/pld3.554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 11/20/2023] [Accepted: 11/26/2023] [Indexed: 12/23/2023]

Pakhrin SC, Pokharel S, Pratyush P, Chaudhari M, Ismail HD, Kc DB. LMPhosSite: A Deep Learning-Based Approach for General Protein Phosphorylation Site Prediction Using Embeddings from the Local Window Sequence and Pretrained Protein Language Model. J Proteome Res 2023;22:2548-2557. [PMID: 37459437 DOI: 10.1021/acs.jproteome.2c00667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]

Bao W, Gu Y, Chen B, Yu H. Golgi_DF: Golgi proteins classification with deep forest. Front Neurosci 2023;17:1197824. [PMID: 37250391 PMCID: PMC10213405 DOI: 10.3389/fnins.2023.1197824] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 04/19/2023] [Indexed: 05/31/2023] Open

Ahmed F, Dehzangi I, Hasan MM, Shatabda S. Accurately predicting microbial phosphorylation sites using evolutionary and structural features. Gene 2023;851:146993. [DOI: 10.1016/j.gene.2022.146993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 10/05/2022] [Accepted: 10/14/2022] [Indexed: 11/27/2022]

Liu S, Cui C, Chen H, Liu T. Ensemble learning-based feature selection for phosphorylation site detection. Front Genet 2022;13:984068. [PMID: 36338976 PMCID: PMC9634105 DOI: 10.3389/fgene.2022.984068] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 10/05/2022] [Indexed: 11/18/2022] Open

Niu M, Zou Q. SgRNA-RF: Identification of SgRNA On-Target Activity With Imbalanced Datasets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:2442-2453. [PMID: 33979289 DOI: 10.1109/tcbb.2021.3079116] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Wang X, Zhang Z, Zhang C, Meng X, Shi X, Qu P. TransPhos: A Deep-Learning Model for General Phosphorylation Site Prediction Based on Transformer-Encoder Architecture. Int J Mol Sci 2022;23:ijms23084263. [PMID: 35457080 PMCID: PMC9029334 DOI: 10.3390/ijms23084263] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 04/04/2022] [Accepted: 04/09/2022] [Indexed: 02/06/2023] Open

Ao C, Jiao S, Wang Y, Yu L, Zou Q. Biological Sequence Classification: A Review on Data and General Methods. RESEARCH 2022. [DOI: 10.34133/research.0011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Zhang S, Shi H. iR5hmcSC: Identifying RNA 5-hydroxymethylcytosine with multiple features based on stacking learning. Comput Biol Chem 2021;95:107583. [PMID: 34562726 DOI: 10.1016/j.compbiolchem.2021.107583] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 09/02/2021] [Accepted: 09/12/2021] [Indexed: 01/27/2023]

Li Y, Pu F, Wang J, Zhou Z, Zhang C, He F, Ma Z, Zhang J. Machine Learning Methods in Prediction of Protein Palmitoylation Sites: A Brief Review. Curr Pharm Des 2021;27:2189-2198. [PMID: 33183190 DOI: 10.2174/1381612826666201112142826] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 07/27/2020] [Indexed: 11/22/2022]

Lv H, Dao FY, Zulfiqar H, Lin H. DeepIPs: comprehensive assessment and computational identification of phosphorylation sites of SARS-CoV-2 infection using a deep learning-based approach. Brief Bioinform 2021;22:6310410. [PMID: 34184738 PMCID: PMC8406875 DOI: 10.1093/bib/bbab244] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 05/18/2020] [Accepted: 06/03/2021] [Indexed: 11/14/2022] Open

Jamal S, Ali W, Nagpal P, Grover A, Grover S. Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins. J Transl Med 2021;19:218. [PMID: 34030700 PMCID: PMC8142496 DOI: 10.1186/s12967-021-02851-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 04/18/2021] [Indexed: 12/11/2022] Open

Abstract

BACKGROUND

Post-translational modification (PTM) is a biological process that alters proteins and is therefore involved in the regulation of various cellular activities and pathogenesis. Protein phosphorylation is an essential process and one of the most-studied PTMs: it occurs when a phosphate group is added to serine (Ser, S), threonine (Thr, T), or tyrosine (Tyr, Y) residue. Dysregulation of protein phosphorylation can lead to various diseases-most commonly neurological disorders, Alzheimer's disease, and Parkinson's disease-thus necessitating the prediction of S/T/Y residues that can be phosphorylated in an uncharacterized amino acid sequence. Despite a surplus of sequencing data, current experimental methods of PTM prediction are time-consuming, costly, and error-prone, so a number of computational methods have been proposed to replace them. However, phosphorylation prediction remains limited, owing to substrate specificity, performance, and the diversity of its features.

METHODS

In the present study we propose machine-learning-based predictors that use the physicochemical, sequence, structural, and functional information of proteins to classify S/T/Y phosphorylation sites. Rigorous feature selection, the minimum redundancy/maximum relevance approach, and the symmetrical uncertainty method were employed to extract the most informative features to train the models.

RESULTS

The RF and SVM models generated using diverse feature types in the present study were highly accurate as is evident from good values for different statistical measures. Moreover, independent test sets and benchmark validations indicated that the proposed method clearly outperformed the existing methods, demonstrating its ability to accurately predict protein phosphorylation.

CONCLUSIONS

The results obtained in the present work indicate that the proposed computational methodology can be effectively used for predicting putative phosphorylation sites further facilitating discovery of various biological processes mechanisms.

Collapse

Li A, Deng Y, Tan Y, Chen M. A Transfer Learning-Based Approach for Lysine Propionylation Prediction. Front Physiol 2021;12:658633. [PMID: 33967828 PMCID: PMC8096918 DOI: 10.3389/fphys.2021.658633] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 03/15/2021] [Indexed: 12/12/2022] Open

Yao Y, Zhang S, Liang Y. iORI-ENST: identifying origin of replication sites based on elastic net and stacking learning. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021;32:317-331. [PMID: 33730950 DOI: 10.1080/1062936x.2021.1895884] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 02/23/2021] [Indexed: 06/12/2023]

Niu M, Lin Y, Zou Q. sgRNACNN: identifying sgRNA on-target activity in four crops using ensembles of convolutional neural networks. PLANT MOLECULAR BIOLOGY 2021;105:483-495. [PMID: 33385273 DOI: 10.1007/s11103-020-01102-y] [Citation(s) in RCA: 65] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Accepted: 12/01/2020] [Indexed: 06/12/2023]

Dai R, Zhang W, Tang W, Wynendaele E, Zhu Q, Bin Y, De Spiegeleer B, Xia J. BBPpred: Sequence-Based Prediction of Blood-Brain Barrier Peptides with Feature Representation Learning and Logistic Regression. J Chem Inf Model 2021;61:525-534. [PMID: 33426873 DOI: 10.1021/acs.jcim.0c01115] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Yang L, Jiao X. Distinguishing Enzymes and Non-enzymes Based on Structural Information with an Alignment Free Approach. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200324134037] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Guo L, Wang Y, Xu X, Cheng KK, Long Y, Xu J, Li S, Dong J. DeepPSP: A Global-Local Information-Based Deep Neural Network for the Prediction of Protein Phosphorylation Sites. J Proteome Res 2020;20:346-356. [PMID: 33241931 DOI: 10.1021/acs.jproteome.0c00431] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Abstract

Identification of phosphorylation sites is an important step in the function study and drug design of proteins. In recent years, there have been increasing applications of the computational method in the identification of phosphorylation sites because of its low cost and high speed. Most of the currently available methods focus on using local information around potential phosphorylation sites for prediction and do not take the global information of the protein sequence into consideration. Here, we demonstrated that the global information of protein sequences may be also critical for phosphorylation site prediction. In this paper, a new deep neural network model, called DeepPSP, was proposed for the prediction of protein phosphorylation sites. In the DeepPSP model, two parallel modules were introduced to extract both local and global features from protein sequences. Two squeeze-and-excitation blocks and one bidirectional long short-term memory block were introduced into each module to capture effective representations of the sequences. Comparative studies were carried out to evaluate the performance of DeepPSP, and four other prediction methods using public data sets The F1-score, area under receiver operating characteristic curves (AUROC), and area under precision-recall curves (AUPRC) of DeepPSP were found to be 0.4819, 0.82, and 0.50, respectively, for S/T general site prediction and 0.4206, 0.73, and 0.39, respectively, for Y general site prediction. Compared with the MusiteDeep method, the F1-score, AUROC, and AUPRC of DeepPSP were found to increase by 8.6, 2.5, and 8.7%, respectively, for S/T general site prediction and by 20.6, 5.8, and 18.2%, respectively, for Y general site prediction. Among the tested methods, the developed DeepPSP method was also found to produce best results for different kinase-specific site predictions including CDK, mitogen-activated protein kinase, CAMK, AGC, and CMGC. Taken together, the developed DeepPSP method may offer a more accurate phosphorylation site prediction by including global information. It may serve as an alternative model with better performance and interpretability for protein phosphorylation site prediction.

Collapse

Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020;20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]

Identification of Latent Oncogenes with a Network Embedding Method and Random Forest. BIOMED RESEARCH INTERNATIONAL 2020;2020:5160396. [PMID: 33029511 PMCID: PMC7530476 DOI: 10.1155/2020/5160396] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/09/2020] [Accepted: 09/14/2020] [Indexed: 12/29/2022]

Du Z, He Y, Li J, Uversky VN. DeepAdd: Protein function prediction from k-mer embedding and additional features. Comput Biol Chem 2020;89:107379. [PMID: 33011616 DOI: 10.1016/j.compbiolchem.2020.107379] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2019] [Revised: 09/15/2020] [Accepted: 09/17/2020] [Indexed: 10/23/2022]

Ahmed S, Kabir M, Arif M, Khan ZU, Yu DJ. DeepPPSite: A deep learning-based model for analysis and prediction of phosphorylation sites using efficient sequence information. Anal Biochem 2020;612:113955. [PMID: 32949607 DOI: 10.1016/j.ab.2020.113955] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 08/30/2020] [Accepted: 09/11/2020] [Indexed: 12/29/2022]

Abstract

Phosphorylation is a ubiquitous type of post-translational modification (PTM) that occurs in both eukaryotic and prokaryotic cells where in a phosphate group binds with amino acid residues. These specific residues, i.e., serine (S), threonine (T), and tyrosine (Y), exhibit diverse functions at the molecular level. Recent studies have determined that some diseases such as cancer, diabetes, and neurodegenerative diseases are caused by abnormal phosphorylation. Based on its potential applications in biological research and drug development, the large-scale identification of phosphorylation sites has attracted interest. Existing wet-lab technologies for targeting phosphorylation sites are overpriced and time consuming. Thus, computational algorithms that can efficiently accelerate the annotation of phosphorylation sites from massive protein sequences are needed. Numerous machine learning-based methods have been implemented for phosphorylation sites prediction. However, despite extensive efforts, existing computational approaches continue to have inadequate performance, particularly in terms of overall ACC, MCC, and AUC. In this paper, we report a novel deep learning-based predictor to overcome these performance hurdles, DeepPPSite, which was constructed using a stacked long short-term memory recurrent network for predicting phosphorylation sites. The proposed technique expediently learns the protein representations from conjoint protein descriptors. The experimental results indicated that our model achieved superior performance on the training dataset for S, T and Y, with MCC values of 0.608, 0.602, and 0.558, respectively, using a 10-fold cross-validation test. We further determined the generalization efficacy of the proposed predictor DeepPPSite by conducting a rigorous independent test. The predictive MCC values were 0.358, 0.356, and 0.350 for the S, T, and Y phosphorylation sites, respectively. Rigorous cross-validation and independent validation tests for the three types of phosphorylation sites demonstrated that the designed DeepPPSite tool significantly outperforms state-of-the-art methods.

Collapse

Hidden dynamic signatures drive substrate selectivity in the disordered phosphoproteome. Proc Natl Acad Sci U S A 2020;117:23606-23616. [PMID: 32900925 PMCID: PMC7519349 DOI: 10.1073/pnas.1921473117] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Abstract

The discovery that more than 40% of the eukaryotic proteome is intrinsically disordered, and that these disordered segments are enriched in phosphorylation sites, suggests that conformational heterogeneity may be important to kinase selectivity. Indeed, phosphorylation prediction programs reliant on classic notions of conserved sequence information (i.e., “vertical information”) are only partially effective. We find that the conformational equilibrium of the phosphorylatable site, whose information is embedded in sequence-averaged energetic and structural properties of the protein (i.e., “horizontal information”), plays a major role in distinguishing phosphorylatable versus nonphosphorylatable sites. In fact, employing both horizontal and vertical information produces a state-of-the-art phosphorylation predictor, wherein the conformational equilibrium of the disordered chain is the dominant contributor.

Phosphorylation sites are hyperabundant in the eukaryotic disordered proteome, suggesting that conformational fluctuations play a major role in determining to what extent a kinase interacts with a particular substrate. In biophysical terms, substrate selectivity may be determined not just by the structural–chemical complementarity between the kinase and its protein substrates but also by the free energy difference between the conformational ensembles that are, or are not, recognized by the kinase. To test this hypothesis, we developed a statistical-thermodynamics-based informatics framework, which allows us to probe for the contribution of equilibrium fluctuations to phosphorylation, as evaluated by the ability to predict Ser/Thr/Tyr phosphorylation sites in the disordered proteome. Essential to this framework is a decomposition of substrate sequence information into two types: vertical information encoding conserved kinase specificity motifs and horizontal information encoding substrate conformational equilibrium that is embedded, but often not apparent, within position-specific conservation patterns. We find not only that conformational fluctuations play a major role but also that they are the dominant contribution to substrate selectivity. In fact, the main substrate classifier distinguishing selectivity is the magnitude of change in local compaction of the disordered chain upon phosphorylation of these mostly singly phosphorylated sites. In addition to providing fundamental insights into the consequences of phosphorylation across the proteome, our approach provides a statistical-thermodynamic strategy for partitioning any sequence-based search into contributions from structural–chemical complementarity and those from changes in conformational equilibrium.

Collapse

Savage SR, Zhang B. Using phosphoproteomics data to understand cellular signaling: a comprehensive guide to bioinformatics resources. Clin Proteomics 2020;17:27. [PMID: 32676006 PMCID: PMC7353784 DOI: 10.1186/s12014-020-09290-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 07/04/2020] [Indexed: 12/19/2022] Open

Song C, Yang B. Use Chou’s 5-Step Rule to Classify Protein Modification Sites with Neural Network. SCIENTIFIC PROGRAMMING 2020;2020:1-7. [DOI: 10.1155/2020/8894633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]

Luo F, Wang M, Liu Y, Zhao XM, Li A. DeepPhos: prediction of protein phosphorylation sites with deep learning. Bioinformatics 2020;35:2766-2773. [PMID: 30601936 PMCID: PMC6691328 DOI: 10.1093/bioinformatics/bty1051] [Citation(s) in RCA: 97] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 11/19/2018] [Accepted: 12/12/2018] [Indexed: 11/28/2022] Open

Wang D, Liang Y, Xu D. Capsule network for protein post-translational modification site prediction. Bioinformatics 2020;35:2386-2394. [PMID: 30520972 DOI: 10.1093/bioinformatics/bty977] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 10/13/2018] [Accepted: 12/05/2018] [Indexed: 11/12/2022] Open

Long H, Sun Z, Li M, Fu HY, Lin MC. Predicting Protein Phosphorylation Sites Based on Deep Learning. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190902154332] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Ju Z, Wang SY. Prediction of 2-hydroxyisobutyrylation sites by integrating multiple sequence features with ensemble support vector machine. Comput Biol Chem 2020;87:107280. [PMID: 32505881 DOI: 10.1016/j.compbiolchem.2020.107280] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 05/05/2020] [Accepted: 05/07/2020] [Indexed: 10/24/2022]

Yang Y, Peng X, Ying P, Tian J, Li J, Ke J, Zhu Y, Gong Y, Zou D, Yang N, Wang X, Mei S, Zhong R, Gong J, Chang J, Miao X. AWESOME: a database of SNPs that affect protein post-translational modifications. Nucleic Acids Res 2020;47:D874-D880. [PMID: 30215764 PMCID: PMC6324025 DOI: 10.1093/nar/gky821] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 09/04/2018] [Indexed: 12/19/2022] Open

Affiliation(s)

Yang Yang Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Xiating Peng Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Pingting Ying Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Jianbo Tian Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Jiaoyuan Li Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Juntao Ke Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Ying Zhu Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Yajie Gong Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Danyi Zou Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Nan Yang Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Xiaoyang Wang Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Shufang Mei Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Rong Zhong Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Jing Gong Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Jiang Chang Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China
Xiaoping Miao Key Laboratory for Environment and Health (Ministry of Education), Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, 430030, China

Collapse

CirRNAPL: A web server for the identification of circRNA based on extreme learning machine. Comput Struct Biotechnol J 2020;18:834-842. [PMID: 32308930 PMCID: PMC7153170 DOI: 10.1016/j.csbj.2020.03.028] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Revised: 03/29/2020] [Accepted: 03/29/2020] [Indexed: 12/27/2022] Open

Hou R, Wang L, Wu YJ. Predicting ATP-Binding Cassette Transporters Using the Random Forest Method. Front Genet 2020;11:156. [PMID: 32269586 PMCID: PMC7109328 DOI: 10.3389/fgene.2020.00156] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Accepted: 02/11/2020] [Indexed: 12/21/2022] Open

Lv Z, Zhang J, Ding H, Zou Q. RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites. Front Bioeng Biotechnol 2020;8:134. [PMID: 32175316 PMCID: PMC7054385 DOI: 10.3389/fbioe.2020.00134] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 02/10/2020] [Indexed: 12/21/2022] Open

Ru X, Wang L, Li L, Ding H, Ye X, Zou Q. Exploration of the correlation between GPCRs and drugs based on a learning to rank algorithm. Comput Biol Med 2020;119:103660. [PMID: 32090901 DOI: 10.1016/j.compbiomed.2020.103660] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 02/04/2020] [Accepted: 02/12/2020] [Indexed: 02/01/2023]

Huang Q, Zhang J, Wei L, Guo F, Zou Q. 6mA-RicePred: A Method for Identifying DNA N ⁶-Methyladenine Sites in the Rice Genome Based on Feature Fusion. FRONTIERS IN PLANT SCIENCE 2020;11:4. [PMID: 32076430 PMCID: PMC7006724 DOI: 10.3389/fpls.2020.00004] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 01/06/2020] [Indexed: 06/01/2023]

Yu L, Xu F, Gao L. Predict New Therapeutic Drugs for Hepatocellular Carcinoma Based on Gene Mutation and Expression. Front Bioeng Biotechnol 2020;8:8. [PMID: 32047745 PMCID: PMC6997129 DOI: 10.3389/fbioe.2020.00008] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 01/07/2020] [Indexed: 02/01/2023] Open

Cai J, Wang D, Chen R, Niu Y, Ye X, Su R, Xiao G, Wei L. A Bioinformatics Tool for the Prediction of DNA N6-Methyladenine Modifications Based on Feature Fusion and Optimization Protocol. Front Bioeng Biotechnol 2020;8:502. [PMID: 32582654 PMCID: PMC7287168 DOI: 10.3389/fbioe.2020.00502] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 04/29/2020] [Indexed: 01/04/2023] Open

Robust Prediction of Single and Multiple Point Protein Mutations Stability Changes. Biomolecules 2019;10:biom10010067. [PMID: 31906171 PMCID: PMC7023245 DOI: 10.3390/biom10010067] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 11/16/2022] Open

Li F, Wang Y, Li C, Marquez-Lago TT, Leier A, Rawlings ND, Haffari G, Revote J, Akutsu T, Chou KC, Purcell AW, Pike RN, Webb GI, Ian Smith A, Lithgow T, Daly RJ, Whisstock JC, Song J. Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods. Brief Bioinform 2019;20:2150-2166. [PMID: 30184176 PMCID: PMC6954447 DOI: 10.1093/bib/bby077] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 07/26/2018] [Accepted: 08/01/2018] [Indexed: 01/06/2023] Open

Abstract

The roles of proteolytic cleavage have been intensively investigated and discussed during the past two decades. This irreversible chemical process has been frequently reported to influence a number of crucial biological processes (BPs), such as cell cycle, protein regulation and inflammation. A number of advanced studies have been published aiming at deciphering the mechanisms of proteolytic cleavage. Given its significance and the large number of functionally enriched substrates targeted by specific proteases, many computational approaches have been established for accurate prediction of protease-specific substrates and their cleavage sites. Consequently, there is an urgent need to systematically assess the state-of-the-art computational approaches for protease-specific cleavage site prediction to further advance the existing methodologies and to improve the prediction performance. With this goal in mind, in this article, we carefully evaluated a total of 19 computational methods (including 8 scoring function-based methods and 11 machine learning-based methods) in terms of their underlying algorithm, calculated features, performance evaluation and software usability. Then, extensive independent tests were performed to assess the robustness and scalability of the reviewed methods using our carefully prepared independent test data sets with 3641 cleavage sites (specific to 10 proteases). The comparative experimental results demonstrate that PROSPERous is the most accurate generic method for predicting eight protease-specific cleavage sites, while GPS-CCD and LabCaS outperformed other predictors for calpain-specific cleavage sites. Based on our review, we then outlined some potential ways to improve the prediction performance and ease the computational burden by applying ensemble learning, deep learning, positive unlabeled learning and parallel and distributed computing techniques. We anticipate that our study will serve as a practical and useful guide for interested readers to further advance next-generation bioinformatics tools for protease-specific cleavage site prediction.

Collapse

Affiliation(s)

Fuyi Li Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
Yanan Wang Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
Chen Li Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia Department of Biology, Institute of Molecular Systems Biology,ETH Zürich, Zürich 8093, Switzerland
Tatiana T Marquez-Lago Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
André Leier Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
Neil D Rawlings EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Wellcome Trust Genome Campus,Hinxton, Cambridgeshire CB10 1SD, UK
Gholamreza Haffari Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
Jerico Revote Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
Tatsuya Akutsu Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
Kuo-Chen Chou Gordon Life Science Institute, Boston, MA 02478, USA Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
Anthony W Purcell Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
Robert N Pike La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
Geoffrey I Webb Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
A Ian Smith Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
Trevor Lithgow Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, Victoria 3800, Australia
Roger J Daly Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
James C Whisstock Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
Jiangning Song Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia

Collapse

Rao B, Zhou C, Zhang G, Su R, Wei L. ACPred-Fuse: fusing multi-view information improves the prediction of anticancer peptides. Brief Bioinform 2019;21:1846-1855. [DOI: 10.1093/bib/bbz088] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Revised: 06/06/2019] [Accepted: 06/22/2019] [Indexed: 02/04/2023] Open

Arif M, Ali F, Ahmad S, Kabir M, Ali Z, Hayat M. Pred-BVP-Unb: Fast prediction of bacteriophage Virion proteins using un-biased multi-perspective properties with recursive feature elimination. Genomics 2019;112:1565-1574. [PMID: 31526842 DOI: 10.1016/j.ygeno.2019.09.006] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 08/27/2019] [Accepted: 09/11/2019] [Indexed: 10/26/2022]

Maiti S, Hassan A, Mitra P. Boosting phosphorylation site prediction with sequence feature-based machine learning. Proteins 2019;88:284-291. [PMID: 31412138 DOI: 10.1002/prot.25801] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 07/13/2019] [Accepted: 08/08/2019] [Indexed: 12/13/2022]

Li H, Guan Y. Machine learning empowers phosphoproteome prediction in cancers. Bioinformatics 2019;36:859-864. [PMID: 31410451 PMCID: PMC7868059 DOI: 10.1093/bioinformatics/btz639] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2019] [Revised: 07/25/2019] [Accepted: 08/12/2019] [Indexed: 02/06/2023] Open

Wei L, Xing P, Shi G, Ji Z, Zou Q. Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019;16:1264-1273. [PMID: 28222000 DOI: 10.1109/tcbb.2017.2670558] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]

Wei L, Zhou C, Su R, Zou Q. PEPred-Suite: improved and robust prediction of therapeutic peptides using adaptive feature representation learning. Bioinformatics 2019;35:4272-4280. [DOI: 10.1093/bioinformatics/btz246] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 01/28/2019] [Accepted: 04/11/2019] [Indexed: 11/13/2022] Open

Abstract Abstract Motivation Prediction of therapeutic peptides is critical for the discovery of novel and efficient peptide-based therapeutics. Computational methods, especially machine learning based methods, have been developed for addressing this need. However, most of existing methods are peptide-specific; currently, there is no generic predictor for multiple peptide types. Moreover, it is still challenging to extract informative feature representations from the perspective of primary sequences. Results In this study, we have developed PEPred-Suite, a bioinformatics tool for the generic prediction of therapeutic peptides. In PEPred-Suite, we introduce an adaptive feature representation strategy that can learn the most representative features for different peptide types. To be specific, we train diverse sequence-based feature descriptors, integrate the learnt class information into our features, and utilize a two-step feature optimization strategy based on the area under receiver operating characteristic curve to extract the most discriminative features. Using the learnt representative features, we trained eight random forest models for eight different types of functional peptides, respectively. Benchmarking results showed that as compared with existing predictors, PEPred-Suite achieves better and robust performance for different peptides. As far as we know, PEPred-Suite is currently the first tool that is capable of predicting so many peptide types simultaneously. In addition, our work demonstrates that the learnt features can reliably predict different peptides. Availability and implementation The user-friendly webserver implementing the proposed PEPred-Suite is freely accessible at http://server.malab.cn/PEPred-Suite. Supplementary information Supplementary data are available at Bioinformatics online. Collapse

Chen W, Song X, Lin H. Combinatorial Pattern of Histone Modifications in Exon Skipping Event. Front Genet 2019;10:122. [PMID: 30833963 PMCID: PMC6387913 DOI: 10.3389/fgene.2019.00122] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 02/04/2019] [Indexed: 11/18/2022] Open

Zhang Y, Dong D, Li D, Lu L, Li J, Zhang Y, Chen L. Computational Method for the Identification of Molecular Metabolites Involved in Cereal Hull Color Variations. Comb Chem High Throughput Screen 2019;21:760-770. [DOI: 10.2174/1386207322666190129105441] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Revised: 08/02/2018] [Accepted: 08/16/2018] [Indexed: 11/22/2022]

Zou Q, Xing P, Wei L, Liu B. Gene2vec: gene subsequence embedding for prediction of mammalian N⁶-methyladenosine sites from mRNA. RNA (NEW YORK, N.Y.) 2019;25:205-218. [PMID: 30425123 PMCID: PMC6348985 DOI: 10.1261/rna.069112.118] [Citation(s) in RCA: 303] [Impact Index Per Article: 60.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 11/01/2018] [Indexed: 05/20/2023]

Li Y, Niu M, Zou Q. ELM-MHC: An Improved MHC Identification Method with Extreme Learning Machine Algorithm. J Proteome Res 2019;18:1392-1401. [DOI: 10.1021/acs.jproteome.9b00012] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]