Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Luo F, Wang M, Liu Y, Zhao XM, Li A. DeepPhos: prediction of protein phosphorylation sites with deep learning. Bioinformatics 2020;35:2766-2773. [PMID: 30601936 PMCID: PMC6691328 DOI: 10.1093/bioinformatics/bty1051] [Citation(s) in RCA: 105] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 11/19/2018] [Accepted: 12/12/2018] [Indexed: 11/28/2022] Open

For:	Luo F, Wang M, Liu Y, Zhao XM, Li A. DeepPhos: prediction of protein phosphorylation sites with deep learning. Bioinformatics 2020;35:2766-2773. [PMID: 30601936 PMCID: PMC6691328 DOI: 10.1093/bioinformatics/bty1051] [Citation(s) in RCA: 105] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 11/19/2018] [Accepted: 12/12/2018] [Indexed: 11/28/2022] Open

Number

Cited by Other Article(s)

Houles T, Yoon SO, Roux PP. The expanding landscape of canonical and non-canonical protein phosphorylation. Trends Biochem Sci 2024:S0968-0004(24)00191-9. [PMID: 39266329 DOI: 10.1016/j.tibs.2024.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 08/01/2024] [Accepted: 08/14/2024] [Indexed: 09/14/2024]

Ahmed F, Sharma A, Shatabda S, Dehzangi I. DeepPhoPred: Accurate Deep Learning Model to Predict Microbial Phosphorylation. Proteins 2024. [PMID: 39239684 DOI: 10.1002/prot.26734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 06/27/2024] [Accepted: 07/15/2024] [Indexed: 09/07/2024]

Deng Q, Zhang J, Liu J, Liu Y, Dai Z, Zou X, Li Z. Identifying Protein Phosphorylation Site-Disease Associations Based on Multi-Similarity Fusion and Negative Sample Selection by Convolutional Neural Network. Interdiscip Sci 2024;16:649-664. [PMID: 38457108 DOI: 10.1007/s12539-024-00615-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 01/26/2024] [Accepted: 01/29/2024] [Indexed: 03/09/2024]

Abstract

As one of the most important post-translational modifications (PTMs), protein phosphorylation plays a key role in a variety of biological processes. Many studies have shown that protein phosphorylation is associated with various human diseases. Therefore, identifying protein phosphorylation site-disease associations can help to elucidate the pathogenesis of disease and discover new drug targets. Networks of sequence similarity and Gaussian interaction profile kernel similarity were constructed for phosphorylation sites, as well as networks of disease semantic similarity, disease symptom similarity and Gaussian interaction profile kernel similarity were constructed for diseases. To effectively combine different phosphorylation sites and disease similarity information, random walk with restart algorithm was used to obtain the topology information of the network. Then, the diffusion component analysis method was utilized to obtain the comprehensive phosphorylation site similarity and disease similarity. Meanwhile, the reliable negative samples were screened based on the Euclidean distance method. Finally, a convolutional neural network (CNN) model was constructed to identify potential associations between phosphorylation sites and diseases. Based on tenfold cross-validation, the evaluation indicators were obtained including accuracy of 93.48%, specificity of 96.82%, sensitivity of 90.15%, precision of 96.62%, Matthew's correlation coefficient of 0.8719, area under the receiver operating characteristic curve of 0.9786 and area under the precision-recall curve of 0.9836. Additionally, most of the top 20 predicted disease-related phosphorylation sites (19/20 for Alzheimer's disease; 20/16 for neuroblastoma) were verified by literatures and databases. These results show that the proposed method has an outstanding prediction performance and a high practical value.

Collapse

Li W, Lin H, Huang Z, Xie S, Zhou Y, Gong R, Jiang Q, Xiang C, Huang J. DOTAD: A Database of Therapeutic Antibody Developability. Interdiscip Sci 2024;16:623-634. [PMID: 38530613 DOI: 10.1007/s12539-024-00613-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 01/25/2024] [Accepted: 01/27/2024] [Indexed: 03/28/2024]

Grunfeld N, Levine E, Libby E. Experimental measurement and computational prediction of bacterial Hanks-type Ser/Thr signaling system regulatory targets. Mol Microbiol 2024;122:152-164. [PMID: 38167835 PMCID: PMC11219531 DOI: 10.1111/mmi.15220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 12/15/2023] [Accepted: 12/17/2023] [Indexed: 01/05/2024]

Yu Z, Yu J, Wang H, Zhang S, Zhao L, Shi S. PhosAF: An integrated deep learning architecture for predicting protein phosphorylation sites with AlphaFold2 predicted structures. Anal Biochem 2024;690:115510. [PMID: 38513769 DOI: 10.1016/j.ab.2024.115510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 03/14/2024] [Accepted: 03/18/2024] [Indexed: 03/23/2024]

Gutierrez CS, Kassim AA, Gutierrez BD, Raines RT. Sitetack: A Deep Learning Model that Improves PTM Prediction by Using Known PTMs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.03.596298. [PMID: 38895359 PMCID: PMC11185516 DOI: 10.1101/2024.06.03.596298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]

Chen Z, Ge R, Wang C, Elazab A, Fu X, Min W, Qin F, Jia G, Fan X. Identification of important gene signatures in schizophrenia through feature fusion and genetic algorithm. Mamm Genome 2024;35:241-255. [PMID: 38512459 DOI: 10.1007/s00335-024-10034-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 02/07/2024] [Indexed: 03/23/2024]

Ke J, Zhao J, Li H, Yuan L, Dong G, Wang G. Prediction of protein N-terminal acetylation modification sites based on CNN-BiLSTM-attention model. Comput Biol Med 2024;174:108330. [PMID: 38588617 DOI: 10.1016/j.compbiomed.2024.108330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/06/2024] [Accepted: 03/17/2024] [Indexed: 04/10/2024]

Abstract

N-terminal acetylation is one of the most common and important post-translational modifications (PTM) of eukaryotic proteins. PTM plays a crucial role in various cellular processes and disease pathogenesis. Thus, the accurate identification of N-terminal acetylation modifications is important to gain insight into cellular processes and other possible functional mechanisms. Although some algorithmic models have been proposed, most have been developed based on traditional machine learning algorithms and small training datasets. Their practical applications are limited. Nevertheless, deep learning algorithmic models are better at handling high-throughput and complex data. In this study, DeepCBA, a model based on the hybrid framework of convolutional neural network (CNN), bidirectional long short-term memory network (BiLSTM), and attention mechanism deep learning, was constructed to detect the N-terminal acetylation sites. The DeepCBA was built as follows: First, a benchmark dataset was generated by selecting low-redundant protein sequences from the Uniport database and further reducing the redundancy of the protein sequences using the CD-HIT tool. Subsequently, based on the skip-gram model in the word2vec algorithm, tripeptide word vector features were generated on the benchmark dataset. Finally, the CNN, BiLSTM, and attention mechanism were combined, and the tripeptide word vector features were fed into the stacked model for multiple rounds of training. The model performed excellently on independent dataset test, with accuracy and area under the curve of 80.51% and 87.36%, respectively. Altogether, DeepCBA achieved superior performance compared with the baseline model, and significantly outperformed most existing predictors. Additionally, our model can be used to identify disease loci and drug targets.

Collapse

Zahiri Z, Mehrshad N, Mehrshad M. DF-Phos: Prediction of Protein Phosphorylation Sites by Deep Forest. J Biochem 2024;175:447-456. [PMID: 38153271 DOI: 10.1093/jb/mvad116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 12/10/2023] [Accepted: 12/12/2023] [Indexed: 12/29/2023] Open

Ertelt M, Mulligan VK, Maguire JB, Lyskov S, Moretti R, Schiffner T, Meiler J, Schoeder CT. Combining machine learning with structure-based protein design to predict and engineer post-translational modifications of proteins. PLoS Comput Biol 2024;20:e1011939. [PMID: 38484014 DOI: 10.1371/journal.pcbi.1011939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 03/26/2024] [Accepted: 02/20/2024] [Indexed: 03/27/2024] Open

Zhou Z, Yeung W, Soleymani S, Gravel N, Salcedo M, Li S, Kannan N. Using explainable machine learning to uncover the kinase-substrate interaction landscape. Bioinformatics 2024;40:btae033. [PMID: 38244571 PMCID: PMC10868336 DOI: 10.1093/bioinformatics/btae033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 12/09/2023] [Accepted: 01/17/2024] [Indexed: 01/22/2024] Open

Abstract

MOTIVATION

Phosphorylation, a post-translational modification regulated by protein kinase enzymes, plays an essential role in almost all cellular processes. Understanding how each of the nearly 500 human protein kinases selectively phosphorylates their substrates is a foundational challenge in bioinformatics and cell signaling. Although deep learning models have been a popular means to predict kinase-substrate relationships, existing models often lack interpretability and are trained on datasets skewed toward a subset of well-studied kinases.

RESULTS

Here we leverage recent peptide library datasets generated to determine substrate specificity profiles of 300 serine/threonine kinases to develop an explainable Transformer model for kinase-peptide interaction prediction. The model, trained solely on primary sequences, achieved state-of-the-art performance. Its unique multitask learning paradigm built within the model enables predictions on virtually any kinase-peptide pair, including predictions on 139 kinases not used in peptide library screens. Furthermore, we employed explainable machine learning methods to elucidate the model's inner workings. Through analysis of learned embeddings at different training stages, we demonstrate that the model employs a unique strategy of substrate prediction considering both substrate motif patterns and kinase evolutionary features. SHapley Additive exPlanation (SHAP) analysis reveals key specificity determining residues in the peptide sequence. Finally, we provide a web interface for predicting kinase-substrate associations for user-defined sequences and a resource for visualizing the learned kinase-substrate associations.

AVAILABILITY AND IMPLEMENTATION

All code and data are available at https://github.com/esbgkannan/Phosformer-ST. Web server is available at https://phosformer.netlify.app.

Collapse

Shrestha P, Kandel J, Tayara H, Chong KT. DL-SPhos: Prediction of serine phosphorylation sites using transformer language model. Comput Biol Med 2024;169:107925. [PMID: 38183701 DOI: 10.1016/j.compbiomed.2024.107925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/21/2023] [Accepted: 01/01/2024] [Indexed: 01/08/2024]

Song T, Yang Q, Qu P, Qiao L, Wang X. Attenphos: General Phosphorylation Site Prediction Model Based on Attention Mechanism. Int J Mol Sci 2024;25:1526. [PMID: 38338804 PMCID: PMC10855885 DOI: 10.3390/ijms25031526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 01/18/2024] [Accepted: 01/23/2024] [Indexed: 02/12/2024] Open

Skinnider MA, Akinlaja MO, Foster LJ. Mapping protein states and interactions across the tree of life with co-fractionation mass spectrometry. Nat Commun 2023;14:8365. [PMID: 38102123 PMCID: PMC10724252 DOI: 10.1038/s41467-023-44139-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 12/01/2023] [Indexed: 12/17/2023] Open

Poretsky E, Andorf CM, Sen TZ. PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models. PLANT DIRECT 2023;7:e554. [PMID: 38124705 PMCID: PMC10732782 DOI: 10.1002/pld3.554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 11/20/2023] [Accepted: 11/26/2023] [Indexed: 12/23/2023]

Esmaili F, Pourmirzaei M, Ramazi S, Shojaeilangari S, Yavari E. A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Site Prediction. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023;21:1266-1285. [PMID: 37863385 PMCID: PMC11082408 DOI: 10.1016/j.gpb.2023.03.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 01/16/2023] [Accepted: 03/23/2023] [Indexed: 10/22/2023]

López-Correa JM, König C, Vellido A. GPCR molecular dynamics forecasting using recurrent neural networks. Sci Rep 2023;13:20995. [PMID: 38017062 PMCID: PMC10684758 DOI: 10.1038/s41598-023-48346-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 11/25/2023] [Indexed: 11/30/2023] Open

Xie J, Quan L, Wang X, Wu H, Jin Z, Pan D, Chen T, Wu T, Lyu Q. DeepMPSF: A Deep Learning Network for Predicting General Protein Phosphorylation Sites Based on Multiple Protein Sequence Features. J Chem Inf Model 2023;63:7258-7271. [PMID: 37931253 DOI: 10.1021/acs.jcim.3c00996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]

Abstract

Phosphorylation, as one of the most important post-translational modifications, plays a key role in various cellular physiological processes and disease occurrences. In recent years, computer technology has been gradually applied to the prediction of protein phosphorylation sites. However, most existing methods rely on simple protein sequence features that provide limited contextual information. To overcome this limitation, we propose DeepMPSF, a phosphorylation site prediction model based on multiple protein sequence features. There are two types of features: sequence semantic features, which comprise protein residue type information and relative position information within protein sequence, and protein background biophysical features, which include global semantic information containing more comprehensive protein background information obtained from pretrained models. To extract these features, DeepMPSF employs two separate subnetworks: the S71SFE module and the BBFE module, which automatically extract high-level semantic features. Our model incorporates a learning strategy for handling imbalanced datasets through ensemble learning during training and prediction. DeepMPSF is trained and evaluated on a well-established dataset of human proteins. Comparing the analysis with other benchmark methods reveals that DeepMPSF outperforms in predicting both S/T residues and Y residues. In particular, DeepMPSF showed excellent generalization performance in cross-species blind test performance, with an average improvement of 5.63%/5.72%, 22.28%/25.94%, 20.11%/17.49%, and 26.40%/28.33% for Mus musculus/Rattus norvegicus test sets in area under curves (AUCs) of ROC curve, AUC of the PR curve, F1-score, and MCC metrics, respectively. Furthermore, it also shows excellent performance in the latest updated case of natural proteins with functional phosphorylation sites. Through an ablation study and visual analysis, we uncover that the design of different feature modules significantly contributes to the accurate classification of DeepMPSF, which provides valuable insights for predicting phosphorylation sites and offers effective support for future downstream research.

Collapse

Pham NT, Phan LT, Seo J, Kim Y, Song M, Lee S, Jeon YJ, Manavalan B. Advancing the accuracy of SARS-CoV-2 phosphorylation site detection via meta-learning approach. Brief Bioinform 2023;25:bbad433. [PMID: 38058187 PMCID: PMC10753650 DOI: 10.1093/bib/bbad433] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 10/30/2023] [Accepted: 11/05/2023] [Indexed: 12/08/2023] Open

Abstract

The worldwide appearance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has generated significant concern and posed a considerable challenge to global health. Phosphorylation is a common post-translational modification that affects many vital cellular functions and is closely associated with SARS-CoV-2 infection. Precise identification of phosphorylation sites could provide more in-depth insight into the processes underlying SARS-CoV-2 infection and help alleviate the continuing COVID-19 crisis. Currently, available computational tools for predicting these sites lack accuracy and effectiveness. In this study, we designed an innovative meta-learning model, Meta-Learning for Serine/Threonine Phosphorylation (MeL-STPhos), to precisely identify protein phosphorylation sites. We initially performed a comprehensive assessment of 29 unique sequence-derived features, establishing prediction models for each using 14 renowned machine learning methods, ranging from traditional classifiers to advanced deep learning algorithms. We then selected the most effective model for each feature by integrating the predicted values. Rigorous feature selection strategies were employed to identify the optimal base models and classifier(s) for each cell-specific dataset. To the best of our knowledge, this is the first study to report two cell-specific models and a generic model for phosphorylation site prediction by utilizing an extensive range of sequence-derived features and machine learning algorithms. Extensive cross-validation and independent testing revealed that MeL-STPhos surpasses existing state-of-the-art tools for phosphorylation site prediction. We also developed a publicly accessible platform at https://balalab-skku.org/MeL-STPhos. We believe that MeL-STPhos will serve as a valuable tool for accelerating the discovery of serine/threonine phosphorylation sites and elucidating their role in post-translational regulation.

Collapse

Kumari S, Gupta R, Ambasta RK, Kumar P. Emerging trends in post-translational modification: Shedding light on Glioblastoma multiforme. Biochim Biophys Acta Rev Cancer 2023;1878:188999. [PMID: 37858622 DOI: 10.1016/j.bbcan.2023.188999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/06/2023] [Accepted: 10/06/2023] [Indexed: 10/21/2023]

Liang Z, Liu T, Li Q, Zhang G, Zhang B, Du X, Liu J, Chen Z, Ding H, Hu G, Lin H, Zhu F, Luo C. Deciphering the functional landscape of phosphosites with deep neural network. Cell Rep 2023;42:113048. [PMID: 37659078 DOI: 10.1016/j.celrep.2023.113048] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 07/11/2023] [Accepted: 08/11/2023] [Indexed: 09/04/2023] Open

Affiliation(s)

Zhongjie Liang Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou 215123, China
Tonghai Liu Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan 528437, China; State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
Qi Li Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan 528437, China; State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
Guangyu Zhang School of Computer Science and Technology, Soochow University, Suzhou 215006, China
Bei Zhang State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
Xikun Du Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
Jingqiu Liu State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
Zhifeng Chen State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
Hong Ding State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
Guang Hu Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou 215123, China
Hao Lin School of Computer Science and Technology, Soochow University, Suzhou 215006, China
Fei Zhu School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
Cheng Luo Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan 528437, China; State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China; School of Life Science and Technology, Shanghai Tech University, 100 Haike Road, Shanghai 201210, China; School of Pharmacy, Fujian Medical University, Fuzhou 350122, China.

Collapse

Pakhrin SC, Pokharel S, Pratyush P, Chaudhari M, Ismail HD, Kc DB. LMPhosSite: A Deep Learning-Based Approach for General Protein Phosphorylation Site Prediction Using Embeddings from the Local Window Sequence and Pretrained Protein Language Model. J Proteome Res 2023;22:2548-2557. [PMID: 37459437 DOI: 10.1021/acs.jproteome.2c00667] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]

Raslan MA, Raslan SA, Shehata EM, Mahmoud AS, Sabri NA. Advances in the Applications of Bioinformatics and Chemoinformatics. Pharmaceuticals (Basel) 2023;16:1050. [PMID: 37513961 PMCID: PMC10384252 DOI: 10.3390/ph16071050] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/19/2023] [Accepted: 07/20/2023] [Indexed: 07/30/2023] Open

Manzoori S, Farahani AHK, Moradi MH, Kazemi-Bonchenari M. Detecting SNP markers discriminating horse breeds by deep learning. Sci Rep 2023;13:11592. [PMID: 37464049 DOI: 10.1038/s41598-023-38601-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 07/11/2023] [Indexed: 07/20/2023] Open

Abstract

The assignment of an individual to the true population of origin using a low-panel of discriminant SNP markers is one of the most important applications of genomic data for practical use. The aim of this study was to evaluate the potential of different Artificial Neural Networks (ANNs) approaches consisting Deep Neural Networks (DNN), Garson and Olden methods for feature selection of informative SNP markers from high-throughput genotyping data, that would be able to trace the true breed of unknown samples. The total of 795 animals from 37 breeds, genotyped by using the Illumina SNP 50k Bead chip were used in the current study and principal component analysis (PCA), log-likelihood ratios (LLR) and Neighbor-Joining (NJ) were applied to assess the performance of different assignment methods. The results revealed that the DNN, Garson, and Olden methods are able to assign individuals to true populations with 4270, 4937, and 7999 SNP markers, respectively. The PCA was used to determine how the animals allocated to the groups using all genotyped markers available on 50k Bead chip and the subset of SNP markers identified with different methods. The results indicated that all SNP panels are able to assign individuals into their true breeds. The success percentage of genetic assignment for different methods assessed by different levels of LLR showed that the success rate of 70% in the analysis was obtained by three methods with the number of markers of 110, 208, and 178 tags for DNN, Garson, and Olden methods, respectively. Also the results showed that DNN performed better than other two approaches by achieving 93% accuracy at the most stringent threshold. Finally, the identified SNPs were successfully used in independent out-group breeds consisting 120 individuals from eight breeds and the results indicated that these markers are able to correctly allocate all unknown samples to true population of origin. Furthermore, the NJ tree of allele-sharing distances on the validation dataset showed that the DNN has a high potential for feature selection. In general, the results of this study indicated that the DNN technique represents an efficient strategy for selecting a reduced pool of highly discriminant markers for assigning individuals to the true population of origin.

Collapse

Kim Y, Lee H. PINNet: a deep neural network with pathway prior knowledge for Alzheimer's disease. Front Aging Neurosci 2023;15:1126156. [PMID: 37520124 PMCID: PMC10380929 DOI: 10.3389/fnagi.2023.1126156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Accepted: 06/20/2023] [Indexed: 08/01/2023] Open

Zhang G, Tang Q, Feng P, Chen W. IPs-GRUAtt: An attention-based bidirectional gated recurrent unit network for predicting phosphorylation sites of SARS-CoV-2 infection. MOLECULAR THERAPY. NUCLEIC ACIDS 2023;32:28-35. [PMID: 36908648 PMCID: PMC9968446 DOI: 10.1016/j.omtn.2023.02.027] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 02/22/2023] [Indexed: 02/27/2023]

Varshney N, Mishra AK. Deep Learning in Phosphoproteomics: Methods and Application in Cancer Drug Discovery. Proteomes 2023;11:proteomes11020016. [PMID: 37218921 DOI: 10.3390/proteomes11020016] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/24/2023] Open

Wang C, Yang Q. ScerePhoSite: An interpretable method for identifying fungal phosphorylation sites in proteins using sequence-based features. Comput Biol Med 2023;158:106798. [PMID: 36966555 DOI: 10.1016/j.compbiomed.2023.106798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/03/2023] [Accepted: 03/20/2023] [Indexed: 03/31/2023]

Wang M, Yan L, Jia J, Lai J, Zhou H, Yu B. DE-MHAIPs: Identification of SARS-CoV-2 phosphorylation sites based on differential evolution multi-feature learning and multi-head attention mechanism. Comput Biol Med 2023;160:106935. [PMID: 37120990 PMCID: PMC10140648 DOI: 10.1016/j.compbiomed.2023.106935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 03/12/2023] [Accepted: 04/13/2023] [Indexed: 05/02/2023]

Zhou H, Tan W, Shi S. DeepGpgs: a novel deep learning framework for predicting arginine methylation sites combined with Gaussian prior and gated self-attention mechanism. Brief Bioinform 2023;24:7000314. [PMID: 36694944 DOI: 10.1093/bib/bbad018] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/26/2022] [Accepted: 01/04/2023] [Indexed: 01/26/2023] Open

Zhou Z, Yeung W, Gravel N, Salcedo M, Soleymani S, Li S, Kannan N. Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions. Bioinformatics 2023;39:7000331. [PMID: 36692152 PMCID: PMC9900213 DOI: 10.1093/bioinformatics/btad046] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 01/16/2023] [Accepted: 01/23/2023] [Indexed: 01/25/2023] Open

Abstract

MOTIVATION

The human genome encodes over 500 distinct protein kinases which regulate nearly all cellular processes by the specific phosphorylation of protein substrates. While advances in mass spectrometry and proteomics studies have identified thousands of phosphorylation sites across species, information on the specific kinases that phosphorylate these sites is currently lacking for the vast majority of phosphosites. Recently, there has been a major focus on the development of computational models for predicting kinase-substrate associations. However, most current models only allow predictions on a subset of well-studied kinases. Furthermore, the utilization of hand-curated features and imbalances in training and testing datasets pose unique challenges in the development of accurate predictive models for kinase-specific phosphorylation prediction. Motivated by the recent development of universal protein language models which automatically generate context-aware features from primary sequence information, we sought to develop a unified framework for kinase-specific phosphosite prediction, allowing for greater investigative utility and enabling substrate predictions at the whole kinome level.

RESULTS

We present a deep learning model for kinase-specific phosphosite prediction, termed Phosformer, which predicts the probability of phosphorylation given an arbitrary pair of unaligned kinase and substrate peptide sequences. We demonstrate that Phosformer implicitly learns evolutionary and functional features during training, removing the need for feature curation and engineering. Further analyses reveal that Phosformer also learns substrate specificity motifs and is able to distinguish between functionally distinct kinase families. Benchmarks indicate that Phosformer exhibits significant improvements compared to the state-of-the-art models, while also presenting a more generalized, unified, and interpretable predictive framework.

AVAILABILITY AND IMPLEMENTATION

Code and data are available at https://github.com/esbgkannan/phosformer.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Xiao D, Chen C, Yang P. Computational systems approach towards phosphoproteomics and their downstream regulation. Proteomics 2023;23:e2200068. [PMID: 35580145 DOI: 10.1002/pmic.202200068] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 04/26/2022] [Accepted: 05/03/2022] [Indexed: 11/07/2022]

Li Z, Gao E, Zhou J, Han W, Xu X, Gao X. Applications of deep learning in understanding gene regulation. CELL REPORTS METHODS 2023;3:100384. [PMID: 36814848 PMCID: PMC9939384 DOI: 10.1016/j.crmeth.2022.100384] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]

Affiliation(s)

Zhongxiao Li Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Elva Gao The KAUST School, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Juexiao Zhou Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Wenkai Han Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Xiaopeng Xu Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Xin Gao Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia

Collapse

Ullah M, Hadi F, Song J, Yu DJ. PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method. Bioinformatics 2023;39:6839969. [PMID: 36413068 PMCID: PMC9947927 DOI: 10.1093/bioinformatics/btac727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 11/02/2022] [Accepted: 11/21/2022] [Indexed: 11/23/2022] Open

Abstract

MOTIVATION

Over the past decades, a variety of in silico methods have been developed to predict protein subcellular localization within cells. However, a common and major challenge in the design and development of such methods is how to effectively utilize the heterogeneous feature sets extracted from bioimages. In this regards, limited efforts have been undertaken.

RESULTS

We propose a new two-level stacked autoencoder network (termed 2L-SAE-SM) to improve its performance by integrating the heterogeneous feature sets. In particular, in the first level of 2L-SAE-SM, each optimal heterogeneous feature set is fed to train our designed stacked autoencoder network (SAE-SM). All the trained SAE-SMs in the first level can output the decision sets based on their respective optimal heterogeneous feature sets, known as 'intermediate decision' sets. Such intermediate decision sets are then ensembled using the mean ensemble method to generate the 'intermediate feature' set for the second-level SAE-SM. Using the proposed framework, we further develop a novel predictor, referred to as PScL-2LSAESM, to characterize image-based protein subcellular localization. Extensive benchmarking experiments on the latest benchmark training and independent test datasets collected from the human protein atlas databank demonstrate the effectiveness of the proposed 2L-SAE-SM framework for the integration of heterogeneous feature sets. Moreover, performance comparison of the proposed PScL-2LSAESM with current state-of-the-art methods further illustrates that PScL-2LSAESM clearly outperforms the existing state-of-the-art methods for the task of protein subcellular localization.

AVAILABILITY AND IMPLEMENTATION

https://github.com/csbio-njust-edu/PScL-2LSAESM.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Ahmed F, Dehzangi I, Hasan MM, Shatabda S. Accurately predicting microbial phosphorylation sites using evolutionary and structural features. Gene 2023;851:146993. [DOI: 10.1016/j.gene.2022.146993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 10/05/2022] [Accepted: 10/14/2022] [Indexed: 11/27/2022]

Jiang H, Shang S, Sha Y, Zhang L, He N, Li L. EdeepSADPr: an extensive deep-learning architecture for prediction of the in situ crosstalks of serine phosphorylation and ADP-ribosylation. Front Cell Dev Biol 2023;11:1149535. [PMID: 37187615 PMCID: PMC10175571 DOI: 10.3389/fcell.2023.1149535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Accepted: 04/17/2023] [Indexed: 05/17/2023] Open

Abstract

The in situ post-translational modification (PTM) crosstalk refers to the interactions between different types of PTMs that occur on the same residue site of a protein. The crosstalk sites generally have different characteristics from those with the single PTM type. Studies targeting the latter's features have been widely conducted, while studies on the former's characteristics are rare. For example, the characteristics of serine phosphorylation (pS) and serine ADP-ribosylation (SADPr) have been investigated, whereas those of their in situ crosstalks (pSADPr) are unknown. In this study, we collected 3,250 human pSADPr, 7,520 SADPr, 151,227 pS and 80,096 unmodified serine sites and explored the features of the pSADPr sites. We found that the characteristics of pSADPr sites are more similar to those of SADPr compared to pS or unmodified serine sites. Moreover, the crosstalk sites are likely to be phosphorylated by some kinase families (e.g., AGC, CAMK, STE and TKL) rather than others (e.g., CK1 and CMGC). Additionally, we constructed three classifiers to predict pSADPr sites from the pS dataset, the SADPr dataset and the protein sequences separately. We built and evaluated five deep-learning classifiers in ten-fold cross-validation and independent test datasets. We also used the classifiers as base classifiers to develop a few stacking-based ensemble classifiers to improve performance. The best classifiers had the AUC values of 0.700, 0.914 and 0.954 for recognizing pSADPr sites from the SADPr, pS and unmodified serine sites, respectively. The lowest prediction accuracy was achieved by separating pSADPr and SADPr sites, which is consistent with the observation that pSADPr's characteristics are more similar to those of SADPr than the rest. Finally, we developed an online tool for extensively predicting human pSADPr sites based on the CNN_OH classifier, dubbed EdeepSADPr. It is freely available through http://edeepsadpr.bioinfogo.org/. We expect our investigation will promote a comprehensive understanding of crosstalks.

Collapse

A Novel Capsule Network with Attention Routing to Identify Prokaryote Phosphorylation Sites. Biomolecules 2022;12:biom12121854. [PMID: 36551282 PMCID: PMC9775645 DOI: 10.3390/biom12121854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/07/2022] [Accepted: 12/09/2022] [Indexed: 12/14/2022] Open

Abstract

By denaturing proteins and promoting the formation of multiprotein complexes, protein phosphorylation has important effects on the activity of protein functional molecules and cell signaling. The regulation of protein phosphorylation allows microbes to respond rapidly and reversibly to specific environmental stimuli or niches, which is closely related to the molecular mechanisms of bacterial drug resistance. Accurate prediction of phosphorylation sites (p-site) of prokaryotes can contribute to addressing bacterial resistance and providing new perspectives for developing novel antibacterial drugs. Most existing studies focus on human phosphorylation sites, while tools targeting phosphorylation site identification of prokaryotic proteins are still relatively scarce. This study designs a capsule network-based prediction technique for p-site in prokaryotes. To address the poor scalability and unreliability of dynamic routing processes in the output space of capsule networks, a more reliable way is introduced to learn the consistency between capsules. We incorporate a self-attention mechanism into the routing algorithm to capture the global information of the capsule, reducing the computational effort while enriching the representation capability of the capsule. Aiming at the weak robustness of the model, EcapsP improves the prediction accuracy and stability by introducing shortcuts and unconditional reconfiguration. In addition, the study compares and analyzes the prediction performance based on word vectors, physicochemical properties, and mixing characteristics in predicting serine (Ser/S), threonine (Thr/T), and tyrosine (Tyr/Y) p-site. The comprehensive experimental results show that the accuracy of the developed technique is close to 70% for the identification of the three phosphorylation sites in prokaryotes. Importantly, in side-by-side comparisons with other state-of-the-art predictors, our method improves the Matthews correlation coefficient (MCC) by approximately 7%. The results demonstrate the superiority of EcapsP in terms of high performance and reliability.

Collapse

Zeng Y, Liu D, Wang Y. Identification of phosphorylation site using S-padding strategy based convolutional neural network. Health Inf Sci Syst 2022;10:29. [PMID: 36124094 PMCID: PMC9481819 DOI: 10.1007/s13755-022-00196-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 08/25/2022] [Indexed: 10/14/2022] Open

Zhao J, Zhuang M, Liu J, Zhang M, Zeng C, Jiang B, Wu J, Song X. pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties. BMC Bioinformatics 2022;23:399. [PMID: 36171552 PMCID: PMC9520798 DOI: 10.1186/s12859-022-04938-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 09/16/2022] [Indexed: 11/17/2022] Open

Ullah M, Hadi F, Song J, Yu DJ. PScL-DDCFPred: an ensemble deep learning-based approach for characterizing multiclass subcellular localization of human proteins from bioimage data. Bioinformatics 2022;38:4019-4026. [PMID: 35771606 PMCID: PMC9890309 DOI: 10.1093/bioinformatics/btac432] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 06/03/2022] [Accepted: 06/28/2022] [Indexed: 02/04/2023] Open

Mini-review: Recent advances in post-translational modification site prediction based on deep learning. Comput Struct Biotechnol J 2022;20:3522-3532. [PMID: 35860402 PMCID: PMC9284371 DOI: 10.1016/j.csbj.2022.06.045] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/21/2022] [Accepted: 06/21/2022] [Indexed: 11/23/2022] Open

Zhu F, Yang S, Meng F, Zheng Y, Ku X, Luo C, Hu G, Liang Z. Leveraging Protein Dynamics to Identify Functional Phosphorylation Sites using Deep Learning Models. J Chem Inf Model 2022;62:3331-3345. [PMID: 35816597 DOI: 10.1021/acs.jcim.2c00484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022;10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open

Abstract

Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit-explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring "the state of the art" in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI-PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI-PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI-PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the "state of the art" on research in the AI-PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.

Collapse

Affiliation(s)

Jalil Villalobos-Alva Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Luis Ochoa-Toledo Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Mario Javier Villalobos-Alva Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Atocha Aliseda Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Fernando Pérez-Escamirosa Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Nelly F. Altamirano-Bustamante Instituto Nacional de Pediatría, Mexico City, Mexico
Francine Ochoa-Fernández Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Ricardo Zamora-Solís Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Sebastián Villalobos-Alva Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Cristina Revilla-Monsalve Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Nicolás Kemper-Valverde Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Myriam M. Altamirano-Bustamante Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico

Collapse

DeepDA-Ace: A Novel Domain Adaptation Method for Species-Specific Acetylation Site Prediction. MATHEMATICS 2022. [DOI: 10.3390/math10142364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

Ma R, Li S, Li W, Yao L, Huang HD, Lee TY. KinasePhos 3.0: Redesign and Expansion of the Prediction on Kinase-specific Phosphorylation Sites. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00081-X. [PMID: 35781048 PMCID: PMC10373160 DOI: 10.1016/j.gpb.2022.06.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Revised: 05/30/2022] [Accepted: 06/27/2022] [Indexed: 06/04/2023]

Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022;2499:285-322. [PMID: 35696087 DOI: 10.1007/978-1-0716-2317-6_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Wang B, Wang M, Zhang H, Xu J, Hou J, Zhu Y. Canine Adenovirus 1 Isolation Bioinformatics Analysis of the Fiber. Front Cell Infect Microbiol 2022;12:879360. [PMID: 35770071 PMCID: PMC9235841 DOI: 10.3389/fcimb.2022.879360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Accepted: 04/25/2022] [Indexed: 11/13/2022] Open

A Novel

N

-Gram-Based Image Classification Model and Its Applications in Diagnosing Thyroid Nodule and Retinal OCT Images. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022;2022:3151554. [PMID: 35547561 PMCID: PMC9085325 DOI: 10.1155/2022/3151554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 04/14/2022] [Accepted: 04/16/2022] [Indexed: 11/18/2022]

Prediction of GPCR activity using Machine Learning. Comput Struct Biotechnol J 2022;20:2564-2573. [PMID: 35685352 PMCID: PMC9163700 DOI: 10.1016/j.csbj.2022.05.016] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 05/08/2022] [Accepted: 05/09/2022] [Indexed: 11/20/2022] Open