1
|
Sun X, Wu Z, Su J, Li C. GraphPBSP: Protein binding site prediction based on Graph Attention Network and pre-trained model ProstT5. Int J Biol Macromol 2024; 282:136933. [PMID: 39471921 DOI: 10.1016/j.ijbiomac.2024.136933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 10/21/2024] [Accepted: 10/24/2024] [Indexed: 11/01/2024]
Abstract
Protein-protein/peptide interactions play crucial roles in various biological processes. Exploring their interactions attracts wide attention. However, accurately predicting their binding sites remains a challenging task. Here, we develop an effective model GraphPBSP based on Graph Attention Network with Convolutional Neural Network and Multilayer Perceptron for protein-protein/peptide binding site prediction, which utilizes various feature types derived from protein sequence and structure including interface residue pairwise propensity developed by us and sequence embeddings obtained from a new pre-trained model ProstT5, alongside physicochemical properties and structural features. To our best knowledge, ProstT5 sequence embeddings and residue pairwise propensity are first introduced for protein-protein/peptide binding site prediction. Additionally, we propose a spatial neighbor-based feature statistic method for effectively considering key spatially neighboring information that significantly improves the model's prediction ability. For model training, a multi-scale objective function is constructed, which enhances the learning capability across samples of the same or different classes. On multiple protein-protein/peptide binding site test sets, GraphPBSP outperforms the currently available state-of-the-art methods with an excellent performance. Additionally, its performances on protein-DNA/RNA binding site test sets also demonstrate its good generalization ability. In conclusion, GraphPBSP is a promising method, which can offer valuable information for protein engineering and drug design.
Collapse
Affiliation(s)
- Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Jingjie Su
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
2
|
Hu J, Chen KX, Rao B, Ni JY, Thafar MA, Albaradei S, Arif M. Protein-peptide binding residue prediction based on protein language models and cross-attention mechanism. Anal Biochem 2024; 694:115637. [PMID: 39121938 DOI: 10.1016/j.ab.2024.115637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 07/28/2024] [Accepted: 08/06/2024] [Indexed: 08/12/2024]
Abstract
Accurate identifications of protein-peptide binding residues are essential for protein-peptide interactions and advancing drug discovery. To address this problem, extensive research efforts have been made to design more discriminative feature representations. However, extracting these explicit features usually depend on third-party tools, resulting in low computational efficacy and suffering from low predictive performance. In this study, we design an end-to-end deep learning-based method, E2EPep, for protein-peptide binding residue prediction using protein sequence only. E2EPep first employs and fine-tunes two state-of-the-art pre-trained protein language models that can extract two different high-latent feature representations from protein sequences relevant for protein structures and functions. A novel feature fusion module is then designed in E2EPep to fuse and optimize the above two feature representations of binding residues. In addition, we have also design E2EPep+, which integrates E2EPep and PepBCL models, to improve the prediction performance. Experimental results on two independent testing data sets demonstrate that E2EPep and E2EPep + could achieve the average AUC values of 0.846 and 0.842 while achieving an average Matthew's correlation coefficient value that is significantly higher than that of existing most of sequence-based methods and comparable to that of the state-of-the-art structure-based predictors. Detailed data analysis shows that the primary strength of E2EPep lies in the effectiveness of feature representation using cross-attention mechanism to fuse the embeddings generated by two fine-tuned protein language models. The standalone package of E2EPep and E2EPep + can be obtained at https://github.com/ckx259/E2EPep.git for academic use only.
Collapse
Affiliation(s)
- Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China; Center for AI and Computational Biology, Suzhou Institution of Systems Medicine, Suzhou, 215123, China.
| | - Kai-Xin Chen
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Bing Rao
- School of Information & Electrical Engineering, Hangzhou City University, Hangzhou, 310015, China
| | - Jing-Yuan Ni
- NUIST Reading Academy, Nanjing University of Information Science & Technology, Nanjing, 210044, China
| | - Maha A Thafar
- Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, 21944, Saudi Arabia
| | - Somayah Albaradei
- Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, 34110, Qatar.
| |
Collapse
|
3
|
Huang J, Li W, Xiao B, Zhao C, Zheng H, Li Y, Wang J. PepCA: Unveiling protein-peptide interaction sites with a multi-input neural network model. iScience 2024; 27:110850. [PMID: 39391726 PMCID: PMC11465048 DOI: 10.1016/j.isci.2024.110850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/13/2024] [Accepted: 08/27/2024] [Indexed: 10/12/2024] Open
Abstract
The protein-peptide interaction plays a pivotal role in fields such as drug development, yet remains underexplored experimentally and challenging to model computationally. Herein, we introduce PepCA, a sequence-based approach for predicting peptide-binding sites on proteins. A primary obstacle in predicting peptide-protein interactions is the difficulty in acquiring precise protein structures, coupled with the uncertainty of polypeptide configurations. To address this, we first encode protein sequences using the Evolutionary Scale Modeling 2 (ESM-2) pre-trained model to extract latent structural information. Additionally, we have developed a multi-input coattention mechanism to concurrently update the encoding of both peptide and protein residues. PepCA integrates this module within an encoder-decoder structure. This model's high precision in identifying binding sites significantly advances the field of computational biology, offering vital insights for peptide drug development and protein science.
Collapse
Affiliation(s)
- Junxiong Huang
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Weikang Li
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Bin Xiao
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Chunqing Zhao
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Hancheng Zheng
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- Shenzhen Digital Life Institute, Shenzhen, Guangdong, China
| | - Yingrui Li
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- Faculty of Health and Medical Sciences, University of Surrey, Guildford, Surrey, UK
- Shenzhen Digital Life Institute, Shenzhen, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Jun Wang
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa, Macau, China
- Shenzhen Digital Life Institute, Shenzhen, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| |
Collapse
|
4
|
Shafiee S, Fathi A, Taherzadeh G. DP-site: A dual deep learning-based method for protein-peptide interaction site prediction. Methods 2024; 229:17-29. [PMID: 38871095 DOI: 10.1016/j.ymeth.2024.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/22/2024] [Accepted: 06/01/2024] [Indexed: 06/15/2024] Open
Abstract
BACKGROUND Protein-peptide interaction prediction is an important topic for several applications including various biological processes, understanding drug discovery, protein function abnormal cellular behaviors, and treating diseases. Over the years, studies have shown that experimental methods have improved the identification of this bio-molecular interaction. However, predicting protein-peptide interactions using these methods is laborious, time-consuming, dependent on third-party tools, and costly. METHOD To address these previous drawbacks, this study introduces a computational framework called DP-Site. The proposed framework concentrates on using a compound of a dual pipeline along with a combination predictor. A deep convolutional neural network for feature extraction and classification is embedded in pipeline 1. In addition, pipeline 2 includes a deep long-short-term memory-based and a random forest classifier for feature extraction and classification. In this investigation, the evolutionary, structure-based, sequence-based, and physicochemical information of proteins is utilized for identifying protein-peptide interaction at the residue level. RESULTS The proposed method is evaluated on both the ten-fold cross-validation and independent test sets. The robust and consistent results between cross-validation and independent test sets confirm the ability of the proposed method to predict peptide binding residues in proteins. Moreover, experimental findings demonstrate that DP-Site has significantly outperformed other state-of-the-art sequence-based and structure-based methods. The proposed method achieves a remarkable balance between a specificity of 0.799 and a sensitivity of 0.770, along with the best f-measure of 0.661 and the highest precision of 0.580 using an independent test set. CONCLUSIONS The outcome of various experiments confirms the proficiency of the proposed method and outperforms state-of-the-art sequence-based and structure-based methods in terms of the mentioned criteria. DP-Site can be accessed at https://github.com/shafiee 95/shima.shafiee.DP-Site.
Collapse
Affiliation(s)
- Shima Shafiee
- Department of Computer Engineering and Information Technology, Razi University, Kermanshah, Iran.
| | - Abdolhossein Fathi
- Department of Computer Engineering and Information Technology, Razi University, Kermanshah, Iran.
| | - Ghazaleh Taherzadeh
- Department of Math, Physics, and Computer Science, Wilkes University, Pennsylvania, USA.
| |
Collapse
|
5
|
Indiran AP, Fatima H, Chattopadhyay S, Ramadoss S, Radhakrishnan Y. UmamiPreDL: Deep learning model for umami taste prediction of peptides using BERT and CNN. Comput Biol Chem 2024; 111:108116. [PMID: 38823360 DOI: 10.1016/j.compbiolchem.2024.108116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 04/24/2024] [Accepted: 05/28/2024] [Indexed: 06/03/2024]
Abstract
Taste is crucial in driving food choice and preference. Umami is one of the basic tastes defined by characteristic deliciousness and mouthfulness that it imparts to foods. Identification of ingredients to enhance umami taste is of significant value to food industry. Various models have been shown to predict umami taste using feature encodings derived from traditional molecular descriptors such as amphiphilic pseudo-amino acid composition, dipeptide composition, and composition-transition-distribution. Highest reported accuracy of 90.5 % was recently achieved through novel model architecture. Here, we propose use of biological sequence transformers such as ProtBert and ESM2, trained on the Uniref databases, as the feature encoders block. With combination of 2 encoders and 2 classifiers, 4 model architectures were developed. Among the 4 models, ProtBert-CNN model outperformed other models with accuracy of 95 % on 5-fold cross validation data and 94 % on independent data.
Collapse
Affiliation(s)
- Arun Pandiyan Indiran
- ITC Life Sciences and Technology Centre, Peenya Industrial Area, 1st Phase, Bengaluru 560058, India
| | - Humaira Fatima
- ITC Life Sciences and Technology Centre, Peenya Industrial Area, 1st Phase, Bengaluru 560058, India
| | | | - Sureshkumar Ramadoss
- ITC Life Sciences and Technology Centre, Peenya Industrial Area, 1st Phase, Bengaluru 560058, India; ITC Infotech India Limited, Bengaluru 560005, India
| | - Yashwanth Radhakrishnan
- ITC Life Sciences and Technology Centre, Peenya Industrial Area, 1st Phase, Bengaluru 560058, India.
| |
Collapse
|
6
|
Zhu C, Zhang C, Shang T, Zhang C, Zhai S, Cao L, Xu Z, Su Z, Song Y, Su A, Li C, Duan H. GAPS: a geometric attention-based network for peptide binding site identification by the transfer learning approach. Brief Bioinform 2024; 25:bbae297. [PMID: 38990514 PMCID: PMC11238429 DOI: 10.1093/bib/bbae297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 04/28/2024] [Accepted: 06/07/2024] [Indexed: 07/12/2024] Open
Abstract
Protein-peptide interactions (PPepIs) are vital to understanding cellular functions, which can facilitate the design of novel drugs. As an essential component in forming a PPepI, protein-peptide binding sites are the basis for understanding the mechanisms involved in PPepIs. Therefore, accurately identifying protein-peptide binding sites becomes a critical task. The traditional experimental methods for researching these binding sites are labor-intensive and time-consuming, and some computational tools have been invented to supplement it. However, these computational tools have limitations in generality or accuracy due to the need for ligand information, complex feature construction, or their reliance on modeling based on amino acid residues. To deal with the drawbacks of these computational algorithms, we describe a geometric attention-based network for peptide binding site identification (GAPS) in this work. The proposed model utilizes geometric feature engineering to construct atom representations and incorporates multiple attention mechanisms to update relevant biological features. In addition, the transfer learning strategy is implemented for leveraging the protein-protein binding sites information to enhance the protein-peptide binding sites recognition capability, taking into account the common structure and biological bias between proteins and peptides. Consequently, GAPS demonstrates the state-of-the-art performance and excellent robustness in this task. Moreover, our model exhibits exceptional performance across several extended experiments including predicting the apo protein-peptide, protein-cyclic peptide and the AlphaFold-predicted protein-peptide binding sites. These results confirm that the GAPS model is a powerful, versatile, stable method suitable for diverse binding site predictions.
Collapse
Affiliation(s)
- Cheng Zhu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Chengyun Zhang
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Tianfeng Shang
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Chenhao Zhang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Silong Zhai
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Lujing Cao
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Zhenyu Xu
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Zhihao Su
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Ying Song
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - An Su
- College of Chemical Engineering, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Chengxi Li
- College of Chemical and Biological Engineering, Zhejiang University, Yuhangtang Road, Xihu District, Hangzhou 310027, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| |
Collapse
|
7
|
Yin S, Mi X, Shukla D. Leveraging machine learning models for peptide-protein interaction prediction. RSC Chem Biol 2024; 5:401-417. [PMID: 38725911 PMCID: PMC11078210 DOI: 10.1039/d3cb00208j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 02/07/2024] [Indexed: 05/12/2024] Open
Abstract
Peptides play a pivotal role in a wide range of biological activities through participating in up to 40% protein-protein interactions in cellular processes. They also demonstrate remarkable specificity and efficacy, making them promising candidates for drug development. However, predicting peptide-protein complexes by traditional computational approaches, such as docking and molecular dynamics simulations, still remains a challenge due to high computational cost, flexible nature of peptides, and limited structural information of peptide-protein complexes. In recent years, the surge of available biological data has given rise to the development of an increasing number of machine learning models for predicting peptide-protein interactions. These models offer efficient solutions to address the challenges associated with traditional computational approaches. Furthermore, they offer enhanced accuracy, robustness, and interpretability in their predictive outcomes. This review presents a comprehensive overview of machine learning and deep learning models that have emerged in recent years for the prediction of peptide-protein interactions.
Collapse
Affiliation(s)
- Song Yin
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign Urbana 61801 Illinois USA
| | - Xuenan Mi
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign Urbana IL 61801 USA
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign Urbana 61801 Illinois USA
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign Urbana IL 61801 USA
- Department of Bioengineering, University of Illinois Urbana-Champaign Urbana IL 61801 USA
| |
Collapse
|
8
|
Zhang J, Wang R, Wei L. MucLiPred: Multi-Level Contrastive Learning for Predicting Nucleic Acid Binding Residues of Proteins. J Chem Inf Model 2024; 64:1050-1065. [PMID: 38301174 DOI: 10.1021/acs.jcim.3c01471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Protein-molecule interactions play a crucial role in various biological functions, with their accurate prediction being pivotal for drug discovery and design processes. Traditional methods for predicting protein-molecule interactions are limited. Some can only predict interactions with a specific molecule, restricting their applicability, while others target multiple molecule types but fail to efficiently process diverse interaction information, leading to complexity and inefficiency. This study presents a novel deep learning model, MucLiPred, equipped with a dual contrastive learning mechanism aimed at improving the prediction of multiple molecule-protein interactions and the identification of potential molecule-binding residues. The residue-level paradigm focuses on differentiating binding from non-binding residues, illuminating detailed local interactions. The type-level paradigm, meanwhile, analyzes overarching contexts of molecule types, like DNA or RNA, ensuring that representations of identical molecule types gravitate closer in the representational space, bolstering the model's proficiency in discerning interaction motifs. This dual approach enables comprehensive multi-molecule predictions, elucidating the relationships among different molecule types and strengthening precise protein-molecule interaction predictions. Empirical evidence demonstrates MucLiPred's superiority over existing models in robustness and prediction accuracy. The integration of dual contrastive learning techniques amplifies its capability to detect potential molecule-binding residues with precision. Further optimization, separating representational and classification tasks, has markedly improved its performance. MucLiPred thus represents a significant advancement in protein-molecule interaction prediction, setting a new precedent for future research in this field.
Collapse
Affiliation(s)
- Jiashuo Zhang
- School of Software, Shandong University, Jinan 250101, China
| | - Ruheng Wang
- School of Software, Shandong University, Jinan 250101, China
| | - Leyi Wei
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| |
Collapse
|
9
|
Fang Y, Jiang Y, Wei L, Ma Q, Ren Z, Yuan Q, Wei DQ. DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model. Bioinformatics 2023; 39:btad718. [PMID: 38015872 PMCID: PMC10723037 DOI: 10.1093/bioinformatics/btad718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 11/04/2023] [Accepted: 11/27/2023] [Indexed: 11/30/2023] Open
Abstract
MOTIVATION Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information. RESULTS In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein-protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite. AVAILABILITY AND IMPLEMENTATION The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.
Collapse
Affiliation(s)
- Yitian Fang
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| | - Yi Jiang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Leyi Wei
- School of Software, Shandong University, Jinan, Shandong 250100, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | | | - Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| |
Collapse
|
10
|
Chandra A, Sharma A, Dehzangi I, Tsunoda T, Sattar A. PepCNN deep learning tool for predicting peptide binding residues in proteins using sequence, structural, and language model features. Sci Rep 2023; 13:20882. [PMID: 38016996 PMCID: PMC10684570 DOI: 10.1038/s41598-023-47624-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 11/16/2023] [Indexed: 11/30/2023] Open
Abstract
Protein-peptide interactions play a crucial role in various cellular processes and are implicated in abnormal cellular behaviors leading to diseases such as cancer. Therefore, understanding these interactions is vital for both functional genomics and drug discovery efforts. Despite a significant increase in the availability of protein-peptide complexes, experimental methods for studying these interactions remain laborious, time-consuming, and expensive. Computational methods offer a complementary approach but often fall short in terms of prediction accuracy. To address these challenges, we introduce PepCNN, a deep learning-based prediction model that incorporates structural and sequence-based information from primary protein sequences. By utilizing a combination of half-sphere exposure, position specific scoring matrices from multiple-sequence alignment tool, and embedding from a pre-trained protein language model, PepCNN outperforms state-of-the-art methods in terms of specificity, precision, and AUC. The PepCNN software and datasets are publicly available at https://github.com/abelavit/PepCNN.git .
Collapse
Affiliation(s)
- Abel Chandra
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
| | - Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, USA
- Center for Computational and Integrative Biology, Rutgers University, Camden, USA
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory for Medical Science Mathematics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia
| |
Collapse
|
11
|
Liu S, Liang Y, Li J, Yang S, Liu M, Liu C, Yang D, Zuo Y. Integrating reduced amino acid composition into PSSM for improving copper ion-binding protein prediction. Int J Biol Macromol 2023:124993. [PMID: 37307968 DOI: 10.1016/j.ijbiomac.2023.124993] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/12/2023] [Accepted: 05/19/2023] [Indexed: 06/14/2023]
Abstract
Copper ion-binding proteins play an essential role in metabolic processes and are critical factors in many diseases, such as breast cancer, lung cancer, and Menkes disease. Many algorithms have been developed for predicting metal ion classification and binding sites, but none have been applied to copper ion-binding proteins. In this study, we developed a copper ion-bound protein classifier, RPCIBP, which integrating the reduced amino acid composition into position-specific score matrix (PSSM). The reduced amino acid composition filters out a large number of useless evolutionary features, improving the operational efficiency and predictive ability of the model (feature dimension from 2900 to 200, ACC from 83 % to 85.1 %). Compared with the basic model using only three sequence feature extraction methods (ACC in training set between 73.8 %-86.2 %, ACC in test set between 69.3 %-87.5 %), the model integrating the evolutionary features of the reduced amino acid composition showed higher accuracy and robustness (ACC in training set between 83.1 %-90.8 %, ACC in test set between 79.1 %-91.9 %). Best copper ion-binding protein classifiers filtered by feature selection progress were deployed in a user-friendly web server (http://bioinfor.imu.edu.cn/RPCIBP). RPCIBP can accurately predict copper ion-binding proteins, which is convenient for further structural and functional studies, and conducive to mechanism exploration and target drug development.
Collapse
Affiliation(s)
- Shanghua Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
| | - Yuchao Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Hohhot 010010, China
| | - Jinzhao Li
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Siqi Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Ming Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Chengfang Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Dezhi Yang
- Inner Mongolia International Mongolian Hospital, Hohhot 010065, China.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Hohhot 010010, China.
| |
Collapse
|
12
|
Kotb HM, Davey NE. xProtCAS: A Toolkit for Extracting Conserved Accessible Surfaces from Protein Structures. Biomolecules 2023; 13:906. [PMID: 37371487 DOI: 10.3390/biom13060906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 05/23/2023] [Accepted: 05/24/2023] [Indexed: 06/29/2023] Open
Abstract
The identification of protein surfaces required for interaction with other biomolecules broadens our understanding of protein function, their regulation by post-translational modification, and the deleterious effect of disease mutations. Protein interaction interfaces are often identifiable as patches of conserved residues on a protein's surface. However, finding conserved accessible surfaces on folded regions requires an understanding of the protein structure to discriminate between functional and structural constraints on residue conservation. With the emergence of deep learning methods for protein structure prediction, high-quality structural models are now available for any protein. In this study, we introduce tools to identify conserved surfaces on AlphaFold2 structural models. We define autonomous structural modules from the structural models and convert these modules to a graph encoding residue topology, accessibility, and conservation. Conserved surfaces are then extracted using a novel eigenvector centrality-based approach. We apply the tool to the human proteome identifying hundreds of uncharacterised yet highly conserved surfaces, many of which contain clinically significant mutations. The xProtCAS tool is available as open-source Python software and an interactive web server.
Collapse
Affiliation(s)
- Hazem M Kotb
- Division of Cancer Biology, The Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Norman E Davey
- Division of Cancer Biology, The Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| |
Collapse
|
13
|
Lyu Y, He R, Hu J, Wang C, Gong X. Prediction of the tetramer protein complex interaction based on CNN and SVM. Front Genet 2023; 14:1076904. [PMID: 36777731 PMCID: PMC9909274 DOI: 10.3389/fgene.2023.1076904] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 01/16/2023] [Indexed: 01/27/2023] Open
Abstract
Protein-protein interactions play an important role in life activities. The study of protein-protein interactions helps to better understand the mechanism of protein complex interaction, which is crucial for drug design, protein function annotation and three-dimensional structure prediction of protein complexes. In this paper, we study the tetramer protein complex interaction. The research has two parts: The first part is to predict the interaction between chains of the tetramer protein complex. In this part, we proposed a feature map to represent a sample generated by two chains of the tetramer protein complex, and constructed a Convolutional Neural Network (CNN) model to predict the interaction between chains of the tetramer protein complex. The AUC value of testing set is 0.6263, which indicates that our model can be used to predict the interaction between chains of the tetramer protein complex. The second part is to predict the tetramer protein complex interface residue pairs. In this part, we proposed a Support Vector Machine (SVM) ensemble method based on under-sampling and ensemble method to predict the tetramer protein complex interface residue pairs. In the top 10 predictions, when at least one protein-protein interaction interface is correctly predicted, the accuracy of our method is 82.14%. The result shows that our method is effective for the prediction of the tetramer protein complex interface residue pairs.
Collapse
Affiliation(s)
- Yanfen Lyu
- Department of Mathematics and PhysicsScience and Engineering, Hebei University of Engineering, Handan, China
| | - Ruonan He
- School of Information, Renmin University of China, Beijing, China
| | - Jingjing Hu
- Department of Mathematics and PhysicsScience and Engineering, Hebei University of Engineering, Handan, China
| | - Chunxia Wang
- School of Landscape and Ecological Engineering, Hebei University of Engineering, Handan, China,*Correspondence: Chunxia Wang, ; Xinqi Gong,
| | - Xinqi Gong
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, School of Math, Renmin University of China, Beijing, China,Beijing Academy of Artificial Intelligence, Beijing, China,*Correspondence: Chunxia Wang, ; Xinqi Gong,
| |
Collapse
|
14
|
Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
Affiliation(s)
- Julia R Rogers
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Gergő Nikolényi
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
15
|
Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022; 10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open
Abstract
Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit-explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring "the state of the art" in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI-PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI-PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI-PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the "state of the art" on research in the AI-PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.
Collapse
Affiliation(s)
- Jalil Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Luis Ochoa-Toledo
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Mario Javier Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Atocha Aliseda
- Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Fernando Pérez-Escamirosa
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | | | - Francine Ochoa-Fernández
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Ricardo Zamora-Solís
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Sebastián Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Cristina Revilla-Monsalve
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Nicolás Kemper-Valverde
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Myriam M. Altamirano-Bustamante
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| |
Collapse
|
16
|
Abdin O, Nim S, Wen H, Kim PM. PepNN: a deep attention model for the identification of peptide binding sites. Commun Biol 2022; 5:503. [PMID: 35618814 PMCID: PMC9135736 DOI: 10.1038/s42003-022-03445-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 05/03/2022] [Indexed: 11/09/2022] Open
Abstract
Protein-peptide interactions play a fundamental role in many cellular processes, but remain underexplored experimentally and difficult to model computationally. Here, we present PepNN-Struct and PepNN-Seq, structure and sequence-based approaches for the prediction of peptide binding sites on a protein. A main difficulty for the prediction of peptide-protein interactions is the flexibility of peptides and their tendency to undergo conformational changes upon binding. Motivated by this, we developed reciprocal attention to simultaneously update the encodings of peptide and protein residues while enforcing symmetry, allowing for information flow between the two inputs. PepNN integrates this module with modern graph neural network layers and a series of transfer learning steps are used during training to compensate for the scarcity of peptide-protein complex information. We show that PepNN-Struct achieves consistently high performance across different benchmark datasets. We also show that PepNN makes reasonable peptide-agnostic predictions, allowing for the identification of novel peptide binding proteins.
Collapse
Affiliation(s)
- Osama Abdin
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Satra Nim
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Han Wen
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Philip M Kim
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada.
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, M5S 3E1, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 3E1, Canada.
| |
Collapse
|
17
|
Wang R, Jin J, Zou Q, Nakai K, Wei L. Predicting protein-peptide binding residues via interpretable deep learning. Bioinformatics 2022; 38:3351-3360. [PMID: 35604077 DOI: 10.1093/bioinformatics/btac352] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 04/13/2022] [Accepted: 05/18/2022] [Indexed: 11/14/2022] Open
Abstract
Identifying the protein-peptide binding residues is fundamentally important to understand the mechanisms of protein functions and explore drug discovery. Although several computational methods have been developed, they highly rely on third-party tools or information for feature design, easily resulting in low computational efficacy and suffering from low predictive performance. To address the limitations, we propose PepBCL, a novel BERT (Bidirectional Encoder Representation from Transformers)-based Contrastive Learning framework to predict the protein-Peptide binding residues based on protein sequences only. PepBCL is an end-to-end predictive model that is independent of designed features. Specifically, we introduce a well pre-trained protein language model that can automatically extract and learn high-latent representations of protein sequences relevant for protein structure and functions. Further, we design a novel contrastive learning module to optimize the feature representations of binding residues underlying the imbalanced dataset. We demonstrate that our proposed method significantly outperforms the state-of-the-art methods under benchmarking comparison, and achieves more robust performance. Moreover, we found that we further improve the performance via the integration of traditional features and our learnt features. Our results highlight the flexibility and adaptability of deep learning-based protein language model to capture both conserved and non-conserved sequential characteristics of peptide-binding residues. Interestingly, we demonstrate that peptide-binding residues in local sequential regions have more specific sequential patterns as compared with other protein-ligand binding residues, which potentially provides functional difference. Finally, to facilitate the use of our method, we establish an online predictive platform as the implementation of the proposed PepBCL, which is now available at http://server.wei-group.net/PepBCL/. AVAILABILITY https://github.com/Ruheng-W/PepBCL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruheng Wang
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Junru Jin
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| | - Kenta Nakai
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| |
Collapse
|
18
|
Kozlovskii I, Popov P. Protein-Peptide Binding Site Detection Using 3D Convolutional Neural Networks. J Chem Inf Model 2021; 61:3814-3823. [PMID: 34292750 DOI: 10.1021/acs.jcim.1c00475] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Peptides and peptide-based molecules represent a promising therapeutic modality targeting intracellular protein-protein interactions, potentially combining the beneficial properties of biologics and small-molecule drugs. Protein-peptide complexes occupy a unique niche of interaction interfaces with respect to protein-protein and protein-small molecule complexes. Protein-peptide binding site identification resembles image object detection, a field that had been revolutionalized with computer vision techniques. We present a new protein-peptide binding site detection method called BiteNetPp by harnessing the power of 3D convolutional neural network. Our method employs a tensor-based representation of spatial protein structures, which is fed to 3D convolutional neural network, resulting in probability scores and coordinates of the binding "hot spots" in the input structures. We used the domain adaptation technique to fine-tune model trained on protein-small molecule complexes using a manually curated set of protein-peptide structures. BiteNetPp consistently outperforms existing state-of-the-art methods in the independent test benchmark. It takes less than a second to analyze a single-protein structure, making BiteNetPp suitable for the large-scale analysis of protein-peptide binding sites.
Collapse
Affiliation(s)
- Igor Kozlovskii
- iMolecule, Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| | - Petr Popov
- iMolecule, Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| |
Collapse
|
19
|
Puentes PR, Henao MC, Torres CE, Gómez SC, Gómez LA, Burgos JC, Arbeláez P, Osma JF, Muñoz-Camargo C, Reyes LH, Cruz JC. Design, Screening, and Testing of Non-Rational Peptide Libraries with Antimicrobial Activity: In Silico and Experimental Approaches. Antibiotics (Basel) 2020; 9:E854. [PMID: 33265897 PMCID: PMC7759991 DOI: 10.3390/antibiotics9120854] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 11/20/2020] [Accepted: 11/23/2020] [Indexed: 12/13/2022] Open
Abstract
One of the challenges of modern biotechnology is to find new routes to mitigate the resistance to conventional antibiotics. Antimicrobial peptides (AMPs) are an alternative type of biomolecules, naturally present in a wide variety of organisms, with the capacity to overcome the current microorganism resistance threat. Here, we reviewed our recent efforts to develop a new library of non-rationally produced AMPs that relies on bacterial genome inherent diversity and compared it with rationally designed libraries. Our approach is based on a four-stage workflow process that incorporates the interplay of recent developments in four major emerging technologies: artificial intelligence, molecular dynamics, surface-display in microorganisms, and microfluidics. Implementing this framework is challenging because to obtain reliable results, the in silico algorithms to search for candidate AMPs need to overcome issues of the state-of-the-art approaches that limit the possibilities for multi-space data distribution analyses in extremely large databases. We expect to tackle this challenge by using a recently developed classification algorithm based on deep learning models that rely on convolutional layers and gated recurrent units. This will be complemented by carefully tailored molecular dynamics simulations to elucidate specific interactions with lipid bilayers. Candidate AMPs will be recombinantly-expressed on the surface of microorganisms for further screening via different droplet-based microfluidic-based strategies to identify AMPs with the desired lytic abilities. We believe that the proposed approach opens opportunities for searching and screening bioactive peptides for other applications.
Collapse
Affiliation(s)
- Paola Ruiz Puentes
- Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogota DC 111711, Colombia; (P.R.P.); (P.A.)
- Department of Biomedical Engineering, Universidad de los Andes, Bogota DC 111711, Colombia; (C.E.T.); (S.C.G.); (L.A.G.); (C.M.-C.)
| | - María C. Henao
- Grupo de Diseño de Productos y Procesos, Department of Chemical and Food Engineering, Universidad de los Andes, Bogota DC 111711, Colombia;
| | - Carlos E. Torres
- Department of Biomedical Engineering, Universidad de los Andes, Bogota DC 111711, Colombia; (C.E.T.); (S.C.G.); (L.A.G.); (C.M.-C.)
| | - Saúl C. Gómez
- Department of Biomedical Engineering, Universidad de los Andes, Bogota DC 111711, Colombia; (C.E.T.); (S.C.G.); (L.A.G.); (C.M.-C.)
| | - Laura A. Gómez
- Department of Biomedical Engineering, Universidad de los Andes, Bogota DC 111711, Colombia; (C.E.T.); (S.C.G.); (L.A.G.); (C.M.-C.)
| | - Juan C. Burgos
- Chemical Engineering Program, Universidad de Cartagena, Cartagena 130015, Colombia;
| | - Pablo Arbeláez
- Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogota DC 111711, Colombia; (P.R.P.); (P.A.)
- Department of Biomedical Engineering, Universidad de los Andes, Bogota DC 111711, Colombia; (C.E.T.); (S.C.G.); (L.A.G.); (C.M.-C.)
| | - Johann F. Osma
- Department of Electrical and Electronic Engineering, Universidad de los Andes, Bogota DC 111711, Colombia;
| | - Carolina Muñoz-Camargo
- Department of Biomedical Engineering, Universidad de los Andes, Bogota DC 111711, Colombia; (C.E.T.); (S.C.G.); (L.A.G.); (C.M.-C.)
| | - Luis H. Reyes
- Grupo de Diseño de Productos y Procesos, Department of Chemical and Food Engineering, Universidad de los Andes, Bogota DC 111711, Colombia;
| | - Juan C. Cruz
- Department of Biomedical Engineering, Universidad de los Andes, Bogota DC 111711, Colombia; (C.E.T.); (S.C.G.); (L.A.G.); (C.M.-C.)
- School of Chemical Engineering and Advanced Materials, The University of Adelaide, Adelaide 5005, Australia
| |
Collapse
|