1
|
Feng Z, Huang W, Li H, Zhu H, Kang Y, Li Z. DGCPPISP: a PPI site prediction model based on dynamic graph convolutional network and two-stage transfer learning. BMC Bioinformatics 2024; 25:252. [PMID: 39085781 PMCID: PMC11293074 DOI: 10.1186/s12859-024-05864-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Accepted: 07/10/2024] [Indexed: 08/02/2024] Open
Abstract
BACKGROUND Proteins play a pivotal role in the diverse array of biological processes, making the precise prediction of protein-protein interaction (PPI) sites critical to numerous disciplines including biology, medicine and pharmacy. While deep learning methods have progressively been implemented for the prediction of PPI sites within proteins, the task of enhancing their predictive performance remains an arduous challenge. RESULTS In this paper, we propose a novel PPI site prediction model (DGCPPISP) based on a dynamic graph convolutional neural network and a two-stage transfer learning strategy. Initially, we implement the transfer learning from dual perspectives, namely feature input and model training that serve to supply efficacious prior knowledge for our model. Subsequently, we construct a network designed for the second stage of training, which is built on the foundation of dynamic graph convolution. CONCLUSIONS To evaluate its effectiveness, the performance of the DGCPPISP model is scrutinized using two benchmark datasets. The ensuing results demonstrate that DGCPPISP outshines competing methods in terms of performance. Specifically, DGCPPISP surpasses the second-best method, EGRET, by margins of 5.9%, 10.1%, and 13.3% for F1-measure, AUPRC, and MCC metrics respectively on Dset_186_72_PDB164. Similarly, on Dset_331, it eclipses the performance of the runner-up method, HN-PPISP, by 14.5%, 19.8%, and 29.9% respectively.
Collapse
Affiliation(s)
- Zijian Feng
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, 313000, Zhejiang, China
- College of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, Zhejiang, China
| | - Weihong Huang
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, 313000, Zhejiang, China
- College of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, Zhejiang, China
| | - Haohao Li
- College of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, Zhejiang, China
| | - Hancan Zhu
- School of Mathematics, Physics and Information, Shaoxing University, Shaoxing, 312000, Zhejiang, China
| | - Yanlei Kang
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, 313000, Zhejiang, China
| | - Zhong Li
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, 313000, Zhejiang, China.
- College of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, Zhejiang, China.
| |
Collapse
|
2
|
Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024; 25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open
Abstract
Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.
Collapse
Affiliation(s)
- Pengzhen Jia
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Chaojin Wu
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| |
Collapse
|
3
|
Cong H, Liu H, Cao Y, Liang C, Chen Y. Protein-protein interaction site prediction by model ensembling with hybrid feature and self-attention. BMC Bioinformatics 2023; 24:456. [PMID: 38053020 DOI: 10.1186/s12859-023-05592-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 11/30/2023] [Indexed: 12/07/2023] Open
Abstract
BACKGROUND Protein-protein interactions (PPIs) are crucial in various biological functions and cellular processes. Thus, many computational approaches have been proposed to predict PPI sites. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in sequences. Many feature extraction methods rely on the sliding window technique, which simply merges all the features of residues into a vector. The importance of some key residues may be weakened in the feature vector, leading to poor performance. RESULTS We propose a novel sequence-based method for PPI sites prediction. The new network model, PPINet, contains multiple feature processing paths. For a residue, the PPINet extracts the features of the targeted residue and its context separately. These two types of features are processed by two paths in the network and combined to form a protein representation, where the two types of features are of relatively equal importance. The model ensembling technique is applied to make use of more features. The base models are trained with different features and then ensembled via stacking. In addition, a data balancing strategy is presented, by which our model can get significant improvement on highly unbalanced data. CONCLUSION The proposed method is evaluated on a fused dataset constructed from Dset186, Dset_72, and PDBset_164, as well as the public Dset_448 dataset. Compared with current state-of-the-art methods, the performance of our method is better than the others. In the most important metrics, such as AUPRC and recall, it surpasses the second-best programmer on the latter dataset by 6.9% and 4.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model, especially, the hybrid feature. We share our code for reproducibility and future research at https://github.com/CandiceCong/StackingPPINet .
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China.
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China.
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| |
Collapse
|
4
|
Popov P, Kalinin R, Buslaev P, Kozlovskii I, Zaretckii M, Karlov D, Gabibov A, Stepanov A. Unraveling viral drug targets: a deep learning-based approach for the identification of potential binding sites. Brief Bioinform 2023; 25:bbad459. [PMID: 38113077 PMCID: PMC10783863 DOI: 10.1093/bib/bbad459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 11/10/2023] [Accepted: 11/22/2023] [Indexed: 12/21/2023] Open
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has spurred a wide range of approaches to control and combat the disease. However, selecting an effective antiviral drug target remains a time-consuming challenge. Computational methods offer a promising solution by efficiently reducing the number of candidates. In this study, we propose a structure- and deep learning-based approach that identifies vulnerable regions in viral proteins corresponding to drug binding sites. Our approach takes into account the protein dynamics, accessibility and mutability of the binding site and the putative mechanism of action of the drug. We applied this technique to validate drug targeting toward severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike glycoprotein S. Our findings reveal a conformation- and oligomer-specific glycan-free binding site proximal to the receptor binding domain. This site comprises topologically important amino acid residues. Molecular dynamics simulations of Spike in complex with candidate drug molecules bound to the potential binding sites indicate an equilibrium shifted toward the inactive conformation compared with drug-free simulations. Small molecules targeting this binding site have the potential to prevent the closed-to-open conformational transition of Spike, thereby allosterically inhibiting its interaction with human angiotensin-converting enzyme 2 receptor. Using a pseudotyped virus-based assay with a SARS-CoV-2 neutralizing antibody, we identified a set of hit compounds that exhibited inhibition at micromolar concentrations.
Collapse
Affiliation(s)
- Petr Popov
- Tetra-d, Rheinweg 9, Schaffhausen, 8200, Switzerland
- School of Science, Constructor University Bremen gGmbH, 28759, Bremen, Germany
| | - Roman Kalinin
- M.M. Shemyakin and Yu.A. Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow, 117997, Russia
| | - Pavel Buslaev
- Nanoscience Center and Department of Chemistry, University of Jyväskylä, 40014, Jyväskylä, Finland
| | - Igor Kozlovskii
- Tetra-d, Rheinweg 9, Schaffhausen, 8200, Switzerland
- School of Science, Constructor University Bremen gGmbH, 28759, Bremen, Germany
| | - Mark Zaretckii
- Tetra-d, Rheinweg 9, Schaffhausen, 8200, Switzerland
- School of Science, Constructor University Bremen gGmbH, 28759, Bremen, Germany
| | - Dmitry Karlov
- School of Pharmacy, Medical Biology Centre, Queen’s University Belfast, Street, Belfast, BT9 7BL Northern Ireland, U.K
| | - Alexander Gabibov
- M.M. Shemyakin and Yu.A. Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow, 117997, Russia
| | - Alexey Stepanov
- Department of Chemistry, The Scripps Research Institute, 10550 North Torrey Pines Road MB-10, La Jolla, 92037, CA, USA
| |
Collapse
|
5
|
Nikam R, Yugandhar K, Gromiha MM. DeepBSRPred: deep learning-based binding site residue prediction for proteins. Amino Acids 2023; 55:1305-1316. [PMID: 36574037 DOI: 10.1007/s00726-022-03228-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 12/15/2022] [Indexed: 12/28/2022]
Abstract
MOTIVATION Proteins-protein interactions (PPIs) are important to govern several cellular activities. Amino acid residues, which are located at the interface are known as the binding sites and the information about binding sites helps to understand the binding affinities and functions of protein-protein complexes. RESULTS We have developed a deep neural network-based method, DeepBSRPred, for predicting the binding sites using protein sequence information and predicted structures from AlphaFold2. Specific sequence and structure-based features include position-specific scoring matrix (PSSM), solvent accessible surface area, conservation score and amino acid properties, and residue depth, respectively. Our method predicted the binding sites with an average F1 score of 0.73 in a dataset of 1236 proteins. Further, we compared the performance with other existing methods in the literature using four benchmark datasets and our method outperformed those methods. AVAILABILITY AND IMPLEMENTATION The DeepBSRPred web server can be found at https://web.iitm.ac.in/bioinfo2/deepbsrpred/index.html , along with all datasets used in this study. The trained models, the DeepBSRPred standalone source code, and the feature computation pipeline are freely available at https://web.iitm.ac.in/bioinfo2/deepbsrpred/download.html .
Collapse
Affiliation(s)
- Rahul Nikam
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Kumar Yugandhar
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
- Department of Computational Biology, Cornell University, New York, NY, USA
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India.
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Japan.
| |
Collapse
|
6
|
Mou M, Pan Z, Zhou Z, Zheng L, Zhang H, Shi S, Li F, Sun X, Zhu F. A Transformer-Based Ensemble Framework for the Prediction of Protein-Protein Interaction Sites. RESEARCH (WASHINGTON, D.C.) 2023; 6:0240. [PMID: 37771850 PMCID: PMC10528219 DOI: 10.34133/research.0240] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/08/2023] [Indexed: 09/30/2023]
Abstract
The identification of protein-protein interaction (PPI) sites is essential in the research of protein function and the discovery of new drugs. So far, a variety of computational tools based on machine learning have been developed to accelerate the identification of PPI sites. However, existing methods suffer from the low predictive accuracy or the limited scope of application. Specifically, some methods learned only global or local sequential features, leading to low predictive accuracy, while others achieved improved performance by extracting residue interactions from structures but were limited in their application scope for the serious dependence on precise structure information. There is an urgent need to develop a method that integrates comprehensive information to realize proteome-wide accurate profiling of PPI sites. Herein, a novel ensemble framework for PPI sites prediction, EnsemPPIS, was therefore proposed based on transformer and gated convolutional networks. EnsemPPIS can effectively capture not only global and local patterns but also residue interactions. Specifically, EnsemPPIS was unique in (a) extracting residue interactions from protein sequences with transformer and (b) further integrating global and local sequential features with the ensemble learning strategy. Compared with various existing methods, EnsemPPIS exhibited either superior performance or broader applicability on multiple PPI sites prediction tasks. Moreover, pattern analysis based on the interpretability of EnsemPPIS demonstrated that EnsemPPIS was fully capable of learning residue interactions within the local structure of PPI sites using only sequence information. The web server of EnsemPPIS is freely available at http://idrblab.org/ensemppis.
Collapse
Affiliation(s)
- Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Zhimeng Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
7
|
Liu T, Gao H, Ren X, Xu G, Liu B, Wu N, Luo H, Wang Y, Tu T, Yao B, Guan F, Teng Y, Huang H, Tian J. Protein-protein interaction and site prediction using transfer learning. Brief Bioinform 2023; 24:bbad376. [PMID: 37870286 DOI: 10.1093/bib/bbad376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/14/2023] [Accepted: 10/02/2023] [Indexed: 10/24/2023] Open
Abstract
The advanced language models have enabled us to recognize protein-protein interactions (PPIs) and interaction sites using protein sequences or structures. Here, we trained the MindSpore ProteinBERT (MP-BERT) model, a Bidirectional Encoder Representation from Transformers, using protein pairs as inputs, making it suitable for identifying PPIs and their respective interaction sites. The pretrained model (MP-BERT) was fine-tuned as MPB-PPI (MP-BERT on PPI) and demonstrated its superiority over the state-of-the-art models on diverse benchmark datasets for predicting PPIs. Moreover, the model's capability to recognize PPIs among various organisms was evaluated on multiple organisms. An amalgamated organism model was designed, exhibiting a high level of generalization across the majority of organisms and attaining an accuracy of 92.65%. The model was also customized to predict interaction site propensity by fine-tuning it with PPI site data as MPB-PPISP. Our method facilitates the prediction of both PPIs and their interaction sites, thereby illustrating the potency of transfer learning in dealing with the protein pair task.
Collapse
Affiliation(s)
- Tuoyu Liu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Han Gao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Xiaopu Ren
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Guoshun Xu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Bo Liu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Ningfeng Wu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Huiying Luo
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Yuan Wang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Tao Tu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Bin Yao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Feifei Guan
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yue Teng
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing 100071, China
| | - Huoqing Huang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Jian Tian
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
8
|
Rehman AU, Khurshid B, Ali Y, Rasheed S, Wadood A, Ng HL, Chen HF, Wei Z, Luo R, Zhang J. Computational approaches for the design of modulators targeting protein-protein interactions. Expert Opin Drug Discov 2023; 18:315-333. [PMID: 36715303 PMCID: PMC10149343 DOI: 10.1080/17460441.2023.2171396] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 01/18/2023] [Indexed: 01/31/2023]
Abstract
BACKGROUND Protein-protein interactions (PPIs) are intriguing targets for designing novel small-molecule inhibitors. The role of PPIs in various infectious and neurodegenerative disorders makes them potential therapeutic targets . Despite being portrayed as undruggable targets, due to their flat surfaces, disorderedness, and lack of grooves. Recent progresses in computational biology have led researchers to reconsider PPIs in drug discovery. AREAS COVERED In this review, we introduce in-silico methods used to identify PPI interfaces and present an in-depth overview of various computational methodologies that are successfully applied to annotate the PPIs. We also discuss several successful case studies that use computational tools to understand PPIs modulation and their key roles in various physiological processes. EXPERT OPINION Computational methods face challenges due to the inherent flexibility of proteins, which makes them expensive, and result in the use of rigid models. This problem becomes more significant in PPIs due to their flexible and flat interfaces. Computational methods like molecular dynamics (MD) simulation and machine learning can integrate the chemical structure data into biochemical and can be used for target identification and modulation. These computational methodologies have been crucial in understanding the structure of PPIs, designing PPI modulators, discovering new drug targets, and predicting treatment outcomes.
Collapse
Affiliation(s)
- Ashfaq Ur Rehman
- Departments of Molecular Biology and Biochemistry, Chemical and Biomolecular Engineering, Materials Science and Engineering, and Biomedical Engineering, Graduate Program in Chemical and Materials Physics, University of California Irvine, Irvine, California, USA
- Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Medicinal Bioinformatics Center, Shanghai Jiao-Tong University School of Medicine, Shanghai, Zhejiang, China
| | - Beenish Khurshid
- Department of Biochemistry, Abdul Wali Khan University Mardan, Pakistan
| | - Yasir Ali
- National Center for Bioinformatics, Quaid-e-Azam University, Islamabad, Pakistan
| | - Salman Rasheed
- National Center for Bioinformatics, Quaid-e-Azam University, Islamabad, Pakistan
| | - Abdul Wadood
- Department of Biochemistry, Abdul Wali Khan University Mardan, Pakistan
| | - Ho-Leung Ng
- Department of Biochemistry and Molecular Biophysics, Kansas State University, Manhattan, Kansas, USA
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, Zhejiang, China
| | - Zhiqiang Wei
- Medicinal Chemistry and Bioinformatics Center, Ocean University of China, Qingdao, Shandong, China
| | - Ray Luo
- Departments of Molecular Biology and Biochemistry, Chemical and Biomolecular Engineering, Materials Science and Engineering, and Biomedical Engineering, Graduate Program in Chemical and Materials Physics, University of California Irvine, Irvine, California, USA
| | - Jian Zhang
- Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Medicinal Bioinformatics Center, Shanghai Jiao-Tong University School of Medicine, Shanghai, Zhejiang, China
- School of Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, Henan, China
| |
Collapse
|
9
|
Han B, Ren C, Wang W, Li J, Gong X. Computational Prediction of Protein Intrinsically Disordered Region Related Interactions and Functions. Genes (Basel) 2023; 14:432. [PMID: 36833360 PMCID: PMC9956190 DOI: 10.3390/genes14020432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/02/2023] [Accepted: 02/05/2023] [Indexed: 02/11/2023] Open
Abstract
Intrinsically Disordered Proteins (IDPs) and Regions (IDRs) exist widely. Although without well-defined structures, they participate in many important biological processes. In addition, they are also widely related to human diseases and have become potential targets in drug discovery. However, there is a big gap between the experimental annotations related to IDPs/IDRs and their actual number. In recent decades, the computational methods related to IDPs/IDRs have been developed vigorously, including predicting IDPs/IDRs, the binding modes of IDPs/IDRs, the binding sites of IDPs/IDRs, and the molecular functions of IDPs/IDRs according to different tasks. In view of the correlation between these predictors, we have reviewed these prediction methods uniformly for the first time, summarized their computational methods and predictive performance, and discussed some problems and perspectives.
Collapse
Affiliation(s)
- Bingqing Han
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Chongjiao Ren
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Wenda Wang
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Jiashan Li
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Xinqi Gong
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
- Beijing Academy of Intelligence, Beijing 100083, China
| |
Collapse
|
10
|
Kang Y, Xu Y, Wang X, Pu B, Yang X, Rao Y, Chen J. HN-PPISP: a hybrid network based on MLP-Mixer for protein-protein interaction site prediction. Brief Bioinform 2023; 24:6833645. [PMID: 36403092 DOI: 10.1093/bib/bbac480] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 09/16/2022] [Accepted: 10/09/2022] [Indexed: 11/21/2022] Open
Abstract
MOTIVATION Biological experimental approaches to protein-protein interaction (PPI) site prediction are critical for understanding the mechanisms of biochemical processes but are time-consuming and laborious. With the development of Deep Learning (DL) techniques, the most popular Convolutional Neural Networks (CNN)-based methods have been proposed to address these problems. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in protein sequences. Current methods cannot efficiently explore the nature of Position Specific Scoring Matrix (PSSM), secondary structure and raw protein sequences by processing them all together. For PPI site prediction, how to effectively model the PPI context with attention to prediction remains an open problem. In addition, the long-distance dependencies of PPI features are important, which is very challenging for many CNN-based methods because the innate ability of CNN is difficult to outperform auto-regressive models like Transformers. RESULTS To effectively mine the properties of PPI features, a novel hybrid neural network named HN-PPISP is proposed, which integrates a Multi-layer Perceptron Mixer (MLP-Mixer) module for local feature extraction and a two-stage multi-branch module for global feature capture. The model merits Transformer, TextCNN and Bi-LSTM as a powerful alternative for PPI site prediction. On the one hand, this is the first application of an advanced Transformer (i.e. MLP-Mixer) with a hybrid network for sequence-based PPI prediction. On the other hand, unlike existing methods that treat global features altogether, the proposed two-stage multi-branch hybrid module firstly assigns different attention scores to the input features and then encodes the feature through different branch modules. In the first stage, different improved attention modules are hybridized to extract features from the raw protein sequences, secondary structure and PSSM, respectively. In the second stage, a multi-branch network is designed to aggregate information from both branches in parallel. The two branches encode the features and extract dependencies through several operations such as TextCNN, Bi-LSTM and different activation functions. Experimental results on real-world public datasets show that our model consistently achieves state-of-the-art performance over seven remarkable baselines. AVAILABILITY The source code of HN-PPISP model is available at https://github.com/ylxu05/HN-PPISP.
Collapse
Affiliation(s)
- Yan Kang
- National Pilot School of Software, Yunnan University, Kunming, 650091, P.R. China
| | - Yulong Xu
- National Pilot School of Software, Yunnan University, Kunming, 650091, P.R. China
| | - Xinchao Wang
- National Pilot School of Software, Yunnan University, Kunming, 650091, P.R. China
| | - Bin Pu
- College of Computer Science and Electronic Engineeringg, Hunan University, Changsha, 410082, P.R. China
| | - Xuekun Yang
- National Pilot School of Software, Yunnan University, Kunming, 650091, P.R. China
| | - Yulong Rao
- National Pilot School of Software, Yunnan University, Kunming, 650091, P.R. China
| | - Jianguo Chen
- School of Software Engineering, Sun Yat-Sen University, Zhuhai, 519082, P.R. China
| |
Collapse
|
11
|
Li K, Quan L, Jiang Y, Li Y, Zhou Y, Wu T, Lyu Q. ctP 2ISP: Protein-Protein Interaction Sites Prediction Using Convolution and Transformer With Data Augmentation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:297-306. [PMID: 35213314 DOI: 10.1109/tcbb.2022.3154413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Protein-protein interactions are the basis of many cellular biological processes, such as cellular organization, signal transduction, and immune response. Identifying protein-protein interaction sites is essential for understanding the mechanisms of various biological processes, disease development, and drug design. However, it remains a challenging task to make accurate predictions, as the small amount of training data and severe imbalanced classification reduce the performance of computational methods. We design a deep learning method named ctP2ISP to improve the prediction of protein-protein interaction sites. ctP2ISP employs Convolution and Transformer to extract information and enhance information perception so that semantic features can be mined to identify protein-protein interaction sites. A weighting loss function with different sample weights is designed to suppress the preference of the model toward multi-category prediction. To efficiently reuse the information in the training set, a preprocessing of data augmentation with an improved sample-oriented sampling strategy is applied. The trained ctP2ISP was evaluated against current state-of-the-art methods on six public datasets. The results show that ctP2ISP outperforms all other competing methods on the balance metrics: F1, MCC, and AUPRC. In particular, our prediction on open tests related to viruses may also be consistent with biological insights. The source code and data can be obtained from https://github.com/lennylv/ctP2ISP.
Collapse
|
12
|
Guo Z, Yamaguchi R. Machine learning methods for protein-protein binding affinity prediction in protein design. FRONTIERS IN BIOINFORMATICS 2022; 2:1065703. [PMID: 36591334 PMCID: PMC9800603 DOI: 10.3389/fbinf.2022.1065703] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/01/2022] [Indexed: 12/23/2022] Open
Abstract
Protein-protein interactions govern a wide range of biological activity. A proper estimation of the protein-protein binding affinity is vital to design proteins with high specificity and binding affinity toward a target protein, which has a variety of applications including antibody design in immunotherapy, enzyme engineering for reaction optimization, and construction of biosensors. However, experimental and theoretical modelling methods are time-consuming, hinder the exploration of the entire protein space, and deter the identification of optimal proteins that meet the requirements of practical applications. In recent years, the rapid development in machine learning methods for protein-protein binding affinity prediction has revealed the potential of a paradigm shift in protein design. Here, we review the prediction methods and associated datasets and discuss the requirements and construction methods of binding affinity prediction models for protein design.
Collapse
Affiliation(s)
- Zhongliang Guo
- Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Nagoya, Aichi, Japan
| | - Rui Yamaguchi
- Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Nagoya, Aichi, Japan,Division of Cancer Informatics, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan,*Correspondence: Rui Yamaguchi,
| |
Collapse
|
13
|
Gu L, Li B, Ming D. A multilayer dynamic perturbation analysis method for predicting ligand-protein interactions. BMC Bioinformatics 2022; 23:456. [PMID: 36324073 PMCID: PMC9628359 DOI: 10.1186/s12859-022-04995-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 10/19/2022] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND Ligand-protein interactions play a key role in defining protein function, and detecting natural ligands for a given protein is thus a very important bioengineering task. In particular, with the rapid development of AI-based structure prediction algorithms, batch structural models with high reliability and accuracy can be obtained at low cost, giving rise to the urgent requirement for the prediction of natural ligands based on protein structures. In recent years, although several structure-based methods have been developed to predict ligand-binding pockets and ligand-binding sites, accurate and rapid methods are still lacking, especially for the prediction of ligand-binding regions and the spatial extension of ligands in the pockets. RESULTS In this paper, we proposed a multilayer dynamics perturbation analysis (MDPA) method for predicting ligand-binding regions based solely on protein structure, which is an extended version of our previously developed fast dynamic perturbation analysis (FDPA) method. In MDPA/FDPA, ligand binding tends to occur in regions that cause large changes in protein conformational dynamics. MDPA, examined using a standard validation dataset of ligand-protein complexes, yielded an averaged ligand-binding site prediction Matthews coefficient of 0.40, with a prediction precision of at least 50% for 71% of the cases. In particular, for 80% of the cases, the predicted ligand-binding region overlaps the natural ligand by at least 50%. The method was also compared with other state-of-the-art structure-based methods. CONCLUSIONS MDPA is a structure-based method to detect ligand-binding regions on protein surface. Our calculations suggested that a range of spaces inside the protein pockets has subtle interactions with the protein, which can significantly impact on the overall dynamics of the protein. This work provides a valuable tool as a starting point upon which further docking and analysis methods can be used for natural ligand detection in protein functional annotation. The source code of MDPA method is freely available at: https://github.com/mingdengming/mdpa .
Collapse
Affiliation(s)
- Lin Gu
- grid.412022.70000 0000 9389 5210College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangbei New District, Nanjing City, 211816 Jiangsu People’s Republic of China
| | - Bin Li
- grid.412022.70000 0000 9389 5210College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangbei New District, Nanjing City, 211816 Jiangsu People’s Republic of China
| | - Dengming Ming
- grid.412022.70000 0000 9389 5210College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangbei New District, Nanjing City, 211816 Jiangsu People’s Republic of China
| |
Collapse
|
14
|
Prediction of Protein-Protein Interaction Sites by Multifeature Fusion and RF with mRMR and IFS. DISEASE MARKERS 2022; 2022:5892627. [PMID: 36246558 PMCID: PMC9553539 DOI: 10.1155/2022/5892627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/17/2022]
Abstract
Prediction of protein-protein interaction (PPI) sites is one of the most perplexing problems in drug discovery and computational biology. Although significant progress has been made by combining different machine learning techniques with a variety of distinct characteristics, the problem still remains unresolved. In this study, a technique for PPI sites is presented using a random forest (RF) algorithm followed by the minimum redundancy maximal relevance (mRMR) approach, and the method of incremental feature selection (IFS). Physicochemical properties of proteins and the features of the residual disorder, sequence conservation, secondary structure, and solvent accessibility are incorporated. Five 3D structural characteristics are also used to predict PPI sites. Analysis of features shows that 3D structural features such as relative solvent-accessible surface area (RASA) and surface curvature (SC) help in the prediction of PPI sites. Results show that the performance of the proposed predictor is superior to several other state-of-the-art predictors, whose average prediction accuracy is 81.44%, sensitivity is 82.17%, and specificity is 80.71%, respectively. The proposed predictor is expected to become a helpful tool for finding PPI sites, and the feature analysis presented in this study will give useful insights into protein interaction mechanisms.
Collapse
|
15
|
Topcu E, Ridgeway NH, Biggar KK. PeSA 2.0: A software tool for peptide specificity analysis implementing positive and negative motifs and motif-based peptide scoring. Comput Biol Chem 2022; 101:107753. [PMID: 35998543 DOI: 10.1016/j.compbiolchem.2022.107753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 08/05/2022] [Accepted: 08/09/2022] [Indexed: 11/26/2022]
Abstract
There are a vast number of molecular interactions that occur at the cellular level. Among these molecular interactions, interactions between multiple proteins are a widely studied area of research due to the importance of these interactions in cellular function and their potential in drug development. PeSA is a desktop application developed to facilitate the in vitro peptide study analysis to predict protein-protein interactions. PeSA can effortlessly generate visual outputs like motifs, bar charts, and visual matrices. Our implementation of PeSA version 2.0 includes additional tools, including the ability to further score peptide lists for consensus amongst interactions. The software is also able to design de novo peptides based on sequence motifs (sequence generator), which can be used to help design additional experiments for motif validation. Further, the efficacy of the sequence generator was validated using the lysine methyltransferase, SETD8, to identify new substrates of methylation based on motif-based predictions developed using PeSA2.0.
Collapse
Affiliation(s)
- Emine Topcu
- Institute of Biochemistry and Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario K1N 5B6, Canada
| | - Nashira H Ridgeway
- Institute of Biochemistry and Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario K1N 5B6, Canada
| | - Kyle K Biggar
- Institute of Biochemistry and Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario K1N 5B6, Canada.
| |
Collapse
|
16
|
Multi-task learning to leverage partially annotated data for PPI interface prediction. Sci Rep 2022; 12:10487. [PMID: 35729253 PMCID: PMC9213449 DOI: 10.1038/s41598-022-13951-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 05/31/2022] [Indexed: 11/29/2022] Open
Abstract
Protein protein interactions (PPI) are crucial for protein functioning, nevertheless predicting residues in PPI interfaces from the protein sequence remains a challenging problem. In addition, structure-based functional annotations, such as the PPI interface annotations, are scarce: only for about one-third of all protein structures residue-based PPI interface annotations are available. If we want to use a deep learning strategy, we have to overcome the problem of limited data availability. Here we use a multi-task learning strategy that can handle missing data. We start with the multi-task model architecture, and adapted it to carefully handle missing data in the cost function. As related learning tasks we include prediction of secondary structure, solvent accessibility, and buried residue. Our results show that the multi-task learning strategy significantly outperforms single task approaches. Moreover, only the multi-task strategy is able to effectively learn over a dataset extended with structural feature data, without additional PPI annotations. The multi-task setup becomes even more important, if the fraction of PPI annotations becomes very small: the multi-task learner trained on only one-eighth of the PPI annotations—with data extension—reaches the same performances as the single-task learner on all PPI annotations. Thus, we show that the multi-task learning strategy can be beneficial for a small training dataset where the protein’s functional properties of interest are only partially annotated.
Collapse
|
17
|
Kim J, Kim RJ, Lee SB, Suh MC. Protein-protein interactions in fatty acid elongase complexes are important for very-long-chain fatty acid synthesis. JOURNAL OF EXPERIMENTAL BOTANY 2022; 73:3004-3017. [PMID: 35560210 DOI: 10.1093/jxb/erab543] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 12/10/2021] [Indexed: 06/15/2023]
Abstract
Fatty acid elongase (FAE), which catalyzes the synthesis of very-long-chain fatty acids (VLCFAs), is a multiprotein complex; however, little is known about its quaternary structure. In this study, bimolecular fluorescence complementation and/or yeast two-hybrid assays showed that homo-interactions were observed in β-ketoacyl-CoA synthases (KCS2, KCS9, and KCS6), Eceriferum2-like proteins [CER2 and CER2-Like2 (C2L2)], and FAE complex proteins (KCR1, PAS2, ECR, and PAS1), except for CER2-Like1 (C2L1). Hetero-interactions were observed between KCSs (KCS2, KCS9, and KCS6), between CER2-LIKEs (CER2, C2L2, and C2L1), and between FAE complex proteins (KCR1, PAS2, ECR, and PAS1). PAS1 interacts with FAE complex proteins (KCR1, PAS2, and ECR), but not with KCSs (KCS2, KCS9, and KCS6) and CER2-LIKEs (CER2, C2L2, and C2L1). Asp308 and Arg309-Arg311 of KCS9 were essential for the homo-interactions of KCS9 and hetero-interactions between KCS9 and PAS2 or ECR. Asp339 of KCS9 is involved in its homo- and hetero-interactions with ECR. Complementation analysis of the Arabidopsis kcs9 mutant by the expression of amino acid-substituted KCS9 mutant genes showed that Asp308 and Asp339 of KCS9 are involved in the synthesis of C24 VLCFAs from C22. This study suggests that protein-protein interaction in FAE complexes is important for VLCFA synthesis and provides insight into the quaternary structure of FAE complexes for efficient synthesis of VLCFAs.
Collapse
Affiliation(s)
- Juyoung Kim
- Department of Bioenergy Science and Technology, Chonnam National University, Gwangju 61186, Republic of Korea
| | - Ryeo Jin Kim
- Department of Life Science, Sogang University, Seoul 04107, Republic of Korea
| | - Saet Buyl Lee
- Department of Agricultural Biotechnology, National Institute of Agricultural Sciences, Rural Development Administration, Jeonju 54874, Republic of Korea
| | - Mi Chung Suh
- Department of Life Science, Sogang University, Seoul 04107, Republic of Korea
| |
Collapse
|
18
|
Tahir M, Khan F, Hayat M, Alshehri MD. An effective machine learning-based model for the prediction of protein–protein interaction sites in health systems. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07024-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
19
|
Elhabashy H, Merino F, Alva V, Kohlbacher O, Lupas AN. Exploring protein-protein interactions at the proteome level. Structure 2022; 30:462-475. [DOI: 10.1016/j.str.2022.02.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 10/26/2021] [Accepted: 02/02/2022] [Indexed: 02/08/2023]
|
20
|
Mahbub S, Bayzid MS. EGRET: edge aggregated graph attention networks and transfer learning improve protein-protein interaction site prediction. Brief Bioinform 2022; 23:6518045. [PMID: 35106547 DOI: 10.1093/bib/bbab578] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 11/25/2021] [Accepted: 12/16/2021] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Protein-protein interactions (PPIs) are central to most biological processes. However, reliable identification of PPI sites using conventional experimental methods is slow and expensive. Therefore, great efforts are being put into computational methods to identify PPI sites. RESULTS We present Edge Aggregated GRaph Attention NETwork (EGRET), a highly accurate deep learning-based method for PPI site prediction, where we have used an edge aggregated graph attention network to effectively leverage the structural information. We, for the first time, have used transfer learning in PPI site prediction. Our proposed edge aggregated network, together with transfer learning, has achieved notable improvement over the best alternate methods. Furthermore, we systematically investigated EGRET's network behavior to provide insights about the causes of its decisions. AVAILABILITY EGRET is freely available as an open source project at https://github.com/Sazan-Mahbub/EGRET. CONTACT shams_bayzid@cse.buet.ac.bd.
Collapse
Affiliation(s)
- Sazan Mahbub
- Department of Computer Science University of Maryland, College Park, Maryland 20742, USA
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| |
Collapse
|
21
|
Pazos F. Computational prediction of protein functional sites-Applications in biotechnology and biomedicine. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:39-57. [PMID: 35534114 DOI: 10.1016/bs.apcsb.2021.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
There are many computational approaches for predicting protein functional sites based on different sequence and structural features. These methods are essential to cope with the sequence deluge that is filling databases with uncharacterized protein sequences. They complement the more expensive and time-consuming experimental approaches by pointing them to possible candidate positions. In many cases they are jointly used to characterize the functional sites in proteins of biotechnological and biomedical interest and eventually modify them for different purposes. There is a clear trend towards approaches based on machine learning and those using structural information, due to the recent developments in these areas. Nevertheless, "classic" methods based on sequence and evolutionary features are still playing an important role as these features are strongly related to functionality. In this review, the main approaches for predicting general functional sites in a protein are discussed, with a focus on sequence-based approaches.
Collapse
Affiliation(s)
- Florencio Pazos
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), Madrid, Spain.
| |
Collapse
|
22
|
Tang M, Wu L, Yu X, Chu Z, Jin S, Liu J. Prediction of Protein-Protein Interaction Sites Based on Stratified Attentional Mechanisms. Front Genet 2021; 12:784863. [PMID: 34880910 PMCID: PMC8647646 DOI: 10.3389/fgene.2021.784863] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 10/08/2021] [Indexed: 11/19/2022] Open
Abstract
Proteins are the basic substances that undertake human life activities, and they often perform their biological functions through interactions with other biological macromolecules, such as cell transmission and signal transduction. Predicting the interaction sites between proteins can deepen the understanding of the principle of protein interactions, but traditional experimental methods are time-consuming and labor-intensive. In this study, a new hierarchical attention network structure, named HANPPIS, by adding six effective features of protein sequence, position-specific scoring matrix (PSSM), secondary structure, pre-training vector, hydrophilic, and amino acid position, is proposed to predict protein–protein interaction (PPI) sites. The experiment proved that our model has obtained very effective results, which was better than the existing advanced calculation methods. More importantly, we used the double-layer attention mechanism to improve the interpretability of the model and to a certain extent solved the problem of the “black box” of deep neural networks, which can be used as a reference for location positioning on the biological level.
Collapse
Affiliation(s)
- Minli Tang
- Department of Computer Science and Technology, Xiamen University, Xiamen, China.,School of Big Data Engineering, Kaili University, Kaili, China
| | - Longxin Wu
- Department of Computer Science and Technology, Xiamen University, Xiamen, China
| | - Xinyu Yu
- Department of Computer Science and Technology, Xiamen University, Xiamen, China
| | - Zhaoqi Chu
- Department of Instrumental and Electrical Engineering, School of Aerospace Engineering, Xiamen University, Xiamen, China
| | - Shuting Jin
- Department of Computer Science and Technology, Xiamen University, Xiamen, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China
| | - Juan Liu
- Department of Instrumental and Electrical Engineering, School of Aerospace Engineering, Xiamen University, Xiamen, China
| |
Collapse
|
23
|
Wang P, Zhang G, Yu ZG, Huang G. A Deep Learning and XGBoost-Based Method for Predicting Protein-Protein Interaction Sites. Front Genet 2021; 12:752732. [PMID: 34764983 PMCID: PMC8576272 DOI: 10.3389/fgene.2021.752732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 09/20/2021] [Indexed: 11/29/2022] Open
Abstract
Knowledge about protein-protein interactions is beneficial in understanding cellular mechanisms. Protein-protein interactions are usually determined according to their protein-protein interaction sites. Due to the limitations of current techniques, it is still a challenging task to detect protein-protein interaction sites. In this article, we presented a method based on deep learning and XGBoost (called DeepPPISP-XGB) for predicting protein-protein interaction sites. The deep learning model served as a feature extractor to remove redundant information from protein sequences. The Extreme Gradient Boosting algorithm was used to construct a classifier for predicting protein-protein interaction sites. The DeepPPISP-XGB achieved the following results: area under the receiver operating characteristic curve of 0.681, a recall of 0.624, and area under the precision-recall curve of 0.339, being competitive with the state-of-the-art methods. We also validated the positive role of global features in predicting protein-protein interaction sites.
Collapse
Affiliation(s)
- Pan Wang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Guiyang Zhang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| | - Guohua Huang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| |
Collapse
|
24
|
Chand GB, Kumar S, Azad GK. Molecular assessment of proteins encoded by the mitochondrial genome of Clarias batrachus and Clarias gariepinus. Biochem Biophys Rep 2021; 26:100985. [PMID: 33855227 PMCID: PMC8024883 DOI: 10.1016/j.bbrep.2021.100985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 03/06/2021] [Accepted: 03/09/2021] [Indexed: 11/25/2022] Open
Abstract
The population of catfish, Clarias batrachus has substantially diminished in various countries and studies show that another related species Clarias gariepinus is replacing it. The better adaptability and survivability of C. gariepinus over C. batrachus could be attributed to the metabolic differences between these two species, which is primarily regulated by mitochondrial activities. To understand the reasons behind this phenomenon, we performed in silico analyses to decipher the differences between the proteins encoded by the mitochondrial genome of these two related species. Our analysis revealed that out of thirteen, twelve proteins encoded by the mitochondrial genome of these two species have substantial variations between them. We characterised these variations by analysing their effect on secondary structure, intrinsic disorder predisposition, and functional impact on protein and stability parameters. Our data show that most of the parameters are changing between these two closely related species. Altogether, we demonstrate the molecular insights into the mitochondrial genome-encoded proteins of these two species and predict their effect on protein function and stability that might be helping C. gariepinus to gain survivability better than the C. batrachus.
Collapse
Affiliation(s)
| | - Sushant Kumar
- Department of Zoology, Patna University, Patna, Bihar, 800005, India
| | | |
Collapse
|
25
|
Li Y, Golding GB, Ilie L. DELPHI: accurate deep ensemble model for protein interaction sites prediction. Bioinformatics 2021; 37:896-904. [PMID: 32840562 DOI: 10.1093/bioinformatics/btaa750] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 08/14/2020] [Accepted: 08/19/2020] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Proteins usually perform their functions by interacting with other proteins, which is why accurately predicting protein-protein interaction (PPI) binding sites is a fundamental problem. Experimental methods are slow and expensive. Therefore, great efforts are being made towards increasing the performance of computational methods. RESULTS We propose DEep Learning Prediction of Highly probable protein Interaction sites (DELPHI), a new sequence-based deep learning suite for PPI-binding sites prediction. DELPHI has an ensemble structure which combines a CNN and a RNN component with fine tuning technique. Three novel features, HSP, position information and ProtVec are used in addition to nine existing ones. We comprehensively compare DELPHI to nine state-of-the-art programmes on five datasets, and DELPHI outperforms the competing methods in all metrics even though its training dataset shares the least similarities with the testing datasets. In the most important metrics, AUPRC and MCC, it surpasses the second best programmes by as much as 18.5% and 27.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model and, especially, the three new features. Using DELPHI it is shown that there is a strong correlation with protein-binding residues (PBRs) and sites with strong evolutionary conservation. In addition, DELPHI's predicted PBR sites closely match known data from Pfam. DELPHI is available as open-sourced standalone software and web server. AVAILABILITY AND IMPLEMENTATION The DELPHI web server can be found at delphi.csd.uwo.ca/, with all datasets and results in this study. The trained models, the DELPHI standalone source code, and the feature computation pipeline are freely available at github.com/lucian-ilie/DELPHI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yiwei Li
- Department of Computer Science, The University of Western Ontario London, ON N6A 5B7, Canada
| | - G Brian Golding
- Department of Biology, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Lucian Ilie
- Department of Computer Science, The University of Western Ontario London, ON N6A 5B7, Canada
| |
Collapse
|
26
|
Hou Q, Stringer B, Waury K, Capel H, Haydarlou R, Xue F, Abeln S, Heringa J, Feenstra KA. SeRenDIP-CE: Sequence-based Interface Prediction for Conformational Epitopes. Bioinformatics 2021; 37:3421-3427. [PMID: 33974039 PMCID: PMC8136078 DOI: 10.1093/bioinformatics/btab321] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 03/26/2021] [Accepted: 04/26/2021] [Indexed: 11/21/2022] Open
Abstract
Motivation Antibodies play an important role in clinical research and biotechnology, with their specificity determined by the interaction with the antigen’s epitope region, as a special type of protein–protein interaction (PPI) interface. The ubiquitous availability of sequence data, allows us to predict epitopes from sequence in order to focus time-consuming wet-lab experiments toward the most promising epitope regions. Here, we extend our previously developed sequence-based predictors for homodimer and heterodimer PPI interfaces to predict epitope residues that have the potential to bind an antibody. Results We collected and curated a high quality epitope dataset from the SAbDab database. Our generic PPI heterodimer predictor obtained an AUC-ROC of 0.666 when evaluated on the epitope test set. We then trained a random forest model specifically on the epitope dataset, reaching AUC 0.694. Further training on the combined heterodimer and epitope datasets, improves our final predictor to AUC 0.703 on the epitope test set. This is better than the best state-of-the-art sequence-based epitope predictor BepiPred-2.0. On one solved antibody–antigen structure of the COVID19 virus spike receptor binding domain, our predictor reaches AUC 0.778. We added the SeRenDIP-CE Conformational Epitope predictors to our webserver, which is simple to use and only requires a single antigen sequence as input, which will help make the method immediately applicable in a wide range of biomedical and biomolecular research. Availability and implementation Webserver, source code and datasets at www.ibi.vu.nl/programs/serendipwww/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qingzhen Hou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250002, P. R. China.,National institute of health data science of China, Shandong University, Shandong 250002, P. R. China
| | - Bas Stringer
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Katharina Waury
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Henriette Capel
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Reza Haydarlou
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250002, P. R. China.,National institute of health data science of China, Shandong University, Shandong 250002, P. R. China
| | - Sanne Abeln
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Jaap Heringa
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands.,AIMMS - Amsterdam Institute for Molecules Medicines and Systems, Vrije Universiteit Amsterdam
| | - K Anton Feenstra
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands.,AIMMS - Amsterdam Institute for Molecules Medicines and Systems, Vrije Universiteit Amsterdam
| |
Collapse
|
27
|
Zhang J, Ghadermarzi S, Kurgan L. Prediction of protein-binding residues: dichotomy of sequence-based methods developed using structured complexes versus disordered proteins. Bioinformatics 2021; 36:4729-4738. [PMID: 32860044 DOI: 10.1093/bioinformatics/btaa573] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 05/22/2020] [Accepted: 06/10/2020] [Indexed: 01/08/2023] Open
Abstract
MOTIVATION There are over 30 sequence-based predictors of the protein-binding residues (PBRs). They use either structure-annotated or disorder-annotated training datasets, potentially creating a dichotomy where the structure-/disorder-specific models may not be able to cross-over to accurately predict the other type. Moreover, the structure-trained predictors were shown to substantially cross-predict PBRs among residues that interact with non-protein partners (nucleic acids and small ligands). We address these issues by performing first-of-its-kind comparative study of a representative collection of disorder- and structure-trained predictors using a comprehensive benchmark set with the structure- and disorder-derived annotations of PBRs (to analyze the cross-over) and the protein-, nucleic acid- and small ligand-binding proteins (to study the cross-predictions). RESULTS Three predictors provide accurate results: SCRIBER, ANCHOR and disoRDPbind. Some of the structure-trained methods make accurate predictions on the structure-annotated proteins. Similarly, the disorder-trained predictors predict well on the disorder-annotated proteins. However, the considered predictors generally fail to cross-over, with the exception of SCRIBER. Our study also reveals that virtually all methods substantially cross-predict PBRs, except for SCRIBER for the structure-annotated proteins and disoRDPbind for the disorder-annotated proteins. We formulate a novel hybrid predictor, hybridPBRpred, that combines results produced by disoRDPbind and SCRIBER to accurately predict disorder- and structure-annotated PBRs. HybridPBRpred generates accurate results that cross-over structure- and disorder-annotated proteins and produces relatively low amount of cross-predictions, offering an accurate alternative to predict PBRs. AVAILABILITY AND IMPLEMENTATION HybridPBRpred webserver, benchmark dataset and supplementary information are available at http://biomine.cs.vcu.edu/servers/hybridPBRpred/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
28
|
Slater O, Miller B, Kontoyianni M. Decoding Protein-protein Interactions: An Overview. Curr Top Med Chem 2021; 20:855-882. [PMID: 32101126 DOI: 10.2174/1568026620666200226105312] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Revised: 11/27/2019] [Accepted: 11/27/2019] [Indexed: 12/24/2022]
Abstract
Drug discovery has focused on the paradigm "one drug, one target" for a long time. However, small molecules can act at multiple macromolecular targets, which serves as the basis for drug repurposing. In an effort to expand the target space, and given advances in X-ray crystallography, protein-protein interactions have become an emerging focus area of drug discovery enterprises. Proteins interact with other biomolecules and it is this intricate network of interactions that determines the behavior of the system and its biological processes. In this review, we briefly discuss networks in disease, followed by computational methods for protein-protein complex prediction. Computational methodologies and techniques employed towards objectives such as protein-protein docking, protein-protein interactions, and interface predictions are described extensively. Docking aims at producing a complex between proteins, while interface predictions identify a subset of residues on one protein that could interact with a partner, and protein-protein interaction sites address whether two proteins interact. In addition, approaches to predict hot spots and binding sites are presented along with a representative example of our internal project on the chemokine CXC receptor 3 B-isoform and predictive modeling with IP10 and PF4.
Collapse
Affiliation(s)
- Olivia Slater
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| | - Bethany Miller
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| | - Maria Kontoyianni
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| |
Collapse
|
29
|
Wojciechowska N, Bagniewska-Zadworna A, Minicka J, Michalak KM, Kalemba EM. Localization and Dynamics of the Methionine Sulfoxide Reductases MsrB1 and MsrB2 in Beech Seeds. Int J Mol Sci 2021; 22:E402. [PMID: 33401671 PMCID: PMC7795007 DOI: 10.3390/ijms22010402] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 12/23/2020] [Accepted: 12/29/2020] [Indexed: 11/24/2022] Open
Abstract
Beech seeds are produced irregularly, and there is a need for long-term storage of these seeds for forest management practices. Accumulated reactive oxygen species broadly oxidize molecules, including amino acids, such as methionine, thereby contributing to decreased seed viability. Methionine oxidation can be reversed by the activity of methionine sulfoxide reductases (Msrs), which are enzymes involved in the regulation of many developmental processes and stress responses. Two types of Msrs, MsrB1 and MsrB2, were investigated in beech seeds to determine their abundance and localization. MsrB1 and MsrB2 were detected in the cortical cells and the outer area of the vascular cylinder of the embryonic axes as well as in the epidermis and parenchyma cells of cotyledons. The abundances of MsrB1 and MsrB2 decreased during long-term storage. Ultrastructural analyses have demonstrated the accumulation of these proteins in protein storage vacuoles and in the cytoplasm, especially in close proximity to the cell membrane. In silico predictions of possible Msr interactions supported our findings. In this study, we investigate the contribution of MsrB1 and MsrB2 locations in the regulation of seed viability and suggest that MsrB2 is linked with the longevity of beech seeds via association with proper utilization of storage material.
Collapse
Affiliation(s)
- Natalia Wojciechowska
- Institute of Dendrology, Polish Academy of Sciences, Parkowa 5, 62-035 Kórnik, Poland
- Department of General Botany, Institute of Experimental Biology, Faculty of Biology, Adam Mickiewicz University, Uniwersytetu Poznańskiego 6, 61-614 Poznań, Poland; (A.B.-Z.); (K.M.M.)
| | - Agnieszka Bagniewska-Zadworna
- Department of General Botany, Institute of Experimental Biology, Faculty of Biology, Adam Mickiewicz University, Uniwersytetu Poznańskiego 6, 61-614 Poznań, Poland; (A.B.-Z.); (K.M.M.)
| | - Julia Minicka
- Department of Virology and Bacteriology, Institute of Plant Protection, Władysława Węgorka 20, 60-318 Poznań, Poland;
| | - Kornel M. Michalak
- Department of General Botany, Institute of Experimental Biology, Faculty of Biology, Adam Mickiewicz University, Uniwersytetu Poznańskiego 6, 61-614 Poznań, Poland; (A.B.-Z.); (K.M.M.)
| | - Ewa M. Kalemba
- Institute of Dendrology, Polish Academy of Sciences, Parkowa 5, 62-035 Kórnik, Poland
| |
Collapse
|
30
|
Preto AJ, Matos-Filipe P, de Almeida JG, Mourão J, Moreira IS. Predicting Hot Spots Using a Deep Neural Network Approach. Methods Mol Biol 2021; 2190:267-288. [PMID: 32804371 DOI: 10.1007/978-1-0716-0826-5_13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Targeting protein-protein interactions is a challenge and crucial task of the drug discovery process. A good starting point for rational drug design is the identification of hot spots (HS) at protein-protein interfaces, typically conserved residues that contribute most significantly to the binding. In this chapter, we depict point-by-point an in-house pipeline used for HS prediction using only sequence-based features from the well-known SpotOn dataset of soluble proteins (Moreira et al., Sci Rep 7:8007, 2017), through the implementation of a deep neural network. The presented pipeline is divided into three steps: (1) feature extraction, (2) deep learning classification, and (3) model evaluation. We present all the available resources, including code snippets, the main dataset, and the free and open-source modules/packages necessary for full replication of the protocol. The users should be able to develop an HS prediction model with accuracy, precision, recall, and AUROC of 0.96, 0.93, 0.91, and 0.86, respectively.
Collapse
Affiliation(s)
- António J Preto
- Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
- Institute for Interdisciplinary Research, University of Coimbra, Coimbra, Portugal
| | - Pedro Matos-Filipe
- Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - José G de Almeida
- Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Joana Mourão
- Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
- Institute for Interdisciplinary Research, University of Coimbra, Coimbra, Portugal
| | - Irina S Moreira
- Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal.
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal.
- University of Coimbra, Department of Life Sciences, University of Coimbra, Coimbra, Portugal.
| |
Collapse
|
31
|
Zhang F, Shi W, Zhang J, Zeng M, Li M, Kurgan L. PROBselect: accurate prediction of protein-binding residues from proteins sequences via dynamic predictor selection. Bioinformatics 2020; 36:i735-i744. [DOI: 10.1093/bioinformatics/btaa806] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/07/2020] [Indexed: 12/13/2022] Open
Abstract
Abstract
Motivation
Knowledge of protein-binding residues (PBRs) improves our understanding of protein−protein interactions, contributes to the prediction of protein functions and facilitates protein−protein docking calculations. While many sequence-based predictors of PBRs were published, they offer modest levels of predictive performance and most of them cross-predict residues that interact with other partners. One unexplored option to improve the predictive quality is to design consensus predictors that combine results produced by multiple methods.
Results
We empirically investigate predictive performance of a representative set of nine predictors of PBRs. We report substantial differences in predictive quality when these methods are used to predict individual proteins, which contrast with the dataset-level benchmarks that are currently used to assess and compare these methods. Our analysis provides new insights for the cross-prediction concern, dissects complementarity between predictors and demonstrates that predictive performance of the top methods depends on unique characteristics of the input protein sequence. Using these insights, we developed PROBselect, first-of-its-kind consensus predictor of PBRs. Our design is based on the dynamic predictor selection at the protein level, where the selection relies on regression-based models that accurately estimate predictive performance of selected predictors directly from the sequence. Empirical assessment using a low-similarity test dataset shows that PROBselect provides significantly improved predictive quality when compared with the current predictors and conventional consensuses that combine residue-level predictions. Moreover, PROBselect informs the users about the expected predictive quality for the prediction generated from a given input protein.
Availability and implementation
PROBselect is available at http://bioinformatics.csu.edu.cn/PROBselect/home/index.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fuhao Zhang
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Wenbo Shi
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Min Zeng
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
32
|
Bartocci A, Gillet N, Jiang T, Szczepaniak F, Dumont E. Molecular Dynamics Approach for Capturing Calixarene-Protein Interactions: The Case of Cytochrome C. J Phys Chem B 2020; 124:11371-11378. [PMID: 33270456 DOI: 10.1021/acs.jpcb.0c08482] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Functionalized supramolecular cages are of growing importance in biology and biochemistry. They have recently been proposed as efficient auxiliaries to obtain high-resolution cocrystallized proteins. Here, we propose a molecular dynamics investigation of the supramolecular association of sulfonated calix-[8]-arenes to cytochrome c starting from initially distant proteins and ligands. We characterize two main binding sites for the sulfonated calixarene on the cytochrome c surface which are in perfect agreement with the previous experiments with regard to the structure (comparison with the X-ray structure PDB 6GD8) and the binding free energies [comparison between the molecular mechanics Poisson-Boltzmann surface area analysis and the isothermal titration calorimetry measurements]. The per-residue decomposition of the interaction energies reveals the detailed picture of this electrostatically driven association and notably the role of arginine R13 as a bridging residue between the two main anchoring sites. In addition, the analysis of the residue behavior by means of a supervised machine learning protocol unveils the formation of a hydrogen bond network far from the binding sites, increasing the rigidity of the protein. This study paves the way toward an automated procedure to predict the supramolecular protein-cage association, with the possibility of a computational screening of new promising derivatives for controlled protein assembly and protein surface recognition processes.
Collapse
Affiliation(s)
- Alessio Bartocci
- Univ Lyon, ENS de Lyon, CNRS UMR 5182, Université Claude Bernard Lyon 1, Laboratoire de Chimie, F-69342 Lyon, France
| | - Natacha Gillet
- Univ Lyon, ENS de Lyon, CNRS UMR 5182, Université Claude Bernard Lyon 1, Laboratoire de Chimie, F-69342 Lyon, France
| | - Tao Jiang
- Univ Lyon, ENS de Lyon, CNRS UMR 5182, Université Claude Bernard Lyon 1, Laboratoire de Chimie, F-69342 Lyon, France
| | - Florence Szczepaniak
- Univ Lyon, ENS de Lyon, CNRS UMR 5182, Université Claude Bernard Lyon 1, Laboratoire de Chimie, F-69342 Lyon, France
| | - Elise Dumont
- Univ Lyon, ENS de Lyon, CNRS UMR 5182, Université Claude Bernard Lyon 1, Laboratoire de Chimie, F-69342 Lyon, France.,Institut Universitaire de France, 5 Rue Descartes, 75005 Paris, France
| |
Collapse
|
33
|
Wang X, Lin L, Lan B, Wang Y, Du L, Chen X, Li Q, Liu K, Hu M, Xue Y, Roberts AI, Shao C, Melino G, Shi Y, Wang Y. IGF2R-initiated proton rechanneling dictates an anti-inflammatory property in macrophages. SCIENCE ADVANCES 2020; 6:6/48/eabb7389. [PMID: 33239287 PMCID: PMC7688333 DOI: 10.1126/sciadv.abb7389] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Accepted: 10/13/2020] [Indexed: 05/10/2023]
Abstract
Metabolic traits of macrophages can be rewired by insulin-like growth factor 2 (IGF2); however, how IGF2 modulates macrophage cellular dynamics and functionality remains unclear. We demonstrate that IGF2 exhibits dual and opposing roles in controlling inflammatory phenotypes in macrophages by regulating glucose metabolism, relying on the dominant activation of the IGF2 receptor (IGF2R) by low-dose IGF2 (L-IGF2) and IGF1R by high-dose IGF2. IGF2R activation leads to proton rechanneling to the mitochondrial intermembrane space and enables sustained oxidative phosphorylation. Mechanistically, L-IGF2 induces nucleus translocation of IGF2R that promotes Dnmt3a-mediated DNA methylation by activating GSK3α/β and subsequently impairs expression of vacuolar-type H+-ATPase (v-ATPase). This sequestrated assembly of v-ATPase inhibits the channeling of protons to lysosomes and leads to their rechanneling to mitochondria. An IGF2R-specific IGF2 mutant induces only the anti-inflammatory response and inhibits colitis progression. Together, our findings highlight a previously unidentified role of IGF2R activation in dictating anti-inflammatory macrophages.
Collapse
Affiliation(s)
- Xuefeng Wang
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China
- The First Affiliated Hospital of Soochow University and State Key Laboratory of Radiation Medicine and Protection, Institutes for Translational Medicine, Soochow University, 199 Renai Road, Suzhou, Jiangsu 215123, China
| | - Liangyu Lin
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China
| | - Bin Lan
- Shanghai Jiao Tong University School of Medicine, Shanghai Center for Systems Biomedicine Research, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Yu Wang
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China
| | - Liming Du
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China
| | - Xiaodong Chen
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China
| | - Qing Li
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China
| | - Keli Liu
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China
| | - Mingyuan Hu
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China
| | - Yueqing Xue
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China
| | - Arthur I Roberts
- Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ 08901, USA
| | - Changshun Shao
- The First Affiliated Hospital of Soochow University and State Key Laboratory of Radiation Medicine and Protection, Institutes for Translational Medicine, Soochow University, 199 Renai Road, Suzhou, Jiangsu 215123, China
| | - Gerry Melino
- Department of Experimental Medicine, TOR, University of Rome Tor Vergata, Rome 00133, Italy
| | - Yufang Shi
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China.
- The First Affiliated Hospital of Soochow University and State Key Laboratory of Radiation Medicine and Protection, Institutes for Translational Medicine, Soochow University, 199 Renai Road, Suzhou, Jiangsu 215123, China
- Department of Experimental Medicine, TOR, University of Rome Tor Vergata, Rome 00133, Italy
| | - Ying Wang
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China.
| |
Collapse
|
34
|
Kober DL, Stuchell-Brereton MD, Kluender CE, Dean HB, Strickland MR, Steinberg DF, Nelson SS, Baban B, Holtzman DM, Frieden C, Alexander-Brett J, Roberson ED, Song Y, Brett TJ. Functional insights from biophysical study of TREM2 interactions with apoE and Aβ 1-42. Alzheimers Dement 2020; 17:10.1002/alz.12194. [PMID: 33090700 PMCID: PMC8026773 DOI: 10.1002/alz.12194] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 08/12/2020] [Accepted: 08/20/2020] [Indexed: 12/20/2022]
Abstract
INTRODUCTION Triggering receptor expressed on myeloid cells-2 (TREM2) is an immune receptor expressed on microglia that also can become soluble (sTREM2). How TREM2 engages different ligands remains poorly understood. METHODS We used comprehensive biolayer interferometry (BLI) analysis to investigate TREM2 and sTREM2 interactions with apolipoprotein E (apoE) and monomeric amyloid beta (Aβ) (mAβ42). RESULTS TREM2 engagement of apoE was protein mediated with little effect of lipidation, showing slight affinity differences between isoforms (E4 > E3 > E2). Another family member, TREML2, did not bind apoE. Disease-linked TREM2 variants within a "basic patch" minimally impact apoE binding. Instead, TREM2 uses a unique hydrophobic surface to bind apoE, which requires the apoE hinge region. TREM2 and sTREM2 directly bind mAβ42 and potently inhibit Aβ42 polymerization, suggesting a potential role for soluble sTREM2 in preventing AD pathogenesis. DISCUSSION These findings demonstrate that TREM2 has at least two ligand-binding surfaces that might be therapeutic targets and uncovers a potential function for sTREM2 in directly inhibiting Aβ polymerization.
Collapse
Affiliation(s)
- Daniel L. Kober
- Molecular Microbiology and Microbial Pathogenesis Program, Washington University School of Medicine, St. Louis, Missouri 63110
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Washington University School of Medicine, St. Louis, Missouri 63110
| | - Melissa D. Stuchell-Brereton
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri 63110
| | - Colin E. Kluender
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Washington University School of Medicine, St. Louis, Missouri 63110
- Biochemistry, Biophysics, and Structural Biology Program, Washington University School of Medicine, St. Louis, Missouri 63110
| | - Hunter B. Dean
- Center for Neurodegeneration and Experimental Therapeutics, Alzheimer’s Disease Center, Departments of Neurology and Neurobiology, University of Alabama at Birmingham, AL 35294
- Department of Biomedical Engineering, University of Alabama at Birmingham, AL 35294
| | - Michael R. Strickland
- Department of Neurology, Washington University School of Medicine, St. Louis, Missouri 63110
| | - Deborah F. Steinberg
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Washington University School of Medicine, St. Louis, Missouri 63110
| | - Samantha S. Nelson
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Washington University School of Medicine, St. Louis, Missouri 63110
| | - Berevan Baban
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri 63110
| | - David M. Holtzman
- Department of Neurology, Washington University School of Medicine, St. Louis, Missouri 63110
- Charles F. and Joanne Knight Alzheimer’s Disease Research Center, Washington University School of Medicine, St. Louis, Missouri 63110
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, Missouri 63110
| | - Carl Frieden
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri 63110
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, Missouri 63110
| | - Jennifer Alexander-Brett
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Washington University School of Medicine, St. Louis, Missouri 63110
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri 63110
| | - Erik D. Roberson
- Center for Neurodegeneration and Experimental Therapeutics, Alzheimer’s Disease Center, Departments of Neurology and Neurobiology, University of Alabama at Birmingham, AL 35294
| | - Yuhua Song
- Department of Biomedical Engineering, University of Alabama at Birmingham, AL 35294
| | - Tom J. Brett
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Washington University School of Medicine, St. Louis, Missouri 63110
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri 63110
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, Missouri 63110
- Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, Missouri 63110
| |
Collapse
|
35
|
Zeng M, Zhang F, Wu FX, Li Y, Wang J, Li M. Protein-protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 2020; 36:1114-1120. [PMID: 31593229 DOI: 10.1093/bioinformatics/btz699] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Revised: 07/25/2019] [Accepted: 09/04/2019] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Protein-protein interactions (PPIs) play important roles in many biological processes. Conventional biological experiments for identifying PPI sites are costly and time-consuming. Thus, many computational approaches have been proposed to predict PPI sites. Existing computational methods usually use local contextual features to predict PPI sites. Actually, global features of protein sequences are critical for PPI site prediction. RESULTS A new end-to-end deep learning framework, named DeepPPISP, through combining local contextual and global sequence features, is proposed for PPI site prediction. For local contextual features, we use a sliding window to capture features of neighbors of a target amino acid as in previous studies. For global sequence features, a text convolutional neural network is applied to extract features from the whole protein sequence. Then the local contextual and global sequence features are combined to predict PPI sites. By integrating local contextual and global sequence features, DeepPPISP achieves the state-of-the-art performance, which is better than the other competing methods. In order to investigate if global sequence features are helpful in our deep learning model, we remove or change some components in DeepPPISP. Detailed analyses show that global sequence features play important roles in DeepPPISP. AVAILABILITY AND IMPLEMENTATION The DeepPPISP web server is available at http://bioinformatics.csu.edu.cn/PPISP/. The source code can be obtained from https://github.com/CSUBioGroup/DeepPPISP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon SKS7N5A9, Canada
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| |
Collapse
|
36
|
Lin X, Zhang X, Xu X. Efficient Classification of Hot Spots and Hub Protein Interfaces by Recursive Feature Elimination and Gradient Boosting. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1525-1534. [PMID: 31380766 DOI: 10.1109/tcbb.2019.2931717] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Proteins are not isolated biological molecules, which have the specific three-dimensional structures and interact with other proteins to perform functions. A small number of residues (hot spots) in protein-protein interactions (PPIs) play the vital role in bioinformatics to influence and control of biological processes. This paper uses the boosting algorithm and gradient boosting algorithm based on two feature selection strategies to classify hot spots with three common datasets and two hub protein datasets. First, the correlation-based feature selection is used to remove the highly related features for improving accuracy of prediction. Then, the recursive feature elimination based on support vector machine (SVM-RFE) is adopted to select the optimal feature subset to improve the training performance. Finally, boosting and gradient boosting (G-boosting) methods are invoked to generate classification results. Gradient boosting is capable of obtaining an excellent model by reducing the loss function in the gradient direction to avoid overfitting. Five datasets from different protein databases are used to verify our models in the experiments. Experimental results show that our proposed classification models have the competitive performance compared with existing classification methods.
Collapse
|
37
|
Savojardo C, Martelli PL, Casadio R. Protein–Protein Interaction Methods and Protein Phase Separation. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-011720-104428] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In the last decade, newly developed experimental methods have made it possible to highlight that macromolecules in the cell milieu physically interact to support physiology. This has shifted the problem of protein–protein interaction from a microscopic, electron-density scale to a mesoscopic one. Further, nowadays there is increasing evidence that proteins in the nucleus and in the cytoplasm can aggregate in membraneless organelles for different physiological reasons. In this scenario, it is urgent to face the problem of biomolecule functional annotation with efficient computational methods, suited to extract knowledge from reliable data and transfer information across different domains of investigation. Here, we revise the present state of the art of our knowledge of protein–protein interaction and the computational methods that differently implement it. Furthermore, we explore experimental and computational features of a set of proteins involved in phase separation.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology and Interdepartmental Center “Luigi Galvani” for Integrated Studies of Bioinformatics, Biophysics, and Biocomplexity, University of Bologna, 40126 Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology and Interdepartmental Center “Luigi Galvani” for Integrated Studies of Bioinformatics, Biophysics, and Biocomplexity, University of Bologna, 40126 Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology and Interdepartmental Center “Luigi Galvani” for Integrated Studies of Bioinformatics, Biophysics, and Biocomplexity, University of Bologna, 40126 Bologna, Italy
- Institute of Biomembranes, Bioenergetics, and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), 70126 Bari, Italy
| |
Collapse
|
38
|
Zhang J, Kurgan L. SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences. Bioinformatics 2020; 35:i343-i353. [PMID: 31510679 PMCID: PMC6612887 DOI: 10.1093/bioinformatics/btz324] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Motivation Accurate predictions of protein-binding residues (PBRs) enhances understanding of molecular-level rules governing protein–protein interactions, helps protein–protein docking and facilitates annotation of protein functions. Recent studies show that current sequence-based predictors of PBRs severely cross-predict residues that interact with other types of protein partners (e.g. RNA and DNA) as PBRs. Moreover, these methods are relatively slow, prohibiting genome-scale use. Results We propose a novel, accurate and fast sequence-based predictor of PBRs that minimizes the cross-predictions. Our SCRIBER (SeleCtive pRoteIn-Binding rEsidue pRedictor) method takes advantage of three innovations: comprehensive dataset that covers multiple types of binding residues, novel types of inputs that are relevant to the prediction of PBRs, and an architecture that is tailored to reduce the cross-predictions. The dataset includes complete protein chains and offers improved coverage of binding annotations that are transferred from multiple protein–protein complexes. We utilize innovative two-layer architecture where the first layer generates a prediction of protein-binding, RNA-binding, DNA-binding and small ligand-binding residues. The second layer re-predicts PBRs by reducing overlap between PBRs and the other types of binding residues produced in the first layer. Empirical tests on an independent test dataset reveal that SCRIBER significantly outperforms current predictors and that all three innovations contribute to its high predictive performance. SCRIBER reduces cross-predictions by between 41% and 69% and our conservative estimates show that it is at least 3 times faster. We provide putative PBRs produced by SCRIBER for the entire human proteome and use these results to hypothesize that about 14% of currently known human protein domains bind proteins. Availability and implementation SCRIBER webserver is available at http://biomine.cs.vcu.edu/servers/SCRIBER/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China.,Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
39
|
Zhu H, Du X, Yao Y. ConvsPPIS: Identifying Protein-protein Interaction Sites by an Ensemble Convolutional Neural Network with Feature Graph. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191105155713] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background/Objective:
Protein-protein interactions are essentials for most cellular
processes and thus, unveiling how proteins interact with is a crucial question that can be better
understood by recognizing which residues participate in the interaction. Although many
computational approaches have been proposed to predict interface residues, their feature
perspective and model learning ability are not enough to achieve ideal results. So, our objective is
to improve the predictive performance under considering feature perspective and new learning
algorithm.
Method:
In this study, we proposed an ensemble deep convolutional neural network, which
explores the context and positional context of consecutive residues within a protein sub-sequence.
Specifically, unlike the feature view of previous methods, ConvsPPIS uses evolutionary,
physicochemical, and structural protein characteristics to construct their own feature graph
respectively. After that, three independent deep convolutional neural networks are trained on each
type of feature graph for learning the underlying pattern in sub-sequence. Lastly, we integrated
those three deep networks into an ensemble predictor with leveraging complementary information
of those features to predict potential interface residues.
Results:
Some comparative experiments have conducted through 10-fold cross-validation. The
results indicated that ConvsPPIS achieved superior performance on DBv5-Sel dataset with an
accuracy of 88%. Additional experiments on CAPRI-Alone dataset demonstrated ConvsPPIS has
also better prediction performance.
Conclusion:
The ConvsPPIS method provided a new perspective to capture protein feature
expression for identifying protein-protein interaction sites. The results proved the superiority of
this method.
Collapse
Affiliation(s)
- Huaixu Zhu
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xiuquan Du
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Yu Yao
- School of Computer Science and Technology, Anhui University, Hefei, China
| |
Collapse
|
40
|
Hehenberger E, Eitel M, Fortunato SAV, Miller DJ, Keeling PJ, Cahill MA. Early eukaryotic origins and metazoan elaboration of MAPR family proteins. Mol Phylogenet Evol 2020; 148:106814. [PMID: 32278076 DOI: 10.1016/j.ympev.2020.106814] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 03/24/2020] [Accepted: 04/01/2020] [Indexed: 01/01/2023]
Abstract
The membrane-associated progesterone receptor (MAPR) family consists of heme-binding proteins containing a cytochrome b5 (cytb5) domain characterized by the presence of a MAPR-specific interhelical insert region (MIHIR) between helices 3 and 4 of the canonical cytb5-domain fold. Animals possess three MAPR genes (PGRMC-like, Neuferricin and Neudesin). Here we show that all three animal MAPR genes were already present in the common ancestor of the opisthokonts (comprising animals and fungi as well as related single-celled taxa). All three MAPR genes acquired extensions C-terminal to the cytb5 domain, either before or with the evolution of animals. The archetypical MAPR protein, progesterone receptor membrane component 1 (PGRMC1), contains phosphorylated tyrosines Y139 and Y180. The combination of Y139/Y180 appeared in the common ancestor of cnidarians and bilaterians, along with an early embryological organizer and synapsed neurons, and is strongly conserved in all bilaterian animals. A predicted protein interaction motif in the PGRMC1 MIHIR is potentially regulated by Y139 phosphorylation. A multilayered model of animal MAPR function acquisition includes some pre-metazoan functions (e.g., heme binding and cytochrome P450 interactions) and some acquired animal-specific functions that involve regulation of strongly conserved protein interaction motifs acquired by animals (Metazoa). This study provides a conceptual framework for future studies, against which especially PGRMC1's multiple functions can perhaps be stratified and functionally dissected.
Collapse
Affiliation(s)
- Elisabeth Hehenberger
- Department of Botany, University of British Columbia, 3529-6270 University Boulevard, Vancouver, BC V6T 1Z4, Canada
| | - Michael Eitel
- Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Sofia A V Fortunato
- ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD 4811, Australia
| | - David J Miller
- ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD 4811, Australia
| | - Patrick J Keeling
- Department of Botany, University of British Columbia, 3529-6270 University Boulevard, Vancouver, BC V6T 1Z4, Canada
| | - Michael A Cahill
- School of Biomedical Sciences, Charles Sturt University, Wagga Wagga, NSW 2678, Australia; ACRF Department of Cancer Biology and Therapeutics, The John Curtin School of Medical Research, Canberra, ACT 2601, Australia.
| |
Collapse
|
41
|
Deng A, Zhang H, Wang W, Zhang J, Fan D, Chen P, Wang B. Developing Computational Model to Predict Protein-Protein Interaction Sites Based on the XGBoost Algorithm. Int J Mol Sci 2020; 21:E2274. [PMID: 32218345 PMCID: PMC7178137 DOI: 10.3390/ijms21072274] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 03/10/2020] [Accepted: 03/23/2020] [Indexed: 12/27/2022] Open
Abstract
The study of protein-protein interaction is of great biological significance, and the prediction of protein-protein interaction sites can promote the understanding of cell biological activity and will be helpful for drug development. However, uneven distribution between interaction and non-interaction sites is common because only a small number of protein interactions have been confirmed by experimental techniques, which greatly affects the predictive capability of computational methods. In this work, two imbalanced data processing strategies based on XGBoost algorithm were proposed to re-balance the original dataset from inherent relationship between positive and negative samples for the prediction of protein-protein interaction sites. Herein, a feature extraction method was applied to represent the protein interaction sites based on evolutionary conservatism of proteins, and the influence of overlapping regions of positive and negative samples was considered in prediction performance. Our method showed good prediction performance, such as prediction accuracy of 0.807 and MCC of 0.614, on an original dataset with 10,455 surface residues but only 2297 interface residues. Experimental results demonstrated the effectiveness of our XGBoost-based method.
Collapse
Affiliation(s)
- Aijun Deng
- Key Laboratory of Metallurgical Emission Reduction & Resources Recycling (Anhui University of Technology), Ministry of Education, Ma'anshan 243002, China
- School of Metallurgical Engineering, Anhui University of Technology, Ma'anshan 243032, China
- Department of Engineering, University of Leicester, Leicester LE1 7RH, UK
| | - Huan Zhang
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan 243032, China
| | - Wenyan Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan 243032, China
| | - Jun Zhang
- Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei 230032, China
| | - Dingdong Fan
- School of Metallurgical Engineering, Anhui University of Technology, Ma'anshan 243032, China
| | - Peng Chen
- Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei 230032, China
| | - Bing Wang
- Key Laboratory of Metallurgical Emission Reduction & Resources Recycling (Anhui University of Technology), Ministry of Education, Ma'anshan 243002, China
- School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan 243032, China
- Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei 230032, China
| |
Collapse
|
42
|
Qiu J, Bernhofer M, Heinzinger M, Kemper S, Norambuena T, Melo F, Rost B. ProNA2020 predicts protein-DNA, protein-RNA, and protein-protein binding proteins and residues from sequence. J Mol Biol 2020; 432:2428-2443. [PMID: 32142788 DOI: 10.1016/j.jmb.2020.02.026] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 02/17/2020] [Accepted: 02/23/2020] [Indexed: 11/29/2022]
Abstract
The intricate details of how proteins bind to proteins, DNA, and RNA are crucial for the understanding of almost all biological processes. Disease-causing sequence variants often affect binding residues. Here, we described a new, comprehensive system of in silico methods that take only protein sequence as input to predict binding of protein to DNA, RNA, and other proteins. Firstly, we needed to develop several new methods to predict whether or not proteins bind (per-protein prediction). Secondly, we developed independent methods that predict which residues bind (per-residue). Not requiring three-dimensional information, the system can predict the actual binding residue. The system combined homology-based inference with machine learning and motif-based profile-kernel approaches with word-based (ProtVec) solutions to machine learning protein level predictions. This achieved an overall non-exclusive three-state accuracy of 77% ± 1% (±one standard error) corresponding to a 1.8 fold improvement over random (best classification for protein-protein with F1 = 91 ± 0.8%). Standard neural networks for per-residue binding residue predictions appeared best for DNA-binding (Q2 = 81 ± 0.9%) followed by RNA-binding (Q2 = 80 ± 1%) and worst for protein-protein binding (Q2 = 69 ± 0.8%). The new method, dubbed ProNA2020, is available as code through github (https://github.com/Rostlab/ProNA2020.git) and through PredictProtein (www.predictprotein.org).
Collapse
Affiliation(s)
- Jiajun Qiu
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany.
| | - Michael Bernhofer
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany
| | - Michael Heinzinger
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany
| | - Sofie Kemper
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany
| | - Tomas Norambuena
- Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Francisco Melo
- Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile; Institute of Biological and Medical Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Burkhard Rost
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; Columbia University, Department of Biochemistry and Molecular Biophysics, 701 West, 168th Street, New York, NY, 10032, USA; Institute of Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany; Germany & Institute for Food and Plant Sciences (WZW) Weihenstephan, Alte Akademie 8, 85354 Freising, Germany
| |
Collapse
|
43
|
Ayanlaja AA, Ji G, Wang J, Gao Y, Cheng B, Kanwore K, Zhang L, Xiong Y, Kambey PA, Gao D. Doublecortin undergo nucleocytoplasmic transport via the RanGTPase signaling to promote glioma progression. Cell Commun Signal 2020; 18:24. [PMID: 32050972 PMCID: PMC7017634 DOI: 10.1186/s12964-019-0485-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 11/20/2019] [Indexed: 12/13/2022] Open
Abstract
Background Nuclear translocation of several oncogenic proteins have previously been reported, but neither the translocation of doublecortin (DCX) nor the mechanism involved has been studied. DCX is a neuronal microtubule-associated protein (MAP) that is crucial for adult neurogenesis and neuronal migration and has been associated with poor prognosis in gliomas. Methods We probed DCX expression in different grades of glioma tissues and conventional cells via western blotting. Then we analyzed the expression pattern in the Oncomine cancer profiling database. Confocal Immunofluorescence was used to detect DCX expression in the cellular compartments, while subcellular fractionation was probed via western blotting. Pulse shape height analysis was utilized to verify DCX localization in a larger population of cells. Co-immunoprecipitation was used in detecting DCX-import receptors interactions. To probe for DCX functions, stable cells expressing high DCX expression or knockdown were generated using CRISPR-Cas9 viral transfection, while plasmid site-directed mutant constructs were used to validate putative nuclear localization sequence (NLS) predicted via conventional algorithms and comparison with classical NLSs. in-silico modeling was performed to validate DCX interactions with import receptors via the selected putative NLS. Effects of DCX high expression, knockdown, mutation, and/or deletion of putative NLS sites were probed via Boyden’s invasion assay and wound healing migration assays, and viability was detected by CCK8 assays in-vitro, while xenograft tumor model was performed in nude mice. Results DCX undergoes nucleocytoplasmic movement via the RanGTPase signaling pathway with an NLS located on the N-terminus between serine47-tyrosine70. This translocation could be stimulated by MARK’s phosphorylation of the serine 47 residue flanking the NLS due to aberrant expression of glial cell line-derived neurotrophic factor (GDNF). High expression and nuclear accumulation of DCX improve invasive glioma abilities in-vitro and in-vivo. Moreover, knocking down or blocking DCX nuclear import attenuates invasiveness and proliferation of glioma cells. Conclusion Collectively, this study highlights a remarkable phenomenon in glioma, hence revealing potential glioma dependencies on DCX expression, which is amenable to targeted therapy. Video abstract
Graphical Abstract ![]()
Collapse
Affiliation(s)
- Abiola Abdulrahman Ayanlaja
- Department of Neurobiology and Anatomy, Key Laboratory of Neurobiology, Xuzhou Medical University, 209, Tongshan Road, Xuzhou, 221004, China
| | - Guanquan Ji
- Department of Neurobiology and Anatomy, Key Laboratory of Neurobiology, Xuzhou Medical University, 209, Tongshan Road, Xuzhou, 221004, China.,Department of Neurosurgery, The Second Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, China.,Department of Neurosurgery, The Third Affiliated Hospital of Henan University of Science and Technology, Henan, China
| | - Jie Wang
- Department of Neurobiology and Anatomy, Key Laboratory of Neurobiology, Xuzhou Medical University, 209, Tongshan Road, Xuzhou, 221004, China
| | - Yue Gao
- Department of Neurobiology and Anatomy, Key Laboratory of Neurobiology, Xuzhou Medical University, 209, Tongshan Road, Xuzhou, 221004, China
| | - Bo Cheng
- Department of Neurobiology and Anatomy, Key Laboratory of Neurobiology, Xuzhou Medical University, 209, Tongshan Road, Xuzhou, 221004, China
| | - Kouminin Kanwore
- Department of Neurobiology and Anatomy, Key Laboratory of Neurobiology, Xuzhou Medical University, 209, Tongshan Road, Xuzhou, 221004, China
| | - Lin Zhang
- Department of Neurobiology and Anatomy, Key Laboratory of Neurobiology, Xuzhou Medical University, 209, Tongshan Road, Xuzhou, 221004, China
| | - Ye Xiong
- Department of Neurobiology and Anatomy, Key Laboratory of Neurobiology, Xuzhou Medical University, 209, Tongshan Road, Xuzhou, 221004, China
| | - Piniel Alphayo Kambey
- Department of Neurobiology and Anatomy, Key Laboratory of Neurobiology, Xuzhou Medical University, 209, Tongshan Road, Xuzhou, 221004, China
| | - Dianshuai Gao
- Department of Neurobiology and Anatomy, Key Laboratory of Neurobiology, Xuzhou Medical University, 209, Tongshan Road, Xuzhou, 221004, China.
| |
Collapse
|
44
|
Xie Z, Deng X, Shu K. Prediction of Protein-Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets. Int J Mol Sci 2020; 21:E467. [PMID: 31940793 PMCID: PMC7013409 DOI: 10.3390/ijms21020467] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 12/23/2019] [Accepted: 01/08/2020] [Indexed: 12/20/2022] Open
Abstract
Protein-protein interaction (PPI) sites play a key role in the formation of protein complexes, which is the basis of a variety of biological processes. Experimental methods to solve PPI sites are expensive and time-consuming, which has led to the development of different kinds of prediction algorithms. We propose a convolutional neural network for PPI site prediction and use residue binding propensity to improve the positive samples. Our method obtains a remarkable result of the area under the curve (AUC) = 0.912 on the improved data set. In addition, it yields much better results on samples with high binding propensity than on randomly selected samples. This suggests that there are considerable false-positive PPI sites in the positive samples defined by the distance between residue atoms.
Collapse
Affiliation(s)
- Zengyan Xie
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
| | | | - Kunxian Shu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
| |
Collapse
|
45
|
Barreto CAV, Baptista SJ, Preto AJ, Matos-Filipe P, Mourão J, Melo R, Moreira I. Prediction and targeting of GPCR oligomer interfaces. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 169:105-149. [PMID: 31952684 DOI: 10.1016/bs.pmbts.2019.11.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
GPCR oligomerization has emerged as a hot topic in the GPCR field in the last years. Receptors that are part of these oligomers can influence each other's function, although it is not yet entirely understood how these interactions work. The existence of such a highly complex network of interactions between GPCRs generates the possibility of alternative targets for new therapeutic approaches. However, challenges still exist in the characterization of these complexes, especially at the interface level. Different experimental approaches, such as FRET or BRET, are usually combined to study GPCR oligomer interactions. Computational methods have been applied as a useful tool for retrieving information from GPCR sequences and the few X-ray-resolved oligomeric structures that are accessible, as well as for predicting new and trustworthy GPCR oligomeric interfaces. Machine-learning (ML) approaches have recently helped with some hindrances of other methods. By joining and evaluating multiple structure-, sequence- and co-evolution-based features on the same algorithm, it is possible to dilute the issues of particular structures and residues that arise from the experimental methodology into all-encompassing algorithms capable of accurately predict GPCR-GPCR interfaces. All these methods used as a single or a combined approach provide useful information about GPCR oligomerization and its role in GPCR function and dynamics. Altogether, we present experimental, computational and machine-learning methods used to study oligomers interfaces, as well as strategies that have been used to target these dynamic complexes.
Collapse
Affiliation(s)
- Carlos A V Barreto
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Salete J Baptista
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, CTN, LRS, Portugal
| | - António José Preto
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Pedro Matos-Filipe
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Joana Mourão
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Institute for Interdisciplinary Research, University of Coimbra, Coimbra, Portugal
| | - Rita Melo
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, CTN, LRS, Portugal
| | - Irina Moreira
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Science and Technology Faculty, University of Coimbra, Coimbra, Portugal.
| |
Collapse
|
46
|
Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F, Rost B. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics 2019; 20:723. [PMID: 31847804 PMCID: PMC6918593 DOI: 10.1186/s12859-019-3220-8] [Citation(s) in RCA: 241] [Impact Index Per Article: 48.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/13/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Predicting protein function and structure from sequence is one important challenge for computational biology. For 26 years, most state-of-the-art approaches combined machine learning and evolutionary information. However, for some applications retrieving related proteins is becoming too time-consuming. Additionally, evolutionary information is less powerful for small families, e.g. for proteins from the Dark Proteome. Both these problems are addressed by the new methodology introduced here. RESULTS We introduced a novel way to represent protein sequences as continuous vectors (embeddings) by using the language model ELMo taken from natural language processing. By modeling protein sequences, ELMo effectively captured the biophysical properties of the language of life from unlabeled big data (UniRef50). We refer to these new embeddings as SeqVec (Sequence-to-Vector) and demonstrate their effectiveness by training simple neural networks for two different tasks. At the per-residue level, secondary structure (Q3 = 79% ± 1, Q8 = 68% ± 1) and regions with intrinsic disorder (MCC = 0.59 ± 0.03) were predicted significantly better than through one-hot encoding or through Word2vec-like approaches. At the per-protein level, subcellular localization was predicted in ten classes (Q10 = 68% ± 1) and membrane-bound were distinguished from water-soluble proteins (Q2 = 87% ± 1). Although SeqVec embeddings generated the best predictions from single sequences, no solution improved over the best existing method using evolutionary information. Nevertheless, our approach improved over some popular methods using evolutionary information and for some proteins even did beat the best. Thus, they prove to condense the underlying principles of protein sequences. Overall, the important novelty is speed: where the lightning-fast HHblits needed on average about two minutes to generate the evolutionary information for a target protein, SeqVec created embeddings on average in 0.03 s. As this speed-up is independent of the size of growing sequence databases, SeqVec provides a highly scalable approach for the analysis of big data in proteomics, i.e. microbiome or metaproteome analysis. CONCLUSION Transfer-learning succeeded to extract information from unlabeled sequence databases relevant for various protein prediction tasks. SeqVec modeled the language of life, namely the principles underlying protein sequences better than any features suggested by textbooks and prediction methods. The exception is evolutionary information, however, that information is not available on the level of a single sequence.
Collapse
Affiliation(s)
- Michael Heinzinger
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
| | - Ahmed Elnaggar
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Yu Wang
- Leibniz Supercomputing Centre, Boltzmannstr. 1, 85748, Garching/Munich, Germany
| | - Christian Dallago
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Dmitrii Nechaev
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Florian Matthes
- TUM Department of Informatics, Software Engineering and Business Information Systems, Boltzmannstr. 1, 85748, Garching/Munich, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics & New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
47
|
Batra V, Maheshwarappa A, Dagar K, Kumar S, Soni A, Kumaresan A, Kumar R, Datta TK. Unusual interplay of contrasting selective pressures on β-defensin genes implicated in male fertility of the Buffalo (Bubalus bubalis). BMC Evol Biol 2019; 19:214. [PMID: 31771505 PMCID: PMC6878701 DOI: 10.1186/s12862-019-1535-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2019] [Accepted: 10/22/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The buffalo, despite its superior milk-producing ability, suffers from reproductive limitations that constrain its lifetime productivity. Male sub-fertility, manifested as low conception rates (CRs), is a major concern in buffaloes. The epididymal sperm surface-binding proteins which participate in the sperm surface remodelling (SSR) events affect the survival and performance of the spermatozoa in the female reproductive tract (FRT). A mutation in an epididymal secreted protein, beta-defensin 126 (DEFB-126/BD-126), a class-A beta-defensin (CA-BD), resulted in decreased CRs in human cohorts across the globe. To better understand the role of CA-BDs in buffalo reproduction, this study aimed to identify the BD genes for characterization of the selection pressure(s) acting on them, and to identify the most abundant CA-BD transcript in the buffalo male reproductive tract (MRT) for predicting its reproductive functional significance. RESULTS Despite the low protein sequence homology with their orthologs, the CA-BDs have maintained the molecular framework and the structural core vital to their biological functions. Their coding-sequences in ruminants revealed evidence of pervasive purifying and episodic diversifying selection pressures. The buffalo CA-BD genes were expressed in the major reproductive and non-reproductive tissues exhibiting spatial variations. The Buffalo BD-129 (BuBD-129) was the most abundant and the longest CA-BD in the distal-MRT segments and was predicted to be heavily O-glycosylated. CONCLUSIONS The maintenance of the structural core, despite the sequence divergence, indicated the conservation of the molecular functions of the CA-BDs. The expression of the buffalo CA-BDs in both the distal-MRT segments and non-reproductive tissues indicate the retention the primordial microbicidal activity, which was also predicted by in silico sequence analyses. However, the observed spatial variations in their expression across the MRT hint at their region-specific roles. Their comparison across mammalian species revealed a pattern in which the various CA-BDs appeared to follow dissimilar evolutionary paths. This pattern appears to maintain only the highly efficacious CA-BD alleles and diversify their functional repertoire in the ruminants. Our preliminary results and analyses indicated that BuBD-129 could be the functional ortholog of the primate DEFB-126. Further studies are warranted to assess its molecular functions to elucidate its role in immunity, reproduction and fertility.
Collapse
Affiliation(s)
- Vipul Batra
- Animal Genomics Lab, National Dairy Research Institute, Karnal, 132001, India
| | | | - Komal Dagar
- Animal Genomics Lab, National Dairy Research Institute, Karnal, 132001, India
| | - Sandeep Kumar
- Animal Genomics Lab, National Dairy Research Institute, Karnal, 132001, India
| | - Apoorva Soni
- Animal Genomics Lab, National Dairy Research Institute, Karnal, 132001, India
| | - A Kumaresan
- Theriogenology Lab, SRS of NDRI, Bengaluru, 560030, India
| | - Rakesh Kumar
- Animal Genomics Lab, National Dairy Research Institute, Karnal, 132001, India
| | - T K Datta
- Animal Genomics Lab, National Dairy Research Institute, Karnal, 132001, India.
| |
Collapse
|
48
|
Perry GML. 'Fat's chances': Loci for phenotypic dispersion in plasma leptin in mouse models of diabetes mellitus. PLoS One 2019; 14:e0222654. [PMID: 31661517 PMCID: PMC6818960 DOI: 10.1371/journal.pone.0222654] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 09/04/2019] [Indexed: 01/29/2023] Open
Abstract
Background Leptin, a critical mediator of feeding, metabolism and diabetes, is expressed on an incidental basis according to satiety. The genetic regulation of leptin should similarly be episodic. Methodology Data from three mouse cohorts hosted by the Jackson Laboratory– 402 (174F, 228M) F2 Dilute Brown non-Agouti (DBA/2)×DU6i intercrosses, 142 Non Obese Diabetic (NOD/ShiLtJ×(NOD/ShiLtJ×129S1/SvImJ.H2g7) N2 backcross females, and 204 male Nonobese Nondiabetic (NON)×New Zealand Obese (NZO/HlLtJ) reciprocal backcrosses–were used to test for loci associated with absolute residuals in plasma leptin and arcsin-transformed percent fat (‘phenotypic dispersion’; PDpLep and PDAFP). Individual data from 1,780 mice from 43 inbred strains was also used to estimate genetic variances and covariances for dispersion in each trait. Principal findings Several loci for PDpLep were detected, including possibly syntenic Chr 17 loci, but there was only a single position on Chr 6 for PDAFP. Coding SNP in genes linked to the consensus Chr 17 PDpLep locus occurred in immunological and cancer genes, genes linked to diabetes and energy regulation, post-transcriptional processors and vomeronasal variants. There was evidence of intersexual differences in the genetic architecture of PDpLep. PDpLep had moderate heritability (hs2=0.29) and PDAFP low heritability (hs2=0.12); dispersion in these traits was highly genetically correlated r = 0.8). Conclusions Greater genetic variance for dispersion in plasma leptin, a physiological trait, may reflect its more ephemeral nature compared to body fat, an accrued progressive character. Genetic effects on incidental phenotypes such as leptin might be effectively characterized with randomization-detection methodologies in addition to classical approaches, helping identify incipient or borderline cases or providing new therapeutic targets.
Collapse
Affiliation(s)
- Guy M. L. Perry
- Department of Biology, University of Prince Edward Island, Charlottetown, PEI, Canada
- * E-mail:
| |
Collapse
|
49
|
Zhang B, Li J, Quan L, Chen Y, Lü Q. Sequence-based prediction of protein-protein interaction sites by simplified long short-term memory network. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.05.013] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
50
|
Sarkar D, Saha S. Machine-learning techniques for the prediction of protein–protein interactions. J Biosci 2019. [DOI: 10.1007/s12038-019-9909-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|