1
|
ProB-Site: Protein Binding Site Prediction Using Local Features. Cells 2022; 11:cells11132117. [PMID: 35805201 PMCID: PMC9266162 DOI: 10.3390/cells11132117] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 06/30/2022] [Accepted: 07/01/2022] [Indexed: 01/16/2023] Open
Abstract
Protein–protein interactions (PPIs) are responsible for various essential biological processes. This information can help develop a new drug against diseases. Various experimental methods have been employed for this purpose; however, their application is limited by their cost and time consumption. Alternatively, computational methods are considered viable means to achieve this crucial task. Various techniques have been explored in the literature using the sequential information of amino acids in a protein sequence, including machine learning and deep learning techniques. The current efficiency of interaction-site prediction still has growth potential. Hence, a deep neural network-based model, ProB-site, is proposed. ProB-site utilizes sequential information of a protein to predict its binding sites. The proposed model uses evolutionary information and predicted structural information extracted from sequential information of proteins, generating three unique feature sets for every amino acid in a protein sequence. Then, these feature sets are fed to their respective sub-CNN architecture to acquire complex features. Finally, the acquired features are concatenated and classified using fully connected layers. This methodology performed better than state-of-the-art techniques because of the selection of the best features and contemplation of local information of each amino acid.
Collapse
|
2
|
Multi-task learning to leverage partially annotated data for PPI interface prediction. Sci Rep 2022; 12:10487. [PMID: 35729253 PMCID: PMC9213449 DOI: 10.1038/s41598-022-13951-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 05/31/2022] [Indexed: 11/29/2022] Open
Abstract
Protein protein interactions (PPI) are crucial for protein functioning, nevertheless predicting residues in PPI interfaces from the protein sequence remains a challenging problem. In addition, structure-based functional annotations, such as the PPI interface annotations, are scarce: only for about one-third of all protein structures residue-based PPI interface annotations are available. If we want to use a deep learning strategy, we have to overcome the problem of limited data availability. Here we use a multi-task learning strategy that can handle missing data. We start with the multi-task model architecture, and adapted it to carefully handle missing data in the cost function. As related learning tasks we include prediction of secondary structure, solvent accessibility, and buried residue. Our results show that the multi-task learning strategy significantly outperforms single task approaches. Moreover, only the multi-task strategy is able to effectively learn over a dataset extended with structural feature data, without additional PPI annotations. The multi-task setup becomes even more important, if the fraction of PPI annotations becomes very small: the multi-task learner trained on only one-eighth of the PPI annotations—with data extension—reaches the same performances as the single-task learner on all PPI annotations. Thus, we show that the multi-task learning strategy can be beneficial for a small training dataset where the protein’s functional properties of interest are only partially annotated.
Collapse
|
3
|
Casadio R, Martelli PL, Savojardo C. Machine learning solutions for predicting protein–protein interactions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Rita Casadio
- Biocomputing Group University of Bologna Bologna Italy
| | | | | |
Collapse
|
4
|
Wang P, Zhang G, Yu ZG, Huang G. A Deep Learning and XGBoost-Based Method for Predicting Protein-Protein Interaction Sites. Front Genet 2021; 12:752732. [PMID: 34764983 PMCID: PMC8576272 DOI: 10.3389/fgene.2021.752732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 09/20/2021] [Indexed: 11/29/2022] Open
Abstract
Knowledge about protein-protein interactions is beneficial in understanding cellular mechanisms. Protein-protein interactions are usually determined according to their protein-protein interaction sites. Due to the limitations of current techniques, it is still a challenging task to detect protein-protein interaction sites. In this article, we presented a method based on deep learning and XGBoost (called DeepPPISP-XGB) for predicting protein-protein interaction sites. The deep learning model served as a feature extractor to remove redundant information from protein sequences. The Extreme Gradient Boosting algorithm was used to construct a classifier for predicting protein-protein interaction sites. The DeepPPISP-XGB achieved the following results: area under the receiver operating characteristic curve of 0.681, a recall of 0.624, and area under the precision-recall curve of 0.339, being competitive with the state-of-the-art methods. We also validated the positive role of global features in predicting protein-protein interaction sites.
Collapse
Affiliation(s)
- Pan Wang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Guiyang Zhang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| | - Guohua Huang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| |
Collapse
|
5
|
Zeng M, Zhang F, Wu FX, Li Y, Wang J, Li M. Protein-protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 2020; 36:1114-1120. [PMID: 31593229 DOI: 10.1093/bioinformatics/btz699] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Revised: 07/25/2019] [Accepted: 09/04/2019] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Protein-protein interactions (PPIs) play important roles in many biological processes. Conventional biological experiments for identifying PPI sites are costly and time-consuming. Thus, many computational approaches have been proposed to predict PPI sites. Existing computational methods usually use local contextual features to predict PPI sites. Actually, global features of protein sequences are critical for PPI site prediction. RESULTS A new end-to-end deep learning framework, named DeepPPISP, through combining local contextual and global sequence features, is proposed for PPI site prediction. For local contextual features, we use a sliding window to capture features of neighbors of a target amino acid as in previous studies. For global sequence features, a text convolutional neural network is applied to extract features from the whole protein sequence. Then the local contextual and global sequence features are combined to predict PPI sites. By integrating local contextual and global sequence features, DeepPPISP achieves the state-of-the-art performance, which is better than the other competing methods. In order to investigate if global sequence features are helpful in our deep learning model, we remove or change some components in DeepPPISP. Detailed analyses show that global sequence features play important roles in DeepPPISP. AVAILABILITY AND IMPLEMENTATION The DeepPPISP web server is available at http://bioinformatics.csu.edu.cn/PPISP/. The source code can be obtained from https://github.com/CSUBioGroup/DeepPPISP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon SKS7N5A9, Canada
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| |
Collapse
|
6
|
Large scale analyses of genotype-phenotype relationships of glycine decarboxylase mutations and neurological disease severity. PLoS Comput Biol 2020; 16:e1007871. [PMID: 32421718 PMCID: PMC7259800 DOI: 10.1371/journal.pcbi.1007871] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 05/29/2020] [Accepted: 04/13/2020] [Indexed: 12/14/2022] Open
Abstract
Monogenetic diseases provide unique opportunity for studying complex, clinical states that underlie neurological severity. Loss of glycine decarboxylase (GLDC) can severely impact neurological development as seen in non-ketotic hyperglycinemia (NKH). NKH is a neuro-metabolic disorder lacking quantitative predictors of disease states. It is characterized by elevation of glycine, seizures and failure to thrive, but glycine reduction often fails to confer neurological benefit, suggesting need for alternate tools to distinguish severe from attenuated disease. A major challenge has been that there are 255 unique disease-causing missense mutations in GLDC, of which 206 remain entirely uncharacterized. Here we report a Multiparametric Mutation Score (MMS) developed by combining in silico predictions of stability, evolutionary conservation and protein interaction models and suitable to assess 251 of 255 mutations. In addition, we created a quantitative scale of clinical disease severity comprising of four major disease domains (seizure, cognitive failure, muscular and motor control and brain-malformation) to comprehensively score patient symptoms identified in 131 clinical reports published over the last 15 years. The resulting patient Clinical Outcomes Scores (COS) were used to optimize the MMS for biological and clinical relevance and yield a patient Weighted Multiparametric Mutation Score (WMMS) that separates severe from attenuated neurological disease (p = 1.2 e-5). Our study provides understanding for developing quantitative tools to predict clinical severity of neurological disease and a clinical scale that advances monitoring disease progression needed to evaluate new treatments for NKH. Neurodegenerative disorders frequently have diverse, severe symptoms and health outcomes that can be difficult to predict. The rare disease non-ketotic hyperglycinemia (NKH) additionally has a wide range of disease-causing mutations in glycine decarboxylase (GLDC), a protein that breaks down glycine. But measuring glycine is not sufficient to foretell disease outcome. A method to predict whether a mutation will cause severe or more mild forms of NKH would be very helpful to both understanding the disease as well as developing treatments for it. We used computation-based approaches to develop a mutation score that comprehensively predicts how mutations decrease GLDC function. After training against clinical data, the score was able to predict whether a mutation will cause severe or attenuated disease. This study utilizes the power of computational and multidisciplinary analyses to advance understanding and treatment of genetically caused neurodegenerative diseases.
Collapse
|
7
|
Esmaielbeiki R, Krawczyk K, Knapp B, Nebel JC, Deane CM. Progress and challenges in predicting protein interfaces. Brief Bioinform 2016; 17:117-31. [PMID: 25971595 PMCID: PMC4719070 DOI: 10.1093/bib/bbv027] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 03/18/2015] [Indexed: 12/31/2022] Open
Abstract
The majority of biological processes are mediated via protein-protein interactions. Determination of residues participating in such interactions improves our understanding of molecular mechanisms and facilitates the development of therapeutics. Experimental approaches to identifying interacting residues, such as mutagenesis, are costly and time-consuming and thus, computational methods for this purpose could streamline conventional pipelines. Here we review the field of computational protein interface prediction. We make a distinction between methods which address proteins in general and those targeted at antibodies, owing to the radically different binding mechanism of antibodies. We organize the multitude of currently available methods hierarchically based on required input and prediction principles to provide an overview of the field.
Collapse
|
8
|
Aumentado-Armstrong TT, Istrate B, Murgita RA. Algorithmic approaches to protein-protein interaction site prediction. Algorithms Mol Biol 2015; 10:7. [PMID: 25713596 PMCID: PMC4338852 DOI: 10.1186/s13015-015-0033-9] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2014] [Accepted: 01/07/2015] [Indexed: 12/19/2022] Open
Abstract
Interaction sites on protein surfaces mediate virtually all biological activities, and their identification holds promise for disease treatment and drug design. Novel algorithmic approaches for the prediction of these sites have been produced at a rapid rate, and the field has seen significant advancement over the past decade. However, the most current methods have not yet been reviewed in a systematic and comprehensive fashion. Herein, we describe the intricacies of the biological theory, datasets, and features required for modern protein-protein interaction site (PPIS) prediction, and present an integrative analysis of the state-of-the-art algorithms and their performance. First, the major sources of data used by predictors are reviewed, including training sets, evaluation sets, and methods for their procurement. Then, the features employed and their importance in the biological characterization of PPISs are explored. This is followed by a discussion of the methodologies adopted in contemporary prediction programs, as well as their relative performance on the datasets most recently used for evaluation. In addition, the potential utility that PPIS identification holds for rational drug design, hotspot prediction, and computational molecular docking is described. Finally, an analysis of the most promising areas for future development of the field is presented.
Collapse
|
9
|
Carl N, Hodošček M, Vehar B, Konc J, Brooks BR, Janežič D. Correlating protein hot spot surface analysis using ProBiS with simulated free energies of protein-protein interfacial residues. J Chem Inf Model 2012; 52:2541-9. [PMID: 23009716 DOI: 10.1021/ci3003254] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A protocol was developed for the computational determination of the contribution of interfacial amino acid residues to the free energy of protein-protein binding. Thermodynamic integration, based on molecular dynamics simulation in CHARMM, was used to determine the free energy associated with single point mutations to glycine in a protein-protein interface. The hot spot amino acids found in this way were then correlated to structural similarity scores detected by the ProBiS algorithm for local structural alignment. We find that amino acids with high structural similarity scores contribute on average -3.19 kcal/mol to the free energy of protein-protein binding and are thus correlated with hot spot residues, while residues with low similarity scores contribute on average only -0.43 kcal/mol. This suggests that the local structural alignment method provides a good approximation of the contribution of a residue to the free energy of binding and is particularly useful for detection of hot spots in proteins with known structures but undetermined protein-protein complexes.
Collapse
Affiliation(s)
- Nejc Carl
- National Institute of Chemistry, Hajdrihova 19, SI-1000 Ljubljana, Slovenia
| | | | | | | | | | | |
Collapse
|
10
|
Weisel M, Bitter HM, Diederich F, So WV, Kondru R. PROLIX: rapid mining of protein-ligand interactions in large crystal structure databases. J Chem Inf Model 2012; 52:1450-61. [PMID: 22582806 DOI: 10.1021/ci300034x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
A central problem in structure-based drug design is understanding protein-ligand interactions quantitatively and qualitatively. Several recent studies have highlighted from a qualitative perspective the nature of these interactions and their utility in drug discovery. However, a common limitation is a lack of adequate tools to mine these interactions comprehensively, since exhaustive searches of the protein data bank are time-consuming and difficult to perform. Consequently, fundamental questions remain unanswered: How unique or how common are the protein-ligand interactions observed in a given drug design project when compared to all complexed structures in the protein data bank? Which interaction patterns might explain the affinity of a tool compound toward unwanted targets? To answer these questions and to enable the systematic and comprehensive study of protein-ligand interactions, we introduce PROLIX (Protein Ligand Interaction Explorer), a tool that uses sophisticated fingerprint representations of protein-ligand interaction patterns for rapid data mining in large crystal structure databases. Our implementation strategy pursues a branch-and-bound technique that enables mining against thousands of complexes within a few seconds. Key elements of PROLIX include (i) an intuitive interface that enables users to formulate complex queries easily, (ii) exceptional speed for results retrieval, and (iii) a sophisticated results summarization. Herein we describe the algorithms developed to enable complex queries and fast retrieval of search results, as well as the intuitive aspects of the user interface and summarization viewer.
Collapse
Affiliation(s)
- Martin Weisel
- Discovery Chemistry, Hoffmann-La Roche, Inc., 340 Kingsland Street, Nutley, New Jersey 07110, USA.
| | | | | | | | | |
Collapse
|
11
|
Jordan RA, EL-Manzalawy Y, Dobbs D, Honavar V. Predicting protein-protein interface residues using local surface structural similarity. BMC Bioinformatics 2012; 13:41. [PMID: 22424103 PMCID: PMC3386866 DOI: 10.1186/1471-2105-13-41] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2011] [Accepted: 03/18/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of the residues in protein-protein interaction sites has a significant impact in problems such as drug discovery. Motivated by the observation that the set of interface residues of a protein tend to be conserved even among remote structural homologs, we introduce PrISE, a family of local structural similarity-based computational methods for predicting protein-protein interface residues. RESULTS We present a novel representation of the surface residues of a protein in the form of structural elements. Each structural element consists of a central residue and its surface neighbors. The PrISE family of interface prediction methods uses a representation of structural elements that captures the atomic composition and accessible surface area of the residues that make up each structural element. Each of the members of the PrISE methods identifies for each structural element in the query protein, a collection of similar structural elements in its repository of structural elements and weights them according to their similarity with the structural element of the query protein. PrISEL relies on the similarity between structural elements (i.e. local structural similarity). PrISEG relies on the similarity between protein surfaces (i.e. general structural similarity). PrISEC, combines local structural similarity and general structural similarity to predict interface residues. These predictors label the central residue of a structural element in a query protein as an interface residue if a weighted majority of the structural elements that are similar to it are interface residues, and as a non-interface residue otherwise. The results of our experiments using three representative benchmark datasets show that the PrISEC outperforms PrISEL and PrISEG; and that PrISEC is highly competitive with state-of-the-art structure-based methods for predicting protein-protein interface residues. Our comparison of PrISEC with PredUs, a recently developed method for predicting interface residues of a query protein based on the known interface residues of its (global) structural homologs, shows that performance superior or comparable to that of PredUs can be obtained using only local surface structural similarity. PrISEC is available as a Web server at http://prise.cs.iastate.edu/ CONCLUSIONS Local surface structural similarity based methods offer a simple, efficient, and effective approach to predict protein-protein interface residues.
Collapse
Affiliation(s)
- Rafael A Jordan
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
- Department of Systems and Computer Engineering, Pontificia Universidad Javeriana, Cali, Colombia
| | - Yasser EL-Manzalawy
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
- Department of Systems and Computer Engineering, Al-Azhar University, Cairo, Egypt
| | - Drena Dobbs
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
| | - Vasant Honavar
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
12
|
Guo F, Li SC, Wang L. Protein-protein binding sites prediction by 3D structural similarities. J Chem Inf Model 2011; 51:3287-94. [PMID: 22077765 DOI: 10.1021/ci200206n] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Identifying the location of binding sites on proteins is of fundamental importance for a wide range of applications including molecular docking, de novo drug design, structure identification, and comparison of functional sites. In this paper, we develop an efficient approach for finding binding sites between proteins. Our approach consists of four steps: local sequence alignment, protein surface detection, 3D structure comparison, and candidate binding site selection. A comparison of our method with the LSA algorithm shows that the binding sites predicted by our method are somewhat closer to the actual binding sites in the protein-protein complexes. The software package is available at http://sites.google.com/site/guofeics/pro-bs for noncommercial use.
Collapse
Affiliation(s)
- Fei Guo
- School of Computer Science and Technology, Shandong University, Jinan 250101, Shandong, China
| | | | | |
Collapse
|
13
|
Craig IR, Pfleger C, Gohlke H, Essex JW, Spiegel K. Pocket-space maps to identify novel binding-site conformations in proteins. J Chem Inf Model 2011; 51:2666-79. [PMID: 21910474 DOI: 10.1021/ci200168b] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The identification of novel binding-site conformations can greatly assist the progress of structure-based ligand design projects. Diverse pocket shapes drive medicinal chemistry to explore a broader chemical space and thus present additional opportunities to overcome key drug discovery issues such as potency, selectivity, toxicity, and pharmacokinetics. We report a new automated approach to diverse pocket selection, PocketAnalyzer(PCA), which applies principal component analysis and clustering to the output of a grid-based pocket detection algorithm. Since the approach works directly with pocket shape descriptors, it is free from some of the problems hampering methods that are based on proxy shape descriptors, e.g. a set of atomic positional coordinates. The approach is technically straightforward and allows simultaneous analysis of mutants, isoforms, and protein structures derived from multiple sources with different residue numbering schemes. The PocketAnalyzer(PCA) approach is illustrated by the compilation of diverse sets of pocket shapes for aldose reductase and viral neuraminidase. In both cases this allows identification of novel computationally derived binding-site conformations that are yet to be observed crystallographically. Indeed, known inhibitors capable of exploiting these novel binding-site conformations are subsequently identified, thereby demonstrating the utility of PocketAnalyzer(PCA) for rationalizing and improving the understanding of the molecular basis of protein-ligand interaction and bioactivity. A Python program implementing the PocketAnalyzer(PCA) approach is available for download under an open-source license ( http://sourceforge.net/projects/papca/ or http://cpclab.uni-duesseldorf.de/downloads ).
Collapse
Affiliation(s)
- Ian R Craig
- Novartis Institutes for Biomedical Research, Wimblehurst Road, Horsham, West Sussex RH12 5AB, UK.
| | | | | | | | | |
Collapse
|
14
|
Xue LC, Dobbs D, Honavar V. HomPPI: a class of sequence homology based protein-protein interface prediction methods. BMC Bioinformatics 2011; 12:244. [PMID: 21682895 PMCID: PMC3213298 DOI: 10.1186/1471-2105-12-244] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2010] [Accepted: 06/17/2011] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. RESULTS We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence.Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein.Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. CONCLUSIONS Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners.
Collapse
Affiliation(s)
- Li C Xue
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA.
| | | | | |
Collapse
|
15
|
Carl N, Konc J, Vehar B, Janezic D. Protein-protein binding site prediction by local structural alignment. J Chem Inf Model 2011; 50:1906-13. [PMID: 20919700 DOI: 10.1021/ci100265x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Generalization of an earlier algorithm has led to the development of new local structural alignment algorithms for prediction of protein-protein binding sites. The algorithms use maximum cliques on protein graphs to define structurally similar protein regions. The search for structural neighbors in the new algorithms has been extended to all the proteins in the PDB and the query protein is compared to more than 60,000 proteins or over 300,000 single-chain structures. The resulting structural similarities are combined and used to predict the protein binding sites. This study shows that the location of protein binding sites can be predicted by comparing only local structural similarities irrespective of general protein folds.
Collapse
Affiliation(s)
- Nejc Carl
- National Institute of Chemistry, Hajdrihova 19, SI-1000 Ljubljana, Slovenia
| | | | | | | |
Collapse
|
16
|
Konc J, Janezic D. ProBiS: a web server for detection of structurally similar protein binding sites. Nucleic Acids Res 2010; 38:W436-40. [PMID: 20504855 PMCID: PMC2896105 DOI: 10.1093/nar/gkq479] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A web server, ProBiS, freely available at http://probis.cmm.ki.si, is presented. This provides access to the program ProBiS (Protein Binding Sites), which detects protein binding sites based on local structural alignments. Detailed instructions and user guidelines for use of ProBiS are available at the server under 'HELP' and selected examples are provided under 'EXAMPLES'.
Collapse
Affiliation(s)
- Janez Konc
- National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia
| | | |
Collapse
|
17
|
Konc J, Janezic D. ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment. ACTA ACUST UNITED AC 2010; 26:1160-8. [PMID: 20305268 PMCID: PMC2859123 DOI: 10.1093/bioinformatics/btq100] [Citation(s) in RCA: 184] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Motivation: Exploitation of locally similar 3D patterns of physicochemical properties on the surface of a protein for detection of binding sites that may lack sequence and global structural conservation. Results: An algorithm, ProBiS is described that detects structurally similar sites on protein surfaces by local surface structure alignment. It compares the query protein to members of a database of protein 3D structures and detects with sub-residue precision, structurally similar sites as patterns of physicochemical properties on the protein surface. Using an efficient maximum clique algorithm, the program identifies proteins that share local structural similarities with the query protein and generates structure-based alignments of these proteins with the query. Structural similarity scores are calculated for the query protein's surface residues, and are expressed as different colors on the query protein surface. The algorithm has been used successfully for the detection of protein–protein, protein–small ligand and protein–DNA binding sites. Availability: The software is available, as a web tool, free of charge for academic users at http://probis.cmm.ki.si Contact:dusa@cmm.ki.si Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Janez Konc
- National Institute of Chemistry, Ljubljana, Slovenia
| | | |
Collapse
|
18
|
Moritsugu K, Njunda BM, Smith JC. Theory and Normal-Mode Analysis of Change in Protein Vibrational Dynamics on Ligand Binding. J Phys Chem B 2009; 114:1479-85. [DOI: 10.1021/jp909677p] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Kei Moritsugu
- Center for Molecular Biophysics, University of Tennessee/Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, Tennessee 37831, Research Program for Computational Science, RIKEN, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan, and Computational Molecular Biophysics, Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg, Im Neuenheimer Feld 368, Heidelberg 69120, Germany
| | - Brigitte M. Njunda
- Center for Molecular Biophysics, University of Tennessee/Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, Tennessee 37831, Research Program for Computational Science, RIKEN, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan, and Computational Molecular Biophysics, Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg, Im Neuenheimer Feld 368, Heidelberg 69120, Germany
| | - Jeremy C. Smith
- Center for Molecular Biophysics, University of Tennessee/Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, Tennessee 37831, Research Program for Computational Science, RIKEN, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan, and Computational Molecular Biophysics, Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg, Im Neuenheimer Feld 368, Heidelberg 69120, Germany
| |
Collapse
|
19
|
Miller BT, Singh RP, Klauda JB, Hodoscek M, Brooks BR, Woodcock HL. CHARMMing: a new, flexible web portal for CHARMM. J Chem Inf Model 2008; 48:1920-9. [PMID: 18698840 DOI: 10.1021/ci800133b] [Citation(s) in RCA: 108] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A new web portal for the CHARMM macromolecular modeling package, CHARMMing (CHARMM interface and graphics, http://www.charmming.org), is presented. This tool provides a user-friendly interface for the preparation, submission, monitoring, and visualization of molecular simulations (i.e., energy minimization, solvation, and dynamics). The infrastructure used to implement the web application is described. Two additional programs have been developed and integrated with CHARMMing: GENRTF, which is employed to define structural features not supported by the standard CHARMM force field, and a job broker, which is used to provide a portable method for using grid and cluster computing with CHARMMing. The use of the program is described with three proteins: 1YJP , 1O1O , and 1UFY . Source code is provided allowing CHARMMing to be downloaded, installed, and used by supercomputing centers and research groups that have a CHARMM license. Although no software can replace a scientist's own judgment and experience, CHARMMing eases the introduction of newcomers to the molecular modeling discipline by providing a graphical method for running simulations.
Collapse
Affiliation(s)
- Benjamin T Miller
- Laboratory of Computational Biology, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | | | | | | | |
Collapse
|