1
|
Viswanathan R, Fajardo E, Steinberg G, Haller M, Fiser A. Protein-protein binding supersites. PLoS Comput Biol 2019; 15:e1006704. [PMID: 30615604 PMCID: PMC6336348 DOI: 10.1371/journal.pcbi.1006704] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 01/17/2019] [Accepted: 12/05/2018] [Indexed: 11/19/2022] Open
Abstract
The lack of a deep understanding of how proteins interact remains an important roadblock in advancing efforts to identify binding partners and uncover the corresponding regulatory mechanisms of the functions they mediate. Understanding protein-protein interactions is also essential for designing specific chemical modifications to develop new reagents and therapeutics. We explored the hypothesis of whether protein interaction sites serve as generic biding sites for non-cognate protein ligands, just as it has been observed for small-molecule-binding sites in the past. Using extensive computational docking experiments on a test set of 241 protein complexes, we found that indeed there is a strong preference for non-cognate ligands to bind to the cognate binding site of a receptor. This observation appears to be robust to variations in docking programs, types of non-cognate protein probes, sizes of binding patches, relative sizes of binding patches and full-length proteins, and the exploration of obligate and non-obligate complexes. The accuracy of the docking scoring function appears to play a role in defining the correct site. The frequency of interaction of unrelated probes recognizing the binding interface was utilized in a simple prediction algorithm that showed accuracy competitive with other state of the art methods.
Collapse
Affiliation(s)
- Raji Viswanathan
- Department of Chemistry, Yeshiva University, New York, NY, United States of America
| | - Eduardo Fajardo
- Departments of Systems & Computational Biology, and Biochemistry, Albert Einstein College of Medicine, Bronx, NY, United States of America
| | - Gabriel Steinberg
- Department of Chemistry, Yeshiva University, New York, NY, United States of America
| | - Matthew Haller
- Department of Chemistry, Yeshiva University, New York, NY, United States of America
| | - Andras Fiser
- Departments of Systems & Computational Biology, and Biochemistry, Albert Einstein College of Medicine, Bronx, NY, United States of America
- * E-mail:
| |
Collapse
|
2
|
Aumentado-Armstrong TT, Istrate B, Murgita RA. Algorithmic approaches to protein-protein interaction site prediction. Algorithms Mol Biol 2015; 10:7. [PMID: 25713596 PMCID: PMC4338852 DOI: 10.1186/s13015-015-0033-9] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2014] [Accepted: 01/07/2015] [Indexed: 12/19/2022] Open
Abstract
Interaction sites on protein surfaces mediate virtually all biological activities, and their identification holds promise for disease treatment and drug design. Novel algorithmic approaches for the prediction of these sites have been produced at a rapid rate, and the field has seen significant advancement over the past decade. However, the most current methods have not yet been reviewed in a systematic and comprehensive fashion. Herein, we describe the intricacies of the biological theory, datasets, and features required for modern protein-protein interaction site (PPIS) prediction, and present an integrative analysis of the state-of-the-art algorithms and their performance. First, the major sources of data used by predictors are reviewed, including training sets, evaluation sets, and methods for their procurement. Then, the features employed and their importance in the biological characterization of PPISs are explored. This is followed by a discussion of the methodologies adopted in contemporary prediction programs, as well as their relative performance on the datasets most recently used for evaluation. In addition, the potential utility that PPIS identification holds for rational drug design, hotspot prediction, and computational molecular docking is described. Finally, an analysis of the most promising areas for future development of the field is presented.
Collapse
|
3
|
Chemical specificity and conformational flexibility in proteinase-inhibitor interaction: scaffolds for promiscuous binding. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2014; 116:151-7. [PMID: 25151636 DOI: 10.1016/j.pbiomolbio.2014.08.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Revised: 07/26/2014] [Accepted: 08/02/2014] [Indexed: 11/24/2022]
Abstract
One of the most important roles of proteins in cellular milieu is recognition of other biomolecules including other proteins. Protein-protein complexes are involved in many essential cellular processes. Interfaces of protein-protein complexes are traditionally known to be conserved in evolution and less flexible than other solvent interacting tertiary structural surface. But many examples are emerging where these features do not hold good. An understanding of inter-play between flexibility and sequence conservation is emerging, providing a fresh dimension to the paradigm of sequence-structure-function relationship. The functional manifestation of the inter-relation between sequence conservation and flexibility of interface is exemplified in this review using proteinase-inhibitor protein complexes.
Collapse
|
4
|
Ezkurdia I, Bartoli L, Fariselli P, Casadio R, Valencia A, Tress ML. Progress and challenges in predicting protein-protein interaction sites. Brief Bioinform 2009; 10:233-46. [PMID: 19346321 DOI: 10.1093/bib/bbp021] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The identification of protein-protein interaction sites is an essential intermediate step for mutant design and the prediction of protein networks. In recent years a significant number of methods have been developed to predict these interface residues and here we review the current status of the field. Progress in this area requires a clear view of the methodology applied, the data sets used for training and testing the systems, and the evaluation procedures. We have analysed the impact of a representative set of features and algorithms and highlighted the problems inherent in generating reliable protein data sets and in the posterior analysis of the results. Although it is clear that there have been some improvements in methods for predicting interacting sites, several major bottlenecks remain. Proteins in complexes are still under-represented in the structural databases and in particular many proteins involved in transient complexes are still to be crystallized. We provide suggestions for effective feature selection, and make it clear that community standards for testing, training and performance measures are necessary for progress in the field.
Collapse
Affiliation(s)
- Iakes Ezkurdia
- Centro Nacional de Biotechnolgia, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | | | | | | | | | | |
Collapse
|
5
|
Engelen S, Trojan LA, Sacquin-Mora S, Lavery R, Carbone A. Joint evolutionary trees: a large-scale method to predict protein interfaces based on sequence sampling. PLoS Comput Biol 2009; 5:e1000267. [PMID: 19165315 PMCID: PMC2613531 DOI: 10.1371/journal.pcbi.1000267] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2008] [Accepted: 12/04/2008] [Indexed: 11/18/2022] Open
Abstract
The Joint Evolutionary Trees (JET) method detects protein interfaces, the core
residues involved in the folding process, and residues susceptible to
site-directed mutagenesis and relevant to molecular recognition. The approach,
based on the Evolutionary Trace (ET) method, introduces a novel way to treat
evolutionary information. Families of homologous sequences are analyzed through
a Gibbs-like sampling of distance trees to reduce effects of erroneous multiple
alignment and impacts of weakly homologous sequences on distance tree
construction. The sampling method makes sequence analysis more sensitive to
functional and structural importance of individual residues by avoiding effects
of the overrepresentation of highly homologous sequences and improves
computational efficiency. A carefully designed clustering method is parametrized
on the target structure to detect and extend patches on protein surfaces into
predicted interaction sites. Clustering takes into account residues'
physical-chemical properties as well as conservation. Large-scale application of
JET requires the system to be adjustable for different datasets and to guarantee
predictions even if the signal is low. Flexibility was achieved by a careful
treatment of the number of retrieved sequences, the amino acid distance between
sequences, and the selective thresholds for cluster identification. An iterative
version of JET (iJET) that guarantees finding the most likely interface residues
is proposed as the appropriate tool for large-scale predictions. Tests are
carried out on the Huang database of 62 heterodimer, homodimer, and transient
complexes and on 265 interfaces belonging to signal transduction proteins,
enzymes, inhibitors, antibodies, antigens, and others. A specific set of
proteins chosen for their special functional and structural properties
illustrate JET behavior on a large variety of interactions covering proteins,
ligands, DNA, and RNA. JET is compared at a large scale to ET and to Consurf,
Rate4Site, siteFiNDER|3D, and SCORECONS on specific structures. A significant
improvement in performance and computational efficiency is shown. Information obtained on the structure of macromolecular complexes is important
for identifying functionally important partners but also for determining how
such interactions will be perturbed by natural or engineered site mutations.
Hence, to fully understand or control biological processes we need to predict in
the most accurate manner protein interfaces for a protein structure, possibly
without knowing its partners. Joint Evolutionary Trees (JET) is a method
designed to detect very different types of interactions of a protein with
another protein, ligands, DNA, and RNA. It uses a carefully designed sampling
method, making sequence analysis more sensitive to the functional and structural
importance of individual residues, and a clustering method parametrized on the
target structure for the detection of patches on protein surfaces and their
extension into predicted interaction sites. JET is a large-scale method, highly
accurate and potentially applicable to search for protein partners.
Collapse
Affiliation(s)
- Stefan Engelen
- Génomique Analytique, Université Pierre et Marie
Curie-Paris 6, UMR S511, Paris, France
- INSERM, U511, Paris, France
| | - Ladislas A. Trojan
- Génomique Analytique, Université Pierre et Marie
Curie-Paris 6, UMR S511, Paris, France
- INSERM, U511, Paris, France
| | | | - Richard Lavery
- Institut de Biologie et Chimie des Protéines, CNRS UMR
5086/IFR 128/Université de Lyon, Lyon, France
| | - Alessandra Carbone
- Génomique Analytique, Université Pierre et Marie
Curie-Paris 6, UMR S511, Paris, France
- INSERM, U511, Paris, France
- * E-mail:
| |
Collapse
|
6
|
Kanamori E, Murakami Y, Tsuchiya Y, Standley DM, Nakamura H, Kinoshita K. Docking of protein molecular surfaces with evolutionary trace analysis. Proteins 2008; 69:832-8. [PMID: 17803239 DOI: 10.1002/prot.21737] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We have developed a new method to predict protein- protein complexes based on the shape complementarity of the molecular surfaces, along with sequence conservation obtained by evolutionary trace (ET) analysis. The docking is achieved by optimization of an object function that evaluates the degree of shape complementarity weighted by the conservation of the interacting residues. The optimization is carried out using a genetic algorithm in combination with Monte Carlo sampling. We applied this method to CAPRI targets and evaluated the performance systematically. Consequently, our method could achieve native-like predictions in several cases. In addition, we have analyzed the feasibility of the ET method for docking simulations, and found that the conservation information was useful only in a limited category of proteins (signal related proteins and enzymes).
Collapse
Affiliation(s)
- Eiji Kanamori
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, 2-41-6 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | | | | | | | | | | |
Collapse
|
7
|
Kufareva I, Budagyan L, Raush E, Totrov M, Abagyan R. PIER: protein interface recognition for structural proteomics. Proteins 2007; 67:400-17. [PMID: 17299750 DOI: 10.1002/prot.21233] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Recent advances in structural proteomics call for development of fast and reliable automatic methods for prediction of functional surfaces of proteins with known three-dimensional structure, including binding sites for known and unknown protein partners as well as oligomerization interfaces. Despite significant progress the problem is still far from being solved. Most existing methods rely, at least partially, on evolutionary information from multiple sequence alignments projected on protein surface. The common drawback of such methods is their limited applicability to the proteins with a sparse set of sequential homologs, as well as inability to detect interfaces in evolutionary variable regions. In this study, the authors developed an improved method for predicting interfaces from a single protein structure, which is based on local statistical properties of the protein surface derived at the level of atomic groups. The proposed Protein IntErface Recognition (PIER) method achieved the overall precision of 60% at the recall threshold of 50% at the residue level on a diverse benchmark of 490 homodimeric, 62 heterodimeric, and 196 transient interfaces (compared with 25% precision at 50% recall expected from random residue function assignment). For 70% of proteins in the benchmark, the binding patch residues were successfully detected with precision exceeding 50% at 50% recall. The calculation only took seconds for an average 300-residue protein. The authors demonstrated that adding the evolutionary conservation signal only marginally influenced the overall prediction performance on the benchmark; moreover, for certain classes of proteins, using this signal actually resulted in a deteriorated prediction. Thorough benchmarking using other datasets from literature showed that PIER yielded improved performance as compared with several alignment-free or alignment-dependent predictions. The accuracy, efficiency, and dependence on structure alone make PIER a suitable tool for automated high-throughput annotation of protein structures emerging from structural proteomics projects.
Collapse
Affiliation(s)
- Irina Kufareva
- Scripps Research Institute, La Jolla, California 92037, USA
| | | | | | | | | |
Collapse
|
8
|
Liang S, Zhang C, Liu S, Zhou Y. Protein binding site prediction using an empirical scoring function. Nucleic Acids Res 2006; 34:3698-707. [PMID: 16893954 PMCID: PMC1540721 DOI: 10.1093/nar/gkl454] [Citation(s) in RCA: 194] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Most biological processes are mediated by interactions between proteins and their interacting partners including proteins, nucleic acids and small molecules. This work establishes a method called PINUP for binding site prediction of monomeric proteins. With only two weight parameters to optimize, PINUP produces not only 42.2% coverage of actual interfaces (percentage of correctly predicted interface residues in actual interface residues) but also 44.5% accuracy in predicted interfaces (percentage of correctly predicted interface residues in the predicted interface residues) in a cross validation using a 57-protein dataset. By comparison, the expected accuracy via random prediction (percentage of actual interface residues in surface residues) is only 15%. The binding sites of the 57-protein set are found to be easier to predict than that of an independent test set of 68 proteins. The average coverage and accuracy for this independent test set are 30.5 and 29.4%, respectively. The significant gain of PINUP over expected random prediction is attributed to (i) effective residue-energy score and accessible-surface-area-dependent interface-propensity, (ii) isolation of functional constraints contained in the conservation score from the structural constraints through the combination of residue-energy score (for structural constraints) and conservation score and (iii) a consensus region built on top-ranked initial patches.
Collapse
Affiliation(s)
| | | | | | - Yaoqi Zhou
- To whom correspondence should be addressed. Tel: +1 716 829 2985; Fax: +1 716 829 2344;
| |
Collapse
|
9
|
de Vries SJ, van Dijk ADJ, Bonvin AMJJ. WHISCY: what information does surface conservation yield? Application to data-driven docking. Proteins 2006; 63:479-89. [PMID: 16450362 DOI: 10.1002/prot.20842] [Citation(s) in RCA: 117] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Protein-protein interactions play a key role in biological processes. Identifying the interacting residues is a first step toward understanding these interactions at a structural level. In this study, the interface prediction program WHISCY is presented. It combines surface conservation and structural information to predict protein-protein interfaces. The accuracy of the predictions is more than three times higher than a random prediction. These predictions have been combined with another interface prediction program, ProMate [Neuvirth et al. J Mol Biol 2004;338:181-199], resulting in an even more accurate predictor. The usefulness of the predictions was tested using the data-driven docking program HADDOCK [Dominguez et al. J Am Chem Soc 2003;125:1731-1737] in an unbound docking experiment, with the goal of generating as many near-native structures as possible. Unrefined rigid body docking solutions within 10 A ligand RMSD from the true structure were generated for 22 out of 25 docked complexes. For 18 complexes, more than 100 of the 8000 generated models were correct. Our results demonstrates the potential of using interface predictions to drive protein-protein docking.
Collapse
Affiliation(s)
- Sjoerd J de Vries
- Bijvoet Center for Biomolecular Research, Utrecht University, Utrecht, The Netherlands
| | | | | |
Collapse
|
10
|
Bradford JR, Needham CJ, Bulpitt AJ, Westhead DR. Insights into protein-protein interfaces using a Bayesian network prediction method. J Mol Biol 2006; 362:365-86. [PMID: 16919296 DOI: 10.1016/j.jmb.2006.07.028] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2006] [Revised: 06/15/2006] [Accepted: 07/13/2006] [Indexed: 11/26/2022]
Abstract
Identifying the interface between two interacting proteins provides important clues to the function of a protein, and is becoming increasing relevant to drug discovery. Here, surface patch analysis was combined with a Bayesian network to predict protein-protein binding sites with a success rate of 82% on a benchmark dataset of 180 proteins, improving by 6% on previous work and well above the 36% that would be achieved by a random method. A comparable success rate was achieved even when evolutionary information was missing, a further improvement on our previous method which was unable to handle incomplete data automatically. In a case study of the Mog1p family, we showed that our Bayesian network method can aid the prediction of previously uncharacterised binding sites and provide important clues to protein function. On Mog1p itself a putative binding site involved in the SLN1-SKN7 signal transduction pathway was detected, as was a Ran binding site, previously characterized solely by conservation studies, even though our automated method operated without using homologous proteins. On the remaining members of the family (two structural genomics targets, and a protein involved in the photosystem II complex in higher plants) we identified novel binding sites with little correspondence to those on Mog1p. These results suggest that members of the Mog1p family bind to different proteins and probably have different functions despite sharing the same overall fold. We also demonstrated the applicability of our method to drug discovery efforts by successfully locating a number of binding sites involved in the protein-protein interaction network of papilloma virus infection. In a separate study, we attempted to distinguish between the two types of binding site, obligate and non-obligate, within our dataset using a second Bayesian network. This proved difficult although some separation was achieved on the basis of patch size, electrostatic potential and conservation. Such was the similarity between the two interacting patch types, we were able to use obligate binding site properties to predict the location of non-obligate binding sites and vice versa.
Collapse
Affiliation(s)
- James R Bradford
- Institute of Molecular and Cellular Biology, University of Leeds, Leeds, LS2 9JT, UK
| | | | | | | |
Collapse
|
11
|
Chelliah V, Blundell TL, Fernández-Recio J. Efficient Restraints for Protein–Protein Docking by Comparison of Observed Amino Acid Substitution Patterns with those Predicted from Local Environment. J Mol Biol 2006; 357:1669-82. [PMID: 16488431 DOI: 10.1016/j.jmb.2006.01.001] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2005] [Revised: 11/28/2005] [Accepted: 01/03/2006] [Indexed: 11/28/2022]
Abstract
The discovery that the functions of most eukaryotic gene products are mediated through multi-protein complexes makes the prediction of protein interactions one of the most important current challenges in structural biology. Rigid-body docking methods can generate a large number of alternative candidates, but it is difficult to discriminate the near-native interactions from the large number of false positives. Many different scoring functions have been developed for this purpose, but in most cases, experimental and biological information is still required for accurate predictions. We explore here the use of evolutionary restraints in evaluating rigid-body docking geometries. In order to identify potential interface residues we identify functional residues based on the comparison of observed amino acid substitutions with those predicted from local environment. The interface residues identified by this method are correctly located in 85% of the cases. These predicted interface residues are used to define distance restraints that help to score rigid-body docking solutions. We have developed the pyDockRST software, which uses the percentage of satisfied distance restraints, together with the electrostatics and desolvation binding energy, to identify correct docking orientations. This methodology dramatically improves the docking results when compared to the use of energy criteria alone, and is able to find the correct orientation within the top 20 docking solutions in 80% of the cases.
Collapse
Affiliation(s)
- Vijayalakshmi Chelliah
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
| | | | | |
Collapse
|
12
|
Cherkasov A, Lee SJ, Nandan D, Reiner NE. Large-scale survey for potentially targetable indels in bacterial and protozoan proteins. Proteins 2006; 62:371-80. [PMID: 16315289 DOI: 10.1002/prot.20631] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Our previous results demonstrated that some essential, housekeeping proteins from pathogenic microorganisms may contain sizable insertions-deletions in their sequences (compared to close human homologs) that can be responsible for unexpected virulence properties. For example, we found that indel-bearing elongation factor-1alpha from several pathogenic protozoa can activate a human tyrosine phosphatase SHP-1 leading to deactivation of macrophages. On the one hand, these findings allowed development of a strategy for targeting some indel-containing pathogen proteins that have similar human counterparts. On the other hand, the results raised numerous questions regarding the nature and implications of sequence indels in pathogen proteins. In the present study, we conducted a large-scale survey of indels in proteins from 136 bacterial and protozoan genomes. It has been established that sizable insertions and deletions occur in approximately 5-10% of bacterial proteins with close human homologs, while proteins from the protozoan pathogens such as Trypanosoma cruzi, Plasmodium falciparum, and Leishmania donovani exhibit elevated indel content that can reach up to 25%. The finding suggested that the occurrence of sequence indels may be involved in the evolution of pathogenic mechanisms in these protozoa.
Collapse
Affiliation(s)
- Artem Cherkasov
- Division of Infectious Diseases, Department of Medicine, University of British Columbia, Faculty of Medicine, Vancouver Coastal Health Research Institute, Vancouver, British Columbia, Canada.
| | | | | | | |
Collapse
|
13
|
Burgoyne NJ, Jackson RM. Predicting protein interaction sites: binding hot-spots in protein–protein and protein–ligand interfaces. Bioinformatics 2006; 22:1335-42. [PMID: 16522669 DOI: 10.1093/bioinformatics/btl079] [Citation(s) in RCA: 132] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Protein assemblies are currently poorly represented in structural databases and their structural elucidation is a key goal in biology. Here we analyse clefts in protein surfaces, likely to correspond to binding 'hot-spots', and rank them according to sequence conservation and simple measures of physical properties including hydrophobicity, desolvation, electrostatic and van der Waals potentials, to predict which are involved in binding in the native complex. RESULTS The resulting differences between predicting binding-sites at protein-protein and protein-ligand interfaces are striking. There is a high level of prediction accuracy (< or =93%) for protein-ligand interactions, based on the following attributes: van der Waals potential, electrostatic potential, desolvation and surface conservation. Generally, the prediction accuracy for protein-protein interactions is lower, with the exception of enzymes. Our results show that the ease of cleft desolvation is strongly predictive of interfaces and strongly maintained across all classes of protein-binding interface.
Collapse
Affiliation(s)
- Nicholas J Burgoyne
- Institute of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK
| | | |
Collapse
|
14
|
Bradford JR, Westhead DR. Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2004; 21:1487-94. [PMID: 15613384 DOI: 10.1093/bioinformatics/bti242] [Citation(s) in RCA: 289] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Structural genomics projects are beginning to produce protein structures with unknown function, therefore, accurate, automated predictors of protein function are required if all these structures are to be properly annotated in reasonable time. Identifying the interface between two interacting proteins provides important clues to the function of a protein and can reduce the search space required by docking algorithms to predict the structures of complexes. RESULTS We have combined a support vector machine (SVM) approach with surface patch analysis to predict protein-protein binding sites. Using a leave-one-out cross-validation procedure, we were able to successfully predict the location of the binding site on 76% of our dataset made up of proteins with both transient and obligate interfaces. With heterogeneous cross-validation, where we trained the SVM on transient complexes to predict on obligate complexes (and vice versa), we still achieved comparable success rates to the leave-one-out cross-validation suggesting that sufficient properties are shared between transient and obligate interfaces. AVAILABILITY A web application based on the method can be found at http://www.bioinformatics.leeds.ac.uk/ppi_pred. The dataset of 180 proteins used in this study is also available via the same web site. CONTACT westhead@bmb.leeds.ac.uk SUPPLEMENTARY INFORMATION http://www.bioinformatics.leeds.ac.uk/ppi-pred/supp-material.
Collapse
Affiliation(s)
- James R Bradford
- School of Biochemistry and Molecular Biology, University of Leeds, Leeds, LS2 9JT, UK
| | | |
Collapse
|