301
|
Guo F, Li SC, Wang L, Zhu D. Protein-protein binding site identification by enumerating the configurations. BMC Bioinformatics 2012; 13:158. [PMID: 22768846 PMCID: PMC3478195 DOI: 10.1186/1471-2105-13-158] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 06/15/2012] [Indexed: 11/10/2022] Open
Abstract
Background The ability to predict protein-protein binding sites has a wide range of applications, including signal transduction studies, de novo drug design, structure identification and comparison of functional sites. The interface in a complex involves two structurally matched protein subunits, and the binding sites can be predicted by identifying structural matches at protein surfaces. Results We propose a method which enumerates “all” the configurations (or poses) between two proteins (3D coordinates of the two subunits in a complex) and evaluates each configuration by the interaction between its components using the Atomic Contact Energy function. The enumeration is achieved efficiently by exploring a set of rigid transformations. Our approach incorporates a surface identification technique and a method for avoiding clashes of two subunits when computing rigid transformations. When the optimal transformations according to the Atomic Contact Energy function are identified, the corresponding binding sites are given as predictions. Our results show that this approach consistently performs better than other methods in binding site identification. Conclusions Our method achieved a success rate higher than other methods, with the prediction quality improved in terms of both accuracy and coverage. Moreover, our method is being able to predict the configurations of two binding proteins, where most of other methods predict only the binding sites. The software package is available at
http://sites.google.com/site/guofeics/dobi for non-commercial use.
Collapse
Affiliation(s)
- Fei Guo
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | | | | | | |
Collapse
|
302
|
Vihinen M. How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis. BMC Genomics 2012; 13 Suppl 4:S2. [PMID: 22759650 PMCID: PMC3303716 DOI: 10.1186/1471-2164-13-s4-s2] [Citation(s) in RCA: 155] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Background Prediction methods are increasingly used in biosciences to forecast diverse features and characteristics. Binary two-state classifiers are the most common applications. They are usually based on machine learning approaches. For the end user it is often problematic to evaluate the true performance and applicability of computational tools as some knowledge about computer science and statistics would be needed. Results Instructions are given on how to interpret and compare method evaluation results. For systematic method performance analysis is needed established benchmark datasets which contain cases with known outcome, and suitable evaluation measures. The criteria for benchmark datasets are discussed along with their implementation in VariBench, benchmark database for variations. There is no single measure that alone could describe all the aspects of method performance. Predictions of genetic variation effects on DNA, RNA and protein level are important as information about variants can be produced much faster than their disease relevance can be experimentally verified. Therefore numerous prediction tools have been developed, however, systematic analyses of their performance and comparison have just started to emerge. Conclusions The end users of prediction tools should be able to understand how evaluation is done and how to interpret the results. Six main performance evaluation measures are introduced. These include sensitivity, specificity, positive predictive value, negative predictive value, accuracy and Matthews correlation coefficient. Together with receiver operating characteristics (ROC) analysis they provide a good picture about the performance of methods and allow their objective and quantitative comparison. A checklist of items to look at is provided. Comparisons of methods for missense variant tolerance, protein stability changes due to amino acid substitutions, and effects of variations on mRNA splicing are presented.
Collapse
Affiliation(s)
- Mauno Vihinen
- Institute of Biomedical Technology, University of Tampere, Finland.
| |
Collapse
|
303
|
Khashan R, Zheng W, Tropsha A. Scoring protein interaction decoys using exposed residues (SPIDER): a novel multibody interaction scoring function based on frequent geometric patterns of interfacial residues. Proteins 2012; 80:2207-17. [PMID: 22581643 DOI: 10.1002/prot.24110] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2012] [Revised: 04/05/2012] [Accepted: 04/20/2012] [Indexed: 01/14/2023]
Abstract
Accurate prediction of the structure of protein-protein complexes in computational docking experiments remains a formidable challenge. It has been recognized that identifying native or native-like poses among multiple decoys is the major bottleneck of the current scoring functions used in docking. We have developed a novel multibody pose-scoring function that has no theoretical limit on the number of residues contributing to the individual interaction terms. We use a coarse-grain representation of a protein-protein complex where each residue is represented by its side chain centroid. We apply a computational geometry approach called Almost-Delaunay tessellation that transforms protein-protein complexes into a residue contact network, or an undirectional graph where vertex-residues are nodes connected by edges. This treatment forms a family of interfacial graphs representing a dataset of protein-protein complexes. We then employ frequent subgraph mining approach to identify common interfacial residue patterns that appear in at least a subset of native protein-protein interfaces. The geometrical parameters and frequency of occurrence of each "native" pattern in the training set are used to develop the new SPIDER scoring function. SPIDER was validated using standard "ZDOCK" benchmark dataset that was not used in the development of SPIDER. We demonstrate that SPIDER scoring function ranks native and native-like poses above geometrical decoys and that it exceeds in performance a popular ZRANK scoring function. SPIDER was ranked among the top scoring functions in a recent round of CAPRI (Critical Assessment of PRedicted Interactions) blind test of protein-protein docking methods.
Collapse
Affiliation(s)
- Raed Khashan
- Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | | | | |
Collapse
|
304
|
Pérez-Cano L, Jiménez-García B, Fernández-Recio J. A protein-RNA docking benchmark (II): Extended set from experimental and homology modeling data. Proteins 2012; 80:1872-82. [DOI: 10.1002/prot.24075] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2011] [Revised: 02/28/2012] [Accepted: 03/30/2012] [Indexed: 12/13/2022]
|
305
|
Arbitrary protein-protein docking targets biologically relevant interfaces. BMC BIOPHYSICS 2012; 5:7. [PMID: 22559010 PMCID: PMC3441232 DOI: 10.1186/2046-1682-5-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2012] [Accepted: 04/11/2012] [Indexed: 11/10/2022]
Abstract
BACKGROUND Protein-protein recognition is of fundamental importance in the vast majority of biological processes. However, it has already been demonstrated that it is very hard to distinguish true complexes from false complexes in so-called cross-docking experiments, where binary protein complexes are separated and the isolated proteins are all docked against each other and scored. Does this result, at least in part, reflect a physical reality? False complexes could reflect possible nonspecific or weak associations. RESULTS In this paper, we investigate the twilight zone of protein-protein interactions, building on an interesting outcome of cross-docking experiments: false complexes seem to favor residues from the true interaction site, suggesting that randomly chosen partners dock in a non-random fashion on protein surfaces. Here, we carry out arbitrary docking of a non-redundant data set of 198 proteins, with more than 300 randomly chosen "probe" proteins. We investigate the tendency of arbitrary partners to aggregate at localized regions of the protein surfaces, the shape and compositional bias of the generated interfaces, and the potential of this property to predict biologically relevant binding sites. We show that the non-random localization of arbitrary partners after protein-protein docking is a generic feature of protein structures. The interfaces generated in this way are not systematically planar or curved, but tend to be closer than average to the center of the proteins. These results can be used to predict biological interfaces with an AUC value up to 0.69 alone, and 0.72 when used in combination with evolutionary information. An appropriate choice of random partners and number of docking models make this method computationally practical. It is also noted that nonspecific interfaces can point to alternate interaction sites in the case of proteins with multiple interfaces. We illustrate the usefulness of arbitrary docking using PEBP (Phosphatidylethanolamine binding protein), a kinase inhibitor with multiple partners. CONCLUSIONS An approach using arbitrary docking, and based solely on physical properties, can successfully identify biologically pertinent protein interfaces.
Collapse
|
306
|
A holistic in silico approach to predict functional sites in protein structures. Bioinformatics 2012; 28:1845-50. [DOI: 10.1093/bioinformatics/bts269] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
307
|
Garcia-Garcia J, Bonet J, Guney E, Fornes O, Planas J, Oliva B. Networks of ProteinProtein Interactions: From Uncertainty to Molecular Details. Mol Inform 2012; 31:342-62. [PMID: 27477264 DOI: 10.1002/minf.201200005] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2012] [Accepted: 03/09/2012] [Indexed: 11/08/2022]
Abstract
Proteins are the bricks and mortar of cells. The work of proteins is structural and functional, as they are the principal element of the organization of the cell architecture, but they also play a relevant role in its metabolism and regulation. To perform all these functions, proteins need to interact with each other and with other bio-molecules, either to form complexes or to recognize precise targets of their action. For instance, a particular transcription factor may activate one gene or another depending on its interactions with other proteins and not only with DNA. Hence, the ability of a protein to interact with other bio-molecules, and the partners they have at each particular time and location can be crucial to characterize the role of a protein. Proteins rarely act alone; they rather constitute a mingled network of physical interactions or other types of relationships (such as metabolic and regulatory) or signaling cascades. In this context, understanding the function of a protein implies to recognize the members of its neighborhood and to grasp how they associate, both at the systemic and atomic level. The network of physical interactions between the proteins of a system, cell or organism, is defined as the interactome. The purpose of this review is to deepen the description of interactomes at different levels of detail: from the molecular structure of complexes to the global topology of the network of interactions. The approaches and techniques applied experimentally and computationally to attain each level are depicted. The limits of each technique and its integration into a model network, the challenges and actual problems of completeness of an interactome, and the reliability of the interactions are reviewed and summarized. Finally, the application of the current knowledge of protein-protein interactions on modern network medicine and protein function annotation is also explored.
Collapse
Affiliation(s)
- Javier Garcia-Garcia
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Jaume Bonet
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Emre Guney
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Oriol Fornes
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Joan Planas
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Baldo Oliva
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain.
| |
Collapse
|
308
|
Qin S, Pang X, Zhou HX. Automated prediction of protein association rate constants. Structure 2012; 19:1744-51. [PMID: 22153497 DOI: 10.1016/j.str.2011.10.015] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2011] [Revised: 10/12/2011] [Accepted: 10/26/2011] [Indexed: 02/06/2023]
Abstract
The association rate constants (k(a)) of proteins with other proteins or other macromolecular targets are a fundamental biophysical property. Observed rate constants span over ten orders of magnitude, from 1 to 10(10) M(-1)s(-1). Protein association can be rate limited either by the diffusional approach of the subunits to form a transient complex, with near-native separation and orientation but without short-range native interactions, or by the subsequent conformational rearrangement to form the native complex. Our transient-complex theory showed promise in predicting k(a) in the diffusion-limited regime. Here, we develop it into a web server called TransComp (http://pipe.sc.fsu.edu/transcomp/) and report on the server's accuracy and robustness based on applications to over 100 protein complexes. We expect this server to be a valuable tool for systems biology applications and for kinetic characterization of protein-protein and protein-nucleic acid association in general.
Collapse
Affiliation(s)
- Sanbo Qin
- Department of Physics and Institute of Molecular Biophysics, Tallahassee, FL 32306, USA
| | | | | |
Collapse
|
309
|
Swapna LS, Bhaskara RM, Sharma J, Srinivasan N. Roles of residues in the interface of transient protein-protein complexes before complexation. Sci Rep 2012; 2:334. [PMID: 22451863 PMCID: PMC3312204 DOI: 10.1038/srep00334] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 03/07/2012] [Indexed: 12/26/2022] Open
Abstract
Transient protein-protein interactions play crucial roles in all facets of cellular physiology. Here, using an analysis on known 3-D structures of transient protein-protein complexes, their corresponding uncomplexed forms and energy calculations we seek to understand the roles of protein-protein interfacial residues in the unbound forms. We show that there are conformationally near invariant and evolutionarily conserved interfacial residues which are rigid and they account for ∼65% of the core interface. Interestingly, some of these residues contribute significantly to the stabilization of the interface structure in the uncomplexed form. Such residues have strong energetic basis to perform dual roles of stabilizing the structure of the uncomplexed form as well as the complex once formed while they maintain their rigid nature throughout. This feature is evolutionarily well conserved at both the structural and sequence levels. We believe this analysis has general bearing in the prediction of interfaces and understanding molecular recognition.
Collapse
|
310
|
Vreven T, Hwang H, Pierce BG, Weng Z. Prediction of protein-protein binding free energies. Protein Sci 2012; 21:396-404. [PMID: 22238219 DOI: 10.1002/pro.2027] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2011] [Revised: 12/23/2011] [Accepted: 01/04/2012] [Indexed: 11/09/2022]
Abstract
We present an energy function for predicting binding free energies of protein-protein complexes, using the three-dimensional structures of the complex and unbound proteins as input. Our function is a linear combination of nine terms and achieves a correlation coefficient of 0.63 with experimental measurements when tested on a benchmark of 144 complexes using leave-one-out cross validation. Although we systematically tested both atomic and residue-based scoring functions, the selected function is dominated by residue-based terms. Our function is stable for subsets of the benchmark stratified by experimental pH and extent of conformational change upon complex formation, with correlation coefficients ranging from 0.61 to 0.66.
Collapse
Affiliation(s)
- Thom Vreven
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | | | | | | |
Collapse
|
311
|
Flores SC, Bernauer J, Shin S, Zhou R, Huang X. Multiscale modeling of macromolecular biosystems. Brief Bioinform 2012; 13:395-405. [DOI: 10.1093/bib/bbr077] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
312
|
|
313
|
Huang W, Liu H. Optimized grid-based protein-protein docking as a global search tool followed by incorporating experimentally derivable restraints. Proteins 2011; 80:691-702. [PMID: 22190391 DOI: 10.1002/prot.23223] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2011] [Revised: 10/10/2011] [Accepted: 10/12/2011] [Indexed: 12/16/2022]
Abstract
Unbound protein docking, or the computational prediction of the structure of a protein complex from the structures of its separated components, is of importance but still challenging. A practical approach toward reliable results for unbound docking is to incorporate experimentally derived information with computation. To this end, truly systematic search of the global docking space is desirable. The fast Fourier transform (FFT) docking is a systematic search method with high computational efficiency. However, by using FFT to perform unbound docking, possible conformational changes upon binding must be treated implicitly. To better accommodate the implicit treatment of conformational flexibility, we develop a rational approach to optimize "softened" parameters for FFT docking. In connection with the increased "softness" of the parameters in this global search step, we use a revised rule to select candidate models from the search results. For complexes designated as of low and medium difficulty for unbound docking, these adaptations of the original FTDOCK program lead to substantial improvements of the global search results. Finally, we show that models resulted from FFT-based global search can be further filtered with restraints derivable from nuclear magnetic resonance (NMR) chemical shift perturbation or mutagenesis experiments, leading to a small set of models that can be feasibly refined and evaluated using computationally more expensive methods and that still include high-ranking near-native conformations.
Collapse
Affiliation(s)
- Wei Huang
- School of Life Sciences and Hefei National Laboratory for Physical Sciences at the Microscale, University of Science and Technology of China (USTC), Hefei, Anhui 230027, People's Republic of China
| | | |
Collapse
|
314
|
Masone D, Vaca ICD, Pons C, Recio JF, Guallar V. H-bond network optimization in protein-protein complexes: are all-atom force field scores enough? Proteins 2011; 80:818-24. [PMID: 22113891 DOI: 10.1002/prot.23239] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2011] [Revised: 10/03/2011] [Accepted: 10/20/2011] [Indexed: 11/08/2022]
Abstract
Structural prediction of protein-protein complexes given the structures of the two interacting compounds in their unbound state is a key problem in biophysics. In addition to the problem of sampling of near-native orientations, one of the modeling main difficulties is to discriminate true from false positives. Here, we present a hierarchical protocol for docking refinement able to discriminate near native poses from a group of docking candidates. The main idea is to combine an efficient sampling of the full system hydrogen bond network and side chains, together with an all-atom force field and a surface generalized born implicit solvent. We tested our method on a set of twenty two complexes containing a near-native solution within the top 100 docking poses, obtaining a near native solution as the top pose in 70% of the cases. We show that all atom force fields optimized H-bond networks do improve significantly state of the art scoring functions.
Collapse
Affiliation(s)
- Diego Masone
- Joint BSC-IRB Research Program in Computational Biology. Barcelona, Spain Supercomputing Center, 08034 Barcelona, Spain
| | | | | | | | | |
Collapse
|
315
|
Benchmarks for flexible and rigid transcription factor-DNA docking. BMC STRUCTURAL BIOLOGY 2011; 11:45. [PMID: 22044637 PMCID: PMC3262759 DOI: 10.1186/1472-6807-11-45] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2011] [Accepted: 11/01/2011] [Indexed: 12/27/2022]
Abstract
BACKGROUND Structural insight from transcription factor-DNA (TF-DNA) complexes is of paramount importance to our understanding of the affinity and specificity of TF-DNA interaction, and to the development of structure-based prediction of TF binding sites. Yet the majority of the TF-DNA complexes remain unsolved despite the considerable experimental efforts being made. Computational docking represents a promising alternative to bridge the gap. To facilitate the study of TF-DNA docking, carefully designed benchmarks are needed for performance evaluation and identification of the strengths and weaknesses of docking algorithms. RESULTS We constructed two benchmarks for flexible and rigid TF-DNA docking respectively using a unified non-redundant set of 38 test cases. The test cases encompass diverse fold families and are classified into easy and hard groups with respect to the degrees of difficulty in TF-DNA docking. The major parameters used to classify expected docking difficulty in flexible docking are the conformational differences between bound and unbound TFs and the interaction strength between TFs and DNA. For rigid docking in which the starting structure is a bound TF conformation, only interaction strength is considered. CONCLUSIONS We believe these benchmarks are important for the development of better interaction potentials and TF-DNA docking algorithms, which bears important implications to structure-based prediction of transcription factor binding sites and drug design.
Collapse
|
316
|
Moal IH, Agius R, Bates PA. Protein-protein binding affinity prediction on a diverse set of structures. Bioinformatics 2011; 27:3002-9. [PMID: 21903632 DOI: 10.1093/bioinformatics/btr513] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2024] Open
Abstract
MOTIVATION Accurate binding free energy functions for protein-protein interactions are imperative for a wide range of purposes. Their construction is predicated upon ascertaining the factors that influence binding and their relative importance. A recent benchmark of binding affinities has allowed, for the first time, the evaluation and construction of binding free energy models using a diverse set of complexes, and a systematic assessment of our ability to model the energetics of conformational changes. RESULTS We construct a large set of molecular descriptors using commonly available tools, introducing the use of energetic factors associated with conformational changes and disorder to order transitions, as well as features calculated on structural ensembles. The descriptors are used to train and test a binding free energy model using a consensus of four machine learning algorithms, whose performance constitutes a significant improvement over the other state of the art empirical free energy functions tested. The internal workings of the learners show how the descriptors are used, illuminating the determinants of protein-protein binding. AVAILABILITY The molecular descriptor set and descriptor values for all complexes are available in the Supplementary Material. A web server for the learners and coordinates for the bound and unbound structures can be accessed from the website: http://bmm.cancerresearchuk.org/~Affinity. CONTACT paul.bates@cancer.org.uk. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Iain H Moal
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, London WC2A 3LY, UK
| | | | | |
Collapse
|
317
|
Zellner H, Staudigel M, Trenner T, Bittkowski M, Wolowski V, Icking C, Merkl R. Prescont: Predicting protein-protein interfaces utilizing four residue properties. Proteins 2011; 80:154-68. [DOI: 10.1002/prot.23172] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Revised: 08/18/2011] [Accepted: 08/29/2011] [Indexed: 12/26/2022]
|
318
|
La D, Kihara D. A novel method for protein-protein interaction site prediction using phylogenetic substitution models. Proteins 2011; 80:126-41. [PMID: 21989996 DOI: 10.1002/prot.23169] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2011] [Revised: 07/07/2011] [Accepted: 08/17/2011] [Indexed: 11/10/2022]
Abstract
Protein-protein binding events mediate many critical biological functions in the cell. Typically, functionally important sites in proteins can be well identified by considering sequence conservation. However, protein-protein interaction sites exhibit higher sequence variation than other functional regions, such as catalytic sites of enzymes. Consequently, the mutational behavior leading to weak sequence conservation poses significant challenges to the protein-protein interaction site prediction. Here, we present a phylogenetic framework to capture critical sequence variations that favor the selection of residues essential for protein-protein binding. Through the comprehensive analysis of diverse protein families, we show that protein binding interfaces exhibit distinct amino acid substitution as compared with other surface residues. On the basis of this analysis, we have developed a novel method, BindML, which utilizes the substitution models to predict protein-protein binding sites of protein with unknown interacting partners. BindML estimates the likelihood that a phylogenetic tree of a local surface region in a query protein structure follows the substitution patterns of protein binding interface and nonbinding surfaces. BindML is shown to perform well compared to alternative methods for protein binding interface prediction. The methodology developed in this study is very versatile in the sense that it can be generally applied for predicting other types of functional sites, such as DNA, RNA, and membrane binding sites in proteins.
Collapse
Affiliation(s)
- David La
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, Indiana 47907, USA
| | | |
Collapse
|
319
|
Melquiond AS, Karaca E, Kastritis PL, Bonvin AM. Next challenges in protein-protein docking: from proteome to interactome and beyond. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2011. [DOI: 10.1002/wcms.91] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
320
|
Luo X, Lü Q, Wu H, Yang L, Huang X, Qian P, Fu G. Automatic prediction of flexible regions improves the accuracy of protein-protein docking models. J Mol Model 2011; 18:2199-208. [DOI: 10.1007/s00894-011-1231-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2011] [Accepted: 08/22/2011] [Indexed: 11/28/2022]
|
321
|
Pons C, Glaser F, Fernandez-Recio J. Prediction of protein-binding areas by small-world residue networks and application to docking. BMC Bioinformatics 2011; 12:378. [PMID: 21943333 PMCID: PMC3189935 DOI: 10.1186/1471-2105-12-378] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2011] [Accepted: 09/26/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein-protein interactions are involved in most cellular processes, and their detailed physico-chemical and structural characterization is needed in order to understand their function at the molecular level. In-silico docking tools can complement experimental techniques, providing three-dimensional structural models of such interactions at atomic resolution. In several recent studies, protein structures have been modeled as networks (or graphs), where the nodes represent residues and the connecting edges their interactions. From such networks, it is possible to calculate different topology-based values for each of the nodes, and to identify protein regions with high centrality scores, which are known to positively correlate with key functional residues, hot spots, and protein-protein interfaces. RESULTS Here we show that this correlation can be efficiently used for the scoring of rigid-body docking poses. When integrated into the pyDock energy-based docking method, the new combined scoring function significantly improved the results of the individual components as shown on a standard docking benchmark. This improvement was particularly remarkable for specific protein complexes, depending on the shape, size, type, or flexibility of the proteins involved. CONCLUSIONS The network-based representation of protein structures can be used to identify protein-protein binding regions and to efficiently score docking poses, complementing energy-based approaches.
Collapse
Affiliation(s)
- Carles Pons
- Joint BSC-IRB research programme in Computational Biology, Barcelona Supercomputing Center, Barcelona 08034, Spain
| | | | | |
Collapse
|
322
|
Accelerating protein docking in ZDOCK using an advanced 3D convolution library. PLoS One 2011; 6:e24657. [PMID: 21949741 PMCID: PMC3176283 DOI: 10.1371/journal.pone.0024657] [Citation(s) in RCA: 440] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Accepted: 08/15/2011] [Indexed: 11/19/2022] Open
Abstract
Computational prediction of the 3D structures of molecular interactions is a challenging area, often requiring significant computational resources to produce structural predictions with atomic-level accuracy. This can be particularly burdensome when modeling large sets of interactions, macromolecular assemblies, or interactions between flexible proteins. We previously developed a protein docking program, ZDOCK, which uses a fast Fourier transform to perform a 3D search of the spatial degrees of freedom between two molecules. By utilizing a pairwise statistical potential in the ZDOCK scoring function, there were notable gains in docking accuracy over previous versions, but this improvement in accuracy came at a substantial computational cost. In this study, we incorporated a recently developed 3D convolution library into ZDOCK, and additionally modified ZDOCK to dynamically orient the input proteins for more efficient convolution. These modifications resulted in an average of over 8.5-fold improvement in running time when tested on 176 cases in a newly released protein docking benchmark, as well as substantially less memory usage, with no loss in docking accuracy. We also applied these improvements to a previous version of ZDOCK that uses a simpler non-pairwise atomic potential, yielding an average speed improvement of over 5-fold on the docking benchmark, while maintaining predictive success. This permits the utilization of ZDOCK for more intensive tasks such as docking flexible molecules and modeling of interactomes, and can be run more readily by those with limited computational resources.
Collapse
|
323
|
Kochańczyk M. Prediction of functionally important residues in globular proteins from unusual central distances of amino acids. BMC STRUCTURAL BIOLOGY 2011; 11:34. [PMID: 21923943 PMCID: PMC3188475 DOI: 10.1186/1472-6807-11-34] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2011] [Accepted: 09/18/2011] [Indexed: 12/12/2022]
Abstract
BACKGROUND Well-performing automated protein function recognition approaches usually comprise several complementary techniques. Beside constructing better consensus, their predictive power can be improved by either adding or refining independent modules that explore orthogonal features of proteins. In this work, we demonstrated how the exploration of global atomic distributions can be used to indicate functionally important residues. RESULTS Using a set of carefully selected globular proteins, we parametrized continuous probability density functions describing preferred central distances of individual protein atoms. Relative preferred burials were estimated using mixture models of radial density functions dependent on the amino acid composition of a protein under consideration. The unexpectedness of extraordinary locations of atoms was evaluated in the information-theoretic manner and used directly for the identification of key amino acids. In the validation study, we tested capabilities of a tool built upon our approach, called SurpResi, by searching for binding sites interacting with ligands. The tool indicated multiple candidate sites achieving success rates comparable to several geometric methods. We also showed that the unexpectedness is a property of regions involved in protein-protein interactions, and thus can be used for the ranking of protein docking predictions. The computational approach implemented in this work is freely available via a Web interface at http://www.bioinformatics.org/surpresi. CONCLUSIONS Probabilistic analysis of atomic central distances in globular proteins is capable of capturing distinct orientational preferences of amino acids as resulting from different sizes, charges and hydrophobic characters of their side chains. When idealized spatial preferences can be inferred from the sole amino acid composition of a protein, residues located in hydrophobically unfavorable environments can be easily detected. Such residues turn out to be often directly involved in binding ligands or interfacing with other proteins.
Collapse
Affiliation(s)
- Marek Kochańczyk
- Faculty of Physics, Jagiellonian University, ul, Reymonta 4, 30-059 Krakow, Poland.
| |
Collapse
|
324
|
Ghoorah AW, Devignes MD, Smaïl-Tabbone M, Ritchie DW. Spatial clustering of protein binding sites for template based protein docking. Bioinformatics 2011; 27:2820-7. [DOI: 10.1093/bioinformatics/btr493] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
325
|
Vreven T, Hwang H, Weng Z. Integrating atom-based and residue-based scoring functions for protein-protein docking. Protein Sci 2011; 20:1576-86. [PMID: 21739500 DOI: 10.1002/pro.687] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2011] [Revised: 06/03/2011] [Accepted: 06/15/2011] [Indexed: 12/30/2022]
Abstract
Most scoring functions for protein-protein docking algorithms are either atom-based or residue-based, with the former being able to produce higher quality structures and latter more tolerant to conformational changes upon binding. Earlier, we developed the ZRANK algorithm for reranking docking predictions, with a scoring function that contained only atom-based terms. Here we combine ZRANK's atom-based potentials with five residue-based potentials published by other labs, as well as an atom-based potential IFACE that we published after ZRANK. We simultaneously optimized the weights for selected combinations of terms in the scoring function, using decoys generated with the protein-protein docking algorithm ZDOCK. We performed rigorous cross validation of the combinations using 96 test cases from a docking benchmark. Judged by the integrative success rate of making 1000 predictions per complex, addition of IFACE and the best residue-based pair potential reduced the number of cases without a correct prediction by 38 and 27% relative to ZDOCK and ZRANK, respectively. Thus combination of residue-based and atom-based potentials into a scoring function can improve performance for protein-protein docking. The resulting scoring function is called IRAD (integration of residue- and atom-based potentials for docking) and is available at http://zlab.umassmed.edu.
Collapse
Affiliation(s)
- Thom Vreven
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | | | | |
Collapse
|
326
|
Karaca E, Bonvin AMJJ. A multidomain flexible docking approach to deal with large conformational changes in the modeling of biomolecular complexes. Structure 2011; 19:555-65. [PMID: 21481778 DOI: 10.1016/j.str.2011.01.014] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Revised: 01/03/2011] [Accepted: 01/10/2011] [Indexed: 10/18/2022]
Abstract
Binding-induced backbone and large-scale conformational changes represent one of the major challenges in the modeling of biomolecular complexes by docking. To address this challenge, we have developed a flexible multidomain docking protocol that follows a "divide-and-conquer" approach to model both large-scale domain motions and small- to medium-scale interfacial rearrangements: the flexible binding partner is treated as an assembly of subparts/domains that are docked simultaneously making use of HADDOCK's multidomain docking ability. For this, the flexible molecules are cut at hinge regions predicted using an elastic network model. The performance of this approach is demonstrated on a benchmark covering an unprecedented range of conformational changes of 1.5 to 19.5 Å. We show from a statistical survey of known complexes that the cumulative sum of eigenvalues obtained from the elastic network has some predictive power to indicate the extent of the conformational change to be expected.
Collapse
Affiliation(s)
- Ezgi Karaca
- Bijvoet Center for Biomolecular Research, Faculty of Science, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | | |
Collapse
|
327
|
Hwang H, Vreven T, Whitfield TW, Wiehe K, Weng Z. A machine learning approach for the prediction of protein surface loop flexibility. Proteins 2011; 79:2467-74. [PMID: 21633973 DOI: 10.1002/prot.23070] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2011] [Revised: 03/30/2011] [Accepted: 04/19/2011] [Indexed: 11/11/2022]
Abstract
Proteins often undergo conformational changes when binding to each other. A major fraction of backbone conformational changes involves motion on the protein surface, particularly in loops. Accounting for the motion of protein surface loops represents a challenge for protein-protein docking algorithms. A first step in addressing this challenge is to distinguish protein surface loops that are likely to undergo backbone conformational changes upon protein-protein binding (mobile loops) from those that are not (stationary loops). In this study, we developed a machine learning strategy based on support vector machines (SVMs). Our SVM uses three features of loop residues in the unbound protein structures-Ramachandran angles, crystallographic B-factors, and relative accessible surface area-to distinguish mobile loops from stationary ones. This method yields an average prediction accuracy of 75.3% compared with a random prediction accuracy of 50%, and an average of 0.79 area under the receiver operating characteristic (ROC) curve using cross-validation. Testing the method on an independent dataset, we obtained a prediction accuracy of 70.5%. Finally, we applied the method to 11 complexes that involve members from the Ras superfamily and achieved prediction accuracy of 92.8% for the Ras superfamily proteins and 74.4% for their binding partners.
Collapse
Affiliation(s)
- Howook Hwang
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | | | | | | | | |
Collapse
|
328
|
Zhang C, Lai L. SDOCK: a global protein-protein docking program using stepwise force-field potentials. J Comput Chem 2011; 32:2598-612. [PMID: 21618559 DOI: 10.1002/jcc.21839] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2011] [Revised: 03/24/2011] [Accepted: 04/16/2011] [Indexed: 11/10/2022]
Abstract
Fast Fourier transform (FFT) method limits the forms of scoring functions in global protein-protein docking. On the other hand, force field potentials can effectively describe the energy hyper surface of biological macromolecules. In this study, we developed a new protein-protein docking program, SDOCK, that incorporates van der Waals attractive potential, geometric collision, screened electrostatic potential, and Lazaridis-Karplus desolvation energy into the scoring function in the global searching process. Stepwise potentials were generated from the corresponding continuous forms to treat the structure flexibility. After optimization of the atom solvation parameters and the weights of different potential terms based on a new docking test set that contains 142 cases with small or moderate conformational changes upon binding, SDOCK slightly outperformed the well-known FFT based global docking program ZDOCK3.0. Among the 142 cases tested, 52.8% gave at least one near-native solutions in the top 100 solutions. SDOCK was also tested on six blind testing cases in Critical Assessment of Predicted Interactions rounds 13 to 18. In all six cases, the near-native solutions could be found within the top 350 solutions. Because the SDOCK approach performs global docking based on force-field potentials, one of its advantages is that it provides global binding free energy surface profiles for further analysis. The efficiency of the program is also comparable with that of other FFT based protein-protein docking programs. SDOCK is available for noncommercial applications at http://mdl.ipc.pku.edu.cn/cgi-bin/down.cgi.
Collapse
Affiliation(s)
- Changsheng Zhang
- Beijing National Laboratory for Molecular Sciences, State Key Laboratory for Structural Chemistry of Unstable and Stable Species, College of Chemistry and Molecular engineering, Peking University, Beijing, China
| | | |
Collapse
|
329
|
Mitra P, Pal D. PRUNE and PROBE--two modular web services for protein-protein docking. Nucleic Acids Res 2011; 39:W229-34. [PMID: 21576226 PMCID: PMC3125751 DOI: 10.1093/nar/gkr317] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The protein–protein docking programs typically perform four major tasks: (i) generation of docking poses, (ii) selecting a subset of poses, (iii) their structural refinement and (iv) scoring, ranking for the final assessment of the true quaternary structure. Although the tasks can be integrated or performed in a serial order, they are by nature modular, allowing an opportunity to substitute one algorithm with another. We have implemented two modular web services, (i) PRUNE: to select a subset of docking poses generated during sampling search (http://pallab.serc.iisc.ernet.in/prune) and (ii) PROBE: to refine, score and rank them (http://pallab.serc.iisc.ernet.in/probe). The former uses a new interface area based edge-scoring function to eliminate >95% of the poses generated during docking search. In contrast to other multi-parameter-based screening functions, this single parameter based elimination reduces the computational time significantly, in addition to increasing the chances of selecting native-like models in the top rank list. The PROBE server performs ranking of pruned poses, after structure refinement and scoring using a regression model for geometric compatibility, and normalized interaction energy. While web-service similar to PROBE is infrequent, no web-service akin to PRUNE has been described before. Both the servers are publicly accessible and free for use.
Collapse
Affiliation(s)
- Pralay Mitra
- Bioinformatics Centre, Indian Institute of Science, Bangalore 560 012, India
| | | |
Collapse
|
330
|
Tuncbag N, Gursoy A, Keskin O. Prediction of protein-protein interactions: unifying evolution and structure at protein interfaces. Phys Biol 2011; 8:035006. [PMID: 21572173 DOI: 10.1088/1478-3975/8/3/035006] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
The vast majority of the chores in the living cell involve protein-protein interactions. Providing details of protein interactions at the residue level and incorporating them into protein interaction networks are crucial toward the elucidation of a dynamic picture of cells. Despite the rapid increase in the number of structurally known protein complexes, we are still far away from a complete network. Given experimental limitations, computational modeling of protein interactions is a prerequisite to proceed on the way to complete structural networks. In this work, we focus on the question 'how do proteins interact?' rather than 'which proteins interact?' and we review structure-based protein-protein interaction prediction approaches. As a sample approach for modeling protein interactions, PRISM is detailed which combines structural similarity and evolutionary conservation in protein interfaces to infer structures of complexes in the protein interaction network. This will ultimately help us to understand the role of protein interfaces in predicting bound conformations.
Collapse
Affiliation(s)
- Nurcan Tuncbag
- Koc University, Center for Computational Biology and Bioinformatics, and College of Engineering, Rumelifeneri Yolu, 34450 Sariyer Istanbul, Turkey
| | | | | |
Collapse
|
331
|
Kastritis PL, Moal IH, Hwang H, Weng Z, Bates PA, Bonvin AMJJ, Janin J. A structure-based benchmark for protein-protein binding affinity. Protein Sci 2011; 20:482-91. [PMID: 21213247 PMCID: PMC3064828 DOI: 10.1002/pro.580] [Citation(s) in RCA: 221] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2010] [Revised: 12/15/2010] [Accepted: 12/16/2010] [Indexed: 11/06/2022]
Abstract
We have assembled a nonredundant set of 144 protein-protein complexes that have high-resolution structures available for both the complexes and their unbound components, and for which dissociation constants have been measured by biophysical methods. The set is diverse in terms of the biological functions it represents, with complexes that involve G-proteins and receptor extracellular domains, as well as antigen/antibody, enzyme/inhibitor, and enzyme/substrate complexes. It is also diverse in terms of the partners' affinity for each other, with K(d) ranging between 10(-5) and 10(-14) M. Nine pairs of entries represent closely related complexes that have a similar structure, but a very different affinity, each pair comprising a cognate and a noncognate assembly. The unbound structures of the component proteins being available, conformation changes can be assessed. They are significant in most of the complexes, and large movements or disorder-to-order transitions are frequently observed. The set may be used to benchmark biophysical models aiming to relate affinity to structure in protein-protein interactions, taking into account the reactants and the conformation changes that accompany the association reaction, instead of just the final product.
Collapse
Affiliation(s)
- Panagiotis L Kastritis
- Bijvoet Center for Biomolecular Research, Faculty of Science, Utrecht University3584CH Utrecht, The Netherlands
| | - Iain H Moal
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, Lincoln's Inn Fields LaboratoriesLondon WC2A 3LY, United Kingdom
| | - Howook Hwang
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical SchoolWorcester, Massachusetts 01605
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical SchoolWorcester, Massachusetts 01605
| | - Paul A Bates
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, Lincoln's Inn Fields LaboratoriesLondon WC2A 3LY, United Kingdom
| | - Alexandre M J J Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science, Utrecht University3584CH Utrecht, The Netherlands
| | - Joël Janin
- Yeast Structural Genomics, IBBMC UMR 8619, Université Paris-Sud91405 Orsay, France
| |
Collapse
|
332
|
Bastard K, Saladin A, Prévost C. Accounting for large amplitude protein deformation during in silico macromolecular docking. Int J Mol Sci 2011; 12:1316-33. [PMID: 21541061 PMCID: PMC3083708 DOI: 10.3390/ijms12021316] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2010] [Revised: 01/07/2011] [Accepted: 02/08/2011] [Indexed: 12/23/2022] Open
Abstract
Rapid progress of theoretical methods and computer calculation resources has turned in silico methods into a conceivable tool to predict the 3D structure of macromolecular assemblages, starting from the structure of their separate elements. Still, some classes of complexes represent a real challenge for macromolecular docking methods. In these complexes, protein parts like loops or domains undergo large amplitude deformations upon association, thus remodeling the surface accessible to the partner protein or DNA. We discuss the problems linked with managing such rearrangements in docking methods and we review strategies that are presently being explored, as well as their limitations and success.
Collapse
Affiliation(s)
- Karine Bastard
- LABIS, Genoscope, CEA, 2 rue Gaston Cremieux, F-91057 Evry Cedex, France; E-Mail:
| | - Adrien Saladin
- MTI, INSERM UMR-M 973, Paris Diderot-Paris 7 University, Bât Lamarck, 35 rue Hélène Brion, F-75205 Paris Cedex 13, France; E-Mail:
| | - Chantal Prévost
- LBT-UPR 9080 CNRS, IBPC, 13 rue Pierre et Marie Curie, F-75005 Paris, France
- Author to whom correspondence should be addressed; E-Mail: ; Tel.: +33-(0)1 58 41 51 71, Fax: +33-(0)1 58 415 026
| |
Collapse
|
333
|
Rawat N, Biswas P. Shape, flexibility and packing of proteins and nucleic acids in complexes. Phys Chem Chem Phys 2011; 13:9632-43. [DOI: 10.1039/c1cp00027f] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|