151
|
de Vries SJ, Bonvin AMJJ. CPORT: a consensus interface predictor and its performance in prediction-driven docking with HADDOCK. PLoS One 2011; 6:e17695. [PMID: 21464987 PMCID: PMC3064578 DOI: 10.1371/journal.pone.0017695] [Citation(s) in RCA: 233] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2010] [Accepted: 02/08/2011] [Indexed: 11/19/2022] Open
Abstract
Background Macromolecular complexes are the molecular machines of the cell. Knowledge at the atomic level is essential to understand and influence their function. However, their number is huge and a significant fraction is extremely difficult to study using classical structural methods such as NMR and X-ray crystallography. Therefore, the importance of large-scale computational approaches in structural biology is evident. This study combines two of these computational approaches, interface prediction and docking, to obtain atomic-level structures of protein-protein complexes, starting from their unbound components. Methodology/Principal Findings Here we combine six interface prediction web servers into a consensus method called CPORT (Consensus Prediction Of interface Residues in Transient complexes). We show that CPORT gives more stable and reliable predictions than each of the individual predictors on its own. A protocol was developed to integrate CPORT predictions into our data-driven docking program HADDOCK. For cases where experimental information is limited, this prediction-driven docking protocol presents an alternative to ab initio docking, the docking of complexes without the use of any information. Prediction-driven docking was performed on a large and diverse set of protein-protein complexes in a blind manner. Our results indicate that the performance of the HADDOCK-CPORT combination is competitive with ZDOCK-ZRANK, a state-of-the-art ab initio docking/scoring combination. Finally, the original interface predictions could be further improved by interface post-prediction (contact analysis of the docking solutions). Conclusions/Significance The current study shows that blind, prediction-driven docking using CPORT and HADDOCK is competitive with ab initio docking methods. This is encouraging since prediction-driven docking represents the absolute bottom line for data-driven docking: any additional biological knowledge will greatly improve the results obtained by prediction-driven docking alone. Finally, the fact that original interface predictions could be further improved by interface post-prediction suggests that prediction-driven docking has not yet been pushed to the limit. A web server for CPORT is freely available at http://haddock.chem.uu.nl/services/CPORT.
Collapse
Affiliation(s)
- Sjoerd J de Vries
- Faculty of Science, Bijvoet Center for Biomolecular Research, Utrecht University, Utrecht, The Netherlands.
| | | |
Collapse
|
152
|
Hamer R, Luo Q, Armitage JP, Reinert G, Deane CM. i-Patch: interprotein contact prediction using local network information. Proteins 2011; 78:2781-97. [PMID: 20635422 DOI: 10.1002/prot.22792] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Biological processes are commonly controlled by precise protein-protein interactions. These connections rely on specific amino acids at the binding interfaces. Here we predict the binding residues of such interprotein complexes. We have developed a suite of methods, i-Patch, which predict the interprotein contact sites by considering the two proteins as a network, with residues as nodes and contacts as edges. i-Patch starts with two proteins, A and B, which are assumed to interact, but for which the structure of the complex is not available. However, we assume that for each protein, we have a reference structure and a multiple sequence alignment of homologues. i-Patch then uses the propensities of patches of residues to interact, to predict interprotein contact sites. i-Patch outperforms several other tested algorithms for prediction of interprotein contact sites. It gives 59% precision with 20% recall on a blind test set of 31 protein pairs. Combining the i-Patch scores with an existing correlated mutation algorithm, McBASC, using a logistic model gave little improvement. Results from a case study, on bacterial chemotaxis protein complexes, demonstrate that our predictions can identify contact residues, as well as suggesting unknown interfaces in multiprotein complexes.
Collapse
Affiliation(s)
- Rebecca Hamer
- Oxford Centre for Integrative Systems Biology, Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | | | | | | | | |
Collapse
|
153
|
Molecular dynamics simulation of the Staphylococcus aureus YsxC protein: molecular insights into ribosome assembly and allosteric inhibition of the protein. J Mol Model 2011; 17:3129-49. [PMID: 21360172 DOI: 10.1007/s00894-011-0998-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2010] [Accepted: 01/26/2011] [Indexed: 12/30/2022]
Abstract
YsxC from Staphylococcus aureus is a member of the GTPase protein family, and is involved in the ribosomal assembly and stability of this microorganism through its interactions with the L17, S2 and S10 ribosomal proteins. Inhibition of its interactions with L17, S2, S10 and the β' subunit of RNA polymerase influences ribosomal assembly, which may affect the growth of the microorganism. This makes YsxC a novel target for the design of inhibitors to treat the disease caused by S. aureus. Understanding the interaction mechanism between YsxC and its partners would aid in the identification of potential catalytic residues, which could then be targeted to inhibit its function. Accordingly, in the present study, an in silico analysis of the interactions between YsxC and L17, S2 and S10 was performed, and the potential residues involved in these interactions were identified. Based on the simulation results, a possible mechanism for the interactions between these proteins was also proposed. Finally, six ligands from among a library of 81,000 chemical molecules were found to interact with parts of the G2 and switch II regions of the YsxC protein. Moreover, their interactions with the YsxC protein were observed to provoke changes at its GTP-binding site, which suggests that the binding of these ligands leads to a reduction in GTPase activity, and they were also found to affect the interactions of YsxC with its partners. This observation indicates that the proposed interacting site of YsxC may act as an allosteric site, and disrupting interactions at this site might lead to novel allosteric inhibition of the YsxC protein.
Collapse
|
154
|
Geppert T, Hoy B, Wessler S, Schneider G. Context-Based Identification of Protein-Protein Interfaces and “Hot-Spot” Residues. ACTA ACUST UNITED AC 2011; 18:344-53. [DOI: 10.1016/j.chembiol.2011.01.005] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Revised: 12/03/2010] [Accepted: 01/05/2011] [Indexed: 02/07/2023]
|
155
|
de Vries SJ, Melquiond ASJ, Kastritis PL, Karaca E, Bordogna A, van Dijk M, Rodrigues JPGLM, Bonvin AMJJ. Strengths and weaknesses of data-driven docking in critical assessment of prediction of interactions. Proteins 2011; 78:3242-9. [PMID: 20718048 DOI: 10.1002/prot.22814] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The recent CAPRI rounds have introduced new docking challenges in the form of protein-RNA complexes, multiple alternative interfaces, and an unprecedented number of targets for which homology modeling was required. We present here the performance of HADDOCK and its web server in the CAPRI experiment and discuss the strengths and weaknesses of data-driven docking. HADDOCK was successful for 6 out of 9 complexes (6 out of 11 targets) and accurately predicted the individual interfaces for two more complexes. The HADDOCK server, which is the first allowing the simultaneous docking of generic multi-body complexes, was successful in 4 out of 7 complexes for which it participated. In the scoring experiment, we predicted the highest number of targets of any group. The main weakness of data-driven docking revealed from these last CAPRI results is its vulnerability for incorrect experimental data related to the interface or the stoichiometry of the complex. At the same time, the use of experimental and/or predicted information is also the strength of our approach as evidenced for those targets for which accurate experimental information was available (e.g., the 10 three-stars predictions for T40!). Even when the models show a wrong orientation, the individual interfaces are generally well predicted with an average coverage of 60% ± 26% over all targets. This makes data-driven docking particularly valuable in a biological context to guide experimental studies like, for example, targeted mutagenesis.
Collapse
Affiliation(s)
- Sjoerd J de Vries
- NMR Research Group, Bijvoet Center for Biomolecular Research, Utrecht University, 3584 CH Utrecht, The Netherlands
| | | | | | | | | | | | | | | |
Collapse
|
156
|
Abstract
In CAPRI rounds 13-19, we submitted models that are of acceptable or higher quality for 6 of the total of 13 targets. This success builds on our record in previous CAPRI rounds. The docking problem can be divided into two steps. In the first, translational/rotational and conformational space is searched to generate a pool of docked poses; the success of this search step is measured by whether near-native poses are included in the pool. In the second step, the pool is selected for near-native poses. In our previous assessment of CAPRI results, we suggested that the search problem is largely solved; a remaining problem is to select near-native poses. Our work in these new rounds of CAPRI was guided by this assessment. To solve the selection problem, we used an assortment of criteria on the interfaces of candidate poses. In one extreme, represented by T29, with very little known interface information, our criterion for top models was based on interface prediction. Poses in which the predicted interface residues occurred in interfaces were selected. Our model 1 for T29 was of medium quality. In the other extreme, represented by T40, with reliably known interface information, our selection was solely based on such information. Nine of the ten models submitted for T40 were of high (3 models), medium (4 models), and acceptable (2 models) quality. Our strategy of mixing predicted and known interface information appears to be widely applicable for the selection of near-native poses.
Collapse
Affiliation(s)
- Sanbo Qin
- Department of Physics and Institute of Molecular Biophysics, Florida State University, Tallahassee, Florida 32306, USA
| | | |
Collapse
|
157
|
Carl N, Konc J, Vehar B, Janezic D. Protein-protein binding site prediction by local structural alignment. J Chem Inf Model 2011; 50:1906-13. [PMID: 20919700 DOI: 10.1021/ci100265x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Generalization of an earlier algorithm has led to the development of new local structural alignment algorithms for prediction of protein-protein binding sites. The algorithms use maximum cliques on protein graphs to define structurally similar protein regions. The search for structural neighbors in the new algorithms has been extended to all the proteins in the PDB and the query protein is compared to more than 60,000 proteins or over 300,000 single-chain structures. The resulting structural similarities are combined and used to predict the protein binding sites. This study shows that the location of protein binding sites can be predicted by comparing only local structural similarities irrespective of general protein folds.
Collapse
Affiliation(s)
- Nejc Carl
- National Institute of Chemistry, Hajdrihova 19, SI-1000 Ljubljana, Slovenia
| | | | | | | |
Collapse
|
158
|
Chennamsetty N, Voynov V, Kayser V, Helk B, Trout BL. Prediction of protein binding regions. Proteins 2010; 79:888-97. [DOI: 10.1002/prot.22926] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2010] [Revised: 09/23/2010] [Accepted: 10/13/2010] [Indexed: 11/07/2022]
|
159
|
Launay G, Simonson T. A large decoy set of protein-protein complexes produced by flexible docking. J Comput Chem 2010; 32:106-20. [DOI: 10.1002/jcc.21604] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
160
|
Kanamori E, Igarashi S, Osawa M, Fukunishi Y, Shimada I, Nakamura H. Structure determination of a protein assembly by amino acid selective cross-saturation. Proteins 2010; 79:179-90. [PMID: 20954264 DOI: 10.1002/prot.22871] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2010] [Revised: 08/25/2010] [Accepted: 08/27/2010] [Indexed: 11/10/2022]
Abstract
Amino acid selective cross-saturation (ASCS) method not only provides information about the interface of a protein assembly by the spin relaxation experiment, but also identifies the amino acid residues in the acceptor protein, which are located close to the selectively labeled amino acid residues in the donor protein. Here, a new method was developed to build a precise structural model of a protein assembly, which satisfies the experimental ASCS values, using simulated annealing computation. This method was applied to the ubiquitin-yeast ubiquitin hydrolase 1 (Ub-YUH1) complex to build a precise complex structure compatible with that determined by X-ray crystallography.
Collapse
Affiliation(s)
- Eiji Kanamori
- Japan Biological Informatics Consortium (JBIC), Koto-ku, Tokyo 135-0064, Japan.
| | | | | | | | | | | |
Collapse
|
161
|
Lensink MF, Wodak SJ. Blind predictions of protein interfaces by docking calculations in CAPRI. Proteins 2010; 78:3085-95. [DOI: 10.1002/prot.22850] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
162
|
Fiorucci S, Zacharias M. Prediction of protein-protein interaction sites using electrostatic desolvation profiles. Biophys J 2010; 98:1921-30. [PMID: 20441756 DOI: 10.1016/j.bpj.2009.12.4332] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2009] [Revised: 11/30/2009] [Accepted: 12/23/2009] [Indexed: 01/15/2023] Open
Abstract
Protein-protein complex formation involves removal of water from the interface region. Surface regions with a small free energy penalty for water removal or desolvation may correspond to preferred interaction sites. A method to calculate the electrostatic free energy of placing a neutral low-dielectric probe at various protein surface positions has been designed and applied to characterize putative interaction sites. Based on solutions of the finite-difference Poisson equation, this method also includes long-range electrostatic contributions and the protein solvent boundary shape in contrast to accessible-surface-area-based solvation energies. Calculations on a large set of proteins indicate that in many cases (>90%), the known binding site overlaps with one of the six regions of lowest electrostatic desolvation penalty (overlap with the lowest desolvation region for 48% of proteins). Since the onset of electrostatic desolvation occurs even before direct protein-protein contact formation, it may help guide proteins toward the binding region in the final stage of complex formation. It is interesting that the probe desolvation properties associated with residue types were found to depend to some degree on whether the residue was outside of or part of a binding site. The probe desolvation penalty was on average smaller if the residue was part of a binding site compared to other surface locations. Applications to several antigen-antibody complexes demonstrated that the approach might be useful not only to predict protein interaction sites in general but to map potential antigenic epitopes on protein surfaces.
Collapse
Affiliation(s)
- Sébastien Fiorucci
- School of Engineering and Science, Jacobs University Bremen, Bremen, Germany.
| | | |
Collapse
|
163
|
Chen P, Li J. Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information. BMC Bioinformatics 2010; 11:402. [PMID: 20667087 PMCID: PMC2921408 DOI: 10.1186/1471-2105-11-402] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2010] [Accepted: 07/28/2010] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Protein-protein interactions play essential roles in protein function determination and drug design. Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost. Therefore, it is important to improve the performance for predicting protein interaction sites based on primary sequence alone. RESULTS We propose a new idea to construct an integrative profile for each residue in a protein by combining its hydrophobic and evolutionary information. A support vector machine (SVM) ensemble is then developed, where SVMs train on different pairs of positive (interface sites) and negative (non-interface sites) subsets. The subsets having roughly the same sizes are grouped in the order of accessible surface area change before and after complexation. A self-organizing map (SOM) technique is applied to group similar input vectors to make more accurate the identification of interface residues. An ensemble of ten-SVMs achieves an MCC improvement by around 8% and F1 improvement by around 9% over that of three-SVMs. As expected, SVM ensembles constantly perform better than individual SVMs. In addition, the model by the integrative profiles outperforms that based on the sequence profile or the hydropathy scale alone. As our method uses a small number of features to encode the input vectors, our model is simpler, faster and more accurate than the existing methods. CONCLUSIONS The integrative profile by combining hydrophobic and evolutionary information contributes most to the protein-protein interaction prediction. Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues. In addition, the ensemble of SVM classifiers improves the prediction performance. AVAILABILITY Datasets and software are available at http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm.
Collapse
Affiliation(s)
- Peng Chen
- Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, 639798 Singapore
| | | |
Collapse
|
164
|
Abstract
With the advent of Systems Biology, the prediction of whether two proteins form a complex has become a problem of increased importance. A variety of experimental techniques have been applied to the problem, but three-dimensional structural information has not been widely exploited. Here we explore the range of applicability of such information by analyzing the extent to which the location of binding sites on protein surfaces is conserved among structural neighbors. We find, as expected, that interface conservation is most significant among proteins that have a clear evolutionary relationship, but that there is a significant level of conservation even among remote structural neighbors. This finding is consistent with recent evidence that information available from structural neighbors, independent of classification, should be exploited in the search for functional insights. The value of such structural information is highlighted through the development of a new protein interface prediction method, PredUs, that identifies what residues on protein surfaces are likely to participate in complexes with other proteins. The performance of PredUs, as measured through comparisons with other methods, suggests that relationships across protein structure space can be successfully exploited in the prediction of protein-protein interactions.
Collapse
|
165
|
Guharoy M, Chakrabarti P. Conserved residue clusters at protein-protein interfaces and their use in binding site identification. BMC Bioinformatics 2010; 11:286. [PMID: 20507585 PMCID: PMC2894039 DOI: 10.1186/1471-2105-11-286] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2010] [Accepted: 05/27/2010] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Biological evolution conserves protein residues that are important for structure and function. Both protein stability and function often require a certain degree of structural co-operativity between spatially neighboring residues and it has previously been shown that conserved residues occur clustered together in protein tertiary structures, enzyme active sites and protein-DNA interfaces. Residues comprising protein interfaces are often more conserved compared to those occurring elsewhere on the protein surface. We investigate the extent to which conserved residues within protein-protein interfaces are clustered together in three-dimensions. RESULTS Out of 121 and 392 interfaces in homodimers and heterocomplexes, 96.7 and 86.7%, respectively, have the conserved positions clustered within the overall interface region. The significance of this clustering was established in comparison to what is seen for the subsets of the same size of randomly selected residues from the interface. Conserved residues occurring in larger interfaces could often be sub-divided into two or more distinct sub-clusters. These structural cluster(s) comprising conserved residues indicate functionally important regions within the protein-protein interface that can be targeted for further structural and energetic analysis by experimental scanning mutagenesis. Almost 60% of experimental hot spot residues (with DeltaDeltaG > 2 kcal/mol) were localized to these conserved residue clusters. An analysis of the residue types that are enriched within these conserved subsets compared to the overall interface showed that hydrophobic and aromatic residues are favored, but charged residues (both positive and negative) are less common. The potential use of this method for discriminating binding sites (interfaces) versus random surface patches was explored by comparing the clustering of conserved residues within each of these regions--in about 50% cases the true interface is ranked among the top 10% of all surface patches. CONCLUSIONS Protein-protein interaction sites are much larger than small molecule biding sites, but still conserved residues are not randomly distributed over the whole interface and are distinctly clustered. The clustered nature of evolutionarily conserved residues within interfaces as compared to those within other surface patches not involved in binding has important implications for the identification of protein-protein binding sites and would have applications in docking studies.
Collapse
Affiliation(s)
- Mainak Guharoy
- Bioinformatics Centre, Bose Institute, P-1/12 CIT Scheme VIIM, Kolkata, India
| | | |
Collapse
|
166
|
Chae MH, Krull F, Lorenzen S, Knapp EW. Predicting protein complex geometries with a neural network. Proteins 2010; 78:1026-39. [PMID: 19938153 DOI: 10.1002/prot.22626] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
A major challenge of the protein docking problem is to define scoring functions that can distinguish near-native protein complex geometries from a large number of non-native geometries (decoys) generated with noncomplexed protein structures (unbound docking). In this study, we have constructed a neural network that employs the information from atom-pair distance distributions of a large number of decoys to predict protein complex geometries. We found that docking prediction can be significantly improved using two different types of polar hydrogen atoms. To train the neural network, 2000 near-native decoys of even distance distribution were used for each of the 185 considered protein complexes. The neural network normalizes the information from different protein complexes using an additional protein complex identity input neuron for each complex. The parameters of the neural network were determined such that they mimic a scoring funnel in the neighborhood of the native complex structure. The neural network approach avoids the reference state problem, which occurs in deriving knowledge-based energy functions for scoring. We show that a distance-dependent atom pair potential performs much better than a simple atom-pair contact potential. We have compared the performance of our scoring function with other empirical and knowledge-based scoring functions such as ZDOCK 3.0, ZRANK, ITScore-PP, EMPIRE, and RosettaDock. In spite of the simplicity of the method and its functional form, our neural network-based scoring function achieves a reasonable performance in rigid-body unbound docking of proteins. Proteins 2010. (c) 2009 Wiley-Liss, Inc.
Collapse
Affiliation(s)
- Myong-Ho Chae
- Department of Biology, University of Science, Unjong-District, Pyongyang, DPR Korea
| | | | | | | |
Collapse
|
167
|
Liu B, Wang X, Lin L, Tang B, Dong Q, Wang X. Prediction of protein binding sites in protein structures using hidden Markov support vector machine. BMC Bioinformatics 2009; 10:381. [PMID: 19925685 PMCID: PMC2785799 DOI: 10.1186/1471-2105-10-381] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2009] [Accepted: 11/20/2009] [Indexed: 01/08/2023] Open
Abstract
Background Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance. Results In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods. Conclusion The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.
Collapse
Affiliation(s)
- Bin Liu
- Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, PR China.
| | | | | | | | | | | |
Collapse
|
168
|
Bordner AJ. Predicting protein-protein binding sites in membrane proteins. BMC Bioinformatics 2009; 10:312. [PMID: 19778442 PMCID: PMC2761413 DOI: 10.1186/1471-2105-10-312] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Accepted: 09/24/2009] [Indexed: 01/15/2023] Open
Abstract
Background Many integral membrane proteins, like their non-membrane counterparts, form either transient or permanent multi-subunit complexes in order to carry out their biochemical function. Computational methods that provide structural details of these interactions are needed since, despite their importance, relatively few structures of membrane protein complexes are available. Results We present a method for predicting which residues are in protein-protein binding sites within the transmembrane regions of membrane proteins. The method uses a Random Forest classifier trained on residue type distributions and evolutionary conservation for individual surface residues, followed by spatial averaging of the residue scores. The prediction accuracy achieved for membrane proteins is comparable to that for non-membrane proteins. Also, like previous results for non-membrane proteins, the accuracy is significantly higher for residues distant from the binding site boundary. Furthermore, a predictor trained on non-membrane proteins was found to yield poor accuracy on membrane proteins, as expected from the different distribution of surface residue types between the two classes of proteins. Thus, although the same procedure can be used to predict binding sites in membrane and non-membrane proteins, separate predictors trained on each class of proteins are required. Finally, the contribution of each residue property to the overall prediction accuracy is analyzed and prediction examples are discussed. Conclusion Given a membrane protein structure and a multiple alignment of related sequences, the presented method gives a prioritized list of which surface residues participate in intramembrane protein-protein interactions. The method has potential applications in guiding the experimental verification of membrane protein interactions, structure-based drug discovery, and also in constraining the search space for computational methods, such as protein docking or threading, that predict membrane protein complex structures.
Collapse
Affiliation(s)
- Andrew J Bordner
- Mayo Clinic, 13400 East Shea Boulevard, Scottsdale, AZ 85259, USA.
| |
Collapse
|
169
|
Using Support Vector Machine Combined with Post-processing Procedure to Improve Prediction of Interface Residues in Transient Complexes. Protein J 2009; 28:369-74. [DOI: 10.1007/s10930-009-9203-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
170
|
Giard J, Ambroise J, Gala JL, Macq B. Regression applied to protein binding site prediction and comparison with classification. BMC Bioinformatics 2009; 10:276. [PMID: 19728868 PMCID: PMC2749839 DOI: 10.1186/1471-2105-10-276] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 09/03/2009] [Indexed: 11/13/2022] Open
Abstract
Background The structural genomics centers provide hundreds of protein structures of unknown function. Therefore, developing methods enabling the determination of a protein function automatically is imperative. The determination of a protein function can be achieved by studying the network of its physical interactions. In this context, identifying a potential binding site between proteins is of primary interest. In the literature, methods for predicting a potential binding site location generally are based on classification tools. The aim of this paper is to show that regression tools are more efficient than classification tools for patches based binding site predictors. For this purpose, we developed a patches based binding site localization method usable with either regression or classification tools. Results We compared predictive performances of regression tools with performances of machine learning classifiers. Using leave-one-out cross-validation, we showed that regression tools provide better predictions than classification ones. Among regression tools, Multilayer Perceptron ranked highest in the quality of predictions. We compared also the predictive performance of our patches based method using Multilayer Perceptron with the performance of three other methods usable through a web server. Our method performed similarly to the other methods. Conclusion Regression is more efficient than classification when applied to our binding site localization method. When it is possible, using regression instead of classification for other existing binding site predictors will probably improve results. Furthermore, the method presented in this work is flexible because the size of the predicted binding site is adjustable. This adaptability is useful when either false positive or negative rates have to be limited.
Collapse
Affiliation(s)
- Joachim Giard
- Communications and Remote Sensing Laboratory, Université Catholique de Louvain, Place du Levant 2, 1348 Louvain-la-Neuve, Belgium.
| | | | | | | |
Collapse
|
171
|
Exploiting three kinds of interface propensities to identify protein binding sites. Comput Biol Chem 2009; 33:303-11. [DOI: 10.1016/j.compbiolchem.2009.07.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2008] [Revised: 06/22/2009] [Accepted: 07/01/2009] [Indexed: 11/21/2022]
|
172
|
Improved Prediction of Protein Binding Sites from Sequences Using Genetic Algorithm. Protein J 2009; 28:273-80. [DOI: 10.1007/s10930-009-9192-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
173
|
Varecha M, Zimmermann M, Amrichová J, Ulman V, Matula P, Kozubek M. Prediction of localization and interactions of apoptotic proteins. J Biomed Sci 2009; 16:59. [PMID: 19580669 PMCID: PMC2714591 DOI: 10.1186/1423-0127-16-59] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2009] [Accepted: 07/06/2009] [Indexed: 01/14/2023] Open
Abstract
During apoptosis several mitochondrial proteins are released. Some of them participate in caspase-independent nuclear DNA degradation, especially apoptosis-inducing factor (AIF) and endonuclease G (endoG). Another interesting protein, which was expected to act similarly as AIF due to the high sequence homology with AIF is AIF-homologous mitochondrion-associated inducer of death (AMID). We studied the structure, cellular localization, and interactions of several proteins in silico and also in cells using fluorescent microscopy. We found the AMID protein to be cytoplasmic, most probably incorporated into the cytoplasmic side of the lipid membranes. Bioinformatic predictions were conducted to analyze the interactions of the studied proteins with each other and with other possible partners. We conducted molecular modeling of proteins with unknown 3D structures. These models were then refined by MolProbity server and employed in molecular docking simulations of interactions. Our results show data acquired using a combination of modern in silico methods and image analysis to understand the localization, interactions and functions of proteins AMID, AIF, endonuclease G, and other apoptosis-related proteins.
Collapse
Affiliation(s)
- Miroslav Varecha
- Centre for Biomedical Image Analysis, Faculty of Informatics, Masaryk University, Botanická 68a, Brno 60200, Czech Republic.
| | | | | | | | | | | |
Collapse
|
174
|
Grosdidier S, Fernández-Recio J. Docking and scoring: applications to drug discovery in the interactomics era. Expert Opin Drug Discov 2009; 4:673-86. [DOI: 10.1517/17460440903002067] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
175
|
Du X, Cheng J, Song J. Identifying protein-protein interaction sites using covering algorithm. Int J Mol Sci 2009; 10:2190-2202. [PMID: 19564948 PMCID: PMC2695276 DOI: 10.3390/ijms10052190] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2009] [Revised: 04/30/2009] [Accepted: 05/13/2009] [Indexed: 12/03/2022] Open
Abstract
Identification of protein-protein interface residues is crucial for structural biology. This paper proposes a covering algorithm for predicting protein-protein interface residues with features including protein sequence profile and residue accessible area. This method adequately utilizes the characters of a covering algorithm which have simple, lower complexity and high accuracy for high dimension data. The covering algorithm can achieve a comparable performance (69.62%, Complete dataset; 60.86%, Trim dataset with overall accuracy) to a support vector machine and maximum entropy on our dataset, a correlation coefficient (CC) of 0.2893, 58.83% specificity, 56.12% sensitivity on the Complete dataset and 0.2144 (CC), 53.34% (specificity), 65.59% (sensitivity) on the Trim dataset in identifying interface residues by 5-fold cross-validation on 61 protein chains. This result indicates that the covering algorithm is a powerful and robust protein-protein interaction site prediction method that can guide biologists to make specific experiments on proteins. Examination of the predictions in the context of the 3-dimensional structures of proteins demonstrates the effectiveness of this method.
Collapse
Affiliation(s)
- Xiuquan Du
- The Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Anhui University, Anhui, China; E-Mails:
(J.-X.C.);
(J.S.)
| | - Jiaxing Cheng
- The Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Anhui University, Anhui, China; E-Mails:
(J.-X.C.);
(J.S.)
| | - Jie Song
- The Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Anhui University, Anhui, China; E-Mails:
(J.-X.C.);
(J.S.)
| |
Collapse
|
176
|
Ezkurdia I, Bartoli L, Fariselli P, Casadio R, Valencia A, Tress ML. Progress and challenges in predicting protein-protein interaction sites. Brief Bioinform 2009; 10:233-46. [PMID: 19346321 DOI: 10.1093/bib/bbp021] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The identification of protein-protein interaction sites is an essential intermediate step for mutant design and the prediction of protein networks. In recent years a significant number of methods have been developed to predict these interface residues and here we review the current status of the field. Progress in this area requires a clear view of the methodology applied, the data sets used for training and testing the systems, and the evaluation procedures. We have analysed the impact of a representative set of features and algorithms and highlighted the problems inherent in generating reliable protein data sets and in the posterior analysis of the results. Although it is clear that there have been some improvements in methods for predicting interacting sites, several major bottlenecks remain. Proteins in complexes are still under-represented in the structural databases and in particular many proteins involved in transient complexes are still to be crystallized. We provide suggestions for effective feature selection, and make it clear that community standards for testing, training and performance measures are necessary for progress in the field.
Collapse
Affiliation(s)
- Iakes Ezkurdia
- Centro Nacional de Biotechnolgia, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | | | | | | | | | | |
Collapse
|
177
|
Nimrod G, Schushan M, Steinberg DM, Ben-Tal N. Detection of functionally important regions in "hypothetical proteins" of known structure. Structure 2009; 16:1755-63. [PMID: 19081051 DOI: 10.1016/j.str.2008.10.017] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2008] [Revised: 10/16/2008] [Accepted: 10/19/2008] [Indexed: 10/21/2022]
Abstract
Structural genomics initiatives provide ample structures of "hypothetical proteins" (i.e., proteins of unknown function) at an ever increasing rate. However, without function annotation, this structural goldmine is of little use to biologists who are interested in particular molecular systems. To this end, we used (an improved version of) the PatchFinder algorithm for the detection of functional regions on the protein surface, which could mediate its interactions with, e.g., substrates, ligands, and other proteins. Examination, using a data set of annotated proteins, showed that PatchFinder outperforms similar methods. We collected 757 structures of hypothetical proteins and their predicted functional regions in the N-Func database. Inspection of several of these regions demonstrated that they are useful for function prediction. For example, we suggested an interprotein interface and a putative nucleotide-binding site. A web-server implementation of PatchFinder and the N-Func database are available at http://patchfinder.tau.ac.il/.
Collapse
Affiliation(s)
- Guy Nimrod
- Department of Biochemistry, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
| | | | | | | |
Collapse
|
178
|
Identifying protein–protein interaction sites in transient complexes with temperature factor, sequence profile and accessible surface area. Amino Acids 2009; 38:263-70. [DOI: 10.1007/s00726-009-0245-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2008] [Accepted: 01/21/2009] [Indexed: 11/26/2022]
|
179
|
Engelen S, Trojan LA, Sacquin-Mora S, Lavery R, Carbone A. Joint evolutionary trees: a large-scale method to predict protein interfaces based on sequence sampling. PLoS Comput Biol 2009; 5:e1000267. [PMID: 19165315 PMCID: PMC2613531 DOI: 10.1371/journal.pcbi.1000267] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2008] [Accepted: 12/04/2008] [Indexed: 11/18/2022] Open
Abstract
The Joint Evolutionary Trees (JET) method detects protein interfaces, the core
residues involved in the folding process, and residues susceptible to
site-directed mutagenesis and relevant to molecular recognition. The approach,
based on the Evolutionary Trace (ET) method, introduces a novel way to treat
evolutionary information. Families of homologous sequences are analyzed through
a Gibbs-like sampling of distance trees to reduce effects of erroneous multiple
alignment and impacts of weakly homologous sequences on distance tree
construction. The sampling method makes sequence analysis more sensitive to
functional and structural importance of individual residues by avoiding effects
of the overrepresentation of highly homologous sequences and improves
computational efficiency. A carefully designed clustering method is parametrized
on the target structure to detect and extend patches on protein surfaces into
predicted interaction sites. Clustering takes into account residues'
physical-chemical properties as well as conservation. Large-scale application of
JET requires the system to be adjustable for different datasets and to guarantee
predictions even if the signal is low. Flexibility was achieved by a careful
treatment of the number of retrieved sequences, the amino acid distance between
sequences, and the selective thresholds for cluster identification. An iterative
version of JET (iJET) that guarantees finding the most likely interface residues
is proposed as the appropriate tool for large-scale predictions. Tests are
carried out on the Huang database of 62 heterodimer, homodimer, and transient
complexes and on 265 interfaces belonging to signal transduction proteins,
enzymes, inhibitors, antibodies, antigens, and others. A specific set of
proteins chosen for their special functional and structural properties
illustrate JET behavior on a large variety of interactions covering proteins,
ligands, DNA, and RNA. JET is compared at a large scale to ET and to Consurf,
Rate4Site, siteFiNDER|3D, and SCORECONS on specific structures. A significant
improvement in performance and computational efficiency is shown. Information obtained on the structure of macromolecular complexes is important
for identifying functionally important partners but also for determining how
such interactions will be perturbed by natural or engineered site mutations.
Hence, to fully understand or control biological processes we need to predict in
the most accurate manner protein interfaces for a protein structure, possibly
without knowing its partners. Joint Evolutionary Trees (JET) is a method
designed to detect very different types of interactions of a protein with
another protein, ligands, DNA, and RNA. It uses a carefully designed sampling
method, making sequence analysis more sensitive to the functional and structural
importance of individual residues, and a clustering method parametrized on the
target structure for the detection of patches on protein surfaces and their
extension into predicted interaction sites. JET is a large-scale method, highly
accurate and potentially applicable to search for protein partners.
Collapse
Affiliation(s)
- Stefan Engelen
- Génomique Analytique, Université Pierre et Marie
Curie-Paris 6, UMR S511, Paris, France
- INSERM, U511, Paris, France
| | - Ladislas A. Trojan
- Génomique Analytique, Université Pierre et Marie
Curie-Paris 6, UMR S511, Paris, France
- INSERM, U511, Paris, France
| | | | - Richard Lavery
- Institut de Biologie et Chimie des Protéines, CNRS UMR
5086/IFR 128/Université de Lyon, Lyon, France
| | - Alessandra Carbone
- Génomique Analytique, Université Pierre et Marie
Curie-Paris 6, UMR S511, Paris, France
- INSERM, U511, Paris, France
- * E-mail:
| |
Collapse
|
180
|
Chen XW, Jeong JC. Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics 2009; 25:585-91. [DOI: 10.1093/bioinformatics/btp039] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
181
|
Moreira IS, Fernandes PA, Ramos MJ. Protein-protein docking dealing with the unknown. J Comput Chem 2009; 31:317-42. [DOI: 10.1002/jcc.21276] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
182
|
Li N, Sun Z, Jiang F. Prediction of protein-protein binding site by using core interface residue and support vector machine. BMC Bioinformatics 2008; 9:553. [PMID: 19102736 PMCID: PMC2627892 DOI: 10.1186/1471-2105-9-553] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2008] [Accepted: 12/22/2008] [Indexed: 12/04/2022] Open
Abstract
Background The prediction of protein-protein binding site can provide structural annotation to the protein interaction data from proteomics studies. This is very important for the biological application of the protein interaction data that is increasing rapidly. Moreover, methods for predicting protein interaction sites can also provide crucial information for improving the speed and accuracy of protein docking methods. Results In this work, we describe a binding site prediction method by designing a new residue neighbour profile and by selecting only the core-interface residues for SVM training. The residue neighbour profile includes both the sequential and the spatial neighbour residues of an interface residue, which is a more complete description of the physical and chemical characteristics surrounding the interface residue. The concept of core interface is applied in selecting the interface residues for training the SVM models, which is shown to result in better discrimination between the core interface and other residues. The best SVM model trained was tested on a test set of 50 randomly selected proteins. The sensitivity, specificity, and MCC for the prediction of the core interface residues were 60.6%, 53.4%, and 0.243, respectively. Our prediction results on this test set were compared with other three binding site prediction methods and found to perform better. Furthermore, our method was tested on the 101 unbound proteins from the protein-protein interaction benchmark v2.0. The sensitivity, specificity, and MCC of this test were 57.5%, 32.5%, and 0.168, respectively. Conclusion By improving both the descriptions of the interface residues and their surrounding environment and the training strategy, better SVM models were obtained and shown to outperform previous methods. Our tests on the unbound protein structures suggest further improvement is possible.
Collapse
Affiliation(s)
- Nan Li
- Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing, PR China.
| | | | | |
Collapse
|
183
|
Abstract
Protein–DNA/RNA/protein interactions play critical roles in many biological functions. Previous studies have focused on the different features characterizing the different macromolecule-binding sites and approaches to detect these sites. However, no common unique signature of these sites had been reported. Thus, this work aims to provide a ‘common’ principle dictating the location of the different macromolecule-binding sites founded upon fundamental principles of binding thermodynamics. To achieve this aim, a comprehensive set of structurally nonhomologous DNA-, RNA-, obligate protein- and nonobligate protein-binding proteins, both free and bound to their respective macromolecules, was created and a novel strategy for detecting clusters of residues with electrostatic or steric strain given the protein structure was developed. The results show that regardless of the macromolecule type, the binding strength and conformational changes upon binding, macromolecule-binding sites are energetically less stable than nonmacromolecule-binding sites. They also reveal new energetic features distinguishing DNA- from RNA-binding sites and obligate protein- from nonobligate protein-binding sites in both free/bound protein structures.
Collapse
Affiliation(s)
- Yao Chi Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | | |
Collapse
|
184
|
|
185
|
Igarashi S, Osawa M, Takeuchi K, Ozawa SI, Shimada I. Amino acid selective cross-saturation method for identification of proximal residue pairs in a protein-protein complex. J Am Chem Soc 2008; 130:12168-76. [PMID: 18707104 DOI: 10.1021/ja804062t] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We describe an NMR-based approach, the amino acid selective cross-saturation (ASCS) method, to identify the pairs of the interface residues of protein-protein complexes. ASCS uses a "cross-saturation (CS)-donor" protein, in which only one amino acid is selectively (1)H-labeled in a (2)H-background, and a "CS-acceptor" protein with uniform (2)H, (15)N labeling. Irradiation of the (1)H-labeled amino acid, which exists only in the donor, decreases the intensity of the (1)H- (15)N HSQC signals of the acceptor residues proximal to the (1)H-labeled CS-source residue(s) through the CS phenomenon. Given the three-dimensional structure of each protein in the complex, but not the complex structure, the combinatorial analysis of multiple ASCS results specify the CS-source residue(s), based on the spatial complementarity between the CS-source residues on the CS donor and the cross-saturated amide protons on the acceptor. NMR investigations of the labeling selectivity and efficiency in an E. coli host, which are critical for ASCS, revealed that Ala, Arg, His, Ile, Leu, Lys, Met, Phe, Pro, Trp, and Tyr are selectively labeled with a high (1)H/(2)H ratio. The observation of the ASCS was then confirmed using the known structure of the yeast ubiquitin (Ub) and yeast ubiquitin hydrolase 1 (YUH1). Conversely, reasonable candidates for the CS-source residues were suggested by the analysis of the ASCS results, with reference to the individual structures of YUH1 and Ub. The pairwise distance information between the CS-source residues and the cross-saturated amide groups obtained by ASCS will be useful for modeling protein-protein complexes.
Collapse
Affiliation(s)
- Shunsuke Igarashi
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
| | | | | | | | | |
Collapse
|
186
|
Identification of protein interaction partners and protein-protein interaction sites. J Mol Biol 2008; 382:1276-89. [PMID: 18708070 DOI: 10.1016/j.jmb.2008.08.002] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2008] [Revised: 07/10/2008] [Accepted: 08/01/2008] [Indexed: 01/10/2023]
Abstract
Rigid-body docking has become quite successful in predicting the correct conformations of binary protein complexes, at least when the constituent proteins do not undergo large conformational changes upon binding. However, determining whether two given proteins interact is a more difficult problem. Successful docking procedures often give equally good scores for proteins that do not interact experimentally. This is the case for the multiple minimization approach we use here. An analysis of the results where all proteins within a set are docked with all other proteins (complete cross-docking) shows that the predictions can be greatly improved if the location of the correct binding interface on each protein is known, since the experimental complexes are much more likely to bring these two interfaces into contact, at the same time as yielding good interaction energy scores. While various methods exist for identifying binding interfaces, it is shown that simply studying the interaction of all potential protein pairs within a data set can itself help to identify the correct interfaces.
Collapse
|
187
|
Kundrotas PJ, Lensink MF, Alexov E. Homology-based modeling of 3D structures of protein–protein complexes using alignments of modified sequence profiles. Int J Biol Macromol 2008; 43:198-208. [DOI: 10.1016/j.ijbiomac.2008.05.004] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2008] [Revised: 05/09/2008] [Accepted: 05/12/2008] [Indexed: 11/25/2022]
|
188
|
Zhang SW, Chen W, Yang F, Pan Q. Using Chou's pseudo amino acid composition to predict protein quaternary structure: a sequence-segmented PseAAC approach. Amino Acids 2008; 35:591-8. [PMID: 18427713 DOI: 10.1007/s00726-008-0086-x] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2008] [Accepted: 02/28/2008] [Indexed: 12/11/2022]
Abstract
In the protein universe, many proteins are composed of two or more polypeptide chains, generally referred to as subunits, which associate through noncovalent interactions and, occasionally, disulfide bonds to form protein quaternary structures. It has long been known that the functions of proteins are closely related to their quaternary structures; some examples include enzymes, hemoglobin, DNA polymerase, and ion channels. However, it is extremely labor-expensive and even impossible to quickly determine the structures of hundreds of thousands of protein sequences solely from experiments. Since the number of protein sequences entering databanks is increasing rapidly, it is highly desirable to develop computational methods for classifying the quaternary structures of proteins from their primary sequences. Since the concept of Chou's pseudo amino acid composition (PseAAC) was introduced, a variety of approaches, such as residue conservation scores, von Neumann entropy, multiscale energy, autocorrelation function, moment descriptors, and cellular automata, have been utilized to formulate the PseAAC for predicting different attributes of proteins. Here, in a different approach, a sequence-segmented PseAAC is introduced to represent protein samples. Meanwhile, multiclass SVM classifier modules were adopted to classify protein quaternary structures. As a demonstration, the dataset constructed by Chou and Cai [(2003) Proteins 53:282-289] was adopted as a benchmark dataset. The overall jackknife success rates thus obtained were 88.2-89.1%, indicating that the new approach is quite promising for predicting protein quaternary structure.
Collapse
Affiliation(s)
- Shao-Wu Zhang
- College of Automation, Northwestern Polytechnical University, 710072, Xi'an, China.
| | | | | | | |
Collapse
|
189
|
Liu ZP, Wu LY, Wang Y, Zhang XS, Chen L. Bridging protein local structures and protein functions. Amino Acids 2008; 35:627-50. [PMID: 18421562 PMCID: PMC7088341 DOI: 10.1007/s00726-008-0088-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2008] [Accepted: 03/10/2008] [Indexed: 12/11/2022]
Abstract
One of the major goals of molecular and evolutionary biology is to understand the functions of proteins by extracting functional information from protein sequences, structures and interactions. In this review, we summarize the repertoire of methods currently being applied and report recent progress in the field of in silico annotation of protein function based on the accumulation of vast amounts of sequence and structure data. In particular, we emphasize the newly developed structure-based methods, which are able to identify locally structural motifs and reveal their relationship with protein functions. These methods include computational tools to identify the structural motifs and reveal the strong relationship between these pre-computed local structures and protein functions. We also discuss remaining problems and possible directions for this exciting and challenging area.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 100080, Beijing, China
| | | | | | | | | |
Collapse
|
190
|
Martin J, Regad L, Lecornet H, Camproux AC. Structural deformation upon protein-protein interaction: a structural alphabet approach. BMC STRUCTURAL BIOLOGY 2008; 8:12. [PMID: 18307769 PMCID: PMC2315654 DOI: 10.1186/1472-6807-8-12] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2007] [Accepted: 02/28/2008] [Indexed: 11/26/2022]
Abstract
Background In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding. Results In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%). This proportion is even greater in the interface regions (41%). Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins. Conclusion Our study provides qualitative information about induced fit. These results could be of help for flexible docking.
Collapse
Affiliation(s)
- Juliette Martin
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM UMRS726/Université Denis Diderot Paris 7, F-75005 Paris, France.
| | | | | | | |
Collapse
|
191
|
Abstract
Docking of unbound protein structures into a complex has gained significant progress in recent years, but nonetheless still poses a great challenge. We have pursued a holistic approach to docking which brings together effective methods at different stages. First, protein-protein interaction sites are predicted or obtained from experimental studies in the literature. Interface prediction/experimental data are then used to guide the generation of docked poses or to rank docked poses generated from an unbiased search. Finally, selected models are refined by lengthy molecular dynamics (MD) simulations in explicit water. For CAPRI target T27, we used information on interaction sites as input to drive docking and as a filter to rank docked poses. Lead candidates were then clustered according to RMSD among them. From the clustering, 10 models were selected and subject to refinement by MD simulations. Our Model 7 is rated number one among all submissions according to L_rmsd. Six of our other submissions are rated acceptable. As scorer, eight of our submissions are rated acceptable.
Collapse
Affiliation(s)
- Sanbo Qin
- Institute of Molecular Biophysics, Florida State University, Tallahassee, Florida 32306, USA
| | | |
Collapse
|
192
|
de Vries SJ, van Dijk ADJ, Krzeminski M, van Dijk M, Thureau A, Hsu V, Wassenaar T, Bonvin AMJJ. HADDOCK versus HADDOCK: new features and performance of HADDOCK2.0 on the CAPRI targets. Proteins 2008; 69:726-33. [PMID: 17803234 DOI: 10.1002/prot.21723] [Citation(s) in RCA: 464] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Here we present version 2.0 of HADDOCK, which incorporates considerable improvements and new features. HADDOCK is now able to model not only protein-protein complexes but also other kinds of biomolecular complexes and multi-component (N > 2) systems. In the absence of any experimental and/or predicted information to drive the docking, HADDOCK now offers two additional ab initio docking modes based on either random patch definition or center-of-mass restraints. The docking protocol has been considerably improved, supporting among other solvated docking, automatic definition of semi-flexible regions, and inclusion of a desolvation energy term in the scoring scheme. The performance of HADDOCK2.0 is evaluated on the targets of rounds 4-11, run in a semi-automated mode using the original information we used in our CAPRI submissions. This enables a direct assessment of the progress made since the previous versions. Although HADDOCK performed very well in CAPRI (65% and 71% success rates, overall and for unbound targets only, respectively), a substantial improvement was achieved with HADDOCK2.0.
Collapse
Affiliation(s)
- Sjoerd J de Vries
- Bijvoet Center for Biomolecular Research, Science Faculty, Utrecht University, 3584CH, Utrecht, The Netherlands
| | | | | | | | | | | | | | | |
Collapse
|
193
|
Nguyen C, Gardiner KJ, Cios KJ. A hidden Markov model for predicting protein interfaces. J Bioinform Comput Biol 2007; 5:739-53. [PMID: 17688314 DOI: 10.1142/s0219720007002722] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2006] [Revised: 12/26/2006] [Indexed: 11/18/2022]
Abstract
Protein-protein interactions play a defining role in protein function. Identifying the sites of interaction in a protein is a critical problem for understanding its functional mechanisms, as well as for drug design. To predict sites within a protein chain that participate in protein complexes, we have developed a novel method based on the Hidden Markov Model, which combines several biological characteristics of the sequences neighboring a target residue: structural information, accessible surface area, and transition probability among amino acids. We have evaluated the method using 5-fold cross-validation on 139 unique proteins and demonstrated precision of 66% and recall of 61% in identifying interfaces. These results are better than those achieved by other methods used for identification of interfaces.
Collapse
Affiliation(s)
- Cao Nguyen
- Department of Computer Science and Engineering, University of Colorado at Denver and Health Sciences, Denver, CO 80217, USA.
| | | | | |
Collapse
|
194
|
Abstract
In a cell, it has been estimated that each protein on average interacts with roughly 10 others, resulting in tens of thousands of proteins known or suspected to have interaction partners; of these, only a tiny fraction have solved protein structures. To partially address this problem, we have developed M-TASSER, a hierarchical method to predict protein quaternary structure from sequence that involves template identification by multimeric threading, followed by multimer model assembly and refinement. The final models are selected by structure clustering. M-TASSER has been tested on a benchmark set comprising 241 dimers having templates with weak sequence similarity and 246 without multimeric templates in the dimer library. Of the total of 207 targets predicted to interact as dimers, 165 (80%) were correctly assigned as interacting with a true positive rate of 68% and a false positive rate of 17%. The initial best template structures have an average root mean-square deviation to native of 5.3, 6.7, and 7.4 A for the monomer, interface, and dimer structures. The final model shows on average a root mean-square deviation improvement of 1.3, 1.3, and 1.5 A over the initial template structure for the monomer, interface, and dimer structures, with refinement evident for 87% of the cases. Thus, we have developed a promising approach to predict full-length quaternary structure for proteins that have weak sequence similarity to proteins of solved quaternary structure.
Collapse
Affiliation(s)
| | - Jeffrey Skolnick
- Address reprint requests to Jeffrey Skolnick, Tel.: 404-407-8975; Fax: 404-385-7478.
| |
Collapse
|
195
|
Abstract
UNLABELLED A number of complementary methods have been developed for predicting protein-protein interaction sites. We sought to increase prediction robustness and accuracy by combining results from different predictors, and report here a meta web server, meta-PPISP, that is built on three individual web servers: cons-PPISP (http://pipe.scs.fsu.edu/ppisp.html), Promate (http://bioportal.weizmann.ac.il/promate), and PINUP (http://sparks.informatics.iupui.edu/PINUP/). A linear regression method, using the raw scores of the three servers as input, was trained on a set of 35 nonhomologous proteins. Cross validation showed that meta-PPISP outperforms all the three individual servers. At coverages identical to those of the individual methods, the accuracy of meta-PPISP is higher by 4.8 to 18.2 percentage points. Similar improvements in accuracy are also seen on CAPRI and other targets. AVAILABILITY meta-PPISP can be accessed at http://pipe.scs.fsu.edu/meta-ppisp.html
Collapse
Affiliation(s)
- Sanbo Qin
- Institute of Molecular Biophysics, School of Computational Science, Florida State University, Tallahassee, Florida 32306, USA
| | | |
Collapse
|
196
|
Brock K, Talley K, Coley K, Kundrotas P, Alexov E. Optimization of electrostatic interactions in protein-protein complexes. Biophys J 2007; 93:3340-52. [PMID: 17693468 PMCID: PMC2072065 DOI: 10.1529/biophysj.107.112367] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
In this article, we present a statistical analysis of the electrostatic properties of 298 protein-protein complexes and 356 domain-domain structures extracted from the previously developed database of protein complexes (ProtCom, http://www.ces.clemson.edu/compbio/protcom). For each structure in the dataset we calculated the total electrostatic energy of the binding and its two components, Coulombic and reaction field energy. It was found that in a vast majority of the cases (>90%), the total electrostatic component of the binding energy was unfavorable. At the same time, the Coulombic component of the binding energy was found to favor the complex formation while the reaction field component of the binding energy opposed the binding. It was also demonstrated that the components in a wild-type (WT) structure are optimized/anti-optimized with respect to the corresponding distributions, arising from random shuffling of the charged side chains. The degree of this optimization was assessed through the Z-score of WT energy in respect to the random distribution. It was found that the Z-scores of Coulombic interactions peak at a considerably negative value for all 654 cases considered while the Z-score of the reaction field energy varied among different types of complexes. All these findings indicate that the Coulombic interactions within WT protein-protein complexes are optimized to favor the complex formation while the total electrostatic energy predominantly opposes the binding. This observation was used to discriminate WT structures among sets of structural decoys and showed that the electrostatic component of the binding energy is not a good discriminator of the WT; while, Coulombic or reaction field energies perform better depending upon the decoy set used.
Collapse
Affiliation(s)
- Kelly Brock
- South Carolina Governor School for Science and Mathematics, Hartsville, South Carolina, USA
| | | | | | | | | |
Collapse
|
197
|
Kundrotas P, Alexov E. Predicting interacting and interfacial residues using continuous sequence segments. Int J Biol Macromol 2007; 41:615-23. [PMID: 17850859 DOI: 10.1016/j.ijbiomac.2007.08.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2007] [Revised: 07/31/2007] [Accepted: 08/01/2007] [Indexed: 01/07/2023]
Abstract
Development of sequence-based methods for predicting putative interfacial residues is an extremely important task in modeling 3D structures of protein-protein complexes. In the present paper we used non-gapped sequence segments to predict both interacting and interfacial residues. We demonstrated that continuous sequence segments do occur at the protein-protein interfaces and showed that continuous interacting interfacial segments (CIIS) of length nine are presented on average, in approximately 37% of the complexes in our dataset. Our results indicate that CIIS consist mostly of interacting strands and/or loops, while the CIIS involving the helixes are scarce. We performed scoring of CIIS using four different scoring mechanisms and found that scores of CIIS differ significantly from the scores calculated for random stretches of residues. We argue that such statistical difference inferred thought the corresponding Z-scores could be used for detecting putative interfacial residue segments without using any structural information. This hypothesis was tested on our dataset and benchmarking resulted to 10-60% prediction accuracy depending on type of benchmarking and scoring scheme used in calculations. Such predictions that do not depend on the availability of the 3D structures of monomers can be quite valuable in modeling 3D structures of obligatory complexes, for which structures of separated monomers do not exist.
Collapse
Affiliation(s)
- Petras Kundrotas
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, SC 29634, United States
| | | |
Collapse
|
198
|
Chung JL, Wang W, Bourne PE. High-throughput identification of interacting protein-protein binding sites. BMC Bioinformatics 2007; 8:223. [PMID: 17594507 PMCID: PMC1925121 DOI: 10.1186/1471-2105-8-223] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2006] [Accepted: 06/27/2007] [Indexed: 11/23/2022] Open
Abstract
Background With the advent of increasing sequence and structural data, a number of methods have been proposed to locate putative protein binding sites from protein surfaces. Therefore, methods that are able to identify whether these binding sites interact are needed. Results We have developed a new method using a machine learning approach to detect if protein binding sites, once identified, interact with each other. The method exploits information relating to sequence and structural complementary across protein interfaces and has been tested on a non-redundant data set consisting of 584 homo-dimers and 198 hetero-dimers extracted from the PDB. Results indicate 87.4% of the interacting binding sites and 68.6% non-interacting binding sites were correctly identified. Furthermore, we built a pipeline that links this method to a modified version of our previously developed method that predicts the location of binding sites. Conclusion We have demonstrated that this high-throughput pipeline is capable of identifying binding sites for proteins, their interacting binding sites and, ultimately, their binding partners on a large scale.
Collapse
Affiliation(s)
- Jo-Lan Chung
- Department of Chemistry and Biochemistry, University of California, San Diego, Gilman Drive, La Jolla, CA 92093-0743, USA
- San Diego Supercomputer Center, University of California, San Diego, Gilman Drive, La Jolla, CA 92093-0743, USA
| | - Wei Wang
- Department of Chemistry and Biochemistry, University of California, San Diego, Gilman Drive, La Jolla, CA 92093-0743, USA
| | - Philip E Bourne
- Department of Pharmacology, University of California, San Diego, Gilman Drive, La Jolla, CA 92093-0743, USA
- San Diego Supercomputer Center, University of California, San Diego, Gilman Drive, La Jolla, CA 92093-0743, USA
| |
Collapse
|
199
|
Abstract
MOTIVATION Proteins function through interactions with other proteins and biomolecules. Protein-protein interfaces hold key information toward molecular understanding of protein function. In the past few years, there have been intensive efforts in developing methods for predicting protein interface residues. A review that presents the current status of interface prediction and an overview of its applications and project future developments is in order. SUMMARY Interface prediction methods rely on a wide range of sequence, structural and physical attributes that distinguish interface residues from non-interface surface residues. The input data are manipulated into either a numerical value or a probability representing the potential for a residue to be inside a protein interface. Predictions are now satisfactory for complex-forming proteins that are well represented in the Protein Data Bank, but less so for under-represented ones. Future developments will be directed at tackling problems such as building structural models for multi-component structural complexes.
Collapse
Affiliation(s)
- Huan-Xiang Zhou
- Institute of Molecular Biophysics, Florida State University, Tallahassee, Florida 32306, USA.
| | | |
Collapse
|
200
|
Abstract
The side chains of the 20 types of amino acids, owing to a large extent to their different physical properties, have characteristic distributions in interior/surface regions of individual proteins and in interface/non-interface portions of protein surfaces that bind proteins or nucleic acids. These distributions have important structural and functional implications. We have developed accurate methods for predicting the solvent accessibility of amino acids from a protein sequence and for predicting interface residues from the structure of a protein-binding or DNA-binding protein. The methods are called WESA, cons-PPISP and DISPLAR, respectively. The web servers of these methods are now available at http://pipe.scs.fsu.edu. To illustrate the utility of these web servers, cons-PPISP and DISPLAR predictions are used to construct a structural model for a multicomponent protein–DNA complex.
Collapse
Affiliation(s)
| | | | - Huan-Xiang Zhou
- *To whom correspondence should be addressed. +1 850 645 1336+1 850 644 7244
| |
Collapse
|