26
|
Lensink MF, Velankar S, Kryshtafovych A, Huang SY, Schneidman-Duhovny D, Sali A, Segura J, Fernandez-Fuentes N, Viswanath S, Elber R, Grudinin S, Popov P, Neveu E, Lee H, Baek M, Park S, Heo L, Rie Lee G, Seok C, Qin S, Zhou HX, Ritchie DW, Maigret B, Devignes MD, Ghoorah A, Torchala M, Chaleil RAG, Bates PA, Ben-Zeev E, Eisenstein M, Negi SS, Weng Z, Vreven T, Pierce BG, Borrman TM, Yu J, Ochsenbein F, Guerois R, Vangone A, Rodrigues JPGLM, van Zundert G, Nellen M, Xue L, Karaca E, Melquiond ASJ, Visscher K, Kastritis PL, Bonvin AMJJ, Xu X, Qiu L, Yan C, Li J, Ma Z, Cheng J, Zou X, Shen Y, Peterson LX, Kim HR, Roy A, Han X, Esquivel-Rodriguez J, Kihara D, Yu X, Bruce NJ, Fuller JC, Wade RC, Anishchenko I, Kundrotas PJ, Vakser IA, Imai K, Yamada K, Oda T, Nakamura T, Tomii K, Pallara C, Romero-Durana M, Jiménez-García B, Moal IH, Férnandez-Recio J, Joung JY, Kim JY, Joo K, Lee J, Kozakov D, Vajda S, Mottarella S, Hall DR, Beglov D, Mamonov A, Xia B, Bohnuud T, Del Carpio CA, Ichiishi E, Marze N, Kuroda D, Roy Burman SS, Gray JJ, Chermak E, Cavallo L, Oliva R, Tovchigrechko A, Wodak SJ. Prediction of homoprotein and heteroprotein complexes by protein docking and template-based modeling: A CASP-CAPRI experiment. Proteins 2016; 84 Suppl 1:323-48. [PMID: 27122118 PMCID: PMC5030136 DOI: 10.1002/prot.25007] [Citation(s) in RCA: 116] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Revised: 12/30/2015] [Accepted: 02/02/2016] [Indexed: 12/26/2022]
Abstract
We present the results for CAPRI Round 30, the first joint CASP-CAPRI experiment, which brought together experts from the protein structure prediction and protein-protein docking communities. The Round comprised 25 targets from amongst those submitted for the CASP11 prediction experiment of 2014. The targets included mostly homodimers, a few homotetramers, and two heterodimers, and comprised protein chains that could readily be modeled using templates from the Protein Data Bank. On average 24 CAPRI groups and 7 CASP groups submitted docking predictions for each target, and 12 CAPRI groups per target participated in the CAPRI scoring experiment. In total more than 9500 models were assessed against the 3D structures of the corresponding target complexes. Results show that the prediction of homodimer assemblies by homology modeling techniques and docking calculations is quite successful for targets featuring large enough subunit interfaces to represent stable associations. Targets with ambiguous or inaccurate oligomeric state assignments, often featuring crystal contact-sized interfaces, represented a confounding factor. For those, a much poorer prediction performance was achieved, while nonetheless often providing helpful clues on the correct oligomeric state of the protein. The prediction performance was very poor for genuine tetrameric targets, where the inaccuracy of the homology-built subunit models and the smaller pair-wise interfaces severely limited the ability to derive the correct assembly mode. Our analysis also shows that docking procedures tend to perform better than standard homology modeling techniques and that highly accurate models of the protein components are not always required to identify their association modes with acceptable accuracy. Proteins 2016; 84(Suppl 1):323-348. © 2016 Wiley Periodicals, Inc.
Collapse
|
27
|
Anishchenko I, Badal V, Dauzhenka T, Das M, Tuzikov AV, Kundrotas PJ, Vakser IA. Genome-Wide Structural Modeling of Protein-Protein Interactions. BIOINFORMATICS RESEARCH AND APPLICATIONS 2016. [DOI: 10.1007/978-3-319-38782-6_8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
28
|
Badal VD, Kundrotas PJ, Vakser IA. Text Mining for Protein Docking. PLoS Comput Biol 2015; 11:e1004630. [PMID: 26650466 PMCID: PMC4674139 DOI: 10.1371/journal.pcbi.1004630] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 10/29/2015] [Indexed: 11/18/2022] Open
Abstract
The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. Protein interactions are central for many cellular processes. Physical characterization of these interactions is essential for understanding of life processes and applications in biology and medicine. Because of the inherent limitations of experimental techniques and rapid development of computational power and methodology, computer modeling is a tool of choice in many studies. Publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for modeling of proteins and protein complexes. A major paradigm shift in modeling of protein complexes is emerging due to the rapidly expanding amount of such information, which can be used as modeling constraints. Text mining has been widely used in recreating networks of protein interactions, as well as in detecting small molecule binding sites on proteins. Combining and expanding these two well-developed areas of research, we applied the text mining to physical modeling of protein complexes (protein docking). Our procedure retrieves published abstracts on a protein-protein interaction and extracts the relevant information. The results show that correct information on binding can be obtained for about half of protein complexes. The extracted constraints were incorporated in a modeling procedure, significantly improving its performance.
Collapse
|
29
|
Kirys T, Ruvinsky AM, Singla D, Tuzikov AV, Kundrotas PJ, Vakser IA. Simulated unbound structures for benchmarking of protein docking in the DOCKGROUND resource. BMC Bioinformatics 2015; 16:243. [PMID: 26227548 PMCID: PMC4521349 DOI: 10.1186/s12859-015-0672-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 07/10/2015] [Indexed: 11/10/2022] Open
Abstract
Background Proteins play an important role in biological processes in living organisms. Many protein functions are based on interaction with other proteins. The structural information is important for adequate description of these interactions. Sets of protein structures determined in both bound and unbound states are essential for benchmarking of the docking procedures. However, the number of such proteins in PDB is relatively small. A radical expansion of such sets is possible if the unbound structures are computationally simulated. Results The Dockground public resource provides data to improve our understanding of protein–protein interactions and to assist in the development of better tools for structural modeling of protein complexes, such as docking algorithms and scoring functions. A large set of simulated unbound protein structures was generated from the bound structures. The modeling protocol was based on 1 ns Langevin dynamics simulation. The simulated structures were validated on the ensemble of experimentally determined unbound and bound structures. The set is intended for large scale benchmarking of docking algorithms and scoring functions. Conclusions A radical expansion of the unbound protein docking benchmark set was achieved by simulating the unbound structures. The simulated unbound structures were selected according to criteria from systematic comparison of experimentally determined bound and unbound structures. The set is publicly available at http://dockground.compbio.ku.edu.
Collapse
|
30
|
Anishchenko I, Kundrotas PJ, Tuzikov AV, Vakser IA. Structural templates for comparative protein docking. Proteins 2015; 83:1563-70. [PMID: 25488330 DOI: 10.1002/prot.24736] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Revised: 11/15/2014] [Accepted: 11/26/2014] [Indexed: 11/07/2022]
Abstract
Structural characterization of protein-protein interactions is important for understanding life processes. Because of the inherent limitations of experimental techniques, such characterization requires computational approaches. Along with the traditional protein-protein docking (free search for a match between two proteins), comparative (template-based) modeling of protein-protein complexes has been gaining popularity. Its development puts an emphasis on full and partial structural similarity between the target protein monomers and the protein-protein complexes previously determined by experimental techniques (templates). The template-based docking relies on the quality and diversity of the template set. We present a carefully curated, nonredundant library of templates containing 4950 full structures of binary complexes and 5936 protein-protein interfaces extracted from the full structures at 12 Å distance cut-off. Redundancy in the libraries was removed by clustering the PDB structures based on structural similarity. The value of the clustering threshold was determined from the analysis of the clusters and the docking performance on a benchmark set. High structural quality of the interfaces in the template and validation sets was achieved by automated procedures and manual curation. The library is included in the Dockground resource for molecular recognition studies at http://dockground.bioinformatics.ku.edu.
Collapse
|
31
|
Anishchenko I, Kundrotas PJ, Tuzikov AV, Vakser IA. Protein models docking benchmark 2. Proteins 2015; 83:891-7. [PMID: 25712716 DOI: 10.1002/prot.24784] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2014] [Revised: 01/30/2015] [Accepted: 02/14/2015] [Indexed: 12/28/2022]
Abstract
Structural characterization of protein-protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template-free or template-based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high-resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have predefined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model-to-native C(α) RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the "real case scenario," as opposed to the previous set, where a significant number of structures were model-like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu.
Collapse
|
32
|
Lensink MF, Moal IH, Bates PA, Kastritis PL, Melquiond ASJ, Karaca E, Schmitz C, van Dijk M, Bonvin AMJJ, Eisenstein M, Jiménez-García B, Grosdidier S, Solernou A, Pérez-Cano L, Pallara C, Fernández-Recio J, Xu J, Muthu P, Praneeth Kilambi K, Gray JJ, Grudinin S, Derevyanko G, Mitchell JC, Wieting J, Kanamori E, Tsuchiya Y, Murakami Y, Sarmiento J, Standley DM, Shirota M, Kinoshita K, Nakamura H, Chavent M, Ritchie DW, Park H, Ko J, Lee H, Seok C, Shen Y, Kozakov D, Vajda S, Kundrotas PJ, Vakser IA, Pierce BG, Hwang H, Vreven T, Weng Z, Buch I, Farkash E, Wolfson HJ, Zacharias M, Qin S, Zhou HX, Huang SY, Zou X, Wojdyla JA, Kleanthous C, Wodak SJ. Blind prediction of interfacial water positions in CAPRI. Proteins 2014; 82:620-32. [PMID: 24155158 PMCID: PMC4582081 DOI: 10.1002/prot.24439] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2013] [Revised: 09/16/2013] [Accepted: 09/26/2013] [Indexed: 12/30/2022]
Abstract
We report the first assessment of blind predictions of water positions at protein-protein interfaces, performed as part of the critical assessment of predicted interactions (CAPRI) community-wide experiment. Groups submitting docking predictions for the complex of the DNase domain of colicin E2 and Im2 immunity protein (CAPRI Target 47), were invited to predict the positions of interfacial water molecules using the method of their choice. The predictions-20 groups submitted a total of 195 models-were assessed by measuring the recall fraction of water-mediated protein contacts. Of the 176 high- or medium-quality docking models-a very good docking performance per se-only 44% had a recall fraction above 0.3, and a mere 6% above 0.5. The actual water positions were in general predicted to an accuracy level no better than 1.5 Å, and even in good models about half of the contacts represented false positives. This notwithstanding, three hotspot interface water positions were quite well predicted, and so was one of the water positions that is believed to stabilize the loop that confers specificity in these complexes. Overall the best interface water predictions was achieved by groups that also produced high-quality docking models, indicating that accurate modelling of the protein portion is a determinant factor. The use of established molecular mechanics force fields, coupled to sampling and optimization procedures also seemed to confer an advantage. Insights gained from this analysis should help improve the prediction of protein-water interactions and their role in stabilizing protein complexes.
Collapse
|
33
|
Kundrotas PJ, Vakser IA. Global and local structural similarity in protein-protein complexes: implications for template-based docking. Proteins 2013; 81:2137-42. [PMID: 23946125 DOI: 10.1002/prot.24392] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2013] [Revised: 07/23/2013] [Accepted: 08/02/2013] [Indexed: 02/02/2023]
Abstract
The increasing amount of structural information on protein-protein interactions makes it possible to predict the structure of protein-protein complexes by comparison/alignment of the interacting proteins to the ones in cocrystallized complexes. In the predictions based on structure similarity, the template search is performed by structural alignment of the target interactors with the entire structures or with the interface only of the subunits in cocrystallized complexes. This study investigates the scope of the structural similarity that facilitates the detection of a broad range of templates significantly divergent from the targets. The analysis of the target-template similarity is based on models of protein-protein complexes in a large representative set of heterodimers. The similarity of the biological and crystal packing interfaces, dissimilar interface structural motifs in overall similar structures, interface similarity to the full structure, and local similarity away from the interface were analyzed. The structural similarity at the protein-protein interfaces only was observed in ~25% of target-template pairs with sequence identity <20% and primarily homodimeric templates. For ~50% of the target-template pairs, the similarity at the interface was accompanied by the similarity of the whole structure. However, the structural similarity at the interfaces was still stronger than that of the noninterface parts. The study provides insights into structural and functional diversity of protein-protein complexes, and relative performance of the interface and full structure alignment in docking.
Collapse
|
34
|
Anishchenko I, Kundrotas PJ, Tuzikov AV, Vakser IA. Protein models: the Grand Challenge of protein docking. Proteins 2013; 82:278-87. [PMID: 23934791 DOI: 10.1002/prot.24385] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2013] [Revised: 07/16/2013] [Accepted: 07/26/2013] [Indexed: 12/28/2022]
Abstract
Characterization of life processes at the molecular level requires structural details of protein-protein interactions (PPIs). The number of experimentally determined protein structures accounts only for a fraction of known proteins. This gap has to be bridged by modeling, typically using experimentally determined structures as templates to model related proteins. The fraction of experimentally determined PPI structures is even smaller than that for the individual proteins, due to a larger number of interactions than the number of individual proteins, and a greater difficulty of crystallizing protein-protein complexes. The approaches to structural modeling of PPI (docking) often have to rely on modeled structures of the interactors, especially in the case of large PPI networks. Structures of modeled proteins are typically less accurate than the ones determined by X-ray crystallography or nuclear magnetic resonance. Thus the utility of approaches to dock these structures should be assessed by thorough benchmarking, specifically designed for protein models. To be credible, such benchmarking has to be based on carefully curated sets of structures with levels of distortion typical for modeled proteins. This article presents such a suite of models built for the benchmark set of the X-ray structures from the Dockground resource (http://dockground.bioinformatics.ku.edu) by a combination of homology modeling and Nudged Elastic Band method. For each monomer, six models were generated with predefined C(α) root mean square deviation from the native structure (1, 2, …, 6 Å). The sets and the accompanying data provide a comprehensive resource for the development of docking methodology for modeled proteins.
Collapse
|
35
|
Kundrotas PJ, Vakser IA, Janin J. Structural templates for modeling homodimers. Protein Sci 2013; 22:1655-63. [PMID: 23996787 DOI: 10.1002/pro.2361] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Revised: 08/23/2013] [Accepted: 08/23/2013] [Indexed: 12/17/2022]
Abstract
Oligomeric proteins are more abundant in nature than monomeric proteins, and involved in all biological processes. In the absence of an experimental structure, their subunits can be modeled from their sequence like monomeric proteins, but reliable procedures to build the oligomeric assembly are scarce. Template-based methods, which start from known protein structures, are commonly applied to model subunits. We present a method to model homodimers that relies on a structural alignment of the subunits, and test it on a set of 511 target structures recently released by the Protein Data Bank, taking as templates the earlier released structures of 3108 homodimeric proteins (H-set), and 2691 monomeric proteins that form dimer-like assemblies in crystals (M-set). The structural alignment identifies a H-set template for 97% of the targets, and in half of the cases, it yields a correct model of the dimer geometry and residue-residue contacts in the target. It also identifies a M-set template for most of the targets, and some of the crystal dimers are very similar to the target homodimers. The procedure efficiently detects homology at low levels of sequence identities, and points to erroneous quaternary structures in the Protein Data Bank. The high coverage of the target set suggests that the content of the Protein Data Bank already approaches the structural diversity of protein assemblies in nature, and that template-based methods should become the choice method for modeling oligomeric as well as monomeric proteins.
Collapse
|
36
|
Kundrotas PJ, Vakser IA. Protein-protein alternative binding modes do not overlap. Protein Sci 2013; 22:1141-5. [PMID: 23775945 DOI: 10.1002/pro.2295] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2013] [Revised: 06/01/2013] [Accepted: 06/03/2013] [Indexed: 11/09/2022]
Abstract
Proteins often bind other proteins in more than one way. Thus alternative binding modes is an essential feature of protein interactions. Such binding modes may be detected by X-ray crystallography and thus reflected in Protein Data Bank. The alternative binding is often observed not for the protein itself but for its structural homolog. The results of this study based on the analysis of a comprehensive set of co-crystallized protein-protein complexes show that the alternative binding modes generally do not overlap, but are spatially separated. This effect is based on molecular recognition characteristics of the protein structures. The results are also in excellent agreement with the intermolecular energy funnel size estimates obtained previously by an independent methodology. The results provide an important insight into the principles of protein association, as well as potential guidelines for modeling of protein complexes and the design of protein interfaces.
Collapse
|
37
|
Kundrotas PJ, Zhu Z, Vakser IA. GWIDD: a comprehensive resource for genome-wide structural modeling of protein-protein interactions. Hum Genomics 2012; 6:7. [PMID: 23245398 PMCID: PMC3500202 DOI: 10.1186/1479-7364-6-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Accepted: 07/11/2012] [Indexed: 11/10/2022] Open
Abstract
Protein-protein interactions are a key component of life processes. The knowledge of the three-dimensional structure of these interactions is important for understanding protein function. Genome-Wide Docking Database (http://gwidd.bioinformatics.ku.edu) offers an extensive source of data for structural studies of protein-protein complexes on genome scale. The current release of the database combines the available experimental data on the structure and characteristics of protein interactions with structural modeling of protein complexes for 771 organisms spanned over the entire universe of life from viruses to humans. The interactions are stored in a relational database with user-friendly interface that includes various search options. The search results can be interactively previewed; the structures, downloaded, along with the interaction characteristics.
Collapse
|
38
|
Sinha R, Kundrotas PJ, Vakser IA. Protein docking by the interface structure similarity: how much structure is needed? PLoS One 2012; 7:e31349. [PMID: 22348074 PMCID: PMC3278447 DOI: 10.1371/journal.pone.0031349] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2011] [Accepted: 01/08/2012] [Indexed: 11/19/2022] Open
Abstract
The increasing availability of co-crystallized protein-protein complexes provides an opportunity to use template-based modeling for protein-protein docking. Structure alignment techniques are useful in detection of remote target-template similarities. The size of the structure involved in the alignment is important for the success in modeling. This paper describes a systematic large-scale study to find the optimal definition/size of the interfaces for the structure alignment-based docking applications. The results showed that structural areas corresponding to the cutoff values <12 Å across the interface inadequately represent structural details of the interfaces. With the increase of the cutoff beyond 12 Å, the success rate for the benchmark set of 99 protein complexes, did not increase significantly for higher accuracy models, and decreased for lower-accuracy models. The 12 Å cutoff was optimal in our interface alignment-based docking, and a likely best choice for the large-scale (e.g., on the scale of the entire genome) applications to protein interaction networks. The results provide guidelines for the docking approaches, including high-throughput applications to modeled structures.
Collapse
|
39
|
Sinha R, Kundrotas PJ, Vakser IA. Docking by structural similarity at protein-protein interfaces. Proteins 2011; 78:3235-41. [PMID: 20715056 DOI: 10.1002/prot.22812] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Rapid accumulation of experimental data on protein-protein complexes drives the paradigm shift in protein docking from "traditional," template free approaches to template based techniques. Homology docking algorithms based on sequence similarity between target and template complexes can account for up to 20% of known protein-protein interactions. When highly homologous templates for the target complex are not available, but the structure of the target monomers is known, docking by local structural alignment may provide an adequate solution. Such an algorithm was developed based on the structural comparison of monomers to cocrystallized interfaces. A library of the interfaces was generated from cocrystallized protein-protein complexes in PDB. The partial structure alignment algorithm was validated on the DOCKGROUND benchmark sets. The optimal performance of the partial (interface) structure alignment was achieved with the interface residues defined by 12 Å distance across the interface. Overall, the partial structure alignment yielded more accurate models than the full structure alignment. Most templates identified by the partial structure alignment had low sequence identity to the target, which makes them hard to detect by sequence-based methods. The results indicate that the structure alignment techniques provide a much needed addition to the docking arsenal, with the combined structure alignment and template free docking success rate significantly surpassing that of the free docking alone.
Collapse
|
40
|
Kundrotas PJ, Anishchenko I, Tuzikov AV, Vakser I. Docking Benchmark Set of Protein Models. Biophys J 2011. [DOI: 10.1016/j.bpj.2010.12.1947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
|
41
|
Kundrotas PJ, Vakser IA. Accuracy of protein-protein binding sites in high-throughput template-based modeling. PLoS Comput Biol 2010; 6:e1000727. [PMID: 20369011 PMCID: PMC2848539 DOI: 10.1371/journal.pcbi.1000727] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2009] [Accepted: 03/01/2010] [Indexed: 11/18/2022] Open
Abstract
The accuracy of protein structures, particularly their binding sites, is essential for the success of modeling protein complexes. Computationally inexpensive methodology is required for genome-wide modeling of such structures. For systematic evaluation of potential accuracy in high-throughput modeling of binding sites, a statistical analysis of target-template sequence alignments was performed for a representative set of protein complexes. For most of the complexes, alignments containing all residues of the interface were found. The full interface alignments were obtained even in the case of poor alignments where a relatively small part of the target sequence (as low as 40%) aligned to the template sequence, with a low overall alignment identity (<30%). Although such poor overall alignments might be considered inadequate for modeling of whole proteins, the alignment of the interfaces was strong enough for docking. In the set of homology models built on these alignments, one third of those ranked 1 by a simple sequence identity criteria had RMSD<5 Å, the accuracy suitable for low-resolution template free docking. Such models corresponded to multi-domain target proteins, whereas for single-domain proteins the best models had 5 Å<RMSD<10 Å, the accuracy suitable for less sensitive structure-alignment methods. Overall, ∼50% of complexes with the interfaces modeled by high-throughput techniques had accuracy suitable for meaningful docking experiments. This percentage will grow with the increasing availability of co-crystallized protein-protein complexes. Protein-protein interactions play a central role in life processes at the molecular level. The structural information on these interactions is essential for our understanding of these processes and our ability to design drugs to cure diseases. Limitations of experimental techniques to determine the structure of protein-protein complexes leave the vast majority of these complexes to be determined by computational modeling. The modeling is also important for revealing the mechanisms of the complex formation. The 3D modeling of protein complexes (protein docking) relies on the structure of the individual proteins for the prediction of their assembly. Thus the structural accuracy of the individual proteins, which often are models themselves, is critical for the docking. For the docking purposes, the accuracy of the binding sites is obviously essential, whereas the accuracy of the non-binding regions is less critical. In our study, we systematically analyze the accuracy of the binding sites in protein models produced by high-throughput techniques suitable for large-scale (e.g., genome-wide) studies. The results indicate that this accuracy is adequate for the low- to medium-resolution docking of a significant part of known protein-protein complexes.
Collapse
|
42
|
Abstract
Structural information on interacting proteins is important for understanding life processes at the molecular level. Genome-wide docking database is an integrated resource for structural studies of protein-protein interactions on the genome scale, which combines the available experimental data with models obtained by docking techniques. Current database version (August 2009) contains 25 559 experimental and modeled 3D structures for 771 organisms spanned over the entire universe of life from viruses to humans. Data are organized in a relational database with user-friendly search interface allowing exploration of the database content by a number of parameters. Search results can be interactively previewed and downloaded as PDB-formatted files, along with the information relevant to the specified interactions. The resource is freely available at http://gwidd.bioinformatics.ku.edu.
Collapse
|
43
|
Kundrotas PJ, Lensink MF, Alexov E. Homology-based modeling of 3D structures of protein–protein complexes using alignments of modified sequence profiles. Int J Biol Macromol 2008; 43:198-208. [DOI: 10.1016/j.ijbiomac.2008.05.004] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2008] [Revised: 05/09/2008] [Accepted: 05/12/2008] [Indexed: 11/25/2022]
|
44
|
Kundrotas PJ, Alexov E. PROTCOM: searchable database of protein complexes enhanced with domain-domain structures. Nucleic Acids Res 2006; 35:D575-9. [PMID: 17071962 PMCID: PMC1635331 DOI: 10.1093/nar/gkl768] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The database of protein complexes (PROTCOM) is a compilation of known 3D structures of protein–protein complexes enriched with artificially created domain–domain structures using the available entries in the Protein Data Bank. The domain–domain structures are generated by parsing single chain structures into loosely connected domains and are important features of the database. The database () could be used for benchmarking purposes of the docking and other algorithms for predicting 3D structures of protein–protein complexes. The database can be utilized as a template database in the homology or threading methods for modeling the 3D structures of unknown protein–protein complexes. PROTCOM provides the scientific community with an integrated set of tools for browsing, searching, visualizing and downloading a pool of protein complexes. The user is given the option to select a subset of entries using a combination of up to 10 different criteria. As on July 2006 the database contains 1770 entries, each of which consists of the known 3D structures and additional relevant information that can be displayed either in text-only or in visual mode.
Collapse
|
45
|
Kundrotas PJ, Alexov E. Predicting 3D structures of transient protein-protein complexes by homology. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2006; 1764:1498-511. [PMID: 16963323 DOI: 10.1016/j.bbapap.2006.08.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2006] [Revised: 07/27/2006] [Accepted: 08/03/2006] [Indexed: 11/26/2022]
Abstract
The paper reports a homology based approach for predicting the 3D structures of full length hetero protein complexes. We have created a database of templates that includes structures of hetero protein-protein complexes as well as domain-domain structures (), which allowed us to expand the template pool up to 418 two-chain entries (at 40% sequence identity). Two protocols were tested-a protocol based on position specific Blast search (Protocol-I) and a protocol based on structural similarity of monomers (Protocol-II). All possible combinations of two monomers (350,284 pairs) in the ProtCom database were subjected to both protocols to predict if they form complexes. The predictions were benchmarked against the ProtCom database resulting to false-true positives ratios of approximately 5:1 and approximately 7:1 and recovery of 19% and 86%, respectively for protocols I and II. From 350,284 trials Protocol-I made only approximately 500 wrong predictions resulting to 0.5% error. In addition, though it was shown that artificially created domain-domain structures can in principle be good templates for modeling full length protein complexes, more sensitive methods are needed to detect homology relations. The quality of the models was assessed using two different criteria such as interfacial residues and overall RMSD. It was found that there is no correlation between these two measures. In many cases the interface residues were predicted correctly, but the overall RMSD was over 6 A and vice versa.
Collapse
|
46
|
Abstract
Statistical electrostatic analysis of 37 protein-protein complexes extracted from the previously developed database of protein complexes (ProtCom, http://www.ces.clemson.edu/compbio/protcom) is presented. It is shown that small interfaces have a higher content of charged and polar groups compared to large interfaces. In a vast majority of the cases the average pKa shifts for acidic residues induced by the complex formation are negative, indicating that complex formation stabilizes their ionizable states, whereas the histidines are predicted to destabilize the complex. The individual pKa shifts show the same tendency since 80% of the interfacial acidic groups were found to lower their pKas, whereas only 25% of histidines raise their pKa upon the complex formation. The interfacial groups have been divided into three sets according to the mechanism of their pKa shift, and statistical analysis of each set was performed. It was shown that the optimum pH values (pH of maximal stability) of the complex tend to be the same as the optimum pH values of the complex components. This finding can be used in the homology-based prediction of the 3D structures of protein complexes, especially when one needs to evaluate and rank putative models. It is more likely for a model to be correct if both components of the model complex and the entire complex have the same or at least similar values of the optimum pH.
Collapse
|
47
|
Kundrotas PJ. Statistical Studies of Flexible Nonhomogeneous Polypeptide Chains. Biomacromolecules 2005; 6:3010-7. [PMID: 16283721 DOI: 10.1021/bm0503266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Unfolded proteins attract increasing attention nowadays because of the accumulation of experimental evidence that they play an important role in different biological processes. Therefore, studies of various statistical properties of flexible protein-like polypeptide chains are becoming increasingly important as well. This paper presents distributions (histograms) of distances between atoms of titratable residues for flexible polypeptide chains with various residue compositions and with the hard-spheres potential taken into consideration. The factors influencing the parameters of the obtained histograms have been identified and analyzed. It was found that the sensitivity of the distributions with respect to the internal structure of intermediate residues increases with the number of residues between the considered charged residues. It was shown that branching at C(beta) atoms of the side chains of the intermediate residues is among the most considerable factors influencing the shape of the distance distribution and the average distance between atoms in flexible chains. Despite the model simplicity, the results of the calculations can be applied for systems with other types of interactions presented, and this was demonstrated for the charge-charge interactions. In particular, it was shown that those interactions have a significant effect on distances between the unlike charges, while such an effect for the like charges is much less pronounced. The comparison of predictions made on the basis of the presented calculations to some experimental data is also given, and possible applications of the theoretical concept described in the paper are discussed.
Collapse
|
48
|
Kundrotas PJ, Karshikoff A. Charge sequence coding in statistical modeling of unfolded proteins. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2004; 1702:1-8. [PMID: 15450845 DOI: 10.1016/j.bbapap.2004.07.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2004] [Revised: 06/18/2004] [Accepted: 07/01/2004] [Indexed: 11/30/2022]
Abstract
Unfolded proteins recently attracted attention due to accumulation of experimental evidences for their significant role in different life processes. Modeling of electrostatic interactions (EI) in unfolded state of proteins is becoming increasingly important as well. In this paper, we stress on the importance of how the sequence of charged residues of a given protein is incorporated into the models for calculation of EI in the unfolded state. On the basis of the distributions of distances between titratable sites of charged residues calculated for polypeptide chains of various compositions, it was found that the distance distribution for a pair of residues, located close to each other along the sequence of a protein, depends on what residues constitute the pair in question. It was concluded that the consideration of these residue-specific distributions is essential for a statistical model to be accurate from the physical point of view. It was suggested that use of distance intervals in the spherical model of unfolded proteins accounts better for the charge sequence than the set of single distance values. This was illustrated by comparison of the pK values of the titratable groups of the unfolded N-terminal SH3 domain of the Drosophila protein drk to the available experimental data.
Collapse
|
49
|
Kundrotas PJ, Karshikoff A. Effects of charge–charge interactions on dimensions of unfolded proteins: A Monte Carlo study. J Chem Phys 2003. [DOI: 10.1063/1.1588996] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
50
|
Kundrotas PJ, Karshikoff A. Modeling of denatured state for calculation of the electrostatic contribution to protein stability. Protein Sci 2002; 11:1681-6. [PMID: 12070320 PMCID: PMC2373658 DOI: 10.1110/ps.4690102] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
Existing models of the denatured state of proteins consider only one possible spatial distribution of protein charges and therefore are applicable to a limited number of cases. In this article, a more general framework for the modeling of the denatured state is proposed. It is based on the assumption that the titratable groups of an unfolded protein can adopt a quasi-random distribution restricted by the protein sequence. The model was applied for the calculations of electrostatic interactions in two proteins, barnase and N-terminal domain of the ribosomal protein L9. The calculated free energy of denaturation, DeltaG(pH), reproduces the experimental data better than the commonly used null approximation (NA). It was shown that the seemingly good agreement with experimental data obtained by NA originates from the compensatory effect between the pairwise electrostatic interactions and the desolvation energy of the individual sites. It was also found that the ionization properties of denatured proteins are influenced by the protein sequence.
Collapse
|