51
|
Kundrotas P, Alexov E. Predicting interacting and interfacial residues using continuous sequence segments. Int J Biol Macromol 2007; 41:615-23. [PMID: 17850859 DOI: 10.1016/j.ijbiomac.2007.08.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2007] [Revised: 07/31/2007] [Accepted: 08/01/2007] [Indexed: 01/07/2023]
Abstract
Development of sequence-based methods for predicting putative interfacial residues is an extremely important task in modeling 3D structures of protein-protein complexes. In the present paper we used non-gapped sequence segments to predict both interacting and interfacial residues. We demonstrated that continuous sequence segments do occur at the protein-protein interfaces and showed that continuous interacting interfacial segments (CIIS) of length nine are presented on average, in approximately 37% of the complexes in our dataset. Our results indicate that CIIS consist mostly of interacting strands and/or loops, while the CIIS involving the helixes are scarce. We performed scoring of CIIS using four different scoring mechanisms and found that scores of CIIS differ significantly from the scores calculated for random stretches of residues. We argue that such statistical difference inferred thought the corresponding Z-scores could be used for detecting putative interfacial residue segments without using any structural information. This hypothesis was tested on our dataset and benchmarking resulted to 10-60% prediction accuracy depending on type of benchmarking and scoring scheme used in calculations. Such predictions that do not depend on the availability of the 3D structures of monomers can be quite valuable in modeling 3D structures of obligatory complexes, for which structures of separated monomers do not exist.
Collapse
Affiliation(s)
- Petras Kundrotas
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, SC 29634, United States
| | | |
Collapse
|
52
|
Zhang S, Jin G, Zhang XS, Chen L. Discovering functions and revealing mechanisms at molecular level from biological networks. Proteomics 2007; 7:2856-69. [PMID: 17703505 DOI: 10.1002/pmic.200700095] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
With the increasingly accumulated data from high-throughput technologies, study on biomolecular networks has become one of key focuses in systems biology and bioinformatics. In particular, various types of molecular networks (e.g., protein-protein interaction (PPI) network; gene regulatory network (GRN); metabolic network (MN); gene coexpression network (GCEN)) have been extensively investigated, and those studies demonstrate great potentials to discover basic functions and to reveal essential mechanisms for various biological phenomena, by understanding biological systems not at individual component level but at a system-wide level. Recent studies on networks have created very prolific researches on many aspects of living organisms. In this paper, we aim to review the recent developments on topics related to molecular networks in a comprehensive manner, with the special emphasis on the computational aspect. The contents of the survey cover global topological properties and local structural characteristics, network motifs, network comparison and query, detection of functional modules and network motifs, function prediction from network analysis, inferring molecular networks from biological data as well as representative databases and software tools.
Collapse
Affiliation(s)
- Shihua Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | | | | | | |
Collapse
|
53
|
Probabilistic prediction and ranking of human protein-protein interactions. BMC Bioinformatics 2007; 8:239. [PMID: 17615067 PMCID: PMC1939716 DOI: 10.1186/1471-2105-8-239] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2007] [Accepted: 07/05/2007] [Indexed: 11/24/2022] Open
Abstract
Background Although the prediction of protein-protein interactions has been extensively investigated for yeast, few such datasets exist for the far larger proteome in human. Furthermore, it has recently been estimated that the overall average false positive rate of available computational and high-throughput experimental interaction datasets is as high as 90%. Results The prediction of human protein-protein interactions was investigated by combining orthogonal protein features within a probabilistic framework. The features include co-expression, orthology to known interacting proteins and the full-Bayesian combination of subcellular localization, co-occurrence of domains and post-translational modifications. A novel scoring function for local network topology was also investigated. This topology feature greatly enhanced the predictions and together with the full-Bayes combined features, made the largest contribution to the predictions. Using a conservative threshold, our most accurate predictor identifies 37606 human interactions, 32892 (80%) of which are not present in other publicly available large human interaction datasets, thus substantially increasing the coverage of the human interaction map. A subset of the 32892 novel predicted interactions have been independently validated. Comparison of the prediction dataset to other available human interaction datasets estimates the false positive rate of the new method to be below 80% which is competitive with other methods. Since the new method scores and ranks all human protein pairs, smaller subsets of higher quality can be generated thus leading to even lower false positive prediction rates. Conclusion The set of interactions predicted in this work increases the coverage of the human interaction map and will help determine the highest confidence human interactions.
Collapse
|
54
|
Kufareva I, Budagyan L, Raush E, Totrov M, Abagyan R. PIER: protein interface recognition for structural proteomics. Proteins 2007; 67:400-17. [PMID: 17299750 DOI: 10.1002/prot.21233] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Recent advances in structural proteomics call for development of fast and reliable automatic methods for prediction of functional surfaces of proteins with known three-dimensional structure, including binding sites for known and unknown protein partners as well as oligomerization interfaces. Despite significant progress the problem is still far from being solved. Most existing methods rely, at least partially, on evolutionary information from multiple sequence alignments projected on protein surface. The common drawback of such methods is their limited applicability to the proteins with a sparse set of sequential homologs, as well as inability to detect interfaces in evolutionary variable regions. In this study, the authors developed an improved method for predicting interfaces from a single protein structure, which is based on local statistical properties of the protein surface derived at the level of atomic groups. The proposed Protein IntErface Recognition (PIER) method achieved the overall precision of 60% at the recall threshold of 50% at the residue level on a diverse benchmark of 490 homodimeric, 62 heterodimeric, and 196 transient interfaces (compared with 25% precision at 50% recall expected from random residue function assignment). For 70% of proteins in the benchmark, the binding patch residues were successfully detected with precision exceeding 50% at 50% recall. The calculation only took seconds for an average 300-residue protein. The authors demonstrated that adding the evolutionary conservation signal only marginally influenced the overall prediction performance on the benchmark; moreover, for certain classes of proteins, using this signal actually resulted in a deteriorated prediction. Thorough benchmarking using other datasets from literature showed that PIER yielded improved performance as compared with several alignment-free or alignment-dependent predictions. The accuracy, efficiency, and dependence on structure alone make PIER a suitable tool for automated high-throughput annotation of protein structures emerging from structural proteomics projects.
Collapse
Affiliation(s)
- Irina Kufareva
- Scripps Research Institute, La Jolla, California 92037, USA
| | | | | | | | | |
Collapse
|
55
|
Jefferson ER, Walsh TP, Roberts TJ, Barton GJ. SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein-Protein Interactions. Nucleic Acids Res 2007; 35:D580-9. [PMID: 17202171 PMCID: PMC1899103 DOI: 10.1093/nar/gkl836] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
SNAPPI-DB, a high performance database of Structures, iNterfaces and Alignments of Protein–Protein Interactions, and its associated Java Application Programming Interface (API) is described. SNAPPI-DB contains structural data, down to the level of atom co-ordinates, for each structure in the Protein Data Bank (PDB) together with associated data including SCOP, CATH, Pfam, SWISSPROT, InterPro, GO terms, Protein Quaternary Structures (PQS) and secondary structure information. Domain–domain interactions are stored for multiple domain definitions and are classified by their Superfamily/Family pair and interaction interface. Each set of classified domain–domain interactions has an associated multiple structure alignment for each partner. The API facilitates data access via PDB entries, domains and domain–domain interactions. Rapid development, fast database access and the ability to perform advanced queries without the requirement for complex SQL statements are provided via an object oriented database and the Java Data Objects (JDO) API. SNAPPI-DB contains many features which are not available in other databases of structural protein–protein interactions. It has been applied in three studies on the properties of protein–protein interactions and is currently being employed to train a protein–protein interaction predictor and a functional residue predictor. The database, API and manual are available for download at: .
Collapse
Affiliation(s)
| | | | | | - Geoffrey J. Barton
- To whom correspondence should be addressed. Tel: +44 01382 385860; Fax: +44 01382 385764;
| |
Collapse
|
56
|
Abstract
MOTIVATION Observation of co-crystallized protein-protein complexes and low-resolution protein-protein docking studies suggest the existence of a binding-related anisotropic shape characteristic of protein-protein complexes. RESULTS Our study systematically assessed the global shape of proteins in a non-redundant database of co-crystallized protein-protein complexes by measuring the distance of the surface residues to the protein's center of mass. The results show that on average the binding site residues are closer to the center of mass than the non-binding surface residues. Thus, the study directly detects an important and simple binding-related characteristic of protein shapes. The results provide an insight into one of the fundamental properties of protein structure and association.
Collapse
Affiliation(s)
- George Nicola
- Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Rd, La Jolla, CA 92037, USA
| | | |
Collapse
|
57
|
Ertekin A, Nussinov R, Haliloglu T. Association of putative concave protein-binding sites with the fluctuation behavior of residues. Protein Sci 2007; 15:2265-77. [PMID: 17008715 PMCID: PMC2242393 DOI: 10.1110/ps.051815006] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Here, we propose a binding site prediction method based on the high frequency end of the spectrum in the native state of the protein structural dynamics. The spectrum is obtained using an elastic network model (GNM). High frequency vibrating (HFV) residues are determined from the fastest modes dynamics. HFV residue clusters and the associated surface patch residues are tested for their likelihood to locate at the binding interfaces using two different data sets, the Benchmark Set of mainly enzymes and antigen/antibodies and the Cluster Set of more diverse structures. The binding interface is identified to be within 7.5 A of the HFV residue clusters in the Benchmark Set and Cluster Set, for 77% and 70% of the structures, respectively. The success rate increases to 88% and 84%, respectively, by using the surface patches. The results suggest that concave binding interfaces, typically those of enzyme-binding sites, are enriched by the HFV residues. Thus, we expect that the association of HFV residues with the interfaces is mostly for enzymes. If, however, a binding region has invaginations and cavities, as in some of the antigen/antibodies and in cases in the Cluster data set, we expect it would be detected there too. This implies that binding sites possess several (inter-related) properties such as cavities, high packing density, conservation, and disposition for hotspots at binding surfaces. It further suggests that the high frequency vibrating residue-based approach is a potential tool for identification of regions likely to serve as protein-binding sites. The software is available at http://www.prc.boun.edu.tr/PRC/software.html.
Collapse
Affiliation(s)
- Asli Ertekin
- Polymer Research Center and Chemical Engineering Department, Bogazici University, Bebek 34342, Istanbul, Turkey
| | | | | |
Collapse
|
58
|
Kundrotas PJ, Alexov E. PROTCOM: searchable database of protein complexes enhanced with domain-domain structures. Nucleic Acids Res 2006; 35:D575-9. [PMID: 17071962 PMCID: PMC1635331 DOI: 10.1093/nar/gkl768] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The database of protein complexes (PROTCOM) is a compilation of known 3D structures of protein–protein complexes enriched with artificially created domain–domain structures using the available entries in the Protein Data Bank. The domain–domain structures are generated by parsing single chain structures into loosely connected domains and are important features of the database. The database () could be used for benchmarking purposes of the docking and other algorithms for predicting 3D structures of protein–protein complexes. The database can be utilized as a template database in the homology or threading methods for modeling the 3D structures of unknown protein–protein complexes. PROTCOM provides the scientific community with an integrated set of tools for browsing, searching, visualizing and downloading a pool of protein complexes. The user is given the option to select a subset of entries using a combination of up to 10 different criteria. As on July 2006 the database contains 1770 entries, each of which consists of the known 3D structures and additional relevant information that can be displayed either in text-only or in visual mode.
Collapse
Affiliation(s)
| | - Emil Alexov
- To whom correspondence should be addressed. Tel: +1 864 656 5307; Fax: +1 864 656 0805;
| |
Collapse
|
59
|
Sen TZ, Cheng H, Kloczkowski A, Jernigan RL. A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining. Protein Sci 2006; 15:2499-506. [PMID: 17001039 PMCID: PMC2242411 DOI: 10.1110/ps.062125306] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
The major aim of tertiary structure prediction is to obtain protein models with the highest possible accuracy. Fold recognition, homology modeling, and de novo prediction methods typically use predicted secondary structures as input, and all of these methods may significantly benefit from more accurate secondary structure predictions. Although there are many different secondary structure prediction methods available in the literature, their cross-validated prediction accuracy is generally <80%. In order to increase the prediction accuracy, we developed a novel hybrid algorithm called Consensus Data Mining (CDM) that combines our two previous successful methods: (1) Fragment Database Mining (FDM), which exploits the Protein Data Bank structures, and (2) GOR V, which is based on information theory, Bayesian statistics, and multiple sequence alignments (MSA). In CDM, the target sequence is dissected into smaller fragments that are compared with fragments obtained from related sequences in the PDB. For fragments with a sequence identity above a certain sequence identity threshold, the FDM method is applied for the prediction. The remainder of the fragments are predicted by GOR V. The results of the CDM are provided as a function of the upper sequence identities of aligned fragments and the sequence identity threshold. We observe that the value 50% is the optimum sequence identity threshold, and that the accuracy of the CDM method measured by Q(3) ranges from 67.5% to 93.2%, depending on the availability of known structural fragments with sufficiently high sequence identity. As the Protein Data Bank grows, it is anticipated that this consensus method will improve because it will rely more upon the structural fragments.
Collapse
Affiliation(s)
- Taner Z Sen
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, Iowa 50011-3020, USA.
| | | | | | | |
Collapse
|
60
|
Kundrotas PJ, Alexov E. Predicting 3D structures of transient protein-protein complexes by homology. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2006; 1764:1498-511. [PMID: 16963323 DOI: 10.1016/j.bbapap.2006.08.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2006] [Revised: 07/27/2006] [Accepted: 08/03/2006] [Indexed: 11/26/2022]
Abstract
The paper reports a homology based approach for predicting the 3D structures of full length hetero protein complexes. We have created a database of templates that includes structures of hetero protein-protein complexes as well as domain-domain structures (), which allowed us to expand the template pool up to 418 two-chain entries (at 40% sequence identity). Two protocols were tested-a protocol based on position specific Blast search (Protocol-I) and a protocol based on structural similarity of monomers (Protocol-II). All possible combinations of two monomers (350,284 pairs) in the ProtCom database were subjected to both protocols to predict if they form complexes. The predictions were benchmarked against the ProtCom database resulting to false-true positives ratios of approximately 5:1 and approximately 7:1 and recovery of 19% and 86%, respectively for protocols I and II. From 350,284 trials Protocol-I made only approximately 500 wrong predictions resulting to 0.5% error. In addition, though it was shown that artificially created domain-domain structures can in principle be good templates for modeling full length protein complexes, more sensitive methods are needed to detect homology relations. The quality of the models was assessed using two different criteria such as interfacial residues and overall RMSD. It was found that there is no correlation between these two measures. In many cases the interface residues were predicted correctly, but the overall RMSD was over 6 A and vice versa.
Collapse
Affiliation(s)
- Petras J Kundrotas
- Computational Biophysics and Bioinformatics, Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | | |
Collapse
|
61
|
Lise S, Walker-Taylor A, Jones DT. Docking protein domains in contact space. BMC Bioinformatics 2006; 7:310. [PMID: 16790041 PMCID: PMC1559650 DOI: 10.1186/1471-2105-7-310] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2006] [Accepted: 06/21/2006] [Indexed: 11/10/2022] Open
Abstract
Background Many biological processes involve the physical interaction between protein domains. Understanding these functional associations requires knowledge of the molecular structure. Experimental investigations though present considerable difficulties and there is therefore a need for accurate and reliable computational methods. In this paper we present a novel method that seeks to dock protein domains using a contact map representation. Rather than providing a full three dimensional model of the complex, the method predicts contacting residues across the interface. We use a scoring function that combines structural, physicochemical and evolutionary information, where each potential residue contact is assigned a value according to the scoring function and the hypothesis is that the real configuration of contacts is the one that maximizes the score. The search is performed with a simulated annealing algorithm directly in contact space. Results We have tested the method on interacting domain pairs that are part of the same protein (intra-molecular domains). We show that it correctly predicts some contacts and that predicted residues tend to be significantly closer to each other than other pairs of residues in the same domains. Moreover we find that predicted contacts can often discriminate the best model (or the native structure, if present) among a set of optimal solutions generated by a standard docking procedure. Conclusion Contact docking appears feasible and able to complement other computational methods for the prediction of protein-protein interactions. With respect to more standard docking algorithms it might be more suitable to handle protein conformational changes and to predict complexes starting from protein models.
Collapse
Affiliation(s)
- Stefano Lise
- Department of Biochemistry and Molecular Biology, University College London, UK
| | | | - David T Jones
- Department of Biochemistry and Molecular Biology, University College London, UK
- Department of Computer Science, University College London, UK
| |
Collapse
|
62
|
Abstract
The understanding of protein-protein interactions is a major goal in the postgenomic era. The prediction of interaction from sequence and the subsequent generation of full-length dimeric models is therefore of great interest especially because the number of structurally characterized protein-protein complexes is sparse. A quality assessment of a benchmark comprised of 170 weakly homologous dimeric target-template pairs is presented. They are predicted in a two-step method, similar to the previously described MULTIPROSPECTOR algorithm: each target sequence is assigned to a monomeric template structure by threading; then, those templates that belong to the same physically interacting dimer template are selected. Additionally we use structural alignments as the "gold standard" to assess the percentage of correctly assigned monomer and dimer templates and to evaluate the threading results with a focus on the quality of the alignments in the interfacial region. This work aims to give a quantitative picture of the quality of dimeric threading. Except for one, all monomer templates are identified correctly, but approximately 40% of the dimer templates are still problematic or incorrect. Preliminary results for three full-length dimeric models generated with the TASSER method show on average a significant improvement of the final model over the initial template.
Collapse
Affiliation(s)
- Vera Grimm
- Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, New York, USA
| | | | | |
Collapse
|
63
|
Grigoryan G, Keating AE. Structure-based Prediction of bZIP Partnering Specificity. J Mol Biol 2006; 355:1125-42. [PMID: 16359704 DOI: 10.1016/j.jmb.2005.11.036] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2005] [Revised: 11/10/2005] [Accepted: 11/11/2005] [Indexed: 10/25/2022]
Abstract
Predicting protein interaction specificity from sequence is an important goal in computational biology. We present a model for predicting the interaction preferences of coiled-coil peptides derived from bZIP transcription factors that performs very well when tested against experimental protein microarray data. We used only sequence information to build atomic-resolution structures for 1711 dimeric complexes, and evaluated these with a variety of functions based on physics, learned empirical weights or experimental coupling energies. A purely physical model, similar to those used for protein design studies, gave reasonable performance. The results were improved significantly when helix propensities were used in place of a structurally explicit model to represent the unfolded reference state. Further improvement resulted upon accounting for residue-residue interactions in competing states in a generic way. Purely physical structure-based methods had difficulty capturing core interactions accurately, especially those involving polar residues such as asparagine. When these terms were replaced with weights from a machine-learning approach, the resulting model was able to correctly order the stabilities of over 6000 pairs of complexes with greater than 90% accuracy. The final model is physically interpretable, and suggests specific pairs of residues that are important for bZIP interaction specificity. Our results illustrate the power and potential of structural modeling as a method for predicting protein interactions and highlight obstacles that must be overcome to reach quantitative accuracy using a de novo approach. Our method shows unprecedented performance in predicting protein-protein interaction specificity accurately using structural modeling and suggests that predicting coiled-coil interactions generally may be within reach.
Collapse
|
64
|
Kosinski J, Steindorf I, Bujnicki JM, Giron-Monzon L, Friedhoff P. Analysis of the quaternary structure of the MutL C-terminal domain. J Mol Biol 2005; 351:895-909. [PMID: 16024043 DOI: 10.1016/j.jmb.2005.06.044] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2005] [Revised: 06/14/2005] [Accepted: 06/17/2005] [Indexed: 11/29/2022]
Abstract
The dimeric DNA mismatch repair protein MutL has a key function in communicating mismatch recognition by MutS to downstream repair processes. Dimerization of MutL is mediated by the C-terminal domain, while activity of the protein is modulated by the ATP-dependent dimerization of the highly conserved N-terminal domain. Recently, a crystal structure analysis of the Escherichia coli MutL C-terminal dimerization domain has been reported and a model for the biological dimer was proposed. In this model, dimerization is mediated by the internal (In) subdomain comprising residues 475-569. Here, we report a computational analysis of all protein interfaces observed in the crystal structure and suggest that the biological dimer interface is formed by a hydrophobic surface patch of the external (Ex) subdomain (residues 432-474 and 570-615). Moreover, sequence analysis revealed that this surface patch is conserved among the MutL proteins. To test this hypothesis, single and double-cysteine variants of MutL were generated and tested for their ability to be cross-linked with chemical cross-linkers of various size. Finally, deletion of the C-terminal residues 605-615 abolished homodimerization. The biochemical data are fully compatible with a revised model for the biological dimer, which has important implications for understanding the heterodimerization of eukaryotic MutL homologues, modeling the MutL holoenzyme and predicting protein-protein interaction sites.
Collapse
Affiliation(s)
- Jan Kosinski
- Institut für Biochemie FB 08, Justus-Liebig Universität, Giessen D-35392, Germany
| | | | | | | | | |
Collapse
|