101
|
Basu S, Bhattacharyya D, Banerjee R. Self-complementarity within proteins: bridging the gap between binding and folding. Biophys J 2012; 102:2605-14. [PMID: 22713576 DOI: 10.1016/j.bpj.2012.04.029] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Revised: 03/30/2012] [Accepted: 04/17/2012] [Indexed: 01/09/2023] Open
Abstract
Complementarity, in terms of both shape and electrostatic potential, has been quantitatively estimated at protein-protein interfaces and used extensively to predict the specific geometry of association between interacting proteins. In this work, we attempted to place both binding and folding on a common conceptual platform based on complementarity. To that end, we estimated (for the first time to our knowledge) electrostatic complementarity (Em) for residues buried within proteins. Em measures the correlation of surface electrostatic potential at protein interiors. The results show fairly uniform and significant values for all amino acids. Interestingly, hydrophobic side chains also attain appreciable complementarity primarily due to the trajectory of the main chain. Previous work from our laboratory characterized the surface (or shape) complementarity (Sm) of interior residues, and both of these measures have now been combined to derive two scoring functions to identify the native fold amid a set of decoys. These scoring functions are somewhat similar to functions that discriminate among multiple solutions in a protein-protein docking exercise. The performances of both of these functions on state-of-the-art databases were comparable if not better than most currently available scoring functions. Thus, analogously to interfacial residues of protein chains associated (docked) with specific geometry, amino acids found in the native interior have to satisfy fairly stringent constraints in terms of both Sm and Em. The functions were also found to be useful for correctly identifying the same fold for two sequences with low sequence identity. Finally, inspired by the Ramachandran plot, we developed a plot of Sm versus Em (referred to as the complementarity plot) that identifies residues with suboptimal packing and electrostatics which appear to be correlated to coordinate errors.
Collapse
Affiliation(s)
- Sankar Basu
- Crystallography and Molecular Biology Division, Saha Institute of Nuclear Physics, Kolkata, India
| | | | | |
Collapse
|
102
|
Ceres N, Lavery R. Coarse-grain Protein Models. INNOVATIONS IN BIOMOLECULAR MODELING AND SIMULATIONS 2012. [DOI: 10.1039/9781849735049-00219] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Coarse-graining is a powerful approach for modeling biomolecules that, over the last few decades, has been extensively applied to proteins. Coarse-grain models offer access to large systems and to slow processes without becoming computationally unmanageable. In addition, they are very versatile, enabling both the protein representation and the energy function to be adapted to the biological problem in hand. This review concentrates on modeling soluble proteins and their assemblies. It presents an overview of the coarse-grain representations, of the associated interaction potentials, and of the optimization procedures used to define them. It then shows how coarse-grain models have been used to understand processes involving proteins, from their initial folding to their functional properties, their binary interactions, and the assembly of large complexes.
Collapse
Affiliation(s)
- N. Ceres
- Bases Moléculaires et Structurales des Systèmes Infectieux Université Lyon1/CNRS UMR 5086, IBCP, 7 Passage du Vercors, 69367, Lyon France
| | - R. Lavery
- Bases Moléculaires et Structurales des Systèmes Infectieux Université Lyon1/CNRS UMR 5086, IBCP, 7 Passage du Vercors, 69367, Lyon France
| |
Collapse
|
103
|
Zimmermann MT, Leelananda SP, Kloczkowski A, Jernigan RL. Combining statistical potentials with dynamics-based entropies improves selection from protein decoys and docking poses. J Phys Chem B 2012; 116:6725-31. [PMID: 22490366 DOI: 10.1021/jp2120143] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein structure prediction and protein-protein docking are important and widely used tools, but methods to confidently evaluate the quality of a predicted structure or binding pose have had limited success. Typically, either knowledge-based or physics-based energy functions are employed to evaluate a set of predicted structures (termed "decoys" in structure prediction and "poses" in docking), with the lowest energy structure being assumed to be the one closest to the native state. While successful for many cases, failures are still common. Thus, improvements to structure evaluation methods are essential for future improvements. In this work, we combine multibody statistical potentials with dynamics models, evaluating fluctuation-based entropies that include contributions from the entire structure. This leads to enhanced selection of native-like structures for CASP9 decoys, refined ClusPro docking poses, as well as large sets of docking poses from the Benchmark 3.0 and Dockground data sets. The data used include both bound and unbound docking, and positive results are found for each type. Not only does this method yield improved average results, but for high quality docking poses, we often pick the best pose.
Collapse
Affiliation(s)
- Michael T Zimmermann
- Bioinformatics and Computational Biology Interdepartmental Graduate Program, Iowa State University, Ames, Iowa 50011, USA
| | | | | | | |
Collapse
|
104
|
Fan H, Periole X, Mark AE. Mimicking the action of folding chaperones by Hamiltonian replica-exchange molecular dynamics simulations: application in the refinement of de novo models. Proteins 2012; 80:1744-54. [PMID: 22411697 DOI: 10.1002/prot.24068] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Revised: 02/11/2012] [Accepted: 03/03/2012] [Indexed: 12/25/2022]
Abstract
The efficiency of using a variant of Hamiltonian replica-exchange molecular dynamics (Chaperone H-replica-exchange molecular dynamics [CH-REMD]) for the refinement of protein structural models generated de novo is investigated. In CH-REMD, the interaction between the protein and its environment, specifically, the electrostatic interaction between the protein and the solvating water, is varied leading to cycles of partial unfolding and refolding mimicking some aspects of folding chaperones. In 10 of the 15 cases examined, the CH-REMD approach sampled structures in which the root-mean-square deviation (RMSD) of secondary structure elements (SSE-RMSD) with respect to the experimental structure was more than 1.0 Å lower than the initial de novo model. In 14 of the 15 cases, the improvement was more than 0.5 Å. The ability of three different statistical potentials to identify near-native conformations was also examined. Little correlation between the SSE-RMSD of the sampled structures with respect to the experimental structure and any of the scoring functions tested was found. The most effective scoring function tested was the DFIRE potential. Using the DFIRE potential, the SSE-RMSD of the best scoring structures was on average 0.3 Å lower than the initial model. Overall the work demonstrates that targeted enhanced-sampling techniques such as CH-REMD can lead to the systematic refinement of protein structural models generated de novo but that improved potentials for the identification of near-native structures are still needed.
Collapse
Affiliation(s)
- Hao Fan
- Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, California 94158-2330, USA
| | | | | |
Collapse
|
105
|
Cossio P, Granata D, Laio A, Seno F, Trovato A. A simple and efficient statistical potential for scoring ensembles of protein structures. Sci Rep 2012. [DOI: 10.1038/srep00351] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
106
|
Gront D, Kmiecik S, Blaszczyk M, Ekonomiuk D, Koliński A. Optimization of protein models. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2012. [DOI: 10.1002/wcms.1090] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Dominik Gront
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Warsaw, Poland
| | - Sebastian Kmiecik
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Warsaw, Poland
| | - Maciej Blaszczyk
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Warsaw, Poland
| | - Dariusz Ekonomiuk
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Warsaw, Poland
| | - Andrzej Koliński
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Warsaw, Poland
| |
Collapse
|
107
|
Zhou H, Skolnick J. GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. Biophys J 2012; 101:2043-52. [PMID: 22004759 DOI: 10.1016/j.bpj.2011.09.012] [Citation(s) in RCA: 201] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Revised: 09/07/2011] [Accepted: 09/09/2011] [Indexed: 12/18/2022] Open
Abstract
An accurate scoring function is a key component for successful protein structure prediction. To address this important unsolved problem, we develop a generalized orientation and distance-dependent all-atom statistical potential. The new statistical potential, generalized orientation-dependent all-atom potential (GOAP), depends on the relative orientation of the planes associated with each heavy atom in interacting pairs. GOAP is a generalization of previous orientation-dependent potentials that consider only representative atoms or blocks of side-chain or polar atoms. GOAP is decomposed into distance- and angle-dependent contributions. The DFIRE distance-scaled finite ideal gas reference state is employed for the distance-dependent component of GOAP. GOAP was tested on 11 commonly used decoy sets containing 278 targets, and recognized 226 native structures as best from the decoys, whereas DFIRE recognized 127 targets. The major improvement comes from decoy sets that have homology-modeled structures that are close to native (all within ∼4.0 Å) or from the ROSETTA ab initio decoy set. For these two kinds of decoys, orientation-independent DFIRE or only side-chain orientation-dependent RWplus performed poorly. Although the OPUS-PSP block-based orientation-dependent, side-chain atom contact potential performs much better (recognizing 196 targets) than DFIRE, RWplus, and dDFIRE, it is still ∼15% worse than GOAP. Thus, GOAP is a promising advance in knowledge-based, all-atom statistical potentials. GOAP is available for download at http://cssb.biology.gatech.edu/GOAP.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, USA
| | | |
Collapse
|
108
|
Mirzaie M, Sadeghi M. Distance-dependent atomic knowledge-based force in protein fold recognition. Proteins 2012; 80:683-90. [DOI: 10.1002/prot.24011] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2011] [Revised: 11/15/2011] [Accepted: 12/06/2011] [Indexed: 11/08/2022]
|
109
|
Masso M. Generation of atomic four-body statistical potentials derived from the delaunay tessellation of protein structures. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2012; 2012:6321-6324. [PMID: 23367374 DOI: 10.1109/embc.2012.6347439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Delaunay tessellation of the atomic coordinates for a crystallographic protein structure yields an aggregate of non-overlapping and space-filling irregular tetrahedral simplices. The vertices of each simplex objectively identify a quadruplet of nearest neighbor atoms in the protein. Here we apply Delaunay tessellation to 1417 high-resolution structures of single chains that share low sequence identity, for the purpose of determining the relative frequencies of occurrence for all possible nearest neighbor atomic quadruplet types. Alternative distributions are explored by varying two fundamental parameters: atomic alphabet selection and cutoff length for admissible simplex edges. The distributions are then converted to four-body potential functions by implementing the inverted Boltzmann principle, which requires calculating the distribution of the reference state. Two alternative definitions for the reference state are presented, which introduces a third parameter, and we derive and compare an array of such potential functions. These knowledge-based statistical potentials based on higher-order interactions complement and generalize the more commonly encountered atom-pair potentials, for which a number of approaches are described in the literature.
Collapse
Affiliation(s)
- Majid Masso
- Laboratory for Structural Bioinformatics, School of Systems Biology, George Mason University, Manassas, VA 20110, USA.
| |
Collapse
|
110
|
Li H, Zhou Y. FOLD HELICAL PROTEINS BY ENERGY MINIMIZATION IN DIHEDRAL SPACE AND A DFIRE-BASED STATISTICAL ENERGY FUNCTION. J Bioinform Comput Biol 2011; 3:1151-70. [PMID: 16278952 DOI: 10.1142/s0219720005001430] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2004] [Revised: 04/12/2005] [Accepted: 04/21/2005] [Indexed: 11/18/2022]
Abstract
Statistical energy functions are discrete (or stepwise) energy functions that lack van der Waals repulsion. As a result, they are often applied directly to a given structure (native or decoy) without further energy minimization being performed to the structure. However, the full benefit (or hidden defect) of an energy function cannot be revealed without energy minimization. This paper tests a recently developed, all-atom statistical energy function by energy minimization with a fixed secondary helical structure in dihedral space. This is accomplished by combining the statistical energy function based on a distance-scaled finite ideal-gas reference (DFIRE) state with a simple repulsive interaction and an improper torsion energy function. The energy function was used to minimize 2000 random initial structures of 41 small and medium-sized helical proteins in a dihedral space with a fixed helical region. Results indicate that near-native structures for most studied proteins can be obtained by minimization alone. The average minimum root-mean-squared distance (rmsd) from the native structure for all 41 proteins is 4.1 Å. The energy function (together with a simple clustering of similar structures) also makes a reasonable selection of near-native structures from minimized structures. The average rmsd value and the average rank for the best structure in the top five is 6.8 Å and 2.4, respectively. The accuracy of the structures sampled and the structure selections can be improved significantly with the removal of flexible terminal regions in rmsd calculations and in minimization and with the increase in the number of minimizations. The minimized structures form an excellent decoy set for testing other energy functions because most structures are well-packed with minimum hard-core overlaps with correct hydrophobic/hydrophilic partitioning. They are available online at .
Collapse
Affiliation(s)
- Hongzhi Li
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology & Biophysics, State University of New York at Buffalo, 124 Sherman Hall, Buffalo, New York 14214, USA.
| | | |
Collapse
|
111
|
Fan H, Schneidman-Duhovny D, Irwin JJ, Dong G, Shoichet BK, Sali A. Statistical potential for modeling and ranking of protein-ligand interactions. J Chem Inf Model 2011; 51:3078-92. [PMID: 22014038 DOI: 10.1021/ci200377u] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF(1)) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScore(CSD) and ITScore/SE and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp) and the LigScore Web server (http://salilab.org/ligscore/).
Collapse
Affiliation(s)
- Hao Fan
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, USA
| | | | | | | | | | | |
Collapse
|
112
|
SAKAE YOSHITAKE, OKAMOTO YUKO. PROTEIN FORCE-FIELD PARAMETERS OPTIMIZED WITH THE PROTEIN DATA BANK I: FORCE-FIELD OPTIMIZATIONS. JOURNAL OF THEORETICAL & COMPUTATIONAL CHEMISTRY 2011. [DOI: 10.1142/s0219633604001082] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We optimized five existing sets of force-field parameters for protein systems by our recently proposed method. The five force fields are AMBER parm94, AMBER parm96, AMBER parm99, CHARMM version 22, and OPLS-AA. The method consists of minimizing the sum of the square of the force acting on each atom in the proteins with the structures from the Protein Data Bank (PDB). We selected the partial-charge and backbone torsion-energy parameters for this optimization, and 100 molecules from the PDB were used. We gave detailed comparisons of the optimized force fields and found that there is a tendency of convergence towards the same function for the torsion-energy term.
Collapse
Affiliation(s)
- YOSHITAKE SAKAE
- Department of Functional Molecular Science, The Graduate University for Advanced Studies, Okazaki, Aichi 444-8585, Japan
- Department of Theoretical Studies, Institute for Molecular Science, Okazaki, Aichi 444-8585, Japan
| | - YUKO OKAMOTO
- Department of Functional Molecular Science, The Graduate University for Advanced Studies, Okazaki, Aichi 444-8585, Japan
- Department of Theoretical Studies, Institute for Molecular Science, Okazaki, Aichi 444-8585, Japan
| |
Collapse
|
113
|
Wang Q, Shang Y, Xu D. Improving a consensus approach for protein structure selection by removing redundancy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1708-15. [PMID: 21519117 DOI: 10.1109/tcbb.2011.75] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
In protein tertiary structure prediction, a crucial step is to select near-native structures from a large number of predicted structural models. Over the years, extensive research has been conducted for the protein structure selection problem with most approaches focusing on developing more accurate energy or scoring functions. Despite significant advances in this area, the discerning power of current approaches is still unsatisfactory. In this paper, we propose a novel consensus-based algorithm for the selection of predicted protein structures. Given a set of predicted models, our method first removes redundant structures to derive a subset of reference models. Then, a structure is ranked based on its average pairwise similarity to the reference models. Using the CASP8 data set containing a large collection of predicted models for 122 targets, we compared our method with the best CASP8 quality assessment (QA) servers, which are all consensus based, and showed that our QA scores correlate better with the GDT-TSs than those of the CASP8 QA servers. We also compared our method with the state-of-the-art scoring functions and showed its improved performance for near-native model selection. The GDT-TSs of the top models picked by our method are on average more than 8 percent better than the ones selected by the best performing scoring function.
Collapse
Affiliation(s)
- Qingguo Wang
- Department of Computer Science, University of Missouri, 201 Engineering Building West, Columbia, MO 65211, USA.
| | | | | |
Collapse
|
114
|
Choi Y, Deane CM. Predicting antibody complementarity determining region structures without classification. MOLECULAR BIOSYSTEMS 2011; 7:3327-34. [PMID: 22011953 DOI: 10.1039/c1mb05223c] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Antibodies are used extensively in medical and biological research. Their complementarity determining regions (CDRs) define the majority of their antigen binding functionality. CDR structures have been intensively studied and classified (canonical structures). Here we show that CDR structure prediction is no different from the standard loop structure prediction problem and predict them without classification. FREAD, a successful database loop prediction technique, is able to produce accurate predictions for all CDR loops (0.81, 0.42, 0.96, 0.98, 0.88 and 2.25 Å RMSD for CDR-L1 to CDR-H3). In order to overcome the relatively poor predictions of CDR-H3, we developed two variants of FREAD, one focused on sequence similarity (FREAD-S) and another which includes contact information (ConFREAD). Both of the methods improve accuracy for CDR-H3 to 1.34 Å and 1.23 Å respectively. The FREAD variants are also tested on homology models and compared to RosettaAntibody (CDR-H3 prediction on models: 1.98 and 2.62 Å for ConFREAD and RosettaAntibody respectively). CDRs are known to change their structural conformations upon binding the antigen. Traditional CDR classifications are based on sequence similarity and do not account for such environment changes. Using a set of antigen-free and antigen-bound structures, we compared our FREAD variants. ConFREAD which includes contact information successfully discriminates the bound and unbound CDR structures and achieves an accuracy of 1.35 Å for bound structures of CDR-H3.
Collapse
Affiliation(s)
- Yoonjoo Choi
- Department of Statistics, Oxford University, 1 South Parks Road, Oxford OX1 3TG, UK
| | | |
Collapse
|
115
|
Zhang J, Wang Q, Vantasin K, Zhang J, He Z, Kosztin I, Shang Y, Xu D. A multilayer evaluation approach for protein structure prediction and model quality assessment. Proteins 2011; 79 Suppl 10:172-84. [PMID: 21997706 DOI: 10.1002/prot.23184] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Revised: 08/26/2011] [Accepted: 09/05/2011] [Indexed: 01/03/2023]
Abstract
Protein tertiary structures are essential for studying functions of proteins at molecular level. An indispensable approach for protein structure solution is computational prediction. Most protein structure prediction methods generate candidate models first and select the best candidates by model quality assessment (QA). In many cases, good models can be produced, but the QA tools fail to select the best ones from the candidate model pool. Because of incomplete understanding of protein folding, each QA method only reflects partial facets of a structure model and thus has limited discerning power with no one consistently outperforming others. In this article, we developed a set of new QA methods, including two QA methods for evaluating target/template alignments, a molecular dynamics (MD)-based QA method, and three consensus QA methods with selected references to reveal new facets of protein structures complementary to the existing methods. Moreover, the underlying relationship among different QA methods were analyzed and then integrated into a multilayer evaluation approach to guide the model generation and model selection in prediction. All methods are integrated and implemented into an innovative and improved prediction system hereafter referred to as MUFOLD. In CASP8 and CASP9, MUFOLD has demonstrated the proof of the principles in terms of both QA discerning power and structure prediction accuracy.
Collapse
Affiliation(s)
- Jingfen Zhang
- Department of Computer Science, University of Missouri, Columbia, MO, USA
| | | | | | | | | | | | | | | |
Collapse
|
116
|
Zhou W, Yan H. Prediction of DNA-binding protein based on statistical and geometric features and support vector machines. Proteome Sci 2011; 9 Suppl 1:S1. [PMID: 22166014 PMCID: PMC3289070 DOI: 10.1186/1477-5956-9-s1-s1] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background Previous studies on protein-DNA interaction mostly focused on the bound structure of DNA-binding proteins but few paid enough attention to the unbound structures. As more new proteins are discovered, it is useful and imperative to develop algorithms for the functional prediction of unbound proteins. In our work, we apply an alpha shape model to represent the surface structure of the protein-DNA complex and extract useful statistical and geometric features, and use structural alignment and support vector machines for the prediction of unbound DNA-binding proteins. Results The performance of our method is evaluated by discriminating a set of 104 DNA-binding proteins from 401 non-DNA-binding proteins. In the same test, the proposed method outperforms the other method using conditional probability. The results achieved by our proposed method for; precision, 83.33%; accuracy, 86.53%; and MCC, 0.5368 demonstrate its good performance. Conclusions In this study we develop an effective method for the prediction of protein-DNA interactions based on statistical and geometric features and support vector machines. Our results show that interface surface features play an important role in protein-DNA interaction. Our technique is able to predict unbound DNA-binding protein and discriminatory DNA-binding proteins from proteins that bind with other molecules.
Collapse
Affiliation(s)
- Weiqiang Zhou
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong.
| | | |
Collapse
|
117
|
Wang Q, Vantasin K, Xu D, Shang Y. MUFOLD-WQA: A new selective consensus method for quality assessment in protein structure prediction. Proteins 2011; 79 Suppl 10:185-95. [PMID: 21997748 DOI: 10.1002/prot.23185] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 08/25/2011] [Accepted: 08/27/2011] [Indexed: 11/07/2022]
Abstract
Assessing the quality of predicted models is essential in protein tertiary structure prediction. In the past critical assessment of techniques for protein structure prediction (CASP) experiments, consensus quality assessment (QA) methods have shown to be very effective, outperforming single-model methods and other competing approaches by a large margin. In the consensus QA approach, the quality score of a model is typically estimated based on pair-wise structure similarity of it to a set of reference models. In CASP8, the differences among the top QA servers were mostly in the selection of the reference models. In this article, we present a new consensus method "SelCon" based on two key ideas: (1) to adaptively select appropriate reference models based on the attributes of the whole set of predicted models and (2) to weigh different reference models differently, and in particular not to use models that are too similar or too different from the candidate model as its references. We have developed several reference selection functions in SelCon and obtained improved QA results over existing QA methods in experiments using CASP7 and CASP8 data. In the recently completed CASP9 in 2010, the new method was implemented in our MUFOLD-WQA server. Both the official CASP9 assessment and our in-house evaluation showed that MUFOLD-WQA performed very well and achieved top performances in both the global structure QA and top-model selection category in CASP9.
Collapse
Affiliation(s)
- Qingguo Wang
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | | | | | | |
Collapse
|
118
|
Wainreb G, Wolf L, Ashkenazy H, Dehouck Y, Ben-Tal N. Protein stability: a single recorded mutation aids in predicting the effects of other mutations in the same amino acid site. ACTA ACUST UNITED AC 2011; 27:3286-92. [PMID: 21998155 PMCID: PMC3223369 DOI: 10.1093/bioinformatics/btr576] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Motivation: Accurate prediction of protein stability is important for understanding the molecular underpinnings of diseases and for the design of new proteins. We introduce a novel approach for the prediction of changes in protein stability that arise from a single-site amino acid substitution; the approach uses available data on mutations occurring in the same position and in other positions. Our algorithm, named Pro-Maya (Protein Mutant stAbilitY Analyzer), combines a collaborative filtering baseline model, Random Forests regression and a diverse set of features. Pro-Maya predicts the stability free energy difference of mutant versus wild type, denoted as ΔΔG. Results: We evaluated our algorithm extensively using cross-validation on two previously utilized datasets of single amino acid mutations and a (third) validation set. The results indicate that using known ΔΔG values of mutations at the query position improves the accuracy of ΔΔG predictions for other mutations in that position. The accuracy of our predictions in such cases significantly surpasses that of similar methods, achieving, e.g. a Pearson's correlation coefficient of 0.79 and a root mean square error of 0.96 on the validation set. Because Pro-Maya uses a diverse set of features, including predictions using two other methods, it also performs slightly better than other methods in the absence of additional experimental data on the query positions. Availability: Pro-Maya is freely available via web server at http://bental.tau.ac.il/ProMaya. Contact:nirb@tauex.tau.ac.il; wolf@cs.tau.ac.il Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gilad Wainreb
- Department of Biochemistry and Molecular Biology, Tel-Aviv University, Ramat Aviv 69978, Israel
| | | | | | | | | |
Collapse
|
119
|
Hu C, Koehl P, Max N. PackHelix: a tool for helix-sheet packing during protein structure prediction. Proteins 2011; 79:2828-43. [PMID: 21905109 PMCID: PMC3172692 DOI: 10.1002/prot.23108] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 04/18/2011] [Accepted: 05/13/2011] [Indexed: 11/09/2022]
Abstract
The three-dimensional structure of a protein is organized around the packing of its secondary structure elements. Predicting the topology and constructing the geometry of structural motifs involving α-helices and/or β-strands are therefore key steps for accurate prediction of protein structure. While many efforts have focused on how to pack helices and on how to sample exhaustively the topologies and geometries of multiple strands forming a β-sheet in a protein, there has been little progress on generating native-like packings of helices on sheets. We describe a method that can generate the packing of multiple helices on a given β-sheet for αβα sandwich type protein folds. This method mines the results of a statistical analysis of the conformations of αβ(2) motifs in protein structures to provide input values for the geometric attributes of the packing of a helix on a sheet. It then proceeds with a geometric builder that generates multiple arrangements of the helices on the sheet of interest by sampling through these values and performing consistency checks that guarantee proper loop geometry between the helices and the strands, minimal number of collisions between the helices, and proper formation of a hydrophobic core. The method is implemented as a module of ProteinShop. Our results show that it produces structures that are within 4-6 Å RMSD of the native one, regardless of the number of helices that need to be packed, though this number may increase if the protein has several helices between two consecutive strands in the sequence that pack on the sheet formed by these two strands.
Collapse
Affiliation(s)
- Chengcheng Hu
- Department of Computer Science, University of California, Davis, CA 95616
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, CA 95616
| | - Nelson Max
- Department of Computer Science, University of California, Davis, CA 95616
| |
Collapse
|
120
|
Moughon SE, Samudrala R. LoCo: a novel main chain scoring function for protein structure prediction based on local coordinates. BMC Bioinformatics 2011; 12:368. [PMID: 21920038 PMCID: PMC3184297 DOI: 10.1186/1471-2105-12-368] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2011] [Accepted: 09/15/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Successful protein structure prediction requires accurate low-resolution scoring functions so that protein main chain conformations that are close to the native can be identified. Once that is accomplished, a more detailed and time-consuming treatment to produce all-atom models can be undertaken. The earliest low-resolution scoring used simple distance-based "contact potentials," but more recently, the relative orientations of interacting amino acids have been taken into account to improve performance. RESULTS We developed a new knowledge-based scoring function, LoCo, that locates the interaction partners of each individual residue within a local coordinate system based only on the position of its main chain N, Cα and C atoms. LoCo was trained on a large set of experimentally determined structures and optimized using standard sets of modeled structures, or "decoys." No structure used to train or optimize the function was included among those used to test it. When tested against 29 other published main chain functions on a group of 77 commonly used decoy sets, our function outperformed all others in Cα RMSD rank of the best-scoring decoy, with statistically significant p-values < 0.05 for 26 out of the 29 other functions considered. LoCo is fast, requiring on average less than 6 microseconds per residue for interaction and scoring on commonly-used computer hardware. CONCLUSIONS Our function demonstrates an unmatched combination of accuracy, speed, and simplicity and shows excellent promise for protein structure prediction. Broader applications may include protein-protein interactions and protein design.
Collapse
Affiliation(s)
- Stewart E Moughon
- Department of Microbiology, University of Washington, Box 357735, Seattle, Washington 98195-7242, USA.
| | | |
Collapse
|
121
|
Lertkiatmongkol P, Jenwitheesuk E, Rongnoparut P. Homology modeling of mosquito cytochrome P450 enzymes involved in pyrethroid metabolism: insights into differences in substrate selectivity. BMC Res Notes 2011; 4:321. [PMID: 21892968 PMCID: PMC3228512 DOI: 10.1186/1756-0500-4-321] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Accepted: 09/06/2011] [Indexed: 11/29/2022] Open
Abstract
Background Cytochrome P450 enzymes (P450s) have been implicated in insecticide resistance. Anopheles minumus mosquito P450 isoforms CYP6AA3 and CYP6P7 are capable of metabolizing pyrethroid insecticides, however CYP6P8 lacks activity against this class of compounds. Findings Homology models of the three An. minimus P450 enzymes were constructed using the multiple template alignment method. The predicted enzyme model structures were compared and used for molecular docking with insecticides and compared with results of in vitro enzymatic assays. The three model structures comprise common P450 folds but differences in geometry of their active-site cavities and substrate access channels are prominent. The CYP6AA3 model has a large active site allowing it to accommodate multiple conformations of pyrethroids. The predicted CYP6P7 active site is more constrained and less accessible to binding of pyrethroids. Moreover the predicted hydrophobic interface in the active-site cavities of CYP6AA3 and CYP6P7 may contribute to their substrate selectivity. The absence of CYP6P8 activity toward pyrethroids appears to be due to its small substrate access channel and the presence of R114 and R216 that may prevent access of pyrethroids to the enzyme heme center. Conclusions Differences in active site topologies among CYPAA3, CYP6P7, and CYP6P8 enzymes may impact substrate binding and selectivity. Information obtained using homology models has the potential to enhance the understanding of pyrethroid metabolism and detoxification mediated by P450 enzymes.
Collapse
Affiliation(s)
- Panida Lertkiatmongkol
- Department of Biochemistry, Faculty of Science, Mahidol University, Phayatai, Bangkok 10400, Thailand.
| | | | | |
Collapse
|
122
|
Shi X, Zhang J, He Z, Shang Y, Xu D. A sampling-based method for ranking protein structural models by integrating multiple scores and features. Curr Protein Pept Sci 2011; 12:540-8. [PMID: 21787308 PMCID: PMC4368063 DOI: 10.2174/138920311796957658] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Revised: 04/01/2011] [Accepted: 05/04/2011] [Indexed: 11/22/2022]
Abstract
One of the major challenges in protein tertiary structure prediction is structure quality assessment. In many cases, protein structure prediction tools generate good structural models, but fail to select the best models from a huge number of candidates as the final output. In this study, we developed a sampling-based machine-learning method to rank protein structural models by integrating multiple scores and features. First, features such as predicted secondary structure, solvent accessibility and residue-residue contact information are integrated by two Radial Basis Function (RBF) models trained from different datasets. Then, the two RBF scores and five selected scoring functions developed by others, i.e., Opus-CA, Opus-PSP, DFIRE, RAPDF, and Cheng Score are synthesized by a sampling method. At last, another integrated RBF model ranks the structural models according to the features of sampling distribution. We tested the proposed method by using two different datasets, including the CASP server prediction models of all CASP8 targets and a set of models generated by our in-house software MUFOLD. The test result shows that our method outperforms any individual scoring function on both best model selection, and overall correlation between the predicted ranking and the actual ranking of structural quality.
Collapse
Affiliation(s)
- Xiaohu Shi
- College of Computer Science and Technology, Jilin University, Jilin, Changchun 130012, China
| | | | | | | | | |
Collapse
|
123
|
Liu S, Vakser IA. DECK: Distance and environment-dependent, coarse-grained, knowledge-based potentials for protein-protein docking. BMC Bioinformatics 2011; 12:280. [PMID: 21745398 PMCID: PMC3145612 DOI: 10.1186/1471-2105-12-280] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2011] [Accepted: 07/11/2011] [Indexed: 11/13/2022] Open
Abstract
Background Computational approaches to protein-protein docking typically include scoring aimed at improving the rank of the near-native structure relative to the false-positive matches. Knowledge-based potentials improve modeling of protein complexes by taking advantage of the rapidly increasing amount of experimentally derived information on protein-protein association. An essential element of knowledge-based potentials is defining the reference state for an optimal description of the residue-residue (or atom-atom) pairs in the non-interaction state. Results The study presents a new Distance- and Environment-dependent, Coarse-grained, Knowledge-based (DECK) potential for scoring of protein-protein docking predictions. Training sets of protein-protein matches were generated based on bound and unbound forms of proteins taken from the DOCKGROUND resource. Each residue was represented by a pseudo-atom in the geometric center of the side chain. To capture the long-range and the multi-body interactions, residues in different secondary structure elements at protein-protein interfaces were considered as different residue types. Five reference states for the potentials were defined and tested. The optimal reference state was selected and the cutoff effect on the distance-dependent potentials investigated. The potentials were validated on the docking decoys sets, showing better performance than the existing potentials used in scoring of protein-protein docking results. Conclusions A novel residue-based statistical potential for protein-protein docking was developed and validated on docking decoy sets. The results show that the scoring function DECK can successfully identify near-native protein-protein matches and thus is useful in protein docking. In addition to the practical application of the potentials, the study provides insights into the relative utility of the reference states, the scope of the distance dependence, and the coarse-graining of the potentials.
Collapse
Affiliation(s)
- Shiyong Liu
- Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | | |
Collapse
|
124
|
Free energies for coarse-grained proteins by integrating multibody statistical contact potentials with entropies from elastic network models. ACTA ACUST UNITED AC 2011; 12:137-47. [PMID: 21674234 DOI: 10.1007/s10969-011-9113-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2010] [Accepted: 05/26/2011] [Indexed: 01/02/2023]
Abstract
We propose a novel method of calculation of free energy for coarse grained models of proteins by combining our newly developed multibody potentials with entropies computed from elastic network models of proteins. Multi-body potentials have been of much interest recently because they take into account three dimensional interactions related to residue packing and capture the cooperativity of these interactions in protein structures. Combining four-body non-sequential, four-body sequential and pairwise short range potentials with optimized weights for each term, our coarse-grained potential improved recognition of native structure among misfolded decoys, outperforming all other contact potentials for CASP8 decoy sets and performance comparable to the fully atomic empirical DFIRE potentials. By combing statistical contact potentials with entropies from elastic network models of the same structures we can compute free energy changes and improve coarse-grained modeling of protein structure and dynamics. The consideration of protein flexibility and dynamics should improve protein structure prediction and refinement of computational models. This work is the first to combine coarse-grained multibody potentials with an entropic model that takes into account contributions of the entire structure, investigating native-like decoy selection.
Collapse
|
125
|
Ghasemi JB, Salahinejad M, Rofouei MK. Review of the quantitative structure–activity relationship modelling methods on estimation of formation constants of macrocyclic compounds with different guest molecules. Supramol Chem 2011. [DOI: 10.1080/10610278.2011.581281] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- J. B. Ghasemi
- a Chemistry Department, Faculty of Sciences , K. N. Toosi University of Technology , Tehran , Iran
| | - M. Salahinejad
- b Faculty of Chemistry , Tarbiat Moalem University , Tehran , Iran
| | - M. K. Rofouei
- b Faculty of Chemistry , Tarbiat Moalem University , Tehran , Iran
| |
Collapse
|
126
|
Bernauer J, Huang X, Sim AYL, Levitt M. Fully differentiable coarse-grained and all-atom knowledge-based potentials for RNA structure evaluation. RNA (NEW YORK, N.Y.) 2011; 17:1066-1075. [PMID: 21521828 PMCID: PMC3096039 DOI: 10.1261/rna.2543711] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2010] [Accepted: 03/01/2011] [Indexed: 05/27/2023]
Abstract
RNA molecules play integral roles in gene regulation, and understanding their structures gives us important insights into their biological functions. Despite recent developments in template-based and parameterized energy functions, the structure of RNA--in particular the nonhelical regions--is still difficult to predict. Knowledge-based potentials have proven efficient in protein structure prediction. In this work, we describe two differentiable knowledge-based potentials derived from a curated data set of RNA structures, with all-atom or coarse-grained representation, respectively. We focus on one aspect of the prediction problem: the identification of native-like RNA conformations from a set of near-native models. Using a variety of near-native RNA models generated from three independent methods, we show that our potential is able to distinguish the native structure and identify native-like conformations, even at the coarse-grained level. The all-atom version of our knowledge-based potential performs better and appears to be more effective at discriminating near-native RNA conformations than one of the most highly regarded parameterized potential. The fully differentiable form of our potentials will additionally likely be useful for structure refinement and/or molecular dynamics simulations.
Collapse
Affiliation(s)
- Julie Bernauer
- INRIA AMIB Bioinformatique, Laboratoire d'Informatique (LIX), Ecole Polytechnique, 91128 Palaiseau, France.
| | | | | | | |
Collapse
|
127
|
Tian L, Wu A, Cao Y, Dong X, Hu Y, Jiang T. NCACO-score: an effective main-chain dependent scoring function for structure modeling. BMC Bioinformatics 2011; 12:208. [PMID: 21612673 PMCID: PMC3123610 DOI: 10.1186/1471-2105-12-208] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Accepted: 05/26/2011] [Indexed: 11/10/2022] Open
Abstract
Background Development of effective scoring functions is a critical component to the success of protein structure modeling. Previously, many efforts have been dedicated to the development of scoring functions. Despite these efforts, development of an effective scoring function that can achieve both good accuracy and fast speed still presents a grand challenge. Results Based on a coarse-grained representation of a protein structure by using only four main-chain atoms: N, Cα, C and O, we develop a knowledge-based scoring function, called NCACO-score, that integrates different structural information to rapidly model protein structure from sequence. In testing on the Decoys'R'Us sets, we found that NCACO-score can effectively recognize native conformers from their decoys. Furthermore, we demonstrate that NCACO-score can effectively guide fragment assembly for protein structure prediction, which has achieved a good performance in building the structure models for hard targets from CASP8 in terms of both accuracy and speed. Conclusions Although NCACO-score is developed based on a coarse-grained model, it is able to discriminate native conformers from decoy conformers with high accuracy. NCACO is a very effective scoring function for structure modeling.
Collapse
Affiliation(s)
- Liqing Tian
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | | | | | | | | | | |
Collapse
|
128
|
Cunningham ML, Horst JA, Rieder MJ, Hing AV, Stanaway IB, Park SS, Samudrala R, Speltz ML. IGF1R variants associated with isolated single suture craniosynostosis. Am J Med Genet A 2011; 155A:91-7. [PMID: 21204214 DOI: 10.1002/ajmg.a.33781] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The genetic contribution to the pathogenesis of isolated single suture craniosynostosis is poorly understood. The role of mutations in genes known to be associated with syndromic synostosis appears to be limited. We present our findings of a candidate gene resequencing approach to identify rare variants associated with the most common forms of isolated craniosynostosis. Resequencing of the coding regions, splice junction sites, and 5' and 3' untranslated regions of 27 candidate genes in 186 cases of isolated non-syndromic single suture synostosis revealed three novel and two rare sequence variants (R406H, R595H, N857S, P190S, M446V) in insulin-like growth factor I receptor (IGF1R) that are enriched relative to control samples. Mapping the resultant amino acid changes to the modeled homodimer protein structure suggests a structural basis for segregation between these and other disease-associated mutations found in IGF1R. These data suggest that IGF1R mutations may contribute to the risk and in some cases cause single suture craniosynostosis.
Collapse
Affiliation(s)
- Michael L Cunningham
- Seattle Children's Hospital Craniofacial Center, University of Washington, 98195, USA.
| | | | | | | | | | | | | | | |
Collapse
|
129
|
Liang S, Zhang C, Standley DM. Protein loop selection using orientation-dependent force fields derived by parameter optimization. Proteins 2011; 79:2260-7. [DOI: 10.1002/prot.23051] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2011] [Revised: 03/21/2011] [Accepted: 03/31/2011] [Indexed: 12/25/2022]
|
130
|
Sun W, He J. From isotropic to anisotropic side chain representations: comparison of three models for residue contact estimation. PLoS One 2011; 6:e19238. [PMID: 21552527 PMCID: PMC3084275 DOI: 10.1371/journal.pone.0019238] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 03/29/2011] [Indexed: 11/19/2022] Open
Abstract
The criterion to determine residue contact is a fundamental problem in deriving knowledge-based mean-force potential energy calculations for protein structures. A frequently used criterion is to require the side chain center-to-center distance or the -to- atom distance to be within a pre-determined cutoff distance. However, the spatially anisotropic nature of the side chain determines that it is challenging to identify the contact pairs. This study compares three side chain contact models: the Atom Distance criteria (ADC) model, the Isotropic Sphere Side chain (ISS) model and the Anisotropic Ellipsoid Side chain (AES) model using 424 high resolution protein structures in the Protein Data Bank. The results indicate that the ADC model is the most accurate and ISS is the worst. The AES model eliminates about 95% of the incorrectly counted contact-pairs in the ISS model. Algorithm analysis shows that AES model is the most computational intensive while ADC model has moderate computational cost. We derived a dataset of the mis-estimated contact pairs by AES model. The most misjudged pairs are Arg-Glu, Arg-Asp and Arg-Tyr. Such a dataset can be useful for developing the improved AES model by incorporating the pair-specific information for the cutoff distance.
Collapse
Affiliation(s)
- Weitao Sun
- Zhou Pei-Yuan Center for Applied Mathematics, Tsinghua University, Beijing, China.
| | | |
Collapse
|
131
|
Gniewek P, Leelananda SP, Kolinski A, Jernigan RL, Kloczkowski A. Multibody coarse-grained potentials for native structure recognition and quality assessment of protein models. Proteins 2011; 79:1923-9. [PMID: 21560165 DOI: 10.1002/prot.23015] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2010] [Revised: 01/07/2011] [Accepted: 01/28/2011] [Indexed: 01/02/2023]
Abstract
Multibody potentials have been of much interest recently because they take into account three dimensional interactions related to residue packing and capture the cooperativity of these interactions in protein structures. Our goal was to combine long range multibody potentials and short range potentials to improve recognition of native structure among misfolded decoys. We optimized the weights for four-body nonsequential, four-body sequential, and short range potentials to obtain optimal model ranking results for threading and have compared these data against results obtained with other potentials (26 different coarse-grained potentials from the Potentials 'R'Us web server have been used). Our optimized multibody potentials outperform all other contact potentials in the recognition of the native structure among decoys, both for models from homology template-based modeling and from template-free modeling in CASP8 decoy sets. We have compared the results obtained for this optimized coarse-grained potentials, where each residue is represented by a single point, with results obtained by using the DFIRE potential, which takes into account atomic level information of proteins. We found that for all proteins larger than 80 amino acids our optimized coarse-grained potentials yield results comparable to those obtained with the atomic DFIRE potential.
Collapse
Affiliation(s)
- Pawel Gniewek
- Faculty of Chemistry, University of Warsaw, Warsaw, Poland
| | | | | | | | | |
Collapse
|
132
|
Shirota M, Ishida T, Kinoshita K. Absolute quality evaluation of protein model structures using statistical potentials with respect to the native and reference states. Proteins 2011; 79:1550-63. [PMID: 21365682 DOI: 10.1002/prot.22982] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2010] [Revised: 11/19/2010] [Accepted: 12/19/2010] [Indexed: 11/06/2022]
Abstract
In protein structure prediction, it is crucial to evaluate the degree of native-likeness of given model structures. Statistical potentials extracted from protein structure data sets are widely used for such quality assessment problems, but they are only applicable for comparing different models of the same protein. Although various other methods, such as machine learning approaches, were developed to predict the absolute similarity of model structures to the native ones, they required a set of decoy structures in addition to the model structures. In this paper, we tried to reformulate the statistical potentials as absolute quality scores, without using the information from decoy structures. For this purpose, we regarded the native state and the reference state, which are necessary components of statistical potentials, as the good and bad standard states, respectively, and first showed that the statistical potentials can be regarded as the state functions, which relate a model structure to the native and reference states. Then, we proposed a standardized measure of protein structure, called native-likeness, by interpolating the score of a model structure between the native and reference state scores defined for each protein. The native-likeness correlated with the similarity to the native structures and discriminated the native structures from the models, with better accuracy than the raw score. Our results show that statistical potentials can quantify the native-like properties of protein structures, if they fully utilize the statistical information obtained from the data set.
Collapse
Affiliation(s)
- Matsuyuki Shirota
- Department of Applied Information Sciences, Graduate School of Information Science, Tohoku University, 6-3-09, Aoba, Aramaki, Aoba-Ku, Sendai, Miyagi 980-8579, Japan
| | | | | |
Collapse
|
133
|
Dong Q, Zhou S. Novel nonlinear knowledge-based mean force potentials based on machine learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:476-486. [PMID: 20820079 DOI: 10.1109/tcbb.2010.86] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The prediction of 3D structures of proteins from amino acid sequences is one of the most challenging problems in molecular biology. An essential task for solving this problem with coarse-grained models is to deduce effective interaction potentials. The development and evaluation of new energy functions is critical to accurately modeling the properties of biological macromolecules. Knowledge-based mean force potentials are derived from statistical analysis of proteins of known structures. Current knowledge-based potentials are almost in the form of weighted linear sum of interaction pairs. In this study, a class of novel nonlinear knowledge-based mean force potentials is presented. The potential parameters are obtained by nonlinear classifiers, instead of relative frequencies of interaction pairs against a reference state or linear classifiers. The support vector machine is used to derive the potential parameters on data sets that contain both native structures and decoy structures. Five knowledge-based mean force Boltzmann-based or linear potentials are introduced and their corresponding nonlinear potentials are implemented. They are the DIH potential (single-body residue-level Boltzmann-based potential), the DFIRE-SCM potential (two-body residue-level Boltzmann-based potential), the FS potential (two-body atom-level Boltzmann-based potential), the HR potential (two-body residue-level linear potential), and the T32S3 potential (two-body atom-level linear potential). Experiments are performed on well-established decoy sets, including the LKF data set, the CASP7 data set, and the Decoys “R”Us data set. The evaluation metrics include the energy Z score and the ability of each potential to discriminate native structures from a set of decoy structures. Experimental results show that all nonlinear potentials significantly outperform the corresponding Boltzmann-based or linear potentials, and the proposed discriminative framework is effective in developing knowledge-based mean force potentials. The nonlinear potentials can be widely used for ab initio protein structure prediction, model quality assessment, protein docking, and other challenging problems in computational biology.
Collapse
Affiliation(s)
- Qiwen Dong
- Shanghai Key Lab of Intelligent Information Processing and the School of Computer Science, Fudan University, Old Yifu Building, Room 202-5, 220 Handan Road, Shanhai 200433, China.
| | | |
Collapse
|
134
|
Adamczak R, Pillardy J, Vallat BK, Meller J. Fast geometric consensus approach for protein model quality assessment. J Comput Biol 2011; 18:1807-18. [PMID: 21244273 DOI: 10.1089/cmb.2010.0170] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Model quality assessment (MQA) is an integral part of protein structure prediction methods that typically generate multiple candidate models. The challenge lies in ranking and selecting the best models using a variety of physical, knowledge-based, and geometric consensus (GC)-based scoring functions. In particular, 3D-Jury and related GC methods assume that well-predicted (sub-)structures are more likely to occur frequently in a population of candidate models, compared to incorrectly folded fragments. While this approach is very successful in the context of diversified sets of models, identifying similar substructures is computationally expensive since all pairs of models need to be superimposed using MaxSub or related heuristics for structure-to-structure alignment. Here, we consider a fast alternative, in which structural similarity is assessed using 1D profiles, e.g., consisting of relative solvent accessibilities and secondary structures of equivalent amino acid residues in the respective models. We show that the new approach, dubbed 1D-Jury, allows to implicitly compare and rank N models in O(N) time, as opposed to quadratic complexity of 3D-Jury and related clustering-based methods. In addition, 1D-Jury avoids computationally expensive 3D superposition of pairs of models. At the same time, structural similarity scores based on 1D profiles are shown to correlate strongly with those obtained using MaxSub. In terms of the ability to select the best models as top candidates 1D-Jury performs on par with other GC methods. Other potential applications of the new approach, including fast clustering of large numbers of intermediate structures generated by folding simulations, are discussed as well.
Collapse
Affiliation(s)
- Rafal Adamczak
- Department of Environmental Health, University of Cincinnati College of Medicine, Cincinnati, Ohio 45226, USA
| | | | | | | |
Collapse
|
135
|
Abstract
Loop modeling is crucial for high-quality homology model construction outside conserved secondary structure elements. Dozens of loop modeling protocols involving a range of database and ab initio search algorithms and a variety of scoring functions have been proposed. Knowledge-based loop modeling methods are very fast and some can successfully and reliably predict loops up to about eight residues long. Several recent ab initio loop simulation methods can be used to construct accurate models of loops up to 12-13 residues long, albeit at a substantial computational cost. Major current challenges are the simulations of loops longer than 12-13 residues, the modeling of multiple interacting flexible loops, and the sensitivity of the loop predictions to the accuracy of the loop environment.
Collapse
|
136
|
Zhou Y, Duan Y, Yang Y, Faraggi E, Lei H. Trends in template/fragment-free protein structure prediction. Theor Chem Acc 2011; 128:3-16. [PMID: 21423322 PMCID: PMC3030773 DOI: 10.1007/s00214-010-0799-2] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2010] [Accepted: 08/15/2010] [Indexed: 12/13/2022]
Abstract
Predicting the structure of a protein from its amino acid sequence is a long-standing unsolved problem in computational biology. Its solution would be of both fundamental and practical importance as the gap between the number of known sequences and the number of experimentally solved structures widens rapidly. Currently, the most successful approaches are based on fragment/template reassembly. Lacking progress in template-free structure prediction calls for novel ideas and approaches. This article reviews trends in the development of physical and specific knowledge-based energy functions as well as sampling techniques for fragment-free structure prediction. Recent physical- and knowledge-based studies demonstrated that it is possible to sample and predict highly accurate protein structures without borrowing native fragments from known protein structures. These emerging approaches with fully flexible sampling have the potential to move the field forward.
Collapse
Affiliation(s)
- Yaoqi Zhou
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Yong Duan
- UC Davis Genome Center and Department of Applied Science, University of California, One Shields Avenue, Davis, CA USA
- College of Physics, Huazhong University of Science and Technology, 1037 Luoyu Road, 430074 Wuhan, China
| | - Yuedong Yang
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Eshel Faraggi
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Hongxing Lei
- UC Davis Genome Center and Department of Applied Science, University of California, One Shields Avenue, Davis, CA USA
- Beijing Institute of Genomics, Chinese Academy of Sciences, 100029 Beijing, China
| |
Collapse
|
137
|
Benkert P, Biasini M, Schwede T. Toward the estimation of the absolute quality of individual protein structure models. ACTA ACUST UNITED AC 2010; 27:343-50. [PMID: 21134891 PMCID: PMC3031035 DOI: 10.1093/bioinformatics/btq662] [Citation(s) in RCA: 1517] [Impact Index Per Article: 108.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Motivation: Quality assessment of protein structures is an important part of experimental structure validation and plays a crucial role in protein structure prediction, where the predicted models may contain substantial errors. Most current scoring functions are primarily designed to rank alternative models of the same sequence supporting model selection, whereas the prediction of the absolute quality of an individual protein model has received little attention in the field. However, reliable absolute quality estimates are crucial to assess the suitability of a model for specific biomedical applications. Results: In this work, we present a new absolute measure for the quality of protein models, which provides an estimate of the ‘degree of nativeness’ of the structural features observed in a model and describes the likelihood that a given model is of comparable quality to experimental structures. Model quality estimates based on the QMEAN scoring function were normalized with respect to the number of interactions. The resulting scoring function is independent of the size of the protein and may therefore be used to assess both monomers and entire oligomeric assemblies. Model quality scores for individual models are then expressed as ‘Z-scores’ in comparison to scores obtained for high-resolution crystal structures. We demonstrate the ability of the newly introduced QMEAN Z-score to detect experimentally solved protein structures containing significant errors, as well as to evaluate theoretical protein models. In a comprehensive QMEAN Z-score analysis of all experimental structures in the PDB, membrane proteins accumulate on one side of the score spectrum and thermostable proteins on the other. Proteins from the thermophilic organism Thermatoga maritima received significantly higher QMEAN Z-scores in a pairwise comparison with their homologous mesophilic counterparts, underlining the significance of the QMEAN Z-score as an estimate of protein stability. Availability: The Z-score calculation has been integrated in the QMEAN server available at: http://swissmodel.expasy.org/qmean. Contact:torsten.schwede@unibas.ch Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
138
|
Hamelryck T, Borg M, Paluszewski M, Paulsen J, Frellsen J, Andreetta C, Boomsma W, Bottaro S, Ferkinghoff-Borg J. Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PLoS One 2010; 5:e13714. [PMID: 21103041 PMCID: PMC2978081 DOI: 10.1371/journal.pone.0013714] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2010] [Accepted: 10/04/2010] [Indexed: 11/26/2022] Open
Abstract
Understanding protein structure is of crucial importance in science, medicine and biotechnology. For about two decades, knowledge-based potentials based on pairwise distances – so-called “potentials of mean force” (PMFs) – have been center stage in the prediction and design of protein structure and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state – a necessary component of these potentials – is an unsolved problem. PMFs are loosely justified by analogy to the reversible work theorem in statistical physics, or by a statistical argument based on a likelihood function. Both justifications are insightful but leave many questions unanswered. Here, we show for the first time that PMFs can be seen as approximations to quantities that do have a rigorous probabilistic justification: they naturally arise when probability distributions over different features of proteins need to be combined. We call these quantities “reference ratio distributions” deriving from the application of the “reference ratio method.” This new view is not only of theoretical relevance but leads to many insights that are of direct practical use: the reference state is uniquely defined and does not require external physical insights; the approach can be generalized beyond pairwise distances to arbitrary features of protein structure; and it becomes clear for which purposes the use of these quantities is justified. We illustrate these insights with two applications, involving the radius of gyration and hydrogen bonding. In the latter case, we also show how the reference ratio method can be iteratively applied to sculpt an energy funnel. Our results considerably increase the understanding and scope of energy functions derived from known biomolecular structures.
Collapse
Affiliation(s)
- Thomas Hamelryck
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (TH); (JFB)
| | - Mikael Borg
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Martin Paluszewski
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jonas Paulsen
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jes Frellsen
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Christian Andreetta
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
- Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Sandro Bottaro
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
| | - Jesper Ferkinghoff-Borg
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
- * E-mail: (TH); (JFB)
| |
Collapse
|
139
|
Zhang J, Zhang Y. A novel side-chain orientation dependent potential derived from random-walk reference state for protein fold selection and structure prediction. PLoS One 2010; 5:e15386. [PMID: 21060880 PMCID: PMC2965178 DOI: 10.1371/journal.pone.0015386] [Citation(s) in RCA: 171] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2010] [Accepted: 09/01/2010] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND An accurate potential function is essential to attack protein folding and structure prediction problems. The key to developing efficient knowledge-based potential functions is to design reference states that can appropriately counteract generic interactions. The reference states of many knowledge-based distance-dependent atomic potential functions were derived from non-interacting particles such as ideal gas, however, which ignored the inherent sequence connectivity and entropic elasticity of proteins. METHODOLOGY We developed a new pair-wise distance-dependent, atomic statistical potential function (RW), using an ideal random-walk chain as reference state, which was optimized on CASP models and then benchmarked on nine structural decoy sets. Second, we incorporated a new side-chain orientation-dependent energy term into RW (RWplus) and found that the side-chain packing orientation specificity can further improve the decoy recognition ability of the statistical potential. SIGNIFICANCE RW and RWplus demonstrate a significantly better ability than the best performing pair-wise distance-dependent atomic potential functions in both native and near-native model selections. It has higher energy-RMSD and energy-TM-score correlations compared with other potentials of the same type in real-life structure assembly decoys. When benchmarked with a comprehensive list of publicly available potentials, RW and RWplus shows comparable performance to the state-of-the-art scoring functions, including those combining terms from multiple resources. These data demonstrate the usefulness of random-walk chain as reference states which correctly account for sequence connectivity and entropic elasticity of proteins. It shows potential usefulness in structure recognition and protein folding simulations. The RW and RWplus potentials, as well as the newly generated I-TASSER decoys, are freely available in http://zhanglab.ccmb.med.umich.edu/RW.
Collapse
Affiliation(s)
- Jian Zhang
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Zhang
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
140
|
|
141
|
Tian Y, Deutsch C, Krishnamoorthy B. Scoring function to predict solubility mutagenesis. Algorithms Mol Biol 2010; 5:33. [PMID: 20929563 PMCID: PMC2958853 DOI: 10.1186/1748-7188-5-33] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2010] [Accepted: 10/07/2010] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small subset of mutations from a large number of candidates to obtain the desired change, and computational techniques are invaluable to make the choices. While several such methods have been proposed to predict stability and reactivity mutagenesis, solubility has not received much attention. RESULTS We use concepts from computational geometry to define a three body scoring function that predicts the change in protein solubility due to mutations. The scoring function captures both sequence and structure information. By exploring the literature, we have assembled a substantial database of 137 single- and multiple-point solubility mutations. Our database is the largest such collection with structural information known so far. We optimize the scoring function using linear programming (LP) methods to derive its weights based on training. Starting with default values of 1, we find weights in the range [0,2] so that predictions of increase or decrease in solubility are optimized. We compare the LP method to the standard machine learning techniques of support vector machines (SVM) and the Lasso. Using statistics for leave-one-out (LOO), 10-fold, and 3-fold cross validations (CV) for training and prediction, we demonstrate that the LP method performs the best overall. For the LOOCV, the LP method has an overall accuracy of 81%. AVAILABILITY Executables of programs, tables of weights, and datasets of mutants are available from the following web page: http://www.wsu.edu/~kbala/OptSolMut.html.
Collapse
Affiliation(s)
- Ye Tian
- Department of Mathematics, Washington State University, Pullman, WA 99164, USA
| | | | - Bala Krishnamoorthy
- Department of Mathematics, Washington State University, Pullman, WA 99164, USA
| |
Collapse
|
142
|
Zhou W, Yan H. A discriminatory function for prediction of protein-DNA interactions based on alpha shape modeling. Bioinformatics 2010; 26:2541-8. [DOI: 10.1093/bioinformatics/btq478] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
143
|
Dibble CF, Horst JA, Malone MH, Park K, Temple B, Cheeseman H, Barbaro JR, Johnson GL, Bencharit S. Defining the functional domain of programmed cell death 10 through its interactions with phosphatidylinositol-3,4,5-trisphosphate. PLoS One 2010; 5:e11740. [PMID: 20668527 PMCID: PMC2909203 DOI: 10.1371/journal.pone.0011740] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2010] [Accepted: 07/01/2010] [Indexed: 11/25/2022] Open
Abstract
Cerebral cavernous malformations (CCM) are vascular abnormalities of the central nervous system predisposing blood vessels to leakage, leading to hemorrhagic stroke. Three genes, Krit1 (CCM1), OSM (CCM2), and PDCD10 (CCM3) are involved in CCM development. PDCD10 binds specifically to PtdIns(3,4,5)P3 and OSM. Using threading analysis and multi-template modeling, we constructed a three-dimensional model of PDCD10. PDCD10 appears to be a six-helical-bundle protein formed by two heptad-repeat-hairpin structures (α1–3 and α4–6) sharing the closest 3D homology with the bacterial phosphate transporter, PhoU. We identified a stretch of five lysines forming an amphipathic helix, a potential PtdIns(3,4,5)P3 binding site, in the α5 helix. We generated a recombinant wild-type (WT) and three PDCD10 mutants that have two (Δ2KA), three (Δ3KA), and five (Δ5KA) K to A mutations. Δ2KA and Δ3KA mutants hypothetically lack binding residues to PtdIns(3,4,5)P3 at the beginning and the end of predicted helix, while Δ5KA completely lacks all predicted binding residues. The WT, Δ2KA, and Δ3KA mutants maintain their binding to PtdIns(3,4,5)P3. Only the Δ5KA abolishes binding to PtdIns(3,4,5)P3. Both Δ5KA and WT show similar secondary and tertiary structures; however, Δ5KA does not bind to OSM. When WT and Δ5KA are co-expressed with membrane-bound constitutively-active PI3 kinase (p110-CAAX), the majority of the WT is co-localized with p110-CAAX at the plasma membrane where PtdIns(3,4,5)P3 is presumably abundant. In contrast, the Δ5KA remains in the cytoplasm and is not present in the plasma membrane. Combining computational modeling and biological data, we propose that the CCM protein complex functions in the PI3K signaling pathway through the interaction between PDCD10 and PtdIns(3,4,5)P3.
Collapse
Affiliation(s)
- Christopher F. Dibble
- Department of Pharmacology, School of Medicine, and the Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Jeremy A. Horst
- Department of Microbiology, School of Medicine, and Department of Oral Biology, School of Dentistry, University of Washington, Seattle, Washington, United States of America
| | - Michael H. Malone
- Department of Pharmacology, School of Medicine, and the Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Kun Park
- Department of Prosthodontics and the Dental Research Center, School of Dentistry, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Brenda Temple
- Department of Pharmacology, School of Medicine, and the Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Holly Cheeseman
- Department of Prosthodontics and the Dental Research Center, School of Dentistry, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Justin R. Barbaro
- Department of Prosthodontics and the Dental Research Center, School of Dentistry, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Gary L. Johnson
- Department of Pharmacology, School of Medicine, and the Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Sompop Bencharit
- Department of Pharmacology, School of Medicine, and the Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Department of Prosthodontics and the Dental Research Center, School of Dentistry, University of North Carolina, Chapel Hill, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
144
|
Potapov V, Cohen M, Inbar Y, Schreiber G. Protein structure modelling and evaluation based on a 4-distance description of side-chain interactions. BMC Bioinformatics 2010; 11:374. [PMID: 20624289 PMCID: PMC2912888 DOI: 10.1186/1471-2105-11-374] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2009] [Accepted: 07/12/2010] [Indexed: 11/11/2022] Open
Abstract
Background Accurate evaluation and modelling of residue-residue interactions within and between proteins is a key aspect of computational structure prediction including homology modelling, protein-protein docking, refinement of low-resolution structures, and computational protein design. Results Here we introduce a method for accurate protein structure modelling and evaluation based on a novel 4-distance description of residue-residue interaction geometry. Statistical 4-distance preferences were extracted from high-resolution protein structures and were used as a basis for a knowledge-based potential, called Hunter. We demonstrate that 4-distance description of side chain interactions can be used reliably to discriminate the native structure from a set of decoys. Hunter ranked the native structure as the top one in 217 out of 220 high-resolution decoy sets, in 25 out of 28 "Decoys 'R' Us" decoy sets and in 24 out of 27 high-resolution CASP7/8 decoy sets. The same concept was applied to side chain modelling in protein structures. On a set of very high-resolution protein structures the average RMSD was 1.47 Å for all residues and 0.73 Å for buried residues, which is in the range of attainable accuracy for a model. Finally, we show that Hunter performs as good or better than other top methods in homology modelling based on results from the CASP7 experiment. The supporting web site http://bioinfo.weizmann.ac.il/hunter/ was developed to enable the use of Hunter and for visualization and interactive exploration of 4-distance distributions. Conclusions Our results suggest that Hunter can be used as a tool for evaluation and for accurate modelling of residue-residue interactions in protein structures. The same methodology is applicable to other areas involving high-resolution modelling of biomolecules.
Collapse
Affiliation(s)
- Vladimir Potapov
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | | | | | | |
Collapse
|
145
|
Costin JM, Jenwitheesuk E, Lok SM, Hunsperger E, Conrads KA, Fontaine KA, Rees CR, Rossmann MG, Isern S, Samudrala R, Michael SF. Structural optimization and de novo design of dengue virus entry inhibitory peptides. PLoS Negl Trop Dis 2010; 4:e721. [PMID: 20582308 PMCID: PMC2889824 DOI: 10.1371/journal.pntd.0000721] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2009] [Accepted: 04/29/2010] [Indexed: 01/15/2023] Open
Abstract
Viral fusogenic envelope proteins are important targets for the development of inhibitors of viral entry. We report an approach for the computational design of peptide inhibitors of the dengue 2 virus (DENV-2) envelope (E) protein using high-resolution structural data from a pre-entry dimeric form of the protein. By using predictive strategies together with computational optimization of binding "pseudoenergies", we were able to design multiple peptide sequences that showed low micromolar viral entry inhibitory activity. The two most active peptides, DN57opt and 1OAN1, were designed to displace regions in the domain II hinge, and the first domain I/domain II beta sheet connection, respectively, and show fifty percent inhibitory concentrations of 8 and 7 microM respectively in a focus forming unit assay. The antiviral peptides were shown to interfere with virus:cell binding, interact directly with the E proteins and also cause changes to the viral surface using biolayer interferometry and cryo-electron microscopy, respectively. These peptides may be useful for characterization of intermediate states in the membrane fusion process, investigation of DENV receptor molecules, and as lead compounds for drug discovery.
Collapse
Affiliation(s)
- Joshua M. Costin
- Department of Biological Sciences, Florida Gulf Coast University, Fort Myers, Florida, United States of America
| | - Ekachai Jenwitheesuk
- Department of Microbiology, University of Washington, Seattle, Washington, United States of America
| | - Shee-Mei Lok
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Elizabeth Hunsperger
- Dengue Branch, Division of Vector-Borne Infectious Diseases, Centers for Disease Control and Prevention, San Juan, Puerto Rico
| | - Kelly A. Conrads
- FortéBio, Incorporated, Menlo Park, California, United States of America
| | - Krystal A. Fontaine
- Department of Biological Sciences, Florida Gulf Coast University, Fort Myers, Florida, United States of America
| | - Craig R. Rees
- Department of Biological Sciences, Florida Gulf Coast University, Fort Myers, Florida, United States of America
| | - Michael G. Rossmann
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Sharon Isern
- Department of Biological Sciences, Florida Gulf Coast University, Fort Myers, Florida, United States of America
| | - Ram Samudrala
- Department of Microbiology, University of Washington, Seattle, Washington, United States of America
| | - Scott F. Michael
- Department of Biological Sciences, Florida Gulf Coast University, Fort Myers, Florida, United States of America
| |
Collapse
|
146
|
Choi Y, Deane CM. FREAD revisited: Accurate loop structure prediction using a database search algorithm. Proteins 2010; 78:1431-40. [PMID: 20034110 DOI: 10.1002/prot.22658] [Citation(s) in RCA: 121] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Loops are the most variable regions of protein structure and are, in general, the least accurately predicted. Their prediction has been approached in two ways, ab initio and database search. In recent years, it has been thought that ab initio methods are more powerful. In light of the continued rapid expansion in the number of known protein structures, we have re-evaluated FREAD, a database search method and demonstrate that the power of database search methods may have been underestimated. We found that sequence similarity as quantified by environment specific substitution scores can be used to significantly improve prediction. In fact, FREAD performs appreciably better for an identifiable subset of loops (two thirds of shorter loops and half of the longer loops tested) than the ab initio methods of MODELLER, PLOP, and RAPPER. Within this subset, FREAD's predictive ability is length independent, in general, producing results within 2A RMSD, compared to an average of over 10A for loop length 20 for any of the other tested methods. We also benchmarked the prediction protocols on a set of 212 loops from the model structures in CASP 7 and 8. An extended version of FREAD is able to make predictions for 127 of these, it gives the best prediction of the methods tested in 61 of these cases. In examining FREAD's ability to predict in the model environment, we found that whole structure quality did not affect the quality of loop predictions.
Collapse
Affiliation(s)
- Yoonjoo Choi
- Department of Statistics, Oxford University, United Kingdom.
| | | |
Collapse
|
147
|
Distance-dependent statistical potentials for discriminating thermophilic and mesophilic proteins. Biochem Biophys Res Commun 2010; 396:736-41. [PMID: 20451495 DOI: 10.1016/j.bbrc.2010.05.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2010] [Accepted: 05/02/2010] [Indexed: 11/22/2022]
Abstract
Identification of the characteristic structural patterns responsible for protein thermostability is theoretically important and practically useful but largely remains an open problem. These patterns may be revealed through comparative study on thermophilic and mesophilic proteins that have distinct thermostability. In this study, we constructed several distance-dependant potentials from thermophilic and mesophilic proteins. These potentials were then used to evaluate the structural difference between thermophilic and mesophilic proteins. We found that using the subtraction or division of the potentials derived from thermophilic and mesophilic proteins can dramatically increase the discriminatory ability. This approach revealed that the ability to distinct the subtle structural features responsible for protein thermostability may be effectively enhanced through rationally designed comparative study.
Collapse
|
148
|
Zhang J, Wang Q, Barz B, He Z, Kosztin I, Shang Y, Xu D. MUFOLD: A new solution for protein 3D structure prediction. Proteins 2010; 78:1137-52. [PMID: 19927325 DOI: 10.1002/prot.22634] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
There have been steady improvements in protein structure prediction during the past 2 decades. However, current methods are still far from consistently predicting structural models accurately with computing power accessible to common users. Toward achieving more accurate and efficient structure prediction, we developed a number of novel methods and integrated them into a software package, MUFOLD. First, a systematic protocol was developed to identify useful templates and fragments from Protein Data Bank for a given target protein. Then, an efficient process was applied for iterative coarse-grain model generation and evaluation at the Calpha or backbone level. In this process, we construct models using interresidue spatial restraints derived from alignments by multidimensional scaling, evaluate and select models through clustering and static scoring functions, and iteratively improve the selected models by integrating spatial restraints and previous models. Finally, the full-atom models were evaluated using molecular dynamics simulations based on structural changes under simulated heating. We have continuously improved the performance of MUFOLD by using a benchmark of 200 proteins from the Astral database, where no template with >25% sequence identity to any target protein is included. The average root-mean-square deviation of the best models from the native structures is 4.28 A, which shows significant and systematic improvement over our previous methods. The computing time of MUFOLD is much shorter than many other tools, such as Rosetta. MUFOLD demonstrated some success in the 2008 community-wide experiment for protein structure prediction CASP8.
Collapse
Affiliation(s)
- Jingfen Zhang
- Department of Computer Science, University of Missouri, Columbia, Missouri 65211, USA
| | | | | | | | | | | | | |
Collapse
|
149
|
Bordner AJ. Orientation-dependent backbone-only residue pair scoring functions for fixed backbone protein design. BMC Bioinformatics 2010; 11:192. [PMID: 20398384 PMCID: PMC2874805 DOI: 10.1186/1471-2105-11-192] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Accepted: 04/16/2010] [Indexed: 11/24/2022] Open
Abstract
Background Empirical scoring functions have proven useful in protein structure modeling. Most such scoring functions depend on protein side chain conformations. However, backbone-only scoring functions do not require computationally intensive structure optimization and so are well suited to protein design, which requires fast score evaluation. Furthermore, scoring functions that account for the distinctive relative position and orientation preferences of residue pairs are expected to be more accurate than those that depend only on the separation distance. Results Residue pair scoring functions for fixed backbone protein design were derived using only backbone geometry. Unlike previous studies that used spherical harmonics to fit 2D angular distributions, Gaussian Mixture Models were used to fit the full 3D (position only) and 6D (position and orientation) distributions of residue pairs. The performance of the 1D (residue separation only), 3D, and 6D scoring functions were compared by their ability to identify correct threading solutions for a non-redundant benchmark set of protein backbone structures. The threading accuracy was found to steadily increase with increasing dimension, with the 6D scoring function achieving the highest accuracy. Furthermore, the 3D and 6D scoring functions were shown to outperform side chain-dependent empirical potentials from three other studies. Next, two computational methods that take advantage of the speed and pairwise form of these new backbone-only scoring functions were investigated. The first is a procedure that exploits available sequence data by averaging scores over threading solutions for homologs. This was evaluated by applying it to the challenging problem of identifying interacting transmembrane alpha-helices and found to further improve prediction accuracy. The second is a protein design method for determining the optimal sequence for a backbone structure by applying Belief Propagation optimization using the 6D scoring functions. The sensitivity of this method to backbone structure perturbations was compared with that of fixed-backbone all-atom modeling by determining the similarities between optimal sequences for two different backbone structures within the same protein family. The results showed that the design method using 6D scoring functions was more robust to small variations in backbone structure than the all-atom design method. Conclusions Backbone-only residue pair scoring functions that account for all six relative degrees of freedom are the most accurate and including the scores of homologs further improves the accuracy in threading applications. The 6D scoring function outperformed several side chain-dependent potentials while avoiding time-consuming and error prone side chain structure prediction. These scoring functions are particularly useful as an initial filter in protein design problems before applying all-atom modeling.
Collapse
|
150
|
Rykunov D, Fiser A. New statistical potential for quality assessment of protein models and a survey of energy functions. BMC Bioinformatics 2010; 11:128. [PMID: 20226048 PMCID: PMC2853469 DOI: 10.1186/1471-2105-11-128] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Accepted: 03/12/2010] [Indexed: 11/30/2022] Open
Abstract
Background Scoring functions, such as molecular mechanic forcefields and statistical potentials are fundamentally important tools in protein structure modeling and quality assessment. Results The performances of a number of publicly available scoring functions are compared with a statistical rigor, with an emphasis on knowledge-based potentials. We explored the effect on accuracy of alternative choices for representing interaction center types and other features of scoring functions, such as using information on solvent accessibility, on torsion angles, accounting for secondary structure preferences and side chain orientation. Partially based on the observations made, we present a novel residue based statistical potential, which employs a shuffled reference state definition and takes into account the mutual orientation of residue side chains. Atom- and residue-level statistical potentials and Linux executables to calculate the energy of a given protein proposed in this work can be downloaded from http://www.fiserlab.org/potentials. Conclusions Among the most influential terms we observed a critical role of a proper reference state definition and the benefits of including information about the microenvironment of interaction centers. Molecular mechanical potentials were also tested and found to be over-sensitive to small local imperfections in a structure, requiring unfeasible long energy relaxation before energy scores started to correlate with model quality.
Collapse
Affiliation(s)
- Dmitry Rykunov
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Ave,, Bronx, NY 10461, USA
| | | |
Collapse
|