1
|
Ceres N, Lavery R. Coarse-grain Protein Models. INNOVATIONS IN BIOMOLECULAR MODELING AND SIMULATIONS 2012. [DOI: 10.1039/9781849735049-00219] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Coarse-graining is a powerful approach for modeling biomolecules that, over the last few decades, has been extensively applied to proteins. Coarse-grain models offer access to large systems and to slow processes without becoming computationally unmanageable. In addition, they are very versatile, enabling both the protein representation and the energy function to be adapted to the biological problem in hand. This review concentrates on modeling soluble proteins and their assemblies. It presents an overview of the coarse-grain representations, of the associated interaction potentials, and of the optimization procedures used to define them. It then shows how coarse-grain models have been used to understand processes involving proteins, from their initial folding to their functional properties, their binary interactions, and the assembly of large complexes.
Collapse
Affiliation(s)
- N. Ceres
- Bases Moléculaires et Structurales des Systèmes Infectieux Université Lyon1/CNRS UMR 5086, IBCP, 7 Passage du Vercors, 69367, Lyon France
| | - R. Lavery
- Bases Moléculaires et Structurales des Systèmes Infectieux Université Lyon1/CNRS UMR 5086, IBCP, 7 Passage du Vercors, 69367, Lyon France
| |
Collapse
|
2
|
Zhou Y, Duan Y, Yang Y, Faraggi E, Lei H. Trends in template/fragment-free protein structure prediction. Theor Chem Acc 2011; 128:3-16. [PMID: 21423322 PMCID: PMC3030773 DOI: 10.1007/s00214-010-0799-2] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2010] [Accepted: 08/15/2010] [Indexed: 12/13/2022]
Abstract
Predicting the structure of a protein from its amino acid sequence is a long-standing unsolved problem in computational biology. Its solution would be of both fundamental and practical importance as the gap between the number of known sequences and the number of experimentally solved structures widens rapidly. Currently, the most successful approaches are based on fragment/template reassembly. Lacking progress in template-free structure prediction calls for novel ideas and approaches. This article reviews trends in the development of physical and specific knowledge-based energy functions as well as sampling techniques for fragment-free structure prediction. Recent physical- and knowledge-based studies demonstrated that it is possible to sample and predict highly accurate protein structures without borrowing native fragments from known protein structures. These emerging approaches with fully flexible sampling have the potential to move the field forward.
Collapse
Affiliation(s)
- Yaoqi Zhou
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Yong Duan
- UC Davis Genome Center and Department of Applied Science, University of California, One Shields Avenue, Davis, CA USA
- College of Physics, Huazhong University of Science and Technology, 1037 Luoyu Road, 430074 Wuhan, China
| | - Yuedong Yang
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Eshel Faraggi
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Hongxing Lei
- UC Davis Genome Center and Department of Applied Science, University of California, One Shields Avenue, Davis, CA USA
- Beijing Institute of Genomics, Chinese Academy of Sciences, 100029 Beijing, China
| |
Collapse
|
3
|
Hamelryck T, Borg M, Paluszewski M, Paulsen J, Frellsen J, Andreetta C, Boomsma W, Bottaro S, Ferkinghoff-Borg J. Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PLoS One 2010; 5:e13714. [PMID: 21103041 PMCID: PMC2978081 DOI: 10.1371/journal.pone.0013714] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2010] [Accepted: 10/04/2010] [Indexed: 11/26/2022] Open
Abstract
Understanding protein structure is of crucial importance in science, medicine and biotechnology. For about two decades, knowledge-based potentials based on pairwise distances – so-called “potentials of mean force” (PMFs) – have been center stage in the prediction and design of protein structure and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state – a necessary component of these potentials – is an unsolved problem. PMFs are loosely justified by analogy to the reversible work theorem in statistical physics, or by a statistical argument based on a likelihood function. Both justifications are insightful but leave many questions unanswered. Here, we show for the first time that PMFs can be seen as approximations to quantities that do have a rigorous probabilistic justification: they naturally arise when probability distributions over different features of proteins need to be combined. We call these quantities “reference ratio distributions” deriving from the application of the “reference ratio method.” This new view is not only of theoretical relevance but leads to many insights that are of direct practical use: the reference state is uniquely defined and does not require external physical insights; the approach can be generalized beyond pairwise distances to arbitrary features of protein structure; and it becomes clear for which purposes the use of these quantities is justified. We illustrate these insights with two applications, involving the radius of gyration and hydrogen bonding. In the latter case, we also show how the reference ratio method can be iteratively applied to sculpt an energy funnel. Our results considerably increase the understanding and scope of energy functions derived from known biomolecular structures.
Collapse
Affiliation(s)
- Thomas Hamelryck
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (TH); (JFB)
| | - Mikael Borg
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Martin Paluszewski
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jonas Paulsen
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jes Frellsen
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Christian Andreetta
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
- Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Sandro Bottaro
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
| | - Jesper Ferkinghoff-Borg
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
- * E-mail: (TH); (JFB)
| |
Collapse
|
4
|
Sodt AJ, Head-Gordon T. Driving forces for transmembrane alpha-helix oligomerization. Biophys J 2010; 99:227-37. [PMID: 20655851 DOI: 10.1016/j.bpj.2010.03.071] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2009] [Revised: 03/24/2010] [Accepted: 03/29/2010] [Indexed: 11/25/2022] Open
Abstract
We present what we believe to be a novel statistical contact potential based on solved structures of transmembrane (TM) alpha-helical bundles, and we use this contact potential to investigate the amino acid likelihood of stabilizing helix-helix interfaces. To increase statistical significance, we have reduced the full contact energy matrix to a four-flavor alphabet of amino acids, automatically determined by our methodology, in which we find that polarity is a more dominant factor of group identity than is size, with charged or polar groups most often occupying the same face, whereas polar/apolar residue pairs tend to occupy opposite faces. We found that the most polar residues strongly influence interhelical contact formation, although they occur rarely in TM helical bundles. Two-body contact energies in the reduced letter code are capable of determining native structure from a large decoy set for a majority of test TM proteins, at the same time illustrating that certain higher-order sequence correlations are necessary for more accurate structure predictions.
Collapse
Affiliation(s)
- Alex J Sodt
- Department of Bioengineering, University of California, Berkeley, California, USA.
| | | |
Collapse
|
5
|
Zhang J, Zhang Y. A novel side-chain orientation dependent potential derived from random-walk reference state for protein fold selection and structure prediction. PLoS One 2010; 5:e15386. [PMID: 21060880 PMCID: PMC2965178 DOI: 10.1371/journal.pone.0015386] [Citation(s) in RCA: 171] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2010] [Accepted: 09/01/2010] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND An accurate potential function is essential to attack protein folding and structure prediction problems. The key to developing efficient knowledge-based potential functions is to design reference states that can appropriately counteract generic interactions. The reference states of many knowledge-based distance-dependent atomic potential functions were derived from non-interacting particles such as ideal gas, however, which ignored the inherent sequence connectivity and entropic elasticity of proteins. METHODOLOGY We developed a new pair-wise distance-dependent, atomic statistical potential function (RW), using an ideal random-walk chain as reference state, which was optimized on CASP models and then benchmarked on nine structural decoy sets. Second, we incorporated a new side-chain orientation-dependent energy term into RW (RWplus) and found that the side-chain packing orientation specificity can further improve the decoy recognition ability of the statistical potential. SIGNIFICANCE RW and RWplus demonstrate a significantly better ability than the best performing pair-wise distance-dependent atomic potential functions in both native and near-native model selections. It has higher energy-RMSD and energy-TM-score correlations compared with other potentials of the same type in real-life structure assembly decoys. When benchmarked with a comprehensive list of publicly available potentials, RW and RWplus shows comparable performance to the state-of-the-art scoring functions, including those combining terms from multiple resources. These data demonstrate the usefulness of random-walk chain as reference states which correctly account for sequence connectivity and entropic elasticity of proteins. It shows potential usefulness in structure recognition and protein folding simulations. The RW and RWplus potentials, as well as the newly generated I-TASSER decoys, are freely available in http://zhanglab.ccmb.med.umich.edu/RW.
Collapse
Affiliation(s)
- Jian Zhang
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Zhang
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
6
|
Cohen M, Potapov V, Schreiber G. Four distances between pairs of amino acids provide a precise description of their interaction. PLoS Comput Biol 2009; 5:e1000470. [PMID: 19680437 PMCID: PMC2715887 DOI: 10.1371/journal.pcbi.1000470] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2009] [Accepted: 07/15/2009] [Indexed: 11/18/2022] Open
Abstract
The three-dimensional structures of proteins are stabilized by the interactions between amino acid residues. Here we report a method where four distances are calculated between any two side chains to provide an exact spatial definition of their bonds. The data were binned into a four-dimensional grid and compared to a random model, from which the preference for specific four-distances was calculated. A clear relation between the quality of the experimental data and the tightness of the distance distribution was observed, with crystal structure data providing far tighter distance distributions than NMR data. Since the four-distance data have higher information content than classical bond descriptions, we were able to identify many unique inter-residue features not found previously in proteins. For example, we found that the side chains of Arg, Glu, Val and Leu are not symmetrical in respect to the interactions of their head groups. The described method may be developed into a function, which computationally models accurately protein structures.
Collapse
Affiliation(s)
- Mati Cohen
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Vladimir Potapov
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Gideon Schreiber
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
7
|
Gu J, Li H, Jiang H, Wang X. A simple Calpha-SC potential with higher accuracy for protein fold recognition. Biochem Biophys Res Commun 2009; 379:610-5. [PMID: 19121621 DOI: 10.1016/j.bbrc.2008.12.131] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2008] [Accepted: 12/20/2008] [Indexed: 11/18/2022]
Abstract
In this paper, an improved C(alpha)-SC energy potential designed for protein fold recognition was reported. It consists of three extremely simple interaction terms which are supposed to be the dominant interactions in protein folding: residue-residue contact, hydrophobicity and pseudodihedral potentials. The potential function only contains 210 contacts, one hydrophobic and one torsion parameters, which have been optimized using an interior point algorithm of linear programming. Tests of the derived potential function on commonly used decoy sets illustrate that it outperforms most of the existing coarse-grained potentials in terms of its capabilities in recognizing native structures and consistency in achieving high Z-scores across decoy sets, and it has almost equivalent performance to the potentials which considered complex intra-molecular interactions. The results show that our scoring function is a generally prospective potential for protein structure prediction and modeling with regard to its recognition and computation efficacy.
Collapse
Affiliation(s)
- Junfeng Gu
- State Key Laboratory of Structural Analysis for Industrial Equipment, Department of Engineering Mechanics, Dalian University of Technology, Dalian 116024, China
| | | | | | | |
Collapse
|
8
|
Yang Y, Zhou Y. Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions. Protein Sci 2008; 17:1212-9. [PMID: 18469178 DOI: 10.1110/ps.033480.107] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
One of the common methods for assessing energy functions of proteins is selection of native or near-native structures from decoys. This is an efficient but indirect test of the energy functions because decoy structures are typically generated either by sampling procedures or by a separate energy function. As a result, these decoys may not contain the global minimum structure that reflects the true folding accuracy of the energy functions. This paper proposes to assess energy functions by ab initio refolding of fully unfolded terminal segments with secondary structures while keeping the rest of the proteins fixed in their native conformations. Global energy minimization of these short unfolded segments, a challenging yet tractable problem, is a direct test of the energy functions. As an illustrative example, refolding terminal segments is employed to assess two closely related all-atom statistical energy functions, DFIRE (distance-scaled, finite, ideal-gas reference state) and DOPE (discrete optimized protein energy). We found that a simple sequence-position dependence contained in the DOPE energy function leads to an intrinsic bias toward the formation of helical structures. Meanwhile, a finer statistical treatment of short-range interactions yields a significant improvement in the accuracy of segment refolding by DFIRE. The updated DFIRE energy function yields success rates of 100% and 67%, respectively, for its ability to sample and fold fully unfolded terminal segments of 15 proteins to within 3.5 A global root-mean-squared distance from the corresponding native structures. The updated DFIRE energy function is available as DFIRE 2.0 upon request.
Collapse
Affiliation(s)
- Yuedong Yang
- Indiana University School of Informatics, Indiana University-Purdue University, Indianapolis, Indiana 46202, USA
| | | |
Collapse
|