1
|
González-Delgado J, Bernadó P, Neuvial P, Cortés J. Statistical proofs of the interdependence between nearest neighbor effects on polypeptide backbone conformations. J Struct Biol 2022; 214:107907. [PMID: 36272694 DOI: 10.1016/j.jsb.2022.107907] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 10/03/2022] [Accepted: 10/09/2022] [Indexed: 11/06/2022]
Abstract
Backbone dihedral angles ϕ and ψ are the main structural descriptors of proteins and peptides. The distribution of these angles has been investigated over decades as they are essential for the validation and refinement of experimental measurements, as well as for structure prediction and design methods. The dependence of these distributions, not only on the nature of each amino acid but also on that of the closest neighbors, has been the subject of numerous studies. Although neighbor-dependent distributions are nowadays generally accepted as a good model, there is still some controversy about the combined effects of left and right neighbors. We have investigated this question using rigorous methods based on recently-developed statistical techniques. Our results unambiguously demonstrate that the influence of left and right neighbors cannot be considered independently. Consequently, three-residue fragments should be considered as the minimal building blocks to investigate polypeptide sequence-structure relationships.
Collapse
Affiliation(s)
- Javier González-Delgado
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France; Institut de Mathématiques de Toulouse, Université de Toulouse, CNRS, France
| | - Pau Bernadó
- Centre de Biologie Structurale, Université de Montpellier, INSERM, CNRS, France
| | - Pierre Neuvial
- Institut de Mathématiques de Toulouse, Université de Toulouse, CNRS, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France.
| |
Collapse
|
2
|
Barozet A, Bianciotto M, Vaisset M, Siméon T, Minoux H, Cortés J. Protein loops with multiple meta-stable conformations: A challenge for sampling and scoring methods. Proteins 2020; 89:218-231. [PMID: 32920900 DOI: 10.1002/prot.26008] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 08/10/2020] [Accepted: 08/25/2020] [Indexed: 12/25/2022]
Abstract
Flexible regions in proteins, such as loops, cannot be represented by a single conformation. Instead, conformational ensembles are needed to provide a more global picture. In this context, identifying statistically meaningful conformations within an ensemble generated by loop sampling techniques remains an open problem. The difficulty is primarily related to the lack of structural data about these flexible regions. With the majority of structural data coming from x-ray crystallography and ignoring plasticity, the conception and evaluation of loop scoring methods is challenging. In this work, we compare the performance of various scoring methods on a set of eight protein loops that are known to be flexible. The ability of each method to identify and select all of the known conformations is assessed, and the underlying energy landscapes are produced and projected to visualize the qualitative differences obtained when using the methods. Statistical potentials are found to provide considerable reliability despite their being designed to tradeoff accuracy for lower computational cost. On a large pool of loop models, they are capable of filtering out statistically improbable states while retaining those that resemble known (and thus likely) conformations. However, computationally expensive methods are still required for more precise assessment and structural refinement. The results also highlight the importance of employing several scaffolds for the protein, due to the high influence of small structural rearrangements in the rest of the protein over the modeled energy landscape for the loop.
Collapse
Affiliation(s)
- Amélie Barozet
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France.,Sanofi Recherche & Développement, Integrated Drug Discovery, Molecular Design Sciences, Vitry-sur-Seine, France
| | - Marc Bianciotto
- Sanofi Recherche & Développement, Integrated Drug Discovery, Molecular Design Sciences, Vitry-sur-Seine, France
| | - Marc Vaisset
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Thierry Siméon
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Hervé Minoux
- Sanofi Recherche & Développement, Integrated Drug Discovery, Molecular Design Sciences, Vitry-sur-Seine, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| |
Collapse
|
3
|
Kundert K, Kortemme T. Computational design of structured loops for new protein functions. Biol Chem 2019; 400:275-288. [PMID: 30676995 PMCID: PMC6530579 DOI: 10.1515/hsz-2018-0348] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 12/18/2018] [Indexed: 12/20/2022]
Abstract
The ability to engineer the precise geometries, fine-tuned energetics and subtle dynamics that are characteristic of functional proteins is a major unsolved challenge in the field of computational protein design. In natural proteins, functional sites exhibiting these properties often feature structured loops. However, unlike the elements of secondary structures that comprise idealized protein folds, structured loops have been difficult to design computationally. Addressing this shortcoming in a general way is a necessary first step towards the routine design of protein function. In this perspective, we will describe the progress that has been made on this problem and discuss how recent advances in the field of loop structure prediction can be harnessed and applied to the inverse problem of computational loop design.
Collapse
Affiliation(s)
- Kale Kundert
- Graduate Group in Biophysics, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
| | - Tanja Kortemme
- Graduate Group in Biophysics, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, 499 Illinois St, San Francisco, CA 94158, USA
| |
Collapse
|
4
|
Jiang F, Wu HN, Kang W, Wu YD. Developments and Applications of Coil-Library-Based Residue-Specific Force Fields for Molecular Dynamics Simulations of Peptides and Proteins. J Chem Theory Comput 2019; 15:2761-2773. [DOI: 10.1021/acs.jctc.8b00794] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Fan Jiang
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Hao-Nan Wu
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Wei Kang
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
- College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Yun-Dong Wu
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
- College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| |
Collapse
|
5
|
Bansal N, Zheng Z, Song LF, Pei J, Merz KM. The Role of the Active Site Flap in Streptavidin/Biotin Complex Formation. J Am Chem Soc 2018; 140:5434-5446. [PMID: 29607642 DOI: 10.1021/jacs.8b00743] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Obtaining a detailed description of how active site flap motion affects substrate or ligand binding will advance structure-based drug design (SBDD) efforts on systems including the kinases, HSP90, HIV protease, ureases, etc. Through this understanding, we will be able to design better inhibitors and better proteins that have desired functions. Herein we address this issue by generating the relevant configurational states of a protein flap on the molecular energy landscape using an approach we call MTFlex-b and then following this with a procedure to estimate the free energy associated with the motion of the flap region. To illustrate our overall workflow, we explored the free energy changes in the streptavidin/biotin system upon introducing conformational flexibility in loop3-4 in the biotin unbound ( apo) and bound ( holo) state. The free energy surfaces were created using the Movable Type free energy method, and for further validation, we compared them to potential of mean force (PMF) generated free energy surfaces using MD simulations employing the FF99SBILDN and FF14SB force fields. We also estimated the free energy thermodynamic cycle using an ensemble of closed-like and open-like end states for the ligand unbound and bound states and estimated the binding free energy to be approximately -16.2 kcal/mol (experimental -18.3 kcal/mol). The good agreement between MTFlex-b in combination with the MT method with experiment and MD simulations supports the effectiveness of our strategy in obtaining unique insights into the motions in proteins that can then be used in a range of biological and biomedical applications.
Collapse
Affiliation(s)
- Nupur Bansal
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Zheng Zheng
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Lin Frank Song
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Jun Pei
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Kenneth M Merz
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States.,Institute for Cyber Enabled Research , Michigan State University , 567 Wilson Road , East Lansing , Michigan 48824 , United States
| |
Collapse
|
6
|
Elhefnawy W, Chen L, Han Y, Li Y. ICOSA: A Distance-Dependent, Orientation-Specific Coarse-Grained Contact Potential for Protein Structure Modeling. J Mol Biol 2015; 427:2562-2576. [DOI: 10.1016/j.jmb.2015.05.022] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 05/21/2015] [Indexed: 11/16/2022]
|
7
|
Yaseen A, Li Y. Template-based C8-SCORPION: a protein 8-state secondary structure prediction method using structural information and context-based features. BMC Bioinformatics 2014; 15 Suppl 8:S3. [PMID: 25080939 PMCID: PMC4120151 DOI: 10.1186/1471-2105-15-s8-s3] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Secondary structures prediction of proteins is important to many protein structure modeling applications. Correct prediction of secondary structures can significantly reduce the degrees of freedom in protein tertiary structure modeling and therefore reduces the difficulty of obtaining high resolution 3D models. Methods In this work, we investigate a template-based approach to enhance 8-state secondary structure prediction accuracy. We construct structural templates from known protein structures with certain sequence similarity. The structural templates are then incorporated as features with sequence and evolutionary information to train two-stage neural networks. In case of structural templates absence, heuristic structural information is incorporated instead. Results After applying the template-based 8-state secondary structure prediction method, the 7-fold cross-validated Q8 accuracy is 78.85%. Even templates from structures with only 20%~30% sequence similarity can help improve the 8-state prediction accuracy. More importantly, when good templates are available, the prediction accuracy of less frequent secondary structures, such as 3-10 helices, turns, and bends, are highly improved, which are useful for practical applications. Conclusions Our computational results show that the templates containing structural information are effective features to enhance 8-state secondary structure predictions. Our prediction algorithm is implemented on a web server named "C8-SCORPION" available at: http://hpcr.cs.odu.edu/c8scorpion.
Collapse
|
8
|
Solis AD. Deriving high-resolution protein backbone structure propensities from all crystal data using the information maximization device. PLoS One 2014; 9:e94334. [PMID: 24896099 PMCID: PMC4045576 DOI: 10.1371/journal.pone.0094334] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 03/12/2014] [Indexed: 11/28/2022] Open
Abstract
The most informative probability distribution functions (PDFs) describing the Ramachandran phi-psi dihedral angle pair, a fundamental descriptor of backbone conformation of protein molecules, are derived from high-resolution X-ray crystal structures using an information-theoretic approach. The Information Maximization Device (IMD) is established, based on fundamental information-theoretic concepts, and then applied specifically to derive highly resolved phi-psi maps for all 20 single amino acid and all 8000 triplet sequences at an optimal resolution determined by the volume of current data. The paper shows that utilizing the latent information contained in all viable high-resolution crystal structures found in the Protein Data Bank (PDB), totaling more than 77,000 chains, permits the derivation of a large number of optimized sequence-dependent PDFs. This work demonstrates the effectiveness of the IMD and the superiority of the resulting PDFs by extensive fold recognition experiments and rigorous comparisons with previously published triplet PDFs. Because it automatically optimizes PDFs, IMD results in improved performance of knowledge-based potentials, which rely on such PDFs. Furthermore, it provides an easy computational recipe for empirically deriving other kinds of sequence-dependent structural PDFs with greater detail and precision. The high-resolution phi-psi maps derived in this work are available for download.
Collapse
Affiliation(s)
- Armando D. Solis
- Biological Sciences Department, New York City College of Technology, The City University of New York, Brooklyn, New York, United States of America
- * E-mail:
| |
Collapse
|
9
|
Yaseen A, Li Y. Context-based features enhance protein secondary structure prediction accuracy. J Chem Inf Model 2014; 54:992-1002. [PMID: 24571803 DOI: 10.1021/ci400647u] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We report a new approach of using statistical context-based scores as encoded features to train neural networks to achieve secondary structure prediction accuracy improvement. The context-based scores are pseudo-potentials derived by evaluating statistical, high-order inter-residue interactions, which estimate the favorability of a residue adopting certain secondary structure conformation within its amino acid environment. Encoding these context-based scores as important training and prediction features provides a way to address a long-standing difficulty in neural network-based secondary structure predictions of taking interdependency among secondary structures of neighboring residues into account. Our computational results have shown that the context-based scores are effective features to enhance the prediction accuracy of secondary structure predictions. An overall 7-fold cross-validated Q3 accuracy of 82.74% and Segment Overlap Accuracy (SOV) accuracy of 86.25% are achieved on a set of more than 7987 protein chains with, at most, 25% sequence identity. The Q3 prediction accuracy on benchmarks of CB513, Manesh215, Carugo338, as well as CASP9 protein chains is higher than popularly used secondary structure prediction servers, including Psipred, Profphd, Jpred, Porter (ab initio), and Netsurf. More significant improvement is observed in the SOV accuracy, where more than 4% enhancement is observed, compared to the server with the best SOV accuracy. A Q8 accuracy of >70% (71.5%) is also found in eight-state secondary structure prediction. The majority of the Q3 accuracy improvement is contributed from correctly identifying β-sheets and α-helices. When the context-based scores are incorporated, there are 15.5% more residues predicted with >90% confidence. These high-confidence predictions usually have a rather high accuracy (averagely ~95%). The three- and eight-state prediction servers (SCORPION) implementing our methods are available online.
Collapse
Affiliation(s)
- Ashraf Yaseen
- Department of Computer Science, Old Dominion University , Norfolk, Virginia 23529, United States
| | | |
Collapse
|
10
|
Dong GQ, Fan H, Schneidman-Duhovny D, Webb B, Sali A. Optimized atomic statistical potentials: assessment of protein interfaces and loops. Bioinformatics 2013; 29:3158-66. [PMID: 24078704 PMCID: PMC3842762 DOI: 10.1093/bioinformatics/btt560] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Revised: 08/13/2013] [Accepted: 09/22/2013] [Indexed: 01/16/2023] Open
Abstract
MOTIVATION Statistical potentials have been widely used for modeling whole proteins and their parts (e.g. sidechains and loops) as well as interactions between proteins, nucleic acids and small molecules. Here, we formulate the statistical potentials entirely within a statistical framework, avoiding questionable statistical mechanical assumptions and approximations, including a definition of the reference state. RESULTS We derive a general Bayesian framework for inferring statistically optimized atomic potentials (SOAP) in which the reference state is replaced with data-driven 'recovery' functions. Moreover, we restrain the relative orientation between two covalent bonds instead of a simple distance between two atoms, in an effort to capture orientation-dependent interactions such as hydrogen bonds. To demonstrate this general approach, we computed statistical potentials for protein-protein docking (SOAP-PP) and loop modeling (SOAP-Loop). For docking, a near-native model is within the top 10 scoring models in 40% of the PatchDock benchmark cases, compared with 23 and 27% for the state-of-the-art ZDOCK and FireDock scoring functions, respectively. Similarly, for modeling 12-residue loops in the PLOP benchmark, the average main-chain root mean square deviation of the best scored conformations by SOAP-Loop is 1.5 Å, close to the average root mean square deviation of the best sampled conformations (1.2 Å) and significantly better than that selected by Rosetta (2.1 Å), DFIRE (2.3 Å), DOPE (2.5 Å) and PLOP scoring functions (3.0 Å). Our Bayesian framework may also result in more accurate statistical potentials for additional modeling applications, thus affording better leverage of the experimentally determined protein structures. AVAILABILITY AND IMPLEMENTATION SOAP-PP and SOAP-Loop are available as part of MODELLER (http://salilab.org/modeller).
Collapse
Affiliation(s)
- Guang Qiang Dong
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry and California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, CA 94158, USA
| | | | | | | | | |
Collapse
|
11
|
Yaseen A, Li Y. Dinosolve: a protein disulfide bonding prediction server using context-based features to enhance prediction accuracy. BMC Bioinformatics 2013; 14 Suppl 13:S9. [PMID: 24267383 PMCID: PMC3849605 DOI: 10.1186/1471-2105-14-s13-s9] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Disulfide bonds play an important role in protein folding and structure stability. Accurately predicting disulfide bonds from protein sequences is important for modeling the structural and functional characteristics of many proteins. Methods In this work, we introduce an approach of enhancing disulfide bonding prediction accuracy by taking advantage of context-based features. We firstly derive the first-order and second-order mean-force potentials according to the amino acid environment around the cysteine residues from large number of cysteine samples. The mean-force potentials are integrated as context-based scores to estimate the favorability of a cysteine residue in disulfide bonding state as well as a cysteine pair in disulfide bond connectivity. These context-based scores are then incorporated as features together with other sequence and evolutionary information to train neural networks for disulfide bonding state prediction and connectivity prediction. Results The 10-fold cross validated accuracy is 90.8% at residue-level and 85.6% at protein-level in classifying an individual cysteine residue as bonded or free, which is around 2% accuracy improvement. The average accuracy for disulfide bonding connectivity prediction is also improved, which yields overall sensitivity of 73.42% and specificity of 91.61%. Conclusions Our computational results have shown that the context-based scores are effective features to enhance the prediction accuracies of both disulfide bonding state prediction and connectivity prediction. Our disulfide prediction algorithm is implemented on a web server named "Dinosolve" available at: http://hpcr.cs.odu.edu/dinosolve.
Collapse
|
12
|
Jiang F, Han W, Wu YD. The intrinsic conformational features of amino acids from a protein coil library and their applications in force field development. Phys Chem Chem Phys 2013; 15:3413-28. [PMID: 23385383 DOI: 10.1039/c2cp43633g] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The local conformational (φ, ψ, χ) preferences of amino acid residues remain an active research area, which are important for the development of protein force fields. In this perspective article, we first summarize spectroscopic studies of alanine-based short peptides in aqueous solution. While most studies indicate a preference for the P(II) conformation in the unfolded state over α and β conformations, significant variations are also observed. A statistical analysis from various coil libraries of high-resolution protein structures is then summarized, which gives a more coherent view of the local conformational features. The φ, ψ, χ distributions of the 20 amino acids have been obtained from a protein coil library, considering both backbone and side-chain conformational preferences. The intrinsic side-chain χ(1) rotamer preference and χ(1)-dependent Ramachandran plot can be generally understood by combining the interaction of the side-chain Cγ/Oγ atom with two neighboring backbone peptide groups. Current all-atom force fields such as AMBER ff99sb-ILDN, ff03 and OPLS-AA/L do not reproduce these distributions well. A method has been developed by combining the φ, ψ plot of alanine with the influence of side-chain χ(1) rotamers to derive the local conformational features of various amino acids. It has been further applied to improve the OPLS-AA force field. The modified force field (OPLS-AA/C) reproduces experimental (3)J coupling constants for various short peptides quite well. It also better reproduces the temperature-dependence of the helix-coil transition for alanine-based peptides. The new force field can fold a series of peptides and proteins with various secondary structures to their experimental structures. MD simulations of several globular proteins using the improved force field give significantly less deviation (RMSD) to experimental structures. The results indicate that the local conformational features from coil libraries are valuable for the development of balanced protein force fields.
Collapse
Affiliation(s)
- Fan Jiang
- Laboratory of Computational Chemistry and Drug Design, Laboratory of Chemical Genomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | | | | |
Collapse
|
13
|
Chys P, Chacón P. Random Coordinate Descent with Spinor-matrices and Geometric Filters for Efficient Loop Closure. J Chem Theory Comput 2013; 9:1821-9. [PMID: 26587638 DOI: 10.1021/ct300977f] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Protein loop closure constitutes a critical step in loop and protein modeling whereby geometrically feasible loops must be found between two given anchor residues. Here, a new analytic/iterative algorithm denoted random coordinate descent (RCD) to perform protein loop closure is described. The algorithm solves loop closure through minimization as in cyclic coordinate descent but selects bonds for optimization randomly, updates loop conformations by spinor-matrices, performs loop closure in both chain directions, and uses a set of geometric filters to yield efficient conformational sampling. Geometric filters allow one to detect clashes and constrain dihedral angles on the fly. The RCD algorithm is at least comparable to state of the art loop closure algorithms due to an excellent balance between efficiency and intrinsic sampling capability. Furthermore, its efficiency allows one to improve conformational sampling by increasing the sampling number without much penalty. Overall, RCD turns out to be accurate, fast, robust, and applicable over a wide range of loop lengths. Because of the versatility of RCD, it is a solid alternative for integration with current loop modeling strategies.
Collapse
Affiliation(s)
- Pieter Chys
- Structural Bioinformatics Group, Biological Chemical Physics Department, Institute of Physical Chemistry Rocasolano (IQFR), Consejo Superior de Investigaciones Cientı́ficas (CSIC), Calle de Serrano 119, Madrid 28006, Spain
| | - Pablo Chacón
- Structural Bioinformatics Group, Biological Chemical Physics Department, Institute of Physical Chemistry Rocasolano (IQFR), Consejo Superior de Investigaciones Cientı́ficas (CSIC), Calle de Serrano 119, Madrid 28006, Spain
| |
Collapse
|
14
|
Li Y. Conformational sampling in template-free protein loop structure modeling: an overview. Comput Struct Biotechnol J 2013; 5:e201302003. [PMID: 24688696 PMCID: PMC3962101 DOI: 10.5936/csbj.201302003] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Revised: 01/23/2013] [Accepted: 01/28/2013] [Indexed: 01/04/2023] Open
Abstract
Accurately modeling protein loops is an important step to predict three-dimensional structures as well as to understand functions of many proteins. Because of their high flexibility, modeling the three-dimensional structures of loops is difficult and is usually treated as a "mini protein folding problem" under geometric constraints. In the past decade, there has been remarkable progress in template-free loop structure modeling due to advances of computational methods as well as stably increasing number of known structures available in PDB. This mini review provides an overview on the recent computational approaches for loop structure modeling. In particular, we focus on the approaches of sampling loop conformation space, which is a critical step to obtain high resolution models in template-free methods. We review the potential energy functions for loop modeling, loop buildup mechanisms to satisfy geometric constraints, and loop conformation sampling algorithms. The recent loop modeling results are also summarized.
Collapse
Affiliation(s)
- Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| |
Collapse
|
15
|
Liang S, Zhang C, Sarmiento J, Standley DM. Protein Loop Modeling with Optimized Backbone Potential Functions. J Chem Theory Comput 2012; 8:1820-7. [PMID: 26593673 DOI: 10.1021/ct300131p] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We represented protein backbone potential as a Fourier series. The parameters of the backbone dihedral potential were initialized to random values and optimized by Monte Carlo simulations so that generated native-like loop decoys had a lower energy than non-native decoys. The low energy regions of the optimized backbone potential were consistent with observed Ramachandran plots derived from crystal structures. The backbone potential was then used for the prediction of loop conformations (OSCAR-loop) combining with the previously described OSCAR force field, which has been shown to be very accurate in side chain modeling. As a result, the accuracy of OSCAR-loop was improved by local energy minimization based on the complete force field. The average accuracies were 0.40, 0.70, 1.10, 2.08, and 3.58 Å for 4, 6, 8, 10, and 12-residue loops, respectively, with each size being represented by 325 to 2809 targets. The accuracy was better than that of other loop modeling algorithms for short loops (<10 residues). For longer loops, the prediction accuracy was improved by concurrently sampling with a fragment-based method, Spanner. OSCAR-loop is available for download at http://sysimm.ifrec.osaka-u.ac.jp/OSCAR/ .
Collapse
Affiliation(s)
- Shide Liang
- Systems Immunology Lab, Immunology Frontier Research Center, Osaka University , Suita, Osaka, 565-0871, Japan
| | - Chi Zhang
- School of Biological Sciences, Center for Plant Science and Innovation, University of Nebraska , Lincoln, Nebraska 68588, United States
| | - Jamica Sarmiento
- Systems Immunology Lab, Immunology Frontier Research Center, Osaka University , Suita, Osaka, 565-0871, Japan
| | - Daron M Standley
- Systems Immunology Lab, Immunology Frontier Research Center, Osaka University , Suita, Osaka, 565-0871, Japan
| |
Collapse
|
16
|
Koppole S, Schaefer M. A discriminative Ramachandran potential of mean force aimed at minimizing secondary structure bias. J Comput Chem 2012; 33:791-9. [DOI: 10.1002/jcc.22908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2011] [Revised: 10/24/2011] [Accepted: 11/20/2011] [Indexed: 11/12/2022]
|
17
|
Shiu JH, Chen CY, Chen YC, Chang YT, Chang YS, Huang CH, Chuang WJ. Effect of P to A mutation of the N-terminal residue adjacent to the Rgd motif on rhodostomin: importance of dynamics in integrin recognition. PLoS One 2012; 7:e28833. [PMID: 22238583 PMCID: PMC3251565 DOI: 10.1371/journal.pone.0028833] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2011] [Accepted: 11/15/2011] [Indexed: 12/04/2022] Open
Abstract
Rhodostomin (Rho) is an RGD protein that specifically inhibits integrins. We found that Rho mutants with the P48A mutation 4.4-11.5 times more actively inhibited integrin α5β1. Structural analysis showed that they have a similar 3D conformation for the RGD loop. Docking analysis also showed no difference between their interactions with integrin α5β1. However, the backbone dynamics of RGD residues were different. The values of the R(2) relaxation parameter for Rho residues R49 and D51 were 39% and 54% higher than those of the P48A mutant, which caused differences in S(2), R(ex), and τ(e). The S(2) values of the P48A mutant residues R49, G50, and D51 were 29%, 14%, and 28% lower than those of Rho. The R(ex) values of Rho residues R49 and D51 were 0.91 s(-1) and 1.42 s(-1); however, no R(ex) was found for those of the P48A mutant. The τ(e) values of Rho residues R49 and D51 were 9.5 and 5.1 times lower than those of P48A mutant. Mutational study showed that integrin α5β1 prefers its ligands to contain (G/A)RGD but not PRGD sequences for binding. These results demonstrate that the N-terminal proline residue adjacent to the RGD motif affect its function and dynamics, which suggests that the dynamic properties of the RGD motif may be important in Rho's interaction with integrin α5β1.
Collapse
Affiliation(s)
- Jia-Hau Shiu
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, National Cheng Kung University College of Medicine, Tainan, Taiwan
| | - Chiu-Yueh Chen
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, National Cheng Kung University College of Medicine, Tainan, Taiwan
| | - Yi-Chun Chen
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, National Cheng Kung University College of Medicine, Tainan, Taiwan
| | - Yao-Tsung Chang
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, National Cheng Kung University College of Medicine, Tainan, Taiwan
| | - Yung-Sheng Chang
- Institute of Biopharmaceutical Sciences, National Cheng Kung University College of Medicine, Tainan, Taiwan
| | - Chun-Hao Huang
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, National Cheng Kung University College of Medicine, Tainan, Taiwan
| | - Woei-Jer Chuang
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, National Cheng Kung University College of Medicine, Tainan, Taiwan
- Institute of Biopharmaceutical Sciences, National Cheng Kung University College of Medicine, Tainan, Taiwan
| |
Collapse
|
18
|
Cruz VL, Ramos J, Martinez-Salazar J. Assessment of the intrinsic conformational preferences of dipeptide amino acids in aqueous solution by combined umbrella sampling/MBAR statistics. A comparison with experimental results. J Phys Chem B 2011; 116:469-75. [PMID: 22136632 DOI: 10.1021/jp206757j] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The propensities of 19 amino acid dipeptides have been calculated by a distributed umbrella sampling molecular dynamics simulation procedure using the OPLS-AA force field. The potential of mean force maps was estimated with the multiple Bennett acceptance ratio statistics. The resulting propensities compare satisfactorily well with very recently published experimental data on equivalent systems. In particular, α conformation-probabilities for all of the dipeptides remain much lower than either β or P(II) propensities. This result is in agreement with most experimental data for dipeptides. However, it is also in contrast with most simulation studies performed so far with other force fields, where α conformations result even more probable than P(II) or β ones. We discuss the behavior of the OPLS-AA force field, which can be useful for the improvement of this model in reproducing the recent experimental observations on amino acid dipeptides.
Collapse
Affiliation(s)
- Victor L Cruz
- BIOPHYM, Department of Macromolecular Physics, Instituto de Estructura de la Materia, CSIC Serrano 113-bis, Madrid, Spain.
| | | | | |
Collapse
|
19
|
Li Y, Rata I, Jakobsson E. Sampling multiple scoring functions can improve protein loop structure prediction accuracy. J Chem Inf Model 2011; 51:1656-66. [PMID: 21702492 PMCID: PMC3211142 DOI: 10.1021/ci200143u] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Accurately predicting loop structures is important for understanding functions of many proteins. In order to obtain loop models with high accuracy, efficiently sampling the loop conformation space to discover reasonable structures is a critical step. In loop conformation sampling, coarse-grain energy (scoring) functions coupling with reduced protein representations are often used to reduce the number of degrees of freedom as well as sampling computational time. However, due to implicitly considering many factors by reduced representations, the coarse-grain scoring functions may have potential insensitivity and inaccuracy, which can mislead the sampling process and consequently ignore important loop conformations. In this paper, we present a new computational sampling approach to obtain reasonable loop backbone models, so-called the Pareto optimal sampling (POS) method. The rationale of the POS method is to sample the function space of multiple, carefully selected scoring functions to discover an ensemble of diversified structures yielding Pareto optimality to all sampled conformations. The POS method can efficiently tolerate insensitivity and inaccuracy in individual scoring functions and thereby lead to significant accuracy improvement in loop structure prediction. We apply the POS method to a set of 4-12-residue loop targets using a function space composed of backbone-only Rosetta and distance-scale finite ideal-gas reference (DFIRE) and a triplet backbone dihedral potential developed in our lab. Our computational results show that in 501 out of 502 targets, the model sets generated by POS contain structure models are within subangstrom resolution. Moreover, the top-ranked models have a root mean square deviation (rmsd) less than 1 A in 96.8, 84.1, and 72.2% of the short (4-6 residues), medium (7-9 residues), and long (10-12 residues) targets, respectively, when the all-atom models are generated by local optimization from the backbone models and are ranked by our recently developed Pareto optimal consensus (POC) method. Similar sampling effectiveness can also be found in a set of 13-residue loop targets.
Collapse
Affiliation(s)
- Yaohang Li
- Department of Computer Science, Old Dominion University
| | - Ionel Rata
- Center for Biophysics and Computational Biology, University of Illinois at Urbana-Champaign
| | - Eric Jakobsson
- Department of Molecular and Integrative Physiology, Beckman Institute, and National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
| |
Collapse
|
20
|
Liang S, Zhou Y, Grishin N, Standley DM. Protein side chain modeling with orientation-dependent atomic force fields derived by series expansions. J Comput Chem 2011; 32:1680-6. [PMID: 21374632 PMCID: PMC3072444 DOI: 10.1002/jcc.21747] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2010] [Revised: 12/10/2010] [Accepted: 12/11/2010] [Indexed: 11/09/2022]
Abstract
We describe the development of new force fields for protein side chain modeling called optimized side chain atomic energy (OSCAR). The distance-dependent energy functions (OSCAR-d) and side-chain dihedral angle potential energy functions were represented as power and Fourier series, respectively. The resulting 802 adjustable parameters were optimized by discriminating the native side chain conformations from non-native conformations, using a training set of 12,000 side chains for each residue type. In the course of optimization, for every residue, its side chain was replaced by varying rotamers, whereas conformations for all other residues were kept as they appeared in the crystal structure. Then, the OSCAR-d were multiplied by an orientation-dependent function to yield OSCAR-o. A total of 1087 parameters of the orientation-dependent energy functions (OSCAR-o) were optimized by maximizing the energy gap between the native conformation and subrotamers calculated as low energy by OSCAR-d. When OSCAR-o with optimized parameters were used to model side chain conformations simultaneously for 218 recently released protein structures, the prediction accuracies were 88.8% for χ(1) , 79.7% for χ(1 + 2) , 1.24 Å overall root mean square deviation (RMSD), and 0.62 Å RMSD for core residues, respectively, compared with the next-best performing side-chain modeling program which achieved 86.6% for χ(1) , 75.7% for χ(1 + 2) , 1.40 Å overall RMSD, and 0.86 Å RMSD for core residues, respectively. The continuous energy functions obtained in this study are suitable for gradient-based optimization techniques for protein structure refinement. A program with built-in OSCAR for protein side chain prediction is available for download at http://sysimm.ifrec.osaka-u.ac.jp/OSCAR/.
Collapse
Affiliation(s)
- Shide Liang
- Systems Immunology Lab, Immunology Frontier Research Center, Osaka University, Suita, Osaka 565-0871, Japan.
| | | | | | | |
Collapse
|
21
|
Arnautova YA, Abagyan RA, Totrov M. Development of a new physics-based internal coordinate mechanics force field and its application to protein loop modeling. Proteins 2011; 79:477-98. [PMID: 21069716 PMCID: PMC3057902 DOI: 10.1002/prot.22896] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
We report the development of internal coordinate mechanics force field (ICMFF), new force field parameterized using a combination of experimental data for crystals of small molecules and quantum mechanics calculations. The main features of ICMFF include: (a) parameterization for the dielectric constant relevant to the condensed state (ε = 2) instead of vacuum, (b) an improved description of hydrogen-bond interactions using duplicate sets of van der Waals parameters for heavy atom-hydrogen interactions, and (c) improved backbone covalent geometry and energetics achieved using novel backbone torsional potentials and inclusion of the bond angles at the C(α) atoms into the internal variable set. The performance of ICMFF was evaluated through loop modeling simulations for 4-13 residue loops. ICMFF was combined with a solvent-accessible surface area solvation model optimized using a large set of loop decoys. Conformational sampling was carried out using the biased probability Monte Carlo method. Average/median backbone root-mean-square deviations of the lowest energy conformations from the native structures were 0.25/0.21 Å for four residues loops, 0.84/0.46 Å for eight residue loops, and 1.16/0.73 Å for 12 residue loops. To our knowledge, these results are significantly better than or comparable with those reported to date for any loop modeling method that does not take crystal packing into account. Moreover, the accuracy of our method is on par with the best previously reported results obtained considering the crystal environment. We attribute this success to the high accuracy of the new ICM force field achieved by meticulous parameterization, to the optimized solvent model, and the efficiency of the search method.
Collapse
Affiliation(s)
- Yelena A Arnautova
- Molsoft LLC, 3366 North Torrey Pines Court, Suite 300, La Jolla, California 92037, USA
| | | | | |
Collapse
|
22
|
Abstract
Loop modeling is crucial for high-quality homology model construction outside conserved secondary structure elements. Dozens of loop modeling protocols involving a range of database and ab initio search algorithms and a variety of scoring functions have been proposed. Knowledge-based loop modeling methods are very fast and some can successfully and reliably predict loops up to about eight residues long. Several recent ab initio loop simulation methods can be used to construct accurate models of loops up to 12-13 residues long, albeit at a substantial computational cost. Major current challenges are the simulations of loops longer than 12-13 residues, the modeling of multiple interacting flexible loops, and the sensitivity of the loop predictions to the accuracy of the loop environment.
Collapse
|
23
|
Lee J, Lee D, Park H, Coutsias EA, Seok C. Protein loop modeling by using fragment assembly and analytical loop closure. Proteins 2010; 78:3428-36. [PMID: 20872556 PMCID: PMC2976774 DOI: 10.1002/prot.22849] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2010] [Revised: 07/16/2010] [Accepted: 07/31/2010] [Indexed: 12/27/2022]
Abstract
Protein loops are often involved in important biological functions such as molecular recognition, signal transduction, or enzymatic action. The three dimensional structures of loops can provide essential information for understanding molecular mechanisms behind protein functions. In this article, we develop a novel method for protein loop modeling, where the loop conformations are generated by fragment assembly and analytical loop closure. The fragment assembly method reduces the conformational space drastically, and the analytical loop closure method finds the geometrically consistent loop conformations efficiently. We also derive an analytic formula for the gradient of any analytical function of dihedral angles in the space of closed loops. The gradient can be used to optimize various restraints derived from experiments or databases, for example restraints for preferential interactions between specific residues or for preferred backbone angles. We demonstrate that the current loop modeling method outperforms previous methods that employ residue-based torsion angle maps or different loop closure strategies when tested on two sets of loop targets of lengths ranging from 4 to 12.
Collapse
Affiliation(s)
- Julian Lee
- Department of Bioinformatics and Life Science, Soongsil University, Seoul 156-743, Korea
| | - Dongseon Lee
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Hahnbeom Park
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Evangelos A. Coutsias
- Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM 87131, USA
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| |
Collapse
|
24
|
Li Y, Rata I, Chiu SW, Jakobsson E. Improving predicted protein loop structure ranking using a Pareto-optimality consensus method. BMC STRUCTURAL BIOLOGY 2010; 10:22. [PMID: 20642859 PMCID: PMC2914074 DOI: 10.1186/1472-6807-10-22] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2009] [Accepted: 07/20/2010] [Indexed: 11/10/2022]
Abstract
Background Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction. Results We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of ~20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods. Conclusions By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.
Collapse
Affiliation(s)
- Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA.
| | | | | | | |
Collapse
|