1
|
Badaczewska-Dawid AE, Kolinski A, Kmiecik S. Computational reconstruction of atomistic protein structures from coarse-grained models. Comput Struct Biotechnol J 2019; 18:162-176. [PMID: 31969975 PMCID: PMC6961067 DOI: 10.1016/j.csbj.2019.12.007] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 12/10/2019] [Indexed: 01/02/2023] Open
Abstract
Three-dimensional protein structures, whether determined experimentally or theoretically, are often too low resolution. In this mini-review, we outline the computational methods for protein structure reconstruction from incomplete coarse-grained to all atomistic models. Typical reconstruction schemes can be divided into four major steps. Usually, the first step is reconstruction of the protein backbone chain starting from the C-alpha trace. This is followed by side-chains rebuilding based on protein backbone geometry. Subsequently, hydrogen atoms can be reconstructed. Finally, the resulting all-atom models may require structure optimization. Many methods are available to perform each of these tasks. We discuss the available tools and their potential applications in integrative modeling pipelines that can transfer coarse-grained information from computational predictions, or experiment, to all atomistic structures.
Collapse
Affiliation(s)
| | | | - Sebastian Kmiecik
- Faculty of Chemistry, Biological and Chemical Research Center, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| |
Collapse
|
2
|
Lima AH, dos Santos AM, Alves CN, Lameira J. Computed insight into a peptide inhibitor preventing the induced fit mechanism of MurA enzyme fromPseudomonas aeruginosa. Chem Biol Drug Des 2016; 89:599-607. [DOI: 10.1111/cbdd.12882] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Revised: 08/16/2016] [Accepted: 09/29/2016] [Indexed: 11/28/2022]
Affiliation(s)
- Anderson H. Lima
- Laboratório de Planejamento e Desenvolvimento de Fármacos; Instituto de Ciências Exatas e Naturais; Universidade Federal do Pará; Belém PA Brasil
| | - Alberto M. dos Santos
- Laboratório de Planejamento e Desenvolvimento de Fármacos; Instituto de Ciências Exatas e Naturais; Universidade Federal do Pará; Belém PA Brasil
| | - Cláudio Nahum Alves
- Laboratório de Planejamento e Desenvolvimento de Fármacos; Instituto de Ciências Exatas e Naturais; Universidade Federal do Pará; Belém PA Brasil
| | - Jerônimo Lameira
- Laboratório de Planejamento e Desenvolvimento de Fármacos; Instituto de Ciências Exatas e Naturais; Universidade Federal do Pará; Belém PA Brasil
| |
Collapse
|
3
|
Bhattacharya D, Adhikari B, Li J, Cheng J. FRAGSION: ultra-fast protein fragment library generation by IOHMM sampling. Bioinformatics 2016; 32:2059-61. [PMID: 27153697 DOI: 10.1093/bioinformatics/btw067] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 01/30/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Speed, accuracy and robustness of building protein fragment library have important implications in de novo protein structure prediction since fragment-based methods are one of the most successful approaches in template-free modeling (FM). Majority of the existing fragment detection methods rely on database-driven search strategies to identify candidate fragments, which are inherently time-consuming and often hinder the possibility to locate longer fragments due to the limited sizes of databases. Also, it is difficult to alleviate the effect of noisy sequence-based predicted features such as secondary structures on the quality of fragment. RESULTS Here, we present FRAGSION, a database-free method to efficiently generate protein fragment library by sampling from an Input-Output Hidden Markov Model. FRAGSION offers some unique features compared to existing approaches in that it (i) is lightning-fast, consuming only few seconds of CPU time to generate fragment library for a protein of typical length (300 residues); (ii) can generate dynamic-size fragments of any length (even for the whole protein sequence) and (iii) offers ways to handle noise in predicted secondary structure during fragment sampling. On a FM dataset from the most recent Critical Assessment of Structure Prediction, we demonstrate that FGRAGSION provides advantages over the state-of-the-art fragment picking protocol of ROSETTA suite by speeding up computation by several orders of magnitude while achieving comparable performance in fragment quality. AVAILABILITY AND IMPLEMENTATION Source code and executable versions of FRAGSION for Linux and MacOS is freely available to non-commercial users at http://sysbio.rnet.missouri.edu/FRAGSION/ It is bundled with a manual and example data. CONTACT chengji@missouri.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Jianlin Cheng
- Department of Computer Science, Informatics Institute and C. Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
4
|
Yeh HYC, Lindsey A, Wu CP, Thomas S, Amato NM. Decoy Database Improvement for Protein Folding. J Comput Biol 2015; 22:823-36. [PMID: 26258648 DOI: 10.1089/cmb.2015.0116] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Predicting protein structures and simulating protein folding are two of the most important problems in computational biology today. Simulation methods rely on a scoring function to distinguish the native structure (the most energetically stable) from non-native structures. Decoy databases are collections of non-native structures used to test and verify these functions. We present a method to evaluate and improve the quality of decoy databases by adding novel structures and removing redundant structures. We test our approach on 20 different decoy databases of varying size and type and show significant improvement across a variety of metrics. We also test our improved databases on two popular modern scoring functions and show that for most cases they contain a greater or equal number of native-like structures than the original databases, thereby producing a more rigorous database for testing scoring functions.
Collapse
Affiliation(s)
- Hsin-Yi Cindy Yeh
- Parasol Lab, Department of Computer Science & Engineering, Texas A&M University , College Station, Texas
| | - Aaron Lindsey
- Parasol Lab, Department of Computer Science & Engineering, Texas A&M University , College Station, Texas
| | - Chih-Peng Wu
- Parasol Lab, Department of Computer Science & Engineering, Texas A&M University , College Station, Texas
| | - Shawna Thomas
- Parasol Lab, Department of Computer Science & Engineering, Texas A&M University , College Station, Texas
| | - Nancy M Amato
- Parasol Lab, Department of Computer Science & Engineering, Texas A&M University , College Station, Texas
| |
Collapse
|
5
|
Dhingra P, Jayaram B. A homology/ab initio hybrid algorithm for sampling near-native protein conformations. J Comput Chem 2013; 34:1925-36. [PMID: 23728619 DOI: 10.1002/jcc.23339] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2012] [Revised: 03/09/2013] [Accepted: 04/21/2013] [Indexed: 12/19/2022]
Abstract
One of the major challenges for protein tertiary structure prediction strategies is the quality of conformational sampling algorithms, which can effectively and readily search the protein fold space to generate near-native conformations. In an effort to advance the field by making the best use of available homology as well as fold recognition approaches along with ab initio folding methods, we have developed Bhageerath-H Strgen, a homology/ab initio hybrid algorithm for protein conformational sampling. The methodology is tested on the benchmark CASP9 dataset of 116 targets. In 93% of the cases, a structure with TM-score ≥ 0.5 is generated in the pool of decoys. Further, the performance of Bhageerath-H Strgen was seen to be efficient in comparison with different decoy generation methods. The algorithm is web enabled as Bhageerath-H Strgen web tool which is made freely accessible for protein decoy generation (http://www.scfbio-iitd.res.in/software/Bhageerath-HStrgen1.jsp).
Collapse
Affiliation(s)
- Priyanka Dhingra
- Department of Chemistry, Indian Institute of Technology, Hauz Khas, New Delhi, 110016, India
| | | |
Collapse
|
6
|
|
7
|
Dal Palú A, Spyrakis F, Cozzini P. A new approach for investigating protein flexibility based on Constraint Logic Programming. The first application in the case of the estrogen receptor. Eur J Med Chem 2012; 49:127-40. [PMID: 22277571 DOI: 10.1016/j.ejmech.2012.01.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2011] [Revised: 01/05/2012] [Accepted: 01/05/2012] [Indexed: 12/01/2022]
Abstract
We describe the potential of a novel method, based on Constraint Logic Programming (CLP), developed for an exhaustive sampling of protein conformational space. The CLP framework proposed here has been tested and applied to the estrogen receptor, whose activity and function is strictly related to its intrinsic, and well known, dynamics. We have investigated in particular the flexibility of H12, focusing on the pathways followed by the helix when moving from one stable crystallographic conformation to the others. Millions of geometrically feasible conformations were generated, selected and the traces connecting the different forms were determined by using a shortest path algorithm. The preliminary analyses showed a marked agreement between the crystallographic agonist-like, antagonist-like and hypothetical apo forms, and the corresponding conformations identified by the CLP framework. These promising results, together with the short computational time required to perform the analyses, make this constraint-based approach a valuable tool for the study of protein folding prediction. The CLP framework enables one to consider various structural and energetic scenarious, without changing the core algorithm. To show the feasibility of the method, we intentionally choose a pure geometric setting, neglecting the energetic evaluation of the poses, in order to be independent from a specific force field and to provide the possibility of comparing different behaviours associated with various energy models.
Collapse
|
8
|
Kifer I, Nussinov R, Wolfson HJ. Protein structure prediction using a docking-based hierarchical folding scheme. Proteins 2011; 79:1759-73. [PMID: 21445943 PMCID: PMC3092838 DOI: 10.1002/prot.22999] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2010] [Revised: 01/02/2011] [Accepted: 01/18/2011] [Indexed: 12/13/2022]
Abstract
The pathways by which proteins fold into their specific native structure are still an unsolved mystery. Currently, many methods for protein structure prediction are available, and most of them tackle the problem by relying on the vast amounts of data collected from known protein structures. These methods are often not concerned with the route the protein follows to reach its final fold. This work is based on the premise that proteins fold in a hierarchical manner. We present FOBIA, an automated method for predicting a protein structure. FOBIA consists of two main stages: the first finds matches between parts of the target sequence and independently folding structural units using profile-profile comparison. The second assembles these units into a 3D structure by searching and ranking their possible orientations toward each other using a docking-based approach. We have previously reported an application of an initial version of this strategy to homology based targets. Since then we have considerably enhanced our method's abilities to allow it to address the more difficult template-based target category. This allows us to now apply FOBIA to the template-based targets of CASP8 and to show that it is both very efficient and promising. Our method can provide an alternative for template-based structure prediction, and in particular, the docking-basedranking technique presented here can be incorporated into any profile-profile comparison based method.
Collapse
Affiliation(s)
- Ilona Kifer
- School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
| | | | | |
Collapse
|
9
|
Ken-Li Lin, Chin-Teng Lin, Pal NR. Incremental Mountain Clustering Method to Find Building Blocks for Constructing Structures of Proteins. IEEE Trans Nanobioscience 2010; 9:278-88. [DOI: 10.1109/tnb.2010.2095467] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
10
|
Vanhee P, Verschueren E, Baeten L, Stricher F, Serrano L, Rousseau F, Schymkowitz J. BriX: a database of protein building blocks for structural analysis, modeling and design. Nucleic Acids Res 2010; 39:D435-42. [PMID: 20972210 PMCID: PMC3013806 DOI: 10.1093/nar/gkq972] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
High-resolution structures of proteins remain the most valuable source for understanding their function in the cell and provide leads for drug design. Since the availability of sufficient protein structures to tackle complex problems such as modeling backbone moves or docking remains a problem, alternative approaches using small, recurrent protein fragments have been employed. Here we present two databases that provide a vast resource for implementing such fragment-based strategies. The BriX database contains fragments from over 7000 non-homologous proteins from the Astral collection, segmented in lengths from 4 to 14 residues and clustered according to structural similarity, summing up to a content of 2 million fragments per length. To overcome the lack of loops classified in BriX, we constructed the Loop BriX database of non-regular structure elements, clustered according to end-to-end distance between the regular residues flanking the loop. Both databases are available online (http://brix.crg.es) and can be accessed through a user-friendly web-interface. For high-throughput queries a web-based API is provided, as well as full database downloads. In addition, two exciting applications are provided as online services: (i) user-submitted structures can be covered on the fly with BriX classes, representing putative structural variation throughout the protein and (ii) gaps or low-confidence regions in these structures can be bridged with matching fragments.
Collapse
Affiliation(s)
- Peter Vanhee
- VIB SWITCH Laboratory, Flanders Institute of Biotechnology, Free University of Brussels, Pleinlaan 2, 1050 Brussels, Belgium
| | | | | | | | | | | | | |
Collapse
|
11
|
Glembo TJ, Ozkan SB. Union of geometric constraint-based simulations with molecular dynamics for protein structure prediction. Biophys J 2010; 98:1046-54. [PMID: 20303862 PMCID: PMC2849074 DOI: 10.1016/j.bpj.2009.11.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2009] [Revised: 11/05/2009] [Accepted: 11/17/2009] [Indexed: 10/19/2022] Open
Abstract
Although proteins are a fundamental unit in biology, the mechanism by which proteins fold into their native state is not well understood. In this work, we explore the assembly of secondary structure units via geometric constraint-based simulations and the effect of refinement of assembled structures using reservoir replica exchange molecular dynamics. Our approach uses two crucial features of these methods: i), geometric simulations speed up the search for nativelike topologies as there are no energy barriers to overcome; and ii), molecular dynamics identifies the low free energy structures and further refines these structures toward the actual native conformation. We use eight alpha-, beta-, and alpha/beta-proteins to test our method. The geometric simulations of our test set result in an average RMSD from native of 3.7 A and this further reduces to 2.7 A after refinement. We also explore the question of robustness of assembly for inaccurate (shifted and shortened) secondary structure. We find that the RMSD from native is highly dependent on the accuracy of secondary structure input, and even slightly shifting the location of secondary structure along the amino acid sequence can lead to a rapid decrease in RMSD to native due to incorrect packing.
Collapse
Key Words
- casp, critical assessment of techniques for protein structure prediction
- froda, framework rigidity optimized dynamics algorithm
- md, molecular dynamic
- remd, replica exchange molecular dynamics
- rmsd, root mean-square deviation
- r-remd, reservoir replica exchange molecular dynamics
- zam, zipping and assembly method
- zamf, zam with froda
- 3-d, three-dimensional
- 1-d, one-dimensional
Collapse
Affiliation(s)
| | - S. Banu Ozkan
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona
| |
Collapse
|
12
|
Pandini A, Fornili A, Kleinjung J. Structural alphabets derived from attractors in conformational space. BMC Bioinformatics 2010; 11:97. [PMID: 20170534 PMCID: PMC2838871 DOI: 10.1186/1471-2105-11-97] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Accepted: 02/20/2010] [Indexed: 11/20/2022] Open
Abstract
Background The hierarchical and partially redundant nature of protein structures justifies the definition of frequently occurring conformations of short fragments as 'states'. Collections of selected representatives for these states define Structural Alphabets, describing the most typical local conformations within protein structures. These alphabets form a bridge between the string-oriented methods of sequence analysis and the coordinate-oriented methods of protein structure analysis. Results A Structural Alphabet has been derived by clustering all four-residue fragments of a high-resolution subset of the protein data bank and extracting the high-density states as representative conformational states. Each fragment is uniquely defined by a set of three independent angles corresponding to its degrees of freedom, capturing in simple and intuitive terms the properties of the conformational space. The fragments of the Structural Alphabet are equivalent to the conformational attractors and therefore yield a most informative encoding of proteins. Proteins can be reconstructed within the experimental uncertainty in structure determination and ensembles of structures can be encoded with accuracy and robustness. Conclusions The density-based Structural Alphabet provides a novel tool to describe local conformations and it is specifically suitable for application in studies of protein dynamics.
Collapse
Affiliation(s)
- Alessandro Pandini
- Division of Mathematical Biology, MRC National Institute for Medical Research, London, UK
| | | | | |
Collapse
|
13
|
Hvidsten TR, Kryshtafovych A, Fidelis K. Local descriptors of protein structure: a systematic analysis of the sequence-structure relationship in proteins using short- and long-range interactions. Proteins 2009; 75:870-84. [PMID: 19025980 DOI: 10.1002/prot.22296] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Local protein structure representations that incorporate long-range contacts between residues are often considered in protein structure comparison but have found relatively little use in structure prediction where assembly from single backbone fragments dominates. Here, we introduce the concept of local descriptors of protein structure to characterize local neighborhoods of amino acids including short- and long-range interactions. We build a library of recurring local descriptors and show that this library is general enough to allow assembly of unseen protein structures. The library could on average re-assemble 83% of 119 unseen structures, and showed little or no performance decrease between homologous targets and targets with folds not represented among domains used to build it. We then systematically evaluate the descriptor library to establish the level of the sequence signal in sets of protein fragments of similar geometrical conformation. In particular, we test whether that signal is strong enough to facilitate correct assignment and alignment of these local geometries to new sequences. We use the signal to assign descriptors to a test set of 479 sequences with less than 40% sequence identity to any domain used to build the library, and show that on average more than 50% of the backbone fragments constituting descriptors can be correctly aligned. We also use the assigned descriptors to infer SCOP folds, and show that correct predictions can be made in many of the 151 cases where PSI-BLAST was unable to detect significant sequence similarity to proteins in the library. Although the combinatorial problem of simultaneously aligning several fragments to sequence is a major bottleneck compared with single fragment methods, the advantage of the current approach is that correct alignments imply correct long range distance constraints. The lack of these constraints is most likely the major reason why structure prediction methods fail to consistently produce adequate models when good templates are unavailable or undetectable. Thus, we believe that the current study offers new and valuable insight into the prediction of sequence-structure relationships in proteins.
Collapse
|
14
|
Vanhee P, Stricher F, Baeten L, Verschueren E, Lenaerts T, Serrano L, Rousseau F, Schymkowitz J. Protein-Peptide Interactions Adopt the Same Structural Motifs as Monomeric Protein Folds. Structure 2009; 17:1128-36. [DOI: 10.1016/j.str.2009.06.013] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2009] [Revised: 06/15/2009] [Accepted: 06/16/2009] [Indexed: 01/24/2023]
|
15
|
|
16
|
Kifer I, Nussinov R, Wolfson HJ. Constructing templates for protein structure prediction by simulation of protein folding pathways. Proteins 2009; 73:380-94. [PMID: 18433063 DOI: 10.1002/prot.22073] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
How a one-dimensional protein sequence folds into a specific 3D structure remains a difficult challenge in structural biology. Many computational methods have been developed in an attempt to predict the tertiary structure of the protein; most of these employ approaches that are based on the accumulated knowledge of solved protein structures. Here we introduce a novel and fully automated approach for predicting the 3D structure of a protein that is based on the well accepted notion that protein folding is a hierarchical process. Our algorithm follows the hierarchical model by employing two stages: the first aims to find a match between the sequences of short independently-folding structural entities and parts of the target sequence and assigns the respective structures. The second assembles these local structural parts into a complete 3D structure, allowing for long-range interactions between them. We present the results of applying our method to a subset of the targets from CASP6 and CASP7. Our results indicate that for targets with a significant sequence similarity to known structures we are often able to provide predictions that are better than those achieved by two leading servers, and that the most significant improvements in comparison with these methods occur in regions of a gapped structural alignment between the native structure and the closest available structural template. We conclude that in addition to performing well for targets with known homologous structures, our method shows great promise for addressing the more general category of comparative modeling targets, which is our next goal.
Collapse
Affiliation(s)
- Ilona Kifer
- School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
| | | | | |
Collapse
|
17
|
Le Q, Pollastri G, Koehl P. Structural alphabets for protein structure classification: a comparison study. J Mol Biol 2008; 387:431-50. [PMID: 19135454 DOI: 10.1016/j.jmb.2008.12.044] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2008] [Revised: 12/16/2008] [Accepted: 12/17/2008] [Indexed: 11/26/2022]
Abstract
Finding structural similarities between proteins often helps reveal shared functionality, which otherwise might not be detected by native sequence information alone. Such similarity is usually detected and quantified by protein structure alignment. Determining the optimal alignment between two protein structures, however, remains a hard problem. An alternative approach is to approximate each three-dimensional protein structure using a sequence of motifs derived from a structural alphabet. Using this approach, structure comparison is performed by comparing the corresponding motif sequences or structural sequences. In this article, we measure the performance of such alphabets in the context of the protein structure classification problem. We consider both local and global structural sequences. Each letter of a local structural sequence corresponds to the best matching fragment to the corresponding local segment of the protein structure. The global structural sequence is designed to generate the best possible complete chain that matches the full protein structure. We use an alphabet of 20 letters, corresponding to a library of 20 motifs or protein fragments having four residues. We show that the global structural sequences approximate well the native structures of proteins, with an average coordinate root mean square of 0.69 A over 2225 test proteins. The approximation is best for all alpha-proteins, while relatively poorer for all beta-proteins. We then test the performance of four different sequence representations of proteins (their native sequence, the sequence of their secondary-structure elements, and the local and global structural sequences based on our fragment library) with different classifiers in their ability to classify proteins that belong to five distinct folds of CATH. Without surprise, the primary sequence alone performs poorly as a structure classifier. We show that addition of either secondary-structure information or local information from the structural sequence considerably improves the classification accuracy. The two fragment-based sequences perform better than the secondary-structure sequence but not well enough at this stage to be a viable alternative to more computationally intensive methods based on protein structure alignment.
Collapse
Affiliation(s)
- Quan Le
- Complex and Adaptive Systems Laboratory, School of Computer Science and Informatics, University College Dublin, Dublin, Ireland.
| | | | | |
Collapse
|
18
|
Mardia KV, Nyirongo VB. Simulating virtual protein Calpha traces with applications. J Comput Biol 2008; 15:1209-20. [PMID: 18973436 DOI: 10.1089/cmb.2007.0092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We propose a simple procedure for generating virtual protein C(alpha) traces. One of the key ingredients of our method, to build a three-dimensional structure from a random sequence of amino acids, is to work directly on torsional angles of the chain which we sample from a von Mises distribution. With simple modeling of the hydrophobic effect in protein folding, the procedure produces compact and globular structures. Some characteristics of real proteins (i.e., compactness and globularity) are well mimicked by this procedure. These virtual traces are used to assess algorithms for matching protein structures or functional sites.
Collapse
Affiliation(s)
- Kanti V Mardia
- Department of Statistics, University of Leeds, Leeds, United Kingdom.
| | | |
Collapse
|
19
|
Paluszewski M, Winter P. Protein Decoy Generation Using Branch and Bound with Efficient Bounding. LECTURE NOTES IN COMPUTER SCIENCE 2008. [DOI: 10.1007/978-3-540-87361-7_32] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
20
|
Wu GA, Coutsias EA, Dill KA. Iterative assembly of helical proteins by optimal hydrophobic packing. Structure 2008; 16:1257-66. [PMID: 18682227 PMCID: PMC2629734 DOI: 10.1016/j.str.2008.04.019] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2007] [Revised: 03/26/2008] [Accepted: 04/06/2008] [Indexed: 11/21/2022]
Abstract
We present a method for the computer-based iterative assembly of native-like tertiary structures of helical proteins from alpha-helical fragments. For any pair of helices, our method, called MATCHSTIX, first generates an ensemble of possible relative orientations of the helices with various ways to form hydrophobic contacts between them. Those conformations having steric clashes, or a large radius of gyration of hydrophobic residues, or with helices too far separated to be connected by the intervening linking region, are discarded. Then, we attempt to connect the two helical fragments by using a robotics-based loop-closure algorithm. When loop closure is feasible, the algorithm generates an ensemble of viable interconnecting loops. After energy minimization and clustering, we use a representative set of conformations for further assembly with the remaining helices, adding one helix at a time. To efficiently sample the conformational space, the order of assembly generally proceeds from the pair of helices connected by the shortest loop, followed by joining one of its adjacent helices, always proceeding with the shorter connecting loop. We tested MATCHSTIX on 28 helical proteins each containing up to 5 helices and found it to heavily sample native-like conformations. The average rmsd of the best conformations for the 17 helix-bundle proteins that have 2 or 3 helices is less than 2 A; errors increase somewhat for proteins containing more helices. Native-like states are even more densely sampled when disulfide bonds are known and imposed as restraints. We conclude that, at least for helical proteins, if the secondary structures are known, this rapid rigid-body maximization of hydrophobic interactions can lead to small ensembles of highly native-like structures. It may be useful for protein structure prediction.
Collapse
Affiliation(s)
- G. Albert Wu
- Department of Pharmaceutical Chemistry, University of California in San Francisco, San Francisco, California 94143-2240
| | - Evangelos A. Coutsias
- Department of Mathematics and Statistics, University of New Mexico, Albuquerque, New Mexico 87131
| | - Ken A. Dill
- Department of Pharmaceutical Chemistry, University of California in San Francisco, San Francisco, California 94143-2240
| |
Collapse
|
21
|
Reconstruction of protein backbones from the BriX collection of canonical protein fragments. PLoS Comput Biol 2008; 4:e1000083. [PMID: 18483555 PMCID: PMC2367438 DOI: 10.1371/journal.pcbi.1000083] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2007] [Accepted: 04/07/2008] [Indexed: 12/23/2022] Open
Abstract
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures. Large-scale DNA sequencing efforts produce large amounts of protein sequence data. However, in order to understand the function of a protein, its tertiary three-dimensional structure is required. Despite worldwide efforts in structural biology, experimental protein structures are determined at a significantly slower pace. As a result, computational methods for protein structure prediction receive significant attention. A large part of the structure prediction problem lies in the enormous size of the problem: proteins seem to occur in an infinite variety of shapes. Here, we propose that this huge complexity may be overcome by identifying recurrent protein fragments, which are frequently reused as building blocks to construct proteins that were hitherto thought to be unrelated. The BriX database is the outcome of identifying about 2,000 canonical shapes among 1,261 protein structures. We show any given protein can be reconstructed from this library of building blocks at a very high resolution, suggesting that the modelling of protein backbones may be greatly aided by our database.
Collapse
|
22
|
Hermoso A, Espadaler J, Enrique Querol E, Aviles FX, Sternberg MJ, Oliva B, Fernandez-Fuentes N. Including Functional Annotations and Extending the Collection of Structural Classifications of Protein Loops (ArchDB). Bioinform Biol Insights 2008. [DOI: 10.1177/117793220700100004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Loops represent an important part of protein structures. The study of loop is critical for two main reasons: First, loops are often involved in protein function, stability and folding. Second, despite improvements in experimental and computational structure prediction methods, modeling the conformation of loops remains problematic. Here, we present a structural classification of loops, ArchDB, a mine of information with application in both mentioned fields: loop structure prediction and function prediction. ArchDB ( http://sbi.imim.es/archdb ) is a database of classified protein loop motifs. The current database provides four different classification sets tailored for different purposes. ArchDB-40, a loop classification derived from SCOP40, well suited for modeling common loop motifs. Since features relevant to loop structure or function can be more easily determined on well-populated clusters, we have developed ArchDB-95, a loop classification derived from SCOP95. This new classification set shows a ~40% increase in the number of subclasses, and a large 7-fold increase in the number of putative structure/function-related subclasses. We also present ArchDB-EC, a classification of loop motifs from enzymes, and ArchDB-KI, a manually annotated classification of loop motifs from kinases. Information about ligand contacts and PDB sites has been included in all classification sets. Improvements in our classification scheme are described, as well as several new database features, such as the ability to query by conserved annotations, sequence similarity, or uploading 3D coordinates of a protein. The lengths of classified loops range between 0 and 36 residues long. ArchDB offers an exhaustive sampling of loop structures. Functional information about loops and links with related biological databases are also provided. All this information and the possibility to browse/query the database through a web-server outline an useful tool with application in the comparative study of loops, the analysis of loops involved in protein function and to obtain templates for loop modeling.
Collapse
Affiliation(s)
- Antoni Hermoso
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
| | - Jordi Espadaler
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
- Laboratori de Bioinformàtica Estructural (GRIB), Universitat Pompeu Fabra/IMIM, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Catalonia, Spain
| | - E Enrique Querol
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
| | - Francesc X. Aviles
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
| | - Michael J.E. Sternberg
- Structural Bioinformatics Group, Department of Biological Sciences, Imperial College, London SW7 2AZ, U.K
| | - Baldomero Oliva
- Laboratori de Bioinformàtica Estructural (GRIB), Universitat Pompeu Fabra/IMIM, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Catalonia, Spain
| | - Narcis Fernandez-Fuentes
- Leeds Institute of Molecular Medicine, Section of Experimental Therapeutics, St. James University Hospital, Leeds LS7 9TF. U.K
| |
Collapse
|
23
|
Friedberg I, Harder T, Kolodny R, Sitbon E, Li Z, Godzik A. Using an alignment of fragment strings for comparing protein structures. Bioinformatics 2007; 23:e219-24. [PMID: 17237095 DOI: 10.1093/bioinformatics/btl310] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Most methods that are used to compare protein structures use three-dimensional (3D) structural information. At the same time, it has been shown that a 1D string representation of local protein structure retains a degree of structural information. This type of representation can be a powerful tool for protein structure comparison and classification, given the arsenal of sequence comparison tools developed by computational biology. However, in order to do so, there is a need to first understand how much information is contained in various possible 1D representations of protein structure. RESULTS Here we describe the use of a particular structure fragment library, denoted here as KL-strings, for the 1D representation of protein structure. Using KL-strings, we develop an infrastructure for comparing protein structures with a 1D representation. This study focuses on the added value gained from such a description. We show the new local structure language adds resolution to the traditional three-state (helix, strand and coil) secondary structure description, and provides a high degree of accuracy in recognizing structural similarities when used with a pairwise alignment benchmark. The results of this study have immediate applications towards fast structure recognition, and for fold prediction and classification.
Collapse
Affiliation(s)
- Iddo Friedberg
- Program in Bioinformatics and Systems Biology, Burnham Institute for Medical Research, La Jolla, CA, USA.
| | | | | | | | | | | |
Collapse
|
24
|
Hamelryck T, Kent JT, Krogh A. Sampling realistic protein conformations using local structural bias. PLoS Comput Biol 2006; 2:e131. [PMID: 17002495 PMCID: PMC1570370 DOI: 10.1371/journal.pcbi.0020131] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2006] [Accepted: 08/21/2006] [Indexed: 11/19/2022] Open
Abstract
The prediction of protein structure from sequence remains a major unsolved problem in biology. The most successful protein structure prediction methods make use of a divide-and-conquer strategy to attack the problem: a conformational sampling method generates plausible candidate structures, which are subsequently accepted or rejected using an energy function. Conceptually, this often corresponds to separating local structural bias from the long-range interactions that stabilize the compact, native state. However, sampling protein conformations that are compatible with the local structural bias encoded in a given protein sequence is a long-standing open problem, especially in continuous space. We describe an elegant and mathematically rigorous method to do this, and show that it readily generates native-like protein conformations simply by enforcing compactness. Our results have far-reaching implications for protein structure prediction, determination, simulation, and design. Protein structure prediction is one of the main unsolved problems in computational biology today. A common way to tackle the problem is to generate plausible protein conformations using a fairly inaccurate but fast method, and to evaluate the conformations using an accurate but slow method. The main bottleneck lies in the first step, that is, efficiently exploring protein conformational space. Currently, the best way to do this is to construct plausible structures by stringing together fragments from experimentally determined protein structures, a method called fragment assembly. Hamelryck, Kent, and Krogh present a new method that can efficiently generate protein conformations that are compatible with a given protein sequence. Unlike for existing methods, the generated conformations cover a continuous range and come with an associated probability. The method shows great promise for use in protein structure prediction, determination, simulation, and design.
Collapse
Affiliation(s)
- Thomas Hamelryck
- Bioinformatics Center, Institute of Molecular Biology and Physiology, University of Copenhagen, Copenhagen, Denmark.
| | | | | |
Collapse
|
25
|
Tuffery P, Derreumaux P. Dependency between consecutive local conformations helps assemble protein structures from secondary structures using Go potential and greedy algorithm. Proteins 2006; 61:732-40. [PMID: 16231300 DOI: 10.1002/prot.20698] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Discretization of protein conformational space and fragment assembly methods simplify the search of native structures. These methods, mostly of Monte Carlo and genetic-type, do not exploit, however, the fact that short fragments describing consecutive parts of proteins are conformation-dependent. Yet, this information should be useful in improving ab initio and comparative protein structure modeling. In a preliminary study, we have assessed the possibility of using greedy algorithms for protein structure reconstruction based on the assembly of fragments of four-residue length. Greedy algorithms differ from Monte Carlo and genetic approaches in that they grow a polypeptide chain one fragment after another. Here, we move one step further in complexity, and provide strong evidence that the dependence between consecutive local conformations during assembly makes possible the reconstruction of protein structures from their secondary structures using a Go potential. Overall our procedure can reproduce 20 protein structures of 50-164 amino acids within 2.7 to 6.5 A RMSd and is able to identify native topologies for all proteins, although some targets are stabilized by very long-range interactions.
Collapse
Affiliation(s)
- Pierre Tuffery
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U726, Paris, France.
| | | |
Collapse
|
26
|
Abstract
Energy functions are crucial ingredients of protein tertiary structure prediction methods. Assessing the quality of energy functions is therefore of prime importance. It requires the elaboration of a standard evaluation scheme, whose key elements are: i). sets that contain the native and several non-native structures of proteins (decoys) in order to test whether the energy functions display the expected quality features and ii). measures to evaluate the reliability of energy functions. We present here a survey of the recent advances in these two related fields. In a first part, we analyze and review the large number of decoy sets that are available on the web, and we summarize the characteristics of a challenging decoy set. We then discuss how to define the quality of energy functions and review the measures related to it.
Collapse
Affiliation(s)
- D Gilis
- Center of Applied Molecular Engineering, Institute of Chemistry and Biochemistry, University of Salzburg, Jakob Haringerstrabe 3, A-5020 Salzburg, Austria.
| |
Collapse
|
27
|
|