1
|
Outeiral C, Strahm M, Shi J, Morris GM, Benjamin SC, Deane CM. The prospects of quantum computing in computational molecular biology. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1481] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Carlos Outeiral
- Department of Statistics University of Oxford Oxford UK
- Department of Materials University of Oxford Oxford UK
| | - Martin Strahm
- Pharma Research and Early Development F. Hoffmann‐La Roche Basel Switzerland
| | - Jiye Shi
- Computer‐Aided Drug Design UCB Pharma Slough UK
| | | | | | | |
Collapse
|
2
|
|
3
|
Hensen U, Meyer T, Haas J, Rex R, Vriend G, Grubmüller H. Exploring protein dynamics space: the dynasome as the missing link between protein structure and function. PLoS One 2012; 7:e33931. [PMID: 22606222 PMCID: PMC3350514 DOI: 10.1371/journal.pone.0033931] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Accepted: 02/20/2012] [Indexed: 12/25/2022] Open
Abstract
Proteins are usually described and classified according to amino acid sequence, structure or function. Here, we develop a minimally biased scheme to compare and classify proteins according to their internal mobility patterns. This approach is based on the notion that proteins not only fold into recurring structural motifs but might also be carrying out only a limited set of recurring mobility motifs. The complete set of these patterns, which we tentatively call the dynasome, spans a multi-dimensional space with axes, the dynasome descriptors, characterizing different aspects of protein dynamics. The unique dynamic fingerprint of each protein is represented as a vector in the dynasome space. The difference between any two vectors, consequently, gives a reliable measure of the difference between the corresponding protein dynamics. We characterize the properties of the dynasome by comparing the dynamics fingerprints obtained from molecular dynamics simulations of 112 proteins but our approach is, in principle, not restricted to any specific source of data of protein dynamics. We conclude that: 1. the dynasome consists of a continuum of proteins, rather than well separated classes. 2. For the majority of proteins we observe strong correlations between structure and dynamics. 3. Proteins with similar function carry out similar dynamics, which suggests a new method to improve protein function annotation based on protein dynamics.
Collapse
Affiliation(s)
- Ulf Hensen
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - Tim Meyer
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - Jürgen Haas
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - René Rex
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - Gert Vriend
- CMBI, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | - Helmut Grubmüller
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| |
Collapse
|
4
|
Alexander N, Woetzel N, Meiler J. bcl::Cluster : A method for clustering biological molecules coupled with visualization in the Pymol Molecular Graphics System. IEEE ... INTERNATIONAL CONFERENCE ON COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES : [PROCEEDINGS]. IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES 2011; 2011:13-18. [PMID: 27818847 PMCID: PMC5091839 DOI: 10.1109/iccabs.2011.5729867] [Citation(s) in RCA: 119] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2023]
Abstract
Clustering algorithms are used as data analysis tools in a wide variety of applications in Biology. Clustering has become especially important in protein structure prediction and virtual high throughput screening methods. In protein structure prediction, clustering is used to structure the conformational space of thousands of protein models. In virtual high throughput screening, databases with millions of drug-like molecules are organized by structural similarity, e.g. common scaffolds. The tree-like dendrogram structure obtained from hierarchical clustering can provide a qualitative overview of the results, which is important for focusing detailed analysis. However, in practice it is difficult to relate specific components of the dendrogram directly back to the objects of which it is comprised and to display all desired information within the two dimensions of the dendrogram. The current work presents a hierarchical agglomerative clustering method termed bcl::Cluster. bcl::Cluster utilizes the Pymol Molecular Graphics System to graphically depict dendrograms in three dimensions. This allows simultaneous display of relevant biological molecules as well as additional information about the clusters and the members comprising them.
Collapse
|
5
|
DeBartolo J, Hocky G, Wilde M, Xu J, Freed KF, Sosnick TR. Protein structure prediction enhanced with evolutionary diversity: SPEED. Protein Sci 2010; 19:520-34. [PMID: 20066664 DOI: 10.1002/pro.330] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
For naturally occurring proteins, similar sequence implies similar structure. Consequently, multiple sequence alignments (MSAs) often are used in template-based modeling of protein structure and have been incorporated into fragment-based assembly methods. Our previous homology-free structure prediction study introduced an algorithm that mimics the folding pathway by coupling the formation of secondary and tertiary structure. Moves in the Monte Carlo procedure involve only a change in a single pair of phi,psi backbone dihedral angles that are obtained from a Protein Data Bank-based distribution appropriate for each amino acid, conditional on the type and conformation of the flanking residues. We improve this method by using MSAs to enrich the sampling distribution, but in a manner that does not require structural knowledge of any protein sequence (i.e., not homologous fragment insertion). In combination with other tools, including clustering and refinement, the accuracies of the predicted secondary and tertiary structures are substantially improved and a global and position-resolved measure of confidence is introduced for the accuracy of the predictions. Performance of the method in the Critical Assessment of Structure Prediction (CASP8) is discussed.
Collapse
Affiliation(s)
- Joe DeBartolo
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois 60637, USA
| | | | | | | | | | | |
Collapse
|
6
|
Jha AN, Vishveshwara S. Inter-helical interactions in membrane proteins: analysis based on the local backbone geometry and the side chain interactions. J Biomol Struct Dyn 2009; 26:719-29. [PMID: 19385700 DOI: 10.1080/07391102.2009.10507284] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
The availability of a significant number of the structures of helical membrane proteins has prompted us to investigate the mode of helix-helix packing. In the present study, we have considered a dataset of alpha-helical membrane proteins representing structures solved from all the known superfamilies. We have described the geometry of all the helical residues in terms of local coordinate axis at the backbone level. Significant inter-helical interactions have been considered as contacts by weighing the number of atom-atom contacts, including all the side-chain atoms. Such a definition of local axis and the contact criterion has allowed us to investigate the inter-helical interaction in a systematic and quantitative manner. We show that a single parameter (designated as alpha), which is derived from the parameters representing the mutual orientation of local axes, is able to accurately capture the details of helix-helix interaction. The analysis has been carried out by dividing the dataset into parallel, anti-parallel, and perpendicular orientation of helices. The study indicates that a specific range of alpha value is preferred for interactions among the anti-parallel helices. Such a preference is also seen among interacting residues of parallel helices, however to a lesser extent. No such preference is seen in the case of perpendicular helices, the contacts that arise mainly due to the interaction of surface helices with the end of the trans-membrane helices. The study supports the prevailing view that the anti-parallel helices are well packed. However, the interactions between helices of parallel orientation are non-trivial. The packing in alpha-helical membrane proteins, which is systematically and rigorously investigated in this study, may prove to be useful in modeling of helical membrane proteins.
Collapse
Affiliation(s)
- Anupam Nath Jha
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| | | |
Collapse
|
7
|
Miklós I, Novák Á, Satija R, Lyngsø R, Hein J. Stochastic models of sequence evolution including insertion—deletion events. Stat Methods Med Res 2009; 18:453-85. [DOI: 10.1177/0962280208099500] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Comparison of sequences that have descended from a common ancestor based on an explicit stochastic model of substitutions, insertions and deletions has risen to prominence in the last decade. Making statements about the positions of insertions-deletions (abbr. indels) is central in sequence and genome analysis and is called alignment. This statistical approach is harder conceptually and computationally, than competing approaches based on choosing an alignment according to some optimality criteria. But it has major practical advantages in terms of testing evolutionary hypotheses and parameter estimation. Basic dynamic approaches can allow the analysis of up to 4—5 sequences. MCMC techniques can bring this to about 10—15 sequences. Beyond this, different or heuristic approaches must be used. Besides the computational challenges, increasing realism in the underlying models is presently being addressed. A recent development that has been especially fruitful is combining statistical alignment with the problem of sequence annotation, making statements about the function of each nucleotide/amino acid. So far gene finding, protein secondary structure prediction and regulatory signal detection has been tackled within this framework. Much progress can be reported, but clearly major challenges remain if this approach is to be central in the analyses of large incoming sequence data sets.
Collapse
Affiliation(s)
- István Miklós
- Bioinformatics Group, Alfréd Rényi Institute of Mathematics, Hungarian Academy of Sciences, 1053 Budapest, Reáltanoda u. 13-15, Hungary, , Bioinformatics Group, Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG Oxford, UK, Data Mining and Search Research Group, Computer and Automation Institute, Hungarian Academy of Sciences, 1111 Budapest, Lágymányosi u. 11., Hungary
| | - Ádám Novák
- Bioinformatics Group, Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG Oxford, UK
| | - Rahul Satija
- Bioinformatics Group, Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG Oxford, UK
| | - Rune Lyngsø
- Bioinformatics Group, Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG Oxford, UK
| | - Jotun Hein
- Bioinformatics Group, Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG Oxford, UK
| |
Collapse
|
8
|
Miklós I, Novák A, Dombai B, Hein J. How reliably can we predict the reliability of protein structure predictions? BMC Bioinformatics 2008; 9:137. [PMID: 18315874 PMCID: PMC2324098 DOI: 10.1186/1471-2105-9-137] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2007] [Accepted: 03/03/2008] [Indexed: 11/10/2022] Open
Abstract
Background Comparative methods have been the standard techniques for in silico protein structure prediction. The prediction is based on a multiple alignment that contains both reference sequences with known structures and the sequence whose unknown structure is predicted. Intensive research has been made to improve the quality of multiple alignments, since misaligned parts of the multiple alignment yield misleading predictions. However, sometimes all methods fail to predict the correct alignment, because the evolutionary signal is too weak to find the homologous parts due to the large number of mutations that separate the sequences. Results Stochastic sequence alignment methods define a posterior distribution of possible multiple alignments. They can highlight the most likely alignment, and above that, they can give posterior probabilities for each alignment column. We made a comprehensive study on the HOMSTRAD database of structural alignments, predicting secondary structures in four different ways. We showed that alignment posterior probabilities correlate with the reliability of secondary structure predictions, though the strength of the correlation is different for different protocols. The correspondence between the reliability of secondary structure predictions and alignment posterior probabilities is the closest to the identity function when the secondary structure posterior probabilities are calculated from the posterior distribution of multiple alignments. The largest deviation from the identity function has been obtained in the case of predicting secondary structures from a single optimal pairwise alignment. We also showed that alignment posterior probabilities correlate with the 3D distances between Cα amino acids in superimposed tertiary structures. Conclusion Alignment posterior probabilities can be used to a priori detect errors in comparative models on the sequence alignment level.
Collapse
Affiliation(s)
- István Miklós
- Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG Oxford, UK.
| | | | | | | |
Collapse
|
9
|
A historical perspective of template-based protein structure prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:3-42. [PMID: 18075160 DOI: 10.1007/978-1-59745-574-9_1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This chapter presents a broad and a historical overview of the problem of protein structure prediction. Different structure prediction methods, including homology modeling, fold recognition (FR)/protein threading, ab initio/de novo approaches, and hybrid techniques involving multiple types of approaches, are introduced in a historical context. The progress of the field as a whole, especially in the threading/FR area, as reflected by the CASP/CAFASP contests, is reviewed. At the end of the chapter, we discuss the challenging issues ahead in the field of protein structure prediction.
Collapse
|
10
|
Abstract
I illustrate the use of the replica exchange molecular dynamics (REMD) algorithm to study the folding of a small (57 amino acids) protein that folds into a three-helix bundle, protein A. The REMD is a trivially parallel method that uses multiple copies of the system of interest to study the canonical ensemble equilibrium properties. Each replica represents a different thermodynamic state, usually at different temperatures. This method enhances the configurational sampling of proteins and allows us to study folding in simulations that are much shorter than the folding timescale for the system at ambient temperature. I show that using REMD and the Amber force field, I can obtain stable configurations of protein A whose backbone root mean square distance (RMSD) is within 0.17 nm of the nuclear magnetic resonance (NMR)-determined structure without biasing the system toward the folded structure. The simulations are done in explicit solvent and starting from nearly extended configurations. This calculation shows that currently available force fields and enhanced sampling methods perform reasonably well in describing the folded structure of small proteins.
Collapse
Affiliation(s)
- Angel E Garcia
- Department of Physics, Appled Physics and Astronomy, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| |
Collapse
|
11
|
Wang F, Stuart SJ, Latour RA. Calculation of adsorption free energy for solute-surface interactions using biased replica-exchange molecular dynamics. Biointerphases 2008; 3:9-18. [PMID: 19768127 PMCID: PMC2746080 DOI: 10.1116/1.2840054] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The adsorption behavior of a biomolecule, such as a peptide or protein, to a functionalized surface is of fundamental importance for a broad range of applications in biotechnology. The adsorption free energy for these types of interactions can be determined from a molecular dynamics simulation using the partitioning between adsorbed and nonadsorbed states, provided that sufficient sampling of both states is obtained. However, if interactions between the solute and the surface are strong, the solute will tend to be trapped near the surface during the simulation, thus preventing the adsorption free energy from being calculated by this method. This situation occurs even when using an advanced sampling algorithm such as replica-exchange molecular dynamics (REMD). In this paper, the authors demonstrate the fundamental basis of this problem using a model system consisting of one sodium ion (Na(+)) as the solute positioned over a surface functionalized with one negatively charged group (COO(-)) in explicit water. With this simple system, the authors show that sufficient sampling in the coordinate normal to the surface cannot be obtained by conventional REMD alone. The authors then present a method to overcome this problem through the use of an adaptive windowed-umbrella sampling technique to develop a biased-energy function that is combined with REMD. This approach provides an effective method for the calculation of adsorption free energy for solute-surface interactions.
Collapse
Affiliation(s)
- Feng Wang
- Department of Bioengineering, Clemson University, Clemson, South Carolina 29634
| | - Steven J. Stuart
- Department of Chemistry, Clemson University, Clemson, South Carolina 29634
| | - Robert A. Latour
- Department of Bioengineering, Clemson University, Clemson, South Carolina 29634
| |
Collapse
|
12
|
Shmygelska A, Hoos HH. An adaptive bin framework search method for a beta-sheet protein homopolymer model. BMC Bioinformatics 2007; 8:136. [PMID: 17451609 PMCID: PMC1894818 DOI: 10.1186/1471-2105-8-136] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2007] [Accepted: 04/24/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The problem of protein structure prediction consists of predicting the functional or native structure of a protein given its linear sequence of amino acids. This problem has played a prominent role in the fields of biomolecular physics and algorithm design for over 50 years. Additionally, its importance increases continually as a result of an exponential growth over time in the number of known protein sequences in contrast to a linear increase in the number of determined structures. Our work focuses on the problem of searching an exponentially large space of possible conformations as efficiently as possible, with the goal of finding a global optimum with respect to a given energy function. This problem plays an important role in the analysis of systems with complex search landscapes, and particularly in the context of ab initio protein structure prediction. RESULTS In this work, we introduce a novel approach for solving this conformation search problem based on the use of a bin framework for adaptively storing and retrieving promising locally optimal solutions. Our approach provides a rich and general framework within which a broad range of adaptive or reactive search strategies can be realized. Here, we introduce adaptive mechanisms for choosing which conformations should be stored, based on the set of conformations already stored in memory, and for biasing choices when retrieving conformations from memory in order to overcome search stagnation. CONCLUSION We show that our bin framework combined with a widely used optimization method, Monte Carlo search, achieves significantly better performance than state-of-the-art generalized ensemble methods for a well-known protein-like homopolymer model on the face-centered cubic lattice.
Collapse
Affiliation(s)
- Alena Shmygelska
- Department of Structural Biology, Stanford University, 299 W. Campus Dr., Stanford, CA 94305, USA
| | - Holger H Hoos
- Department of Computer Science, University of British Columbia, 2366 Main Mall, Vancouver, BC V6T 1Z4, Canada
| |
Collapse
|
13
|
Abstract
This review presents the advances in protein structure prediction from the computational methods perspective. The approaches are classified into four major categories: comparative modeling, fold recognition, first principles methods that employ database information, and first principles methods without database information. Important advances along with current limitations and challenges are presented.
Collapse
Affiliation(s)
- C A Floudas
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA.
| |
Collapse
|
14
|
Kurgan L, Kedarisetti KD. Sequence representation and prediction of protein secondary structure for structural motifs in twilight zone proteins. Protein J 2007; 25:463-74. [PMID: 17115254 DOI: 10.1007/s10930-006-9029-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Characterizing and classifying regularities in protein structure is an important element in uncovering the mechanisms that regulate protein structure, function and evolution. Recent research concentrates on analysis of structural motifs that can be used to describe larger, fold-sized structures based on homologous primary sequences. At the same time, accuracy of secondary protein structure prediction based on multiple sequence alignment drops significantly when low homology (twilight zone) sequences are considered. To this end, this paper addresses a problem of providing an alternative sequences representation that would improve ability to distinguish secondary structures for the twilight zone sequences without using alignment. We consider a novel classification problem, in which, structural motifs, referred to as structural fragments (SFs) are defined as uniform strand, helix and coil fragments. Classification of SFs allows to design novel sequence representations, and to investigate which other factors and prediction algorithms may result in the improved discrimination. Comprehensive experimental results show that statistically significant improvement in classification accuracy can be achieved by: (1) improving sequence representations, and (2) removing possible noise on the terminal residues in the SFs. Combining these two approaches reduces the error rate on average by 15% when compared to classification using standard representation and noisy information on the terminal residues, bringing the classification accuracy to over 70%. Finally, we show that certain prediction algorithms, such as neural networks and boosted decision trees, are superior to other algorithms.
Collapse
Affiliation(s)
- Lukasz Kurgan
- Electrical and Computer Engineering Department, University of Alberta, Edmonton, Alberta, Canada, T6G 2V4.
| | | |
Collapse
|
15
|
McAllister SR, Mickus BE, Klepeis JL, Floudas CA. Novel approach for alpha-helical topology prediction in globular proteins: generation of interhelical restraints. Proteins 2007; 65:930-52. [PMID: 17029234 DOI: 10.1002/prot.21095] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The protein folding problem represents one of the most challenging problems in computational biology. Distance constraints and topology predictions can be highly useful for the folding problem in reducing the conformational space that must be searched by deterministic algorithms to find a protein structure of minimum conformational energy. We present a novel optimization framework for predicting topological contacts and generating interhelical distance restraints between hydrophobic residues in alpha-helical globular proteins. It should be emphasized that since the model does not make assumptions about the form of the helices, it is applicable to all alpha-helical proteins, including helices with kinks and irregular helices. This model aims at enhancing the ASTRO-FOLD protein folding approach of Klepeis and Floudas (Journal of Computational Chemistry 2003;24:191-208), which finds the structure of global minimum conformational energy via a constrained nonlinear optimization problem. The proposed topology prediction model was evaluated on 26 alpha-helical proteins ranging from 2 to 8 helices and 35 to 159 residues, and the best identified average interhelical distances corresponding to the predicted contacts fell below 11 A in all 26 of these systems. Given the positive results of applying the model to several protein systems, the importance of interhelical hydrophobic-to-hydrophobic contacts in determining the folding of alpha-helical globular proteins is highlighted.
Collapse
Affiliation(s)
- S R McAllister
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | | | |
Collapse
|
16
|
Moult J. Rigorous performance evaluation in protein structure modelling and implications for computational biology. Philos Trans R Soc Lond B Biol Sci 2006; 361:453-8. [PMID: 16524833 PMCID: PMC1609338 DOI: 10.1098/rstb.2005.1810] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In principle, given the amino acid sequence of a protein, it is possible to compute the corresponding three-dimensional structure. Methods for modelling structure based on this premise have been under development for more than 40 years. For the past decade, a series of community wide experiments (termed Critical Assessment of Structure Prediction (CASP)) have assessed the state of the art, providing a detailed picture of what has been achieved in the field, where we are making progress, and what major problems remain. The rigorous evaluation procedures of CASP have been accompanied by substantial progress. Lessons from this area of computational biology suggest a set of principles for increasing rigor in the field as a whole.
Collapse
Affiliation(s)
- John Moult
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, 9600 Gudelsky Drive, Rockville, MD 20850, USA.
| |
Collapse
|
17
|
Fang Q, Shortle D. Enhanced sampling near the native conformation using statistical potentials for local side-chain and backbone interactions. Proteins 2006; 60:97-102. [PMID: 15852306 DOI: 10.1002/prot.20483] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In the preceding article in this issue of Proteins, an empirical energy function consisting of 4 statistical potentials that quantify local side-chain-backbone and side-chain-side-chain interactions has been demonstrated to successfully identify the native conformations of short sequence fragments and the native structure within large sets of high-quality decoys. Because this energy function consists entirely of interactions between residues separated by fewer than 5 positions, it can be used at the earliest stage of ab initio structure prediction to enhance the efficiency of conformational search. In this article, protein fragments are generated de novo by recombining very short segments of protein structures (2, 4, or 6 residues), either selected at random or optimized with respect this local energy function. When local energy is optimized in selected fragments, more efficient sampling of conformational space near the native conformation is consistently observed for 450 randomly selected single turn fragments, with turn lengths varying from 3 to 12 residues and all 4 combinations of flanking secondary structure. These results further demonstrate the energetic significance of local interactions in protein conformations. When used in combination with longer range energy functions, application of these potentials should lead to more accurate prediction of protein structure.
Collapse
Affiliation(s)
- Qiaojun Fang
- Department of Biological Chemistry, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | | |
Collapse
|
18
|
Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R. Advances in protein structure prediction and de novo protein design: A review. Chem Eng Sci 2006. [DOI: 10.1016/j.ces.2005.04.009] [Citation(s) in RCA: 175] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
19
|
Garcia AE, Herce H, Paschek D. Chapter 5 Simulations of Temperature and Pressure Unfolding of Peptides and Proteins with Replica Exchange Molecular Dynamics. ANNUAL REPORTS IN COMPUTATIONAL CHEMISTRY 2006. [DOI: 10.1016/s1574-1400(06)02005-6] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
20
|
Moult J. A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 2005; 15:285-9. [PMID: 15939584 DOI: 10.1016/j.sbi.2005.05.011] [Citation(s) in RCA: 302] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2005] [Revised: 04/29/2005] [Accepted: 05/09/2005] [Indexed: 10/25/2022]
Abstract
For the past ten years, CASP (Critical Assessment of Structure Prediction) has monitored the state of the art in modeling protein structure from sequence. During this period, there has been substantial progress in both comparative modeling of structure (using information from an evolutionarily related structural template) and template-free modeling. The quality of comparative models depends on the closeness of the evolutionary relationship on which they are based. Template-free modeling, although still very approximate, now produces topologically near correct models for some small proteins. Current major challenges are refining comparative models so that they match experimental accuracy, obtaining accurate sequence alignments for models based on remote evolutionary relationships, and extending template-free modeling methods so that they produce more accurate models, handle parts of comparative models not available from a template and deal with larger structures.
Collapse
Affiliation(s)
- John Moult
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, 9600 Gudelsky Drive, Rockville, MD 20850, USA
| |
Collapse
|
21
|
Wu Y, Chen M, Lu M, Wang Q, Ma J. Determining Protein Topology from Skeletons of Secondary Structures. J Mol Biol 2005; 350:571-86. [PMID: 15961102 DOI: 10.1016/j.jmb.2005.04.064] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2004] [Revised: 04/24/2005] [Accepted: 04/27/2005] [Indexed: 11/16/2022]
Abstract
We report a novel computational procedure for determining protein native topology, or fold, by defining loop connectivity based on skeletons of secondary structures that can usually be obtained from low to intermediate-resolution density maps. The procedure primarily involves a knowledge-based geometry filter followed by an energetics-based evaluation. It was tested on a large set of skeletons covering a wide range of protein architecture, including one modeled from an experimentally determined 7.6A cryo-electron microscopy (cryo-EM) density map. The results showed that the new procedure could effectively deduce protein folds without high-resolution structural data, a feature that could also be used to recognize native fold in structure prediction and to interpret data in fields like structure genomics. Most importantly, in the energetics-based evaluation, it was revealed that, despite the inevitable errors in the artificially constructed structures and limited accuracy of knowledge-based potential functions, the average energy of an ensemble of structures with slightly different configurations around the native skeleton is a much more robust parameter for marking native topology than the energy of individual structures in the ensemble. This result implies that, among all the possible topology candidates for a given skeleton, evolution has selected the native topology as the one that can accommodate the largest structural variations, not the one rigidly trapped in a deep, but narrow, conformational energy well.
Collapse
Affiliation(s)
- Yinghao Wu
- Department of Bioengineering, Rice University, Houston, TX 77005, USA
| | | | | | | | | |
Collapse
|
22
|
Lee J, Kim SY, Lee J. Protein structure prediction based on fragment assembly and parameter optimization. Biophys Chem 2005; 115:209-14. [PMID: 15752606 DOI: 10.1016/j.bpc.2004.12.046] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2004] [Revised: 11/09/2004] [Accepted: 12/10/2004] [Indexed: 11/28/2022]
Abstract
We propose a novel method for ab-initio prediction of protein tertiary structures based on the fragment assembly and global optimization. Fifteen residue long fragment libraries are constructed using the secondary structure prediction method PREDICT, and fragments in these libraries are assembled to generate full-length chains of a query protein. Tertiary structures of 50 to 100 conformations are obtained by minimizing an energy function for proteins, using the conformational space annealing method that enables one to sample diverse low-lying local minima of the energy. Then in order to enhance the performance of the prediction method, we optimize the linear parameters of the energy function, so that the native-like conformations become energetically more favorable than the non-native ones for proteins with known structures. We test the feasibility of the parameter optimization procedure by applying it to the training set consisting of three proteins: the 10-55 residue fragment of staphylococcal protein A (PDB ID 1bdd), a designed protein betanova, and 1fsd.
Collapse
Affiliation(s)
- Julian Lee
- Department of Bioinformatics and Life Science, Computer Aided Molecular Design Research Center, Bioinformatics and Molecular Design Technology Innovation Center, Soongsil University, Seoul 156-743, South Korea.
| | | | | |
Collapse
|
23
|
Floudas CA. Research challenges, opportunities and synergism in systems engineering and computational biology. AIChE J 2005. [DOI: 10.1002/aic.10620] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
24
|
Wen EZ, Hsieh MJ, Kollman PA, Luo R. Enhanced ab initio protein folding simulations in Poisson-Boltzmann molecular dynamics with self-guiding forces. J Mol Graph Model 2004; 22:415-24. [PMID: 15099837 DOI: 10.1016/j.jmgm.2003.12.008] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We have investigated the sampling efficiency in molecular dynamics with the PB implicit solvent when self-guiding forces are added. Compared with a high-temperature dynamics simulation, the use of self-guiding forces in room-temperature dynamics is found to be rather efficient as measured by potential energy fluctuation, gyration radius fluctuation, backbone RMSD fluctuation, number of unique clusters, and distribution of low RMSD structures over simulation time. Based on the enhanced sampling method, we have performed ab initio folding simulations of two small proteins, betabetaalpha1 and villin headpiece. The preliminary data for the folding simulations is presented. It is found that betabetaalpha1 folding proceeds by initiation of the turn and the helix. The hydrophobic collapse seems to be lagging behind or at most concurrent with the formation of the helix. The hairpin stability is weaker than the helix in our simulations. Its role in the early folding events seems to be less important than the more stable helix. In contrast, villin headpiece folding proceeds first by hydrophobic collapse. The formation of helices is later than the collapse phase, different from the betabetaalpha1 folding.
Collapse
Affiliation(s)
- Edward Z Wen
- Department of Molecular Biology and Biochemistry, University of California, Irvine, CA 92697-3900, USA
| | | | | | | |
Collapse
|
25
|
Pei J, Grishin NV. Combining evolutionary and structural information for local protein structure prediction. Proteins 2004; 56:782-94. [PMID: 15281130 DOI: 10.1002/prot.20158] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We study the effects of various factors in representing and combining evolutionary and structural information for local protein structural prediction based on fragment selection. We prepare databases of fragments from a set of non-redundant protein domains. For each fragment, evolutionary information is derived from homologous sequences and represented as estimated effective counts and frequencies of amino acids (evolutionary frequencies) at each position. Position-specific amino acid preferences called structural frequencies are derived from statistical analysis of discrete local structural environments in database structures. Our method for local structure prediction is based on ranking and selecting database fragments that are most similar to a target fragment. Using secondary structure type as a local structural property, we test our method in a number of settings. The major findings are: (1) the COMPASS-type scoring function for fragment similarity comparison gives better prediction accuracy than three other tested scoring functions for profile-profile comparison. We show that the COMPASS-type scoring function can be derived both in the probabilistic framework and in the framework of statistical potentials. (2) Using the evolutionary frequencies of database fragments gives better prediction accuracy than using structural frequencies. (3) Finer definition of local environments, such as including more side-chain solvent accessibility classes and considering the backbone conformations of neighboring residues, gives increasingly better prediction accuracy using structural frequencies. (4) Combining evolutionary and structural frequencies of database fragments, either in a linear fashion or using a pseudocount mixture formula, results in improvement of prediction accuracy. Combination at the log-odds score level is not as effective as combination at the frequency level. This suggests that there might be better ways of combining sequence and structural information than the commonly used linear combination of log-odds scores. Our method of fragment selection and frequency combination gives reasonable results of secondary structure prediction tested on 56 CASP5 targets (average SOV score 0.77), suggesting that it is a valid method for local protein structure prediction. Mixture of predicted structural frequencies and evolutionary frequencies improve the quality of local profile-to-profile alignment by COMPASS.
Collapse
Affiliation(s)
- Jimin Pei
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050, USA
| | | |
Collapse
|
26
|
Lee J, Kim SY, Joo K, Kim I, Lee J. Prediction of protein tertiary structure using PROFESY, a novel method based on fragment assembly and conformational space annealing. Proteins 2004; 56:704-14. [PMID: 15281124 DOI: 10.1002/prot.20150] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A novel method for ab initio prediction of protein tertiary structures, PROFESY (PROFile Enumerating SYstem), is proposed. This method utilizes the secondary structure prediction information of a query sequence and the fragment assembly procedure based on global optimization. Fifteen-residue-long fragment libraries are constructed using the secondary structure prediction method PREDICT, and fragments in these libraries are assembled to generate full-length chains of a query protein. Tertiary structures of 50 to 100 conformations are obtained by minimizing an energy function for proteins, using the conformational space annealing method that enables one to sample diverse low-lying local minima of the energy. We apply PROFESY for benchmark tests to proteins with known structures to demonstrate its feasibility. In addition, we participated in CASP5 and applied PROFESY to four new-fold targets for blind prediction. The results are quite promising, despite the fact that PROFESY was in its early stages of development. In particular, PROFESY successfully provided us the best model-one structure for the target T0161.
Collapse
Affiliation(s)
- Julian Lee
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
| | | | | | | | | |
Collapse
|
27
|
Klepeis JL, Wei Y, Hecht MH, Floudas CA. Ab initio prediction of the three-dimensional structure of a de novo designed protein: A double-blind case study. Proteins 2004; 58:560-70. [PMID: 15609306 DOI: 10.1002/prot.20338] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Ab initio structure prediction and de novo protein design are two problems at the forefront of research in the fields of structural biology and chemistry. The goal of ab initio structure prediction of proteins is to correctly characterize the 3D structure of a protein using only the amino acid sequence as input. De novo protein design involves the production of novel protein sequences that adopt a desired fold. In this work, the results of a double-blind study are presented in which a new ab initio method was successfully used to predict the 3D structure of a protein designed through an experimental approach using binary patterned combinatorial libraries of de novo sequences. The predicted structure, which was produced before the experimental structure was known and without consideration of the design goals, and the final NMR analysis both characterize this protein as a 4-helix bundle. The similarity of these structures is evidenced by both small RMSD values between the coordinates of the two structures and a detailed analysis of the helical packing.
Collapse
Affiliation(s)
- John L Klepeis
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | | | |
Collapse
|
28
|
Boniecki M, Rotkiewicz P, Skolnick J, Kolinski A. Protein fragment reconstruction using various modeling techniques. J Comput Aided Mol Des 2004; 17:725-38. [PMID: 15072433 DOI: 10.1023/b:jcam.0000017486.83645.a0] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Recently developed reduced models of proteins with knowledge-based force fields have been applied to a specific case of comparative modeling. From twenty high resolution protein structures of various structural classes, significant fragments of their chains have been removed and treated as unknown. The remaining portions of the structures were treated as fixed - i.e., as templates with an exact alignment. Then, the missed fragments were reconstructed using several modeling tools. These included three reduced types of protein models: the lattice SICHO (Side Chain Only) model, the lattice CABS (Calpha + Cbeta + Side group) model and an off-lattice model similar to the CABS model and called REFINER. The obtained reduced models were compared with more standard comparative modeling tools such as MODELLER and the SWISS-MODEL server. The reduced model results are qualitatively better for the higher resolution lattice models, clearly suggesting that these are now mature, competitive and complementary (in the range of sparse alignments) to the classical tools of comparative modeling. Comparison between the various reduced models strongly suggests that the essential ingredient for the sucessful and accurate modeling of protein structures is not the representation of conformational space (lattice, off-lattice, all-atom) but, rather, the specificity of the force fields used and, perhaps, the sampling techniques employed. These conclusions are encouraging for the future application of the fast reduced models in comparative modeling on a genomic scale.
Collapse
Affiliation(s)
- Michal Boniecki
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, Warsaw University, Pasteura 1, 02-093 Warsaw, Poland
| | | | | | | |
Collapse
|
29
|
Gromiha MM, Selvaraj S. Inter-residue interactions in protein folding and stability. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2004; 86:235-77. [PMID: 15288760 DOI: 10.1016/j.pbiomolbio.2003.09.003] [Citation(s) in RCA: 225] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
During the process of protein folding, the amino acid residues along the polypeptide chain interact with each other in a cooperative manner to form the stable native structure. The knowledge about inter-residue interactions in protein structures is very helpful to understand the mechanism of protein folding and stability. In this review, we introduce the classification of inter-residue interactions into short, medium and long range based on a simple geometric approach. The features of these interactions in different structural classes of globular and membrane proteins, and in various folds have been delineated. The development of contact potentials and the application of inter-residue contacts for predicting the structural class and secondary structures of globular proteins, solvent accessibility, fold recognition and ab initio tertiary structure prediction have been evaluated. Further, the relationship between inter-residue contacts and protein-folding rates has been highlighted. Moreover, the importance of inter-residue interactions in protein-folding kinetics and for understanding the stability of proteins has been discussed. In essence, the information gained from the studies on inter-residue interactions provides valuable insights for understanding protein folding and de novo protein design.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Aomi Frontier Building 17F, 2-43 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | |
Collapse
|
30
|
Kong Y, Zhang X, Baker TS, Ma J. A Structural-informatics approach for tracing beta-sheets: building pseudo-C(alpha) traces for beta-strands in intermediate-resolution density maps. J Mol Biol 2004; 339:117-30. [PMID: 15123425 PMCID: PMC4148645 DOI: 10.1016/j.jmb.2004.03.038] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2003] [Revised: 02/03/2004] [Accepted: 03/09/2004] [Indexed: 10/26/2022]
Abstract
We report the development of two computational methods to assist density map interpretation at intermediate resolutions: sheettracer for building pseudo-C(alpha) models of beta-sheets, and a deconvolution method for enhancing features attributed to major secondary structural elements. Sheettracer is tightly coupled with sheetminer, which was developed to locate sheet densities in intermediate-resolution density maps. The results from sheetminer are used as inputs to sheettracer, which employs a multi-step ad hoc morphological analysis of sheet densities to trace individual strands of beta-sheets. The methods were tested on simulated density maps from 12 protein crystal structures that represent a reasonably complete sampling of sheet morphology. The sheet-tracing results were quantitatively assessed in terms of sensitivity, specificity and rms deviations. Furthermore, sheettracer and the deconvolution method were rigorously tested on experimental maps of the lambda2 protein of reovirus at resolutions of 7.6A and 11.8A. Our results clearly demonstrate the capability of sheettracer in building pseudo-C(alpha) models of beta-sheets in intermediate-resolution density maps and the power of the deconvolution method in enhancing the performance of sheettracer. These computational methods, along with other related ones, should facilitate recognition and analysis of folding motifs from experimental data at intermediate resolutions.
Collapse
Affiliation(s)
- Yifei Kong
- Graduate Program of Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, One Baylor Plaza Houston, TX 77030, USA
| | - Xing Zhang
- Department of Biological Sciences, Purdue University West Lafayette, IN 47907, USA
| | - Timothy S. Baker
- Department of Biological Sciences, Purdue University West Lafayette, IN 47907, USA
| | - Jianpeng Ma
- Graduate Program of Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, One Baylor Plaza Houston, TX 77030, USA
- Department of Bioengineering Rice University, Houston, TX 77005, USA
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
- Corresponding author:
| |
Collapse
|
31
|
Klepeis JL, Floudas CA. ASTRO-FOLD: a combinatorial and global optimization framework for Ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. Biophys J 2004; 85:2119-46. [PMID: 14507680 PMCID: PMC1303441 DOI: 10.1016/s0006-3495(03)74640-2] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The field of computational biology has been revolutionized by recent advances in genomics. The completion of a number of genome projects, including that of the human genome, has paved the way toward a variety of challenges and opportunities in bioinformatics and biological systems engineering. One of the first challenges has been the determination of the structures of proteins encoded by the individual genes. This problem, which represents the progression from sequence to structure (genomics to structural genomics), has been widely known as the structure-prediction-in-protein-folding problem. We present the development and application of ASTRO-FOLD, a novel and complete approach for the ab initio prediction of protein structures given only the amino acid sequences of the proteins. The approach exhibits many novel components and the merits of its application are examined for a suite of protein systems, including a number of targets from several critical-assessment-of-structure-prediction experiments.
Collapse
Affiliation(s)
- J L Klepeis
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 10036, USA.
| | | |
Collapse
|
32
|
Reinhardt A, Eisenberg D. DPANN: Improved sequence to structure alignments following fold recognition. Proteins 2004; 56:528-38. [PMID: 15229885 DOI: 10.1002/prot.20144] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
In fold recognition (FR) a protein sequence of unknown structure is assigned to the closest known three-dimensional (3D) fold. Although FR programs can often identify among all possible folds the one a sequence adopts, they frequently fail to align the sequence to the equivalent residue positions in that fold. Such failures frustrate the next step in structure prediction, protein model building. Hence it is desirable to improve the quality of the alignments between the sequence and the identified structure. We have used artificial neural networks (ANN) to derive a substitution matrix to create alignments between a protein sequence and a protein structure through dynamic programming (DPANN: Dynamic Programming meets Artificial Neural Networks). The matrix is based on the amino acid type and the secondary structure state of each residue. In a database of protein pairs that have the same fold but lack sequences-similarity, DPANN aligns over 30% of all sequences to the paired structure, resembling closely the structural superposition of the pair. In over half of these cases the DPANN alignment is close to the structural superposition, although the initial alignment from the step of fold recognition is not close. Conversely, the alignment created during fold recognition outperforms DPANN in only 10% of all cases. Thus application of DPANN after fold recognition leads to substantial improvements in alignment accuracy, which in turn provides more useful templates for the modeling of protein structures. In the artificial case of using actual instead of predicted secondary structures for the probe protein, over 50% of the alignments are successful.
Collapse
|
33
|
Abstract
A protein model that is simple enough to be used in protein-folding simulations but accurate enough to identify a protein native fold is described. Its geometry consists of describing the residues by one, two, or three pseudoatoms, depending on the residue size. Its energy is given by a pairwise, knowledge-based potential obtained for all the pseudoatoms as a function of their relative distance. The pseudoatomic potential is also a function of the primary chain separation and residue order. The model is tested by gapless threading on a large, representative set of known protein and decoy structures obtained from the "Decoys 'R' Us" database. It is also tested by threading on gapped decoys generated for proteins with many homologs. The gapless threading tests show near 98% native-structure recognition as the lowest energy structure and almost 100% as one of the three lowest energy structures for over 2200 test proteins. In decoy threading tests, the model recognized the majority of the native structures. It is also able to recognize native structures among gapped decoys, in spite of close structural similarities. The results indicate that the pseudoatomic model has native recognition ability similar to comparable atomic-based models but much better than equivalent residue-based models.
Collapse
Affiliation(s)
- Marcos R Betancourt
- University at Buffalo Center of Excellence in Bioinformatics, Buffalo, New York 14203, USA
| |
Collapse
|
34
|
|
35
|
De Sancho D, Prieto L, Rubio AM, Rey A. Evolutionary method for the assembly of rigid protein fragments. J Comput Chem 2004; 26:131-41. [PMID: 15584079 DOI: 10.1002/jcc.20150] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Genetic algorithms constitute a powerful optimization method that has already been used in the study of the protein folding problem. However, they often suffer from a lack of convergence in a reasonably short time for complex fitness functions. Here, we propose an evolutionary strategy that can reproducibly find structures close to the minimum of a potential function for a simplified protein model in an efficient way. The model reduces the number of degrees of freedom of the system by treating the protein structure as composed of rigid fragments. The search incorporates a double encoding procedure and a merging operation from subpopulations that evolve independently of one another, both contributing to the good performance of the full algorithm. We have tested it with protein structures of different degrees of complexity, and present our conclusions related to its possible application as an efficient tool for the analysis of folding potentials.
Collapse
Affiliation(s)
- David De Sancho
- Departamento de Química Física, Facultad de Ciencias Químicas, Universidad Complutense, E-28040 Madrid, Spain
| | | | | | | |
Collapse
|
36
|
Skolnick J, Zhang Y, Arakaki AK, Kolinski A, Boniecki M, Szilágyi A, Kihara D. TOUCHSTONE: A unified approach to protein structure prediction. Proteins 2003; 53 Suppl 6:469-79. [PMID: 14579335 DOI: 10.1002/prot.10551] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We have applied the TOUCHSTONE structure prediction algorithm that spans the range from homology modeling to ab initio folding to all protein targets in CASP5. Using our threading algorithm PROSPECTOR that does not utilize input from metaservers, one threads against a representative set of PDB templates. If a template is significantly hit, Generalized Comparative Modeling designed to span the range from closely to distantly related proteins from the template is done. This involves freezing the aligned regions and relaxing the remaining structure to accommodate insertions or deletions with respect to the template. For all targets, consensus predicted side chain contacts from at least weakly threading templates are pooled and incorporated into ab initio folding. Often, TOUCHSTONE performs well in the CM to FR categories, with PROSPECTOR showing significant ability to identify analogous templates. When ab initio folding is done, frequently the best models are closer to the native state than the initial template. Among the particularly good predictions are T0130 in the CM/FR category, T0138 in the FR(H) category, T0135 in the FR(A) category, T0170 in the FR/NF category and T0181 in the NF category. Improvements in the approach are needed in the FR/NF and NF categories. Nevertheless, TOUCHSTONE was one of the best performing algorithms over all categories in CASP5.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, New York 14203, USA.
| | | | | | | | | | | | | |
Collapse
|
37
|
Kong Y, Ma J. A structural-informatics approach for mining beta-sheets: locating sheets in intermediate-resolution density maps. J Mol Biol 2003; 332:399-413. [PMID: 12948490 DOI: 10.1016/s0022-2836(03)00859-3] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Here, we report a new computational method, called sheetminer, for mining beta-sheets in the density maps at intermediate resolutions of 6 to 10A. The method employs a multi-step ad hoc morphological analysis of density maps to identify the unique characteristics of beta-sheets. It was tested on density maps from 12 protein crystal structures that were artificially blurred to intermediate resolutions. There are a total of 35 independent beta-sheets with a wide distribution of morphology. The method successfully located 34 of them and missed only one. The method was also applied to an experimental 9A electron cryomicroscopic structure and an 8A X-ray density map. In both cases, the sheet-searching results were found to agree very well with known high-resolution crystal structures. Collectively, these results demonstrate clearly the robustness of sheetminer in locating the regions belonging to beta-sheets in the intermediate-resolution density maps. Furthermore, sheetminer is completely complementary to all other existing computational methods, including helixhunter and threading algorithms. Their combined usage has the potential to significantly enhance the computational modeling capacity for a much more complete interpretation of structural data at intermediate resolutions, from which extraction of functional information would be more effective. This is particularly important in the field of structural genomics, in which the fast screening approach may not always yield crystals that diffract to atomic resolution. An exciting future application of sheetminer is as a valuable tool for revealing the structures of amyloid fibrils that are rich in beta-motifs.
Collapse
Affiliation(s)
- Yifei Kong
- Graduate Program of Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, TX 77030, USA
| | | |
Collapse
|
38
|
Abstract
We describe a novel procedure for generating and optimizing pattern descriptors that can be used to find structural motifs in DNA or RNA sequences. This combines a pattern-description language (based primarily on secondary structure alignment and conservation of some key nucleotides) with a scoring function that relies heavily on estimated folding free energies for the secondary structure of interest. For the cloverleaf secondary structure characteristic of tRNA, we show that a fairly simple pattern descriptor can find almost all known tRNA genes in both bacterial and eukaryotic genomes, and that false positives (sequences that match the pattern but that are probably not tRNAs) can be recognized by their high estimated folding free energies. A general procedure for optimizing descriptors (and hence for finding new structural motifs) is also described. For six bacterial, four eukaryotic, and four archaea genome sequences, our results compare favorably with those of the more complex and specialized tRNAscan-SE algorithm. Prospects for using this general approach to find other RNA structural motifs are discussed.
Collapse
Affiliation(s)
- Vickie Tsui
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037, USA
| | | | | |
Collapse
|
39
|
Haspel N, Tsai CJ, Wolfson H, Nussinov R. Hierarchical protein folding pathways: a computational study of protein fragments. Proteins 2003; 51:203-15. [PMID: 12660989 DOI: 10.1002/prot.10294] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We have previously presented a building block folding model. The model postulates that protein folding is a hierarchical top-down process. The basic unit from which a fold is constructed, referred to as a hydrophobic folding unit, is the outcome of combinatorial assembly of a set of "building blocks." Results obtained by the computational cutting procedure yield fragments that are in agreement with those obtained experimentally by limited proteolysis. Here we show that as expected, proteins from the same family give very similar building blocks. However, different proteins can also give building blocks that are similar in structure. In such cases the building blocks differ in sequence, stability, contacts with other building blocks, and in their 3D locations in the protein structure. This result, which we have repeatedly observed in many cases, leads us to conclude that while a building block is influenced by its environment, nevertheless, it can be viewed as a stand-alone unit. For small-sized building blocks existing in multiple conformations, interactions with sister building blocks in the protein will increase the population time of the native conformer. With this conclusion in hand, it is possible to develop an algorithm that predicts the building block assignment of a protein sequence whose structure is unknown. Toward this goal, we have created sequentially nonredundant databases of building block sequences. A protein sequence can be aligned against these, in order to be matched to a set of potential building blocks.
Collapse
Affiliation(s)
- Nurit Haspel
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | | | | | | |
Collapse
|
40
|
Abstract
Central issues concerning protein structure prediction have been highlighted by the recently published summary of the fourth community-wide protein structure prediction experiment (CASP4). Although sequence/structure alignment remains the bottleneck in comparative modeling, there has been substantial progress in fully automated remote homolog detection and in de novo structure prediction. Significant further progress will probably require improvements in high-resolution modeling.
Collapse
Affiliation(s)
- Jack Schonbrun
- Howard Hughes Medical Institute and Department of Biochemistry, Box 357350, University of Washington, Seattle, Washington 98165, USA
| | | | | |
Collapse
|
41
|
Kihara D, Zhang Y, Lu H, Kolinski A, Skolnick J. Ab initio protein structure prediction on a genomic scale: application to the Mycoplasma genitalium genome. Proc Natl Acad Sci U S A 2002; 99:5993-8. [PMID: 11959918 PMCID: PMC122890 DOI: 10.1073/pnas.092135699] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
An ab initio protein structure prediction procedure, TOUCHSTONE, was applied to all 85 small proteins of the Mycoplasma genitalium genome. TOUCHSTONE is based on a Monte Carlo refinement of a lattice model of proteins, which uses threading-based tertiary restraints. Such restraints are derived by extracting consensus contacts and local secondary structure from at least weakly scoring structures that, in some cases, can lack any global similarity to the sequence of interest. Selection of the native fold was done by using the convergence of the simulation from two different conformational search schemes and the lowest energy structure by a knowledge-based atomic-detailed potential. Among the 85 proteins, for 34 proteins with significant threading hits, the template structures were reasonably well reproduced. Of the remaining 51 proteins, 29 proteins converged to five or fewer clusters. In the test set, 84.8% of the proteins that converged to five or fewer clusters had a correct fold among the clusters. If this statistic is simply applied, 24 proteins (84.8% of the 29 proteins) may have correct folds. Thus, the topology of a total of 58 proteins probably has been correctly predicted. Based on these results, ab initio protein structure prediction is becoming a practical approach.
Collapse
Affiliation(s)
- Daisuke Kihara
- Laboratory of Computational Genomics, Donald Danforth Plant Science Center, 975 North Warson Road, St. Louis, MO 63132, USA
| | | | | | | | | |
Collapse
|
42
|
Abstract
Steady progress has been made in the field of ab initio protein folding. A variety of methods now allow the prediction of low-resolution structures of small proteins or protein fragments up to approximately 100 amino acid residues in length. Such low-resolution structures may be sufficient for the functional annotation of protein sequences on a genome-wide scale. Although no consistently reliable algorithm is currently available, the essential challenges to developing a general theory or approach to protein structure prediction are better understood. The energy landscapes resulting from the structure prediction algorithms are only partially funneled to the native state of the protein. This review focuses on two areas of recent advances in ab initio structure prediction-improvements in the energy functions and strategies to search the caldera region of the energy landscapes.
Collapse
Affiliation(s)
- Corey Hardin
- Center for Biophysics and Computational Biology, University of Illinois, 600 South Mathews Avenue, Urbana, Illinois 61801, USA
| | | | | |
Collapse
|
43
|
|