1
|
Kumar AV, Ali RFM, Cao Y, Krishnan VV. Application of data mining tools for classification of protein structural class from residue based averaged NMR chemical shifts. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2015; 1854:1545-52. [PMID: 25758094 DOI: 10.1016/j.bbapap.2015.02.016] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 02/25/2015] [Indexed: 10/23/2022]
Abstract
The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for protein structural information that is directly related to function. Nuclear magnetic resonance (NMR) provides powerful means to determine three-dimensional structures of proteins in the solution state. However, translation of the NMR spectral parameters to even low-resolution structural information such as protein class requires multiple time consuming steps. In this paper, we present an unorthodox method to predict the protein structural class directly by using the residue's averaged chemical shifts (ACS) based on machine learning algorithms. Experimental chemical shift information from 1491 proteins obtained from Biological Magnetic Resonance Bank (BMRB) and their respective protein structural classes derived from structural classification of proteins (SCOP) were used to construct a data set with 119 attributes and 5 different classes. Twenty four different classification schemes were evaluated using several performance measures. Overall the residue based ACS values can predict the protein structural classes with 80% accuracy measured by Matthew correlation coefficient. Specifically protein classes defined by mixed αβ or small proteins are classified with >90% correlation. Our results indicate that this NMR-based method can be utilized as a low-resolution tool for protein structural class identification without any prior chemical shift assignments.
Collapse
Affiliation(s)
- Arun V Kumar
- Department of Computer Science, California State University, Fresno, CA 93740, United States
| | - Rehana F M Ali
- Department of Computer Science, California State University, Fresno, CA 93740, United States
| | - Yu Cao
- Department of Computer Science, California State University, Fresno, CA 93740, United States
| | - V V Krishnan
- Department of Chemistry, California State University, Fresno, CA 93740, United States; Department of Pathology and Laboratory Medicine, School of Medicine, University of California, Davis, CA 95616, United States.
| |
Collapse
|
2
|
Producing high-accuracy lattice models from protein atomic coordinates including side chains. Adv Bioinformatics 2012; 2012:148045. [PMID: 22934109 PMCID: PMC3426164 DOI: 10.1155/2012/148045] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 06/18/2012] [Indexed: 02/08/2023] Open
Abstract
Lattice models are a common abstraction used in the study of protein structure, folding, and refinement. They are advantageous because the discretisation of space can make extensive protein evaluations computationally feasible. Various approaches to the protein chain lattice fitting problem have been suggested but only a single backbone-only tool is available currently. We introduce LatFit, a new tool to produce high-accuracy lattice protein models. It generates both backbone-only and backbone-side-chain models in any user defined lattice. LatFit implements a new distance RMSD-optimisation fitting procedure in addition to the known coordinate RMSD method. We tested LatFit's accuracy and speed using a large nonredundant set of high resolution proteins (SCOP database) on three commonly used lattices: 3D cubic, face-centred cubic, and knight's walk. Fitting speed compared favourably to other methods and both backbone-only and backbone-side-chain models show low deviation from the original data (~1.5 Å RMSD in the FCC lattice). To our knowledge this represents the first comprehensive study of lattice quality for on-lattice protein models including side chains while LatFit is the only available tool for such models.
Collapse
|
3
|
Zhou Y, Duan Y, Yang Y, Faraggi E, Lei H. Trends in template/fragment-free protein structure prediction. Theor Chem Acc 2011; 128:3-16. [PMID: 21423322 PMCID: PMC3030773 DOI: 10.1007/s00214-010-0799-2] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2010] [Accepted: 08/15/2010] [Indexed: 12/13/2022]
Abstract
Predicting the structure of a protein from its amino acid sequence is a long-standing unsolved problem in computational biology. Its solution would be of both fundamental and practical importance as the gap between the number of known sequences and the number of experimentally solved structures widens rapidly. Currently, the most successful approaches are based on fragment/template reassembly. Lacking progress in template-free structure prediction calls for novel ideas and approaches. This article reviews trends in the development of physical and specific knowledge-based energy functions as well as sampling techniques for fragment-free structure prediction. Recent physical- and knowledge-based studies demonstrated that it is possible to sample and predict highly accurate protein structures without borrowing native fragments from known protein structures. These emerging approaches with fully flexible sampling have the potential to move the field forward.
Collapse
Affiliation(s)
- Yaoqi Zhou
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Yong Duan
- UC Davis Genome Center and Department of Applied Science, University of California, One Shields Avenue, Davis, CA USA
- College of Physics, Huazhong University of Science and Technology, 1037 Luoyu Road, 430074 Wuhan, China
| | - Yuedong Yang
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Eshel Faraggi
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Hongxing Lei
- UC Davis Genome Center and Department of Applied Science, University of California, One Shields Avenue, Davis, CA USA
- Beijing Institute of Genomics, Chinese Academy of Sciences, 100029 Beijing, China
| |
Collapse
|
4
|
Faraggi E, Yang Y, Zhang S, Zhou Y. Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. Structure 2010; 17:1515-27. [PMID: 19913486 DOI: 10.1016/j.str.2009.09.006] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2009] [Revised: 09/01/2009] [Accepted: 09/03/2009] [Indexed: 11/30/2022]
Abstract
Local structures predicted from protein sequences are used extensively in every aspect of modeling and prediction of protein structure and function. For more than 50 years, they have been predicted at a low-resolution coarse-grained level (e.g., three-state secondary structure). Here, we combine a two-state classifier with real-value predictor to predict local structure in continuous representation by backbone torsion angles. The accuracy of the angles predicted by this approach is close to that derived from NMR chemical shifts. Their substitution for predicted secondary structure as restraints for ab initio structure prediction doubles the success rate. This result demonstrates the potential of predicted local structure for fragment-free tertiary-structure prediction. It further implies potentially significant benefits from using predicted real-valued torsion angles as a replacement for or supplement to the secondary-structure prediction tools used almost exclusively in many computational methods ranging from sequence alignment to function prediction.
Collapse
Affiliation(s)
- Eshel Faraggi
- Indiana University School of Informatics, Indiana University-Purdue University and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | | | | |
Collapse
|
5
|
|
6
|
Malkov SN, Zivković MV, Beljanski MV, Hall MB, Zarić SD. A reexamination of the propensities of amino acids towards a particular secondary structure: classification of amino acids based on their chemical structure. J Mol Model 2008; 14:769-75. [PMID: 18504624 DOI: 10.1007/s00894-008-0313-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2007] [Accepted: 04/08/2008] [Indexed: 10/22/2022]
Abstract
The correlation between the primary and secondary structures of proteins was analysed using a large data set from the Protein Data Bank. Clear preferences of amino acids towards certain secondary structures classify amino acids into four groups: alpha-helix preferrers, strand preferrers, turn and bend preferrers, and His and Cys (the latter two amino acids show no clear preference for any secondary structure). Amino acids in the same group have similar structural characteristics at their Cbeta and Cgamma atoms that predicts their preference for a particular secondary structure. All alpha-helix preferrers have neither polar heteroatoms on Cbeta and Cgamma atoms, nor branching or aromatic group on the Cbeta atom. All strand preferrers have aromatic groups or branching groups on the Cbeta atom. All turn and bend preferrers have a polar heteroatom on the Cbeta or Cgamma atoms or do not have a Cbeta atom at all. These new rules could be helpful in making predictions about non-natural amino acids.
Collapse
Affiliation(s)
- Sasa N Malkov
- Department of Mathematics, University of Belgrade, Studentski trg 16, 11000, Belgrade, Serbia
| | | | | | | | | |
Collapse
|
7
|
Abstract
This review presents the advances in protein structure prediction from the computational methods perspective. The approaches are classified into four major categories: comparative modeling, fold recognition, first principles methods that employ database information, and first principles methods without database information. Important advances along with current limitations and challenges are presented.
Collapse
Affiliation(s)
- C A Floudas
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA.
| |
Collapse
|
8
|
Lwin TZ, Luo R. Overcoming entropic barrier with coupled sampling at dual resolutions. J Chem Phys 2007; 123:194904. [PMID: 16321110 DOI: 10.1063/1.2102871] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
An enhanced sampling method is proposed for ab initio protein folding simulations. The new method couples a high-resolution model for accuracy and a low-resolution model for efficiency. It aims to overcome the entropic barrier found in the exponentially large protein conformational space when a high-resolution model, such as an all-atom molecular mechanics force field, is used. The proposed method is designed to satisfy the detailed balance condition so that the Boltzmann distribution can be generated in all sampling trajectories in both high and low resolutions. The method was tested on model analytical energy functions and ab initio folding simulations of a beta-hairpin peptide. It was found to be more efficient than replica-exchange method that is used as its building block. Analysis with the analytical energy functions shows that the number of energy calculations required to find global minima and to converge mean potential energies is much fewer with the new method. Ergodic measure shows that the new method explores the conformational space more rapidly. We also studied imperfect low-resolution energy models and found that the introduction of errors in low-resolution models does decrease its sampling efficiency. However, a reasonable increase in efficiency is still observed when the global minima of the low-resolution models are in the vicinity of the global minimum basin of the high-resolution model. Finally, our ab initio folding simulation of the tested peptide shows that the new method is able to fold the peptide in a very short simulation time. The structural distribution generated by the new method at the equilibrium portion of the trajectory resembles that in the equilibrium simulation starting from the crystal structure.
Collapse
Affiliation(s)
- Thur Zar Lwin
- Chemical and Material Physics Graduate Program, University of California, Irvine, CA 92697-3900, USA
| | | |
Collapse
|
9
|
Yang L, Tan CH, Hsieh MJ, Wang J, Duan Y, Cieplak P, Caldwell J, Kollman PA, Luo R. New-generation amber united-atom force field. J Phys Chem B 2007; 110:13166-76. [PMID: 16805629 DOI: 10.1021/jp060163v] [Citation(s) in RCA: 135] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We have developed a new-generation Amber united-atom force field for simulations involving highly demanding conformational sampling such as protein folding and protein-protein binding. In the new united-atom force field, all hydrogens on aliphatic carbons in all amino acids are united with carbons except those on Calpha. Our choice of explicit representation of all protein backbone atoms aims at minimizing perturbation to protein backbone conformational distributions and to simplify development of backbone torsion terms. Tests with dipeptides and solvated proteins show that our goal is achieved quite successfully. The new united-atom force field uses the same new RESP charging scheme based on B3LYP/cc-pVTZ//HF/6-31g** quantum mechanical calculations in the PCM continuum solvent as that in the Duan et al. force field. van der Waals parameters are empirically refitted starting from published values with respect to experimental solvation free energies of amino acid side-chain analogues. The suitability of mixing new point charges and van der Waals parameters with existing Amber covalent terms is tested on alanine dipeptide and is found to be reasonable. Parameters for all new torsion terms are refitted based on the new point charges and the van der Waals parameters. Molecular dynamics simulations of three small globular proteins in the explicit TIP3P solvent are performed to test the overall stability and accuracy of the new united-atom force field. Good agreements between the united-atom force field and the Duan et al. all-atom force field for both backbone and side-chain conformations are observed. In addition, the per-step efficiency of the new united-atom force field is demonstrated for simulations in the implicit generalized Born solvent. A speedup around two is observed over the Duan et al. all-atom force field for the three tested small proteins. Finally, the efficiency gain of the new united-atom force field in conformational sampling is further demonstrated with a well-known toy protein folding system, an 18 residue polyalanine in distance-dependent dielectric. The new united-atom force field is at least a factor of 200 more efficient than the Duan et al. all-atom force field for ab initio folding of the tested peptide.
Collapse
Affiliation(s)
- Lijiang Yang
- Department of Molecular Biology and Biochemistry, University of California, Irvine, California 92697, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Protein structure prediction by all-atom free-energy refinement. BMC STRUCTURAL BIOLOGY 2007; 7:12. [PMID: 17371594 PMCID: PMC1832197 DOI: 10.1186/1472-6807-7-12] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2006] [Accepted: 03/19/2007] [Indexed: 11/18/2022]
Abstract
Background The reliable prediction of protein tertiary structure from the amino acid sequence remains challenging even for small proteins. We have developed an all-atom free-energy protein forcefield (PFF01) that we could use to fold several small proteins from completely extended conformations. Because the computational cost of de-novo folding studies rises steeply with system size, this approach is unsuitable for structure prediction purposes. We therefore investigate here a low-cost free-energy relaxation protocol for protein structure prediction that combines heuristic methods for model generation with all-atom free-energy relaxation in PFF01. Results We use PFF01 to rank and cluster the conformations for 32 proteins generated by ROSETTA. For 22/10 high-quality/low quality decoy sets we select near-native conformations with an average Cα root mean square deviation of 3.03 Å/6.04 Å. The protocol incorporates an inherent reliability indicator that succeeds for 78% of the decoy sets. In over 90% of these cases near-native conformations are selected from the decoy set. This success rate is rationalized by the quality of the decoys and the selectivity of the PFF01 forcefield, which ranks near-native conformations an average 3.06 standard deviations below that of the relaxed decoys (Z-score). Conclusion All-atom free-energy relaxation with PFF01 emerges as a powerful low-cost approach toward generic de-novo protein structure prediction. The approach can be applied to large all-atom decoy sets of any origin and requires no preexisting structural information to identify the native conformation. The study provides evidence that a large class of proteins may be foldable by PFF01.
Collapse
|
11
|
McAllister SR, Mickus BE, Klepeis JL, Floudas CA. Novel approach for alpha-helical topology prediction in globular proteins: generation of interhelical restraints. Proteins 2007; 65:930-52. [PMID: 17029234 DOI: 10.1002/prot.21095] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The protein folding problem represents one of the most challenging problems in computational biology. Distance constraints and topology predictions can be highly useful for the folding problem in reducing the conformational space that must be searched by deterministic algorithms to find a protein structure of minimum conformational energy. We present a novel optimization framework for predicting topological contacts and generating interhelical distance restraints between hydrophobic residues in alpha-helical globular proteins. It should be emphasized that since the model does not make assumptions about the form of the helices, it is applicable to all alpha-helical proteins, including helices with kinks and irregular helices. This model aims at enhancing the ASTRO-FOLD protein folding approach of Klepeis and Floudas (Journal of Computational Chemistry 2003;24:191-208), which finds the structure of global minimum conformational energy via a constrained nonlinear optimization problem. The proposed topology prediction model was evaluated on 26 alpha-helical proteins ranging from 2 to 8 helices and 35 to 159 residues, and the best identified average interhelical distances corresponding to the predicted contacts fell below 11 A in all 26 of these systems. Given the positive results of applying the model to several protein systems, the importance of interhelical hydrophobic-to-hydrophobic contacts in determining the folding of alpha-helical globular proteins is highlighted.
Collapse
Affiliation(s)
- S R McAllister
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | | | |
Collapse
|
12
|
Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R. Advances in protein structure prediction and de novo protein design: A review. Chem Eng Sci 2006. [DOI: 10.1016/j.ces.2005.04.009] [Citation(s) in RCA: 175] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
13
|
Cheng H, Sen TZ, Kloczkowski A, Margaritis D, Jernigan RL. Prediction of protein secondary structure by mining structural fragment database. POLYMER 2005; 46:4314-4321. [PMID: 19081746 DOI: 10.1016/j.polymer.2005.02.040] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
A new method for predicting protein secondary structure from amino acid sequence has been developed. The method is based on multiple sequence alignment of the query sequence with all other sequences with known structure from the protein data bank (PDB) by using BLAST. The fragments of the alignments belonging to proteins from the PBD are then used for further analysis. We have studied various schemes of assigning weights for matching segments and calculated normalized scores to predict one of the three secondary structures: α-helix, β-sheet, or coil. We applied several artificial intelligence techniques: decision trees (DT), neural networks (NN) and support vector machines (SVM) to improve the accuracy of predictions and found that SVM gave the best performance. Preliminary data show that combining the fragment mining approach with GOR V (Kloczkowski et al, Proteins 49 (2002) 154-166) for regions of low sequence similarity improves the prediction accuracy.
Collapse
Affiliation(s)
- Haitao Cheng
- Department of Biochemistry, Biophysics and Molecular Biology, L. H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, 112 Office and Laboratory Building, Ames, IA 50011-3020, USA
| | | | | | | | | |
Collapse
|
14
|
Floudas CA. Research challenges, opportunities and synergism in systems engineering and computational biology. AIChE J 2005. [DOI: 10.1002/aic.10620] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
15
|
Zhang C, Liu S, Zhou H, Zhou Y. An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Sci 2004; 13:400-11. [PMID: 14739325 PMCID: PMC2286718 DOI: 10.1110/ps.03348304] [Citation(s) in RCA: 116] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Structure prediction on a genomic scale requires a simplified energy function that can efficiently sample the conformational space of polypeptide chains. A good energy function at minimum should discriminate native structures against decoys. Here, we show that a recently developed, residue-specific, all-atom knowledge-based potential (167 atomic types) based on distance-scaled, finite ideal-gas reference state (DFIRE-all-atom) can be substantially simplified to 20 residue types located at side-chain center of mass (DFIRE-SCM) without a significant change in its capability of structure discrimination. Using 96 standard multiple decoy sets, we show that there is only a small reduction (from 80% to 78%) in success rate of ranking native structures as the top 1. The success rate is higher than two previously developed, all-atom distance-dependent statistical pair potentials. Applied to structure selections of 21 docking decoys without modification, the DFIRE-SCM potential is 29% more successful in recognizing native complex structures than an all-atom statistical potential trained by a database of dimeric interfaces. The potential also achieves 92% accuracy in distinguishing true dimeric interfaces from artificial crystal interfaces. In addition, the DFIRE potential with the C(alpha) positions as the interaction centers recognizes 123 native structures out of a comprehensive 125-protein TOUCHSTONE decoy set in which each protein has 24,000 decoys with only C(alpha) positions. Furthermore, the performance by DFIRE-SCM on newly established 25 monomeric and 31 docking Rosetta-decoy sets is comparable to (or better than in the case of monomeric decoy sets) that of a recently developed, all-atom Rosetta energy function enhanced with an orientation-dependent hydrogen bonding potential.
Collapse
Affiliation(s)
- Chi Zhang
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, SUNY Buffalo, 124 Sherman Hall, Buffalo, NY 14214, USA
| | | | | | | |
Collapse
|
16
|
Fleishman SJ, Harrington S, Friesner RA, Honig B, Ben-Tal N. An automatic method for predicting transmembrane protein structures using cryo-EM and evolutionary data. Biophys J 2004; 87:3448-59. [PMID: 15339802 PMCID: PMC1304811 DOI: 10.1529/biophysj.104.046417] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The transmembrane (TM) domains of many integral membrane proteins are composed of alpha-helix bundles. Structure determination at high resolution (<4 A) of TM domains is still exceedingly difficult experimentally. Hence, some TM-protein structures have only been solved at intermediate (5-10 A) or low (>10 A) resolutions using, for example, cryo-electron microscopy (cryo-EM). These structures reveal the packing arrangement of the TM domain, but cannot be used to determine the positions of individual amino acids. The observation that typically, the lipid-exposed faces of TM proteins are evolutionarily more variable and less charged than their core provides a simple rule for orienting their constituent helices. Based on this rule, we developed score functions and automated methods for orienting TM helices, for which locations and tilt angles have been determined using, e.g., cryo-EM data. The method was parameterized with the aim of retrieving the native structure of bacteriorhodopsin among near- and far-from-native templates. It was then tested on proteins that differ from bacteriorhodopsin in their sequences, architectures, and functions, such as the acetylcholine receptor and rhodopsin. The predicted structures were within 1.5-3.5 A from the native state in all cases. We conclude that the computational method can be used in conjunction with cryo-EM data to obtain approximate model structures of TM domains of proteins for which a sufficiently heterogeneous set of homologs is available. We also show that in those proteins in which relatively short loops connect neighboring helices, the scoring functions can discriminate between near- and far-from-native conformations even without the constraints imposed on helix locations and tilt angles that are derived from cryo-EM.
Collapse
Affiliation(s)
- Sarel J Fleishman
- Department of Biochemistry, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat-Aviv 69978, Israel
| | | | | | | | | |
Collapse
|
17
|
Klepeis JL, Floudas CA. ASTRO-FOLD: a combinatorial and global optimization framework for Ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. Biophys J 2004; 85:2119-46. [PMID: 14507680 PMCID: PMC1303441 DOI: 10.1016/s0006-3495(03)74640-2] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The field of computational biology has been revolutionized by recent advances in genomics. The completion of a number of genome projects, including that of the human genome, has paved the way toward a variety of challenges and opportunities in bioinformatics and biological systems engineering. One of the first challenges has been the determination of the structures of proteins encoded by the individual genes. This problem, which represents the progression from sequence to structure (genomics to structural genomics), has been widely known as the structure-prediction-in-protein-folding problem. We present the development and application of ASTRO-FOLD, a novel and complete approach for the ab initio prediction of protein structures given only the amino acid sequences of the proteins. The approach exhibits many novel components and the merits of its application are examined for a suite of protein systems, including a number of targets from several critical-assessment-of-structure-prediction experiments.
Collapse
Affiliation(s)
- J L Klepeis
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 10036, USA.
| | | |
Collapse
|
18
|
Berglund A, Head RD, Welsh EA, Marshall GR. ProVal: a protein-scoring function for the selection of native and near-native folds. Proteins 2004; 54:289-302. [PMID: 14696191 DOI: 10.1002/prot.10523] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A low-resolution scoring function for the selection of native and near-native structures from a set of predicted structures for a given protein sequence has been developed. The scoring function, ProVal (Protein Validate), used several variables that describe an aspect of protein structure for which the proximity to the native structure can be assessed quantitatively. Among the parameters included are a packing estimate, surface areas, and the contact order. A partial least squares for latent variables (PLS) model was built for each candidate set of the 28 decoy sets of structures generated for 22 different proteins using the described parameters as independent variables. The C(alpha) RMS of the candidate structures versus the experimental structure was used as the dependent variable. The final generalized scoring function was an average of all models derived, ensuring that the function was not optimized for specific fold classes or method of structure generation of the candidate folds. The results show that the crystal structure was scored best in 64% of the 28 test sets and was clearly separated from the decoys in many examples. In all the other cases in which the crystal structure did not rank first, it ranked within the top 10%. Thus, although ProVal could not distinguish between predicted structures that were similar overall in fold quality due to its inherently low resolution, it can clearly be used as a primary filter to eliminate approximately 90% of fold candidates generated by current prediction methods from all-atom modeling and further evaluation. The correlation between the predicted and actual C(alpha) RMS values varies considerably between the candidate fold sets.
Collapse
Affiliation(s)
- Anders Berglund
- Center for Computational Biology, Washington University Medical School, St. Louis, Missouri 63110, USA
| | | | | | | |
Collapse
|
19
|
Lu H, Skolnick J. Application of statistical potentials to protein structure refinement from low resolution ab initio models. Biopolymers 2004; 70:575-84. [PMID: 14648767 DOI: 10.1002/bip.10537] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Recently ab initio protein structure prediction methods have advanced sufficiently so that they often assemble the correct low resolution structure of the protein. To enhance the speed of conformational search, many ab initio prediction programs adopt a reduced protein representation. However, for drug design purposes, better quality structures are probably needed. To achieve this refinement, it is natural to use a more detailed heavy atom representation. Here, as opposed to costly implicit or explicit solvent molecular dynamics simulations, knowledge-based heavy atom pair potentials were employed. By way of illustration, we tried to improve the quality of the predicted structures obtained from the ab initio prediction program TOUCHSTONE by three methods: local constraint refinement, reduced predicted tertiary contact refinement, and statistical pair potential guided molecular dynamics. Sixty-seven predicted structures from 30 small proteins (less than 150 residues in length) representing different structural classes (alpha, beta, alpha;/beta) were examined. In 33 cases, the root mean square deviation (RMSD) from native structures improved by more than 0.3 A; in 19 cases, the improvement was more than 0.5 A, and sometimes as large as 1 A. In only seven (four) cases did the refinement procedure increase the RMSD by more than 0.3 (0.5) A. For the remaining structures, the refinement procedures changed the structures by less than 0.3 A. While modest, the performance of the current refinement methods is better than the published refinement results obtained using standard molecular dynamics.
Collapse
Affiliation(s)
- Hui Lu
- Laboratory of Computational Genomics, Donald Danforth Plant Science Center, 975 N Warson St., St. Louis, MO 63132, USA
| | | |
Collapse
|
20
|
Tsai J, Bonneau R, Morozov AV, Kuhlman B, Rohl CA, Baker D. An improved protein decoy set for testing energy functions for protein structure prediction. Proteins 2003; 53:76-87. [PMID: 12945051 DOI: 10.1002/prot.10454] [Citation(s) in RCA: 139] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We have improved the original Rosetta centroid/backbone decoy set by increasing the number of proteins and frequency of near native models and by building on sidechains and minimizing clashes. The new set consists of 1,400 model structures for 78 different and diverse protein targets and provides a challenging set for the testing and evaluation of scoring functions. We evaluated the extent to which a variety of all-atom energy functions could identify the native and close-to-native structures in the new decoy sets. Of various implicit solvent models, we found that a solvent-accessible surface area-based solvation provided the best enrichment and discrimination of close-to-native decoys. The combination of this solvation treatment with Lennard Jones terms and the original Rosetta energy provided better enrichment and discrimination than any of the individual terms. The results also highlight the differences in accuracy of NMR and X-ray crystal structures: a large energy gap was observed between native and non-native conformations for X-ray structures but not for NMR structures.
Collapse
Affiliation(s)
- Jerry Tsai
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas 77843, USA.
| | | | | | | | | | | |
Collapse
|
21
|
Abstract
Protein residues that are critical for structure and function are expected to be conserved throughout evolution. Here, we investigate the extent to which these conserved residues are clustered in three-dimensional protein structures. In 92% of the proteins in a data set of 79 proteins, the most conserved positions in multiple sequence alignments are significantly more clustered than randomly selected sets of positions. The comparison to random subsets is not necessarily appropriate, however, because the signal could be the result of differences in the amino acid composition of sets of conserved residues compared to random subsets (hydrophobic residues tend to be close together in the protein core), or differences in sequence separation of the residues in the different sets. In order to overcome these limits, we compare the degree of clustering of the conserved positions on the native structure and on alternative conformations generated by the de novo structure prediction method Rosetta. For 65% of the 79 proteins, the conserved residues are significantly more clustered in the native structure than in the alternative conformations, indicating that the clustering of conserved residues in protein structures goes beyond that expected purely from sequence locality and composition effects. The differences in the spatial distribution of conserved residues can be utilized in de novo protein structure prediction: We find that for 79% of the proteins, selection of the Rosetta generated conformations with the greatest clustering of the conserved residues significantly enriches the fraction of close-to-native structures.
Collapse
Affiliation(s)
- Ora Schueler-Furman
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
| | | |
Collapse
|
22
|
Nanias M, Chinchio M, Pillardy J, Ripoll DR, Scheraga HA. Packing helices in proteins by global optimization of a potential energy function. Proc Natl Acad Sci U S A 2003; 100:1706-10. [PMID: 12571353 PMCID: PMC149897 DOI: 10.1073/pnas.252760199] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
An efficient method has been developed for packing alpha-helices in proteins. It treats alpha-helices as rigid bodies and uses a simplified Lennard-Jones potential with Miyazawa-Jernigan contact-energy parameters to describe the interactions between the alpha-helical elements in this coarse-grained system. Global conformational searches to generate packing arrangements rapidly are carried out with a Monte Carlo-with-minimization type of approach. The results for 42 proteins show that the approach reproduces native-like folds of alpha-helical proteins as low-energy local minima of this highly simplified potential function.
Collapse
Affiliation(s)
- Marian Nanias
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | | | | | | | | |
Collapse
|
23
|
Standley DM, Eyrich VA, An Y, Pincus DL, Gunn JR, Friesner RA. Protein structure prediction using a combination of sequence-based alignment, constrained energy minimization, and structural alignment. Proteins 2002; Suppl 5:133-9. [PMID: 11835490 DOI: 10.1002/prot.10005] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We present a novel approach to protein structure prediction in which fold recognition techniques are combined with ab initio folding methods. Based on the predicted secondary structure, one of two different protocols is followed. For mostly alpha proteins, global optimization and sampling of a statistical energy function is used to generate many low-energy structures; these structures are then screened against a fold library. Any structural matches are then selected for further refinement. For proteins predicted to have significant beta-content, sequence and secondary structure-based alignment is used to identify candidate templates; spatial constraints are then extracted from these templates and used, along with the statistical energy function, in the global sampling and optimization program. Successes and failures of both protocols are discussed.
Collapse
|
24
|
de la Cruz X, Hutchinson EG, Shepherd A, Thornton JM. Toward predicting protein topology: an approach to identifying beta hairpins. Proc Natl Acad Sci U S A 2002; 99:11157-62. [PMID: 12177429 PMCID: PMC123226 DOI: 10.1073/pnas.162376199] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Although secondary structure prediction methods have recently improved, progress from secondary to tertiary structure prediction has been limited. A promising but largely unexplored route to this goal is to predict structure motifs from secondary structure knowledge. Here we present a novel method for the recognition of beta hairpins that combines secondary structure predictions and threading methods by using a database search and a neural network approach. The method successfully predicts 48 and 77%, respectively, of all of hairpin and nonhairpin beta-coil-beta motifs in a protein database. We find that the main contributors to motif recognition are predicted accessibility and turn propensities.
Collapse
Affiliation(s)
- Xavier de la Cruz
- Institut Català per la Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys, 23, 08018 Barcelona, Spain.
| | | | | | | |
Collapse
|
25
|
Felts AK, Gallicchio E, Wallqvist A, Levy RM. Distinguishing native conformations of proteins from decoys with an effective free energy estimator based on the OPLS all-atom force field and the Surface Generalized Born solvent model. Proteins 2002; 48:404-22. [PMID: 12112706 DOI: 10.1002/prot.10171] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Protein decoy data sets provide a benchmark for testing scoring functions designed for fold recognition and protein homology modeling problems. It is commonly believed that statistical potentials based on reduced atomic models are better able to discriminate native-like from misfolded decoys than scoring functions based on more detailed molecular mechanics models. Recent benchmark tests on small data sets, however, suggest otherwise. In this work, we report the results of extensive decoy detection tests using an effective free energy function based on the OPLS all-atom (OPLS-AA) force field and the Surface Generalized Born (SGB) model for the solvent electrostatic effects. The OPLS-AA/SGB effective free energy is used as a scoring function to detect native protein folds among a total of 48,832 decoys for 32 different proteins from Park and Levitt's 4-state-reduced, Levitt's local-minima, Baker's ROSETTA all-atom, and Skolnick's decoy sets. Solvent electrostatic effects are included through the Surface Generalized Born (SGB) model. All structures are locally minimized without restraints. From an analysis of the individual energy components of the OPLS-AA/SGB energy function for the native and the best-ranked decoy, it is determined that a balance of the terms of the potential is responsible for the minimized energies that most successfully distinguish the native from the misfolded conformations. Different combinations of individual energy terms provide less discrimination than the total energy. The results are consistent with observations that all-atom molecular potentials coupled with intermediate level solvent dielectric models are competitive with knowledge-based potentials for decoy detection and protein modeling problems such as fold recognition and homology modeling.
Collapse
Affiliation(s)
- Anthony K Felts
- Department of Chemistry and Chemical Biology, Rutgers University, Wright-Rieman Laboratories, Piscataway, New Jersey 08854-8087, USA.
| | | | | | | |
Collapse
|
26
|
An Y, Friesner RA. A novel fold recognition method using composite predicted secondary structures. Proteins 2002; 48:352-66. [PMID: 12112702 DOI: 10.1002/prot.10145] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In this work, we introduce a new method for fold recognition using composite secondary structures assembled from different secondary structure prediction servers for a given target sequence. An automatic, complete, and robust way of finding all possible combinations of predicted secondary structure segments (SSS) for the target sequence and clustering them into a few flexible clusters, each containing patterns with the same number of SSS, is developed. This program then takes two steps in choosing plausible homologues: (i) a SSS-based alignment excludes impossible templates whose SSS patterns are very different from any of those of the target; (ii) a residue-based alignment selects good structural templates based on sequence similarity and secondary structure similarity between the target and only those templates left in the first stage. The secondary structure of each residue in the target is selected from one of the predictions to find the best match with the template. Truncation is applied to a target where different predictions vary. In most cases, a target is also divided into N-terminal and C-terminal fragments, each of which is used as a separate subsequence. Our program was tested on the fold recognition targets from CASP3 with known PDB codes and some available targets from CASP4. The results are compared with a structural homologue list for each target produced by the CE program (Shindyalov and Bourne, Protein Eng 1998;11:739-747). The program successfully locates homologues with high Z-score and low root-mean-score deviation within the top 30-50 predictions in the overwhelming majority of cases.
Collapse
Affiliation(s)
- Yuling An
- Department of Chemistry and Center for Biomolecular Simulation, Columbia University, New York, New York 10027, USA
| | | |
Collapse
|
27
|
Ruczinski I, Kooperberg C, Bonneau R, Baker D. Distributions of beta sheets in proteins with application to structure prediction. Proteins 2002; 48:85-97. [PMID: 12012340 DOI: 10.1002/prot.10123] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We recently developed the Rosetta algorithm for ab initio protein structure prediction, which generates protein structures from fragment libraries using simulated annealing. The scoring function in this algorithm favors the assembly of strands into sheets. However, it does not discriminate between different sheet motifs. After generating many structures using Rosetta, we found that the folding algorithm predominantly generates very local structures. We surveyed the distribution of beta-sheet motifs with two edge strands (open sheets) in a large set of non-homologous proteins. We investigated how much of that distribution can be accounted for by rules previously published in the literature, and developed a filter and a scoring method that enables us to improve protein structure prediction for beta-sheet proteins. Proteins 2002;48:85-97.
Collapse
Affiliation(s)
- Ingo Ruczinski
- Department of Biochemistry, University of Washington, Seattle, Washington, USA.
| | | | | | | |
Collapse
|
28
|
Abstract
Steady progress has been made in the field of ab initio protein folding. A variety of methods now allow the prediction of low-resolution structures of small proteins or protein fragments up to approximately 100 amino acid residues in length. Such low-resolution structures may be sufficient for the functional annotation of protein sequences on a genome-wide scale. Although no consistently reliable algorithm is currently available, the essential challenges to developing a general theory or approach to protein structure prediction are better understood. The energy landscapes resulting from the structure prediction algorithms are only partially funneled to the native state of the protein. This review focuses on two areas of recent advances in ab initio structure prediction-improvements in the energy functions and strategies to search the caldera region of the energy landscapes.
Collapse
Affiliation(s)
- Corey Hardin
- Center for Biophysics and Computational Biology, University of Illinois, 600 South Mathews Avenue, Urbana, Illinois 61801, USA
| | | | | |
Collapse
|
29
|
Abstract
Predicting protein structures from their amino acid sequences is a problem of global optimization. Global optima (native structures) are often sought using stochastic sampling methods such as Monte Carlo or molecular dynamics, but these methods are slow. In contrast, there are fast deterministic methods that find near-optimal solutions of well-known global optimization problems such as the traveling salesman problem (TSP). But fast TSP strategies have yet to be applied to protein folding, because of fundamental differences in the two types of problems. Here, we show how protein folding can be framed in terms of the TSP, to which we apply a variation of the Durbin-Willshaw elastic net optimization strategy. We illustrate using a simple model of proteins with database-derived statistical potentials and predicted secondary structure restraints. This optimization strategy can be applied to many different models and potential functions, and can readily incorporate experimental restraint information. It is also fast; with the simple model used here, the method finds structures that are within 5-6 A all-Calpha-atom RMSD of the known native structures for 40-mers in about 8 s on a PC; 100-mers take about 20 s. The computer time tau scales as tau approximately n, where n is the number of amino acids. This method may prove to be useful for structure refinement and prediction.
Collapse
Affiliation(s)
- Keith D Ball
- Department of Pharmaceutical Chemistry, University of California at San Francisco, 94118, USA.
| | | | | |
Collapse
|
30
|
de la Cruz X, Sillitoe I, Orengo C. Use of structure comparison methods for the refinement of protein structure predictions. I. Identifying the structural family of a protein from low-resolution models. Proteins 2002; 46:72-84. [PMID: 11746704 DOI: 10.1002/prot.10002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Predicting the three-dimensional structure of proteins is still one of the most challenging problems in molecular biology. Despite its difficulty, several investigators have started to produce consistently low-resolution predictions for small proteins. However, in most of these cases, the prediction accuracy is still too low to make them useful. In the present article, we address the problem of obtaining better-quality predictions, starting from low-resolution models. To this end, we have devised a new procedure that uses these models, together with structure comparison methods, to identify the structural family of the target protein. This would allow, in a second step not described in the present work, to refine the predictions using conserved features of the identified family. In our approach, the structure database is investigated using predictions, at different accuracy levels, for a given protein. As query structures, we used both low-resolution versions of the native structures, as well as different sets of low accuracy predictions. In general, we found that for predictions with a resolution of > or =5-7 A, structure comparison methods were able to identify the fold of a protein in the top positions.
Collapse
Affiliation(s)
- Xavier de la Cruz
- Departmento de Bioquímica y Biología Molecular Facultad de Químicas; Universidad de Barcelona, Barcelona, Spain.
| | | | | |
Collapse
|
31
|
|
32
|
Bonneau R, Baker D. Ab initio protein structure prediction: progress and prospects. ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE 2001; 30:173-89. [PMID: 11340057 DOI: 10.1146/annurev.biophys.30.1.173] [Citation(s) in RCA: 226] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Considerable recent progress has been made in the field of ab initio protein structure prediction, as witnessed by the third Critical Assessment of Structure Prediction (CASP3). In spite of this progress, much work remains, for the field has yet to produce consistently reliable ab initio structure prediction protocols. In this work, we review the features of current ab initio protocols in an attempt to highlight the foundations of recent progress in the field and suggest promising directions for future work.
Collapse
Affiliation(s)
- R Bonneau
- Department of Biochemistry, University of Washington, Seattle, Washington, Box 357350, 98195, USA.
| | | |
Collapse
|
33
|
Abstract
Methods predicting protein secondary structure improved substantially in the 1990s through the use of evolutionary information taken from the divergence of proteins in the same structural family. Recently, the evolutionary information resulting from improved searches and larger databases has again boosted prediction accuracy by more than four percentage points to its current height of around 76% of all residues predicted correctly in one of the three states, helix, strand, and other. The past year also brought successful new concepts to the field. These new methods may be particularly interesting in light of the improvements achieved through simple combining of existing methods. Divergent evolutionary profiles contain enough information not only to substantially improve prediction accuracy, but also to correctly predict long stretches of identical residues observed in alternative secondary structure states depending on nonlocal conditions. An example is a method automatically identifying structural switches and thus finding a remarkable connection between predicted secondary structure and aspects of function. Secondary structure predictions are increasingly becoming the work horse for numerous methods aimed at predicting protein structure and function. Is the recent increase in accuracy significant enough to make predictions even more useful? Because the recent improvement yields a better prediction of segments, and in particular of beta strands, I believe the answer is affirmative. What is the limit of prediction accuracy? We shall see.
Collapse
Affiliation(s)
- B Rost
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, New York 10032, USA
| |
Collapse
|
34
|
Bonneau R, Strauss CE, Baker D. Improving the performance of Rosetta using multiple sequence alignment information and global measures of hydrophobic core formation. Proteins 2001; 43:1-11. [PMID: 11170209 DOI: 10.1002/1097-0134(20010401)43:1<1::aid-prot1012>3.0.co;2-a] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
This study explores the use of multiple sequence alignment (MSA) information and global measures of hydrophobic core formation for improving the Rosetta ab initio protein structure prediction method. The most effective use of the MSA information is achieved by carrying out independent folding simulations for a subset of the homologous sequences in the MSA and then identifying the free energy minima common to all folded sequences via simultaneous clustering of the independent folding runs. Global measures of hydrophobic core formation, using ellipsoidal rather than spherical representations of the hydrophobic core, are found to be useful in removing non-native conformations before cluster analysis. Through this combination of MSA information and global measures of protein core formation, we significantly increase the performance of Rosetta on a challenging test set. Proteins 2001;43:1-11.
Collapse
Affiliation(s)
- R Bonneau
- Department of Biochemistry, Box 357350, University of Washington, Seattle, Washington, USA
| | | | | |
Collapse
|
35
|
Abstract
We present the results of a large-scale testing of the ROSETTA method for ab initio protein structure prediction. Models were generated for two independently generated lists of small proteins (up to 150 amino acid residues), and the results were evaluated using traditional rmsd based measures and a novel measure based on the structure-based comparison of the models to the structures in the PDB using DALI. For 111 of 136 all alpha and alpha/beta proteins 50 to 150 residues in length, the method produced at least one model within 7 A rmsd of the native structure in 1000 attempts. For 60 of these proteins, the closest structure match in the PDB to at least one of the ten most frequently generated conformations was found to be structurally related (four standard deviations above background) to the native protein. These results suggest that ab initio structure prediction approaches may soon be useful for generating low resolution models and identifying distantly related proteins with similar structures and perhaps functions for these classes of proteins on the genome scale.
Collapse
Affiliation(s)
- K T Simons
- Unversity of Washington, Seattle, WA 98195, USA
| | | | | |
Collapse
|
36
|
Abstract
We present a novel technique of sampling the configurations of helical proteins. Assuming knowledge of native secondary structure, we employ assembly rules gathered from a database of existing structures to enumerate the geometrically possible three-dimensional arrangements of the constituent helices. We produce a library of possible folds for 25 helical protein cores. In each case, our method finds significant numbers of conformations close to the native structure. In addition, we assign coordinates to all atoms for four of the 25 proteins and show that this has a small effect on the number of near-native conformations. In the context of database driven exhaustive enumeration our method performs extremely well, yielding significant percentages of conformations (between 0.02% and 82%) within 6 A of the native structure. The method's speed and efficiency make it a valuable tool for predicting protein structure.
Collapse
Affiliation(s)
- B Fain
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | |
Collapse
|
37
|
Carney JR, Zwier TS. The Infrared and Ultraviolet Spectra of Individual Conformational Isomers of Biomolecules: Tryptamine. J Phys Chem A 2000. [DOI: 10.1021/jp001433r] [Citation(s) in RCA: 127] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Joel R. Carney
- Department of Chemistry, Purdue University, West Lafayette, Indiana 47907-1393
| | - Timothy S. Zwier
- Department of Chemistry, Purdue University, West Lafayette, Indiana 47907-1393
| |
Collapse
|
38
|
Xia Y, Huang ES, Levitt M, Samudrala R. Ab initio construction of protein tertiary structures using a hierarchical approach. J Mol Biol 2000; 300:171-85. [PMID: 10864507 DOI: 10.1006/jmbi.2000.3835] [Citation(s) in RCA: 141] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We present a hierarchical method to predict protein tertiary structure models from sequence. We start with complete enumeration of conformations using a simple tetrahedral lattice model. We then build conformations with increasing detail, and at each step select a subset of conformations using empirical energy functions with increasing complexity. After enumeration on lattice, we select a subset of low energy conformations using a statistical residue-residue contact energy function, and generate all-atom models using predicted secondary structure. A combined knowledge-based atomic level energy function is then used to select subsets of the all-atom models. The final predictions are generated using a consensus distance geometry procedure. We test the feasibility of the procedure on a set of 12 small proteins covering a wide range of protein topologies. A rigorous double-blind test of our method was made under the auspices of the CASP3 experiment, where we did ab initio structure predictions for 12 proteins using this approach. The performance of our methodology at CASP3 is reasonably good and completely consistent with our initial tests.
Collapse
Affiliation(s)
- Y Xia
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | | | | | | |
Collapse
|
39
|
Lemak AS, Gunn JR. Rotamer-Specific Potentials of Mean Force for Residue Pair Interactions. J Phys Chem B 2000. [DOI: 10.1021/jp9919157] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Alexandre S. Lemak
- Départment de Chimie, Centre de Recherche en Calcul Appliqué, and Protein Engineering Network of Centers of Excellence, Université de Montréal, C.P. 6128, Succ. Centre-ville, Montréal, Québec H3C 3J7, Canada
| | - John R. Gunn
- Départment de Chimie, Centre de Recherche en Calcul Appliqué, and Protein Engineering Network of Centers of Excellence, Université de Montréal, C.P. 6128, Succ. Centre-ville, Montréal, Québec H3C 3J7, Canada
| |
Collapse
|