1
|
Bakare OO, Keyster M, Pretorius A. Identification of biomarkers for the accurate and sensitive diagnosis of three bacterial pneumonia pathogens using in silico approaches. BMC Mol Cell Biol 2020; 21:82. [PMID: 33218302 PMCID: PMC7678116 DOI: 10.1186/s12860-020-00328-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 11/09/2020] [Indexed: 11/10/2022] Open
Abstract
Background Pneumonia ranks as one of the main infectious sources of mortality among kids under 5 years of age, killing 2500 a day; late research has additionally demonstrated that mortality is higher in the elderly. A few biomarkers, which up to this point have been distinguished for its determination lack specificity, as these biomarkers fail to build up a differentiation between pneumonia and other related diseases, for example, pulmonary tuberculosis and Human Immunodeficiency Infection (HIV). There is an inclusive global consensus of an improved comprehension of the utilization of new biomarkers, which are delivered in light of pneumonia infection for precision identification to defeat these previously mentioned constraints. Antimicrobial peptides (AMPs) have been demonstrated to be promising remedial specialists against numerous illnesses. This research work sought to identify AMPs as biomarkers for three bacterial pneumonia pathogens such as Streptococcus pneumoniae, Klebsiella pneumoniae, Acinetobacter baumannii using in silico technology. Hidden Markov Models (HMMER) was used to identify putative anti-bacterial pneumonia AMPs against the identified receptor proteins of Streptococcus pneumoniae, Klebsiella pneumoniae, and Acinetobacter baumannii. The physicochemical parameters of these putative AMPs were computed and their 3-D structures were predicted using I-TASSER. These AMPs were subsequently subjected to docking interaction analysis against the identified bacterial pneumonia pathogen proteins using PATCHDOCK. Results The in silico results showed 18 antibacterial AMPs which were ranked based on their E values with significant physicochemical parameters in conformity with known experimentally validated AMPs. The AMPs also bound the pneumonia receptors of their respective pathogens sensitively at the extracellular regions. Conclusions The propensity of these AMPs to bind pneumonia pathogens proteins justifies that they would be potential applicant biomarkers for the recognizable detection of these bacterial pathogens in a point-of-care POC pneumonia diagnostics. The high sensitivity, accuracy, and specificity of the AMPs likewise justify the utilization of HMMER in the design and discovery of AMPs for disease diagnostics and therapeutics.
Collapse
Affiliation(s)
- Olalekan Olanrewaju Bakare
- Bioinformatics Research Group, Biotechnology Department, University of the Western Cape, Cape Town, 7535, South Africa. .,Environmental Biotechnology Laboratory, Biotechnology Department, University of the Western Cape, Cape Town, 7535, South Africa.
| | - Marshall Keyster
- Environmental Biotechnology Laboratory, Biotechnology Department, University of the Western Cape, Cape Town, 7535, South Africa
| | - Ashley Pretorius
- Bioinformatics Research Group, Biotechnology Department, University of the Western Cape, Cape Town, 7535, South Africa
| |
Collapse
|
2
|
dos Santos RN, Ferrari AJR, de Jesus HCR, Gozzo FC, Morcos F, Martínez L. Enhancing protein fold determination by exploring the complementary information of chemical cross-linking and coevolutionary signals. Bioinformatics 2018; 34:2201-2208. [DOI: 10.1093/bioinformatics/bty074] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 02/10/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ricardo N dos Santos
- Institute of Chemistry, University of Campinas, Campinas, Brazil
- Center for Computational Engineering and Sciences, University of Campinas, Campinas, Brazil
| | | | | | - Fábio C Gozzo
- Institute of Chemistry, University of Campinas, Campinas, Brazil
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, USA
| | - Leandro Martínez
- Institute of Chemistry, University of Campinas, Campinas, Brazil
- Center for Computational Engineering and Sciences, University of Campinas, Campinas, Brazil
| |
Collapse
|
3
|
Li B, Fooksa M, Heinze S, Meiler J. Finding the needle in the haystack: towards solving the protein-folding problem computationally. Crit Rev Biochem Mol Biol 2018; 53:1-28. [PMID: 28976219 PMCID: PMC6790072 DOI: 10.1080/10409238.2017.1380596] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Revised: 08/22/2017] [Accepted: 09/13/2017] [Indexed: 12/22/2022]
Abstract
Prediction of protein tertiary structures from amino acid sequence and understanding the mechanisms of how proteins fold, collectively known as "the protein folding problem," has been a grand challenge in molecular biology for over half a century. Theories have been developed that provide us with an unprecedented understanding of protein folding mechanisms. However, computational simulation of protein folding is still difficult, and prediction of protein tertiary structure from amino acid sequence is an unsolved problem. Progress toward a satisfying solution has been slow due to challenges in sampling the vast conformational space and deriving sufficiently accurate energy functions. Nevertheless, several techniques and algorithms have been adopted to overcome these challenges, and the last two decades have seen exciting advances in enhanced sampling algorithms, computational power and tertiary structure prediction methodologies. This review aims at summarizing these computational techniques, specifically conformational sampling algorithms and energy approximations that have been frequently used to study protein-folding mechanisms or to de novo predict protein tertiary structures. We hope that this review can serve as an overview on how the protein-folding problem can be studied computationally and, in cases where experimental approaches are prohibitive, help the researcher choose the most relevant computational approach for the problem at hand. We conclude with a summary of current challenges faced and an outlook on potential future directions.
Collapse
Affiliation(s)
- Bian Li
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Michaela Fooksa
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
- Chemical and Physical Biology Graduate Program, Vanderbilt University, Nashville, TN, USA
| | - Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
4
|
Deng H, Jia Y, Zhang Y. 3DRobot: automated generation of diverse and well-packed protein structure decoys. Bioinformatics 2015; 32:378-87. [PMID: 26471454 DOI: 10.1093/bioinformatics/btv601] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 10/10/2015] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Computationally generated non-native protein structure conformations (or decoys) are often used for designing protein folding simulation methods and force fields. However, almost all the decoy sets currently used in literature suffer from uneven root mean square deviation (RMSD) distribution with bias to non-protein like hydrogen-bonding and compactness patterns. Meanwhile, most protein decoy sets are pre-calculated and there is a lack of methods for automated generation of high-quality decoys for any target proteins. RESULTS We developed a new algorithm, 3DRobot, to create protein structure decoys by free fragment assembly with enhanced hydrogen-bonding and compactness interactions. The method was benchmarked with three widely used decoy sets from ab initio folding and comparative modeling simulations. The decoys generated by 3DRobot are shown to have significantly enhanced diversity and evenness with a continuous distribution in the RMSD space. The new energy terms introduced in 3DRobot improve the hydrogen-bonding network and compactness of decoys, which eliminates the possibility of native structure recognition by trivial potentials. Algorithms that can automatically create such diverse and well-packed non-native conformations from any protein structure should have a broad impact on the development of advanced protein force field and folding simulation methods. AVAILIABLITY AND IMPLEMENTATION: http://zhanglab.ccmb.med.umich.edu/3DRobot/ CONTACT jiay@phy.ccnu.edu.cn; zhng@umich.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Haiyou Deng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 45108, USA, Department of Physics and Institute of Biophysics, Central China Normal University, Wuhan 430079, China and
| | - Ya Jia
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 45108, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 45108, USA, Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 45108, USA
| |
Collapse
|
5
|
Zhang J, Barz B, Zhang J, Xu D, Kosztin I. Selective refinement and selection of near-native models in protein structure prediction. Proteins 2015; 83:1823-35. [PMID: 26214389 PMCID: PMC4700123 DOI: 10.1002/prot.24866] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Revised: 06/22/2015] [Accepted: 07/21/2015] [Indexed: 11/07/2022]
Abstract
In recent years in silico protein structure prediction reached a level where fully automated servers can generate large pools of near-native structures. However, the identification and further refinement of the best structures from the pool of models remain problematic. To address these issues, we have developed (i) a target-specific selective refinement (SR) protocol; and (ii) molecular dynamics (MD) simulation based ranking (SMDR) method. In SR the all-atom refinement of structures is accomplished via the Rosetta Relax protocol, subject to specific constraints determined by the size and complexity of the target. The best-refined models are selected with SMDR by testing their relative stability against gradual heating through all-atom MD simulations. Through extensive testing we have found that Mufold-MD, our fully automated protein structure prediction server updated with the SR and SMDR modules consistently outperformed its previous versions.
Collapse
Affiliation(s)
- Jiong Zhang
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri 65211
| | - Bagdan Barz
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri 65211
| | - Jingfen Zhang
- Department of Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211
| | - Dong Xu
- Department of Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211
| | - Ioan Kosztin
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri 65211
| |
Collapse
|
6
|
Three-dimensional protein structure prediction: Methods and computational strategies. Comput Biol Chem 2014; 53PB:251-276. [DOI: 10.1016/j.compbiolchem.2014.10.001] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 10/03/2014] [Accepted: 10/07/2014] [Indexed: 01/01/2023]
|
7
|
Abstract
Empirical protein folding potentialfunctions should have a global minimum nearthe native conformationof globular proteins that fold stably, andthey should give the correct free energy offolding. We demonstrate that otherwise verysuccessful potentials fail to have even alocal minimumanywhere near the native conformation, anda seemingly well validated method ofestimatingthe thermodynamic stability of the nativestate is extremely sensitive to smallperturbations inatomic coordinates. These are bothindicative of fitting a great deal ofirrelevant detail. Here weshow how to devise a robust potentialfunction that succeeds very well at bothtasks, at least for alimited set of proteins, and this involvesdeveloping a novel representation of thedenatured state.Predicted free energies of unfolding for 25mutants of barnase are in close agreementwith theexperimental values, while for 17 mutantsthere are substantial discrepancies.
Collapse
Affiliation(s)
- M Chhajer
- Department of Chemistry, University of North Carolina, Chapel Hill, NC 27599 U.S.A
| | | |
Collapse
|
8
|
Fan H, Schneidman-Duhovny D, Irwin JJ, Dong G, Shoichet BK, Sali A. Statistical potential for modeling and ranking of protein-ligand interactions. J Chem Inf Model 2011; 51:3078-92. [PMID: 22014038 DOI: 10.1021/ci200377u] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF(1)) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScore(CSD) and ITScore/SE and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp) and the LigScore Web server (http://salilab.org/ligscore/).
Collapse
Affiliation(s)
- Hao Fan
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, USA
| | | | | | | | | | | |
Collapse
|
9
|
Liu T, Horst JA, Samudrala R. A novel method for predicting and using distance constraints of high accuracy for refining protein structure prediction. Proteins 2009; 77:220-34. [PMID: 19422061 DOI: 10.1002/prot.22434] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The principal bottleneck in protein structure prediction is the refinement of models from lower accuracies to the resolution observed by experiment. We developed a novel constraints-based refinement method that identifies a high number of accurate input constraints from initial models and rebuilds them using restrained torsion angle dynamics (rTAD). We previously created a Bayesian statistics-based residue-specific all-atom probability discriminatory function (RAPDF) to discriminate native-like models by measuring the probability of accuracy for atom type distances within a given model. Here, we exploit RAPDF to score (i.e., filter) constraints from initial predictions that may or may not be close to a native-like state, obtain consensus of top scoring constraints amongst five initial models, and compile sets with no redundant residue pair constraints. We find that this method consistently produces a large and highly accurate set of distance constraints from which to build refinement models. We further optimize the balance between accuracy and coverage of constraints by producing multiple structure sets using different constraint distance cutoffs, and note that the cutoff governs spatially near versus distant effects in model generation. This complete procedure of deriving distance constraints for rTAD simulations improves the quality of initial predictions significantly in all cases evaluated by us. Our procedure represents a significant step in solving the protein structure prediction and refinement problem, by enabling the use of consensus constraints, RAPDF, and rTAD for protein structure modeling and refinement.
Collapse
Affiliation(s)
- Tianyun Liu
- Department of Genetics, Stanford University, Stanford, California, USA
| | | | | |
Collapse
|
10
|
Wang Z, Tegge AN, Cheng J. Evaluating the absolute quality of a single protein model using structural features and support vector machines. Proteins 2009; 75:638-47. [PMID: 19004001 DOI: 10.1002/prot.22275] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Knowing the quality of a protein structure model is important for its appropriate usage. We developed a model evaluation method to assess the absolute quality of a single protein model using only structural features with support vector machine regression. The method assigns an absolute quantitative score (i.e. GDT-TS) to a model by comparing its secondary structure, relative solvent accessibility, contact map, and beta sheet structure with their counterparts predicted from its primary sequence. We trained and tested the method on the CASP6 dataset using cross-validation. The correlation between predicted and true scores is 0.82. On the independent CASP7 dataset, the correlation averaged over 95 protein targets is 0.76; the average correlation for template-based and ab initio targets is 0.82 and 0.50, respectively. Furthermore, the predicted absolute quality scores can be used to rank models effectively. The average difference (or loss) between the scores of the top-ranked models and the best models is 5.70 on the CASP7 targets. This method performs favorably when compared with the other methods used on the same dataset. Moreover, the predicted absolute quality scores are comparable across models for different proteins. These features make the method a valuable tool for model quality assurance and ranking.
Collapse
Affiliation(s)
- Zheng Wang
- Computer Science Department, Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| | | | | |
Collapse
|
11
|
Gao X, Xu J, Li SC, Li M. Predicting local quality of a sequence-structure alignment. J Bioinform Comput Biol 2009; 7:789-810. [PMID: 19785046 DOI: 10.1142/s0219720009004345] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2009] [Revised: 04/06/2009] [Accepted: 04/07/2009] [Indexed: 11/18/2022]
Abstract
Although protein structure prediction has made great progress in recent years, a protein model derived from automated prediction methods is subject to various errors. As methods for structure prediction develop, a continuing problem is how to evaluate the quality of a protein model, especially to identify some well-predicted regions of the model, so that the structural biology community can benefit from the automated structure prediction. It is also important to identify badly-predicted regions in a model so that some refinement measurements can be applied to it. We present two complementary techniques, FragQA and PosQA, to accurately predict local quality of a sequence-structure (i.e. sequence-template) alignment generated by comparative modeling (i.e. homology modeling and threading). FragQA and PosQA predict local quality from two different perspectives. Different from existing methods, FragQA directly predicts cRMSD between a continuously aligned fragment determined by an alignment and the corresponding fragment in the native structure, while PosQA predicts the quality of an individual aligned position. Both FragQA and PosQA use an SVM (Support Vector Machine) regression method to perform prediction using similar information extracted from a single given alignment. Experimental results demonstrate that FragQA performs well on predicting local fragment quality, and PosQA outperforms two top-notch methods, ProQres and ProQprof. Our results indicate that (1) local quality can be predicted well; (2) local sequence evolutionary information (i.e. sequence similarity) is the major factor in predicting local quality; and (3) structural information such as solvent accessibility and secondary structure helps to improve the prediction performance.
Collapse
Affiliation(s)
- Xin Gao
- David R. Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, N2L 3G1, Canada.
| | | | | | | |
Collapse
|
12
|
Betancourt MR. Another look at the conditions for the extraction of protein knowledge-based potentials. Proteins 2009; 76:72-85. [PMID: 19089977 DOI: 10.1002/prot.22320] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Protein knowledge-based potentials are effective free energies obtained from databases of known protein structures. They are used to parameterize coarse-grained protein models in many folding simulation and structure prediction methods. Two common approaches are used in the derivation of knowledge-based potentials. One assumes that the energy parameters optimize the native structure stability. The other assumes that interaction events are related to their energies according to the Boltzmann distribution, and that they are distributed independently of other events, that is, the quasi-chemical approximation. Here, these assumptions are systematically tested by extracting contact energies from artificial databases of lattice proteins with predefined pairwise contact energies. Databases of protein sequences are designed to either satisfy the Boltzmann distribution at high or low temperatures, or to simultaneously optimize the native stability and folding kinetics. It is found that the quasi-chemical approximation, with the ideal reference state, accurately reproduce the true energies for high temperature Boltzmann distributed sequences (weakly interacting residues), but less accurately at low temperatures, where the sequences correspond to energy minima and the residues are strongly interacting. To overcome this problem, an iterative procedure for Boltzmann distributed sequences is introduced, which accounts for interacting residue correlations and eliminates the need for the quasi-chemical approximation. In this case, the energies are accurately reproduced at any ensemble temperature. However, when the database of sequences designed for optimal stability and kinetics is used, the energy correlation is less than optimal using either method, exhibiting random and systematic deviations from linearity. Therefore, the assumption that native structures are maximally stable or that sequences are determined according to the Boltzmann distribution seems to be inadequate for obtaining accurate energies. The limited number of sequences in the database and the inhomogeneous concentration of amino acids from one structure to another do not seem to be major obstacles for improving the quality of the extracted pairwise energies, with the exception of repulsive interactions.
Collapse
Affiliation(s)
- Marcos R Betancourt
- Department of Physics, Indiana University Purdue University Indianapolis, Indianapolis, Indiana 46202, USA.
| |
Collapse
|
13
|
Zhao F, Li S, Sterner BW, Xu J. Discriminative learning for protein conformation sampling. Proteins 2009; 73:228-40. [PMID: 18412258 DOI: 10.1002/prot.22057] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Protein structure prediction without using templates (i.e., ab initio folding) is one of the most challenging problems in structural biology. In particular, conformation sampling poses as a major bottleneck of ab initio folding. This article presents CRFSampler, an extensible protein conformation sampler, built on a probabilistic graphical model Conditional Random Fields (CRFs). Using a discriminative learning method, CRFSampler can automatically learn more than ten thousand parameters quantifying the relationship among primary sequence, secondary structure, and (pseudo) backbone angles. Using only compactness and self-avoiding constraints, CRFSampler can efficiently generate protein-like conformations from primary sequence and predicted secondary structure. CRFSampler is also very flexible in that a variety of model topologies and feature sets can be defined to model the sequence-structure relationship without worrying about parameter estimation. Our experimental results demonstrate that using a simple set of features, CRFSampler can generate decoys with much higher quality than the most recent HMM model.
Collapse
Affiliation(s)
- Feng Zhao
- Toyota Technological Institute at Chicago, Chicago, Illinois, USA
| | | | | | | |
Collapse
|
14
|
Vassura M, Margara L, Fariselli P, Casadio R. A graph theoretic approach to protein structure selection. Artif Intell Med 2009; 45:229-37. [DOI: 10.1016/j.artmed.2008.07.016] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2007] [Revised: 07/25/2008] [Accepted: 07/26/2008] [Indexed: 11/28/2022]
|
15
|
Eramian D, Eswar N, Shen MY, Sali A. How well can the accuracy of comparative protein structure models be predicted? Protein Sci 2008; 17:1881-93. [PMID: 18832340 DOI: 10.1110/ps.036061.108] [Citation(s) in RCA: 114] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Comparative structure models are available for two orders of magnitude more protein sequences than are experimentally determined structures. These models, however, suffer from two limitations that experimentally determined structures do not: They frequently contain significant errors, and their accuracy cannot be readily assessed. We have addressed the latter limitation by developing a protocol optimized specifically for predicting the Calpha root-mean-squared deviation (RMSD) and native overlap (NO3.5A) errors of a model in the absence of its native structure. In contrast to most traditional assessment scores that merely predict one model is more accurate than others, this approach quantifies the error in an absolute sense, thus helping to determine whether or not the model is suitable for intended applications. The assessment relies on a model-specific scoring function constructed by a support vector machine. This regression optimizes the weights of up to nine features, including various sequence similarity measures and statistical potentials, extracted from a tailored training set of models unique to the model being assessed: If possible, we use similarly sized models with the same fold; otherwise, we use similarly sized models with the same secondary structure composition. This protocol predicts the RMSD and NO3.5A errors for a diverse set of 580,317 comparative models of 6174 sequences with correlation coefficients (r) of 0.84 and 0.86, respectively, to the actual errors. This scoring function achieves the best correlation compared to 13 other tested assessment criteria that achieved correlations ranging from 0.35 to 0.71.
Collapse
Affiliation(s)
- David Eramian
- Graduate Group in Biophysics, University of California at San Francisco, California 94158, USA
| | | | | | | |
Collapse
|
16
|
Halperin I, Glazer DS, Wu S, Altman RB. The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications. BMC Genomics 2008; 9 Suppl 2:S2. [PMID: 18831785 PMCID: PMC2559884 DOI: 10.1186/1471-2164-9-s2-s2] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Structural genomics efforts contribute new protein structures that often lack significant sequence and fold similarity to known proteins. Traditional sequence and structure-based methods may not be sufficient to annotate the molecular functions of these structures. Techniques that combine structural and functional modeling can be valuable for functional annotation. FEATURE is a flexible framework for modeling and recognition of functional sites in macromolecular structures. Here, we present an overview of the main components of the FEATURE framework, and describe the recent developments in its use. These include automating training sets selection to increase functional coverage, coupling FEATURE to structural diversity generating methods such as molecular dynamics simulations and loop modeling methods to improve performance, and using FEATURE in large-scale modeling and structure determination efforts.
Collapse
Affiliation(s)
- Inbal Halperin
- Department of Genetics, 318 Campus Drive, Clark Center S240, Stanford, CA 94305, USA.
| | | | | | | |
Collapse
|
17
|
Ngan SC, Hung LH, Liu T, Samudrala R. Scoring functions for de novo protein structure prediction revisited. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:243-81. [PMID: 18075169 DOI: 10.1007/978-1-59745-574-9_10] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
Abstract
De novo protein structure prediction methods attempt to predict tertiary structures from sequences based on general principles that govern protein folding energetics and/or statistical tendencies of conformational features that native structures acquire, without the use of explicit templates. A general paradigm for de novo prediction involves sampling the conformational space, guided by scoring functions and other sequence-dependent biases, such that a large set of candidate ("decoy") structures are generated, and then selecting native-like conformations from those decoys using scoring functions as well as conformer clustering. High-resolution refinement is sometimes used as a final step to fine-tune native-like structures. There are two major classes of scoring functions. Physics-based functions are based on mathematical models describing aspects of the known physics of molecular interaction. Knowledge-based functions are formed with statistical models capturing aspects of the properties of native protein conformations. We discuss the implementation and use of some of the scoring functions from these two classes for de novo structure prediction in this chapter.
Collapse
Affiliation(s)
- Shing-Chung Ngan
- Department of Microbiology, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | |
Collapse
|
18
|
Liu T, Guerquin M, Samudrala R. Improving the accuracy of template-based predictions by mixing and matching between initial models. BMC STRUCTURAL BIOLOGY 2008; 8:24. [PMID: 18457597 PMCID: PMC2424052 DOI: 10.1186/1472-6807-8-24] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2007] [Accepted: 05/05/2008] [Indexed: 11/10/2022]
Abstract
BACKGROUND Comparative modeling is a technique to predict the three dimensional structure of a given protein sequence based primarily on its alignment to one or more proteins with experimentally determined structures. A major bottleneck of current comparative modeling methods is the lack of methods to accurately refine a starting initial model so that it approaches the resolution of the corresponding experimental structure. We investigate the effectiveness of a graph-theoretic clique finding approach to solve this problem. RESULTS Our method takes into account the information presented in multiple templates/alignments at the three-dimensional level by mixing and matching regions between different initial comparative models. This method enables us to obtain an optimized conformation ensemble representing the best combination of secondary structures, resulting in the refined models of higher quality. In addition, the process of mixing and matching accumulates near-native conformations, resulting in discriminating the native-like conformation in a more effective manner. In the seventh Critical Assessment of Structure Prediction (CASP7) experiment, the refined models produced are more accurate than the starting initial models. CONCLUSION This novel approach can be applied without any manual intervention to improve the quality of comparative predictions where multiple template/alignment combinations are available for modeling, producing conformational models of higher quality than the starting initial predictions.
Collapse
Affiliation(s)
- Tianyun Liu
- Department of Microbiology, University of Washington, School of Medicine, Seattle, WA 98195, USA
| | - Michal Guerquin
- Department of Microbiology, University of Washington, School of Medicine, Seattle, WA 98195, USA
| | - Ram Samudrala
- Department of Microbiology, University of Washington, School of Medicine, Seattle, WA 98195, USA
| |
Collapse
|
19
|
Felts AK, Gallicchio E, Chekmarev D, Paris KA, Friesner RA, Levy RM. Prediction of Protein Loop Conformations using the AGBNP Implicit Solvent Model and Torsion Angle Sampling. J Chem Theory Comput 2008; 4:855-868. [PMID: 18787648 DOI: 10.1021/ct800051k] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The OPLS-AA all-atom force field and the Analytical Generalized Born plus Non-Polar (AGBNP) implicit solvent model, in conjunction with torsion angle conformational search protocols based on the Protein Local Optimization Program (PLOP), are shown to be effective in predicting the native conformations of 57 9-residue and 35 13-residue loops of a diverse series of proteins with low sequence identity. The novel nonpolar solvation free energy estimator implemented in AGBNP augmented by correction terms aimed at reducing the occurrence of ion pairing are important to achieve the best prediction accuracy. Extended versions of the previously developed PLOP-based conformational search schemes based on calculations in the crystal environment are reported that are suitable for application to loop homology modeling without the crystal environment. Our results suggest that in general the loop backbone conformation is not strongly influenced by crystal packing. The application of the temperature Replica Exchange Molecular Dynamics (T-REMD) sampling method for a few examples where PLOP sampling is insufficient are also reported. The results reported indicate that the OPLS-AA/AGBNP effective potential is suitable for high-resolution modeling of proteins in the final stages of homology modeling and/or protein crystallographic refinement.
Collapse
Affiliation(s)
- Anthony K Felts
- Department of Chemistry and Chemical Biology and BioMaPS Institute for Quantitative Biology, Rutgers University, Piscataway, New Jersey 08854
| | | | | | | | | | | |
Collapse
|
20
|
Wendel C, Gohlke H. Predicting transmembrane helix pair configurations with knowledge-based distance-dependent pair potentials. Proteins 2008; 70:984-99. [PMID: 17847096 DOI: 10.1002/prot.21574] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
As a first step toward a novel de novo structure prediction approach for alpha-helical membrane proteins, we developed coarse-grained knowledge-based potentials to score the mutual configuration of transmembrane (TM) helices. Using a comprehensive database of 71 known membrane protein structures, pairwise potentials depending solely on amino acid types and distances between C(alpha)-atoms were derived. To evaluate the potentials, they were used as an objective function for the rigid docking of 442 TM helix pairs. This is by far the largest test data set reported to date for that purpose. After clustering 500 docking runs for each pair and considering the largest cluster, we found solutions with a root mean squared (RMS) deviation <2 A for about 30% of all helix pairs. Encouragingly, if only clusters that contain at least 20% of all decoys are considered, a success rate >71% (with a RMS deviation <2 A) is obtained. The cluster size thus serves as a measure of significance to identify good docking solutions. In a leave-one-protein-family-out cross-validation study, more than 2/3 of the helix pairs were still predicted with an RMS deviation <2.5 A (if only clusters that contain at least 20% of all decoys are considered). This demonstrates the predictive power of the potentials in general, although it is advisable to further extend the knowledge base to derive more robust potentials in the future. When compared to the scoring function of Fleishman and Ben-Tal, a comparable performance is found by our cross-validated potentials. Finally, well-predicted "anchor helix pairs" can be reliably identified for most of the proteins of the test data set. This is important for an extension of the approach towards TM helix bundles because these anchor pairs will act as "nucleation sites" to which more helices will be added subsequently, which alleviates the sampling problem.
Collapse
Affiliation(s)
- Christina Wendel
- Department of Biological Sciences, Molecular Bioinformatics Group, J. W. Goethe-University, Frankfurt, Germany
| | | |
Collapse
|
21
|
Panjkovich A, Melo F, Marti-Renom MA. Evolutionary potentials: structure specific knowledge-based potentials exploiting the evolutionary record of sequence homologs. Genome Biol 2008; 9:R68. [PMID: 18397517 PMCID: PMC2643939 DOI: 10.1186/gb-2008-9-4-r68] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Revised: 04/02/2008] [Accepted: 04/08/2008] [Indexed: 11/10/2022] Open
Abstract
So-called ‘Evolutionary potentials’ for protein structure prediction are derived using a single experimental protein structure and all three-dimensional models of its homologous sequences. We introduce a new type of knowledge-based potentials for protein structure prediction, called 'evolutionary potentials', which are derived using a single experimental protein structure and all three-dimensional models of its homologous sequences. The new potentials have been benchmarked against other knowledge-based potentials, resulting in a significant increase in accuracy for model assessment. In contrast to standard knowledge-based potentials, we propose that evolutionary potentials capture key determinants of thermodynamic stability and specific sequence constraints required for fast folding.
Collapse
Affiliation(s)
- Alejandro Panjkovich
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile
| | | | | |
Collapse
|
22
|
Benkert P, Tosatto SCE, Schomburg D. QMEAN: A comprehensive scoring function for model quality assessment. Proteins 2008; 71:261-77. [PMID: 17932912 DOI: 10.1002/prot.21715] [Citation(s) in RCA: 745] [Impact Index Per Article: 46.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In protein structure prediction, a considerable number of alternative models are usually produced from which subsequently the final model has to be selected. Thus, a scoring function for the identification of the best model within an ensemble of alternative models is a key component of most protein structure prediction pipelines. QMEAN, which stands for Qualitative Model Energy ANalysis, is a composite scoring function describing the major geometrical aspects of protein structures. Five different structural descriptors are used. The local geometry is analyzed by a new kind of torsion angle potential over three consecutive amino acids. A secondary structure-specific distance-dependent pairwise residue-level potential is used to assess long-range interactions. A solvation potential describes the burial status of the residues. Two simple terms describing the agreement of predicted and calculated secondary structure and solvent accessibility, respectively, are also included. A variety of different implementations are investigated and several approaches to combine and optimize them are discussed. QMEAN was tested on several standard decoy sets including a molecular dynamics simulation decoy set as well as on a comprehensive data set of totally 22,420 models from server predictions for the 95 targets of CASP7. In a comparison to five well-established model quality assessment programs, QMEAN shows a statistically significant improvement over nearly all quality measures describing the ability of the scoring function to identify the native structure and to discriminate good from bad models. The three-residue torsion angle potential turned out to be very effective in recognizing the native fold.
Collapse
Affiliation(s)
- Pascal Benkert
- Institute for Biochemistry, University of Cologne, 50674 Cologne, Germany
| | | | | |
Collapse
|
23
|
Tan CW, Jones DT. Using neural networks and evolutionary information in decoy discrimination for protein tertiary structure prediction. BMC Bioinformatics 2008; 9:94. [PMID: 18267018 PMCID: PMC2267779 DOI: 10.1186/1471-2105-9-94] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2007] [Accepted: 02/11/2008] [Indexed: 11/13/2022] Open
Abstract
Background We present a novel method of protein fold decoy discrimination using machine learning, more specifically using neural networks. Here, decoy discrimination is represented as a machine learning problem, where neural networks are used to learn the native-like features of protein structures using a set of positive and negative training examples. A set of native protein structures provides the positive training examples, while negative training examples are simulated decoy structures obtained by reversing the sequences of native structures. Various features are extracted from the training dataset of positive and negative examples and used as inputs to the neural networks. Results Results have shown that the best performing neural network is the one that uses input information comprising of PSI-BLAST [1] profiles of residue pairs, pairwise distance and the relative solvent accessibilities of the residues. This neural network is the best among all methods tested in discriminating the native structure from a set of decoys for all decoy datasets tested. Conclusion This method is demonstrated to be viable, and furthermore evolutionary information is successfully used in the neural networks to improve decoy discrimination.
Collapse
Affiliation(s)
- Ching-Wai Tan
- Department of Computer Science, University College London, London, UK.
| | | |
Collapse
|
24
|
Wallner B, Elofsson A. Prediction of global and local model quality in CASP7 using Pcons and ProQ. Proteins 2008; 69 Suppl 8:184-93. [PMID: 17894353 DOI: 10.1002/prot.21774] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The ability to rank and select the best model is important in protein structure prediction. Model Quality Assessment Programs (MQAPs) are programs developed to perform this task. They can be divided into three categories based on the information they use. Consensus based methods use the similarity to other models, structure-based methods use features calculated from the structure and evolutionary based methods use the sequence similarity between a model and a template. These methods can be trained to predict the overall global quality of a model, that is, how much a model is likely to differ from the native structure. The methods can also be trained to pinpoint which local regions in a model are likely to be incorrect. In CASP7, we participated with three predictors of global and four of local quality using information from the three categories described above. The result shows that the MQAP using consensus, Pcons, was significantly better at predicting both global and local quality compared with MQAPs using only structure or sequence based information.
Collapse
Affiliation(s)
- Björn Wallner
- Center for Biomembrane Research, Stockholm University, SE-106 91 Stockholm, Sweden.
| | | |
Collapse
|
25
|
Qiu J, Sheffler W, Baker D, Noble WS. Ranking predicted protein structures with support vector regression. Proteins 2007; 71:1175-82. [PMID: 18004754 DOI: 10.1002/prot.21809] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Jian Qiu
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | | | | | | |
Collapse
|
26
|
Abstract
Accurate and automated assessment of both geometrical errors and incompleteness of comparative protein structure models is necessary for an adequate use of the models. Here, we describe a composite score for discriminating between models with the correct and incorrect fold. To find an accurate composite score, we designed and applied a genetic algorithm method that searched for a most informative subset of 21 input model features as well as their optimized nonlinear transformation into the composite score. The 21 input features included various statistical potential scores, stereochemistry quality descriptors, sequence alignment scores, geometrical descriptors, and measures of protein packing. The optimized composite score was found to depend on (1) a statistical potential z-score for residue accessibilities and distances, (2) model compactness, and (3) percentage sequence identity of the alignment used to build the model. The accuracy of the composite score was compared with the accuracy of assessment by single and combined features as well as by other commonly used assessment methods. The testing set was representative of models produced by automated comparative modeling on a genomic scale. The composite score performed better than any other tested score in terms of the maximum correct classification rate (i.e., 3.3% false positives and 2.5% false negatives) as well as the sensitivity and specificity across the whole range of thresholds. The composite score was implemented in our program MODELLER-8 and was used to assess models in the MODBASE database that contains comparative models for domains in approximately 1.3 million protein sequences.
Collapse
Affiliation(s)
- Francisco Melo
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile.
| | | |
Collapse
|
27
|
Ferrada E, Melo F. Nonbonded terms extrapolated from nonlocal knowledge-based energy functions improve error detection in near-native protein structure models. Protein Sci 2007; 16:1410-21. [PMID: 17586774 PMCID: PMC2206707 DOI: 10.1110/ps.062735907] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
The accurate assessment of structural errors plays a key role in protein structure prediction, constitutes the first step of protein structure refinement, and has a major impact on subsequent functional inference from structural data. In this study, we assess and compare the ability of different full atom knowledge-based potentials to detect small and localized errors in comparative protein structure models of known accuracy. We have evaluated the effect of incorporating close nonbonded pairwise atom terms on the task of classifying residue modeling accuracy. Since the direct and unbiased derivation of close nonbonded terms from current experimental data is not possible, we extrapolated those terms from the corresponding pseudo-energy functions of a nonlocal knowledge-based potential. It is shown that this methodology clearly improves the detection of errors in protein models, suggesting that a proper description of close nonbonded terms is important to achieve a more complete and accurate description of native protein conformations. The use of close nonbonded terms directly derived from experimental data exhibited a poor performance, demonstrating that these terms cannot be accurately obtained by using the current data and methodology. Some external knowledge-based energy functions that are widely used in model assessment also performed poorly, which suggests that the benchmark of models and the specific error detection task tested in this study constituted a difficult challenge. The methodology presented here could be useful to detect localized structural errors not only in high-quality protein models, but also in experimental protein structures.
Collapse
Affiliation(s)
- Evandro Ferrada
- Departmento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | | |
Collapse
|
28
|
Towards the high-resolution protein structure prediction. Fast refinement of reduced models with all-atom force field. BMC STRUCTURAL BIOLOGY 2007; 7:43. [PMID: 17603876 PMCID: PMC1933428 DOI: 10.1186/1472-6807-7-43] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2007] [Accepted: 06/29/2007] [Indexed: 12/03/2022]
Abstract
Background Although experimental methods for determining protein structure are providing high resolution structures, they cannot keep the pace at which amino acid sequences are resolved on the scale of entire genomes. For a considerable fraction of proteins whose structures will not be determined experimentally, computational methods can provide valuable information. The value of structural models in biological research depends critically on their quality. Development of high-accuracy computational methods that reliably generate near-experimental quality structural models is an important, unsolved problem in the protein structure modeling. Results Large sets of structural decoys have been generated using reduced conformational space protein modeling tool CABS. Subsequently, the reduced models were subject to all-atom reconstruction. Then, the resulting detailed models were energy-minimized using state-of-the-art all-atom force field, assuming fixed positions of the alpha carbons. It has been shown that a very short minimization leads to the proper ranking of the quality of the models (distance from the native structure), when the all-atom energy is used as the ranking criterion. Additionally, we performed test on medium and low accuracy decoys built via classical methods of comparative modeling. The test placed our model evaluation procedure among the state-of-the-art protein model assessment methods. Conclusion These test computations show that a large scale high resolution protein structure prediction is possible, not only for small but also for large protein domains, and that it should be based on a hierarchical approach to the modeling protocol. We employed Molecular Mechanics with fixed alpha carbons to rank-order the all-atom models built on the scaffolds of the reduced models. Our tests show that a physic-based approach, usually considered computationally too demanding for large-scale applications, can be effectively used in such studies.
Collapse
|
29
|
Fasnacht M, Zhu J, Honig B. Local quality assessment in homology models using statistical potentials and support vector machines. Protein Sci 2007; 16:1557-68. [PMID: 17600147 PMCID: PMC2203356 DOI: 10.1110/ps.072856307] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
In this study, we address the problem of local quality assessment in homology models. As a prerequisite for the evaluation of methods for predicting local model quality, we first examine the problem of measuring local structural similarities between a model and the corresponding native structure. Several local geometric similarity measures are evaluated. Two methods based on structural superposition are found to best reproduce local model quality assessments by human experts. We then examine the performance of state-of-the-art statistical potentials in predicting local model quality on three qualitatively distinct data sets. The best statistical potential, DFIRE, is shown to perform on par with the best current structure-based method in the literature, ProQres. A combination of different statistical potentials and structural features using support vector machines is shown to provide somewhat improved performance over published methods.
Collapse
Affiliation(s)
- Marc Fasnacht
- Howard Hughes Medical Institute at Columbia University, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, New York, New York 10032, USA
| | | | | |
Collapse
|
30
|
Wollacott AM, Merz KM. Assessment of Semiempirical Quantum Mechanical Methods for the Evaluation of Protein Structures. J Chem Theory Comput 2007; 3:1609-1619. [PMID: 18728758 DOI: 10.1021/ct600325q] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The ability to discriminate native structures from computer-generated misfolded ones is key to predicting the three-dimensional structure of a protein from its amino acid sequence. Here we describe an assessment of semiempirical methods for discriminating native protein structures from decoy models. The discrimination of decoys entails an analysis of a large number of protein structures, and provides a large-scale validation of quantum mechanical methods and their ability to accurately model proteins. We combine our analysis of semiempirical methods with a comparison of an AMBER force field to discriminate decoys in conjunction with a continuum solvent model. Protein decoys provide a rigorous and reliable benchmark for the evaluation of scoring functions, not only in their ability to accurately identify native structures but also to be computationally tractable to sample a large set of non-native models.
Collapse
|
31
|
Protein structure prediction by all-atom free-energy refinement. BMC STRUCTURAL BIOLOGY 2007; 7:12. [PMID: 17371594 PMCID: PMC1832197 DOI: 10.1186/1472-6807-7-12] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2006] [Accepted: 03/19/2007] [Indexed: 11/18/2022]
Abstract
Background The reliable prediction of protein tertiary structure from the amino acid sequence remains challenging even for small proteins. We have developed an all-atom free-energy protein forcefield (PFF01) that we could use to fold several small proteins from completely extended conformations. Because the computational cost of de-novo folding studies rises steeply with system size, this approach is unsuitable for structure prediction purposes. We therefore investigate here a low-cost free-energy relaxation protocol for protein structure prediction that combines heuristic methods for model generation with all-atom free-energy relaxation in PFF01. Results We use PFF01 to rank and cluster the conformations for 32 proteins generated by ROSETTA. For 22/10 high-quality/low quality decoy sets we select near-native conformations with an average Cα root mean square deviation of 3.03 Å/6.04 Å. The protocol incorporates an inherent reliability indicator that succeeds for 78% of the decoy sets. In over 90% of these cases near-native conformations are selected from the decoy set. This success rate is rationalized by the quality of the decoys and the selectivity of the PFF01 forcefield, which ranks near-native conformations an average 3.06 standard deviations below that of the relaxed decoys (Z-score). Conclusion All-atom free-energy relaxation with PFF01 emerges as a powerful low-cost approach toward generic de-novo protein structure prediction. The approach can be applied to large all-atom decoy sets of any origin and requires no preexisting structural information to identify the native conformation. The study provides evidence that a large class of proteins may be foldable by PFF01.
Collapse
|
32
|
Shen MY, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci 2007; 15:2507-24. [PMID: 17075131 PMCID: PMC2242414 DOI: 10.1110/ps.062416606] [Citation(s) in RCA: 1792] [Impact Index Per Article: 105.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.
Collapse
Affiliation(s)
- Min-Yi Shen
- Department of Biopharmaceutical Sciences, Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, California 94158, USA.
| | | |
Collapse
|
33
|
Duan MJ, Zhou YH. A contact energy function considering residue hydrophobic environment and its application in protein fold recognition. GENOMICS PROTEOMICS & BIOINFORMATICS 2006; 3:218-24. [PMID: 16689689 PMCID: PMC5172539 DOI: 10.1016/s1672-0229(05)03030-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
The three-dimensional (3D) structure prediction of proteins is an important task in bioinformatics. Finding energy functions that can better represent residue-residue and residue-solvent interactions is a crucial way to improve the prediction accuracy. The widely used contact energy functions mostly only consider the contact frequency between different types of residues; however, we find that the contact frequency also relates to the residue hydrophobic environment. Accordingly, we present an improved contact energy function to integrate the two factors, which can reflect the influence of hydrophobic interaction on the stabilization of protein 3D structure more effectively. Furthermore, a fold recognition (threading) approach based on this energy function is developed. The testing results obtained with 20 randomly selected proteins demonstrate that, compared with common contact energy functions, the proposed energy function can improve the accuracy of the fold template prediction from 20% to 50%, and can also improve the accuracy of the sequence-template alignment from 35% to 65%.
Collapse
|
34
|
de Sancho D, Rey A. Assessment of protein folding potentials with an evolutionary method. J Chem Phys 2006; 125:014904. [PMID: 16863330 DOI: 10.1063/1.2210931] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Many different protein folding potentials have been developed in the last decades, based upon knowledge of experimentally determined protein structures. Decoy-based techniques are frequently used to assess these force fields, but other methods can explore different features in the performance of the interaction schemes, thus helping in their evaluation. Here, we propose an evolutionary strategy to efficiently assess folding potentials. We apply it to three potentials with different characteristics, taken from the bibliography. A search for minimum energy protein topologies, treated as arrangements of rigid protein fragments, is performed. The method, applied to a set of helix bundle proteins, shows the different behavior of the studied potentials, providing a reasonably fast tool to evaluate their advantages and limitations.
Collapse
Affiliation(s)
- David de Sancho
- Departamento de Química Física I, Facultad de Ciencias Químicas, Universidad Complutense, E-28040 Madrid, Spain
| | | |
Collapse
|
35
|
Eramian D, Shen MY, Devos D, Melo F, Sali A, Marti-Renom MA. A composite score for predicting errors in protein structure models. Protein Sci 2006; 15:1653-66. [PMID: 16751606 PMCID: PMC2242555 DOI: 10.1110/ps.062095806] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Reliable prediction of model accuracy is an important unsolved problem in protein structure modeling. To address this problem, we studied 24 individual assessment scores, including physics-based energy functions, statistical potentials, and machine learning-based scoring functions. Individual scores were also used to construct approximately 85,000 composite scoring functions using support vector machine (SVM) regression. The scores were tested for their abilities to identify the most native-like models from a set of 6000 comparative models of 20 representative protein structures. Each of the 20 targets was modeled using a template of <30% sequence identity, corresponding to challenging comparative modeling cases. The best SVM score outperformed all individual scores by decreasing the average RMSD difference between the model identified as the best of the set and the model with the lowest RMSD (DeltaRMSD) from 0.63 A to 0.45 A, while having a higher Pearson correlation coefficient to RMSD (r=0.87) than any other tested score. The most accurate score is based on a combination of the DOPE non-hydrogen atom statistical potential; surface, contact, and combined statistical potentials from MODPIPE; and two PSIPRED/DSSP scores. It was implemented in the SVMod program, which can now be applied to select the final model in various modeling problems, including fold assignment, target-template alignment, and loop modeling.
Collapse
Affiliation(s)
- David Eramian
- Graduate Group in Biophysics, Department of Biopharmaceutical Sciences, University of California at San Francisco 94158, USA
| | | | | | | | | | | |
Collapse
|
36
|
Qiu J, Elber R. Atomically detailed potentials to recognize native and approximate protein structures. Proteins 2006; 61:44-55. [PMID: 16080157 DOI: 10.1002/prot.20585] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Atomically detailed potentials for recognition of protein folds are presented. The potentials consist of pair interactions between atoms. One or three distance steps are used to describe the range of interactions between a pair. Training is carried out with the mathematical programming approach on the decoy sets of Baker, Levitt, and some of our own design. Recognition is required not only for decoy-native structural pairs but also for pairs of decoy and homologous structures. Performance is tested on the targets of CASP5 using templates from the Protein Data Bank, on two test ab initio decoy sets from Skolnick's laboratory, and on decoy sets from Moult's laboratory. We conclude that the newly derived potentials have significant recognition capacity, comparable to the best models derived from other techniques. The new potentials require a significantly smaller number of parameters. The enhanced recognition capacity extends primarily to the identification of structures generated by ab initio simulation and less to the recognition of approximate shapes created by homology.
Collapse
Affiliation(s)
- Jian Qiu
- Department of Computer Science, Cornell University, Ithaca, New York 14853, USA
| | | |
Collapse
|
37
|
Steinbach PJ. Exploring peptide energy landscapes: a test of force fields and implicit solvent models. Proteins 2006; 57:665-77. [PMID: 15390266 DOI: 10.1002/prot.20247] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A biased Monte Carlo-minimization/annealing conformational search was used to characterize five descriptions of the energy landscape for each of three model systems: the 20-residue "trp-cage" miniprotein, the 20-residue "BS1" peptide, and the 17-residue "U(1-17)T9D" peptide. The EEF1 and SASA energy landscapes were studied as well as those defined by using the GB/ACE implicit water model with one of three protein force fields: CHARMM19, CHARMM22, and CHARMM22/CMAP. The lowest-energy structures of the trp-cage and BS1 peptides found for the EEF1 landscape have main-chain root-mean-square deviations (rmsds) from the respective NMR structures of less than 2 A; for U(1-17)T9D, the deviation is less than 3 A using EEF1. The main-chain rmsd of the minimum-energy trp-cage conformation obtained for the GB/ACE/CHARMM22/CMAP landscape is less than 1 A. However, this energy function strongly favored helical structures for the two peptides shown by NMR to form beta-sheet structures. Brief annealing of the system following main-chain conformational changes was found to enhance the exploration of low-energy states. The thousands of simulations reported here suggest that the prediction of protein structure might be improved by the simultaneous use of a CMAP-like description of the main chain and an EEF1-like description of the solvent.
Collapse
Affiliation(s)
- Peter J Steinbach
- Center for Molecular Modeling, National Institutes of Health, DHHS, Bethesda, Maryland 20892-5624, USA.
| |
Collapse
|
38
|
Mayewski S. A multibody, whole-residue potential for protein structures, with testing by Monte Carlo simulated annealing. Proteins 2006; 59:152-69. [PMID: 15723360 DOI: 10.1002/prot.20397] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A new multibody, whole-residue potential for protein tertiary structure is described. The potential is based on the local environment surrounding each main-chain alpha carbon (CA), defined as the set of all residues whose CA coordinates lie within a spherical volume of set radius in 3-dimensional (3D) space surrounding that position. It is shown that the relative positions of the CAs in these local environments belong to a set of preferred templates. The templates are derived by cluster analysis of the presently available database of over 3000 protein chains (750,000 residues) having not more than 30% sequence similarity. For each template is derived also a set of residue propensities for each topological position in the template. Using lookup tables of these derived templates, it is then possible to calculate an energy for any conformation of a given protein sequence. The application of the potential to ab initio protein tertiary structure prediction is evaluated by performing Monte Carlo simulated annealing on test protein sequences.
Collapse
Affiliation(s)
- Stefan Mayewski
- Max-Planck-Institut für Biochemie, 82152 Martinsried, Germany.
| |
Collapse
|
39
|
Wallner B, Elofsson A. Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Sci 2006; 15:900-13. [PMID: 16522791 PMCID: PMC2242478 DOI: 10.1110/ps.051799606] [Citation(s) in RCA: 122] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
In this study we present two methods to predict the local quality of a protein model: ProQres and ProQprof. ProQres is based on structural features that can be calculated from a model, while ProQprof uses alignment information and can only be used if the model is created from an alignment. In addition, we also propose a simple approach based on local consensus, Pcons-local. We show that all these methods perform better than state-of-the-art methodologies and that, when applicable, the consensus approach is by far the best approach to predict local structure quality. It was also found that ProQprof performed better than other methods for models based on distant relationships, while ProQres performed best for models based on closer relationship, i.e., a model has to be reasonably good to make a structural evaluation useful. Finally, we show that a combination of ProQprof and ProQres (ProQlocal) performed better than any other nonconsensus method for both high- and low-quality models. Additional information and Web servers are available at: http://www.sbc.su.se/~bjorn/ProQ/.
Collapse
Affiliation(s)
- Björn Wallner
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden.
| | | |
Collapse
|
40
|
Inferring ideal amino acid interaction forms from statistical protein contact potentials. Proteins 2006; 59:49-57. [PMID: 15688450 DOI: 10.1002/prot.20380] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We have analyzed 29 different published matrices of protein pairwise contact potentials (CPs) between amino acids derived from different sets of proteins, either crystallographic structures taken from the Protein Data Bank (PDB) or computer-generated decoys. Each of the CPs is similar to 1 of the 2 matrices derived in the work of Miyazawa and Jernigan (Proteins 1999;34:49-68). The CP matrices of the first class can be approximated with a correlation of order 0.9 by the formula e(ij) = h(i) + h(j), 1 <or= i, j <or= 20, where the residue-type dependent factor h is highly correlated with the frequency of occurrence of a given amino acid type inside proteins. Electrostatic interactions for the potentials of this class are almost negligible. In the potentials belonging to this class, the major contribution to the potentials is the one-body transfer energy of the amino acid from water to the protein environment. Potentials belonging to the second class can be approximated with a correlation of 0.9 by the formula e(ij) = c(0) - h(i)h(j) + q(i)q(j), where c(0) is a constant, h is highly correlated with the Kyte-Doolittle hydrophobicity scale, and a new, less dominant, residue-type dependent factor q is correlated ( approximately 0.9) with amino acid isoelectric points pI. Including electrostatic interactions significantly improves the approximation for this class of potentials. While, the high correlation between potentials of the first class and the hydrophobic transfer energies is well known, the fact that this approximation can work well also for the second class of potentials is a new finding. We interpret potentials of this class as representing energies of contact of amino acid pairs within an average protein environment.
Collapse
|
41
|
Abstract
In this article, we explore the information content of molecular force-field calculations. We make use of exhaustive lattice models of molecular conformations and reduced alphabet sequences to determine the relative resolving power of pairwise interaction-based force fields. We find that sequence-specific interactions that operate over longer distances offer greater amounts of information than nearest-neighbor or non-sequence-specific interactions. In a companion article in this issue, we explored the information content of sequence alignment procedures and the calculation of gap penalties. Both articles have implications for protein and nucleic-acid computations.
Collapse
Affiliation(s)
- Tiba Aynechi
- Graduate Group in Biophysics, and Department of Pharmaceutical Chemistry, University of California-San Francisco, San Francisco, CA 94143, USA
| | | |
Collapse
|
42
|
Buchete NV, Straub JE, Thirumalai D. Development of novel statistical potentials for protein fold recognition. Curr Opin Struct Biol 2005; 14:225-32. [PMID: 15093838 DOI: 10.1016/j.sbi.2004.03.002] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
The need to perform large-scale studies of protein fold recognition, structure prediction and protein-protein interactions has led to novel developments of residue-level minimal models of proteins. A minimum requirement for useful protein force-fields is that they be successful in the recognition of native conformations. The balance between the level of detail in describing the specific interactions within proteins and the accuracy obtained using minimal protein models is the focus of many current protein studies. Recent results suggest that the introduction of explicit orientation dependence in a coarse-grained, residue-level model improves the ability of inter-residue potentials to recognize the native state. New statistical and optimization computational algorithms can be used to obtain accurate residue-dependent potentials for use in protein fold recognition and, more importantly, structure prediction.
Collapse
Affiliation(s)
- N-V Buchete
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | |
Collapse
|
43
|
Abstract
Cluster distance geometry is a recent generalization of distance geometry whereby protein structures can be described at even lower levels of detail than one point per residue. With improvements in the clustering technique, protein conformations can be summarized in terms of alternative contact patterns between clusters, where each cluster contains four sequentially adjacent amino acid residues. A very simple potential function involving 210 adjustable parameters can be determined that favors the native contacts of 31 small, monomeric proteins over their respective sets of nonnative contacts. This potential then favors the native contacts for 174 small, monomeric proteins that have low sequence identity with any of the training set. A broader search finds 698 small protein chains from the Protein Data Bank where the native contacts are preferred over all alternatives, even though they have low sequence identity with the training set. This amounts to a highly predictive method for ab initio protein folding at low spatial resolution.
Collapse
Affiliation(s)
- Gordon M Crippen
- College of Pharmacy, University of Michigan, Ann Arbor, Michigan 48109-1065, USA.
| |
Collapse
|
44
|
Li X, Liang J. Geometric cooperativity and anticooperativity of three-body interactions in native proteins. Proteins 2005; 60:46-65. [PMID: 15849756 DOI: 10.1002/prot.20438] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Characterizing multibody interactions of hydrophobic, polar, and ionizable residues in protein is important for understanding the stability of protein structures. We introduce a geometric model for quantifying 3-body interactions in native proteins. With this model, empirical propensity values for many types of 3-body interactions can be reliably estimated from a database of native protein structures, despite the overwhelming presence of pairwise contacts. In addition, we define a nonadditive coefficient that characterizes cooperativity and anticooperativity of residue interactions in native proteins by measuring the deviation of 3-body interactions from 3 independent pairwise interactions. It compares the 3-body propensity value from what would be expected if only pairwise interactions were considered, and highlights the distinction of propensity and cooperativity of 3-body interaction. Based on the geometric model, and what can be inferred from statistical analysis of such a model, we find that hydrophobic interactions and hydrogen-bonding interactions make nonadditive contributions to protein stability, but the nonadditive nature depends on whether such interactions are located in the protein interior or on the protein surface. When located in the interior, many hydrophobic interactions such as those involving alkyl residues are anticooperative. Salt-bridge and regular hydrogen-bonding interactions, such as those involving ionizable residues and polar residues, are cooperative. When located on the protein surface, these salt-bridge and regular hydrogen-bonding interactions are anticooperative, and hydrophobic interactions involving alkyl residues become cooperative. We show with examples that incorporating 3-body interactions improves discrimination of protein native structures against decoy conformations. In addition, analysis of cooperative 3-body interaction may reveal spatial motifs that can suggest specific protein functions.
Collapse
Affiliation(s)
- Xiang Li
- Department of Bioengineering, SEO, MC-063, University of Illinois at Chicago, Chicago, Illinois 60607-7052, USA
| | | |
Collapse
|
45
|
Abstract
Energy functions are crucial ingredients of protein tertiary structure prediction methods. Assessing the quality of energy functions is therefore of prime importance. It requires the elaboration of a standard evaluation scheme, whose key elements are: i). sets that contain the native and several non-native structures of proteins (decoys) in order to test whether the energy functions display the expected quality features and ii). measures to evaluate the reliability of energy functions. We present here a survey of the recent advances in these two related fields. In a first part, we analyze and review the large number of decoy sets that are available on the web, and we summarize the characteristics of a challenging decoy set. We then discuss how to define the quality of energy functions and review the measures related to it.
Collapse
Affiliation(s)
- D Gilis
- Center of Applied Molecular Engineering, Institute of Chemistry and Biochemistry, University of Salzburg, Jakob Haringerstrabe 3, A-5020 Salzburg, Austria.
| |
Collapse
|
46
|
Shacham S, Marantz Y, Bar-Haim S, Kalid O, Warshaviak D, Avisar N, Inbal B, Heifetz A, Fichman M, Topf M, Naor Z, Noiman S, Becker OM. PREDICT modeling and in-silico screening for G-protein coupled receptors. Proteins 2005; 57:51-86. [PMID: 15326594 DOI: 10.1002/prot.20195] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
G-protein coupled receptors (GPCRs) are a major group of drug targets for which only one x-ray structure is known (the nondrugable rhodopsin), limiting the application of structure-based drug discovery to GPCRs. In this paper we present the details of PREDICT, a new algorithmic approach for modeling the 3D structure of GPCRs without relying on homology to rhodopsin. PREDICT, which focuses on the transmembrane domain of GPCRs, starts from the primary sequence of the receptor, simultaneously optimizing multiple 'decoy' conformations of the protein in order to find its most stable structure, culminating in a virtual receptor-ligand complex. In this paper we present a comprehensive analysis of three PREDICT models for the dopamine D2, neurokinin NK1, and neuropeptide Y Y1 receptors. A shorter discussion of the CCR3 receptor model is also included. All models were found to be in good agreement with a large body of experimental data. The quality of the PREDICT models, at least for drug discovery purposes, was evaluated by their successful utilization in in-silico screening. Virtual screening using all three PREDICT models yielded enrichment factors 9-fold to 44-fold better than random screening. Namely, the PREDICT models can be used to identify active small-molecule ligands embedded in large compound libraries with an efficiency comparable to that obtained using crystal structures for non-GPCR targets.
Collapse
|
47
|
Buchete NV, Straub JE, Thirumalai D. Continuous anisotropic representation of coarse-grained potentials for proteins by spherical harmonics synthesis. J Mol Graph Model 2004; 22:441-50. [PMID: 15099839 DOI: 10.1016/j.jmgm.2003.12.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A new method is presented for extracting statistical potentials dependent on the relative side chain and backbone orientations in proteins. Coarse-grained, anisotropic potentials are constructed for short-, medium-, and long-range interactions using the Boltzmann method and a database of non-homologous protein structures. The new orientation-dependent potentials are analyzed using a spherical harmonics decomposition method with real eigenfunctions. This method permits a more realistic, continuous angular representation of the coarse-grained potentials. Results of tests for discriminating the native protein conformations from large sets of decoy proteins, show that the new continuous distance- and orientation-dependent potentials present significantly improved performance. Novel graphical representations are developed and used to depict the orientational dependence of the interaction potentials. These new continuous anisotropic statistical potentials could be instrumental in developing new computational methods for structure prediction, threading and coarse-grained simulations.
Collapse
Affiliation(s)
- N-V Buchete
- Department of Chemistry, Boston University, Boston, MA 02215, USA
| | | | | |
Collapse
|
48
|
Hsieh MJ, Luo R. Physical scoring function based on AMBER force field and Poisson-Boltzmann implicit solvent for protein structure prediction. Proteins 2004; 56:475-86. [PMID: 15229881 DOI: 10.1002/prot.20133] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
A well-behaved physics-based all-atom scoring function for protein structure prediction is analyzed with several widely used all-atom decoy sets. The scoring function, termed AMBER/Poisson-Boltzmann (PB), is based on a refined AMBER force field for intramolecular interactions and an efficient PB model for solvation interactions. Testing on the chosen decoy sets shows that the scoring function, which is designed to consider detailed chemical environments, is able to consistently discriminate all 62 native crystal structures after considering the heteroatom groups, disulfide bonds, and crystal packing effects that are not included in the decoy structures. When NMR structures are considered in the testing, the scoring function is able to discriminate 8 out of 10 targets. In the more challenging test of selecting near-native structures, the scoring function also performs very well: for the majority of the targets studied, the scoring function is able to select decoys that are close to the corresponding native structures as evaluated by ranking numbers and backbone Calpha root mean square deviations. Various important components of the scoring function are also studied to understand their discriminative contributions toward the rankings of native and near-native structures. It is found that neither the nonpolar solvation energy as modeled by the surface area model nor a higher protein dielectric constant improves its discriminative power. The terms remaining to be improved are related to 1-4 interactions. The most troublesome term is found to be the large and highly fluctuating 1-4 electrostatics term, not the dihedral-angle term. These data support ongoing efforts in the community to develop protein structure prediction methods with physics-based potentials that are competitive with knowledge-based potentials.
Collapse
Affiliation(s)
- Meng-Juei Hsieh
- Department of Molecular Biology and Biochemistry, University of California, Irvine, California 92697-3900, USA
| | | |
Collapse
|
49
|
Gromiha MM, Selvaraj S. Inter-residue interactions in protein folding and stability. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2004; 86:235-77. [PMID: 15288760 DOI: 10.1016/j.pbiomolbio.2003.09.003] [Citation(s) in RCA: 207] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
During the process of protein folding, the amino acid residues along the polypeptide chain interact with each other in a cooperative manner to form the stable native structure. The knowledge about inter-residue interactions in protein structures is very helpful to understand the mechanism of protein folding and stability. In this review, we introduce the classification of inter-residue interactions into short, medium and long range based on a simple geometric approach. The features of these interactions in different structural classes of globular and membrane proteins, and in various folds have been delineated. The development of contact potentials and the application of inter-residue contacts for predicting the structural class and secondary structures of globular proteins, solvent accessibility, fold recognition and ab initio tertiary structure prediction have been evaluated. Further, the relationship between inter-residue contacts and protein-folding rates has been highlighted. Moreover, the importance of inter-residue interactions in protein-folding kinetics and for understanding the stability of proteins has been discussed. In essence, the information gained from the studies on inter-residue interactions provides valuable insights for understanding protein folding and de novo protein design.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Aomi Frontier Building 17F, 2-43 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | |
Collapse
|
50
|
Buchete NV, Straub JE, Thirumalai D. Orientational potentials extracted from protein structures improve native fold recognition. Protein Sci 2004; 13:862-74. [PMID: 15044723 PMCID: PMC2280067 DOI: 10.1110/ps.03488704] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We develop coarse-grained, distance- and orientation-dependent statistical potentials from the growing protein structural databases. For protein structural classes (alpha, beta, and alpha/beta), a substantial number of backbone-backbone and backbone-side-chain contacts stabilize the native folds. By taking into account the importance of backbone interactions with a virtual backbone interaction center as the 21st anisotropic site, we construct a 21 x 21 interaction scheme. The new potentials are studied using spherical harmonics analysis (SHA) and a smooth, continuous version is constructed using spherical harmonic synthesis (SHS). Our approach has the following advantages: (1) The smooth, continuous form of the resulting potentials is more realistic and presents significant advantages for computational simulations, and (2) with SHS, the potential values can be computed efficiently for arbitrary coordinates, requiring only the knowledge of a few spherical harmonic coefficients. The performance of the new orientation-dependent potentials was tested using a standard database of decoy structures. The results show that the ability of the new orientation-dependent potentials to recognize native protein folds from a set of decoy structures is strongly enhanced by the inclusion of anisotropic backbone interaction centers. The anisotropic potentials can be used to develop realistic coarse-grained simulations of proteins, with direct applications to protein design, folding, and aggregation.
Collapse
|