151
|
Shrestha A, Dhamwichukorn S, Jenwitheesuk E. Modeling of pyruvate decarboxylases from ethanol producing bacteria. Bioinformation 2010; 4:378-84. [PMID: 20975902 PMCID: PMC2951667 DOI: 10.6026/97320630004378] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2009] [Revised: 12/19/2009] [Accepted: 02/10/2010] [Indexed: 11/25/2022] Open
Abstract
Pyruvate decarboxylase (PDC) is a key enzyme in homoethanol fermentation process, which decarboxylates 2-keto acid pyruvate into acetaldehyde and carbon dioxide. PDC enzymes from potential ethanol-producing bacteria such as Zymomonas mobilis, Zymobacter palmae and Sarcina ventriculi have different K(m) and k(cat) values for the substrate pyruvate at their respective optimum pH. In this study, the putative three-dimensional structures of PDC dimer of Z. palmae PDC and S. ventriculi PDC were generated based on the X-ray crystal structures of Z. mobilis PDC, Saccharomyces cerevisiae PDC form-A and Enterobacter cloacae indolepyruvate decarboxylase in order to compare the quaternary structures of these bacterial PDCs with respect to enzyme-substrate interactions, and subunit-subunit interfaces that might be related to the different biochemical characteristics. The PROCHECK scores for both models were within recommended intervals. The generated models are similar to the X-ray crystal structure of Z. mobilis PDC in terms of binding modes of the cofactor, the position of Mg(2+), and the amino acids that form the active sites. However, subunit-subunit interface analysis showed lower H-bonding in both models compared with X-ray crystal structure of Z. mobilis PDC, suggesting a smaller interface area and the possibility of conformational change upon substrate binding in both models. Both models have predicted lower affinity towards branched and aromatic 2-keto acids, which correlated with the molecular volumes of the ligands. The models shed valuable information necessary for further improvement of PDC enzymes for industrial production of ethanol and other products.
Collapse
Affiliation(s)
- Anjala Shrestha
- Joint Graduate School of Energy and Environment, King Mongkut’s University of Technology Thonburi, Prachautid Road, Toongkru, Bangkok 10140, Thailand
| | | | | |
Collapse
|
152
|
Buck PM, Bystroff C. Simulating protein folding initiation sites using an alpha-carbon-only knowledge-based force field. Proteins 2010; 76:331-42. [PMID: 19137613 DOI: 10.1002/prot.22348] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Protein folding is a hierarchical process where structure forms locally first, then globally. Some short sequence segments initiate folding through strong structural preferences that are independent of their three-dimensional context in proteins. We have constructed a knowledge-based force field in which the energy functions are conditional on local sequence patterns, as expressed in the hidden Markov model for local structure (HMMSTR). Carbon-alpha force field (CALF) builds sequence specific statistical potentials based on database frequencies for alpha-carbon virtual bond opening and dihedral angles, pair-wise contacts and hydrogen bond donor-acceptor pairs, and simulates folding via Brownian dynamics. We introduce hydrogen bond donor and acceptor potentials as alpha-carbon probability fields that are conditional on the predicted local sequence. Constant temperature simulations were carried out using 27 peptides selected as putative folding initiation sites, each 12 residues in length, representing several different local structure motifs. Each 0.6 micros trajectory was clustered based on structure. Simulation convergence or representativeness was assessed by subdividing trajectories and comparing clusters. For 21 of the 27 sequences, the largest cluster made up more than half of the total trajectory. Of these 21 sequences, 14 had cluster centers that were at most 2.6 A root mean square deviation (RMSD) from their native structure in the corresponding full-length protein. To assess the adequacy of the energy function on nonlocal interactions, 11 full length native structures were relaxed using Brownian dynamics simulations. Equilibrated structures deviated from their native states but retained their overall topology and compactness. A simple potential that folds proteins locally and stabilizes proteins globally may enable a more realistic understanding of hierarchical folding pathways.
Collapse
Affiliation(s)
- Patrick M Buck
- Department of Biology, Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, New York, USA
| | | |
Collapse
|
153
|
Thomas A, Joris B, Brasseur R. Standardized evaluation of protein stability. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2010; 1804:1265-71. [PMID: 20176144 DOI: 10.1016/j.bbapap.2010.02.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Revised: 01/24/2010] [Accepted: 02/10/2010] [Indexed: 11/25/2022]
Abstract
We compare mean force potential values of a large series of PDB models of proteins and peptides and find that, either as monomers or polymers, proteins longer than 200-250 residues have equivalent MFP values that are averaged to -65+/-3 kcal/aa. This value is named the standard or stability value. The standard value is reached irrespective of sequences and 3D folds. Peptides are too short to follow the rule and frequently exist as populations of conformers; one exception is peptides in amyloid fibrils. Fibrils surpass the standard value in accordance with their uppermost stability. In parallel, we calculate median MFP values of amino acids in stably folded PDB models of proteins: median values vary from -25 for Gly to -115 kcal/aa for Trp. These median values are used to score primary sequences of proteins: all sequences converge to a mean value of -63.5+/-2.5 kcal/aa, i.e., only 1.5 kcal less than the folded model standard. Sequences from unfolded proteins have lower values. This supports the conclusion that sequences carry in an important message and more specifically that diversity of amino acids in sequences is mandatory for stability. We also use the median amino acid MFP to score residue stability in 3D folds. This demonstrates that 3D folds are compromises between fragments of high and fragments of low scores and that functional residues are often but not always in the extreme score values. The approach opens to possibilities of evaluating any 3D model and of detecting functional residues and should help in conducting mutation assays.
Collapse
Affiliation(s)
- Annick Thomas
- CBMN, Gembloux AgroBiotech, ULg, 5030 Gembloux, Belgium.
| | | | | |
Collapse
|
154
|
Limitations of Ab initio predictions of peptide binding to MHC class II molecules. PLoS One 2010; 5:e9272. [PMID: 20174654 PMCID: PMC2822856 DOI: 10.1371/journal.pone.0009272] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2009] [Accepted: 01/21/2010] [Indexed: 11/19/2022] Open
Abstract
Successful predictions of peptide MHC binding typically require a large set of binding data for the specific MHC molecule that is examined. Structure based prediction methods promise to circumvent this requirement by evaluating the physical contacts a peptide can make with an MHC molecule based on the highly conserved 3D structure of peptide:MHC complexes. While several such methods have been described before, most are not publicly available and have not been independently tested for their performance. We here implemented and evaluated three prediction methods for MHC class II molecules: statistical potentials derived from the analysis of known protein structures; energetic evaluation of different peptide snapshots in a molecular dynamics simulation; and direct analysis of contacts made in known 3D structures of peptide:MHC complexes. These methods are ab initio in that they require structural data of the MHC molecule examined, but no specific peptide:MHC binding data. Moreover, these methods retain the ability to make predictions in a sufficiently short time scale to be useful in a real world application, such as screening a whole proteome for candidate binding peptides. A rigorous evaluation of each methods prediction performance showed that these are significantly better than random, but still substantially lower than the best performing sequence based class II prediction methods available. While the approaches presented here were developed independently, we have chosen to present our results together in order to support the notion that generating structure based predictions of peptide:MHC binding without using binding data is unlikely to give satisfactory results.
Collapse
|
155
|
Borlee BR, Goldman AD, Murakami K, Samudrala R, Wozniak DJ, Parsek MR. Pseudomonas aeruginosa uses a cyclic-di-GMP-regulated adhesin to reinforce the biofilm extracellular matrix. Mol Microbiol 2010; 75:827-42. [PMID: 20088866 PMCID: PMC2847200 DOI: 10.1111/j.1365-2958.2009.06991.x] [Citation(s) in RCA: 350] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Pseudomonas aeruginosa, the principal pathogen of cystic fibrosis patients, forms antibiotic-resistant biofilms promoting chronic colonization of the airways. The extracellular (EPS) matrix is a crucial component of biofilms that provides the community multiple benefits. Recent work suggests that the secondary messenger, cyclic-di-GMP, promotes biofilm formation. An analysis of factors specifically expressed in P. aeruginosa under conditions of elevated c-di-GMP, revealed functions involved in the production and maintenance of the biofilm extracellular matrix. We have characterized one of these components, encoded by the PA4625 gene, as a putative adhesin and designated it cdrA. CdrA shares structural similarities to extracellular adhesins that belong to two-partner secretion systems. The cdrA gene is in a two gene operon that also encodes a putative outer membrane transporter, CdrB. The cdrA gene encodes a 220 KDa protein that is predicted to be rod-shaped protein harbouring a β-helix structural motif. Western analysis indicates that the CdrA is produced as a 220 kDa proprotein and processed to 150 kDa before secretion into the extracellular medium. We demonstrated that cdrAB expression is minimal in liquid culture, but is elevated in biofilm cultures. CdrAB expression was found to promote biofilm formation and auto-aggregation in liquid culture. Aggregation mediated by CdrA is dependent on the Psl polysaccharide and can be disrupted by adding mannose, a key structural component of Psl. Immunoprecipitation of Psl present in culture supernatants resulted in co-immunoprecipitation of CdrA, providing additional evidence that CdrA directly binds to Psl. A mutation in cdrA caused a decrease in biofilm biomass and resulted in the formation of biofilms exhibiting decreased structural integrity. Psl-specific lectin staining suggests that CdrA either cross-links Psl polysaccharide polymers and/or tethers Psl to the cells, resulting in increased biofilm structural stability. Thus, this study identifies a key protein structural component of the P. aeruginosa EPS matrix.
Collapse
Affiliation(s)
- Bradley R Borlee
- Department of Microbiology, University of Washington, Box 357242, Seattle, WA 98195-7242, USA
| | | | | | | | | | | |
Collapse
|
156
|
Bahadur RP, Chakrabarti P. Discriminating the native structure from decoys using scoring functions based on the residue packing in globular proteins. BMC STRUCTURAL BIOLOGY 2009; 9:76. [PMID: 20038291 PMCID: PMC2809062 DOI: 10.1186/1472-6807-9-76] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2009] [Accepted: 12/28/2009] [Indexed: 11/14/2022]
Abstract
BACKGROUND Setting the rules for the identification of a stable conformation of a protein is of utmost importance for the efficient generation of structures in computer simulation. For structure prediction, a considerable number of possible models are generated from which the best model has to be selected. RESULTS Two scoring functions, Rs and Rp, based on the consideration of packing of residues, which indicate if the conformation of an amino acid sequence is native-like, are presented. These are defined using the solvent accessible surface area (ASA) and the partner number (PN) (other residues that are within 4.5 A) of a particular residue. The two functions evaluate the deviation from the average packing properties (ASA or PN) of all residues in a polypeptide chain corresponding to a model of its three-dimensional structure. While simple in concept and computationally less intensive, both the functions are at least as efficient as any other energy functions in discriminating the native structure from decoys in a large number of standard decoy sets, as well as on models submitted for the targets of CASP7. Rs appears to be slightly more effective than Rp, as determined by the number of times the native structure possesses the minimum value for the function and its separation from the average value for the decoys. CONCLUSION Two parameters, Rs and Rp, are discussed that can very efficiently recognize the native fold for a sequence from an ensemble of decoy structures. Unlike many other algorithms that rely on the use of composite scoring function, these are based on a single parameter, viz., the accessible surface area (or the number of residues in contact), but still able to capture the essential attribute of the native fold.
Collapse
Affiliation(s)
- Ranjit Prasad Bahadur
- Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme VIIM, Calcutta 700 054, India
- Current address: Department of Biotechnology, Indian Institute of Technology, Kharagpur 721302, West Bengal, India
| | - Pinak Chakrabarti
- Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme VIIM, Calcutta 700 054, India
| |
Collapse
|
157
|
Prediction of calcium-binding sites by combining loop-modeling with machine learning. BMC STRUCTURAL BIOLOGY 2009; 9:72. [PMID: 20003365 PMCID: PMC2808310 DOI: 10.1186/1472-6807-9-72] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2009] [Accepted: 12/11/2009] [Indexed: 01/23/2023]
Abstract
Background Protein ligand-binding sites in the apo state exhibit structural flexibility. This flexibility often frustrates methods for structure-based recognition of these sites because it leads to the absence of electron density for these critical regions, particularly when they are in surface loops. Methods for recognizing functional sites in these missing loops would be useful for recovering additional functional information. Results We report a hybrid approach for recognizing calcium-binding sites in disordered regions. Our approach combines loop modeling with a machine learning method (FEATURE) for structure-based site recognition. For validation, we compared the performance of our method on known calcium-binding sites for which there are both holo and apo structures. When loops in the apo structures are rebuilt using modeling methods, FEATURE identifies 14 out of 20 crystallographically proven calcium-binding sites. It only recognizes 7 out of 20 calcium-binding sites in the initial apo crystal structures. We applied our method to unstructured loops in proteins from SCOP families known to bind calcium in order to discover potential cryptic calcium binding sites. We built 2745 missing loops and evaluated them for potential calcium binding. We made 102 predictions of calcium-binding sites. Ten predictions are consistent with independent experimental verifications. We found indirect experimental evidence for 14 other predictions. The remaining 78 predictions are novel predictions, some with intriguing potential biological significance. In particular, we see an enrichment of beta-sheet folds with predicted calcium binding sites in the connecting loops on the surface that may be important for calcium-mediated function switches. Conclusion Protein crystal structures are a potentially rich source of functional information. When loops are missing in these structures, we may be losing important information about binding sites and active sites. We have shown that limited loop modeling (e.g. loops less than 17 residues) combined with pattern matching algorithms can recover functions and propose putative conformations associated with these functions.
Collapse
|
158
|
Shirota M, Ishida T, Kinoshita K. Analyses on hydrophobicity and attractiveness of all-atom distance-dependent potentials. Protein Sci 2009; 18:1906-15. [PMID: 19588493 DOI: 10.1002/pro.201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Accurate model evaluation is a crucial step in protein structure prediction. For this purpose, statistical potentials, which evaluate a model structure based on the observed atomic distance frequencies in comparison with those in reference states, have been widely used. The reference state is a virtual state where all of the atomic interactions are turned off, and it provides a standard to measure the observed frequencies. In this study, we examined seven all-atom distance-dependent potentials with different reference states. As results, we observed that the variations of atom pair composition and those of distance distributions in the reference states produced systematic changes in the hydrophobic and attractive characteristics of the potentials. The performance evaluations with the CASP7 structures indicated that the preference of hydrophobic interactions improved the correlation between the energy and the GDT-TS score, but decreased the Z-score of the native structure. The attractiveness of potential improved both the correlation and Z-score for template-based modeling targets, but the benefit was smaller in free modeling targets. These results indicated that the performances of the potentials were more strongly influenced by their characteristics than by the accuracy of the definitions of the reference states.
Collapse
Affiliation(s)
- Matsuyuki Shirota
- Human Genome Center, Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan
| | | | | |
Collapse
|
159
|
Aloy P, Oliva B. Splitting statistical potentials into meaningful scoring functions: testing the prediction of near-native structures from decoy conformations. BMC STRUCTURAL BIOLOGY 2009; 9:71. [PMID: 19917096 PMCID: PMC2783033 DOI: 10.1186/1472-6807-9-71] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2009] [Accepted: 11/16/2009] [Indexed: 11/20/2022]
Abstract
Background Recent advances on high-throughput technologies have produced a vast amount of protein sequences, while the number of high-resolution structures has seen a limited increase. This has impelled the production of many strategies to built protein structures from its sequence, generating a considerable amount of alternative models. The selection of the closest model to the native conformation has thus become crucial for structure prediction. Several methods have been developed to score protein models by energies, knowledge-based potentials and combination of both. Results Here, we present and demonstrate a theory to split the knowledge-based potentials in scoring terms biologically meaningful and to combine them in new scores to predict near-native structures. Our strategy allows circumventing the problem of defining the reference state. In this approach we give the proof for a simple and linear application that can be further improved by optimizing the combination of Zscores. Using the simplest composite score () we obtained predictions similar to state-of-the-art methods. Besides, our approach has the advantage of identifying the most relevant terms involved in the stability of the protein structure. Finally, we also use the composite Zscores to assess the conformation of models and to detect local errors. Conclusion We have introduced a method to split knowledge-based potentials and to solve the problem of defining a reference state. The new scores have detected near-native structures as accurately as state-of-art methods and have been successful to identify wrongly modeled regions of many near-native conformations.
Collapse
Affiliation(s)
- Patrick Aloy
- Institut de Recerca Biomèdica and Barcelona Supercomputing Center, 10-12 08028 Barcelona, Catalonia, Spain.
| | | |
Collapse
|
160
|
Liu T, Horst JA, Samudrala R. A novel method for predicting and using distance constraints of high accuracy for refining protein structure prediction. Proteins 2009; 77:220-34. [PMID: 19422061 DOI: 10.1002/prot.22434] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The principal bottleneck in protein structure prediction is the refinement of models from lower accuracies to the resolution observed by experiment. We developed a novel constraints-based refinement method that identifies a high number of accurate input constraints from initial models and rebuilds them using restrained torsion angle dynamics (rTAD). We previously created a Bayesian statistics-based residue-specific all-atom probability discriminatory function (RAPDF) to discriminate native-like models by measuring the probability of accuracy for atom type distances within a given model. Here, we exploit RAPDF to score (i.e., filter) constraints from initial predictions that may or may not be close to a native-like state, obtain consensus of top scoring constraints amongst five initial models, and compile sets with no redundant residue pair constraints. We find that this method consistently produces a large and highly accurate set of distance constraints from which to build refinement models. We further optimize the balance between accuracy and coverage of constraints by producing multiple structure sets using different constraint distance cutoffs, and note that the cutoff governs spatially near versus distant effects in model generation. This complete procedure of deriving distance constraints for rTAD simulations improves the quality of initial predictions significantly in all cases evaluated by us. Our procedure represents a significant step in solving the protein structure prediction and refinement problem, by enabling the use of consensus constraints, RAPDF, and rTAD for protein structure modeling and refinement.
Collapse
Affiliation(s)
- Tianyun Liu
- Department of Genetics, Stanford University, Stanford, California, USA
| | | | | |
Collapse
|
161
|
Gao X, Xu J, Li SC, Li M. Predicting local quality of a sequence-structure alignment. J Bioinform Comput Biol 2009; 7:789-810. [PMID: 19785046 DOI: 10.1142/s0219720009004345] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2009] [Revised: 04/06/2009] [Accepted: 04/07/2009] [Indexed: 11/18/2022]
Abstract
Although protein structure prediction has made great progress in recent years, a protein model derived from automated prediction methods is subject to various errors. As methods for structure prediction develop, a continuing problem is how to evaluate the quality of a protein model, especially to identify some well-predicted regions of the model, so that the structural biology community can benefit from the automated structure prediction. It is also important to identify badly-predicted regions in a model so that some refinement measurements can be applied to it. We present two complementary techniques, FragQA and PosQA, to accurately predict local quality of a sequence-structure (i.e. sequence-template) alignment generated by comparative modeling (i.e. homology modeling and threading). FragQA and PosQA predict local quality from two different perspectives. Different from existing methods, FragQA directly predicts cRMSD between a continuously aligned fragment determined by an alignment and the corresponding fragment in the native structure, while PosQA predicts the quality of an individual aligned position. Both FragQA and PosQA use an SVM (Support Vector Machine) regression method to perform prediction using similar information extracted from a single given alignment. Experimental results demonstrate that FragQA performs well on predicting local fragment quality, and PosQA outperforms two top-notch methods, ProQres and ProQprof. Our results indicate that (1) local quality can be predicted well; (2) local sequence evolutionary information (i.e. sequence similarity) is the major factor in predicting local quality; and (3) structural information such as solvent accessibility and secondary structure helps to improve the prediction performance.
Collapse
Affiliation(s)
- Xin Gao
- David R. Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, N2L 3G1, Canada.
| | | | | | | |
Collapse
|
162
|
Abstract
Empirical or knowledge-based potentials have many applications in structural biology such as the prediction of protein structure, protein-protein, and protein-ligand interactions and in the evaluation of stability for mutant proteins, the assessment of errors in experimentally solved structures, and the design of new proteins. Here, we describe a simple procedure to derive and use pairwise distance-dependent potentials that rely on the definition of effective atomic interactions, which attempt to capture interactions that are more likely to be physically relevant. Based on a difficult benchmark test composed of proteins with different secondary structure composition and representing many different folds, we show that the use of effective atomic interactions significantly improves the performance of potentials at discriminating between native and near-native conformations. We also found that, in agreement with previous reports, the potentials derived from the observed effective atomic interactions in native protein structures contain a larger amount of mutual information. A detailed analysis of the effective energy functions shows that atom connectivity effects, which mostly arise when deriving the potential by the incorporation of those indirect atomic interactions occurring beyond the first atomic shell, are clearly filtered out. The shape of the energy functions for direct atomic interactions representing hydrogen bonding and disulfide and salt bridges formation is almost unaffected when effective interactions are taken into account. On the contrary, the shape of the energy functions for indirect atom interactions (i.e., those describing the interaction between two atoms bound to a direct interacting pair) is clearly different when effective interactions are considered. Effective energy functions for indirect interacting atom pairs are not influenced by the shape or the energy minimum observed for the corresponding direct interacting atom pair. Our results suggest that the dependency between the signals in different energy functions is a key aspect that need to be addressed when empirical energy functions are derived and used, and also highlight the importance of additivity assumptions in the use of potential energy functions.
Collapse
Affiliation(s)
- Evandro Ferrada
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile
| | | |
Collapse
|
163
|
Cendron L, Trovato A, Seno F, Folli C, Alfieri B, Zanotti G, Berni R. Amyloidogenic potential of transthyretin variants: insights from structural and computational analyses. J Biol Chem 2009; 284:25832-41. [PMID: 19602727 PMCID: PMC2757985 DOI: 10.1074/jbc.m109.017657] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2009] [Revised: 06/17/2009] [Indexed: 11/06/2022] Open
Abstract
Human transthyretin (TTR) is an amyloidogenic protein whose mild amyloidogenicity is enhanced by many point mutations affecting considerably the amyloid disease phenotype. To ascertain whether the high amyloidogenic potential of TTR variants may be explained on the basis of the conformational change hypothesis, an aim of this work was to determine structural alterations for five amyloidogenic TTR variants crystallized under native and/or destabilizing (moderately acidic pH) conditions. While at acidic pH structural changes may be more significant because of a higher local protein flexibility, only limited alterations, possibly representing early events associated with protein destabilization, are generally induced by mutations. This study was also aimed at establishing to what extent wild-type TTR and its amyloidogenic variants are intrinsically prone to beta-aggregation. We report the results of a computational analysis predicting that wild-type TTR possesses a very high intrinsic beta-aggregation propensity which is on average not enhanced by amyloidogenic mutations. However, when located in beta-strands, most of these mutations are predicted to destabilize the native beta-structure. The analysis also shows that rat and murine TTR have a lower intrinsic beta-aggregation propensity and a similar native beta-structure stability compared with human TTR. This result is consistent with the lack of in vitro amyloidogenicity found for both murine and rat TTR. Collectively, the results of this study support the notion that the high amyloidogenic potential of human pathogenic TTR variants is determined by the destabilization of their native structures, rather than by a higher intrinsic beta-aggregation propensity.
Collapse
Affiliation(s)
- Laura Cendron
- From the Department of Biological Chemistry, University of Padua, and Istituto di Chimica Biomolecolare, Section of Padua, Viale G. Colombo 3, 35121 Padua
- the Venetian Institute of Molecular Medicine, Via Orus 2, 35129 Padua, Italy
| | - Antonio Trovato
- the Department of Physics “G. Galilei” and Consorzio Nazionale Interuniversitario per le Scienze Fisiche della Materia, University of Padua, Via Marzolo 8, 35131 Padua
| | - Flavio Seno
- the Department of Physics “G. Galilei” and Consorzio Nazionale Interuniversitario per le Scienze Fisiche della Materia, University of Padua, Via Marzolo 8, 35131 Padua
| | - Claudia Folli
- the Department of Biochemistry and Molecular Biology, University of Parma, Via G.P. Usberti 23/A, 43100 Parma, and
| | - Beatrice Alfieri
- the Department of Biochemistry and Molecular Biology, University of Parma, Via G.P. Usberti 23/A, 43100 Parma, and
| | - Giuseppe Zanotti
- From the Department of Biological Chemistry, University of Padua, and Istituto di Chimica Biomolecolare, Section of Padua, Viale G. Colombo 3, 35121 Padua
- the Venetian Institute of Molecular Medicine, Via Orus 2, 35129 Padua, Italy
| | - Rodolfo Berni
- the Department of Biochemistry and Molecular Biology, University of Parma, Via G.P. Usberti 23/A, 43100 Parma, and
| |
Collapse
|
164
|
Xu B, Yang Y, Liang H, Zhou Y. An all-atom knowledge-based energy function for protein-DNA threading, docking decoy discrimination, and prediction of transcription-factor binding profiles. Proteins 2009; 76:718-30. [PMID: 19274740 DOI: 10.1002/prot.22384] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
How to make an accurate representation of protein-DNA interaction by an energy function is a long-standing unsolved problem in structural biology. Here, we modified a statistical potential based on the distance-scaled, finite ideal-gas reference state so that it is optimized for protein-DNA interactions. The changes include a volume-fraction correction to account for unmixable atom types in proteins and DNA in addition to the usage of a low-count correction, residue/base-specific atom types, and a shorter cutoff distance for protein-DNA interactions. The new statistical energy functions are tested in threading and docking decoy discriminations and prediction of protein-DNA binding affinities and transcription-factor binding profiles. The results indicate that new proposed energy functions are among the best in existing energy functions for protein-DNA interactions. The new energy functions are available as a web-server called DDNA 2.0 at http://sparks.informatics.iupui.edu. The server version was trained by the entire 212 protein-DNA complexes.
Collapse
Affiliation(s)
- Beisi Xu
- Department of Polymer Science and Engineering, University of Science and Technology of China, Hefei, Anhui, China
| | | | | | | |
Collapse
|
165
|
Goldman AD, Leigh JA, Samudrala R. Comprehensive computational analysis of Hmd enzymes and paralogs in methanogenic Archaea. BMC Evol Biol 2009; 9:199. [PMID: 19671178 PMCID: PMC2739858 DOI: 10.1186/1471-2148-9-199] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Accepted: 08/11/2009] [Indexed: 11/29/2022] Open
Abstract
Background Methanogenesis is the sole means of energy production in methanogenic Archaea. H2-forming methylenetetrahydromethanopterin dehydrogenase (Hmd) catalyzes a step in the hydrogenotrophic methanogenesis pathway in class I methanogens. At least one hmd paralog has been identified in nine of the eleven complete genome sequences of class I hydrogenotrophic methanogens. The products of these paralog genes have thus far eluded any detailed functional characterization. Results Here we present a thorough computational analysis of Hmd enzymes and paralogs that includes state of the art phylogenetic inference, structure prediction, and functional site prediction techniques. We determine that the Hmd enzymes are phylogenetically distinct from Hmd paralogs but share a common overall structure. We predict that the active site of the Hmd enzyme is conserved as a functional site in Hmd paralogs and use this observation to propose possible molecular functions of the paralog that are consistent with previous experimental evidence. We also identify an uncharacterized site in the N-terminal domains of both proteins that is predicted by our methods to directly impart function. Conclusion This study contributes to our understanding of the evolutionary history, structural conservation, and functional roles, of the Hmd enzymes and paralogs. The results of our phylogenetic and structural analysis constitute datasets that will aid in the future study of the Hmd protein family. Our functional site predictions generate several testable hypotheses that will guide further experimental characterization of the Hmd paralog. This work also represents a novel approach to protein function prediction in which multiple computational methods are integrated to achieve a detailed characterization of proteins that are not well understood.
Collapse
Affiliation(s)
- Aaron D Goldman
- Department of Microbiology, University of Washington, Seattle, WA, USA.
| | | | | |
Collapse
|
166
|
Bernard B, Samudrala R. A generalized knowledge-based discriminatory function for biomolecular interactions. Proteins 2009; 76:115-28. [PMID: 19127590 DOI: 10.1002/prot.22323] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Several novel and established knowledge-based discriminatory function formulations and reference state derivations have been evaluated to identify parameter sets capable of distinguishing native and near-native biomolecular interactions from incorrect ones. We developed the r.m.r function, a novel atomic level radial distribution function with mean reference state that averages over all pairwise atom types from a reduced atom type composition, using experimentally determined intermolecular complexes in the Cambridge Structural Database (CSD) and the Protein Data Bank (PDB) as the information sources. We demonstrate that r.m.r had the best discriminatory accuracy and power for protein-small molecule and protein-DNA interactions, regardless of whether the native complex was included or excluded, from the test set. The superior performance of the r.m.r discriminatory function compared with seventeen alternative functions evaluated on publicly available test sets for protein-small molecule and protein-DNA interactions indicated that the function was not over optimized through back testing on a single class of biomolecular interactions. The initial success of the reduced composition and superior performance with the CSD as the distribution set over the PDB implies that further improvements and generality of the function are possible by deriving probabilities from subsets of the CSD, using structures that consist of only the atom types to be considered for given biomolecular interactions. The method is available as a web server module at http://protinfo.compbio.washington.edu.
Collapse
Affiliation(s)
- Brady Bernard
- Department of Bioengineering, University of Washington, Seattle, WA 98195, USA
| | | |
Collapse
|
167
|
Betancourt MR. Another look at the conditions for the extraction of protein knowledge-based potentials. Proteins 2009; 76:72-85. [PMID: 19089977 DOI: 10.1002/prot.22320] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Protein knowledge-based potentials are effective free energies obtained from databases of known protein structures. They are used to parameterize coarse-grained protein models in many folding simulation and structure prediction methods. Two common approaches are used in the derivation of knowledge-based potentials. One assumes that the energy parameters optimize the native structure stability. The other assumes that interaction events are related to their energies according to the Boltzmann distribution, and that they are distributed independently of other events, that is, the quasi-chemical approximation. Here, these assumptions are systematically tested by extracting contact energies from artificial databases of lattice proteins with predefined pairwise contact energies. Databases of protein sequences are designed to either satisfy the Boltzmann distribution at high or low temperatures, or to simultaneously optimize the native stability and folding kinetics. It is found that the quasi-chemical approximation, with the ideal reference state, accurately reproduce the true energies for high temperature Boltzmann distributed sequences (weakly interacting residues), but less accurately at low temperatures, where the sequences correspond to energy minima and the residues are strongly interacting. To overcome this problem, an iterative procedure for Boltzmann distributed sequences is introduced, which accounts for interacting residue correlations and eliminates the need for the quasi-chemical approximation. In this case, the energies are accurately reproduced at any ensemble temperature. However, when the database of sequences designed for optimal stability and kinetics is used, the energy correlation is less than optimal using either method, exhibiting random and systematic deviations from linearity. Therefore, the assumption that native structures are maximally stable or that sequences are determined according to the Boltzmann distribution seems to be inadequate for obtaining accurate energies. The limited number of sequences in the database and the inhomogeneous concentration of amino acids from one structure to another do not seem to be major obstacles for improving the quality of the extracted pairwise energies, with the exception of repulsive interactions.
Collapse
Affiliation(s)
- Marcos R Betancourt
- Department of Physics, Indiana University Purdue University Indianapolis, Indianapolis, Indiana 46202, USA.
| |
Collapse
|
168
|
Verma A, Wenzel W. A free-energy approach for all-atom protein simulation. Biophys J 2009; 96:3483-94. [PMID: 19413955 DOI: 10.1016/j.bpj.2008.12.3921] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Revised: 11/24/2008] [Accepted: 12/01/2008] [Indexed: 11/29/2022] Open
Abstract
All-atom free-energy methods offer a promising alternative to kinetic molecular mechanics simulations of protein folding and association. Here we report an accurate, transferable all-atom biophysical force field (PFF02) that stabilizes the native conformation of a wide range of proteins as the global optimum of the free-energy landscape. For 32 proteins of the ROSETTA decoy set and six proteins that we have previously folded with PFF01, we find near-native conformations with an average backbone RMSD of 2.14 A to the native conformation and an average Z-score of -3.46 to the corresponding decoy set. We used nonequilibrium sampling techniques starting from completely extended conformations to exhaustively sample the energy surface of three nonhomologous hairpin-peptides, a three-stranded beta-sheet, the all-helical 40 amino-acid HIV accessory protein, and a zinc-finger beta beta alpha motif, and find near-native conformations for the minimal energy for each protein. Using a massively parallel evolutionary algorithm, we also obtain a near-native low-energy conformation for the 54 amino-acid engrailed homeodomain. Our force field thus stabilized near-native conformations for a total of 20 proteins of all structure classes with an average RMSD of only 3.06 A to their respective experimental conformations.
Collapse
Affiliation(s)
- Abhinav Verma
- Institute of Scientific Computing, Forschungszentrum Karlsruhe, Karlsruhe, Germany
| | | |
Collapse
|
169
|
Benkert P, Schwede T, Tosatto SC. QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information. BMC STRUCTURAL BIOLOGY 2009; 9:35. [PMID: 19457232 PMCID: PMC2709111 DOI: 10.1186/1472-6807-9-35] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2008] [Accepted: 05/20/2009] [Indexed: 11/10/2022]
Abstract
BACKGROUND The selection of the most accurate protein model from a set of alternatives is a crucial step in protein structure prediction both in template-based and ab initio approaches. Scoring functions have been developed which can either return a quality estimate for a single model or derive a score from the information contained in the ensemble of models for a given sequence. Local structural features occurring more frequently in the ensemble have a greater probability of being correct. Within the context of the CASP experiment, these so called consensus methods have been shown to perform considerably better in selecting good candidate models, but tend to fail if the best models are far from the dominant structural cluster. In this paper we show that model selection can be improved if both approaches are combined by pre-filtering the models used during the calculation of the structural consensus. RESULTS Our recently published QMEAN composite scoring function has been improved by including an all-atom interaction potential term. The preliminary model ranking based on the new QMEAN score is used to select a subset of reliable models against which the structural consensus score is calculated. This scoring function called QMEANclust achieves a correlation coefficient of predicted quality score and GDT_TS of 0.9 averaged over the 98 CASP7 targets and perform significantly better in selecting good models from the ensemble of server models than any other groups participating in the quality estimation category of CASP7. Both scoring functions are also benchmarked on the MOULDER test set consisting of 20 target proteins each with 300 alternatives models generated by MODELLER. QMEAN outperforms all other tested scoring functions operating on individual models, while the consensus method QMEANclust only works properly on decoy sets containing a certain fraction of near-native conformations. We also present a local version of QMEAN for the per-residue estimation of model quality (QMEANlocal) and compare it to a new local consensus-based approach. CONCLUSION Improved model selection is obtained by using a composite scoring function operating on single models in order to enrich higher quality models which are subsequently used to calculate the structural consensus. The performance of consensus-based methods such as QMEANclust highly depends on the composition and quality of the model ensemble to be analysed. Therefore, performance estimates for consensus methods based on large meta-datasets (e.g. CASP) might overrate their applicability in more realistic modelling situations with smaller sets of models based on individual methods.
Collapse
Affiliation(s)
- Pascal Benkert
- Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland.
| | | | | |
Collapse
|
170
|
Benkert P, Künzli M, Schwede T. QMEAN server for protein model quality estimation. Nucleic Acids Res 2009; 37:W510-4. [PMID: 19429685 DOI: 10.1093/nar/gkp322] [Citation(s) in RCA: 593] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Model quality estimation is an essential component of protein structure prediction, since ultimately the accuracy of a model determines its usefulness for specific applications. Usually, in the course of protein structure prediction a set of alternative models is produced, from which subsequently the most accurate model has to be selected. The QMEAN server provides access to two scoring functions successfully tested at the eighth round of the community-wide blind test experiment CASP. The user can choose between the composite scoring function QMEAN, which derives a quality estimate on the basis of the geometrical analysis of single models, and the clustering-based scoring function QMEANclust which calculates a global and local quality estimate based on a weighted all-against-all comparison of the models from the ensemble provided by the user. The web server performs a ranking of the input models and highlights potentially problematic regions for each model. The QMEAN server is available at http://swissmodel.expasy.org/qmean.
Collapse
|
171
|
Kittichotirat W, Guerquin M, Bumgarner RE, Samudrala R. Protinfo PPC: a web server for atomic level prediction of protein complexes. Nucleic Acids Res 2009; 37:W519-25. [PMID: 19420059 PMCID: PMC2703994 DOI: 10.1093/nar/gkp306] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
‘Protinfo PPC’ (Prediction of Protein Complex) is a web server that predicts atomic level structures of interacting proteins from their amino-acid sequences. It uses the interolog method to search for experimental protein complex structures that are homologous to the input sequences submitted by a user. These structures are then used as starting templates to generate protein complex models, which are returned to the user in Protein Data Bank format via email. The server supports modeling of both homo and hetero multimers and generally produces full atomic level models (including insertion/deletion regions) of protein complexes as long as at least one putative homologous template for the query sequences is found. The modeling pipeline behind Protinfo PPC has been rigorously benchmarked and proven to produce highly accurate protein complex models. The fully automated all atom comparative modeling service for protein complexes provided by Protinfo PPC server offers wide capabilities ranging from prediction of protein complex interactions to identification of possible interaction sites, which will be useful for researchers studying these topics. The Protinfo PPC web server is available at http://protinfo.compbio.washington.edu/ppc/
Collapse
|
172
|
Vadivel K, Namasivayam G. An estimate of the numbers and density of low-energy structures (or decoys) in the conformational landscape of proteins. PLoS One 2009; 4:e5148. [PMID: 19357778 PMCID: PMC2663821 DOI: 10.1371/journal.pone.0005148] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2008] [Accepted: 03/02/2009] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The conformational energy landscape of a protein, as calculated by known potential energy functions, has several minima, and one of these corresponds to its native structure. It is however difficult to comprehensively estimate the actual numbers of low energy structures (or decoys), the relationships between them, and how the numbers scale with the size of the protein. METHODOLOGY We have developed an algorithm to rapidly and efficiently identify the low energy conformers of oligo peptides by using mutually orthogonal Latin squares to sample the potential energy hyper surface. Using this algorithm, and the ECEPP/3 potential function, we have made an exhaustive enumeration of the low-energy structures of peptides of different lengths, and have extrapolated these results to larger polypeptides. CONCLUSIONS AND SIGNIFICANCE We show that the number of native-like structures for a polypeptide is, in general, an exponential function of its sequence length. The density of these structures in conformational space remains more or less constant and all the increase appears to come from an expansion in the volume of the space. These results are consistent with earlier reports that were based on other models and techniques.
Collapse
Affiliation(s)
- Kanagasabai Vadivel
- Centre of Advanced Study in Crystallography & Biophysics, University of Madras, Tamilnadu, India
| | - Gautham Namasivayam
- Centre of Advanced Study in Crystallography & Biophysics, University of Madras, Tamilnadu, India
- * E-mail:
| |
Collapse
|
173
|
Gao M, Skolnick J. From nonspecific DNA-protein encounter complexes to the prediction of DNA-protein interactions. PLoS Comput Biol 2009; 5:e1000341. [PMID: 19343221 PMCID: PMC2659451 DOI: 10.1371/journal.pcbi.1000341] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2008] [Accepted: 02/26/2009] [Indexed: 11/19/2022] Open
Abstract
DNA–protein interactions are involved in many essential biological
activities. Because there is no simple mapping code between DNA base pairs and
protein amino acids, the prediction of DNA–protein interactions is a
challenging problem. Here, we present a novel computational approach for
predicting DNA-binding protein residues and DNA–protein interaction
modes without knowing its specific DNA target sequence. Given the structure of a
DNA-binding protein, the method first generates an ensemble of complex
structures obtained by rigid-body docking with a nonspecific canonical B-DNA.
Representative models are subsequently selected through clustering and ranking
by their DNA–protein interfacial energy. Analysis of these encounter
complex models suggests that the recognition sites for specific DNA binding are
usually favorable interaction sites for the nonspecific DNA probe and that
nonspecific DNA–protein interaction modes exhibit some similarity to
specific DNA–protein binding modes. Although the method requires as
input the knowledge that the protein binds DNA, in benchmark tests, it achieves
better performance in identifying DNA-binding sites than three previously
established methods, which are based on sophisticated machine-learning
techniques. We further apply our method to protein structures predicted through
modeling and demonstrate that our method performs satisfactorily on protein
models whose root-mean-square Cα deviation from native is up to 5
Å from their native structures. This study provides valuable
structural insights into how a specific DNA-binding protein interacts with a
nonspecific DNA sequence. The similarity between the specific
DNA–protein interaction mode and nonspecific interaction modes may
reflect an important sampling step in search of its specific DNA targets by a
DNA-binding protein. Many essential biological activities require interactions between DNA and
proteins. These proteins usually use certain amino acids, called DNA-binding
sites, to recognize their specific DNA targets. To facilitate the search of its
specific DNA targets, a DNA-binding protein often associates with nonspecific
DNA and then diffuses along the DNA. Due to the weak interactions between
nonspecific DNA and the protein, structural characterization of nonspecific
DNA–protein complexes is experimentally challenging. This paper
describes a computational modeling study on nonspecific DNA–protein
complexes and comparative analysis with respect to specific
DNA–protein complexes. The study found that the specific DNA-binding
sites on a protein are typically favorable for nonspecific DNA and that
nonspecific and specific DNA–protein interaction modes are quite
similar. This similarity may reflect an important sampling step in the search
for the specific DNA target sequence by a DNA-binding protein. On the basis of
these observations, a novel method was proposed for predicting DNA-binding sites
and binding modes of a DNA-binding protein without knowing its specific DNA
target sequence. Ultimately, the combination of this method and protein
structure prediction may lead the way to high throughput modeling of
DNA–protein interactions.
Collapse
Affiliation(s)
- Mu Gao
- Center for the Study of Systems Biology, School of Biology, Georgia
Institute of Technology, Atlanta, Georgia, United States of America
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia
Institute of Technology, Atlanta, Georgia, United States of America
- * E-mail:
| |
Collapse
|
174
|
Zhao F, Li S, Sterner BW, Xu J. Discriminative learning for protein conformation sampling. Proteins 2009; 73:228-40. [PMID: 18412258 DOI: 10.1002/prot.22057] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Protein structure prediction without using templates (i.e., ab initio folding) is one of the most challenging problems in structural biology. In particular, conformation sampling poses as a major bottleneck of ab initio folding. This article presents CRFSampler, an extensible protein conformation sampler, built on a probabilistic graphical model Conditional Random Fields (CRFs). Using a discriminative learning method, CRFSampler can automatically learn more than ten thousand parameters quantifying the relationship among primary sequence, secondary structure, and (pseudo) backbone angles. Using only compactness and self-avoiding constraints, CRFSampler can efficiently generate protein-like conformations from primary sequence and predicted secondary structure. CRFSampler is also very flexible in that a variety of model topologies and feature sets can be defined to model the sequence-structure relationship without worrying about parameter estimation. Our experimental results demonstrate that using a simple set of features, CRFSampler can generate decoys with much higher quality than the most recent HMM model.
Collapse
Affiliation(s)
- Feng Zhao
- Toyota Technological Institute at Chicago, Chicago, Illinois, USA
| | | | | | | |
Collapse
|
175
|
da Silveira CH, Pires DEV, Minardi RC, Ribeiro C, Veloso CJM, Lopes JCD, Meira W, Neshich G, Ramos CHI, Habesch R, Santoro MM. Protein cutoff scanning: A comparative analysis of cutoff dependent and cutoff free methods for prospecting contacts in proteins. Proteins 2009; 74:727-43. [PMID: 18704933 DOI: 10.1002/prot.22187] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Carlos H da Silveira
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, UFMG, Brazil.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
176
|
Hartmann C, Antes I, Lengauer T. Docking and scoring with alternative side-chain conformations. Proteins 2009; 74:712-26. [PMID: 18704939 DOI: 10.1002/prot.22189] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We describe a scoring and modeling procedure for docking ligands into protein models that have either modeled or flexible side-chain conformations. Our methodical contribution comprises a procedure for generating new potentials of mean force for the ROTA scoring function which we have introduced previously for optimizing side-chain conformations with the tool IRECS. The ROTA potentials are specially trained to tolerate small-scale positional errors of atoms that are characteristic of (i) side-chain conformations that are modeled using a sparse rotamer library and (ii) ligand conformations that are generated using a docking program. We generated both rigid and flexible protein models with our side-chain prediction tool IRECS and docked ligands to proteins using the scoring function ROTA and the docking programs FlexX (for rigid side chains) and FlexE (for flexible side chains). We validated our approach on the forty screening targets of the DUD database. The validation shows that the ROTA potentials are especially well suited for estimating the binding affinity of ligands to proteins. The results also show that our procedure can compensate for the performance decrease in screening that occurs when using protein models with side chains modeled with a rotamer library instead of using X-ray structures. The average runtime per ligand of our method is 168 seconds on an Opteron V20z, which is fast enough to allow virtual screening of compound libraries for drug candidates.
Collapse
Affiliation(s)
- Christoph Hartmann
- Department of Computational Biology and Applied Algorithmics, Max-Planck-Institut für Informatik, Saarbrücken, Germany
| | | | | |
Collapse
|
177
|
Zhu J, Fan H, Periole X, Honig B, Mark AE. Refining homology models by combining replica-exchange molecular dynamics and statistical potentials. Proteins 2009; 72:1171-88. [PMID: 18338384 DOI: 10.1002/prot.22005] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
A protocol is presented for the global refinement of homology models of proteins. It combines the advantages of temperature-based replica-exchange molecular dynamics (REMD) for conformational sampling and the use of statistical potentials for model selection. The protocol was tested using 21 models. Of these 14 were models of 10 small proteins for which high-resolution crystal structures were available, the remainder were targets of the recent CASPR exercise. It was found that REMD in combination with currently available force fields could sample near-native conformational states starting from high-quality homology models. Conformations in which the backbone RMSD of secondary structure elements (SSE-RMSD) was lower than the starting value by 0.5-1.0 A were found for 15 out of the 21 cases (average 0.82 A). Furthermore, when a simple scoring function consisting of two statistical potentials was used to rank the structures, one or more structures with SSE-RMSD of at least 0.2 A lower than the starting value was found among the five best ranked structures in 11 out of the 21 cases. The average improvement in SSE-RMSD for the best models was 0.42 A. However, none of the scoring functions tested identified the structures with the lowest SSE-RMSD as the best models although all identified the native conformation as the one with lowest energy. This suggests that while the proposed protocol proved effective for the refinement of high-quality models of small proteins scoring functions remain one of the major limiting factors in structure refinement. This and other aspects by which the methodology could be further improved are discussed.
Collapse
Affiliation(s)
- Jiang Zhu
- Howard Hughes Medical Institute and Columbia University, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, USA
| | | | | | | | | |
Collapse
|
178
|
Vassura M, Margara L, Fariselli P, Casadio R. A graph theoretic approach to protein structure selection. Artif Intell Med 2009; 45:229-37. [DOI: 10.1016/j.artmed.2008.07.016] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2007] [Revised: 07/25/2008] [Accepted: 07/26/2008] [Indexed: 11/28/2022]
|
179
|
Varadwaj PK, Lahiri T. Functional group based Ligand binding affinity scoring function at atomic environmental level. Bioinformation 2009; 3:268-74. [PMID: 19255647 PMCID: PMC2646862 DOI: 10.6026/97320630003268] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2008] [Accepted: 12/06/2008] [Indexed: 11/23/2022] Open
Abstract
Use of knowledge based scoring function (KBSF) for virtual screening and molecular docking has become an established method for drug discovery. Lack of a precise and reliable free energy function that describes several interactions including water-mediated atomic interaction between amino-acid residues and ligand makes distance based statistical measure as the only alternative. Till now all the distance based scoring functions in KBSF arena use atom singularity concept, which neglects the environmental effect of the atom under consideration. We have developed a novel knowledge-based statistical energy function for protein-ligand complexes which takes atomic environment in to account hence functional group as a singular entity. The proposed knowledge based scoring function is fast, simple to construct, easy to use and moreover it tackle the existing problem of handling molecular orientation in active site pocket. We have designed and used Functional group based Ligand retrieval (FBLR) system which can identify and detect the orientation of functional groups in ligand. This decoy searching was used to build the above KBSF to quantify the activity and affinity of high resolution protein-ligand complexes. We have proposed the probable use of these decoys in molecular build-up as a de-novo drug designing approach. We have also discussed the possible use of the said KSBF in pharmacophore fragment detection and pseudo center based fragment alignment procedure.
Collapse
|
180
|
Kamisetty H, Xing EP, Langmead CJ. Free energy estimates of all-atom protein structures using generalized belief propagation. J Comput Biol 2008; 15:755-66. [PMID: 18662103 DOI: 10.1089/cmb.2007.0131] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We present a technique for approximating the free energy of protein structures using generalized belief propagation (GBP). The accuracy and utility of these estimates are then demonstrated in two different application domains. First, we show that the entropy component of our free energy estimates can useful in distinguishing native protein structures from decoys-structures with similar internal energy to that of the native structure, but otherwise incorrect. Our method is able to correctly identify the native fold from among a set of decoys with 87.5% accuracy over a total of 48 different immunoglobulin folds. The remaining 12.5% of native structures are ranked among the top four of all structures. Second, we show that our estimates of DeltaDeltaG upon mutation upon mutation for three different data sets have linear correlations of 0.63-0.70 with experimental measurements and statistically significant p-values. Together, these results suggest that GBP is an effective means for computing free energy in all-atom models of protein structures. GBP is also efficient, taking a few minutes to run on a typical sized protein, further suggesting that GBP may be an attractive alternative to more costly molecular dynamic simulations for some tasks.
Collapse
Affiliation(s)
- Hetunandan Kamisetty
- Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | | | | |
Collapse
|
181
|
Cui M, Mezei M, Osman R. Prediction of protein loop structures using a local move Monte Carlo approach and a grid-based force field. Protein Eng Des Sel 2008; 21:729-35. [PMID: 18957407 PMCID: PMC2597363 DOI: 10.1093/protein/gzn056] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2008] [Revised: 09/18/2008] [Accepted: 09/23/2008] [Indexed: 11/14/2022] Open
Abstract
We have developed an improved local move Monte Carlo (LMMC) loop sampling approach for loop predictions. The method generates loop conformations based on simple moves of the torsion angles of side chains and local moves of backbone of loops. To reduce the computational costs for energy evaluations, we developed a grid-based force field to represent the protein environment and solvation effect. Simulated annealing has been used to enhance the efficiency of the LMMC loop sampling and identify low-energy loop conformations. The prediction quality is evaluated on a set of protein loops with known crystal structure that has been previously used by others to test different loop prediction methods. The results show that this approach can reproduce the experimental results with the root mean square deviation within 1.8 A for all the test cases. The LMMC loop prediction approach developed here could be useful for improvement in the quality the loop regions in homology models, flexible protein-ligand and protein-protein docking studies.
Collapse
Affiliation(s)
- Meng Cui
- Department of Structural and Chemical Biology, Mount Sinai School of Medicine, NYU, Box 1218, New York, NY 10029
- Department of Physiology and Biophysics, Virginia Commonwealth University, 1101 East Marshall Street, PO Box 980551, Richmond, VA 23298, USA
| | - Mihaly Mezei
- Department of Structural and Chemical Biology, Mount Sinai School of Medicine, NYU, Box 1218, New York, NY 10029
| | - Roman Osman
- Department of Structural and Chemical Biology, Mount Sinai School of Medicine, NYU, Box 1218, New York, NY 10029
| |
Collapse
|
182
|
Makino Y, Itoh N. A knowledge-based structure-discriminating function that requires only main-chain atom coordinates. BMC STRUCTURAL BIOLOGY 2008; 8:46. [PMID: 18957132 PMCID: PMC2600639 DOI: 10.1186/1472-6807-8-46] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/26/2007] [Accepted: 10/29/2008] [Indexed: 11/23/2022]
Abstract
Background The use of knowledge-based potential function is a powerful method for protein structure evaluation. A variety of formulations that evaluate single or multiple structural features of proteins have been developed and studied. The performance of functions is often evaluated by discrimination ability using decoy structures of target proteins. A function that can evaluate coarse-grained structures is advantageous from many aspects, such as relatively easy generation and manipulation of model structures; however, the reduction of structural representation is often accompanied by degradation of the structure discrimination performance. Results We developed a knowledge-based pseudo-energy calculating function for protein structure discrimination. The function (Discriminating Function using Main-chain Atom Coordinates, DFMAC) consists of six pseudo-energy calculation components that deal with different structural features. Only the main-chain atom coordinates of N, Cα, and C atoms for the respective amino acid residues are required as input data for structure evaluation. The 231 target structures in 12 different types of decoy sets were separated into 154 and 77 targets, and function training and the subsequent performance test were performed using the respective target sets. Fifty-nine (76.6%) native and 68 (88.3%) near-native (< 2.0 Å Cα RMSD) targets in the test set were successfully identified. The average Cα RMSD of the test set resulted in 1.174 with the tuned parameters. The major part of the discrimination performance was supported by the orientation-dependent component. Conclusion Despite the reduced representation of input structures, DFMAC showed considerable structure discrimination ability. The function can be applied to the identification of near-native structures in structure prediction experiments.
Collapse
Affiliation(s)
- Yoshihide Makino
- Department of Biotechnology, Faculty of Engineering, Toyama Prefectural University, 5180 Kurokawa, Imizu-shi, Toyama 939-0398, Japan.
| | | |
Collapse
|
183
|
Protein meta-functional signatures from combining sequence, structure, evolution, and amino acid property information. PLoS Comput Biol 2008; 4:e1000181. [PMID: 18818722 PMCID: PMC2526173 DOI: 10.1371/journal.pcbi.1000181] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2008] [Accepted: 08/07/2008] [Indexed: 11/19/2022] Open
Abstract
Protein function is mediated by different amino acid residues, both their positions and types, in a protein sequence. Some amino acids are responsible for the stability or overall shape of the protein, playing an indirect role in protein function. Others play a functionally important role as part of active or binding sites of the protein. For a given protein sequence, the residues and their degree of functional importance can be thought of as a signature representing the function of the protein. We have developed a combination of knowledge- and biophysics-based function prediction approaches to elucidate the relationships between the structural and the functional roles of individual residues and positions. Such a meta-functional signature (MFS), which is a collection of continuous values representing the functional significance of each residue in a protein, may be used to study proteins of known function in greater detail and to aid in experimental characterization of proteins of unknown function. We demonstrate the superior performance of MFS in predicting protein functional sites and also present four real-world examples to apply MFS in a wide range of settings to elucidate protein sequence-structure-function relationships. Our results indicate that the MFS approach, which can combine multiple sources of information and also give biological interpretation to each component, greatly facilitates the understanding and characterization of protein function.
Collapse
|
184
|
Solis AD, Rackovsky S. Information and discrimination in pairwise contact potentials. Proteins 2008; 71:1071-87. [PMID: 18004788 DOI: 10.1002/prot.21733] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
We examine the information-theoretic characteristics of statistical potentials that describe pairwise long-range contacts between amino acid residues in proteins. In our work, we seek to map out an efficient information-based strategy to detect and optimally utilize the structural information latent in empirical data, to make contact potentials, and other statistically derived folding potentials, more effective tools in protein structure prediction. Foremost, we establish fundamental connections between basic information-theoretic quantities (including the ubiquitous Z-score) and contact "energies" or scores used routinely in protein structure prediction, and demonstrate that the informatic quantity that mediates fold discrimination is the total divergence. We find that pairwise contacts between residues bear a moderate amount of fold information, and if optimized, can assist in the discrimination of native conformations from large ensembles of native-like decoys. Using an extensive battery of threading tests, we demonstrate that parameters that affect the information content of contact potentials (e.g., choice of atoms to define residue location and the cut-off distance between pairs) have a significant influence in their performance in fold recognition. We conclude that potentials that have been optimized for mutual information and that have high number of score events per sequence-structure alignment are superior in identifying the correct fold. We derive the quantity "information product" that embodies these two critical factors. We demonstrate that the information product, which does not require explicit threading to compute, is as effective as the Z-score, which requires expensive decoy threading to evaluate. This new objective function may be able to speed up the multidimensional parameter search for better statistical potentials. Lastly, by demonstrating the functional equivalence of quasi-chemically approximated "energies" to fundamental informatic quantities, we make statistical potentials less dependent on theoretically tenuous biophysical formalisms and more amenable to direct bioinformatic optimization.
Collapse
Affiliation(s)
- Armando D Solis
- Department of Pharmacology and Systems Therapeutics, Mount Sinai School of Medicine, New York, New York 10029, USA
| | | |
Collapse
|
185
|
Hoang TX, Seno F, Trovato A, Banavar JR, Maritan A. Inference of the solvation energy parameters of amino acids using maximum entropy approach. J Chem Phys 2008; 129:035102. [PMID: 18647046 DOI: 10.1063/1.2953691] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We present a novel technique, based on the principle of maximum entropy, for deriving the solvation energy parameters of amino acids from the knowledge of the solvent accessible areas in experimentally determined native state structures as well as high quality decoys of proteins. We present the results of detailed studies and analyze the correlations of the solvation energy parameters with the standard hydrophobic scale. We study the ability of the inferred parameters to discriminate between the native state structures of proteins and their decoy conformations.
Collapse
Affiliation(s)
- Trinh X Hoang
- Physics Department, Penn State University, 104 Davey Lab, University Park, Pennsylvania 16801, USA
| | | | | | | | | |
Collapse
|
186
|
Ngan SC, Hung LH, Liu T, Samudrala R. Scoring functions for de novo protein structure prediction revisited. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:243-81. [PMID: 18075169 DOI: 10.1007/978-1-59745-574-9_10] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
Abstract
De novo protein structure prediction methods attempt to predict tertiary structures from sequences based on general principles that govern protein folding energetics and/or statistical tendencies of conformational features that native structures acquire, without the use of explicit templates. A general paradigm for de novo prediction involves sampling the conformational space, guided by scoring functions and other sequence-dependent biases, such that a large set of candidate ("decoy") structures are generated, and then selecting native-like conformations from those decoys using scoring functions as well as conformer clustering. High-resolution refinement is sometimes used as a final step to fine-tune native-like structures. There are two major classes of scoring functions. Physics-based functions are based on mathematical models describing aspects of the known physics of molecular interaction. Knowledge-based functions are formed with statistical models capturing aspects of the properties of native protein conformations. We discuss the implementation and use of some of the scoring functions from these two classes for de novo structure prediction in this chapter.
Collapse
Affiliation(s)
- Shing-Chung Ngan
- Department of Microbiology, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | |
Collapse
|
187
|
Liu T, Guerquin M, Samudrala R. Improving the accuracy of template-based predictions by mixing and matching between initial models. BMC STRUCTURAL BIOLOGY 2008; 8:24. [PMID: 18457597 PMCID: PMC2424052 DOI: 10.1186/1472-6807-8-24] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2007] [Accepted: 05/05/2008] [Indexed: 11/10/2022]
Abstract
BACKGROUND Comparative modeling is a technique to predict the three dimensional structure of a given protein sequence based primarily on its alignment to one or more proteins with experimentally determined structures. A major bottleneck of current comparative modeling methods is the lack of methods to accurately refine a starting initial model so that it approaches the resolution of the corresponding experimental structure. We investigate the effectiveness of a graph-theoretic clique finding approach to solve this problem. RESULTS Our method takes into account the information presented in multiple templates/alignments at the three-dimensional level by mixing and matching regions between different initial comparative models. This method enables us to obtain an optimized conformation ensemble representing the best combination of secondary structures, resulting in the refined models of higher quality. In addition, the process of mixing and matching accumulates near-native conformations, resulting in discriminating the native-like conformation in a more effective manner. In the seventh Critical Assessment of Structure Prediction (CASP7) experiment, the refined models produced are more accurate than the starting initial models. CONCLUSION This novel approach can be applied without any manual intervention to improve the quality of comparative predictions where multiple template/alignment combinations are available for modeling, producing conformational models of higher quality than the starting initial predictions.
Collapse
Affiliation(s)
- Tianyun Liu
- Department of Microbiology, University of Washington, School of Medicine, Seattle, WA 98195, USA
| | - Michal Guerquin
- Department of Microbiology, University of Washington, School of Medicine, Seattle, WA 98195, USA
| | - Ram Samudrala
- Department of Microbiology, University of Washington, School of Medicine, Seattle, WA 98195, USA
| |
Collapse
|
188
|
Abstract
We describe a fast and accurate protocol, LoopBuilder, for the prediction of loop conformations in proteins. The procedure includes extensive sampling of backbone conformations, side chain addition, the use of a statistical potential to select a subset of these conformations, and, finally, an energy minimization and ranking with an all-atom force field. We find that the Direct Tweak algorithm used in the previously developed LOOPY program is successful in generating an ensemble of conformations that on average are closer to the native conformation than those generated by other methods. An important feature of Direct Tweak is that it checks for interactions between the loop and the rest of the protein during the loop closure process. DFIRE is found to be a particularly effective statistical potential that can bias conformation space toward conformations that are close to the native structure. Its application as a filter prior to a full molecular mechanics energy minimization both improves prediction accuracy and offers a significant savings in computer time. Final scoring is based on the OPLS/SBG-NP force field implemented in the PLOP program. The approach is also shown to be quite successful in predicting loop conformations for cases where the native side chain conformations are assumed to be unknown, suggesting that it will prove effective in real homology modeling applications. Proteins 2008. © 2007 Wiley-Liss, Inc.
Collapse
Affiliation(s)
- Cinque S Soto
- Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | | | | | | | | |
Collapse
|
189
|
Rajgaria R, McAllister SR, Floudas CA. Distance dependent centroid to centroid force fields using high resolution decoys. Proteins 2008; 70:950-70. [PMID: 17847088 DOI: 10.1002/prot.21561] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Simplified force fields play an important role in protein structure prediction and de novo protein design by requiring less computational effort than detailed atomistic potentials. A side chain centroid based, distance dependent pairwise interaction potential has been developed. A linear programming based formulation was used in which non-native "decoy" conformers are forced to take a higher energy compared with the corresponding native structure. This model was trained on an enhanced and diverse protein set. High quality decoy structures were generated for approximately 1400 nonhomologous proteins using torsion angle dynamics along with restricted variations of the hydrophobic cores of the native structure. The resulting decoy set was used to train the model yielding two different side chain centroid based force fields that differ in the way distance dependence has been used to calculate energy parameters. These force fields were tested on an independent set of 148 test proteins with 500 decoy structures for each protein. The side chain centroid force fields were successful in correctly identifying approximately 86% native structures. The Z-scores produced by the proposed centroid-centroid distance dependent force fields improved compared with other distance dependent C(alpha)-C(alpha) or side chain based force fields.
Collapse
Affiliation(s)
- R Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | |
Collapse
|
190
|
Panjkovich A, Melo F, Marti-Renom MA. Evolutionary potentials: structure specific knowledge-based potentials exploiting the evolutionary record of sequence homologs. Genome Biol 2008; 9:R68. [PMID: 18397517 PMCID: PMC2643939 DOI: 10.1186/gb-2008-9-4-r68] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Revised: 04/02/2008] [Accepted: 04/08/2008] [Indexed: 11/10/2022] Open
Abstract
So-called ‘Evolutionary potentials’ for protein structure prediction are derived using a single experimental protein structure and all three-dimensional models of its homologous sequences. We introduce a new type of knowledge-based potentials for protein structure prediction, called 'evolutionary potentials', which are derived using a single experimental protein structure and all three-dimensional models of its homologous sequences. The new potentials have been benchmarked against other knowledge-based potentials, resulting in a significant increase in accuracy for model assessment. In contrast to standard knowledge-based potentials, we propose that evolutionary potentials capture key determinants of thermodynamic stability and specific sequence constraints required for fast folding.
Collapse
Affiliation(s)
- Alejandro Panjkovich
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile
| | | | | |
Collapse
|
191
|
Benkert P, Tosatto SCE, Schomburg D. QMEAN: A comprehensive scoring function for model quality assessment. Proteins 2008; 71:261-77. [PMID: 17932912 DOI: 10.1002/prot.21715] [Citation(s) in RCA: 733] [Impact Index Per Article: 45.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In protein structure prediction, a considerable number of alternative models are usually produced from which subsequently the final model has to be selected. Thus, a scoring function for the identification of the best model within an ensemble of alternative models is a key component of most protein structure prediction pipelines. QMEAN, which stands for Qualitative Model Energy ANalysis, is a composite scoring function describing the major geometrical aspects of protein structures. Five different structural descriptors are used. The local geometry is analyzed by a new kind of torsion angle potential over three consecutive amino acids. A secondary structure-specific distance-dependent pairwise residue-level potential is used to assess long-range interactions. A solvation potential describes the burial status of the residues. Two simple terms describing the agreement of predicted and calculated secondary structure and solvent accessibility, respectively, are also included. A variety of different implementations are investigated and several approaches to combine and optimize them are discussed. QMEAN was tested on several standard decoy sets including a molecular dynamics simulation decoy set as well as on a comprehensive data set of totally 22,420 models from server predictions for the 95 targets of CASP7. In a comparison to five well-established model quality assessment programs, QMEAN shows a statistically significant improvement over nearly all quality measures describing the ability of the scoring function to identify the native structure and to discriminate good from bad models. The three-residue torsion angle potential turned out to be very effective in recognizing the native fold.
Collapse
Affiliation(s)
- Pascal Benkert
- Institute for Biochemistry, University of Cologne, 50674 Cologne, Germany
| | | | | |
Collapse
|
192
|
Saccenti E, Rosato A. The war of tools: how can NMR spectroscopists detect errors in their structures? JOURNAL OF BIOMOLECULAR NMR 2008; 40:251-261. [PMID: 18320330 DOI: 10.1007/s10858-008-9228-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2007] [Revised: 02/08/2008] [Accepted: 02/13/2008] [Indexed: 05/26/2023]
Abstract
Protein structure determination by NMR methods has started in the mid-eighties and has been growing steadily since then. Ca. 14% of the protein structures deposited in the PDB have been solved by NMR. The evaluation of the quality of NMR structures however is still lacking a well-established practice. In this work, we examined various tools for the assessment of structural quality to ascertain the extent to which these tools could be applied to detect flaws in NMR structures. In particular, we investigated the variation in the scores assigned by these programs as a function of the deviation of the structures induced by errors in assignments or in the upper distance limits used. These perturbations did not distort radically the protein fold, but resulted in backbone RMS deviations up to 3 A, which is in line with errors highlighted in the available literature. We found that it is quite difficult to discriminate the structures perturbed because of misassignments from the original ones, also because the spread in score over the conformers of the original bundle is relatively large. varphi-psi distributions and normality scores related to the backbone conformation and to the distribution of side-chain dihedral angles are the most sensitive indicators of flaws.
Collapse
Affiliation(s)
- Edoardo Saccenti
- Magnetic Resonance Center, University of Florence, Via L. Sacconi 6, 50019, Sesto Fiorentino, Italy
| | | |
Collapse
|
193
|
An improved method of potential of mean force for protein-protein interactions. Sci Bull (Beijing) 2008. [DOI: 10.1007/s11434-008-0036-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
194
|
de Sancho D, Rey A. Energy minimizations with a combination of two knowledge-based potentials for protein folding. J Comput Chem 2008; 29:1684-92. [DOI: 10.1002/jcc.20924] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
195
|
Yang Y, Zhou Y. Specific interactions for ab initio folding of protein terminal regions with secondary structures. Proteins 2008; 72:793-803. [PMID: 18260109 DOI: 10.1002/prot.21968] [Citation(s) in RCA: 186] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Yuedong Yang
- Indiana University School of Informatics, Indianapolis, Indiana 46202, USA
| | | |
Collapse
|
196
|
Li YC, Zeng ZH. Interfacial atom pair analysis. BIOCHEMISTRY. BIOKHIMIIA 2008; 73:231-233. [PMID: 18298380 DOI: 10.1134/s0006297908020156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The relations of the binding free energies in a dataset of 69 protein complexes with the numbers of interfacial atom pairs, as well as with the atomic distances of the pairs, are analyzed. It is found that the interfacial main-chain atom pairs contribute more to the correlation than the interfacial side chain atom pairs do, and the polar atom pairs contribute more than the non-polar atom pairs do. Interfacial atom pairs with atomic distance in the range of 6-12 A are the most important to explain the differences in binding free energies in the datasets.
Collapse
Affiliation(s)
- Yong-Chao Li
- Institute of Biophysics, Chinese Academy of Sciences, Chaoyang District, Beijing, China
| | | |
Collapse
|
197
|
Structural and functional characterization of an organic hydroperoxide resistance protein from Mycoplasma gallisepticum. J Bacteriol 2008; 190:2206-16. [PMID: 18192392 DOI: 10.1128/jb.01685-07] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
As obligate parasites, Mycoplasma species are continuously exposed to oxidative damage due to host-generated peroxides and reactive oxygen species (ROS). In addition, the production of endogenous oxidants is believed to be a primary virulence mechanism of several Mollicute species, indicating that oxidative stress resistance is crucial to survival of these bacteria in the host milieu. Despite the abundance of oxidants at the site of infection, enzymes responsible for the detoxification of ROS have never been characterized in mycoplasmas. Here we characterize a homolog of the ohr (organic hydroperoxide resistance) family from Mycoplasma gallisepticum (encoding MGA1142). Unlike previously characterized ohr genes, the mga1142 gene is not upregulated in response to oxidative stress but displays a novel pattern of expression. Both organic and inorganic peroxides can act as substrates for MGA1142, but they are degraded with various efficiencies. Furthermore, cumene hydroperoxide, an aromatic peroxide metabolized with high efficiency by other Ohr proteins, was shown to rapidly inactivate MGA1142, accounting for the sensitivity of M. gallisepticum cells to this compound. Comparative modeling of the MGA1142 quaternary structure revealed that the active site of this molecule has a relatively wide conformation. These data indicate that the natural substrate for MGA1142 differs from that for previously characterized Ohr proteins. Triton X-114 partitioning demonstrated that MGA1142 is located in both cytosol and membrane fractions, suggesting that in vivo this molecule plays a role in the detoxification of both endogenous and exogenous peroxides. A model describing how MGA1142 is likely to be oriented in the cell membrane is presented.
Collapse
|
198
|
A historical perspective of template-based protein structure prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:3-42. [PMID: 18075160 DOI: 10.1007/978-1-59745-574-9_1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This chapter presents a broad and a historical overview of the problem of protein structure prediction. Different structure prediction methods, including homology modeling, fold recognition (FR)/protein threading, ab initio/de novo approaches, and hybrid techniques involving multiple types of approaches, are introduced in a historical context. The progress of the field as a whole, especially in the threading/FR area, as reflected by the CASP/CAFASP contests, is reviewed. At the end of the chapter, we discuss the challenging issues ahead in the field of protein structure prediction.
Collapse
|
199
|
Abstract
We perform a systematic examination of the ability of several different high-resolution, atomic-detail scoring functions to discriminate native conformations of loops in membrane proteins from non-native but physically reasonable, or "decoy," conformations. Decoys constructed from changing a loop conformation while keeping the remainder of the protein fixed are a challenging test of energy function accuracy. Nevertheless, the best of the energy functions we examined recognized the native structure as lowest in energy around half the time, and consistently chose it as a low-energy structure. This suggests that the best of present energy functions, even without a representation of the lipid bilayer, are of sufficient accuracy to give reasonable confidence in predictions of membrane protein structure. We also constructed homology models for each structure, using other known structures in the same protein family as templates. Homology models were constructed using several scoring functions and modeling programs, but with a comparable sampling effort for each procedure. Our results indicate that the quality of sequence alignment is probably the most important factor in model accuracy for sequence identity from 20-40%; one can expect a reasonably accurate model for membrane proteins when sequence identity is greater than 30%, in agreement with previous studies. Most errors are localized in loop regions, which tend to be found outside the lipid bilayer. For the most discriminative energy functions, it appears that errors are most likely due to lack of sufficient sampling, although it should be stressed that present energy functions are still far from perfectly reliable.
Collapse
Affiliation(s)
- Cen Gao
- Department of Chemistry, University of Rochester, Rochester, New York, USA
| | | |
Collapse
|
200
|
Qiu J, Sheffler W, Baker D, Noble WS. Ranking predicted protein structures with support vector regression. Proteins 2007; 71:1175-82. [PMID: 18004754 DOI: 10.1002/prot.21809] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Jian Qiu
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | | | | | | |
Collapse
|