1
|
Akhter N, Kabir KL, Chennupati G, Vangara R, Alexandrov BS, Djidjev H, Shehu A. Improved Protein Decoy Selection via Non-Negative Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1670-1682. [PMID: 33400654 DOI: 10.1109/tcbb.2020.3049088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
A central challenge in protein modeling research and protein structure prediction in particular is known as decoy selection. The problem refers to selecting biologically-active/native tertiary structures among a multitude of physically-realistic structures generated by template-free protein structure prediction methods. Research on decoy selection is active. Clustering-based methods are popular, but they fail to identify good/near-native decoys on datasets where near-native decoys are severely under-sampled by a protein structure prediction method. Reasonable progress is reported by methods that additionally take into account the internal energy of a structure and employ it to identify basins in the energy landscape organizing the multitude of decoys. These methods, however, incur significant time costs for extracting basins from the landscape. In this paper, we propose a novel decoy selection method based on non-negative matrix factorization. We demonstrate that our method outperforms energy landscape-based methods. In particular, the proposed method addresses both the time cost issue and the challenge of identifying good decoys in a sparse dataset, successfully recognizing near-native decoys for both easy and hard protein targets.
Collapse
|
2
|
Akhter N, Chennupati G, Kabir KL, Djidjev H, Shehu A. Unsupervised and Supervised Learning over theEnergy Landscape for Protein Decoy Selection. Biomolecules 2019; 9:E607. [PMID: 31615116 PMCID: PMC6843838 DOI: 10.3390/biom9100607] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 10/03/2019] [Accepted: 10/04/2019] [Indexed: 11/17/2022] Open
Abstract
The energy landscape that organizes microstates of a molecular system and governs theunderlying molecular dynamics exposes the relationship between molecular form/structure, changesto form, and biological activity or function in the cell. However, several challenges stand in the wayof leveraging energy landscapes for relating structure and structural dynamics to function. Energylandscapes are high-dimensional, multi-modal, and often overly-rugged. Deep wells or basins inthem do not always correspond to stable structural states but are instead the result of inherentinaccuracies in semi-empirical molecular energy functions. Due to these challenges, energeticsis typically ignored in computational approaches addressing long-standing central questions incomputational biology, such as protein decoy selection. In the latter, the goal is to determine over apossibly large number of computationally-generated three-dimensional structures of a protein thosestructures that are biologically-active/native. In recent work, we have recast our attention on theprotein energy landscape and its role in helping us to advance decoy selection. Here, we summarizesome of our successes so far in this direction via unsupervised learning. More importantly, we furtheradvance the argument that the energy landscape holds valuable information to aid and advance thestate of protein decoy selection via novel machine learning methodologies that leverage supervisedlearning. Our focus in this article is on decoy selection for the purpose of a rigorous, quantitativeevaluation of how leveraging protein energy landscapes advances an important problem in proteinmodeling. However, the ideas and concepts presented here are generally useful to make discoveriesin studies aiming to relate molecular structure and structural dynamics to function.
Collapse
Affiliation(s)
- Nasrin Akhter
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
| | - Gopinath Chennupati
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Kazi Lutful Kabir
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
| | - Hristo Djidjev
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
- Center for Adaptive Human-Machine Partnership, George Mason University, Fairfax, VA 22030, USA.
- Department of Bioengineering, George Mason University, Fairfax, VA 22030, USA.
- School of Systems Biology, George Mason University, Fairfax, VA 22030, USA.
| |
Collapse
|
3
|
Chopra G, Samudrala R. Exploring Polypharmacology in Drug Discovery and Repurposing Using the CANDO Platform. Curr Pharm Des 2017; 22:3109-23. [PMID: 27013226 DOI: 10.2174/1381612822666160325121943] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Accepted: 03/01/2015] [Indexed: 01/05/2023]
Abstract
BACKGROUND Traditional drug discovery approaches focus on a limited set of target molecules for treatment against specific indications/diseases. However, drug absorption, dispersion, metabolism, and excretion (ADME) involve interactions with multiple protein systems. Drugs approved for particular indication(s) may be repurposed as novel therapeutics for others. The severely declining rate of discovery and increasing costs of new drugs illustrate the limitations of the traditional reductionist paradigm in drug discovery. METHODS We developed the Computational Analysis of Novel Drug Opportunities (CANDO) platform based on a hypothesis that drugs function by interacting with multiple protein targets to create a molecular interaction signature that can be exploited for therapeutic repurposing and discovery. We compiled a library of compounds that are human ingestible with minimal side effects, followed by an 'all-compounds' vs 'all-proteins' fragment-based multitarget docking with dynamics screen to construct compound-proteome interaction matrices that were then analyzed to determine similarity of drug behavior. The proteomic signature similarity of drugs is then ranked to make putative drug predictions for all indications in a shotgun manner. RESULTS We have previously applied this platform with success in both retrospective benchmarking and prospective validation, and to understand the effect of druggable protein classes on repurposing accuracy. Here we use the CANDO platform to analyze and determine the contribution of multitargeting (polypharmacology) to drug repurposing benchmarking accuracy. Taken together with the previous work, our results indicate that a large number of protein structures with diverse fold space and a specific polypharmacological interactome is necessary for accurate drug predictions using our proteomic and evolutionary drug discovery and repurposing platform. CONCLUSION These results have implications for future drug development and repurposing in the context of polypharmacology.
Collapse
Affiliation(s)
- Gaurav Chopra
- Department of Chemistry, Purdue University, West Lafayette, IN, USA.
| | - Ram Samudrala
- Department of Biomedical Informatics, SUNY, Buffalo, NY, USA.
| |
Collapse
|
4
|
Feig M. Computational protein structure refinement: Almost there, yet still so far to go. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL MOLECULAR SCIENCE 2017; 7:e1307. [PMID: 30613211 PMCID: PMC6319934 DOI: 10.1002/wcms.1307] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Protein structures are essential in modern biology yet experimental methods are far from being able to catch up with the rapid increase in available genomic data. Computational protein structure prediction methods aim to fill the gap while the role of protein structure refinement is to take approximate initial template-based models and bring them closer to the true native structure. Current methods for computational structure refinement rely on molecular dynamics simulations, related sampling methods, or iterative structure optimization protocols. The best methods are able to achieve moderate degrees of refinement but consistent refinement that can reach near-experimental accuracy remains elusive. Key issues revolve around the accuracy of the energy function, the inability to reliably rank multiple models, and the use of restraints that keep sampling close to the native state but also limit the degree of possible refinement. A different aspect is the question of what exactly the target of high-resolution refinement should be as experimental structures are affected by experimental conditions and different biological questions require varying levels of accuracy. While improvement of the global protein structure is a difficult problem, high-resolution refinement methods that improves local structural quality such as favorable stereochemistry and the avoidance of atomic clashes are much more successful.
Collapse
Affiliation(s)
- Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, 603 Wilson Rd., Room 218 BCH, East Lansing, MI, USA, ; 517-432-7439
| |
Collapse
|
5
|
Serra F, Romualdi C, Fogolari F. Similarity Measures Based on the Overlap of Ranked Genes Are Effective for Comparison and Classification of Microarray Data. J Comput Biol 2016; 23:603-14. [PMID: 27104372 DOI: 10.1089/cmb.2015.0057] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Similarity (or conversely distance) measures are at the heart of most bioinformatic applications. When the similarity involves only a small subset of features out of many, global similarity measures may be significantly affected by noise. Selecting only a subset of (putatively relevant) features for comparison is a widespread solution to the problem albeit affected by arbitrariness and manual intervention. The problem is becoming more and more important due to the increasing amount of experimental data available. In recent years measures based on ranking similarities between two datasets have been proposed. Here, we use one of the proposed rank similarity measures, sharing some aspects with the fraction enrichment score used for protein structure prediction and the gene set enrichment analysis, and test its performance in classifying experiments. The discrimination ability of the similarity measures based on the overlap of ranked genes tested here compares well or better with standard measures of similarity. This conclusion supports the use of rank-based proximity measures to gain further insight in dataset comparisons, particularly on expression data obtained by different techonologies (e.g., RNA-seq and microarrays).
Collapse
Affiliation(s)
- Fabrizio Serra
- 1 Department of Biomedical Sciences and Technologies, University of Udine, Udine , Italy
| | - Chiara Romualdi
- 2 Department of Biology, University of Padova , Padova, Italy
| | - Federico Fogolari
- 1 Department of Biomedical Sciences and Technologies, University of Udine, Udine , Italy .,3 Istituto Nazionale Biostrutture e Biosistemi , Roma, Italy
| |
Collapse
|
6
|
Lee MS, Olson MA. Assessment of Detection and Refinement Strategies for de novo Protein Structures Using Force Field and Statistical Potentials. J Chem Theory Comput 2015; 3:312-24. [PMID: 26627174 DOI: 10.1021/ct600195f] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
De novo predictions of protein structures at high resolution are plagued by the problem of detecting the native conformation from false energy minima. In this work, we provide an assessment of various detection and refinement protocols on a small subset of the second-generation all-atom Rosetta decoy set (Tsai et al. Proteins 2003, 53, 76-87) using two potentials: the all-atom CHARMM PARAM22 force field combined with generalized Born/surface-area (GB-SA) implicit solvation and the DFIRE-AA statistical potential. Detection schemes included DFIRE-AA conformational scoring and energy minimization followed by scoring with both GB-SA and DFIRE-AA potentials. Refinement methods included short-time (1-ps) molecular dynamics simulations, temperature-based replica exchange molecular dynamics, and a new computational unfold/refold procedure. Refinement methods include temperature-based replica exchange molecular dynamics and a new computational unfold/refold procedure. Our results indicate that simple detection with only minimization is the best protocol for finding the most nativelike structures in the decoy set. The refinement techniques that we tested are generally unsuccessful in improving detection; however, they provide marginal improvements to some of the decoy structures. Future directions in the development of refinement techniques are discussed in the context of the limitations of the protocols evaluated in this study.
Collapse
Affiliation(s)
- Michael S Lee
- Computational and Information Sciences Directorate, U.S. Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, and Department of Cell Biology and Biochemistry, U.S. Army Medical Research Institute of Infectious Diseases, Frederick, Maryland 21702
| | - Mark A Olson
- Computational and Information Sciences Directorate, U.S. Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, and Department of Cell Biology and Biochemistry, U.S. Army Medical Research Institute of Infectious Diseases, Frederick, Maryland 21702
| |
Collapse
|
7
|
Kim H, Kihara D. Detecting local residue environment similarity for recognizing near-native structure models. Proteins 2014; 82:3255-72. [PMID: 25132526 PMCID: PMC4237674 DOI: 10.1002/prot.24658] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 06/10/2014] [Accepted: 07/21/2014] [Indexed: 12/14/2022]
Abstract
We developed a new representation of local amino acid environments in protein structures called the Side-chain Depth Environment (SDE). An SDE defines a local structural environment of a residue considering the coordinates and the depth of amino acids that locate in the vicinity of the side-chain centroid of the residue. SDEs are general enough that similar SDEs are found in protein structures with globally different folds. Using SDEs, we developed a procedure called PRESCO (Protein Residue Environment SCOre) for selecting native or near-native models from a pool of computational models. The procedure searches similar residue environments observed in a query model against a set of representative native protein structures to quantify how native-like SDEs in the model are. When benchmarked on commonly used computational model datasets, our PRESCO compared favorably with the other existing scoring functions in selecting native and near-native models.
Collapse
Affiliation(s)
- Hyungrae Kim
- Department of Biological Sciences, Purdue University, West Lafayette IN, 47906, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette IN, 47906, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
8
|
Ruiz-Blanco YB, Marrero-Ponce Y, García Y, Puris A, Bello R, Green J, Sotomayor-Torres CM. A physics-based scoring function for protein structural decoys: Dynamic testing on targets of CASP-ROLL. Chem Phys Lett 2014. [DOI: 10.1016/j.cplett.2014.07.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
9
|
How good are simplified models for protein structure prediction? Adv Bioinformatics 2014; 2014:867179. [PMID: 24876837 PMCID: PMC4022063 DOI: 10.1155/2014/867179] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 01/22/2014] [Accepted: 01/23/2014] [Indexed: 11/18/2022] Open
Abstract
Protein structure prediction (PSP) has been one of the most challenging problems in computational biology for several decades. The challenge is largely due to the complexity of the all-atomic details and the unknown nature of the energy function. Researchers have therefore used simplified energy models that consider interaction potentials only between the amino acid monomers in contact on discrete lattices. The restricted nature of the lattices and the energy models poses a twofold concern regarding the assessment of the models. Can a native or a very close structure be obtained when structures are mapped to lattices? Can the contact based energy models on discrete lattices guide the search towards the native structures? In this paper, we use the protein chain lattice fitting (PCLF) problem to address the first concern; we developed a constraint-based local search algorithm for the PCLF problem for cubic and face-centered cubic lattices and found very close lattice fits for the native structures. For the second concern, we use a number of techniques to sample the conformation space and find correlations between energy functions and root mean square deviation (RMSD) distance of the lattice-based structures with the native structures. Our analysis reveals weakness of several contact based energy models used that are popular in PSP.
Collapse
|
10
|
Dong GQ, Fan H, Schneidman-Duhovny D, Webb B, Sali A. Optimized atomic statistical potentials: assessment of protein interfaces and loops. Bioinformatics 2013; 29:3158-66. [PMID: 24078704 PMCID: PMC3842762 DOI: 10.1093/bioinformatics/btt560] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Revised: 08/13/2013] [Accepted: 09/22/2013] [Indexed: 01/16/2023] Open
Abstract
MOTIVATION Statistical potentials have been widely used for modeling whole proteins and their parts (e.g. sidechains and loops) as well as interactions between proteins, nucleic acids and small molecules. Here, we formulate the statistical potentials entirely within a statistical framework, avoiding questionable statistical mechanical assumptions and approximations, including a definition of the reference state. RESULTS We derive a general Bayesian framework for inferring statistically optimized atomic potentials (SOAP) in which the reference state is replaced with data-driven 'recovery' functions. Moreover, we restrain the relative orientation between two covalent bonds instead of a simple distance between two atoms, in an effort to capture orientation-dependent interactions such as hydrogen bonds. To demonstrate this general approach, we computed statistical potentials for protein-protein docking (SOAP-PP) and loop modeling (SOAP-Loop). For docking, a near-native model is within the top 10 scoring models in 40% of the PatchDock benchmark cases, compared with 23 and 27% for the state-of-the-art ZDOCK and FireDock scoring functions, respectively. Similarly, for modeling 12-residue loops in the PLOP benchmark, the average main-chain root mean square deviation of the best scored conformations by SOAP-Loop is 1.5 Å, close to the average root mean square deviation of the best sampled conformations (1.2 Å) and significantly better than that selected by Rosetta (2.1 Å), DFIRE (2.3 Å), DOPE (2.5 Å) and PLOP scoring functions (3.0 Å). Our Bayesian framework may also result in more accurate statistical potentials for additional modeling applications, thus affording better leverage of the experimentally determined protein structures. AVAILABILITY AND IMPLEMENTATION SOAP-PP and SOAP-Loop are available as part of MODELLER (http://salilab.org/modeller).
Collapse
Affiliation(s)
- Guang Qiang Dong
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry and California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, CA 94158, USA
| | | | | | | | | |
Collapse
|
11
|
Nugent T, Jones DT. Membrane protein orientation and refinement using a knowledge-based statistical potential. BMC Bioinformatics 2013; 14:276. [PMID: 24047460 PMCID: PMC3852961 DOI: 10.1186/1471-2105-14-276] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Accepted: 09/05/2013] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Recent increases in the number of deposited membrane protein crystal structures necessitate the use of automated computational tools to position them within the lipid bilayer. Identifying the correct orientation allows us to study the complex relationship between sequence, structure and the lipid environment, which is otherwise challenging to investigate using experimental techniques due to the difficulty in crystallising membrane proteins embedded within intact membranes. RESULTS We have developed a knowledge-based membrane potential, calculated by the statistical analysis of transmembrane protein structures, coupled with a combination of genetic and direct search algorithms, and demonstrate its use in positioning proteins in membranes, refinement of membrane protein models and in decoy discrimination. CONCLUSIONS Our method is able to quickly and accurately orientate both alpha-helical and beta-barrel membrane proteins within the lipid bilayer, showing closer agreement with experimentally determined values than existing approaches. We also demonstrate both consistent and significant refinement of membrane protein models and the effective discrimination between native and decoy structures. Source code is available under an open source license from http://bioinf.cs.ucl.ac.uk/downloads/memembed/.
Collapse
Affiliation(s)
- Timothy Nugent
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK
| | - David T Jones
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
12
|
Fan H, Schneidman-Duhovny D, Irwin JJ, Dong G, Shoichet BK, Sali A. Statistical potential for modeling and ranking of protein-ligand interactions. J Chem Inf Model 2011; 51:3078-92. [PMID: 22014038 DOI: 10.1021/ci200377u] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF(1)) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScore(CSD) and ITScore/SE and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp) and the LigScore Web server (http://salilab.org/ligscore/).
Collapse
Affiliation(s)
- Hao Fan
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, USA
| | | | | | | | | | | |
Collapse
|
13
|
Wang Q, Shang Y, Xu D. Improving a consensus approach for protein structure selection by removing redundancy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1708-15. [PMID: 21519117 DOI: 10.1109/tcbb.2011.75] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
In protein tertiary structure prediction, a crucial step is to select near-native structures from a large number of predicted structural models. Over the years, extensive research has been conducted for the protein structure selection problem with most approaches focusing on developing more accurate energy or scoring functions. Despite significant advances in this area, the discerning power of current approaches is still unsatisfactory. In this paper, we propose a novel consensus-based algorithm for the selection of predicted protein structures. Given a set of predicted models, our method first removes redundant structures to derive a subset of reference models. Then, a structure is ranked based on its average pairwise similarity to the reference models. Using the CASP8 data set containing a large collection of predicted models for 122 targets, we compared our method with the best CASP8 quality assessment (QA) servers, which are all consensus based, and showed that our QA scores correlate better with the GDT-TSs than those of the CASP8 QA servers. We also compared our method with the state-of-the-art scoring functions and showed its improved performance for near-native model selection. The GDT-TSs of the top models picked by our method are on average more than 8 percent better than the ones selected by the best performing scoring function.
Collapse
Affiliation(s)
- Qingguo Wang
- Department of Computer Science, University of Missouri, 201 Engineering Building West, Columbia, MO 65211, USA.
| | | | | |
Collapse
|
14
|
Dong Q, Zhou S. Novel nonlinear knowledge-based mean force potentials based on machine learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:476-486. [PMID: 20820079 DOI: 10.1109/tcbb.2010.86] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The prediction of 3D structures of proteins from amino acid sequences is one of the most challenging problems in molecular biology. An essential task for solving this problem with coarse-grained models is to deduce effective interaction potentials. The development and evaluation of new energy functions is critical to accurately modeling the properties of biological macromolecules. Knowledge-based mean force potentials are derived from statistical analysis of proteins of known structures. Current knowledge-based potentials are almost in the form of weighted linear sum of interaction pairs. In this study, a class of novel nonlinear knowledge-based mean force potentials is presented. The potential parameters are obtained by nonlinear classifiers, instead of relative frequencies of interaction pairs against a reference state or linear classifiers. The support vector machine is used to derive the potential parameters on data sets that contain both native structures and decoy structures. Five knowledge-based mean force Boltzmann-based or linear potentials are introduced and their corresponding nonlinear potentials are implemented. They are the DIH potential (single-body residue-level Boltzmann-based potential), the DFIRE-SCM potential (two-body residue-level Boltzmann-based potential), the FS potential (two-body atom-level Boltzmann-based potential), the HR potential (two-body residue-level linear potential), and the T32S3 potential (two-body atom-level linear potential). Experiments are performed on well-established decoy sets, including the LKF data set, the CASP7 data set, and the Decoys “R”Us data set. The evaluation metrics include the energy Z score and the ability of each potential to discriminate native structures from a set of decoy structures. Experimental results show that all nonlinear potentials significantly outperform the corresponding Boltzmann-based or linear potentials, and the proposed discriminative framework is effective in developing knowledge-based mean force potentials. The nonlinear potentials can be widely used for ab initio protein structure prediction, model quality assessment, protein docking, and other challenging problems in computational biology.
Collapse
Affiliation(s)
- Qiwen Dong
- Shanghai Key Lab of Intelligent Information Processing and the School of Computer Science, Fudan University, Old Yifu Building, Room 202-5, 220 Handan Road, Shanhai 200433, China.
| | | |
Collapse
|
15
|
Benkert P, Tosatto SCE, Schwede T. Global and local model quality estimation at CASP8 using the scoring functions QMEAN and QMEANclust. Proteins 2010; 77 Suppl 9:173-80. [PMID: 19705484 DOI: 10.1002/prot.22532] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Identifying the best candidate model among an ensemble of alternatives is crucial in protein structure prediction. For this purpose, scoring functions have been developed which either calculate a quality estimate on the basis of a single model or derive a score from the information contained in the ensemble of models generated for a given sequence (i.e., consensus methods). At CASP7, consensus methods have performed considerably better than scoring functions operating on single models. However, consensus methods tend to fail if the best models are far from the center of the dominant structural cluster. At CASP8, we investigated whether our hybrid method QMEANclust may overcome this limitation by combining the QMEAN composite scoring function operating on single models with consensus information. We participated with four different scoring functions in the quality assessment category. The QMEANclust consensus scoring function turned out to be a successful method both for the ranking of entire models but especially for the estimation of the per-residue model quality. In this article, we briefly describe the two scoring functions QMEAN and QMEANclust and discuss their performance in the context of what went right and wrong at CASP8. Both scoring functions are publicly available at http://swissmodel.expasy.org/qmean/.
Collapse
Affiliation(s)
- Pascal Benkert
- Biozentrum, University of Basel, Basel 4056, Switzerland
| | | | | |
Collapse
|
16
|
Quintillá A, Hennrich F, Lebedkin S, Kappes MM, Wenzel W. Influence of endohedral water on diameter sorting of single-walled carbon nanotubes by density gradient centrifugation. Phys Chem Chem Phys 2009; 12:902-8. [PMID: 20066375 DOI: 10.1039/b912847f] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Separation of single-walled carbon nanotubes (SWNT) by diameter is an important prerequisite for controlled experimental studies and efficient application of these systems. By comparing experimental data with molecular dynamics (MD) simulations, we demonstrate that water filling has a significant, tube-diameter dependent effect on the effective mass density of individual single-walled carbon nanotubes suspended in aqueous surfactant suspensions. We present a model for the effective density of the nanotube-surfactant complex in aqueous solution that permits a comprehensive description of its density across the entire, experimentally relevant range of SWNT diameters. Parameters for this model can be obtained from molecular dynamics simulations and/or experiment and help explain the subtle interplay of surfactant coverage and endohedral water in the separation of a particular diameter species of SWNT by gradient centrifugation.
Collapse
Affiliation(s)
- A Quintillá
- Forschungszentrum Karlsruhe, Institut für Nanotechnologie, D-76021 Karlsruhe, Germany
| | | | | | | | | |
Collapse
|
17
|
Gopal SM, Klenin K, Wenzel W. Template-free protein structure prediction and quality assessment with an all-atom free-energy model. Proteins 2009; 77:330-41. [PMID: 19422063 DOI: 10.1002/prot.22438] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Biophysical forcefields have contributed less than originally anticipated to recent progress in protein structure prediction. Here, we have investigated the selectivity of a recently developed all-atom free-energy forcefield for protein structure prediction and quality assessment (QA). Using a heuristic method, but excluding homology, we generated decoy-sets for all targets of the CASP7 protein structure prediction assessment with <150 amino acids. The decoys in each set were then ranked by energy in short relaxation simulations and the best low-energy cluster was submitted as a prediction. For four of nine template-free targets, this approach generated high-ranking predictions within the top 10 models submitted in CASP7 for the respective targets. For these targets, our de-novo predictions had an average GDT_S score of 42.81, significantly above the average of all groups. The refinement protocol has difficulty for oligomeric targets and when no near-native decoys are generated in the decoy library. For targets with high-quality decoy sets the refinement approach was highly selective. Motivated by this observation, we rescored all server submissions up to 200 amino acids using a similar refinement protocol, but using no clustering, in a QA exercise. We found an excellent correlation between the best server models and those with the lowest energy in the forcefield. The free-energy refinement protocol may thus be an efficient tool for relative QA and protein structure prediction.
Collapse
Affiliation(s)
- Srinivasa Murthy Gopal
- Forschungszentrum Karlsruhe, Institute for Nanotechnology, PO Box 3640, 76021 Karlsruhe, Germany
| | | | | |
Collapse
|
18
|
Mirzaie M, Eslahchi C, Pezeshk H, Sadeghi M. A distance-dependent atomic knowledge-based potential and force for discrimination of native structures from decoys. Proteins 2009; 77:454-63. [PMID: 19452553 DOI: 10.1002/prot.22457] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The purpose of this article is to introduce a novel model for discriminating correctly folded proteins from well designed decoy structures using mechanical interatomic forces. In our model, we consider a protein as a collection of springs and the force imposed to each atom is calculated. A potential function is obtained from statistical contact preferences within known protein structures. Combining this function with the spring equation, the interatomic forces are calculated. Finally, we consider a structure and define a score function on the 3D structure of a protein. We compare the force imposed to each atom of a protein with the corresponding atom in the other structures. We then assign larger scores to those atoms with lower forces. The total score is the sum of partial scores of atoms. The optimal structure is assumed to be the one with the highest score in the data set. To evaluate the performance of our model, we apply it on several decoy sets.
Collapse
Affiliation(s)
- Mehdi Mirzaie
- Department of Mathematical Sciences, Shahid Beheshti University, Post Code 1983963113, Tehran, Iran
| | | | | | | |
Collapse
|
19
|
Betancourt MR. Another look at the conditions for the extraction of protein knowledge-based potentials. Proteins 2009; 76:72-85. [PMID: 19089977 DOI: 10.1002/prot.22320] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Protein knowledge-based potentials are effective free energies obtained from databases of known protein structures. They are used to parameterize coarse-grained protein models in many folding simulation and structure prediction methods. Two common approaches are used in the derivation of knowledge-based potentials. One assumes that the energy parameters optimize the native structure stability. The other assumes that interaction events are related to their energies according to the Boltzmann distribution, and that they are distributed independently of other events, that is, the quasi-chemical approximation. Here, these assumptions are systematically tested by extracting contact energies from artificial databases of lattice proteins with predefined pairwise contact energies. Databases of protein sequences are designed to either satisfy the Boltzmann distribution at high or low temperatures, or to simultaneously optimize the native stability and folding kinetics. It is found that the quasi-chemical approximation, with the ideal reference state, accurately reproduce the true energies for high temperature Boltzmann distributed sequences (weakly interacting residues), but less accurately at low temperatures, where the sequences correspond to energy minima and the residues are strongly interacting. To overcome this problem, an iterative procedure for Boltzmann distributed sequences is introduced, which accounts for interacting residue correlations and eliminates the need for the quasi-chemical approximation. In this case, the energies are accurately reproduced at any ensemble temperature. However, when the database of sequences designed for optimal stability and kinetics is used, the energy correlation is less than optimal using either method, exhibiting random and systematic deviations from linearity. Therefore, the assumption that native structures are maximally stable or that sequences are determined according to the Boltzmann distribution seems to be inadequate for obtaining accurate energies. The limited number of sequences in the database and the inhomogeneous concentration of amino acids from one structure to another do not seem to be major obstacles for improving the quality of the extracted pairwise energies, with the exception of repulsive interactions.
Collapse
Affiliation(s)
- Marcos R Betancourt
- Department of Physics, Indiana University Purdue University Indianapolis, Indianapolis, Indiana 46202, USA.
| |
Collapse
|
20
|
Benkert P, Schwede T, Tosatto SC. QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information. BMC STRUCTURAL BIOLOGY 2009; 9:35. [PMID: 19457232 PMCID: PMC2709111 DOI: 10.1186/1472-6807-9-35] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2008] [Accepted: 05/20/2009] [Indexed: 11/10/2022]
Abstract
BACKGROUND The selection of the most accurate protein model from a set of alternatives is a crucial step in protein structure prediction both in template-based and ab initio approaches. Scoring functions have been developed which can either return a quality estimate for a single model or derive a score from the information contained in the ensemble of models for a given sequence. Local structural features occurring more frequently in the ensemble have a greater probability of being correct. Within the context of the CASP experiment, these so called consensus methods have been shown to perform considerably better in selecting good candidate models, but tend to fail if the best models are far from the dominant structural cluster. In this paper we show that model selection can be improved if both approaches are combined by pre-filtering the models used during the calculation of the structural consensus. RESULTS Our recently published QMEAN composite scoring function has been improved by including an all-atom interaction potential term. The preliminary model ranking based on the new QMEAN score is used to select a subset of reliable models against which the structural consensus score is calculated. This scoring function called QMEANclust achieves a correlation coefficient of predicted quality score and GDT_TS of 0.9 averaged over the 98 CASP7 targets and perform significantly better in selecting good models from the ensemble of server models than any other groups participating in the quality estimation category of CASP7. Both scoring functions are also benchmarked on the MOULDER test set consisting of 20 target proteins each with 300 alternatives models generated by MODELLER. QMEAN outperforms all other tested scoring functions operating on individual models, while the consensus method QMEANclust only works properly on decoy sets containing a certain fraction of near-native conformations. We also present a local version of QMEAN for the per-residue estimation of model quality (QMEANlocal) and compare it to a new local consensus-based approach. CONCLUSION Improved model selection is obtained by using a composite scoring function operating on single models in order to enrich higher quality models which are subsequently used to calculate the structural consensus. The performance of consensus-based methods such as QMEANclust highly depends on the composition and quality of the model ensemble to be analysed. Therefore, performance estimates for consensus methods based on large meta-datasets (e.g. CASP) might overrate their applicability in more realistic modelling situations with smaller sets of models based on individual methods.
Collapse
Affiliation(s)
- Pascal Benkert
- Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland.
| | | | | |
Collapse
|
21
|
Handl J, Knowles J, Lovell SC. Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction. Bioinformatics 2009; 25:1271-9. [PMID: 19297350 PMCID: PMC2677743 DOI: 10.1093/bioinformatics/btp150] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2008] [Revised: 03/06/2009] [Accepted: 03/14/2009] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Decoy datasets, consisting of a solved protein structure and numerous alternative native-like structures, are in common use for the evaluation of scoring functions in protein structure prediction. Several pitfalls with the use of these datasets have been identified in the literature, as well as useful guidelines for generating more effective decoy datasets. We contribute to this ongoing discussion an empirical assessment of several decoy datasets commonly used in experimental studies. RESULTS We find that artefacts and sampling issues in the large majority of these data make it trivial to discriminate the native structure. This underlines that evaluation based on the rank/z-score of the native is a weak test of scoring function performance. Moreover, sampling biases present in the way decoy sets are generated or used can strongly affect other types of evaluation measures such as the correlation between score and root mean squared deviation (RMSD) to the native. We demonstrate how, depending on type of bias and evaluation context, sampling biases may lead to both over- or under-estimation of the quality of scoring terms, functions or methods. AVAILABILITY Links to the software and data used in this study are available at http://dbkgroup.org/handl/decoy_sets.
Collapse
Affiliation(s)
- Julia Handl
- Faculty of Life Sciences, University of Manchester, Manchester, UK
| | | | | |
Collapse
|
22
|
Makino Y, Itoh N. A knowledge-based structure-discriminating function that requires only main-chain atom coordinates. BMC STRUCTURAL BIOLOGY 2008; 8:46. [PMID: 18957132 PMCID: PMC2600639 DOI: 10.1186/1472-6807-8-46] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/26/2007] [Accepted: 10/29/2008] [Indexed: 11/23/2022]
Abstract
Background The use of knowledge-based potential function is a powerful method for protein structure evaluation. A variety of formulations that evaluate single or multiple structural features of proteins have been developed and studied. The performance of functions is often evaluated by discrimination ability using decoy structures of target proteins. A function that can evaluate coarse-grained structures is advantageous from many aspects, such as relatively easy generation and manipulation of model structures; however, the reduction of structural representation is often accompanied by degradation of the structure discrimination performance. Results We developed a knowledge-based pseudo-energy calculating function for protein structure discrimination. The function (Discriminating Function using Main-chain Atom Coordinates, DFMAC) consists of six pseudo-energy calculation components that deal with different structural features. Only the main-chain atom coordinates of N, Cα, and C atoms for the respective amino acid residues are required as input data for structure evaluation. The 231 target structures in 12 different types of decoy sets were separated into 154 and 77 targets, and function training and the subsequent performance test were performed using the respective target sets. Fifty-nine (76.6%) native and 68 (88.3%) near-native (< 2.0 Å Cα RMSD) targets in the test set were successfully identified. The average Cα RMSD of the test set resulted in 1.174 with the tuned parameters. The major part of the discrimination performance was supported by the orientation-dependent component. Conclusion Despite the reduced representation of input structures, DFMAC showed considerable structure discrimination ability. The function can be applied to the identification of near-native structures in structure prediction experiments.
Collapse
Affiliation(s)
- Yoshihide Makino
- Department of Biotechnology, Faculty of Engineering, Toyama Prefectural University, 5180 Kurokawa, Imizu-shi, Toyama 939-0398, Japan.
| | | |
Collapse
|
23
|
Protein meta-functional signatures from combining sequence, structure, evolution, and amino acid property information. PLoS Comput Biol 2008; 4:e1000181. [PMID: 18818722 PMCID: PMC2526173 DOI: 10.1371/journal.pcbi.1000181] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2008] [Accepted: 08/07/2008] [Indexed: 11/19/2022] Open
Abstract
Protein function is mediated by different amino acid residues, both their positions and types, in a protein sequence. Some amino acids are responsible for the stability or overall shape of the protein, playing an indirect role in protein function. Others play a functionally important role as part of active or binding sites of the protein. For a given protein sequence, the residues and their degree of functional importance can be thought of as a signature representing the function of the protein. We have developed a combination of knowledge- and biophysics-based function prediction approaches to elucidate the relationships between the structural and the functional roles of individual residues and positions. Such a meta-functional signature (MFS), which is a collection of continuous values representing the functional significance of each residue in a protein, may be used to study proteins of known function in greater detail and to aid in experimental characterization of proteins of unknown function. We demonstrate the superior performance of MFS in predicting protein functional sites and also present four real-world examples to apply MFS in a wide range of settings to elucidate protein sequence-structure-function relationships. Our results indicate that the MFS approach, which can combine multiple sources of information and also give biological interpretation to each component, greatly facilitates the understanding and characterization of protein function.
Collapse
|
24
|
Solis AD, Rackovsky S. Information and discrimination in pairwise contact potentials. Proteins 2008; 71:1071-87. [PMID: 18004788 DOI: 10.1002/prot.21733] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
We examine the information-theoretic characteristics of statistical potentials that describe pairwise long-range contacts between amino acid residues in proteins. In our work, we seek to map out an efficient information-based strategy to detect and optimally utilize the structural information latent in empirical data, to make contact potentials, and other statistically derived folding potentials, more effective tools in protein structure prediction. Foremost, we establish fundamental connections between basic information-theoretic quantities (including the ubiquitous Z-score) and contact "energies" or scores used routinely in protein structure prediction, and demonstrate that the informatic quantity that mediates fold discrimination is the total divergence. We find that pairwise contacts between residues bear a moderate amount of fold information, and if optimized, can assist in the discrimination of native conformations from large ensembles of native-like decoys. Using an extensive battery of threading tests, we demonstrate that parameters that affect the information content of contact potentials (e.g., choice of atoms to define residue location and the cut-off distance between pairs) have a significant influence in their performance in fold recognition. We conclude that potentials that have been optimized for mutual information and that have high number of score events per sequence-structure alignment are superior in identifying the correct fold. We derive the quantity "information product" that embodies these two critical factors. We demonstrate that the information product, which does not require explicit threading to compute, is as effective as the Z-score, which requires expensive decoy threading to evaluate. This new objective function may be able to speed up the multidimensional parameter search for better statistical potentials. Lastly, by demonstrating the functional equivalence of quasi-chemically approximated "energies" to fundamental informatic quantities, we make statistical potentials less dependent on theoretically tenuous biophysical formalisms and more amenable to direct bioinformatic optimization.
Collapse
Affiliation(s)
- Armando D Solis
- Department of Pharmacology and Systems Therapeutics, Mount Sinai School of Medicine, New York, New York 10029, USA
| | | |
Collapse
|
25
|
Ngan SC, Hung LH, Liu T, Samudrala R. Scoring functions for de novo protein structure prediction revisited. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:243-81. [PMID: 18075169 DOI: 10.1007/978-1-59745-574-9_10] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
Abstract
De novo protein structure prediction methods attempt to predict tertiary structures from sequences based on general principles that govern protein folding energetics and/or statistical tendencies of conformational features that native structures acquire, without the use of explicit templates. A general paradigm for de novo prediction involves sampling the conformational space, guided by scoring functions and other sequence-dependent biases, such that a large set of candidate ("decoy") structures are generated, and then selecting native-like conformations from those decoys using scoring functions as well as conformer clustering. High-resolution refinement is sometimes used as a final step to fine-tune native-like structures. There are two major classes of scoring functions. Physics-based functions are based on mathematical models describing aspects of the known physics of molecular interaction. Knowledge-based functions are formed with statistical models capturing aspects of the properties of native protein conformations. We discuss the implementation and use of some of the scoring functions from these two classes for de novo structure prediction in this chapter.
Collapse
Affiliation(s)
- Shing-Chung Ngan
- Department of Microbiology, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | |
Collapse
|
26
|
Felts AK, Gallicchio E, Chekmarev D, Paris KA, Friesner RA, Levy RM. Prediction of Protein Loop Conformations using the AGBNP Implicit Solvent Model and Torsion Angle Sampling. J Chem Theory Comput 2008; 4:855-868. [PMID: 18787648 DOI: 10.1021/ct800051k] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The OPLS-AA all-atom force field and the Analytical Generalized Born plus Non-Polar (AGBNP) implicit solvent model, in conjunction with torsion angle conformational search protocols based on the Protein Local Optimization Program (PLOP), are shown to be effective in predicting the native conformations of 57 9-residue and 35 13-residue loops of a diverse series of proteins with low sequence identity. The novel nonpolar solvation free energy estimator implemented in AGBNP augmented by correction terms aimed at reducing the occurrence of ion pairing are important to achieve the best prediction accuracy. Extended versions of the previously developed PLOP-based conformational search schemes based on calculations in the crystal environment are reported that are suitable for application to loop homology modeling without the crystal environment. Our results suggest that in general the loop backbone conformation is not strongly influenced by crystal packing. The application of the temperature Replica Exchange Molecular Dynamics (T-REMD) sampling method for a few examples where PLOP sampling is insufficient are also reported. The results reported indicate that the OPLS-AA/AGBNP effective potential is suitable for high-resolution modeling of proteins in the final stages of homology modeling and/or protein crystallographic refinement.
Collapse
Affiliation(s)
- Anthony K Felts
- Department of Chemistry and Chemical Biology and BioMaPS Institute for Quantitative Biology, Rutgers University, Piscataway, New Jersey 08854
| | | | | | | | | | | |
Collapse
|
27
|
Rajgaria R, McAllister SR, Floudas CA. Distance dependent centroid to centroid force fields using high resolution decoys. Proteins 2008; 70:950-70. [PMID: 17847088 DOI: 10.1002/prot.21561] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Simplified force fields play an important role in protein structure prediction and de novo protein design by requiring less computational effort than detailed atomistic potentials. A side chain centroid based, distance dependent pairwise interaction potential has been developed. A linear programming based formulation was used in which non-native "decoy" conformers are forced to take a higher energy compared with the corresponding native structure. This model was trained on an enhanced and diverse protein set. High quality decoy structures were generated for approximately 1400 nonhomologous proteins using torsion angle dynamics along with restricted variations of the hydrophobic cores of the native structure. The resulting decoy set was used to train the model yielding two different side chain centroid based force fields that differ in the way distance dependence has been used to calculate energy parameters. These force fields were tested on an independent set of 148 test proteins with 500 decoy structures for each protein. The side chain centroid force fields were successful in correctly identifying approximately 86% native structures. The Z-scores produced by the proposed centroid-centroid distance dependent force fields improved compared with other distance dependent C(alpha)-C(alpha) or side chain based force fields.
Collapse
Affiliation(s)
- R Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | |
Collapse
|
28
|
Benkert P, Tosatto SCE, Schomburg D. QMEAN: A comprehensive scoring function for model quality assessment. Proteins 2008; 71:261-77. [PMID: 17932912 DOI: 10.1002/prot.21715] [Citation(s) in RCA: 733] [Impact Index Per Article: 45.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In protein structure prediction, a considerable number of alternative models are usually produced from which subsequently the final model has to be selected. Thus, a scoring function for the identification of the best model within an ensemble of alternative models is a key component of most protein structure prediction pipelines. QMEAN, which stands for Qualitative Model Energy ANalysis, is a composite scoring function describing the major geometrical aspects of protein structures. Five different structural descriptors are used. The local geometry is analyzed by a new kind of torsion angle potential over three consecutive amino acids. A secondary structure-specific distance-dependent pairwise residue-level potential is used to assess long-range interactions. A solvation potential describes the burial status of the residues. Two simple terms describing the agreement of predicted and calculated secondary structure and solvent accessibility, respectively, are also included. A variety of different implementations are investigated and several approaches to combine and optimize them are discussed. QMEAN was tested on several standard decoy sets including a molecular dynamics simulation decoy set as well as on a comprehensive data set of totally 22,420 models from server predictions for the 95 targets of CASP7. In a comparison to five well-established model quality assessment programs, QMEAN shows a statistically significant improvement over nearly all quality measures describing the ability of the scoring function to identify the native structure and to discriminate good from bad models. The three-residue torsion angle potential turned out to be very effective in recognizing the native fold.
Collapse
Affiliation(s)
- Pascal Benkert
- Institute for Biochemistry, University of Cologne, 50674 Cologne, Germany
| | | | | |
Collapse
|
29
|
Zheng S, Robertson TA, Varani G. A knowledge-based potential function predicts the specificity and relative binding energy of RNA-binding proteins. FEBS J 2007; 274:6378-91. [PMID: 18005254 DOI: 10.1111/j.1742-4658.2007.06155.x] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
RNA-protein interactions are fundamental to gene expression. Thus, the molecular basis for the sequence dependence of protein-RNA recognition has been extensively studied experimentally. However, there have been very few computational studies of this problem, and no sustained attempt has been made towards using computational methods to predict or alter the sequence-specificity of these proteins. In the present study, we provide a distance-dependent statistical potential function derived from our previous work on protein-DNA interactions. This potential function discriminates native structures from decoys, successfully predicts the native sequences recognized by sequence-specific RNA-binding proteins, and recapitulates experimentally determined relative changes in binding energy due to mutations of individual amino acids at protein-RNA interfaces. Thus, this work demonstrates that statistical models allow the quantitative analysis of protein-RNA recognition based on their structure and can be applied to modeling protein-RNA interfaces for prediction and design purposes.
Collapse
Affiliation(s)
- Suxin Zheng
- Department of Chemistry, University of Washington, Seattle, WA 98195, USA
| | | | | |
Collapse
|
30
|
Abstract
Accurate and automated assessment of both geometrical errors and incompleteness of comparative protein structure models is necessary for an adequate use of the models. Here, we describe a composite score for discriminating between models with the correct and incorrect fold. To find an accurate composite score, we designed and applied a genetic algorithm method that searched for a most informative subset of 21 input model features as well as their optimized nonlinear transformation into the composite score. The 21 input features included various statistical potential scores, stereochemistry quality descriptors, sequence alignment scores, geometrical descriptors, and measures of protein packing. The optimized composite score was found to depend on (1) a statistical potential z-score for residue accessibilities and distances, (2) model compactness, and (3) percentage sequence identity of the alignment used to build the model. The accuracy of the composite score was compared with the accuracy of assessment by single and combined features as well as by other commonly used assessment methods. The testing set was representative of models produced by automated comparative modeling on a genomic scale. The composite score performed better than any other tested score in terms of the maximum correct classification rate (i.e., 3.3% false positives and 2.5% false negatives) as well as the sensitivity and specificity across the whole range of thresholds. The composite score was implemented in our program MODELLER-8 and was used to assess models in the MODBASE database that contains comparative models for domains in approximately 1.3 million protein sequences.
Collapse
Affiliation(s)
- Francisco Melo
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile.
| | | |
Collapse
|
31
|
Lin MS, Fawzi NL, Head-Gordon T. Hydrophobic potential of mean force as a solvation function for protein structure prediction. Structure 2007; 15:727-40. [PMID: 17562319 DOI: 10.1016/j.str.2007.05.004] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2006] [Revised: 05/04/2007] [Accepted: 05/07/2007] [Indexed: 10/23/2022]
Abstract
We have developed a solvation function that combines a Generalized Born model for polarization of protein charge by the high dielectric solvent, with a hydrophobic potential of mean force (HPMF) as a model for hydrophobic interaction, to aid in the discrimination of native structures from other misfolded states in protein structure prediction. We find that our energy function outperforms other reported scoring functions in terms of correct native ranking for 91% of proteins and low Z scores for a variety of decoy sets, including the challenging Rosetta decoys. This work shows that the stabilizing effect of hydrophobic exposure to aqueous solvent that defines the HPMF hydration physics is an apparent improvement over solvent-accessible surface area models that penalize hydrophobic exposure. Decoys generated by thermal sampling around the native-state basin reveal a potentially important role for side-chain entropy in the future development of even more accurate free energy surfaces.
Collapse
Affiliation(s)
- Matthew S Lin
- UCSF/UCB Joint Graduate Group in Bioengineering, University of California-Berkeley, Berkeley, CA 94720, USA
| | | | | |
Collapse
|
32
|
Ferrada E, Melo F. Nonbonded terms extrapolated from nonlocal knowledge-based energy functions improve error detection in near-native protein structure models. Protein Sci 2007; 16:1410-21. [PMID: 17586774 PMCID: PMC2206707 DOI: 10.1110/ps.062735907] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
The accurate assessment of structural errors plays a key role in protein structure prediction, constitutes the first step of protein structure refinement, and has a major impact on subsequent functional inference from structural data. In this study, we assess and compare the ability of different full atom knowledge-based potentials to detect small and localized errors in comparative protein structure models of known accuracy. We have evaluated the effect of incorporating close nonbonded pairwise atom terms on the task of classifying residue modeling accuracy. Since the direct and unbiased derivation of close nonbonded terms from current experimental data is not possible, we extrapolated those terms from the corresponding pseudo-energy functions of a nonlocal knowledge-based potential. It is shown that this methodology clearly improves the detection of errors in protein models, suggesting that a proper description of close nonbonded terms is important to achieve a more complete and accurate description of native protein conformations. The use of close nonbonded terms directly derived from experimental data exhibited a poor performance, demonstrating that these terms cannot be accurately obtained by using the current data and methodology. Some external knowledge-based energy functions that are widely used in model assessment also performed poorly, which suggests that the benchmark of models and the specific error detection task tested in this study constituted a difficult challenge. The methodology presented here could be useful to detect localized structural errors not only in high-quality protein models, but also in experimental protein structures.
Collapse
Affiliation(s)
- Evandro Ferrada
- Departmento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | | |
Collapse
|
33
|
Ruvinsky AM. Role of binding entropy in the refinement of protein-ligand docking predictions: analysis based on the use of 11 scoring functions. J Comput Chem 2007; 28:1364-72. [PMID: 17342720 DOI: 10.1002/jcc.20580] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We present results of testing the ability of eleven popular scoring functions to predict native docked positions using a recently developed method (Ruvinsky and Kozintsev, J Comput Chem 2005, 26, 1089) for estimation the entropy contributions of relative motions to protein-ligand binding affinity. The method is based on the integration of the configurational integral over clusters obtained from multiple docked positions. We use a test set of 100 PDB protein-ligand complexes and ensembles of 101 docked positions generated by (Wang et al. J Med Chem 2003, 46, 2287) for each ligand in the test set. To test the suggested method we compared the averaged root-mean square deviations (RMSD) of the top-scored ligand docked positions, accounting and not accounting for entropy contributions, relative to the experimentally determined positions. We demonstrate that the method increases docking accuracy by 10-21% when used in conjunction with the AutoDock scoring function, by 2-25% with G-Score, by 7-41% with D-Score, by 0-8% with LigScore, by 1-6% with PLP, by 0-12% with LUDI, by 2-8% with F-Score, by 7-29% with ChemScore, by 0-9% with X-Score, by 2-19% with PMF, and by 1-7% with DrugScore. We also compared the performance of the suggested method with the method based on ranking by cluster occupancy only. We analyze how the choice of a clustering-RMSD and a low bound of dense clusters impacts on docking accuracy of the scoring methods. We derive optimal intervals of the clustering-RMSD for 11 scoring functions.
Collapse
Affiliation(s)
- Anatoly M Ruvinsky
- Center for Bioinformatics, The University of Kansas, 2030 Becker Drive, Lawrence, Kansas 66047, USA.
| |
Collapse
|
34
|
Küçükural A, Sezerman OU. Discrimination of proteins using graph theoretic properties. BMC SYSTEMS BIOLOGY 2007. [DOI: 10.1186/1752-0509-1-s1-p49] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
35
|
Bhattacharyay A, Trovato A, Seno F. Simple solvation potential for coarse-grained models of proteins. Proteins 2007; 67:285-92. [PMID: 17286285 DOI: 10.1002/prot.21291] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We formulate a simple solvation potential based on a coarsed-grained representation of amino acids with two spheres modeling the C(alpha) atom and an effective side-chain centroid. The potential relies on a new method for estimating the buried area of residues, based on counting the effective number of burying neighbors in a suitable way. This latter quantity shows a good correlation with the buried area of residues computed from all atom crystallographic structures. We check the discriminatory power of the solvation potential alone to identify the native fold of a protein from a set of decoys and show the potential to be considerably selective.
Collapse
Affiliation(s)
- A Bhattacharyay
- Dipartimento di Fisica G.Galilei, Universitá degli Studi di Padova, via F. Marzolo 8, 35131 Padova, Italy.
| | | | | |
Collapse
|
36
|
Fogolari F, Pieri L, Dovier A, Bortolussi L, Giugliarelli G, Corazza A, Esposito G, Viglino P. Scoring predictive models using a reduced representation of proteins: model and energy definition. BMC STRUCTURAL BIOLOGY 2007; 7:15. [PMID: 17378941 PMCID: PMC1854906 DOI: 10.1186/1472-6807-7-15] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2006] [Accepted: 03/23/2007] [Indexed: 11/25/2022]
Abstract
Background Reduced representations of proteins have been playing a keyrole in the study of protein folding. Many such models are available, with different representation detail. Although the usefulness of many such models for structural bioinformatics applications has been demonstrated in recent years, there are few intermediate resolution models endowed with an energy model capable, for instance, of detecting native or native-like structures among decoy sets. The aim of the present work is to provide a discrete empirical potential for a reduced protein model termed here PC2CA, because it employs a PseudoCovalent structure with only 2 Centers of interactions per Amino acid, suitable for protein model quality assessment. Results All protein structures in the set top500H have been converted in reduced form. The distribution of pseudobonds, pseudoangle, pseudodihedrals and distances between centers of interactions have been converted into potentials of mean force. A suitable reference distribution has been defined for non-bonded interactions which takes into account excluded volume effects and protein finite size. The correlation between adjacent main chain pseudodihedrals has been converted in an additional energetic term which is able to account for cooperative effects in secondary structure elements. Local energy surface exploration is performed in order to increase the robustness of the energy function. Conclusion The model and the energy definition proposed have been tested on all the multiple decoys' sets in the Decoys'R'us database. The energetic model is able to recognize, for almost all sets, native-like structures (RMSD less than 2.0 Å). These results and those obtained in the blind CASP7 quality assessment experiment suggest that the model compares well with scoring potentials with finer granularity and could be useful for fast exploration of conformational space. Parameters are available at the url: .
Collapse
Affiliation(s)
- Federico Fogolari
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| | - Lidia Pieri
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
- INAF – Astronomical Observatory of Padova Vicolo dell'Osservatorio 5, I-35122 Padova, Italy
| | - Agostino Dovier
- Dipartimento di Matematica e Informatica, Università di Udine, Via delle Scienze 206, 33100 Udine, Italy
| | - Luca Bortolussi
- Dipartimento di Matematica e Informatica, Università di Udine, Via delle Scienze 206, 33100 Udine, Italy
| | - Gilberto Giugliarelli
- Dipartimento di Fisica, Università di Udine, Via delle Scienze 206, 33100 Udine, Italy
| | - Alessandra Corazza
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| | - Gennaro Esposito
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| | - Paolo Viglino
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| |
Collapse
|
37
|
Protein structure prediction by all-atom free-energy refinement. BMC STRUCTURAL BIOLOGY 2007; 7:12. [PMID: 17371594 PMCID: PMC1832197 DOI: 10.1186/1472-6807-7-12] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2006] [Accepted: 03/19/2007] [Indexed: 11/18/2022]
Abstract
Background The reliable prediction of protein tertiary structure from the amino acid sequence remains challenging even for small proteins. We have developed an all-atom free-energy protein forcefield (PFF01) that we could use to fold several small proteins from completely extended conformations. Because the computational cost of de-novo folding studies rises steeply with system size, this approach is unsuitable for structure prediction purposes. We therefore investigate here a low-cost free-energy relaxation protocol for protein structure prediction that combines heuristic methods for model generation with all-atom free-energy relaxation in PFF01. Results We use PFF01 to rank and cluster the conformations for 32 proteins generated by ROSETTA. For 22/10 high-quality/low quality decoy sets we select near-native conformations with an average Cα root mean square deviation of 3.03 Å/6.04 Å. The protocol incorporates an inherent reliability indicator that succeeds for 78% of the decoy sets. In over 90% of these cases near-native conformations are selected from the decoy set. This success rate is rationalized by the quality of the decoys and the selectivity of the PFF01 forcefield, which ranks near-native conformations an average 3.06 standard deviations below that of the relaxed decoys (Z-score). Conclusion All-atom free-energy relaxation with PFF01 emerges as a powerful low-cost approach toward generic de-novo protein structure prediction. The approach can be applied to large all-atom decoy sets of any origin and requires no preexisting structural information to identify the native conformation. The study provides evidence that a large class of proteins may be foldable by PFF01.
Collapse
|
38
|
Shen MY, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci 2007; 15:2507-24. [PMID: 17075131 PMCID: PMC2242414 DOI: 10.1110/ps.062416606] [Citation(s) in RCA: 1765] [Impact Index Per Article: 103.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.
Collapse
Affiliation(s)
- Min-Yi Shen
- Department of Biopharmaceutical Sciences, Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, California 94158, USA.
| | | |
Collapse
|
39
|
|
40
|
Dong Q, Wang X, Lin L. Novel knowledge-based mean force potential at the profile level. BMC Bioinformatics 2006; 7:324. [PMID: 16803615 PMCID: PMC1534065 DOI: 10.1186/1471-2105-7-324] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2006] [Accepted: 06/27/2006] [Indexed: 11/10/2022] Open
Abstract
Background The development and testing of functions for the modeling of protein energetics is an important part of current research aimed at understanding protein structure and function. Knowledge-based mean force potentials are derived from statistical analyses of interacting groups in experimentally determined protein structures. Current knowledge-based mean force potentials are developed at the atom or amino acid level. The evolutionary information contained in the profiles is not investigated. Based on these observations, a class of novel knowledge-based mean force potentials at the profile level has been presented, which uses the evolutionary information of profiles for developing more powerful statistical potentials. Results The frequency profiles are directly calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into binary profiles with a probability threshold. As a result, the protein sequences are represented as sequences of binary profiles rather than sequences of amino acids. Similar to the knowledge-based potentials at the residue level, a class of novel potentials at the profile level is introduced. We develop four types of profile-level statistical potentials including distance-dependent, contact, Φ/Ψ dihedral angle and accessible surface statistical potentials. These potentials are first evaluated by the fold assessment between the correct and incorrect models generated by comparative modeling from our own and other groups. They are then used to recognize the native structures from well-constructed decoy sets. Experimental results show that all the knowledge-base mean force potentials at the profile level outperform those at the residue level. Significant improvements are obtained for the distance-dependent and accessible surface potentials (5–6%). The contact and Φ/Ψ dihedral angle potential only get a slight improvement (1–2%). Decoy set evaluation results show that the distance-dependent profile-level potentials even outperform other atom-level potentials. We also demonstrate that profile-level statistical potentials can improve the performance of threading. Conclusion The knowledge-base mean force potentials at the profile level can provide better discriminatory ability than those at the residue level, so they will be useful for protein structure prediction and model refinement.
Collapse
Affiliation(s)
- Qiwen Dong
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, PR China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, PR China
| | - Lei Lin
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, PR China
| |
Collapse
|
41
|
Abstract
Scoring functions are widely used in the final step of model selection in protein structure prediction. This is of interest both for comparative modeling targets, where it is important to select the best model among a set of many good, "correct" ones, as well as for other (fold recognition or novel fold) targets, where the set may contain many incorrect models. A novel combination of four knowledge-based potentials recognizing different features of native protein structures is introduced and tested. The pairwise, solvation, hydrogen bond, and torsion angle potentials contain largely orthogonal information. Of these, the torsion angle potential is found to show the strongest correlation with model quality. Combining these features with a linear weighting function, it was possible to construct a robust energy function capable of discriminating native-like structures on several benchmarking sets. In a recent blind test (CAFASP-4 MQAP), the scoring function ranked consistently well and was able to reliably distinguish the correct template from an ensemble of high quality decoys in 52 of 70 cases (33 of 34 for comparative modeling). An executable version of the Victor/FRST function for Linux PCs is available for download from the URL http://protein.cribi.unipd.it/frst/.
Collapse
|
42
|
Skolnick J. In quest of an empirical potential for protein structure prediction. Curr Opin Struct Biol 2006; 16:166-71. [PMID: 16524716 DOI: 10.1016/j.sbi.2006.02.004] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2005] [Revised: 02/10/2006] [Accepted: 02/23/2006] [Indexed: 11/19/2022]
Abstract
Key to successful protein structure prediction is a potential that recognizes the native state from misfolded structures. Recent advances in empirical potentials based on known protein structures include improved reference states for assessing random interactions, sidechain-orientation-dependent pair potentials, potentials for describing secondary or supersecondary structural preferences and, most importantly, optimization protocols that sculpt the energy landscape to enhance the correlation between native-like features and the energy. Improved clustering algorithms that select native-like structures on the basis of cluster density also resulted in greater prediction accuracy. For template-based modeling, these advances allowed improvement in predicted structures relative to their initial template alignments over a wide range of target-template homology. This represents significant progress and suggests applications to proteome-scale structure prediction.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center of Excellence in Bioinformatics, University at Buffalo, 901 Washington Street, Buffalo, NY 14203, USA.
| |
Collapse
|
43
|
Fogolari F, Tosatto SCE, Colombo G. A decoy set for the thermostable subdomain from chicken villin headpiece, comparison of different free energy estimators. BMC Bioinformatics 2005; 6:301. [PMID: 16354298 PMCID: PMC1351271 DOI: 10.1186/1471-2105-6-301] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2005] [Accepted: 12/14/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Estimators of free energies are routinely used to judge the quality of protein structural models. As these estimators still present inaccuracies, they are frequently evaluated by discriminating native or native-like conformations from large ensembles of so-called decoy structures. RESULTS A decoy set is obtained from snapshots taken from 5 long (100 ns) molecular dynamics (MD) simulations of the thermostable subdomain from chicken villin headpiece. An evaluation of the energy of the decoys is given using: i) a residue based contact potential supplemented by a term for the quality of dihedral angles; ii) a recently introduced combination of four statistical scoring functions for model quality estimation (FRST); iii) molecular mechanics with solvation energy estimated either according to the generalized Born surface area (GBSA) or iv) the Poisson-Boltzmann surface area (PBSA) method. CONCLUSION The decoy set presented here has the following features which make it attractive for testing energy scoring functions:1) it covers a broad range of RMSD values (from less than 2.0 A to more than 12 A);2) it has been obtained from molecular dynamics trajectories, starting from different non-native-like conformations which have diverse behaviour, with secondary structure elements correctly or incorrectly formed, and in one case folding to a native-like structure. This allows not only for scoring of static structures, but also for studying, using free energy estimators, the kinetics of folding;3) all structures have been obtained from accurate MD simulations in explicit solvent and after molecular mechanics (MM) energy minimization using an implicit solvent method. The quality of the covalent structure therefore does not suffer from steric or covalent problems. The statistical and physical effective energy functions tested on the set behave differently when native simulation snapshots are included or not in the set and when averaging over the trajectory is performed.
Collapse
Affiliation(s)
- Federico Fogolari
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| | - Silvio CE Tosatto
- Dipartimento di Biologia and CRIBI Biotech Centre, Università di Padova, Viale G. Colombo 3, 35131 Padova, Italy
| | - Giorgio Colombo
- Istituto di Chimica del Riconoscimento Molecolare, CNR, Via Mario Bianco 9, 20131 Milano, Italy
| |
Collapse
|
44
|
Hung LH, Ngan SC, Liu T, Samudrala R. PROTINFO: new algorithms for enhanced protein structure predictions. Nucleic Acids Res 2005; 33:W77-80. [PMID: 15980581 PMCID: PMC1160164 DOI: 10.1093/nar/gki403] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
We describe new algorithms and modules for protein structure prediction available as part of the PROTINFO web server. The modules, comparative and de novo modelling, have significantly improved back-end algorithms that were rigorously evaluated at the sixth meeting on the Critical Assessment of Protein Structure Prediction methods. We were one of four server groups invited to make an oral presentation (only the best performing groups are asked to do so). These two modules allow a user to submit a protein sequence and return atomic coordinates representing the tertiary structure of that protein. The PROTINFO server is available at .
Collapse
Affiliation(s)
| | | | | | - Ram Samudrala
- To whom correspondence should be addressed. Tel: +1 206 732 6122; Fax: +1 206 732 6055;
| |
Collapse
|
45
|
|