101
|
Dutagaci B, Wittayanarakul K, Mori T, Feig M. Discrimination of Native-like States of Membrane Proteins with Implicit Membrane-based Scoring Functions. J Chem Theory Comput 2017; 13:3049-3059. [PMID: 28475346 DOI: 10.1021/acs.jctc.7b00254] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
A scoring protocol based on implicit membrane-based scoring functions and a new protocol for optimizing the positioning of proteins inside the membrane was evaluated for its capacity to discriminate native-like states from misfolded decoys. A decoy set previously established by the Baker lab (Proteins: Struct., Funct., Genet. 2006, 62, 1010-1025) was used along with a second set that was generated to cover higher resolution models. The Implicit Membrane Model 1 (IMM1), IMM1 model with CHARMM 36 parameters (IMM1-p36), generalized Born with simple switching (GBSW), and heterogeneous dielectric generalized Born versions 2 (HDGBv2) and 3 (HDGBv3) were tested along with the new HDGB van der Waals (HDGBvdW) model that adds implicit van der Waals contributions to the solvation free energy. For comparison, scores were also calculated with the distance-scaled finite ideal-gas reference (DFIRE) scoring function. Z-scores for native state discrimination, energy vs root-mean-square deviation (RMSD) correlations, and the ability to select the most native-like structures as top-scoring decoys were evaluated to assess the performance of the scoring functions. Ranking of the decoys in the Baker set that were relatively far from the native state was challenging and dominated largely by packing interactions that were captured best by DFIRE with less benefit of the implicit membrane-based models. Accounting for the membrane environment was much more important in the second decoy set where especially the HDGB-based scoring functions performed very well in ranking decoys and providing significant correlations between scores and RMSD, which shows promise for improving membrane protein structure prediction and refinement applications. The new membrane structure scoring protocol was implemented in the MEMScore web server ( http://feiglab.org/memscore ).
Collapse
Affiliation(s)
- Bercem Dutagaci
- Department of Biochemistry and Molecular Biology, Michigan State University , East Lansing, Michigan, United States
| | - Kitiyaporn Wittayanarakul
- Department of Natural Resource and Environmental Management, Faculty of Applied Science and Engineering, Khon Kaen University , Nong Khai Campus, Nong Khai 43000, Thailand
| | - Takaharu Mori
- Theoretical Molecular Science Laboratory, RIKEN , Wako-shi, Japan
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University , East Lansing, Michigan, United States
| |
Collapse
|
102
|
Feig M. Computational protein structure refinement: Almost there, yet still so far to go. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL MOLECULAR SCIENCE 2017; 7:e1307. [PMID: 30613211 PMCID: PMC6319934 DOI: 10.1002/wcms.1307] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Protein structures are essential in modern biology yet experimental methods are far from being able to catch up with the rapid increase in available genomic data. Computational protein structure prediction methods aim to fill the gap while the role of protein structure refinement is to take approximate initial template-based models and bring them closer to the true native structure. Current methods for computational structure refinement rely on molecular dynamics simulations, related sampling methods, or iterative structure optimization protocols. The best methods are able to achieve moderate degrees of refinement but consistent refinement that can reach near-experimental accuracy remains elusive. Key issues revolve around the accuracy of the energy function, the inability to reliably rank multiple models, and the use of restraints that keep sampling close to the native state but also limit the degree of possible refinement. A different aspect is the question of what exactly the target of high-resolution refinement should be as experimental structures are affected by experimental conditions and different biological questions require varying levels of accuracy. While improvement of the global protein structure is a difficult problem, high-resolution refinement methods that improves local structural quality such as favorable stereochemistry and the avoidance of atomic clashes are much more successful.
Collapse
Affiliation(s)
- Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, 603 Wilson Rd., Room 218 BCH, East Lansing, MI, USA, ; 517-432-7439
| |
Collapse
|
103
|
Olechnovič K, Venclovas Č. VoroMQA: Assessment of protein structure quality using interatomic contact areas. Proteins 2017; 85:1131-1145. [DOI: 10.1002/prot.25278] [Citation(s) in RCA: 104] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Revised: 01/13/2017] [Accepted: 02/21/2017] [Indexed: 12/14/2022]
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Vilnius University; Saulėtekio 7 LT-10257 Vilnius Lithuania
- Faculty of Mathematics and Informatics; Vilnius University; Naugarduko 24 LT-03225 Vilnius Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Vilnius University; Saulėtekio 7 LT-10257 Vilnius Lithuania
| |
Collapse
|
104
|
Khoury GA, Smadbeck J, Kieslich CA, Koskosidis AJ, Guzman YA, Tamamis P, Floudas CA. Princeton_TIGRESS 2.0: High refinement consistency and net gains through support vector machines and molecular dynamics in double-blind predictions during the CASP11 experiment. Proteins 2017; 85:1078-1098. [PMID: 28241391 DOI: 10.1002/prot.25274] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Revised: 02/01/2017] [Accepted: 02/14/2017] [Indexed: 12/28/2022]
Abstract
Protein structure refinement is the challenging problem of operating on any protein structure prediction to improve its accuracy with respect to the native structure in a blind fashion. Although many approaches have been developed and tested during the last four CASP experiments, a majority of the methods continue to degrade models rather than improve them. Princeton_TIGRESS (Khoury et al., Proteins 2014;82:794-814) was developed previously and utilizes separate sampling and selection stages involving Monte Carlo and molecular dynamics simulations and classification using an SVM predictor. The initial implementation was shown to consistently refine protein structures 76% of the time in our own internal benchmarking on CASP 7-10 targets. In this work, we improved the sampling and selection stages and tested the method in blind predictions during CASP11. We added a decomposition of physics-based and hybrid energy functions, as well as a coordinate-free representation of the protein structure through distance-binning Cα-Cα distances to capture fine-grained movements. We performed parameter estimation to optimize the adjustable SVM parameters to maximize precision while balancing sensitivity and specificity across all cross-validated data sets, finding enrichment in our ability to select models from the populations of similar decoys generated for targets in CASPs 7-10. The MD stage was enhanced such that larger structures could be further refined. Among refinement methods that are currently implemented as web-servers, Princeton_TIGRESS 2.0 demonstrated the most consistent and most substantial net refinement in blind predictions during CASP11. The enhanced refinement protocol Princeton_TIGRESS 2.0 is freely available as a web server at http://atlas.engr.tamu.edu/refinement/. Proteins 2017; 85:1078-1098. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- George A Khoury
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey
| | - James Smadbeck
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey
| | - Chris A Kieslich
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas.,Texas A&M Energy Institute, Texas A&M University, College Station, Texas
| | - Alexandra J Koskosidis
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas.,Texas A&M Energy Institute, Texas A&M University, College Station, Texas
| | - Yannis A Guzman
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey.,Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas.,Texas A&M Energy Institute, Texas A&M University, College Station, Texas
| | - Phanourios Tamamis
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas.,Texas A&M Energy Institute, Texas A&M University, College Station, Texas
| | - Christodoulos A Floudas
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas.,Texas A&M Energy Institute, Texas A&M University, College Station, Texas
| |
Collapse
|
105
|
Knowledge-based entropies improve the identification of native protein structures. Proc Natl Acad Sci U S A 2017; 114:2928-2933. [PMID: 28265078 DOI: 10.1073/pnas.1613331114] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Evaluating protein structures requires reliable free energies with good estimates of both potential energies and entropies. Although there are many demonstrated successes from using knowledge-based potential energies, computing entropies of proteins has lagged far behind. Here we take an entirely different approach and evaluate knowledge-based conformational entropies of proteins based on the observed frequencies of contact changes between amino acids in a set of 167 diverse proteins, each of which has two alternative structures. The results show that charged and polar interactions break more often than hydrophobic pairs. This pattern correlates strongly with the average solvent exposure of amino acids in globular proteins, as well as with polarity indices and the sizes of the amino acids. Knowledge-based entropies are derived by using the inverse Boltzmann relationship, in a manner analogous to the way that knowledge-based potentials have been extracted. Including these new knowledge-based entropies almost doubles the performance of knowledge-based potentials in selecting the native protein structures from decoy sets. Beyond the overall energy-entropy compensation, a similar compensation is seen for individual pairs of interacting amino acids. The entropies in this report have immediate applications for 3D structure prediction, protein model assessment, and protein engineering and design.
Collapse
|
106
|
Pfeiffenberger E, Chaleil RA, Moal IH, Bates PA. A machine learning approach for ranking clusters of docked protein-protein complexes by pairwise cluster comparison. Proteins 2017; 85:528-543. [PMID: 27935158 PMCID: PMC5396268 DOI: 10.1002/prot.25218] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Revised: 11/14/2016] [Accepted: 11/21/2016] [Indexed: 01/28/2023]
Abstract
Reliable identification of near-native poses of docked protein-protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein-protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding sites is not unusual. Often, ranking protocols are based on initial clustering of docked poses followed by the application of an energy function to rank each cluster according to its lowest energy member. Here, we present an approach of cluster ranking based not only on one molecular descriptor (e.g., an energy function) but also employing a large number of descriptors that are integrated in a machine learning model, whereby, an extremely randomized tree classifier based on 109 molecular descriptors is trained. The protocol is based on first locally enriching clusters with additional poses, the clusters are then characterized using features describing the distribution of molecular descriptors within the cluster, which are combined into a pairwise cluster comparison model to discriminate near-native from incorrect clusters. The results show that our approach is able to identify clusters containing near-native protein-protein complexes. In addition, we present an analysis of the descriptors with respect to their power to discriminate near native from incorrect clusters and how data transformations and recursive feature elimination can improve the ranking performance. Proteins 2017; 85:528-543. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
| | | | - Iain H. Moal
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute, Wellcome Trust Genome Campus, HinxtonCambridgeCB10 1SDUK
| | - Paul A. Bates
- Biomolecular Modelling LaboratoryThe Francis Crick InstituteLondonNW1 1ATUK
| |
Collapse
|
107
|
Gao P, Wang S, Lv J, Wang Y, Ma Y. A database assisted protein structure prediction method via a swarm intelligence algorithm. RSC Adv 2017. [DOI: 10.1039/c7ra07461a] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A swarm-intelligence-based protein structure prediction method holds promise for narrowing the sequence-structure gap of proteins.
Collapse
Affiliation(s)
- Pengyue Gao
- State Key Laboratory of Superhard Materials
- Jilin University
- Changchun 130012
- China
| | - Sheng Wang
- State Key Laboratory of Superhard Materials
- Jilin University
- Changchun 130012
- China
| | - Jian Lv
- College of Materials Science and Engineering
- Jilin University
- Changchun 130012
- China
| | - Yanchao Wang
- State Key Laboratory of Superhard Materials
- Jilin University
- Changchun 130012
- China
| | - Yanming Ma
- State Key Laboratory of Superhard Materials
- Jilin University
- Changchun 130012
- China
| |
Collapse
|
108
|
Li H, Lyu Q, Cheng J. A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning. ACTA ACUST UNITED AC 2016; 9:306-313. [PMID: 29081613 PMCID: PMC5658031 DOI: 10.4172/jpb.1000419] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.
Collapse
Affiliation(s)
- Haiou Li
- Department of Computer Science and Technology, Soochow University, Suzhou, 215006, China.,Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Qiang Lyu
- Department of Computer Science and Technology, Soochow University, Suzhou, 215006, China
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
109
|
Cao R, Bhattacharya D, Hou J, Cheng J. DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics 2016; 17:495. [PMID: 27919220 PMCID: PMC5139030 DOI: 10.1186/s12859-016-1405-y] [Citation(s) in RCA: 112] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 12/01/2016] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. RESULTS We introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information. The deep belief network is trained on several large datasets consisting of models from the Critical Assessment of Protein Structure Prediction (CASP) experiments, several publicly available datasets, and models generated by our in-house ab initio method. Our experiments demonstrate that deep belief network has better performance compared to Support Vector Machines and Neural Networks on the protein model quality assessment problem, and our method DeepQA achieves the state-of-the-art performance on CASP11 dataset. It also outperformed two well-established methods in selecting good outlier models from a large set of models of mostly low quality generated by ab initio modeling methods. CONCLUSION DeepQA is a useful deep learning tool for protein single model quality assessment and protein structure prediction. The source code, executable, document and training/test datasets of DeepQA for Linux is freely available to non-commercial users at http://cactus.rnet.missouri.edu/DeepQA/ .
Collapse
Affiliation(s)
- Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, 98447, USA
| | - Debswapna Bhattacharya
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS, 67260, USA
| | - Jie Hou
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, USA. .,Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
110
|
Jing X, Wang K, Lu R, Dong Q. Sorting protein decoys by machine-learning-to-rank. Sci Rep 2016; 6:31571. [PMID: 27530967 PMCID: PMC4987638 DOI: 10.1038/srep31571] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 07/26/2016] [Indexed: 11/18/2022] Open
Abstract
Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai 200433, People’s Republic of China
| | - Kai Wang
- College of Animal Science and Technology, Jilin Agricultural University, Changchun 130118, People’s Republic of China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai 200433, People’s Republic of China
| | - Qiwen Dong
- Institute for Data Science and Engineering, East China Normal University, Shanghai 200062, People’s Republic of China
| |
Collapse
|
111
|
Topham CM, Barbe S, André I. An Atomistic Statistically Effective Energy Function for Computational Protein Design. J Chem Theory Comput 2016; 12:4146-68. [PMID: 27341125 DOI: 10.1021/acs.jctc.6b00090] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Shortcomings in the definition of effective free-energy surfaces of proteins are recognized to be a major contributory factor responsible for the low success rates of existing automated methods for computational protein design (CPD). The formulation of an atomistic statistically effective energy function (SEEF) suitable for a wide range of CPD applications and its derivation from structural data extracted from protein domains and protein-ligand complexes are described here. The proposed energy function comprises nonlocal atom-based and local residue-based SEEFs, which are coupled using a novel atom connectivity number factor to scale short-range, pairwise, nonbonded atomic interaction energies and a surface-area-dependent cavity energy term. This energy function was used to derive additional SEEFs describing the unfolded-state ensemble of any given residue sequence based on computed average energies for partially or fully solvent-exposed fragments in regions of irregular structure in native proteins. Relative thermal stabilities of 97 T4 bacteriophage lysozyme mutants were predicted from calculated energy differences for folded and unfolded states with an average unsigned error (AUE) of 0.84 kcal mol(-1) when compared to experiment. To demonstrate the utility of the energy function for CPD, further validation was carried out in tests of its capacity to recover cognate protein sequences and to discriminate native and near-native protein folds, loop conformers, and small-molecule ligand binding poses from non-native benchmark decoys. Experimental ligand binding free energies for a diverse set of 80 protein complexes could be predicted with an AUE of 2.4 kcal mol(-1) using an additional energy term to account for the loss in ligand configurational entropy upon binding. The atomistic SEEF is expected to improve the accuracy of residue-based coarse-grained SEEFs currently used in CPD and to extend the range of applications of extant atom-based protein statistical potentials.
Collapse
Affiliation(s)
- Christopher M Topham
- Université de Toulouse; INSA, UPS, INP; LISBP , 135 Avenue de Rangueil, F-31077 Toulouse, France.,CNRS, UMR5504 , F-31400 Toulouse, France.,INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés , F-31400 Toulouse, France
| | - Sophie Barbe
- Université de Toulouse; INSA, UPS, INP; LISBP , 135 Avenue de Rangueil, F-31077 Toulouse, France.,CNRS, UMR5504 , F-31400 Toulouse, France.,INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés , F-31400 Toulouse, France
| | - Isabelle André
- Université de Toulouse; INSA, UPS, INP; LISBP , 135 Avenue de Rangueil, F-31077 Toulouse, France.,CNRS, UMR5504 , F-31400 Toulouse, France.,INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés , F-31400 Toulouse, France
| |
Collapse
|
112
|
Quan L, Lv Q, Zhang Y. STRUM: structure-based prediction of protein stability changes upon single-point mutation. Bioinformatics 2016; 32:2936-46. [PMID: 27318206 DOI: 10.1093/bioinformatics/btw361] [Citation(s) in RCA: 236] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 06/06/2016] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. RESULTS We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. AVAILABILITY AND IMPLEMENTATION http://zhanglab.ccmb.med.umich.edu/STRUM/ CONTACT: qiang@suda.edu.cn and zhng@umich.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lijun Quan
- School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China Department of Computational Medicine and Bioinformatics, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Qiang Lv
- School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, Jiangsu, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA Department of Biological Chemistry, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| |
Collapse
|
113
|
Zheng Z, Wang T, Li P, Merz KM. KECSA-Movable Type Implicit Solvation Model (KMTISM). J Chem Theory Comput 2016; 11:667-82. [PMID: 25691832 PMCID: PMC4325602 DOI: 10.1021/ct5007828] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Indexed: 11/30/2022]
Abstract
![]()
Computation
of the solvation free energy for chemical and biological
processes has long been of significant interest. The key challenges
to effective solvation modeling center on the choice of potential
function and configurational sampling. Herein, an energy sampling
approach termed the “Movable Type” (MT) method, and
a statistical energy function for solvation modeling, “Knowledge-based
and Empirical Combined Scoring Algorithm” (KECSA) are developed
and utilized to create an implicit solvation model: KECSA-Movable
Type Implicit Solvation Model (KMTISM) suitable for the study of chemical
and biological systems. KMTISM is an implicit solvation model, but
the MT method performs energy sampling at the atom pairwise level.
For a specific molecular system, the MT method collects energies from
prebuilt databases for the requisite atom pairs at all relevant distance
ranges, which by its very construction encodes all possible molecular
configurations simultaneously. Unlike traditional statistical energy
functions, KECSA converts structural statistical information into
categorized atom pairwise interaction energies as a function of the
radial distance instead of a mean force energy function. Within the
implicit solvent model approximation, aqueous solvation free energies
are then obtained from the NVT ensemble partition function generated
by the MT method. Validation is performed against several subsets
selected from the Minnesota Solvation Database v2012. Results are
compared with several solvation free energy calculation methods, including
a one-to-one comparison against two commonly used classical implicit
solvation models: MM-GBSA and MM-PBSA. Comparison against a quantum
mechanics based polarizable continuum model is also discussed (Cramer
and Truhlar’s Solvation Model 12).
Collapse
Affiliation(s)
- Zheng Zheng
- Institute for Cyber Enabled Research, Department of Chemistry and Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824-1322, United States
| | | | | | | |
Collapse
|
114
|
Discriminate protein decoys from native by using a scoring function based on ubiquitous Phi and Psi angles computed for all atom. J Theor Biol 2016; 398:112-21. [DOI: 10.1016/j.jtbi.2016.03.029] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2015] [Revised: 02/26/2016] [Accepted: 03/17/2016] [Indexed: 12/20/2022]
|
115
|
Urquiza-Carvalho GA, Fragoso WD, Rocha GB. Assessment of semiempirical enthalpy of formation in solution as an effective energy function to discriminate native-like structures in protein decoy sets. J Comput Chem 2016; 37:1962-72. [DOI: 10.1002/jcc.24415] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Revised: 04/29/2016] [Accepted: 05/11/2016] [Indexed: 11/09/2022]
Affiliation(s)
- Gabriel Aires Urquiza-Carvalho
- Departamento De QúImica; CCEN, Universidade Federal Da ParáIba; Jõao, Pessoa/PB, Caixa Postal: 5093 CEP: 58051-970 Brazil
| | - Wallace Duarte Fragoso
- Departamento De QúImica; CCEN, Universidade Federal Da ParáIba; Jõao, Pessoa/PB, Caixa Postal: 5093 CEP: 58051-970 Brazil
| | - Gerd Bruno Rocha
- Departamento De QúImica; CCEN, Universidade Federal Da ParáIba; Jõao, Pessoa/PB, Caixa Postal: 5093 CEP: 58051-970 Brazil
| |
Collapse
|
116
|
Chen L, He J. A distance- and orientation-dependent energy function of amino acid key blocks. Biopolymers 2016; 101:681-92. [PMID: 24222511 DOI: 10.1002/bip.22440] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 10/31/2013] [Accepted: 11/01/2013] [Indexed: 01/03/2023]
Abstract
Blocks are the selected portions of amino acids. They have been used effectively to represent amino acids in distinguishing the native conformation from the decoys. Although many statistical energy functions exist, most of them rely on the distances between two or more amino acids. In this study, the authors have developed a pairwise energy function "DOKB" that is both distance and orientation dependent, and it is based on the key blocks that bias the distal ends of side chains. The results suggest that both the distance and the orientation are needed to distinguish the fine details of the packing geometry. DOKB appears to perform well in recognizing native conformations when compared with six other energy functions. Highly packed clusters play important roles in stabilizing the structure. The investigation about the highly packed clusters at the residue level suggests that certain residue pairs in a low-energy region have lower probability to appear in the highly packed clusters than in the entire protein. The cluster energy term appears to significantly improve the recognition of the native conformations in ig_structal decoy set, in which more highly packed clusters are contained than in other decoy sets.
Collapse
Affiliation(s)
- Lin Chen
- Department of Computer Science, Old Dominion University, Norfolk, Virginia
| | | |
Collapse
|
117
|
Li J, Cheng J. A Stochastic Point Cloud Sampling Method for Multi-Template Protein Comparative Modeling. Sci Rep 2016; 6:25687. [PMID: 27161489 PMCID: PMC4861977 DOI: 10.1038/srep25687] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 04/21/2016] [Indexed: 12/04/2022] Open
Abstract
Generating tertiary structural models for a target protein from the known structure of its homologous template proteins and their pairwise sequence alignment is a key step in protein comparative modeling. Here, we developed a new stochastic point cloud sampling method, called MTMG, for multi-template protein model generation. The method first superposes the backbones of template structures, and the Cα atoms of the superposed templates form a point cloud for each position of a target protein, which are represented by a three-dimensional multivariate normal distribution. MTMG stochastically resamples the positions for Cα atoms of the residues whose positions are uncertain from the distribution, and accepts or rejects new position according to a simulated annealing protocol, which effectively removes atomic clashes commonly encountered in multi-template comparative modeling. We benchmarked MTMG on 1,033 sequence alignments generated for CASP9, CASP10 and CASP11 targets, respectively. Using multiple templates with MTMG improves the GDT-TS score and TM-score of structural models by 2.96–6.37% and 2.42–5.19% on the three datasets over using single templates. MTMG’s performance was comparable to Modeller in terms of GDT-TS score, TM-score, and GDT-HA score, while the average RMSD was improved by a new sampling approach. The MTMG software is freely available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/mtmg.html.
Collapse
Affiliation(s)
- Jilong Li
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA.,Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
118
|
Bhattacharya D, Nowotny J, Cao R, Cheng J. 3Drefine: an interactive web server for efficient protein structure refinement. Nucleic Acids Res 2016; 44:W406-9. [PMID: 27131371 PMCID: PMC4987902 DOI: 10.1093/nar/gkw336] [Citation(s) in RCA: 300] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2016] [Accepted: 04/15/2016] [Indexed: 11/14/2022] Open
Abstract
3Drefine is an interactive web server for consistent and computationally efficient protein structure refinement with the capability to perform web-based statistical and visual analysis. The 3Drefine refinement protocol utilizes iterative optimization of hydrogen bonding network combined with atomic-level energy minimization on the optimized model using a composite physics and knowledge-based force fields for efficient protein structure refinement. The method has been extensively evaluated on blind CASP experiments as well as on large-scale and diverse benchmark datasets and exhibits consistent improvement over the initial structure in both global and local structural quality measures. The 3Drefine web server allows for convenient protein structure refinement through a text or file input submission, email notification, provided example submission and is freely available without any registration requirement. The server also provides comprehensive analysis of submissions through various energy and statistical feedback and interactive visualization of multiple refined models through the JSmol applet that is equipped with numerous protein model analysis tools. The web server has been extensively tested and used by many users. As a result, the 3Drefine web server conveniently provides a useful tool easily accessible to the community. The 3Drefine web server has been made publicly available at the URL: http://sysbio.rnet.missouri.edu/3Drefine/.
Collapse
Affiliation(s)
| | - Jackson Nowotny
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Renzhi Cao
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA Informatics Institute, University of Missouri, Columbia, MO 65211, USA C. Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
119
|
Protein single-model quality assessment by feature-based probability density functions. Sci Rep 2016; 6:23990. [PMID: 27041353 PMCID: PMC4819172 DOI: 10.1038/srep23990] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Accepted: 03/17/2016] [Indexed: 11/11/2022] Open
Abstract
Protein quality assessment (QA) has played an important role in protein structure prediction. We developed a novel single-model quality assessment method–Qprob. Qprob calculates the absolute error for each protein feature value against the true quality scores (i.e. GDT-TS scores) of protein structural models, and uses them to estimate its probability density distribution for quality assessment. Qprob has been blindly tested on the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM-NOVEL server. The official CASP result shows that Qprob ranks as one of the top single-model QA methods. In addition, Qprob makes contributions to our protein tertiary structure predictor MULTICOM, which is officially ranked 3rd out of 143 predictors. The good performance shows that Qprob is good at assessing the quality of models of hard targets. These results demonstrate that this new probability density distribution based method is effective for protein single-model quality assessment and is useful for protein structure prediction. The webserver of Qprob is available at: http://calla.rnet.missouri.edu/qprob/. The software is now freely available in the web server of Qprob.
Collapse
|
120
|
Hoque MT, Yang Y, Mishra A, Zhou Y. s
DFIRE
: Sequence‐specific statistical energy function for protein structure prediction by decoy selections. J Comput Chem 2016; 37:1119-24. [DOI: 10.1002/jcc.24298] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2015] [Revised: 12/06/2015] [Accepted: 12/13/2015] [Indexed: 12/15/2022]
Affiliation(s)
- Md Tamjidul Hoque
- Computer Science, University of New Orleans, New OrleansLouisiana70148
| | - Yuedong Yang
- Institute for Glycomics and School of Informatics and Communication Technology, Griffith UniversityQueensland4222 Australia
| | - Avdesh Mishra
- Computer Science, University of New Orleans, New OrleansLouisiana70148
| | - Yaoqi Zhou
- Institute for Glycomics and School of Informatics and Communication Technology, Griffith UniversityQueensland4222 Australia
| |
Collapse
|
121
|
|
122
|
Cao R, Bhattacharya D, Adhikari B, Li J, Cheng J. Large-scale model quality assessment for improving protein tertiary structure prediction. Bioinformatics 2015; 31:i116-23. [PMID: 26072473 PMCID: PMC4553833 DOI: 10.1093/bioinformatics/btv235] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Motivation: Sampling structural models and ranking them are the two major challenges of protein structure prediction. Traditional protein structure prediction methods generally use one or a few quality assessment (QA) methods to select the best-predicted models, which cannot consistently select relatively better models and rank a large number of models well. Results: Here, we develop a novel large-scale model QA method in conjunction with model clustering to rank and select protein structural models. It unprecedentedly applied 14 model QA methods to generate consensus model rankings, followed by model refinement based on model combination (i.e. averaging). Our experiment demonstrates that the large-scale model QA approach is more consistent and robust in selecting models of better quality than any individual QA method. Our method was blindly tested during the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM group. It was officially ranked third out of all 143 human and server predictors according to the total scores of the first models predicted for 78 CASP11 protein domains and second according to the total scores of the best of the five models predicted for these domains. MULTICOM’s outstanding performance in the extremely competitive 2014 CASP11 experiment proves that our large-scale QA approach together with model clustering is a promising solution to one of the two major problems in protein structure modeling. Availability and implementation: The web server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/human/. Contact: chengji@missouri.edu
Collapse
Affiliation(s)
- Renzhi Cao
- Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA, Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA and C. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA
| | - Debswapna Bhattacharya
- Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA, Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA and C. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA
| | - Badri Adhikari
- Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA, Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA and C. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA
| | - Jilong Li
- Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA, Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA and C. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA
| | - Jianlin Cheng
- Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA, Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA and C. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA, Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA and C. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA, Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA and C. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA
| |
Collapse
|
123
|
Berjanskii M, Arndt D, Liang Y, Wishart DS. A robust algorithm for optimizing protein structures with NMR chemical shifts. JOURNAL OF BIOMOLECULAR NMR 2015; 63:255-264. [PMID: 26345175 DOI: 10.1007/s10858-015-9982-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Accepted: 08/27/2015] [Indexed: 06/05/2023]
Abstract
Over the past decade, a number of methods have been developed to determine the approximate structure of proteins using minimal NMR experimental information such as chemical shifts alone, sparse NOEs alone or a combination of comparative modeling data and chemical shifts. However, there have been relatively few methods that allow these approximate models to be substantively refined or improved using the available NMR chemical shift data. Here, we present a novel method, called Chemical Shift driven Genetic Algorithm for biased Molecular Dynamics (CS-GAMDy), for the robust optimization of protein structures using experimental NMR chemical shifts. The method incorporates knowledge-based scoring functions and structural information derived from NMR chemical shifts via a unique combination of multi-objective MD biasing, a genetic algorithm, and the widely used XPLOR molecular modelling language. Using this approach, we demonstrate that CS-GAMDy is able to refine and/or fold models that are as much as 10 Å (RMSD) away from the correct structure using only NMR chemical shift data. CS-GAMDy is also able to refine of a wide range of approximate or mildly erroneous protein structures to more closely match the known/correct structure and the known/correct chemical shifts. We believe CS-GAMDy will allow protein models generated by sparse restraint or chemical-shift-only methods to achieve sufficiently high quality to be considered fully refined and "PDB worthy". The CS-GAMDy algorithm is explained in detail and its performance is compared over a range of refinement scenarios with several commonly used protein structure refinement protocols. The program has been designed to be easily installed and easily used and is available at http://www.gamdy.ca.
Collapse
Affiliation(s)
- Mark Berjanskii
- Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada
| | - David Arndt
- Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada
| | - Yongjie Liang
- Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada
| | - David S Wishart
- Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada.
- Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2E9, Canada.
- National Research Council, National Institute for Nanotechnology (NINT), Edmonton, AB, T6G 2M9, Canada.
| |
Collapse
|
124
|
A large-scale conformation sampling and evaluation server for protein tertiary structure prediction and its assessment in CASP11. BMC Bioinformatics 2015; 16:337. [PMID: 26493701 PMCID: PMC4619059 DOI: 10.1186/s12859-015-0775-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 10/14/2015] [Indexed: 11/10/2022] Open
Abstract
Background With more and more protein sequences produced in the genomic era, predicting protein structures from sequences becomes very important for elucidating the molecular details and functions of these proteins for biomedical research. Traditional template-based protein structure prediction methods tend to focus on identifying the best templates, generating the best alignments, and applying the best energy function to rank models, which often cannot achieve the best performance because of the difficulty of obtaining best templates, alignments, and models. Methods We developed a large-scale conformation sampling and evaluation method and its servers to improve the reliability and robustness of protein structure prediction. In the first step, our method used a variety of alignment methods to sample relevant and complementary templates and to generate alternative and diverse target-template alignments, used a template and alignment combination protocol to combine alignments, and used template-based and template-free modeling methods to generate a pool of conformations for a target protein. In the second step, it used a large number of protein model quality assessment methods to evaluate and rank the models in the protein model pool, in conjunction with an exception handling strategy to deal with any additional failure in model ranking. Results The method was implemented as two protein structure prediction servers: MULTICOM-CONSTRUCT and MULTICOM-CLUSTER that participated in the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) in 2014. The two servers were ranked among the best 10 server predictors. Conclusions The good performance of our servers in CASP11 demonstrates the effectiveness and robustness of the large-scale conformation sampling and evaluation. The MULTICOM server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0775-x) contains supplementary material, which is available to authorized users.
Collapse
|
125
|
Deng H, Jia Y, Zhang Y. 3DRobot: automated generation of diverse and well-packed protein structure decoys. Bioinformatics 2015; 32:378-87. [PMID: 26471454 DOI: 10.1093/bioinformatics/btv601] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 10/10/2015] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Computationally generated non-native protein structure conformations (or decoys) are often used for designing protein folding simulation methods and force fields. However, almost all the decoy sets currently used in literature suffer from uneven root mean square deviation (RMSD) distribution with bias to non-protein like hydrogen-bonding and compactness patterns. Meanwhile, most protein decoy sets are pre-calculated and there is a lack of methods for automated generation of high-quality decoys for any target proteins. RESULTS We developed a new algorithm, 3DRobot, to create protein structure decoys by free fragment assembly with enhanced hydrogen-bonding and compactness interactions. The method was benchmarked with three widely used decoy sets from ab initio folding and comparative modeling simulations. The decoys generated by 3DRobot are shown to have significantly enhanced diversity and evenness with a continuous distribution in the RMSD space. The new energy terms introduced in 3DRobot improve the hydrogen-bonding network and compactness of decoys, which eliminates the possibility of native structure recognition by trivial potentials. Algorithms that can automatically create such diverse and well-packed non-native conformations from any protein structure should have a broad impact on the development of advanced protein force field and folding simulation methods. AVAILIABLITY AND IMPLEMENTATION: http://zhanglab.ccmb.med.umich.edu/3DRobot/ CONTACT jiay@phy.ccnu.edu.cn; zhng@umich.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Haiyou Deng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 45108, USA, Department of Physics and Institute of Biophysics, Central China Normal University, Wuhan 430079, China and
| | - Ya Jia
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 45108, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 45108, USA, Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 45108, USA
| |
Collapse
|
126
|
Cao R, Bhattacharya D, Adhikari B, Li J, Cheng J. Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11. Proteins 2015; 84 Suppl 1:247-59. [PMID: 26369671 DOI: 10.1002/prot.24924] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Revised: 08/21/2015] [Accepted: 09/10/2015] [Indexed: 12/28/2022]
Abstract
Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. Proteins 2016; 84(Suppl 1):247-259. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Renzhi Cao
- Department of Computer Science, University of Missouri, Columbia, Missouri, 65211
| | | | - Badri Adhikari
- Department of Computer Science, University of Missouri, Columbia, Missouri, 65211
| | - Jilong Li
- Department of Computer Science, University of Missouri, Columbia, Missouri, 65211
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, Missouri, 65211. .,Informatics Institute, University of Missouri, Columbia, Missouri, 65211.
| |
Collapse
|
127
|
Zhang W, Yang J, He B, Walker SE, Zhang H, Govindarajoo B, Virtanen J, Xue Z, Shen HB, Zhang Y. Integration of QUARK and I-TASSER for Ab Initio Protein Structure Prediction in CASP11. Proteins 2015; 84 Suppl 1:76-86. [PMID: 26370505 DOI: 10.1002/prot.24930] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 08/26/2015] [Accepted: 09/10/2015] [Indexed: 11/12/2022]
Abstract
We tested two pipelines developed for template-free protein structure prediction in the CASP11 experiment. First, the QUARK pipeline constructs structure models by reassembling fragments of continuously distributed lengths excised from unrelated proteins. Five free-modeling (FM) targets have the model successfully constructed by QUARK with a TM-score above 0.4, including the first model of T0837-D1, which has a TM-score = 0.736 and RMSD = 2.9 Å to the native. Detailed analysis showed that the success is partly attributed to the high-resolution contact map prediction derived from fragment-based distance-profiles, which are mainly located between regular secondary structure elements and loops/turns and help guide the orientation of secondary structure assembly. In the Zhang-Server pipeline, weakly scoring threading templates are re-ordered by the structural similarity to the ab initio folding models, which are then reassembled by I-TASSER based structure assembly simulations; 60% more domains with length up to 204 residues, compared to the QUARK pipeline, were successfully modeled by the I-TASSER pipeline with a TM-score above 0.4. The robustness of the I-TASSER pipeline can stem from the composite fragment-assembly simulations that combine structures from both ab initio folding and threading template refinements. Despite the promising cases, challenges still exist in long-range beta-strand folding, domain parsing, and the uncertainty of secondary structure prediction; the latter of which was found to affect nearly all aspects of FM structure predictions, from fragment identification, target classification, structure assembly, to final model selection. Significant efforts are needed to solve these problems before real progress on FM could be made. Proteins 2016; 84(Suppl 1):76-86. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Wenxuan Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Jianyi Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Baoji He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Sara Elizabeth Walker
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Brandon Govindarajoo
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Jouko Virtanen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Hong-Bin Shen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109. .,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109.
| |
Collapse
|
128
|
Kim H, Kihara D. Protein structure prediction using residue- and fragment-environment potentials in CASP11. Proteins 2015; 84 Suppl 1:105-17. [PMID: 26344195 DOI: 10.1002/prot.24920] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Revised: 08/03/2015] [Accepted: 08/31/2015] [Indexed: 11/08/2022]
Abstract
An accurate scoring function that can select near-native structure models from a pool of alternative models is key for successful protein structure prediction. For the critical assessment of techniques for protein structure prediction (CASP) 11, we have built a protocol of protein structure prediction that has novel coarse-grained scoring functions for selecting decoys as the heart of its pipeline. The score named PRESCO (Protein Residue Environment SCOre) developed recently by our group evaluates the native-likeness of local structural environment of residues in a structure decoy considering positions and the depth of side-chains of spatially neighboring residues. We also introduced a helix interaction potential as an additional scoring function for selecting decoys. The best models selected by PRESCO and the helix interaction potential underwent structure refinement, which includes side-chain modeling and relaxation with a short molecular dynamics simulation. Our protocol was successful, achieving the top rank in the free modeling category with a significant margin of the accumulated Z-score to the subsequent groups when the top 1 models were considered. Proteins 2016; 84(Suppl 1):105-117. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Hyungrae Kim
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, 47906
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, 47906. .,Department of Computer Science, Purdue University, West Lafayette, Indiana, 47907.
| |
Collapse
|
129
|
Yang J, Zhang W, He B, Walker SE, Zhang H, Govindarajoo B, Virtanen J, Xue Z, Shen HB, Zhang Y. Template-based protein structure prediction in CASP11 and retrospect of I-TASSER in the last decade. Proteins 2015; 84 Suppl 1:233-46. [PMID: 26343917 DOI: 10.1002/prot.24918] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Revised: 08/13/2015] [Accepted: 08/31/2015] [Indexed: 01/26/2023]
Abstract
We report the structure prediction results of a new composite pipeline for template-based modeling (TBM) in the 11th CASP experiment. Starting from multiple structure templates identified by LOMETS based meta-threading programs, the QUARK ab initio folding program is extended to generate initial full-length models under strong constraints from template alignments. The final atomic models are then constructed by I-TASSER based fragment reassembly simulations, followed by the fragment-guided molecular dynamic simulation and the MQAP-based model selection. It was found that the inclusion of QUARK-TBM simulations as an intermediate modeling step could help improve the quality of the I-TASSER models for both Easy and Hard TBM targets. Overall, the average TM-score of the first I-TASSER model is 12% higher than that of the best LOMETS templates, with the RMSD in the same threading-aligned regions reduced from 5.8 to 4.7 Å. Nevertheless, there are nearly 18% of TBM domains with the templates deteriorated by the structure assembly pipeline, which may be attributed to the errors of secondary structure and domain orientation predictions that propagate through and degrade the procedures of template identification and final model selections. To examine the record of progress, we made a retrospective report of the I-TASSER pipeline in the last five CASP experiments (CASP7-11). The data show no clear progress of the LOMETS threading programs over PSI-BLAST; but obvious progress on structural improvement relative to threading templates was witnessed in recent CASP experiments, which is probably attributed to the integration of the extended ab initio folding simulation with the threading assembly pipeline and the introduction of atomic-level structure refinements following the reduced modeling simulations. Proteins 2016; 84(Suppl 1):233-246. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Jianyi Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Wenxuan Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Baoji He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Sara Elizabeth Walker
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Brandon Govindarajoo
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Jouko Virtanen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Hong-Bin Shen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109.
| |
Collapse
|
130
|
Olson MA, Zabetakis D, Legler PM, Turner KB, Anderson GP, Goldman ER. Can template-based protein models guide the design of sequence fitness for enhanced thermal stability of single domain antibodies? Protein Eng Des Sel 2015; 28:395-402. [PMID: 26374895 DOI: 10.1093/protein/gzv047] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 08/14/2015] [Indexed: 12/18/2022] Open
Abstract
We investigate the practical use of comparative (template-based) protein models in replica-exchange simulations of single-domain antibody (sdAb) chains to evaluate if the models can correctly predict in rank order the thermal susceptibility to unfold relative to experimental melting temperatures. The baseline model system is the recently determined crystallographic structure of a llama sdAb (denoted as A3), which exhibits an unusually high thermal stability. An evaluation of the simulation results for the A3 comparative model and crystal structure shows that, despite the overall low Cα root-mean-square deviation between the two structures, the model contains misfolded regions that yields a thermal profile of unraveling at a lower temperature. Yet comparison of the simulations of four different comparative models for sdAb A3, C8, A3C8 and E9, where A3C8 is a design of swapping the sequence of the complementarity determining regions of C8 onto the A3 framework, discriminated among the sequences to detect the highest and lowest experimental melting transition temperatures. Further structural analysis of A3 for selected alanine substitutions by a combined computational and experimental study found unexpectedly that the comparative model performed admirably in recognizing substitution 'hot spots' when using a support-vector machine algorithm.
Collapse
Affiliation(s)
- Mark A Olson
- Department of Cell Biology and Biochemistry, Molecular and Translational Sciences Division, USAMRIID, Frederick, MD, USA
| | - Dan Zabetakis
- Center for Bio/Molecular Science and Engineering, Naval Research Laboratory, 4555 Overlook Avenue SW, Washington, DC, USA
| | - Patricia M Legler
- Center for Bio/Molecular Science and Engineering, Naval Research Laboratory, 4555 Overlook Avenue SW, Washington, DC, USA
| | - Kendrick B Turner
- Center for Bio/Molecular Science and Engineering, Naval Research Laboratory, 4555 Overlook Avenue SW, Washington, DC, USA
| | - George P Anderson
- Center for Bio/Molecular Science and Engineering, Naval Research Laboratory, 4555 Overlook Avenue SW, Washington, DC, USA
| | - Ellen R Goldman
- Center for Bio/Molecular Science and Engineering, Naval Research Laboratory, 4555 Overlook Avenue SW, Washington, DC, USA
| |
Collapse
|
131
|
Feig M, Mirjalili V. Protein structure refinement via molecular-dynamics simulations: What works and what does not? Proteins 2015; 84 Suppl 1:282-92. [PMID: 26234208 DOI: 10.1002/prot.24871] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Revised: 07/15/2015] [Accepted: 07/29/2015] [Indexed: 12/26/2022]
Abstract
Protein structure refinement during CASP11 by the Feig group was described. Molecular dynamics simulations were used in combination with an improved selection and averaging protocol. On average, modest refinement was achieved with some targets improved significantly. Analysis of the CASP submission from our group focused on refinement success versus amount of sampling, refinement of different secondary structure elements and whether refinement varied as a function of which group provided initial models. The refinement of local stereochemical features was examined via the MolProbity score and an updated protocol was developed that can generate high-quality structures with very low MolProbity scores for most starting structures with modest computational effort. Proteins 2016; 84(Suppl 1):282-292. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, 48824. .,Department of Chemistry, Michigan State University, East Lansing, Michigan, 48824.
| | - Vahid Mirjalili
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, 48824.,Department of Mechanical Engineering, Michigan State University, East Lansing, Michigan, 48824
| |
Collapse
|
132
|
Zhang J, Yang J, Jang R, Zhang Y. GPCR-I-TASSER: A Hybrid Approach to G Protein-Coupled Receptor Structure Modeling and the Application to the Human Genome. Structure 2015; 23:1538-1549. [PMID: 26190572 DOI: 10.1016/j.str.2015.06.007] [Citation(s) in RCA: 133] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Revised: 06/03/2015] [Accepted: 06/10/2015] [Indexed: 12/31/2022]
Abstract
Experimental structure determination remains difficult for G protein-coupled receptors (GPCRs). We propose a new hybrid protocol to construct GPCR structure models that integrates experimental mutagenesis data with ab initio transmembrane (TM) helix assembly simulations. The method was tested on 24 known GPCRs where the ab initio TM-helix assembly procedure constructed the correct fold for 20 cases. When combined with weak homology and sparse mutagenesis restraints, the method generated correct folds for all the tested cases with an average Cα root-mean-square deviation 2.4 Å in the TM regions. The new hybrid protocol was applied to model all 1,026 GPCRs in the human genome, where 923 have a high confidence score and are expected to have correct folds; these contain many pharmaceutically important families with no previously solved structures, including Trace amine, Prostanoids, Releasing hormones, Melanocortins, Vasopressin, and Neuropeptide Y receptors. The results demonstrate new progress on genome-wide structure modeling of TM proteins.
Collapse
Affiliation(s)
- Jian Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Jianyi Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA; School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Richard Jang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA; Department of Biological Chemistry, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA.
| |
Collapse
|
133
|
Elhefnawy W, Chen L, Han Y, Li Y. ICOSA: A Distance-Dependent, Orientation-Specific Coarse-Grained Contact Potential for Protein Structure Modeling. J Mol Biol 2015; 427:2562-2576. [DOI: 10.1016/j.jmb.2015.05.022] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 05/21/2015] [Indexed: 11/16/2022]
|
134
|
Zheng F, Zhang J, Grigoryan G. Tertiary Structural Propensities Reveal Fundamental Sequence/Structure Relationships. Structure 2015; 23:961-971. [DOI: 10.1016/j.str.2015.03.015] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Revised: 03/02/2015] [Accepted: 03/22/2015] [Indexed: 02/08/2023]
|
135
|
Xu Y, Zhou X, Huang M. StaRProtein, a web server for prediction of the stability of repeat proteins. PLoS One 2015; 10:e0119417. [PMID: 25807112 PMCID: PMC4373711 DOI: 10.1371/journal.pone.0119417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 01/13/2015] [Indexed: 11/25/2022] Open
Abstract
Repeat proteins have become increasingly important due to their capability to bind to almost any proteins and the potential as alternative therapy to monoclonal antibodies. In the past decade repeat proteins have been designed to mediate specific protein-protein interactions. The tetratricopeptide and ankyrin repeat proteins are two classes of helical repeat proteins that form different binding pockets to accommodate various partners. It is important to understand the factors that define folding and stability of repeat proteins in order to prioritize the most stable designed repeat proteins to further explore their potential binding affinities. Here we developed distance-dependant statistical potentials using two classes of alpha-helical repeat proteins, tetratricopeptide and ankyrin repeat proteins respectively, and evaluated their efficiency in predicting the stability of repeat proteins. We demonstrated that the repeat-specific statistical potentials based on these two classes of repeat proteins showed paramount accuracy compared with non-specific statistical potentials in: 1) discriminate correct vs. incorrect models 2) rank the stability of designed repeat proteins. In particular, the statistical scores correlate closely with the equilibrium unfolding free energies of repeat proteins and therefore would serve as a novel tool in quickly prioritizing the designed repeat proteins with high stability. StaRProtein web server was developed for predicting the stability of repeat proteins.
Collapse
Affiliation(s)
- Yongtao Xu
- School of Chemistry and Chemical Engineering, Queen's University Belfast, David Keir Building, Stranmillis Road, Belfast, Northern Ireland, United Kingdom
| | - Xu Zhou
- School of Chemistry and Chemical Engineering, Queen's University Belfast, David Keir Building, Stranmillis Road, Belfast, Northern Ireland, United Kingdom
| | - Meilan Huang
- School of Chemistry and Chemical Engineering, Queen's University Belfast, David Keir Building, Stranmillis Road, Belfast, Northern Ireland, United Kingdom
- * E-mail:
| |
Collapse
|
136
|
Chae MH, Krull F, Knapp EW. Optimized distance-dependent atom-pair-based potential DOOP for protein structure prediction. Proteins 2015; 83:881-90. [PMID: 25693513 DOI: 10.1002/prot.24782] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Revised: 02/06/2015] [Accepted: 02/10/2015] [Indexed: 12/20/2022]
Abstract
The DOcking decoy-based Optimized Potential (DOOP) energy function for protein structure prediction is based on empirical distance-dependent atom-pair interactions. To optimize the atom-pair interactions, native protein structures are decomposed into polypeptide chain segments that correspond to structural motives involving complete secondary structure elements. They constitute near native ligand-receptor systems (or just pairs). Thus, a total of 8609 ligand-receptor systems were prepared from 954 selected proteins. For each of these hypothetical ligand-receptor systems, 1000 evenly sampled docking decoys with 0-10 Å interface root-mean-square-deviation (iRMSD) were generated with a method used before for protein-protein docking. A neural network-based optimization method was applied to derive the optimized energy parameters using these decoys so that the energy function mimics the funnel-like energy landscape for the interaction between these hypothetical ligand-receptor systems. Thus, our method hierarchically models the overall funnel-like energy landscape of native protein structures. The resulting energy function was tested on several commonly used decoy sets for native protein structure recognition and compared with other statistical potentials. In combination with a torsion potential term which describes the local conformational preference, the atom-pair-based potential outperforms other reported statistical energy functions in correct ranking of native protein structures for a variety of decoy sets. This is especially the case for the most challenging ROSETTA decoy set, although it does not take into account side chain orientation-dependence explicitly. The DOOP energy function for protein structure prediction, the underlying database of protein structures with hypothetical ligand-receptor systems and their decoys are freely available at http://agknapp.chemie.fu-berlin.de/doop/.
Collapse
Affiliation(s)
- Myong-Ho Chae
- Department of Biology, University of Science, Unjong-District, Pyongyang, DPR Korea
| | | | | |
Collapse
|
137
|
He Z, Ma W, Zhang J, Xu D. A New Hidden Markov Model for Protein Quality Assessment Using Compatibility Between Protein Sequence and Structure. TSINGHUA SCIENCE AND TECHNOLOGY 2015; 19:559-567. [PMID: 26221066 PMCID: PMC4515432 DOI: 10.1109/tst.2014.6961026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Protein structure Quality Assessment (QA) is an essential component in protein structure prediction and analysis. The relationship between protein sequence and structure often serves as a basis for protein structure QA. In this work, we developed a new Hidden Markov Model (HMM) to assess the compatibility of protein sequence and structure for capturing their complex relationship. More specifically, the emission of the HMM consists of protein local structures in angular space, secondary structures, and sequence profiles. This model has two capabilities: (1) encoding local structure of each position by jointly considering sequence and structure information, and (2) assigning a global score to estimate the overall quality of a predicted structure, as well as local scores to assess the quality of specific regions of a structure, which provides useful guidance for targeted structure refinement. We compared the HMM model to state-of-art single structure quality assessment methods OPUSCA, DFIRE, GOAP, and RW in protein structure selection. Computational results showed our new score HMM.Z can achieve better overall selection performance on the benchmark datasets.
Collapse
Affiliation(s)
- Zhiquan He
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, MO 65211, USA.
| | - Wenji Ma
- Christopher S. Bond Life Sciences Center, University of Missouri, MO 65211, USA and Department of Computer Science, City University of Hong Kong, Hong Kong, China.
| | - Jingfen Zhang
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, MO 65211, USA.
| | - Dong Xu
- To whom correspondence should be addressed:
| |
Collapse
|
138
|
Faraggi E, Kloczkowski A. GENN: a GEneral Neural Network for learning tabulated data with examples from protein structure prediction. Methods Mol Biol 2015; 1260:165-78. [PMID: 25502381 PMCID: PMC6930076 DOI: 10.1007/978-1-4939-2239-0_10] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
We present a GEneral Neural Network (GENN) for learning trends from existing data and making predictions of unknown information. The main novelty of GENN is in its generality, simplicity of use, and its specific handling of windowed input/output. Its main strength is its efficient handling of the input data, enabling learning from large datasets. GENN is built on a two-layered neural network and has the option to use separate inputs-output pairs or window-based data using data structures to efficiently represent input-output pairs. The program was tested on predicting the accessible surface area of globular proteins, scoring proteins according to similarity to native, predicting protein disorder, and has performed remarkably well. In this paper we describe the program and its use. Specifically, we give as an example the construction of a similarity to native protein scoring function that was constructed using GENN. The source code and Linux executables for GENN are available from Research and Information Systems at http://mamiris.com and from the Battelle Center for Mathematical Medicine at http://mathmed.org. Bugs and problems with the GENN program should be reported to EF.
Collapse
Affiliation(s)
- Eshel Faraggi
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana 46202, USA; Battelle Center for Mathematical Medicine, Nationwide Children’s Hospital, Columbus, Ohio 43215, USA; and Physics Division, Research and Information Systems, LLC, Carmel, Indiana, 46032, USA, phone: 317-332-0368
| | - Andrzej Kloczkowski
- Andrzej Kloczkowski Battelle Center for Mathematical Medicine, Nationwide Children’s Hospital, Columbus, Ohio 43215, USA; and Department of Pediatrics, The Ohio State University, Columbus, Ohio 43215, USA
| |
Collapse
|
139
|
Kim H, Kihara D. Detecting local residue environment similarity for recognizing near-native structure models. Proteins 2014; 82:3255-72. [PMID: 25132526 PMCID: PMC4237674 DOI: 10.1002/prot.24658] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 06/10/2014] [Accepted: 07/21/2014] [Indexed: 12/14/2022]
Abstract
We developed a new representation of local amino acid environments in protein structures called the Side-chain Depth Environment (SDE). An SDE defines a local structural environment of a residue considering the coordinates and the depth of amino acids that locate in the vicinity of the side-chain centroid of the residue. SDEs are general enough that similar SDEs are found in protein structures with globally different folds. Using SDEs, we developed a procedure called PRESCO (Protein Residue Environment SCOre) for selecting native or near-native models from a pool of computational models. The procedure searches similar residue environments observed in a query model against a set of representative native protein structures to quantify how native-like SDEs in the model are. When benchmarked on commonly used computational model datasets, our PRESCO compared favorably with the other existing scoring functions in selecting native and near-native models.
Collapse
Affiliation(s)
- Hyungrae Kim
- Department of Biological Sciences, Purdue University, West Lafayette IN, 47906, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette IN, 47906, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
140
|
Thompson JJ, Tabatabaei Ghomi H, Lill MA. Application of information theory to a three-body coarse-grained representation of proteins in the PDB: insights into the structural and evolutionary roles of residues in protein structure. Proteins 2014; 82:3450-65. [PMID: 25269778 DOI: 10.1002/prot.24698] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Revised: 09/09/2014] [Accepted: 09/19/2014] [Indexed: 01/03/2023]
Abstract
Knowledge-based methods for analyzing protein structures, such as statistical potentials, primarily consider the distances between pairs of bodies (atoms or groups of atoms). Considerations of several bodies simultaneously are generally used to characterize bonded structural elements or those in close contact with each other, but historically do not consider atoms that are not in direct contact with each other. In this report, we introduce an information-theoretic method for detecting and quantifying distance-dependent through-space multibody relationships between the sidechains of three residues. The technique introduced is capable of producing convergent and consistent results when applied to a sufficiently large database of randomly chosen, experimentally solved protein structures. The results of our study can be shown to reproduce established physico-chemical properties of residues as well as more recently discovered properties and interactions. These results offer insight into the numerous roles that residues play in protein structure, as well as relationships between residue function, protein structure, and evolution. The techniques and insights presented in this work should be useful in the future development of novel knowledge-based tools for the evaluation of protein structure.
Collapse
Affiliation(s)
- Jared J Thompson
- Department of Medicinal Chemistry and Molecular Pharmacology, College of Pharmacy, Purdue University, West Lafayette, Indiana
| | | | | |
Collapse
|
141
|
Carlsen M, Koehl P, Røgen P. On the importance of the distance measures used to train and test knowledge-based potentials for proteins. PLoS One 2014; 9:e109335. [PMID: 25411785 PMCID: PMC4239004 DOI: 10.1371/journal.pone.0109335] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2014] [Accepted: 08/31/2014] [Indexed: 12/15/2022] Open
Abstract
Knowledge-based potentials are energy functions derived from the analysis of databases of protein structures and sequences. They can be divided into two classes. Potentials from the first class are based on a direct conversion of the distributions of some geometric properties observed in native protein structures into energy values, while potentials from the second class are trained to mimic quantitatively the geometric differences between incorrectly folded models and native structures. In this paper, we focus on the relationship between energy and geometry when training the second class of knowledge-based potentials. We assume that the difference in energy between a decoy structure and the corresponding native structure is linearly related to the distance between the two structures. We trained two distance-based knowledge-based potentials accordingly, one based on all inter-residue distances (PPD), while the other had the set of all distances filtered to reflect consistency in an ensemble of decoys (PPE). We tested four types of metric to characterize the distance between the decoy and the native structure, two based on extrinsic geometry (RMSD and GTD-TS*), and two based on intrinsic geometry (Q* and MT). The corresponding eight potentials were tested on a large collection of decoy sets. We found that it is usually better to train a potential using an intrinsic distance measure. We also found that PPE outperforms PPD, emphasizing the benefits of capturing consistent information in an ensemble. The relevance of these results for the design of knowledge-based potentials is discussed.
Collapse
Affiliation(s)
- Martin Carlsen
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California Davis, Davis, CA, United States of America
| | - Peter Røgen
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
- * E-mail:
| |
Collapse
|
142
|
Park J, Saitou K. ROTAS: a rotamer-dependent, atomic statistical potential for assessment and prediction of protein structures. BMC Bioinformatics 2014; 15:307. [PMID: 25236673 PMCID: PMC4262145 DOI: 10.1186/1471-2105-15-307] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2014] [Accepted: 09/09/2014] [Indexed: 12/31/2022] Open
Abstract
Background Multibody potentials accounting for cooperative effects of molecular interactions have shown better accuracy than typical pairwise potentials. The main challenge in the development of such potentials is to find relevant structural features that characterize the tightly folded proteins. Also, the side-chains of residues adopt several specific, staggered conformations, known as rotamers within protein structures. Different molecular conformations result in different dipole moments and induce charge reorientations. However, until now modeling of the rotameric state of residues had not been incorporated into the development of multibody potentials for modeling non-bonded interactions in protein structures. Results In this study, we develop a new multibody statistical potential which can account for the influence of rotameric states on the specificity of atomic interactions. In this potential, named “rotamer-dependent atomic statistical potential” (ROTAS), the interaction between two atoms is specified by not only the distance and relative orientation but also by two state parameters concerning the rotameric state of the residues to which the interacting atoms belong. It was clearly found that the rotameric state is correlated to the specificity of atomic interactions. Such rotamer-dependencies are not limited to specific type or certain range of interactions. The performance of ROTAS was tested using 13 sets of decoys and was compared to those of existing atomic-level statistical potentials which incorporate orientation-dependent energy terms. The results show that ROTAS performs better than other competing potentials not only in native structure recognition, but also in best model selection and correlation coefficients between energy and model quality. Conclusions A new multibody statistical potential, ROTAS accounting for the influence of rotameric states on the specificity of atomic interactions was developed and tested on decoy sets. The results show that ROTAS has improved ability to recognize native structure from decoy models compared to other potentials. The effectiveness of ROTAS may provide insightful information for the development of many applications which require accurate side-chain modeling such as protein design, mutation analysis, and docking simulation. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-307) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Kazuhiro Saitou
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
143
|
Manavalan B, Lee J, Lee J. Random forest-based protein model quality assessment (RFMQA) using structural features and potential energy terms. PLoS One 2014; 9:e106542. [PMID: 25222008 PMCID: PMC4164442 DOI: 10.1371/journal.pone.0106542] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 08/06/2014] [Indexed: 01/28/2023] Open
Abstract
Recently, predicting proteins three-dimensional (3D) structure from its sequence information has made a significant progress due to the advances in computational techniques and the growth of experimental structures. However, selecting good models from a structural model pool is an important and challenging task in protein structure prediction. In this study, we present the first application of random forest based model quality assessment (RFMQA) to rank protein models using its structural features and knowledge-based potential energy terms. The method predicts a relative score of a model by using its secondary structure, solvent accessibility and knowledge-based potential energy terms. We trained and tested the RFMQA method on CASP8 and CASP9 targets using 5-fold cross-validation. The correlation coefficient between the TM-score of the model selected by RFMQA (TMRF) and the best server model (TMbest) is 0.945. We benchmarked our method on recent CASP10 targets by using CASP8 and 9 server models as a training set. The correlation coefficient and average difference between TMRF and TMbest over 95 CASP10 targets are 0.984 and 0.0385, respectively. The test results show that our method works better in selecting top models when compared with other top performing methods. RFMQA is available for download from http://lee.kias.re.kr/RFMQA/RFMQA_eval.tar.gz.
Collapse
Affiliation(s)
- Balachandran Manavalan
- Center for In Silico Protein Science, School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
| | - Juyong Lee
- Center for In Silico Protein Science, School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science, School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
- * E-mail:
| |
Collapse
|
144
|
Moal IH, Jiménez-García B, Fernández-Recio J. CCharPPI web server: computational characterization of protein-protein interactions from structure. Bioinformatics 2014; 31:123-5. [PMID: 25183488 DOI: 10.1093/bioinformatics/btu594] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
SUMMARY The atomic structures of protein-protein interactions are central to understanding their role in biological systems, and a wide variety of biophysical functions and potentials have been developed for their characterization and the construction of predictive models. These tools are scattered across a multitude of stand-alone programs, and are often available only as model parameters requiring reimplementation. This acts as a significant barrier to their widespread adoption. CCharPPI integrates many of these tools into a single web server. It calculates up to 108 parameters, including models of electrostatics, desolvation and hydrogen bonding, as well as interface packing and complementarity scores, empirical potentials at various resolutions, docking potentials and composite scoring functions. AVAILABILITY AND IMPLEMENTATION The server does not require registration by the user and is freely available for non-commercial academic use at http://life.bsc.es/pid/ccharppi.
Collapse
Affiliation(s)
- Iain H Moal
- Joint BSC-IRB Research Programme in Computational Biology, Department of Life Sciences, Barcelona Supercomputing Center, C/Jordi Girona 29, 08034 Barcelona, Spain
| | - Brian Jiménez-García
- Joint BSC-IRB Research Programme in Computational Biology, Department of Life Sciences, Barcelona Supercomputing Center, C/Jordi Girona 29, 08034 Barcelona, Spain
| | - Juan Fernández-Recio
- Joint BSC-IRB Research Programme in Computational Biology, Department of Life Sciences, Barcelona Supercomputing Center, C/Jordi Girona 29, 08034 Barcelona, Spain
| |
Collapse
|
145
|
Nguyen SP, Shang Y, Xu D. DL-PRO: A Novel Deep Learning Method for Protein Model Quality Assessment. PROCEEDINGS OF ... INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS. INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2014; 2014:2071-2078. [PMID: 25392745 PMCID: PMC4226404 DOI: 10.1109/ijcnn.2014.6889891] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Computational protein structure prediction is very important for many applications in bioinformatics. In the process of predicting protein structures, it is essential to accurately assess the quality of generated models. Although many single-model quality assessment (QA) methods have been developed, their accuracy is not high enough for most real applications. In this paper, a new approach based on C-α atoms distance matrix and machine learning methods is proposed for single-model QA and the identification of native-like models. Different from existing energy/scoring functions and consensus approaches, this new approach is purely geometry based. Furthermore, a novel algorithm based on deep learning techniques, called DL-Pro, is proposed. For a protein model, DL-Pro uses its distance matrix that contains pairwise distances between two residues' C-α atoms in the model, which sometimes is also called contact map, as an orientation-independent representation. From training examples of distance matrices corresponding to good and bad models, DL-Pro learns a stacked autoencoder network as a classifier. In experiments on selected targets from the Critical Assessment of Structure Prediction (CASP) competition, DL-Pro obtained promising results, outperforming state-of-the-art energy/scoring functions, including OPUS-CA, DOPE, DFIRE, and RW.
Collapse
Affiliation(s)
- Son P. Nguyen
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Yi Shang
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Dong Xu
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA. Christopher S. Bond Life Science Center, University of Missouri at Columbia
| |
Collapse
|
146
|
Chen Y, Shang Y, Xu D. Multi-Dimensional Scaling and MODELLER-Based Evolutionary Algorithms for Protein Model Refinement. PROCEEDINGS OF THE ... CONGRESS ON EVOLUTIONARY COMPUTATION. CONGRESS ON EVOLUTIONARY COMPUTATION 2014; 2014:1038-1045. [PMID: 25844403 PMCID: PMC4380876 DOI: 10.1109/cec.2014.6900443] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Protein structure prediction, i.e., computationally predicting the three-dimensional structure of a protein from its primary sequence, is one of the most important and challenging problems in bioinformatics. Model refinement is a key step in the prediction process, where improved structures are constructed based on a pool of initially generated models. Since the refinement category was added to the biennial Critical Assessment of Structure Prediction (CASP) in 2008, CASP results show that it is a challenge for existing model refinement methods to improve model quality consistently. This paper presents three evolutionary algorithms for protein model refinement, in which multidimensional scaling(MDS), the MODELLER software, and a hybrid of both are used as crossover operators, respectively. The MDS-based method takes a purely geometrical approach and generates a child model by combining the contact maps of multiple parents. The MODELLER-based method takes a statistical and energy minimization approach, and uses the remodeling module in MODELLER program to generate new models from multiple parents. The hybrid method first generates models using the MDS-based method and then run them through the MODELLER-based method, aiming at combining the strength of both. Promising results have been obtained in experiments using CASP datasets. The MDS-based method improved the best of a pool of predicted models in terms of the global distance test score (GDT-TS) in 9 out of 16test targets.
Collapse
Affiliation(s)
- Yan Chen
- Yan Chen, Yi Shang, and Dong Xu are with the Department of Computer Science, University of Missouri, Columbia, MO 65211 USA. Dong Xu is also with the Christopher S. Bond Life Science Center, University of Missouri. (, , and )
| | - Yi Shang
- Yan Chen, Yi Shang, and Dong Xu are with the Department of Computer Science, University of Missouri, Columbia, MO 65211 USA. Dong Xu is also with the Christopher S. Bond Life Science Center, University of Missouri. (, , and )
| | - Dong Xu
- Yan Chen, Yi Shang, and Dong Xu are with the Department of Computer Science, University of Missouri, Columbia, MO 65211 USA. Dong Xu is also with the Christopher S. Bond Life Science Center, University of Missouri. (, , and )
| |
Collapse
|
147
|
Liu Y, Zeng J, Gong H. Improving the orientation-dependent statistical potential using a reference state. Proteins 2014; 82:2383-93. [DOI: 10.1002/prot.24600] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2014] [Revised: 04/30/2014] [Accepted: 05/05/2014] [Indexed: 12/23/2022]
Affiliation(s)
- Yufeng Liu
- MOE Key Laboratory of Bioinformatics; School of Life Sciences, Tsinghua University; Beijing 100084 China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University; Beijing 100084 China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics; School of Life Sciences, Tsinghua University; Beijing 100084 China
| |
Collapse
|
148
|
Olson MA, Lee MS. Evaluation of unrestrained replica-exchange simulations using dynamic walkers in temperature space for protein structure refinement. PLoS One 2014; 9:e96638. [PMID: 24848767 PMCID: PMC4029997 DOI: 10.1371/journal.pone.0096638] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 04/09/2014] [Indexed: 01/09/2023] Open
Abstract
A central problem of computational structural biology is the refinement of modeled protein structures taken from either comparative modeling or knowledge-based methods. Simulations are commonly used to achieve higher resolution of the structures at the all-atom level, yet methodologies that consistently yield accurate results remain elusive. In this work, we provide an assessment of an adaptive temperature-based replica exchange simulation method where the temperature clients dynamically walk in temperature space to enrich their population and exchanges near steep energetic barriers. This approach is compared to earlier work of applying the conventional method of static temperature clients to refine a dataset of conformational decoys. Our results show that, while an adaptive method has many theoretical advantages over a static distribution of client temperatures, only limited improvement was gained from this strategy in excursions of the downhill refinement regime leading to an increase in the fraction of native contacts. To illustrate the sampling differences between the two simulation methods, energy landscapes are presented along with their temperature client profiles.
Collapse
Affiliation(s)
- Mark A. Olson
- Department of Cell Biology and Biochemistry, Molecular and Translational Sciences, USAMRIID, Fredrick, Maryland, United States of America
- Advanced Academic Programs, Zanvyl Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Michael S. Lee
- Computational Sciences Division, U.S. Army Research Laboratory, Aberdeen Proving Ground, Maryland, United States of America
| |
Collapse
|
149
|
Improvement in low-homology template-based modeling by employing a model evaluation method with focus on topology. PLoS One 2014; 9:e89935. [PMID: 24587135 PMCID: PMC3935967 DOI: 10.1371/journal.pone.0089935] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Accepted: 01/24/2014] [Indexed: 01/22/2023] Open
Abstract
Many template-based modeling (TBM) methods have been developed over the recent years that allow for protein structure prediction and for the study of structure-function relationships for proteins. One major problem all TBM algorithms face, however, is their unsatisfactory performance when proteins under consideration are low-homology. To improve the performance of TBM methods for such targets, a novel model evaluation method was developed here, and named MEFTop. Our novel method focuses on evaluating the topology by using two novel groups of features. These novel features included secondary structure element (SSE) contact information and 3-dimensional topology information. By combining MEFTop algorithm with FR-t5, a threading program developed by our group, we found that this modified TBM program, which was named FR-t5-M, exhibited significant improvements in predictive abilities for low-homology protein targets. We further showed that the MEFTop could be a generalized method to improve threading programs for low-homology protein targets. The softwares (FR-t5-M and MEFTop) are available to non-commercial users at our website: http://jianglab.ibp.ac.cn/lims/FRt5M/FRt5M.html.
Collapse
|
150
|
Ghosh S, Vishveshwara S. Ranking the quality of protein structure models using sidechain based network properties. F1000Res 2014; 3:17. [PMID: 25580218 PMCID: PMC4038323 DOI: 10.12688/f1000research.3-17.v1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/20/2014] [Indexed: 01/31/2023] Open
Abstract
Determining the correct structure of a protein given its sequence still remains an arduous task with many researchers working towards this goal. Most structure prediction methodologies result in the generation of a large number of probable candidates with the final challenge being to select the best amongst these. In this work, we have used Protein Structure Networks of native and modeled proteins in combination with Support Vector Machines to estimate the quality of a protein structure model and finally to provide ranks for these models. Model ranking is performed using regression analysis and helps in model selection from a group of many similar and good quality structures. Our results show that structures with a rank greater than 16 exhibit native protein-like properties while those below 10 are non-native like. The tool is also made available as a web-server ( http://vishgraph.mbu.iisc.ernet.in/GraProStr/native_non_native_ranking.html), where, 5 modelled structures can be evaluated at a given time.
Collapse
Affiliation(s)
- Soma Ghosh
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India ; I.I.Sc. Mathematics Initiative, Indian Institute of Science, Bangalore, 560012, India
| | | |
Collapse
|