1151
|
Abstract
Motivation: One of the major bottlenecks with ab initio protein folding is an effective conformation sampling algorithm that can generate native-like conformations quickly. The popular fragment assembly method generates conformations by restricting the local conformations of a protein to short structural fragments in the PDB. This method may limit conformations to a subspace to which the native fold does not belong because (i) a protein with really new fold may contain some structural fragments not in the PDB and (ii) the discrete nature of fragments may prevent them from building a native-like fold. Previously we have developed a conditional random fields (CRF) method for fragment-free protein folding that can sample conformations in a continuous space and demonstrated that this CRF method compares favorably to the popular fragment assembly method. However, the CRF method is still limited by its capability of generating conformations compatible with a sequence. Results: We present a new fragment-free approach to protein folding using a recently invented probabilistic graphical model conditional neural fields (CNF). This new CNF method is much more powerful than CRF in modeling the sophisticated protein sequence-structure relationship and thus, enables us to generate native-like conformations more easily. We show that when coupled with a simple energy function and replica exchange Monte Carlo simulation, our CNF method can generate decoys much better than CRF on a variety of test proteins including the CASP8 free-modeling targets. In particular, our CNF method can predict a correct fold for T0496_D1, one of the two CASP8 targets with truly new fold. Our predicted model for T0496 is significantly better than all the CASP8 models. Contact:jinboxu@gmail.com
Collapse
Affiliation(s)
- Feng Zhao
- Toyota Technological Institute, Chicago, IL 60637, USA
| | | | | |
Collapse
|
1152
|
Abstract
Motivation: The challenge of template-based modeling lies in the recognition of correct templates and generation of accurate sequence-template alignments. Homologous information has proved to be very powerful in detecting remote homologs, as demonstrated by the state-of-the-art profile-based method HHpred. However, HHpred does not fare well when proteins under consideration are low-homology. A protein is low-homology if we cannot obtain sufficient amount of homologous information for it from existing protein sequence databases. Results: We present a profile-entropy dependent scoring function for low-homology protein threading. This method will model correlation among various protein features and determine their relative importance according to the amount of homologous information available. When proteins under consideration are low-homology, our method will rely more on structure information; otherwise, homologous information. Experimental results indicate that our threading method greatly outperforms the best profile-based method HHpred and all the top CASP8 servers on low-homology proteins. Tested on the CASP8 hard targets, our threading method is also better than all the top CASP8 servers but slightly worse than Zhang-Server. This is significant considering that Zhang-Server and other top CASP8 servers use a combination of multiple structure-prediction techniques including consensus method, multiple-template modeling, template-free modeling and model refinement while our method is a classical single-template-based threading method without any post-threading refinement. Contact:jinboxu@gmail.com
Collapse
Affiliation(s)
- Jian Peng
- Toyota Technological Institute at Chicago, IL 60637, USA
| | | |
Collapse
|
1153
|
Wu S, Zhang Y. Recognizing protein substructure similarity using segmental threading. Structure 2010; 18:858-67. [PMID: 20637422 DOI: 10.1016/j.str.2010.04.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2010] [Revised: 04/02/2010] [Accepted: 04/03/2010] [Indexed: 11/15/2022]
Abstract
Protein template identification is essential to protein structure and function predictions. However, conventional whole-chain threading approaches often fail to recognize conserved substructure motifs when the target and templates do not share the same fold. We developed a new approach, SEGMER, for identifying protein substructure similarities by segmental threading. The target sequence is split into segments of two to four consecutive or nonconsecutive secondary structural elements, which are then threaded through PDB to identify appropriate substructure motifs. SEGMER is tested on 144 nonredundant hard proteins. When combined with whole-chain threading, the TM-score of alignments and accuracy of spatial restraints of SEGMER increase by 16% and 25%, respectively, compared with that by the whole-chain threading methods only. When tested on 12 free modeling targets from CASP8, SEGMER increases the TM-score and contact accuracy by 28% and 48%, respectively. This significant improvement should have important impact on protein structure modeling and functional inference.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, USA
| | | |
Collapse
|
1154
|
|
1155
|
Multiple templates-based homology modeling enhances structure quality of AT1 receptor: validation by molecular dynamics and antagonist docking. J Mol Model 2010; 17:1565-77. [DOI: 10.1007/s00894-010-0860-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2010] [Accepted: 09/24/2010] [Indexed: 10/19/2022]
|
1156
|
Lin MS, Head-Gordon T. Reliable protein structure refinement using a physical energy function. J Comput Chem 2010; 32:709-17. [DOI: 10.1002/jcc.21664] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2010] [Revised: 08/02/2010] [Accepted: 08/07/2010] [Indexed: 11/10/2022]
|
1157
|
Han L, Monné M, Okumura H, Schwend T, Cherry AL, Flot D, Matsuda T, Jovine L. Insights into Egg Coat Assembly and Egg-Sperm Interaction from the X-Ray Structure of Full-Length ZP3. Cell 2010; 143:404-15. [DOI: 10.1016/j.cell.2010.09.041] [Citation(s) in RCA: 118] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2010] [Revised: 08/11/2010] [Accepted: 08/24/2010] [Indexed: 11/15/2022]
|
1158
|
Abstract
UNLABELLED Bioinformaticians are tackling increasingly computation-intensive tasks. In the meantime, workstations are shifting towards multi-core architectures and even massively multi-core may be the norm soon. Bag-of-Tasks (BoT) applications are commonly encountered in bioinformatics. They consist of a large number of independent computation-intensive tasks. This note introduces PAR, a scalable, dynamic, parallel and distributed execution engine for Bag-of-Tasks. PAR is aimed at multi-core architectures and small clusters. Accelerations obtained thanks to PAR on two different applications are shown. AVAILABILITY PAR is released under the GNU General Public License version three and can be freely downloaded (http://download.savannah.gnu.org/releases/par/par.tgz).
Collapse
Affiliation(s)
- Francois Berenger
- Zhang Initiative Research Unit, Advanced Science Institute, RIKEN, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
| | | | | |
Collapse
|
1159
|
Brylinski M, Lee SY, Zhou H, Skolnick J. The utility of geometrical and chemical restraint information extracted from predicted ligand-binding sites in protein structure refinement. J Struct Biol 2010; 173:558-69. [PMID: 20850544 DOI: 10.1016/j.jsb.2010.09.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2010] [Revised: 09/08/2010] [Accepted: 09/10/2010] [Indexed: 01/01/2023]
Abstract
Exhaustive exploration of molecular interactions at the level of complete proteomes requires efficient and reliable computational approaches to protein function inference. Ligand docking and ranking techniques show considerable promise in their ability to quantify the interactions between proteins and small molecules. Despite the advances in the development of docking approaches and scoring functions, the genome-wide application of many ligand docking/screening algorithms is limited by the quality of the binding sites in theoretical receptor models constructed by protein structure prediction. In this study, we describe a new template-based method for the local refinement of ligand-binding regions in protein models using remotely related templates identified by threading. We designed a Support Vector Regression (SVR) model that selects correct binding site geometries in a large ensemble of multiple receptor conformations. The SVR model employs several scoring functions that impose geometrical restraints on the Cα positions, account for the specific chemical environment within a binding site and optimize the interactions with putative ligands. The SVR score is well correlated with the RMSD from the native structure; in 47% (70%) of the cases, the Pearson's correlation coefficient is >0.5 (>0.3). When applied to weakly homologous models, the average heavy atom, local RMSD from the native structure of the top-ranked (best of top five) binding site geometries is 3.1Å (2.9Å) for roughly half of the targets; this represents a 0.1 (0.3)Å average improvement over the original predicted structure. Focusing on the subset of strongly conserved residues, the average heavy atom RMSD is 2.6Å (2.3Å). Furthermore, we estimate the upper bound of template-based binding site refinement using only weakly related proteins to be ∼2.6Å RMSD. This value also corresponds to the plasticity of the ligand-binding regions in distant homologues. The Binding Site Refinement (BSR) approach is available to the scientific community as a web server that can be accessed at http://cssb.biology.gatech.edu/bsr/.
Collapse
Affiliation(s)
- Michal Brylinski
- Center for the Study of Systems Biology, Georgia Institute of Technology, Atlanta, GA 30318, USA
| | | | | | | |
Collapse
|
1160
|
Martin AJM, Walsh I, Tosatto SCE. MOBI: a web server to define and visualize structural mobility in NMR protein ensembles. Bioinformatics 2010; 26:2916-7. [PMID: 20861031 DOI: 10.1093/bioinformatics/btq537] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION MOBI is a web server for the identification of structurally mobile regions in NMR protein ensembles. It provides a binary mobility definition that is analogous to the commonly used definition of intrinsic disorder in X-ray crystallographic structures. At least three different use cases can be envisaged: (i) visualization of NMR mobility for structural analysis; (ii) definition of regions for reliable comparative modelling in protein structure prediction and (iii) definition of mobility in analogy to intrinsic disorder. MOBI uses structural superposition and local conformational differences to derive a robust binary mobility definition that is in excellent agreement with the manually curated definition used in the CASP8 experiment for intrinsic disorder in NMR structure. The output includes mobility-coloured PDB files, mobility plots and a FASTA formatted sequence file summarizing the mobility results. AVAILABILITY The MOBI server and supplementary methods are available for non-commercial use at URL: http://protein.bio.unipd.it/mobi/.
Collapse
Affiliation(s)
- Alberto J M Martin
- Department of Biology, University of Padova, viale G. Colombo 3, 35131 Padova, Italy
| | | | | |
Collapse
|
1161
|
Csaba G, Zimmer R. Vorescore--fold recognition improved by rescoring of protein structure models. Bioinformatics 2010; 26:i474-81. [PMID: 20823310 PMCID: PMC2935407 DOI: 10.1093/bioinformatics/btq369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Summary: The identification of good protein structure models and their appropriate ranking is a crucial problem in structure prediction and fold recognition. For many alignment methods, rescoring of alignment-induced models using structural information can improve the separation of useful and less useful models as compared with the alignment score. Vorescore, a template-based protein structure model rescoring system is introduced. The method scores the model structure against the template used for the modeling using Vorolign. The method works on models from different alignment methods and incorporates both knowledge from the prediction method and the rescoring. Results: The performance of Vorescore is evaluated in a large-scale and difficult protein structure prediction context. We use different threading methods to create models for 410 targets, in three scenarios: (i) family members are contained in the template set; (ii) superfamily members (but no family members); and (iii) only fold members (but no family or superfamily members). In all cases Vorescore improves significantly (e.g. 40% on both Gotoh and HHalign at the fold level) on the model quality, and clearly outperforms the state-of-the-art physics-based model scoring system Rosetta. Moreover, Vorescore improves on other successful rescoring approaches such as Pcons and ProQ. In an additional experiment we add high-quality models based on structural alignments to the set, which allows Vorescore to improve the fold recognition rate by another 50%. Availability: All models of the test set (about 2 million, 44 GB gzipped) are available upon request. Contact:csaba@bio.ifi.lmu.de; ralf.zimmer@ifi.lmu.de
Collapse
Affiliation(s)
- Gergely Csaba
- Department of Informatics, Ludwig-Maximilians-Universität München, München, Germany.
| | | |
Collapse
|
1162
|
Karmazinova M, Beyl S, Stary-Weinzinger A, Suwattanasophon C, Klugbauer N, Hering S, Lacinova L. Cysteines in the loop between IS5 and the pore helix of CaV3.1 are essential for channel gating. Pflugers Arch 2010; 460:1015-28. [DOI: 10.1007/s00424-010-0874-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2010] [Revised: 08/09/2010] [Accepted: 08/17/2010] [Indexed: 11/28/2022]
|
1163
|
Kotchoni SO, Jimenez-Lopez JC, Gao D, Edwards V, Gachomo EW, Margam VM, Seufferheld MJ. Modeling-dependent protein characterization of the rice aldehyde dehydrogenase (ALDH) superfamily reveals distinct functional and structural features. PLoS One 2010; 5:e11516. [PMID: 20634950 PMCID: PMC2902511 DOI: 10.1371/journal.pone.0011516] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2010] [Accepted: 06/16/2010] [Indexed: 12/04/2022] Open
Abstract
The completion of the rice genome sequence has made it possible to identify and characterize new genes and to perform comparative genomics studies across taxa. The aldehyde dehydrogenase (ALDH) gene superfamily encoding for NAD(P)(+)-dependent enzymes is found in all major plant and animal taxa. However, the characterization of plant ALDHs has lagged behind their animal- and prokaryotic-ALDH homologs. In plants, ALDHs are involved in abiotic stress tolerance, male sterility restoration, embryo development and seed viability and maturation. However, there is still no structural property-dependent functional characterization of ALDH protein superfamily in plants. In this paper, we identify members of the rice ALDH gene superfamily and use the evolutionary nesting events of retrotransposons and protein-modeling-based structural reconstitution to report the genetic and molecular and structural features of each member of the rice ALDH superfamily in abiotic/biotic stress responses and developmental processes. Our results indicate that rice-ALDHs are the most expanded plant ALDHs ever characterized. This work represents the first report of specific structural features mediating functionality of the whole families of ALDHs in an organism ever characterized.
Collapse
Affiliation(s)
- Simeon O Kotchoni
- Department of Agronomy, Purdue University, West Lafayette, Indiana, United States of America.
| | | | | | | | | | | | | |
Collapse
|
1164
|
Gao M, Skolnick J. iAlign: a method for the structural comparison of protein-protein interfaces. ACTA ACUST UNITED AC 2010; 26:2259-65. [PMID: 20624782 DOI: 10.1093/bioinformatics/btq404] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Protein-protein interactions play an essential role in many cellular processes. The rapid accumulation of protein-protein complex structures provides an unprecedented opportunity for comparative studies of protein-protein interactions. To facilitate such studies, it is necessary to develop an accurate and efficient computational algorithm for the comparison of protein-protein interaction modes. While there are many structural comparison approaches developed for individual proteins, very few methods are available for protein-protein complexes. RESULTS We present a novel interface alignment method, iAlign, for the structural alignment of protein-protein interfaces. New scoring schemes for measuring interface similarity are introduced, and an iterative dynamic programming algorithm is implemented. We find that the similarity scores follow extreme value distributions. Using statistical models, we empirically estimate their statistical significance, which is in good agreement with manual classifications by human experts. Large-scale tests of iAlign were conducted on both artificial docking models and experimental structures. In a benchmark test on 1517 dimers, iAlign successfully detects biologically related, structurally similar protein-protein interfaces at a coverage percentage of 90% and an error per query of 0.05. When compared against previously published methods, iAlign is substantially more accurate and efficient. AVAILABILITY The iAlign software package is freely available at http://cssb.biology.gatech.edu/iAlign.
Collapse
Affiliation(s)
- Mu Gao
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, GA, USA
| | | |
Collapse
|
1165
|
Choi Y, Deane CM. FREAD revisited: Accurate loop structure prediction using a database search algorithm. Proteins 2010; 78:1431-40. [PMID: 20034110 DOI: 10.1002/prot.22658] [Citation(s) in RCA: 121] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Loops are the most variable regions of protein structure and are, in general, the least accurately predicted. Their prediction has been approached in two ways, ab initio and database search. In recent years, it has been thought that ab initio methods are more powerful. In light of the continued rapid expansion in the number of known protein structures, we have re-evaluated FREAD, a database search method and demonstrate that the power of database search methods may have been underestimated. We found that sequence similarity as quantified by environment specific substitution scores can be used to significantly improve prediction. In fact, FREAD performs appreciably better for an identifiable subset of loops (two thirds of shorter loops and half of the longer loops tested) than the ab initio methods of MODELLER, PLOP, and RAPPER. Within this subset, FREAD's predictive ability is length independent, in general, producing results within 2A RMSD, compared to an average of over 10A for loop length 20 for any of the other tested methods. We also benchmarked the prediction protocols on a set of 212 loops from the model structures in CASP 7 and 8. An extended version of FREAD is able to make predictions for 127 of these, it gives the best prediction of the methods tested in 61 of these cases. In examining FREAD's ability to predict in the model environment, we found that whole structure quality did not affect the quality of loop predictions.
Collapse
Affiliation(s)
- Yoonjoo Choi
- Department of Statistics, Oxford University, United Kingdom.
| | | |
Collapse
|
1166
|
Rajgaria R, Wei Y, Floudas CA. Contact prediction for beta and alpha-beta proteins using integer linear optimization and its impact on the first principles 3D structure prediction method ASTRO-FOLD. Proteins 2010; 78:1825-46. [PMID: 20225257 PMCID: PMC2858251 DOI: 10.1002/prot.22696] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
An integer linear optimization model is presented to predict residue contacts in beta, alpha + beta, and alpha/beta proteins. The total energy of a protein is expressed as sum of a C(alpha)-C(alpha) distance dependent contact energy contribution and a hydrophobic contribution. The model selects contact that assign lowest energy to the protein structure as satisfying a set of constraints that are included to enforce certain physically observed topological information. A new method based on hydrophobicity is proposed to find the beta-sheet alignments. These beta-sheet alignments are used as constraints for contacts between residues of beta-sheets. This model was tested on three independent protein test sets and CASP8 test proteins consisting of beta, alpha + beta, alpha/beta proteins and it was found to perform very well. The average accuracy of the predictions (separated by at least six residues) was approximately 61%. The average true positive and false positive distances were also calculated for each of the test sets and they are 7.58 A and 15.88 A, respectively. Residue contact prediction can be directly used to facilitate the protein tertiary structure prediction. This proposed residue contact prediction model is incorporated into the first principles protein tertiary structure prediction approach, ASTRO-FOLD. The effectiveness of the contact prediction model was further demonstrated by the improvement in the quality of the protein structure ensemble generated using the predicted residue contacts for a test set of 10 proteins.
Collapse
Affiliation(s)
- R. Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - Y. Wei
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - C. A. Floudas
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| |
Collapse
|
1167
|
Zhang J, Wang Q, Barz B, He Z, Kosztin I, Shang Y, Xu D. MUFOLD: A new solution for protein 3D structure prediction. Proteins 2010; 78:1137-52. [PMID: 19927325 DOI: 10.1002/prot.22634] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
There have been steady improvements in protein structure prediction during the past 2 decades. However, current methods are still far from consistently predicting structural models accurately with computing power accessible to common users. Toward achieving more accurate and efficient structure prediction, we developed a number of novel methods and integrated them into a software package, MUFOLD. First, a systematic protocol was developed to identify useful templates and fragments from Protein Data Bank for a given target protein. Then, an efficient process was applied for iterative coarse-grain model generation and evaluation at the Calpha or backbone level. In this process, we construct models using interresidue spatial restraints derived from alignments by multidimensional scaling, evaluate and select models through clustering and static scoring functions, and iteratively improve the selected models by integrating spatial restraints and previous models. Finally, the full-atom models were evaluated using molecular dynamics simulations based on structural changes under simulated heating. We have continuously improved the performance of MUFOLD by using a benchmark of 200 proteins from the Astral database, where no template with >25% sequence identity to any target protein is included. The average root-mean-square deviation of the best models from the native structures is 4.28 A, which shows significant and systematic improvement over our previous methods. The computing time of MUFOLD is much shorter than many other tools, such as Rosetta. MUFOLD demonstrated some success in the 2008 community-wide experiment for protein structure prediction CASP8.
Collapse
Affiliation(s)
- Jingfen Zhang
- Department of Computer Science, University of Missouri, Columbia, Missouri 65211, USA
| | | | | | | | | | | | | |
Collapse
|
1168
|
Abstract
The success of ligand docking calculations typically depends on the quality of the receptor structure. Given improvements in protein structure prediction approaches, approximate protein models now can be routinely obtained for the majority of gene products in a given proteome. Structure-based virtual screening of large combinatorial libraries of lead candidates against theoretically modeled receptor structures requires fast and reliable docking techniques capable of dealing with structural inaccuracies in protein models. Here, we present Q-Dock(LHM), a method for low-resolution refinement of binding poses provided by FINDSITE(LHM), a ligand homology modeling approach. We compare its performance to that of classical ligand docking approaches in ligand docking against a representative set of experimental (both holo and apo) as well as theoretically modeled receptor structures. Docking benchmarks reveal that unlike all-atom docking, Q-Dock(LHM) exhibits the desired tolerance to the receptor's structure deformation. Our results suggest that the use of an evolution-based approach to ligand homology modeling followed by fast low-resolution refinement is capable of achieving satisfactory performance in ligand-binding pose prediction with promising applicability to proteome-scale applications.
Collapse
Affiliation(s)
- Michal Brylinski
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318
| |
Collapse
|
1169
|
Roy A, Kucukural A, Zhang Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 2010; 5:725-38. [PMID: 20360767 PMCID: PMC2849174 DOI: 10.1038/nprot.2010.5] [Citation(s) in RCA: 4739] [Impact Index Per Article: 338.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The iterative threading assembly refinement (I-TASSER) server is an integrated platform for automated protein structure and function prediction based on the sequence-to-structure-to-function paradigm. Starting from an amino acid sequence, I-TASSER first generates three-dimensional (3D) atomic models from multiple threading alignments and iterative structural assembly simulations. The function of the protein is then inferred by structurally matching the 3D models with other known proteins. The output from a typical server run contains full-length secondary and tertiary structure predictions, and functional annotations on ligand-binding sites, Enzyme Commission numbers and Gene Ontology terms. An estimate of accuracy of the predictions is provided based on the confidence score of the modeling. This protocol provides new insights and guidelines for designing of online server systems for the state-of-the-art protein structure and function predictions. The server is available at http://zhanglab.ccmb.med.umich.edu/I-TASSER.
Collapse
Affiliation(s)
- Ambrish Roy
- Center for Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Ave, Ann Arbor, MI 48109, USA
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA
| | - Alper Kucukural
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA
| | - Yang Zhang
- Center for Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Ave, Ann Arbor, MI 48109, USA
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA
| |
Collapse
|
1170
|
Wang Z, Eickholt J, Cheng J. MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8. Bioinformatics 2010; 26:882-8. [PMID: 20150411 PMCID: PMC2844995 DOI: 10.1093/bioinformatics/btq058] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Revised: 02/02/2010] [Accepted: 02/08/2010] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Protein structure prediction is one of the most important problems in structural bioinformatics. Here we describe MULTICOM, a multi-level combination approach to improve the various steps in protein structure prediction. In contrast to those methods which look for the best templates, alignments and models, our approach tries to combine complementary and alternative templates, alignments and models to achieve on average better accuracy. RESULTS The multi-level combination approach was implemented via five automated protein structure prediction servers and one human predictor which participated in the eighth Critical Assessment of Techniques for Protein Structure Prediction (CASP8), 2008. The MULTICOM servers and human predictor were consistently ranked among the top predictors on the CASP8 benchmark. The methods can predict moderate- to high-resolution models for most template-based targets and low-resolution models for some template-free targets. The results show that the multi-level combination of complementary templates, alternative alignments and similar models aided by model quality assessment can systematically improve both template-based and template-free protein modeling. AVAILABILITY The MULTICOM server is freely available at http://casp.rnet.missouri.edu/multicom_3d.html .
Collapse
Affiliation(s)
- Zheng Wang
- Department of Computer Science, Informatics Institute and C. Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA
| | | | | |
Collapse
|
1171
|
Girgis HZ, Corso JJ, Fischer D. On-line hierarchy of general linear models for selecting and ranking the best predicted protein structures. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2010; 2009:4949-53. [PMID: 19963875 DOI: 10.1109/iembs.2009.5332706] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
To predict the three dimensional structure of proteins, many computational methods sample the conformational space, generating a large number of candidate structures. Subsequently, such methods rank the generated structures using a variety of model quality assessment programs in order to obtain a small set of structures that are most likely to resemble the unknown experimentally determined structure. Model quality assessment programs suffer from two main limitations: (i) the rank-one structure is not always the best predicted structure; in other words, the best predicted structure could be ranked as the 10th structure (ii) no single assessment method can correctly rank the predicted structures for all target proteins. However, because often at least some of the methods achieve a good ranking, a model quality assessment method that is based on a consensus of a number of model quality assessment methods is likely to perform better. We have devised the STPdata algorithm, a consensus method based on five model quality assessment programs. We have applied it to build an on-line "custom-trained" hierarchy of general linear models to select and rank the best predicted structures. By "custom-trained", we mean for each target protein the STPdata algorithm trains a unique model on data related to the input target protein. To evaluate our method we participated in CASP8 as human predictors. In CASP8, the STPdata algorithm has trained 128 hierarchical models for each of the 128 target proteins. Based on the official results of CASP8 our method outperformed the best server by 6% and won the fourth position among human predictors. Our CASP results are purely based on computational methods without any human intervention.
Collapse
Affiliation(s)
- Hani Zakaria Girgis
- Computer Science Department, The Johns Hopkins University, Baltimore, MD, USA.
| | | | | |
Collapse
|
1172
|
Pascual-García A, Abia D, Méndez R, Nido GS, Bastolla U. Quantifying the evolutionary divergence of protein structures: the role of function change and function conservation. Proteins 2010; 78:181-96. [PMID: 19830831 DOI: 10.1002/prot.22616] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The molecular clock hypothesis, stating that protein sequences diverge in evolution by accumulating amino acid substitutions at an almost constant rate, played a major role in the development of molecular evolution and boosted quantitative theories of evolutionary change. These studies were extended to protein structures by the seminal paper by Chothia and Lesk, which established the approximate proportionality between structure and sequence divergence. Here we analyse how function influences the relationship between sequence and structure divergence, studying four large superfamilies of evolutionarily related proteins: globins, aldolases, P-loop and NADP-binding. We introduce the contact divergence, which is more consistent with sequence divergence than previously used structure divergence measures. Our main findings are: (1) Small structure and sequence divergences are proportional, consistent with the molecular clock. Approximate validity of the clock is also supported by the analysis of the clustering coefficient of structure similarity networks. (2) Functional constraints strongly limit the structure divergence of proteins performing the same function and may allow to identify incomplete or wrong functional annotations. (3) The rate of structure versus sequence divergence is larger for proteins performing different functions than for proteins performing the same function. We conjecture that this acceleration is due to positive selection for new functions. Accelerations in structure divergence are also suggested by the analysis of the clustering coefficient. (4) For low sequence identity, structural diversity explodes. We conjecture that this explosion is related to functional diversification. (5) Large indels are almost always associated with function changes.
Collapse
|
1173
|
Zhou F, Chen H, Xu Y. GASdb: a large-scale and comparative exploration database of glycosyl hydrolysis systems. BMC Microbiol 2010; 10:69. [PMID: 20202206 PMCID: PMC2838879 DOI: 10.1186/1471-2180-10-69] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2009] [Accepted: 03/04/2010] [Indexed: 11/29/2022] Open
Abstract
Background The genomes of numerous cellulolytic organisms have been recently sequenced or in the pipeline of being sequenced. Analyses of these genomes as well as the recently sequenced metagenomes in a systematic manner could possibly lead to discoveries of novel biomass-degradation systems in nature. Description We have identified 4,679 and 49,099 free acting glycosyl hydrolases with or without carbohydrate binding domains, respectively, by scanning through all the proteins in the UniProt Knowledgebase and the JGI Metagenome database. Cellulosome components were observed only in bacterial genomes, and 166 cellulosome-dependent glycosyl hydrolases were identified. We observed, from our analysis data, unexpected wide distributions of two less well-studied bacterial glycosyl hydrolysis systems in which glycosyl hydrolases may bind to the cell surface directly rather than through linking to surface anchoring proteins, or cellulosome complexes may bind to the cell surface by novel mechanisms other than the other used SLH domains. In addition, we found that animal-gut metagenomes are substantially enriched with novel glycosyl hydrolases. Conclusions The identified biomass degradation systems through our large-scale search are organized into an easy-to-use database GASdb at http://csbl.bmb.uga.edu/~ffzhou/GASdb/, which should be useful to both experimental and computational biofuel researchers.
Collapse
Affiliation(s)
- Fengfeng Zhou
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | | | | |
Collapse
|
1174
|
Faraggi E, Yang Y, Zhang S, Zhou Y. Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. Structure 2010; 17:1515-27. [PMID: 19913486 DOI: 10.1016/j.str.2009.09.006] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2009] [Revised: 09/01/2009] [Accepted: 09/03/2009] [Indexed: 11/30/2022]
Abstract
Local structures predicted from protein sequences are used extensively in every aspect of modeling and prediction of protein structure and function. For more than 50 years, they have been predicted at a low-resolution coarse-grained level (e.g., three-state secondary structure). Here, we combine a two-state classifier with real-value predictor to predict local structure in continuous representation by backbone torsion angles. The accuracy of the angles predicted by this approach is close to that derived from NMR chemical shifts. Their substitution for predicted secondary structure as restraints for ab initio structure prediction doubles the success rate. This result demonstrates the potential of predicted local structure for fragment-free tertiary-structure prediction. It further implies potentially significant benefits from using predicted real-valued torsion angles as a replacement for or supplement to the secondary-structure prediction tools used almost exclusively in many computational methods ranging from sequence alignment to function prediction.
Collapse
Affiliation(s)
- Eshel Faraggi
- Indiana University School of Informatics, Indiana University-Purdue University and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | | | | |
Collapse
|
1175
|
Buck PM, Bystroff C. Simulating protein folding initiation sites using an alpha-carbon-only knowledge-based force field. Proteins 2010; 76:331-42. [PMID: 19137613 DOI: 10.1002/prot.22348] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Protein folding is a hierarchical process where structure forms locally first, then globally. Some short sequence segments initiate folding through strong structural preferences that are independent of their three-dimensional context in proteins. We have constructed a knowledge-based force field in which the energy functions are conditional on local sequence patterns, as expressed in the hidden Markov model for local structure (HMMSTR). Carbon-alpha force field (CALF) builds sequence specific statistical potentials based on database frequencies for alpha-carbon virtual bond opening and dihedral angles, pair-wise contacts and hydrogen bond donor-acceptor pairs, and simulates folding via Brownian dynamics. We introduce hydrogen bond donor and acceptor potentials as alpha-carbon probability fields that are conditional on the predicted local sequence. Constant temperature simulations were carried out using 27 peptides selected as putative folding initiation sites, each 12 residues in length, representing several different local structure motifs. Each 0.6 micros trajectory was clustered based on structure. Simulation convergence or representativeness was assessed by subdividing trajectories and comparing clusters. For 21 of the 27 sequences, the largest cluster made up more than half of the total trajectory. Of these 21 sequences, 14 had cluster centers that were at most 2.6 A root mean square deviation (RMSD) from their native structure in the corresponding full-length protein. To assess the adequacy of the energy function on nonlocal interactions, 11 full length native structures were relaxed using Brownian dynamics simulations. Equilibrated structures deviated from their native states but retained their overall topology and compactness. A simple potential that folds proteins locally and stabilizes proteins globally may enable a more realistic understanding of hierarchical folding pathways.
Collapse
Affiliation(s)
- Patrick M Buck
- Department of Biology, Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, New York, USA
| | | |
Collapse
|
1176
|
Margelevicius M, Venclovas C. Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison. BMC Bioinformatics 2010; 11:89. [PMID: 20158924 PMCID: PMC2837030 DOI: 10.1186/1471-2105-11-89] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2009] [Accepted: 02/17/2010] [Indexed: 01/31/2023] Open
Abstract
Background Detection of common evolutionary origin (homology) is a primary means of inferring protein structure and function. At present, comparison of protein families represented as sequence profiles is arguably the most effective homology detection strategy. However, finding the best way to represent evolutionary information of a protein sequence family in the profile, to compare profiles and to estimate the biological significance of such comparisons, remains an active area of research. Results Here, we present a new homology detection method based on sequence profile-profile comparison. The method has a number of new features including position-dependent gap penalties and a global score system. Position-dependent gap penalties provide a more biologically relevant way to represent and align protein families as sequence profiles. The global score system enables an analytical solution of the statistical parameters needed to estimate the statistical significance of profile-profile similarities. The new method, together with other state-of-the-art profile-based methods (HHsearch, COMPASS and PSI-BLAST), is benchmarked in all-against-all comparison of a challenging set of SCOP domains that share at most 20% sequence identity. For benchmarking, we use a reference ("gold standard") free model-based evaluation framework. Evaluation results show that at the level of protein domains our method compares favorably to all other tested methods. We also provide examples of the new method outperforming structure-based similarity detection and alignment. The implementation of the new method both as a standalone software package and as a web server is available at http://www.ibt.lt/bioinformatics/coma. Conclusion Due to a number of developments, the new profile-profile comparison method shows an improved ability to match distantly related protein domains. Therefore, the method should be useful for annotation and homology modeling of uncharacterized proteins.
Collapse
|
1177
|
Abstract
MOTIVATION Protein structure similarity is often measured by root mean squared deviation, global distance test score and template modeling score (TM-score). However, the scores themselves cannot provide information on how significant the structural similarity is. Also, it lacks a quantitative relation between the scores and conventional fold classifications. This article aims to answer two questions: (i) what is the statistical significance of TM-score? (ii) What is the probability of two proteins having the same fold given a specific TM-score? RESULTS We first made an all-to-all gapless structural match on 6684 non-homologous single-domain proteins in the PDB and found that the TM-scores follow an extreme value distribution. The data allow us to assign each TM-score a P-value that measures the chance of two randomly selected proteins obtaining an equal or higher TM-score. With a TM-score at 0.5, for instance, its P-value is 5.5 x 10(-7), which means we need to consider at least 1.8 million random protein pairs to acquire a TM-score of no less than 0.5. Second, we examine the posterior probability of the same fold proteins from three datasets SCOP, CATH and the consensus of SCOP and CATH. It is found that the posterior probability from different datasets has a similar rapid phase transition around TM-score=0.5. This finding indicates that TM-score can be used as an approximate but quantitative criterion for protein topology classification, i.e. protein pairs with a TM-score >0.5 are mostly in the same fold while those with a TM-score <0.5 are mainly not in the same fold.
Collapse
Affiliation(s)
- Jinrui Xu
- Department of Medical School, Center for Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | | |
Collapse
|
1178
|
Jefferys BR, Kelley LA, Sternberg MJE. Protein folding requires crowd control in a simulated cell. J Mol Biol 2010; 397:1329-38. [PMID: 20149797 PMCID: PMC2891488 DOI: 10.1016/j.jmb.2010.01.074] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Revised: 12/30/2009] [Accepted: 01/02/2010] [Indexed: 11/09/2022]
Abstract
Macromolecular crowding has a profound effect upon biochemical processes in the cell. We have computationally studied the effect of crowding upon protein folding for 12 small domains in a simulated cell using a coarse-grained protein model, which is based upon Langevin dynamics, designed to unify the often disjoint goals of protein folding simulation and structure prediction. The model can make predictions of native conformation with accuracy comparable with that of the best current template-free models. It is fast enough to enable a more extensive analysis of crowding than previously attempted, studying several proteins at many crowding levels and further random repetitions designed to more closely approximate the ensemble of conformations. We found that when crowding approaches 40% excluded volume, the maximum level found in the cell, proteins fold to fewer native-like states. Notably, when crowding is increased beyond this level, there is a sudden failure of protein folding: proteins fix upon a structure more quickly and become trapped in extended conformations. These results suggest that the ability of small protein domains to fold without the help of chaperones may be an important factor in limiting the degree of macromolecular crowding in the cell. Here, we discuss the possible implications regarding the relationship between protein expression level, protein size, chaperone activity and aggregation.
Collapse
Affiliation(s)
- Benjamin R Jefferys
- Division of Molecular Biosciences, Biochemistry Building, Imperial College London, South Kensington, London SW7 2AZ, UK.
| | | | | |
Collapse
|
1179
|
Wolff K, Vendruscolo M, Porto M. Efficient identification of near-native conformations in ab initio protein structure prediction using structural profiles. Proteins 2010; 78:249-58. [PMID: 19701942 DOI: 10.1002/prot.22533] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
One of the major bottlenecks in many ab initio protein structure prediction methods is currently the selection of a small number of candidate structures for high-resolution refinement from large sets of low-resolution decoys. This step often includes a scoring by low-resolution energy functions and a clustering of conformations by their pairwise root mean square deviations (RMSDs). As an efficient selection is crucial to reduce the overall computational cost of the predictions, any improvement in this direction can increase the overall performance of the predictions and the range of protein structures that can be predicted. We show here that the use of structural profiles, which can be predicted with good accuracy from the amino acid sequences of proteins, provides an efficient means to identify good candidate structures.
Collapse
Affiliation(s)
- Katrin Wolff
- Institut für Festkörperphysik, Technische Universität Darmstadt, 64289 Darmstadt, Germany
| | | | | |
Collapse
|
1180
|
Evans P, Sacan A, Ungar L, Tozeren A. Sequence alignment reveals possible MAPK docking motifs on HIV proteins. PLoS One 2010; 5:e8942. [PMID: 20126615 PMCID: PMC2812490 DOI: 10.1371/journal.pone.0008942] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2009] [Accepted: 01/11/2010] [Indexed: 01/18/2023] Open
Abstract
Over the course of HIV infection, virus replication is facilitated by the phosphorylation of HIV proteins by human ERK1 and ERK2 mitogen-activated protein kinases (MAPKs). MAPKs are known to phosphorylate their substrates by first binding with them at a docking site. Docking site interactions could be viable drug targets because the sequences guiding them are more specific than phosphorylation consensus sites. In this study we use multiple bioinformatics tools to discover candidate MAPK docking site motifs on HIV proteins known to be phosphorylated by MAPKs, and we discuss the possibility of targeting docking sites with drugs. Using sequence alignments of HIV proteins of different subtypes, we show that MAPK docking patterns previously described for human proteins appear on the HIV matrix, Tat, and Vif proteins in a strain dependent manner, but are absent from HIV Rev and appear on all HIV Nef strains. We revise the regular expressions of previously annotated MAPK docking patterns in order to provide a subtype independent motif that annotates all HIV proteins. One revision is based on a documented human variant of one of the substrate docking motifs, and the other reduces the number of required basic amino acids in the standard docking motifs from two to one. The proposed patterns are shown to be consistent with in silico docking between ERK1 and the HIV matrix protein. The motif usage on HIV proteins is sufficiently different from human proteins in amino acid sequence similarity to allow for HIV specific targeting using small-molecule drugs.
Collapse
Affiliation(s)
- Perry Evans
- Genomics and Computational Biology and Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Ahmet Sacan
- Center for Integrated Bioinformatics, School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Lyle Ungar
- Genomics and Computational Biology and Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Aydin Tozeren
- Center for Integrated Bioinformatics, School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
1181
|
Remmert M, Biegert A, Linke D, Lupas AN, Söding J. Evolution of outer membrane beta-barrels from an ancestral beta beta hairpin. Mol Biol Evol 2010; 27:1348-58. [PMID: 20106904 DOI: 10.1093/molbev/msq017] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Outer membrane beta-barrels (OMBBs) are the major class of outer membrane proteins from Gram-negative bacteria, mitochondria, and plastids. Their transmembrane domains consist of 8-24 beta-strands forming a closed, barrel-shaped beta-sheet around a central pore. Despite their obvious structural regularity, evidence for an origin by duplication or for a common ancestry has not been found. We use three complementary approaches to show that all OMBBs from Gram-negative bacteria evolved from a single, ancestral beta beta hairpin. First, we link almost all families of known single-chain bacterial OMBBs with each other through transitive profile searches. Second, we identify a clear repeat signature in the sequences of many OMBBs in which the repeating sequence unit coincides with the structural beta beta hairpin repeat. Third, we show that the observed sequence similarity between OMBB hairpins cannot be explained by structural or membrane constraints on their sequences. The third approach addresses a longstanding problem in protein evolution: how to distinguish between a very remotely homologous relationship and the opposing scenario of "sequence convergence." The origin of a diverse group of proteins from a single hairpin module supports the hypothesis that, around the time of transition from the RNA to the protein world, proteins arose by amplification and recombination of short peptide modules that had previously evolved as cofactors of RNAs.
Collapse
Affiliation(s)
- M Remmert
- Department of Biochemistry, Gene Center Munich and Center for Integrated Protein Science (CIPSM), Ludwig-Maximilians-Universtät München, Munich, Germany
| | | | | | | | | |
Collapse
|
1182
|
Hildebrand A, Remmert M, Biegert A, Söding J. Fast and accurate automatic structure prediction with HHpred. Proteins 2010; 77 Suppl 9:128-32. [PMID: 19626712 DOI: 10.1002/prot.22499] [Citation(s) in RCA: 345] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Automated protein structure prediction is becoming a mainstream tool for biological research. This has been fueled by steady improvements of publicly available automated servers over the last decade, in particular their ability to build good homology models for an increasing number of targets by reliably detecting and aligning more and more remotely homologous templates. Here, we describe the three fully automated versions of the HHpred server that participated in the community-wide blind protein structure prediction competition CASP8. What makes HHpred unique is the combination of usability, short response times (typically under 15 min) and a model accuracy that is competitive with those of the best servers in CASP8.
Collapse
Affiliation(s)
- Andrea Hildebrand
- Gene Center and Center for Integrated Protein Science (Munich), Ludwig-Maximilians-University Munich, 81377 Munich, Germany
| | | | | | | |
Collapse
|
1183
|
Zhou H, Pandit SB, Skolnick J. Performance of the Pro-sp3-TASSER server in CASP8. Proteins 2010; 77 Suppl 9:123-7. [PMID: 19639638 DOI: 10.1002/prot.22501] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The performance of the protein structure prediction server pro-sp3-TASSER in CASP8 is described. Compared to CASP7, the major improvement in prediction is in the quality of input models to TASSER. These improvements are due to the PRO-SP(3) threading method, the improved quality of contact predictions provided by TASSER_2.0, multiple short TASSER simulations for building the full-length model, and the accuracy of model selection using the TASSER-QA quality assessment method. Finally, we analyze the overall performance and highlight some successful predictions of the pro-sp3-TASSER server.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | | | |
Collapse
|
1184
|
Larsson P, Skwark MJ, Wallner B, Elofsson A. Assessment of global and local model quality in CASP8 using Pcons and ProQ. Proteins 2010; 77 Suppl 9:167-72. [PMID: 19544566 DOI: 10.1002/prot.22476] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Model Quality Assessment Programs (MQAPs) are programs developed to rank protein models. These methods can be trained to predict the overall global quality of a model or what local regions in a model that are likely to be incorrect. In CASP8, we participated with two predictors that predict both global and local quality using either consensus information, Pcons, or purely structural information, ProQ. Consistently with results in previous CASPs, the best performance in CASP8 was obtained using the Pcons method. Furthermore, the results show that the modification introduced into Pcons for CASP8 improved the predictions against GDT_TS and now a correlation coefficient above 0.9 is achieved, whereas the correlation for ProQ is about 0.7. The correlation is better for the easier than for the harder targets, but it is not below 0.5 for a single target and below 0.7 only for three targets. The correlation coefficient for the best local quality MQAP is 0.68 showing that there is still clear room for improvement within this area. We also detect that Pcons still is not always able to identify the best model. However, we show that using a linear combination of Pcons and ProQ it is possible to select models that are better than the models from the best single server. In particular, the average quality over the hard targets increases by about 6% compared with using Pcons alone.
Collapse
Affiliation(s)
- Per Larsson
- Department of Biochemistry and Biophysics, Center for Biomembrane Research, Stockholm Bioinformatics Center, Stockholm University, SE-10691 Stockholm, Sweden
| | | | | | | |
Collapse
|
1185
|
Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A. Evaluation of template-based models in CASP8 with standard measures. Proteins 2010; 77 Suppl 9:18-28. [PMID: 19731382 DOI: 10.1002/prot.22561] [Citation(s) in RCA: 108] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The strategy for evaluating template-based models submitted to CASP has continuously evolved from CASP1 to CASP5, leading to a standard procedure that has been used in all subsequent editions. The established approach includes methods for calculating the quality of each individual model, for assigning scores based on the distribution of the results for each target and for computing the statistical significance of the differences in scores between prediction methods. These data are made available to the assessor of the template-based modeling category, who uses them as a starting point for further evaluations and analyses. This article describes the detailed workflow of the procedure, provides justifications for a number of choices that are customarily made for CASP data evaluation, and reports the results of the analysis of template-based predictions at CASP8.
Collapse
Affiliation(s)
- Domenico Cozzetto
- Department of Biochemical Sciences, Sapienza-University of Rome, P. le A. Moro, 5, 00185 Rome, Italy
| | | | | | - John Moult
- Center for Advanced Research in Biotechnology, University of Maryland, Rockville, Maryland 20850
| | - Burkhard Rost
- Department of Biochemistry and Molecular Biophysics, Columbia University, Northeast Structural Genomics Consortium (NESG) and New York Consortium on Membrane Proteins (NYCOMPS), Columbia University, New York, New York 10032
| | - Anna Tramontano
- Department of Biochemical Sciences, Sapienza-University of Rome, P. le A. Moro, 5, 00185 Rome, Italy.,Istituto Pasteur-Fondazione Cenci Bolognetti, Sapienza-University of Rome, P. le A. Moro, 5, 00185 Rome, Italy
| |
Collapse
|
1186
|
Benkert P, Tosatto SCE, Schwede T. Global and local model quality estimation at CASP8 using the scoring functions QMEAN and QMEANclust. Proteins 2010; 77 Suppl 9:173-80. [PMID: 19705484 DOI: 10.1002/prot.22532] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Identifying the best candidate model among an ensemble of alternatives is crucial in protein structure prediction. For this purpose, scoring functions have been developed which either calculate a quality estimate on the basis of a single model or derive a score from the information contained in the ensemble of models generated for a given sequence (i.e., consensus methods). At CASP7, consensus methods have performed considerably better than scoring functions operating on single models. However, consensus methods tend to fail if the best models are far from the center of the dominant structural cluster. At CASP8, we investigated whether our hybrid method QMEANclust may overcome this limitation by combining the QMEAN composite scoring function operating on single models with consensus information. We participated with four different scoring functions in the quality assessment category. The QMEANclust consensus scoring function turned out to be a successful method both for the ranking of entire models but especially for the estimation of the per-residue model quality. In this article, we briefly describe the two scoring functions QMEAN and QMEANclust and discuss their performance in the context of what went right and wrong at CASP8. Both scoring functions are publicly available at http://swissmodel.expasy.org/qmean/.
Collapse
Affiliation(s)
- Pascal Benkert
- Biozentrum, University of Basel, Basel 4056, Switzerland
| | | | | |
Collapse
|
1187
|
Abstract
The I-TASSER algorithm for 3D protein structure prediction was tested in CASP8, with the procedure fully automated in both the Server and Human sections. The quality of the server models is close to that of human ones but the human predictions incorporate more diverse templates from other servers which improve the human predictions in some of the distant homology targets. For the first time, the sequence-based contact predictions from machine learning techniques are found helpful for both template-based modeling (TBM) and template-free modeling (FM). In TBM, although the accuracy of the sequence based contact predictions is on average lower than that from template-based ones, the novel contacts in the sequence-based predictions, which are complementary to the threading templates in the weakly or unaligned regions, are important to improve the global and local packing in these regions. Moreover, the newly developed atomic structural refinement algorithm was tested in CASP8 and found to improve the hydrogen-bonding networks and the overall TM-score, which is mainly due to its ability of removing steric clashes so that the models can be generated from cluster centroids. Nevertheless, one of the major issues of the I-TASSER pipeline is the model selection where the best models could not be appropriately recognized when the correct templates are detected only by the minority of the threading algorithms. There are also problems related with domain-splitting and mirror image recognition which mainly influences the performance of I-TASSER modeling in the FM-based structure predictions.
Collapse
Affiliation(s)
- Yang Zhang
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, Lawrence, Kansas 66047, USA.
| |
Collapse
|
1188
|
Computational and single-molecule force studies of a macro domain protein reveal a key molecular determinant for mechanical stability. Proc Natl Acad Sci U S A 2010; 107:1989-94. [PMID: 20080695 DOI: 10.1073/pnas.0905796107] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Resolving molecular determinants of mechanical stability of proteins is crucial in the rational design of advanced biomaterials for use in biomedical and nanotechnological applications. Here we present an interdisciplinary study combining bioinformatics screening, steered molecular dynamics simulations, protein engineering, and single-molecule force spectroscopy that explores the mechanical properties of a macro domain protein with mixed alpha + beta topology. The unique architecture is defined by a single seven-stranded beta-sheet in the core of the protein flanked by five alpha-helices. Unlike mechanically stable proteins studied thus far, the macro domain provides the distinct advantage of having the key load-bearing hydrogen bonds (H bonds) buried in the hydrophobic core protected from water attacks. This feature allows direct measurement of the force required to break apart the load-bearing H bonds under locally hydrophobic conditions. Steered molecular dynamics simulations predicted extremely high mechanical stability of the macro domain by using constant velocity and constant force methods. Single-molecule force spectroscopy experiments confirm the exceptional mechanical strength of the macro domain, measuring a rupture force as high as 570 pN. Furthermore, through selective deletion of shielding peptide segments, we examined the same key H bonds under hydrophilic environments in which the beta-strands are exposed to solvent and verify that the high mechanical stability of the macro domain results from excellent shielding of the load-bearing H bonds from competing water. Our study reveals that shielding water accessibility to the load-bearing strands is a critical molecular determinant for enhancing the mechanical stability of proteins.
Collapse
|
1189
|
Brylinski M, Skolnick J. Comparison of structure-based and threading-based approaches to protein functional annotation. Proteins 2010; 78:118-34. [PMID: 19731377 PMCID: PMC2804779 DOI: 10.1002/prot.22566] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
To exploit the vast amount of sequence information provided by the Genomic revolution, the biological function of these sequences must be identified. As a practical matter, this is often accomplished by functional inference. Purely sequence-based approaches, particularly in the "twilight zone" of low sequence similarity levels, are complicated by many factors. For proteins, structure-based techniques aim to overcome these problems; however, most require high-quality crystal structures and suffer from complex and equivocal relations between protein fold and function. In this study, in extensive benchmarking, we consider a number of aspects of structure-based functional annotation: binding pocket detection, molecular function assignment and ligand-based virtual screening. We demonstrate that protein threading driven by a strong sequence profile component greatly improves the quality of purely structure-based functional annotation in the "twilight zone." By detecting evolutionarily related proteins, it considerably reduces the high false positive rate of function inference derived on the basis of global structure similarity alone. Combined evolution/structure-based function assignment emerges as a powerful technique that can make a significant contribution to comprehensive proteome annotation.
Collapse
Affiliation(s)
- Michal Brylinski
- Center for the Study of Systems Biology School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318
| |
Collapse
|
1190
|
Prediction of calcium-binding sites by combining loop-modeling with machine learning. BMC STRUCTURAL BIOLOGY 2009; 9:72. [PMID: 20003365 PMCID: PMC2808310 DOI: 10.1186/1472-6807-9-72] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2009] [Accepted: 12/11/2009] [Indexed: 01/23/2023]
Abstract
Background Protein ligand-binding sites in the apo state exhibit structural flexibility. This flexibility often frustrates methods for structure-based recognition of these sites because it leads to the absence of electron density for these critical regions, particularly when they are in surface loops. Methods for recognizing functional sites in these missing loops would be useful for recovering additional functional information. Results We report a hybrid approach for recognizing calcium-binding sites in disordered regions. Our approach combines loop modeling with a machine learning method (FEATURE) for structure-based site recognition. For validation, we compared the performance of our method on known calcium-binding sites for which there are both holo and apo structures. When loops in the apo structures are rebuilt using modeling methods, FEATURE identifies 14 out of 20 crystallographically proven calcium-binding sites. It only recognizes 7 out of 20 calcium-binding sites in the initial apo crystal structures. We applied our method to unstructured loops in proteins from SCOP families known to bind calcium in order to discover potential cryptic calcium binding sites. We built 2745 missing loops and evaluated them for potential calcium binding. We made 102 predictions of calcium-binding sites. Ten predictions are consistent with independent experimental verifications. We found indirect experimental evidence for 14 other predictions. The remaining 78 predictions are novel predictions, some with intriguing potential biological significance. In particular, we see an enrichment of beta-sheet folds with predicted calcium binding sites in the connecting loops on the surface that may be important for calcium-mediated function switches. Conclusion Protein crystal structures are a potentially rich source of functional information. When loops are missing in these structures, we may be losing important information about binding sites and active sites. We have shown that limited loop modeling (e.g. loops less than 17 residues) combined with pattern matching algorithms can recover functions and propose putative conformations associated with these functions.
Collapse
|
1191
|
Expression and structural characterization of peripherin/RDS, a membrane protein implicated in photoreceptor outer segment morphology. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2009; 39:679-88. [DOI: 10.1007/s00249-009-0553-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2009] [Revised: 10/01/2009] [Accepted: 10/09/2009] [Indexed: 10/20/2022]
|
1192
|
Májek P, Elber R. A coarse-grained potential for fold recognition and molecular dynamics simulations of proteins. Proteins 2009; 76:822-36. [PMID: 19291741 DOI: 10.1002/prot.22388] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A coarse-grained potential for protein simulations and fold ranking is presented. The potential is based on a two-point model of individual amino acids and a specific implementation of hydrogen bonding. Parameters are determined for distance dependent pair interactions, pseudo bonds, angles, and torsions. A scaling factor for a hydrogen bonding term is also determined. Iterative sampling for 4867 proteins reproduces distributions of internal coordinates and distances observed in the Protein Data Bank. The adjustment of the potential and resampling are in the spirit of the generalized ensemble approach. No native structure information (e.g., secondary structure) is used in the calculation of the potential or in the simulation of a particular protein. The potential is subject to two tests as follows: (i) simulations of 956 globular proteins in the neighborhood of their native folds (these proteins were not used in the training set) and (ii) discrimination between native and decoy structures for 2470 proteins with 305,000 decoys and the "Decoys 'R' Us" dataset. In the first test, 58% of tested proteins stay within 5 A from the native fold in Molecular Dynamics simulations of more than 20 nanoseconds using the new potential. The potential is also useful in differentiating between correct and approximate folds providing significant signal for structure prediction algorithms. Sampling with the potential consistently regenerates the distribution of distances and internal coordinates it learned. Nevertheless, during Molecular Dynamics simulations structures are found that reproduce the learned distributions but are far from the native fold.
Collapse
Affiliation(s)
- Peter Májek
- Department of Computer Science, Upson Hall 4130, Cornell University, Ithaca, New York 14853-7501, USA
| | | |
Collapse
|
1193
|
McGuffin LJ, Roche DB. Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments. Bioinformatics 2009; 26:182-8. [PMID: 19897565 DOI: 10.1093/bioinformatics/btp629] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The accurate prediction of the quality of 3D models is a key component of successful protein tertiary structure prediction methods. Currently, clustering- or consensus-based Model Quality Assessment Programs (MQAPs) are the most accurate methods for predicting 3D model quality; however, they are often CPU intensive as they carry out multiple structural alignments in order to compare numerous models. In this study, we describe ModFOLDclustQ--a novel MQAP that compares 3D models of proteins without the need for CPU intensive structural alignments by utilizing the Q measure for model comparisons. The ModFOLDclustQ method is benchmarked against the top established methods in terms of both accuracy and speed. In addition, the ModFOLDclustQ scores are combined with those from our older ModFOLDclust method to form a new method, ModFOLDclust2, that aims to provide increased prediction accuracy with negligible computational overhead. RESULTS The ModFOLDclustQ method is competitive with leading clustering-based MQAPs for the prediction of global model quality, yet it is up to 150 times faster than the previous version of the ModFOLDclust method at comparing models of small proteins (<60 residues) and over five times faster at comparing models of large proteins (>800 residues). Furthermore, a significant improvement in accuracy can be gained over the previous clustering-based MQAPs by combining the scores from ModFOLDclustQ and ModFOLDclust to form the new ModFOLDclust2 method, with little impact on the overall time taken for each prediction. AVAILABILITY The ModFOLDclustQ and ModFOLDclust2 methods are available to download from http://www.reading.ac.uk/bioinf/downloads/.
Collapse
Affiliation(s)
- Liam J McGuffin
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK.
| | | |
Collapse
|
1194
|
|
1195
|
Abstract
Undertaker is a program designed to help predict protein structure using alignments to proteins of known structure and fragment assembly. The program generates conformations and uses cost functions to select the best structures from among the generated conformations. This paper describes the use of Undertaker's cost functions for model quality assessment. We achieve an accuracy that is similar to other methods, without using consensus-based techniques. Adding consensus-based features further improves our approach substantially. We report several correlation measures, including a new weighted version of Kendall's tau (tau(3)) and show model quality assessment results superior to previously published results on all correlation measures when using only models with no missing atoms.
Collapse
Affiliation(s)
- John Archie
- University of California at Santa Cruz, Biomolecular Engineering, Santa Cruz, CA, USA
| | | |
Collapse
|
1196
|
Soriano-Ursúa MA, Trujillo-Ferrara JG, Correa-Basurto J. Homology modeling and flex-ligand docking studies on the guinea pig beta(2) adrenoceptor: structural and experimental similarities/ differences with the human beta(2). J Mol Model 2009; 15:1203-11. [PMID: 19263094 DOI: 10.1007/s00894-009-0480-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2008] [Accepted: 01/24/2009] [Indexed: 02/07/2023]
Abstract
The trachea of a guinea pig is widely used in drug development assays focused on the treatment of pulmonary diseases. Some of these drugs relax the airways by binding to the guinea pig beta(2)-adrenoceptor (Gbeta(2)AR). In this work, the amino acid sequence of the Gbeta(2)AR was searched to carry out homology modeling, using the Swiss-Model server, with the human beta(2)AR as the parent template. The Gbeta(2)AR 3-D structure was structurally and energetically optimized in vacuo using NAMD 2.6 program. The refined 3-D model obtained was used for further study. Molecular docking simulations were performed by testing a set of well-known beta(2)AR ligands using the AutoDock 3.0.5 program. The results show that the homology model of Gbeta(2)AR has a 3-D structure very similar to the crystal structure of recently studied human beta(2)AR. This was also corroborated by identity (94.23%), Ramachandran map, and docking results. The theoretical simulation showed that the ligands bind at sites that are similar to those reported for the human beta(2)AR. The R-enantiomer ligands showed correlation with in vitro data. We have obtained a Gbeta(2)AR 3-D model which can be used to carry out computational screening as a complementary tool during the drug design and experimental tests under guinea pig models.
Collapse
|
1197
|
Gao X, Xu J, Li SC, Li M. Predicting local quality of a sequence-structure alignment. J Bioinform Comput Biol 2009; 7:789-810. [PMID: 19785046 DOI: 10.1142/s0219720009004345] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2009] [Revised: 04/06/2009] [Accepted: 04/07/2009] [Indexed: 11/18/2022]
Abstract
Although protein structure prediction has made great progress in recent years, a protein model derived from automated prediction methods is subject to various errors. As methods for structure prediction develop, a continuing problem is how to evaluate the quality of a protein model, especially to identify some well-predicted regions of the model, so that the structural biology community can benefit from the automated structure prediction. It is also important to identify badly-predicted regions in a model so that some refinement measurements can be applied to it. We present two complementary techniques, FragQA and PosQA, to accurately predict local quality of a sequence-structure (i.e. sequence-template) alignment generated by comparative modeling (i.e. homology modeling and threading). FragQA and PosQA predict local quality from two different perspectives. Different from existing methods, FragQA directly predicts cRMSD between a continuously aligned fragment determined by an alignment and the corresponding fragment in the native structure, while PosQA predicts the quality of an individual aligned position. Both FragQA and PosQA use an SVM (Support Vector Machine) regression method to perform prediction using similar information extracted from a single given alignment. Experimental results demonstrate that FragQA performs well on predicting local fragment quality, and PosQA outperforms two top-notch methods, ProQres and ProQprof. Our results indicate that (1) local quality can be predicted well; (2) local sequence evolutionary information (i.e. sequence similarity) is the major factor in predicting local quality; and (3) structural information such as solvent accessibility and secondary structure helps to improve the prediction performance.
Collapse
Affiliation(s)
- Xin Gao
- David R. Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, N2L 3G1, Canada.
| | | | | | | |
Collapse
|
1198
|
Joo K, Lee J, Seo JH, Lee K, Kim BG, Lee J. All-atom chain-building by optimizing MODELLER energy function using conformational space annealing. Proteins 2009; 75:1010-23. [PMID: 19089941 DOI: 10.1002/prot.22312] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We have investigated the effect of rigorous optimization of the MODELLER energy function for possible improvement in protein all-atom chain-building. For this we applied the global optimization method called conformational space annealing (CSA) to the standard MODELLER procedure to achieve better energy optimization than what MODELLER provides. The method, which we call MODELLERCSA, is tested on two benchmark sets. The first is the 298 proteins taken from the HOMSTRAD multiple alignment set. By simply optimizing the MODELLER energy function, we observe significant improvement in side-chain modeling, where MODELLERCSA provides about 10.7% (14.5%) improvement for chi(1) (chi(1) + chi(2)) accuracy compared to the standard MODELLER modeling. The improvement of backbone accuracy by MODELLERCSA is shown to be less prominent, and a similar improvement can be achieved by simply generating many standard MODELLER models and selecting lowest energy models. However, the level of side-chain modeling accuracy by MODELLERCSA could not be matched either by extensive MODELLER strategies, side-chain remodeling by SCWRL3, or copying unmutated rotamers. The identical procedure was successfully applied to 100 CASP7 template base modeling domains during the prediction season in a blind fashion, and the results are included here for comparison. From this study, we observe a good correlation between the MODELLER energy and the side-chain accuracy. Our findings indicate that, when a good alignment between a target protein and its templates is provided, thorough optimization of the MODELLER energy function leads to accurate all-atom models.
Collapse
Affiliation(s)
- Keehyoung Joo
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 130-722, Korea
| | | | | | | | | | | |
Collapse
|
1199
|
Seo JH, Lee GS, Kim J, Cho BK, Joo K, Lee J, Kim BG. Automatic protein structure prediction system enabling rapid and accurate model building for enzyme screening. Enzyme Microb Technol 2009. [DOI: 10.1016/j.enzmictec.2009.05.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
1200
|
Li Y, Zhang Y. REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins 2009; 76:665-76. [PMID: 19274737 PMCID: PMC2771173 DOI: 10.1002/prot.22380] [Citation(s) in RCA: 99] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Protein structure prediction approaches usually perform modeling simulations based on reduced representation of protein structures. For biological utilizations, it is an important step to construct full atomic models from the reduced structure decoys. Most of the current full atomic model reconstruction procedures have defects which either could not completely remove the steric clashes among backbone atoms or generate final atomic models with worse topology similarity relative to the native structures than the reduced models. In this work, we develop a new protocol, called REMO, to generate full atomic protein models by optimizing the hydrogen-bonding network with basic fragments matched from a newly constructed backbone isomer library of solved protein structures. The algorithm is benchmarked on 230 nonhomologous proteins with reduced structure decoys generated by I-TASSER simulations. The results show that REMO has a significant ability to remove steric clashes, and meanwhile retains good topology of the reduced model. The hydrogen-bonding network of the final models is dramatically improved during the procedure. The REMO algorithm has been exploited in the recent CASP8 experiment which demonstrated significant improvements of the I-TASSER models in both atomic-level structural refinement and hydrogen-bonding network construction.
Collapse
Affiliation(s)
- Yunqi Li
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA
| | - Yang Zhang
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA
| |
Collapse
|