1
|
Machat M, Langenfeld F, Craciun D, Sirugue L, Labib T, Lagarde N, Maria M, Montes M. Comparative evaluation of shape retrieval methods on macromolecular surfaces: an application of computer vision methods in structural bioinformatics. Bioinformatics 2021; 37:4375-4382. [PMID: 34247232 PMCID: PMC8652110 DOI: 10.1093/bioinformatics/btab511] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 05/18/2021] [Accepted: 07/08/2021] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION The investigation of the structure of biological systems at the molecular level gives insights about their functions and dynamics. Shape and surface of biomolecules are fundamental to molecular recognition events. Characterizing their geometry can lead to more adequate predictions of their interactions. In the present work, we assess the performance of reference shape retrieval methods from the computer vision community on protein shapes. RESULTS Shape retrieval methods are efficient in identifying orthologous proteins and tracking large conformational changes. This work illustrates the interest for the protein surface shape as a higher-level representation of the protein structure that (i) abstracts the underlying protein sequence, structure or fold, (ii) allows the use of shape retrieval methods to screen large databases of protein structures to identify surficial homologs and possible interacting partners and (iii) opens an extension of the protein structure-function paradigm toward a protein structure-surface(s)-function paradigm. AVAILABILITYAND IMPLEMENTATION All data are available online at http://datasetmachat.drugdesign.fr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mohamed Machat
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Florent Langenfeld
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Daniela Craciun
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Léa Sirugue
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Taoufik Labib
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Nathalie Lagarde
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Maxime Maria
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
- Laboratoire XLIM, UMR CNRS 7252, Université de Limoges, Limoges 87000, France
| | - Matthieu Montes
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| |
Collapse
|
2
|
Ruiz-Serra V, Pontes C, Milanetti E, Kryshtafovych A, Lepore R, Valencia A. Assessing the accuracy of contact and distance predictions in CASP14. Proteins 2021; 89:1888-1900. [PMID: 34595772 DOI: 10.1002/prot.26248] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 09/06/2021] [Accepted: 09/21/2021] [Indexed: 12/26/2022]
Abstract
We present the results of the assessment of the intramolecular residue-residue contact and distance predictions from groups participating in the 14th round of the CASP experiment. The performance of contact prediction methods was evaluated with the measures used in previous CASPs, while distance predictions were assessed based on a new protocol, which considers individual distance pairs as well as the whole predicted distance matrix, using a graph-based framework. The results of the evaluation indicate that predictions by the tFold framework, TripletRes and DeepPotential were the most accurate in both categories. With regards to progress in method performance, the results of the assessment in contact prediction did not reveal any discernible difference when compared to CASP13. Arguably, this could be due to CASP14 FM targets being more challenging than ever before.
Collapse
Affiliation(s)
| | - Camila Pontes
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Edoardo Milanetti
- Department of Physics, Sapienza Università di Roma, Rome, Italy.,Center for Life Nano- & Neuro-Science, Fondazione Istituto Italiano di Tecnologia (IIT), Rome, Italy
| | | | | | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,ICREA, Pg. Lluís Companys, Barcelona, Spain
| |
Collapse
|
3
|
Abstract
For two decades, Rosetta has consistently been at the forefront of protein structure
prediction. While it has become a very large package comprising programs, scripts, and tools, for
different types of macromolecular modelling such as ligand docking, protein-protein docking,
protein design, and loop modelling, it started as the implementation of an algorithm for ab initio
protein structure prediction. The term ’Rosetta’ appeared for the first time twenty years ago in the
literature to describe that algorithm and its contribution to the third edition of the community wide
Critical Assessment of techniques for protein Structure Prediction (CASP3). Similar to the Rosetta
stone that allowed deciphering the ancient Egyptian civilisation, David Baker and his co-workers
have been contributing to deciphering ’the second half of the genetic code’. Although the focus of
Baker’s team has expended to de novo protein design in the past few years, Rosetta’s ‘fame’ is
associated with its fragment-assembly protein structure prediction approach. Following a
presentation of the main concepts underpinning its foundation, especially sequence-structure
correlation and usage of fragments, we review the main stages of its developments and highlight
the milestones it has achieved in terms of protein structure prediction, particularly in CASP.
Collapse
Affiliation(s)
- Jad Abbass
- Department of Computer Science, Lebanese International University, Bekaa, Lebanon
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE, United Kingdom
| |
Collapse
|
4
|
Shrestha R, Fajardo E, Gil N, Fidelis K, Kryshtafovych A, Monastyrskyy B, Fiser A. Assessing the accuracy of contact predictions in CASP13. Proteins 2019; 87:1058-1068. [PMID: 31587357 DOI: 10.1002/prot.25819] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/17/2019] [Accepted: 09/17/2019] [Indexed: 01/07/2023]
Abstract
The accuracy of sequence-based tertiary contact predictions was assessed in a blind prediction experiment at the CASP13 meeting. After 4 years of significant improvements in prediction accuracy, another dramatic advance has taken place since CASP12 was held 2 years ago. The precision of predicting the top L/5 contacts in the free modeling category, where L is the corresponding length of the protein in residues, has exceeded 70%. As a comparison, the best-performing group at CASP12 with a 47% precision would have finished below the top 1/3 of the CASP13 groups. Extensively trained deep neural network approaches dominate the top performing algorithms, which appear to efficiently integrate information on coevolving residues and interacting fragments or possibly utilize memories of sequence similarities and sometimes can deliver accurate results even in the absence of virtually any target specific evolutionary information. If the current performance is evaluated by F-score on L contacts, it stands around 24% right now, which, despite the tremendous impact and advance in improving its utility for structure modeling, also suggests that there is much room left for further improvement.
Collapse
Affiliation(s)
- Rojan Shrestha
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York
| | - Eduardo Fajardo
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York
| | - Nelson Gil
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York
| | | | | | | | - Andras Fiser
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York
| |
Collapse
|
5
|
Liu T, Wang Z. SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity. SOURCE CODE FOR BIOLOGY AND MEDICINE 2018; 13:1. [PMID: 29713370 PMCID: PMC5909207 DOI: 10.1186/s13029-018-0068-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2016] [Accepted: 04/02/2018] [Indexed: 11/22/2022]
Abstract
Background The segment overlap score (SOV) has been used to evaluate the predicted protein secondary structures, a sequence composed of helix (H), strand (E), and coil (C), by comparing it with the native or reference secondary structures, another sequence of H, E, and C. SOV’s advantage is that it can consider the size of continuous overlapping segments and assign extra allowance to longer continuous overlapping segments instead of only judging from the percentage of overlapping individual positions as Q3 score does. However, we have found a drawback from its previous definition, that is, it cannot ensure increasing allowance assignment when more residues in a segment are further predicted accurately. Results A new way of assigning allowance has been designed, which keeps all the advantages of the previous SOV score definitions and ensures that the amount of allowance assigned is incremental when more elements in a segment are predicted accurately. Furthermore, our improved SOV has achieved a higher correlation with the quality of protein models measured by GDT-TS score and TM-score, indicating its better abilities to evaluate tertiary structure quality at the secondary structure level. We analyzed the statistical significance of SOV scores and found the threshold values for distinguishing two protein structures (SOV_refine > 0.19) and indicating whether two proteins are under the same CATH fold (SOV_refine > 0.94 and > 0.90 for three- and eight-state secondary structures respectively). We provided another two example applications, which are when used as a machine learning feature for protein model quality assessment and comparing different definitions of topologically associating domains. We proved that our newly defined SOV score resulted in better performance. Conclusions The SOV score can be widely used in bioinformatics research and other fields that need to compare two sequences of letters in which continuous segments have important meanings. We also generalized the previous SOV definitions so that it can work for sequences composed of more than three states (e.g., it can work for the eight-state definition of protein secondary structures). A standalone software package has been implemented in Perl with source code released. The software can be downloaded from http://dna.cs.miami.edu/SOV/.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL 33124 USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL 33124 USA
| |
Collapse
|
6
|
Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AM. Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age. Proteins 2018; 86 Suppl 1:51-66. [PMID: 29071738 PMCID: PMC5820169 DOI: 10.1002/prot.25407] [Citation(s) in RCA: 126] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 10/06/2017] [Accepted: 10/24/2017] [Indexed: 12/20/2022]
Abstract
Following up on the encouraging results of residue-residue contact prediction in the CASP11 experiment, we present the analysis of predictions submitted for CASP12. The submissions include predictions of 34 groups for 38 domains classified as free modeling targets which are not accessible to homology-based modeling due to a lack of structural templates. CASP11 saw a rise of coevolution-based methods outperforming other approaches. The improvement of these methods coupled to machine learning and sequence database growth are most likely the main driver for a significant improvement in average precision from 27% in CASP11 to 47% in CASP12. In more than half of the targets, especially those with many homologous sequences accessible, precisions above 90% were achieved with the best predictors reaching a precision of 100% in some cases. We furthermore tested the impact of using these contacts as restraints in ab initio modeling of 14 single-domain free modeling targets using Rosetta. Adding contacts to the Rosetta calculations resulted in improvements of up to 26% in GDT_TS within the top five structures.
Collapse
Affiliation(s)
- Joerg Schaarschmidt
- Faculty of Science ‐ ChemistryComputational Structural Biology Group, Bijvoet Center for Biomolecular Research, Utrecht UniversityUtrechtThe Netherlands
| | | | | | - Alexandre M.J.J. Bonvin
- Faculty of Science ‐ ChemistryComputational Structural Biology Group, Bijvoet Center for Biomolecular Research, Utrecht UniversityUtrechtThe Netherlands
| |
Collapse
|
7
|
Monastyrskyy B, D'Andrea D, Fidelis K, Tramontano A, Kryshtafovych A. New encouraging developments in contact prediction: Assessment of the CASP11 results. Proteins 2016; 84 Suppl 1:131-44. [PMID: 26474083 PMCID: PMC4834069 DOI: 10.1002/prot.24943] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Revised: 09/15/2015] [Accepted: 10/11/2015] [Indexed: 12/27/2022]
Abstract
This article provides a report on the state-of-the-art in the prediction of intra-molecular residue-residue contacts in proteins based on the assessment of the predictions submitted to the CASP11 experiment. The assessment emphasis is placed on the accuracy in predicting long-range contacts. Twenty-nine groups participated in contact prediction in CASP11. At least eight of them used the recently developed evolutionary coupling techniques, with the top group (CONSIP2) reaching precision of 27% on target proteins that could not be modeled by homology. This result indicates a breakthrough in the development of methods based on the correlated mutation approach. Successful prediction of contacts was shown to be practically helpful in modeling three-dimensional structures; in particular target T0806 was modeled exceedingly well with accuracy not yet seen for ab initio targets of this size (>250 residues). Proteins 2016; 84(Suppl 1):131-144. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
| | - Daniel D'Andrea
- Department of Physics, Sapienza-University of Rome, Rome, 00185, Italy
| | | | - Anna Tramontano
- Department of Physics, Sapienza-University of Rome, Rome, 00185, Italy
- Istituto Pasteur-Fondazione Cenci Bolognetti-University of Rome, Rome, 00185, Italy
| | | |
Collapse
|
8
|
Modi V, Xu Q, Adhikari S, Dunbrack RL. Assessment of template-based modeling of protein structure in CASP11. Proteins 2016; 84 Suppl 1:200-20. [PMID: 27081927 DOI: 10.1002/prot.25049] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2016] [Revised: 04/04/2016] [Accepted: 04/11/2016] [Indexed: 12/27/2022]
Abstract
We present the assessment of predictions submitted in the template-based modeling (TBM) category of CASP11 (Critical Assessment of Protein Structure Prediction). Model quality was judged on the basis of global and local measures of accuracy on all atoms including side chains. The top groups on 39 human-server targets based on model 1 predictions were LEER, Zhang, LEE, MULTICOM, and Zhang-Server. The top groups on 81 targets by server groups based on model 1 predictions were Zhang-Server, nns, BAKER-ROSETTASERVER, QUARK, and myprotein-me. In CASP11, the best models for most targets were equal to or better than the best template available in the Protein Data Bank, even for targets with poor templates. The overall performance in CASP11 is similar to the performance of predictors in CASP10 with slightly better performance on the hardest targets. For most targets, assessment measures exhibited bimodal probability density distributions. Multi-dimensional scaling of an RMSD matrix for each target typically revealed a single cluster with models similar to the target structure, with a mode in the GDT-TS density between 40 and 90, and a wide distribution of models highly divergent from each other and from the experimental structure, with density mode at a GDT-TS value of ∼20. The models in this peak in the density were either compact models with entirely the wrong fold, or highly non-compact models. The results argue for a density-driven approach in future CASP TBM assessments that accounts for the bimodal nature of these distributions instead of Z scores, which assume a unimodal, Gaussian distribution. Proteins 2016; 84(Suppl 1):200-220. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Vivek Modi
- Fox Chase Cancer Center, Institute for Cancer Research, Philadelphia, Pennsylvania, 19111
| | - Qifang Xu
- Fox Chase Cancer Center, Institute for Cancer Research, Philadelphia, Pennsylvania, 19111
| | - Sam Adhikari
- Fox Chase Cancer Center, Institute for Cancer Research, Philadelphia, Pennsylvania, 19111
| | - Roland L Dunbrack
- Fox Chase Cancer Center, Institute for Cancer Research, Philadelphia, Pennsylvania, 19111.
| |
Collapse
|
9
|
Abstract
Background Computational sequence analysis, that is, prediction of local sequence properties, homologs, spatial structure and function from the sequence of a protein, offers an efficient way to obtain needed information about proteins under study. Since reliable prediction is usually based on the consensus of many computer programs, meta-severs have been developed to fit such needs. Most meta-servers focus on one aspect of sequence analysis, while others incorporate more information, such as PredictProtein for local sequence feature predictions, SMART for domain architecture and sequence motif annotation, and GeneSilico for secondary and spatial structure prediction. However, as predictions of local sequence properties, three-dimensional structure and function are usually intertwined, it is beneficial to address them together. Results We developed a MEta-Server for protein Sequence Analysis (MESSA) to facilitate comprehensive protein sequence analysis and gather structural and functional predictions for a protein of interest. For an input sequence, the server exploits a number of select tools to predict local sequence properties, such as secondary structure, structurally disordered regions, coiled coils, signal peptides and transmembrane helices; detect homologous proteins and assign the query to a protein family; identify three-dimensional structure templates and generate structure models; and provide predictive statements about the protein's function, including functional annotations, Gene Ontology terms, enzyme classification and possible functionally associated proteins. We tested MESSA on the proteome of Candidatus Liberibacter asiaticus. Manual curation shows that three-dimensional structure models generated by MESSA covered around 75% of all the residues in this proteome and the function of 80% of all proteins could be predicted. Availability MESSA is free for non-commercial use at http://prodata.swmed.edu/MESSA/
Collapse
|
10
|
Olechnovič K, Kulberkytė E, Venclovas C. CAD-score: a new contact area difference-based function for evaluation of protein structural models. Proteins 2012; 81:149-62. [PMID: 22933340 DOI: 10.1002/prot.24172] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2012] [Revised: 08/09/2012] [Accepted: 08/25/2012] [Indexed: 12/17/2022]
Abstract
Evaluation of protein models against the native structure is essential for the development and benchmarking of protein structure prediction methods. Although a number of evaluation scores have been proposed to date, many aspects of model assessment still lack desired robustness. In this study we present CAD-score, a new evaluation function quantifying differences between physical contacts in a model and the reference structure. The new score uses the concept of residue-residue contact area difference (CAD) introduced by Abagyan and Totrov (J Mol Biol 1997; 268:678-685). Contact areas, the underlying basis of the score, are derived using the Voronoi tessellation of protein structure. The newly introduced CAD-score is a continuous function, confined within fixed limits, free of any arbitrary thresholds or parameters. The built-in logic for treatment of missing residues allows consistent ranking of models of any degree of completeness. We tested CAD-score on a large set of diverse models and compared it to GDT-TS, a widely accepted measure of model accuracy. Similarly to GDT-TS, CAD-score showed a robust performance on single-domain proteins, but displayed a stronger preference for physically more realistic models. Unlike GDT-TS, the new score revealed a balanced assessment of domain rearrangement, removing the necessity for different treatment of single-domain, multi-domain, and multi-subunit structures. Moreover, CAD-score makes it possible to assess the accuracy of inter-domain or inter-subunit interfaces directly. In addition, the approach offers an alternative to the superposition-based model clustering. The CAD-score implementation is available both as a web server and a standalone software package at http://www.ibt.lt/bioinformatics/cad-score/.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Vilnius University, Graičiūno 8, LT-02241 Vilnius, Lithuania
| | | | | |
Collapse
|
11
|
Abstract
We introduce a theoretical framework that exploits the ever-increasing genomic sequence information for protein structure prediction. Structure-based models are modified to incorporate constraints by a large number of non-local contacts estimated from direct coupling analysis (DCA) of co-evolving genomic sequences. A simple hybrid method, called DCA-fold, integrating DCA contacts with an accurate knowledge of local information (e.g., the local secondary structure) is sufficient to fold proteins in the range of 1-3 Å resolution.
Collapse
|
12
|
Cong Q, Kinch LN, Pei J, Shi S, Grishin VN, Li W, Grishin NV. An automatic method for CASP9 free modeling structure prediction assessment. ACTA ACUST UNITED AC 2011; 27:3371-8. [PMID: 21994223 DOI: 10.1093/bioinformatics/btr572] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Manual inspection has been applied to and is well accepted for assessing critical assessment of protein structure prediction (CASP) free modeling (FM) category predictions over the years. Such manual assessment requires expertise and significant time investment, yet has the problems of being subjective and unable to differentiate models of similar quality. It is beneficial to incorporate the ideas behind manual inspection to an automatic score system, which could provide objective and reproducible assessment of structure models. RESULTS Inspired by our experience in CASP9 FM category assessment, we developed an automatic superimposition independent method named Quality Control Score (QCS) for structure prediction assessment. QCS captures both global and local structural features, with emphasis on global topology. We applied this method to all FM targets from CASP9, and overall the results showed the best agreement with Manual Inspection Scores among automatic prediction assessment methods previously applied in CASPs, such as Global Distance Test Total Score (GDT_TS) and Contact Score (CS). As one of the important components to guide our assessment of CASP9 FM category predictions, this method correlates well with other scoring methods and yet is able to reveal good-quality models that are missed by GDT_TS. AVAILABILITY The script for QCS calculation is available at http://prodata.swmed.edu/QCS/. CONTACT grishin@chop.swmed.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qian Cong
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050, USA
| | | | | | | | | | | | | |
Collapse
|
13
|
Lee J, Lee D, Park H, Coutsias EA, Seok C. Protein loop modeling by using fragment assembly and analytical loop closure. Proteins 2010; 78:3428-36. [PMID: 20872556 PMCID: PMC2976774 DOI: 10.1002/prot.22849] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2010] [Revised: 07/16/2010] [Accepted: 07/31/2010] [Indexed: 12/27/2022]
Abstract
Protein loops are often involved in important biological functions such as molecular recognition, signal transduction, or enzymatic action. The three dimensional structures of loops can provide essential information for understanding molecular mechanisms behind protein functions. In this article, we develop a novel method for protein loop modeling, where the loop conformations are generated by fragment assembly and analytical loop closure. The fragment assembly method reduces the conformational space drastically, and the analytical loop closure method finds the geometrically consistent loop conformations efficiently. We also derive an analytic formula for the gradient of any analytical function of dihedral angles in the space of closed loops. The gradient can be used to optimize various restraints derived from experiments or databases, for example restraints for preferential interactions between specific residues or for preferred backbone angles. We demonstrate that the current loop modeling method outperforms previous methods that employ residue-based torsion angle maps or different loop closure strategies when tested on two sets of loop targets of lengths ranging from 4 to 12.
Collapse
Affiliation(s)
- Julian Lee
- Department of Bioinformatics and Life Science, Soongsil University, Seoul 156-743, Korea
| | - Dongseon Lee
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Hahnbeom Park
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Evangelos A. Coutsias
- Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM 87131, USA
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| |
Collapse
|
14
|
Tress ML, Valencia A. Predicted residue-residue contacts can help the scoring of 3D models. Proteins 2010; 78:1980-91. [PMID: 20408174 DOI: 10.1002/prot.22714] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
During the 7th Critical Assessment of Protein Structure Prediction (CASP7) experiment, it was suggested that the real value of predicted residue-residue contacts might lie in the scoring of 3D model structures. Here, we have carried out a detailed reassessment of the contact predictions made during the recent CASP8 experiment to determine whether predicted contacts might aid in the selection of close-to-native structures or be a useful tool for scoring 3D structural models. We used the contacts predicted by the CASP8 residue-residue contact prediction groups to select models for each target domain submitted to the experiment. We found that the information contained in the predicted residue-residue contacts would probably have helped in the selection of 3D models in the free modeling regime and over the harder comparative modeling targets. Indeed, in many cases, the models selected using just the predicted contacts had better GDT-TS scores than all but the best 3D prediction groups. Despite the well-known low accuracy of residue-residue contact predictions, it is clear that the predictive power of contacts can be useful in 3D model prediction strategies.
Collapse
Affiliation(s)
- Michael L Tress
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain.
| | | |
Collapse
|
15
|
Madera M, Calmus R, Thiltgen G, Karplus K, Gough J. Improving protein secondary structure prediction using a simple k-mer model. Bioinformatics 2010; 26:596-602. [PMID: 20130034 PMCID: PMC2828123 DOI: 10.1093/bioinformatics/btq020] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Motivation: Some first order methods for protein sequence analysis inherently treat each position as independent. We develop a general framework for introducing longer range interactions. We then demonstrate the power of our approach by applying it to secondary structure prediction; under the independence assumption, sequences produced by existing methods can produce features that are not protein like, an extreme example being a helix of length 1. Our goal was to make the predictions from state of the art methods more realistic, without loss of performance by other measures. Results: Our framework for longer range interactions is described as a k-mer order model. We succeeded in applying our model to the specific problem of secondary structure prediction, to be used as an additional layer on top of existing methods. We achieved our goal of making the predictions more realistic and protein like, and remarkably this also improved the overall performance. We improve the Segment OVerlap (SOV) score by 1.8%, but more importantly we radically improve the probability of the real sequence given a prediction from an average of 0.271 per residue to 0.385. Crucially, this improvement is obtained using no additional information. Availability:http://supfam.cs.bris.ac.uk/kmer Contact:gough@cs.bris.ac.uk
Collapse
Affiliation(s)
- Martin Madera
- Department of Computer Science, University of Bristol, Woodland Road, Bristol BS8 1UB, UK
| | | | | | | | | |
Collapse
|
16
|
Ben-David M, Noivirt-Brik O, Paz A, Prilusky J, Sussman JL, Levy Y. Assessment of CASP8 structure predictions for template free targets. Proteins 2010; 77 Suppl 9:50-65. [PMID: 19774550 DOI: 10.1002/prot.22591] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The biennial CASP experiment is a crucial way to evaluate, in an unbiased way, the progress in predicting novel 3D protein structures. In this article, we assess the quality of prediction of template free models, that is, ab initio prediction of 3D structures of proteins based solely on the amino acid sequences, that is, proteins that did not have significant sequence identity to any protein in the Protein Data Bank. There were 13 targets in this category and 102 groups submitted predictions. Analysis was based on the GDT_TS analysis, which has been used in previous CASP experiments, together with a newly developed method, the OK_Rank, as well as by visual inspection. There is no doubt that in recent years many obstacles have been removed on the long and elusive way to deciphering the protein-folding problem. Out of the 13 targets, six were predicted well by a number of groups. On the other hand, it must be stressed that for four targets, none of the models were judged to be satisfactory. Thus, for template free model prediction, as evaluated in this CASP, successes have been achieved for most targets; however, a great deal of research is still required, both in improving the existing methods and in development of new approaches.
Collapse
Affiliation(s)
- Moshe Ben-David
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | | | | | | | |
Collapse
|
17
|
Aloy P, Oliva B. Splitting statistical potentials into meaningful scoring functions: testing the prediction of near-native structures from decoy conformations. BMC STRUCTURAL BIOLOGY 2009; 9:71. [PMID: 19917096 PMCID: PMC2783033 DOI: 10.1186/1472-6807-9-71] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2009] [Accepted: 11/16/2009] [Indexed: 11/20/2022]
Abstract
Background Recent advances on high-throughput technologies have produced a vast amount of protein sequences, while the number of high-resolution structures has seen a limited increase. This has impelled the production of many strategies to built protein structures from its sequence, generating a considerable amount of alternative models. The selection of the closest model to the native conformation has thus become crucial for structure prediction. Several methods have been developed to score protein models by energies, knowledge-based potentials and combination of both. Results Here, we present and demonstrate a theory to split the knowledge-based potentials in scoring terms biologically meaningful and to combine them in new scores to predict near-native structures. Our strategy allows circumventing the problem of defining the reference state. In this approach we give the proof for a simple and linear application that can be further improved by optimizing the combination of Zscores. Using the simplest composite score () we obtained predictions similar to state-of-the-art methods. Besides, our approach has the advantage of identifying the most relevant terms involved in the stability of the protein structure. Finally, we also use the composite Zscores to assess the conformation of models and to detect local errors. Conclusion We have introduced a method to split knowledge-based potentials and to solve the problem of defining a reference state. The new scores have detected near-native structures as accurately as state-of-art methods and have been successful to identify wrongly modeled regions of many near-native conformations.
Collapse
Affiliation(s)
- Patrick Aloy
- Institut de Recerca Biomèdica and Barcelona Supercomputing Center, 10-12 08028 Barcelona, Catalonia, Spain.
| | | |
Collapse
|
18
|
Shi S, Pei J, Sadreyev RI, Kinch LN, Majumdar I, Tong J, Cheng H, Kim BH, Grishin NV. Analysis of CASP8 targets, predictions and assessment methods. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2009; 2009:bap003. [PMID: 20157476 PMCID: PMC2794793 DOI: 10.1093/database/bap003] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2009] [Accepted: 02/21/2009] [Indexed: 11/17/2022]
Abstract
Results of the recent Critical Assessment of Techniques for Protein Structure Prediction, CASP8, present several valuable sources of information. First, CASP targets comprise a realistic sample of currently solved protein structures and exemplify the corresponding challenges for predictors. Second, the plethora of predictions by all possible methods provides an unusually rich material for evolutionary analysis of target proteins. Third, CASP results show the current state of the field and highlight specific problems in both predicting and assessing. Finally, these data can serve as grounds to develop and analyze methods for assessing prediction quality. Here we present results of our analysis in these areas. Our objective is not to duplicate CASP assessment, but to use our unique experience as former CASP5 assessors and CASP8 predictors to (i) offer more insights into CASP targets and predictions based on expert analysis, including invaluable analysis prior to target structure release; and (ii) develop an assessment methodology tailored towards current challenges in the field. Specifically, we discuss preparing target structures for assessment, parsing protein domains, balancing evaluations based on domains and on whole chains, dividing targets into categories and developing new evaluation scores. We also present evolutionary analysis of the most interesting and challenging targets. Database URL: Our results are available as a comprehensive database of targets and predictions at http://prodata.swmed.edu/CASP8.
Collapse
Affiliation(s)
- Shuoyong Shi
- Howard Hughes Medical Institute and Department of Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Sadreyev RI, Shi S, Baker D, Grishin NV. Structure similarity measure with penalty for close non-equivalent residues. Bioinformatics 2009; 25:1259-63. [PMID: 19321733 PMCID: PMC2677741 DOI: 10.1093/bioinformatics/btp148] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Motivation:Recent improvement in homology-based structure modeling emphasizes the importance of sensitive evaluation measures that help identify and correct modest distortions in models compared with the target structures. Global Distance Test Total Score (GDT_TS), otherwise a very powerful and effective measure for model evaluation, is still insensitive to and can even reward such distortions, as observed for remote homology modeling in the latest CASP8 (Comparative Assessment of Structure Prediction). Results:We develop a new measure that balances GDT_TS reward for the closeness of equivalent model and target residues (‘attraction’ term) with the penalty for the closeness of non-equivalent residues (‘repulsion’ term). Compared with GDT_TS, the resulting score, TR (total score with repulsion), is much more sensitive to structure compression both in real remote homologs and in CASP models. TR is correlated yet different from other measures of structure similarity. The largest difference from GDT_TS is observed in models of mid-range quality based on remote homology modeling. Availability:The script for TR calculation is included in Supplementary Material. TR scores for all server models in CASP8 are available at http://prodata.swmed.edu/CASP8. Contact:grishin@chop.swmed.edu Supplementary information:All scripts and numerical data are available for download at ftp://iole.swmed.edu/pub/tr_score/
Collapse
Affiliation(s)
- Ruslan I Sadreyev
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA
| | | | | | | |
Collapse
|
20
|
Batyanovskii AV, Vlasov PK. Short protein segments with prevalent conformation. Biophysics (Nagoya-shi) 2009. [DOI: 10.1134/s0006350908040040] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
21
|
Espadaler J, Eswar N, Querol E, Avilés FX, Sali A, Marti-Renom MA, Oliva B. Prediction of enzyme function by combining sequence similarity and protein interactions. BMC Bioinformatics 2008; 9:249. [PMID: 18505562 PMCID: PMC2430716 DOI: 10.1186/1471-2105-9-249] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2007] [Accepted: 05/27/2008] [Indexed: 11/18/2022] Open
Abstract
Background A number of studies have used protein interaction data alone for protein function prediction. Here, we introduce a computational approach for annotation of enzymes, based on the observation that similar protein sequences are more likely to perform the same function if they share similar interacting partners. Results The method has been tested against the PSI-BLAST program using a set of 3,890 protein sequences from which interaction data was available. For protein sequences that align with at least 40% sequence identity to a known enzyme, the specificity of our method in predicting the first three EC digits increased from 80% to 90% at 80% coverage when compared to PSI-BLAST. Conclusion Our method can also be used in proteins for which homologous sequences with known interacting partners can be detected. Thus, our method could increase 10% the specificity of genome-wide enzyme predictions based on sequence matching by PSI-BLAST alone.
Collapse
Affiliation(s)
- Jordi Espadaler
- Laboratori de Bioinformàtica Estructural (GRIB), Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra-IMIM, 08003-Barcelona, Catalonia, Spain.
| | | | | | | | | | | | | |
Collapse
|
22
|
Pereira de Araújo AF, Gomes ALC, Bursztyn AA, Shakhnovich EI. Native atomic burials, supplemented by physically motivated hydrogen bond constraints, contain sufficient information to determine the tertiary structure of small globular proteins. Proteins 2008; 70:971-83. [PMID: 17847091 DOI: 10.1002/prot.21571] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We investigate the possibility that atomic burials, as measured by their distances from the structural geometrical center, contain sufficient information to determine the tertiary structure of globular proteins. We report Monte Carlo simulated annealing results of all-atom hard-sphere models in continuous space for four small proteins: the all-beta WW-domain 1E0L, the alpha/beta protein-G 1IGD, the all-alpha engrailed homeo-domain 1ENH, and the alpha + beta engineered monomeric form of the Cro protein 1ORC. We used as energy function the sum over all atoms, labeled by i, of |R(i) - R(i) (*)|, where R(i) is the atomic distance from the center of coordinates, or central distance, and R(i) (*) is the "ideal" central distance obtained from the native structure. Hydrogen bonds were taken into consideration by the assignment of two ideal distances for backbone atoms forming hydrogen bonds in the native structure depending on the formation of a geometrically defined bond, independently of bond partner. Lowest energy final conformations turned out to be very similar to the native structure for the four proteins under investigation and a strong correlation was observed between energy and distance root mean square deviation (DRMS) from the native in the case of all-beta 1E0L and alpha/beta 1IGD. For all alpha 1ENH and alpha + beta 1ORC the overall correlation between energy and DRMS among final conformations was not as high because some trajectories resulted in high DRMS but low energy final conformations in which alpha-helices adopted a non-native mutual orientation. Comparison between central distances and actual accessible surface areas corroborated the implicit assumption of correlation between these two quantities. The Z-score obtained with this native-centric potential in the discrimination of native 1ORC from a set of random compact structures confirmed that it contains a much smaller amount of native information when compared to a traditional contact Go potential but indicated that simple sequence-dependent burial potentials still need some improvement in order to attain a similar discriminability. Taken together, our results suggest that central distances, in conjunction to physically motivated hydrogen bond constraints, contain sufficient information to determine the native conformation of these small proteins and that a solution to the folding problem for globular proteins could arise from sufficiently accurate burial predictions from sequence followed by minimization of a burial-dependent energy function.
Collapse
Affiliation(s)
- Antônio F Pereira de Araújo
- Laboratório de Biologia Teórica, Departamento de Biologia Celular, Universidade de Brasília, Brasília-DF 70910-900, Brazil.
| | | | | | | |
Collapse
|
23
|
Abstract
Currently, one of the most serious problems in protein-folding simulations for de novo structure prediction is conformational sampling of medium-to-large proteins. In vivo, folding of these proteins is mediated by molecular chaperones. Inspired by the functions of chaperonins, we designed a simple chaperonin-like simulation protocol within the framework of the standard fragment assembly method: in our protocol, the strength of the hydrophobic interaction is periodically modulated to help the protein escape from misfolded structures. We tested this protocol for 38 proteins and found that, using a certain defined criterion of success, our method could successfully predict the native structures of 14 targets, whereas only those of 10 targets were successfully predicted using the standard protocol. In particular, for non-alpha-helical proteins, our method yielded significantly better predictions than the standard approach. This chaperonin-inspired protocol that enhanced de novo structure prediction using folding simulations may, in turn, provide new insights into the working principles underlying the chaperonin system.
Collapse
|
24
|
Wu S, Zhang Y. A comprehensive assessment of sequence-based and template-based methods for protein contact prediction. ACTA ACUST UNITED AC 2008; 24:924-31. [PMID: 18296462 DOI: 10.1093/bioinformatics/btn069] [Citation(s) in RCA: 151] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Pair-wise residue-residue contacts in proteins can be predicted from both threading templates and sequence-based machine learning. However, most structure modeling approaches only use the template-based contact predictions in guiding the simulations; this is partly because the sequence-based contact predictions are usually considered to be less accurate than that by threading. With the rapid progress in sequence databases and machine-learning techniques, it is necessary to have a detailed and comprehensive assessment of the contact-prediction methods in different template conditions. RESULTS We develop two methods for protein-contact predictions: SVM-SEQ is a sequence-based machine learning approach which trains a variety of sequence-derived features on contact maps; SVM-LOMETS collects consensus contact predictions from multiple threading templates. We test both methods on the same set of 554 proteins which are categorized into 'Easy', 'Medium', 'Hard' and 'Very Hard' targets based on the evolutionary and structural distance between templates and targets. For the Easy and Medium targets, SVM-LOMETS obviously outperforms SVM-SEQ; but for the Hard and Very Hard targets, the accuracy of the SVM-SEQ predictions is higher than that of SVM-LOMETS by 12-25%. If we combine the SVM-SEQ and SVM-LOMETS predictions together, the total number of correctly predicted contacts in the Hard proteins will increase by more than 60% (or 70% for the long-range contact with a sequence separation > or =24), compared with SVM-LOMETS alone. The advantage of SVM-SEQ is also shown in the CASP7 free modeling targets where the SVM-SEQ is around four times more accurate than SVM-LOMETS in the long-range contact prediction. These data demonstrate that the state-of-the-art sequence-based contact prediction has reached a level which may be helpful in assisting tertiary structure modeling for the targets which do not have close structure templates. The maximum yield should be obtained by the combination of both sequence- and template-based predictions.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA
| | | |
Collapse
|
25
|
Mereghetti P, Ganadu ML, Papaleo E, Fantucci P, De Gioia L. Validation of protein models by a neural network approach. BMC Bioinformatics 2008; 9:66. [PMID: 18230168 PMCID: PMC2276493 DOI: 10.1186/1471-2105-9-66] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2007] [Accepted: 01/29/2008] [Indexed: 11/30/2022] Open
Abstract
Background The development and improvement of reliable computational methods designed to evaluate the quality of protein models is relevant in the context of protein structure refinement, which has been recently identified as one of the bottlenecks limiting the quality and usefulness of protein structure prediction. Results In this contribution, we present a computational method (Artificial Intelligence Decoys Evaluator: AIDE) which is able to consistently discriminate between correct and incorrect protein models. In particular, the method is based on neural networks that use as input 15 structural parameters, which include energy, solvent accessible surface, hydrophobic contacts and secondary structure content. The results obtained with AIDE on a set of decoy structures were evaluated using statistical indicators such as Pearson correlation coefficients, Znat, fraction enrichment, as well as ROC plots. It turned out that AIDE performances are comparable and often complementary to available state-of-the-art learning-based methods. Conclusion In light of the results obtained with AIDE, as well as its comparison with available learning-based methods, it can be concluded that AIDE can be successfully used to evaluate the quality of protein structures. The use of AIDE in combination with other evaluation tools is expected to further enhance protein refinement efforts.
Collapse
Affiliation(s)
- Paolo Mereghetti
- Department of Chemistry, University of Sassari, Via Vienna 2, 07100, Sassari, Italy.
| | | | | | | | | |
Collapse
|
26
|
Jauch R, Yeo HC, Kolatkar PR, Clarke ND. Assessment of CASP7 structure predictions for template free targets. Proteins 2008; 69 Suppl 8:57-67. [PMID: 17894330 DOI: 10.1002/prot.21771] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In CASP7, protein structure prediction targets that lacked substantial similarity to a protein in the PDB at the time of assessment were considered to be free modeling targets (FM). We assessed predictions for 14 FM targets as well as four other targets that were deemed to be on the borderline between FM targets and template based modeling targets (TBM/FM). GDT_TS was used as one measure of model quality. Model quality was also assessed by visual inspection. Visual inspection was performed by three independent assessors who were blinded to GDT_TS scores and other quantitative measures of model quality. The best models by visual inspection tended to rank among the top few percent by GDT_TS, but were typically not the highest scoring models. Thus, visual inspection remains an essential component of assessment for FM targets. Overall, group TS020 (Baker) performed best, but success on individual targets was widely distributed among many groups. Among these other groups, TS024 and TS025 (Zhang and Zhang server) performed notably well without exceptionally large computing resources. This should be considered encouraging for future CASPs. There was a sense of progress in template FM relative to CASP6, but we were unable to demonstrate this progress objectively.
Collapse
Affiliation(s)
- Ralf Jauch
- Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | | | | | | |
Collapse
|
27
|
|
28
|
Shestopalov BV. The code-based physics of formation of alpha-helices and beta-hairpins in water-soluble proteins. DOKL BIOCHEM BIOPHYS 2007; 416:245-7. [PMID: 18064823 DOI: 10.1134/s1607672907050055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Affiliation(s)
- B V Shestopalov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretskii pr. 4, St. Petersburg, 194064, Russia
| |
Collapse
|
29
|
Shmygelska A, Hoos HH. An adaptive bin framework search method for a beta-sheet protein homopolymer model. BMC Bioinformatics 2007; 8:136. [PMID: 17451609 PMCID: PMC1894818 DOI: 10.1186/1471-2105-8-136] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2007] [Accepted: 04/24/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The problem of protein structure prediction consists of predicting the functional or native structure of a protein given its linear sequence of amino acids. This problem has played a prominent role in the fields of biomolecular physics and algorithm design for over 50 years. Additionally, its importance increases continually as a result of an exponential growth over time in the number of known protein sequences in contrast to a linear increase in the number of determined structures. Our work focuses on the problem of searching an exponentially large space of possible conformations as efficiently as possible, with the goal of finding a global optimum with respect to a given energy function. This problem plays an important role in the analysis of systems with complex search landscapes, and particularly in the context of ab initio protein structure prediction. RESULTS In this work, we introduce a novel approach for solving this conformation search problem based on the use of a bin framework for adaptively storing and retrieving promising locally optimal solutions. Our approach provides a rich and general framework within which a broad range of adaptive or reactive search strategies can be realized. Here, we introduce adaptive mechanisms for choosing which conformations should be stored, based on the set of conformations already stored in memory, and for biasing choices when retrieving conformations from memory in order to overcome search stagnation. CONCLUSION We show that our bin framework combined with a widely used optimization method, Monte Carlo search, achieves significantly better performance than state-of-the-art generalized ensemble methods for a well-known protein-like homopolymer model on the face-centered cubic lattice.
Collapse
Affiliation(s)
- Alena Shmygelska
- Department of Structural Biology, Stanford University, 299 W. Campus Dr., Stanford, CA 94305, USA
| | - Holger H Hoos
- Department of Computer Science, University of British Columbia, 2366 Main Mall, Vancouver, BC V6T 1Z4, Canada
| |
Collapse
|
30
|
Ling Z, Tran KC, Arnold JJ, Teng MN. Purification and characterization of recombinant human respiratory syncytial virus nonstructural protein NS1. Protein Expr Purif 2007; 57:261-70. [PMID: 17997327 DOI: 10.1016/j.pep.2007.09.017] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2007] [Revised: 09/18/2007] [Accepted: 09/19/2007] [Indexed: 10/22/2022]
Abstract
We report here the first biochemical and structural characterization of the respiratory syncytial virus (RSV) NS1 protein. We have used a pET-ubiquitin expression system to produce respiratory syncytial virus (RSV) NS1 protein in E. coli that contains a hexahistidine-tag on either the amino- or carboxyl-terminus (His(6)-NS1 and NS1-His(6), respectively). We have been able to isolate milligram quantities of highly purified His(6)-NS1 and NS1-His(6) by nickel affinity chromatography. Generation of recombinant RSV indicated that addition of the hexahistidine tag to the C-terminus of NS1 slightly decreased viral replication competence whereas addition of the tag to the N-terminus had no observable effect. Therefore, we performed a comprehensive biochemical and biophysical characterization on His(6)-NS1. His(6)-NS1 is monodisperse in solution as determined by dynamic light scattering analysis. Both gel filtration and analytical ultracentrifugation showed that His(6)-NS1 is predominantly a monomer. In agreement with theoretical predictions, circular dichroism spectroscopy showed that His(6)-NS1 contains 21% alpha-helices, 34% beta-sheets, and 45% undefined structure. Immunization with purified His(6)-NS1 generated an antiserum that specifically recognizes NS1 by immunoprecipitation from HEp-2 cells infected by RSV, indicating that His(6)-NS1 resembles native NS1. The availability of purified RSV NS1 will permit biochemical and structural investigations providing insight into the function of NS1 in viral replication and interferon antagonism.
Collapse
Affiliation(s)
- Zhenhua Ling
- Graduate Program in Biochemistry, Microbiology, and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | | | | | | |
Collapse
|
31
|
Drabikowski M, Nowakowski S, Tiuryn J. Library of local descriptors models the core of proteins accurately. Proteins 2007; 69:499-510. [PMID: 17623841 DOI: 10.1002/prot.21393] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In this article, we present a novel approach to describing proteins based on multifragment structure motifs called local descriptors. We collect structurally similar descriptors in groups to construct a compact library of groups of descriptors. To demonstrate its feasibility for a wide spectrum of applications, ranging from structure comparison and analysis to structure prediction, it is critical to show the ability of groups from our library to reproduce proteins accurately. We show that this library describes all local 3D structure patterns occurring in the core of proteins and present an algorithm for reconstruction of accurate global 3D structures. Moreover, we show that the sequence of motifs used in such a construction correlates significantly with the amino acid sequence of the considered protein. Finally, we present how our library may be successfully used for predicting protein sequence based on the structure.
Collapse
Affiliation(s)
- Michał Drabikowski
- Institute of Informatics, Warsaw University, Banacha 2, 02-097 Warszawa, Poland.
| | | | | |
Collapse
|
32
|
Wang Z, Smith CE, Atchley WR. Application of complex demodulation on bZIP and bHLH-PAS protein domains. Math Biosci 2007; 207:204-18. [PMID: 17374384 DOI: 10.1016/j.mbs.2007.01.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2006] [Revised: 12/29/2006] [Accepted: 01/10/2007] [Indexed: 11/22/2022]
Abstract
Proteins are built with molecular modular building blocks such as an alpha-helix, beta-sheet, loop region and other structures. This is an economical way of constructing complex molecules. Periodicity analysis of protein sequences has allowed us to obtain meaningful information concerning their structure, function and evolution. In this work, complex demodulation (CDM) is introduced to detect functional regions in protein sequences data. More specifically, we analyzed bZIP and bHLH-PAS protein domains. Complex demodulation provided insightful information about changing amplitudes of periodic components in protein sequences. Furthermore, it was found that the local amplitude minimum or local amplitude maximum of the 3.6-aa periodic component is associated with protein structural or functional information due to the observation that the extrema are mainly located in the boundary area of two structural or functional regions.
Collapse
Affiliation(s)
- Zhi Wang
- Graduate Program in Biomathematics, North Carolina State University, Raleigh, NC 27695-8203, USA
| | | | | |
Collapse
|
33
|
Eyal E, Frenkel-Morgenstern M, Sobolev V, Pietrokovski S. A pair-to-pair amino acids substitution matrix and its applications for protein structure prediction. Proteins 2007; 67:142-53. [PMID: 17243158 DOI: 10.1002/prot.21223] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We present a new structurally derived pair-to-pair substitution matrix (P2PMAT). This matrix is constructed from a very large amount of integrated high quality multiple sequence alignments (Blocks) and protein structures. It evaluates the likelihoods of all 160,000 pair-to-pair substitutions. P2PMAT matrix implicitly accounts for evolutionary conservation, correlated mutations, and residue-residue contact potentials. The usefulness of the matrix for structural predictions is shown in this article. Predicting protein residue-residue contacts from sequence information alone, by our method (P2PConPred) is particularly accurate in the protein cores, where it performs better than other basic contact prediction methods (increasing accuracy by 25-60%). The method mean accuracy for protein cores is 24% for 59 diverse families and 34% for a subset of proteins shorter than 100 residues. This is above the level that was recently shown to be sufficient to significantly improve ab initio protein structure prediction. We also demonstrate the ability of our approach to identify native structures within large sets of (300-2000) protein decoys. On the basis of evolutionary information alone our method ranks the native structure in the top 0.3% of the decoys in 4/10 of the sets, and in 8/10 of sets the native structure is ranked in the top 10% of the decoys. The method can, thus, be used to assist filtering wrong models, complementing traditional scoring functions.
Collapse
Affiliation(s)
- Eran Eyal
- Department of Plant Sciences, Weizmann Institute of Science, Rehovot 76100, Israel.
| | | | | | | |
Collapse
|
34
|
A new approach to the assessment of the quality of predictions of transcription factor binding sites. J Biomed Inform 2007; 40:139-49. [DOI: 10.1016/j.jbi.2006.07.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2006] [Revised: 06/23/2006] [Accepted: 07/13/2006] [Indexed: 11/22/2022]
|
35
|
Bumbaca D, Littlejohn JE, Nayakanti H, Lucas AH, Rigden DJ, Galperin MY, Jedrzejas MJ. Genome-based identification and characterization of a putative mucin-binding protein from the surface of Streptococcus pneumoniae. Proteins 2006; 66:547-58. [PMID: 17115425 DOI: 10.1002/prot.21205] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Streptococcus pneumoniae open reading frame SP1492 encodes a surface protein that contains a novel conserved domain similar to the repeated fragments of mucin-binding proteins from lactobacilli and lactococci. To investigate the functional role(s) of this protein and its potential adhesive properties, the surface-exposed region of SP1492 was expressed in Escherichia coli, purified to homogeneity, and partially characterized by biophysical and immunological methods. Circular dichroism and sedimentation measurements confirmed that SP1492 is an all-beta protein that exists in solution as a monomer. The SP1492 protein has been shown to be expressed by S. pneumoniae and was experimentally localized to its surface. The protein functional domain binds to mucins II and III from porcine stomach and to purified submaxillary bovine gland mucin. It appears to be one of the very few unambiguous pneumococcal adhesin molecules known to date. A hypothetical model constructed by ab initio techniques predicts a novel beta-sandwich protein structure.
Collapse
Affiliation(s)
- Daniela Bumbaca
- Center for Immunobiology and Vaccine Development, Children's Hospital Oakland Research Institute, Oakland, California 94609, USA
| | | | | | | | | | | | | |
Collapse
|
36
|
Elcock AH. Molecular simulations of cotranslational protein folding: fragment stabilities, folding cooperativity, and trapping in the ribosome. PLoS Comput Biol 2006; 2:e98. [PMID: 16789821 PMCID: PMC1523309 DOI: 10.1371/journal.pcbi.0020098.eor] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2006] [Accepted: 06/14/2006] [Indexed: 11/19/2022] Open
Abstract
Although molecular simulation methods have yielded valuable insights into mechanistic aspects of protein refolding in vitro, they have up to now not been used to model the folding of proteins as they are actually synthesized by the ribosome. To address this issue, we report here simulation studies of three model proteins: chymotrypsin inhibitor 2 (CI2), barnase, and Semliki forest virus protein (SFVP), and directly compare their folding during ribosome-mediated synthesis with their refolding from random, denatured conformations. To calibrate the methodology, simulations are first compared with in vitro data on the folding stabilities of N-terminal fragments of CI2 and barnase; the simulations reproduce the fact that both the stability and thermal folding cooperativity increase as fragments increase in length. Coupled simulations of synthesis and folding for the same two proteins are then described, showing that both fold essentially post-translationally, with mechanisms effectively identical to those for refolding. In both cases, confinement of the nascent polypeptide chain within the ribosome tunnel does not appear to promote significant formation of native structure during synthesis; there are however clear indications that the formation of structure within the nascent chain is sensitive to location within the ribosome tunnel, being subject to both gain and loss as the chain lengthens. Interestingly, simulations in which CI2 is artificially stabilized show a pronounced tendency to become trapped within the tunnel in partially folded conformations: non-cooperative folding, therefore, appears in the simulations to exert a detrimental effect on the rate at which fully folded conformations are formed. Finally, simulations of the two-domain protease module of SFVP, which experimentally folds cotranslationally, indicate that for multi-domain proteins, ribosome-mediated folding may follow different pathways from those taken during refolding. Taken together, these studies provide a first step toward developing more realistic methods for simulating protein folding as it occurs in vivo.
Collapse
Affiliation(s)
- Adrian H Elcock
- Department of Biochemistry, University of Iowa, Iowa City, Iowa, USA.
| |
Collapse
|
37
|
Elcock AH. Molecular simulations of cotranslational protein folding: fragment stabilities, folding cooperativity, and trapping in the ribosome. PLoS Comput Biol 2006. [PMID: 16789821 PMCID: PMC1523309 DOI: 10.1371/journal.pcbi.0020098] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Although molecular simulation methods have yielded valuable insights into mechanistic aspects of protein refolding in vitro, they have up to now not been used to model the folding of proteins as they are actually synthesized by the ribosome. To address this issue, we report here simulation studies of three model proteins: chymotrypsin inhibitor 2 (CI2), barnase, and Semliki forest virus protein (SFVP), and directly compare their folding during ribosome-mediated synthesis with their refolding from random, denatured conformations. To calibrate the methodology, simulations are first compared with in vitro data on the folding stabilities of N-terminal fragments of CI2 and barnase; the simulations reproduce the fact that both the stability and thermal folding cooperativity increase as fragments increase in length. Coupled simulations of synthesis and folding for the same two proteins are then described, showing that both fold essentially post-translationally, with mechanisms effectively identical to those for refolding. In both cases, confinement of the nascent polypeptide chain within the ribosome tunnel does not appear to promote significant formation of native structure during synthesis; there are however clear indications that the formation of structure within the nascent chain is sensitive to location within the ribosome tunnel, being subject to both gain and loss as the chain lengthens. Interestingly, simulations in which CI2 is artificially stabilized show a pronounced tendency to become trapped within the tunnel in partially folded conformations: non-cooperative folding, therefore, appears in the simulations to exert a detrimental effect on the rate at which fully folded conformations are formed. Finally, simulations of the two-domain protease module of SFVP, which experimentally folds cotranslationally, indicate that for multi-domain proteins, ribosome-mediated folding may follow different pathways from those taken during refolding. Taken together, these studies provide a first step toward developing more realistic methods for simulating protein folding as it occurs in vivo.
Collapse
Affiliation(s)
- Adrian H Elcock
- Department of Biochemistry, University of Iowa, Iowa City, Iowa, USA.
| |
Collapse
|
38
|
Stumpff-Kane AW, Feig M. A correlation-based method for the enhancement of scoring functions on funnel-shaped energy landscapes. Proteins 2006; 63:155-64. [PMID: 16397892 DOI: 10.1002/prot.20853] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A correlation-based approach is introduced for enhancing the ability of structure-scoring methods to identify and distinguish native-like conformations. The proposed method relies on a funnel-shaped scoring function that decreases steadily toward the native state. It takes advantage of the idea that the structure from a given ensemble that is closest to the native basin leads to the highest correlation coefficient between a given score and distance to that structure as an approximation of the native state for the entire ensemble. The method is applied successfully to a number of different test cases that demonstrate substantial improvements in the correlation of the score with the distance from the true native state but also result in the selection of more native-like structures compared to the original score.
Collapse
Affiliation(s)
- Andrew W Stumpff-Kane
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824-1319, USA
| | | |
Collapse
|
39
|
Takamoto K, Chance MR. RADIOLYTIC PROTEIN FOOTPRINTING WITH MASS SPECTROMETRY TO PROBE THE STRUCTURE OF MACROMOLECULAR COMPLEXES. ACTA ACUST UNITED AC 2006; 35:251-76. [PMID: 16689636 DOI: 10.1146/annurev.biophys.35.040405.102050] [Citation(s) in RCA: 197] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Structural proteomics approaches using mass spectrometry are increasingly used in biology to examine the composition and structure of macromolecules. Hydroxyl radical-mediated protein footprinting using mass spectrometry has recently been developed to define structure, assembly, and conformational changes of macromolecules in solution based on measurements of reactivity of amino acid side chain groups with covalent modification reagents. Accurate measurements of side chain reactivity are achieved using quantitative liquid-chromatography-coupled mass spectrometry, whereas the side chain modification sites are identified using tandem mass spectrometry. In addition, the use of footprinting data in conjunction with computational modeling approaches is a powerful new method for testing and refining structural models of macromolecules and their complexes. In this review, we discuss the basic chemistry of hydroxyl radical reactions with peptides and proteins, highlight various approaches to map protein structure using radical oxidation methods, and describe state-of-the-art approaches to combine computational and footprinting data.
Collapse
Affiliation(s)
- Keiji Takamoto
- Case Center for Proteomics, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | | |
Collapse
|
40
|
Graña O, Baker D, MacCallum RM, Meiler J, Punta M, Rost B, Tress ML, Valencia A. CASP6 assessment of contact prediction. Proteins 2006; 61 Suppl 7:214-224. [PMID: 16187364 DOI: 10.1002/prot.20739] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Here we present the evaluation results of the Critical Assessment of Protein Structure Prediction (CASP6) contact prediction category. Contact prediction was assessed with standard measures well known in the field and the performance of specialist groups was evaluated alongside groups that submitted models with 3D coordinates. The evaluation was mainly focused on long range contact predictions for the set of new fold targets, although we analyzed predictions for all targets. Three groups with similar levels of accuracy and coverage performed a little better than the others. Comparisons of the predictions of the three best methods with those of CASP5/CAFASP3 suggested some improvement, although there were not enough targets in the comparisons to make this statistically significant.
Collapse
Affiliation(s)
- Osvaldo Graña
- Protein Design Group, Centro Nacional de Biotecnologia (CNB-CSIC), C/Darwin 3, Cantoblanco, Madrid, Spain
| | | | | | | | | | | | | | | |
Collapse
|
41
|
Vincent JJ, Tai CH, Sathyanarayana BK, Lee B. Assessment of CASP6 predictions for new and nearly new fold targets. Proteins 2006; 61 Suppl 7:67-83. [PMID: 16187347 DOI: 10.1002/prot.20722] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
This is a report of the assessment of the predictions made for the CASP6 protein structure prediction experiment conducted in 2004 in the New Fold (NF) category. There were nine protein domains that were judged to have new folds (NF) and 16 for which a similar structure was known but the sequence similarity was judged to be too low for them to be easily recognized (FR/A). We selected all NF targets and eight of the 16 FR/A targets judged to be at the borderline between NF and FR/A for evaluation in the NF category. A total of 165 prediction groups submitted over 7400 structural models for these targets. The quality of these models was evaluated using the GDT_TS scores of the structural similarity detection program LGA and by visual inspection of the top-scoring models. The best models submitted bore an overall similarity to the target structure for three or four of the nine NF targets and for all but one of the FR/A targets. High-scoring models for the NF targets were submitted by several different groups. When both the NF and FR/A targets were considered, Baker group dominated by submitting best models for seven of the 17 targets, but 14 other groups also managed to submit best models for one or more targets.
Collapse
Affiliation(s)
- James J Vincent
- Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | | | | | | |
Collapse
|
42
|
Chivian D, Kim DE, Malmström L, Schonbrun J, Rohl CA, Baker D. Prediction of CASP6 structures using automated Robetta protocols. Proteins 2006; 61 Suppl 7:157-166. [PMID: 16187358 DOI: 10.1002/prot.20733] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The Robetta server and revised automatic protocols were used to predict structures for CASP6 targets. Robetta is a publicly available protein structure prediction server (http://robetta.bakerlab.org/ that uses the Rosetta de novo and homology modeling structure prediction methods. We incorporated some of the lessons learned in the CASP5 experiment into the server prior to participating in CASP6. We additionally tested new ideas that were amenable to full-automation with an eye toward improving the server. We find that the Robetta server shows the greatest promise for the more challenging targets. The most significant finding from CASP5, that automated protocols can be roughly comparable in ability with the better human-intervention predictors, is repeated here in CASP6.
Collapse
Affiliation(s)
- Dylan Chivian
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
| | | | | | | | | | | |
Collapse
|
43
|
Marsden RL, Ranea JAG, Sillero A, Redfern O, Yeats C, Maibaum M, Lee D, Addou S, Reeves GA, Dallman TJ, Orengo CA. Exploiting protein structure data to explore the evolution of protein function and biological complexity. Philos Trans R Soc Lond B Biol Sci 2006; 361:425-40. [PMID: 16524831 PMCID: PMC1609337 DOI: 10.1098/rstb.2005.1801] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
New directions in biology are being driven by the complete sequencing of genomes, which has given us the protein repertoires of diverse organisms from all kingdoms of life. In tandem with this accumulation of sequence data, worldwide structural genomics initiatives, advanced by the development of improved technologies in X-ray crystallography and NMR, are expanding our knowledge of structural families and increasing our fold libraries. Methods for detecting remote sequence similarities have also been made more sensitive and this means that we can map domains from these structural families onto genome sequences to understand how these families are distributed throughout the genomes and reveal how they might influence the functional repertoires and biological complexities of the organisms. We have used robust protocols to assign sequences from completed genomes to domain structures in the CATH database, allowing up to 60% of domain sequences in these genomes, depending on the organism, to be assigned to a domain family of known structure. Analysis of the distribution of these families throughout bacterial genomes identified more than 300 universal families, some of which had expanded significantly in proportion to genome size. These highly expanded families are primarily involved in metabolism and regulation and appear to make major contributions to the functional repertoire and complexity of bacterial organisms. When comparisons are made across all kingdoms of life, we find a smaller set of universal domain families (approx. 140), of which families involved in protein biosynthesis are the largest conserved component. Analysis of the behaviour of other families reveals that some (e.g. those involved in metabolism, regulation) have remained highly innovative during evolution, making it harder to trace their evolutionary ancestry. Structural analyses of metabolic families provide some insights into the mechanisms of functional innovation, which include changes in domain partnerships and significant structural embellishments leading to modulation of active sites and protein interactions.
Collapse
Affiliation(s)
- Russell L Marsden
- Department of Biochemistry, University College London Gower Street, London WC1E 6BT, UK.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Moult J. Rigorous performance evaluation in protein structure modelling and implications for computational biology. Philos Trans R Soc Lond B Biol Sci 2006; 361:453-8. [PMID: 16524833 PMCID: PMC1609338 DOI: 10.1098/rstb.2005.1810] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In principle, given the amino acid sequence of a protein, it is possible to compute the corresponding three-dimensional structure. Methods for modelling structure based on this premise have been under development for more than 40 years. For the past decade, a series of community wide experiments (termed Critical Assessment of Structure Prediction (CASP)) have assessed the state of the art, providing a detailed picture of what has been achieved in the field, where we are making progress, and what major problems remain. The rigorous evaluation procedures of CASP have been accompanied by substantial progress. Lessons from this area of computational biology suggest a set of principles for increasing rigor in the field as a whole.
Collapse
Affiliation(s)
- John Moult
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, 9600 Gudelsky Drive, Rockville, MD 20850, USA.
| |
Collapse
|
45
|
Gianese G, Pascarella S. A consensus procedure improving solvent accessibility prediction. J Comput Chem 2006; 27:621-6. [PMID: 16470666 DOI: 10.1002/jcc.20370] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Prediction methods of structural features in 1D represent a useful tool for the understanding of folding, classification, and function of proteins, and, in particular, for 3D structure prediction. Among the structural aspects characterizing a protein, solvent accessibility has received great attention in recent years. The available methods proposed for predicting accessibility have never considered the combination of the results deriving from different methods to construct a consensus prediction able to provide more reliable results. A consensus approach that increases prediction accuracy using three high-performance methods is described. The results of our method for three different protein data sets show that up to 3.0% improvement in prediction accuracy of solvent accessibility may be obtained by a consensus approach. The improvement also extends to the correlation coefficient. Application of our consensus approach to the accessibility prediction using only three prediction methods gives results better than single methods combined for consensus formation. Currently, the scarce availability of predictors with similar parameters defining solvent accessibility hinders the testing of other methods in our consensus procedure.
Collapse
Affiliation(s)
- Giulio Gianese
- Dipartimento di Scienze Biochimiche A. Rossi Fanelli, Università La Sapienza, 00185 Roma, Italy
| | | |
Collapse
|
46
|
Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins 2006; 57:702-10. [PMID: 15476259 DOI: 10.1002/prot.20264] [Citation(s) in RCA: 1296] [Impact Index Per Article: 72.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We have developed a new scoring function, the template modeling score (TM-score), to assess the quality of protein structure templates and predicted full-length models by extending the approaches used in Global Distance Test (GDT)1 and MaxSub.2 First, a protein size-dependent scale is exploited to eliminate the inherent protein size dependence of the previous scores and appropriately account for random protein structure pairs. Second, rather than setting specific distance cutoffs and calculating only the fractions with errors below the cutoff, all residue pairs in alignment/modeling are evaluated in the proposed score. For comparison of various scoring functions, we have constructed a large-scale benchmark set of structure templates for 1489 small to medium size proteins using the threading program PROSPECTOR_3 and built the full-length models using MODELLER and TASSER. The TM-score of the initial threading alignments, compared to the GDT and MaxSub scoring functions, shows a much stronger correlation to the quality of the final full-length models. The TM-score is further exploited as an assessment of all 'new fold' targets in the recent CASP5 experiment and shows a close coincidence with the results of human-expert visual assessment. These data suggest that the TM-score is a useful complement to the fully automated assessment of protein structure predictions. The executable program of TM-score is freely downloadable at http://bioinformatics.buffalo.edu/TM-score.
Collapse
Affiliation(s)
- Yang Zhang
- Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, New York 14203, USA
| | | |
Collapse
|
47
|
Ho BK, Dill KA. Folding very short peptides using molecular dynamics. PLoS Comput Biol 2006; 2:e27. [PMID: 16617376 PMCID: PMC1435986 DOI: 10.1371/journal.pcbi.0020027] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2005] [Accepted: 02/20/2005] [Indexed: 11/29/2022] Open
Abstract
Peptides often have conformational preferences. We simulated 133 peptide 8-mer fragments from six different proteins, sampled by replica-exchange molecular dynamics using Amber7 with a GB/SA (generalized-Born/solvent-accessible electrostatic approximation to water) implicit solvent. We found that 85 of the peptides have no preferred structure, while 48 of them converge to a preferred structure. In 85% of the converged cases (41 peptides), the structures found by the simulations bear some resemblance to their native structures, based on a coarse-grained backbone description. In particular, all seven of the β hairpins in the native structures contain a fragment in the turn that is highly structured. In the eight cases where the bioinformatics-based I-sites library picks out native-like structures, the present simulations are largely in agreement. Such physics-based modeling may be useful for identifying early nuclei in folding kinetics and for assisting in protein-structure prediction methods that utilize the assembly of peptide fragments. To carry out specific biochemical reactions, proteins must adopt precise three-dimensional conformations. During the folding of a protein, the protein picks out the right conformation out of billions of other conformations. It is not yet possible to do this computationally. Picking out the native conformation using physics-based atomically detailed models, sampled by molecular dynamics, is presently beyond the reach of computer methods. How can we speed up computational protein-structure prediction? One idea is that proteins start folding at specific parts of a chain that kink up early in the folding process. If we can identify these kinks, we should be able to speed up protein-structure prediction. Previous studies have identified likely kinks through bioinformatic analysis of existing protein structures. The goal of the authors here is to identify these putative folding initiation sites with a physical model instead. In this study, Ho and Dill show that, by chopping a protein chain into peptide pieces, then simulating the pieces in molecular dynamics, they can identify those peptide fragments that have conformational biases. These peptides identify the kinks in the protein chain.
Collapse
Affiliation(s)
- Bosco K Ho
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, USA.
| | | |
Collapse
|
48
|
Fujitsuka Y, Chikenji G, Takada S. SimFold energy function for de novo protein structure prediction: consensus with Rosetta. Proteins 2006; 62:381-98. [PMID: 16294329 DOI: 10.1002/prot.20748] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Predicting protein tertiary structures by in silico folding is still very difficult for proteins that have new folds. Here, we developed a coarse-grained energy function, SimFold, for de novo structure prediction, performed a benchmark test of prediction with fragment assembly simulations for 38 test proteins, and proposed consensus prediction with Rosetta. The SimFold energy consists of many terms that take into account solvent-induced effects on the basis of physicochemical consideration. In the benchmark test, SimFold succeeded in predicting native structures within 6.5 A for 12 of 38 proteins; this success rate was the same as that by the publicly available version of Rosetta (ab initio version 1.2) run with default parameters. We investigated which energy terms in SimFold contribute to structure prediction performance, finding that the hydrophobic interaction is the most crucial for the prediction, whereas other sequence-specific terms have weak but positive roles. In the benchmark, well-predicted proteins by SimFold and by Rosetta were not the same for 5 of 12 proteins, which led us to introduce consensus prediction. With combined decoys, we succeeded in prediction for 16 proteins, four more than SimFold or Rosetta separately. For each of 38 proteins, structural ensembles generated by SimFold and by Rosetta were qualitatively compared by mapping sampled structural space onto two dimensions. For proteins of which one of the two methods succeeded and the other failed in prediction, the former had a less scattered ensemble located around the native. For proteins of which both methods succeeded in prediction, often two ensembles were mixed up.
Collapse
Affiliation(s)
- Yoshimi Fujitsuka
- Graduate School of Natural Science and Technology, Kobe University, Kobe, Japan
| | | | | |
Collapse
|
49
|
Benros C, de Brevern AG, Etchebest C, Hazout S. Assessing a novel approach for predicting local 3D protein structures from sequence. Proteins 2006; 62:865-80. [PMID: 16385557 DOI: 10.1002/prot.20815] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We developed a novel approach for predicting local protein structure from sequence. It relies on the Hybrid Protein Model (HPM), an unsupervised clustering method we previously developed. This model learns three-dimensional protein fragments encoded into a structural alphabet of 16 protein blocks (PBs). Here, we focused on 11-residue fragments encoded as a series of seven PBs and used HPM to cluster them according to their local similarities. We thus built a library of 120 overlapping prototypes (mean fragments from each cluster), with good three-dimensional local approximation, i.e., a mean accuracy of 1.61 A Calpha root-mean-square distance. Our prediction method is intended to optimize the exploitation of the sequence-structure relations deduced from this library of long protein fragments. This was achieved by setting up a system of 120 experts, each defined by logistic regression to optimize the discrimination from sequence of a given prototype relative to the others. For a target sequence window, the experts computed probabilities of sequence-structure compatibility for the prototypes and ranked them, proposing the top scorers as structural candidates. Predictions were defined as successful when a prototype <2.5 A from the true local structure was found among those proposed. Our strategy yielded a prediction rate of 51.2% for an average of 4.2 candidates per sequence window. We also proposed a confidence index to estimate prediction quality. Our approach predicts from sequence alone and will thus provide valuable information for proteins without structural homologs. Candidates will also contribute to global structure prediction by fragment assembly.
Collapse
Affiliation(s)
- Cristina Benros
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U726, Université Denis DIDEROT-Paris 7, Paris, France.
| | | | | | | |
Collapse
|
50
|
Damm KL, Carlson HA. Gaussian-weighted RMSD superposition of proteins: a structural comparison for flexible proteins and predicted protein structures. Biophys J 2006; 90:4558-73. [PMID: 16565070 PMCID: PMC1471868 DOI: 10.1529/biophysj.105.066654] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Many proteins contain flexible structures such as loops and hinged domains. A simple root mean square deviation (RMSD) alignment of two different conformations of the same protein can be skewed by the difference between the mobile regions. To overcome this problem, we have developed a novel method to overlay two protein conformations by their atomic coordinates using a Gaussian-weighted RMSD (wRMSD) fit. The algorithm is based on the Kabsch least-squares method and determines an optimal transformation between two molecules by calculating the minimal weighted deviation between the two coordinate sets. Unlike other techniques that choose subsets of residues to overlay, all atoms are included in the wRMSD overlay. Atoms that barely move between the two conformations will have a greater weighting than those that have a large displacement. Our superposition tool has produced successful alignments when applied to proteins for which two conformations are known. The transformation calculation is heavily weighted by the coordinates of the static region of the two conformations, highlighting the range of flexibility in the overlaid structures. Lastly, we show how wRMSD fits can be used to evaluate predicted protein structures. Comparing a predicted fold to its experimentally determined target structure is another case of comparing two protein conformations of the same sequence, and the degree of alignment directly reflects the quality of the prediction.
Collapse
Affiliation(s)
- Kelly L Damm
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109-1065, USA
| | | |
Collapse
|