1
|
Pellequer JL. Perspectives Toward an Integrative Structural Biology Pipeline With Atomic Force Microscopy Topographic Images. J Mol Recognit 2024:e3102. [PMID: 39329418 DOI: 10.1002/jmr.3102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 08/21/2024] [Accepted: 09/03/2024] [Indexed: 09/28/2024]
Abstract
After the recent double revolutions in structural biology, which include the use of direct detectors for cryo-electron microscopy resulting in a significant improvement in the expected resolution of large macromolecule structures, and the advent of AlphaFold which allows for near-accurate prediction of any protein structures, the field of structural biology is now pursuing more ambitious targets, including several MDa assemblies. But complex target systems cannot be tackled using a single biophysical technique. The field of integrative structural biology has emerged as a global solution. The aim is to integrate data from multiple complementary techniques to produce a final three-dimensional model that cannot be obtained from any single technique. The absence of atomic force microscopy data from integrative structural biology platforms is not necessarily due to its nm resolution, as opposed to Å resolution for x-ray crystallography, nuclear magnetic resonance, or electron microscopy. Rather a significant issue was that the AFM topographic data lacked interpretability. Fortunately, with the introduction of the AFM-Assembly pipeline and other similar tools, it is now possible to integrate AFM topographic data into integrative modeling platforms. The advantages of single molecule techniques, such as AFM, include the ability to confirm experimentally any assembled molecular models or to produce alternative conformations that mimic the inherent flexibility of large proteins or complexes. The review begins with a brief overview of the historical developments of AFM data in structural biology, followed by an examination of the strengths and limitations of AFM imaging, which have hindered its integration into modern modeling platforms. This review discusses the correction and improvement of AFM topographic images, as well as the principles behind the AFM-Assembly pipeline. It also presents and discusses a series of challenges that need to be addressed in order to improve the incorporation of AFM data into integrative modeling platform.
Collapse
Affiliation(s)
- Jean-Luc Pellequer
- Univ. Grenoble Alpes, CEA, CNRS, Institut de Biologie Structurale (IBS), Grenoble, France
| |
Collapse
|
2
|
Abstract
For two decades, Rosetta has consistently been at the forefront of protein structure
prediction. While it has become a very large package comprising programs, scripts, and tools, for
different types of macromolecular modelling such as ligand docking, protein-protein docking,
protein design, and loop modelling, it started as the implementation of an algorithm for ab initio
protein structure prediction. The term ’Rosetta’ appeared for the first time twenty years ago in the
literature to describe that algorithm and its contribution to the third edition of the community wide
Critical Assessment of techniques for protein Structure Prediction (CASP3). Similar to the Rosetta
stone that allowed deciphering the ancient Egyptian civilisation, David Baker and his co-workers
have been contributing to deciphering ’the second half of the genetic code’. Although the focus of
Baker’s team has expended to de novo protein design in the past few years, Rosetta’s ‘fame’ is
associated with its fragment-assembly protein structure prediction approach. Following a
presentation of the main concepts underpinning its foundation, especially sequence-structure
correlation and usage of fragments, we review the main stages of its developments and highlight
the milestones it has achieved in terms of protein structure prediction, particularly in CASP.
Collapse
Affiliation(s)
- Jad Abbass
- Department of Computer Science, Lebanese International University, Bekaa, Lebanon
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE, United Kingdom
| |
Collapse
|
3
|
Hou J, Adhikari B, Tanner JJ, Cheng J. SAXSDom: Modeling multidomain protein structures using small-angle X-ray scattering data. Proteins 2020; 88:775-787. [PMID: 31860156 PMCID: PMC7230021 DOI: 10.1002/prot.25865] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 11/18/2019] [Accepted: 12/14/2019] [Indexed: 12/27/2022]
Abstract
Many proteins are composed of several domains that pack together into a complex tertiary structure. Multidomain proteins can be challenging for protein structure modeling, particularly those for which templates can be found for individual domains but not for the entire sequence. In such cases, homology modeling can generate high quality models of the domains but not for the orientations between domains. Small-angle X-ray scattering (SAXS) reports the structural properties of entire proteins and has the potential for guiding homology modeling of multidomain proteins. In this article, we describe a novel multidomain protein assembly modeling method, SAXSDom that integrates experimental knowledge from SAXS with probabilistic Input-Output Hidden Markov model to assemble the structures of individual domains together. Four SAXS-based scoring functions were developed and tested, and the method was evaluated on multidomain proteins from two public datasets. Incorporation of SAXS information improved the accuracy of domain assembly for 40 out of 46 critical assessment of protein structure prediction multidomain protein targets and 45 out of 73 multidomain protein targets from the ab initio domain assembly dataset. The results demonstrate that SAXS data can provide useful information to improve the accuracy of domain-domain assembly. The source code and tool packages are available at https://github.com/jianlin-cheng/SAXSDom.
Collapse
Affiliation(s)
- Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, MO, 63103, USA
| | - Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, Saint Louis, MO 63121, USA
| | - John J. Tanner
- Departments of Biochemistry and Chemistry, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
4
|
Wang Y, Virtanen J, Xue Z, Tesmer JJG, Zhang Y. Using iterative fragment assembly and progressive sequence truncation to facilitate phasing and crystal structure determination of distantly related proteins. Acta Crystallogr D Struct Biol 2016; 72:616-28. [PMID: 27139625 PMCID: PMC4931812 DOI: 10.1107/s2059798316003016] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Accepted: 02/19/2016] [Indexed: 04/15/2023] Open
Abstract
Molecular replacement (MR) often requires templates with high homology to solve the phase problem in X-ray crystallography. I-TASSER-MR has been developed to test whether the success rate for structure determination of distant-homology proteins could be improved by a combination of iterative fragmental structure-assembly simulations with progressive sequence truncation designed to trim regions with high variation. The pipeline was tested on two independent protein sets consisting of 61 proteins from CASP8 and 100 high-resolution proteins from the PDB. After excluding homologous templates, I-TASSER generated full-length models with an average TM-score of 0.773, which is 12% higher than the best threading templates. Using these as search models, I-TASSER-MR found correct MR solutions for 95 of 161 targets as judged by having a TFZ of >8 or with the final structure closer to the native than the initial search models. The success rate was 16% higher than when using the best threading templates. I-TASSER-MR was also applied to 14 protein targets from structure genomics centers. Seven of these were successfully solved by I-TASSER-MR. These results confirm that advanced structure assembly and progressive structural editing can significantly improve the success rate of MR for targets with distant homology to proteins of known structure.
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People’s Republic of China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jouko Virtanen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People’s Republic of China
| | - John J. G. Tesmer
- Departments of Pharmacology and Biological Chemistry, University of Michigan, Ann Arbor, MI 41809, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
5
|
Márquez-Chamorro AE, Asencio-Cortés G, Santiesteban-Toca CE, Aguilar-Ruiz JS. Soft computing methods for the prediction of protein tertiary structures: A survey. Appl Soft Comput 2015. [DOI: 10.1016/j.asoc.2015.06.024] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
6
|
Huang YJ, Mao B, Aramini JM, Montelione GT. Assessment of template-based protein structure predictions in CASP10. Proteins 2014; 82 Suppl 2:43-56. [PMID: 24323734 DOI: 10.1002/prot.24488] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Revised: 11/10/2013] [Accepted: 11/19/2013] [Indexed: 12/27/2022]
Abstract
Template-based modeling (TBM) is a major component of the critical assessment of protein structure prediction (CASP). In CASP10, some 41,740 predicted models submitted by 150 predictor groups were assessed as TBM predictions. The accuracy of protein structure prediction was assessed by geometric comparison with experimental X-ray crystal and NMR structures using a composite score that included both global alignment metrics and distance-matrix-based metrics. These included GDT-HA and GDC-all global alignment scores, and the superimposition-independent LDDT distance-matrix-based score. In addition, a superimposition-independent RPF metric, similar to that described previously for comparing protein models against experimental NMR data, was used for comparing predicted protein structure models against experimental protein structures. To score well on all four of these metrics, models must feature accurate predictions of both backbone and side-chain conformations. Performance rankings were determined independently for server and the combined server plus human-curated predictor groups. Final rankings were made using paired head-to-head Student's t-test analysis of raw metric scores among the top 25 performing groups in each category.
Collapse
Affiliation(s)
- Yuanpeng J Huang
- Center for Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey, 08854; Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, New Jersey, 08854; Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, New Jersey, 08854
| | | | | | | |
Collapse
|
7
|
Larsen A, Wagner JR, Jain A, Vaidehi N. Protein structure refinement of CASP target proteins using GNEIMO torsional dynamics method. J Chem Inf Model 2014; 54:508-17. [PMID: 24397429 PMCID: PMC3985798 DOI: 10.1021/ci400484c] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Indexed: 11/30/2022]
Abstract
A longstanding challenge in using computational methods for protein structure prediction is the refinement of low-resolution structural models derived from comparative modeling methods into highly accurate atomistic models useful for detailed structural studies. Previously, we have developed and demonstrated the utility of the internal coordinate molecular dynamics (MD) technique, generalized Newton-Euler inverse mass operator (GNEIMO), for refinement of small proteins. Using GNEIMO, the high-frequency degrees of freedom are frozen and the protein is modeled as a collection of rigid clusters connected by torsional hinges. This physical model allows larger integration time steps and focuses the conformational search in the low frequency torsional degrees of freedom. Here, we have applied GNEIMO with temperature replica exchange to refine low-resolution protein models of 30 proteins taken from the continuous assessment of structure prediction (CASP) competition. We have shown that GNEIMO torsional MD method leads to refinement of up to 1.3 Å in the root-mean-square deviation in coordinates for 30 CASP target proteins without using any experimental data as restraints in performing the GNEIMO simulations. This is in contrast with the unconstrained all-atom Cartesian MD method performed under the same conditions, where refinement requires the use of restraints during the simulations.
Collapse
Affiliation(s)
- Adrien
B. Larsen
- Division
of Immunology, Beckman Research Institute
of the City of Hope, 1500, E. Duarte Road, Duarte, California 91010, United States
| | - Jeffrey R. Wagner
- Division
of Immunology, Beckman Research Institute
of the City of Hope, 1500, E. Duarte Road, Duarte, California 91010, United States
| | - Abhinandan Jain
- Jet
Propulsion Laboratory, California Institute
of Technology, Pasadena, California 91109, United States
| | - Nagarajan Vaidehi
- Division
of Immunology, Beckman Research Institute
of the City of Hope, 1500, E. Duarte Road, Duarte, California 91010, United States
| |
Collapse
|
8
|
Tai CH, Bai H, Taylor TJ, Lee B. Assessment of template-free modeling in CASP10 and ROLL. Proteins 2013; 82 Suppl 2:57-83. [PMID: 24343678 DOI: 10.1002/prot.24470] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Revised: 10/23/2013] [Accepted: 10/29/2013] [Indexed: 12/27/2022]
Abstract
We present the assessment of predictions for Template-Free Modeling in CASP10 and a report on the first ROLL experiment wherein predictions are collected year round for review at the regular CASP season. Models were first clustered so that duplicated or very similar ones were grouped together and represented by one model in the cluster. The representatives were then compared with targets using GDT_TS, QCS, and three additional superposition-independent score functions newly developed for CASP10. For each target, the top 15 representatives by each score were pooled to form the Top15Union set. All models in this set were visually inspected by four of us independently using the new plugin, EvalScore, which we developed with the UCSF Chimera group. The best models were selected for each target after extensive debate among the four examiners. Groups were ranked by the number of targets (hits) for which a group's model was selected as one of the best models. The Keasar group had most hits in both categories, with four of 19 FM and eight of 36 ROLL targets. The most successful prediction servers were QUARK from Zhang's group for FM category with three hits and Zhang-server for the ROLL category with seven hits. As observed in CASP9, many successful groups were not true "template-free" modelers but used remote templates and/or server models to obtain their winning models. The results of the first ROLL experiment were broadly similar to those of the CASP10 FM exercise.
Collapse
Affiliation(s)
- Chin-Hsien Tai
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, 20892
| | | | | | | |
Collapse
|
9
|
Taylor TJ, Tai CH, Huang YJ, Block J, Bai H, Kryshtafovych A, Montelione GT, Lee B. Definition and classification of evaluation units for CASP10. Proteins 2013; 82 Suppl 2:14-25. [PMID: 24123179 DOI: 10.1002/prot.24434] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Revised: 08/23/2013] [Accepted: 09/19/2013] [Indexed: 11/10/2022]
Abstract
For the 10th experiment on Critical Assessment of the techniques of protein Structure Prediction (CASP), the prediction target proteins were broken into independent evaluation units (EUs), which were then classified into template-based modeling (TBM) or free modeling (FM) categories. We describe here how the EUs were defined and classified, what issues arose in the process, and how we resolved them. EUs are frequently not the whole target proteins but the constituting structural domains. However, the assessors from CASP7 on combined more than one domain into 1 EU for some targets, which implied that the assessment also included evaluation of the prediction of the relative position and orientation of these domains. In CASP10, we followed and expanded this notion by defining multidomain EUs for a number of targets. These included 3 EUs, each made of two domains of familiar fold but arranged in a novel manner and for which the focus of evaluation was the interdomain arrangement. An EU was classified to the TBM category if a template could be found by sequence similarity searches and to FM if a structural template could not be found by structural similarity searches. The EUs that did not fall cleanly in either of these cases were classified case-by-case, often including consideration of the overall quality and characteristics of the predictions.
Collapse
Affiliation(s)
- Todd J Taylor
- Laboratory of Molecular Biology, Center for Cancer Research National Cancer Institute National Institutes of Health, Bethesda, Maryland, 20892-4264
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Affiliation(s)
- Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa 31905, Israel;
| | - Leonid Pereyaslavets
- Department of Structural Biology, Stanford University, Stanford, California 94305; ,
| | | | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, California 94305; ,
| |
Collapse
|
11
|
Eickholt J, Cheng J. DNdisorder: predicting protein disorder using boosting and deep networks. BMC Bioinformatics 2013; 14:88. [PMID: 23497251 PMCID: PMC3599628 DOI: 10.1186/1471-2105-14-88] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2012] [Accepted: 02/28/2013] [Indexed: 11/23/2022] Open
Abstract
Background A number of proteins contain regions which do not adopt a stable tertiary structure in their native state. Such regions known as disordered regions have been shown to participate in many vital cell functions and are increasingly being examined as drug targets. Results This work presents a new sequence based approach for the prediction of protein disorder. The method uses boosted ensembles of deep networks to make predictions and participated in the CASP10 experiment. In a 10 fold cross validation procedure on a dataset of 723 proteins, the method achieved an average balanced accuracy of 0.82 and an area under the ROC curve of 0.90. These results are achieved in part by a boosting procedure which is able to steadily increase balanced accuracy and the area under the ROC curve over several rounds. The method also compared competitively when evaluated against a number of state-of-the-art disorder predictors on CASP9 and CASP10 benchmark datasets. Conclusions DNdisorder is available as a web service at http://iris.rnet.missouri.edu/dndisorder/.
Collapse
Affiliation(s)
- Jesse Eickholt
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | | |
Collapse
|
12
|
Kaufmann KW, Meiler J. Using RosettaLigand for small molecule docking into comparative models. PLoS One 2012; 7:e50769. [PMID: 23239984 PMCID: PMC3519832 DOI: 10.1371/journal.pone.0050769] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2012] [Accepted: 10/24/2012] [Indexed: 11/18/2022] Open
Abstract
Computational small molecule docking into comparative models of proteins is widely used to query protein function and in the development of small molecule therapeutics. We benchmark RosettaLigand docking into comparative models for nine proteins built during CASP8 that contain ligands. We supplement the study with 21 additional protein/ligand complexes to cover a wider space of chemotypes. During a full docking run in 21 of the 30 cases, RosettaLigand successfully found a native-like binding mode among the top ten scoring binding modes. From the benchmark cases we find that careful template selection based on ligand occupancy provides the best chance of success while overall sequence identity between template and target do not appear to improve results. We also find that binding energy normalized by atom number is often less than -0.4 in native-like binding modes.
Collapse
Affiliation(s)
- Kristian W. Kaufmann
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
- Institute of Chemical Biology, Vanderbilt University, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
13
|
Cheng J, Eickholt J, Wang Z, Deng X. Recursive protein modeling: a divide and conquer strategy for Protein Structure Prediction and its case study in CASP9. J Bioinform Comput Biol 2012; 10:1242003. [PMID: 22809379 DOI: 10.1142/s0219720012420036] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
After decades of research, protein structure prediction remains a very challenging problem. In order to address the different levels of complexity of structural modeling, two types of modeling techniques--template-based modeling and template-free modeling--have been developed. Template-based modeling can often generate a moderate- to high-resolution model when a similar, homologous template structure is found for a query protein but fails if no template or only incorrect templates are found. Template-free modeling, such as fragment-based assembly, may generate models of moderate resolution for small proteins of low topological complexity. Seldom have the two techniques been integrated together to improve protein modeling. Here we develop a recursive protein modeling approach to selectively and collaboratively apply template-based and template-free modeling methods to model template-covered (i.e. certain) and template-free (i.e. uncertain) regions of a protein. A preliminary implementation of the approach was tested on a number of hard modeling cases during the 9th Critical Assessment of Techniques for Protein Structure Prediction (CASP9) and successfully improved the quality of modeling in most of these cases. Recursive modeling can significantly reduce the complexity of protein structure modeling and integrate template-based and template-free modeling to improve the quality and efficiency of protein structure prediction.
Collapse
Affiliation(s)
- Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA.
| | | | | | | |
Collapse
|
14
|
Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T. Assessment of template based protein structure predictions in CASP9. Proteins 2011; 79 Suppl 10:37-58. [PMID: 22002823 DOI: 10.1002/prot.23177] [Citation(s) in RCA: 132] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2011] [Revised: 09/01/2011] [Accepted: 09/04/2011] [Indexed: 12/29/2022]
Abstract
In the Ninth Edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP9), 61,665 models submitted by 176 groups were assessed for their accuracy in the template based modeling category. The models were evaluated numerically in comparison to their experimental control structures using two global measures (GDT and GDC), and a novel local score evaluating the correct modeling of local interactions (lDDT). Overall, the state of the art of template based modeling in CASP9 is high, with many groups performing well. Among the methods registered as prediction "servers", six independent groups are performing on average better than the rest. The submissions by "human" groups are dominated by meta-predictors, with one group performing noticeably better than the others. Most of the participating groups failed to assign realistic confidence estimates to their predictions, and only a very small fraction of the assessed methods have provided highly accurate models and realistic error estimates at the same time. Also, the accuracy of predictions for homo-oligomeric assemblies was overall poor, and only one group performed better than a naïve control predictor. Here, we present the results of our assessment of the CASP9 predictions in the category of template based modeling, documenting the state of the art and highlighting areas for future developments.
Collapse
Affiliation(s)
- Valerio Mariani
- Biozentrum University of Basel, Switzerland; SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | | | | | | | |
Collapse
|
15
|
Kryshtafovych A, Fidelis K, Moult J. CASP9 results compared to those of previous CASP experiments. Proteins 2011; 79 Suppl 10:196-207. [PMID: 21997643 PMCID: PMC4180080 DOI: 10.1002/prot.23182] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2011] [Revised: 07/13/2011] [Accepted: 08/13/2011] [Indexed: 01/07/2023]
Abstract
The quality of structure models submitted to CASP9 is analyzed in the context of previous CASPs. Comparison methods are similar to those used in previous articles in this series, with the addition of new methods looking at model quality in regions not covered by a single best structural template, alignment accuracy, and progress for template-free models. Progress in this CASP was again modest and statistically hard to validate. Nevertheless, there are several positive trends. There is an indication of improvement in overall model quality for the midrange of template-based modeling difficulty, methods for identifying the best model from a set generated have improved, and there are strong indications of progress in the quality of template-free models of short proteins. In addition, the new examination of a model quality in regions of model not covered by the best available template reveals better performance than had previously been apparent.
Collapse
Affiliation(s)
- Andriy Kryshtafovych
- Genome Center, University of California-Davis, 451 Health Sciences Drive, Davis, CA 95616, USA.
| | | | | |
Collapse
|
16
|
Wang Q, Vantasin K, Xu D, Shang Y. MUFOLD-WQA: A new selective consensus method for quality assessment in protein structure prediction. Proteins 2011; 79 Suppl 10:185-95. [PMID: 21997748 DOI: 10.1002/prot.23185] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 08/25/2011] [Accepted: 08/27/2011] [Indexed: 11/07/2022]
Abstract
Assessing the quality of predicted models is essential in protein tertiary structure prediction. In the past critical assessment of techniques for protein structure prediction (CASP) experiments, consensus quality assessment (QA) methods have shown to be very effective, outperforming single-model methods and other competing approaches by a large margin. In the consensus QA approach, the quality score of a model is typically estimated based on pair-wise structure similarity of it to a set of reference models. In CASP8, the differences among the top QA servers were mostly in the selection of the reference models. In this article, we present a new consensus method "SelCon" based on two key ideas: (1) to adaptively select appropriate reference models based on the attributes of the whole set of predicted models and (2) to weigh different reference models differently, and in particular not to use models that are too similar or too different from the candidate model as its references. We have developed several reference selection functions in SelCon and obtained improved QA results over existing QA methods in experiments using CASP7 and CASP8 data. In the recently completed CASP9 in 2010, the new method was implemented in our MUFOLD-WQA server. Both the official CASP9 assessment and our in-house evaluation showed that MUFOLD-WQA performed very well and achieved top performances in both the global structure QA and top-model selection category in CASP9.
Collapse
Affiliation(s)
- Qingguo Wang
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | | | | | | |
Collapse
|
17
|
Eickholt J, Wang Z, Cheng J. A conformation ensemble approach to protein residue-residue contact. BMC STRUCTURAL BIOLOGY 2011; 11:38. [PMID: 21989082 PMCID: PMC3200154 DOI: 10.1186/1472-6807-11-38] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2011] [Accepted: 10/12/2011] [Indexed: 11/20/2022]
Abstract
Background Protein residue-residue contact prediction is important for protein model generation and model evaluation. Here we develop a conformation ensemble approach to improve residue-residue contact prediction. We collect a number of structural models stemming from a variety of methods and implementations. The various models capture slightly different conformations and contain complementary information which can be pooled together to capture recurrent, and therefore more likely, residue-residue contacts. Results We applied our conformation ensemble approach to free modeling targets from both CASP8 and CASP9. Given a diverse ensemble of models, the method is able to achieve accuracies of. 48 for the top L/5 medium range contacts and. 36 for the top L/5 long range contacts for CASP8 targets (L being the target domain length). When applied to targets from CASP9, the accuracies of the top L/5 medium and long range contact predictions were. 34 and. 30 respectively. Conclusions When operating on a moderately diverse ensemble of models, the conformation ensemble approach is an effective means to identify medium and long range residue-residue contacts. An immediate benefit of the method is that when tied with a scoring scheme, it can be used to successfully rank models.
Collapse
Affiliation(s)
- Jesse Eickholt
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | | | | |
Collapse
|
18
|
Lee J, Lee J, Sasaki TN, Sasai M, Seok C, Lee J. De novo
protein structure prediction by dynamic fragment assembly and conformational space annealing. Proteins 2011; 79:2403-17. [DOI: 10.1002/prot.23059] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Revised: 03/24/2011] [Accepted: 04/12/2011] [Indexed: 12/25/2022]
|
19
|
|
20
|
Hu Y, Dong X, Wu A, Cao Y, Tian L, Jiang T. Incorporation of local structural preference potential improves fold recognition. PLoS One 2011; 6:e17215. [PMID: 21365008 PMCID: PMC3041821 DOI: 10.1371/journal.pone.0017215] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Accepted: 01/25/2011] [Indexed: 11/19/2022] Open
Abstract
Fold recognition, or threading, is a popular protein structure modeling approach that uses known structure templates to build structures for those of unknown. The key to the success of fold recognition methods lies in the proper integration of sequence, physiochemical and structural information. Here we introduce another type of information, local structural preference potentials of 3-residue and 9-residue fragments, for fold recognition. By combining the two local structural preference potentials with the widely used sequence profile, secondary structure information and hydrophobic score, we have developed a new threading method called FR-t5 (fold recognition by use of 5 terms). In benchmark testings, we have found the consideration of local structural preference potentials in FR-t5 not only greatly enhances the alignment accuracy and recognition sensitivity, but also significantly improves the quality of prediction models.
Collapse
Affiliation(s)
- Yun Hu
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoxi Dong
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Aiping Wu
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Yang Cao
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Liqing Tian
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Taijiao Jiang
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- * E-mail:
| |
Collapse
|
21
|
Proteome evolution and the metabolic origins of translation and cellular life. J Mol Evol 2010; 72:14-33. [PMID: 21082171 DOI: 10.1007/s00239-010-9400-9] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2010] [Accepted: 10/25/2010] [Indexed: 12/27/2022]
Abstract
The origin of life has puzzled molecular scientists for over half a century. Yet fundamental questions remain unanswered, including which came first, the metabolic machinery or the encoding nucleic acids. In this study we take a protein-centric view and explore the ancestral origins of proteins. Protein domain structures in proteomes are highly conserved and embody molecular functions and interactions that are needed for cellular and organismal processes. Here we use domain structure to study the evolution of molecular function in the protein world. Timelines describing the age and function of protein domains at fold, fold superfamily, and fold family levels of structural complexity were derived from a structural phylogenomic census in hundreds of fully sequenced genomes. These timelines unfold congruent hourglass patterns in rates of appearance of domain structures and functions, functional diversity, and hierarchical complexity, and revealed a gradual build up of protein repertoires associated with metabolism, translation and DNA, in that order. The most ancient domain architectures were hydrolase enzymes and the first translation domains had catalytic functions for the aminoacylation and the molecular switch-driven transport of RNA. Remarkably, the most ancient domains had metabolic roles, did not interact with RNA, and preceded the gradual build-up of translation. In fact, the first translation domains had also a metabolic origin and were only later followed by specialized translation machinery. Our results explain how the generation of structure in the protein world and the concurrent crystallization of translation and diversified cellular life created further opportunities for proteomic diversification.
Collapse
|
22
|
Unmet challenges of structural genomics. Curr Opin Struct Biol 2010; 20:587-97. [PMID: 20810277 DOI: 10.1016/j.sbi.2010.08.001] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2010] [Revised: 07/30/2010] [Accepted: 08/03/2010] [Indexed: 11/22/2022]
Abstract
Structural genomics (SG) programs have developed during the last decade many novel methodologies for faster and more accurate structure determination. These new tools and approaches led to the determination of thousands of protein structures. The generation of enormous amounts of experimental data resulted in significant improvements in the understanding of many biological processes at molecular levels. However, the amount of data collected so far is so large that traditional analysis methods are limiting the rate of extraction of biological and biochemical information from 3D models. This situation has prompted us to review the challenges that remain unmet by SG, as well as the areas in which the potential impact of SG could exceed what has been achieved so far.
Collapse
|
23
|
Tress ML, Valencia A. Predicted residue-residue contacts can help the scoring of 3D models. Proteins 2010; 78:1980-91. [PMID: 20408174 DOI: 10.1002/prot.22714] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
During the 7th Critical Assessment of Protein Structure Prediction (CASP7) experiment, it was suggested that the real value of predicted residue-residue contacts might lie in the scoring of 3D model structures. Here, we have carried out a detailed reassessment of the contact predictions made during the recent CASP8 experiment to determine whether predicted contacts might aid in the selection of close-to-native structures or be a useful tool for scoring 3D structural models. We used the contacts predicted by the CASP8 residue-residue contact prediction groups to select models for each target domain submitted to the experiment. We found that the information contained in the predicted residue-residue contacts would probably have helped in the selection of 3D models in the free modeling regime and over the harder comparative modeling targets. Indeed, in many cases, the models selected using just the predicted contacts had better GDT-TS scores than all but the best 3D prediction groups. Despite the well-known low accuracy of residue-residue contact predictions, it is clear that the predictive power of contacts can be useful in 3D model prediction strategies.
Collapse
Affiliation(s)
- Michael L Tress
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain.
| | | |
Collapse
|
24
|
Kryshtafovych A, Krysko O, Daniluk P, Dmytriv Z, Fidelis K. Protein structure prediction center in CASP8. Proteins 2010; 77 Suppl 9:5-9. [PMID: 19722263 DOI: 10.1002/prot.22517] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
We present an outline of the Critical Assessment of Protein Structure Prediction (CASP) infrastructure implemented at the University of California, Davis, Protein Structure Prediction Center. The infrastructure supports selection and validation of prediction targets, collection of predictions, standard evaluation of submitted predictions, and presentation of results. The Center also supports information exchange relating to CASP experiments and structure prediction in general. Technical aspects of conducting the CASP8 experiment and relevant statistics are also provided.
Collapse
|
25
|
Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A. Evaluation of template-based models in CASP8 with standard measures. Proteins 2010; 77 Suppl 9:18-28. [PMID: 19731382 DOI: 10.1002/prot.22561] [Citation(s) in RCA: 108] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The strategy for evaluating template-based models submitted to CASP has continuously evolved from CASP1 to CASP5, leading to a standard procedure that has been used in all subsequent editions. The established approach includes methods for calculating the quality of each individual model, for assigning scores based on the distribution of the results for each target and for computing the statistical significance of the differences in scores between prediction methods. These data are made available to the assessor of the template-based modeling category, who uses them as a starting point for further evaluations and analyses. This article describes the detailed workflow of the procedure, provides justifications for a number of choices that are customarily made for CASP data evaluation, and reports the results of the analysis of template-based predictions at CASP8.
Collapse
Affiliation(s)
- Domenico Cozzetto
- Department of Biochemical Sciences, Sapienza-University of Rome, P. le A. Moro, 5, 00185 Rome, Italy
| | | | | | - John Moult
- Center for Advanced Research in Biotechnology, University of Maryland, Rockville, Maryland 20850
| | - Burkhard Rost
- Department of Biochemistry and Molecular Biophysics, Columbia University, Northeast Structural Genomics Consortium (NESG) and New York Consortium on Membrane Proteins (NYCOMPS), Columbia University, New York, New York 10032
| | - Anna Tramontano
- Department of Biochemical Sciences, Sapienza-University of Rome, P. le A. Moro, 5, 00185 Rome, Italy.,Istituto Pasteur-Fondazione Cenci Bolognetti, Sapienza-University of Rome, P. le A. Moro, 5, 00185 Rome, Italy
| |
Collapse
|
26
|
Keedy DA, Williams CJ, Headd JJ, Arendall WB, Chen VB, Kapral GJ, Gillespie RA, Block JN, Zemla A, Richardson DC, Richardson JS. The other 90% of the protein: assessment beyond the Calphas for CASP8 template-based and high-accuracy models. Proteins 2010; 77 Suppl 9:29-49. [PMID: 19731372 DOI: 10.1002/prot.22551] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
For template-based modeling in the CASP8 Critical Assessment of Techniques for Protein Structure Prediction, this work develops and applies six new full-model metrics. They are designed to complement and add value to the traditional template-based assessment by the global distance test (GDT) and related scores (based on multiple superpositions of Calpha atoms between target structure and predictions labeled "Model 1"). The new metrics evaluate each predictor group on each target, using all atoms of their best model with above-average GDT. Two metrics evaluate how "protein-like" the predicted model is: the MolProbity score used for validating experimental structures, and a mainchain reality score using all-atom steric clashes, bond length and angle outliers, and backbone dihedrals. Four other new metrics evaluate match of model to target for mainchain and sidechain hydrogen bonds, sidechain end positioning, and sidechain rotamers. Group-average Z-score across the six full-model measures is averaged with group-average GDT Z-score to produce the overall ranking for full-model, high-accuracy performance. Separate assessments are reported for specific aspects of predictor-group performance, such as robustness of approximately correct template or fold identification, and self-scoring ability at identifying the best of their models. Fold identification is distinct from but correlated with group-average GDT Z-score if target difficulty is taken into account, whereas self-scoring is done best by servers and is uncorrelated with GDT performance. Outstanding individual models on specific targets are identified and discussed. Predictor groups excelled at different aspects, highlighting the diversity of current methodologies. However, good full-model scores correlate robustly with high Calpha accuracy.
Collapse
Affiliation(s)
- Daniel A Keedy
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina 27710, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Moult J, Fidelis K, Kryshtafovych A, Rost B, Tramontano A. Critical assessment of methods of protein structure prediction-Round VIII. Proteins 2009; 77 Suppl 9:1-4. [DOI: 10.1002/prot.22589] [Citation(s) in RCA: 156] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|