51
|
Johansson MU, Zoete V, Michielin O, Guex N. Defining and searching for structural motifs using DeepView/Swiss-PdbViewer. BMC Bioinformatics 2012; 13:173. [PMID: 22823337 PMCID: PMC3436773 DOI: 10.1186/1471-2105-13-173] [Citation(s) in RCA: 217] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Accepted: 07/06/2012] [Indexed: 11/10/2022] Open
Abstract
Background Today, recognition and classification of sequence motifs and protein folds is a mature field, thanks to the availability of numerous comprehensive and easy to use software packages and web-based services. Recognition of structural motifs, by comparison, is less well developed and much less frequently used, possibly due to a lack of easily accessible and easy to use software. Results In this paper, we describe an extension of DeepView/Swiss-PdbViewer through which structural motifs may be defined and searched for in large protein structure databases, and we show that common structural motifs involved in stabilizing protein folds are present in evolutionarily and structurally unrelated proteins, also in deeply buried locations which are not obviously related to protein function. Conclusions The possibility to define custom motifs and search for their occurrence in other proteins permits the identification of recurrent arrangements of residues that could have structural implications. The possibility to do so without having to maintain a complex software/hardware installation on site brings this technology to experts and non-experts alike.
Collapse
Affiliation(s)
- Maria U Johansson
- Vital-IT Group, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | | | | |
Collapse
|
52
|
Bogacheva EN, Dolgov AA, Chulichkov AL, Shishkov AV, Ksenofontov AL, Fedorova NV, Baratova LA. [Differences in spatial structures of the influenza virus M1 protein in crystal, solution and virion]. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2012; 38:70-7. [PMID: 22792708 DOI: 10.1134/s1068162012010037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Spatial structure of the influenza virus A/Puerto Rico/8/34 (PR8, subtype H1N1) M1 protein in a solution and composition of the virion was studied by tritium planigraphy technique. The special algorithm for modeling of the spatial structure is used to simulate the experiment, as well as a set of algorithms predicting secondary structure and disordered regions in proteins. Tertiary structures were refined using the program Rosetta. To compare the structures in solution and in virion, also used the X-ray diffraction data for NM-domain. The main difference between protein structure in solution and crystal is observed in the contact region of N- and M-domains, which are more densely packed in the crystalline state. Locations include the maximum label is almost identical to the unstructured regions of proteins predicted by bioinformatics analysis. These areas are concentrated in the C-domain and in the loop regions between the M-, N-, and C-domains. Analytical centrifugation and dynamic laser light scattering confirm data of tritium planigraphy. Anomalous hydrodynamic size, and low structuring of the M1 protein in solution were found. The multifunctionality of protein in the cell appears to be associated with its plastic tertiary structure, which provides at the expense of unstructured regions of contact with various molecules-partners.
Collapse
|
53
|
Salon JA, Lodowski DT, Palczewski K. The significance of G protein-coupled receptor crystallography for drug discovery. Pharmacol Rev 2012; 63:901-37. [PMID: 21969326 DOI: 10.1124/pr.110.003350] [Citation(s) in RCA: 160] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Crucial as molecular sensors for many vital physiological processes, seven-transmembrane domain G protein-coupled receptors (GPCRs) comprise the largest family of proteins targeted by drug discovery. Together with structures of the prototypical GPCR rhodopsin, solved structures of other liganded GPCRs promise to provide insights into the structural basis of the superfamily's biochemical functions and assist in the development of new therapeutic modalities and drugs. One of the greatest technical and theoretical challenges to elucidating and exploiting structure-function relationships in these systems is the emerging concept of GPCR conformational flexibility and its cause-effect relationship for receptor-receptor and receptor-effector interactions. Such conformational changes can be subtle and triggered by relatively small binding energy effects, leading to full or partial efficacy in the activation or inactivation of the receptor system at large. Pharmacological dogma generally dictates that these changes manifest themselves through kinetic modulation of the receptor's G protein partners. Atomic resolution information derived from increasingly available receptor structures provides an entrée to the understanding of these events and practically applying it to drug design. Supported by structure-activity relationship information arising from empirical screening, a unified structural model of GPCR activation/inactivation promises to both accelerate drug discovery in this field and improve our fundamental understanding of structure-based drug design in general. This review discusses fundamental problems that persist in drug design and GPCR structural determination.
Collapse
Affiliation(s)
- John A Salon
- Department of Molecular Structure, Amgen Incorporated, Thousand Oaks, California, USA
| | | | | |
Collapse
|
54
|
Bogacheva EN, Dolgov AA, Chulichkov AL, Shishkov AV. Tritium planigraphy as a tool for studying the structural organization nanobiocomplexes. RUSSIAN JOURNAL OF PHYSICAL CHEMISTRY B 2012. [DOI: 10.1134/s1990793112080039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
55
|
Kasprzak JM, Czerwoniec A, Bujnicki JM. Molecular evolution of dihydrouridine synthases. BMC Bioinformatics 2012; 13:153. [PMID: 22741570 PMCID: PMC3674756 DOI: 10.1186/1471-2105-13-153] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2011] [Accepted: 05/24/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Dihydrouridine (D) is a modified base found in conserved positions in the D-loop of tRNA in Bacteria, Eukaryota, and some Archaea. Despite the abundant occurrence of D, little is known about its biochemical roles in mediating tRNA function. It is assumed that D may destabilize the structure of tRNA and thus enhance its conformational flexibility. D is generated post-transcriptionally by the reduction of the 5,6-double bond of a uridine residue in RNA transcripts. The reaction is carried out by dihydrouridine synthases (DUS). DUS constitute a conserved family of enzymes encoded by the orthologous gene family COG0042. In protein sequence databases, members of COG0042 are typically annotated as "predicted TIM-barrel enzymes, possibly dehydrogenases, nifR3 family". RESULTS To elucidate sequence-structure-function relationships in the DUS family, a comprehensive bioinformatic analysis was carried out. We performed extensive database searches to identify all members of the currently known DUS family, followed by clustering analysis to subdivide it into subfamilies of closely related sequences. We analyzed phylogenetic distributions of all members of the DUS family and inferred the evolutionary tree, which suggested a scenario for the evolutionary origin of dihydrouridine-forming enzymes. For a human representative of the DUS family, the hDus2 protein suggested as a potential drug target in cancer, we generated a homology model. While this article was under review, a crystal structure of a DUS representative has been published, giving us an opportunity to validate the model. CONCLUSIONS We compared sequences and phylogenetic distributions of all members of the DUS family and inferred the phylogenetic tree, which provides a framework to study the functional differences among these proteins and suggests a scenario for the evolutionary origin of dihydrouridine formation. Our evolutionary and structural classification of the DUS family provides a background to study functional differences among these proteins that will guide experimental analyses.
Collapse
Affiliation(s)
- Joanna M Kasprzak
- Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, PL-61-614 Poznan, Poland
| | | | | |
Collapse
|
56
|
Basu S, Bhattacharyya D, Banerjee R. Self-complementarity within proteins: bridging the gap between binding and folding. Biophys J 2012; 102:2605-14. [PMID: 22713576 DOI: 10.1016/j.bpj.2012.04.029] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Revised: 03/30/2012] [Accepted: 04/17/2012] [Indexed: 01/09/2023] Open
Abstract
Complementarity, in terms of both shape and electrostatic potential, has been quantitatively estimated at protein-protein interfaces and used extensively to predict the specific geometry of association between interacting proteins. In this work, we attempted to place both binding and folding on a common conceptual platform based on complementarity. To that end, we estimated (for the first time to our knowledge) electrostatic complementarity (Em) for residues buried within proteins. Em measures the correlation of surface electrostatic potential at protein interiors. The results show fairly uniform and significant values for all amino acids. Interestingly, hydrophobic side chains also attain appreciable complementarity primarily due to the trajectory of the main chain. Previous work from our laboratory characterized the surface (or shape) complementarity (Sm) of interior residues, and both of these measures have now been combined to derive two scoring functions to identify the native fold amid a set of decoys. These scoring functions are somewhat similar to functions that discriminate among multiple solutions in a protein-protein docking exercise. The performances of both of these functions on state-of-the-art databases were comparable if not better than most currently available scoring functions. Thus, analogously to interfacial residues of protein chains associated (docked) with specific geometry, amino acids found in the native interior have to satisfy fairly stringent constraints in terms of both Sm and Em. The functions were also found to be useful for correctly identifying the same fold for two sequences with low sequence identity. Finally, inspired by the Ramachandran plot, we developed a plot of Sm versus Em (referred to as the complementarity plot) that identifies residues with suboptimal packing and electrostatics which appear to be correlated to coordinate errors.
Collapse
Affiliation(s)
- Sankar Basu
- Crystallography and Molecular Biology Division, Saha Institute of Nuclear Physics, Kolkata, India
| | | | | |
Collapse
|
57
|
Abstract
Accurate all-atom energy functions are crucial for successful high-resolution protein structure prediction. In this chapter, we review both physics-based force fields and knowledge-based potentials used in protein modeling. Because it is important to calculate the energy as accurately as possible given the limitations imposed by sampling convergence, different components of the energy, and force fields representing them to varying degrees of detail and complexity are discussed. Force fields using Cartesian as well as torsion angle representations of protein geometry are covered. Since solvent is important for protein energetics, different aqueous and membrane solvation models for protein simulations are also described. Finally, we summarize recent progress in protein structure refinement using new force fields.
Collapse
|
58
|
Wu S, Szilagyi A, Zhang Y. Improving protein structure prediction using multiple sequence-based contact predictions. Structure 2011; 19:1182-91. [PMID: 21827953 DOI: 10.1016/j.str.2011.05.004] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2011] [Revised: 04/13/2011] [Accepted: 05/12/2011] [Indexed: 11/25/2022]
Abstract
Although residue-residue contact maps dictate the topology of proteins, sequence-based ab initio contact predictions have been found little use in actual structure prediction due to the low accuracy. We developed a composite set of nine SVM-based contact predictors that are used in I-TASSER simulation in combination with sparse template contact restraints. When testing the strategy on 273 nonhomologous targets, remarkable improvements of I-TASSER models were observed for both easy and hard targets, with p value by Student's t test<0.00001 and 0.001, respectively. In several cases, template modeling score increases by >30%, which essentially converts "nonfoldable" targets into "foldable" ones. In CASP9, I-TASSER employed ab initio contact predictions, and generated models for 26 FM targets with a GDT-score 16% and 44% higher than the second and third best servers from other groups, respectively. These findings demonstrate a new avenue to improve the accuracy of protein structure prediction especially for free-modeling targets.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, Lawrence, KS 66047, USA
| | | | | |
Collapse
|
59
|
Lammert H, Wolynes PG, Onuchic JN. The role of atomic level steric effects and attractive forces in protein folding. Proteins 2011; 80:362-73. [PMID: 22081451 DOI: 10.1002/prot.23187] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2011] [Revised: 09/05/2011] [Accepted: 09/07/2011] [Indexed: 12/14/2022]
Abstract
Protein folding into tertiary structures is controlled by an interplay of attractive contact interactions and steric effects. We investigate the balance between these contributions using structure-based models using an all-atom representation of the structure combined with a coarse-grained contact potential. Tertiary contact interactions between atoms are collected into a single broad attractive well between the C(β) atoms between each residue pair in a native contact. Through the width of these contact potentials we control their tolerance for deviations from the ideal structure and the spatial range of attractive interactions. In the compact native state dominant packing constraints limit the effects of a coarse-grained contact potential. During folding, however, the broad attractive potentials allow an early collapse that starts before the native local structure is completely adopted. As a consequence the folding transition is broadened and the free energy barrier is decreased. Eventually two-state folding behavior is lost completely for systems with very broad attractive potentials. The stabilization of native-like residue interactions in non-perfect geometries early in the folding process frequently leads to structural traps. Global mirror images are a notable example. These traps are penalized by the details of the repulsive interactions only after further collapse. Successful folding to the native state requires simultaneous guidance from both attractive and repulsive interactions.
Collapse
Affiliation(s)
- Heiko Lammert
- Center for Theoretical Biological Physics, University of California, San Diego, La Jolla, California 92093
| | | | | |
Collapse
|
60
|
Schulz C, Lytovchenko O, Melin J, Chacinska A, Guiard B, Neumann P, Ficner R, Jahn O, Schmidt B, Rehling P. Tim50's presequence receptor domain is essential for signal driven transport across the TIM23 complex. ACTA ACUST UNITED AC 2011; 195:643-56. [PMID: 22065641 PMCID: PMC3257539 DOI: 10.1083/jcb.201105098] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
N-terminal targeting signals (presequences) direct proteins across the TOM complex in the outer mitochondrial membrane and the TIM23 complex in the inner mitochondrial membrane. Presequences provide directionality to the transport process and regulate the transport machineries during translocation. However, surprisingly little is known about how presequence receptors interact with the signals and what role these interactions play during preprotein transport. Here, we identify signal-binding sites of presequence receptors through photo-affinity labeling. Using engineered presequence probes, photo cross-linking sites on mitochondrial proteins were mapped mass spectrometrically, thereby defining a presequence-binding domain of Tim50, a core subunit of the TIM23 complex that is essential for mitochondrial protein import. Our results establish Tim50 as the primary presequence receptor at the inner membrane and show that targeting signals and Tim50 regulate the Tim23 channel in an antagonistic manner.
Collapse
Affiliation(s)
- Christian Schulz
- Abteilung für Biochemie II, Universität Göttingen, D-37073 Göttingen, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
61
|
Shishkov A, Bogacheva E, Fedorova N, Ksenofontov A, Badun G, Radyukhin V, Lukashina E, Serebryakova M, Dolgov A, Chulichkov A, Dobrov E, Baratova L. Spatial structure peculiarities of influenza A virus matrix M1 protein in an acidic solution that simulates the internal lysosomal medium. FEBS J 2011; 278:4905-16. [PMID: 21985378 DOI: 10.1111/j.1742-4658.2011.08392.x] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
The structure of the C-terminal domain of the influenza virus A matrix M1 protein, for which X-ray diffraction data were still missing, was studied in acidic solution. Matrix M1 protein was bombarded with thermally-activated tritium atoms, and the resulting intramolecular distribution of the tritium label was analyzed to assess the steric accessibility of the amino acid residues in this protein. This technique revealed that interdomain loops and the C-terminal domain of the protein are the most accessible to labeling with tritium atoms. A model of the spatial arrangement of the C-terminal domain of matrix M1 protein was generated using rosetta software adjusted to the data obtained by tritium planigraphy experiments. This model suggests that the C-terminal domain is an almost flat layer with a three-α-helical structure. To explain the high level of tritium label incorporation into the C-terminal domain of the M1 protein in an acidic solution, we also used independent experimental approaches (CD spectroscopy, limited proteolysis and MALDI-TOF MS analysis of the proteolysis products, dynamic light scattering and analytical ultracentrifugation), as well as multiple computational algorithms, to analyse the intrinsic protein disorder. Taken together, the results obtained in the present study indicate that the C-terminal domain is weakly structured. We hypothesize that the specific 3D structural peculiarities of the M1 protein revealed in acidic pH solution allow the protein greater structural flexibility and enable it to interact effectively with the components of the host cell.
Collapse
Affiliation(s)
- Alexander Shishkov
- N N Semenov Institute of Chemical Physics, Russian Academy of Sciences, Moscow, Russia
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
62
|
Eickholt J, Wang Z, Cheng J. A conformation ensemble approach to protein residue-residue contact. BMC STRUCTURAL BIOLOGY 2011; 11:38. [PMID: 21989082 PMCID: PMC3200154 DOI: 10.1186/1472-6807-11-38] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2011] [Accepted: 10/12/2011] [Indexed: 11/20/2022]
Abstract
Background Protein residue-residue contact prediction is important for protein model generation and model evaluation. Here we develop a conformation ensemble approach to improve residue-residue contact prediction. We collect a number of structural models stemming from a variety of methods and implementations. The various models capture slightly different conformations and contain complementary information which can be pooled together to capture recurrent, and therefore more likely, residue-residue contacts. Results We applied our conformation ensemble approach to free modeling targets from both CASP8 and CASP9. Given a diverse ensemble of models, the method is able to achieve accuracies of. 48 for the top L/5 medium range contacts and. 36 for the top L/5 long range contacts for CASP8 targets (L being the target domain length). When applied to targets from CASP9, the accuracies of the top L/5 medium and long range contact predictions were. 34 and. 30 respectively. Conclusions When operating on a moderately diverse ensemble of models, the conformation ensemble approach is an effective means to identify medium and long range residue-residue contacts. An immediate benefit of the method is that when tied with a scoring scheme, it can be used to successfully rank models.
Collapse
Affiliation(s)
- Jesse Eickholt
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | | | | |
Collapse
|
63
|
Monastyrskyy B, Fidelis K, Tramontano A, Kryshtafovych A. Evaluation of residue-residue contact predictions in CASP9. Proteins 2011; 79 Suppl 10:119-25. [PMID: 21928322 DOI: 10.1002/prot.23160] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2011] [Revised: 06/25/2011] [Accepted: 07/27/2011] [Indexed: 01/03/2023]
Abstract
This work presents the results of the assessment of the intramolecular residue-residue contact predictions submitted to CASP9. The methodology for the assessment does not differ from that used in previous CASPs, with two basic evaluation measures being the precision in recognizing contacts and the difference between the distribution of distances in the subset of predicted contact pairs versus all pairs of residues in the structure. The emphasis is placed on the prediction of long-range contacts (i.e., contacts between residues separated by at least 24 residues along sequence) in target proteins that cannot be easily modeled by homology. Although there is considerable activity in the field, the current analysis reports no discernable progress since CASP8.
Collapse
Affiliation(s)
- Bohdan Monastyrskyy
- Genome Center, University of California-Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | | | | | | |
Collapse
|
64
|
MacCallum JL, Pérez A, Schnieders MJ, Hua L, Jacobson MP, Dill KA. Assessment of protein structure refinement in CASP9. Proteins 2011; 79 Suppl 10:74-90. [PMID: 22069034 DOI: 10.1002/prot.23131] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2011] [Revised: 06/15/2011] [Accepted: 07/03/2011] [Indexed: 11/06/2022]
Abstract
We assess performance in the structure refinement category in CASP9. Two years after CASP8, the performance of the best groups has not improved. There are few groups that improve any of our assessment scores with statistical significance. Some predictors, however, are able to consistently improve the physicality of the models. Although we cannot identify any clear bottleneck in improving refinement, several points arise: (1) The refinement portion of CASP has too few targets to make many statistically meaningful conclusions. (2) Predictors are usually very conservative, limiting the possibility of large improvements in models. (3) No group is actually able to correctly rank their five submissions-indicating that potentially better models may be discarded. (4) Different sampling strategies work better for different refinement problems; there is no single strategy that works on all targets. In general, conservative strategies do better, while the greatest improvements come from more adventurous sampling-at the cost of consistency. Comparison with experimental data reveals aspects not captured by comparison to a single structure. In particular, we show that improvement in backbone geometry does not always mean better agreement with experimental data. Finally, we demonstrate that even given the current challenges facing refinement, the refined models are useful for solving the crystallographic phase problem through molecular replacement. Proteins 2011;. © 2011 Wiley-Liss, Inc.
Collapse
Affiliation(s)
- Justin L MacCallum
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA.
| | | | | | | | | | | |
Collapse
|
65
|
Petrella RJ. A versatile method for systematic conformational searches: application to CheY. J Comput Chem 2011; 32:2369-85. [PMID: 21557263 PMCID: PMC3298744 DOI: 10.1002/jcc.21817] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2010] [Revised: 03/01/2011] [Accepted: 03/20/2011] [Indexed: 12/27/2022]
Abstract
A novel molecular structure prediction method, the Z Method, is described. It provides a versatile platform for the development and use of systematic, grid-based conformational search protocols, in which statistical information (i.e., rotamers) can also be included. The Z Method generates trial structures by applying many changes of the same type to a single starting structure, thereby sampling the conformation space in an unbiased way. The method, implemented in the CHARMM program as the Z Module, is applied here to an illustrative model problem in which rigid, systematic searches are performed in a 36-dimensional conformational space that describes the relative positions of the 10 secondary structural elements of the protein CheY. A polar hydrogen representation with an implicit solvation term (EEF1) is used to evaluate successively larger fragments of the protein generated in a hierarchical build-up procedure. After a final refinement stage, and a total computational time of about two-and-a-half CPU days on AMD Opteron processors, the prediction is within 1.56 Å of the native structure. The errors in the predicted backbone dihedral angles are found to approximately cancel. Monte Carlo and simulated annealing trials on the same or smaller versions of the problem, using the same atomic model and energy terms, are shown to result in less accurate predictions. Although the problem solved here is a limited one, the findings illustrate the utility of systematic searches with atom-based models for macromolecular structure prediction and the importance of unbiased sampling in structure prediction methods.
Collapse
Affiliation(s)
- Robert J Petrella
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA.
| |
Collapse
|
66
|
|
67
|
The draft genome of the parasitic nematode Trichinella spiralis. Nat Genet 2011; 43:228-35. [PMID: 21336279 PMCID: PMC3057868 DOI: 10.1038/ng.769] [Citation(s) in RCA: 241] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2010] [Accepted: 01/21/2011] [Indexed: 12/02/2022]
Abstract
Genome-based studies of metazoan evolution are most informative when phylogenetically diverse species are incorporated in the analysis. As such, evolutionary trends within and outside the phylum Nematoda have been less revealing by focusing only on comparisons involving Caenorhabditis elegans. Herein, we present a draft of the 64 megabase nuclear genome of Trichinella spiralis, containing 15,808 protein coding genes. This parasitic nematode is an extant member of a clade that diverged early in the evolution of the phylum enabling identification of archetypical genes and molecular signatures exclusive to nematodes. Comparative analyses support intrachromosomal rearrangements across the phylum, disproportionate numbers of protein family deaths over births in parasitic vs. a non-parasitic nematode, and a preponderance of gene loss and gain events in nematodes relative to Drosophila melanogaster. This sequence and the panphylum characteristics identified herein will advance evolutionary studies and strategies to combat global parasites of humans, food animals and crops.
Collapse
|
68
|
di Luccio E, Koehl P. A quality metric for homology modeling: the H-factor. BMC Bioinformatics 2011; 12:48. [PMID: 21291572 PMCID: PMC3213331 DOI: 10.1186/1471-2105-12-48] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2010] [Accepted: 02/04/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The analysis of protein structures provides fundamental insight into most biochemical functions and consequently into the cause and possible treatment of diseases. As the structures of most known proteins cannot be solved experimentally for technical or sometimes simply for time constraints, in silico protein structure prediction is expected to step in and generate a more complete picture of the protein structure universe. Molecular modeling of protein structures is a fast growing field and tremendous works have been done since the publication of the very first model. The growth of modeling techniques and more specifically of those that rely on the existing experimental knowledge of protein structures is intimately linked to the developments of high resolution, experimental techniques such as NMR, X-ray crystallography and electron microscopy. This strong connection between experimental and in silico methods is however not devoid of criticisms and concerns among modelers as well as among experimentalists. RESULTS In this paper, we focus on homology-modeling and more specifically, we review how it is perceived by the structural biology community and what can be done to impress on the experimentalists that it can be a valuable resource to them. We review the common practices and provide a set of guidelines for building better models. For that purpose, we introduce the H-factor, a new indicator for assessing the quality of homology models, mimicking the R-factor in X-ray crystallography. The methods for computing the H-factor is fully described and validated on a series of test cases. CONCLUSIONS We have developed a web service for computing the H-factor for models of a protein structure. This service is freely accessible at http://koehllab.genomecenter.ucdavis.edu/toolkit/h-factor.
Collapse
Affiliation(s)
- Eric di Luccio
- Computer Science Department, Room 4337, Genome Center, GBSF University of California Davis 451 East Health Sciences Drive Davis, CA 95616, USA.
| | | |
Collapse
|
69
|
Zhao F, Peng J, Debartolo J, Freed KF, Sosnick TR, Xu J. A probabilistic and continuous model of protein conformational space for template-free modeling. J Comput Biol 2011; 17:783-98. [PMID: 20583926 DOI: 10.1089/cmb.2009.0235] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
One of the major challenges with protein template-free modeling is an efficient sampling algorithm that can explore a huge conformation space quickly. The popular fragment assembly method constructs a conformation by stringing together short fragments extracted from the Protein Data Base (PDB). The discrete nature of this method may limit generated conformations to a subspace in which the native fold does not belong. Another worry is that a protein with really new fold may contain some fragments not in the PDB. This article presents a probabilistic model of protein conformational space to overcome the above two limitations. This probabilistic model employs directional statistics to model the distribution of backbone angles and 2(nd)-order Conditional Random Fields (CRFs) to describe sequence-angle relationship. Using this probabilistic model, we can sample protein conformations in a continuous space, as opposed to the widely used fragment assembly and lattice model methods that work in a discrete space. We show that when coupled with a simple energy function, this probabilistic method compares favorably with the fragment assembly method in the blind CASP8 evaluation, especially on alpha or small beta proteins. To our knowledge, this is the first probabilistic method that can search conformations in a continuous space and achieves favorable performance. Our method also generated three-dimensional (3D) models better than template-based methods for a couple of CASP8 hard targets. The method described in this article can also be applied to protein loop modeling, model refinement, and even RNA tertiary structure prediction.
Collapse
Affiliation(s)
- Feng Zhao
- Toyota Technological Institute at Chicago, Chicago, Illinois 60637, USA
| | | | | | | | | | | |
Collapse
|
70
|
Dolan MA, Noah JW, Hurt D. Comparison of common homology modeling algorithms: application of user-defined alignments. Methods Mol Biol 2011; 857:399-414. [PMID: 22323232 DOI: 10.1007/978-1-61779-588-6_18] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The number of known three-dimensional protein sequences is orders of magnitude higher than the number of known protein structures. This is a result of an increase in large-scale genomic sequencing projects, the inability of proteins to crystallize or crystals to diffract well, or a simple lack of resources. An alternative is to use one of a variety of available homology modeling programs to produce a computational model of a protein. Protein models are produced using information from known protein structures found to be similar. Here, we compare the ability of a number of popular homology modeling programs to produce quality models from user-defined target-template sequence alignments over a range of circumstances including low sequence identity, variable sequence length, and when interfaced with a protein or small molecule. Programs evaluated include Prime, SWISS-MODEL, MOE, MODELLER, ROSETTA, Composer, ORCHESTRAR, and I-TASSER. Proteins to be modeled were chosen to test a range of sequence identities, sequence lengths, and protein motifs and all are of scientific importance. These include HIV-1 protease, kinases, dihydrofolate reductase, a viral capsid protein, and factor Xa among others. For the most part, the programs produce results that are similar. For example, all programs are able to produce reasonable models when sequence identities are >30% and all programs have difficulties producing complete models when sequence identities are lower. However, certain programs fare slightly better than others in certain situations and we attempt to provide insight on this topic.
Collapse
Affiliation(s)
- Michael A Dolan
- Bioinformatics and Computational Biosciences Branch, National Institute of Allergies and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA.
| | | | | |
Collapse
|
71
|
Zhou H, Skolnick J. Improving threading algorithms for remote homology modeling by combining fragment and template comparisons. Proteins 2010; 78:2041-8. [PMID: 20455261 DOI: 10.1002/prot.22717] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
In this work, we develop a method called fragment comparison and the template comparison (FTCOM) for assessing the global quality of protein structural models for targets of medium and hard difficulty (remote homology) produced by structure prediction approaches such as threading or ab initio structure prediction. FTCOM requires the C(alpha) coordinates of full length models and assesses model quality based on fragment comparison and a score derived from comparison of the model to top threading templates. On a set of 361 medium/hard targets, FTCOM was applied to and assessed for its ability to improve on the results from the SP(3), SPARKS, PROSPECTOR_3, and PRO-SP(3)-TASSER threading algorithms. The average TM-score improves by 5-10% for the first selected model by the new method over models obtained by the original selection procedure in the respective threading methods. Moreover, the number of foldable targets (TM-score >or= 0.4) increases from least 7.6% for SP(3) to 54% for SPARKS. Thus, FTCOM is a promising approach to template selection. Proteins 2010. (c) 2010 Wiley-Liss, Inc.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | |
Collapse
|
72
|
Abstract
Motivation: One of the major bottlenecks with ab initio protein folding is an effective conformation sampling algorithm that can generate native-like conformations quickly. The popular fragment assembly method generates conformations by restricting the local conformations of a protein to short structural fragments in the PDB. This method may limit conformations to a subspace to which the native fold does not belong because (i) a protein with really new fold may contain some structural fragments not in the PDB and (ii) the discrete nature of fragments may prevent them from building a native-like fold. Previously we have developed a conditional random fields (CRF) method for fragment-free protein folding that can sample conformations in a continuous space and demonstrated that this CRF method compares favorably to the popular fragment assembly method. However, the CRF method is still limited by its capability of generating conformations compatible with a sequence. Results: We present a new fragment-free approach to protein folding using a recently invented probabilistic graphical model conditional neural fields (CNF). This new CNF method is much more powerful than CRF in modeling the sophisticated protein sequence-structure relationship and thus, enables us to generate native-like conformations more easily. We show that when coupled with a simple energy function and replica exchange Monte Carlo simulation, our CNF method can generate decoys much better than CRF on a variety of test proteins including the CASP8 free-modeling targets. In particular, our CNF method can predict a correct fold for T0496_D1, one of the two CASP8 targets with truly new fold. Our predicted model for T0496 is significantly better than all the CASP8 models. Contact:jinboxu@gmail.com
Collapse
Affiliation(s)
- Feng Zhao
- Toyota Technological Institute, Chicago, IL 60637, USA
| | | | | |
Collapse
|
73
|
Wu S, Zhang Y. Recognizing protein substructure similarity using segmental threading. Structure 2010; 18:858-67. [PMID: 20637422 DOI: 10.1016/j.str.2010.04.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2010] [Revised: 04/02/2010] [Accepted: 04/03/2010] [Indexed: 11/15/2022]
Abstract
Protein template identification is essential to protein structure and function predictions. However, conventional whole-chain threading approaches often fail to recognize conserved substructure motifs when the target and templates do not share the same fold. We developed a new approach, SEGMER, for identifying protein substructure similarities by segmental threading. The target sequence is split into segments of two to four consecutive or nonconsecutive secondary structural elements, which are then threaded through PDB to identify appropriate substructure motifs. SEGMER is tested on 144 nonredundant hard proteins. When combined with whole-chain threading, the TM-score of alignments and accuracy of spatial restraints of SEGMER increase by 16% and 25%, respectively, compared with that by the whole-chain threading methods only. When tested on 12 free modeling targets from CASP8, SEGMER increases the TM-score and contact accuracy by 28% and 48%, respectively. This significant improvement should have important impact on protein structure modeling and functional inference.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, USA
| | | |
Collapse
|
74
|
Zelter A, Hoopmann MR, Vernon R, Baker D, MacCoss MJ, Davis TN. Isotope signatures allow identification of chemically cross-linked peptides by mass spectrometry: a novel method to determine interresidue distances in protein structures through cross-linking. J Proteome Res 2010; 9:3583-9. [PMID: 20476776 DOI: 10.1021/pr1001115] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Knowledge of protein structures and protein-protein interactions is essential for understanding of biological processes. Recent advances in protein cross-linking and mass spectrometry (MS) have shown significant potential to contribute to this area. Here we report a novel method to rapidly and accurately identify cross-linked peptides based on their unique isotope signature when digested in the presence of H(2)(18)O. This method overcomes the need for specially synthesized cross-linkers and/or multiple MS runs required by other techniques. We validated our method by performing a "blind" analysis of 5 proteins/complexes of known structure. Side chain repacking calculations using Rosetta show that 17 of our 20 positively identified cross-links fit the published atomic structures. The remaining 3 cross-links are likely due to protein aggregation. The accuracy and rapid throughput of our workflow will advance the use of protein cross-linking in structural biology.
Collapse
Affiliation(s)
- Alex Zelter
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
| | | | | | | | | | | |
Collapse
|
75
|
Lin MS, Head-Gordon T. Reliable protein structure refinement using a physical energy function. J Comput Chem 2010; 32:709-17. [DOI: 10.1002/jcc.21664] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2010] [Revised: 08/02/2010] [Accepted: 08/07/2010] [Indexed: 11/10/2022]
|
76
|
Chopra G, Kalisman N, Levitt M. Consistent refinement of submitted models at CASP using a knowledge-based potential. Proteins 2010; 78:2668-78. [PMID: 20589633 PMCID: PMC2911515 DOI: 10.1002/prot.22781] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Protein structure refinement is an important but unsolved problem; it must be solved if we are to predict biological function that is very sensitive to structural details. Specifically, critical assessment of techniques for protein structure prediction (CASP) shows that the accuracy of predictions in the comparative modeling category is often worse than that of the template on which the homology model is based. Here we describe a refinement protocol that is able to consistently refine submitted predictions for all categories at CASP7. The protocol uses direct energy minimization of the knowledge-based potential of mean force that is based on the interaction statistics of 167 atom types (Summa and Levitt, Proc Natl Acad Sci USA 2007; 104:3177-3182). Our protocol is thus computationally very efficient; it only takes a few minutes of CPU time to run typical protein models (300 residues). We observe an average structural improvement of 1% in GDT_TS, for predictions that have low and medium homology to known PDB structures (Global Distance Test score or GDT_TS between 50 and 80%). We also observe a marked improvement in the stereochemistry of the models. The level of improvement varies amongst the various participants at CASP, but we see large improvements (>10% increase in GDT_TS) even for models predicted by the best performing groups at CASP7. In addition, our protocol consistently improved the best predicted models in the refinement category at CASP7 and CASP8. These improvements in structure and stereochemistry prove the usefulness of our computationally inexpensive, powerful and automatic refinement protocol.
Collapse
Affiliation(s)
- Gaurav Chopra
- Department of Structural Biology, Stanford University, Stanford, CA 94305, USA.
| | | | | |
Collapse
|
77
|
Application of biasing-potential replica-exchange simulations for loop modeling and refinement of proteins in explicit solvent. Proteins 2010; 78:2809-19. [DOI: 10.1002/prot.22796] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
78
|
Lasker K, Phillips JL, Russel D, Velázquez-Muriel J, Schneidman-Duhovny D, Tjioe E, Webb B, Schlessinger A, Sali A. Integrative structure modeling of macromolecular assemblies from proteomics data. Mol Cell Proteomics 2010; 9:1689-702. [PMID: 20507923 DOI: 10.1074/mcp.r110.000067] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Proteomics techniques have been used to generate comprehensive lists of protein interactions in a number of species. However, relatively little is known about how these interactions result in functional multiprotein complexes. This gap can be bridged by combining data from proteomics experiments with data from established structure determination techniques. Correspondingly, integrative computational methods are being developed to provide descriptions of protein complexes at varying levels of accuracy and resolution, ranging from complex compositions to detailed atomic structures.
Collapse
Affiliation(s)
- Keren Lasker
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California 94158, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
79
|
Kaufmann KW, Lemmon GH, Deluca SL, Sheehan JH, Meiler J. Practically useful: what the Rosetta protein modeling suite can do for you. Biochemistry 2010; 49:2987-98. [PMID: 20235548 PMCID: PMC2850155 DOI: 10.1021/bi902153g] [Citation(s) in RCA: 291] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
![]()
The objective of this review is to enable researchers to use the software package Rosetta for biochemical and biomedicinal studies. We provide a brief review of the six most frequent research problems tackled with Rosetta. For each of these six tasks, we provide a tutorial that illustrates a basic Rosetta protocol. The Rosetta method was originally developed for de novo protein structure prediction and is regularly one of the best performers in the community-wide biennial Critical Assessment of Structure Prediction. Predictions for protein domains with fewer than 125 amino acids regularly have a backbone root-mean-square deviation of better than 5.0 Å. More impressively, there are several cases in which Rosetta has been used to predict structures with atomic level accuracy better than 2.5 Å. In addition to de novo structure prediction, Rosetta also has methods for molecular docking, homology modeling, determining protein structures from sparse experimental NMR or EPR data, and protein design. Rosetta has been used to accurately design a novel protein structure, predict the structure of protein−protein complexes, design altered specificity protein−protein and protein−DNA interactions, and stabilize proteins and protein complexes. Most recently, Rosetta has been used to solve the X-ray crystallographic phase problem.
Collapse
Affiliation(s)
- Kristian W Kaufmann
- Department of Chemistry, Vanderbilt University, 7330 Stevenson Center, Station B 351822, Nashville, Tennessee 37235, USA
| | | | | | | | | |
Collapse
|
80
|
Venselaar H, Joosten RP, Vroling B, Baakman CAB, Hekkelman ML, Krieger E, Vriend G. Homology modelling and spectroscopy, a never-ending love story. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2010; 39:551-63. [PMID: 19718498 PMCID: PMC2841279 DOI: 10.1007/s00249-009-0531-0] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2009] [Revised: 07/29/2009] [Accepted: 08/04/2009] [Indexed: 01/29/2023]
Abstract
Homology modelling is normally the technique of choice when experimental structure data are not available but three-dimensional coordinates are needed, for example, to aid with detailed interpretation of results of spectroscopic studies. Herein, the state of the art of homology modelling will be described in the light of a series of recent developments, and an overview will be given of the problems and opportunities encountered in this field. The major topic, the accuracy and precision of homology models, will be discussed extensively due to its influence on the reliability of conclusions drawn from the combination of homology models and spectroscopic data. Three real-world examples will illustrate how both homology modelling and spectroscopy can be beneficial for (bio)medical research.
Collapse
Affiliation(s)
- Hanka Venselaar
- Centre for Molecular and Biomolecular Informatics, CMBI, NCMLS 260, Radboud University Medical Centre, 6500 HB Nijmegen, The Netherlands.
| | | | | | | | | | | | | |
Collapse
|
81
|
Levy R, Edelman M, Sobolev V. Prediction of 3D metal binding sites from translated gene sequences based on remote-homology templates. Proteins 2010; 76:365-74. [PMID: 19173310 DOI: 10.1002/prot.22352] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Database-scale analysis was performed to determine whether structural models, based on remote homologues, are effective in predicting 3D transition metal binding sites in proteins directly from translated gene sequences. The extent by which side chain modeling alone reduces sensitivity and selectivity is shown to be <10%. Surprisingly, selectivity was not dependent on the level of sequence homology between template and target, or on the presence of a metal ion in the structural template. Applying a modification of the CHED algorithm (Babor et al., Proteins 2008;70:208-217) and machine learning filters, a selectivity of approximately 90% was achieved for protein sequences using unrelated structural templates over a sequence identity range of 18-100%. Below approximately 18% identity, the number of analyzable target-template pairs and predictability of metal binding sites falls off sharply. A full third of structural templates were found to have target partners only in the remote homology range of 18-30%. In this range, nonmetal-binding templates are calculated to be the majority and serve to predict with 50% sensitivity at the geometric level. Overall, sensitivity at the geometric level for targets having templates in the 18-30% sequence identity range is 73%, with an average of one false positive site per true site. Protein sequences described as "unknown" in the UniProt database and composed largely of unidentified genome project sequences were studied and metal binding sites predicted. A web server for prediction of metal binding sites from protein sequence is provided.
Collapse
Affiliation(s)
- Ronen Levy
- Department of Plant Sciences, Weizmann Institute of Science, Rehovot, Israel
| | | | | |
Collapse
|
82
|
Liang S, Wang G, Zhou Y. Refining near-native protein-protein docking decoys by local resampling and energy minimization. Proteins 2010; 76:309-16. [PMID: 19156819 DOI: 10.1002/prot.22343] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
How to refine a near-native structure to make it closer to its native conformation is an unsolved problem in protein-structure and protein-protein complex-structure prediction. In this article, we first test several scoring functions for selecting locally resampled near-native protein-protein docking conformations and then propose a computationally efficient protocol for structure refinement via local resampling and energy minimization. The proposed method employs a statistical energy function based on a Distance-scaled Ideal-gas REference state (DFIRE) as an initial filter and an empirical energy function EMPIRE (EMpirical Protein-InteRaction Energy) for optimization and re-ranking. Significant improvement of final top-1 ranked structures over initial near-native structures is observed in the ZDOCK 2.3 decoy set for Benchmark 1.0 (74% whose global rmsd reduced by 0.5 A or more and only 7% increased by 0.5 A or more). Less significant improvement is observed for Benchmark 2.0 (38% versus 33%). Possible reasons are discussed.
Collapse
Affiliation(s)
- Shide Liang
- Indiana University School of Informatics, Indiana University-Purdue University, Indianapolis, 46202, USA
| | | | | |
Collapse
|
83
|
Davis JH, Aperlo C, Li Y, Kurosawa E, Lan Y, Lo KM, Huston JS. SEEDbodies: fusion proteins based on strand-exchange engineered domain (SEED) CH3 heterodimers in an Fc analogue platform for asymmetric binders or immunofusions and bispecific antibodies. Protein Eng Des Sel 2010; 23:195-202. [PMID: 20299542 DOI: 10.1093/protein/gzp094] [Citation(s) in RCA: 143] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Bispecific antibodies and asymmetric Fc fusion proteins offer opportunities for important advances in therapeutics. Bivalent IgG depends upon in vivo dimerization of its heavy chains, mediated by homodimeric association of its C(H)3 domains. We have developed a heterodimeric Fc platform that supports the design of bispecific and asymmetric fusion proteins by devising strand-exchange engineered domain (SEED) C(H)3 heterodimers. These derivatives of human IgG and IgA C(H)3 domains create complementary human SEED C(H)3 heterodimers that are composed of alternating segments of human IgA and IgG C(H)3 sequences. The resulting pair of SEED C(H)3 domains preferentially associates to form heterodimers when expressed in mammalian cells. SEEDbody (Sb) fusion proteins consist of [IgG1 hinge]-C(H)2-[SEED C(H)3], that may be genetically linked to one or more fusion partners. This investigation reports on the generation of mono-Fab-Sb and Sb-IL2 monocytokine as models. They were expressed at high levels in NS/0 cells, purified on recombinant protein A resin and were well-behaved in solution. When administered intravenously to mice, Sb pharmacokinetics exhibited the long serum half-life extensions typical of comparable Fc-containing immunofusion and IgG1 controls.
Collapse
|
84
|
MacCallum JL, Hua L, Schnieders MJ, Pande VS, Jacobson MP, Dill KA. Assessment of the protein-structure refinement category in CASP8. Proteins 2010; 77 Suppl 9:66-80. [PMID: 19714776 DOI: 10.1002/prot.22538] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Here, we summarize the assessment of protein structure refinement in CASP8. Twenty-four groups refined a total of 12 target proteins. Averaging over all groups and all proteins, there was no net improvement over the original starting models. However, there are now some individual research groups who consistently do improve protein structures relative to a starting starting model. We compare various measures of quality assessment, including (i) standard backbone-based methods, (ii) new methods from the Richardson group, and (iii) ensemble-based methods for comparing experimental structures, such as NMR NOE violations and the suitability of the predicted models to serve as templates for molecular replacement. On the whole, there is a general correlation among various measures. However, there are interesting differences. Sometimes a structure that is in better agreement with the experimental data is judged to be slightly worse by GDT-TS. This suggests that for comparing protein structures that are already quite close to the native, it may be preferable to use ensemble-based experimentally derived measures of quality, in addition to single-structure-based methods such as GDT-TS.
Collapse
Affiliation(s)
- Justin L MacCallum
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California 94158, USA
| | | | | | | | | | | |
Collapse
|
85
|
Krieger E, Joo K, Lee J, Lee J, Raman S, Thompson J, Tyka M, Baker D, Karplus K. Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: Four approaches that performed well in CASP8. Proteins 2010; 77 Suppl 9:114-22. [PMID: 19768677 DOI: 10.1002/prot.22570] [Citation(s) in RCA: 1011] [Impact Index Per Article: 72.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
A correct alignment is an essential requirement in homology modeling. Yet in order to bridge the structural gap between template and target, which may not only involve loop rearrangements, but also shifts of secondary structure elements and repacking of core residues, high-resolution refinement methods with full atomic details are needed. Here, we describe four approaches that address this "last mile of the protein folding problem" and have performed well during CASP8, yielding physically realistic models: YASARA, which runs molecular dynamics simulations of models in explicit solvent, using a new partly knowledge-based all atom force field derived from Amber, whose parameters have been optimized to minimize the damage done to protein crystal structures. The LEE-SERVER, which makes extensive use of conformational space annealing to create alignments, to help Modeller build physically realistic models while satisfying input restraints from templates and CHARMM stereochemistry, and to remodel the side-chains. ROSETTA, whose high resolution refinement protocol combines a physically realistic all atom force field with Monte Carlo minimization to allow the large conformational space to be sampled quickly. And finally UNDERTAKER, which creates a pool of candidate models from various templates and then optimizes them with an adaptive genetic algorithm, using a primarily empirical cost function that does not include bond angle, bond length, or other physics-like terms.
Collapse
Affiliation(s)
- Elmar Krieger
- Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, The Netherlands
| | | | | | | | | | | | | | | | | |
Collapse
|
86
|
Arab S, Sadeghi M, Eslahchi C, Pezeshk H, Sheari A. A pairwise residue contact area-based mean force potential for discrimination of native protein structure. BMC Bioinformatics 2010; 11:16. [PMID: 20064218 PMCID: PMC2821318 DOI: 10.1186/1471-2105-11-16] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2009] [Accepted: 01/09/2010] [Indexed: 11/21/2022] Open
Abstract
Background Considering energy function to detect a correct protein fold from incorrect ones is very important for protein structure prediction and protein folding. Knowledge-based mean force potentials are certainly the most popular type of interaction function for protein threading. They are derived from statistical analyses of interacting groups in experimentally determined protein structures. These potentials are developed at the atom or the amino acid level. Based on orientation dependent contact area, a new type of knowledge-based mean force potential has been developed. Results We developed a new approach to calculate a knowledge-based potential of mean-force, using pairwise residue contact area. To test the performance of our approach, we performed it on several decoy sets to measure its ability to discriminate native structure from decoys. This potential has been able to distinguish native structures from the decoys in the most cases. Further, the calculated Z-scores were quite high for all protein datasets. Conclusions This knowledge-based potential of mean force can be used in protein structure prediction, fold recognition, comparative modelling and molecular recognition. The program is available at http://www.bioinf.cs.ipm.ac.ir/softwares/surfield
Collapse
Affiliation(s)
- Shahriar Arab
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | | | | | | | | |
Collapse
|
87
|
Abstract
The tertiary structure of proteins can reveal information that is hard to detect in a linear sequence. Knowing the tertiary structure is valuable when generating hypothesis and interpreting data. Unfortunately, the gap between the number of known protein sequences and their associated structures is widening. One way to bridge this gap is to use computer-generated structure models of proteins. Here we present concepts and online resources that can be used to identify structural domains in proteins and to create structure models of those domains.
Collapse
Affiliation(s)
- Lars Malmström
- Institute for Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
| | | |
Collapse
|
88
|
Lindert S, Staritzbichler R, Wötzel N, Karakaş M, Stewart PL, Meiler J. EM-fold: De novo folding of alpha-helical proteins guided by intermediate-resolution electron microscopy density maps. Structure 2009; 17:990-1003. [PMID: 19604479 DOI: 10.1016/j.str.2009.06.001] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2009] [Revised: 05/31/2009] [Accepted: 06/02/2009] [Indexed: 01/22/2023]
Abstract
In medium-resolution (7-10 A) cryo-electron microscopy (cryo-EM) density maps, alpha helices can be identified as density rods whereas beta-strand or loop regions are not as easily discerned. We are proposing a computational protein structure prediction algorithm "EM-Fold" that resolves the density rod connectivity ambiguity by placing predicted alpha helices into the density rods and adding missing backbone coordinates in loop regions. In a benchmark of 11 mainly alpha-helical proteins of known structure a native-like model is identified in eight cases (rmsd 3.9-7.9 A). The three failures can be attributed to inaccuracies in the secondary structure prediction step that precedes EM-Fold. EM-Fold has been applied to the approximately 6 A resolution cryo-EM density map of protein IIIa from human adenovirus. We report the first topological model for the alpha-helical 400 residue N-terminal region of protein IIIa. EM-Fold also has the potential to interpret medium-resolution density maps in X-ray crystallography.
Collapse
Affiliation(s)
- Steffen Lindert
- Department of Chemistry, Vanderbilt University, Nashville, TN 37212, USA
| | | | | | | | | | | |
Collapse
|
89
|
Mirzaie M, Eslahchi C, Pezeshk H, Sadeghi M. A distance-dependent atomic knowledge-based potential and force for discrimination of native structures from decoys. Proteins 2009; 77:454-63. [PMID: 19452553 DOI: 10.1002/prot.22457] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The purpose of this article is to introduce a novel model for discriminating correctly folded proteins from well designed decoy structures using mechanical interatomic forces. In our model, we consider a protein as a collection of springs and the force imposed to each atom is calculated. A potential function is obtained from statistical contact preferences within known protein structures. Combining this function with the spring equation, the interatomic forces are calculated. Finally, we consider a structure and define a score function on the 3D structure of a protein. We compare the force imposed to each atom of a protein with the corresponding atom in the other structures. We then assign larger scores to those atoms with lower forces. The total score is the sum of partial scores of atoms. The optimal structure is assumed to be the one with the highest score in the data set. To evaluate the performance of our model, we apply it on several decoy sets.
Collapse
Affiliation(s)
- Mehdi Mirzaie
- Department of Mathematical Sciences, Shahid Beheshti University, Post Code 1983963113, Tehran, Iran
| | | | | | | |
Collapse
|
90
|
Michino M, Brooks CL. Predicting structurally conserved contacts for homologous proteins using sequence conservation filters. Proteins 2009; 77:448-53. [PMID: 19475704 DOI: 10.1002/prot.22456] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The prediction of intramolecular contacts has a useful application in predicting the three-dimensional structures of proteins. The accuracy of the template-based contact prediction methods depends on the quality of the template structures. To reduce the false positive predictions associated with using the entire set of template-derived contacts, we develop selection filters that use sequence conservation information to predict subsets of contacts more likely to be structurally conserved between the template and the target. The method is developed specifically for protein families with few available templates such as the G protein-coupled receptor (GPCR) family. It is validated on a test set of 342 template-target pairs from three protein families, and applied to one template-target pair from the GPCR family. We find that the filter selection method increases the accuracy of contact prediction with sufficient coverage for structure prediction.
Collapse
Affiliation(s)
- Mayako Michino
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037, USA
| | | |
Collapse
|
91
|
Aloy P, Oliva B. Splitting statistical potentials into meaningful scoring functions: testing the prediction of near-native structures from decoy conformations. BMC STRUCTURAL BIOLOGY 2009; 9:71. [PMID: 19917096 PMCID: PMC2783033 DOI: 10.1186/1472-6807-9-71] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2009] [Accepted: 11/16/2009] [Indexed: 11/20/2022]
Abstract
Background Recent advances on high-throughput technologies have produced a vast amount of protein sequences, while the number of high-resolution structures has seen a limited increase. This has impelled the production of many strategies to built protein structures from its sequence, generating a considerable amount of alternative models. The selection of the closest model to the native conformation has thus become crucial for structure prediction. Several methods have been developed to score protein models by energies, knowledge-based potentials and combination of both. Results Here, we present and demonstrate a theory to split the knowledge-based potentials in scoring terms biologically meaningful and to combine them in new scores to predict near-native structures. Our strategy allows circumventing the problem of defining the reference state. In this approach we give the proof for a simple and linear application that can be further improved by optimizing the combination of Zscores. Using the simplest composite score () we obtained predictions similar to state-of-the-art methods. Besides, our approach has the advantage of identifying the most relevant terms involved in the stability of the protein structure. Finally, we also use the composite Zscores to assess the conformation of models and to detect local errors. Conclusion We have introduced a method to split knowledge-based potentials and to solve the problem of defining a reference state. The new scores have detected near-native structures as accurately as state-of-art methods and have been successful to identify wrongly modeled regions of many near-native conformations.
Collapse
Affiliation(s)
- Patrick Aloy
- Institut de Recerca Biomèdica and Barcelona Supercomputing Center, 10-12 08028 Barcelona, Catalonia, Spain.
| | | |
Collapse
|
92
|
Finding of residues crucial for supersecondary structure formation. Proc Natl Acad Sci U S A 2009; 106:18996-9000. [PMID: 19855006 DOI: 10.1073/pnas.0909714106] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
This work evaluates the hypothesis that proteins with an identical supersecondary structure (SSS) share a unique set of residues--SSS-determining residues--even though they may belong to different protein families and have very low sequence similarities. This hypothesis was tested on two groups of sandwich-like proteins (SPs). Proteins in each group have an identical SSS, but their sequence similarity is below the "twilight zone." To find the SSS-determining residues specific to each group, a unique structure-based algorithm of multiple sequences alignment was developed. The units of alignment are individual strands and loops rather than whole sequences. The algorithm is based on the alignment of residues that form hydrogen bonds between corresponding strands. Structure-based alignment revealed that 30-35% of the positions in the sequences in each group of proteins are "conserved positions" occupied either by hydrophobic-only or hydrophilic-only residues. Moreover, each group of SPs is characterized by a unique set of SSS-determining residues found at the conserved positions. The set of SSS-determining residues has very high sensitivity and specificity for identifying proteins with a corresponding SSS: It is an "amino acid tag" that brands a sequence as having a particular SSS. Thus, the sets of SSS-determining residues can be used to classify proteins and to predict the SSS of a query amino acid sequence.
Collapse
|
93
|
Kim DE, Blum B, Bradley P, Baker D. Sampling bottlenecks in de novo protein structure prediction. J Mol Biol 2009; 393:249-60. [PMID: 19646450 DOI: 10.1016/j.jmb.2009.07.063] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2009] [Revised: 07/21/2009] [Accepted: 07/22/2009] [Indexed: 11/25/2022]
Abstract
The primary obstacle to de novo protein structure prediction is conformational sampling: the native state generally has lower free energy than nonnative structures but is exceedingly difficult to locate. Structure predictions with atomic level accuracy have been made for small proteins using the Rosetta structure prediction method, but for larger and more complex proteins, the native state is virtually never sampled, and it has been unclear how much of an increase in computing power would be required to successfully predict the structures of such proteins. In this paper, we develop an approach to determining how much computer power is required to accurately predict the structure of a protein, based on a reformulation of the conformational search problem as a combinatorial sampling problem in a discrete feature space. We find that conformational sampling for many proteins is limited by critical "linchpin" features, often the backbone torsion angles of individual residues, which are sampled very rarely in unbiased trajectories and, when constrained, dramatically increase the sampling of the native state. These critical features frequently occur in less regular and likely strained regions of proteins that contribute to protein function. In a number of proteins, the linchpin features are in regions found experimentally to form late in folding, suggesting a correspondence between folding in silico and in reality.
Collapse
Affiliation(s)
- David E Kim
- Department of Biochemistry, Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | | | | | | |
Collapse
|
94
|
Maupetit J, Tuffery P, Derreumaux P. A coarse-grained protein force field for folding and structure prediction. Proteins 2009; 69:394-408. [PMID: 17600832 DOI: 10.1002/prot.21505] [Citation(s) in RCA: 164] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
We have revisited the protein coarse-grained optimized potential for efficient structure prediction (OPEP). The training and validation sets consist of 13 and 16 protein targets. Because optimization depends on details of how the ensemble of decoys is sampled, trial conformations are generated by molecular dynamics, threading, greedy, and Monte Carlo simulations, or taken from publicly available databases. The OPEP parameters are varied by a genetic algorithm using a scoring function which requires that the native structure has the lowest energy, and the native-like structures have energy higher than the native structure but lower than the remote conformations. Overall, we find that OPEP correctly identifies 24 native or native-like states for 29 targets and has very similar capability to the all-atom discrete optimized protein energy model (DOPE), found recently to outperform five currently used energy models.
Collapse
Affiliation(s)
- Julien Maupetit
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM E0346, Université Paris 7, Tour 53-54, 2 place Jussieu, 75251 Paris, Cedex 05, France
| | | | | |
Collapse
|
95
|
Peterson ME, Chen F, Saven JG, Roos DS, Babbitt PC, Sali A. Evolutionary constraints on structural similarity in orthologs and paralogs. Protein Sci 2009; 18:1306-15. [PMID: 19472362 PMCID: PMC2774440 DOI: 10.1002/pro.143] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2008] [Revised: 03/29/2009] [Accepted: 03/30/2009] [Indexed: 11/10/2022]
Abstract
Although a quantitative relationship between sequence similarity and structural similarity has long been established, little is known about the impact of orthology on the relationship between protein sequence and structure. Among homologs, orthologs (derived by speciation) more frequently have similar functions than paralogs (derived by duplication). Here, we hypothesize that an orthologous pair will tend to exhibit greater structural similarity than a paralogous pair at the same level of sequence similarity. To test this hypothesis, we used 284,459 pairwise structure-based alignments of 12,634 unique domains from SCOP as well as orthology and paralogy assignments from OrthoMCL DB. We divided the comparisons by sequence identity and determined whether the sequence-structure relationship differed between the orthologs and paralogs. We found that at levels of sequence identity between 30 and 70%, orthologous domain pairs indeed tend to be significantly more structurally similar than paralogous pairs at the same level of sequence identity. An even larger difference is found when comparing ligand binding residues instead of whole domains. These differences between orthologs and paralogs are expected to be useful for selecting template structures in comparative modeling and target proteins in structural genomics.
Collapse
Affiliation(s)
- Mark E Peterson
- Department of Bioengineering and Therapeutic Sciences, University of CaliforniaSan Francisco, San Francisco, California 94158
- Department of Pharmaceutical Chemistry, University of CaliforniaSan Francisco, San Francisco, California 94158
- California Institute for Quantitative Biosciences, University of CaliforniaSan Francisco, San Francisco, California 94158
| | - Feng Chen
- Department of Chemistry, University of PennsylvaniaPhiladelphia, PA 19104
- Department of Biology and Penn Genomics Institute, University of PennsylvaniaPhiladelphia, PA 19104
| | - Jeffery G Saven
- Department of Chemistry, University of PennsylvaniaPhiladelphia, PA 19104
| | - David S Roos
- Department of Biology and Penn Genomics Institute, University of PennsylvaniaPhiladelphia, PA 19104
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of CaliforniaSan Francisco, San Francisco, California 94158
- Department of Pharmaceutical Chemistry, University of CaliforniaSan Francisco, San Francisco, California 94158
- California Institute for Quantitative Biosciences, University of CaliforniaSan Francisco, San Francisco, California 94158
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, University of CaliforniaSan Francisco, San Francisco, California 94158
- Department of Pharmaceutical Chemistry, University of CaliforniaSan Francisco, San Francisco, California 94158
- California Institute for Quantitative Biosciences, University of CaliforniaSan Francisco, San Francisco, California 94158
| |
Collapse
|
96
|
Zhou H, Skolnick J. Protein structure prediction by pro-Sp3-TASSER. Biophys J 2009; 96:2119-27. [PMID: 19289038 DOI: 10.1016/j.bpj.2008.12.3898] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Revised: 11/12/2008] [Accepted: 12/03/2008] [Indexed: 12/29/2022] Open
Abstract
An automated protein structure prediction algorithm, pro-sp3-Threading/ASSEmbly/Refinement (TASSER), is described and benchmarked. Structural templates are identified using five different scoring functions derived from the previously developed threading methods PROSPECTOR_3 and SP(3). Top templates identified by each scoring function are combined to derive contact and distant restraints for subsequent model refinement by short TASSER simulations. For Medium/Hard targets (those with moderate to poor quality templates and/or alignments), alternative template alignments are also generated by parametric alignment and the top models selected by TASSER-QA are included in the contact and distance restraint derivation. Then, multiple short TASSER simulations are used to generate an ensemble of full-length models. Subsequently, the top models are selected from the ensemble by TASSER-QA and used to derive TASSER contacts and distant restraints for another round of full TASSER refinement. The final models are selected from both rounds of TASSER simulations by TASSER-QA. We compare pro-sp3-TASSER with our previously developed MetaTASSER method (enhanced with chunk-TASSER for Medium/Hard targets) on a representative test data set of 723 proteins <250 residues in length. For the 348 proteins classified as easy targets (those templates with good alignments and global structure similarity to the target), the cumulative TM-score of the best of top five models by pro-sp3-TASSER shows a 2.1% improvement over MetaTASSER. For the 155/220 medium/hard targets, the improvements in TM-score are 2.8% and 2.2%, respectively. All improvements are statistically significant. More importantly, the number of foldable targets (those having models whose TM-score to native >0.4 in the top five clusters) increases from 472 to 497 for all targets, and the relative increases for medium and hard targets are 10% and 15%, respectively. A server that implements the above algorithm is available at http://cssb.biology.gatech.edu/skolnick/webservice/pro-sp3-TASSER/. The source code is also available upon request.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, USA
| | | |
Collapse
|
97
|
Gao X, Bu D, Xu J, Li M. Improving consensus contact prediction via server correlation reduction. BMC STRUCTURAL BIOLOGY 2009; 9:28. [PMID: 19419562 PMCID: PMC2689239 DOI: 10.1186/1472-6807-9-28] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2008] [Accepted: 05/06/2009] [Indexed: 11/10/2022]
Abstract
Background Protein inter-residue contacts play a crucial role in the determination and prediction of protein structures. Previous studies on contact prediction indicate that although template-based consensus methods outperform sequence-based methods on targets with typical templates, such consensus methods perform poorly on new fold targets. However, we find out that even for new fold targets, the models generated by threading programs can contain many true contacts. The challenge is how to identify them. Results In this paper, we develop an integer linear programming model for consensus contact prediction. In contrast to the simple majority voting method assuming that all the individual servers are equally important and independent, the newly developed method evaluates their correlation by using maximum likelihood estimation and extracts independent latent servers from them by using principal component analysis. An integer linear programming method is then applied to assign a weight to each latent server to maximize the difference between true contacts and false ones. The proposed method is tested on the CASP7 data set. If the top L/5 predicted contacts are evaluated where L is the protein size, the average accuracy is 73%, which is much higher than that of any previously reported study. Moreover, if only the 15 new fold CASP7 targets are considered, our method achieves an average accuracy of 37%, which is much better than that of the majority voting method, SVM-LOMETS, SVM-SEQ, and SAM-T06. These methods demonstrate an average accuracy of 13.0%, 10.8%, 25.8% and 21.2%, respectively. Conclusion Reducing server correlation and optimally combining independent latent servers show a significant improvement over the traditional consensus methods. This approach can hopefully provide a powerful tool for protein structure refinement and prediction use.
Collapse
Affiliation(s)
- Xin Gao
- David R, Cheriton School of Computer Science, University of Waterloo, N2L3G1, Canada.
| | | | | | | |
Collapse
|
98
|
Ramelot TA, Raman S, Kuzin AP, Xiao R, Ma LC, Acton TB, Hunt JF, Montelione GT, Baker D, Kennedy MA. Improving NMR protein structure quality by Rosetta refinement: a molecular replacement study. Proteins 2009; 75:147-67. [PMID: 18816799 PMCID: PMC2878636 DOI: 10.1002/prot.22229] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The structure of human protein HSPC034 has been determined by both solution nuclear magnetic resonance (NMR) spectroscopy and X-ray crystallography. Refinement of the NMR structure ensemble, using a Rosetta protocol in the absence of NMR restraints, resulted in significant improvements not only in structure quality, but also in molecular replacement (MR) performance with the raw X-ray diffraction data using MOLREP and Phaser. This method has recently been shown to be generally applicable with improved MR performance demonstrated for eight NMR structures refined using Rosetta (Qian et al., Nature 2007;450:259-264). Additionally, NMR structures of HSPC034 calculated by standard methods that include NMR restraints have improvements in the RMSD to the crystal structure and MR performance in the order DYANA, CYANA, XPLOR-NIH, and CNS with explicit water refinement (CNSw). Further Rosetta refinement of the CNSw structures, perhaps due to more thorough conformational sampling and/or a superior force field, was capable of finding alternative low energy protein conformations that were equally consistent with the NMR data according to the Recall, Precision, and F-measure (RPF) scores. On further examination, the additional MR-performance shortfall for NMR refined structures as compared with the X-ray structure were attributed, in part, to crystal-packing effects, real structural differences, and inferior hydrogen bonding in the NMR structures. A good correlation between a decrease in the number of buried unsatisfied hydrogen-bond donors and improved MR performance demonstrates the importance of hydrogen-bond terms in the force field for improving NMR structures. The superior hydrogen-bond network in Rosetta-refined structures demonstrates that correct identification of hydrogen bonds should be a critical goal of NMR structure refinement. Inclusion of nonbivalent hydrogen bonds identified from Rosetta structures as additional restraints in the structure calculation results in NMR structures with improved MR performance.
Collapse
Affiliation(s)
- Theresa A. Ramelot
- Department of Chemistry and Biochemistry and Northeast Structural Genomics Consortium, Miami University, Oxford, Ohio
| | - Srivatsan Raman
- Department of Biochemistry, University of Washington, and Howard Hughes Medical Institute, Seattle, Washington
| | - Alexandre P. Kuzin
- Department of Biological Sciences and Northeast Structural Genomics Consortium, Columbia University, New York, New York
| | - Rong Xiao
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, and Northeast Structural Genomics Consortium, Rutgers University, Piscataway, New Jersey
| | - Li-Chung Ma
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, and Northeast Structural Genomics Consortium, Rutgers University, Piscataway, New Jersey
| | - Thomas B. Acton
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, and Northeast Structural Genomics Consortium, Rutgers University, Piscataway, New Jersey
| | - John F. Hunt
- Department of Biological Sciences and Northeast Structural Genomics Consortium, Columbia University, New York, New York
| | - Gaetano T. Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, and Northeast Structural Genomics Consortium, Rutgers University, Piscataway, New Jersey
| | - David Baker
- Department of Biochemistry, University of Washington, and Howard Hughes Medical Institute, Seattle, Washington
| | - Michael A. Kennedy
- Department of Chemistry and Biochemistry and Northeast Structural Genomics Consortium, Miami University, Oxford, Ohio
| |
Collapse
|
99
|
Zhang Y. Protein structure prediction: when is it useful? Curr Opin Struct Biol 2009; 19:145-55. [PMID: 19327982 PMCID: PMC2673339 DOI: 10.1016/j.sbi.2009.02.005] [Citation(s) in RCA: 191] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2008] [Revised: 02/18/2009] [Accepted: 02/19/2009] [Indexed: 10/21/2022]
Abstract
Computationally predicted three-dimensional structure of protein molecules has demonstrated the usefulness in many areas of biomedicine, ranging from approximate family assignments to precise drug screening. For nearly 40 years, however, the accuracy of the predicted models has been dictated by the availability of close structural templates. Progress has recently been achieved in refining low-resolution models closer to the native ones; this has been made possible by combining knowledge-based information from multiple sources of structural templates as well as by improving the energy funnel of physics-based force fields. Unfortunately, there has been no essential progress in the development of techniques for detecting remotely homologous templates and for predicting novel protein structures.
Collapse
Affiliation(s)
- Yang Zhang
- Center for Bioinformatics and Department of Molecular Biosciences, University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, USA.
| |
Collapse
|
100
|
Lindert S, Stewart PL, Meiler J. Hybrid approaches: applying computational methods in cryo-electron microscopy. Curr Opin Struct Biol 2009; 19:218-25. [PMID: 19339173 DOI: 10.1016/j.sbi.2009.02.010] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2008] [Accepted: 02/26/2009] [Indexed: 12/20/2022]
Abstract
Recent advances in cryo-electron microscopy have led to an increasing number of high (3-5A) to medium (5-10A) resolution cryoEM density maps. These density maps contain valuable information about the protein structure but frequently require computational algorithms to aid their structural interpretation. It is these hybrid approaches between cryoEM and computational protein structure prediction algorithms that will shape protein structure elucidation from density maps.
Collapse
Affiliation(s)
- Steffen Lindert
- Department of Chemistry, Vanderbilt University, Nashville, TN 37212, USA
| | | | | |
Collapse
|