1
|
Li J, Sawhney A, Lee JY, Liao L. Improving Inter-Helix Contact Prediction With Local 2D Topological Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3001-3012. [PMID: 37155404 DOI: 10.1109/tcbb.2023.3274361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Inter-helix contact prediction is to identify residue contact across different helices in α-helical integral membrane proteins. Despite the progress made by various computational methods, contact prediction remains as a challenging task, and there is no method to our knowledge that directly tap into the contact map in an alignment free manner. We build 2D contact models from an independent dataset to capture the topological patterns in the neighborhood of a residue pair depending it is a contact or not, and apply the models to the state-of-art method's predictions to extract the features reflecting 2D inter-helix contact patterns. A secondary classifier is trained on such features. Realizing that the achievable improvement is intrinsically hinged on the quality of original predictions, we devise a mechanism to deal with the issue by introducing, 1) partial discretization of original prediction scores to more effectively leverage useful information 2) fuzzy score to assess the quality of the original prediction to help with selecting the residue pairs where improvement is more achievable. The cross-validation results show that the prediction from our method outperforms other methods including the state-of-the-art method (DeepHelicon) by a notable degree even without using the refinement selection scheme. By applying the refinement selection scheme, our method outperforms the state-of-the-art method significantly in these selected sequences.
Collapse
|
2
|
Structure-based protein function prediction using graph convolutional networks. Nat Commun 2021; 12:3168. [PMID: 34039967 PMCID: PMC8155034 DOI: 10.1038/s41467-021-23303-9] [Citation(s) in RCA: 256] [Impact Index Per Article: 85.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 04/22/2021] [Indexed: 02/04/2023] Open
Abstract
The rapid increase in the number of proteins in sequence databases and the diversity of their functions challenge computational approaches for automated function prediction. Here, we introduce DeepFRI, a Graph Convolutional Network for predicting protein functions by leveraging sequence features extracted from a protein language model and protein structures. It outperforms current leading methods and sequence-based Convolutional Neural Networks and scales to the size of current sequence repositories. Augmenting the training set of experimental structures with homology models allows us to significantly expand the number of predictable functions. DeepFRI has significant de-noising capability, with only a minor drop in performance when experimental structures are replaced by protein models. Class activation mapping allows function predictions at an unprecedented resolution, allowing site-specific annotations at the residue-level in an automated manner. We show the utility and high performance of our method by annotating structures from the PDB and SWISS-MODEL, making several new confident function predictions. DeepFRI is available as a webserver at https://beta.deepfri.flatironinstitute.org/ .
Collapse
|
3
|
Ferruz N, Noske J, Höcker B. Protlego: A Python package for the analysis and design of chimeric proteins. Bioinformatics 2021; 37:3182-3189. [PMID: 33901273 PMCID: PMC8504633 DOI: 10.1093/bioinformatics/btab253] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Revised: 03/05/2021] [Accepted: 04/19/2021] [Indexed: 01/03/2023] Open
Abstract
Motivation Duplication and recombination of protein fragments have led to the highly diverse protein space that we observe today. By mimicking this natural process, the design of protein chimeras via fragment recombination has proven experimentally successful and has opened a new era for the design of customizable proteins. The in silico building of structural models for these chimeric proteins, however, remains a manual task that requires a considerable degree of expertise and is not amenable for high-throughput studies. Energetic and structural analysis of the designed proteins often require the use of several tools, each with their unique technical difficulties and available in different programming languages or web servers. Results We implemented a Python package that enables automated, high-throughput design of chimeras and their structural analysis. First, it fetches evolutionarily conserved fragments from a built-in database (also available at fuzzle.uni-bayreuth.de). These relationships can then be represented via networks or further selected for chimera construction via recombination. Designed chimeras or natural proteins are then scored and minimized with the Charmm and Amber forcefields and their diverse structural features can be analyzed at ease. Here, we showcase Protlego’s pipeline by exploring the relationships between the P-loop and Rossmann superfolds, building and characterizing their offspring chimeras. We believe that Protlego provides a powerful new tool for the protein design community. Availability and implementation Protlego runs on the Linux platform and is freely available at (https://hoecker-lab.github.io/protlego/) with tutorials and documentation. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Jakob Noske
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| |
Collapse
|
4
|
Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput Biol 2021; 17:e1008865. [PMID: 33770072 PMCID: PMC8026059 DOI: 10.1371/journal.pcbi.1008865] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 04/07/2021] [Accepted: 03/10/2021] [Indexed: 12/24/2022] Open
Abstract
The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top-L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top-L/5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library. Ab initio protein folding has been a major unsolved problem in computational biology for more than half a century. Recent community-wide Critical Assessment of Structure Prediction (CASP) experiments have witnessed exciting progress on ab initio structure prediction, which was mainly powered by the boosting of contact-map prediction as the latter can be used as constraints to guide ab initio folding simulations. In this work, we proposed a new open-source deep-learning architecture, TripletRes, built on the residual convolutional neural networks for high-accuracy contact prediction. The large-scale benchmark and blind test results demonstrate competitive performance of the proposed methods to other top approaches in predicting medium- and long-range contact-maps that are critical for guiding protein folding simulations. Detailed data analyses showed that the major advantage of TripletRes lies in the unique protocol to fuse multiple evolutionary feature matrices which are directly extracted from whole-genome and metagenome databases and therefore minimize the information loss during the contact model training.
Collapse
|
5
|
Hameduh T, Haddad Y, Adam V, Heger Z. Homology modeling in the time of collective and artificial intelligence. Comput Struct Biotechnol J 2020; 18:3494-3506. [PMID: 33304450 PMCID: PMC7695898 DOI: 10.1016/j.csbj.2020.11.007] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 12/12/2022] Open
Abstract
Homology modeling is a method for building protein 3D structures using protein primary sequence and utilizing prior knowledge gained from structural similarities with other proteins. The homology modeling process is done in sequential steps where sequence/structure alignment is optimized, then a backbone is built and later, side-chains are added. Once the low-homology loops are modeled, the whole 3D structure is optimized and validated. In the past three decades, a few collective and collaborative initiatives allowed for continuous progress in both homology and ab initio modeling. Critical Assessment of protein Structure Prediction (CASP) is a worldwide community experiment that has historically recorded the progress in this field. Folding@Home and Rosetta@Home are examples of crowd-sourcing initiatives where the community is sharing computational resources, whereas RosettaCommons is an example of an initiative where a community is sharing a codebase for the development of computational algorithms. Foldit is another initiative where participants compete with each other in a protein folding video game to predict 3D structure. In the past few years, contact maps deep machine learning was introduced to the 3D structure prediction process, adding more information and increasing the accuracy of models significantly. In this review, we will take the reader in a journey of exploration from the beginnings to the most recent turnabouts, which have revolutionized the field of homology modeling. Moreover, we discuss the new trends emerging in this rapidly growing field.
Collapse
Affiliation(s)
- Tareq Hameduh
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
| | - Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Vojtech Adam
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| |
Collapse
|
6
|
Škrbić T, Hoang TX, Maritan A, Banavar JR, Giacometti A. Local symmetry determines the phases of linear chains: a simple model for the self-assembly of peptides. SOFT MATTER 2019; 15:5596-5613. [PMID: 31259346 DOI: 10.1039/c9sm00851a] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We discuss the relation between the emergence of new phases with broken symmetry within the framework of simple models of biopolymers. We start with a classic model for a chain molecule of spherical beads tethered together, with the steric constraint that non-consecutive beads cannot overlap, and with a pairwise attractive square well potential accounting for the hydrophobic effect and promoting compaction. We then discuss the consequences of the successive breaking of spurious symmetries. First, we allow the partial interpenetration of consecutive beads. In addition to the standard high temperature coil phase and the low temperature collapsed phase, this results in a new class of marginally compact ground states comprising conformations reminiscent of α-helices and β-sheets, the building blocks of the native states of globular proteins. We then discuss the effect of a further symmetry breaking of the cylindrical symmetry on attaching a side-sphere to the backbone beads along the negative normal of the chain, to mimic the presence of side chains in real proteins. This leads to the emergence of a novel phase within the previously obtained marginally compact phase, with the appearance of more complex secondary structure assemblies. The potential importance of this new phase in the de novo design of self-assembled peptides is highlighted.
Collapse
Affiliation(s)
- Tatjana Škrbić
- Department of Physics and Institute for Theoretical Science, 1274 University of Oregon, Eugene, OR 97403-1274, USA. and Dipartimento di Scienze Molecolari e Nanosistemi, Università Ca' Foscari di Venezia, Campus Scientifico, Edificio Alfa, via Torino 155, 30170 Venezia Mestre, Italy.
| | - Trinh Xuan Hoang
- Center for Computational Physics Institute of Physics, Vietnam Academy of Science and Technology, 10 Dao Tan St., Hanoi, Vietnam.
| | - Amos Maritan
- Dipartimento di Fisica e Astronomia, Università di Padova, and INFN, via Marzolo 8, I-35131 Padova, Italy.
| | - Jayanth R Banavar
- Department of Physics and Institute for Theoretical Science, 1274 University of Oregon, Eugene, OR 97403-1274, USA.
| | - Achille Giacometti
- Dipartimento di Scienze Molecolari e Nanosistemi, Università Ca' Foscari di Venezia, Campus Scientifico, Edificio Alfa, via Torino 155, 30170 Venezia Mestre, Italy.
| |
Collapse
|
7
|
Lubecka EA, Liwo A. Introduction of a bounded penalty function in contact-assisted simulations of protein structures to omit false restraints. J Comput Chem 2019; 40:2164-2178. [PMID: 31037754 DOI: 10.1002/jcc.25847] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 03/29/2019] [Accepted: 04/14/2019] [Indexed: 12/26/2022]
Abstract
Contact-assisted simulations, the contacts being predicted or determined experimentally, have become very important in the determination of the structures of proteins and other biological macromolecules. In this work, the effect of contact-distance restraints on the simulated structures was investigated with the use of multiplexed replica exchange simulations with the coarse-grained UNRES force field. A modified bounded flat-bottom restraint function that does not generate a gradient when a restraint cannot be satisfied was implemented. Calculations were run with (i) a set of four small proteins, with contact restraints derived from experimental structures, and (ii) selected CASP11 and CASP12 targets, with restraints as used at prediction time. The bounded penalty function largely omitted false contacts, which were usually inconsistent. It was found that at least 20% of correct contacts must be present in the restraint set to improve model quality with respect to unrestrained simulations. © 2019 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Emilia A Lubecka
- Institute of Informatics, Faculty of Mathematics, Physics and Informatics, University of Gdańsk, Wita Stwosza 57, 80-308 Gdańsk, Poland.,Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| |
Collapse
|
8
|
De Oliveira CCS, Pereira GRC, De Alcantara JYS, Antunes D, Caffarena ER, De Mesquita JF. In silico analysis of the V66M variant of human BDNF in psychiatric disorders: An approach to precision medicine. PLoS One 2019; 14:e0215508. [PMID: 30998730 PMCID: PMC6472887 DOI: 10.1371/journal.pone.0215508] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Accepted: 04/04/2019] [Indexed: 11/19/2022] Open
Abstract
Brain-derived neurotrophic factor (BDNF) plays an important role in neurogenesis and synapse formation. The V66M is the most prevalent BDNF mutation in humans and impairs the function and distribution of BDNF. This mutation is related to several psychiatric disorders. The pro-region of BDNF, particularly position 66 and its adjacent residues, are determinant for the intracellular sorting and activity-dependent secretion of BDNF. However, it has not yet been fully elucidated. The present study aims to analyze the effects of the V66M mutation on BDNF structure and function. Here, we applied nine algorithms, including SIFT and PolyPhen-2, for functional and stability prediction of the V66M mutation. The complete theoretical model of BNDF was generated by Rosetta and validated by PROCHECK, RAMPAGE, ProSa, QMEAN and Verify-3D algorithms. Structural alignment was performed using TM-align. Phylogenetic analysis was performed using the ConSurf server. Molecular dynamics (MD) simulations were performed and analyzed using the GROMACS 2018.2 package. The V66M mutation was predicted as deleterious by PolyPhen-2 and SIFT in addition to being predicted as destabilizing by I-Mutant. According to SNPeffect, the V66M mutation does not affect protein aggregation, amyloid propensity, and chaperone binding. The complete theoretical structure of BDNF proved to be a reliable model. Phylogenetic analysis indicated that the V66M mutation of BDNF occurs at a non-conserved position of the protein. MD analyses indicated that the V66M mutation does not affect the BDNF flexibility and surface-to-volume ratio, but affects the BDNF essential motions, hydrogen-bonding and secondary structure particularly at its pre and pro-domain, which are crucial for its activity and distribution. Thus, considering that these parameters are determinant for protein interactions and, consequently, protein function; the alterations observed throughout the MD analyses may be related to the functional impairment of BDNF upon V66M mutation, as well as its involvement in psychiatric disorders.
Collapse
Affiliation(s)
- Clara Carolina Silva De Oliveira
- Department of Genetics and Molecular Biology, Bioinformatics and Computational Biology Laboratory, Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Rio de Janeiro, Brazil
| | - Gabriel Rodrigues Coutinho Pereira
- Department of Genetics and Molecular Biology, Bioinformatics and Computational Biology Laboratory, Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Rio de Janeiro, Brazil
| | - Jamile Yvis Santos De Alcantara
- Department of Genetics and Molecular Biology, Bioinformatics and Computational Biology Laboratory, Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Rio de Janeiro, Brazil
| | - Deborah Antunes
- Computational Biophysics and Molecular Modeling Group, Scientific Computing Program (PROCC), Fundação Oswaldo Cruz, Manguinhos, Rio de Janeiro, Brazil
| | - Ernesto Raul Caffarena
- Computational Biophysics and Molecular Modeling Group, Scientific Computing Program (PROCC), Fundação Oswaldo Cruz, Manguinhos, Rio de Janeiro, Brazil
| | - Joelma Freire De Mesquita
- Department of Genetics and Molecular Biology, Bioinformatics and Computational Biology Laboratory, Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
9
|
Dodd PM, Damasceno PF, Glotzer SC. Universal folding pathways of polyhedron nets. Proc Natl Acad Sci U S A 2018; 115:E6690-E6696. [PMID: 29970420 PMCID: PMC6055160 DOI: 10.1073/pnas.1722681115] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Low-dimensional objects such as molecular strands, ladders, and sheets have intrinsic features that affect their propensity to fold into 3D objects. Understanding this relationship remains a challenge for de novo design of functional structures. Using molecular dynamics simulations, we investigate the refolding of the 24 possible 2D unfoldings ("nets") of the three simplest Platonic shapes and demonstrate that attributes of a net's topology-net compactness and leaves on the cutting graph-correlate with thermodynamic folding propensity. To explain these correlations we exhaustively enumerate the pathways followed by nets during folding and identify a crossover temperature [Formula: see text] below which nets fold via nonnative contacts (bonds must break before the net can fold completely) and above which nets fold via native contacts (newly formed bonds are also present in the folded structure). Folding above [Formula: see text] shows a universal balance between reduction of entropy via the elimination of internal degrees of freedom when bonds are formed and gain in potential energy via local, cooperative edge binding. Exploiting this universality, we devised a numerical method to efficiently compute all high-temperature folding pathways for any net, allowing us to predict, among the combined 86,760 nets for the remaining Platonic solids, those with highest folding propensity. Our results provide a general heuristic for the design of 2D objects to stochastically fold into target 3D geometries and suggest a mechanism by which geometry and folding propensity are related above [Formula: see text], where native bonds dominate folding.
Collapse
Affiliation(s)
- Paul M Dodd
- Chemical Engineering Department, University of Michigan, Ann Arbor, MI 48109
| | - Pablo F Damasceno
- Applied Physics Program, University of Michigan, Ann Arbor, MI 48109
| | - Sharon C Glotzer
- Chemical Engineering Department, University of Michigan, Ann Arbor, MI 48109;
- Applied Physics Program, University of Michigan, Ann Arbor, MI 48109
- Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109
- Biointerfaces Institute, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
10
|
Kim SS, Seffernick JT, Lindert S. Accurately Predicting Disordered Regions of Proteins Using Rosetta ResidueDisorder Application. J Phys Chem B 2018; 122:3920-3930. [PMID: 29595057 DOI: 10.1021/acs.jpcb.8b01763] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Although many proteins necessitate well-folded structures to properly instigate their biological functions, a large fraction of functioning proteins contain regions-known as intrinsically disordered protein regions-where stable structures are not likely to form. Notable functional roles of intrinsically disordered proteins are in transcriptional regulation, translation, and cellular signal transduction. Moreover, intrinsically disordered protein regions are highly abundant in many proteins associated with various human diseases, therefore these segments have become attractive drug targets for potential therapeutics. Over the past decades, numerous computational methods have been developed to accurately predict disordered regions of proteins. Here we introduce a user-friendly and reliable approach for the prediction of disordered protein regions using the structure prediction software Rosetta. Using 245 proteins from a benchmark data set (16 DisProt database proteins) and a test data set (229 proteins with NMR data), we use Rosetta to predict the global protein structures and then show that there is a statistically significant difference between Rosetta scores in disordered and ordered regions, with scores being less favorable in disordered regions. Furthermore, the difference in scores between ordered and disordered protein regions is sufficient to accurately identify disordered protein regions. As a result, our Rosetta ResidueDisorder method (benchmark data set prediction accuracy of 71.77% and independent test data set prediction accuracy of 65.37%) outperformed other established disorder prediction tools and did not exhibit a biased prediction toward either ordered or disordered regions. To facilitate usage, a Rosetta application has been developed for the Rosetta ResidueDisorder method.
Collapse
Affiliation(s)
- Stephanie S Kim
- Department of Chemistry and Biochemistry , Ohio State University , Columbus , Ohio 43210 , United States
| | - Justin T Seffernick
- Department of Chemistry and Biochemistry , Ohio State University , Columbus , Ohio 43210 , United States
| | - Steffen Lindert
- Department of Chemistry and Biochemistry , Ohio State University , Columbus , Ohio 43210 , United States
| |
Collapse
|
11
|
Li B, Fooksa M, Heinze S, Meiler J. Finding the needle in the haystack: towards solving the protein-folding problem computationally. Crit Rev Biochem Mol Biol 2018; 53:1-28. [PMID: 28976219 PMCID: PMC6790072 DOI: 10.1080/10409238.2017.1380596] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Revised: 08/22/2017] [Accepted: 09/13/2017] [Indexed: 12/22/2022]
Abstract
Prediction of protein tertiary structures from amino acid sequence and understanding the mechanisms of how proteins fold, collectively known as "the protein folding problem," has been a grand challenge in molecular biology for over half a century. Theories have been developed that provide us with an unprecedented understanding of protein folding mechanisms. However, computational simulation of protein folding is still difficult, and prediction of protein tertiary structure from amino acid sequence is an unsolved problem. Progress toward a satisfying solution has been slow due to challenges in sampling the vast conformational space and deriving sufficiently accurate energy functions. Nevertheless, several techniques and algorithms have been adopted to overcome these challenges, and the last two decades have seen exciting advances in enhanced sampling algorithms, computational power and tertiary structure prediction methodologies. This review aims at summarizing these computational techniques, specifically conformational sampling algorithms and energy approximations that have been frequently used to study protein-folding mechanisms or to de novo predict protein tertiary structures. We hope that this review can serve as an overview on how the protein-folding problem can be studied computationally and, in cases where experimental approaches are prohibitive, help the researcher choose the most relevant computational approach for the problem at hand. We conclude with a summary of current challenges faced and an outlook on potential future directions.
Collapse
Affiliation(s)
- Bian Li
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Michaela Fooksa
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
- Chemical and Physical Biology Graduate Program, Vanderbilt University, Nashville, TN, USA
| | - Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
12
|
Faure G, Ogurtsov AY, Shabalina SA, Koonin EV. Adaptation of mRNA structure to control protein folding. RNA Biol 2017; 14:1649-1654. [PMID: 28722509 DOI: 10.1080/15476286.2017.1349047] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
Abstract
Comparison of mRNA and protein structures shows that highly structured mRNAs typically encode compact protein domains suggesting that mRNA structure controls protein folding. This function is apparently performed by distinct structural elements in the mRNA, which implies 'fine tuning' of mRNA structure under selection for optimal protein folding. We find that, during evolution, changes in the mRNA folding energy follow amino acid replacements, reinforcing the notion of an intimate connection between the structures of a mRNA and the protein it encodes, and the double encoding of protein sequence and folding in the mRNA.
Collapse
Affiliation(s)
- Guilhem Faure
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| | - Aleksey Y Ogurtsov
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| | - Svetlana A Shabalina
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| | - Eugene V Koonin
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| |
Collapse
|
13
|
Stahl K, Schneider M, Brock O. EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction. BMC Bioinformatics 2017; 18:303. [PMID: 28623886 PMCID: PMC5474060 DOI: 10.1186/s12859-017-1713-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 05/30/2017] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Accurately predicted contacts allow to compute the 3D structure of a protein. Since the solution space of native residue-residue contact pairs is very large, it is necessary to leverage information to identify relevant regions of the solution space, i.e. correct contacts. Every additional source of information can contribute to narrowing down candidate regions. Therefore, recent methods combined evolutionary and sequence-based information as well as evolutionary and physicochemical information. We develop a new contact predictor (EPSILON-CP) that goes beyond current methods by combining evolutionary, physicochemical, and sequence-based information. The problems resulting from the increased dimensionality and complexity of the learning problem are combated with a careful feature analysis, which results in a drastically reduced feature set. The different information sources are combined using deep neural networks. RESULTS On 21 hard CASP11 FM targets, EPSILON-CP achieves a mean precision of 35.7% for top- L/10 predicted long-range contacts, which is 11% better than the CASP11 winning version of MetaPSICOV. The improvement on 1.5L is 17%. Furthermore, in this study we find that the amino acid composition, a commonly used feature, is rendered ineffective in the context of meta approaches. The size of the refined feature set decreased by 75%, enabling a significant increase in training data for machine learning, contributing significantly to the observed improvements. CONCLUSIONS Exploiting as much and diverse information as possible is key to accurate contact prediction. Simply merging the information introduces new challenges. Our study suggests that critical feature analysis can improve the performance of contact prediction methods that combine multiple information sources. EPSILON-CP is available as a webservice: http://compbio.robotics.tu-berlin.de/epsilon/.
Collapse
Affiliation(s)
- Kolja Stahl
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| | - Michael Schneider
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| | - Oliver Brock
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| |
Collapse
|
14
|
Molecular design and downstream processing of turoctocog alfa (NovoEight), a B-domain truncated factor VIII molecule. Blood Coagul Fibrinolysis 2017; 27:568-75. [PMID: 26761578 PMCID: PMC4935534 DOI: 10.1097/mbc.0000000000000477] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Turoctocog alfa (NovoEight) is a third-generation recombinant factor VIII (rFVIII) with a truncated B-domain that is manufactured in Chinese hamster ovary cells. No human or animal-derived materials are used in the process. The aim of this study is to describe the molecular design and purification process for turoctocog alfa. A five-step purification process is applied to turoctocog alfa: protein capture on mixed-mode resin; immunoaffinity chromatography using a unique, recombinantly produced anti-FVIII mAb; anion exchange chromatography; nanofiltration and size exclusion chromatography. This process enabled reduction of impurities such as host cell proteins (HCPs) and high molecular weight proteins (HMWPs) to a very low level. The immunoaffinity step is very important for the removal of FVIII-related degradation products. Manufacturing scale data shown in this article confirmed the robustness of the purification process and a reliable and consistent reduction of the impurities. The contribution of each step to the final product purity is described and shown for three manufacturing batches. Turoctocog alfa, a third-generation B-domain truncated rFVIII product is manufactured in Chinese hamster ovary cells without the use of animal or human-derived proteins. The five-step purification process results in a homogenous, highly purified rFVIII product.
Collapse
|
15
|
Abstract
Globular proteins typically fold into tightly packed arrays of regular secondary structures. We developed a model to approximate the compact parallel and antiparallel arrangement of α-helices and β-strands, enumerated all possible topologies formed by up to five secondary structural elements (SSEs), searched for their occurrence in spatial structures of proteins, and documented their frequencies of occurrence in the PDB. The enumeration model grows larger super-secondary structure patterns (SSPs) by combining pairs of smaller patterns, a process that approximates a potential path of protein fold evolution. The most prevalent SSPs are typically present in superfolds such as the Rossmann-like fold, the ferredoxin-like fold, and the Greek key motif, whereas the less frequent SSPs often possess uncommon structure features such as split β-sheets, left-handed connections, and crossing loops. This complete SSP enumeration model, for the first time, allows us to investigate which theoretically possible SSPs are not observed in available protein structures. All SSPs with up to four SSEs occurred in proteins. However, among the SSPs with five SSEs, approximately 20% (218) are absent from existing folds. Of these unobserved SSPs, 80% contain two or more uncommon structure features. To facilitate future efforts in protein structure classification, engineering, and design, we provide the resulting patterns and their frequency of occurrence in proteins at: http://prodata.swmed.edu/ssps/.
Collapse
|
16
|
Fischer AW, Heinze S, Putnam DK, Li B, Pino JC, Xia Y, Lopez CF, Meiler J. CASP11--An Evaluation of a Modular BCL::Fold-Based Protein Structure Prediction Pipeline. PLoS One 2016; 11:e0152517. [PMID: 27046050 PMCID: PMC4821492 DOI: 10.1371/journal.pone.0152517] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2015] [Accepted: 03/15/2016] [Indexed: 11/18/2022] Open
Abstract
In silico prediction of a protein's tertiary structure remains an unsolved problem. The community-wide Critical Assessment of Protein Structure Prediction (CASP) experiment provides a double-blind study to evaluate improvements in protein structure prediction algorithms. We developed a protein structure prediction pipeline employing a three-stage approach, consisting of low-resolution topology search, high-resolution refinement, and molecular dynamics simulation to predict the tertiary structure of proteins from the primary structure alone or including distance restraints either from predicted residue-residue contacts, nuclear magnetic resonance (NMR) nuclear overhauser effect (NOE) experiments, or mass spectroscopy (MS) cross-linking (XL) data. The protein structure prediction pipeline was evaluated in the CASP11 experiment on twenty regular protein targets as well as thirty-three 'assisted' protein targets, which also had distance restraints available. Although the low-resolution topology search module was able to sample models with a global distance test total score (GDT_TS) value greater than 30% for twelve out of twenty proteins, frequently it was not possible to select the most accurate models for refinement, resulting in a general decay of model quality over the course of the prediction pipeline. In this study, we provide a detailed overall analysis, study one target protein in more detail as it travels through the protein structure prediction pipeline, and evaluate the impact of limited experimental data.
Collapse
Affiliation(s)
- Axel W. Fischer
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37232, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37232, United States of America
| | - Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37232, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37232, United States of America
| | - Daniel K. Putnam
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, 37232, United States of America
| | - Bian Li
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37232, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37232, United States of America
| | - James C. Pino
- Chemical and Physical Biology Graduate Program, Vanderbilt University, Nashville, TN, 37232, United States of America
| | - Yan Xia
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37232, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37232, United States of America
| | - Carlos F. Lopez
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37232, United States of America
- Department of Cancer Biology and Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, 37232, United States of America
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37232, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, TN, 37232, United States of America
| |
Collapse
|
17
|
Heinze S, Putnam DK, Fischer AW, Kohlmann T, Weiner BE, Meiler J. CASP10-BCL::Fold efficiently samples topologies of large proteins. Proteins 2015; 83:547-63. [PMID: 25581562 DOI: 10.1002/prot.24733] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2014] [Revised: 10/15/2014] [Accepted: 11/03/2014] [Indexed: 12/26/2022]
Abstract
During CASP10 in summer 2012, we tested BCL::Fold for prediction of free modeling (FM) and template-based modeling (TBM) targets. BCL::Fold assembles the tertiary structure of a protein from predicted secondary structure elements (SSEs) omitting more flexible loop regions early on. This approach enables the sampling of conformational space for larger proteins with more complex topologies. In preparation of CASP11, we analyzed the quality of CASP10 models throughout the prediction pipeline to understand BCL::Fold's ability to sample the native topology, identify native-like models by scoring and/or clustering approaches, and our ability to add loop regions and side chains to initial SSE-only models. The standout observation is that BCL::Fold sampled topologies with a GDT_TS score > 33% for 12 of 18 and with a topology score > 0.8 for 11 of 18 test cases de novo. Despite the sampling success of BCL::Fold, significant challenges still exist in clustering and loop generation stages of the pipeline. The clustering approach employed for model selection often failed to identify the most native-like assembly of SSEs for further refinement and submission. It was also observed that for some β-strand proteins model refinement failed as β-strands were not properly aligned to form hydrogen bonds removing otherwise accurate models from the pool. Further, BCL::Fold samples frequently non-natural topologies that require loop regions to pass through the center of the protein.
Collapse
Affiliation(s)
- Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, 37240
| | | | | | | | | | | |
Collapse
|
18
|
Putnam DK, Weiner BE, Woetzel N, Lowe EW, Meiler J. BCL::SAXS: GPU accelerated Debye method for computation of small angle X-ray scattering profiles. Proteins 2015; 83:1500-12. [PMID: 26018949 PMCID: PMC4797635 DOI: 10.1002/prot.24838] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Revised: 05/08/2015] [Accepted: 05/19/2015] [Indexed: 12/25/2022]
Abstract
Small angle X-ray scattering (SAXS) is an experimental technique used for structural characterization of macromolecules in solution. Here, we introduce BCL::SAXS--an algorithm designed to replicate SAXS profiles from rigid protein models at different levels of detail. We first show our derivation of BCL::SAXS and compare our results with the experimental scattering profile of hen egg white lysozyme. Using this protein we show how to generate SAXS profiles representing: (1) complete models, (2) models with approximated side chain coordinates, and (3) models with approximated side chain and loop region coordinates. We evaluated the ability of SAXS profiles to identify a correct protein topology from a non-redundant benchmark set of proteins. We find that complete SAXS profiles can be used to identify the correct protein by receiver operating characteristic (ROC) analysis with an area under the curve (AUC) > 99%. We show how our approximation of loop coordinates between secondary structure elements improves protein recognition by SAχS for protein models without loop regions and side chains. Agreement with SAXS data is a necessary but not sufficient condition for structure determination. We conclude that experimental SAXS data can be used as a filter to exclude protein models with large structural differences from the native.
Collapse
Affiliation(s)
- Daniel K. Putnam
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37235, USA
| | - Brian E. Weiner
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
| | - Nils Woetzel
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
| | - Edward W. Lowe
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
| | - Jens Meiler
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37235, USA
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
| |
Collapse
|
19
|
Hofmann T, Fischer AW, Meiler J, Kalkhof S. Protein structure prediction guided by crosslinking restraints--A systematic evaluation of the impact of the crosslinking spacer length. Methods 2015; 89:79-90. [PMID: 25986934 DOI: 10.1016/j.ymeth.2015.05.014] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2015] [Revised: 04/21/2015] [Accepted: 05/12/2015] [Indexed: 11/15/2022] Open
Abstract
Recent development of high-resolution mass spectrometry (MS) instruments enables chemical crosslinking (XL) to become a high-throughput method for obtaining structural information about proteins. Restraints derived from XL-MS experiments have been used successfully for structure refinement and protein-protein docking. However, one formidable question is under which circumstances XL-MS data might be sufficient to determine a protein's tertiary structure de novo? Answering this question will not only include understanding the impact of XL-MS data on sampling and scoring within a de novo protein structure prediction algorithm, it must also determine an optimal crosslinker type and length for protein structure determination. While a longer crosslinker will yield more restraints, the value of each restraint for protein structure prediction decreases as the restraint is consistent with a larger conformational space. In this study, the number of crosslinks and their discriminative power was systematically analyzed in silico on a set of 2055 non-redundant protein folds considering Lys-Lys, Lys-Asp, Lys-Glu, Cys-Cys, and Arg-Arg reactive crosslinkers between 1 and 60Å. Depending on the protein size a heuristic was developed that determines the optimal crosslinker length. Next, simulated restraints of variable length were used to de novo predict the tertiary structure of fifteen proteins using the BCL::Fold algorithm. The results demonstrate that a distinct crosslinker length exists for which information content for de novo protein structure prediction is maximized. The sampling accuracy improves on average by 1.0 Å and up to 2.2 Å in the most prominent example. XL-MS restraints enable consistently an improved selection of native-like models with an average enrichment of 2.1.
Collapse
Affiliation(s)
- Tommy Hofmann
- Department of Proteomics, Helmholtz-Centre for Environmental Research - UFZ, Leipzig D-04318, Germany
| | - Axel W Fischer
- Department of Chemistry and Center for Structural Biology, Vanderbilt University, Nashville, TN 37232, USA
| | - Jens Meiler
- Department of Chemistry and Center for Structural Biology, Vanderbilt University, Nashville, TN 37232, USA.
| | - Stefan Kalkhof
- Department of Proteomics, Helmholtz-Centre for Environmental Research - UFZ, Leipzig D-04318, Germany; Department of Bioanalytics, University of Applied Sciences and Arts of Coburg, D-96450 Coburg, Germany.
| |
Collapse
|
20
|
Maurice KJ. SSThread: Template-free protein structure prediction by threading pairs of contacting secondary structures followed by assembly of overlapping pairs. J Comput Chem 2014; 35:644-56. [PMID: 24523210 DOI: 10.1002/jcc.23543] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2013] [Revised: 11/15/2013] [Accepted: 01/05/2014] [Indexed: 11/12/2022]
Abstract
Acquiring the three-dimensional structure of a protein from its amino acid sequence alone, despite a great deal of work and significant progress on the subject, is still an unsolved problem. SSThread, a new template-free algorithm is described here that consists of making several predictions of contacting pairs of α-helices and β-strands derived from a database of experimental structures using a knowledge-based potential, secondary structure prediction, and contact map prediction followed by assembly of overlapping pair predictions to create an ensemble of core structure predictions whose loops are then predicted. In a set of seven CASP10 targets SSThread outperformed the two leading methods for two targets each. The targets were all β-strand containing structures and most of them have a high relative contact order which demonstrates the advantages of SSThread. The primary bottlenecks based on sets of 74 and 21 test cases are the pair prediction and loop prediction stages.
Collapse
|
21
|
WU XUE, FU TING, XIU ZHILONG, YIN LIU, WANG JINGUANG, LI GUOHUI. COMPARING FOLDING MECHANISMS OF DIFFERENT PRION PROTEINS BY Gō MODEL. JOURNAL OF THEORETICAL & COMPUTATIONAL CHEMISTRY 2013. [DOI: 10.1142/s0219633613410046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Prions are associated with neurodegenerative diseases induced by transmissible spongiform encephalopathies. The infectious scrapie form is referred to as PrP Sc , which has conformational change from normal prion with predominant α-helical conformation to the abnormal PrP Sc that is rich in β-sheet content. Neurodegenerative diseases have been found from both human and bovine sources, but there are no reports about infected by transmissible spongiform encephalopathies from rabbit, canine and horse sources. Here we used coarse-grained Gō model to compare the difference among human, bovine, rabbit, canine, and horse normal (cellular) prion proteins. The denatured state of normal prion has relation with the conversion from normal to abnormal prion protein, so we used all-atom Gō model to investigate the folding pathway and energy landscape for human prion protein. Through using coarse-grained Gō model, the cooperativity of the five prion proteins was characterized in terms of calorimetric criterion, sigmoidal transition, and free-energy profile. The rabbit and horse prion proteins have higher folding free-energy barrier and cooperativity, and canine prion protein has slightly higher folding free-energy barrier comparing with human and bovine prion proteins. The results from all-atom Gō model confirmed the validity of C α-Gō model. The correlations of our results with previous experimental and theoretical researches were discussed.
Collapse
Affiliation(s)
- XUE WU
- School of Life Science and Biotechnology, Dalian University of Technology, Linggong Road 2, Dalian 116024, P. R. China
- Laboratory of Molecular Modeling and Design, State key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Science 457, Zhongshan Road, Dalian, Liaoning, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| | - TING FU
- School of Life Science and Biotechnology, Dalian University of Technology, Linggong Road 2, Dalian 116024, P. R. China
- Laboratory of Molecular Modeling and Design, State key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Science 457, Zhongshan Road, Dalian, Liaoning, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| | - ZHI-LONG XIU
- School of Life Science and Biotechnology, Dalian University of Technology, Linggong Road 2, Dalian 116024, P. R. China
| | - LIU YIN
- Oncology Department in the 1st Affiliated Hospital of Dalian, Medical University, 222 Zhongshan Road, Liaoning Province, Dalian 116011, P. R. China
| | - JIN-GUANG WANG
- Thoracic Surgery Department in the 1st Affiliated Hospital of Dalian, Medical University, 222 Zhongshan Road, Liaoning Province, Dalian 116011, P. R. China
| | - GUO-HUI LI
- Laboratory of Molecular Modeling and Design, State key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Science 457, Zhongshan Road, Dalian, Liaoning, P. R. China
| |
Collapse
|
22
|
Mono and dual cofactor dependence of human cystathionine β-synthase enzyme variants in vivo and in vitro. G3-GENES GENOMES GENETICS 2013; 3:1619-28. [PMID: 23934999 PMCID: PMC3789787 DOI: 10.1534/g3.113.006916] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Any two individuals differ from each other by an average of 3 million single-nucleotide polymorphisms. Some polymorphisms have a functional impact on cofactor-using enzymes and therefore represent points of possible therapeutic intervention through elevated-cofactor remediation. Because most known disease-causing mutations affect protein stability, we evaluated how the in vivo impact caused by single amino acid substitutions in a prototypical enzyme of this type compared with physical characteristics of the variant enzymes in vitro. We focused on cystathionine β-synthase (CBS) because of its clinical relevance in homocysteine metabolism and because some variants of the enzyme are clinically responsive to increased levels of its B6 cofactor. Single amino-acid substitutions throughout the CBS protein caused reduced function in vivo, and a subset of these altered sensitivity to limiting B6-cofactor. Some of these B6-sensitive substitutions also had altered sensitivity to limiting heme, another CBS cofactor. Limiting heme resulted in reduced incorporation of heme into these variants, and subsequently increased protease sensitivity of the enzyme in vitro. We hypothesize that these alleles caused a modest, yet significant, destabilization of the native state of the protein, and that the functional impact of the amino acid substitutions caused by these alleles can be influenced by cofactor(s) even when the affected amino acid is distant from the cofactor binding site.
Collapse
|
23
|
Weiner BE, Woetzel N, Karakas M, Alexander N, Meiler J. BCL::MP-fold: folding membrane proteins through assembly of transmembrane helices. Structure 2013; 21:1107-17. [PMID: 23727232 PMCID: PMC3738745 DOI: 10.1016/j.str.2013.04.022] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2013] [Revised: 04/10/2013] [Accepted: 04/25/2013] [Indexed: 12/01/2022]
Abstract
Membrane protein structure determination remains a challenging endeavor. Computational methods that predict membrane protein structure from sequence can potentially aid structure determination for such difficult target proteins. The de novo protein structure prediction method BCL::Fold rapidly assembles secondary structure elements into three-dimensional models. Here, we describe modifications to the algorithm, named BCL::MP-Fold, in order to simulate membrane protein folding. Models are built into a static membrane object and are evaluated using a knowledge-based energy potential, which has been modified to account for the membrane environment. Additionally, a symmetry folding mode allows for the prediction of obligate homomultimers, a common property among membrane proteins. In a benchmark test of 40 proteins of known structure, the method sampled the correct topology in 34 cases. This demonstrates that the algorithm can accurately predict protein topology without the need for large multiple sequence alignments, homologous template structures, or experimental restraints.
Collapse
Affiliation(s)
- Brian E. Weiner
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville TN, 37232, USA
| | - Nils Woetzel
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville TN, 37232, USA
| | - Mert Karakas
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville TN, 37232, USA
| | - Nathan Alexander
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville TN, 37232, USA
| | - Jens Meiler
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville TN, 37232, USA
| |
Collapse
|
24
|
Menon V, Vallat BK, Dybas JM, Fiser A. Modeling proteins using a super-secondary structure library and NMR chemical shift information. Structure 2013; 21:891-9. [PMID: 23685209 DOI: 10.1016/j.str.2013.04.012] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Revised: 04/02/2013] [Accepted: 04/13/2013] [Indexed: 11/29/2022]
Abstract
A remaining challenge in protein modeling is to predict structures for sequences with no sequence similarity to any experimentally solved structure. Based on earlier observations, the library of protein backbone supersecondary structure motifs (Smotifs) saturated about a decade ago. Therefore, it should be possible to build any structure from a combination of existing Smotifs with the help of limited experimental data that are sufficient to relate the backbone conformations of Smotifs between target proteins and known structures. Here, we present a hybrid modeling algorithm that relies on an exhaustive Smotif library and on nuclear magnetic resonance chemical shift patterns without any input of primary sequence information. In a test of 102 proteins, the algorithm delivered 90 homology-model-quality models, among them 24 high-quality ones, and a topologically correct solution for almost all cases. The current approach opens a venue to address the modeling of larger protein structures for which chemical shifts are available.
Collapse
Affiliation(s)
- Vilas Menon
- Department of Systems and Computational Biology, Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | | | | | | |
Collapse
|
25
|
Liu M, He H, Su J. Is it possible to stabilize a thermophilic protein further using sequences and structures of mesophilic proteins: a theoretical case study concerning DgAS. Theor Biol Med Model 2013; 10:26. [PMID: 23575217 PMCID: PMC3639903 DOI: 10.1186/1742-4682-10-26] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Accepted: 03/29/2013] [Indexed: 11/13/2022] Open
Abstract
Incorporating structural elements of thermostable homologs can greatly improve the thermostability of a mesophilic protein. Despite the effectiveness of this method, applying it is often hampered. First, it requires alignment of the target mesophilic protein sequence with those of thermophilic homologs, but not every mesophilic protein has a thermophilic homolog. Second, not all favorable features of a thermophilic protein can be incorporated into the structure of a mesophilic protein. Furthermore, even the most stable native protein is not sufficiently stable for industrial applications. Therefore, creating an industrially applicable protein on the basis of the thermophilic protein could prove advantageous. Amylosucrase (AS) can catalyze the synthesis of an amylose-like polysaccharide composed of only α-1,4-linkages using sucrose as the lone energy source. However, industrial development of AS has been hampered owing to its low thermostability. To facilitate potential industrial applications, the aim of the current study was to improve the thermostability of Deinococcus geothermalis amylosucrase (DgAS) further; this is the most stable AS discovered to date. By integrating ideas from mesophilic AS with well-established protein design protocols, three useful design protocols are proposed, and several promising substitutions were identified using these protocols. The successful application of this hybrid design method indicates that it is possible to stabilize a thermostable protein further by incorporating structural elements of less-stable homologs.
Collapse
Affiliation(s)
- Ming Liu
- Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | | | | |
Collapse
|
26
|
Principles for designing ideal protein structures. Nature 2013; 491:222-7. [PMID: 23135467 DOI: 10.1038/nature11600] [Citation(s) in RCA: 410] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2012] [Accepted: 09/19/2012] [Indexed: 02/03/2023]
Abstract
Unlike random heteropolymers, natural proteins fold into unique ordered structures. Understanding how these are encoded in amino-acid sequences is complicated by energetically unfavourable non-ideal features--for example kinked α-helices, bulged β-strands, strained loops and buried polar groups--that arise in proteins from evolutionary selection for biological function or from neutral drift. Here we describe an approach to designing ideal protein structures stabilized by completely consistent local and non-local interactions. The approach is based on a set of rules relating secondary structure patterns to protein tertiary motifs, which make possible the design of funnel-shaped protein folding energy landscapes leading into the target folded state. Guided by these rules, we designed sequences predicted to fold into ideal protein structures consisting of α-helices, β-strands and minimal loops. Designs for five different topologies were found to be monomeric and very stable and to adopt structures in solution nearly identical to the computational models. These results illuminate how the folding funnels of natural proteins arise and provide the foundation for engineering a new generation of functional proteins free from natural evolution.
Collapse
|
27
|
Karakaş M, Woetzel N, Staritzbichler R, Alexander N, Weiner BE, Meiler J. BCL::Fold--de novo prediction of complex and large protein topologies by assembly of secondary structure elements. PLoS One 2012; 7:e49240. [PMID: 23173050 PMCID: PMC3500284 DOI: 10.1371/journal.pone.0049240] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Accepted: 10/07/2012] [Indexed: 01/10/2023] Open
Abstract
Computational de novo protein structure prediction is limited to small proteins of simple topology. The present work explores an approach to extend beyond the current limitations through assembling protein topologies from idealized α-helices and β-strands. The algorithm performs a Monte Carlo Metropolis simulated annealing folding simulation. It optimizes a knowledge-based potential that analyzes radius of gyration, β-strand pairing, secondary structure element (SSE) packing, amino acid pair distance, amino acid environment, contact order, secondary structure prediction agreement and loop closure. Discontinuation of the protein chain favors sampling of non-local contacts and thereby creation of complex protein topologies. The folding simulation is accelerated through exclusion of flexible loop regions further reducing the size of the conformational search space. The algorithm is benchmarked on 66 proteins with lengths between 83 and 293 amino acids. For 61 out of these proteins, the best SSE-only models obtained have an RMSD100 below 8.0 Å and recover more than 20% of the native contacts. The algorithm assembles protein topologies with up to 215 residues and a relative contact order of 0.46. The method is tailored to be used in conjunction with low-resolution or sparse experimental data sets which often provide restraints for regions of defined secondary structure.
Collapse
Affiliation(s)
- Mert Karakaş
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Nils Woetzel
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Rene Staritzbichler
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Nathan Alexander
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Brian E. Weiner
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Jens Meiler
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| |
Collapse
|
28
|
Lindert S, Alexander N, Wötzel N, Karakaş M, Stewart PL, Meiler J. EM-fold: de novo atomic-detail protein structure determination from medium-resolution density maps. Structure 2012; 20:464-78. [PMID: 22405005 DOI: 10.1016/j.str.2012.01.023] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2011] [Revised: 01/23/2012] [Accepted: 01/26/2012] [Indexed: 11/17/2022]
Abstract
Electron density maps of membrane proteins or large macromolecular complexes are frequently only determined at medium resolution between 4 Å and 10 Å, either by cryo-electron microscopy or X-ray crystallography. In these density maps, the general arrangement of secondary structure elements (SSEs) is revealed, whereas their directionality and connectivity remain elusive. We demonstrate that the topology of proteins with up to 250 amino acids can be determined from such density maps when combined with a computational protein folding protocol. Furthermore, we accurately reconstruct atomic detail in loop regions and amino acid side chains not visible in the experimental data. The EM-Fold algorithm assembles the SSEs de novo before atomic detail is added using Rosetta. In a benchmark of 27 proteins, the protocol consistently and reproducibly achieves models with root mean square deviation values <3 Å.
Collapse
Affiliation(s)
- Steffen Lindert
- Department of Chemistry and Center for Structural Biology, Vanderbilt University, Nashville, TN 37212, USA
| | | | | | | | | | | |
Collapse
|
29
|
Lange OF, Baker D. Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation. Proteins 2012; 80:884-95. [PMID: 22423358 PMCID: PMC3310173 DOI: 10.1002/prot.23245] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
Recent work has shown that NMR structures can be determined by integrating sparse NMR data with structure prediction methods such as Rosetta. The experimental data serve to guide the search for the lowest energy state towards the deep minimum at the native state which is frequently missed in Rosetta de novo structure calculations. However, as the protein size increases, sampling again becomes limiting; for example, the standard Rosetta protocol involving Monte Carlo fragment insertion starting from an extended chain fails to converge for proteins over 150 amino acids even with guidance from chemical shifts (CS-Rosetta) and other NMR data. The primary limitation of this protocol—that every folding trajectory is completely independent of every other—was recently overcome with the development of a new approach involving resolution-adapted structural recombination (RASREC). Here we describe the RASREC approach in detail and compare it to standard CS-Rosetta. We show that the improved sampling of RASREC is essential in obtaining accurate structures over a benchmark set of 11 proteins in the 15-25 kDa size range using chemical shifts, backbone RDCs and HN-HN NOE data; in a number of cases the improved sampling methodology makes a larger contribution than incorporation of additional experimental data. Experimental data are invaluable for guiding sampling to the vicinity of the global energy minimum, but for larger proteins, the standard Rosetta fold-from-extended-chain protocol does not converge on the native minimum even with experimental data and the more powerful RASREC approach is necessary to converge to accurate solutions.
Collapse
Affiliation(s)
- Oliver F Lange
- Department Chemie, Biomolecular NMR and Munich Center for Integrated Protein Science, Technische Universität München, Garching, Germany.
| | | |
Collapse
|
30
|
Lindert S, Hofmann T, Wötzel N, Karakaş M, Stewart PL, Meiler J. Ab initio protein modeling into CryoEM density maps using EM-Fold. Biopolymers 2012; 97:669-77. [PMID: 22302372 DOI: 10.1002/bip.22027] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2011] [Revised: 12/26/2011] [Accepted: 01/09/2012] [Indexed: 11/08/2022]
Abstract
EM-Fold was used to build models for nine proteins in the maps of GroEL (7.7 Å resolution) and ribosome (6.4 Å resolution) in the ab initio modeling category of the 2010 cryo-electron microscopy modeling challenge. EM-Fold assembles predicted secondary structure elements (SSEs) into regions of the density map that were identified to correspond to either α-helices or β-strands. The assembly uses a Monte Carlo algorithm where loop closure, density-SSE length agreement, and strength of connecting density between SSEs are evaluated. Top-scoring models are refined by translating, rotating, and bending SSEs to yield better agreement with the density map. EM-Fold produces models that contain backbone atoms within SSEs only. The RMSD values of the models with respect to native range from 2.4 to 3.5 Å for six of the nine proteins. EM-Fold failed to predict the correct topology in three cases. Subsequently, Rosetta was used to build loops and side chains for the very best scoring models after EM-Fold refinement. The refinement within Rosetta's force field is driven by a density agreement score that calculates a cross-correlation between a density map simulated from the model and the experimental density map. All-atom RMSDs as low as 3.4 Å are achieved in favorable cases. Values above 10.0 Å are observed for two proteins with low overall content of secondary structure and hence particularly complex loop modeling problems. RMSDs over residues in secondary structure elements range from 2.5 to 4.8 Å.
Collapse
Affiliation(s)
- Steffen Lindert
- Department of Chemistry and Center for Structural Biology, Vanderbilt University, Nashville, TN 37212, USA
| | | | | | | | | | | |
Collapse
|
31
|
Duclert-Savatier N, Martínez L, Nilges M, Malliavin TE. The redundancy of NMR restraints can be used to accelerate the unfolding behavior of an SH3 domain during molecular dynamics simulations. BMC STRUCTURAL BIOLOGY 2011; 11:46. [PMID: 22115427 PMCID: PMC3274457 DOI: 10.1186/1472-6807-11-46] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2011] [Accepted: 11/24/2011] [Indexed: 11/29/2022]
Abstract
1 Abstract
Collapse
Affiliation(s)
- Nathalie Duclert-Savatier
- Institut Pasteur, CNRS URA 2185, Unité de Bioinformatique Structurale, 25-28 rue du Dr Roux, F-75724 Paris Cedex 15, France
| | | | | | | |
Collapse
|
32
|
Esque J, Oguey C, de Brevern AG. Comparative Analysis of Threshold and Tessellation Methods for Determining Protein Contacts. J Chem Inf Model 2011; 51:493-507. [DOI: 10.1021/ci100195t] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jeremy Esque
- LPTM, CNRS UMR 8089, Université de Cergy Pontoise, 2 av. Adolphe Chauvin, 95302 Cergy-Pontoise, France
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Université Paris Diderot, Paris 7, INTS, 6, rue Alexandre Cabanel, 75739 Paris Cedex 15, France
| | - Christophe Oguey
- LPTM, CNRS UMR 8089, Université de Cergy Pontoise, 2 av. Adolphe Chauvin, 95302 Cergy-Pontoise, France
| | - Alexandre G. de Brevern
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Université Paris Diderot, Paris 7, INTS, 6, rue Alexandre Cabanel, 75739 Paris Cedex 15, France
| |
Collapse
|
33
|
Andreeva A, Murzin AG. Structural classification of proteins and structural genomics: new insights into protein folding and evolution. Acta Crystallogr Sect F Struct Biol Cryst Commun 2010; 66:1190-7. [PMID: 20944210 PMCID: PMC2954204 DOI: 10.1107/s1744309110007177] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2010] [Accepted: 02/24/2010] [Indexed: 11/10/2022]
Abstract
During the past decade, the Protein Structure Initiative (PSI) centres have become major contributors of new families, superfamilies and folds to the Structural Classification of Proteins (SCOP) database. The PSI results have increased the diversity of protein structural space and accelerated our understanding of it. This review article surveys a selection of protein structures determined by the Joint Center for Structural Genomics (JCSG). It presents previously undescribed β-sheet architectures such as the double barrel and spiral β-roll and discusses new examples of unusual topologies and peculiar structural features observed in proteins characterized by the JCSG and other Structural Genomics centres.
Collapse
Affiliation(s)
- Antonina Andreeva
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, England
| | - Alexey G. Murzin
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, England
| |
Collapse
|
34
|
Rajgaria R, Wei Y, Floudas CA. Contact prediction for beta and alpha-beta proteins using integer linear optimization and its impact on the first principles 3D structure prediction method ASTRO-FOLD. Proteins 2010; 78:1825-46. [PMID: 20225257 PMCID: PMC2858251 DOI: 10.1002/prot.22696] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
An integer linear optimization model is presented to predict residue contacts in beta, alpha + beta, and alpha/beta proteins. The total energy of a protein is expressed as sum of a C(alpha)-C(alpha) distance dependent contact energy contribution and a hydrophobic contribution. The model selects contact that assign lowest energy to the protein structure as satisfying a set of constraints that are included to enforce certain physically observed topological information. A new method based on hydrophobicity is proposed to find the beta-sheet alignments. These beta-sheet alignments are used as constraints for contacts between residues of beta-sheets. This model was tested on three independent protein test sets and CASP8 test proteins consisting of beta, alpha + beta, alpha/beta proteins and it was found to perform very well. The average accuracy of the predictions (separated by at least six residues) was approximately 61%. The average true positive and false positive distances were also calculated for each of the test sets and they are 7.58 A and 15.88 A, respectively. Residue contact prediction can be directly used to facilitate the protein tertiary structure prediction. This proposed residue contact prediction model is incorporated into the first principles protein tertiary structure prediction approach, ASTRO-FOLD. The effectiveness of the contact prediction model was further demonstrated by the improvement in the quality of the protein structure ensemble generated using the predicted residue contacts for a test set of 10 proteins.
Collapse
Affiliation(s)
- R. Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - Y. Wei
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - C. A. Floudas
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| |
Collapse
|
35
|
Karakaş M, Woetzel N, Meiler J. BCL::contact-low confidence fold recognition hits boost protein contact prediction and de novo structure determination. J Comput Biol 2010; 17:153-68. [PMID: 19772383 DOI: 10.1089/cmb.2009.0030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Knowledge of all residue-residue contacts within a protein allows determination of the protein fold. Accurate prediction of even a subset of long-range contacts (contacts between amino acids far apart in sequence) can be instrumental for determining tertiary structure. Here we present BCL::Contact, a novel contact prediction method that utilizes artificial neural networks (ANNs) and specializes in the prediction of medium to long-range contacts. BCL::Contact comes in two modes: sequence-based and structure-based. The sequence-based mode uses only sequence information and has individual ANNs specialized for helix-helix, helix-strand, strand-helix, strand-strand, and sheet-sheet contacts. The structure-based mode combines results from 32-fold recognition methods with sequence information to a consensus prediction. The two methods were presented in the 6(th) and 7(th) Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments. The present work focuses on elucidating the impact of fold recognition results onto contact prediction via a direct comparison of both methods on a joined benchmark set of proteins. The sequence-based mode predicted contacts with 42% accuracy (7% false positive rate), while the structure-based mode achieved 45% accuracy (2% false positive rate). Predictions by both modes of BCL::Contact were supplied as input to the protein tertiary structure prediction program Rosetta for a benchmark of 17 proteins with no close sequence homologs in the protein data bank (PDB). Rosetta created higher accuracy models, signified by an improvement of 1.3 A on average root mean square deviation (RMSD), when driven by the predicted contacts. Further, filtering Rosetta models by agreement with the predicted contacts enriches for native-like fold topologies.
Collapse
Affiliation(s)
- Mert Karakaş
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, USA
| | | | | |
Collapse
|
36
|
Max N, Hu C, Kreylos O, Crivelli S. BuildBeta--a system for automatically constructing beta sheets. Proteins 2010; 78:559-74. [PMID: 19768785 DOI: 10.1002/prot.22578] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We describe a method that can thoroughly sample a protein conformational space given the protein primary sequence of amino acids and secondary structure predictions. Specifically, we target proteins with beta-sheets because they are particularly challenging for ab initio protein structure prediction because of the complexity of sampling long-range strand pairings. Using some basic packing principles, inverse kinematics (IK), and beta-pairing scores, this method creates all possible beta-sheet arrangements including those that have the correct packing of beta-strands. It uses the IK algorithms of ProteinShop to move alpha-helices and beta-strands as rigid bodies by rotating the dihedral angles in the coil regions. Our results show that our approach produces structures that are within 4-6 A RMSD of the native one regardless of the protein size and beta-sheet topology although this number may increase if the protein has long loops or complex alpha-helical regions.
Collapse
Affiliation(s)
- Nelson Max
- Department of Computer Science, University of California, Davis, California 95616, USA
| | | | | | | |
Collapse
|
37
|
Raman S, Vernon R, Thompson J, Tyka M, Sadreyev R, Pei J, Kim D, Kellogg E, DiMaio F, Lange O, Kinch L, Sheffler W, Kim BH, Das R, Grishin NV, Baker D. Structure prediction for CASP8 with all-atom refinement using Rosetta. Proteins 2010; 77 Suppl 9:89-99. [PMID: 19701941 DOI: 10.1002/prot.22540] [Citation(s) in RCA: 367] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We describe predictions made using the Rosetta structure prediction methodology for the Eighth Critical Assessment of Techniques for Protein Structure Prediction. Aggressive sampling and all-atom refinement were carried out for nearly all targets. A combination of alignment methodologies was used to generate starting models from a range of templates, and the models were then subjected to Rosetta all atom refinement. For the 64 domains with readily identified templates, the best submitted model was better than the best alignment to the best template in the Protein Data Bank for 24 cases, and improved over the best starting model for 43 cases. For 13 targets where only very distant sequence relationships to proteins of known structure were detected, models were generated using the Rosetta de novo structure prediction methodology followed by all-atom refinement; in several cases the submitted models were better than those based on the available templates. Of the 12 refinement challenges, the best submitted model improved on the starting model in seven cases. These improvements over the starting template-based models and refinement tests demonstrate the power of Rosetta structure refinement in improving model accuracy.
Collapse
Affiliation(s)
- Srivatsan Raman
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Lindert S, Staritzbichler R, Wötzel N, Karakaş M, Stewart PL, Meiler J. EM-fold: De novo folding of alpha-helical proteins guided by intermediate-resolution electron microscopy density maps. Structure 2009; 17:990-1003. [PMID: 19604479 DOI: 10.1016/j.str.2009.06.001] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2009] [Revised: 05/31/2009] [Accepted: 06/02/2009] [Indexed: 01/22/2023]
Abstract
In medium-resolution (7-10 A) cryo-electron microscopy (cryo-EM) density maps, alpha helices can be identified as density rods whereas beta-strand or loop regions are not as easily discerned. We are proposing a computational protein structure prediction algorithm "EM-Fold" that resolves the density rod connectivity ambiguity by placing predicted alpha helices into the density rods and adding missing backbone coordinates in loop regions. In a benchmark of 11 mainly alpha-helical proteins of known structure a native-like model is identified in eight cases (rmsd 3.9-7.9 A). The three failures can be attributed to inaccuracies in the secondary structure prediction step that precedes EM-Fold. EM-Fold has been applied to the approximately 6 A resolution cryo-EM density map of protein IIIa from human adenovirus. We report the first topological model for the alpha-helical 400 residue N-terminal region of protein IIIa. EM-Fold also has the potential to interpret medium-resolution density maps in X-ray crystallography.
Collapse
Affiliation(s)
- Steffen Lindert
- Department of Chemistry, Vanderbilt University, Nashville, TN 37212, USA
| | | | | | | | | | | |
Collapse
|
39
|
Sathyapriya R, Duarte JM, Stehr H, Filippis I, Lappe M. Defining an essence of structure determining residue contacts in proteins. PLoS Comput Biol 2009; 5:e1000584. [PMID: 19997489 PMCID: PMC2778133 DOI: 10.1371/journal.pcbi.1000584] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2009] [Accepted: 10/30/2009] [Indexed: 11/18/2022] Open
Abstract
The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this “structural essence” has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts—such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed “cone-peeling” that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 Å Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This “structural essence” opens new avenues in the fields of structure prediction, empirical potentials and docking. A protein structure can be visualized as a network of non-covalent contacts existing between amino acids. But not all such contacts are important structural determinants of a protein. We have attempted to identify a subset of amino acid contacts that are essential for reconstructing protein structures. Initially, we followed random sampling of contacts and tested their efficacy to successfully represent the three-dimensional structure. Further, we also developed an algorithm that selects a subset of amino acid contacts from proteins based on the sequence and network properties. The subsets picked by our algorithm represent protein three-dimensional structure better than random subsets, thereby offering direct evidence for the existence of a structural essence in protein structures. The identification of such structure-defining subsets finds application in experimental and computational protein structure determination.
Collapse
Affiliation(s)
- R. Sathyapriya
- Structural Genomics/Bioinformatics Group, Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Jose M. Duarte
- Structural Genomics/Bioinformatics Group, Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Henning Stehr
- Structural Genomics/Bioinformatics Group, Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Ioannis Filippis
- Structural Genomics/Bioinformatics Group, Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Michael Lappe
- Structural Genomics/Bioinformatics Group, Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Berlin, Germany
- * E-mail:
| |
Collapse
|
40
|
Wu L, Li WF, Liu F, Zhang J, Wang J, Wang W. Understanding protein folding cooperativity based on topological consideration. J Chem Phys 2009; 131:065105. [PMID: 19691415 DOI: 10.1063/1.3200952] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The folding cooperativity is an important issue of protein folding dynamics. Since the native topology plays a significant role in determining the folding behavior of proteins, we believe that it also has close relationship with the folding cooperativity. In the present work, we perform simulations on proteins Naf-BBL, QNND-BBL, CI2, and SH3 with the Gō model and compare their different folding behaviors. By analyzing the weak cooperative folding of protein Naf-BBL in detail, we found that the folding of Naf-BBL shows relatively weak thermodynamic coupling between residues, and such weak coupling is found mainly between the nonlocal native contacts. This finding complements our understandings on the source of barrierless folding of Naf-BBL and promotes us to analyze the topological origins of the poor thermodynamic coupling of Naf-BBL. Then, we further extend our analysis to other two-state and multistate proteins. Based on the considerations of the thermodynamic coupling and kinetic coupling, we conclude that the fraction of scattered native contacts, the difference in loop entropy of contacts, and the long range relative contact order are the major topological factors that influence the folding cooperativity. The combination of these three tertiary structural features shows significant correlations with the folding types of proteins. Moreover, we also discuss the topological factors related to downhill folding. Finally, the generic role of tertiary structure in determining the folding cooperativity is summarized.
Collapse
Affiliation(s)
- L Wu
- Department of Physics and National Laboratory of Solid State Microstructure, Nanjing University, Nanjing 210093, China
| | | | | | | | | | | |
Collapse
|
41
|
Xue B, Faraggi E, Zhou Y. Predicting residue-residue contact maps by a two-layer, integrated neural-network method. Proteins 2009; 76:176-83. [PMID: 19137600 DOI: 10.1002/prot.22329] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
A neural network method (SPINE-2D) is introduced to provide a sequence-based prediction of residue-residue contact maps. This method is built on the success of SPINE in predicting secondary structure, residue solvent accessibility, and backbone torsion angles via large-scale training with overfit protection and a two-layer neural network. SPINE-2D achieved a 10-fold cross-validated accuracy of 47% (+/-2%) for top L/5 predicted contacts between two residues with sequence separation of six or more and an accuracy of 24 +/- 1% for nonlocal contacts with sequence separation of 24 residues or more. The accuracies of 23% and 26% for nonlocal contact predictions are achieved for two independent datasets of 500 proteins and 82 CASP 7 targets, respectively. A comparison with other methods indicates that SPINE-2D is among the most accurate methods for contact-map prediction. SPINE-2D is available as a webserver at http://sparks.informatics.iupui.edu.
Collapse
Affiliation(s)
- Bin Xue
- Indiana University School of Informatics, Indiana University-Purdue University, Indianapolis, Indiana 46202, USA
| | | | | |
Collapse
|
42
|
Abstract
Various topologies for representing 3D protein structures have been advanced for purposes ranging from prediction of folding rates to ab initio structure prediction. Examples include relative contact order, Delaunay tessellations, and backbone torsion angle distributions. Here, we introduce a new topology based on a novel means for operationalizing 3D proximities with respect to the underlying chain. The measure involves first interpreting a rank-based representation of the nearest neighbors of each residue as a permutation, then determining how perturbed this permutation is relative to an unfolded chain. We show that the resultant topology provides improved association with folding and unfolding rates determined for a set of two-state proteins under standardized conditions. Furthermore, unlike existing topologies, the proposed geometry exhibits fine scale structure with respect to sequence position along the chain, potentially providing insights into folding initiation and/or nucleation sites.
Collapse
Affiliation(s)
- Mark R Segal
- Division of Biostatistics, University of California, San Francisco, California 94107, USA.
| |
Collapse
|
43
|
Tegge AN, Wang Z, Eickholt J, Cheng J. NNcon: improved protein contact map prediction using 2D-recursive neural networks. Nucleic Acids Res 2009; 37:W515-8. [PMID: 19420062 PMCID: PMC2703959 DOI: 10.1093/nar/gkp305] [Citation(s) in RCA: 110] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2009] [Revised: 04/13/2009] [Accepted: 04/16/2009] [Indexed: 11/13/2022] Open
Abstract
Protein contact map prediction is useful for protein folding rate prediction, model selection and 3D structure prediction. Here we describe NNcon, a fast and reliable contact map prediction server and software. NNcon was ranked among the most accurate residue contact predictors in the Eighth Critical Assessment of Techniques for Protein Structure Prediction (CASP8), 2008. Both NNcon server and software are available at http://casp.rnet.missouri.edu/nncon.html.
Collapse
Affiliation(s)
| | | | | | - Jianlin Cheng
- Computer Science Department, Informatics Institute, University of Missouri, Columbia, MO 65213, USA
| |
Collapse
|
44
|
Rajgaria R, McAllister SR, Floudas CA. Towards accurate residue-residue hydrophobic contact prediction for alpha helical proteins via integer linear optimization. Proteins 2009; 74:929-47. [PMID: 18767158 DOI: 10.1002/prot.22202] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
A new optimization-based method is presented to predict the hydrophobic residue contacts in alpha-helical proteins. The proposed approach uses a high resolution distance dependent force field to calculate the interaction energy between different residues of a protein. The formulation predicts the hydrophobic contacts by minimizing the sum of these contact energies. These residue contacts are highly useful in narrowing down the conformational space searched by protein structure prediction algorithms. The proposed algorithm also offers the algorithmic advantage of producing a rank ordered list of the best contact sets. This model was tested on four independent alpha-helical protein test sets and was found to perform very well. The average accuracy of the predictions (separated by at least six residues) obtained using the presented method was approximately 66% for single domain proteins. The average true positive and false positive distances were also calculated for each protein test set and they are 8.87 and 14.67 A, respectively.
Collapse
Affiliation(s)
- R Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | |
Collapse
|
45
|
Ngan SC, Hung LH, Liu T, Samudrala R. Scoring functions for de novo protein structure prediction revisited. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:243-81. [PMID: 18075169 DOI: 10.1007/978-1-59745-574-9_10] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
Abstract
De novo protein structure prediction methods attempt to predict tertiary structures from sequences based on general principles that govern protein folding energetics and/or statistical tendencies of conformational features that native structures acquire, without the use of explicit templates. A general paradigm for de novo prediction involves sampling the conformational space, guided by scoring functions and other sequence-dependent biases, such that a large set of candidate ("decoy") structures are generated, and then selecting native-like conformations from those decoys using scoring functions as well as conformer clustering. High-resolution refinement is sometimes used as a final step to fine-tune native-like structures. There are two major classes of scoring functions. Physics-based functions are based on mathematical models describing aspects of the known physics of molecular interaction. Knowledge-based functions are formed with statistical models capturing aspects of the properties of native protein conformations. We discuss the implementation and use of some of the scoring functions from these two classes for de novo structure prediction in this chapter.
Collapse
Affiliation(s)
- Shing-Chung Ngan
- Department of Microbiology, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | |
Collapse
|
46
|
Shi Y, Zhou J, Arndt D, Wishart DS, Lin G. Protein contact order prediction from primary sequences. BMC Bioinformatics 2008; 9:255. [PMID: 18513429 PMCID: PMC2440764 DOI: 10.1186/1471-2105-9-255] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2007] [Accepted: 05/30/2008] [Indexed: 11/11/2022] Open
Abstract
Background Contact order is a topological descriptor that has been shown to be correlated with several interesting protein properties such as protein folding rates and protein transition state placements. Contact order has also been used to select for viable protein folds from ab initio protein structure prediction programs. For proteins of known three-dimensional structure, their contact order can be calculated directly. However, for proteins with unknown three-dimensional structure, there is no effective prediction method currently available. Results In this paper, we propose several simple yet very effective methods to predict contact order from the amino acid sequence only. One set of methods is based on a weighted linear combination of predicted secondary structure content and amino acid composition. Depending on the number of components used in these equations it is possible to achieve a correlation coefficient of 0.857–0.870 between the observed and predicted contact order. A second method, based on sequence similarity to known three-dimensional structures, is able to achieve a correlation coefficient of 0.977. We have also developed a much more robust implementation for calculating contact order directly from PDB coordinates that works for > 99% PDB files. All of these contact order predictors and calculators have been implemented as a web server (see Availability and requirements section for URL). Conclusion Protein contact order can be effectively predicted from the primary sequence, at the absence of three-dimensional structure. Three factors, percentage of residues in alpha helices, percentage of residues in beta strands, and sequence length, appear to be strongly correlated with the absolute contact order.
Collapse
Affiliation(s)
- Yi Shi
- Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8, Canada.
| | | | | | | | | |
Collapse
|
47
|
Alexander N, Bortolus M, Al-Mestarihi A, Mchaourab H, Meiler J. De novo high-resolution protein structure determination from sparse spin-labeling EPR data. Structure 2008; 16:181-95. [PMID: 18275810 PMCID: PMC2390841 DOI: 10.1016/j.str.2007.11.015] [Citation(s) in RCA: 105] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2007] [Revised: 10/22/2007] [Accepted: 11/25/2007] [Indexed: 11/18/2022]
Abstract
As many key proteins evade crystallization and remain too large for nuclear magnetic resonance spectroscopy, electron paramagnetic resonance (EPR) spectroscopy combined with site-directed spin labeling offers an alternative approach for obtaining structural information. Such information must be translated into geometric restraints to be used in computer simulations. Here, distances between spin labels are converted into distance ranges between beta carbons by using a "motion-on-a-cone" model, and a linear-correlation model links spin-label accessibility to the number of neighboring residues. This approach was tested on T4-lysozyme and alphaA-crystallin with the de novo structure prediction algorithm Rosetta. The results demonstrate the feasibility of obtaining highly accurate, atomic-detail models from EPR data by yielding 1.0 A and 2.6 A full-atom models, respectively. Distance restraints between amino acids far apart in sequence but close in space are most valuable for structure determination. The approach can be extended to other experimental techniques such as fluorescence spectroscopy, substituted cysteine accessibility method, or mutational studies.
Collapse
Affiliation(s)
- Nathan Alexander
- Department of Chemistry, Vanderbilt University, Nashville, TN 37212
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37212
| | - Marco Bortolus
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37212
| | | | - Hassane Mchaourab
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37212
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37212
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN 37212
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37212
| |
Collapse
|
48
|
|
49
|
Malmström L, Riffle M, Strauss CEM, Chivian D, Davis TN, Bonneau R, Baker D. Superfamily assignments for the yeast proteome through integration of structure prediction with the gene ontology. PLoS Biol 2007; 5:e76. [PMID: 17373854 PMCID: PMC1828141 DOI: 10.1371/journal.pbio.0050076] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2006] [Accepted: 01/12/2007] [Indexed: 11/18/2022] Open
Abstract
Saccharomyces cerevisiae is one of the best-studied model organisms, yet the three-dimensional structure and molecular function of many yeast proteins remain unknown. Yeast proteins were parsed into 14,934 domains, and those lacking sequence similarity to proteins of known structure were folded using the Rosetta de novo structure prediction method on the World Community Grid. This structural data was integrated with process, component, and function annotations from the Saccharomyces Genome Database to assign yeast protein domains to SCOP superfamilies using a simple Bayesian approach. We have predicted the structure of 3,338 putative domains and assigned SCOP superfamily annotations to 581 of them. We have also assigned structural annotations to 7,094 predicted domains based on fold recognition and homology modeling methods. The domain predictions and structural information are available in an online database at http://rd.plos.org/10.1371_journal.pbio.0050076_01.
Collapse
Affiliation(s)
- Lars Malmström
- Department of Biochemistry, University of Washington, Seattle, Washington, United States of America
| | - Michael Riffle
- Department of Biochemistry, University of Washington, Seattle, Washington, United States of America
| | - Charlie E. M Strauss
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Dylan Chivian
- Department of Biochemistry, University of Washington, Seattle, Washington, United States of America
| | - Trisha N Davis
- Department of Biochemistry, University of Washington, Seattle, Washington, United States of America
| | - Richard Bonneau
- Department of Biology, Department of Computer Science, and Center for Comparative Functional Genomics, New York University, New York, New York, United States of America
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, Washington, United States of America
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
50
|
Abstract
Accurate and automated assessment of both geometrical errors and incompleteness of comparative protein structure models is necessary for an adequate use of the models. Here, we describe a composite score for discriminating between models with the correct and incorrect fold. To find an accurate composite score, we designed and applied a genetic algorithm method that searched for a most informative subset of 21 input model features as well as their optimized nonlinear transformation into the composite score. The 21 input features included various statistical potential scores, stereochemistry quality descriptors, sequence alignment scores, geometrical descriptors, and measures of protein packing. The optimized composite score was found to depend on (1) a statistical potential z-score for residue accessibilities and distances, (2) model compactness, and (3) percentage sequence identity of the alignment used to build the model. The accuracy of the composite score was compared with the accuracy of assessment by single and combined features as well as by other commonly used assessment methods. The testing set was representative of models produced by automated comparative modeling on a genomic scale. The composite score performed better than any other tested score in terms of the maximum correct classification rate (i.e., 3.3% false positives and 2.5% false negatives) as well as the sensitivity and specificity across the whole range of thresholds. The composite score was implemented in our program MODELLER-8 and was used to assess models in the MODBASE database that contains comparative models for domains in approximately 1.3 million protein sequences.
Collapse
Affiliation(s)
- Francisco Melo
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile.
| | | |
Collapse
|