1
|
Porter LL. Fluid protein fold space and its implications. Bioessays 2023; 45:e2300057. [PMID: 37431685 PMCID: PMC10529699 DOI: 10.1002/bies.202300057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/21/2023] [Accepted: 06/23/2023] [Indexed: 07/12/2023]
Abstract
Fold-switching proteins, which remodel their secondary and tertiary structures in response to cellular stimuli, suggest a new view of protein fold space. For decades, experimental evidence has indicated that protein fold space is discrete: dissimilar folds are encoded by dissimilar amino acid sequences. Challenging this assumption, fold-switching proteins interconnect discrete groups of dissimilar protein folds, making protein fold space fluid. Three recent observations support the concept of fluid fold space: (1) some amino acid sequences interconvert between folds with distinct secondary structures, (2) some naturally occurring sequences have switched folds by stepwise mutation, and (3) fold switching is evolutionarily selected and likely confers advantage. These observations indicate that minor amino acid sequence modifications can transform protein structure and function. Consequently, proteomic structural and functional diversity may be expanded by alternative splicing, small nucleotide polymorphisms, post-translational modifications, and modified translation rates.
Collapse
Affiliation(s)
- Lauren L. Porter
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD
| |
Collapse
|
2
|
Schaeffer RD, Zhang J, Kinch LN, Pei J, Cong Q, Grishin NV. Classification of domains in predicted structures of the human proteome. Proc Natl Acad Sci U S A 2023; 120:e2214069120. [PMID: 36917664 PMCID: PMC10041065 DOI: 10.1073/pnas.2214069120] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 02/06/2023] [Indexed: 03/16/2023] Open
Abstract
Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and classification of domains from the human proteome. Our classification indicates that only 62% of residues are located in globular domains. We further classify these globular domains and observe that the majority (65%) can be classified among known folds by sequence, with a smaller fraction (33%) requiring structural data to refine the domain boundaries and/or to support their homology. A relatively small number (966 domains) cannot be confidently assigned using our automatic pipelines, thus demanding manual inspection. We classify 47,576 domains, of which only 23% have been included in experimental structures. A portion (6.3%) of these classified globular domains lack sequence-based annotation in InterPro. A quarter (23%) have not been structurally modeled by homology, and they contain 2,540 known disease-causing single amino acid variations whose pathogenesis can now be inferred using AF models. A comparison of classified domains from a series of model organisms revealed expansions of several immune response-related domains in humans and a depletion of olfactory receptors. Finally, we use this classification to expand well-known protein families of biological significance. These classifications are presented on the ECOD website (http://prodata.swmed.edu/ecod/index_human.php).
Collapse
Affiliation(s)
- R. Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX75390
- HHMI, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Jimin Pei
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX75390
| |
Collapse
|
3
|
Sykes J, Holland BR, Charleston MA. A review of visualisations of protein fold networks and their relationship with sequence and function. Biol Rev Camb Philos Soc 2023; 98:243-262. [PMID: 36210328 PMCID: PMC10092621 DOI: 10.1111/brv.12905] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 09/08/2022] [Accepted: 09/09/2022] [Indexed: 01/12/2023]
Abstract
Proteins form arguably the most significant link between genotype and phenotype. Understanding the relationship between protein sequence and structure, and applying this knowledge to predict function, is difficult. One way to investigate these relationships is by considering the space of protein folds and how one might move from fold to fold through similarity, or potential evolutionary relationships. The many individual characterisations of fold space presented in the literature can tell us a lot about how well the current Protein Data Bank represents protein fold space, how convergence and divergence may affect protein evolution, how proteins affect the whole of which they are part, and how proteins themselves function. A synthesis of these different approaches and viewpoints seems the most likely way to further our knowledge of protein structure evolution and thus, facilitate improved protein structure design and prediction.
Collapse
Affiliation(s)
- Janan Sykes
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| | - Barbara R Holland
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| | - Michael A Charleston
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| |
Collapse
|
4
|
Abstract
Does reductionism, in the era of machine learning and now interpretable AI, facilitate or hinder scientific insight? The protein ribbon diagram, as a means of visual reductionism, is a case in point.
Collapse
Affiliation(s)
- Philip E. Bourne
- School of Data Science, University of Virginia, Charlottesville, Virginia, United States of America
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
- * E-mail:
| | - Eli J. Draizen
- School of Data Science, University of Virginia, Charlottesville, Virginia, United States of America
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| | - Cameron Mura
- School of Data Science, University of Virginia, Charlottesville, Virginia, United States of America
| |
Collapse
|
5
|
Gullotto D. Fine tuned exploration of evolutionary relationships within the protein universe. Stat Appl Genet Mol Biol 2021; 20:17-36. [PMID: 33594839 DOI: 10.1515/sagmb-2019-0039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Accepted: 01/12/2021] [Indexed: 11/15/2022]
Abstract
In the regime of domain classifications, the protein universe unveils a discrete set of folds connected by hierarchical relationships. Instead, at sub-domain-size resolution and because of physical constraints not necessarily requiring evolution to shape polypeptide chains, networks of protein motifs depict a continuous view that lies beyond the extent of hierarchical classification schemes. A number of studies, however, suggest that universal sub-sequences could be the descendants of peptides emerged in an ancient pre-biotic world. Should this be the case, evolutionary signals retained by structurally conserved motifs, along with hierarchical features of ancient domains, could sew relationships among folds that diverged beyond the point where homology is discernable. In view of the aforementioned, this paper provides a rationale where a network with hierarchical and continuous levels of the protein space, together with sequence profiles that probe the extent of sequence similarity and contacting residues that capture the transition from pre-biotic to domain world, has been used to explore relationships between ancient folds. Statistics of detected signals have been reported. As a result, an example of an emergent sub-network that makes sense from an evolutionary perspective, where conserved signals retrieved from the assessed protein space have been co-opted, has been discussed.
Collapse
Affiliation(s)
- Danilo Gullotto
- Advanced Computational Biostructural Research Collaboratory, I-95019, Zafferana Etnea, Italy
| |
Collapse
|
6
|
Searching protein space for ancient sub-domain segments. Curr Opin Struct Biol 2021; 68:105-112. [PMID: 33476896 DOI: 10.1016/j.sbi.2020.11.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 11/29/2020] [Indexed: 01/08/2023]
Abstract
Evolutionary processes that formed the current protein universe left their traces, among them homologous segments that recur, or are 'reused,' in multiple proteins. These reused segments, called 'themes,' can be found at various scales, the best known of which is the domain. Yet, recent studies have begun to focus on the evolutionary insights that can be derived from sub-domain-scale themes, which are candidates for traces of more ancient events. Characterizing these may provide clues to the emergence of domains. Particularly interesting are themes that are reused across dissimilar contexts, that is, where the rest of the protein domain differs. We survey computational studies identifying reused themes within different contexts at the sub-domain level.
Collapse
|
7
|
Mura C, Veretnik S, Bourne PE. The Urfold: Structural similarity just above the superfold level? Protein Sci 2019; 28:2119-2126. [PMID: 31599042 PMCID: PMC6863707 DOI: 10.1002/pro.3742] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 09/30/2019] [Accepted: 10/01/2019] [Indexed: 01/16/2023]
Abstract
We suspect that there is a level of granularity of protein structure intermediate between the classical levels of "architecture" and "topology," as reflected in such phenomena as extensive three-dimensional structural similarity above the level of (super)folds. Here, we examine this notion of architectural identity despite topological variability, starting with a concept that we call the "Urfold." We believe that this model could offer a new conceptual approach for protein structural analysis and classification: indeed, the Urfold concept may help reconcile various phenomena that have been frequently recognized or debated for years, such as the precise meaning of "significant" structural overlap and the degree of continuity of fold space. More broadly, the role of structural similarity in sequence↔structure↔function evolution has been studied via many models over the years; by addressing a conceptual gap that we believe exists between the architecture and topology levels of structural classification schemes, the Urfold eventually may help synthesize these models into a generalized, consistent framework. Here, we begin by qualitatively introducing the concept.
Collapse
Affiliation(s)
- Cameron Mura
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia
| | - Stella Veretnik
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia
| | - Philip E Bourne
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia.,School of Data Science, University of Virginia, Charlottesville, Virginia
| |
Collapse
|
8
|
Verma R, Pandit SB. Unraveling the structural landscape of intra-chain domain interfaces: Implication in the evolution of domain-domain interactions. PLoS One 2019; 14:e0220336. [PMID: 31374091 PMCID: PMC6677297 DOI: 10.1371/journal.pone.0220336] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 07/12/2019] [Indexed: 12/22/2022] Open
Abstract
Intra-chain domain interactions are known to play a significant role in the function and stability of multidomain proteins. These interactions are mediated through a physical interaction at domain-domain interfaces (DDIs). With a motivation to understand evolution of interfaces, we have investigated similarities among DDIs. Even though interfaces of protein-protein interactions (PPIs) have been previously studied by structurally aligning interfaces, similar analyses have not yet been performed on DDIs of either multidomain proteins or PPIs. For studying the structural landscape of DDIs, we have used iAlign to structurally align intra-chain domain interfaces of domains. The interface alignment of spatially constrained domains (due to inter-domain linkers) showed that ~88% of these could identify a structural matching interface having similar C-alpha geometry and contact pattern despite that aligned domain pairs are not structurally related. Moreover, the mean interface similarity score (IS-score) is 0.307, which is higher compared to the average random IS-score (0.207) suggesting domain interfaces are not random. The structural space of DDIs is highly connected as ~84% of all possible directed edges among interfaces are found to have at most path length of 8 when 0.26 is IS-score threshold. At this threshold, ~83% of interfaces form the largest strongly connected component. Thus, suggesting that structural space of intra-chain domain interfaces is degenerate and highly connected, as has been found in PPI interfaces. Interestingly, searching for structural neighbors of inter-chain interfaces among intra-chain interfaces showed that ~86% could find a statistically significant match to intra-chain interface with a mean IS-score of 0.311. This implies that domain interfaces are degenerate whether formed within a protein or between proteins. The interface degeneracy is most likely due to limited possible ways of packing secondary structures. In principle, interface similarities can be exploited to accurately model domain interfaces in structure prediction of multidomain proteins.
Collapse
Affiliation(s)
- Rivi Verma
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India
| | - Shashi Bhushan Pandit
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India
- * E-mail:
| |
Collapse
|
9
|
Herman JL. Enhancing Statistical Multiple Sequence Alignment and Tree Inference Using Structural Information. Methods Mol Biol 2019; 1851:183-214. [PMID: 30298398 DOI: 10.1007/978-1-4939-8736-8_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
For highly divergent sequences, there is often insufficient information to reliably construct alignments and phylogenetic trees. Since protein structure may be strongly conserved despite large divergences in sequence, structural information can be used to help identify homology in such cases.While there exist well-studied models of sequence evolution, structurally informed alignment methods have typically made use of geometric measures of deviation that do not take into account the underlying mutational processes. In order to integrate structural information into sequence-based evolutionary models, we recently developed a stochastic model of structural evolution on a phylogenetic tree and implemented this as the StructAlign plugin for the StatAlign statistical alignment package.In this chapter, we will outline the types of analyses that can be carried out using StructAlign, illustrating how the inclusion of structural information can be used to inform joint estimation of alignments and trees. StructAlign can also be used to infer branch-specific rates of structural evolution, and analysis of an example globin dataset highlights strong variation in the inferred rate across the tree. While structure is more highly conserved within clades, the rate of structural divergence as a function of sequence variation is larger between functionally divergent proteins. Allowing for the rate of structural divergence to vary over the tree results in an improved fit to the empirically observed pairwise RMSD values.
Collapse
Affiliation(s)
- Joseph L Herman
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
10
|
Kumirov VK, Dykstra EM, Hall BM, Anderson WJ, Szyszka TN, Cordes MHJ. Multistep mutational transformation of a protein fold through structural intermediates. Protein Sci 2018; 27:1767-1779. [PMID: 30051937 DOI: 10.1002/pro.3488] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2018] [Revised: 07/24/2018] [Accepted: 07/25/2018] [Indexed: 12/24/2022]
Abstract
New protein folds may evolve from existing folds through metamorphic evolution involving a dramatic switch in structure. To mimic pathways by which amino acid sequence changes could induce a change in fold, we designed two folded hybrids of Xfaso 1 and Pfl 6, a pair of homologous Cro protein sequences with ~40% identity but different folds (all-α vs. α + β, respectively). Each hybrid, XPH1 or XPH2, is 85% identical in sequence to its parent, Xfaso 1 or Pfl 6, respectively; 55% identical to its noncognate parent; and ~70% identical to the other hybrid. XPH1 and XPH2 also feature a designed hybrid chameleon sequence corresponding to the C-terminal region, which switched from α-helical to β-sheet structure during Cro evolution. We report solution nuclear magnetic resonance (NMR) structures of XPH1 and XPH2 at 0.3 Å and 0.5 Å backbone root mean square deviation (RMSD), respectively. XPH1 retains a global fold generally similar to Xfaso 1, and XPH2 retains a fold similar to Pfl 6, as measured by TM-align scores (~0.7), DALI Z-scores (7-9), and backbone RMSD (2-3 Å RMSD for the most ordered regions). However, these scores also indicate significant deviations in structure. Most notably, XPH1 and XPH2 have different, and intermediate, secondary structure content relative to Xfaso 1 and Pfl 6. The multistep progression in sequence, from Xfaso 1 to XPH1 to XPH2 to Pfl 6, thus involves both abrupt and gradual changes in folding pattern. The plasticity of some protein folds may allow for "polymetamorphic" evolution through intermediate structures.
Collapse
Affiliation(s)
- Vlad K Kumirov
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona, 85721-0088
| | - Emily M Dykstra
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona, 85721-0088
| | - Branwen M Hall
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona, 85721-0088
| | - William J Anderson
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona, 85721-0088
| | - Taylor N Szyszka
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona, 85721-0088
| | - Matthew H J Cordes
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona, 85721-0088
| |
Collapse
|
11
|
Jaña GA, Mendoza F, Osorio MI, Alderete JB, Fernandes PA, Ramos MJ, Jiménez VA. A QM/MM approach on the structural and stereoelectronic factors governing glycosylation by GTF-SI fromStreptococcus mutans. Org Biomol Chem 2018; 16:2438-2447. [DOI: 10.1039/c8ob00284c] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
This manuscript contains novel insights into the reaction mechanism catalyzed by GTF-SI. Structural and electronic features of the system are revealed, such as the strong hydrogen bond depicted above.
Collapse
Affiliation(s)
- Gonzalo A. Jaña
- Departamento de CienciasQuímicas
- Facultad de Ciencias Exactas
- Universidad Andres Bello
- Sede Concepción
- Talcahuano
| | - Fernanda Mendoza
- Departamento de CienciasQuímicas
- Facultad de Ciencias Exactas
- Universidad Andres Bello
- Sede Concepción
- Talcahuano
| | - Manuel I. Osorio
- Departamento de CienciasQuímicas
- Facultad de Ciencias Exactas
- Universidad Andres Bello
- Sede Concepción
- Talcahuano
| | - Joel B. Alderete
- Departamento de Química Orgánica
- Facultad de Ciencias Químicas
- Universidad de Concepción
- Concepción
- Chile
| | - Pedro A. Fernandes
- UCIBIO
- REQUIMTE
- Departamento de Química e Bioquímica
- Faculdade de Ciências
- Universidade do Porto
| | - Maria J. Ramos
- UCIBIO
- REQUIMTE
- Departamento de Química e Bioquímica
- Faculdade de Ciências
- Universidade do Porto
| | - Verónica A. Jiménez
- Departamento de CienciasQuímicas
- Facultad de Ciencias Exactas
- Universidad Andres Bello
- Sede Concepción
- Talcahuano
| |
Collapse
|
12
|
Dybas JM, Fiser A. Development of a motif-based topology-independent structure comparison method to identify evolutionarily related folds. Proteins 2016; 84:1859-1874. [PMID: 27671894 PMCID: PMC5118133 DOI: 10.1002/prot.25169] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Revised: 08/17/2016] [Accepted: 08/25/2016] [Indexed: 11/09/2022]
Abstract
Structure conservation, functional similarities, and homologous relationships that exist across diverse protein topologies suggest that some regions of the protein fold universe are continuous. However, the current structure classification systems are based on hierarchical organizations, which cannot accommodate structural relationships that span fold definitions. Here, we describe a novel, super-secondary-structure motif-based, topology-independent structure comparison method (SmotifCOMP) that is able to quantitatively identify structural relationships between disparate topologies. The basis of SmotifCOMP is a systematically defined super-secondary-structure motif library whose representative geometries are shown to be saturated in the Protein Data Bank and exhibit a unique distribution within the known folds. SmotifCOMP offers a robust and quantitative technique to compare domains that adopt different topologies since the method does not rely on a global superposition. SmotifCOMP is used to perform an exhaustive comparison of the known folds and the identified relationships are used to produce a nonhierarchical representation of the fold space that reflects the notion of a continuous and connected fold universe. The current work offers insight into previously hypothesized evolutionary relationships between disparate folds and provides a resource for exploring novel ones. Proteins 2016; 84:1859-1874. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Joseph M. Dybas
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
| |
Collapse
|
13
|
Margelevičius M. Bayesian nonparametrics in protein remote homology search. Bioinformatics 2016; 32:2744-52. [DOI: 10.1093/bioinformatics/btw213] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2015] [Accepted: 04/14/2016] [Indexed: 11/14/2022] Open
|
14
|
Das S, Dawson NL, Orengo CA. Diversity in protein domain superfamilies. Curr Opin Genet Dev 2015; 35:40-9. [PMID: 26451979 PMCID: PMC4686048 DOI: 10.1016/j.gde.2015.09.005] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 09/07/2015] [Accepted: 09/08/2015] [Indexed: 01/25/2023]
Abstract
Whilst ∼93% of domain superfamilies appear to be relatively structurally and functionally conserved based on the available data from the CATH-Gene3D domain classification resource, the remainder are much more diverse. In this review, we consider how domains in some of the most ubiquitous and promiscuous superfamilies have evolved, in particular the plasticity in their functional sites and surfaces which expands the repertoire of molecules they interact with and actions performed on them. To what extent can we identify a core function for these superfamilies which would allow us to develop a ‘domain grammar of function’ whereby a protein's biological role can be proposed from its constituent domains? Clearly the first step is to understand the extent to which these components vary and how changes in their molecular make-up modifies function.
Collapse
Affiliation(s)
- Sayoni Das
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Natalie L Dawson
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK.
| |
Collapse
|
15
|
Machine Learnable Fold Space Representation based on Residue Cluster Classes. Comput Biol Chem 2015; 59 Pt A:1-7. [PMID: 26366526 DOI: 10.1016/j.compbiolchem.2015.07.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Revised: 07/17/2015] [Accepted: 07/25/2015] [Indexed: 11/21/2022]
Abstract
MOTIVATION Protein fold space is a conceptual framework where all possible protein folds exist and ideas about protein structure, function and evolution may be analyzed. Classification of protein folds in this space is commonly achieved by using similarity indexes and/or machine learning approaches, each with different limitations. RESULTS We propose a method for constructing a compact vector space model of protein fold space by representing each protein structure by its residues local contacts. We developed an efficient method to statistically test for the separability of points in a space and showed that our protein fold space representation is learnable by any machine-learning algorithm. AVAILABILITY An API is freely available at https://code.google.com/p/pyrcc/.
Collapse
|
16
|
Edwards H, Deane CM. Structural Bridges through Fold Space. PLoS Comput Biol 2015; 11:e1004466. [PMID: 26372166 PMCID: PMC4570669 DOI: 10.1371/journal.pcbi.1004466] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 07/12/2015] [Indexed: 12/05/2022] Open
Abstract
Several protein structure classification schemes exist that partition the protein universe into structural units called folds. Yet these schemes do not discuss how these units sit relative to each other in a global structure space. In this paper we construct networks that describe such global relationships between folds in the form of structural bridges. We generate these networks using four different structural alignment methods across multiple score thresholds. The networks constructed using the different methods remain a similar distance apart regardless of the probability threshold defining a structural bridge. This suggests that at least some structural bridges are method specific and that any attempt to build a picture of structural space should not be reliant on a single structural superposition method. Despite these differences all representations agree on an organisation of fold space into five principal community structures: all-α, all-β sandwiches, all-β barrels, α/β and α + β. We project estimated fold ages onto the networks and find that not only are the pairings of unconnected folds associated with higher age differences than bridged folds, but this difference increases with the number of networks displaying an edge. We also examine different centrality measures for folds within the networks and how these relate to fold age. While these measures interpret the central core of fold space in varied ways they all identify the disposition of ancestral folds to fall within this core and that of the more recently evolved structures to provide the peripheral landscape. These findings suggest that evolutionary information is encoded along these structural bridges. Finally, we identify four highly central pivotal folds representing dominant topological features which act as key attractors within our landscapes. Folds are considered to be the structural units which make up the protein universe. Structural classification schemes focus on the assignment and organisation of protein domains into folds. However, they do not suggest how different folds might relate to one another in a global way. We introduce the concept of bridges through fold space: significant similarities between these units. We consider four alignment methods and a dynamic approach to placing these bridges. A greater consensus between these methods cannot be achieved by simply increasing the stringency with which edges are assigned. Instead, we emphasise the importance of considering consensus maps and only report results where there is agreement across all networks. It is possible that a study of the bridges may reveal evolutionary relationships. Based on a phylogenetic analysis of structures, we find that bridges consistently fall between folds which evolved at similar times. Moreover, the landscapes all consist of a core of older folds, with younger structures more often seen at the periphery. Finally we identify four pivotal folds in the landscapes. They contain topological motifs which unite disparate regions of fold space.
Collapse
Affiliation(s)
- Hannah Edwards
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Charlotte M. Deane
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
17
|
Rackovsky S. Nonlinearities in protein space limit the utility of informatics in protein biophysics. Proteins 2015; 83:1923-8. [PMID: 26315852 DOI: 10.1002/prot.24916] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Revised: 08/12/2015] [Accepted: 08/20/2015] [Indexed: 11/08/2022]
Abstract
We examine the utility of informatic-based methods in computational protein biophysics. To do so, we use newly developed metric functions to define completely independent sequence and structure spaces for a large database of proteins. By investigating the relationship between these spaces, we demonstrate quantitatively the limits of knowledge-based correlation between the sequences and structures of proteins. It is shown that there are well-defined, nonlinear regions of protein space in which dissimilar structures map onto similar sequences (the conformational switch), and dissimilar sequences map onto similar structures (remote homology). These nonlinearities are shown to be quite common-almost half the proteins in our database fall into one or the other of these two regions. They are not anomalies, but rather intrinsic properties of structural encoding in amino acid sequences. It follows that extreme care must be exercised in using bioinformatic data as a basis for computational structure prediction. The implications of these results for protein evolution are examined.
Collapse
Affiliation(s)
- S Rackovsky
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York, 14853.,Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York, New York, 10029
| |
Collapse
|
18
|
Holzgräfe C, Wallin S. Smooth functional transition along a mutational pathway with an abrupt protein fold switch. Biophys J 2015; 107:1217-1225. [PMID: 25185557 DOI: 10.1016/j.bpj.2014.07.020] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Revised: 06/25/2014] [Accepted: 07/01/2014] [Indexed: 10/24/2022] Open
Abstract
Recent protein design experiments have demonstrated that proteins can migrate between folds through the accumulation of substitution mutations without visiting disordered or nonfunctional points in sequence space. To explore the biophysical mechanism underlying such transitions we use a three-letter continuous protein model with seven atoms per amino acid to provide realistic sequence-structure and sequence-function mappings through explicit simulation of the folding and interaction of model sequences. We start from two 16-amino-acid sequences folding into an α-helix and a β-hairpin, respectively, each of which has a preferred binding partner with 35 amino acids. We identify a mutational pathway between the two folds, which features a sharp fold switch. By contrast, we find that the transition in function is smooth. Moreover, the switch in preferred binding partner does not coincide with the fold switch. Discovery of new folds in evolution might therefore be facilitated by following fitness slopes in sequence space underpinned by binding-induced conformational switching.
Collapse
Affiliation(s)
- Christian Holzgräfe
- Department of Astronomy and Theoretical Physics, Computational Biology and Biological Physics, Lund University, Lund, Sweden
| | - Stefan Wallin
- Department of Astronomy and Theoretical Physics, Computational Biology and Biological Physics, Lund University, Lund, Sweden.
| |
Collapse
|
19
|
De novo protein design: how do we expand into the universe of possible protein structures? Curr Opin Struct Biol 2015; 33:16-26. [DOI: 10.1016/j.sbi.2015.05.009] [Citation(s) in RCA: 128] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Revised: 05/15/2015] [Accepted: 05/25/2015] [Indexed: 01/08/2023]
|
20
|
Lhota J, Hauptman R, Hart T, Ng C, Xie L. A new method to improve network topological similarity search: applied to fold recognition. Bioinformatics 2015; 31:2106-14. [PMID: 25717198 DOI: 10.1093/bioinformatics/btv125] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 02/21/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Similarity search is the foundation of bioinformatics. It plays a key role in establishing structural, functional and evolutionary relationships between biological sequences. Although the power of the similarity search has increased steadily in recent years, a high percentage of sequences remain uncharacterized in the protein universe. Thus, new similarity search strategies are needed to efficiently and reliably infer the structure and function of new sequences. The existing paradigm for studying protein sequence, structure, function and evolution has been established based on the assumption that the protein universe is discrete and hierarchical. Cumulative evidence suggests that the protein universe is continuous. As a result, conventional sequence homology search methods may be not able to detect novel structural, functional and evolutionary relationships between proteins from weak and noisy sequence signals. To overcome the limitations in existing similarity search methods, we propose a new algorithmic framework-Enrichment of Network Topological Similarity (ENTS)-to improve the performance of large scale similarity searches in bioinformatics. RESULTS We apply ENTS to a challenging unsolved problem: protein fold recognition. Our rigorous benchmark studies demonstrate that ENTS considerably outperforms state-of-the-art methods. As the concept of ENTS can be applied to any similarity metric, it may provide a general framework for similarity search on any set of biological entities, given their representation as a network. AVAILABILITY AND IMPLEMENTATION Source code freely available upon request CONTACT : lxie@iscb.org.
Collapse
Affiliation(s)
- John Lhota
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Ruth Hauptman
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Thomas Hart
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Clara Ng
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Lei Xie
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A. Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| |
Collapse
|
21
|
Minami S, Sawada K, Chikenji G. How a spatial arrangement of secondary structure elements is dispersed in the universe of protein folds. PLoS One 2014; 9:e107959. [PMID: 25243952 PMCID: PMC4171485 DOI: 10.1371/journal.pone.0107959] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 08/18/2014] [Indexed: 11/18/2022] Open
Abstract
It has been known that topologically different proteins of the same class sometimes share the same spatial arrangement of secondary structure elements (SSEs). However, the frequency by which topologically different structures share the same spatial arrangement of SSEs is unclear. It is important to estimate this frequency because it provides both a deeper understanding of the geometry of protein folds and a valuable suggestion for predicting protein structures with novel folds. Here we clarified the frequency with which protein folds share the same SSE packing arrangement with other folds, the types of spatial arrangement of SSEs that are frequently observed across different folds, and the diversity of protein folds that share the same spatial arrangement of SSEs with a given fold, using a protein structure alignment program MICAN, which we have been developing. By performing comprehensive structural comparison of SCOP fold representatives, we found that approximately 80% of protein folds share the same spatial arrangement of SSEs with other folds. We also observed that many protein pairs that share the same spatial arrangement of SSEs belong to the different classes, often with an opposing N- to C-terminal direction of the polypeptide chain. The most frequently observed spatial arrangement of SSEs was the 2-layer α/β packing arrangement and it was dispersed among as many as 27% of SCOP fold representatives. These results suggest that the same spatial arrangements of SSEs are adopted by a wide variety of different folds and that the spatial arrangement of SSEs is highly robust against the N- to C-terminal direction of the polypeptide chain.
Collapse
Affiliation(s)
- Shintaro Minami
- Department of Complex Systems Science, Nagoya University, Nagoya, Aichi, Japan
| | - Kengo Sawada
- Department of Applied Physics, Nagoya University, Nagoya, Aichi, Japan
| | - George Chikenji
- Department of Computational Science and Engineering, Nagoya University, Nagoya, Aichi, Japan
- * E-mail:
| |
Collapse
|
22
|
Abstract
To explore protein space from a global perspective, we consider 9,710 SCOP (Structural Classification of Proteins) domains with up to 70% sequence identity and present all similarities among them as networks: In the "domain network," nodes represent domains, and edges connect domains that share "motifs," i.e., significantly sized segments of similar sequence and structure. We explore the dependence of the network on the thresholds that define the evolutionary relatedness of the domains. At excessively strict thresholds the network falls apart completely; for very lax thresholds, there are network paths between virtually all domains. Interestingly, at intermediate thresholds the network constitutes two regions that can be described as "continuous" versus "discrete." The continuous region comprises a large connected component, dominated by domains with alternating alpha and beta elements, and the discrete region includes the rest of the domains in isolated islands, each generally corresponding to a fold. We also construct the "motif network," in which nodes represent recurring motifs, and edges connect motifs that appear in the same domain. This network also features a large and highly connected component of motifs that originate from domains with alternating alpha/beta elements (and some all-alpha domains), and smaller isolated islands. Indeed, the motif network suggests that nature reuses such motifs extensively. The networks suggest evolutionary paths between domains and give hints about protein evolution and the underlying biophysics. They provide natural means of organizing protein space, and could be useful for the development of strategies for protein search and design.
Collapse
|
23
|
Tóth-Petróczy A, Tawfik DS. The robustness and innovability of protein folds. Curr Opin Struct Biol 2014; 26:131-8. [PMID: 25038399 DOI: 10.1016/j.sbi.2014.06.007] [Citation(s) in RCA: 93] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Revised: 06/26/2014] [Accepted: 06/26/2014] [Indexed: 11/30/2022]
Abstract
Assignment of protein folds to functions indicates that >60% of folds carry out one or two enzymatic functions, while few folds, for example, the TIM-barrel and Rossmann folds, exhibit hundreds. Are there structural features that make a fold amenable to functional innovation (innovability)? Do these features relate to robustness--the ability to readily accumulate sequence changes? We discuss several hypotheses regarding the relationship between the architecture of a protein and its evolutionary potential. We describe how, in a seemingly paradoxical manner, opposite properties, such as high stability and rigidity versus conformational plasticity and structural order versus disorder, promote robustness and/or innovability. We hypothesize that polarity--differentiation and low connectivity between a protein's scaffold and its active-site--is a key prerequisite for innovability.
Collapse
Affiliation(s)
- Agnes Tóth-Petróczy
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Dan S Tawfik
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel.
| |
Collapse
|
24
|
Myers-Turnbull D, Bliven SE, Rose PW, Aziz ZK, Youkharibache P, Bourne PE, Prlić A. Systematic detection of internal symmetry in proteins using CE-Symm. J Mol Biol 2014; 426:2255-68. [PMID: 24681267 DOI: 10.1016/j.jmb.2014.03.010] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Revised: 03/17/2014] [Accepted: 03/18/2014] [Indexed: 11/26/2022]
Abstract
Symmetry is an important feature of protein tertiary and quaternary structures that has been associated with protein folding, function, evolution, and stability. Its emergence and ensuing prevalence has been attributed to gene duplications, fusion events, and subsequent evolutionary drift in sequence. This process maintains structural similarity and is further supported by this study. To further investigate the question of how internal symmetry evolved, how symmetry and function are related, and the overall frequency of internal symmetry, we developed an algorithm, CE-Symm, to detect pseudo-symmetry within the tertiary structure of protein chains. Using a large manually curated benchmark of 1007 protein domains, we show that CE-Symm performs significantly better than previous approaches. We use CE-Symm to build a census of symmetry among domain superfamilies in SCOP and note that 18% of all superfamilies are pseudo-symmetric. Our results indicate that more domains are pseudo-symmetric than previously estimated. We establish a number of recurring types of symmetry-function relationships and describe several characteristic cases in detail. With the use of the Enzyme Commission classification, symmetry was found to be enriched in some enzyme classes but depleted in others. CE-Symm thus provides a methodology for a more complete and detailed study of the role of symmetry in tertiary protein structure [availability: CE-Symm can be run from the Web at http://source.rcsb.org/jfatcatserver/symmetry.jsp. Source code and software binaries are also available under the GNU Lesser General Public License (version 2.1) at https://github.com/rcsb/symmetry. An interactive census of domains identified as symmetric by CE-Symm is available from http://source.rcsb.org/jfatcatserver/scopResults.jsp].
Collapse
Affiliation(s)
- Douglas Myers-Turnbull
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Spencer E Bliven
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Peter W Rose
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Zaid K Aziz
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093, USA
| | | | - Philip E Bourne
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA.
| | - Andreas Prlić
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
25
|
Gao M, Skolnick J. A comprehensive survey of small-molecule binding pockets in proteins. PLoS Comput Biol 2013; 9:e1003302. [PMID: 24204237 PMCID: PMC3812058 DOI: 10.1371/journal.pcbi.1003302] [Citation(s) in RCA: 91] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Accepted: 09/11/2013] [Indexed: 11/19/2022] Open
Abstract
Many biological activities originate from interactions between small-molecule ligands and their protein targets. A detailed structural and physico-chemical characterization of these interactions could significantly deepen our understanding of protein function and facilitate drug design. Here, we present a large-scale study on a non-redundant set of about 20,000 known ligand-binding sites, or pockets, of proteins. We find that the structural space of protein pockets is crowded, likely complete, and may be represented by about 1,000 pocket shapes. Correspondingly, the growth rate of novel pockets deposited in the Protein Data Bank has been decreasing steadily over the recent years. Moreover, many protein pockets are promiscuous and interact with ligands of diverse scaffolds. Conversely, many ligands are promiscuous and interact with structurally different pockets. Through a physico-chemical and structural analysis, we provide insights into understanding both pocket promiscuity and ligand promiscuity. Finally, we discuss the implications of our study for the prediction of protein-ligand interactions based on pocket comparison. The life of a living cell relies on many distinct proteins to carry out their functions. Most of these functions are rooted in interactions between the proteins and metabolites, small-molecules essential for life. By targeting specific proteins relevant to a disease, drug molecules may provide a cure. A deep understanding of the nature of interactions between proteins and small-molecules (or ligands) through analyzing their structures may help predict protein function or improve drug design. In this contribution, we present a large-scale analysis of a non-redundant set of over 20,000 experimental protein-ligand complex structures available in the current Protein Data Bank. We seek answers to several fundamental questions: How many representative pockets are there that serve as ligand-binding sites in proteins? To what extent can we infer a similar protein-ligand interaction by matching the structures of protein pockets? How different are the ligands found in the same pocket? For a promiscuous protein pocket, how does a pocket maintain favorable interactions with very different ligands? Conversely, how different are those pockets that interact with the same ligand? We find the structural space of protein pocket is small and that both protein promiscuity and ligand promiscuity are very common in Nature.
Collapse
Affiliation(s)
- Mu Gao
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
- * E-mail:
| |
Collapse
|
26
|
Hadzipasic O, Wrabl JO, Hilser VJ. A horizontal alignment tool for numerical trend discovery in sequence data: application to protein hydropathy. PLoS Comput Biol 2013; 9:e1003247. [PMID: 24130469 PMCID: PMC3794901 DOI: 10.1371/journal.pcbi.1003247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2013] [Accepted: 07/10/2013] [Indexed: 11/19/2022] Open
Abstract
An algorithm is presented that returns the optimal pairwise gapped alignment of two sets of signed numerical sequence values. One distinguishing feature of this algorithm is a flexible comparison engine (based on both relative shape and absolute similarity measures) that does not rely on explicit gap penalties. Additionally, an empirical probability model is developed to estimate the significance of the returned alignment with respect to randomized data. The algorithm's utility for biological hypothesis formulation is demonstrated with test cases including database search and pairwise alignment of protein hydropathy. However, the algorithm and probability model could possibly be extended to accommodate other diverse types of protein or nucleic acid data, including positional thermodynamic stability and mRNA translation efficiency. The algorithm requires only numerical values as input and will readily compare data other than protein hydropathy. The tool is therefore expected to complement, rather than replace, existing sequence and structure based tools and may inform medical discovery, as exemplified by proposed similarity between a chlamydial ORFan protein and bacterial colicin pore-forming domain. The source code, documentation, and a basic web-server application are available.
Collapse
Affiliation(s)
- Omar Hadzipasic
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - James O. Wrabl
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Vincent J. Hilser
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, United States of America
- T.C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail:
| |
Collapse
|
27
|
Ingles-Prieto A, Ibarra-Molero B, Delgado-Delgado A, Perez-Jimenez R, Fernandez JM, Gaucher EA, Sanchez-Ruiz JM, Gavira JA. Conservation of protein structure over four billion years. Structure 2013; 21:1690-7. [PMID: 23932589 PMCID: PMC3774310 DOI: 10.1016/j.str.2013.06.020] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2013] [Revised: 06/07/2013] [Accepted: 06/26/2013] [Indexed: 01/07/2023]
Abstract
Little is known about the evolution of protein structures and the degree of protein structure conservation over planetary time scales. Here, we report the X-ray crystal structures of seven laboratory resurrections of Precambrian thioredoxins dating up to approximately four billion years ago. Despite considerable sequence differences compared with extant enzymes, the ancestral proteins display the canonical thioredoxin fold, whereas only small structural changes have occurred over four billion years. This remarkable degree of structure conservation since a time near the last common ancestor of life supports a punctuated-equilibrium model of structure evolution in which the generation of new folds occurs over comparatively short periods and is followed by long periods of structural stasis.
Collapse
Affiliation(s)
- Alvaro Ingles-Prieto
- Facultad de Ciencias, Departamento de Química Física, Universidad de Granada, Granada, 18071, Spain
| | - Beatriz Ibarra-Molero
- Facultad de Ciencias, Departamento de Química Física, Universidad de Granada, Granada, 18071, Spain
| | - Asuncion Delgado-Delgado
- Facultad de Ciencias, Departamento de Química Física, Universidad de Granada, Granada, 18071, Spain
| | - Raul Perez-Jimenez
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Julio M. Fernandez
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Eric A. Gaucher
- Georgia Institute of Technology, School of Biology, School of Chemistry and Biochemistry, and Parker H. Petit Institute for Bioengineering and Biosciences, Atlanta, Georgia, 30332, USA
| | - Jose M. Sanchez-Ruiz
- Facultad de Ciencias, Departamento de Química Física, Universidad de Granada, Granada, 18071, Spain,To whom correspondence should be addressed: CONTACT: Jose M. Sanchez-Ruiz., , TEL: 34-958243189, FAX: 34-958272879
| | - Jose A. Gavira
- Laboratorio de Estudios Cristalográficos, Instituto Andaluz de Ciencias de la Tierra (Consejo Superior de Investigaciones Científicas – Universidad de Granada), Avenida de las Palmeras 4, Armilla, Granada, 18100, Spain,To whom correspondence should be addressed: CONTACT: Jose M. Sanchez-Ruiz., , TEL: 34-958243189, FAX: 34-958272879
| |
Collapse
|
28
|
Topham CM, Rouquier M, Tarrat N, André I. Adaptive Smith-Waterman residue match seeding for protein structural alignment. Proteins 2013; 81:1823-39. [DOI: 10.1002/prot.24327] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2013] [Revised: 04/22/2013] [Accepted: 05/15/2013] [Indexed: 12/30/2022]
Affiliation(s)
- Christopher M. Topham
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| | - Mickaël Rouquier
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| | - Nathalie Tarrat
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| | - Isabelle André
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| |
Collapse
|
29
|
Lenart A, Dudkiewicz M, Grynberg M, Pawłowski K. CLCAs - a family of metalloproteases of intriguing phylogenetic distribution and with cases of substituted catalytic sites. PLoS One 2013; 8:e62272. [PMID: 23671590 PMCID: PMC3650047 DOI: 10.1371/journal.pone.0062272] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Accepted: 03/19/2013] [Indexed: 01/08/2023] Open
Abstract
The zinc-dependent metalloproteases with His-Glu-x-x-His (HExxH) active site motif, zincins, are a broad group of proteins involved in many metabolic and regulatory functions, and found in all forms of life. Human genome contains more than 100 genes encoding proteins with known zincin-like domains. A survey of all proteins containing the HExxH motif shows that approximately 52% of HExxH occurrences fall within known protein structural domains (as defined in the Pfam database). Domain families with majority of members possessing a conserved HExxH motif include, not surprisingly, many known and putative metalloproteases. Furthermore, several HExxH-containing protein domains thus identified can be confidently predicted to be putative peptidases of zincin fold. Thus, we predict zincin-like fold for eight uncharacterised Pfam families. Besides the domains with the HExxH motif strictly conserved, and those with sporadic occurrences, intermediate families are identified that contain some members with a conserved HExxH motif, but also many homologues with substitutions at the conserved positions. Such substitutions can be evolutionarily conserved and non-random, yet functional roles of these inactive zincins are not known. The CLCAs are a novel zincin-like protease family with many cases of substituted active sites. We show that this allegedly metazoan family has a number of bacterial and archaeal members. An extremely patchy phylogenetic distribution of CLCAs in prokaryotes and their conserved protein domain composition strongly suggests an evolutionary scenario of horizontal gene transfer (HGT) from multicellular eukaryotes to bacteria, providing an example of eukaryote-derived xenologues in bacterial genomes. Additionally, in a protein family identified here as closely homologous to CLCA, the CLCA_X (CLCA-like) family, a number of proteins is found in phages and plasmids, supporting the HGT scenario.
Collapse
Affiliation(s)
- Anna Lenart
- Department of Cellular and Molecular Neurobiology, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
| | - Małgorzata Dudkiewicz
- Faculty of Agriculture and Biology, Warsaw University of Life Sciences, Warsaw, Poland
| | - Marcin Grynberg
- Department of Genetics, Institute of Biochemistry and Biophysics, Polish Academy of Sciences,Warsaw, Poland
| | - Krzysztof Pawłowski
- Department of Cellular and Molecular Neurobiology, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
- Faculty of Agriculture and Biology, Warsaw University of Life Sciences, Warsaw, Poland
- * E-mail:
| |
Collapse
|
30
|
Affiliation(s)
- Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa 31905, Israel;
| | - Leonid Pereyaslavets
- Department of Structural Biology, Stanford University, Stanford, California 94305; ,
| | | | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, California 94305; ,
| |
Collapse
|
31
|
Wu Y, Punta M, Xiao R, Acton TB, Sathyamoorthy B, Dey F, Fischer M, Skerra A, Rost B, Montelione GT, Szyperski T. NMR structure of lipoprotein YxeF from Bacillus subtilis reveals a calycin fold and distant homology with the lipocalin Blc from Escherichia coli. PLoS One 2012; 7:e37404. [PMID: 22693626 PMCID: PMC3367933 DOI: 10.1371/journal.pone.0037404] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2012] [Accepted: 04/19/2012] [Indexed: 11/18/2022] Open
Abstract
The soluble monomeric domain of lipoprotein YxeF from the Gram positive bacterium B. subtilis was selected by the Northeast Structural Genomics Consortium (NESG) as a target of a biomedical theme project focusing on the structure determination of the soluble domains of bacterial lipoproteins. The solution NMR structure of YxeF reveals a calycin fold and distant homology with the lipocalin Blc from the Gram-negative bacterium E.coli. In particular, the characteristic β-barrel, which is open to the solvent at one end, is extremely well conserved in YxeF with respect to Blc. The identification of YxeF as the first lipocalin homologue occurring in a Gram-positive bacterium suggests that lipocalins emerged before the evolutionary divergence of Gram positive and Gram negative bacteria. Since YxeF is devoid of the α-helix that packs in all lipocalins with known structure against the β-barrel to form a second hydrophobic core, we propose to introduce a new lipocalin sub-family named ‘slim lipocalins’, with YxeF and the other members of Pfam family PF11631 to which YxeF belongs constituting the first representatives. The results presented here exemplify the impact of structural genomics to enhance our understanding of biology and to generate new biological hypotheses.
Collapse
Affiliation(s)
- Yibing Wu
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York, United States of America
- Northeast Structural Genomics Consortium
| | - Marco Punta
- Department of Computer Science and Institute for Advanced Study, Technical University of Munich, Munich, Germany
- Northeast Structural Genomics Consortium
| | - Rong Xiao
- Center of Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Robert Wood Johnson Medical School, The State University of New Jersey, Piscataway, New Jersey, United States of America
- Northeast Structural Genomics Consortium
| | - Thomas B. Acton
- Center of Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Robert Wood Johnson Medical School, The State University of New Jersey, Piscataway, New Jersey, United States of America
- Northeast Structural Genomics Consortium
| | - Bharathwaj Sathyamoorthy
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Fabian Dey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- Northeast Structural Genomics Consortium
| | - Markus Fischer
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- Northeast Structural Genomics Consortium
| | - Arne Skerra
- Munich Center for Integrated Protein Science, CIPS-M, and Lehrstuhl für Biologische Chemie, Technische Universität München, Freising-Weihenstephan, Germany
| | - Burkhard Rost
- Department of Computer Science and Institute for Advanced Study, Technical University of Munich, Munich, Germany
- Northeast Structural Genomics Consortium
| | - Gaetano T. Montelione
- Center of Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Robert Wood Johnson Medical School, The State University of New Jersey, Piscataway, New Jersey, United States of America
- Northeast Structural Genomics Consortium
| | - Thomas Szyperski
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York, United States of America
- Northeast Structural Genomics Consortium
- * E-mail:
| |
Collapse
|
32
|
Hensen U, Meyer T, Haas J, Rex R, Vriend G, Grubmüller H. Exploring protein dynamics space: the dynasome as the missing link between protein structure and function. PLoS One 2012; 7:e33931. [PMID: 22606222 PMCID: PMC3350514 DOI: 10.1371/journal.pone.0033931] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Accepted: 02/20/2012] [Indexed: 12/25/2022] Open
Abstract
Proteins are usually described and classified according to amino acid sequence, structure or function. Here, we develop a minimally biased scheme to compare and classify proteins according to their internal mobility patterns. This approach is based on the notion that proteins not only fold into recurring structural motifs but might also be carrying out only a limited set of recurring mobility motifs. The complete set of these patterns, which we tentatively call the dynasome, spans a multi-dimensional space with axes, the dynasome descriptors, characterizing different aspects of protein dynamics. The unique dynamic fingerprint of each protein is represented as a vector in the dynasome space. The difference between any two vectors, consequently, gives a reliable measure of the difference between the corresponding protein dynamics. We characterize the properties of the dynasome by comparing the dynamics fingerprints obtained from molecular dynamics simulations of 112 proteins but our approach is, in principle, not restricted to any specific source of data of protein dynamics. We conclude that: 1. the dynasome consists of a continuum of proteins, rather than well separated classes. 2. For the majority of proteins we observe strong correlations between structure and dynamics. 3. Proteins with similar function carry out similar dynamics, which suggests a new method to improve protein function annotation based on protein dynamics.
Collapse
Affiliation(s)
- Ulf Hensen
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - Tim Meyer
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - Jürgen Haas
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - René Rex
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| | - Gert Vriend
- CMBI, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | - Helmut Grubmüller
- Theoretische und computergestützte Biophysik, Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany
| |
Collapse
|
33
|
Skolnick J, Zhou H, Brylinski M. Further evidence for the likely completeness of the library of solved single domain protein structures. J Phys Chem B 2012; 116:6654-64. [PMID: 22272723 DOI: 10.1021/jp211052j] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Recent studies questioned whether the Protein Data Bank (PDB) contains all compact, single domain protein structures. Here, we show that all quasi-spherical, QS, random protein structures devoid of secondary structure are in the PDB and are excellent templates for all native PDB proteins up to 250 residues. Because QS templates have a similar global contour as native, TASSER can refine 98% (90%) of those whose TM-score is 0.4 (0.35) to structures greater than or equal to the 0.5 TM-score threshold (0.74 (0.64) mean TM-score) for CATH/SCOP assignment. On the basis of this and the fact that, at a TM-score of 0.4, 83% (90%) of all (internal) core secondary structure elements are recovered, a 0.40 TM-score is an appropriate fold similarity assignment threshold. Despite the claims of Taylor, Trovato, and Zhou that many of their structures lack a PDB counterpart, using fr-TM-align, at a 0.45 (0.5) TM-score threshold, essentially all (most) are found in the PDB. Thus, the conclusion that the PDB is likely complete is further supported.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, Georgia 30318, USA.
| | | | | |
Collapse
|
34
|
Daniels NM, Kumar A, Cowen LJ, Menke M. Touring protein space with Matt. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:286-93. [PMID: 21464511 PMCID: PMC3355523 DOI: 10.1109/tcbb.2011.70] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Using the Matt structure alignment program, we take a tour of protein space, producing a hierarchical clustering scheme that divides protein structural domains into clusters based on geometric dissimilarity. While it was known that purely structural, geometric, distance-based measures of structural similarity, such as Dali/FSSP, could largely replicate hand-curated schemes such as SCOP at the family level, it was an open question as to whether any such scheme could approximate SCOP at the more distant superfamily and fold levels. We partially answer this question in the affirmative, by designing a clustering scheme based on Matt that approximately matches SCOP at the superfamily level, and demonstrates qualitative differences in performance between Matt and DaliLite. Implications for the debate over the organization of protein fold space are discussed. Based on our clustering of protein space, we introduce the Mattbench benchmark set, a new collection of structural alignments useful for testing sequence aligners on more distantly homologous proteins.
Collapse
Affiliation(s)
- Noah M. Daniels
- The authors are with the Tufts University, 161 College Avenue, Halligan Hall Room 102, Medford, MA 02155
| | - Anoop Kumar
- The authors are with the Tufts University, 161 College Avenue, Halligan Hall Room 102, Medford, MA 02155
| | - Lenore J. Cowen
- The authors are with the Tufts University, 161 College Avenue, Halligan Hall Room 102, Medford, MA 02155
| | - Matt Menke
- The authors are with the Tufts University, 161 College Avenue, Halligan Hall Room 102, Medford, MA 02155
| |
Collapse
|
35
|
Brylinski M, Gao M, Skolnick J. Why not consider a spherical protein? Implications of backbone hydrogen bonding for protein structure and function. Phys Chem Chem Phys 2011; 13:17044-55. [PMID: 21655593 DOI: 10.1039/c1cp21140d] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The intrinsic ability of protein structures to exhibit the geometric features required for molecular function in the absence of evolution is examined in the context of three systems: the reference set of real, single domain protein structures, a library of computationally generated, compact homopolypeptides, artificial structures with protein-like secondary structural elements, and quasi-spherical random proteins packed at the same density as proteins but lacking backbone secondary structure and hydrogen bonding. Without any evolutionary selection, the library of artificial structures has similar backbone hydrogen bonding, global shape, surface to volume ratio and statistically significant structural matches to real protein global structures. Moreover, these artificial structures have native like ligand binding cavities, and a tiny subset has interfacial geometries consistent with native-like protein-protein interactions and DNA binding. In contrast, the quasi-spherical random proteins, being devoid of secondary structure, have a lower surface to volume ratio and lack ligand binding pockets and intermolecular interaction interfaces. Surprisingly, these quasi-spherical random proteins exhibit protein like distributions of virtual bond angles and almost all have a statistically significant structural match to real protein structures. This implies that it is local chain stiffness, even without backbone hydrogen bonding, and compactness that give rise to the likely completeness of the library solved single domain protein structures. These studies also suggest that the packing of secondary structural elements generates the requisite geometry for intermolecular binding. Thus, backbone hydrogen bonding plays an important role not only in protein structure but also in protein function. Such ability to bind biological molecules is an inherent feature of protein structure; if combined with appropriate protein sequences, it could provide the non-zero background probability for low-level function that evolution requires for selection to occur.
Collapse
Affiliation(s)
- Michal Brylinski
- Center for the Study of Systems Biology, Georgia Institute of Technology, 250 14th St NW, Atlanta, GA 30076, USA
| | | | | |
Collapse
|
36
|
Abroi A, Gough J. Are viruses a source of new protein folds for organisms? - Virosphere structure space and evolution. Bioessays 2011; 33:626-35. [DOI: 10.1002/bies.201000126] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
37
|
Hollup SM, Sadowski MI, Jonassen I, Taylor WR. Exploring the limits of fold discrimination by structural alignment: a large scale benchmark using decoys of known fold. Comput Biol Chem 2011; 35:174-88. [PMID: 21704264 PMCID: PMC3145973 DOI: 10.1016/j.compbiolchem.2011.04.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2011] [Accepted: 04/23/2011] [Indexed: 11/10/2022]
Abstract
Protein structure comparison by pairwise alignment is commonly used to identify highly similar substructures in pairs of proteins and provide a measure of structural similarity based on the size and geometric similarity of the match. These scores are routinely applied in analyses of protein fold space under the assumption that high statistical significance is equivalent to a meaningful relationship, however the truth of this assumption has previously been difficult to test since there is a lack of automated methods which do not rely on the same underlying principles. As a resolution to this we present a method based on the use of topological descriptions of global protein structure, providing an independent means to assess the ability of structural alignment to maintain meaningful structural correspondances on a large scale. Using a large set of decoys of specified global fold we benchmark three widely used methods for structure comparison, SAP, TM-align and DALI, and test the degree to which this assumption is justified for these methods. Application of a topological edit distance measure to provide a scale of the degree of fold change shows that while there is a broad correlation between high structural alignment scores and low edit distances there remain many pairs of highly significant score which differ by core strand swaps and therefore are structurally different on a global level. Possible causes of this problem and its meaning for present assessments of protein fold space are discussed.
Collapse
|
38
|
Dai L, Zhou Y. Characterizing the existing and potential structural space of proteins by large-scale multiple loop permutations. J Mol Biol 2011; 408:585-95. [PMID: 21376059 DOI: 10.1016/j.jmb.2011.02.056] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2010] [Revised: 02/22/2011] [Accepted: 02/24/2011] [Indexed: 10/18/2022]
Abstract
Worldwide structural genomics projects are increasing structure coverage of sequence space but have not significantly expanded the protein structure space itself (i.e., number of unique structural folds) since 2007. Discovering new structural folds experimentally by directed evolution and random recombination of secondary-structure blocks is also proved rarely successful. Meanwhile, previous computational efforts for large-scale mapping of protein structure space are limited to simple model proteins and led to an inconclusive answer on the completeness of the existing observed protein structure space. Here, we build novel protein structures by extending naturally occurring circular (single-loop) permutation to multiple loop permutations (MLPs). These structures are clustered by structural similarity measure called TM-score. The computational technique allows us to produce different structural clusters on the same naturally occurring, packed, stable core but with alternatively connected secondary-structure segments. A large-scale MLP of 2936 domains from structural classification of protein domains reproduces those existing structural clusters (63%) mostly as hubs for many nonredundant sequences and illustrates newly discovered novel clusters as islands adopted by a few sequences only. Results further show that there exist a significant number of novel potentially stable clusters for medium-size or large-size single-domain proteins, in particular, >100 amino acid residues, that are either not yet adopted by nature or adopted only by a few sequences. This study suggests that MLP provides a simple yet highly effective tool for engineering and design of novel protein structures (including naturally knotted proteins). The implication of recovering new-fold targets from critical assessment of structure prediction techniques (CASP) by MLP on template-based structure prediction is also discussed. Our MLP structures are available for download at the publication page of the Web site http://sparks.informatics.iupui.edu.
Collapse
Affiliation(s)
- Liang Dai
- School of Informatics, Indiana University Purdue University Indianapolis, and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 719 Indiana Avenue, Walker Plaza Building Suite 319, Indianapolis, IN 46202, USA
| | | |
Collapse
|
39
|
Schaeffer RD, Daggett V. Protein folds and protein folding. Protein Eng Des Sel 2011; 24:11-9. [PMID: 21051320 PMCID: PMC3003448 DOI: 10.1093/protein/gzq096] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2010] [Revised: 10/09/2010] [Accepted: 10/11/2010] [Indexed: 01/07/2023] Open
Abstract
The classification of protein folds is necessarily based on the structural elements that distinguish domains. Classification of protein domains consists of two problems: the partition of structures into domains and the classification of domains into sets of similar structures (or folds). Although similar topologies may arise by convergent evolution, the similarity of their respective folding pathways is unknown. The discovery and the characterization of the majority of protein folds will be followed by a similar enumeration of available protein folding pathways. Consequently, understanding the intricacies of structural domains is necessary to understanding their collective folding pathways. We review the current state of the art in the field of protein domain classification and discuss methods for the systematic and comprehensive study of protein folding across protein fold space via atomistic molecular dynamics simulation. Finally, we discuss our large-scale Dynameomics project, which includes simulations of representatives of all autonomous protein folds.
Collapse
Affiliation(s)
| | - Valerie Daggett
- Department of Bioengineering, University of Washington, Seattle, WA 98195-5013, USA
| |
Collapse
|
40
|
Structural space of protein-protein interfaces is degenerate, close to complete, and highly connected. Proc Natl Acad Sci U S A 2010; 107:22517-22. [PMID: 21149688 DOI: 10.1073/pnas.1012820107] [Citation(s) in RCA: 103] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
At the heart of protein-protein interactions are protein-protein interfaces where the direct physical interactions occur. By developing and applying an efficient structural alignment method, we study the structural similarity of representative protein-protein interfaces involving interactions between dimers. Even without structural similarity between individual monomers that form dimeric complexes, ∼90% of native interfaces have a close structural neighbor with similar backbone C(α) geometry and interfacial contact pattern. About 80% of the interfaces form a dense network, where any two interfaces are structurally related using a transitive set of at most seven intermediate interfaces. The degeneracy of interface space is largely due to the packing of compact, hydrogen-bonded secondary structure elements. This packing generates relatively flat interacting surfaces whose geometries are highly degenerate. Comparative study of artificial and native interfaces argues that the library of protein interfaces is close to complete and comprised of roughly 1,000 distinct interface types. In contrast, the number of possible quaternary structures of dimers is estimated to be about 10(4) times larger; thus, an experimentally determined database of all representative quaternary structures is not likely in the near future. Nevertheless, one could in principle exploit the completeness of protein interfaces to predict most dimeric quaternary structures. Finally, our results provide a structural explanation for the prevalence of promiscuous protein interactions. By side-chain packing adjustments, we illustrate how multiprotein specificity can be attained at a promiscuous interface.
Collapse
|
41
|
Vuong TV, Wilson DB. Glycoside hydrolases: catalytic base/nucleophile diversity. Biotechnol Bioeng 2010; 107:195-205. [PMID: 20552664 DOI: 10.1002/bit.22838] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Recent studies have shown that a number of glycoside hydrolase families do not follow the classical catalytic mechanisms, as they lack a typical catalytic base/nucleophile. A variety of mechanisms are used to replace this function, including substrate-assisted catalysis, a network of several residues, and the use of non-carboxylate residues or exogenous nucleophiles. Removal of the catalytic base/nucleophile by mutation can have a profound impact on substrate specificity, producing enzymes with completely new functions.
Collapse
Affiliation(s)
- Thu V Vuong
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14850, USA
| | | |
Collapse
|
42
|
Wu S, Zhang Y. Recognizing protein substructure similarity using segmental threading. Structure 2010; 18:858-67. [PMID: 20637422 DOI: 10.1016/j.str.2010.04.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2010] [Revised: 04/02/2010] [Accepted: 04/03/2010] [Indexed: 11/15/2022]
Abstract
Protein template identification is essential to protein structure and function predictions. However, conventional whole-chain threading approaches often fail to recognize conserved substructure motifs when the target and templates do not share the same fold. We developed a new approach, SEGMER, for identifying protein substructure similarities by segmental threading. The target sequence is split into segments of two to four consecutive or nonconsecutive secondary structural elements, which are then threaded through PDB to identify appropriate substructure motifs. SEGMER is tested on 144 nonredundant hard proteins. When combined with whole-chain threading, the TM-score of alignments and accuracy of spatial restraints of SEGMER increase by 16% and 25%, respectively, compared with that by the whole-chain threading methods only. When tested on 12 free modeling targets from CASP8, SEGMER increases the TM-score and contact accuracy by 28% and 48%, respectively. This significant improvement should have important impact on protein structure modeling and functional inference.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, USA
| | | |
Collapse
|
43
|
Chubb D, Jefferys BR, Sternberg MJE, Kelley LA. Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe. ACTA ACUST UNITED AC 2010; 26:2664-71. [PMID: 20843957 DOI: 10.1093/bioinformatics/btq527] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Databases of sequenced genomes are widely used to characterize the structure, function and evolutionary relationships of proteins. The ability to discern such relationships is widely expected to grow as sequencing projects provide novel information, bridging gaps in our map of the protein universe. RESULTS We have plotted our progress in protein sequencing over the last two decades and found that the rate of novel sequence discovery is in a sustained period of decline. Consequently, PSI-BLAST, the most widely used method to detect remote evolutionary relationships, which relies upon the accumulation of novel sequence data, is now showing a plateau in performance. We interpret this trend as signalling our approach to a representative map of the protein universe and discuss its implications.
Collapse
Affiliation(s)
- Daniel Chubb
- Department of Life Science, Imperial College London, London, UK.
| | | | | | | |
Collapse
|
44
|
On the evolutionary origins of "Fold Space Continuity": a study of topological convergence and divergence in mixed alpha-beta domains. J Struct Biol 2010; 172:244-52. [PMID: 20691788 DOI: 10.1016/j.jsb.2010.07.016] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2010] [Revised: 06/25/2010] [Accepted: 07/31/2010] [Indexed: 11/21/2022]
Abstract
Existing protein structure classifications group proteins by overall structural similarity at the highest level and by evolutionary relationships at the lowest level, deriving higher-level groups by pairwise structure comparison. For this to be successful requires that large changes in structure are relatively rare in evolution and that proteins with no detectable evolutionary relationship do not converge on similar global chain conformations since this creates conflicts between structural and evolutionary consistency. Analysis of global structural changes using core topological descriptions for 4261 domains from classes C and D of the SCOP database and new measures of topological distance and consistency of classification showed that the topological consistency of SCOP folds is highly variable with some folds having no consistent description and significant overlaps between groups including some members of separate folds with identical topological descriptions. Topological clustering shows that including sufficient indels to allow family members to be joined would also require joining several distinct folds. We conclude that evolutionary changes in the global topology of protein domains are the root cause of many difficulties for present approaches to structure classification using pairwise comparison. As a resolution we propose that a purely structural classification should be created using an approach similar to that adopted by the Gene Ontology in which proteins are assigned labels describing structure.
Collapse
|
45
|
Di Lena P, Fariselli P, Margara L, Vassura M, Casadio R. Fast overlapping of protein contact maps by alignment of eigenvectors. ACTA ACUST UNITED AC 2010; 26:2250-8. [PMID: 20610612 DOI: 10.1093/bioinformatics/btq402] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Searching for structural similarity is a key issue of protein functional annotation. The maximum contact map overlap (CMO) is one of the possible measures of protein structure similarity. Exact and approximate methods known to optimize the CMO are computationally expensive and this hampers their applicability to large-scale comparison of protein structures. RESULTS In this article, we describe a heuristic algorithm (Al-Eigen) for finding a solution to the CMO problem. Our approach relies on the approximation of contact maps by eigendecomposition. We obtain good overlaps of two contact maps by computing the optimal global alignment of few principal eigenvectors. Our algorithm is simple, fast and its running time is independent of the amount of contacts in the map. Experimental testing indicates that the algorithm is comparable to exact CMO methods in terms of the overlap quality, to structural alignment methods in terms of structure similarity detection and it is fast enough to be suited for large-scale comparison of protein structures. Furthermore, our preliminary tests indicates that it is quite robust to noise, which makes it suitable for structural similarity detection also for noisy and incomplete contact maps. AVAILABILITY Available at http://bioinformatics.cs.unibo.it/Al-Eigen.
Collapse
Affiliation(s)
- Pietro Di Lena
- Department of Computer Science, University of Bologna, Bologna, Italy.
| | | | | | | | | |
Collapse
|
46
|
Omelchenko MV, Galperin MY, Wolf YI, Koonin EV. Non-homologous isofunctional enzymes: a systematic analysis of alternative solutions in enzyme evolution. Biol Direct 2010; 5:31. [PMID: 20433725 PMCID: PMC2876114 DOI: 10.1186/1745-6150-5-31] [Citation(s) in RCA: 108] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2010] [Accepted: 04/30/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Evolutionarily unrelated proteins that catalyze the same biochemical reactions are often referred to as analogous - as opposed to homologous - enzymes. The existence of numerous alternative, non-homologous enzyme isoforms presents an interesting evolutionary problem; it also complicates genome-based reconstruction of the metabolic pathways in a variety of organisms. In 1998, a systematic search for analogous enzymes resulted in the identification of 105 Enzyme Commission (EC) numbers that included two or more proteins without detectable sequence similarity to each other, including 34 EC nodes where proteins were known (or predicted) to have distinct structural folds, indicating independent evolutionary origins. In the past 12 years, many putative non-homologous isofunctional enzymes were identified in newly sequenced genomes. In addition, efforts in structural genomics resulted in a vastly improved structural coverage of proteomes, providing for definitive assessment of (non)homologous relationships between proteins. RESULTS We report the results of a comprehensive search for non-homologous isofunctional enzymes (NISE) that yielded 185 EC nodes with two or more experimentally characterized - or predicted - structurally unrelated proteins. Of these NISE sets, only 74 were from the original 1998 list. Structural assignments of the NISE show over-representation of proteins with the TIM barrel fold and the nucleotide-binding Rossmann fold. From the functional perspective, the set of NISE is enriched in hydrolases, particularly carbohydrate hydrolases, and in enzymes involved in defense against oxidative stress. CONCLUSIONS These results indicate that at least some of the non-homologous isofunctional enzymes were recruited relatively recently from enzyme families that are active against related substrates and are sufficiently flexible to accommodate changes in substrate specificity.
Collapse
Affiliation(s)
- Marina V Omelchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | | | | | | |
Collapse
|
47
|
Abstract
Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co-occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.
Collapse
Affiliation(s)
- Vikram Alva
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Tübingen 72076, Germany
| | | | | | | | | |
Collapse
|
48
|
Wrabl JO, Hilser VJ. Investigating homology between proteins using energetic profiles. PLoS Comput Biol 2010; 6:e1000722. [PMID: 20361049 PMCID: PMC2845653 DOI: 10.1371/journal.pcbi.1000722] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2009] [Accepted: 02/25/2010] [Indexed: 11/19/2022] Open
Abstract
Accumulated experimental observations demonstrate that protein stability is often preserved upon conservative point mutation. In contrast, less is known about the effects of large sequence or structure changes on the stability of a particular fold. Almost completely unknown is the degree to which stability of different regions of a protein is generally preserved throughout evolution. In this work, these questions are addressed through thermodynamic analysis of a large representative sample of protein fold space based on remote, yet accepted, homology. More than 3,000 proteins were computationally analyzed using the structural-thermodynamic algorithm COREX/BEST. Estimated position-specific stability (i.e., local Gibbs free energy of folding) and its component enthalpy and entropy were quantitatively compared between all proteins in the sample according to all-vs.-all pairwise structural alignment. It was discovered that the local stabilities of homologous pairs were significantly more correlated than those of non-homologous pairs, indicating that local stability was indeed generally conserved throughout evolution. However, the position-specific enthalpy and entropy underlying stability were less correlated, suggesting that the overall regional stability of a protein was more important than the thermodynamic mechanism utilized to achieve that stability. Finally, two different types of statistically exceptional evolutionary structure-thermodynamic relationships were noted. First, many homologous proteins contained regions of similar thermodynamics despite localized structure change, suggesting a thermodynamic mechanism enabling evolutionary fold change. Second, some homologous proteins with extremely similar structures nonetheless exhibited different local stabilities, a phenomenon previously observed experimentally in this laboratory. These two observations, in conjunction with the principal conclusion that homologous proteins generally conserved local stability, may provide guidance for a future thermodynamically informed classification of protein homology. Protein structure and function are fundamentally determined by thermodynamics. However, for technical as well as historical reasons, current evolutionary classification schemes and bioinformatics tools do not fully utilize thermodynamic information to describe or analyze proteins. In this work, we address this deficiency by computationally estimating the position-specific thermodynamic quantities of stability (ΔG), enthalpy (ΔH), and entropy (TΔS) for a large and diverse representative sample of protein structures. The sample was drawn from an expertly curated database, such that accepted evolutionary relationships existed for all protein pairs. Importantly, trivial relationships between pairs highly similar in amino acid sequence were explicitly excluded. We found that all position-specific thermodynamic quantities ΔG, ΔH, and TΔS were more similar between proteins that were evolutionarily related (i.e., homologous), and were less similar between proteins that were not evolutionarily related (i.e., non-homologous), with stability being particularly similar between homologous proteins. However, interesting statistically significant exceptions to these trends were observed, exceptions that could indicate novel processes of functional adaptation or evolutionary fold change, mediated by thermodynamics, for the proteins involved. Taken together, these results expand our understanding of the role of thermodynamics in protein evolution and suggest an organizational framework for a future thermodynamically-informed classification of protein homology.
Collapse
Affiliation(s)
- James O. Wrabl
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch at Galveston, Galveston, Texas, United States of America
- Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch at Galveston, Galveston, Texas, United States of America
| | - Vincent J. Hilser
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch at Galveston, Galveston, Texas, United States of America
- Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch at Galveston, Galveston, Texas, United States of America
- * E-mail:
| |
Collapse
|
49
|
Abstract
MOTIVATION Protein structure similarity is often measured by root mean squared deviation, global distance test score and template modeling score (TM-score). However, the scores themselves cannot provide information on how significant the structural similarity is. Also, it lacks a quantitative relation between the scores and conventional fold classifications. This article aims to answer two questions: (i) what is the statistical significance of TM-score? (ii) What is the probability of two proteins having the same fold given a specific TM-score? RESULTS We first made an all-to-all gapless structural match on 6684 non-homologous single-domain proteins in the PDB and found that the TM-scores follow an extreme value distribution. The data allow us to assign each TM-score a P-value that measures the chance of two randomly selected proteins obtaining an equal or higher TM-score. With a TM-score at 0.5, for instance, its P-value is 5.5 x 10(-7), which means we need to consider at least 1.8 million random protein pairs to acquire a TM-score of no less than 0.5. Second, we examine the posterior probability of the same fold proteins from three datasets SCOP, CATH and the consensus of SCOP and CATH. It is found that the posterior probability from different datasets has a similar rapid phase transition around TM-score=0.5. This finding indicates that TM-score can be used as an approximate but quantitative criterion for protein topology classification, i.e. protein pairs with a TM-score >0.5 are mostly in the same fold while those with a TM-score <0.5 are mainly not in the same fold.
Collapse
Affiliation(s)
- Jinrui Xu
- Department of Medical School, Center for Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | | |
Collapse
|
50
|
Sadowski MI, Taylor WR. Protein structures, folds and fold spaces. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2010; 22:033103. [PMID: 21386276 DOI: 10.1088/0953-8984/22/3/033103] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
There has been considerable progress towards the goal of understanding the space of possible tertiary structures adopted by proteins. Despite a greatly increased rate of structure determination and a deliberate strategy of sequencing proteins expected to be very different from those already known, it is now rare to see a genuinely new fold, leading to the conclusion that we have seen the majority of natural structural types. The increase in knowledge has also led to a critical examination of traditional fold-based classifications and their meaning for evolution and protein structures. We review these issues and discuss possible solutions.
Collapse
Affiliation(s)
- Michael I Sadowski
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK
| | | |
Collapse
|