1
|
Razban RM, Gilson AI, Durfee N, Strobelt H, Dinkla K, Choi JM, Pfister H, Shakhnovich EI. ProteomeVis: a web app for exploration of protein properties from structure to sequence evolution across organisms' proteomes. Bioinformatics 2018; 34:3557-3565. [PMID: 29741573 PMCID: PMC6184454 DOI: 10.1093/bioinformatics/bty370] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 03/27/2018] [Accepted: 05/03/2018] [Indexed: 01/27/2023] Open
Abstract
Motivation Protein evolution spans time scales and its effects span the length of an organism. A web app named ProteomeVis is developed to provide a comprehensive view of protein evolution in the Saccharomyces cerevisiae and Escherichia coli proteomes. ProteomeVis interactively creates protein chain graphs, where edges between nodes represent structure and sequence similarities within user-defined ranges, to study the long time scale effects of protein structure evolution. The short time scale effects of protein sequence evolution are studied by sequence evolutionary rate (ER) correlation analyses with protein properties that span from the molecular to the organismal level. Results We demonstrate the utility and versatility of ProteomeVis by investigating the distribution of edges per node in organismal protein chain universe graphs (oPCUGs) and putative ER determinants. S.cerevisiae and E.coli oPCUGs are scale-free with scaling constants of 1.79 and 1.56, respectively. Both scaling constants can be explained by a previously reported theoretical model describing protein structure evolution. Protein abundance most strongly correlates with ER among properties in ProteomeVis, with Spearman correlations of -0.49 (P-value < 10-10) and -0.46 (P-value < 10-10) for S.cerevisiae and E.coli, respectively. This result is consistent with previous reports that found protein expression to be the most important ER determinant. Availability and implementation ProteomeVis is freely accessible at http://proteomevis.chem.harvard.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rostam M Razban
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Amy I Gilson
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Niamh Durfee
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Hendrik Strobelt
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Kasper Dinkla
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Jeong-Mo Choi
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Hanspeter Pfister
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Eugene I Shakhnovich
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
2
|
Franklin MW, Nepomnyachiy S, Feehan R, Ben-Tal N, Kolodny R, Slusky JSG. Efflux Pumps Represent Possible Evolutionary Convergence onto the β-Barrel Fold. Structure 2018; 26:1266-1274.e2. [PMID: 30057025 PMCID: PMC6125174 DOI: 10.1016/j.str.2018.06.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 05/17/2018] [Accepted: 06/20/2018] [Indexed: 11/22/2022]
Abstract
There are around 100 varieties of outer membrane proteins in each Gram-negative bacteria. All of these proteins have the same fold-an up-down β-barrel. It has been suggested that all membrane β-barrels excluding lysins are homologous. Here we suggest that β-barrels of efflux pumps have converged on this fold as well. By grouping structurally solved outer membrane β-barrels (OMBBs) by sequence we find that the membrane environment may have led to convergent evolution of the barrel fold. Specifically, the lack of sequence linkage to other barrels coupled with distinctive structural differences, such as differences in strand tilt and barrel radius, suggest that the outer membrane factor of efflux pumps evolutionarily converged on the barrel. Rather than being related to other OMBBs, sequence and structural similarity in the periplasmic region of the outer membrane factor of efflux pumps suggests an evolutionary link to the periplasmic subunit of the same pump complex.
Collapse
Affiliation(s)
| | - Sergey Nepomnyachiy
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel; Department of Computer Science, University of Haifa, Mount Carmel, Haifa 3498838, Israel
| | - Ryan Feehan
- Center for Computational Biology, University of Kansas, Lawrence, KS 66045, USA
| | - Nir Ben-Tal
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Mount Carmel, Haifa 3498838, Israel
| | - Joanna S G Slusky
- Center for Computational Biology, University of Kansas, Lawrence, KS 66045, USA; Department of Molecular Biosciences, University of Kansas, Lawrence, KS 66045, USA.
| |
Collapse
|
3
|
Sequence and structure space model of protein divergence driven by point mutations. J Theor Biol 2013; 330:1-8. [DOI: 10.1016/j.jtbi.2013.03.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Revised: 03/07/2013] [Accepted: 03/18/2013] [Indexed: 12/11/2022]
|
4
|
Mannige RV. Two modes of protein sequence evolution and their compositional dependencies. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013; 87:062714. [PMID: 23848722 DOI: 10.1103/physreve.87.062714] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2013] [Revised: 05/10/2013] [Indexed: 06/02/2023]
Abstract
Protein sequence evolution has resulted in a vast repertoire of molecular functionality crucial to life. Despite the central importance of sequence evolution to biology, our fundamental understanding of how sequence composition affects evolution is incomplete. This report describes the utilization of lattice model simulations of directed evolution, which indicate that, on average, peptide and protein evolvability is strongly dependent on initial sequence composition. The report also discusses two distinct regimes of sequence evolution by point mutation: (a) the "classical" mode where sequences "crawl" over free energy barriers towards acquiring a target fold, and (b) the "quantum" mode where sequences appear to "tunnel" through large energy barriers generally insurmountable by means of a crawl. Finally, the simulations indicate that oily and charged peptides are the most efficient substrates for evolution at the "classical" and "quantum" regimes, respectively, and that their respective response to temperature is commensurate with analogies made to barrier crossing in classical and quantum systems. On the whole, these results show that sequence composition can tune both the evolvability and the optimal mode of evolution of peptides and proteins.
Collapse
Affiliation(s)
- Ranjan V Mannige
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
| |
Collapse
|
5
|
Deeds EJ, Shakhnovich EI. A structure-centric view of protein evolution, design, and adaptation. ADVANCES IN ENZYMOLOGY AND RELATED AREAS OF MOLECULAR BIOLOGY 2010; 75:133-91, xi-xii. [PMID: 17124867 DOI: 10.1002/9780471224464.ch2] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Proteins, by virtue of their central role in most biological processes, represent one of the key subjects of the study of molecular evolution. Inherent in the indispensability of proteins for living cells is the fact that a given protein can adopt a specific three-dimensional shape that is specified solely by the protein's sequence of amino acids. Over the past several decades, structural biologists have demonstrated that the array of structures that proteins may adopt is quite astounding, and this has lead to a strong interest in understanding how protein structures change and evolve over time. In this review we consider a large body of recent work that attempts to illuminate this structure-centric picture of protein evolution. Much of this work has focused on the question of how completely new protein structures (i.e., new folds or topologies) are discovered by protein sequences as they evolve. Pursuant to this question of structural innovation has been a desire to describe and understand the observation that certain types of protein structures are far more abundant than others and how this uneven distribution of proteins implicates on the process through which new shapes are discovered. We consider a number of theoretical models that have been successful at explaining this heterogeneity in protein populations and discuss the increasing amount of evidence that indicates that the process of structural evolution involves the divergence of protein sequences and structures from one another. We also consider the topic of protein designability, which concerns itself with understanding how a protein's structure influences the number of sequences that can fold successfully into that structure. Understanding and quantifying the relationship between the physical feature of a structure and its designability has been a long-standing goal of the study of protein structure and evolution, and we discuss a number of recent advances that have yielded a promising answer to this question. Finally, we review the relatively new field of protein structural phylogeny, an area of study in which information about the distribution of protein structures among different organisms is used to reconstruct the evolutionary relationships between them. Taken together, the work that we review presents an increasingly coherent picture of how these unique polymers have evolved over the course of life on Earth.
Collapse
Affiliation(s)
- Eric J Deeds
- Department of Molecular and Cellular Biology, Harvard University, 7 Divinity Avenue, Cambridge, MA 02138, USA
| | | |
Collapse
|
6
|
Pascual-García A, Abia D, Ortiz ÁR, Bastolla U. Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures. PLoS Comput Biol 2009; 5:e1000331. [PMID: 19325884 PMCID: PMC2654728 DOI: 10.1371/journal.pcbi.1000331] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2008] [Accepted: 02/11/2009] [Indexed: 11/19/2022] Open
Abstract
Structural classifications of proteins assume the existence of the fold, which is an intrinsic equivalence class of protein domains. Here, we test in which conditions such an equivalence class is compatible with objective similarity measures. We base our analysis on the transitive property of the equivalence relationship, requiring that similarity of A with B and B with C implies that A and C are also similar. Divergent gene evolution leads us to expect that the transitive property should approximately hold. However, if protein domains are a combination of recurrent short polypeptide fragments, as proposed by several authors, then similarity of partial fragments may violate the transitive property, favouring the continuous view of the protein structure space. We propose a measure to quantify the violations of the transitive property when a clustering algorithm joins elements into clusters, and we find out that such violations present a well defined and detectable cross-over point, from an approximately transitive regime at high structure similarity to a regime with large transitivity violations and large differences in length at low similarity. We argue that protein structure space is discrete and hierarchic classification is justified up to this cross-over point, whereas at lower similarities the structure space is continuous and it should be represented as a network. We have tested the qualitative behaviour of this measure, varying all the choices involved in the automatic classification procedure, i.e., domain decomposition, alignment algorithm, similarity score, and clustering algorithm, and we have found out that this behaviour is quite robust. The final classification depends on the chosen algorithms. We used the values of the clustering coefficient and the transitivity violations to select the optimal choices among those that we tested. Interestingly, this criterion also favours the agreement between automatic and expert classifications. As a domain set, we have selected a consensus set of 2,890 domains decomposed very similarly in SCOP and CATH. As an alignment algorithm, we used a global version of MAMMOTH developed in our group, which is both rapid and accurate. As a similarity measure, we used the size-normalized contact overlap, and as a clustering algorithm, we used average linkage. The resulting automatic classification at the cross-over point was more consistent than expert ones with respect to the structure similarity measure, with 86% of the clusters corresponding to subsets of either SCOP or CATH superfamilies and fewer than 5% containing domains in distinct folds according to both SCOP and CATH. Almost 15% of SCOP superfamilies and 10% of CATH superfamilies were split, consistent with the notion of fold change in protein evolution. These results were qualitatively robust for all choices that we tested, although we did not try to use alignment algorithms developed by other groups. Folds defined in SCOP and CATH would be completely joined in the regime of large transitivity violations where clustering is more arbitrary. Consistently, the agreement between SCOP and CATH at fold level was lower than their agreement with the automatic classification obtained using as a clustering algorithm, respectively, average linkage (for SCOP) or single linkage (for CATH). The networks representing significant evolutionary and structural relationships between clusters beyond the cross-over point may allow us to perform evolutionary, structural, or functional analyses beyond the limits of classification schemes. These networks and the underlying clusters are available at http://ub.cbm.uam.es/research/ProtNet.php Making order of the fast-growing information on proteins is essential for gaining evolutionary and functional knowledge. The most successful approaches to this task are based on classifications of protein structures, such as SCOP and CATH, which assume a discrete view of the protein structure space as a collection of separated equivalence classes (folds). However, several authors proposed that protein domains should be regarded as assemblies of polypeptide fragments, which implies that the protein–structure space is continuous. Here, we assess these views of domain space through the concept of transitivity; i.e., we test whether structure similarity of A with B and B with C implies that A and C are similar, as required for consistent classification. We find that the domain space is approximately transitive and discrete at high similarity and continuous at low similarity, where transitivity is severely violated. Comparing our classification at the cross-over similarity with CATH and SCOP, we find that they join proteins at low similarity where classification is inconsistent. Part of this discrepancy is due to structural divergence of homologous domains, which are forced to be in a single cluster in CATH and SCOP. Structural and evolutionary relationships between consistent clusters are represented as a network in our approach, going beyond current protein classification schemes. We conjecture that our results are related to a change of evolutionary regime, from uniparental divergent evolution for highly related domains to assembly of large fragments for which the classical tree representation is unsuitable.
Collapse
Affiliation(s)
| | - David Abia
- Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid, Spain
| | - Ángel R. Ortiz
- Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid, Spain
| | - Ugo Bastolla
- Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid, Spain
- * E-mail:
| |
Collapse
|
7
|
Itoh M, Nacher JC, Kuma KI, Goto S, Kanehisa M. Evolutionary history and functional implications of protein domains and their combinations in eukaryotes. Genome Biol 2008; 8:R121. [PMID: 17588271 PMCID: PMC2394772 DOI: 10.1186/gb-2007-8-6-r121] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2007] [Revised: 05/10/2007] [Accepted: 06/25/2007] [Indexed: 01/08/2023] Open
Abstract
A rapid emergence of animal-specific domains was observed in animals, contributing to specific domain combinations and functional diversification, but no similar trends were observed in other clades of eukaryotes. Background In higher multicellular eukaryotes, complex protein domain combinations contribute to various cellular functions such as regulation of intercellular or intracellular signaling and interactions. To elucidate the characteristics and evolutionary mechanisms that underlie such domain combinations, it is essential to examine the different types of domains and their combinations among different groups of eukaryotes. Results We observed a large number of group-specific domain combinations in animals, especially in vertebrates. Examples include animal-specific combinations in tyrosine phosphorylation systems and vertebrate-specific combinations in complement and coagulation cascades. These systems apparently underwent extensive evolution in the ancestors of these groups. In extant animals, especially in vertebrates, animal-specific domains have greater connectivity than do other domains on average, and contribute to the varying number of combinations in each animal subgroup. In other groups, the connectivities of older domains were greater on average. To observe the global behavior of domain combinations during evolution, we traced the changes in domain combinations among animals and fungi in a network analysis. Our results indicate that there is a correlation between the differences in domain combinations among different phylogenetic groups and different global behaviors. Conclusion Rapid emergence of animal-specific domains was observed in animals, contributing to specific domain combinations and functional diversification, but no such trends were observed in other clades of eukaryotes. We therefore suggest that the strategy for achieving complex multicellular systems in animals differs from that of other eukaryotes.
Collapse
Affiliation(s)
- Masumi Itoh
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan
| | - Jose C Nacher
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan
| | - Kei-ichi Kuma
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan
| | - Susumu Goto
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan
| | - Minoru Kanehisa
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan
| |
Collapse
|
8
|
Abstract
We develop models of the divergent evolution of genomes; the elementary object of sequence dynamics is the protein structural domain. To identify patterns of organization that reflect mechanisms of evolution, we consider the individual genomes of many procaryote species, studying the arrangement of protein structural domains in the space of all polypeptide structures. We view the network of structural similarities as a graph, called the organismal Protein Domain Universe Graph (oPDUG); vertices represent types of structural domains and edges represent strong structural similarity. As observed before, each oPDUG is a highly nonrandom graph, as evidenced in the vertex degree distribution, which resembles a Pareto law (which has a power-law asymptotic). To explain this and other peculiar properties of the oPDUGs, we construct an evolving-graph model for the long-timescale evolutionary dynamics of oPDUGs, containing only divergent mechanisms of domain discovery. The model generates degree distributions (resembling Pareto laws) and clustering-coefficient distributions that are characteristic of the oPDUGs. In the infinite-graph limit, we analytically compute the exponent for specific biological parameters, as well as the complete phase diagram of the model, finding two distinct regimes of domain innovation dynamics. Thus, divergent evolutionary dynamics quantitatively explains the nonrandom organization of oPDUGs.
Collapse
Affiliation(s)
- C Brian Roland
- Chemical Physics Program, Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | | |
Collapse
|
9
|
Panchenko AR, Wolf YI, Panchenko LA, Madej T. Evolutionary plasticity of protein families: coupling between sequence and structure variation. Proteins 2006; 61:535-44. [PMID: 16184609 PMCID: PMC1941674 DOI: 10.1002/prot.20644] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In this work we examine how protein structural changes are coupled with sequence variation in the course of evolution of a family of homologs. The sequence-structure correlation analysis performed on 81 homologous protein families shows that the majority of them exhibit statistically significant linear correlation between the measures of sequence and structural similarity. We observed, however, that there are cases where structural variability cannot be mainly explained by sequence variation, such as protein families with a number of disulfide bonds. To understand whether structures from different families and/or folds evolve in the same manner, we compared the degrees of structural change per unit of sequence change ("the evolutionary plasticity of structure") between those families with a significant linear correlation. Using rigorous statistical procedures we find that, with a few exceptions, evolutionary plasticity does not show a statistically significant difference between protein families. Similar sequence-structure analysis performed for protein loop regions shows that evolutionary plasticity of loop regions is greater than for the protein core.
Collapse
Affiliation(s)
- Anna R Panchenko
- Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20894, USA.
| | | | | | | |
Collapse
|
10
|
Abstract
We review fold usage on completed genomes to explore protein structure evolution. The patterns of presence or absence of folds on genomes gives us insights into the relationships between folds, the age of different folds and how we have arrived at the set of folds we see today. We examine the relationships between different measures which describe protein fold usage, such as the number of copies of a fold per genome, the number of families per fold, and the number of genomes a fold occurs on. We obtained these measures of fold usage by searching for the structural domains on 157 completed genome sequences from all three kingdoms of life. In our comparisons of these measures we found that bacteria have relatively more distinct folds on their genomes than archaea. Eukaryotes were found to have many more copies of a fold on their genomes. If we separate out the different fold classes, the alpha/beta class has relatively fewer distinct folds on large genomes, more copies of a fold on bacteria and more folds occurring in all three kingdoms simultaneously. These results possibly indicate that most alpha/beta folds originated earlier than other folds. The expected power law distribution is observed for copies of a fold per genome and we found a similar distribution for the number of families per fold. However, a more complicated distribution appears for fold occurrence across genomes, which strongly depends on fold class and kingdom. We also show that there is not a clear relationship between the three measures of fold usage. A fold which occurs on many genomes does not necessarily have many copies on each genome. Similarly, folds with many copies do not necessarily have many families or vice versa.
Collapse
Affiliation(s)
- Sanne Abeln
- Department of Statistics, University of Oxford, United Kingdom
| | | |
Collapse
|
11
|
Fliri AF, Loging WT, Thadeio PF, Volkmann RA. Biospectra Analysis: Model Proteome Characterizations for Linking Molecular Structure and Biological Response. J Med Chem 2005; 48:6918-25. [PMID: 16250650 DOI: 10.1021/jm050494g] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Establishing quantitative relationships between molecular structure and broad biological effects has been a long-standing goal in drug discovery. Evaluation of the capacity of molecules to modulate protein functions is a prerequisite for understanding the relationship between molecular structure and in vivo biological response. A particular challenge in these investigations is to derive quantitative measurements of a molecule's functional activity pattern across different proteins. Herein we describe an operationally simple probabilistic structure-activity relationship (SAR) approach, termed biospectra analysis, for identifying agonist and antagonist effect profiles of medicinal agents by using pattern similarity between biological activity spectra (biospectra) of molecules as the determinant. Accordingly, in vitro binding data (percent inhibition values of molecules determined at single high drug concentration in a battery of assays representing a cross section of the proteome) are useful for identifying functional effect profile similarity between medicinal agents. To illustrate this finding, the relationship between biospectra similarity of 24 molecules, identified by hierarchical clustering of a 1567 molecule dataset as being most closely aligned with the neurotransmitter dopamine, and their agonist or antagonist properties was probed. Distinguishing the results described in this study from those obtained with affinity-based methods, the observed association between biospectra and biological response profile similarity remains intact even upon removal of putative drug targets from the dataset (four dopaminergic [D1/D2/D3/D4] and two adrenergic [alpha1 and alpha2] receptors). These findings indicate that biospectra analysis provides an unbiased new tool for forecasting structure-response relationships and for translating broad biological effect information into chemical structure design.
Collapse
Affiliation(s)
- Anton F Fliri
- Pfizer Global Research and Development, Groton, CT 06340, USA.
| | | | | | | |
Collapse
|
12
|
Deeds EJ, Shakhnovich EI. The emergence of scaling in sequence-based physical models of protein evolution. Biophys J 2005; 88:3905-11. [PMID: 15805176 PMCID: PMC1305622 DOI: 10.1529/biophysj.104.051433] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
It has recently been discovered that many biological systems, when represented as graphs, exhibit a scale-free topology. One such system is the set of structural relationships among protein domains. The scale-free nature of this and other systems has previously been explained using network growth models that, although motivated by biological processes, do not explicitly consider the underlying physics or biology. In this work we explore a sequence-based model for the evolution protein structures and demonstrate that this model is able to recapitulate the scale-free nature observed in graphs of real protein structures. We find that this model also reproduces other statistical feature of the protein domain graph. This represents, to our knowledge, the first such microscopic, physics-based evolutionary model for a scale-free network of biological importance and as such has strong implications for our understanding of the evolution of protein structures and of other biological networks.
Collapse
Affiliation(s)
- Eric J Deeds
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| | | |
Collapse
|
13
|
Deeds EJ, Hennessey H, Shakhnovich EI. Prokaryotic phylogenies inferred from protein structural domains. Genome Res 2005; 15:393-402. [PMID: 15741510 PMCID: PMC551566 DOI: 10.1101/gr.3033805] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The determination of the phylogenetic relationships among microorganisms has long relied primarily on gene sequence information. Given that prokaryotic organisms often lack morphological characteristics amenable to phylogenetic analysis, prokaryotic phylogenies, in particular, are often based on sequence data. In this work, we explore a new source of phylogenetic information, the distribution of protein structural domains within fully sequenced prokaryotic genomes. The evolution of the structural domains we use has been studied extensively, allowing us to base our phylogenetic methods on testable theoretical models of structural evolution. We find that the methods that produce reasonable phylogenetic relationships are indeed the methods that are most consistent with theoretical evolutionary models. This work represents, to our knowledge, the first such theoretically motivated phylogeny, as well as the first application of structural information to phylogeny on this scale. Our results have strong implications for the phylogenetic relationships among prokaryotic organisms and for the understanding of protein evolution as a whole.
Collapse
Affiliation(s)
- Eric J Deeds
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | | | | |
Collapse
|
14
|
Abstract
A simple classification scheme that uses only the presence or absence of a protein domain architecture has been used to determine the phylogeny of 174 complete genomes. The method correctly divides the 174 taxa into Archaea, Bacteria, and Eukarya and satisfactorily sorts most of the major groups within these superkingdoms. The most challenging problem involved 119 Bacteria, many of which have reduced genomes. When a weighting factor was used that takes account of difference in genome size (number of considered folds), small-genome taxa were mostly grouped with their full-sized counterparts. Although not every organism appears exactly at its classical phylogenetic position in these trees, the agreement appears comparable with the efforts of others by using sophisticated sequence analysis and/or combinations of gene content and gene order. During the course of the study, it emerged that there is a core set of approximately 50 folds that is found in all 174 genomes and a single fold diagnostic of all Archaea.
Collapse
Affiliation(s)
- Song Yang
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA
| | | | | |
Collapse
|