1
|
Caetano-Anollés K, Aziz MF, Mughal F, Caetano-Anollés G. On Protein Loops, Prior Molecular States and Common Ancestors of Life. J Mol Evol 2024:10.1007/s00239-024-10167-y. [PMID: 38652291 DOI: 10.1007/s00239-024-10167-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/22/2024] [Indexed: 04/25/2024]
Abstract
The principle of continuity demands the existence of prior molecular states and common ancestors responsible for extant macromolecular structure. Here, we focus on the emergence and evolution of loop prototypes - the elemental architects of protein domain structure. Phylogenomic reconstruction spanning superkingdoms and viruses generated an evolutionary chronology of prototypes with six distinct evolutionary phases defining a most parsimonious evolutionary progression of cellular life. Each phase was marked by strategic prototype accumulation shaping the structures and functions of common ancestors. The last universal common ancestor (LUCA) of cells and viruses and the last universal cellular ancestor (LUCellA) defined stem lines that were structurally and functionally complex. The evolutionary saga highlighted transformative forces. LUCA lacked biosynthetic ribosomal machinery, while the pivotal LUCellA lacked essential DNA biosynthesis and modern transcription. Early proteins therefore relied on RNA for genetic information storage but appeared initially decoupled from it, hinting at transformative shifts of genetic processing. Urancestral loop types suggest advanced folding designs were present at an early evolutionary stage. An exploration of loop geometric properties revealed gradual replacement of prototypes with α-helix and β-strand bracing structures over time, paving the way for the dominance of other loop types. AlphFold2-generated atomic models of prototype accretion described patterns of fold emergence. Our findings favor a ‛processual' model of evolving stem lines aligned with Woese's vision of a communal world. This model prompts discussing the 'problem of ancestors' and the challenges that lie ahead for research in taxonomy, evolution and complexity.
Collapse
Affiliation(s)
- Kelsey Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Callout Biotech, Albuquerque, NM, 87112, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|
2
|
Caetano-Anollés G, Claverie JM, Nasir A. A critical analysis of the current state of virus taxonomy. Front Microbiol 2023; 14:1240993. [PMID: 37601376 PMCID: PMC10435761 DOI: 10.3389/fmicb.2023.1240993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 07/20/2023] [Indexed: 08/22/2023] Open
Abstract
Taxonomical classification has preceded evolutionary understanding. For that reason, taxonomy has become a battleground fueled by knowledge gaps, technical limitations, and a priorism. Here we assess the current state of the challenging field, focusing on fallacies that are common in viral classification. We emphasize that viruses are crucial contributors to the genomic and functional makeup of holobionts, organismal communities that behave as units of biological organization. Consequently, viruses cannot be considered taxonomic units because they challenge crucial concepts of organismality and individuality. Instead, they should be considered processes that integrate virions and their hosts into life cycles. Viruses harbor phylogenetic signatures of genetic transfer that compromise monophyly and the validity of deep taxonomic ranks. A focus on building phylogenetic networks using alignment-free methodologies and molecular structure can help mitigate the impasse, at least in part. Finally, structural phylogenomic analysis challenges the polyphyletic scenario of multiple viral origins adopted by virus taxonomy, defeating a polyphyletic origin and supporting instead an ancient cellular origin of viruses. We therefore, prompt abandoning deep ranks and urgently reevaluating the validity of taxonomic units and principles of virus classification.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and C.R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Jean-Michel Claverie
- Structural and Genomic Information Laboratory (UMR7256), Mediterranean Institute of Microbiology (FR3479), IM2B, IOM, Aix Marseille University, CNRS, Marseille, France
| | | |
Collapse
|
3
|
Nasir A, Romero-Severson E, Claverie JM. Investigating the Concept and Origin of Viruses. Trends Microbiol 2020; 28:959-967. [PMID: 33158732 PMCID: PMC7609044 DOI: 10.1016/j.tim.2020.08.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/25/2020] [Accepted: 08/27/2020] [Indexed: 12/21/2022]
Abstract
The ongoing COVID-19 pandemic has piqued public interest in the properties, evolution, and emergence of viruses. Here, we discuss how these basic questions have surprisingly remained disputed despite being increasingly within the reach of scientific analysis. We review recent data-driven efforts that shed light into the origin and evolution of viruses and explain factors that resist the widespread acceptance of new views and insights. We propose a new definition of viruses that is not restricted to the presence or absence of any genetic or physical feature, detail a scenario for how viruses likely originated from ancient cells, and explain technical and conceptual biases that limit our understanding of virus evolution. We note that the philosophical aspects of virus evolution also impact the way we might prepare for future outbreaks.
Collapse
Affiliation(s)
- Arshan Nasir
- Theoretical Biology and Biophysics (T-6), Los Alamos National Laboratory, Los Alamos, NM, USA.
| | - Ethan Romero-Severson
- Theoretical Biology and Biophysics (T-6), Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Jean-Michel Claverie
- Aix Marseille University, CNRS, IGS, Structural and Genomic Information Laboratory (UMR7256), Mediterranean Institute of Microbiology (FR3479), Marseille, France
| |
Collapse
|
4
|
Mughal F, Nasir A, Caetano-Anollés G. The origin and evolution of viruses inferred from fold family structure. Arch Virol 2020; 165:2177-2191. [PMID: 32748179 PMCID: PMC7398281 DOI: 10.1007/s00705-020-04724-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 05/30/2020] [Indexed: 12/16/2022]
Abstract
The canonical frameworks of viral evolution describe viruses as cellular predecessors, reduced forms of cells, or entities that escaped cellular control. The discovery of giant viruses has changed these standard paradigms. Their genetic, proteomic and structural complexities resemble those of cells, prompting a redefinition and reclassification of viruses. In a previous genome-wide analysis of the evolution of structural domains in proteomes, with domains defined at the fold superfamily level, we found the origins of viruses intertwined with those of ancient cells. Here, we extend these data-driven analyses to the study of fold families confirming the co-evolution of viruses and ancient cells and the genetic ability of viruses to foster molecular innovation. The results support our suggestion that viruses arose by genomic reduction from ancient cells and validate a co-evolutionary ‘symbiogenic’ model of viral origins.
Collapse
Affiliation(s)
- Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Arshan Nasir
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, USA
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
5
|
Bokhari RH, Amirjan N, Jeong H, Kim KM, Caetano-Anollés G, Nasir A. Bacterial Origin and Reductive Evolution of the CPR Group. Genome Biol Evol 2020; 12:103-121. [PMID: 32031619 PMCID: PMC7093835 DOI: 10.1093/gbe/evaa024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/31/2020] [Indexed: 12/24/2022] Open
Abstract
The candidate phyla radiation (CPR) is a proposed subdivision within the bacterial domain comprising several candidate phyla. CPR organisms are united by small genome and physical sizes, lack several metabolic enzymes, and populate deep branches within the bacterial subtree of life. These features raise intriguing questions regarding their origin and mode of evolution. In this study, we performed a comparative and phylogenomic analysis to investigate CPR origin and evolution. Unlike previous gene/protein sequence-based reports of CPR evolution, we used protein domain superfamilies classified by protein structure databases to resolve the evolutionary relationships of CPR with non-CPR bacteria, Archaea, Eukarya, and viruses. Across all supergroups, CPR shared maximum superfamilies with non-CPR bacteria and were placed as deep branching bacteria in most phylogenomic trees. CPR contributed 1.22% of new superfamilies to bacteria including the ribosomal protein L19e and encoded four core superfamilies that are likely involved in cell-to-cell interaction and establishing episymbiotic lifestyles. Although CPR and non-CPR bacterial proteomes gained common superfamilies over the course of evolution, CPR and Archaea had more common losses. These losses mostly involved metabolic superfamilies. In fact, phylogenies built from only metabolic protein superfamilies separated CPR and non-CPR bacteria. These findings indicate that CPR are bacterial organisms that have probably evolved in an Archaea-like manner via the early loss of metabolic functions. We also discovered that phylogenies built from metabolic and informational superfamilies gave contrasting views of the groupings among Archaea, Bacteria, and Eukarya, which add to the current debate on the evolutionary relationships among superkingdoms.
Collapse
Affiliation(s)
| | - Nooreen Amirjan
- Department of Biosciences, COMSATS University Islamabad, Pakistan
| | - Hyeonsoo Jeong
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana
| | - Arshan Nasir
- Department of Biosciences, COMSATS University Islamabad, Pakistan
- Theoretical Biology & Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico
| |
Collapse
|
6
|
Mughal F, Caetano-Anollés G. MANET 3.0: Hierarchy and modularity in evolving metabolic networks. PLoS One 2019; 14:e0224201. [PMID: 31648227 PMCID: PMC6812854 DOI: 10.1371/journal.pone.0224201] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 10/08/2019] [Indexed: 11/30/2022] Open
Abstract
Enzyme recruitment is a fundamental evolutionary driver of modern metabolism. We see evidence of recruitment at work in the metabolic Molecular Ancestry Networks (MANET) database, an online resource that integrates data from KEGG, SCOP and structural phylogenomic reconstruction. The database, which was introduced in 2006, traces the deep history of the structural domains of enzymes in metabolic pathways. Here we release version 3.0 of MANET, which updates data from KEGG and SCOP, links enzyme and PDB information with PDBsum, and traces evolutionary information of domains defined at fold family level of SCOP classification in metabolic subnetwork diagrams. Compared to SCOP folds used in the previous versions, fold families are cohesive units of functional similarity that are highly conserved at sequence level and offer a 10-fold increase of data entries. We surveyed enzymatic, functional and catalytic site distributions among superkingdoms showing that ancient enzymatic innovations followed a biphasic temporal pattern of diversification typical of module innovation. We grouped enzymatic activities of MANET into a hierarchical system of subnetworks and mesonetworks matching KEGG classification. The evolutionary growth of these modules of metabolic activity was studied using bipartite networks and their one-mode projections at enzyme, subnetwork and mesonetwork levels of organization. Evolving metabolic networks revealed patterns of enzyme sharing that transcended mesonetwork boundaries and supported the patchwork model of metabolic evolution. We also explored the scale-freeness, randomness and small-world properties of evolving networks as possible organizing principles of network growth and diversification. The network structure shows an increase in hierarchical modularity and scale-free behavior as metabolic networks unfold in evolutionary time. Remarkably, this evolutionary constraint on structure was stronger at lower levels of metabolic organization. Evolving metabolic structure reveals a 'principle of granularity', an evolutionary increase of the cohesiveness of lower-level parts of a hierarchical system. MANET is available at http://manet.illinois.edu.
Collapse
Affiliation(s)
- Fizza Mughal
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| |
Collapse
|
7
|
Caetano-Anollés D, Nasir A, Kim KM, Caetano-Anollés G. Testing Empirical Support for Evolutionary Models that Root the Tree of Life. J Mol Evol 2019; 87:131-142. [PMID: 30887086 PMCID: PMC6443624 DOI: 10.1007/s00239-019-09891-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Accepted: 03/06/2019] [Indexed: 12/12/2022]
Abstract
Trees of life (ToLs) can only be rooted with direct methods that seek optimization of character state information in ingroup taxa. This involves optimizing phylogenetic tree, model and data in an exercise of reciprocal illumination. Rooted ToLs have been built from a census of protein structural domains in proteomes using two kinds of models. Fully-reversible models use standard-ordered (additive) characters and Wagner parsimony to generate unrooted trees of proteomes that are then rooted with Weston's generality criterion. Non-reversible models directly build rooted trees with unordered characters and asymmetric stepmatrices of transformation costs that penalize gain over loss of domains. Here, we test the empirical support for the evolutionary models with character state reconstruction methods using two published proteomic datasets. We show that the reversible models match reconstructed frequencies of character change and are faithful to the distribution of serial homologies in trees. In contrast, the non-reversible models go counter to trends in the data they must explain, attracting organisms with large proteomes to the base of the rooted trees while violating the triangle inequality of distances. This can lead to serious reconstruction inconsistencies that show model inadequacy. Our study highlights the aprioristic perils of disposing of countering evidence in natural history reconstruction.
Collapse
Affiliation(s)
- Derek Caetano-Anollés
- Department of Evolutionary Genetics, Max-Planck-Institut für Evolutionsbiologie, Plön, Germany.
| | - Arshan Nasir
- Department of Biosciences, COMSATS University, Islamabad, 45550, Pakistan
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| |
Collapse
|
8
|
Caetano-Anollés G, Nasir A, Kim KM, Caetano-Anollés D. Rooting Phylogenies and the Tree of Life While Minimizing Ad Hoc and Auxiliary Assumptions. Evol Bioinform Online 2018; 14:1176934318805101. [PMID: 30364468 PMCID: PMC6196624 DOI: 10.1177/1176934318805101] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 09/05/2018] [Indexed: 12/25/2022] Open
Abstract
Phylogenetic methods unearth evolutionary history when supported by three starting points of reason: (1) the continuity axiom begs the existence of a "model" of evolutionary change, (2) the singularity axiom defines the historical ground plan (phylogeny) in which biological entities (taxa) evolve, and (3) the memory axiom demands identification of biological attributes (characters) with historical information. Axiom consequences are interlinked, making the retrodiction enterprise an endeavor of reciprocal fulfillment. In particular, establishing direction of evolutionary change (character polarization) roots phylogenies and enables testing the existence of historical memory (homology). Unfortunately, rooting phylogenies, especially the "tree of life," generally follow narratives instead of integrating empirical and theoretical knowledge of retrodictive exploration. This stems mostly from a focus on molecular sequence analysis and uncertainties about rooting methods. Here, we review available rooting criteria, highlighting the need to minimize both ad hoc and auxiliary assumptions, especially argumentative ad hocness. We show that while the outgroup comparison method has been widely adopted, the generality criterion of nesting and additive phylogenetic change embodied in Weston rule offers the most powerful rooting approach. We also propose a change of focus, from phylogenies that describe the evolution of biological systems to those that describe the evolution of parts of those systems. This weakens violation of character independence, helps formalize the generality criterion of rooting, and provides new ways to study the problem of evolution.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Derek Caetano-Anollés
- Department of Evolutionary Genetics, Max-Planck-Institut für Evolutionsbiologie, Plön, Germany
| |
Collapse
|
9
|
McCafferty CL, Sergeev YV. Global computational mutagenesis provides a critical stability framework in protein structures. PLoS One 2017; 12:e0189064. [PMID: 29216252 PMCID: PMC5720693 DOI: 10.1371/journal.pone.0189064] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Accepted: 11/17/2017] [Indexed: 11/20/2022] Open
Abstract
A protein’s amino acid sequence dictates the folds and final structure the macromolecule will form. We propose that by identifying critical residues in a protein’s atomic structure, we can select a critical stability framework within the protein structure essential to proper protein folding. We use global computational mutagenesis based on the unfolding mutation screen to test the effect of every possible missense mutation on the protein structure to identify the residues that cannot tolerate a substitution without causing protein misfolding. This method was tested using molecular dynamics to simulate the stability effects of mutating critical residues in proteins involved in inherited disease, such as myoglobin, p53, and the 15th sushi domain of complement factor H. In addition we prove that when the critical residues are in place, other residues may be changed within the structure without a stability loss. We validate that critical residues are conserved using myoglobin to show that critical residues are the same for crystal structures of 6 different species and comparing conservation indices to critical residues in 9 eye disease-related proteins. Our studies demonstrate that by using a selection of critical elements in a protein structure we can identify a critical protein stability framework. The frame of critical residues can be used in genetic engineering to improve small molecule binding for drug studies, identify loss-of-function disease-causing missense mutations in genetics studies, and aide in identifying templates for homology modeling.
Collapse
Affiliation(s)
- Caitlyn L. McCafferty
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Yuri V. Sergeev
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
10
|
Malik SS, Azem-E-Zahra S, Kim KM, Caetano-Anollés G, Nasir A. Do Viruses Exchange Genes across Superkingdoms of Life? Front Microbiol 2017; 8:2110. [PMID: 29163404 PMCID: PMC5671483 DOI: 10.3389/fmicb.2017.02110] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 10/16/2017] [Indexed: 12/13/2022] Open
Abstract
Viruses can be classified into archaeoviruses, bacterioviruses, and eukaryoviruses according to the taxonomy of the infected host. The host-constrained perception of viruses implies preference of genetic exchange between viruses and cellular organisms of their host superkingdoms and viral origins from host cells either via escape or reduction. However, viruses frequently establish non-lytic interactions with organisms and endogenize into the genomes of bacterial endosymbionts that reside in eukaryotic cells. Such interactions create opportunities for genetic exchange between viruses and organisms of non-host superkingdoms. Here, we take an atypical approach to revisit virus-cell interactions by first identifying protein fold structures in the proteomes of archaeoviruses, bacterioviruses, and eukaryoviruses and second by tracing their spread in the proteomes of superkingdoms Archaea, Bacteria, and Eukarya. The exercise quantified protein structural homologies between viruses and organisms of their host and non-host superkingdoms and revealed likely candidates for virus-to-cell and cell-to-virus gene transfers. Unexpected lifestyle-driven genetic affiliations between bacterioviruses and Eukarya and eukaryoviruses and Bacteria were also predicted in addition to a large cohort of protein folds that were universally shared by viral and cellular proteomes and virus-specific protein folds not detected in cellular proteomes. These protein folds provide unique insights into viral origins and evolution that are generally difficult to recover with traditional sequence alignment-dependent evolutionary analyses owing to the fast mutation rates of viral gene sequences.
Collapse
Affiliation(s)
- Shahana S Malik
- Department of Biosciences, COMSATS Institute of Information Technology, Islamabad, Pakistan
| | - Syeda Azem-E-Zahra
- Department of Biosciences, COMSATS Institute of Information Technology, Islamabad, Pakistan
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, South Korea
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Arshan Nasir
- Department of Biosciences, COMSATS Institute of Information Technology, Islamabad, Pakistan.,Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| |
Collapse
|
11
|
Nasir A, Kim KM, Caetano-Anollés G. Phylogenetic Tracings of Proteome Size Support the Gradual Accretion of Protein Structural Domains and the Early Origin of Viruses from Primordial Cells. Front Microbiol 2017; 8:1178. [PMID: 28690608 PMCID: PMC5481351 DOI: 10.3389/fmicb.2017.01178] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 06/09/2017] [Indexed: 01/05/2023] Open
Abstract
Untangling the origin and evolution of viruses remains a challenging proposition. We recently studied the global distribution of protein domain structures in thousands of completely sequenced viral and cellular proteomes with comparative genomics, phylogenomics, and multidimensional scaling methods. A tree of life describing the evolution of proteomes revealed viruses emerging from the base of the tree as a fourth supergroup of life. A tree of domains indicated an early origin of modern viral lineages from ancient cells that co-existed with the cellular ancestors. However, it was recently argued that the rooting of our trees and the basal placement of viruses was artifactually induced by small genome (proteome) size. Here we show that these claims arise from misunderstanding and misinterpretations of cladistic methodology. Trees are reconstructed unrooted, and thus, their topologies cannot be distorted a posteriori by the rooting methodology. Tracing proteome size in trees and multidimensional views of evolutionary relationships as well as tests of leaf stability and exclusion/inclusion of taxa demonstrated that the smallest proteomes were neither attracted toward the root nor caused any topological distortions of the trees. Simulations confirmed that taxa clustering patterns were independent of proteome size and were determined by the presence of known evolutionary relatives in data matrices, highlighting the need for broader taxon sampling in phylogeny reconstruction. Instead, phylogenetic tracings of proteome size revealed a slowdown in innovation of the structural domain vocabulary and four regimes of allometric scaling that reflected a Heaps law. These regimes explained increasing economies of scale in the evolutionary growth and accretion of kernel proteome repertoires of viruses and cellular organisms that resemble growth of human languages with limited vocabulary sizes. Results reconcile dynamic and static views of domain frequency distributions that are consistent with the axiom of spatiotemporal continuity that is tenet of evolutionary thinking.
Collapse
Affiliation(s)
- Arshan Nasir
- Department of Biosciences, COMSATS Institute of Information TechnologyIslamabad, Pakistan
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-ChampaignUrbana, IL, United States
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research InstituteIncheon, South Korea
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-ChampaignUrbana, IL, United States
| |
Collapse
|
12
|
Koç I, Caetano-Anollés G. The natural history of molecular functions inferred from an extensive phylogenomic analysis of gene ontology data. PLoS One 2017; 12:e0176129. [PMID: 28467492 PMCID: PMC5414959 DOI: 10.1371/journal.pone.0176129] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 04/05/2017] [Indexed: 11/18/2022] Open
Abstract
The origin and natural history of molecular functions hold the key to the emergence of cellular organization and modern biochemistry. Here we use a genomic census of Gene Ontology (GO) terms to reconstruct phylogenies at the three highest (1, 2 and 3) and the lowest (terminal) levels of the hierarchy of molecular functions, which reflect the broadest and the most specific GO definitions, respectively. These phylogenies define evolutionary timelines of functional innovation. We analyzed 249 free-living organisms comprising the three superkingdoms of life, Archaea, Bacteria, and Eukarya. Phylogenies indicate catalytic, binding and transport functions were the oldest, suggesting a 'metabolism-first' origin scenario for biochemistry. Metabolism made use of increasingly complicated organic chemistry. Primordial features of ancient molecular functions and functional recruitments were further distilled by studying the oldest child terms of the oldest level 1 GO definitions. Network analyses showed the existence of an hourglass pattern of enzyme recruitment in the molecular functions of the directed acyclic graph of molecular functions. Older high-level molecular functions were thoroughly recruited at younger lower levels, while very young high-level functions were used throughout the timeline. This pattern repeated in every one of the three mappings, which gave a criss-cross pattern. The timelines and their mappings were remarkable. They revealed the progressive evolutionary development of functional toolkits, starting with the early rise of metabolic activities, followed chronologically by the rise of macromolecular biosynthesis, the establishment of controlled interactions with the environment and self, adaptation to oxygen, and enzyme coordinated regulation, and ending with the rise of structural and cellular complexity. This historical account holds important clues for dissection of the emergence of biomcomplexity and life.
Collapse
Affiliation(s)
- Ibrahim Koç
- Molecular Biology and Genetics, Gebze Technical University, Kocaeli, Turkey
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| |
Collapse
|
13
|
Nasir A, Caetano-Anollés G. Identification of Capsid/Coat Related Protein Folds and Their Utility for Virus Classification. Front Microbiol 2017; 8:380. [PMID: 28344575 PMCID: PMC5344890 DOI: 10.3389/fmicb.2017.00380] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2016] [Accepted: 02/23/2017] [Indexed: 12/31/2022] Open
Abstract
The viral supergroup includes the entire collection of known and unknown viruses that roam our planet and infect life forms. The supergroup is remarkably diverse both in its genetics and morphology and has historically remained difficult to study and classify. The accumulation of protein structure data in the past few years now provides an excellent opportunity to re-examine the classification and evolution of viruses. Here we scan completely sequenced viral proteomes from all genome types and identify protein folds involved in the formation of viral capsids and virion architectures. Viruses encoding similar capsid/coat related folds were pooled into lineages, after benchmarking against published literature. Remarkably, the in silico exercise reproduced all previously described members of known structure-based viral lineages, along with several proposals for new additions, suggesting it could be a useful supplement to experimental approaches and to aid qualitative assessment of viral diversity in metagenome samples.
Collapse
Affiliation(s)
- Arshan Nasir
- Department of Crop Sciences, Evolutionary Bioinformatics Laboratory, University of Illinois at Urbana-ChampaignUrbana, IL, USA; Department of Biosciences, COMSATS Institute of Information TechnologyIslamabad, Pakistan
| | - Gustavo Caetano-Anollés
- Department of Crop Sciences, Evolutionary Bioinformatics Laboratory, University of Illinois at Urbana-Champaign Urbana, IL, USA
| |
Collapse
|
14
|
Arguments Reinforcing the Three-Domain View of Diversified Cellular Life. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2016; 2016:1851865. [PMID: 28050162 PMCID: PMC5165138 DOI: 10.1155/2016/1851865] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Revised: 10/18/2016] [Accepted: 11/03/2016] [Indexed: 11/18/2022]
Abstract
The archaeal ancestor scenario (AAS) for the origin of eukaryotes implies the emergence of a new kind of organism from the fusion of ancestral archaeal and bacterial cells. Equipped with this “chimeric” molecular arsenal, the resulting cell would gradually accumulate unique genes and develop the complex molecular machineries and cellular compartments that are hallmarks of modern eukaryotes. In this regard, proteins related to phagocytosis and cell movement should be present in the archaeal ancestor, thus identifying the recently described candidate archaeal phylum “Lokiarchaeota” as resembling a possible candidate ancestor of eukaryotes. Despite its appeal, AAS seems incompatible with the genomic, molecular, and biochemical differences that exist between Archaea and Eukarya. In particular, the distribution of conserved protein domain structures in the proteomes of cellular organisms and viruses appears hard to reconcile with the AAS. In addition, concerns related to taxon and character sampling, presupposing bacterial outgroups in phylogenies, and nonuniform effects of protein domain structure rearrangement and gain/loss in concatenated alignments of protein sequences cast further doubt on AAS-supporting phylogenies. Here, we evaluate AAS against the traditional “three-domain” world of cellular organisms and propose that the discovery of Lokiarchaeota could be better reconciled under the latter view, especially in light of several additional biological and technical considerations.
Collapse
|
15
|
Caetano-Anollés D, Caetano-Anollés G. Piecemeal Buildup of the Genetic Code, Ribosomes, and Genomes from Primordial tRNA Building Blocks. Life (Basel) 2016; 6:life6040043. [PMID: 27918435 PMCID: PMC5198078 DOI: 10.3390/life6040043] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 11/21/2016] [Accepted: 11/29/2016] [Indexed: 01/10/2023] Open
Abstract
The origin of biomolecular machinery likely centered around an ancient and central molecule capable of interacting with emergent macromolecular complexity. tRNA is the oldest and most central nucleic acid molecule of the cell. Its co-evolutionary interactions with aminoacyl-tRNA synthetase protein enzymes define the specificities of the genetic code and those with the ribosome their accurate biosynthetic interpretation. Phylogenetic approaches that focus on molecular structure allow reconstruction of evolutionary timelines that describe the history of RNA and protein structural domains. Here we review phylogenomic analyses that reconstruct the early history of the synthetase enzymes and the ribosome, their interactions with RNA, and the inception of amino acid charging and codon specificities in tRNA that are responsible for the genetic code. We also trace the age of domains and tRNA onto ancient tRNA homologies that were recently identified in rRNA. Our findings reveal a timeline of recruitment of tRNA building blocks for the formation of a functional ribosome, which holds both the biocatalytic functions of protein biosynthesis and the ability to store genetic memory in primordial RNA genomic templates.
Collapse
Affiliation(s)
- Derek Caetano-Anollés
- Department of Evolutionary Genetics, Max-Planck-Institut für Evolutionsbiologie, 24306 Plön, Germany.
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| |
Collapse
|
16
|
Nasir A, Caetano-Anollés G. A phylogenomic data-driven exploration of viral origins and evolution. SCIENCE ADVANCES 2015; 1:e1500527. [PMID: 26601271 PMCID: PMC4643759 DOI: 10.1126/sciadv.1500527] [Citation(s) in RCA: 116] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 06/30/2015] [Indexed: 05/05/2023]
Abstract
The origin of viruses remains mysterious because of their diverse and patchy molecular and functional makeup. Although numerous hypotheses have attempted to explain viral origins, none is backed by substantive data. We take full advantage of the wealth of available protein structural and functional data to explore the evolution of the proteomic makeup of thousands of cells and viruses. Despite the extremely reduced nature of viral proteomes, we established an ancient origin of the "viral supergroup" and the existence of widespread episodes of horizontal transfer of genetic information. Viruses harboring different replicon types and infecting distantly related hosts shared many metabolic and informational protein structural domains of ancient origin that were also widespread in cellular proteomes. Phylogenomic analysis uncovered a universal tree of life and revealed that modern viruses reduced from multiple ancient cells that harbored segmented RNA genomes and coexisted with the ancestors of modern cells. The model for the origin and evolution of viruses and cells is backed by strong genomic and structural evidence and can be reconciled with existing models of viral evolution if one considers viruses to have originated from ancient cells and not from modern counterparts.
Collapse
|
17
|
Nasir A, Kim KM, Caetano-Anollés G. Viral evolution: Primordial cellular origins and late adaptation to parasitism. Mob Genet Elements 2014; 2:247-252. [PMID: 23550145 PMCID: PMC3575434 DOI: 10.4161/mge.22797] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Explaining the origin of viruses remains an important challenge for evolutionary biology. Previous explanatory frameworks described viruses as founders of cellular life, as parasitic reductive products of ancient cellular organisms or as escapees of modern genomes. Each of these frameworks endow viruses with distinct molecular, cellular, dynamic and emergent properties that carry broad and important implications for many disciplines, including biology, ecology and epidemiology. In a recent genome-wide structural phylogenomic analysis, we have shown that large-to-medium-sized viruses coevolved with cellular ancestors and have chosen the evolutionary reductive route. Here we interpret these results and provide a parsimonious hypothesis for the origin of viruses that is supported by molecular data and objective evolutionary bioinformatic approaches. Results suggest two important phases in the evolution of viruses: (1) origin from primordial cells and coexistence with cellular ancestors, and (2) prolonged pressure of genome reduction and relatively late adaptation to the parasitic lifestyle once virions and diversified cellular life took over the planet. Under this evolutionary model, new viral lineages can evolve from existing cellular parasites and enhance the diversity of the world’s virosphere.
Collapse
Affiliation(s)
- Arshan Nasir
- Department of Crop Science; University of Illinois at Urbana-Champaign; Urbana, IL USA ; Illinois Informatics Institute; University of Illinois at Urbana-Champaign; Urbana, IL USA
| | | | | |
Collapse
|
18
|
A phylogenomic census of molecular functions identifies modern thermophilic archaea as the most ancient form of cellular life. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2014; 2014:706468. [PMID: 25249790 PMCID: PMC4164138 DOI: 10.1155/2014/706468] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2013] [Revised: 11/20/2013] [Accepted: 01/17/2014] [Indexed: 12/30/2022]
Abstract
The origins of diversified life remain mysterious despite considerable efforts devoted to untangling the roots of the universal tree of life. Here we reconstructed phylogenies that described the evolution of molecular functions and the evolution of species directly from a genomic census of gene ontology (GO) definitions. We sampled 249 free-living genomes spanning organisms in the three superkingdoms of life, Archaea, Bacteria, and Eukarya, and used the abundance of GO terms as molecular characters to produce rooted phylogenetic trees. Results revealed an early thermophilic origin of Archaea that was followed by genome reduction events in microbial superkingdoms. Eukaryal genomes displayed extraordinary functional diversity and were enriched with hundreds of novel molecular activities not detected in the akaryotic microbial cells. Remarkably, the majority of these novel functions appeared quite late in evolution, synchronized with the diversification of the eukaryal superkingdom. The distribution of GO terms in superkingdoms confirms that Archaea appears to be the simplest and most ancient form of cellular life, while Eukarya is the most diverse and recent.
Collapse
|
19
|
Kim KM, Nasir A, Hwang K, Caetano-Anollés G. A tree of cellular life inferred from a genomic census of molecular functions. J Mol Evol 2014; 79:240-62. [PMID: 25128982 DOI: 10.1007/s00239-014-9637-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 08/05/2014] [Indexed: 10/24/2022]
Abstract
Phylogenomics aims to describe evolutionary relatedness between organisms by analyzing genomic data. The common practice is to produce phylogenomic trees from molecular information in the sequence, order, and content of genes in genomes. These phylogenies describe the evolution of life and become valuable tools for taxonomy. The recent availability of structural and functional data for hundreds of genomes now offers the opportunity to study evolution using more deep, conserved, and reliable sets of molecular features. Here, we reconstruct trees of life from the functions of proteins. We start by inferring rooted phylogenomic trees and networks of organisms directly from Gene Ontology annotations. Phylogenies and networks yield novel insights into the emergence and evolution of cellular life. The ancestor of Archaea originated earlier than the ancestors of Bacteria and Eukarya and was thermophilic. In contrast, basal bacterial lineages were non-thermophilic. A close relationship between Plants and Metazoa was also identified that disagrees with the traditional Fungi-Metazoa grouping. While measures of evolutionary reticulation were minimum in Eukarya and maximum in Bacteria, the massive role of horizontal gene transfer in microbes did not materialize in phylogenomic networks. Phylogenies and networks also showed that the best reconstructions were recovered when problematic taxa (i.e., parasitic/symbiotic organisms) and horizontally transferred characters were excluded from analysis. Our results indicate that functionomic data represent a useful addition to the set of molecular characters used for tree reconstruction and that trees of cellular life carry in deep branches considerable predictive power to explain the evolution of living organisms.
Collapse
Affiliation(s)
- Kyung Mo Kim
- Microbial Resource Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, 305-806, Korea
| | | | | | | |
Collapse
|
20
|
Caetano-Anollés G, Nasir A, Zhou K, Caetano-Anollés D, Mittenthal JE, Sun FJ, Kim KM. Archaea: the first domain of diversified life. ARCHAEA (VANCOUVER, B.C.) 2014; 2014:590214. [PMID: 24987307 PMCID: PMC4060292 DOI: 10.1155/2014/590214] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Revised: 02/15/2014] [Accepted: 03/25/2014] [Indexed: 01/23/2023]
Abstract
The study of the origin of diversified life has been plagued by technical and conceptual difficulties, controversy, and apriorism. It is now popularly accepted that the universal tree of life is rooted in the akaryotes and that Archaea and Eukarya are sister groups to each other. However, evolutionary studies have overwhelmingly focused on nucleic acid and protein sequences, which partially fulfill only two of the three main steps of phylogenetic analysis, formulation of realistic evolutionary models, and optimization of tree reconstruction. In the absence of character polarization, that is, the ability to identify ancestral and derived character states, any statement about the rooting of the tree of life should be considered suspect. Here we show that macromolecular structure and a new phylogenetic framework of analysis that focuses on the parts of biological systems instead of the whole provide both deep and reliable phylogenetic signal and enable us to put forth hypotheses of origin. We review over a decade of phylogenomic studies, which mine information in a genomic census of millions of encoded proteins and RNAs. We show how the use of process models of molecular accumulation that comply with Weston's generality criterion supports a consistent phylogenomic scenario in which the origin of diversified life can be traced back to the early history of Archaea.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Kaiyue Zhou
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Derek Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Jay E. Mittenthal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Feng-Jie Sun
- School of Science and Technology, Georgia Gwinnett College, Lawrenceville, GA 30043, USA
| | - Kyung Mo Kim
- Microbial Resource Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 305-806, Republic of Korea
| |
Collapse
|
21
|
Nath N, Mitchell JBO, Caetano-Anollés G. The natural history of biocatalytic mechanisms. PLoS Comput Biol 2014; 10:e1003642. [PMID: 24874434 PMCID: PMC4038463 DOI: 10.1371/journal.pcbi.1003642] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2013] [Accepted: 04/09/2014] [Indexed: 11/29/2022] Open
Abstract
Phylogenomic analysis of the occurrence and abundance of protein domains in proteomes has recently showed that the α/β architecture is probably the oldest fold design. This holds important implications for the origins of biochemistry. Here we explore structure-function relationships addressing the use of chemical mechanisms by ancestral enzymes. We test the hypothesis that the oldest folds used the most mechanisms. We start by tracing biocatalytic mechanisms operating in metabolic enzymes along a phylogenetic timeline of the first appearance of homologous superfamilies of protein domain structures from CATH. A total of 335 enzyme reactions were retrieved from MACiE and were mapped over fold age. We define a mechanistic step type as one of the 51 mechanistic annotations given in MACiE, and each step of each of the 335 mechanisms was described using one or more of these annotations. We find that the first two folds, the P-loop containing nucleotide triphosphate hydrolase and the NAD(P)-binding Rossmann-like homologous superfamilies, were α/β architectures responsible for introducing 35% (18/51) of the known mechanistic step types. We find that these two oldest structures in the phylogenomic analysis of protein domains introduced many mechanistic step types that were later combinatorially spread in catalytic history. The most common mechanistic step types included fundamental building blocks of enzyme chemistry: "Proton transfer," "Bimolecular nucleophilic addition," "Bimolecular nucleophilic substitution," and "Unimolecular elimination by the conjugate base." They were associated with the most ancestral fold structure typical of P-loop containing nucleotide triphosphate hydrolases. Over half of the mechanistic step types were introduced in the evolutionary timeline before the appearance of structures specific to diversified organisms, during a period of architectural diversification. The other half unfolded gradually after organismal diversification and during a period that spanned ∼2 billion years of evolutionary history.
Collapse
Affiliation(s)
- Neetika Nath
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, University of St. Andrews, North Haugh, St. Andrews, Scotland, United Kingdom
| | - John B. O. Mitchell
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, University of St. Andrews, North Haugh, St. Andrews, Scotland, United Kingdom
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| |
Collapse
|
22
|
Global patterns of protein domain gain and loss in superkingdoms. PLoS Comput Biol 2014; 10:e1003452. [PMID: 24499935 PMCID: PMC3907288 DOI: 10.1371/journal.pcbi.1003452] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Accepted: 12/03/2013] [Indexed: 12/21/2022] Open
Abstract
Domains are modules within proteins that can fold and function independently and are evolutionarily conserved. Here we compared the usage and distribution of protein domain families in the free-living proteomes of Archaea, Bacteria and Eukarya and reconstructed species phylogenies while tracing the history of domain emergence and loss in proteomes. We show that both gains and losses of domains occurred frequently during proteome evolution. The rate of domain discovery increased approximately linearly in evolutionary time. Remarkably, gains generally outnumbered losses and the gain-to-loss ratios were much higher in akaryotes compared to eukaryotes. Functional annotations of domain families revealed that both Archaea and Bacteria gained and lost metabolic capabilities during the course of evolution while Eukarya acquired a number of diverse molecular functions including those involved in extracellular processes, immunological mechanisms, and cell regulation. Results also highlighted significant contemporary sharing of informational enzymes between Archaea and Eukarya and metabolic enzymes between Bacteria and Eukarya. Finally, the analysis provided useful insights into the evolution of species. The archaeal superkingdom appeared first in evolution by gradual loss of ancestral domains, bacterial lineages were the first to gain superkingdom-specific domains, and eukaryotes (likely) originated when an expanding proto-eukaryotic stem lineage gained organelles through endosymbiosis of already diversified bacterial lineages. The evolutionary dynamics of domain families in proteomes and the increasing number of domain gains is predicted to redefine the persistence strategies of organisms in superkingdoms, influence the make up of molecular functions, and enhance organismal complexity by the generation of new domain architectures. This dynamics highlights ongoing secondary evolutionary adaptations in akaryotic microbes, especially Archaea.
Collapse
|
23
|
Comparative analysis of proteomes and functionomes provides insights into origins of cellular diversification. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2013; 2013:648746. [PMID: 24492748 PMCID: PMC3892558 DOI: 10.1155/2013/648746] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Revised: 11/22/2013] [Accepted: 11/25/2013] [Indexed: 11/22/2022]
Abstract
Reconstructing the evolutionary history of modern species is a difficult problem complicated by the conceptual and technical limitations of phylogenetic tree building methods. Here, we propose a comparative proteomic and functionomic inferential framework for genome evolution that allows resolving the tripartite division of cells and sketching their history. Evolutionary inferences were derived from the spread of conserved molecular features, such as molecular structures and functions, in the proteomes and functionomes of contemporary organisms. Patterns of use and reuse of these traits yielded significant insights into the origins of cellular diversification. Results uncovered an unprecedented strong evolutionary association between Bacteria and Eukarya while revealing marked evolutionary reductive tendencies in the archaeal genomic repertoires. The effects of nonvertical evolutionary processes (e.g., HGT, convergent evolution) were found to be limited while reductive evolution and molecular innovation appeared to be prevalent during the evolution of cells. Our study revealed a strong vertical trace in the history of proteins and associated molecular functions, which was reliably recovered using the comparative genomics approach. The trace supported the existence of a stem line of descent and the very early appearance of Archaea as a diversified superkingdom, but failed to uncover a hidden canonical pattern in which Bacteria was the first superkingdom to deploy superkingdom-specific structures and functions.
Collapse
|
24
|
Caetano-Anollés G, Wang M, Caetano-Anollés D. Structural phylogenomics retrodicts the origin of the genetic code and uncovers the evolutionary impact of protein flexibility. PLoS One 2013; 8:e72225. [PMID: 23991065 PMCID: PMC3749098 DOI: 10.1371/journal.pone.0072225] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Accepted: 07/07/2013] [Indexed: 11/18/2022] Open
Abstract
The genetic code shapes the genetic repository. Its origin has puzzled molecular scientists for over half a century and remains a long-standing mystery. Here we show that the origin of the genetic code is tightly coupled to the history of aminoacyl-tRNA synthetase enzymes and their interactions with tRNA. A timeline of evolutionary appearance of protein domain families derived from a structural census in hundreds of genomes reveals the early emergence of the 'operational' RNA code and the late implementation of the standard genetic code. The emergence of codon specificities and amino acid charging involved tight coevolution of aminoacyl-tRNA synthetases and tRNA structures as well as episodes of structural recruitment. Remarkably, amino acid and dipeptide compositions of single-domain proteins appearing before the standard code suggest archaic synthetases with structures homologous to catalytic domains of tyrosyl-tRNA and seryl-tRNA synthetases were capable of peptide bond formation and aminoacylation. Results reveal that genetics arose through coevolutionary interactions between polypeptides and nucleic acid cofactors as an exacting mechanism that favored flexibility and folding of the emergent proteins. These enhancements of phenotypic robustness were likely internalized into the emerging genetic system with the early rise of modern protein structure.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
- * E-mail:
| | - Minglei Wang
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| | - Derek Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| |
Collapse
|
25
|
Bukhari SA, Caetano-Anollés G. Origin and evolution of protein fold designs inferred from phylogenomic analysis of CATH domain structures in proteomes. PLoS Comput Biol 2013; 9:e1003009. [PMID: 23555236 PMCID: PMC3610613 DOI: 10.1371/journal.pcbi.1003009] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2012] [Accepted: 02/13/2013] [Indexed: 12/22/2022] Open
Abstract
The spatial arrangements of secondary structures in proteins, irrespective of their connectivity, depict the overall shape and organization of protein domains. These features have been used in the CATH and SCOP classifications to hierarchically partition fold space and define the architectural make up of proteins. Here we use phylogenomic methods and a census of CATH structures in hundreds of genomes to study the origin and diversification of protein architectures (A) and their associated topologies (T) and superfamilies (H). Phylogenies that describe the evolution of domain structures and proteomes were reconstructed from the structural census and used to generate timelines of domain discovery. Phylogenies of CATH domains at T and H levels of structural abstraction and associated chronologies revealed patterns of reductive evolution, the early rise of Archaea, three epochs in the evolution of the protein world, and patterns of structural sharing between superkingdoms. Phylogenies of proteomes confirmed the early appearance of Archaea. While these findings are in agreement with previous phylogenomic studies based on the SCOP classification, phylogenies unveiled sharing patterns between Archaea and Eukarya that are recent and can explain the canonical bacterial rooting typically recovered from sequence analysis. Phylogenies of CATH domains at A level uncovered general patterns of architectural origin and diversification. The tree of A structures showed that ancient structural designs such as the 3-layer (αβα) sandwich (3.40) or the orthogonal bundle (1.10) are comparatively simpler in their makeup and are involved in basic cellular functions. In contrast, modern structural designs such as prisms, propellers, 2-solenoid, super-roll, clam, trefoil and box are not widely distributed and were probably adopted to perform specialized functions. Our timelines therefore uncover a universal tendency towards protein structural complexity that is remarkable.
Collapse
Affiliation(s)
- Syed Abbas Bukhari
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| |
Collapse
|
26
|
Caetano-Anollés K, Caetano-Anollés G. Structural phylogenomics reveals gradual evolutionary replacement of abiotic chemistries by protein enzymes in purine metabolism. PLoS One 2013; 8:e59300. [PMID: 23516625 PMCID: PMC3596326 DOI: 10.1371/journal.pone.0059300] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 02/13/2013] [Indexed: 11/30/2022] Open
Abstract
The origin of metabolism has been linked to abiotic chemistries that existed in our planet at the beginning of life. While plausible chemical pathways have been proposed, including the synthesis of nucleobases, ribose and ribonucleotides, the cooption of these reactions by modern enzymes remains shrouded in mystery. Here we study the emergence of purine metabolism. The ages of protein domains derived from a census of fold family structure in hundreds of genomes were mapped onto enzymes in metabolic diagrams. We find that the origin of the nucleotide interconversion pathway benefited most parsimoniously from the prebiotic formation of adenine nucleosides. In turn, pathways of nucleotide biosynthesis, catabolism and salvage originated ∼300 million years later by concerted enzymatic recruitments and gradual replacement of abiotic chemistries. Remarkably, this process led to the emergence of the fully enzymatic biosynthetic pathway ∼3 billion years ago, concurrently with the appearance of a functional ribosome. The simultaneous appearance of purine biosynthesis and the ribosome probably fulfilled the expanding matter-energy and processing needs of genomic information.
Collapse
Affiliation(s)
- Kelsey Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Chicago School of Professional Psychology, Chicago, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|