1
|
Mughal F, Nasir A, Caetano-Anollés G. The origin and evolution of viruses inferred from fold family structure. Arch Virol 2020; 165:2177-2191. [PMID: 32748179 PMCID: PMC7398281 DOI: 10.1007/s00705-020-04724-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 05/30/2020] [Indexed: 12/16/2022]
Abstract
The canonical frameworks of viral evolution describe viruses as cellular predecessors, reduced forms of cells, or entities that escaped cellular control. The discovery of giant viruses has changed these standard paradigms. Their genetic, proteomic and structural complexities resemble those of cells, prompting a redefinition and reclassification of viruses. In a previous genome-wide analysis of the evolution of structural domains in proteomes, with domains defined at the fold superfamily level, we found the origins of viruses intertwined with those of ancient cells. Here, we extend these data-driven analyses to the study of fold families confirming the co-evolution of viruses and ancient cells and the genetic ability of viruses to foster molecular innovation. The results support our suggestion that viruses arose by genomic reduction from ancient cells and validate a co-evolutionary ‘symbiogenic’ model of viral origins.
Collapse
Affiliation(s)
- Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Arshan Nasir
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, USA
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
2
|
Bokhari RH, Amirjan N, Jeong H, Kim KM, Caetano-Anollés G, Nasir A. Bacterial Origin and Reductive Evolution of the CPR Group. Genome Biol Evol 2020; 12:103-121. [PMID: 32031619 PMCID: PMC7093835 DOI: 10.1093/gbe/evaa024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/31/2020] [Indexed: 12/24/2022] Open
Abstract
The candidate phyla radiation (CPR) is a proposed subdivision within the bacterial domain comprising several candidate phyla. CPR organisms are united by small genome and physical sizes, lack several metabolic enzymes, and populate deep branches within the bacterial subtree of life. These features raise intriguing questions regarding their origin and mode of evolution. In this study, we performed a comparative and phylogenomic analysis to investigate CPR origin and evolution. Unlike previous gene/protein sequence-based reports of CPR evolution, we used protein domain superfamilies classified by protein structure databases to resolve the evolutionary relationships of CPR with non-CPR bacteria, Archaea, Eukarya, and viruses. Across all supergroups, CPR shared maximum superfamilies with non-CPR bacteria and were placed as deep branching bacteria in most phylogenomic trees. CPR contributed 1.22% of new superfamilies to bacteria including the ribosomal protein L19e and encoded four core superfamilies that are likely involved in cell-to-cell interaction and establishing episymbiotic lifestyles. Although CPR and non-CPR bacterial proteomes gained common superfamilies over the course of evolution, CPR and Archaea had more common losses. These losses mostly involved metabolic superfamilies. In fact, phylogenies built from only metabolic protein superfamilies separated CPR and non-CPR bacteria. These findings indicate that CPR are bacterial organisms that have probably evolved in an Archaea-like manner via the early loss of metabolic functions. We also discovered that phylogenies built from metabolic and informational superfamilies gave contrasting views of the groupings among Archaea, Bacteria, and Eukarya, which add to the current debate on the evolutionary relationships among superkingdoms.
Collapse
Affiliation(s)
| | - Nooreen Amirjan
- Department of Biosciences, COMSATS University Islamabad, Pakistan
| | - Hyeonsoo Jeong
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana
| | - Arshan Nasir
- Department of Biosciences, COMSATS University Islamabad, Pakistan
- Theoretical Biology & Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico
| |
Collapse
|
3
|
Mughal F, Caetano-Anollés G. MANET 3.0: Hierarchy and modularity in evolving metabolic networks. PLoS One 2019; 14:e0224201. [PMID: 31648227 PMCID: PMC6812854 DOI: 10.1371/journal.pone.0224201] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 10/08/2019] [Indexed: 11/30/2022] Open
Abstract
Enzyme recruitment is a fundamental evolutionary driver of modern metabolism. We see evidence of recruitment at work in the metabolic Molecular Ancestry Networks (MANET) database, an online resource that integrates data from KEGG, SCOP and structural phylogenomic reconstruction. The database, which was introduced in 2006, traces the deep history of the structural domains of enzymes in metabolic pathways. Here we release version 3.0 of MANET, which updates data from KEGG and SCOP, links enzyme and PDB information with PDBsum, and traces evolutionary information of domains defined at fold family level of SCOP classification in metabolic subnetwork diagrams. Compared to SCOP folds used in the previous versions, fold families are cohesive units of functional similarity that are highly conserved at sequence level and offer a 10-fold increase of data entries. We surveyed enzymatic, functional and catalytic site distributions among superkingdoms showing that ancient enzymatic innovations followed a biphasic temporal pattern of diversification typical of module innovation. We grouped enzymatic activities of MANET into a hierarchical system of subnetworks and mesonetworks matching KEGG classification. The evolutionary growth of these modules of metabolic activity was studied using bipartite networks and their one-mode projections at enzyme, subnetwork and mesonetwork levels of organization. Evolving metabolic networks revealed patterns of enzyme sharing that transcended mesonetwork boundaries and supported the patchwork model of metabolic evolution. We also explored the scale-freeness, randomness and small-world properties of evolving networks as possible organizing principles of network growth and diversification. The network structure shows an increase in hierarchical modularity and scale-free behavior as metabolic networks unfold in evolutionary time. Remarkably, this evolutionary constraint on structure was stronger at lower levels of metabolic organization. Evolving metabolic structure reveals a 'principle of granularity', an evolutionary increase of the cohesiveness of lower-level parts of a hierarchical system. MANET is available at http://manet.illinois.edu.
Collapse
Affiliation(s)
- Fizza Mughal
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| |
Collapse
|
4
|
Nasir A, Kim KM, Caetano-Anollés G. Phylogenetic Tracings of Proteome Size Support the Gradual Accretion of Protein Structural Domains and the Early Origin of Viruses from Primordial Cells. Front Microbiol 2017; 8:1178. [PMID: 28690608 PMCID: PMC5481351 DOI: 10.3389/fmicb.2017.01178] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 06/09/2017] [Indexed: 01/05/2023] Open
Abstract
Untangling the origin and evolution of viruses remains a challenging proposition. We recently studied the global distribution of protein domain structures in thousands of completely sequenced viral and cellular proteomes with comparative genomics, phylogenomics, and multidimensional scaling methods. A tree of life describing the evolution of proteomes revealed viruses emerging from the base of the tree as a fourth supergroup of life. A tree of domains indicated an early origin of modern viral lineages from ancient cells that co-existed with the cellular ancestors. However, it was recently argued that the rooting of our trees and the basal placement of viruses was artifactually induced by small genome (proteome) size. Here we show that these claims arise from misunderstanding and misinterpretations of cladistic methodology. Trees are reconstructed unrooted, and thus, their topologies cannot be distorted a posteriori by the rooting methodology. Tracing proteome size in trees and multidimensional views of evolutionary relationships as well as tests of leaf stability and exclusion/inclusion of taxa demonstrated that the smallest proteomes were neither attracted toward the root nor caused any topological distortions of the trees. Simulations confirmed that taxa clustering patterns were independent of proteome size and were determined by the presence of known evolutionary relatives in data matrices, highlighting the need for broader taxon sampling in phylogeny reconstruction. Instead, phylogenetic tracings of proteome size revealed a slowdown in innovation of the structural domain vocabulary and four regimes of allometric scaling that reflected a Heaps law. These regimes explained increasing economies of scale in the evolutionary growth and accretion of kernel proteome repertoires of viruses and cellular organisms that resemble growth of human languages with limited vocabulary sizes. Results reconcile dynamic and static views of domain frequency distributions that are consistent with the axiom of spatiotemporal continuity that is tenet of evolutionary thinking.
Collapse
Affiliation(s)
- Arshan Nasir
- Department of Biosciences, COMSATS Institute of Information TechnologyIslamabad, Pakistan
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-ChampaignUrbana, IL, United States
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research InstituteIncheon, South Korea
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-ChampaignUrbana, IL, United States
| |
Collapse
|
5
|
Koç I, Caetano-Anollés G. The natural history of molecular functions inferred from an extensive phylogenomic analysis of gene ontology data. PLoS One 2017; 12:e0176129. [PMID: 28467492 PMCID: PMC5414959 DOI: 10.1371/journal.pone.0176129] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 04/05/2017] [Indexed: 11/18/2022] Open
Abstract
The origin and natural history of molecular functions hold the key to the emergence of cellular organization and modern biochemistry. Here we use a genomic census of Gene Ontology (GO) terms to reconstruct phylogenies at the three highest (1, 2 and 3) and the lowest (terminal) levels of the hierarchy of molecular functions, which reflect the broadest and the most specific GO definitions, respectively. These phylogenies define evolutionary timelines of functional innovation. We analyzed 249 free-living organisms comprising the three superkingdoms of life, Archaea, Bacteria, and Eukarya. Phylogenies indicate catalytic, binding and transport functions were the oldest, suggesting a 'metabolism-first' origin scenario for biochemistry. Metabolism made use of increasingly complicated organic chemistry. Primordial features of ancient molecular functions and functional recruitments were further distilled by studying the oldest child terms of the oldest level 1 GO definitions. Network analyses showed the existence of an hourglass pattern of enzyme recruitment in the molecular functions of the directed acyclic graph of molecular functions. Older high-level molecular functions were thoroughly recruited at younger lower levels, while very young high-level functions were used throughout the timeline. This pattern repeated in every one of the three mappings, which gave a criss-cross pattern. The timelines and their mappings were remarkable. They revealed the progressive evolutionary development of functional toolkits, starting with the early rise of metabolic activities, followed chronologically by the rise of macromolecular biosynthesis, the establishment of controlled interactions with the environment and self, adaptation to oxygen, and enzyme coordinated regulation, and ending with the rise of structural and cellular complexity. This historical account holds important clues for dissection of the emergence of biomcomplexity and life.
Collapse
Affiliation(s)
- Ibrahim Koç
- Molecular Biology and Genetics, Gebze Technical University, Kocaeli, Turkey
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| |
Collapse
|
6
|
A Dynamic Model for the Evolution of Protein Structure. J Mol Evol 2016; 82:230-43. [PMID: 27146880 DOI: 10.1007/s00239-016-9740-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Accepted: 04/12/2016] [Indexed: 10/21/2022]
Abstract
Domains are folded structures and evolutionary building blocks of protein molecules. Their three-dimensional atomic conformations, which define biological functions, can be coarse-grained into levels of a hierarchy. Here we build global dynamical models for the evolution of domains at fold and fold superfamily (FSF) levels. We fit the models with data from phylogenomic trees of domain structures and evaluate the distributions of the resulting parameters and their implications. The trees were inferred from a census of domain structures in hundreds of genomes from all three superkingdoms of life. The models used birth-death differential equations with the global abundances of structures as state variables, with one set of equations for folds and another for FSFs. Only the transitions present in the tree are assumed possible. Each fold or FSF diversifies in variants, eventually producing a new fold or FSF. The parameters specify rates of generation of variants and of new folds or FSFs. The equations were solved for the parameters by simplifying the trees to a comb-like topology, treating branches as emerging directly from a trunk. We found that the rate constants for folds and FSFs evolved similarly. These parameters showed a sharp transient change at about 1.5 Gyrs ago. This time coincides with a period in which domains massively combined in proteins and their arrangements distributed in novel lineages during the rise of organismal diversification. Our simulations suggest that exploration of protein structure space occurs through coarse-grained discoveries that undergo fine-grained elaboration.
Collapse
|
7
|
Jeong H, Sung S, Kwon T, Seo M, Caetano-Anollés K, Choi SH, Cho S, Nasir A, Kim H. HGTree: database of horizontally transferred genes determined by tree reconciliation. Nucleic Acids Res 2015; 44:D610-9. [PMID: 26578597 PMCID: PMC4702880 DOI: 10.1093/nar/gkv1245] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 11/01/2015] [Indexed: 01/13/2023] Open
Abstract
The HGTree database provides putative genome-wide horizontal gene transfer (HGT) information for 2472 completely sequenced prokaryotic genomes. This task is accomplished by reconstructing approximate maximum likelihood phylogenetic trees for each orthologous gene and corresponding 16S rRNA reference species sets and then reconciling the two trees under parsimony framework. The tree reconciliation method is generally considered to be a reliable way to detect HGT events but its practical use has remained limited because the method is computationally intensive and conceptually challenging. In this regard, HGTree (http://hgtree.snu.ac.kr) represents a useful addition to the biological community and enables quick and easy retrieval of information for HGT-acquired genes to better understand microbial taxonomy and evolution. The database is freely available and can be easily scaled and updated to keep pace with the rapid rise in genomic information.
Collapse
Affiliation(s)
- Hyeonsoo Jeong
- Interdisciplinary Program in Bioinformatics, Seoul National University, Kwan-ak St. 599, Kwan-ak Gu, Seoul, 151-741, Republic of Korea Department of Animal Sciences, University of Illinois, Urbana, IL 61801, USA
| | - Samsun Sung
- C&K genomics, Main Bldg. #514, SNU Research Park, Seoul 151-919, Republic of Korea
| | - Taehyung Kwon
- Department of Agricultural Biotechnology, Seoul National University, Seoul 151-742, Republic of Korea
| | - Minseok Seo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Kwan-ak St. 599, Kwan-ak Gu, Seoul, 151-741, Republic of Korea
| | | | - Sang Ho Choi
- National Research Laboratory of Molecular Microbiology and Toxicology, Department of Agricultural Biotechnology, Center for Food Safety and Toxicology, Seoul National University, Seoul 151-921, Republic of Korea
| | - Seoae Cho
- C&K genomics, Main Bldg. #514, SNU Research Park, Seoul 151-919, Republic of Korea
| | - Arshan Nasir
- Department of Biosciences, COMSATS Institute of Information Technology, Park Road, Chak Shahzad, Islamabad 45550, Pakistan
| | - Heebal Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Kwan-ak St. 599, Kwan-ak Gu, Seoul, 151-741, Republic of Korea Department of Agricultural Biotechnology, Seoul National University, Seoul 151-742, Republic of Korea
| |
Collapse
|
8
|
Nasir A, Caetano-Anollés G. A phylogenomic data-driven exploration of viral origins and evolution. SCIENCE ADVANCES 2015; 1:e1500527. [PMID: 26601271 PMCID: PMC4643759 DOI: 10.1126/sciadv.1500527] [Citation(s) in RCA: 116] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 06/30/2015] [Indexed: 05/05/2023]
Abstract
The origin of viruses remains mysterious because of their diverse and patchy molecular and functional makeup. Although numerous hypotheses have attempted to explain viral origins, none is backed by substantive data. We take full advantage of the wealth of available protein structural and functional data to explore the evolution of the proteomic makeup of thousands of cells and viruses. Despite the extremely reduced nature of viral proteomes, we established an ancient origin of the "viral supergroup" and the existence of widespread episodes of horizontal transfer of genetic information. Viruses harboring different replicon types and infecting distantly related hosts shared many metabolic and informational protein structural domains of ancient origin that were also widespread in cellular proteomes. Phylogenomic analysis uncovered a universal tree of life and revealed that modern viruses reduced from multiple ancient cells that harbored segmented RNA genomes and coexisted with the ancestors of modern cells. The model for the origin and evolution of viruses and cells is backed by strong genomic and structural evidence and can be reconciled with existing models of viral evolution if one considers viruses to have originated from ancient cells and not from modern counterparts.
Collapse
|
9
|
Shahzad K, Mittenthal JE, Caetano-Anollés G. The organization of domains in proteins obeys Menzerath-Altmann's law of language. BMC SYSTEMS BIOLOGY 2015; 9:44. [PMID: 26260760 PMCID: PMC4531524 DOI: 10.1186/s12918-015-0192-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 07/30/2015] [Indexed: 11/10/2022]
Abstract
BACKGROUND The combination of domains in multidomain proteins enhances their function and structure but lengthens the molecules and increases their cost at cellular level. METHODS The dependence of domain length on the number of domains a protein holds was surveyed for a set of 60 proteomes representing free-living organisms from all kingdoms of life. Distributions were fitted using non-linear functions and fitted parameters interpreted with a formulation of decreasing returns. RESULTS We find that domain length decreases with increasing number of domains in proteins, following the Menzerath-Altmann (MA) law of language. Highly significant negative correlations exist for the set of proteomes examined. Mathematically, the MA law expresses as a power law relationship that unfolds when molecular persistence P is a function of domain accretion. P holds two terms, one reflecting the matter-energy cost of adding domains and extending their length, the other reflecting how domain length and number impinges on information and biophysics. The pattern of diminishing returns can therefore be explained as a frustrated interplay between the strategies of economy, flexibility and robustness, matching previously observed trade-offs in the domain makeup of proteomes. Proteomes of Archaea, Fungi and to a lesser degree Plants show the largest push towards molecular economy, each at their own economic stratum. Fungi increase domain size in single domain proteins while reinforcing the pattern of diminishing returns. In contrast, Metazoa, and to lesser degrees Protista and Bacteria, relax economy. Metazoa achieves maximum flexibility and robustness by harboring compact molecules and complex domain organization, offering a new functional vocabulary for molecular biology. CONCLUSIONS The tendency of parts to decrease their size when systems enlarge is universal for language and music, and now for parts of macromolecules, extending the MA law to natural systems.
Collapse
Affiliation(s)
| | - Jay E Mittenthal
- Department of Cell and Developmental Biology, Urbana, IL, 61801, USA.
| | - Gustavo Caetano-Anollés
- Illinois Informatics Institute, Urbana, IL, 61801, USA. .,Department of Crop Sciences, Evolutionary Bioinformatics Laboratory, University of Illinois, 332 NSRC, Urbana, IL, 61801, USA.
| |
Collapse
|
10
|
Pinard D, Mizrachi E, Hefer CA, Kersting AR, Joubert F, Douglas CJ, Mansfield SD, Myburg AA. Comparative analysis of plant carbohydrate active enZymes and their role in xylogenesis. BMC Genomics 2015; 16:402. [PMID: 25994181 PMCID: PMC4440533 DOI: 10.1186/s12864-015-1571-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Accepted: 04/23/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Carbohydrate metabolism is a key feature of vascular plant architecture, and is of particular importance in large woody species, where lignocellulosic biomass is responsible for bearing the bulk of the stem and crown. Since Carbohydrate Active enZymes (CAZymes) in plants are responsible for the synthesis, modification and degradation of carbohydrate biopolymers, the differences in gene copy number and regulation between woody and herbaceous species have been highlighted previously. There are still many unanswered questions about the role of CAZymes in land plant evolution and the formation of wood, a strong carbohydrate sink. RESULTS Here, twenty-two publically available plant genomes were used to characterize the frequency, diversity and complexity of CAZymes in plants. We find that a conserved suite of CAZymes is a feature of land plant evolution, with similar diversity and complexity regardless of growth habit and form. In addition, we compared the diversity and levels of CAZyme gene expression during wood formation in trees using mRNA-seq data from two distantly related angiosperm tree species Eucalyptus grandis and Populus trichocarpa, highlighting the major CAZyme classes involved in xylogenesis and lignocellulosic biomass production. CONCLUSIONS CAZyme domain ratio across embryophytes is maintained, and the diversity of CAZyme domains is similar in all land plants, regardless of woody habit. The stoichiometric conservation of gene expression in woody and non-woody tissues of Eucalyptus and Populus are indicative of gene balance preservation.
Collapse
Affiliation(s)
- Desre Pinard
- Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Private bag X20 Hatfield, Pretoria, 0028, South Africa.
| | - Eshchar Mizrachi
- Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Private bag X20 Hatfield, Pretoria, 0028, South Africa.
| | - Charles A Hefer
- Centre for Bioinformatics and Computational Biology, Genomics Research Institute (GRI), University of Pretoria, Private bag X20 Hatfield, Pretoria, 0028, South Africa.
| | - Anna R Kersting
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, Hufferstr. 1, Munster, D48149, Germany.
| | - Fourie Joubert
- Centre for Bioinformatics and Computational Biology, Genomics Research Institute (GRI), University of Pretoria, Private bag X20 Hatfield, Pretoria, 0028, South Africa.
| | - Carl J Douglas
- Department of Botany, University of British Columbia, 6270 University Blvd, Vancouver, BC, V6T 1Z4, Canada.
| | - Shawn D Mansfield
- Department of Wood Science, University of British Columbia, 2424 Main Mall, Vancouver, BC, V6T 1Z4, Canada.
| | - Alexander A Myburg
- Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Private bag X20 Hatfield, Pretoria, 0028, South Africa.
| |
Collapse
|
11
|
Nasir A, Kim KM, Caetano-Anollés G. Viral evolution: Primordial cellular origins and late adaptation to parasitism. Mob Genet Elements 2014; 2:247-252. [PMID: 23550145 PMCID: PMC3575434 DOI: 10.4161/mge.22797] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Explaining the origin of viruses remains an important challenge for evolutionary biology. Previous explanatory frameworks described viruses as founders of cellular life, as parasitic reductive products of ancient cellular organisms or as escapees of modern genomes. Each of these frameworks endow viruses with distinct molecular, cellular, dynamic and emergent properties that carry broad and important implications for many disciplines, including biology, ecology and epidemiology. In a recent genome-wide structural phylogenomic analysis, we have shown that large-to-medium-sized viruses coevolved with cellular ancestors and have chosen the evolutionary reductive route. Here we interpret these results and provide a parsimonious hypothesis for the origin of viruses that is supported by molecular data and objective evolutionary bioinformatic approaches. Results suggest two important phases in the evolution of viruses: (1) origin from primordial cells and coexistence with cellular ancestors, and (2) prolonged pressure of genome reduction and relatively late adaptation to the parasitic lifestyle once virions and diversified cellular life took over the planet. Under this evolutionary model, new viral lineages can evolve from existing cellular parasites and enhance the diversity of the world’s virosphere.
Collapse
Affiliation(s)
- Arshan Nasir
- Department of Crop Science; University of Illinois at Urbana-Champaign; Urbana, IL USA ; Illinois Informatics Institute; University of Illinois at Urbana-Champaign; Urbana, IL USA
| | | | | |
Collapse
|
12
|
A phylogenomic census of molecular functions identifies modern thermophilic archaea as the most ancient form of cellular life. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2014; 2014:706468. [PMID: 25249790 PMCID: PMC4164138 DOI: 10.1155/2014/706468] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2013] [Revised: 11/20/2013] [Accepted: 01/17/2014] [Indexed: 12/30/2022]
Abstract
The origins of diversified life remain mysterious despite considerable efforts devoted to untangling the roots of the universal tree of life. Here we reconstructed phylogenies that described the evolution of molecular functions and the evolution of species directly from a genomic census of gene ontology (GO) definitions. We sampled 249 free-living genomes spanning organisms in the three superkingdoms of life, Archaea, Bacteria, and Eukarya, and used the abundance of GO terms as molecular characters to produce rooted phylogenetic trees. Results revealed an early thermophilic origin of Archaea that was followed by genome reduction events in microbial superkingdoms. Eukaryal genomes displayed extraordinary functional diversity and were enriched with hundreds of novel molecular activities not detected in the akaryotic microbial cells. Remarkably, the majority of these novel functions appeared quite late in evolution, synchronized with the diversification of the eukaryal superkingdom. The distribution of GO terms in superkingdoms confirms that Archaea appears to be the simplest and most ancient form of cellular life, while Eukarya is the most diverse and recent.
Collapse
|
13
|
Kim KM, Nasir A, Hwang K, Caetano-Anollés G. A tree of cellular life inferred from a genomic census of molecular functions. J Mol Evol 2014; 79:240-62. [PMID: 25128982 DOI: 10.1007/s00239-014-9637-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 08/05/2014] [Indexed: 10/24/2022]
Abstract
Phylogenomics aims to describe evolutionary relatedness between organisms by analyzing genomic data. The common practice is to produce phylogenomic trees from molecular information in the sequence, order, and content of genes in genomes. These phylogenies describe the evolution of life and become valuable tools for taxonomy. The recent availability of structural and functional data for hundreds of genomes now offers the opportunity to study evolution using more deep, conserved, and reliable sets of molecular features. Here, we reconstruct trees of life from the functions of proteins. We start by inferring rooted phylogenomic trees and networks of organisms directly from Gene Ontology annotations. Phylogenies and networks yield novel insights into the emergence and evolution of cellular life. The ancestor of Archaea originated earlier than the ancestors of Bacteria and Eukarya and was thermophilic. In contrast, basal bacterial lineages were non-thermophilic. A close relationship between Plants and Metazoa was also identified that disagrees with the traditional Fungi-Metazoa grouping. While measures of evolutionary reticulation were minimum in Eukarya and maximum in Bacteria, the massive role of horizontal gene transfer in microbes did not materialize in phylogenomic networks. Phylogenies and networks also showed that the best reconstructions were recovered when problematic taxa (i.e., parasitic/symbiotic organisms) and horizontally transferred characters were excluded from analysis. Our results indicate that functionomic data represent a useful addition to the set of molecular characters used for tree reconstruction and that trees of cellular life carry in deep branches considerable predictive power to explain the evolution of living organisms.
Collapse
Affiliation(s)
- Kyung Mo Kim
- Microbial Resource Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, 305-806, Korea
| | | | | | | |
Collapse
|
14
|
Kim KM, Nasir A, Caetano-Anollés G. The importance of using realistic evolutionary models for retrodicting proteomes. Biochimie 2014; 99:129-37. [DOI: 10.1016/j.biochi.2013.11.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Accepted: 11/22/2013] [Indexed: 01/16/2023]
|
15
|
Global patterns of protein domain gain and loss in superkingdoms. PLoS Comput Biol 2014; 10:e1003452. [PMID: 24499935 PMCID: PMC3907288 DOI: 10.1371/journal.pcbi.1003452] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Accepted: 12/03/2013] [Indexed: 12/21/2022] Open
Abstract
Domains are modules within proteins that can fold and function independently and are evolutionarily conserved. Here we compared the usage and distribution of protein domain families in the free-living proteomes of Archaea, Bacteria and Eukarya and reconstructed species phylogenies while tracing the history of domain emergence and loss in proteomes. We show that both gains and losses of domains occurred frequently during proteome evolution. The rate of domain discovery increased approximately linearly in evolutionary time. Remarkably, gains generally outnumbered losses and the gain-to-loss ratios were much higher in akaryotes compared to eukaryotes. Functional annotations of domain families revealed that both Archaea and Bacteria gained and lost metabolic capabilities during the course of evolution while Eukarya acquired a number of diverse molecular functions including those involved in extracellular processes, immunological mechanisms, and cell regulation. Results also highlighted significant contemporary sharing of informational enzymes between Archaea and Eukarya and metabolic enzymes between Bacteria and Eukarya. Finally, the analysis provided useful insights into the evolution of species. The archaeal superkingdom appeared first in evolution by gradual loss of ancestral domains, bacterial lineages were the first to gain superkingdom-specific domains, and eukaryotes (likely) originated when an expanding proto-eukaryotic stem lineage gained organelles through endosymbiosis of already diversified bacterial lineages. The evolutionary dynamics of domain families in proteomes and the increasing number of domain gains is predicted to redefine the persistence strategies of organisms in superkingdoms, influence the make up of molecular functions, and enhance organismal complexity by the generation of new domain architectures. This dynamics highlights ongoing secondary evolutionary adaptations in akaryotic microbes, especially Archaea.
Collapse
|
16
|
rpoB gene as a novel molecular marker to infer phylogeny in Planctomycetales. Antonie van Leeuwenhoek 2013; 104:477-88. [PMID: 23904187 DOI: 10.1007/s10482-013-9980-7] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 07/20/2013] [Indexed: 10/26/2022]
Abstract
The 16S rRNA gene has been used in the last decades as a gold standard for determining the phylogenetic position of bacteria and their taxonomy. It is a well conserved gene, with some variations, present in all bacteria and allows the reconstruction of genealogies of microorganisms. Nevertheless, this gene has its limitations when inferring phylogenetic relationships between closely related isolates. To overcome this problem, DNA-DNA hybridization appeared as a solution to clarify interspecies relationships when the sequence similarity of the 16S rRNA gene is above 97 %. However, this technique is time consuming, expensive and laborious and so, researchers developed other molecular markers such as sequencing of housekeeping or functional genes for accurate determination of bacterial phylogeny. One of these genes that have been used successfully, particularly in clinical microbiology, codes for the beta subunit of the RNA polymerase (rpoB). The rpoB gene is sufficiently conserved to be used as a molecular clock, it is present in all bacteria and it is a mono-copy gene. In this study, rpoB gene sequencing was applied to the phylum Planctomycetes. Based on the genomes of 19 planctomycetes it was possible to determine the correlation between the rpoB gene sequence and the phylogenetic position of the organisms at a 95-96 % sequence similarity threshold for a novel species. A 1200-bp fragment of the rpoB gene was amplified from several new planctomycetal isolates and their intra and inter-species relationships to other members of this group were determined based on a 96.3 % species border and 98.2 % for intraspecies resolution.
Collapse
|
17
|
Caetano-Anollés K, Caetano-Anollés G. Structural phylogenomics reveals gradual evolutionary replacement of abiotic chemistries by protein enzymes in purine metabolism. PLoS One 2013; 8:e59300. [PMID: 23516625 PMCID: PMC3596326 DOI: 10.1371/journal.pone.0059300] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 02/13/2013] [Indexed: 11/30/2022] Open
Abstract
The origin of metabolism has been linked to abiotic chemistries that existed in our planet at the beginning of life. While plausible chemical pathways have been proposed, including the synthesis of nucleobases, ribose and ribonucleotides, the cooption of these reactions by modern enzymes remains shrouded in mystery. Here we study the emergence of purine metabolism. The ages of protein domains derived from a census of fold family structure in hundreds of genomes were mapped onto enzymes in metabolic diagrams. We find that the origin of the nucleotide interconversion pathway benefited most parsimoniously from the prebiotic formation of adenine nucleosides. In turn, pathways of nucleotide biosynthesis, catabolism and salvage originated ∼300 million years later by concerted enzymatic recruitments and gradual replacement of abiotic chemistries. Remarkably, this process led to the emergence of the fully enzymatic biosynthetic pathway ∼3 billion years ago, concurrently with the appearance of a functional ribosome. The simultaneous appearance of purine biosynthesis and the ribosome probably fulfilled the expanding matter-energy and processing needs of genomic information.
Collapse
Affiliation(s)
- Kelsey Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Chicago School of Professional Psychology, Chicago, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
18
|
Yafremava LS, Wielgos M, Thomas S, Nasir A, Wang M, Mittenthal JE, Caetano-Anollés G. A general framework of persistence strategies for biological systems helps explain domains of life. Front Genet 2013; 4:16. [PMID: 23443991 PMCID: PMC3580334 DOI: 10.3389/fgene.2013.00016] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Accepted: 01/28/2013] [Indexed: 11/13/2022] Open
Abstract
The nature and cause of the division of organisms in superkingdoms is not fully understood. Assuming that environment shapes physiology, here we construct a novel theoretical framework that helps identify general patterns of organism persistence. This framework is based on Jacob von Uexküll's organism-centric view of the environment and James G. Miller's view of organisms as matter-energy-information processing molecular machines. Three concepts describe an organism's environmental niche: scope, umwelt, and gap. Scope denotes the entirety of environmental events and conditions to which the organism is exposed during its lifetime. Umwelt encompasses an organism's perception of these events. The gap is the organism's blind spot, the scope that is not covered by umwelt. These concepts bring organisms of different complexity to a common ecological denominator. Ecological and physiological data suggest organisms persist using three strategies: flexibility, robustness, and economy. All organisms use umwelt information to flexibly adapt to environmental change. They implement robustness against environmental perturbations within the gap generally through redundancy and reliability of internal constituents. Both flexibility and robustness improve survival. However, they also incur metabolic matter-energy processing costs, which otherwise could have been used for growth and reproduction. Lineages evolve unique tradeoff solutions among strategies in the space of what we call "a persistence triangle." Protein domain architecture and other evidence support the preferential use of flexibility and robustness properties. Archaea and Bacteria gravitate toward the triangle's economy vertex, with Archaea biased toward robustness. Eukarya trade economy for survivability. Protista occupy a saddle manifold separating akaryotes from multicellular organisms. Plants and the more flexible Fungi share an economic stratum, and Metazoa are locked in a positive feedback loop toward flexibility.
Collapse
Affiliation(s)
- Liudmila S Yafremava
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana, IL, USA
| | | | | | | | | | | | | |
Collapse
|
19
|
Caetano-Anollés G, Nasir A. Benefits of using molecular structure and abundance in phylogenomic analysis. Front Genet 2012; 3:172. [PMID: 22973296 PMCID: PMC3434437 DOI: 10.3389/fgene.2012.00172] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 08/18/2012] [Indexed: 12/25/2022] Open
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana-Champaign, IL, USA
| | | |
Collapse
|
20
|
Nasir A, Kim KM, Caetano-Anolles G. Giant viruses coexisted with the cellular ancestors and represent a distinct supergroup along with superkingdoms Archaea, Bacteria and Eukarya. BMC Evol Biol 2012; 12:156. [PMID: 22920653 PMCID: PMC3570343 DOI: 10.1186/1471-2148-12-156] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Accepted: 08/22/2012] [Indexed: 11/17/2022] Open
Abstract
Background The discovery of giant viruses with genome and physical size comparable to cellular organisms, remnants of protein translation machinery and virus-specific parasites (virophages) have raised intriguing questions about their origin. Evidence advocates for their inclusion into global phylogenomic studies and their consideration as a distinct and ancient form of life. Results Here we reconstruct phylogenies describing the evolution of proteomes and protein domain structures of cellular organisms and double-stranded DNA viruses with medium-to-very-large proteomes (giant viruses). Trees of proteomes define viruses as a ‘fourth supergroup’ along with superkingdoms Archaea, Bacteria, and Eukarya. Trees of domains indicate they have evolved via massive and primordial reductive evolutionary processes. The distribution of domain structures suggests giant viruses harbor a significant number of protein domains including those with no cellular representation. The genomic and structural diversity embedded in the viral proteomes is comparable to the cellular proteomes of organisms with parasitic lifestyles. Since viral domains are widespread among cellular species, we propose that viruses mediate gene transfer between cells and crucially enhance biodiversity. Conclusions Results call for a change in the way viruses are perceived. They likely represent a distinct form of life that either predated or coexisted with the last universal common ancestor (LUCA) and constitute a very crucial part of our planet’s biosphere.
Collapse
Affiliation(s)
- Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Science, University of Illinois, Urbana, IL 61801, USA
| | | | | |
Collapse
|
21
|
Jiang YY, Kong DX, Qin T, Li X, Caetano-Anollés G, Zhang HY. The impact of oxygen on metabolic evolution: a chemoinformatic investigation. PLoS Comput Biol 2012; 8:e1002426. [PMID: 22438800 PMCID: PMC3305344 DOI: 10.1371/journal.pcbi.1002426] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2011] [Accepted: 01/27/2012] [Indexed: 11/28/2022] Open
Abstract
The appearance of planetary oxygen likely transformed the chemical and biochemical makeup of life and probably triggered episodes of organismal diversification. Here we use chemoinformatic methods to explore the impact of the rise of oxygen on metabolic evolution. We undertake a comprehensive comparative analysis of structures, chemical properties and chemical reactions of anaerobic and aerobic metabolites. The results indicate that aerobic metabolism has expanded the structural and chemical space of metabolites considerably, including the appearance of 130 novel molecular scaffolds. The molecular functions of these metabolites are mainly associated with derived aspects of cellular life, such as signal transfer, defense against biotic factors, and protection of organisms from oxidation. Moreover, aerobic metabolites are more hydrophobic and rigid than anaerobic compounds, suggesting they are better fit to modulate membrane functions and to serve as transmembrane signaling factors. Since higher organisms depend largely on sophisticated membrane-enabled functions and intercellular signaling systems, the metabolic developments brought about by oxygen benefit the diversity of cellular makeup and the complexity of cellular organization as well. These findings enhance our understanding of the molecular link between oxygen and evolution. They also show the significance of chemoinformatics in addressing basic biological questions. Elucidating the link between the rise of planetary oxygen and biological evolution is a challenging topic in evolutionary biology. Previous studies in this area were dominated by biological investigations. The recent simulations of metabolic networks under anaerobic or aerobic conditions revealed that aerobic metabolism gave rise to 1,000+ new reactions. Since metabolites are small molecules and metabolic reactions are basically chemical reactions, we think that the impact of oxygen on metabolic evolution can be well studied by chemoinformatics. In this paper, we use chemoinformatic methods to perform a comprehensive comparative analysis of the chemical structures, properties and reactions of anaerobic and aerobic metabolites. It was found that aerobic metabolism has considerably expanded the structural space of metabolites by inventing 130 novel molecular scaffolds. Moreover, aerobic metabolism also helped organisms to explore a new chemical space by increasing the hydrophobicity and rigidity of metabolites. Since hydrophobic metabolites are fitting to modulate membrane functions and to serve as transmembrane signaling factors, these metabolic innovations definitely benefit the establishment of complex cellular organizations. The present findings not only help to understand the molecular link between oxygen and evolution but also suggest that chemoinformatics is of special value in addressing some basic biological questions.
Collapse
Affiliation(s)
- Ying-Ying Jiang
- National Key Laboratory of Crop Genetic Improvement, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
- Center for Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - De-Xin Kong
- Center for Bioinformatics, Huazhong Agricultural University, Wuhan, China
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
| | - Tao Qin
- School of Life Sciences, Shandong University of Technology, Zibo, China
| | - Xiao Li
- School of Life Sciences, Shandong University of Technology, Zibo, China
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| | - Hong-Yu Zhang
- National Key Laboratory of Crop Genetic Improvement, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
- Center for Bioinformatics, Huazhong Agricultural University, Wuhan, China
- * E-mail:
| |
Collapse
|