1
|
Caetano-Anollés G, Nasir A, Kim KM, Caetano-Anollés D. Rooting Phylogenies and the Tree of Life While Minimizing Ad Hoc and Auxiliary Assumptions. Evol Bioinform Online 2018; 14:1176934318805101. [PMID: 30364468 PMCID: PMC6196624 DOI: 10.1177/1176934318805101] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 09/05/2018] [Indexed: 12/25/2022] Open
Abstract
Phylogenetic methods unearth evolutionary history when supported by three starting points of reason: (1) the continuity axiom begs the existence of a "model" of evolutionary change, (2) the singularity axiom defines the historical ground plan (phylogeny) in which biological entities (taxa) evolve, and (3) the memory axiom demands identification of biological attributes (characters) with historical information. Axiom consequences are interlinked, making the retrodiction enterprise an endeavor of reciprocal fulfillment. In particular, establishing direction of evolutionary change (character polarization) roots phylogenies and enables testing the existence of historical memory (homology). Unfortunately, rooting phylogenies, especially the "tree of life," generally follow narratives instead of integrating empirical and theoretical knowledge of retrodictive exploration. This stems mostly from a focus on molecular sequence analysis and uncertainties about rooting methods. Here, we review available rooting criteria, highlighting the need to minimize both ad hoc and auxiliary assumptions, especially argumentative ad hocness. We show that while the outgroup comparison method has been widely adopted, the generality criterion of nesting and additive phylogenetic change embodied in Weston rule offers the most powerful rooting approach. We also propose a change of focus, from phylogenies that describe the evolution of biological systems to those that describe the evolution of parts of those systems. This weakens violation of character independence, helps formalize the generality criterion of rooting, and provides new ways to study the problem of evolution.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Derek Caetano-Anollés
- Department of Evolutionary Genetics, Max-Planck-Institut für Evolutionsbiologie, Plön, Germany
| |
Collapse
|
2
|
Chen BS, Wu WS. Underlying Principles of Natural Selection in Network Evolution: Systems Biology Approach. Evol Bioinform Online 2017. [DOI: 10.1177/117693430700300010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Systems biology is a rapidly expanding field that integrates diverse areas of science such as physics, engineering, computer science, mathematics, and biology toward the goal of elucidating the underlying principles of hierarchical metabolic and regulatory systems in the cell, and ultimately leading to predictive understanding of cellular response to perturbations. Because post-genomics research is taking place throughout the tree of life, comparative approaches offer a way for combining data from many organisms to shed light on the evolution and function of biological networks from the gene to the organismal level. Therefore, systems biology can build on decades of theoretical work in evolutionary biology, and at the same time evolutionary biology can use the systems biology approach to go in new uncharted directions. In this study, we present a review of how the post-genomics era is adopting comparative approaches and dynamic system methods to understand the underlying design principles of network evolution and to shape the nascent field of evolutionary systems biology. Finally, the application of evolutionary systems biology to robust biological network designs is also discussed from the synthetic biology perspective.
Collapse
Affiliation(s)
- Bor-Sen Chen
- Lab of Control and Systems Biology, National Tsing Hua University, Hsinchu, 300, Taiwan
| | - Wei-Sheng Wu
- Lab of Control and Systems Biology, National Tsing Hua University, Hsinchu, 300, Taiwan
| |
Collapse
|
3
|
Andersen SB, Ghoul M, Griffin AS, Petersen B, Johansen HK, Molin S. Diversity, Prevalence, and Longitudinal Occurrence of Type II Toxin-Antitoxin Systems of Pseudomonas aeruginosa Infecting Cystic Fibrosis Lungs. Front Microbiol 2017; 8:1180. [PMID: 28690609 PMCID: PMC5481352 DOI: 10.3389/fmicb.2017.01180] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2017] [Accepted: 06/09/2017] [Indexed: 12/15/2022] Open
Abstract
Type II toxin-antitoxin (TA) systems are most commonly composed of two genes encoding a stable toxin, which harms the cell, and an unstable antitoxin that can inactivate it. TA systems were initially characterized as selfish elements, but have recently gained attention for regulating general stress responses responsible for pathogen virulence, formation of drug-tolerant persister cells and biofilms—all implicated in causing recalcitrant chronic infections. We use a bioinformatics approach to explore the distribution and evolution of type II TA loci of the opportunistic pathogen, Pseudomonas aeruginosa, across longitudinally sampled isolates from cystic fibrosis lungs. We identify their location in the genome, mutations, and gain/loss during infection to elucidate their function(s) in stabilizing selfish elements and pathogenesis. We found (1) 26 distinct TA systems, where all isolates harbor four in their core genome and a variable number of the remaining 22 on genomic islands; (2) limited mutations in core genome TA loci, suggesting they are not under negative selection; (3) no evidence for horizontal transmission of elements with TA systems between clone types within patients, despite their ability to mobilize; (4) no gain and limited loss of TA-bearing genomic islands, and of those elements partially lost, the remnant regions carry the TA systems supporting their role in genomic stabilization; (5) no significant correlation between frequency of TA systems and strain ability to establish as chronic infection, but those with a particular TA, are more successful in establishing a chronic infection.
Collapse
Affiliation(s)
- Sandra B Andersen
- Department of Zoology, University of OxfordOxford, United Kingdom.,The Novo Nordisk Foundation Center for Biosustainability, Technical University of DenmarkLyngby, Denmark
| | - Melanie Ghoul
- Department of Zoology, University of OxfordOxford, United Kingdom
| | | | - Bent Petersen
- Department of Bio and Health Informatics, Technical University of DenmarkLyngby, Denmark
| | - Helle K Johansen
- Department of Clinical Microbiology, RigshospitaletCopenhagen, Denmark
| | - Søren Molin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of DenmarkLyngby, Denmark
| |
Collapse
|
4
|
Koç I, Caetano-Anollés G. The natural history of molecular functions inferred from an extensive phylogenomic analysis of gene ontology data. PLoS One 2017; 12:e0176129. [PMID: 28467492 PMCID: PMC5414959 DOI: 10.1371/journal.pone.0176129] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 04/05/2017] [Indexed: 11/18/2022] Open
Abstract
The origin and natural history of molecular functions hold the key to the emergence of cellular organization and modern biochemistry. Here we use a genomic census of Gene Ontology (GO) terms to reconstruct phylogenies at the three highest (1, 2 and 3) and the lowest (terminal) levels of the hierarchy of molecular functions, which reflect the broadest and the most specific GO definitions, respectively. These phylogenies define evolutionary timelines of functional innovation. We analyzed 249 free-living organisms comprising the three superkingdoms of life, Archaea, Bacteria, and Eukarya. Phylogenies indicate catalytic, binding and transport functions were the oldest, suggesting a 'metabolism-first' origin scenario for biochemistry. Metabolism made use of increasingly complicated organic chemistry. Primordial features of ancient molecular functions and functional recruitments were further distilled by studying the oldest child terms of the oldest level 1 GO definitions. Network analyses showed the existence of an hourglass pattern of enzyme recruitment in the molecular functions of the directed acyclic graph of molecular functions. Older high-level molecular functions were thoroughly recruited at younger lower levels, while very young high-level functions were used throughout the timeline. This pattern repeated in every one of the three mappings, which gave a criss-cross pattern. The timelines and their mappings were remarkable. They revealed the progressive evolutionary development of functional toolkits, starting with the early rise of metabolic activities, followed chronologically by the rise of macromolecular biosynthesis, the establishment of controlled interactions with the environment and self, adaptation to oxygen, and enzyme coordinated regulation, and ending with the rise of structural and cellular complexity. This historical account holds important clues for dissection of the emergence of biomcomplexity and life.
Collapse
Affiliation(s)
- Ibrahim Koç
- Molecular Biology and Genetics, Gebze Technical University, Kocaeli, Turkey
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| |
Collapse
|
5
|
Sun B, Li T, Xiao J, Liu L, Zhang P, Murphy RW, He S, Huang D. Contribution of Multiple Inter-Kingdom Horizontal Gene Transfers to Evolution and Adaptation of Amphibian-Killing Chytrid, Batrachochytrium dendrobatidis. Front Microbiol 2016; 7:1360. [PMID: 27630622 PMCID: PMC5005798 DOI: 10.3389/fmicb.2016.01360] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 08/17/2016] [Indexed: 01/30/2023] Open
Abstract
Amphibian populations are experiencing catastrophic declines driven by the fungal pathogen Batrachochytrium dendrobatidis (Bd). Although horizontal gene transfer (HGT) facilitates the evolution and adaptation in many fungi by conferring novel function genes to the recipient fungi, inter-kingdom HGT in Bd remains largely unexplored. In this study, our investigation detects 19 bacterial genes transferred to Bd, including metallo-beta-lactamase and arsenate reductase that play important roles in the resistance to antibiotics and arsenates. Moreover, three probable HGT gene families in Bd are from plants and one gene family coding the ankyrin repeat-containing protein appears to come from oomycetes. The observed multi-copy gene families associated with HGT are probably due to the independent transfer events or gene duplications. Five HGT genes with extracellular locations may relate to infection, and some other genes may participate in a variety of metabolic pathways, and in doing so add important metabolic traits to the recipient. The evolutionary analysis indicates that all the transferred genes evolved under purifying selection, suggesting that their functions in Bd are similar to those of the donors. Collectively, our results indicate that HGT from diverse donors may be an important evolutionary driver of Bd, and improve its adaptations for infecting and colonizing host amphibians.
Collapse
Affiliation(s)
- Baofa Sun
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of SciencesBeijing, China; CAS Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of SciencesBeijing, China
| | - Tong Li
- Key Laboratory of Crop Pests Control of Henan Province, Institute of Plant Protection, Henan Academy of Agricultural Sciences Zhengzhou, China
| | - Jinhua Xiao
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences Beijing, China
| | - Li Liu
- Network & Information Center, Institute of Microbiology, Chinese Academy of Sciences Beijing, China
| | - Peng Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences Beijing, China
| | - Robert W Murphy
- Department of Natural History, Royal Ontario Museum Toronto, ON, Canada
| | - Shunmin He
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences Beijing, China
| | - Dawei Huang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of SciencesBeijing, China; Shandong Provincial Key Laboratory for Biology of Vegetable Diseases and Insect Pests, College of Plant Protection, Shandong Agricultural UniversityTai'an, China
| |
Collapse
|
6
|
Kurland CG, Harish A. The phylogenomics of protein structures: The backstory. Biochimie 2015; 119:284-302. [DOI: 10.1016/j.biochi.2015.07.027] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2015] [Accepted: 07/28/2015] [Indexed: 12/11/2022]
|
7
|
Abstract
All life on earth can be naturally classified into cellular life forms and virus-like selfish elements, the latter being fully dependent on the former for their reproduction. Cells are reproducers that not only replicate their genome but also reproduce the cellular organization that depends on semipermeable, energy-transforming membranes and cannot be recovered from the genome alone, under the famous dictum of Rudolf Virchow, Omnis cellula e cellula. In contrast, simple selfish elements are replicators that can complete their life cycles within the host cell starting from genomic RNA or DNA alone. The origin of the cellular organization is the central and perhaps the hardest problem of evolutionary biology. I argue that the origin of cells can be understood only in conjunction with the origin and evolution of selfish genetic elements. A scenario of precellular evolution is presented that involves cohesion of the genomes of the emerging cellular life forms from primordial pools of small genetic elements that eventually segregated into hosts and parasites. I further present a model of the coevolution of primordial membranes and membrane proteins, discuss protocellular and non-cellular models of early evolution, and examine the habitats on the primordial earth that could have been conducive to precellular evolution and the origin of cells.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, Bethesda, MD, 20894, USA,
| |
Collapse
|
8
|
Kim KM, Nasir A, Caetano-Anollés G. The importance of using realistic evolutionary models for retrodicting proteomes. Biochimie 2014; 99:129-37. [DOI: 10.1016/j.biochi.2013.11.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Accepted: 11/22/2013] [Indexed: 01/16/2023]
|
9
|
Abstract
The traditional bacterial rooting of the three superkingdoms in sequence-based gene trees is inconsistent with new phylogenetic reconstructions based on genome content of compact protein domains. We find that protein domains at the level of the SCOP superfamily (SF) from sequenced genomes implement with maximum parsimony fully resolved rooted trees. Such genome content trees identify archaea and bacteria (akaryotes) as sister clades that diverge from an akaryote common ancestor, LACA. Several eukaryote sister clades diverge from a eukaryote common ancestor, LECA. LACA and LECA descend in parallel from the most recent universal common ancestor (MRUCA), which is not a bacterium. Rather, MRUCA presents 75% of the unique SFs encoded by extant genomes of the three superkingdoms, each encoding a proteome that partially overlaps all others. This alone implies that the common ancestor to the superkingdoms was very complex. Such ancestral complexity is confirmed by phylogenetic reconstructions. In addition, the divergence of proteomes from the complex ancestor in each superkingdom is both reductive in numbers of unique SFs as well as cumulative in the abundance of surviving SFs. These data suggest that the common ancestor was not the first cell lineage and that modern global phylogeny is the crown of a "recently" re-rooted tree. We suggest that a bottlenecked survivor of an environmental collapse, which preceded the flourishing of the modern crown, seeded the current phylogenetic tree.
Collapse
|
10
|
Sun BF, Xiao JH, He S, Liu L, Murphy RW, Huang DW. Multiple interkingdom horizontal gene transfers in Pyrenophora and closely related species and their contributions to phytopathogenic lifestyles. PLoS One 2013; 8:e60029. [PMID: 23555871 PMCID: PMC3612039 DOI: 10.1371/journal.pone.0060029] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2012] [Accepted: 02/20/2013] [Indexed: 12/13/2022] Open
Abstract
Many studies have reported horizontal gene transfer (HGT) events from eukaryotes, especially fungi. However, only a few investigations summarized multiple interkingdom HGTs involving important phytopathogenic species of Pyrenophora and few have investigated the genetic contributions of HGTs to fungi. We investigated HGT events in P. teres and P. tritici-repentis and discovered that both species harbored 14 HGT genes derived from bacteria and plants, including 12 HGT genes that occurred in both species. One gene coding a leucine-rich repeat protein was present in both species of Pyrenophora and it may have been transferred from a host plant. The transfer of genes from a host plant to pathogenic fungi has been reported rarely and we discovered the first evidence for this transfer in phytopathogenic Pyrenophora. Two HGTs in Pyrenophora underwent subsequent duplications. Some HGT genes had homologs in a few other fungi, indicating relatively ancient transfer events. Functional analyses indicated that half of the HGT genes encoded extracellular proteins and these may have facilitated the infection of plants by Pyrenophora via interference with plant defense-response and the degradation of plant cell walls. Some other HGT genes appeared to participate in carbohydrate metabolism. Together, these functions implied that HGTs may have led to highly efficient mechanisms of infection as well as the utilization of host carbohydrates. Evolutionary analyses indicated that HGT genes experienced amelioration, purifying selection, and accelerated evolution. These appeared to constitute adaptations to the background genome of the recipient. The discovery of multiple interkingdom HGTs in Pyrenophora, their significance to infection, and their adaptive evolution, provided valuable insights into the evolutionary significance of interkingdom HGTs from multiple donors.
Collapse
Affiliation(s)
- Bao-Fa Sun
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of the Chinese Academy of Sciences, Beijing, China
| | - Jin-Hua Xiao
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Shunmin He
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Li Liu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Robert W. Murphy
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Centre for Biodiversity and Conservation Biology, Royal Ontario Museum, Toronto, Canada
| | - Da-Wei Huang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- College of Plant Protection, Shandong Agricultural University, Tai'an, Shandong, China
- * E-mail:
| |
Collapse
|
11
|
Sun BF, Xiao JH, He SM, Liu L, Murphy RW, Huang DW. Multiple ancient horizontal gene transfers and duplications in lepidopteran species. INSECT MOLECULAR BIOLOGY 2013; 22:72-87. [PMID: 23211014 DOI: 10.1111/imb.12004] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Eukaryotic horizontal gene transfer (HGT) events are increasingly being discovered yet few reports have summarized multiple occurrences in a wide range of species. We systematically investigated HGT events in the order Lepidoptera by employing a series of filters. Bombyx mori, Danaus plexippus and Heliconius melpomene had 13, 12 and 12 HGTs, respectively, from bacteria and fungi. These HGTs contributed a total of 64 predicted genes: 22 to B. mori, 22 to D. plexippus and 20 to H. melpomene. Several new genes were generated by post-transfer duplications. Post-transfer duplication of a suite of functional HGTs has rarely been reported in higher organisms. The distributional patterns of paralogues for certain genes differed in the three species, indicating potential independent duplication or loss events. All of these HGTs had homologues expressed in some other lepidopterans, indicating ancient transfer events. Most HGTs were involved in the metabolism of sugar and amino acids. These HGTs appeared to have experienced amelioration, purifying selection and accelerated evolution to adapt to the background genome of the recipient. The discovery of ancient, massive HGTs and duplications in lepidopterans and their adaptive evolution provides further insights into the evolutionary significance of the events from donors to multicellular host recipients.
Collapse
Affiliation(s)
- B F Sun
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | | | | | | | | | | |
Collapse
|
12
|
Kim KM, Caetano-Anollés G. The evolutionary history of protein fold families and proteomes confirms that the archaeal ancestor is more ancient than the ancestors of other superkingdoms. BMC Evol Biol 2012; 12:13. [PMID: 22284070 PMCID: PMC3306197 DOI: 10.1186/1471-2148-12-13] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2011] [Accepted: 01/27/2012] [Indexed: 11/23/2022] Open
Abstract
Background The entire evolutionary history of life can be studied using myriad sequences generated by genomic research. This includes the appearance of the first cells and of superkingdoms Archaea, Bacteria, and Eukarya. However, the use of molecular sequence information for deep phylogenetic analyses is limited by mutational saturation, differential evolutionary rates, lack of sequence site independence, and other biological and technical constraints. In contrast, protein structures are evolutionary modules that are highly conserved and diverse enough to enable deep historical exploration. Results Here we build phylogenies that describe the evolution of proteins and proteomes. These phylogenetic trees are derived from a genomic census of protein domains defined at the fold family (FF) level of structural classification. Phylogenomic trees of FF structures were reconstructed from genomic abundance levels of 2,397 FFs in 420 proteomes of free-living organisms. These trees defined timelines of domain appearance, with time spanning from the origin of proteins to the present. Timelines are divided into five different evolutionary phases according to patterns of sharing of FFs among superkingdoms: (1) a primordial protein world, (2) reductive evolution and the rise of Archaea, (3) the rise of Bacteria from the common ancestor of Bacteria and Eukarya and early development of the three superkingdoms, (4) the rise of Eukarya and widespread organismal diversification, and (5) eukaryal diversification. The relative ancestry of the FFs shows that reductive evolution by domain loss is dominant in the first three phases and is responsible for both the diversification of life from a universal cellular ancestor and the appearance of superkingdoms. On the other hand, domain gains are predominant in the last two phases and are responsible for organismal diversification, especially in Bacteria and Eukarya. Conclusions The evolution of functions that are associated with corresponding FFs along the timeline reveals that primordial metabolic domains evolved earlier than informational domains involved in translation and transcription, supporting the metabolism-first hypothesis rather than the RNA world scenario. In addition, phylogenomic trees of proteomes reconstructed from FFs appearing in each of the five phases of the protein world show that trees reconstructed from ancient domain structures were consistently rooted in archaeal lineages, supporting the proposal that the archaeal ancestor is more ancient than the ancestors of other superkingdoms.
Collapse
Affiliation(s)
- Kyung Mo Kim
- Evolutionary Bioinformatics Laboratory, Department of Crop Science, University of Illinois, Urbana, IL 61801, USA
| | | |
Collapse
|
13
|
Chen BS, Lin YP. On the Interplay between the Evolvability and Network Robustness in an Evolutionary Biological Network: A Systems Biology Approach. Evol Bioinform Online 2011; 7:201-33. [PMID: 22084563 PMCID: PMC3210637 DOI: 10.4137/ebo.s8123] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
In the evolutionary process, the random transmission and mutation of genes provide biological diversities for natural selection. In order to preserve functional phenotypes between generations, gene networks need to evolve robustly under the influence of random perturbations. Therefore, the robustness of the phenotype, in the evolutionary process, exerts a selection force on gene networks to keep network functions. However, gene networks need to adjust, by variations in genetic content, to generate phenotypes for new challenges in the network's evolution, ie, the evolvability. Hence, there should be some interplay between the evolvability and network robustness in evolutionary gene networks. In this study, the interplay between the evolvability and network robustness of a gene network and a biochemical network is discussed from a nonlinear stochastic system point of view. It was found that if the genetic robustness plus environmental robustness is less than the network robustness, the phenotype of the biological network is robust in evolution. The tradeoff between the genetic robustness and environmental robustness in evolution is discussed from the stochastic stability robustness and sensitivity of the nonlinear stochastic biological network, which may be relevant to the statistical tradeoff between bias and variance, the so-called bias/variance dilemma. Further, the tradeoff could be considered as an antagonistic pleiotropic action of a gene network and discussed from the systems biology perspective.
Collapse
Affiliation(s)
- Bor-Sen Chen
- Lab of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan 30013
| | - Ying-Po Lin
- Lab of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan 30013
| |
Collapse
|
14
|
Proteome evolution and the metabolic origins of translation and cellular life. J Mol Evol 2010; 72:14-33. [PMID: 21082171 DOI: 10.1007/s00239-010-9400-9] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2010] [Accepted: 10/25/2010] [Indexed: 12/27/2022]
Abstract
The origin of life has puzzled molecular scientists for over half a century. Yet fundamental questions remain unanswered, including which came first, the metabolic machinery or the encoding nucleic acids. In this study we take a protein-centric view and explore the ancestral origins of proteins. Protein domain structures in proteomes are highly conserved and embody molecular functions and interactions that are needed for cellular and organismal processes. Here we use domain structure to study the evolution of molecular function in the protein world. Timelines describing the age and function of protein domains at fold, fold superfamily, and fold family levels of structural complexity were derived from a structural phylogenomic census in hundreds of fully sequenced genomes. These timelines unfold congruent hourglass patterns in rates of appearance of domain structures and functions, functional diversity, and hierarchical complexity, and revealed a gradual build up of protein repertoires associated with metabolism, translation and DNA, in that order. The most ancient domain architectures were hydrolase enzymes and the first translation domains had catalytic functions for the aminoacylation and the molecular switch-driven transport of RNA. Remarkably, the most ancient domains had metabolic roles, did not interact with RNA, and preceded the gradual build-up of translation. In fact, the first translation domains had also a metabolic origin and were only later followed by specialized translation machinery. Our results explain how the generation of structure in the protein world and the concurrent crystallization of translation and diversified cellular life created further opportunities for proteomic diversification.
Collapse
|
15
|
Wang M, Jiang YY, Kim KM, Qu G, Ji HF, Mittenthal JE, Zhang HY, Caetano-Anollés G. A universal molecular clock of protein folds and its power in tracing the early history of aerobic metabolism and planet oxygenation. Mol Biol Evol 2010; 28:567-82. [PMID: 20805191 DOI: 10.1093/molbev/msq232] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
The standard molecular clock describes a constant rate of molecular evolution and provides a powerful framework for evolutionary timescales. Here, we describe the existence and implications of a molecular clock of folds, a universal recurrence in the discovery of new structures in the world of proteins. Using a phylogenomic structural census in hundreds of proteomes, we build phylogenies and time lines of domains at fold and fold superfamily levels of structural complexity. These time lines correlate approximately linearly with geological timescales and were here used to date two crucial events in life history, planet oxygenation and organism diversification. We first dissected the structures and functions of enzymes in simulated metabolic networks. The placement of anaerobic and aerobic enzymes in the time line revealed that aerobic metabolism emerged about 2.9 billion years (giga-annum; Ga) ago and expanded during a period of about 400 My, reaching what is known as the Great Oxidation Event. During this period, enzymes recruited old and new folds for oxygen-mediated enzymatic activities. Remarkably, the first fold lost by a superkingdom disappeared in Archaea 2.6 Ga ago, within the span of oxygen rise, suggesting that oxygen also triggered diversification of life. The implications of a molecular clock of folds are many and important for the neutral theory of molecular evolution and for understanding the growth and diversity of the protein world. The clock also extends the standard concept that was specific to molecules and their timescales and turns it into a universal timescale-generating tool.
Collapse
Affiliation(s)
- Minglei Wang
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana-Champaign, USA
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Flamm C, Ullrich A, Ekker H, Mann M, Högerl D, Rohrschneider M, Sauer S, Scheuermann G, Klemm K, Hofacker IL, Stadler PF. Evolution of metabolic networks: a computational frame-work. ACTA ACUST UNITED AC 2010. [DOI: 10.1186/1759-2208-1-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Abstract
Background
The metabolic architectures of extant organisms share many key pathways such as the citric acid cycle, glycolysis, or the biosynthesis of most amino acids. Several competing hypotheses for the evolutionary mechanisms that shape metabolic networks have been discussed in the literature, each of which finds support from comparative analysis of extant genomes. Alternatively, the principles of metabolic evolution can be studied by direct computer simulation. This requires, however, an explicit implementation of all pertinent components: a universe of chemical reactions upon which the metabolism is built, an explicit representation of the enzymes that implement the metabolism, a genetic system that encodes these enzymes, and a fitness function that can be selected for.
Results
We describe here a simulation environment that implements all these components in a simplified way so that large-scale evolutionary studies are feasible. We employ an artificial chemistry that views chemical reactions as graph rewriting operations and utilizes a toy-version of quantum chemistry to derive thermodynamic parameters. Minimalist organisms with simple string-encoded genomes produce model ribozymes whose catalytic activity is determined by an ad hoc mapping between their secondary structure and the transition state graphs that they stabilize. Fitness is computed utilizing the ideas of metabolic flux analysis. We present an implementation of the complete system and first simulation results.
Conclusions
The simulation system presented here allows coherent investigations into the evolutionary mechanisms of the first steps of metabolic evolution using a self-consistent toy universe.
Collapse
|
17
|
Kim KM, Caetano-Anollés G. Emergence and evolution of modern molecular functions inferred from phylogenomic analysis of ontological data. Mol Biol Evol 2010; 27:1710-33. [PMID: 20418223 DOI: 10.1093/molbev/msq106] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The biological processes that characterize the phenotypes of a living system are embodied in the function of molecules and hold the key to evolutionary history, delimiting natural selection and change. These processes and functions provide direct insight into the emergence, development, and organization of cellular life. However, detailed molecular functions make up a network-like hierarchy of relationships that tells little of evolutionary links between structure and function in biology. For example, Gene Ontology terms represent widely-used vocabularies of processes and functions with evolutionary relationships that are implicit but not defined. Here, we uncover patterns of global evolutionary history in ontological terms associated with the sequence of 38 genomes. These patterns unfold the metabolic origins of modern molecular functions and major biological transitions in evolution toward complex life. Phylogenies reveal the primordial appearance of hydrolases and transferases, with ATPase, GTPase, and helicase activities being the most ancient. This indicates that ancient catalysts were crucial for binding and transport, the emergence of nucleic acids and protein biopolymers, and the communication of primordial cells with the environment. Finally, the history of biological processes showed that cellular biopolymer metabolic processes preceded biopolymer biosynthesis and essential processes related to macromolecular formation, directly challenging the existence of an RNA world. Phylogenomic systematization of biological function takes the structure and function paradigm to a completely new level of abstraction, demonstrating a "metabolic first" origin of life. The approach uncovers patterns in the morphing of function that are unprecedented and necessary for systematic views in biology.
Collapse
Affiliation(s)
- Kyung Mo Kim
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, IL, USA
| | | |
Collapse
|
18
|
Abstract
It is proposed that the precellular stage of biological evolution unraveled within networks of inorganic compartments that harbored a diverse mix of virus‐like genetic elements. This stage of evolution might makes up the Last Universal Cellular Ancestor (LUCA) that more appropriately could be denoted Last Universal Cellular Ancestral State (LUCAS). Such a scenario recapitulates the ideas of J. B. S. Haldane sketched in his classic 1928 essay. However, unlike in Haldane's day, considerable support for this scenario exits today: lack of homology between core DNA replication system components in archaea and bacteria, distinct membrane chemistries and enzymes of lipid biosynthesis in archaea and bacteria, spread of several viral hallmark genes among diverse groups of viruses, and the extant archaeal and bacterial chromosomes appear to be shaped by accretion of diverse, smaller replicons. Under the viral model of precellular evolution, the key components of cells originated as components of virus‐like entities. The two surviving types of cellular life forms, archaea and bacteria, might have emerged from the LUCAS independently, along with, probably, numerous forms now extinct.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
19
|
Sun FJ, Caetano-Anollés G. The evolutionary history of the structure of 5S ribosomal RNA. J Mol Evol 2009; 69:430-43. [PMID: 19639237 DOI: 10.1007/s00239-009-9264-z] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2009] [Accepted: 07/03/2009] [Indexed: 02/05/2023]
Abstract
5S rRNA is the smallest nucleic acid component of the large ribosomal subunit, contributing to ribosomal assembly, stability, and function. Despite being a model for the study of RNA structure and RNA-protein interactions, the evolution of this universally conserved molecule remains unclear. Here, we explore the history of the three-domain structure of 5S rRNA using phylogenetic trees that are reconstructed directly from molecular structure. A total of 46 structural characters describing the geometry of 666 5S rRNAs were used to derive intrinsically rooted trees of molecules and molecular substructures. Trees of molecules revealed the tripartite nature of life. In these trees, superkingdom Archaea formed a paraphyletic basal group, while Bacteria and Eukarya were monophyletic and derived. Trees of molecular substructures supported an origin of the molecule in a segment that is homologous to helix I (alpha domain), its initial enhancement with helix III (beta domain), and the early formation of the three-domain structure typical of modern 5S rRNA in Archaea. The delayed formation of the branched structure in Bacteria and Eukarya lends further support to the archaeal rooting of the tree of life. Remarkably, the evolution of molecular interactions between 5S rRNA and associated ribosomal proteins inferred from a census of domain structure in hundreds of genomes established a tight relationship between the age of 5S rRNA helices and the age of ribosomal proteins. Results suggest 5S rRNA originated relatively quickly but quite late in evolution, at a time when primordial metabolic enzymes and translation machinery were already in place. The molecule therefore represents a late evolutionary addition to the ribosomal ensemble that occurred prior to the early diversification of Archaea.
Collapse
Affiliation(s)
- Feng-Jie Sun
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, 332 National Soybean Research Center, 1101 West Peabody Drive, Urbana, IL 61801, USA
| | | |
Collapse
|
20
|
Valas RE, Yang S, Bourne PE. Nothing about protein structure classification makes sense except in the light of evolution. Curr Opin Struct Biol 2009; 19:329-34. [PMID: 19394812 DOI: 10.1016/j.sbi.2009.03.011] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2008] [Revised: 02/19/2009] [Accepted: 03/16/2009] [Indexed: 12/27/2022]
Abstract
In this, the 200th anniversary of Charles Darwin's birth and the 150th anniversary of the publication of the Origin of Species, it is fitting to revisit the classification of protein structures from an evolutionary perspective. Existing classifications use homologous sequence relationships, but knowing that structure is much more conserved that sequence creates an iterative loop from which structures can be further classified beyond that of the domain, thereby teasing out distant evolutionary relationships. The desired classification scheme is then one in which a fold is merely semantics and structure can be classified as either ancestral or derived.
Collapse
Affiliation(s)
- Ruben E Valas
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093-0743, USA
| | | | | |
Collapse
|
21
|
Abstract
Bacterial toxin–antitoxin (TA) systems are diverse and widespread in the prokaryotic kingdom. They are composed of closely linked genes encoding a stable toxin that can harm the host cell and its cognate labile antitoxin, which protects the host from the toxin's deleterious effect. TA systems are thought to invade bacterial genomes through horizontal gene transfer. Some TA systems might behave as selfish elements and favour their own maintenance at the expense of their host. As a consequence, they may contribute to the maintenance of plasmids or genomic islands, such as super-integrons, by post-segregational killing of the cell that loses these genes and so suffers the stable toxin's destructive effect. The function of the chromosomally encoded TA systems is less clear and still open to debate. This Review discusses current hypotheses regarding the biological roles of these evolutionarily successful small operons. We consider the various selective forces that could drive the maintenance of TA systems in bacterial genomes.
Collapse
Affiliation(s)
- Laurence Van Melderen
- Laboratoire de Génétique et Physiologie Bactérienne, IBMM, Faculté des Sciences, Université Libre de Bruxelles, Gosselies, Belgium
- * E-mail:
| | - Manuel Saavedra De Bast
- Laboratoire de Génétique et Physiologie Bactérienne, IBMM, Faculté des Sciences, Université Libre de Bruxelles, Gosselies, Belgium
| |
Collapse
|
22
|
Abstract
Contemporary protein architectures can be regarded as molecular fossils, historical imprints that mark important milestones in the history of life. Whereas sequences change at a considerable pace, higher-order structures are constrained by the energetic landscape of protein folding, the exploration of sequence and structure space, and complex interactions mediated by the proteostasis and proteolytic machineries of the cell. The survey of architectures in the living world that was fuelled by recent structural genomic initiatives has been summarized in protein classification schemes, and the overall structure of fold space explored with novel bioinformatic approaches. However, metrics of general structural comparison have not yet unified architectural complexity using the 'shared and derived' tenet of evolutionary analysis. In contrast, a shift of focus from molecules to proteomes and a census of protein structure in fully sequenced genomes were able to uncover global evolutionary patterns in the structure of proteins. Timelines of discovery of architectures and functions unfolded episodes of specialization, reductive evolutionary tendencies of architectural repertoires in proteomes and the rise of modularity in the protein world. They revealed a biologically complex ancestral proteome and the early origin of the archaeal lineage. Studies also identified an origin of the protein world in enzymes of nucleotide metabolism harbouring the P-loop-containing triphosphate hydrolase fold and the explosive discovery of metabolic functions that recapitulated well-defined prebiotic shells and involved the recruitment of structures and functions. These observations have important implications for origins of modern biochemistry and diversification of life.
Collapse
|
23
|
Kanduc D, Stufano A, Lucchese G, Kusalik A. Massive peptide sharing between viral and human proteomes. Peptides 2008; 29:1755-66. [PMID: 18582510 PMCID: PMC7115663 DOI: 10.1016/j.peptides.2008.05.022] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/29/2008] [Revised: 05/28/2008] [Accepted: 05/30/2008] [Indexed: 11/10/2022]
Abstract
Thirty viral proteomes were examined for amino acid sequence similarity to the human proteome, and, in parallel, a control of 30 sets of human proteins was analyzed for internal human overlapping. We find that all of the analyzed 30 viral proteomes, independently of their structural or pathogenic characteristics, present a high number of pentapeptide overlaps to the human proteome. Among the examined viruses, human T-lymphotropic virus 1, Rubella virus, and hepatitis C virus present the highest number of viral overlaps to the human proteome. The widespread and ample distribution of viral amino acid sequences through the human proteome indicates that viral and human proteins are formed of common peptide backbone units and suggests a fluid compositional chimerism in phylogenetic entities canonically classified distantly as viruses and Homo sapiens. Importantly, the massive viral to human peptide overlapping calls into question the possibility of a direct causal association between virus-host sharing of amino acid sequences and incitement to autoimmune reactions through molecular recognition of common motifs.
Collapse
Affiliation(s)
- Darja Kanduc
- Department of Biochemistry and Molecular Biology, University of Bari, Bari 70126, Italy.
| | | | | | | |
Collapse
|
24
|
Abstract
A wide variety of peptidases associate with vital biological pathways, but the origin and evolution of their tremendous diversity are poorly defined. Application of the MEROPS classification to a comprehensive set of genomes yields a simple pattern of peptidase distribution and provides insight into the organization of proteolysis in all forms of life. Unexpectedly, a near ubiquitous core set of peptidases is shown to contain more types than those unique to higher multicellular organisms. From this core group, an array of eukaryote-specific peptidases evolved to yield well known intracellular and extracellular processes. The paucity of peptidase families unique to higher metazoa suggests gains in proteolytic network complexity required a limited number of biochemical inventions. These findings provide a framework for deeper investigation into the evolutionary forces that shaped each peptidase family and a roadmap to develop a timeline for their expansion as an interconnected system.
Collapse
Affiliation(s)
- Michael J Page
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | | |
Collapse
|
25
|
Odronitz F, Kollmar M. Drawing the tree of eukaryotic life based on the analysis of 2,269 manually annotated myosins from 328 species. Genome Biol 2008; 8:R196. [PMID: 17877792 PMCID: PMC2375034 DOI: 10.1186/gb-2007-8-9-r196] [Citation(s) in RCA: 283] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2007] [Revised: 09/17/2007] [Accepted: 09/18/2007] [Indexed: 01/03/2023] Open
Abstract
The tree of eukaryotic life was reconstructed based on the analysis of 2,269 myosin motor domains from 328 organisms, confirming some accepted relationships of major taxa and resolving disputed and preliminary classifications. Background The evolutionary history of organisms is expressed in phylogenetic trees. The most widely used phylogenetic trees describing the evolution of all organisms have been constructed based on single-gene phylogenies that, however, often produce conflicting results. Incongruence between phylogenetic trees can result from the violation of the orthology assumption and stochastic and systematic errors. Results Here, we have reconstructed the tree of eukaryotic life based on the analysis of 2,269 myosin motor domains from 328 organisms. All sequences were manually annotated and verified, and were grouped into 35 myosin classes, of which 16 have not been proposed previously. The resultant phylogenetic tree confirms some accepted relationships of major taxa and resolves disputed and preliminary classifications. We place the Viridiplantae after the separation of Euglenozoa, Alveolata, and Stramenopiles, we suggest a monophyletic origin of Entamoebidae, Acanthamoebidae, and Dictyosteliida, and provide evidence for the asynchronous evolution of the Mammalia and Fungi. Conclusion Our analysis of the myosins allowed combining phylogenetic information derived from class-specific trees with the information of myosin class evolution and distribution. This approach is expected to result in superior accuracy compared to single-gene or phylogenomic analyses because the orthology problem is resolved and a strong determinant not depending on any technical uncertainties is incorporated, the class distribution. Combining our analysis of the myosins with high quality analyses of other protein families, for example, that of the kinesins, could help in resolving still questionable dependencies at the origin of eukaryotic life.
Collapse
Affiliation(s)
- Florian Odronitz
- Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg, 37077 Goettingen, Germany
| | - Martin Kollmar
- Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg, 37077 Goettingen, Germany
| |
Collapse
|
26
|
Levasseur A, Pontarotti P, Poch O, Thompson JD. Strategies for reliable exploitation of evolutionary concepts in high throughput biology. Evol Bioinform Online 2008; 4:121-37. [PMID: 19204813 PMCID: PMC2614184 DOI: 10.4137/ebo.s597] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
The recent availability of the complete genome sequences of a large number of model organisms, together with the immense amount of data being produced by the new high-throughput technologies, means that we can now begin comparative analyses to understand the mechanisms involved in the evolution of the genome and their consequences in the study of biological systems. Phylogenetic approaches provide a unique conceptual framework for performing comparative analyses of all this data, for propagating information between different systems and for predicting or inferring new knowledge. As a result, phylogeny-based inference systems are now playing an increasingly important role in most areas of high throughput genomics, including studies of promoters (phylogenetic footprinting), interactomes (based on the presence and degree of conservation of interacting proteins), and in comparisons of transcriptomes or proteomes (phylogenetic proximity and co-regulation/co-expression). Here we review the recent developments aimed at making automatic, reliable phylogeny-based inference feasible in large-scale projects. We also discuss how evolutionary concepts and phylogeny-based inference strategies are now being exploited in order to understand the evolution and function of biological systems. Such advances will be fundamental for the success of the emerging disciplines of systems biology and synthetic biology, and will have wide-reaching effects in applied fields such as biotechnology, medicine and pharmacology.
Collapse
Affiliation(s)
- Anthony Levasseur
- Phylogenomics Laboratory, EA 3781 Evolution Biologique, Université de Provence, 13331 Marseille, France
| | | | | | | |
Collapse
|
27
|
Schuster P. Modeling in biological chemistry. From biochemical kinetics to systems biology. MONATSHEFTE FUR CHEMIE 2008. [DOI: 10.1007/s00706-008-0892-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
28
|
Sun FJ, Caetano-Anollés G. Evolutionary patterns in the sequence and structure of transfer RNA: early origins of archaea and viruses. PLoS Comput Biol 2008; 4:e1000018. [PMID: 18369418 PMCID: PMC2265525 DOI: 10.1371/journal.pcbi.1000018] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2007] [Accepted: 02/01/2008] [Indexed: 02/06/2023] Open
Abstract
Transfer RNAs (tRNAs) are ancient molecules that are central to translation. Since they probably carry evolutionary signatures that were left behind when the living world diversified, we reconstructed phylogenies directly from the sequence and structure of tRNA using well-established phylogenetic methods. The trees placed tRNAs with long variable arms charging Sec, Tyr, Ser, and Leu consistently at the base of the rooted phylogenies, but failed to reveal groupings that would indicate clear evolutionary links to organismal origin or molecular functions. In order to uncover evolutionary patterns in the trees, we forced tRNAs into monophyletic groups using constraint analyses to generate timelines of organismal diversification and test competing evolutionary hypotheses. Remarkably, organismal timelines showed Archaea was the most ancestral superkingdom, followed by viruses, then superkingdoms Eukarya and Bacteria, in that order, supporting conclusions from recent phylogenomic studies of protein architecture. Strikingly, constraint analyses showed that the origin of viruses was not only ancient, but was linked to Archaea. Our findings have important implications. They support the notion that the archaeal lineage was very ancient, resulted in the first organismal divide, and predated diversification of tRNA function and specificity. Results are also consistent with the concept that viruses contributed to the development of the DNA replication machinery during the early diversification of the living world.
Collapse
Affiliation(s)
- Feng-Jie Sun
- Department of Crop Sciences, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Department of Crop Sciences, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
| |
Collapse
|
29
|
Abstract
The evolution of the transfer RNA (tRNA) molecule is controversial but embeds the history of protein biosynthesis, the genetic code, and the origins of diversified life. A new phylogenetic method based on RNA structure that we developed provides new lines of evidence to support the genome tag hypothesis and confirms that the 'top half' of tRNA is more ancient than the 'bottom half'. Timelines of amino acid charging function generated from constraint analyses showed that selenocysteine, tyrosine, serine, and leucine specificities were ancient, while those related to asparagine, methionine, and arginine were more recent. The timelines also uncovered an early role of the second and then first codon bases, identified codons for alanine and proline as the most ancient, and revealed important evolutionary take-overs related to the loss of the long variable arm of tRNA. Furthermore, organismal timelines showed Archaea was the oldest superkingdom, followed by viruses, and superkingdoms Eukarya and Bacteria in that order supporting conclusions from recent phylogenomic studies of protein architecture. Strikingly, results showed that the origin of viruses was not only ancient but was linked to Archaea, supporting the notion that the archaeal lineage is the most ancient on earth and its origin predated diversification of tRNA function and specificity.
Collapse
Affiliation(s)
- Feng-Jie Sun
- Department of Crop Sciences at the University of Illinois at Urbana-Champaign, 61801, USA
| | - Gustavo Caetano-Anollés
- Department of Crop Sciences, University of Illinois, 332 NSRC, 1101 West Peabody Drive, Urbana, Illinois, 61801, USA
| |
Collapse
|
30
|
Evlampiev K, Isambert H. Modeling protein network evolution under genome duplication and domain shuffling. BMC SYSTEMS BIOLOGY 2007; 1:49. [PMID: 17999763 PMCID: PMC2245809 DOI: 10.1186/1752-0509-1-49] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2007] [Accepted: 11/13/2007] [Indexed: 12/26/2022]
Abstract
BACKGROUND Successive whole genome duplications have recently been firmly established in all major eukaryote kingdoms. Such exponential evolutionary processes must have largely contributed to shape the topology of protein-protein interaction (PPI) networks by outweighing, in particular, all time-linear network growths modeled so far. RESULTS We propose and solve a mathematical model of PPI network evolution under successive genome duplications. This demonstrates, from first principles, that evolutionary conservation and scale-free topology are intrinsically linked properties of PPI networks and emerge from i) prevailing exponential network dynamics under duplication and ii) asymmetric divergence of gene duplicates. While required, we argue that this asymmetric divergence arises, in fact, spontaneously at the level of protein-binding sites. This supports a refined model of PPI network evolution in terms of protein domains under exponential and asymmetric duplication/divergence dynamics, with multidomain proteins underlying the combinatorial formation of protein complexes. Genome duplication then provides a powerful source of PPI network innovation by promoting local rearrangements of multidomain proteins on a genome wide scale. Yet, we show that the overall conservation and topology of PPI networks are robust to extensive domain shuffling of multidomain proteins as well as to finer details of protein interaction and evolution. Finally, large scale features of direct and indirect PPI networks of S. cerevisiae are well reproduced numerically with only two adjusted parameters of clear biological significance (i.e. network effective growth rate and average number of protein-binding domains per protein). CONCLUSION This study demonstrates the statistical consequences of genome duplication and domain shuffling on the conservation and topology of PPI networks over a broad evolutionary scale across eukaryote kingdoms. In particular, scale-free topologies of PPI networks, which are found to be robust to extensive shuffling of protein domains, appear to be a simple consequence of the conservation of protein-binding domains under asymmetric duplication/divergence dynamics in the course of evolution.
Collapse
Affiliation(s)
- Kirill Evlampiev
- RNA dynamics and Biomolecular Systems Lab, CNRS UMR168, Institut Curie, Section de Recherche, 11 rue P. & M. Curie, 75005 Paris, France
| | - Hervé Isambert
- RNA dynamics and Biomolecular Systems Lab, CNRS UMR168, Institut Curie, Section de Recherche, 11 rue P. & M. Curie, 75005 Paris, France
| |
Collapse
|
31
|
Wang M, Yafremava LS, Caetano-Anollés D, Mittenthal JE, Caetano-Anollés G. Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world. Genes Dev 2007; 17:1572-85. [PMID: 17908824 PMCID: PMC2045140 DOI: 10.1101/gr.6454307] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2007] [Accepted: 08/23/2007] [Indexed: 11/25/2022]
Abstract
The repertoire of protein architectures in proteomes is evolutionarily conserved and capable of preserving an accurate record of genomic history. Here we use a census of protein architecture in 185 genomes that have been fully sequenced to generate genome-based phylogenies that describe the evolution of the protein world at fold (F) and fold superfamily (FSF) levels. The patterns of representation of F and FSF architectures over evolutionary history suggest three epochs in the evolution of the protein world: (1) architectural diversification, where members of an architecturally rich ancestral community diversified their protein repertoire; (2) superkingdom specification, where superkingdoms Archaea, Bacteria, and Eukarya were specified; and (3) organismal diversification, where F and FSF specific to relatively small sets of organisms appeared as the result of diversification of organismal lineages. Functional annotation of FSF along these architectural chronologies revealed patterns of discovery of biological function. Most importantly, the analysis identified an early and extensive differential loss of architectures occurring primarily in Archaea that segregates the archaeal lineage from the ancient community of organisms and establishes the first organismal divide. Reconstruction of phylogenomic trees of proteomes reflects the timeline of architectural diversification in the emerging lineages. Thus, Archaea undertook a minimalist strategy using only a small subset of the full architectural repertoire and then crystallized into a diversified superkingdom late in evolution. Our analysis also suggests a communal ancestor to all life that was molecularly complex and adopted genomic strategies currently present in Eukarya.
Collapse
Affiliation(s)
- Minglei Wang
- Department of Crop Sciences, University of Illinois at Urbana–Champaign, Urbana, Illinois 61801, USA
| | - Liudmila S. Yafremava
- Department of Crop Sciences, University of Illinois at Urbana–Champaign, Urbana, Illinois 61801, USA
| | - Derek Caetano-Anollés
- Department of Crop Sciences, University of Illinois at Urbana–Champaign, Urbana, Illinois 61801, USA
| | - Jay E. Mittenthal
- Department of Cell and Developmental Biology, University of Illinois at Urbana–Champaign, Urbana, Illinois 61801, USA
| | - Gustavo Caetano-Anollés
- Department of Crop Sciences, University of Illinois at Urbana–Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
32
|
Konstantinidis K, Tebbe A, Klein C, Scheffer B, Aivaliotis M, Bisle B, Falb M, Pfeiffer F, Siedler F, Oesterhelt D. Genome-wide proteomics of Natronomonas pharaonis. J Proteome Res 2007; 6:185-93. [PMID: 17203963 DOI: 10.1021/pr060352q] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The aerobic, haloalkaliphilic archaeon Natronomonas pharaonis is able to survive in salt-saturated lakes of pH 11. According to genome analysis, the theoretical proteome consists of 2843 proteins. To reach further conclusions about its cellular physiology, the cytosolic protein inventory of Nmn. pharaonis has been analyzed using MS/MS on an ESI-Q-TOF mass spectrometer coupled on-line with a nanoLC system. The efficiency of this shotgun approach is illustrated by the identification of 929 proteins of which 886 are soluble proteins representing 41% of the cytosolic proteome. Cell lysis under denaturing conditions in water with subsequent separation by SDS-PAGE prior to nanoLC-MS/MS resulted in identification of 700 proteins. The same number (but a different subset) of proteins was identified upon cell lysis under native conditions followed by size fractionation (retaining protein complexes) prior to SDS-PAGE. Additional size fractionation reduced sample complexity and increased identification reliability. The set of identified proteins covers about 60% of the cytosolic proteins involved in metabolism and genetic information processing. Many of the identified proteins illustrate the high genetic variability among the halophilic archaea.
Collapse
Affiliation(s)
- Kosta Konstantinidis
- Department of Membrane Biochemistry, Max Planck Institute of Biochemistry, Am Klopferspitz 18, 82152 Martinsried, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Bao L, Gu H, Dunn KA, Bielawski JP. Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data. BMC Evol Biol 2007; 7 Suppl 1:S5. [PMID: 17288578 PMCID: PMC1796614 DOI: 10.1186/1471-2148-7-s1-s5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Models of codon evolution have proven useful for investigating the strength and direction of natural selection. In some cases, a priori biological knowledge has been used successfully to model heterogeneous evolutionary dynamics among codon sites. These are called fixed-effect models, and they require that all codon sites are assigned to one of several partitions which are permitted to have independent parameters for selection pressure, evolutionary rate, transition to transversion ratio or codon frequencies. For single gene analysis, partitions might be defined according to protein tertiary structure, and for multiple gene analysis partitions might be defined according to a gene's functional category. Given a set of related fixed-effect models, the task of selecting the model that best fits the data is not trivial. RESULTS In this study, we implement a set of fixed-effect codon models which allow for different levels of heterogeneity among partitions in the substitution process. We describe strategies for selecting among these models by a backward elimination procedure, Akaike information criterion (AIC) or a corrected Akaike information criterion (AICc). We evaluate the performance of these model selection methods via a simulation study, and make several recommendations for real data analysis. Our simulation study indicates that the backward elimination procedure can provide a reliable method for model selection in this setting. We also demonstrate the utility of these models by application to a single-gene dataset partitioned according to tertiary structure (abalone sperm lysin), and a multi-gene dataset partitioned according to the functional category of the gene (flagellar-related proteins of Listeria). CONCLUSION Fixed-effect models have advantages and disadvantages. Fixed-effect models are desirable when data partitions are known to exhibit significant heterogeneity or when a statistical test of such heterogeneity is desired. They have the disadvantage of requiring a priori knowledge for partitioning sites. We recommend: (i) selection of models by using backward elimination rather than AIC or AICc, (ii) use a stringent cut-off, e.g., p = 0.0001, and (iii) conduct sensitivity analysis of results. With thoughtful application, fixed-effect codon models should provide a useful tool for large scale multi-gene analyses.
Collapse
Affiliation(s)
- Le Bao
- Department of Mathematics and Statistics, Dalhousie University Halifax Nova Scotia, Canada
| | - Hong Gu
- Department of Mathematics and Statistics, Dalhousie University Halifax Nova Scotia, Canada
| | - Katherine A Dunn
- Department of Biology, Dalhousie University, Halifax Nova Scotia, Canada
| | - Joseph P Bielawski
- Department of Mathematics and Statistics, Dalhousie University Halifax Nova Scotia, Canada
- Department of Biology, Dalhousie University, Halifax Nova Scotia, Canada
| |
Collapse
|
34
|
von Mering C, Hugenholtz P, Raes J, Tringe SG, Doerks T, Jensen LJ, Ward N, Bork P. Quantitative phylogenetic assessment of microbial communities in diverse environments. Science 2007; 315:1126-30. [PMID: 17272687 DOI: 10.1126/science.1133420] [Citation(s) in RCA: 217] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The taxonomic composition of environmental communities is an important indicator of their ecology and function. We used a set of protein-coding marker genes, extracted from large-scale environmental shotgun sequencing data, to provide a more direct, quantitative, and accurate picture of community composition than that provided by traditional ribosomal RNA-based approaches depending on the polymerase chain reaction. Mapping marker genes from four diverse environmental data sets onto a reference species phylogeny shows that certain communities evolve faster than others. The method also enables determination of preferred habitats for entire microbial clades and provides evidence that such habitat preferences are often remarkably stable over time.
Collapse
Affiliation(s)
- C von Mering
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | | | | | | | | | | | | | | |
Collapse
|
35
|
Chen J, Blackwell TW, Fermin D, Menon R, Chen Y, Gao J, Lee AW, States DJ. Evolutionary-conserved gene expression response profiles across mammalian tissues. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2007; 11:96-115. [PMID: 17411398 DOI: 10.1089/omi.2006.0007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Gene expression responses are complex and frequently involve the actions of many genes to effect coordinated patterns. We hypothesized these coordinated responses are evolutionarily conserved and used a comparison of human and mouse gene expression profiles to identify the most prominent conserved features across a set of normal mammalian tissues. Based on data from multiple studies across multiple tissues in human and mouse, 13 gene expression modes across multiple tissues were identified in each of these species using principal component analysis. Strikingly, 1-to-1 pairing of human and mouse modes was observed in 12 out of 13 modes obtained from the two species independently. These paired modes define evolutionarily conserved gene expression response modes (CGEMs). Notably, in this study we were able to extract biological responses that are not overwhelmed by laboratory-to-laboratory or species-to-species variation. Of the variation in our gene expression dataset, 84% can be explained using these CGEMs. Functional annotation was performed using Gene Ontology, pathway, and transcription factor binding site over representation. Our conclusion is that we found an unbiased way of obtaining conserved gene response modes that accounts for a considerable portion of gene expression variation in a given dataset, as well as validates the conservation of major gene expression response modes across the mammals.
Collapse
Affiliation(s)
- Ji Chen
- Bioinformatics Program, University of Michigan, Ann Arbor, Michigan 48109, USA
| | | | | | | | | | | | | | | |
Collapse
|
36
|
Abstract
Research into the origins of introns is at a critical juncture in the resolution of theories on the evolution of early life (which came first, RNA or DNA?), the identity of LUCA (the last universal common ancestor, was it prokaryotic- or eukaryotic-like?), and the significance of noncoding nucleotide variation. One early notion was that introns would have evolved as a component of an efficient mechanism for the origin of genes. But alternative theories emerged as well. From the debate between the "introns-early" and "introns-late" theories came the proposal that introns arose before the origin of genetically encoded proteins and DNA, and the more recent "introns-first" theory, which postulates the presence of introns at that early evolutionary stage from a reconstruction of the "RNA world." Here we review seminal and recent ideas about intron origins. Recent discoveries about the patterns and causes of intron evolution make this one of the most hotly debated and exciting topics in molecular evolutionary biology today.
Collapse
Affiliation(s)
- Francisco Rodríguez-Trelles
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697-2525, USA.
| | | | | |
Collapse
|
37
|
Abstract
It is proposed that the pre-cellular stage of biological evolution, including the Last Universal Common Ancestor (LUCA) of modern cellular life forms, occurred within networks of inorganic compartments that hosted a diverse mix of virus-like genetic elements. This viral model of cellular origin recapitulates the early ideas of J.B.S. Haldane, sketched in his 1928 essay on the origin of life. However, unlike in Haldane's day, there is substantial empirical support for this scenario from three major lines of evidence provided by comparative genomics: (i) the lack of homology among the core components of the DNA replication systems between the two primary lines of descent of cellular life forms, archaea and bacteria, (ii) the similar lack of homology between the enzymes of lipid biosynthesis in conjunction with distinct membrane chemistries in archaea and bacteria, and (iii) the spread of several viral hallmark genes, which encode proteins with key functions in viral replication and morphogenesis, among numerous and extremely diverse groups of viruses, in contrast to their absence in cellular life forms. Under the viral model of pre-cellular evolution, the key elements of cells including the replication apparatus, membranes, molecular complexes involved in membrane transport and translocation, and others originated as components of virus-like entities. This model alleviates, at least in part, the challenge of the emergence of the immensely complex organization of modern cells.
Collapse
Affiliation(s)
- Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda
| |
Collapse
|
38
|
Fischer D. Servers for protein structure prediction. Curr Opin Struct Biol 2006; 16:178-82. [PMID: 16546376 DOI: 10.1016/j.sbi.2006.03.004] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2006] [Revised: 02/14/2006] [Accepted: 03/07/2006] [Indexed: 11/18/2022]
Abstract
The 1990s cultivated a generation of protein structure human predictors. As a result of structural genomics and genome sequencing projects, and significant improvements in the performance of protein structure prediction methods, a generation of automated servers has evolved in the past few years. Servers for close and distant homology modeling are now routinely used by many biologists, and have already been applied to the experimental structure determination process itself, and to the interpretation and annotation of genome sequences. Because dozens of servers are currently available, it is hard for a biologist to know which server(s) to use; however, the state of the art of these methods is now assessed through the LiveBench and CAFASP experiments. Meta-servers--servers that use the results of other autonomous servers to produce a consensus prediction--have proven to be the best performers, and are already challenging all but a handful of expert human predictors. The difference in performance of the top ten autonomous (non-meta) servers is small and hard to assess using relatively small test sets. Recent experiments suggest that servers will soon free humans from most of the burden of protein structure prediction.
Collapse
Affiliation(s)
- Daniel Fischer
- Buffalo Center of Excellence in Bioinformatics, and Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY 14260, USA.
| |
Collapse
|
39
|
Duggin IG, Bell SD. The chromosome replication machinery of the archaeon Sulfolobus solfataricus. J Biol Chem 2006; 281:15029-32. [PMID: 16467299 DOI: 10.1074/jbc.r500029200] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
In the three domains of life, the archaea, bacteria, and eukarya, there are two general lineages of DNA replication proteins: the bacterial and the eukaryal/archaeal lineages. The hyperthermophilic archaeon Sulfolobus solfataricus provides an attractive model for biochemical study of DNA replication. Its relative simplicity in both genomic and biochemical contexts, together with high protein thermostability, has already provided insight into the function of the more complex yet homologous molecules of the eukaryotic domain. Here, we provide an overview of recent insights into the functioning of the chromosome replication machinery of S. solfataricus, focusing on some of the relatively well characterized core components that act at the DNA replication fork.
Collapse
Affiliation(s)
- Iain G Duggin
- MRC Cancer Cell Unit, Hutchison/Medical Research Council Research Centre, Hills Road, Cambridge CB2 2XZ, United Kingdom
| | | |
Collapse
|