1
|
Sevillya G, Snir S. Synteny footprints provide clearer phylogenetic signal than sequence data for prokaryotic classification. Mol Phylogenet Evol 2019; 136:128-137. [DOI: 10.1016/j.ympev.2019.03.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Revised: 03/07/2019] [Accepted: 03/17/2019] [Indexed: 01/22/2023]
|
2
|
Comparative analyses of whole-genome protein sequences from multiple organisms. Sci Rep 2018; 8:6800. [PMID: 29717164 PMCID: PMC5931523 DOI: 10.1038/s41598-018-25090-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Accepted: 04/16/2018] [Indexed: 12/02/2022] Open
Abstract
Phylogenies based on entire genomes are a powerful tool for reconstructing the Tree of Life. Several methods have been proposed, most of which employ an alignment-free strategy. Average sequence similarity methods are different than most other whole-genome methods, because they are based on local alignments. However, previous average similarity methods fail to reconstruct a correct phylogeny when compared against other whole-genome trees. In this study, we developed a novel average sequence similarity method. Our method correctly reconstructs the phylogenetic tree of in silico evolved E. coli proteomes. We applied the method to reconstruct a whole-proteome phylogeny of 1,087 species from all three domains of life, Bacteria, Archaea, and Eucarya. Our tree was automatically reconstructed without any human decisions, such as the selection of organisms. The tree exhibits a concentric circle-like structure, indicating that all the organisms have similar total branch lengths from their common ancestor. Branching patterns of the members of each phylum of Bacteria and Archaea are largely consistent with previous reports. The topologies are largely consistent with those reconstructed by other methods. These results strongly suggest that this approach has sufficient taxonomic resolution and reliability to infer phylogeny, from phylum to strain, of a wide range of organisms.
Collapse
|
3
|
Paquola ACM, Asif H, Pereira CADB, Feltes BC, Bonatto D, Lima WC, Menck CFM. Horizontal Gene Transfer Building Prokaryote Genomes: Genes Related to Exchange Between Cell and Environment are Frequently Transferred. J Mol Evol 2018; 86:190-203. [DOI: 10.1007/s00239-018-9836-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 03/15/2018] [Indexed: 10/17/2022]
|
4
|
Abstract
The patchy distribution of genes across the prokaryotes may be caused by multiple gene losses or lateral transfer. Probabilistic models of gene gain and loss are needed to distinguish between these possibilities. Existing models allow only single genes to be gained and lost, despite the empirical evidence for multi-gene events. We compare birth-death models (currently the only widely-used models, in which only one gene can be gained or lost at a time) to blocks models (allowing gain and loss of multiple genes within a family). We analyze two pairs of genomes: two E. coli strains, and the distantly-related Archaeoglobus fulgidus (archaea) and Bacillus subtilis (gram positive bacteria). Blocks models describe the data much better than birth-death models. Our models suggest that lateral transfers of multiple genes from the same family are rare (although transfers of single genes are probably common). For both pairs, the estimated median time that a gene will remain in the genome is not much greater than the time separating the common ancestors of the archaea and bacteria. Deep phylogenetic reconstruction from sequence data will therefore depend on choosing genes likely to remain in the genome for a long time. Phylogenies based on the blocks model are more biologically plausible than phylogenies based on the birth-death model.
Collapse
Affiliation(s)
- Matthew Spencer
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Andrew J. Roger
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
5
|
Snir S. Ordered orthology as a tool in prokaryotic evolutionary inference. Mob Genet Elements 2017; 6:e1120576. [PMID: 28090377 DOI: 10.1080/2159256x.2015.1120576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Revised: 10/27/2015] [Accepted: 11/10/2015] [Indexed: 10/22/2022] Open
Abstract
Molecular data is accumulated at exponentially increasing pace. This deluge of information should have brought us closer to resolving one of the most fundamental issues in biology - deciphering the history of life on Earth. So far, however, this abundance of data only seems to blur our understanding of the problem. This is largely due to horizontal gene transfer (HGT), the transfer of genetic material between evolutionarily unrelated organisms that transforms the prokaryotic tree into a network of relationships. Recently, we developed a method to infer evolutionary relationships among closely related species where the conventional evolutionary markers do not provide a strong enough signal. The method relies on the loss of synteny, gene order conservation among species that provides a stronger signal, sufficient to classify even strains of a given species. Here we elaborate on this method and suggest further uses of it in the context of detecting HGT events and genome architecture.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary Biology, University of Haifa , Haifa, Israel
| |
Collapse
|
6
|
No evidence of inhibition of horizontal gene transfer by CRISPR-Cas on evolutionary timescales. ISME JOURNAL 2015; 9:2021-7. [PMID: 25710183 PMCID: PMC4542034 DOI: 10.1038/ismej.2015.20] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Revised: 11/24/2014] [Accepted: 01/08/2015] [Indexed: 11/23/2022]
Abstract
The CRISPR (clustered, regularly, interspaced, short, palindromic repeats)–Cas (CRISPR-associated genes) systems of archaea and bacteria provide adaptive immunity against viruses and other selfish elements and are believed to curtail horizontal gene transfer (HGT). Limiting acquisition of new genetic material could be one of the sources of the fitness cost of CRISPR–Cas maintenance and one of the causes of the patchy distribution of CRISPR–Cas among bacteria, and across environments. We sought to test the hypothesis that the activity of CRISPR–Cas in microbes is negatively correlated with the extent of recent HGT. Using three independent measures of HGT, we found no significant dependence between the length of CRISPR arrays, which reflects the activity of the immune system, and the estimated number of recent HGT events. In contrast, we observed a significant negative dependence between the estimated extent of HGT and growth temperature of microbes, which could be explained by the lower genetic diversity in hotter environments. We hypothesize that the relevant events in the evolution of resistance to mobile elements and proclivity for HGT, to which CRISPR–Cas systems seem to substantially contribute, occur on the population scale rather than on the timescale of species evolution.
Collapse
|
7
|
Shifman A, Ninyo N, Gophna U, Snir S. Phylo SI: a new genome-wide approach for prokaryotic phylogeny. Nucleic Acids Res 2013; 42:2391-404. [PMID: 24243847 PMCID: PMC3936750 DOI: 10.1093/nar/gkt1138] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The evolutionary history of all life forms is usually represented as a vertical tree-like process. In prokaryotes, however, the vertical signal is partly obscured by the massive influence of horizontal gene transfer (HGT). The HGT creates widespread discordance between evolutionary histories of different genes as genomes become mosaics of gene histories. Thus, the Tree of Life (TOL) has been questioned as an appropriate representation of the evolution of prokaryotes. Nevertheless a common hypothesis is that prokaryotic evolution is primarily tree-like, and a routine effort is made to place new isolates in their appropriate location in the TOL. Moreover, it appears desirable to exploit non–tree-like evolutionary processes for the task of microbial classification. In this work, we present a novel technique that builds on the straightforward observation that gene order conservation (‘synteny’) decreases in time as a result of gene mobility. This is particularly true in prokaryotes, mainly due to HGT. Using a ‘synteny index’ (SI) that measures the average synteny between a pair of genomes, we developed the phylogenetic reconstruction tool ‘Phylo SI’. Phylo SI offers several attractive properties such as easy bootstrapping, high sensitivity in cases where phylogenetic signal is weak and computational efficiency. Phylo SI was tested both on simulated data and on two bacterial data sets and compared with two well-established phylogenetic methods. Phylo SI is particularly efficient on short evolutionary distances where synteny footprints remain detectable, whereas the nucleotide substitution signal is too weak for reliable sequence-based phylogenetic reconstruction. The method is publicly available at http://research.haifa.ac.il/ssagi/software/PhyloSI.zip.
Collapse
Affiliation(s)
- Anton Shifman
- Department of Evolutionary & Environmental Biology, University of Haifa, Haifa 31905 Israel, Department of Molecular Microbiology and Biotechnology Tel Aviv University, Tel Aviv 69978, Israel and National Evolutionary Synthesis Center, 2024 W. Main Street A200, Durham, NC 27705, USA
| | | | | | | |
Collapse
|
8
|
Satoh S, Mimuro M, Tanaka A. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences. PLoS One 2013; 8:e70290. [PMID: 23922968 PMCID: PMC3724816 DOI: 10.1371/journal.pone.0070290] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2012] [Accepted: 06/18/2013] [Indexed: 12/03/2022] Open
Abstract
Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.
Collapse
Affiliation(s)
- Soichirou Satoh
- Graduate School of Life and Environmental Science, Kyoto Prefectural University, Kyoto, Japan
- Institute of Low Temperature Science, Hokkaido University, Sapporo, Japan
| | - Mamoru Mimuro
- Graduate School of Human and Environmental Studies, Kyoto University, Kyoto, Japan
| | - Ayumi Tanaka
- Institute of Low Temperature Science, Hokkaido University, Sapporo, Japan
- CREST, Japan Science and Technology Agency, Sapporo, Japan
- * E-mail:
| |
Collapse
|
9
|
Meinel T, Krause A. Meta-analysis of general bacterial subclades in whole-genome phylogenies using tree topology profiling. Evol Bioinform Online 2012; 8:489-525. [PMID: 22915837 PMCID: PMC3422217 DOI: 10.4137/ebo.s9642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
In the last two decades, a large number of whole-genome phylogenies have been inferred to reconstruct the Tree of Life (ToL). Underlying data models range from gene or functionality content in species to phylogenetic gene family trees and multiple sequence alignments of concatenated protein sequences. Diversity in data models together with the use of different tree reconstruction techniques, disruptive biological effects and the steadily increasing number of genomes have led to a huge diversity in published phylogenies. Comparison of those and, moreover, identification of the impact of inference properties (underlying data model, inference technique) on particular reconstructions is almost impossible. In this work, we introduce tree topology profiling as a method to compare already published whole-genome phylogenies. This method requires visual determination of the particular topology in a drawn whole-genome phylogeny for a set of particular bacterial clans. For each clan, neighborhoods to other bacteria are collected into a catalogue of generalized alternative topologies. Particular topology alternatives found for an ordered list of bacterial clans reveal a topology profile that represents the analyzed phylogeny. To simulate the inhomogeneity of published gene content phylogenies we generate a set of seven phylogenies using different inference techniques and the SYSTERS-PhyloMatrix data model. After tree topology profiling on in total 54 selected published and newly inferred phylogenies, we separate artefactual from biologically meaningful phylogenies and associate particular inference results (phylogenies) with inference background (inference techniques as well as data models). Topological relationships of particular bacterial species groups are presented. With this work we introduce tree topology profiling into the scientific field of comparative phylogenomics.
Collapse
Affiliation(s)
- Thomas Meinel
- Charité-University Medicine Berlin, Institute for Physiology, Structural Bioinformatics Group, Thielallee 71, 14195 Berlin, Germany
| | | |
Collapse
|
10
|
Williams D, Fournier GP, Lapierre P, Swithers KS, Green AG, Andam CP, Gogarten JP. A rooted net of life. Biol Direct 2011; 6:45. [PMID: 21936906 PMCID: PMC3189188 DOI: 10.1186/1745-6150-6-45] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2011] [Accepted: 09/21/2011] [Indexed: 01/29/2023] Open
Abstract
Abstract Phylogenetic reconstruction using DNA and protein sequences has allowed the reconstruction of evolutionary histories encompassing all life. We present and discuss a means to incorporate much of this rich narrative into a single model that acknowledges the discrete evolutionary units that constitute the organism. Briefly, this Rooted Net of Life genome phylogeny is constructed around an initial, well resolved and rooted tree scaffold inferred from a supermatrix of combined ribosomal genes. Extant sampled ribosomes form the leaves of the tree scaffold. These leaves, but not necessarily the deeper parts of the scaffold, can be considered to represent a genome or pan-genome, and to be associated with members of other gene families within that sequenced (pan)genome. Unrooted phylogenies of gene families containing four or more members are reconstructed and superimposed over the scaffold. Initially, reticulations are formed where incongruities between topologies exist. Given sufficient evidence, edges may then be differentiated as those representing vertical lines of inheritance within lineages and those representing horizontal genetic transfers or endosymbioses between lineages. Reviewers W. Ford Doolittle, Eric Bapteste and Robert Beiko.
Collapse
Affiliation(s)
- David Williams
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125, USA.
| | | | | | | | | | | | | |
Collapse
|
11
|
Andam CP, Fournier GP, Gogarten JP. Multilevel populations and the evolution of antibiotic resistance through horizontal gene transfer. FEMS Microbiol Rev 2011; 35:756-67. [DOI: 10.1111/j.1574-6976.2011.00274.x] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
12
|
Abstract
BACKGROUND Genome sequencing has revolutionized our view of the relationships among genomes, particularly in revealing the confounding effects of lateral genetic transfer (LGT). Phylogenomic techniques have been used to construct purported trees of microbial life. Although such trees are easily interpreted and allow the use of a subset of genomes as "proxies" for the full set, LGT and other phenomena impact the positioning of different groups in genome trees, confounding and potentially invalidating attempts to construct a phylogeny-based taxonomy of microorganisms. Network and graph approaches can reveal complex sets of relationships, but applying these techniques to large data sets is a significant challenge. Notwithstanding the question of what exactly it might represent, generating and interpreting a Tree or Network of All Genomes will only be feasible if current algorithms can be improved upon. RESULTS Complex relationships among even the most-similar genomes demonstrate that proxy-based approaches to simplifying large sets of genomes are not alone sufficient to solve the analysis problem. A phylogenomic analysis of 1173 sequenced bacterial and archaeal genomes generated phylogenetic trees for 159,905 distinct homologous gene sets. The relationships inferred from this set can be heavily dependent on the inclusion of other taxa: for example, phyla such as Spirochaetes, Proteobacteria and Firmicutes are recovered as cohesive groups or split depending on the presence of other specific lineages. Furthermore, named groups such as Acidithiobacillus, Coprothermobacter and Brachyspira show a multitude of affiliations that are more consistent with their ecology than with small subunit ribosomal DNA-based taxonomy. Network and graph representations can illustrate the multitude of conflicting affinities, but all methods impose constraints on the input data and create challenges of construction and interpretation. CONCLUSIONS These complex relationships highlight the need for an inclusive approach to genomic data, and current methods with minor alterations will likely scale to allow the analysis of data sets with 10,000 or more genomes. The main challenges lie in the visualization and interpretation of genomic relationships, and the redefinition of microbial taxonomy when subsets of genomic data are so evidently in conflict with one another, and with the "canonical" molecular taxonomy.
Collapse
Affiliation(s)
- Robert G Beiko
- Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 1W5 Canada.
| |
Collapse
|
13
|
Stone AC, Wilbur AK, Buikstra JE, Roberts CA. Tuberculosis and leprosy in perspective. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2010; 140 Suppl 49:66-94. [PMID: 19890861 DOI: 10.1002/ajpa.21185] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Two of humankind's most socially and psychologically devastating diseases, tuberculosis and leprosy, have been the subject of intensive paleopathological research due to their antiquity, a presumed association with human settlement and subsistence patterns, and their propensity to leave characteristic lesions on skeletal and mummified remains. Despite a long history of medical research and the development of effective chemotherapy, these diseases remain global health threats even in the 21st century, and as such, their causative agents Mycobacterium tuberculosis and M. leprae, respectively, have recently been the subject of molecular genetics research. The new genome-level data for several mycobacterial species have informed extensive phylogenetic analyses that call into question previously accepted theories concerning the origins and antiquity of these diseases. Of special note is the fact that all new models are in broad agreement that human TB predated that in other animals, including cattle and other domesticates, and that this disease originated at least 35,000 years ago and probably closer to 2.6 million years ago. In this work, we review current phylogenetic and biogeographic models derived from molecular biology and explore their implications for the global development of TB and leprosy, past and present. In so doing, we also briefly review the skeletal evidence for TB and leprosy, explore the current status of these pathogens, critically consider current methods for identifying ancient mycobacterial DNA, and evaluate coevolutionary models.
Collapse
Affiliation(s)
- Anne C Stone
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ 85287, USA.
| | | | | | | |
Collapse
|
14
|
Abstract
We present the pan-genome tree as a tool for visualizing similarities and differences between closely related microbial genomes within a species or genus. Distance between genomes is computed as a weighted relative Manhattan distance based on gene family presence/absence. The weights can be chosen with emphasis on groups of gene families conserved to various degrees inside the pan-genome. The software is available for free as an R-package.
Collapse
|
15
|
Bapteste E, O'Malley MA, Beiko RG, Ereshefsky M, Gogarten JP, Franklin-Hall L, Lapointe FJ, Dupré J, Dagan T, Boucher Y, Martin W. Prokaryotic evolution and the tree of life are two different things. Biol Direct 2009; 4:34. [PMID: 19788731 PMCID: PMC2761302 DOI: 10.1186/1745-6150-4-34] [Citation(s) in RCA: 128] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2009] [Accepted: 09/29/2009] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The concept of a tree of life is prevalent in the evolutionary literature. It stems from attempting to obtain a grand unified natural system that reflects a recurrent process of species and lineage splittings for all forms of life. Traditionally, the discipline of systematics operates in a similar hierarchy of bifurcating (sometimes multifurcating) categories. The assumption of a universal tree of life hinges upon the process of evolution being tree-like throughout all forms of life and all of biological time. In multicellular eukaryotes, the molecular mechanisms and species-level population genetics of variation do indeed mainly cause a tree-like structure over time. In prokaryotes, they do not. Prokaryotic evolution and the tree of life are two different things, and we need to treat them as such, rather than extrapolating from macroscopic life to prokaryotes. In the following we will consider this circumstance from philosophical, scientific, and epistemological perspectives, surmising that phylogeny opted for a single model as a holdover from the Modern Synthesis of evolution. RESULTS It was far easier to envision and defend the concept of a universal tree of life before we had data from genomes. But the belief that prokaryotes are related by such a tree has now become stronger than the data to support it. The monistic concept of a single universal tree of life appears, in the face of genome data, increasingly obsolete. This traditional model to describe evolution is no longer the most scientifically productive position to hold, because of the plurality of evolutionary patterns and mechanisms involved. Forcing a single bifurcating scheme onto prokaryotic evolution disregards the non-tree-like nature of natural variation among prokaryotes and accounts for only a minority of observations from genomes. CONCLUSION Prokaryotic evolution and the tree of life are two different things. Hence we will briefly set out alternative models to the tree of life to study their evolution. Ultimately, the plurality of evolutionary patterns and mechanisms involved, such as the discontinuity of the process of evolution across the prokaryote-eukaryote divide, summons forth a pluralistic approach to studying evolution. REVIEWERS This article was reviewed by Ford Doolittle, John Logsdon and Nicolas Galtier.
Collapse
|
16
|
Abstract
A universal Tree of Life has been a longstanding goal of the biosciences. The most common Tree of Life, based on the small subunit rRNA gene, may or may not represent the phylogenetic history of microorganisms. The horizontal transfer of genes from one taxon to another provides a means by which each gene may tell of an independent history. When complete genomes became available, the extent to which horizontal gene transfer (HGT) has occurred became more evident. When using genomic data to study the Tree of Life, one can use any of the four broad approaches: (i) build lots of individual gene trees ("phylogenomics"), (ii) concatenate genes together for an analysis yielding one "supergene" tree, (iii) form a single tree based on the "gene content" within genomes using either orthologs or homologs, or (iv) investigate the order of genes within genomes to discern some aspects of microbial evolution. The application of whole genome tree building has suggested that there is a core tree, that such a core tree can be investigated using these varied methods, and that the results are largely similar to those of the rRNA universal Tree of Life. Some of the most interesting features of the rRNA tree, such as early diverging hyperthermophilic lineages are still uncertain, but remain a possibility. Genomic trees and geologic evidence together suggest that the vertical descent of genes and the horizontal transfer of genes between genetically similar lineages ultimately results in a core Tree of Life with at least some lineages that have phenotypic characteristics recognizable for billions of years.
Collapse
Affiliation(s)
- Christopher H House
- Department of Geosciences and Pennsylvania State Astrobiology Research Center, Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
17
|
Beiko RG, Doolittle WF, Charlebois RL. The Impact of Reticulate Evolution on Genome Phylogeny. Syst Biol 2008; 57:844-56. [DOI: 10.1080/10635150802559265] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Affiliation(s)
- Robert G. Beiko
- Faculty of Computer Science, Dalhousie University, and Institute for Molecular Bioscience/ARC Centre for Bioinformatics
Brisbane, Australia; E-mail:
| | - W. Ford Doolittle
- Genome Atlantic, Department of Biochemistry & Molecular Biology, Dalhousie University
Halifax, Nova Scotia, Canada
| | - Robert L. Charlebois
- Genome Atlantic, Department of Biochemistry & Molecular Biology, Dalhousie University
Halifax, Nova Scotia, Canada
| |
Collapse
|
18
|
Koonin EV, Wolf YI. Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res 2008; 36:6688-719. [PMID: 18948295 PMCID: PMC2588523 DOI: 10.1093/nar/gkn668] [Citation(s) in RCA: 534] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The first bacterial genome was sequenced in 1995, and the first archaeal genome in 1996. Soon after these breakthroughs, an exponential rate of genome sequencing was established, with a doubling time of approximately 20 months for bacteria and approximately 34 months for archaea. Comparative analysis of the hundreds of sequenced bacterial and dozens of archaeal genomes leads to several generalizations on the principles of genome organization and evolution. A crucial finding that enables functional characterization of the sequenced genomes and evolutionary reconstruction is that the majority of archaeal and bacterial genes have conserved orthologs in other, often, distant organisms. However, comparative genomics also shows that horizontal gene transfer (HGT) is a dominant force of prokaryotic evolution, along with the loss of genetic material resulting in genome contraction. A crucial component of the prokaryotic world is the mobilome, the enormous collection of viruses, plasmids and other selfish elements, which are in constant exchange with more stable chromosomes and serve as HGT vehicles. Thus, the prokaryotic genome space is a tightly connected, although compartmentalized, network, a novel notion that undermines the ‘Tree of Life’ model of evolution and requires a new conceptual framework and tools for the study of prokaryotic evolution.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | | |
Collapse
|
19
|
Peregrín-Alvarez JM, Parkinson J. The global landscape of sequence diversity. Genome Biol 2008; 8:R238. [PMID: 17996061 PMCID: PMC2258180 DOI: 10.1186/gb-2007-8-11-r238] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2007] [Revised: 10/18/2007] [Accepted: 11/08/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Systematic comparisons between genomic sequence datasets have revealed a wide spectrum of sequence specificity from sequences that are highly conserved to those that are specific to individual species. Due to the limited number of fully sequenced eukaryotic genomes, analyses of this spectrum have largely focused on prokaryotes. Combining existing genomic datasets with the partial genomes of 193 eukaryotes derived from collections of expressed sequence tags, we performed a quantitative analysis of the sequence specificity spectrum to provide a global view of the origins and extent of sequence diversity across the three domains of life. RESULTS Comparisons with prokaryotic datasets reveal a greater genetic diversity within eukaryotes that may be related to differences in modes of genetic inheritance. Mapping this diversity within a phylogenetic framework revealed that the majority of sequences are either highly conserved or specific to the species or taxon from which they derive. Between these two extremes, several evolutionary landmarks consisting of large numbers of sequences conserved within specific taxonomic groups were identified. For example, 8% of sequences derived from metazoan species are specific and conserved within the metazoan lineage. Many of these sequences likely mediate metazoan specific functions, such as cell-cell communication and differentiation. CONCLUSION Through the use of partial genome datasets, this study provides a unique perspective of sequence conservation across the three domains of life. The provision of taxon restricted sequences should prove valuable for future computational and biochemical analyses aimed at understanding evolutionary and functional relationships.
Collapse
Affiliation(s)
- José Manuel Peregrín-Alvarez
- Molecular Structure and Function, Hospital for Sick Children, 555 University Avenue, Toronto, ON M5G 1X8, Canada.
| | | |
Collapse
|
20
|
Wellner A, Gophna U. Neutrality of foreign complex subunits in an experimental model of lateral gene transfer. Mol Biol Evol 2008; 25:1835-40. [PMID: 18550618 DOI: 10.1093/molbev/msn131] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Lateral gene transfer (LGT) is a powerful force in microbial evolution. However, the barriers that restrict this evolutionary phenomenon are not fully understood. It has long been observed that genes that encode subunits of complexes exhibit relatively compatible phylogenies, implying mostly vertical evolution. This may be explained by the failure of a new gene product to effectively interact with preexisting protein subunits, making its acquisition neutral--a theory termed the "complexity hypothesis." On the other hand, such genes may reduce the fitness of the host by disturbing the stoichiometric balance between complex subunits, resulting in purifying selection against gene retention. To examine these 2 alternative scenarios, we designed an experimental system that mimics the transfer of genes encoding homologs of essential complex subunits into the model bacterium Escherichia coli. In addition, we overexpressed the native E. coli gene in order to examine the contribution of gene dosage effects. We show that accumulation of native or foreign complex subunits in the cell does not result in loss of fitness, except for a minor fitness reduction observed for a single foreign homolog. Indeed, a series of genetic and biochemical assays failed to detect any interaction between the foreign subunits and the native polypeptides of the complex, implying an inability of such transfer events to generate positive selection for gene retention. We conclude that LGT of complex subunits may be mostly neutral and that forces operating against gene retention appear to be moderate.
Collapse
Affiliation(s)
- Alon Wellner
- Department of Molecular Microbiology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | | |
Collapse
|
21
|
Wellner A, Lurie MN, Gophna U. Complexity, connectivity, and duplicability as barriers to lateral gene transfer. Genome Biol 2008; 8:R156. [PMID: 17678544 PMCID: PMC2374987 DOI: 10.1186/gb-2007-8-8-r156] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2007] [Revised: 07/10/2007] [Accepted: 08/02/2007] [Indexed: 11/25/2022] Open
Abstract
Laterally transferred genes are shown to be less involved in protein-protein interactions, and essential genes that exhibit low duplicability and high connectivity do exhibit mostly vertical descent. Background Lateral gene transfer is a major force in microbial evolution and a great source of genetic innovation in prokaryotes. Protein complexity has been claimed to be a barrier for gene transfer, due to either the inability of a new gene's encoded protein to become a subunit of an existing complex (lack of positive selection), or from a harmful effect exerted by the newcomer on native protein assemblages (negative selection). Results We tested these scenarios using data from the model prokaryote Escherichia coli. Surprisingly, the data did not support an inverse link between membership in protein complexes and gene transfer. As the complexity hypothesis, in its strictest sense, seemed valid only to essential complexes, we broadened its scope to include connectivity in general. Transferred genes are found to be less involved in protein-protein interactions, outside stable complexes, and this is especially true for genes recently transferred to the E. coli genome. Thus, subsequent to transfer, new genes probably integrate slowly into existing protein-interaction networks. We show that a low duplicability of a gene is linked to a lower chance of being horizontally transferred. Notably, many essential genes in E. coli are conserved as singletons across multiple related genomes, have high connectivity and a highly vertical phylogenetic signal. Conclusion High complexity and connectivity generally do not impede gene transfer. However, essential genes that exhibit low duplicability and high connectivity do exhibit mostly vertical descent.
Collapse
Affiliation(s)
- Alon Wellner
- Department of Molecular Microbiology and Biotechnology, George S Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel, 69978
| | - Mor N Lurie
- Department of Molecular Microbiology and Biotechnology, George S Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel, 69978
| | - Uri Gophna
- Department of Molecular Microbiology and Biotechnology, George S Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel, 69978
| |
Collapse
|
22
|
Konstantinidis KT, Tiedje JM. Prokaryotic taxonomy and phylogeny in the genomic era: advancements and challenges ahead. Curr Opin Microbiol 2007; 10:504-9. [PMID: 17923431 DOI: 10.1016/j.mib.2007.08.006] [Citation(s) in RCA: 290] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2007] [Revised: 08/27/2007] [Accepted: 08/29/2007] [Indexed: 10/22/2022]
Abstract
Advancing prokaryotic taxonomy constitutes a contemporary academic challenge as well as practical necessity. Genome sequencing has greatly facilitated the evaluation of the current taxonomic system and the development of simpler, more portable and accurate, sequence-based alternatives to substitute for the traditional cumbersome methods. Studies based on the former genome-enabled methods reveal that existing taxonomic designations, including the species level, correspond frequently to a continuum of genetic diversity as opposed to natural groupings (e.g. biological species). Improving such artificial and often ambiguous taxonomic designations, however, will require larger genomic datasets and more carefully designed sampling of natural populations. Only then can the promise for a superior genome-based taxonomy materialize.
Collapse
|
23
|
Spencer M, Bryant D, Susko E. Conditioned genome reconstruction: how to avoid choosing the conditioning genome. Syst Biol 2007; 56:25-43. [PMID: 17366135 DOI: 10.1080/10635150601156313] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Genome phylogenies can be inferred from data on the presence and absence of genes across taxa. Logdet distances may be a good method, because they allow expected genome size to vary across the tree. Recently, Lake and Rivera proposed conditioned genome reconstruction (calculation of logdet distances using only those genes present in a conditioning genome) to deal with unobservable genes that are absent from every taxon of interest. We prove that their method can consistently estimate the topology for almost any choice of conditioning genome. Nevertheless, the choice of conditioning genome is important for small samples. For real bacterial genome data, different choices of conditioning genome can result in strong bootstrap support for different tree topologies. To overcome this problem, we developed supertree methods that combine information from all choices of conditioning genome. One of these methods, based on the BIONJ algorithm, performs well on simulated data and may have applications to other supertree problems. However, an analysis of 40 bacterial genomes using this method supports an incorrect clade of parasites. This is a common feature of model-based gene content methods and is due to parallel gene loss.
Collapse
Affiliation(s)
- Matthew Spencer
- Department of Mathematics and Statistics, Dalhousie University, Hali, Nova Scotia, B3H 3J5, Canada.
| | | | | |
Collapse
|
24
|
The power of phylogenetic approaches to detect horizontally transferred genes. BMC Evol Biol 2007; 7:45. [PMID: 17376230 PMCID: PMC1847511 DOI: 10.1186/1471-2148-7-45] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2006] [Accepted: 03/21/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Horizontal gene transfer plays an important role in evolution because it sometimes allows recipient lineages to adapt to new ecological niches. High genes transfer frequencies were inferred for prokaryotic and early eukaryotic evolution. Does horizontal gene transfer also impact phylogenetic reconstruction of the evolutionary history of genomes and organisms? The answer to this question depends at least in part on the actual gene transfer frequencies and on the ability to weed out transferred genes from further analyses. Are the detected transfers mainly false positives, or are they the tip of an iceberg of many transfer events most of which go undetected by current methods? RESULTS Phylogenetic detection methods appear to be the method of choice to infer gene transfers, especially for ancient transfers and those followed by orthologous replacement. Here we explore how well some of these methods perform using in silico transfers between the terminal branches of a gamma proteobacterial, genome based phylogeny. For the experiments performed here on average the AU test at a 5% significance level detects 90.3% of the transfers and 91% of the exchanges as significant. Using the Robinson-Foulds distance only 57.7% of the exchanges and 60% of the donations were identified as significant. Analyses using bipartition spectra appeared most successful in our test case. The power of detection was on average 97% using a 70% cut-off and 94.2% with 90% cut-off for identifying conflicting bipartitions, while the rate of false positives was below 4.2% and 2.1% for the two cut-offs, respectively. For all methods the detection rates improved when more intervening branches separated donor and recipient. CONCLUSION Rates of detected transfers should not be mistaken for the actual transfer rates; most analyses of gene transfers remain anecdotal. The method and significance level to identify potential gene transfer events represent a trade-off between the frequency of erroneous identification (false positives) and the power to detect actual transfer events.
Collapse
|
25
|
Comas I, Moya A, González-Candelas F. Phylogenetic signal and functional categories in Proteobacteria genomes. BMC Evol Biol 2007; 7 Suppl 1:S7. [PMID: 17288580 PMCID: PMC1796616 DOI: 10.1186/1471-2148-7-s1-s7] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND A comprehensive evolutionary analysis of bacterial genomes implies to identify the hallmark of vertical and non-vertical signals and to discriminate them from the presence of mere phylogenetic noise. In this report we have addressed the impact of factors like the universal distribution of the genes, their essentiality or their functional role in the cell on the inference of vertical signal through phylogenomic methods. RESULTS We have established that supermatrices derived from data sets composed mainly by genes suspected to be essential for bacterial cellular life perform better on the recovery of vertical signal than those composed by widely distributed genes. In addition, we show that the "Transcription" category of genes seems to harbor a better vertical signal than other functional categories. Moreover, the "Poorly characterized" category performs better than other categories related with metabolism or cellular processes. CONCLUSION From these results we conclude that different data sets allow addressing different questions in phylogenomic analyses. The vertical signal seems to be more present in essential genes although these also include a significant degree of incongruence. From a functional perspective, as expected, informational genes perform better than operational ones but we have also shown the surprising behavior of poorly annotated genes, which points to their importance in the genome evolution of bacteria.
Collapse
Affiliation(s)
- Iñaki Comas
- Instituto Cavanilles de Biodiversidad y Biología Evolutiva. Universidad de Valencia. Apartado Oficial 22085, Valencia E-46071, Spain
| | - Andrés Moya
- Instituto Cavanilles de Biodiversidad y Biología Evolutiva. Universidad de Valencia. Apartado Oficial 22085, Valencia E-46071, Spain
| | - Fernando González-Candelas
- Instituto Cavanilles de Biodiversidad y Biología Evolutiva. Universidad de Valencia. Apartado Oficial 22085, Valencia E-46071, Spain
| |
Collapse
|
26
|
Spencer M, Susko E, Roger AJ. Modelling prokaryote gene content. Evol Bioinform Online 2007; 2:157-78. [PMID: 19455209 PMCID: PMC2674660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The patchy distribution of genes across the prokaryotes may be caused by multiple gene losses or lateral transfer. Probabilistic models of gene gain and loss are needed to distinguish between these possibilities. Existing models allow only single genes to be gained and lost, despite the empirical evidence for multi-gene events. We compare birth-death models (currently the only widely-used models, in which only one gene can be gained or lost at a time) to blocks models (allowing gain and loss of multiple genes within a family). We analyze two pairs of genomes: two E. coli strains, and the distantly-related Archaeoglobus fulgidus (archaea) and Bacillus subtilis (gram positive bacteria). Blocks models describe the data much better than birth-death models. Our models suggest that lateral transfers of multiple genes from the same family are rare (although transfers of single genes are probably common). For both pairs, the estimated median time that a gene will remain in the genome is not much greater than the time separating the common ancestors of the archaea and bacteria. Deep phylogenetic reconstruction from sequence data will therefore depend on choosing genes likely to remain in the genome for a long time. Phylogenies based on the blocks model are more biologically plausible than phylogenies based on the birth-death model.
Collapse
Affiliation(s)
- Matthew Spencer
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada,Correspondence: Matthew Spencer, School of Biological Sciences, University of Liverpool, Liverpool, L69 7ZB, UK.
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Andrew J. Roger
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
27
|
Gupta RS, Sneath PHA. Application of the character compatibility approach to generalized molecular sequence data: branching order of the proteobacterial subdivisions. J Mol Evol 2006; 64:90-100. [PMID: 17160641 DOI: 10.1007/s00239-006-0082-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2006] [Accepted: 08/28/2006] [Indexed: 10/23/2022]
Abstract
The character compatibility approach, which removes all homoplasic characters and involves finding the largest clique of compatible characters in a dataset, in principle, provides a powerful means for obtaining correct topology in difficult to resolve cases. However, the usefulness of this approach to generalized molecular sequence data for phylogeny determination has not been studied in the past. We have used this approach to determine the topology of 23 proteobacterial species (6 each of alpha-, beta- and gamma-, 3 delta-, and 2 epsilon-proteobacteria) using sequence data for 10 conserved proteins (Hsp60, Hsp70, EF-Tu, EF-G, alanyl-tRNA synthetase, RecA, GyrA, GyrB, RpoB and RpoC). All sites in the sequence alignments of these proteins where only two amino acids were found, with each amino acid present in at least two species, were selected. Mutual compatibility determination on these binary state sites was carried out by two means. In one case, all of these sites were combined into a large dataset (Set A; 957 characters) prior to compatibility analysis. In the second case, compatibility analysis was carried out on characters from individual proteins and all compatible sites were combined into a large dataset (Set B; 398 characters) for further studies. Upon compatibility analyses, the largest cliques that were obtained from Sets A and B consisted of 337 and 323 compatible characters, respectively. In these cliques, all proteobacterial subgroups were clearly distinguished and branching orders of most of the species were also resolved. The epsilon-proteobacteria exhibited the earliest branching, whereas the beta- and gamma-subgroups were found to have emerged last. The relative placement of the alpha- and delta-subgroups, however, was not resolved. The topology of these species was also determined based on 16S rRNA sequences and a concatenated dataset of sequences for all 10 proteins by means of neighbor-joining, maximum likelihood, and maximum parsimony methods. In the protein trees, all proteobacterial groups were reliably resolved and they branched in the following order: (epsilon(delta(alpha(beta,gamma)))). However, in the rRNA trees, the gamma- and beta-subgroups exhibited polyphyletic branching and many internal nodes were not resolved. These results indicate that the character compatibility analysis using generalized molecular sequence data provides a powerful means for evolutionary studies. Based on molecular sequences, it should be possible to obtain very large datasets of compatible characters that should prove very helpful in clarifying difficult to resolve phylogenetic relationships.
Collapse
Affiliation(s)
- Radhey S Gupta
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Canada L8N 3Z5.
| | | |
Collapse
|
28
|
van Passel MWJ, Kuramae EE, Luyf ACM, Bart A, Boekhout T. The reach of the genome signature in prokaryotes. BMC Evol Biol 2006; 6:84. [PMID: 17040564 PMCID: PMC1621082 DOI: 10.1186/1471-2148-6-84] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2006] [Accepted: 10/13/2006] [Indexed: 01/31/2023] Open
Abstract
Background With the increased availability of sequenced genomes there have been several initiatives to infer evolutionary relationships by whole genome characteristics. One of these studies suggested good congruence between genome synteny, shared gene content, 16S ribosomal DNA identity, codon usage and the genome signature in prokaryotes. Here we rigorously test the phylogenetic signal of the genome signature, which consists of the genome-specific relative frequencies of dinucleotides, on 334 sequenced prokaryotic genome sequences. Results Intrageneric comparisons show that in general the genomic dissimilarity scores are higher than in intraspecific comparisons, in accordance with the suggested phylogenetic signal of the genome signature. Exceptions to this trend, (Bartonella spp., Bordetella spp., Salmonella spp. and Yersinia spp.), which have low average intrageneric genomic dissimilarity scores, suggest that members of these genera might be considered the same species. On the other hand, high genomic dissimilarity values for intraspecific analyses suggest that in some cases (e.g.Prochlorococcus marinus, Pseudomonas fluorescens, Buchnera aphidicola and Rhodopseudomonas palustris) different strains from the same species may actually represent different species. Comparing 16S rDNA identity with genomic dissimilarity values corroborates the previously suggested trend in phylogenetic signal, albeit that the dissimilarity values only provide low resolution. Conclusion The genome signature has a distinct phylogenetic signal, independent of individual genetic marker genes. A reliable phylogenetic clustering cannot be based on dissimilarity values alone, as bootstrapping is not possible for this parameter. It can however be used to support or refute a given phylogeny and resulting taxonomy.
Collapse
Affiliation(s)
- Mark WJ van Passel
- Centraalbureau voor Schimmelcultures (CBS), Uppsalalaan 8, Utrecht, The Netherlands
- Center for Infection and Immunity Amsterdam (CINIMA), Department of Medical Microbiology, Academic Medical Center, University of Amsterdam, the Netherlands
- Department of Biochemistry and Molecular Biophysics, University of Arizona, POBox 210088, Tucson, Arizona, USA
| | - Eiko E Kuramae
- Centraalbureau voor Schimmelcultures (CBS), Uppsalalaan 8, Utrecht, The Netherlands
| | - Angela CM Luyf
- Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, the Netherlands
| | - Aldert Bart
- Center for Infection and Immunity Amsterdam (CINIMA), Department of Medical Microbiology, Academic Medical Center, University of Amsterdam, the Netherlands
| | - Teun Boekhout
- Centraalbureau voor Schimmelcultures (CBS), Uppsalalaan 8, Utrecht, The Netherlands
| |
Collapse
|
29
|
Gribaldo S, Brochier-Armanet C. The origin and evolution of Archaea: a state of the art. Philos Trans R Soc Lond B Biol Sci 2006; 361:1007-22. [PMID: 16754611 PMCID: PMC1578729 DOI: 10.1098/rstb.2006.1841] [Citation(s) in RCA: 189] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Environmental surveys indicate that the Archaea are diverse and abundant not only in extreme environments, but also in soil, oceans and freshwater, where they may fulfil a key role in the biogeochemical cycles of the planet. Archaea display unique capacities, such as methanogenesis and survival at temperatures higher than 90 degrees C, that make them crucial for understanding the nature of the biota of early Earth. Molecular, genomics and phylogenetics data strengthen Woese's definition of Archaea as a third domain of life in addition to Bacteria and Eukarya. Phylogenomics analyses of the components of different molecular systems are highlighting a core of mainly vertically inherited genes in Archaea. This allows recovering a globally well-resolved picture of archaeal evolution, as opposed to what is observed for Bacteria and Eukarya. This may be due to the fact that no rapid divergence occurred at the emergence of present-day archaeal lineages. This phylogeny supports a hyperthermophilic and non-methanogenic ancestor to present-day archaeal lineages, and a profound divergence between two major phyla, the Crenarchaeota and the Euryarchaeota, that may not have an equivalent in the other two domains of life. Nanoarchaea may not represent a third and ancestral archaeal phylum, but a fast-evolving euryarchaeal lineage. Methanogenesis seems to have appeared only once and early in the evolution of Euryarchaeota. Filling up this picture of archaeal evolution by adding presently uncultivated species, and placing it back in geological time remain two essential goals for the future.
Collapse
Affiliation(s)
- Simonetta Gribaldo
- Institut Pasteur, Unité Biologie Moléculaire du Gène chez les Extremophiles, 25 rue du Dr Roux, 75724 Paris Cedex 15, France.
| | | |
Collapse
|
30
|
Lienau EK, DeSalle R, Rosenfeld JA, Planet PJ. Reciprocal illumination in the gene content tree of life. Syst Biol 2006; 55:441-53. [PMID: 16861208 DOI: 10.1080/10635150600697416] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Abstract
Phylogenies based on gene content rely on statements of primary homology to characterize gene presence or absence. These statements (hypotheses) are usually determined by techniques based on threshold similarity or distance measurements between genes. This fundamental but problematic step can be examined by evaluating each homology hypothesis by the extent to which it is corroborated by the rest of the data. Here we test the effects of varying the stringency for making primary homology statements using a range of similarity (e-value) cutoffs in 166 fully sequenced and annotated genomes spanning the tree of life. By evaluating each resulting data set with tree-based measurements of character consistency and information content, we find a set of homology statements that optimizes overall corroboration. The resulting data set produces well-resolved and well-supported trees of life and greatly ameliorates previously noted inconsistencies such as the misclassification of small genomes. The method presented here, which can be used to test any technique for recognizing primary homology, provides an objective framework for evaluating phylogenetic hypotheses and data sets for the tree of life. It also can serve as a technique for identifying well-corroborated sets of homologous genes for functional genomic applications.
Collapse
Affiliation(s)
- E Kurt Lienau
- American Museum of Natural History, Molecular Laboratories, Central Park West at 79th Street, (P.J.P.), New York, New York 10024, USA
| | | | | | | |
Collapse
|
31
|
Glasner ME, Fayazmanesh N, Chiang RA, Sakai A, Jacobson MP, Gerlt JA, Babbitt PC. Evolution of structure and function in the o-succinylbenzoate synthase/N-acylamino acid racemase family of the enolase superfamily. J Mol Biol 2006; 360:228-50. [PMID: 16740275 DOI: 10.1016/j.jmb.2006.04.055] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2006] [Revised: 04/22/2006] [Accepted: 04/25/2006] [Indexed: 11/30/2022]
Abstract
Understanding how proteins evolve to provide both exquisite specificity and proficient activity is a fundamental problem in biology that has implications for protein function prediction and protein engineering. To study this problem, we analyzed the evolution of structure and function in the o-succinylbenzoate synthase/N-acylamino acid racemase (OSBS/NAAAR) family, part of the mechanistically diverse enolase superfamily. Although all characterized members of the family catalyze the OSBS reaction, this family is extraordinarily divergent, with some members sharing <15% identity. In addition, a member of this family, Amycolatopsis OSBS/NAAAR, is promiscuous, catalyzing both dehydration and racemization. Although the OSBS/NAAAR family appears to have a single evolutionary origin, no sequence or structural motifs unique to this family could be identified; all residues conserved in the family are also found in enolase superfamily members that have different functions. Based on their species distribution, several uncharacterized proteins similar to Amycolatopsis OSBS/NAAAR appear to have been transmitted by lateral gene transfer. Like Amycolatopsis OSBS/NAAAR, these might have additional or alternative functions to OSBS because many are from organisms lacking the pathway in which OSBS is an intermediate. In addition to functional differences, the OSBS/NAAAR family exhibits surprising structural variations, including large differences in orientation between the two domains. These results offer several insights into protein evolution. First, orthologous proteins can exhibit significant structural variation, and specificity can be maintained with little conservation of ligand-contacting residues. Second, the discovery of a set of proteins similar to Amycolatopsis OSBS/NAAAR supports the hypothesis that new protein functions evolve through promiscuous intermediates. Finally, a combination of evolutionary, structural, and sequence analyses identified characteristics that might prime proteins, such as Amycolatopsis OSBS/NAAAR, for the evolution of new activities.
Collapse
Affiliation(s)
- Margaret E Glasner
- Department of Biopharmaceutical Sciences, University of California, San Francisco, CA 94143, USA
| | | | | | | | | | | | | |
Collapse
|
32
|
Gophna U, Charlebois RL, Doolittle WF. Ancient lateral gene transfer in the evolution of Bdellovibrio bacteriovorus. Trends Microbiol 2006; 14:64-9. [PMID: 16413191 DOI: 10.1016/j.tim.2005.12.008] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2005] [Revised: 11/21/2005] [Accepted: 12/21/2005] [Indexed: 10/25/2022]
Abstract
The recently sequenced genome of the predatory delta-proteobacterium Bdellovibrio bacteriovorus provides many insights into its metabolism and evolution. Because its genes are reasonably uniform in G+C content, it was suggested that B. bacteriovorus actively resists recombination with foreign DNA and horizontal transfer of DNA from other bacteria. To investigate this further, we carried out a variety of phylogenetic and comparative genomics analyses using data from >200 microbial genomes, including several published delta-proteobacteria. Although there might be little evidence for the extensive recent transfer of genes, we demonstrate that ancient lateral gene acquisition has shaped the B. bacteriovorus genome to a great extent.
Collapse
Affiliation(s)
- Uri Gophna
- Department of Molecular Microbiology and Biotechnology, The George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel.
| | | | | |
Collapse
|
33
|
van Passel MWJ, Bart A, van der Ende A. Default taxonomy and the genomics era. MICROBIOLOGY-SGM 2005; 151:2818-2820. [PMID: 16151194 DOI: 10.1099/mic.0.28249-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- M W J van Passel
- Academic Medical Center, Department of Medical Microbiology, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| | - A Bart
- Academic Medical Center, Department of Medical Microbiology, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| | - A van der Ende
- Academic Medical Center, Department of Medical Microbiology, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| |
Collapse
|
34
|
Mongodin EF, Nelson KE, Daugherty S, Deboy RT, Wister J, Khouri H, Weidman J, Walsh DA, Papke RT, Sanchez Perez G, Sharma AK, Nesbø CL, MacLeod D, Bapteste E, Doolittle WF, Charlebois RL, Legault B, Rodriguez-Valera F. The genome of Salinibacter ruber: convergence and gene exchange among hyperhalophilic bacteria and archaea. Proc Natl Acad Sci U S A 2005; 102:18147-52. [PMID: 16330755 PMCID: PMC1312414 DOI: 10.1073/pnas.0509073102] [Citation(s) in RCA: 252] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Saturated thalassic brines are among the most physically demanding habitats on Earth: few microbes survive in them. Salinibacter ruber is among these organisms and has been found repeatedly in significant numbers in climax saltern crystallizer communities. The phenotype of this bacterium is remarkably similar to that of the hyperhalophilic Archaea (Haloarchaea). The genome sequence suggests that this resemblance has arisen through convergence at the physiological level (different genes producing similar overall phenotype) and the molecular level (independent mutations yielding similar sequences or structures). Several genes and gene clusters also derive by lateral transfer from (or may have been laterally transferred to) haloarchaea. S. ruber encodes four rhodopsins. One resembles bacterial proteorhodopsins and three are of the haloarchaeal type, previously uncharacterized in a bacterial genome. The impact of these modular adaptive elements on the cell biology and ecology of S. ruber is substantial, affecting salt adaptation, bioenergetics, and photobiology.
Collapse
Affiliation(s)
- E F Mongodin
- The Institute for Genomic Research, Rockville, MD 20850, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Abstract
The ranks higher than the species in the prokaryotic taxonomy are primarily designated based on phylogenetic analysis of the 16S rRNA gene sequences, but no definite standards exist for the absolute relatedness (measured by 16S rRNA or other means) between the ranks. Accordingly, it remains unknown how comparable the ranks are between different organisms. To gain insights into this question, we studied the relationship between shared gene content and genetic relatedness for 175 fully sequenced strains, using as a robust measure of relatedness the average amino acid identity (AAI) of the shared genes. Our results reveal that adjacent ranks (e.g., phylum versus class) frequently show extensive overlap in terms of genetic and gene content relatedness of the grouped organisms, and hence, the current system is of limited predictive power in this respect. The overlap between nonadjacent ranks (e.g., phylum versus family) is generally limited and attributable to clear inconsistencies of the taxonomy. In addition to providing means for standardizing taxonomy, our AAI-based approach provides a means to evaluate the robustness of alternative genetic markers for phylogenetic purposes. For instance, the 23S rRNA gene was found to be as good a marker as the 16S rRNA gene, while several of the widely distributed protein-coding genes, such as the RNA polymerase and gyrase subunits, show a strong phylogenetic signal, albeit less strong than the rRNA genes (0.78 > R2 > 0.69 for the protein-coding genes versus R2 = 0.84 for the rRNA genes). The AAI approach outlined here could contribute significantly to a genome-based taxonomy for all microbial organisms.
Collapse
|
36
|
Abstract
To what extent is the tree of life the best representation of the evolutionary history of microorganisms? Recent work has shown that, among sets of prokaryotic genomes in which most homologous genes show extremely low sequence divergence, gene content can vary enormously, implying that those genes that are variably present or absent are frequently horizontally transferred. Traditionally, successful horizontal gene transfer was assumed to provide a selective advantage to either the host or the gene itself, but could horizontally transferred genes be neutral or nearly neutral? We suggest that for many prokaryotes, the boundaries between species are fuzzy, and therefore the principles of population genetics must be broadened so that they can be applied to higher taxonomic categories.
Collapse
Affiliation(s)
- J Peter Gogarten
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut 06269-3125, USA.
| | | |
Collapse
|
37
|
Bern M, Goldberg D. Automatic selection of representative proteins for bacterial phylogeny. BMC Evol Biol 2005; 5:34. [PMID: 15927057 PMCID: PMC1175084 DOI: 10.1186/1471-2148-5-34] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2004] [Accepted: 05/31/2005] [Indexed: 11/22/2022] Open
Abstract
Background Although there are now about 200 complete bacterial genomes in GenBank, deep bacterial phylogeny remains a difficult problem, due to confounding horizontal gene transfers and other phylogenetic "noise". Previous methods have relied primarily upon biological intuition or manual curation for choosing genomic sequences unlikely to be horizontally transferred, and have given inconsistent phylogenies with poor bootstrap confidence. Results We describe an algorithm that automatically picks "representative" protein families from entire genomes for use as phylogenetic characters. A representative protein family is one that, taken alone, gives an organismal distance matrix in good agreement with a distance matrix computed from all sufficiently conserved proteins. We then use maximum-likelihood methods to compute phylogenetic trees from a concatenation of representative sequences. We validate the use of representative proteins on a number of small phylogenetic questions with accepted answers. We then use our methodology to compute a robust and well-resolved phylogenetic tree for a diverse set of sequenced bacteria. The tree agrees closely with a recently published tree computed using manually curated proteins, and supports two proposed high-level clades: one containing Actinobacteria, Deinococcus, and Cyanobacteria ("Terrabacteria"), and another containing Planctomycetes and Chlamydiales. Conclusion Representative proteins provide an effective solution to the problem of selecting phylogenetic characters.
Collapse
Affiliation(s)
- Marshall Bern
- Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304, USA
| | - David Goldberg
- Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304, USA
| |
Collapse
|
38
|
Bapteste E, Susko E, Leigh J, MacLeod D, Charlebois RL, Doolittle WF. Do orthologous gene phylogenies really support tree-thinking? BMC Evol Biol 2005; 5:33. [PMID: 15913459 PMCID: PMC1156881 DOI: 10.1186/1471-2148-5-33] [Citation(s) in RCA: 148] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2005] [Accepted: 05/24/2005] [Indexed: 11/17/2022] Open
Abstract
Background Since Darwin's Origin of Species, reconstructing the Tree of Life has been a goal of evolutionists, and tree-thinking has become a major concept of evolutionary biology. Practically, building the Tree of Life has proven to be tedious. Too few morphological characters are useful for conducting conclusive phylogenetic analyses at the highest taxonomic level. Consequently, molecular sequences (genes, proteins, and genomes) likely constitute the only useful characters for constructing a phylogeny of all life. For this reason, tree-makers expect a lot from gene comparisons. The simultaneous study of the largest number of molecular markers possible is sometimes considered to be one of the best solutions in reconstructing the genealogy of organisms. This conclusion is a direct consequence of tree-thinking: if gene inheritance conforms to a tree-like model of evolution, sampling more of these molecules will provide enough phylogenetic signal to build the Tree of Life. The selection of congruent markers is thus a fundamental step in simultaneous analysis of many genes. Results Heat map analyses were used to investigate the congruence of orthologues in four datasets (archaeal, bacterial, eukaryotic and alpha-proteobacterial). We conclude that we simply cannot determine if a large portion of the genes have a common history. In addition, none of these datasets can be considered free of lateral gene transfer. Conclusion Our phylogenetic analyses do not support tree-thinking. These results have important conceptual and practical implications. We argue that representations other than a tree should be investigated in this case because a non-critical concatenation of markers could be highly misleading.
Collapse
Affiliation(s)
- E Bapteste
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Dalhousie University, Department of Biochemistry & Molecular Biology, 5850 College St., Halifax, NS, B3H 1X5, Canada
| | - E Susko
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Dalhousie University, Department of Mathematics and Statistics, Halifax, Nova Scotia, Canada
| | - J Leigh
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Dalhousie University, Department of Biochemistry & Molecular Biology, 5850 College St., Halifax, NS, B3H 1X5, Canada
| | - D MacLeod
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Dalhousie University, Department of Biochemistry & Molecular Biology, 5850 College St., Halifax, NS, B3H 1X5, Canada
| | - RL Charlebois
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Dalhousie University, Department of Biochemistry & Molecular Biology, 5850 College St., Halifax, NS, B3H 1X5, Canada
| | - WF Doolittle
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Dalhousie University, Department of Biochemistry & Molecular Biology, 5850 College St., Halifax, NS, B3H 1X5, Canada
| |
Collapse
|
39
|
MacLeod D, Charlebois RL, Doolittle F, Bapteste E. Deduction of probable events of lateral gene transfer through comparison of phylogenetic trees by recursive consolidation and rearrangement. BMC Evol Biol 2005; 5:27. [PMID: 15819979 PMCID: PMC1087482 DOI: 10.1186/1471-2148-5-27] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2004] [Accepted: 04/08/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND When organismal phylogenies based on sequences of single marker genes are poorly resolved, a logical approach is to add more markers, on the assumption that weak but congruent phylogenetic signal will be reinforced in such multigene trees. Such approaches are valid only when the several markers indeed have identical phylogenies, an issue which many multigene methods (such as the use of concatenated gene sequences or the assembly of supertrees) do not directly address. Indeed, even when the true history is a mixture of vertical descent for some genes and lateral gene transfer (LGT) for others, such methods produce unique topologies. RESULTS We have developed software that aims to extract evidence for vertical and lateral inheritance from a set of gene trees compared against an arbitrary reference tree. This evidence is then displayed as a synthesis showing support over the tree for vertical inheritance, overlaid with explicit lateral gene transfer (LGT) events inferred to have occurred over the history of the tree. Like splits-tree methods, one can thus identify nodes at which conflict occurs. Additionally one can make reasonable inferences about vertical and lateral signal, assigning putative donors and recipients. CONCLUSION A tool such as ours can serve to explore the reticulated dimensionality of molecular evolution, by dissecting vertical and lateral inheritance at high resolution. By this, we mean that individual nodes can be examined not only for congruence, but also for coherence in light of LGT. We assert that our tools will facilitate the comparison of phylogenetic trees, and the interpretation of conflicting data.
Collapse
Affiliation(s)
- Dave MacLeod
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Department of Biochemistry & Molecular Biology, Dalhousie University, 5850 College St., Halifax, NS, B3H 1X5, Canada
| | - Robert L Charlebois
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Department of Biochemistry & Molecular Biology, Dalhousie University, 5850 College St., Halifax, NS, B3H 1X5, Canada
| | - Ford Doolittle
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Department of Biochemistry & Molecular Biology, Dalhousie University, 5850 College St., Halifax, NS, B3H 1X5, Canada
| | - Eric Bapteste
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Department of Biochemistry & Molecular Biology, Dalhousie University, 5850 College St., Halifax, NS, B3H 1X5, Canada
| |
Collapse
|
40
|
Charlebois RL, Doolittle WF. Computing prokaryotic gene ubiquity: rescuing the core from extinction. Genome Res 2005; 14:2469-77. [PMID: 15574825 PMCID: PMC534671 DOI: 10.1101/gr.3024704] [Citation(s) in RCA: 137] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The genomic core concept has found several uses in comparative and evolutionary genomics. Defined as the set of all genes common to (ubiquitous among) all genomes in a phylogenetically coherent group, core size decreases as the number and phylogenetic diversity of the relevant group increases. Here, we focus on methods for defining the size and composition of the core of all genes shared by sequenced genomes of prokaryotes (Bacteria and Archaea). There are few (almost certainly less than 50) genes shared by all of the 147 genomes compared, surely insufficient to conduct all essential functions. Sequencing and annotation errors are responsible for the apparent absence of some genes, while very limited but genuine disappearances (from just one or a few genomes) can account for several others. Core size will continue to decrease as more genome sequences appear, unless the requirement for ubiquity is relaxed. Such relaxation seems consistent with any reasonable biological purpose for seeking a core, but it renders the problem of definition more problematic. We propose an alternative approach (the phylogenetically balanced core), which preserves some of the biological utility of the core concept. Cores, however delimited, preferentially contain informational rather than operational genes; we present a new hypothesis for why this might be so.
Collapse
Affiliation(s)
- Robert L Charlebois
- Genome Atlantic, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, B3H 1X5, Canada
| | | |
Collapse
|
41
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2447491 DOI: 10.1002/cfg.425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|