1
|
Garcia PS, D'Angelo F, Ollagnier de Choudens S, Dussouchaud M, Bouveret E, Gribaldo S, Barras F. An early origin of iron-sulfur cluster biosynthesis machineries before Earth oxygenation. Nat Ecol Evol 2022; 6:1564-1572. [PMID: 36109654 DOI: 10.1038/s41559-022-01857-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 07/22/2022] [Indexed: 11/09/2022]
Abstract
Iron-sulfur (Fe-S) clusters are ubiquitous cofactors essential for life. It is largely thought that the emergence of oxygenic photosynthesis and progressive oxygenation of the atmosphere led to the origin of multiprotein machineries (ISC, NIF and SUF) assisting Fe-S cluster synthesis in the presence of oxidative stress and shortage of bioavailable iron. However, previous analyses have left unclear the origin and evolution of these systems. Here, we combine exhaustive homology searches with genomic context analysis and phylogeny to precisely identify Fe-S cluster biogenesis systems in over 10,000 archaeal and bacterial genomes. We highlight the existence of two additional and clearly distinct 'minimal' Fe-S cluster assembly machineries, MIS (minimal iron-sulfur) and SMS (SUF-like minimal system), which we infer in the last universal common ancestor (LUCA) and we experimentally validate SMS as a bona fide Fe-S cluster biogenesis system. These ancestral systems were kept in archaea whereas they went through stepwise complexification in bacteria to incorporate additional functions for higher Fe-S cluster synthesis efficiency leading to SUF, ISC and NIF. Horizontal gene transfers and losses then shaped the current distribution of these systems, driving ecological adaptations such as the emergence of aerobic lifestyles in archaea. Our results show that dedicated machineries were in place early in evolution to assist Fe-S cluster biogenesis and that their origin is not directly linked to Earth oxygenation.
Collapse
Affiliation(s)
- Pierre Simon Garcia
- Department of Microbiology, Unit Stress Adaptation and Metabolism in Enterobacteria, Institut Pasteur, Université Paris Cité, UMR CNRS 6047, Paris, France
- Department of Microbiology, Unit Evolutionary Biology of the Microbial Cell, Institut Pasteur, Université Paris Cité, UMR CNRS 6047, Paris, France
| | - Francesca D'Angelo
- Department of Microbiology, Unit Stress Adaptation and Metabolism in Enterobacteria, Institut Pasteur, Université Paris Cité, UMR CNRS 6047, Paris, France
| | | | - Macha Dussouchaud
- Department of Microbiology, Unit Stress Adaptation and Metabolism in Enterobacteria, Institut Pasteur, Université Paris Cité, UMR CNRS 6047, Paris, France
| | - Emmanuelle Bouveret
- Department of Microbiology, Unit Stress Adaptation and Metabolism in Enterobacteria, Institut Pasteur, Université Paris Cité, UMR CNRS 6047, Paris, France
| | - Simonetta Gribaldo
- Department of Microbiology, Unit Evolutionary Biology of the Microbial Cell, Institut Pasteur, Université Paris Cité, UMR CNRS 6047, Paris, France.
| | - Frédéric Barras
- Department of Microbiology, Unit Stress Adaptation and Metabolism in Enterobacteria, Institut Pasteur, Université Paris Cité, UMR CNRS 6047, Paris, France.
| |
Collapse
|
2
|
Al Jewari C, Baldauf SL. Conflict over the eukaryote root resides in strong outliers, mosaics and missing data sensitivity of site-specific (CAT) mixture models. Syst Biol 2022; 72:1-16. [PMID: 35412616 DOI: 10.1093/sysbio/syac029] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/07/2022] [Indexed: 11/14/2022] Open
Abstract
Phylogenetic reconstruction using concatenated loci ("phylogenomics" or "supermatrix phylogeny") is a powerful tool for solving evolutionary splits that are poorly resolved in single gene/protein trees (SGTs). However, recent phylogenomic attempts to resolve the eukaryote root have yielded conflicting results, along with claims of various artefacts hidden in the data. We have investigated these conflicts using two new methods for assessing phylogenetic conflict. ConJak uses whole marker (gene or protein) jackknifing to assess deviation from a central mean for each individual sequence, while ConWin uses a sliding window to screen for incongruent protein fragments (mosaics). Both methods allow selective masking of individual sequences or sequence fragments in order to minimize missing data, an important consideration for resolving deep splits with limited data. Analyses focused on a set of 76 eukaryotic proteins of bacterial-ancestry previously used in various combinations to assess the branching order among the three major divisions of eukaryotes: Amorphea (mainly animals, fungi and Amoebozoa), Diaphoretickes (most other well-known eukaryotes and nearly all algae) and Excavata, represented here by Discoba (Jakobida, Heterolobosea, and Euglenozoa). ConJak analyses found strong outliers to be concentrated in under-sampled lineages, while ConWin analyses of Discoba, the most under-sampled of the major lineages, detected potentially incongruent fragments scattered throughout. Phylogenetic analyses of the full data using an LG-gamma model support a Discoba sister scenario (neozoan-excavate root), which rises to 99-100% bootstrap support with data masked according to either protocol. However, analyses with two site-specific (CAT) mixture models yielded widely inconsistent results and a striking sensitivity to missing data. The neozoan-excavate root places Amorphea and Diaphoretickes as more closely related to each other than either is to Discoba, a fundamental relationship that should remain unaffected by additional taxa.
Collapse
Affiliation(s)
- Caesar Al Jewari
- Program in Systematic Biology, Department of Organismal Biology, Uppsala University, Uppsala, Sweden 75236
| | - Sandra L Baldauf
- Program in Systematic Biology, Department of Organismal Biology, Uppsala University, Uppsala, Sweden 75236
| |
Collapse
|
3
|
Superson A, Battistuzzi F. Exclusion of fast evolving genes or fast evolving sites produces different archaean phylogenies. Mol Phylogenet Evol 2022; 170:107438. [DOI: 10.1016/j.ympev.2022.107438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Revised: 01/07/2022] [Accepted: 02/03/2022] [Indexed: 11/26/2022]
|
4
|
Smith MR. Robust analysis of phylogenetic tree space. Syst Biol 2021; 71:1255-1270. [PMID: 34963003 PMCID: PMC9366458 DOI: 10.1093/sysbio/syab100] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 12/03/2021] [Accepted: 12/23/2021] [Indexed: 11/13/2022] Open
Abstract
Phylogenetic analyses often produce large numbers of trees. Mapping trees' distribution in 'tree space' can illuminate the behaviour and performance of search strategies, reveal distinct clusters of optimal trees, and expose differences between different data sources or phylogenetic methods - but the high-dimensional spaces defined by metric distances are necessarily distorted when represented in fewer dimensions. Here, I explore the consequences of this transformation in phylogenetic search results from 128 morphological datasets, using stratigraphic congruence - a complementary aspect of tree similarity - to evaluate the utility of low-dimensional mappings. I find that phylogenetic similarities between cladograms are most accurately depicted in tree spaces derived from information-theoretic tree distances or the quartet distance. Robinson-Foulds tree spaces exhibit prominent distortions and often fail to group trees according to phylogenetic similarity, whereas the strong influence of tree shape on the Kendall-Colijn distance makes its tree space unsuitable for many purposes. Distances mapped into two or even three dimensions often display little correspondence with true distances, which can lead to profound misrepresentation of clustering structure. Without explicit testing, one cannot be confident that a tree space mapping faithfully represents the true distribution of trees, nor that visually evident structure is valid. My recommendations for tree space validation and visualization are implemented in a new graphical user interface in the 'TreeDist' R package.
Collapse
Affiliation(s)
- Martin R Smith
- Department of Earth Sciences, Durham University, Lower Mountjoy, Durham, DH1 3LE, UK
| |
Collapse
|
5
|
Debray K, Marie-Magdelaine J, Ruttink T, Clotault J, Foucher F, Malécot V. Identification and assessment of variable single-copy orthologous (SCO) nuclear loci for low-level phylogenomics: a case study in the genus Rosa (Rosaceae). BMC Evol Biol 2019; 19:152. [PMID: 31340752 PMCID: PMC6657147 DOI: 10.1186/s12862-019-1479-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Accepted: 07/16/2019] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND With an ever-growing number of published genomes, many low levels of the Tree of Life now contain several species with enough molecular data to perform shallow-scale phylogenomic studies. Moving away from using just a few universal phylogenetic markers, we can now target thousands of other loci to decipher taxa relationships. Making the best possible selection of informative sequences regarding the taxa studied has emerged as a new issue. Here, we developed a general procedure to mine genomic data, looking for orthologous single-copy loci capable of deciphering phylogenetic relationships below the generic rank. To develop our strategy, we chose the genus Rosa, a rapid-evolving lineage of the Rosaceae family in which several species genomes have recently been sequenced. We also compared our loci to conventional plastid markers, commonly used for phylogenetic inference in this genus. RESULTS We generated 1856 sequence tags in putative single-copy orthologous nuclear loci. Associated in silico primer pairs can potentially amplify fragments able to resolve a wide range of speciation events within the genus Rosa. Analysis of parsimony-informative site content showed the value of non-coding genomic regions to obtain variable sequences despite the fact that they may be more difficult to target in less related species. Dozens of nuclear loci outperform the conventional plastid phylogenetic markers in terms of phylogenetic informativeness, for both recent and ancient evolutionary divergences. However, conflicting phylogenetic signals were found between nuclear gene tree topologies and the species-tree topology, shedding light on the many patterns of hybridization and/or incomplete lineage sorting that occur in the genus Rosa. CONCLUSIONS With recently published genome sequence data, we developed a set of single-copy orthologous nuclear loci to resolve species-level phylogenomics in the genus Rosa. This genome-wide scale dataset contains hundreds of highly variable loci which phylogenetic interest was assessed in terms of phylogenetic informativeness and topological conflict. Our target identification procedure can easily be reproduced to identify new highly informative loci for other taxonomic groups and ranks.
Collapse
Affiliation(s)
- Kevin Debray
- IRHS, Agrocampus-Ouest, INRA, UNIV Angers, SFR 4207 QuaSaV, Beaucouzé, France.
| | | | - Tom Ruttink
- ILVO, Flanders Research Institute for Agriculture, Fisheries and Food, Plant Sciences Unit, Melle, Belgium
| | - Jérémy Clotault
- IRHS, Agrocampus-Ouest, INRA, UNIV Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - Fabrice Foucher
- IRHS, Agrocampus-Ouest, INRA, UNIV Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - Valéry Malécot
- IRHS, Agrocampus-Ouest, INRA, UNIV Angers, SFR 4207 QuaSaV, Beaucouzé, France.
| |
Collapse
|
6
|
Abstract
Understanding how an animal organism and its gut microbes form an integrated biological organization, known as a holobiont, is becoming a central issue in biological studies. Such an organization inevitably involves a complex web of transmission processes that occur on different scales in time and space, across microbes and hosts. Network-based models are introduced in this chapter to tackle aspects of this complexity and to better take into account vertical and horizontal dimensions of transmission. Two types of network-based models are presented, sequence similarity networks and bipartite graphs. One interest of these networks is that they can consider a rich diversity of important players in microbial evolution that are usually excluded from evolutionary studies, like plasmids and viruses. These methods bring forward the notion of "gene externalization," which is defined as the presence of redundant copies of prokaryotic genes on mobile genetic elements (MGEs), and therefore emphasizes a related although distinct process from lateral gene transfer between microbial cells. This chapter introduces guidelines to the construction of these networks, reviews their analysis, and illustrates their possible biological interpretations and uses. The application to human gut microbiomes shows that sequences present in a higher diversity of MGEs have both biased functions and a broader microbial and human host range. These results suggest that an "externalized gut metagenome" is partly common to humans and benefits the gut microbial community. We conclude that testing relationships between microbial genes, microbes, and their animal hosts, using network-based methods, could help to unravel additional mechanisms of transmission in holobionts.
Collapse
|
7
|
Corel E, Pathmanathan JS, Watson AK, Karkar S, Lopez P, Bapteste E. MultiTwin: A Software Suite to Analyze Evolution at Multiple Levels of Organization Using Multipartite Graphs. Genome Biol Evol 2018; 10:2777-2784. [PMID: 30247672 PMCID: PMC6199892 DOI: 10.1093/gbe/evy209] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2018] [Indexed: 01/08/2023] Open
Abstract
The inclusion of introgressive processes in evolutionary studies induces a less constrained view of evolution. Network-based methods (like large-scale similarity networks) allow to include in comparative genomics all extrachromosomic carriers (like viruses, the most abundant biological entities on the planet) with their cellular hosts. The integration of several levels of biological organization (genes, genomes, communities, environments) enables more comprehensive analyses of gene sharing and improved sequence-based classifications. However, the algorithmic tools for the analysis of such networks are usually restricted to people with high programming skills. We present an integrated suite of software tools named MultiTwin, aimed at the construction, structuring, and analysis of multipartite graphs for evolutionary biology. Typically, this kind of graph is useful for the comparative analysis of the gene content of genomes in microbial communities from the environment and for exploring patterns of gene sharing, for example between distantly related cellular genomes, pangenomes, or between cellular genomes and their mobile genetic elements. We illustrate the use of this tool with an application of the bipartite approach (using gene family-genome graphs) for the analysis of pathogenicity traits in prokaryotes.
Collapse
Affiliation(s)
- Eduardo Corel
- Unité Mixte de Recherche, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Université Pierre et Marie Curie, Sorbonne Université, Paris, France
| | - Jananan S Pathmanathan
- Unité Mixte de Recherche, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Université Pierre et Marie Curie, Sorbonne Université, Paris, France
| | - Andrew K Watson
- Unité Mixte de Recherche, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Université Pierre et Marie Curie, Sorbonne Université, Paris, France
| | - Slim Karkar
- Unité Mixte de Recherche, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Université Pierre et Marie Curie, Sorbonne Université, Paris, France
| | - Philippe Lopez
- Unité Mixte de Recherche, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Université Pierre et Marie Curie, Sorbonne Université, Paris, France
| | - Eric Bapteste
- Unité Mixte de Recherche, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Université Pierre et Marie Curie, Sorbonne Université, Paris, France
| |
Collapse
|
8
|
Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets. Gigascience 2016; 5:44. [PMID: 27776538 PMCID: PMC5078944 DOI: 10.1186/s13742-016-0152-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Accepted: 10/12/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Collective animal behavior, such as the flocking of birds or the shoaling of fish, has inspired a class of algorithms designed to optimize distance-based clusters in various applications, including document analysis and DNA microarrays. In a flocking model, individual agents respond only to their immediate environment and move according to a few simple rules. After several iterations the agents self-organize, and clusters emerge without the need for partitional seeds. In addition to its unsupervised nature, flocking offers several computational advantages, including the potential to reduce the number of required comparisons. FINDINGS In the tool presented here, Clusterflock, we have implemented a flocking algorithm designed to locate groups (flocks) of orthologous gene families (OGFs) that share an evolutionary history. Pairwise distances that measure phylogenetic incongruence between OGFs guide flock formation. We tested this approach on several simulated datasets by varying the number of underlying topologies, the proportion of missing data, and evolutionary rates, and show that in datasets containing high levels of missing data and rate heterogeneity, Clusterflock outperforms other well-established clustering techniques. We also verified its utility on a known, large-scale recombination event in Staphylococcus aureus. By isolating sets of OGFs with divergent phylogenetic signals, we were able to pinpoint the recombined region without forcing a pre-determined number of groupings or defining a pre-determined incongruence threshold. CONCLUSIONS Clusterflock is an open-source tool that can be used to discover horizontally transferred genes, recombined areas of chromosomes, and the phylogenetic 'core' of a genome. Although we used it here in an evolutionary context, it is generalizable to any clustering problem. Users can write extensions to calculate any distance metric on the unit interval, and can use these distances to 'flock' any type of data.
Collapse
|
9
|
Planet PJ, Narechania A, Chen L, Mathema B, Boundy S, Archer G, Kreiswirth B. Architecture of a Species: Phylogenomics of Staphylococcus aureus. Trends Microbiol 2016; 25:153-166. [PMID: 27751626 DOI: 10.1016/j.tim.2016.09.009] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 09/07/2016] [Accepted: 09/22/2016] [Indexed: 12/11/2022]
Abstract
A deluge of whole-genome sequencing has begun to give insights into the patterns and processes of microbial evolution, but genome sequences have accrued in a haphazard manner, with biased sampling of natural variation that is driven largely by medical and epidemiological priorities. For instance, there is a strong bias for sequencing epidemic lineages of methicillin-resistant Staphylococcus aureus (MRSA) over sensitive isolates (methicillin-sensitive S. aureus: MSSA). As more diverse genomes are sequenced the emerging picture is of a highly subdivided species with a handful of relatively clonal groups (complexes) that, at any given moment, dominate in particular geographical regions. The establishment of hegemony of particular clones appears to be a dynamic process of successive waves of replacement of the previously dominant clone. Here we review the phylogenomic structure of a diverse range of S. aureus, including both MRSA and MSSA. We consider the utility of the concept of the 'core' genome and the impact of recombination and horizontal transfer. We argue that whole-genome surveillance of S. aureus populations could lead to better forecasting of antibiotic resistance and virulence of emerging clones, and a better understanding of the elusive biological factors that determine repeated strain replacement.
Collapse
Affiliation(s)
- Paul J Planet
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY, USA; Department of Pediatrics, Division of Pediatric Infectious Diseases, Children's Hospital of Philadelphia & University of Pennsylvania, Philadelphia, PA, USA.
| | - Apurva Narechania
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY, USA
| | - Liang Chen
- Public Health Research Institute Center, New Jersey Medical School, Rutgers, Newark, NJ, USA
| | - Barun Mathema
- Public Health Research Institute Center, New Jersey Medical School, Rutgers, Newark, NJ, USA; Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Sam Boundy
- Department of Internal Medicine, Virginia Commonwealth University School of Medicine, Richmond, VA, USA
| | - Gordon Archer
- Department of Internal Medicine, Virginia Commonwealth University School of Medicine, Richmond, VA, USA
| | - Barry Kreiswirth
- Public Health Research Institute Center, New Jersey Medical School, Rutgers, Newark, NJ, USA
| |
Collapse
|
10
|
Uyeda JC, Harmon LJ, Blank CE. A Comprehensive Study of Cyanobacterial Morphological and Ecological Evolutionary Dynamics through Deep Geologic Time. PLoS One 2016; 11:e0162539. [PMID: 27649395 PMCID: PMC5029880 DOI: 10.1371/journal.pone.0162539] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 08/24/2016] [Indexed: 01/01/2023] Open
Abstract
Cyanobacteria have exerted a profound influence on the progressive oxygenation of Earth. As a complementary approach to examining the geologic record—phylogenomic and trait evolutionary analyses of extant species can lead to new insights. We constructed new phylogenomic trees and analyzed phenotypic trait data using novel phylogenetic comparative methods. We elucidated the dynamics of trait evolution in Cyanobacteria over billion-year timescales, and provide evidence that major geologic events in early Earth’s history have shaped—and been shaped by—evolution in Cyanobacteria. We identify a robust core cyanobacterial phylogeny and a smaller set of taxa that exhibit long-branch attraction artifacts. We estimated the age of nodes and reconstruct the ancestral character states of 43 phenotypic characters. We find high levels of phylogenetic signal for nearly all traits, indicating the phylogeny carries substantial predictive power. The earliest cyanobacterial lineages likely lived in freshwater habitats, had small cell diameters, were benthic or sessile, and possibly epilithic/endolithic with a sheath. We jointly analyzed a subset of 25 binary traits to determine whether rates of trait evolution have shifted over time in conjunction with major geologic events. Phylogenetic comparative analysis reveal an overriding signal of decreasing rates of trait evolution through time. Furthermore, the data suggest two major rate shifts in trait evolution associated with bursts of evolutionary innovation. The first rate shift occurs in the aftermath of the Great Oxidation Event and “Snowball Earth” glaciations and is associated with decrease in the evolutionary rates around 1.8–1.6 Ga. This rate shift seems to indicate the end of a major diversification of cyanobacterial phenotypes–particularly related to traits associated with filamentous morphology, heterocysts and motility in freshwater ecosystems. Another burst appears around the time of the Neoproterozoic Oxidation Event in the Neoproterozoic, and is associated with the acquisition of traits involved in planktonic growth in marine habitats. Our results demonstrate how uniting genomic and phenotypic datasets in extant bacterial species can shed light on billion-year old events in Earth’s history.
Collapse
Affiliation(s)
- Josef C. Uyeda
- University of Idaho, Dept. Biological Sciences, Moscow, ID, United States of America
- * E-mail:
| | - Luke J. Harmon
- University of Idaho, Dept. Biological Sciences, Moscow, ID, United States of America
| | - Carrine E. Blank
- University of Montana, Dept. Geosciences, Missoula, MT, United States of America
| |
Collapse
|
11
|
Abstract
Phylogenetic inference can potentially result in a more accurate tree using data from multiple loci. However, if the loci are incongruent-due to events such as incomplete lineage sorting or horizontal gene transfer-it can be misleading to infer a single tree. To address this, many previous contributions have taken a mechanistic approach, by modeling specific processes. Alternatively, one can cluster loci without assuming how these incongruencies might arise. Such "process-agnostic" approaches typically infer a tree for each locus and cluster these. There are, however, many possible combinations of tree distance and clustering methods; their comparative performance in the context of tree incongruence is largely unknown. Furthermore, because standard model selection criteria such as AIC cannot be applied to problems with a variable number of topologies, the issue of inferring the optimal number of clusters is poorly understood. Here, we perform a large-scale simulation study of phylogenetic distances and clustering methods to infer loci of common evolutionary history. We observe that the best-performing combinations are distances accounting for branch lengths followed by spectral clustering or Ward's method. We also introduce two statistical tests to infer the optimal number of clusters and show that they strongly outperform the silhouette criterion, a general-purpose heuristic. We illustrate the usefulness of the approach by 1) identifying errors in a previous phylogenetic analysis of yeast species and 2) identifying topological incongruence among newly sequenced loci of the globeflower fly genus Chiastocheta We release treeCl, a new program to cluster genes of common evolutionary history (http://git.io/treeCl).
Collapse
Affiliation(s)
- Kevin Gori
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Campus, Hinxton, United Kingdom
| | - Tomasz Suchan
- Department of Ecology and Evolution, Biophore Building, UNIL-Sorge, University of Lausanne, Lausanne, Switzerland
| | - Nadir Alvarez
- Department of Ecology and Evolution, Biophore Building, UNIL-Sorge, University of Lausanne, Lausanne, Switzerland
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Campus, Hinxton, United Kingdom
| | - Christophe Dessimoz
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Campus, Hinxton, United Kingdom Department of Ecology and Evolution, Biophore Building, UNIL-Sorge, University of Lausanne, Lausanne, Switzerland Department of Genetics, Evolution & Environment, University College London, London, United Kingdom Department of Computer Science, University College London, London, United Kingdom Centre for Integrative Genomics, University of Lausanne, Lausanne, Switzerland Swiss Institute of Bioinformatics, Biophore, Lausanne, Switzerland
| |
Collapse
|
12
|
Mengual-Chuliá B, Bedhomme S, Lafforgue G, Elena SF, Bravo IG. Assessing parallel gene histories in viral genomes. BMC Evol Biol 2016; 16:32. [PMID: 26847371 PMCID: PMC4743424 DOI: 10.1186/s12862-016-0605-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 01/29/2016] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND The increasing abundance of sequence data has exacerbated a long known problem: gene trees and species trees for the same terminal taxa are often incongruent. Indeed, genes within a genome have not all followed the same evolutionary path due to events such as incomplete lineage sorting, horizontal gene transfer, gene duplication and deletion, or recombination. Considering conflicts between gene trees as an obstacle, numerous methods have been developed to deal with these incongruences and to reconstruct consensus evolutionary histories of species despite the heterogeneity in the history of their genes. However, inconsistencies can also be seen as a source of information about the specific evolutionary processes that have shaped genomes. RESULTS The goal of the approach here proposed is to exploit this conflicting information: we have compiled eleven variables describing phylogenetic relationships and evolutionary pressures and submitted them to dimensionality reduction techniques to identify genes with similar evolutionary histories. To illustrate the applicability of the method, we have chosen two viral datasets, namely papillomaviruses and Turnip mosaic virus (TuMV) isolates, largely dissimilar in genome, evolutionary distance and biology. Our method pinpoints viral genes with common evolutionary patterns. In the case of papillomaviruses, gene clusters match well our knowledge on viral biology and life cycle, illustrating the potential of our approach. For the less known TuMV, our results trigger new hypotheses about viral evolution and gene interaction. CONCLUSIONS The approach here presented allows turning phylogenetic inconsistencies into evolutionary information, detecting gene assemblies with similar histories, and could be a powerful tool for comparative pathogenomics.
Collapse
Affiliation(s)
- Beatriz Mengual-Chuliá
- Infections and Cancer Laboratory, Catalan Institute of Oncology (ICO), Barcelona, Spain.,Bellvitge Institute of Biomedical Research (IDIBELL), Barcelona, Spain
| | - Stéphanie Bedhomme
- Infections and Cancer Laboratory, Catalan Institute of Oncology (ICO), Barcelona, Spain.,Bellvitge Institute of Biomedical Research (IDIBELL), Barcelona, Spain.,Centre d'Ecologie Fonctionnelle et Evolutive, UMR CNRS 5175, Montpellier, France
| | - Guillaume Lafforgue
- Centre d'Ecologie Fonctionnelle et Evolutive, UMR CNRS 5175, Montpellier, France.,Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-Universidad Politécnica de Valencia, València, Spain
| | - Santiago F Elena
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-Universidad Politécnica de Valencia, València, Spain.,I2SysBio, Consejo Superior de Investigaciones Científicas-Universitat de València, València, Spain.,The Santa Fe Institute, Santa Fe, NM, USA
| | - Ignacio G Bravo
- Infections and Cancer Laboratory, Catalan Institute of Oncology (ICO), Barcelona, Spain. .,MIVEGEC (UMR CNRS 5290, IRD 224, UM), National Center for Scientific Research (CNRS), Montpellier, France. .,National Center for Scientific Research (CNRS), Maladies Infectieuses et Vecteurs: Ecologie, Génétique, Evolution et Contrôle (MIVEGEC), UMR CNRS 5290, IRD 224, UM, 911 Avenue Agropolis, BP 64501, 34394, Montpellier, Cedex 5, France.
| |
Collapse
|
13
|
Schierwater B, Holland PWH, Miller DJ, Stadler PF, Wiegmann BM, Wörheide G, Wray GA, DeSalle R. Never Ending Analysis of a Century Old Evolutionary Debate: “Unringing” the Urmetazoon Bell. Front Ecol Evol 2016. [DOI: 10.3389/fevo.2016.00005] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
14
|
Simmons MP, Sloan DB, Gatesy J. The effects of subsampling gene trees on coalescent methods applied to ancient divergences. Mol Phylogenet Evol 2016; 97:76-89. [PMID: 26768112 DOI: 10.1016/j.ympev.2015.12.013] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Revised: 12/03/2015] [Accepted: 12/20/2015] [Indexed: 10/22/2022]
Abstract
Gene-tree-estimation error is a major concern for coalescent methods of phylogenetic inference. We sampled eight empirical studies of ancient lineages with diverse numbers of taxa and genes for which the original authors applied one or more coalescent methods. We found that the average pairwise congruence among gene trees varied greatly both between studies and also often within a study. We recommend that presenting plots of pairwise congruence among gene trees in a dataset be treated as a standard practice for empirical coalescent studies so that readers can readily assess the extent and distribution of incongruence among gene trees. ASTRAL-based coalescent analyses generally outperformed MP-EST and STAR with respect to both internal consistency (congruence between analyses of subsamples of genes with the complete dataset of all genes) and congruence with the concatenation-based topology. We evaluated the approach of subsampling gene trees that are, on average, more congruent with other gene trees as a method to reduce artifacts caused by gene-tree-estimation errors on coalescent analyses. We suggest that this method is well suited to testing whether gene-tree-estimation error is a primary cause of incongruence between concatenation- and coalescent-based results, to reconciling conflicting phylogenetic results based on different coalescent methods, and to identifying genes affected by artifacts that may then be targeted for reciprocal illumination. We provide scripts that automate the process of calculating pairwise gene-tree incongruence and subsampling trees while accounting for differential taxon sampling among genes. Finally, we assert that multiple tree-search replicates should be implemented as a standard practice for empirical coalescent studies that apply MP-EST.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
| | - Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - John Gatesy
- Department of Biology, University of California, Riverside, CA 92521, USA
| |
Collapse
|
15
|
Oton EV, Quince C, Nicol GW, Prosser JI, Gubry-Rangin C. Phylogenetic congruence and ecological coherence in terrestrial Thaumarchaeota. ISME JOURNAL 2015; 10:85-96. [PMID: 26140533 PMCID: PMC4604658 DOI: 10.1038/ismej.2015.101] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Revised: 04/13/2015] [Accepted: 05/08/2015] [Indexed: 11/09/2022]
Abstract
Thaumarchaeota form a ubiquitously distributed archaeal phylum, comprising both the ammonia-oxidising archaea (AOA) and other archaeal groups in which ammonia oxidation has not been demonstrated (including Group 1.1c and Group 1.3). The ecology of AOA in terrestrial environments has been extensively studied using either a functional gene, encoding ammonia monooxygenase subunit A (amoA) or 16S ribosomal RNA (rRNA) genes, which show phylogenetic coherence with respect to soil pH. To test phylogenetic congruence between these two markers and to determine ecological coherence in all Thaumarchaeota, we performed high-throughput sequencing of 16S rRNA and amoA genes in 46 UK soils presenting 29 available contextual soil characteristics. Adaptation to pH and organic matter content reflected strong ecological coherence at various levels of taxonomic resolution for Thaumarchaeota (AOA and non-AOA), whereas nitrogen, total mineralisable nitrogen and zinc concentration were also important factors associated with AOA thaumarchaeotal community distribution. Other significant associations with environmental factors were also detected for amoA and 16S rRNA genes, reflecting different diversity characteristics between these two markers. Nonetheless, there was significant statistical congruence between the markers at fine phylogenetic resolution, supporting the hypothesis of low horizontal gene transfer between Thaumarchaeota. Group 1.1c Thaumarchaeota were also widely distributed, with two clusters predominating, particularly in environments with higher moisture content and organic matter, whereas a similar ecological pattern was observed for Group 1.3 Thaumarchaeota. The ecological and phylogenetic congruence identified is fundamental to understand better the life strategies, evolutionary history and ecosystem function of the Thaumarchaeota.
Collapse
Affiliation(s)
- Eduard Vico Oton
- Institute of Biological and Environmental Sciences, University of Aberdeen, Cruickshank Building, Aberdeen, UK.,School of Life Sciences, Centre for Biomolecular Sciences, University of Nottingham, Nottingham, UK
| | | | - Graeme W Nicol
- Institute of Biological and Environmental Sciences, University of Aberdeen, Cruickshank Building, Aberdeen, UK.,Laboratoire Ampère UMR CNRS 5005, École Centrale de Lyon, Université de Lyon, Ecully CEDEX, France
| | - James I Prosser
- Institute of Biological and Environmental Sciences, University of Aberdeen, Cruickshank Building, Aberdeen, UK
| | - Cécile Gubry-Rangin
- Institute of Biological and Environmental Sciences, University of Aberdeen, Cruickshank Building, Aberdeen, UK
| |
Collapse
|
16
|
Abstract
The large phylogenetic distance separating eukaryotic genes and their archaeal orthologs has prevented identification of the position of the eukaryotic root in phylogenomic studies. Recently, an innovative approach has been proposed to circumvent this issue: the use as phylogenetic markers of proteins that have been transferred from bacterial donor sources to eukaryotes, after their emergence from Archaea. Using this approach, two recent independent studies have built phylogenomic datasets based on bacterial sequences, leading to different predictions of the eukaryotic root. Taking advantage of additional genome sequences from the jakobid Andalucia godoyi and the two known malawimonad species (Malawimonas jakobiformis and Malawimonas californiana), we reanalyzed these two phylogenomic datasets. We show that both datasets pinpoint the same phylogenetic position of the eukaryotic root that is between "Unikonta" and "Bikonta," with malawimonad and collodictyonid lineages on the Unikonta side of the root. Our results firmly indicate that (i) the supergroup Excavata is not monophyletic and (ii) the last common ancestor of eukaryotes was a biflagellate organism. Based on our results, we propose to rename the two major eukaryotic groups Unikonta and Bikonta as Opimoda and Diphoda, respectively.
Collapse
|
17
|
Abstract
The human microbiome is the ensemble of genes in the microbes that live inside and on the surface of humans. Because microbial sequencing information is now much easier to come by than phenotypic information, there has been an explosion of sequencing and genetic analysis of microbiome samples. Much of the analytical work for these sequences involves phylogenetics, at least indirectly, but methodology has developed in a somewhat different direction than for other applications of phylogenetics. In this article, I review the field and its methods from the perspective of a phylogeneticist, as well as describing current challenges for phylogenetics coming from this type of work.
Collapse
Affiliation(s)
- Frederick A Matsen
- Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 91802, USA
| |
Collapse
|
18
|
McInerney J, Cummins C, Haggerty L. Goods-thinking vs. tree-thinking: Finding a place for mobile genetic elements. Mob Genet Elements 2014; 1:304-308. [PMID: 22545244 PMCID: PMC3337142 DOI: 10.4161/mge.19153] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
While it has become increasingly clear that the Tree of Life hypothesis has limitations in its ability to describe the evolution of all evolving entities on the planet, there has been a marked reluctance to move away from the tree-based language. Ironically, while modifying the idea of the Tree of Life to the extent that it is only very distantly related to its original descriptions, there has been a very careful attempt to retain the language of tree-thinking. The recent movement away from a tree-thinking language toward a goods-thinking language and perspective is a significant improvement. In this commentary, we describe how goods-thinking can provide better descriptions of evolution, can integrate evolution with environment more closely and can offer an equal place for Mobile Genetic Elements and chromosomal elements in discussions of evolutionary history.
Collapse
Affiliation(s)
- James McInerney
- Bioinformatics and Molecular Evolution Unit; Department of Biology; National University of Ireland Maynooth, Co.; Kildare, Ireland
| | | | | |
Collapse
|
19
|
Schrödl M, Stöger I. A review on deep molluscan phylogeny: old markers, integrative approaches, persistent problems. J NAT HIST 2014. [DOI: 10.1080/00222933.2014.963184] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
20
|
Andrade SCS, Montenegro H, Strand M, Schwartz ML, Kajihara H, Norenburg JL, Turbeville JM, Sundberg P, Giribet G. A Transcriptomic Approach to Ribbon Worm Systematics (Nemertea): Resolving the Pilidiophora Problem. Mol Biol Evol 2014; 31:3206-15. [DOI: 10.1093/molbev/msu253] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
|
21
|
Fernández R, Laumer CE, Vahtera V, Libro S, Kaluziak S, Sharma PP, Pérez-Porro AR, Edgecombe GD, Giribet G. Evaluating topological conflict in centipede phylogeny using transcriptomic data sets. Mol Biol Evol 2014; 31:1500-13. [PMID: 24674821 DOI: 10.1093/molbev/msu108] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Relationships between the five extant orders of centipedes have been considered solved based on morphology. Phylogenies based on samples of up to a few dozen genes have largely been congruent with the morphological tree apart from an alternative placement of one order, the relictual Craterostigmomorpha, consisting of two species in Tasmania and New Zealand. To address this incongruence, novel transcriptomic data were generated to sample all five orders of centipedes and also used as a test case for studying gene-tree incongruence. Maximum likelihood and Bayesian mixture model analyses of a data set composed of 1,934 orthologs with 45% missing data, as well as the 389 orthologs in the least saturated, stationary quartile, retrieve strong support for a sister-group relationship between Craterostigmomorpha and all other pleurostigmophoran centipedes, of which the latter group is newly named Amalpighiata. The Amalpighiata hypothesis, which shows little gene-tree incongruence and is robust to the influence of among-taxon compositional heterogeneity, implies convergent evolution in several morphological and behavioral characters traditionally used in centipede phylogenetics, such as maternal brood care, but accords with patterns of first appearances in the fossil record.
Collapse
Affiliation(s)
- Rosa Fernández
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA
| | - Christopher E Laumer
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA
| | - Varpu Vahtera
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MAZoological Museum, Department of Biology, University of Turku, Turku, Finland
| | - Silvia Libro
- Marine Science Center, Northeastern University, Nahant, MA
| | | | - Prashant P Sharma
- Division of Invertebrate Zoology, American Museum of Natural History, New York, NY
| | - Alicia R Pérez-Porro
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MACentre d'Estudis Avançats de Blanes (CEAB-CSIC), Catalonia, Spain
| | - Gregory D Edgecombe
- Department of Earth Sciences, The Natural History Museum, London, United Kingdom
| | - Gonzalo Giribet
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA
| |
Collapse
|
22
|
Ramulu HG, Groussin M, Talla E, Planel R, Daubin V, Brochier-Armanet C. Ribosomal proteins: toward a next generation standard for prokaryotic systematics? Mol Phylogenet Evol 2014; 75:103-17. [PMID: 24583288 DOI: 10.1016/j.ympev.2014.02.013] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Revised: 01/23/2014] [Accepted: 02/17/2014] [Indexed: 10/25/2022]
Abstract
The seminal work of Carl Woese and co-workers has contributed to promote the RNA component of the small subunit of the ribosome (SSU rRNA) as a "gold standard" of modern prokaryotic taxonomy and systematics, and an essential tool to explore microbial diversity. Yet, this marker has a limited resolving power, especially at deep phylogenetic depth and can lead to strongly biased trees. The ever-larger number of available complete genomes now calls for a novel standard dataset of robust protein markers that may complement SSU rRNA. In this respect, concatenation of ribosomal proteins (r-proteins) is being growingly used to reconstruct large-scale prokaryotic phylogenies, but their suitability for systematic and/or taxonomic purposes has not been specifically addressed. Using Proteobacteria as a case study, we show that amino acid and nucleic acid r-protein sequences contain a reliable phylogenetic signal at a wide range of taxonomic depths, which has not been totally blurred by mutational saturation or horizontal gene transfer. The use of accurate evolutionary models and reconstruction methods allows overcoming most tree reconstruction artefacts resulting from compositional biases and/or fast evolutionary rates. The inferred phylogenies allow clarifying the relationships among most proteobacterial orders and families, along with the position of several unclassified lineages, suggesting some possible revisions of the current classification. In addition, we investigate the root of the Proteobacteria by considering the time-variation of nucleic acid composition of r-protein sequences and the information carried by horizontal gene transfers, two approaches that do not require the use of an outgroup and limit tree reconstruction artefacts. Altogether, our analyses indicate that r-proteins may represent a promising standard for prokaryotic taxonomy and systematics.
Collapse
Affiliation(s)
- Hemalatha Golaconda Ramulu
- Aix-Marseille Université, CNRS, UMR 7283, Laboratoire de Chimie Bactérienne, IMM, 31 chemin Joseph Aiguier, F-13402 Marseille, France
| | - Mathieu Groussin
- Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, 43 boulevard du 11 novembre 1918, F-69622 Villeurbanne, France
| | - Emmanuel Talla
- Aix-Marseille Université, CNRS, UMR 7283, Laboratoire de Chimie Bactérienne, IMM, 31 chemin Joseph Aiguier, F-13402 Marseille, France
| | - Remi Planel
- Aix-Marseille Université, CNRS, UMR 7283, Laboratoire de Chimie Bactérienne, IMM, 31 chemin Joseph Aiguier, F-13402 Marseille, France
| | - Vincent Daubin
- Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, 43 boulevard du 11 novembre 1918, F-69622 Villeurbanne, France
| | - Céline Brochier-Armanet
- Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, 43 boulevard du 11 novembre 1918, F-69622 Villeurbanne, France.
| |
Collapse
|
23
|
An alternative root for the eukaryote tree of life. Curr Biol 2014; 24:465-70. [PMID: 24508168 DOI: 10.1016/j.cub.2014.01.036] [Citation(s) in RCA: 147] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2013] [Revised: 12/12/2013] [Accepted: 01/16/2014] [Indexed: 01/02/2023]
Abstract
The root of the eukaryote tree of life defines some of the most fundamental relationships among species. It is also critical for defining the last eukaryote common ancestor (LECA), the shared heritage of all extant species. The unikont-bikont root has been the reigning paradigm for eukaryotes for more than 10 years but is becoming increasingly controversial. We developed a carefully vetted data set, consisting of 37 nuclear-encoded proteins of close bacterial ancestry (euBacs) and their closest bacterial relatives, augmented by deep sequencing of the Acrasis kona (Heterolobosea, Discoba) transcriptome. Phylogenetic analysis of these data produces a highly robust, fully resolved global phylogeny of eukaryotes. The tree sorts all examined eukaryotes into three megagroups and identifies the Discoba, and potentially its parent taxon Excavata, as the sister group to the bulk of known eukaryote diversity, the proposed Neozoa (Amorphea + Stramenopila+Alveolata+Rhizaria+Plantae [SARP]). All major alternative hypotheses are rejected with as little as ∼50% of the data, and this resolution is unaffected by the presence of fast-evolving alignment positions or distant outgroup sequences. This "neozoan-excavate" root revises hypotheses of early eukaryote evolution and highlights the importance of the poorly studied Discoba for understanding the evolution of eukaryotic diversity and basic cellular processes.
Collapse
|
24
|
Chung Y, Perna NT, Ané C. Computing the joint distribution of tree shape and tree distance for gene tree inference and recombination detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1263-1274. [PMID: 24384712 DOI: 10.1109/tcbb.2013.109] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Ancestral recombination events can cause the underlying genealogy of a site to vary along the genome. We consider Bayesian models to simultaneously detect recombination breakpoints in very long sequence alignments and estimate the phylogenetic tree of each block between breakpoints. The models we consider use a dissimilarity measure between trees in their prior distribution to favor similar trees at neighboring loci. We show empirical evidence in Enterobacteria that neighboring genomic regions have similar trees. The main hurdle in using such models is the need to properly calculate the normalizing function for the prior probabilities on trees. In this work, we quantify the impact of approximating this normalizing function as done in biomc2, a hierarchical Bayesian method to detect recombination based on distance between tree topologies. We then derive an algorithm to calculate the normalizing function exactly, for a Gibbs distribution based on the Robinson-Foulds (RF) distance between gene trees at neighboring loci. At the core is the calculation of the joint distribution of the shape of a random tree and its RF distance to a fixed tree. We also propose fast approximations to the normalizing function, which are shown to be very accurate with little impact on the Bayesian inference.
Collapse
|
25
|
Cornils A, Blanco-Bercial L. Phylogeny of the Paracalanidae Giesbrecht, 1888 (Crustacea: Copepoda: Calanoida). Mol Phylogenet Evol 2013; 69:861-72. [PMID: 23831457 DOI: 10.1016/j.ympev.2013.06.018] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2013] [Revised: 06/21/2013] [Accepted: 06/22/2013] [Indexed: 11/16/2022]
Abstract
The Paracalanidae are ecologically-important marine planktonic copepods that occur in the epipelagic zone in temperate and tropical waters. They are often the dominant taxon - in terms of biomass and abundance - in continental shelf regions. As primary consumers, they form a vital link in the pelagic food web between primary producers and higher trophic levels. Despite the ecological importance of the taxon, evolutionary and systematic relationships within the family remain largely unknown. A multigene phylogeny including 24 species, including representatives for all seven genera, was determined based on two nuclear genes, small-subunit (18S) ribosomal RNA and Histone 3 (H3) and one mitochondrial gene, cytochrome c oxidase subunit I (COI). The molecular phylogeny was well supported by Maximum likelihood and Bayesian inference analysis; all genera were found to be monophyletic, except for Paracalanus, which was separated into two distinct clades: the Paracalanus aculeatus group and Paracalanus parvus group. The molecular phylogeny also confirmed previous findings that Mecynocera and Calocalanus are genera of the family Paracalanidae. For comparison, a morphological phylogeny was created for 35 paracalanid species based on 54 morphological characters derived from published descriptions. The morphological phylogeny did not resolve all genera as monophyletic and bootstrap support was not strong. Molecular and morphological phylogenies were not congruent in the positioning of Bestiolina and the Paracalanus species groups, possibly due to the lack of sufficient phylogenetically-informative morphological characters.
Collapse
Affiliation(s)
- Astrid Cornils
- Alfred Wegener Institute for Polar and Marine Research, Am alten Hafen 26, 27568 Bremerhaven, Germany.
| | | |
Collapse
|
26
|
Lasek-Nesselquist E, Gogarten JP. The effects of model choice and mitigating bias on the ribosomal tree of life. Mol Phylogenet Evol 2013; 69:17-38. [PMID: 23707703 DOI: 10.1016/j.ympev.2013.05.006] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Revised: 04/26/2013] [Accepted: 05/08/2013] [Indexed: 01/03/2023]
Abstract
Deep-level relationships within Bacteria, Archaea, and Eukarya as well as the relationships of these three domains to each other require resolution. The ribosomal machinery, universal to all cellular life, represents a protein repertoire resistant to horizontal gene transfer, which provides a largely congruent signal necessary for reconstructing a tree suitable as a backbone for life's reticulate history. Here, we generate a ribosomal tree of life from a robust taxonomic sampling of Bacteria, Archaea, and Eukarya to elucidate deep-level intra-domain and inter-domain relationships. Lack of phylogenetic information and systematic errors caused by inadequate models (that cannot account for substitution rate or compositional heterogeneities) or improper model selection compound conflicting phylogenetic signals from HGT and/or paralogy. Thus, we tested several models of varying sophistication on three different datasets, performed removal of fast-evolving or long-branched Archaea and Eukarya, and employed three different strategies to remove compositional heterogeneity to examine their effects on the topological outcome. Our results support a two-domain topology for the tree of life, where Eukarya emerges from within Archaea as sister to a Korarchaeota/Thaumarchaeota (KT) or Crenarchaeota/KT clade for all models under all or at least one of the strategies employed. Taxonomic manipulation allows single-matrix and certain mixture models to vacillate between two-domain and three-domain phylogenies. We find that models vary in their ability to resolve different areas of the tree of life, which does not necessarily correlate with model complexity. For example, both single-matrix and some mixture models recover monophyletic Crenarchaeota and Euryarchaeota archaeal phyla. In contrast, the most sophisticated model recovers a paraphyletic Euryarchaeota but detects two large clades that comprise the Bacteria, which were recovered separately but never together in the other models. Overall, models recovered consistent topologies despite dataset modifications due to the removal of compositional bias, which reflects either ineffective bias reduction or robust datasets that allow models to overcome reconstruction artifacts. We recommend a comparative approach for evolutionary models to identify model weaknesses as well as consensus relationships.
Collapse
|
27
|
Lang JM, Darling AE, Eisen JA. Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PLoS One 2013; 8:e62510. [PMID: 23638103 PMCID: PMC3636077 DOI: 10.1371/journal.pone.0062510] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Accepted: 03/26/2013] [Indexed: 11/29/2022] Open
Abstract
Over 3000 microbial (bacterial and archaeal) genomes have been made publically available to date, providing an unprecedented opportunity to examine evolutionary genomic trends and offering valuable reference data for a variety of other studies such as metagenomics. The utility of these genome sequences is greatly enhanced when we have an understanding of how they are phylogenetically related to each other. Therefore, we here describe our efforts to reconstruct the phylogeny of all available bacterial and archaeal genomes. We identified 24, single-copy, ubiquitous genes suitable for this phylogenetic analysis. We used two approaches to combine the data for the 24 genes. First, we concatenated alignments of all genes into a single alignment from which a Maximum Likelihood (ML) tree was inferred using RAxML. Second, we used a relatively new approach to combining gene data, Bayesian Concordance Analysis (BCA), as implemented in the BUCKy software, in which the results of 24 single-gene phylogenetic analyses are used to generate a "primary concordance" tree. A comparison of the concatenated ML tree and the primary concordance (BUCKy) tree reveals that the two approaches give similar results, relative to a phylogenetic tree inferred from the 16S rRNA gene. After comparing the results and the methods used, we conclude that the current best approach for generating a single phylogenetic tree, suitable for use as a reference phylogeny for comparative analyses, is to perform a maximum likelihood analysis of a concatenated alignment of conserved, single-copy genes.
Collapse
Affiliation(s)
- Jenna Morgan Lang
- Department of Medical Microbiology and Immunology and Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
- Department of Energy Joint Genome Institute, Walnut Creek, California, United States of America
| | - Aaron E. Darling
- Department of Medical Microbiology and Immunology and Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
| | - Jonathan A. Eisen
- Department of Medical Microbiology and Immunology and Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
- Department of Energy Joint Genome Institute, Walnut Creek, California, United States of America
| |
Collapse
|
28
|
Grünewald S, Spillner A, Bastkowski S, Bögershausen A, Moulton V. SuperQ: computing supernetworks from quartets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:151-60. [PMID: 23702551 DOI: 10.1109/tcbb.2013.8] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Supertrees are a commonly used tool in phylogenetics to summarize collections of partial phylogenetic trees. As a generalization of supertrees, phylogenetic supernetworks allow, in addition, the visual representation of conflict between the trees that is not possible to observe with a single tree. Here, we introduce SuperQ, a new method for constructing such supernetworks (SuperQ is freely available at >www.uea.ac.uk/computing/superq.). It works by first breaking the input trees into quartet trees, and then stitching these together to form a special kind of phylogenetic network, called a split network. This stitching process is performed using an adaptation of the QNet method for split network reconstruction employing a novel approach to use the branch lengths from the input trees to estimate the branch lengths in the resulting network. Compared with previous supernetwork methods, SuperQ has the advantage of producing a planar network. We compare the performance of SuperQ to the Z-closure and Q-imputation supernetwork methods, and also present an analysis of some published data sets as an illustration of its applicability.
Collapse
Affiliation(s)
- Stefan Grünewald
- CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China.
| | | | | | | | | |
Collapse
|
29
|
Williams TA, Foster PG, Nye TMW, Cox CJ, Embley TM. A congruent phylogenomic signal places eukaryotes within the Archaea. Proc Biol Sci 2012; 279:4870-9. [PMID: 23097517 PMCID: PMC3497233 DOI: 10.1098/rspb.2012.1795] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Determining the relationships among the major groups of cellular life is important for understanding the evolution of biological diversity, but is difficult given the enormous time spans involved. In the textbook ‘three domains’ tree based on informational genes, eukaryotes and Archaea share a common ancestor to the exclusion of Bacteria. However, some phylogenetic analyses of the same data have placed eukaryotes within the Archaea, as the nearest relatives of different archaeal lineages. We compared the support for these competing hypotheses using sophisticated phylogenetic methods and an improved sampling of archaeal biodiversity. We also employed both new and existing tests of phylogenetic congruence to explore the level of uncertainty and conflict in the data. Our analyses suggested that much of the observed incongruence is weakly supported or associated with poorly fitting evolutionary models. All of our phylogenetic analyses, whether on small subunit and large subunit ribosomal RNA or concatenated protein-coding genes, recovered a monophyletic group containing eukaryotes and the TACK archaeal superphylum comprising the Thaumarchaeota, Aigarchaeota, Crenarchaeota and Korarchaeota. Hence, while our results provide no support for the iconic three-domain tree of life, they are consistent with an extended eocyte hypothesis whereby vital components of the eukaryotic nuclear lineage originated from within the archaeal radiation.
Collapse
Affiliation(s)
- Tom A Williams
- Institute for Cell and Molecular Biosciences, University of Newcastle, Newcastle upon Tyne NE2 4HH, UK
| | | | | | | | | |
Collapse
|
30
|
de Vienne DM, Ollier S, Aguileta G. Phylo-MCOA: a fast and efficient method to detect outlier genes and species in phylogenomics using multiple co-inertia analysis. Mol Biol Evol 2012; 29:1587-98. [PMID: 22319162 DOI: 10.1093/molbev/msr317] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Full genome data sets are currently being explored on a regular basis to infer phylogenetic trees, but there are often discordances among the trees produced by different genes. An important goal in phylogenomics is to identify which individual gene and species produce the same phylogenetic tree and are thus likely to share the same evolutionary history. On the other hand, it is also essential to identify which genes and species produce discordant topologies and therefore evolve in a different way or represent noise in the data. The latter are outlier genes or species and they can provide a wealth of information on potentially interesting biological processes, such as incomplete lineage sorting, hybridization, and horizontal gene transfers. Here, we propose a new method to explore the genomic tree space and detect outlier genes and species based on multiple co-inertia analysis (MCOA), which efficiently captures and compares the similarities in the phylogenetic topologies produced by individual genes. Our method allows the rapid identification of outlier genes and species by extracting the similarities and discrepancies, in terms of the pairwise distances, between all the species in all the trees, simultaneously. This is achieved by using MCOA, which finds successive decomposition axes from individual ordinations (i.e., derived from distance matrices) that maximize a covariance function. The method is freely available as a set of R functions. The source code and tutorial can be found online at http://phylomcoa.cgenomics.org.
Collapse
Affiliation(s)
- Damien M de Vienne
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) and UPF, Barcelona, Spain.
| | | | | |
Collapse
|
31
|
Tamminen M, Virta M, Fani R, Fondi M. Large-scale analysis of plasmid relationships through gene-sharing networks. Mol Biol Evol 2011; 29:1225-40. [PMID: 22130968 DOI: 10.1093/molbev/msr292] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Plasmids are vessels of genetic exchange in microbial communities. They are known to transfer between different host organisms and acquire diverse genetic elements from chromosomes and/or other plasmids. Therefore, they constitute an important element in microbial evolution by rapidly disseminating various genetic properties among different communities. A paradigmatic example of this is the dissemination of antibiotic resistance (AR) genes that has resulted in the emergence of multiresistant pathogenic bacterial strains. To globally analyze the evolutionary dynamics of plasmids, we built a large graph in which 2,343 plasmids (nodes) are connected according to the proteins shared by each other. The analysis of this gene-sharing network revealed an overall coherence between network clustering and the phylogenetic classes of the corresponding microorganisms, likely resulting from genetic barriers to horizontal gene transfer between distant phylogenetic groups. Habitat was not a crucial factor in clustering as plasmids from organisms inhabiting different environments were often found embedded in the same cluster. Analyses of network metrics revealed a statistically significant correlation between plasmid mobility and their centrality within the network, providing support to the observation that mobile plasmids are particularly important in spreading genes in microbial communities. Finally, our study reveals an extensive (and previously undescribed) sharing of AR genes between Actinobacteria and Gammaproteobacteria, suggesting that the former might represent an important reservoir of AR genes for the latter.
Collapse
Affiliation(s)
- Manu Tamminen
- Department of Food and Environmental Sciences, University of Helsinki, Helsinki, Finland
| | | | | | | |
Collapse
|
32
|
McInerney JO, Pisani D, Bapteste E, O'Connell MJ. The Public Goods Hypothesis for the evolution of life on Earth. Biol Direct 2011; 6:41. [PMID: 21861918 PMCID: PMC3179745 DOI: 10.1186/1745-6150-6-41] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2011] [Accepted: 08/23/2011] [Indexed: 02/01/2023] Open
Abstract
It is becoming increasingly difficult to reconcile the observed extent of horizontal gene transfers with the central metaphor of a great tree uniting all evolving entities on the planet. In this manuscript we describe the Public Goods Hypothesis and show that it is appropriate in order to describe biological evolution on the planet. According to this hypothesis, nucleotide sequences (genes, promoters, exons, etc.) are simply seen as goods, passed from organism to organism through both vertical and horizontal transfer. Public goods sequences are defined by having the properties of being largely non-excludable (no organism can be effectively prevented from accessing these sequences) and non-rival (while such a sequence is being used by one organism it is also available for use by another organism). The universal nature of genetic systems ensures that such non-excludable sequences exist and non-excludability explains why we see a myriad of genes in different combinations in sequenced genomes. There are three features of the public goods hypothesis. Firstly, segments of DNA are seen as public goods, available for all organisms to integrate into their genomes. Secondly, we expect the evolution of mechanisms for DNA sharing and of defense mechanisms against DNA intrusion in genomes. Thirdly, we expect that we do not see a global tree-like pattern. Instead, we expect local tree-like patterns to emerge from the combination of a commonage of genes and vertical inheritance of genomes by cell division. Indeed, while genes are theoretically public goods, in reality, some genes are excludable, particularly, though not only, when they have variant genetic codes or behave as coalition or club goods, available for all organisms of a coalition to integrate into their genomes, and non-rival within the club. We view the Tree of Life hypothesis as a regionalized instance of the Public Goods hypothesis, just like classical mechanics and euclidean geometry are seen as regionalized instances of quantum mechanics and Riemannian geometry respectively. We argue for this change using an axiomatic approach that shows that the Public Goods hypothesis is a better accommodation of the observed data than the Tree of Life hypothesis.
Collapse
Affiliation(s)
- James O McInerney
- Molecular Evolution and Bioinformatics Unit, Department of Biology, National University of Ireland Maynooth, County Kildare, Ireland.
| | | | | | | |
Collapse
|
33
|
Abstract
Life is a chemical reaction. Three major transitions in early evolution are considered without recourse to a tree of life. The origin of prokaryotes required a steady supply of energy and electrons, probably in the form of molecular hydrogen stemming from serpentinization. Microbial genome evolution is not a treelike process because of lateral gene transfer and the endosymbiotic origins of organelles. The lack of true intermediates in the prokaryote-to-eukaryote transition has a bioenergetic cause. This article was reviewed by Dan Graur, W. Ford Doolittle, Eugene V. Koonin and Christophe Malaterre.
Collapse
Affiliation(s)
- William F Martin
- Institut of Botany III, University of Düsseldorf, 40225 Düsseldorf, Germany.
| |
Collapse
|
34
|
Leigh JW, Lapointe FJ, Lopez P, Bapteste E. Evaluating phylogenetic congruence in the post-genomic era. Genome Biol Evol 2011; 3:571-87. [PMID: 21712432 PMCID: PMC3156567 DOI: 10.1093/gbe/evr050] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/27/2011] [Indexed: 12/04/2022] Open
Abstract
Congruence is a broadly applied notion in evolutionary biology used to justify multigene phylogeny or phylogenomics, as well as in studies of coevolution, lateral gene transfer, and as evidence for common descent. Existing methods for identifying incongruence or heterogeneity using character data were designed for data sets that are both small and expected to be rarely incongruent. At the same time, methods that assess incongruence using comparison of trees test a null hypothesis of uncorrelated tree structures, which may be inappropriate for phylogenomic studies. As such, they are ill-suited for the growing number of available genome sequences, most of which are from prokaryotes and viruses, either for phylogenomic analysis or for studies of the evolutionary forces and events that have shaped these genomes. Specifically, many existing methods scale poorly with large numbers of genes, cannot accommodate high levels of incongruence, and do not adequately model patterns of missing taxa for different markers. We propose the development of novel incongruence assessment methods suitable for the analysis of the molecular evolution of the vast majority of life and support the investigation of homogeneity of evolutionary process in cases where markers do not share identical tree structures.
Collapse
Affiliation(s)
- Jessica W Leigh
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand.
| | | | | | | |
Collapse
|