1
|
Gamblin J, Lambert A, Blanquart F. Persistent, Private, and Mobile Genes: A Model for Gene Dynamics in Evolving Pangenomes. Mol Biol Evol 2025; 42:msaf001. [PMID: 39812022 PMCID: PMC11781223 DOI: 10.1093/molbev/msaf001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 11/22/2024] [Accepted: 12/17/2024] [Indexed: 01/16/2025] Open
Abstract
The pangenome of a species is the set of all genes carried by at least one member of the species. In bacteria, pangenomes can be much larger than the set of genes carried by a single organism. Many questions remain unanswered regarding the evolutionary forces shaping the patterns of the presence/absence of genes in pangenomes of a given species. We introduce a new model for bacterial pangenome evolution along a species phylogeny that explicitly describes the timing of appearance of each gene in the species and accounts for three generic types of gene evolutionary dynamics: persistent genes that are present in the ancestral genome, private genes that are specific to a given clade, and mobile genes that are imported once into the gene pool and then undergo frequent horizontal gene transfers. We call this model the Persistent-Private-Mobile (PPM) model. We develop an algorithm fitting the PPM model and apply it to a dataset of 902 Salmonella enterica genomes. We show that the best fitting model is able to reproduce the global pattern of some multivariate statistics like the gene frequency spectrum and the parsimony vs. frequency plot. Moreover, the gene classification induced by the PPM model allows us to study the position of accessory genes on the chromosome depending on their category, as well as the gene functions that are most present in each category. This work paves the way for a mechanistic understanding of pangenome evolution, and the PPM model developed here could be used for dynamics-aware gene classification.
Collapse
Affiliation(s)
- Jasmine Gamblin
- Center for Interdisciplinary Research in Biology (CIRB), College de France, CNRS, INSERM, Université PSL, Paris, France
| | - Amaury Lambert
- Center for Interdisciplinary Research in Biology (CIRB), College de France, CNRS, INSERM, Université PSL, Paris, France
- Institut de Biologie de l’ENS (IBENS), École Normale Supérieure (ENS), CNRS, INSERM, Université PSL, Paris, France
| | - François Blanquart
- Center for Interdisciplinary Research in Biology (CIRB), College de France, CNRS, INSERM, Université PSL, Paris, France
| |
Collapse
|
2
|
Maestri R, Perez-Lamarque B, Zhukova A, Morlon H. Recent evolutionary origin and localized diversity hotspots of mammalian coronaviruses. eLife 2024; 13:RP91745. [PMID: 39196812 PMCID: PMC11357359 DOI: 10.7554/elife.91745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2024] Open
Abstract
Several coronaviruses infect humans, with three, including the SARS-CoV2, causing diseases. While coronaviruses are especially prone to induce pandemics, we know little about their evolutionary history, host-to-host transmissions, and biogeography. One of the difficulties lies in dating the origination of the family, a particularly challenging task for RNA viruses in general. Previous cophylogenetic tests of virus-host associations, including in the Coronaviridae family, have suggested a virus-host codiversification history stretching many millions of years. Here, we establish a framework for robustly testing scenarios of ancient origination and codiversification versus recent origination and diversification by host switches. Applied to coronaviruses and their mammalian hosts, our results support a scenario of recent origination of coronaviruses in bats and diversification by host switches, with preferential host switches within mammalian orders. Hotspots of coronavirus diversity, concentrated in East Asia and Europe, are consistent with this scenario of relatively recent origination and localized host switches. Spillovers from bats to other species are rare, but have the highest probability to be towards humans than to any other mammal species, implicating humans as the evolutionary intermediate host. The high host-switching rates within orders, as well as between humans, domesticated mammals, and non-flying wild mammals, indicates the potential for rapid additional spreading of coronaviruses across the world. Our results suggest that the evolutionary history of extant mammalian coronaviruses is recent, and that cases of long-term virus-host codiversification have been largely over-estimated.
Collapse
Affiliation(s)
- Renan Maestri
- Institut de Biologie de l'École Normale Supérieure (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSLParisFrance
- Departamento de Ecologia, Instituto de Biociências, Universidade Federal do Rio Grande do SulPorto AlegreBrazil
| | - Benoît Perez-Lamarque
- Institut de Biologie de l'École Normale Supérieure (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSLParisFrance
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum national d’histoire naturelle, CNRS, Sorbonne Université, EPHE, UAParisFrance
| | - Anna Zhukova
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics HubParisFrance
| | - Hélène Morlon
- Institut de Biologie de l'École Normale Supérieure (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSLParisFrance
| |
Collapse
|
3
|
Gàlvez-Morante A, Guéguen L, Natsidis P, Telford MJ, Richter DJ. Dollo Parsimony Overestimates Ancestral Gene Content Reconstructions. Genome Biol Evol 2024; 16:evae062. [PMID: 38518756 PMCID: PMC10995720 DOI: 10.1093/gbe/evae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 03/15/2024] [Accepted: 03/19/2024] [Indexed: 03/24/2024] Open
Abstract
Ancestral reconstruction is a widely used technique that has been applied to understand the evolutionary history of gain and loss of gene families. Ancestral gene content can be reconstructed via different phylogenetic methods, but many current and previous studies employ Dollo parsimony. We hypothesize that Dollo parsimony is not appropriate for ancestral gene content reconstruction inferences based on sequence homology, as Dollo parsimony is derived from the assumption that a complex character cannot be regained. This premise does not accurately model molecular sequence evolution, in which false orthology can result from sequence convergence or lateral gene transfer. The aim of this study is to test Dollo parsimony's suitability for ancestral gene content reconstruction and to compare its inferences with a maximum likelihood-based approach that allows a gene family to be gained more than once within a tree. We first compared the performance of the two approaches on a series of artificial data sets each of 5,000 genes that were simulated according to a spectrum of evolutionary rates without gene gain or loss, so that inferred deviations from the true gene count would arise only from errors in orthology inference and ancestral reconstruction. Next, we reconstructed protein domain evolution on a phylogeny representing known eukaryotic diversity. We observed that Dollo parsimony produced numerous ancestral gene content overestimations, especially at nodes closer to the root of the tree. These observations led us to the conclusion that, confirming our hypothesis, Dollo parsimony is not an appropriate method for ancestral reconstruction studies based on sequence homology.
Collapse
Affiliation(s)
- Alex Gàlvez-Morante
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona 08003, Spain
| | - Laurent Guéguen
- LBBE, UMR 5558, CNRS, Université Claude Bernard Lyon 1, Villeurbanne 69622, France
| | - Paschalis Natsidis
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Maximilian J Telford
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Daniel J Richter
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona 08003, Spain
| |
Collapse
|
4
|
Cribbie EP, Doerr D, Chauve C. AGO, a Framework for the Reconstruction of Ancestral Syntenies and Gene Orders. Methods Mol Biol 2024; 2802:247-265. [PMID: 38819563 DOI: 10.1007/978-1-0716-3838-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Reconstructing ancestral gene orders from the genome data of extant species is an important problem in comparative and evolutionary genomics. In a phylogenomics setting that accounts for gene family evolution through gene duplication and gene loss, the reconstruction of ancestral gene orders involves several steps, including multiple sequence alignment, the inference of reconciled gene trees, and the inference of ancestral syntenies and gene adjacencies. For each of the steps of such a process, several methods can be used and implemented using a growing corpus of, often parameterized, tools; in practice, interfacing such tools into an ancestral gene order reconstruction pipeline is far from trivial. This chapter introduces AGO, a Python-based framework aimed at creating ancestral gene order reconstruction pipelines allowing to interface and parameterize different bioinformatics tools. The authors illustrate the features of AGO by reconstructing ancestral gene orders for the X chromosome of three ancestral Anopheles species using three different pipelines. AGO is freely available at https://github.com/cchauve/AGO-pipeline .
Collapse
Affiliation(s)
- Evan P Cribbie
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | - Daniel Doerr
- Department for Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, German Diabetes Center (DDZ), Leibniz Institute for Diabetes Research, and Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada.
| |
Collapse
|
5
|
Katriel G, Mahanaymi U, Brezner S, Kezel N, Koutschan C, Zeilberger D, Steel M, Snir S. Gene Transfer-Based Phylogenetics: Analytical Expressions and Additivity via Birth-Death Theory. Syst Biol 2023; 72:1403-1417. [PMID: 37862116 DOI: 10.1093/sysbio/syad060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 09/01/2023] [Accepted: 10/05/2023] [Indexed: 10/22/2023] Open
Abstract
The genomic era has opened up vast opportunities in molecular systematics, one of which is deciphering the evolutionary history in fine detail. Under this mass of data, analyzing the point mutations of standard markers is often too crude and slow for fine-scale phylogenetics. Nevertheless, genome dynamics (GD) events provide alternative, often richer information. The synteny index (SI) between a pair of genomes combines gene order and gene content information, allowing the comparison of genomes of unequal gene content, together with order considerations of their common genes. Recently, genome dynamics has been modeled as a continuous-time Markov process, and gene distance in the genome as a birth-death-immigration process. Nevertheless, due to complexities arising in this setting, no precise and provably consistent estimators could be derived, resulting in heuristic solutions. Here, we extend this modeling approach by using techniques from birth-death theory to derive explicit expressions of the system's probabilistic dynamics in the form of rational functions of the model parameters. This, in turn, allows us to infer analytically accurate distances between organisms based on their SI. Subsequently, we establish additivity of this estimated evolutionary distance (a desirable property yielding phylogenetic consistency). Applying the new measure in simulation studies shows that it provides accurate results in realistic settings and even under model extensions such as gene gain/loss or over a tree structure. In the real-data realm, we applied the new formulation to unique data structure that we constructed-the ordered orthology DB-based on a new version of the EggNOG database, to construct a tree with more than 4.5K taxa. To the best of our knowledge, this is the largest gene-order-based tree constructed and it overcomes shortcomings found in previous approaches. Constructing a GD-based tree allows to confirm and contrast findings based on other phylogenetic approaches, as we show.
Collapse
Affiliation(s)
- Guy Katriel
- Department of Mathematics, Braude College of Engineering, Karmiel, Israel
| | - Udi Mahanaymi
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| | - Shelly Brezner
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| | - Noor Kezel
- Department of Mathematics, University of Haifa, Haifa, Israel
| | | | - Doron Zeilberger
- Department of Mathematics, Rutgers University, New Brunwick, NJ, USA
| | - Mike Steel
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Sagi Snir
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
6
|
Perez-Lamarque B, Morlon H. Comparing different computational approaches for detecting long-term vertical transmission in host-associated microbiota. Mol Ecol 2023; 32:6671-6685. [PMID: 36065594 DOI: 10.1111/mec.16681] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 08/29/2022] [Accepted: 09/01/2022] [Indexed: 11/30/2022]
Abstract
Long-term vertical transmissions of gut bacteria are thought to be frequent and functionally important in mammals. Several phylogenetic-based approaches have been proposed to detect, among species-rich microbiota, the bacteria that have been vertically transmitted during a host clade radiation. Applied to mammal microbiota, these methods have sometimes led to conflicting results; in addition, how they cope with the slow evolution of markers typically used to characterize bacterial microbiota remains unclear. Here, we use simulations to test the statistical performances of two widely-used global-fit approaches (ParaFit and PACo) and two event-based approaches (ALE and HOME). We find that these approaches have different strengths and weaknesses depending on the amount of variation in the bacterial DNA sequences and are therefore complementary. In particular, we show that ALE performs better when there is a lot of variation in the bacterial DNA sequences, whereas HOME performs better when there is not. Global-fit approaches (ParaFit and PACo) have higher type I error rates (false positives) but have the advantage to be very fast to run. We apply these methods to the gut microbiota of primates and our results suggest that only a small fraction of their gut bacteria is vertically transmitted.
Collapse
Affiliation(s)
- Benoît Perez-Lamarque
- Institut de Biologie de l'École Normale Supérieure (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'histoire Naturelle, CNRS, Sorbonne Université, EPHE, UA, Paris, France
| | - Hélène Morlon
- Institut de Biologie de l'École Normale Supérieure (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| |
Collapse
|
7
|
Askelson KK, Spellman GM, Irwin D. Genomic divergence and introgression between cryptic species of a widespread North American songbird. Mol Ecol 2023; 32:6839-6853. [PMID: 37916530 DOI: 10.1111/mec.17169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 09/20/2023] [Indexed: 11/03/2023]
Abstract
Analysis of genomic variation among related populations can sometimes reveal distinct species that were previously undescribed due to similar morphological appearances, and close examination of such cases can provide much insight regarding speciation. Genomic data can also reveal the role of reticulate evolution in differentiation and speciation. White-breasted nuthatches (Sitta carolinensis) are widely distributed North American songbirds that are currently classified as a single species but have been suspected to represent a case of cryptic speciation. Previous genetic analyses suggested four divergent groups, but it was unclear whether these represented multiple reproductively isolated species. Using extensive genomic sampling of over 350 white-breasted nuthatches from across North America and a new chromosome-level reference genome, we asked if white-breasted nuthatches are comprised of multiple species and whether introgression has occurred between divergent populations. Genomic variation of over 300,000 loci revealed four highly differentiated populations (Pacific, n = 45; Eastern, n = 23; Rocky Mountains North, n = 138; and Rocky Mountains South, n = 150) with geographic ranges that are adjacent. We observed a moderate degree of admixture between Rocky Mountain populations but only a small number of hybrids between the Rockies and the Eastern population. The rarity of hybrids together with high levels of differentiation between populations is supportive of populations having some level of reproductive isolation. Between populations, we show evidence for introgression from a divergent ghost lineage of white-breasted nuthatches into the Rocky Mountains South population, which is otherwise closely related to Rocky Mountains North. We conclude that white-breasted nuthatches are best considered at least three species and that ghost lineage introgression has contributed to differentiation between the two Rocky Mountain populations. White-breasted nuthatches provide a dramatic case of morphological similarity despite high genomic differentiation, and the varying levels of reproductive isolation among the four groups provide an example of the speciation continuum.
Collapse
Affiliation(s)
- Kenneth K Askelson
- Biodiversity Research Centre and Department of Zoology, University of British Columbia, Vancouver, British Columbia, Canada
| | - Garth M Spellman
- Department of Zoology, Denver Museum of Nature & Science, Denver, Colorado, USA
| | - Darren Irwin
- Biodiversity Research Centre and Department of Zoology, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
8
|
Zaman S, Sledzieski S, Berger B, Wu YC, Bansal MS. virDTL: Viral Recombination Analysis Through Phylogenetic Reconciliation and Its Application to Sarbecoviruses and SARS-CoV-2. J Comput Biol 2023; 30:3-20. [PMID: 36125448 PMCID: PMC10081712 DOI: 10.1089/cmb.2021.0507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
An accurate understanding of the evolutionary history of rapidly-evolving viruses like SARS-CoV-2, responsible for the COVID-19 pandemic, is crucial to tracking and preventing the spread of emerging pathogens. However, viruses undergo frequent recombination, which makes it difficult to trace their evolutionary history using traditional phylogenetic methods. In this study, we present a phylogenetic workflow, virDTL, for analyzing viral evolution in the presence of recombination. Our approach leverages reconciliation methods developed for inferring horizontal gene transfer in prokaryotes and, compared to existing tools, is uniquely able to identify ancestral recombinations while accounting for several sources of inference uncertainty, including in the construction of a strain tree, estimation and rooting of gene family trees, and reconciliation itself. We apply this workflow to the Sarbecovirus subgenus and demonstrate how a principled analysis of predicted recombination gives insight into the evolution of SARS-CoV-2. In addition to providing confirming evidence for the horseshoe bat as its zoonotic origin, we identify several ancestral recombination events that merit further study.
Collapse
Affiliation(s)
- Sumaira Zaman
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, USA
| | - Samuel Sledzieski
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.,Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Yi-Chieh Wu
- Department of Computer Science, Harvey Mudd College, Claremont, California, USA
| | - Mukul S Bansal
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, USA.,The Institute for Systems Genomics, University of Connecticut, Storrs, Connecticut, USA
| |
Collapse
|
9
|
Affiliation(s)
- Hugo Menet
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- * E-mail: (VD); (ET)
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- Inria, centre de recherche de Lyon, Villeurbanne, France
- * E-mail: (VD); (ET)
| |
Collapse
|
10
|
Harris BJ, Clark JW, Schrempf D, Szöllősi GJ, Donoghue PCJ, Hetherington AM, Williams TA. Divergent evolutionary trajectories of bryophytes and tracheophytes from a complex common ancestor of land plants. Nat Ecol Evol 2022; 6:1634-1643. [PMID: 36175544 PMCID: PMC9630106 DOI: 10.1038/s41559-022-01885-x] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 08/12/2022] [Indexed: 11/16/2022]
Abstract
The origin of plants and their colonization of land fundamentally transformed the terrestrial environment. Here we elucidate the basis of this formative episode in Earth history through patterns of lineage, gene and genome evolution. We use new fossil calibrations, a relative clade age calibration (informed by horizontal gene transfer) and new phylogenomic methods for mapping gene family origins. Distinct rooting strategies resolve tracheophytes (vascular plants) and bryophytes (non-vascular plants) as monophyletic sister groups that diverged during the Cambrian, 515-494 million years ago. The embryophyte stem is characterized by a burst of gene innovation, while bryophytes subsequently experienced an equally dramatic episode of reductive genome evolution in which they lost genes associated with the elaboration of vasculature and the stomatal complex. Overall, our analyses reveal that extant tracheophytes and bryophytes are both highly derived from a more complex ancestral land plant. Understanding the origin of land plants requires tracing character evolution across a diversity of modern lineages.
Collapse
Affiliation(s)
- Brogan J Harris
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - James W Clark
- School of Biological Sciences, University of Bristol, Bristol, UK
- Bristol Palaeobiology Group, School of Earth Sciences, University of Bristol, Bristol, UK
| | - Dominik Schrempf
- Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Gergely J Szöllősi
- Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
- MTA-ELTE 'Lendület' Evolutionary Genomics Research Group, Budapest, Hungary
- Institute of Evolution, Centre for Ecological Research, Budapest, Hungary
| | - Philip C J Donoghue
- Bristol Palaeobiology Group, School of Earth Sciences, University of Bristol, Bristol, UK
| | | | - Tom A Williams
- School of Biological Sciences, University of Bristol, Bristol, UK.
- Bristol Palaeobiology Group, School of Earth Sciences, University of Bristol, Bristol, UK.
| |
Collapse
|
11
|
Mulvey LPA, Warnock RCM, De Baets K. Where traditional extinction estimates fall flat: using novel cophylogenetic methods to estimate extinction risk in platyhelminths. Proc Biol Sci 2022; 289:20220432. [PMID: 36043279 PMCID: PMC9428535 DOI: 10.1098/rspb.2022.0432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 08/08/2022] [Indexed: 11/12/2022] Open
Abstract
Today parasites comprise a huge proportion of living biodiversity and play a major role in shaping community structure. Given their ecological significance, parasite extinctions could result in massive cascading effects across ecosystems. It is therefore crucial that we have a way of estimating their extinction risk. Attempts to do this have often relied on information about host extinction risk, without explicitly incorporating information about the parasites. However, assuming an identical risk may be misleading. Here, we apply a novel metric to estimate the cophylogenetic extinction rate, Ec, of parasites with their hosts. This metric incorporates information about the evolutionary history of parasites and hosts that can be estimated using event-based cophylogenetic methods. To explore this metric, we investigated the use of different cophylogenetic methods to inform the Ec rate, based on the analysis of polystome parasites and their anuran hosts. We show using both parsimony- and model-based approaches that different methods can have a large effect on extinction risk estimation. Further, we demonstrate that model-based approaches offer greater potential to provide insights into cophylogenetic history and extinction risk.
Collapse
Affiliation(s)
- Laura P. A. Mulvey
- GeoZentrum Nordbayern, Department of Geography and Geosciences, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen 91054, Germany
| | - Rachel C. M. Warnock
- GeoZentrum Nordbayern, Department of Geography and Geosciences, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen 91054, Germany
| | - Kenneth De Baets
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw 00-927 Warszawa, Poland
| |
Collapse
|
12
|
Tricou T, Tannier E, de Vienne DM. Ghost Lineages Highly Influence the Interpretation of Introgression Tests. Syst Biol 2022; 71:1147-1158. [PMID: 35169846 PMCID: PMC9366450 DOI: 10.1093/sysbio/syac011] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 02/01/2021] [Accepted: 02/08/2022] [Indexed: 11/29/2022] Open
Abstract
Most species are extinct, those that are not are often unknown. Sequenced and sampled species are often a minority of known ones. Past evolutionary events involving horizontal gene flow, such as horizontal gene transfer, hybridization, introgression, and admixture, are therefore likely to involve "ghosts," that is extinct, unknown, or unsampled lineages. The existence of these ghost lineages is widely acknowledged, but their possible impact on the detection of gene flow and on the identification of the species involved is largely overlooked. It is generally considered as a possible source of error that, with reasonable approximation, can be ignored. We explore the possible influence of absent species on an evolutionary study by quantifying the effect of ghost lineages on introgression as detected by the popular D-statistic method. We show from simulated data that under certain frequently encountered conditions, the donors and recipients of horizontal gene flow can be wrongly identified if ghost lineages are not taken into account. In particular, having a distant outgroup, which is usually recommended, leads to an increase in the error probability and to false interpretations in most cases. We conclude that introgression from ghost lineages should be systematically considered as an alternative possible, even probable, scenario. [ABBA-BABA; D-statistic; gene flow; ghost lineage; introgression; simulation.].
Collapse
Affiliation(s)
- Théo Tricou
- Laboratoire de Biométrie et Biologie Évolutive UMR5558, Univ Lyon, Université Lyon 1, CNRS, F-69622 Villeurbanne, France
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Évolutive UMR5558, Univ Lyon, Université Lyon 1, CNRS, F-69622 Villeurbanne, France
- Inria, Centre de Recherche de Lyon, F-69603 Villeurbanne, France
| | - Damien M de Vienne
- Laboratoire de Biométrie et Biologie Évolutive UMR5558, Univ Lyon, Université Lyon 1, CNRS, F-69622 Villeurbanne, France
| |
Collapse
|
13
|
Cerón-Romero MA, Fonseca MM, de Oliveira Martins L, Posada D, Katz LA. Phylogenomic Analyses of 2,786 Genes in 158 Lineages Support a Root of the Eukaryotic Tree of Life between Opisthokonts and All Other Lineages. Genome Biol Evol 2022; 14:evac119. [PMID: 35880421 PMCID: PMC9366629 DOI: 10.1093/gbe/evac119] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2022] [Indexed: 12/02/2022] Open
Abstract
Advances in phylogenomics and high-throughput sequencing have allowed the reconstruction of deep phylogenetic relationships in the evolution of eukaryotes. Yet, the root of the eukaryotic tree of life remains elusive. The most popular hypothesis in textbooks and reviews is a root between Unikonta (Opisthokonta + Amoebozoa) and Bikonta (all other eukaryotes), which emerged from analyses of a single-gene fusion. Subsequent, highly cited studies based on concatenation of genes supported this hypothesis with some variations or proposed a root within Excavata. However, concatenation of genes does not consider phylogenetically-informative events like gene duplications and losses. A recent study using gene tree parsimony (GTP) suggested the root lies between Opisthokonta and all other eukaryotes, but only including 59 taxa and 20 genes. Here we use GTP with a duplication-loss model in a gene-rich and taxon-rich dataset (i.e., 2,786 gene families from two sets of 155 and 158 diverse eukaryotic lineages) to assess the root, and we iterate each analysis 100 times to quantify tree space uncertainty. We also contrasted our results and discarded alternative hypotheses from the literature using GTP and the likelihood-based method SpeciesRax. Our estimates suggest a root between Fungi or Opisthokonta and all other eukaryotes; but based on further analysis of genome size, we propose that the root between Opisthokonta and all other eukaryotes is the most likely.
Collapse
Affiliation(s)
- Mario A Cerón-Romero
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, USA
- Program in Organismic and Evolutionary Biology, University of Massachusetts Amherst, Amherst, Massachusetts, USA
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana-Champaign, Illinois, USA
| | - Miguel M Fonseca
- CIIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal
| | - Leonardo de Oliveira Martins
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
- Quadram Institute Bioscience, Norwich, United Kingdom
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Laura A Katz
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, USA
- Program in Organismic and Evolutionary Biology, University of Massachusetts Amherst, Amherst, Massachusetts, USA
| |
Collapse
|
14
|
Schön ME, Martijn J, Vosseberg J, Köstlbacher S, Ettema TJG. The evolutionary origin of host association in the Rickettsiales. Nat Microbiol 2022; 7:1189-1199. [PMID: 35798888 PMCID: PMC9352585 DOI: 10.1038/s41564-022-01169-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 05/30/2022] [Indexed: 12/14/2022]
Abstract
The evolution of obligate host-association of bacterial symbionts and pathogens remains poorly understood. The Rickettsiales are an alphaproteobacterial order of obligate endosymbionts and parasites that infect a wide variety of eukaryotic hosts, including humans, livestock, insects and protists. Induced by their host-associated lifestyle, Rickettsiales genomes have undergone reductive evolution, leading to small, AT-rich genomes with limited metabolic capacities. Here we uncover eleven deep-branching alphaproteobacterial metagenome assembled genomes from aquatic environments, including data from the Tara Oceans initiative and other publicly available datasets, distributed over three previously undescribed Rickettsiales-related clades. Phylogenomic analyses reveal that two of these clades, Mitibacteraceae and Athabascaceae, branch sister to all previously sampled Rickettsiales. The third clade, Gamibacteraceae, branch sister to the recently identified ectosymbiotic ‘Candidatus Deianiraea vastatrix’. Comparative analyses indicate that the gene complement of Mitibacteraceae and Athabascaceae is reminiscent of that of free-living and biofilm-associated bacteria. Ancestral genome content reconstruction across the Rickettsiales species tree further suggests that the evolution of host association in Rickettsiales was a gradual process that may have involved the repurposing of a type IV secretion system. Phylogenomic analyses reveal novel environmental clades of Rickettsiales providing insights into their evolution from free-living to host-associated lifestyle.
Collapse
Affiliation(s)
- Max E Schön
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Joran Martijn
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden.,Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Canada
| | - Julian Vosseberg
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden.,Theoretical Biology and Bioinformatics, Department of Biology, Utrecht University, Utrecht, The Netherlands
| | - Stephan Köstlbacher
- Laboratory of Microbiology, Wageningen University and Research, Wageningen, The Netherlands
| | - Thijs J G Ettema
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden. .,Laboratory of Microbiology, Wageningen University and Research, Wageningen, The Netherlands.
| |
Collapse
|
15
|
Tree Reconciliation Methods for Host-Symbiont Cophylogenetic Analyses. Life (Basel) 2022; 12:life12030443. [PMID: 35330194 PMCID: PMC8951107 DOI: 10.3390/life12030443] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 03/05/2022] [Accepted: 03/10/2022] [Indexed: 12/16/2022] Open
Abstract
Phylogenetic reconciliation is a fundamental method in the study of pairs of coevolving species. This paper provides an overview of the underlying theory of reconciliation in the context of host-symbiont cophylogenetics, identifying some of the major challenges to users of these methods, such as selecting event costs and selecting representative reconciliations. Next, recent advances to address these challenges are discussed followed by a discussion of several established and recent software tools.
Collapse
|
16
|
Harris BJ, Sheridan PO, Davín AA, Gubry-Rangin C, Szöllősi GJ, Williams TA. Rooting Species Trees Using Gene Tree-Species Tree Reconciliation. Methods Mol Biol 2022; 2569:189-211. [PMID: 36083449 DOI: 10.1007/978-1-0716-2691-7_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Interpreting phylogenetic trees requires a root, which provides the direction of evolution and polarizes ancestor-descendant relationships. But inferring the root using genetic data is difficult, particularly in cases where the closest available outgroup is only distantly related, which are common for microbes. In this chapter, we present a workflow for estimating rooted species trees and the evolutionary history of the gene families that evolve within them using probabilistic gene tree-species tree reconciliation. We illustrate the pipeline using a small dataset of prokaryotic genomes, for which the example scripts can be run using modest computer resources. We describe the rooting method used in this work in the context or other rooting strategies and discuss some of the limitations and opportunities presented by probabilistic gene tree-species tree reconciliation methods.
Collapse
Affiliation(s)
- Brogan J Harris
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Paul O Sheridan
- School of Biological Sciences, University of Bristol, Bristol, UK
- School of Biological Sciences, University of Aberdeen, Aberdeen, UK
| | - Adrián A Davín
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | | | - Gergely J Szöllősi
- Dept. of Biological Physics, Eötvös Loránd University, Budapest, Hungary
- MTA-ELTE "Lendület" Evolutionary Genomics Research Group, Budapest, Hungary
- Institute of Evolution, Centre for Ecological Research, Budapest, Hungary
| | - Tom A Williams
- School of Biological Sciences, University of Bristol, Bristol, UK.
| |
Collapse
|
17
|
Bansal MS. Deciphering Microbial Gene Family Evolution Using Duplication-Transfer-Loss Reconciliation and RANGER-DTL. Methods Mol Biol 2022; 2569:233-252. [PMID: 36083451 DOI: 10.1007/978-1-0716-2691-7_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenetic reconciliation has emerged as a principled, highly effective technique for investigating the origin, spread, and evolutionary history of microbial gene families. Proper application of phylogenetic reconciliation requires a clear understanding of potential pitfalls and sources of error, and knowledge of the most effective reconciliation-based tools and protocols to use to maximize accuracy. In this book chapter, we provide a brief overview of Duplication-Transfer-Loss (DTL) reconciliation, the standard reconciliation model used to study microbial gene families and provide a step-by-step computational protocol to maximize the accuracy of DTL reconciliation and minimize false-positive evolutionary inferences.
Collapse
Affiliation(s)
- Mukul S Bansal
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
18
|
Davín AA, Schrempf D, Williams TA, Hugenholtz P, Szöllősi GJ. Relative Time Inference Using Lateral Gene Transfers. Methods Mol Biol 2022; 2569:75-94. [PMID: 36083444 DOI: 10.1007/978-1-0716-2691-7_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Many organisms are able to incorporate exogenous DNA into their genomes. This process, called lateral gene transfer (LGT), has the potential to benefit the recipient organism by providing useful coding sequences, such as antibiotic resistance genes or enzymes which expand the organism's metabolic niche. For evolutionary biologists, LGTs have often been considered a nuisance because they complicate the reconstruction of the underlying species tree that many analyses aim to recover. However, LGT events between distinct organisms harbor information on the relative divergence time of the donor and recipient lineages. As a result transfers provide a novel and as yet mostly unexplored source of information to determine the order of divergence of clades, with the potential for absolute dating if linked to the fossil record.
Collapse
Affiliation(s)
- Adrián A Davín
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Dominik Schrempf
- Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Tom A Williams
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Philip Hugenholtz
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Gergely J Szöllősi
- Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
- MTA-ELTE "Lendület" Evolutionary Genomics Research Group, Budapest, Hungary
- Institute of Evolution, Centre for Ecological Research, Budapest, Hungary
| |
Collapse
|
19
|
Wickell D, Kuo LY, Yang HP, Dhabalia Ashok A, Irisarri I, Dadras A, de Vries S, de Vries J, Huang YM, Li Z, Barker MS, Hartwick NT, Michael TP, Li FW. Underwater CAM photosynthesis elucidated by Isoetes genome. Nat Commun 2021; 12:6348. [PMID: 34732722 PMCID: PMC8566536 DOI: 10.1038/s41467-021-26644-7] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 10/12/2021] [Indexed: 12/13/2022] Open
Abstract
To conserve water in arid environments, numerous plant lineages have independently evolved Crassulacean Acid Metabolism (CAM). Interestingly, Isoetes, an aquatic lycophyte, can also perform CAM as an adaptation to low CO2 availability underwater. However, little is known about the evolution of CAM in aquatic plants and the lack of genomic data has hindered comparison between aquatic and terrestrial CAM. Here, we investigate underwater CAM in Isoetes taiwanensis by generating a high-quality genome assembly and RNA-seq time course. Despite broad similarities between CAM in Isoetes and terrestrial angiosperms, we identify several key differences. Notably, Isoetes may have recruited the lesser-known 'bacterial-type' PEPC, along with the 'plant-type' exclusively used in other CAM and C4 plants for carboxylation of PEP. Furthermore, we find that circadian control of key CAM pathway genes has diverged considerably in Isoetes relative to flowering plants. This suggests the existence of more evolutionary paths to CAM than previously recognized.
Collapse
Affiliation(s)
- David Wickell
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
- Boyce Thompson Institute, Ithaca, NY, USA
| | - Li-Yaung Kuo
- Institute of Molecular & Cellular Biology, National Tsing Hua University, Hsinchu, Taiwan
| | | | - Amra Dhabalia Ashok
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Iker Irisarri
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
- Campus Institute Data Science, University of Goettingen, Goettingen, Germany
| | - Armin Dadras
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Sophie de Vries
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Jan de Vries
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
- Campus Institute Data Science, University of Goettingen, Goettingen, Germany
- Department of Applied Bioinformatics, Goettingen Center for Molecular Biosciences, University of Goettingen, Goettingen, Germany
| | | | - Zheng Li
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA
| | - Michael S Barker
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Nolan T Hartwick
- The Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Todd P Michael
- The Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA.
| | - Fay-Wei Li
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA.
- Boyce Thompson Institute, Ithaca, NY, USA.
| |
Collapse
|
20
|
Improved Duplication-Transfer-Loss Reconciliation with Extinct and Unsampled Lineages. ALGORITHMS 2021. [DOI: 10.3390/a14080231] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Duplication-Transfer-Loss (DTL) reconciliation is a widely used computational technique for understanding gene family evolution and inferring horizontal gene transfer (transfer for short) in microbes. However, most existing models and implementations of DTL reconciliation cannot account for the effect of unsampled or extinct species lineages on the evolution of gene families, likely affecting their accuracy. Accounting for the presence and possible impact of any unsampled species lineages, including those that are extinct, is especially important for inferring and studying horizontal transfer since many genes in the species lineages represented in the reconciliation analysis are likely to have been acquired through horizontal transfer from unsampled lineages. While models of DTL reconciliation that account for transfer from unsampled lineages have already been proposed, they use a relatively simple framework for transfer from unsampled lineages and cannot explicitly infer the location on the species tree of each unsampled or extinct lineage associated with an identified transfer event. Furthermore, there does not yet exist any systematic studies to assess the impact of accounting for unsampled lineages on the accuracy of DTL reconciliation. In this work, we address these deficiencies by (i) introducing an extended DTL reconciliation model, called the DTLx reconciliation model, that accounts for unsampled and extinct species lineages in a new, more functional manner compared to existing models, (ii) showing that optimal reconciliations under the new DTLx reconciliation model can be computed just as efficiently as under the fastest DTL reconciliation model, (iii) providing an efficient algorithm for sampling optimal DTLx reconciliations uniformly at random, (iv) performing the first systematic simulation study to assess the impact of accounting for unsampled lineages on the accuracy of DTL reconciliation, and (v) comparing the accuracies of inferring transfers from unsampled lineages under our new model and the only other previously proposed parsimony-based model for this problem.
Collapse
|
21
|
Liu J, Mawhorter R, Liu N, Santichaivekin S, Bush E, Libeskind-Hadas R. Maximum parsimony reconciliation in the DTLOR model. BMC Bioinformatics 2021; 22:394. [PMID: 34348661 PMCID: PMC8340394 DOI: 10.1186/s12859-021-04290-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Accepted: 07/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analyses of microbial evolution often use reconciliation methods. However, the standard duplication-transfer-loss (DTL) model does not account for the fact that species trees are often not fully sampled and thus, from the perspective of reconciliation, a gene family may enter the species tree from the outside. Moreover, within the genome, genes are often rearranged, causing them to move to new syntenic regions. RESULTS We extend the DTL model to account for two events that commonly arise in the evolution of microbes: origin of a gene from outside the sampled species tree and rearrangement of gene syntenic regions. We describe an efficient algorithm for maximum parsimony reconciliation in this new DTLOR model and then show how it can be extended to account for non-binary gene trees to handle uncertainty in gene tree topologies. Finally, we describe preliminary experimental results from the integration of our algorithm into the existing xenoGI tool for reconstructing the histories of genomic islands in closely related bacteria. CONCLUSIONS Reconciliation in the DTLOR model can offer new insights into the evolution of microbes that is not currently possible under the DTL model.
Collapse
Affiliation(s)
- Jingyi Liu
- Department of Computer Science, Harvey Mudd College, Claremont, CA, USA
| | - Ross Mawhorter
- Department of Computer Science and Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Nuo Liu
- Department of Computer Science, Harvey Mudd College, Claremont, CA, USA.,Department of Biology, Harvey Mudd College, Claremont, CA, USA
| | | | - Eliot Bush
- Department of Biology, Harvey Mudd College, Claremont, CA, USA
| | | |
Collapse
|
22
|
Dismukes W, Heath TA. treeducken: An R package for simulating cophylogenetic systems. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Wade Dismukes
- Department of Ecology, Evolution, and Organismal Biology Iowa State University Ames IA USA
| | - Tracy A. Heath
- Department of Ecology, Evolution, and Organismal Biology Iowa State University Ames IA USA
| |
Collapse
|
23
|
Coleman GA, Davín AA, Mahendrarajah TA, Szánthó LL, Spang A, Hugenholtz P, Szöllősi GJ, Williams TA. A rooted phylogeny resolves early bacterial evolution. Science 2021; 372:372/6542/eabe0511. [PMID: 33958449 DOI: 10.1126/science.abe0511] [Citation(s) in RCA: 122] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 11/05/2020] [Accepted: 04/01/2021] [Indexed: 12/17/2022]
Abstract
A rooted bacterial tree is necessary to understand early evolution, but the position of the root is contested. Here, we model the evolution of 11,272 gene families to identify the root, extent of horizontal gene transfer (HGT), and the nature of the last bacterial common ancestor (LBCA). Our analyses root the tree between the major clades Terrabacteria and Gracilicutes and suggest that LBCA was a free-living flagellated, rod-shaped double-membraned organism. Contrary to recent proposals, our analyses reject a basal placement of the Candidate Phyla Radiation, which instead branches sister to Chloroflexota within Terrabacteria. While most gene families (92%) have evidence of HGT, overall, two-thirds of gene transmissions have been vertical, suggesting that a rooted tree provides a meaningful frame of reference for interpreting bacterial evolution.
Collapse
Affiliation(s)
- Gareth A Coleman
- School of Biological Sciences, University of Bristol, Bristol BS8 1TQ, UK
| | - Adrián A Davín
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland 4072, Australia
| | - Tara A Mahendrarajah
- Department of Marine Microbiology and Biogeochemistry, NIOZ, Royal Netherlands Institute for Sea Research, 1790 AB Den Burg, Netherlands
| | - Lénárd L Szánthó
- Department of Biological Physics, Eötvös Loránd University, 1117 Budapest, Hungary.,MTA-ELTE "Lendület" Evolutionary Genomics Research Group, 1117 Budapest, Hungary
| | - Anja Spang
- Department of Marine Microbiology and Biogeochemistry, NIOZ, Royal Netherlands Institute for Sea Research, 1790 AB Den Burg, Netherlands.,Department of Cell- and Molecular Biology, Uppsala University, SE-75123 Uppsala, Sweden
| | - Philip Hugenholtz
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland 4072, Australia.
| | - Gergely J Szöllősi
- Department of Biological Physics, Eötvös Loránd University, 1117 Budapest, Hungary. .,MTA-ELTE "Lendület" Evolutionary Genomics Research Group, 1117 Budapest, Hungary.,Institute of Evolution, Centre for Ecological Research, 1121 Budapest, Hungary
| | - Tom A Williams
- School of Biological Sciences, University of Bristol, Bristol BS8 1TQ, UK.
| |
Collapse
|
24
|
Blanquart S, Groussin M, Le Roy A, Szöllosi GJ, Girard E, Franzetti B, Gouy M, Madern D. Resurrection of Ancestral Malate Dehydrogenases Reveals the Evolutionary History of Halobacterial Proteins : Deciphering Gene Trajectories and Changes in Biochemical Properties. Mol Biol Evol 2021; 38:3754-3774. [PMID: 33974066 PMCID: PMC8382911 DOI: 10.1093/molbev/msab146] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Extreme halophilic Archaea thrive in high salt, where, through proteomic adaptation, they cope with the strong osmolarity and extreme ionic conditions of their environment. In spite of wide fundamental interest, however, studies providing insights into this adaptation are scarce, because of practical difficulties inherent to the purification and characterization of halophilic enzymes. In this work, we describe the evolutionary history of malate dehydrogenases (MalDH) within Halobacteria (a class of the Euryarchaeota phylum). We resurrected nine ancestors along the inferred halobacterial MalDH phylogeny, including the Last Common Ancestral MalDH of Halobacteria (LCAHa) and compared their biochemical properties with those of five modern halobacterial MalDHs. We monitored the stability of these various MalDHs, their oligomeric states and enzymatic properties, as a function of concentration for different salts in the solvent. We found that a variety of evolutionary processes such as amino acid replacement, gene duplication, loss of MalDH gene and replacement owing to horizontal transfer resulted in significant differences in solubility, stability and catalytic properties between these enzymes in the three Halobacteriales, Haloferacales and Natrialbales orders since the LCAHa MalDH.We also showed how a stability trade-off might favor the emergence of new properties during adaptation to diverse environmental conditions. Altogether, our results suggest a new view of halophilic protein adaptation in Archaea.
Collapse
Affiliation(s)
| | - Mathieu Groussin
- Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, Villeurbanne, F-69622, France.,Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139, USA
| | - Aline Le Roy
- Univ Grenoble Alpes, CNRS, CEA, IBS, Grenoble, F-38000, France
| | - Gergely J Szöllosi
- Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, Villeurbanne, F-69622, France.,MTA-ELTE "Lendulet" Evolutionary Genomics Research Group, Budapest, H-1117, Hungary
| | - Eric Girard
- Univ Grenoble Alpes, CNRS, CEA, IBS, Grenoble, F-38000, France
| | - Bruno Franzetti
- Univ Grenoble Alpes, CNRS, CEA, IBS, Grenoble, F-38000, France
| | - Manolo Gouy
- Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, Villeurbanne, F-69622, France
| | | |
Collapse
|
25
|
Morel B, Kozlov AM, Stamatakis A, Szöllősi GJ. GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss. Mol Biol Evol 2021; 37:2763-2774. [PMID: 32502238 PMCID: PMC8312565 DOI: 10.1093/molbev/msaa141] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Inferring phylogenetic trees for individual homologous gene families is difficult because
alignments are often too short, and thus contain insufficient signal, while substitution
models inevitably fail to capture the complexity of the evolutionary processes. To
overcome these challenges, species-tree-aware methods also leverage information from a
putative species tree. However, only few methods are available that implement a full
likelihood framework or account for horizontal gene transfers. Furthermore, these methods
often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on
approximations and heuristics that limit the degree of tree space exploration. Here, we
present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference
software. It simultaneously accounts for substitutions at the sequence level as well as
gene level events, such as duplication, transfer, and loss relying on established maximum
likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for
multiple gene families, directly from the per-gene sequence alignments and a rooted, yet
undated, species tree. We show that compared with competing tools, on simulated data
GeneRax infers trees that are the closest to the true tree in 90% of the simulations in
terms of relative Robinson–Foulds distance. On empirical data sets, GeneRax is the fastest
among all tested methods when starting from aligned sequences, and it infers trees with
the highest likelihood score, based on our model. GeneRax completed tree inferences and
reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its
parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at
https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).
Collapse
Affiliation(s)
- Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Alexey M Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.,Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Gergely J Szöllősi
- ELTE-MTA "Lendület" Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös University, Budapest, Hungary.,Evolutionary Systems Research Group, Centre for Ecological Research, Hungarian Academy of Sciences, Tihany, Hungary
| |
Collapse
|
26
|
Phylogenomics reveals the basis of adaptation of Pseudorhizobium species to extreme environments and supports a taxonomic revision of the genus. Syst Appl Microbiol 2020; 44:126165. [PMID: 33360413 DOI: 10.1016/j.syapm.2020.126165] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 11/10/2020] [Accepted: 11/11/2020] [Indexed: 11/21/2022]
Abstract
The family Rhizobiaceae includes many genera of soil bacteria, often isolated for their association with plants. Herein, we investigate the genomic diversity of a group of Rhizobium species and unclassified strains isolated from atypical environments, including seawater, rock matrix or polluted soil. Based on whole-genome similarity and core genome phylogeny, we show that this group corresponds to the genus Pseudorhizobium. We thus reclassify Rhizobium halotolerans, R. marinum, R. flavum and R. endolithicum as P. halotolerans sp. nov., P. marinum comb. nov., P. flavum comb. nov. and P. endolithicum comb. nov., respectively, and show that P. pelagicum is a synonym of P. marinum. We also delineate a new chemolithoautotroph species, P. banfieldiae sp. nov., whose type strain is NT-26T (=DSM 106348T=CFBP 8663T). This genome-based classification was supported by a chemotaxonomic comparison, with increasing taxonomic resolution provided by fatty acid, protein and metabolic profiles. In addition, we used a phylogenetic approach to infer scenarios of duplication, horizontal transfer and loss for all genes in the Pseudorhizobium pangenome. We thus identify the key functions associated with the diversification of each species and higher clades, shedding light on the mechanisms of adaptation to their respective ecological niches. Respiratory proteins acquired at the origin of Pseudorhizobium were combined with clade-specific genes to enable different strategies for detoxification and nutrition in harsh, nutrient-poor environments.
Collapse
|
27
|
Li Q, Scornavacca C, Galtier N, Chan YB. The Multilocus Multispecies Coalescent: A Flexible New Model of Gene Family Evolution. Syst Biol 2020; 70:822-837. [PMID: 33169795 DOI: 10.1093/sysbio/syaa084] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 05/07/2020] [Accepted: 10/19/2020] [Indexed: 02/06/2023] Open
Abstract
Incomplete lineage sorting (ILS), the interaction between coalescence and speciation, can generate incongruence between gene trees and species trees, as can gene duplication (D), transfer (T), and loss (L). These processes are usually modeled independently, but in reality, ILS can affect gene copy number polymorphism, that is, interfere with DTL. This has been previously recognized, but not treated in a satisfactory way, mainly because DTL events are naturally modeled forward-in-time, while ILS is naturally modeled backward-in-time with the coalescent. Here, we consider the joint action of ILS and DTL on the gene tree/species tree problem in all its complexity. In particular, we show that the interaction between ILS and duplications/transfers (without losses) can result in patterns usually interpreted as resulting from gene loss, and that the realized rate of D, T, and L becomes nonhomogeneous in time when ILS is taken into account. We introduce algorithmic solutions to these problems. Our new model, the multilocus multispecies coalescent, which also accounts for any level of linkage between loci, generalizes the multispecies coalescent (MSC) model and offers a versatile, powerful framework for proper simulation, and inference of gene family evolution. [Gene duplication; gene loss; horizontal gene transfer; incomplete lineage sorting; multispecies coalescent; hemiplasy; recombination.].
Collapse
Affiliation(s)
- Qiuyi Li
- School of Mathematics and Statistics / Melbourne Integrative Genomics, The University of Melbourne, Melbourne 3010, Australia
| | - Celine Scornavacca
- Institut des Sciences de l'Evolution, Université Montpellier, CNRS, IRD, EPHE, Montpellier, 34095, France
| | - Nicolas Galtier
- Institut des Sciences de l'Evolution, Université Montpellier, CNRS, IRD, EPHE, Montpellier, 34095, France
| | - Yao-Ban Chan
- School of Mathematics and Statistics / Melbourne Integrative Genomics, The University of Melbourne, Melbourne 3010, Australia
| |
Collapse
|
28
|
Davín AA, Tricou T, Tannier E, de Vienne DM, Szöllősi GJ. Zombi: a phylogenetic simulator of trees, genomes and sequences that accounts for dead linages. Bioinformatics 2020; 36:1286-1288. [PMID: 31566657 PMCID: PMC7031779 DOI: 10.1093/bioinformatics/btz710] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 09/09/2019] [Accepted: 09/26/2019] [Indexed: 11/14/2022] Open
Abstract
Summary Here we present Zombi, a tool to simulate the evolution of species, genomes and sequences in silico, that considers for the first time the evolution of genomes in extinct lineages. It also incorporates various features that have not to date been combined in a single simulator, such as the possibility of generating species trees with a pre-defined variation of speciation and extinction rates through time, simulating explicitly intergenic sequences of variable length and outputting gene tree—species tree reconciliations. Availability and implementation Source code and manual are freely available in https://github.com/AADavin/ZOMBI/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Adrián A Davín
- MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd, Budapest, Hungary
| | - Théo Tricou
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne F-69622, France
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne F-69622, France.,INRIA Grenoble Rhône-Alpes, Montbonnot-Saint-Martin F-38334, France
| | - Damien M de Vienne
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne F-69622, France
| | - Gergely J Szöllősi
- MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd, Budapest, Hungary.,Evolutionary Systems Research Group, Centre for Ecological Research, Hungarian Academy of Sciences, Tihany H-8237, Hungary
| |
Collapse
|
29
|
Sevillya G, Doerr D, Lerner Y, Stoye J, Steel M, Snir S. Horizontal Gene Transfer Phylogenetics: A Random Walk Approach. Mol Biol Evol 2020; 37:1470-1479. [PMID: 31845962 DOI: 10.1093/molbev/msz302] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The dramatic decrease in time and cost for generating genetic sequence data has opened up vast opportunities in molecular systematics, one of which is the ability to decipher the evolutionary history of strains of a species. Under this fine systematic resolution, the standard markers are too crude to provide a phylogenetic signal. Nevertheless, among prokaryotes, genome dynamics in the form of horizontal gene transfer (HGT) between organisms and gene loss seem to provide far richer information by affecting both gene order and gene content. The "synteny index" (SI) between a pair of genomes combines these latter two factors, allowing comparison of genomes with unequal gene content, together with order considerations of their common genes. Although this approach is useful for classifying close relatives, no rigorous statistical modeling for it has been suggested. Such modeling is valuable, as it allows observed measures to be transformed into estimates of time periods during evolution, yielding the "additivity" of the measure. To the best of our knowledge, there is no other additivity proof for other gene order/content measures under HGT. Here, we provide a first statistical model and analysis for the SI measure. We model the "gene neighborhood" as a "birth-death-immigration" process affected by the HGT activity over the genome, and analytically relate the HGT rate and time to the expected SI. This model is asymptotic and thus provides accurate results, assuming infinite size genomes. Therefore, we also developed a heuristic model following an "exponential decay" function, accounting for biologically realistic values, which performed well in simulations. Applying this model to 1,133 prokaryotes partitioned to 39 clusters by the rank of genus yields that the average number of genome dynamics events per gene in the phylogenetic depth of genus is around half with significant variability between genera. This result extends and confirms similar results obtained for individual genera in different manners.
Collapse
Affiliation(s)
- Gur Sevillya
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Daniel Doerr
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Yael Lerner
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Jens Stoye
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Mike Steel
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
30
|
The complex phylogenetic relationships of a 4mC/6mA DNA methyltransferase in prokaryotes. Mol Phylogenet Evol 2020; 149:106837. [PMID: 32304827 DOI: 10.1016/j.ympev.2020.106837] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Revised: 01/30/2020] [Accepted: 04/09/2020] [Indexed: 01/04/2023]
Abstract
DNA methyltransferases are proteins that modify DNA via attachment of methyl groups to nucleobases and are ubiquitous across the bacterial, archaeal, and eukaryotic domains of life. Here, we investigated the complex evolutionary history of the large and consequential 4mC/6mA DNA methyltransferase protein family using phylogenetic reconstruction of amino acid sequences. We present a well-supported phylogeny of this family based on systematic sampling of taxa across superphyla of bacteria and archaea. We compared the phylogeny to a current representation of the species tree of life and found that the 4mC/6mA methyltransferase family has a strikingly complex evolutionary history that likely began sometime after the last universal common ancestor of life diverged into the bacterial and archaeal lineages and probably involved many horizontal gene transfers within and between domains. Despite the complexity of its evolutionary history, we inferred that only one significant shift in molecular evolutionary rate characterizes the diversification of this protein family.
Collapse
|
31
|
Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models. J Math Biol 2020; 80:1353-1388. [PMID: 32060618 PMCID: PMC7052048 DOI: 10.1007/s00285-019-01465-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 11/18/2019] [Indexed: 10/28/2022]
Abstract
Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including-but not limited to-speciation ([Formula: see text]), gene duplication ([Formula: see text]), gene loss ([Formula: see text]), and horizontal gene transfer ([Formula: see text]). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the [Formula: see text]-model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the [Formula: see text]-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the [Formula: see text]-model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations.
Collapse
|
32
|
TreeSolve: Rapid Error-Correction of Microbial Gene Trees. ALGORITHMS FOR COMPUTATIONAL BIOLOGY 2020. [PMCID: PMC7197061 DOI: 10.1007/978-3-030-42266-0_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
Gene tree reconstruction is an important problem in phylogenetics. However, gene sequences often lack sufficient information to confidently distinguish between competing gene tree topologies. To overcome this limitation, the best gene tree reconstruction methods use a known species tree topology to guide the reconstruction of the gene tree. While such species-tree-aware gene tree reconstruction methods have been repeatedly shown to result in vastly more accurate gene trees, the most accurate of these methods often have prohibitively high computational costs. In this work, we introduce a highly computationally efficient and robust species-tree-aware method, named TreeSolve, for microbial gene tree reconstruction. TreeSolve works by collapsing weakly supported edges of the input gene tree, resulting in a non-binary gene tree, and then using new algorithms and techniques to optimally resolve the non-binary gene trees with respect to the given species tree in an appropriately and dynamically constrained search space. Using thousands of real and simulated gene trees, we demonstrate that TreeSolve significantly outperforms the best existing species-tree-aware methods for microbes in terms of accuracy, speed, or both. Crucially, TreeSolve also implicitly keeps track of multiple optimal gene tree reconstructions and can compute either a single best estimate of the gene tree or multiple distinct estimates. As we demonstrate, aggregating over multiple gene tree candidates helps distinguish between correct and incorrect parts of an error-corrected gene tree. Thus, TreeSolve not only enables rapid gene tree error-correction for large gene trees without compromising on accuracy, but also enables accounting of inference uncertainty.
Collapse
|
33
|
Duchemin W, Gence G, Arigon Chifolleau AM, Arvestad L, Bansal MS, Berry V, Boussau B, Chevenet F, Comte N, Davín AA, Dessimoz C, Dylus D, Hasic D, Mallo D, Planel R, Posada D, Scornavacca C, Szöllosi G, Zhang L, Tannier É, Daubin V. RecPhyloXML: a format for reconciled gene trees. Bioinformatics 2019; 34:3646-3652. [PMID: 29762653 PMCID: PMC6198865 DOI: 10.1093/bioinformatics/bty389] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 05/09/2018] [Indexed: 12/21/2022] Open
Abstract
Motivation A reconciliation is an annotation of the nodes of a gene tree with evolutionary events—for example, speciation, gene duplication, transfer, loss, etc.—along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative—albeit flexible—specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. Availability and implementation http://phylariane.univ-lyon1.fr/recphyloxml/.
Collapse
Affiliation(s)
- Wandrille Duchemin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France.,MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Guillaume Gence
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| | - Anne-Muriel Arigon Chifolleau
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France
| | - Lars Arvestad
- Department of Mathematics, Stockholm University, Stockholm, Sweden.,Swedish e-Science Research Centre (SeRC), Stockholm, Sweden
| | - Mukul S Bansal
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.,Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - Vincent Berry
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France.,ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier, France
| | - Bastien Boussau
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| | - François Chevenet
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,MIVEGEC, CNRS 5290, IRD 224, Université de Montpellier, Montpellier, France
| | - Nicolas Comte
- INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| | - Adrián A Davín
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution and Environment, University College London, London, UK.,Department of Computer Science, University College London, London, UK.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - David Dylus
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Damir Hasic
- Department of Mathematics, Faculty of Science, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Diego Mallo
- Virginia G. Piper Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Rémi Planel
- Laboratoire d'Analyse Bio-informatique en Génomique et Métabolisme CNRS-UMR 8030, Commissariat à l'Énergie Atomique (CEA), Institut de Génomique, Genoscope, Evry, France
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Celine Scornavacca
- Institut de Biologie Computationnelle (IBC), Montpellier, France.,ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier, France
| | - Gergely Szöllosi
- MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Louxin Zhang
- Department of Mathematics, National University of Singapore, Singapore, Singapore
| | - Éric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| |
Collapse
|
34
|
Perez‐Lamarque B, Morlon H. Characterizing symbiont inheritance during host–microbiota evolution: Application to the great apes gut microbiota. Mol Ecol Resour 2019; 19:1659-1671. [DOI: 10.1111/1755-0998.13063] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Revised: 06/26/2019] [Accepted: 06/28/2019] [Indexed: 01/19/2023]
Affiliation(s)
- Benoît Perez‐Lamarque
- Institut de Biologie de l'ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM PSL University Paris France
- Muséum national d'Histoire naturelleUMR 7205 CNRS‐MNHN‐UPMC‐EPHE “Institut de Systématique, Evolution, Biodiversité – ISYEB” Herbier National 16 rue Buffon Paris France
| | - Hélène Morlon
- Institut de Biologie de l'ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM PSL University Paris France
| |
Collapse
|
35
|
Satler JD, Herre EA, Jandér KC, Eaton DAR, Machado CA, Heath TA, Nason JD. Inferring processes of coevolutionary diversification in a community of Panamanian strangler figs and associated pollinating wasps. Evolution 2019; 73:2295-2311. [PMID: 31339553 DOI: 10.1111/evo.13809] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 07/18/2019] [Indexed: 12/27/2022]
Abstract
The fig and pollinator wasp obligate mutualism is diverse (∼750 described species), ecologically important, and ancient (∼80 Ma). Once thought to be an example of strict one-to-one cospeciation, current thinking suggests genera of pollinator wasps codiversify with corresponding sections of figs, but the degree to which cospeciation or other processes contribute to the association at finer scales is unclear. Here, we use genome-wide sequence data from a community of Panamanian strangler figs and associated wasp pollinators to estimate the relative contributions of four evolutionary processes generating cophylogenetic patterns in this mutualism: cospeciation, host switching, pollinator speciation, and pollinator extinction. Using a model-based approach adapted from the study of gene family evolution, our results demonstrate the importance of host switching of pollinator wasps at this fine phylogenetic and regional scale. Although we estimate a modest amount of cospeciation, simulations reveal the number of putative cospeciation events to be consistent with what would be expected by chance. Additionally, model selection tests identify host switching as a critical parameter for explaining cophylogenetic patterns in this system. Our study demonstrates a promising approach through which the history of evolutionary association between interacting lineages can be rigorously modeled and tested in a probabilistic phylogenetic framework.
Collapse
Affiliation(s)
- Jordan D Satler
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, 50011
| | - Edward Allen Herre
- Smithsonian Tropical Research Institute, Unit 9100, P.O. Box 0498, Diplomatic Post Office, Armed Forces America 34002-9998
| | - K Charlotte Jandér
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, 752 36, Uppsala, Sweden
| | - Deren A R Eaton
- Department of Ecology, Evolution and Environmental Biology, Columbia University, New York, New York, 10027
| | - Carlos A Machado
- Department of Biology, University of Maryland, College Park, Maryland, 20742
| | - Tracy A Heath
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, 50011
| | - John D Nason
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, 50011
| |
Collapse
|
36
|
Zwaenepoel A, Van de Peer Y. Inference of Ancient Whole-Genome Duplications and the Evolution of Gene Duplication and Loss Rates. Mol Biol Evol 2019; 36:1384-1404. [PMID: 31004147 DOI: 10.1093/molbev/msz088] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Gene tree-species tree reconciliation methods have been employed for studying ancient whole-genome duplication (WGD) events across the eukaryotic tree of life. Most approaches have relied on using maximum likelihood trees and the maximum parsimony reconciliation thereof to count duplication events on specific branches of interest in a reference species tree. Such approaches do not account for uncertainty in the gene tree and reconciliation, or do so only heuristically. The effects of these simplifications on the inference of ancient WGDs are unclear. In particular, the effects of variation in gene duplication and loss rates across the species tree have not been considered. Here, we developed a full probabilistic approach for phylogenomic reconciliation-based WGD inference, accounting for both gene tree and reconciliation uncertainty using a method based on the principle of amalgamated likelihood estimation. The model and methods are implemented in a maximum likelihood and Bayesian setting and account for variation of duplication and loss rates across the species tree, using methods inspired by phylogenetic divergence time estimation. We applied our newly developed framework to ancient WGDs in land plants and investigated the effects of duplication and loss rate variation on reconciliation and gene count based assessment of these earlier proposed WGDs.
Collapse
Affiliation(s)
- Arthur Zwaenepoel
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent, Belgium
- Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa
| |
Collapse
|
37
|
Gene tree species tree reconciliation with gene conversion. J Math Biol 2019; 78:1981-2014. [PMID: 30767052 DOI: 10.1007/s00285-019-01331-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Revised: 10/03/2018] [Indexed: 01/19/2023]
Abstract
Gene tree/species tree reconciliation is a recent decisive progress in phylogenetic methods, accounting for the possible differences between gene histories and species histories. Reconciliation consists in explaining these differences by gene-scale events such as duplication, loss, transfer, which translates mathematically into a mapping between gene tree nodes and species tree nodes or branches. Gene conversion is a frequent and important evolutionary event, which results in the replacement of a gene by a copy of another from the same species and in the same gene tree. Including this event in reconciliation models has never been attempted because it introduces a dependency between lineages, and standard algorithms based on dynamic programming become ineffective. We propose here a novel mathematical framework including gene conversion as an evolutionary event in gene tree/species tree reconciliation. We describe a randomized algorithm that finds, in polynomial running time, a reconciliation minimizing the number of duplications, losses and conversions in the case when their weights are equal. We show that the space of optimal reconciliations includes an analog of the last common ancestor reconciliation, but is not limited to it. Our algorithm outputs any optimal reconciliation with a non-null probability. We argue that this study opens a research avenue on including gene conversion in reconciliation, and discuss its possible importance in biology.
Collapse
|
38
|
Zhang C, Ogilvie HA, Drummond AJ, Stadler T. Bayesian Inference of Species Networks from Multilocus Sequence Data. Mol Biol Evol 2019; 35:504-517. [PMID: 29220490 PMCID: PMC5850812 DOI: 10.1093/molbev/msx307] [Citation(s) in RCA: 103] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Reticulate species evolution, such as hybridization or introgression, is relatively common in nature. In the presence of reticulation, species relationships can be captured by a rooted phylogenetic network, and orthologous gene evolution can be modeled as bifurcating gene trees embedded in the species network. We present a Bayesian approach to jointly infer species networks and gene trees from multilocus sequence data. A novel birth-hybridization process is used as the prior for the species network, and we assume a multispecies network coalescent prior for the embedded gene trees. We verify the ability of our method to correctly sample from the posterior distribution, and thus to infer a species network, through simulations. To quantify the power of our method, we reanalyze two large data sets of genes from spruces and yeasts. For the three closely related spruces, we verify the previously suggested homoploid hybridization event in this clade; for the yeast data, we find extensive hybridization events. Our method is available within the BEAST 2 add-on SpeciesNetwork, and thus provides an extensible framework for Bayesian inference of reticulate evolution.
Collapse
Affiliation(s)
- Chi Zhang
- Department of Biosystems Science and Engineering, Eidgenössische Technische Hochschule Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Switzerland.,Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China
| | - Huw A Ogilvie
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia.,Centre for Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Alexei J Drummond
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand.,Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, Eidgenössische Technische Hochschule Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Switzerland
| |
Collapse
|
39
|
Mykowiecka A, Szczesny P, Gorecki P. Inferring Gene-Species Assignments in the Presence of Horizontal Gene Transfer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1571-1578. [PMID: 28541905 DOI: 10.1109/tcbb.2017.2707083] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
BACKGROUND Microbial communities from environmental samples show great diversity as bacteria quickly responds to changes in their ecosystems. To assess the scenario of the actual changes, metagenomics experiments aimed at sequencing genomic DNA from such samples are performed. These new obtained sequences together with already known are used to infer phylogenetic trees assessing the taxonomic groups the species with these genes belong to. Here, we propose the first approach to the gene-species assignment problem by using reconciliation with horizontal gene transfer. RESULTS We propose efficient algorithms that search for optimal gene-species mappings taking into account gene duplication, loss and transfer events under two tractable models of HGT reconciliation. CONCLUSIONS We calculate both the optimal cost and all possible optimal scenarios. Furthermore as the number of optimal reconstructions can be large, we use a Monte-Carlo method for the inference of approximate distributions of gene-species assignments. We demonstrate the applicability on empirical and simulated datasets.
Collapse
|
40
|
Abstract
Biodiversity has always been predominantly microbial, and the scarcity of fossils from bacteria, archaea and microbial eukaryotes has prevented a comprehensive dating of the tree of life. Here, we show that patterns of lateral gene transfer deduced from an analysis of modern genomes encode a novel and abundant source of information about the temporal coexistence of lineages throughout the history of life. We use state-of-the-art species tree-aware phylogenetic methods to reconstruct the history of thousands of gene families and demonstrate that dates implied by gene transfers are consistent with estimates from relaxed molecular clocks in Bacteria, Archaea and Eukarya. We present the order of speciations according to lateral gene transfer data calibrated to geological time for three datasets comprising 40 genomes for Cyanobacteria, 60 genomes for Archaea and 60 genomes for Fungi. An inspection of discrepancies between transfers and clocks and a comparison with mammalian fossils show that gene transfer in microbes is potentially as informative for dating the tree of life as the geological record in macroorganisms.
Collapse
|
41
|
Horizontal gene transfer constrains the timing of methanogen evolution. Nat Ecol Evol 2018; 2:897-903. [DOI: 10.1038/s41559-018-0513-7] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 02/20/2018] [Indexed: 11/08/2022]
|
42
|
Duchemin W, Anselmetti Y, Patterson M, Ponty Y, Bérard S, Chauve C, Scornavacca C, Daubin V, Tannier E. DeCoSTAR: Reconstructing the Ancestral Organization of Genes or Genomes Using Reconciled Phylogenies. Genome Biol Evol 2018; 9:1312-1319. [PMID: 28402423 PMCID: PMC5441342 DOI: 10.1093/gbe/evx069] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/07/2017] [Indexed: 12/15/2022] Open
Abstract
DeCoSTAR is a software that aims at reconstructing the organization of ancestral genes or genomes in the form of sets of neighborhood relations (adjacencies) between pairs of ancestral genes or gene domains. It can also improve the assembly of fragmented genomes by proposing evolutionary-induced adjacencies between scaffolding fragments. Ancestral genes or domains are deduced from reconciled phylogenetic trees under an evolutionary model that considers gains, losses, speciations, duplications, and transfers as possible events for gene evolution. Reconciliations are either given as input or computed with the ecceTERA package, into which DeCoSTAR is integrated. DeCoSTAR computes adjacency evolutionary scenarios using a scoring scheme based on a weighted sum of adjacency gains and breakages. Solutions, both optimal and near-optimal, are sampled according to the Boltzmann–Gibbs distribution centered around parsimonious solutions, and statistical supports on ancestral and extant adjacencies are provided. DeCoSTAR supports the features of previously contributed tools that reconstruct ancestral adjacencies, namely DeCo, DeCoLT, ART-DeCo, and DeClone. In a few minutes, DeCoSTAR can reconstruct the evolutionary history of domains inside genes, of gene fusion and fission events, or of gene order along chromosomes, for large data sets including dozens of whole genomes from all kingdoms of life. We illustrate the potential of DeCoSTAR with several applications: ancestral reconstruction of gene orders for Anopheles mosquito genomes, multidomain proteins in Drosophila, and gene fusion and fission detection in Actinobacteria. Availability:http://pbil.univ-lyon1.fr/software/DeCoSTAR (Last accessed April 24, 2017).
Collapse
Affiliation(s)
- Wandrille Duchemin
- Inria Grenoble Rhône-Alpes, Montbonnot, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| | - Yoann Anselmetti
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France.,Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Murray Patterson
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France.,Experimental Algorithmics Lab (AlgoLab), Dipartimento di Informatica, Sistemistica e Comunicazione (DISCo), Università degli Studi di Milano-Bicocca, Viale Sarca, Milano, Italy
| | - Yann Ponty
- CNRS, Ecole Polytechnique, LIX UMR7161, Palaiseau, France.,Inria Saclay, EP AMIB, Palaiseau, France
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.,LIRMM, Université de Montpellier, CNRS, Montpellier, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Celine Scornavacca
- Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Vincent Daubin
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| | - Eric Tannier
- Inria Grenoble Rhône-Alpes, Montbonnot, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| |
Collapse
|
43
|
Cariou M, Duret L, Charlat S. The global impact ofWolbachiaon mitochondrial diversity and evolution. J Evol Biol 2017; 30:2204-2210. [DOI: 10.1111/jeb.13186] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Revised: 09/01/2017] [Accepted: 09/25/2017] [Indexed: 02/06/2023]
Affiliation(s)
- M. Cariou
- Université de Lyon; Université Lyon 1; CNRS; UMR 5558; Laboratoire de Biométrie et Biologie Evolutive; Villeurbanne France
| | - L. Duret
- Université de Lyon; Université Lyon 1; CNRS; UMR 5558; Laboratoire de Biométrie et Biologie Evolutive; Villeurbanne France
| | - S. Charlat
- Université de Lyon; Université Lyon 1; CNRS; UMR 5558; Laboratoire de Biométrie et Biologie Evolutive; Villeurbanne France
| |
Collapse
|
44
|
Song J, Zheng S, Nguyen N, Wang Y, Zhou Y, Lin K. Integrated pipeline for inferring the evolutionary history of a gene family embedded in the species tree: a case study on the STIMATE gene family. BMC Bioinformatics 2017; 18:439. [PMID: 28974198 PMCID: PMC5627428 DOI: 10.1186/s12859-017-1850-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2017] [Accepted: 09/26/2017] [Indexed: 11/28/2022] Open
Abstract
Background Because phylogenetic inference is an important basis for answering many evolutionary problems, a large number of algorithms have been developed. Some of these algorithms have been improved by integrating gene evolution models with the expectation of accommodating the hierarchy of evolutionary processes. To the best of our knowledge, however, there still is no single unifying model or algorithm that can take all evolutionary processes into account through a stepwise or simultaneous method. Results On the basis of three existing phylogenetic inference algorithms, we built an integrated pipeline for inferring the evolutionary history of a given gene family; this pipeline can model gene sequence evolution, gene duplication-loss, gene transfer and multispecies coalescent processes. As a case study, we applied this pipeline to the STIMATE (TMEM110) gene family, which has recently been reported to play an important role in store-operated Ca2+ entry (SOCE) mediated by ORAI and STIM proteins. We inferred their phylogenetic trees in 69 sequenced chordate genomes. Conclusions By integrating three tree reconstruction algorithms with diverse evolutionary models, a pipeline for inferring the evolutionary history of a gene family was developed, and its application was demonstrated. Electronic supplementary material The online version of this article (10.1186/s12859-017-1850-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jia Song
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Sisi Zheng
- Beijing Key Laboratory of Gene Resources and Molecular Development College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Nhung Nguyen
- Center for Translational Cancer Research, Institute of Biosciences and Technology, Department of Medical Physiology, College of Medicine, Texas A&M University, Houston, TX, 77030, USA
| | - Youjun Wang
- Beijing Key Laboratory of Gene Resources and Molecular Development College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Yubin Zhou
- Center for Translational Cancer Research, Institute of Biosciences and Technology, Department of Medical Physiology, College of Medicine, Texas A&M University, Houston, TX, 77030, USA
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
45
|
The growing tree of Archaea: new perspectives on their diversity, evolution and ecology. ISME JOURNAL 2017; 11:2407-2425. [PMID: 28777382 DOI: 10.1038/ismej.2017.122] [Citation(s) in RCA: 231] [Impact Index Per Article: 28.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Revised: 04/07/2017] [Accepted: 06/07/2017] [Indexed: 01/19/2023]
Abstract
The Archaea occupy a key position in the Tree of Life, and are a major fraction of microbial diversity. Abundant in soils, ocean sediments and the water column, they have crucial roles in processes mediating global carbon and nutrient fluxes. Moreover, they represent an important component of the human microbiome, where their role in health and disease is still unclear. The development of culture-independent sequencing techniques has provided unprecedented access to genomic data from a large number of so far inaccessible archaeal lineages. This is revolutionizing our view of the diversity and metabolic potential of the Archaea in a wide variety of environments, an important step toward understanding their ecological role. The archaeal tree is being rapidly filled up with new branches constituting phyla, classes and orders, generating novel challenges for high-rank systematics, and providing key information for dissecting the origin of this domain, the evolutionary trajectories that have shaped its current diversity, and its relationships with Bacteria and Eukarya. The present picture is that of a huge diversity of the Archaea, which we are only starting to explore.
Collapse
|
46
|
|
47
|
Groussin M, Mazel F, Sanders JG, Smillie CS, Lavergne S, Thuiller W, Alm EJ. Unraveling the processes shaping mammalian gut microbiomes over evolutionary time. Nat Commun 2017; 8:14319. [PMID: 28230052 PMCID: PMC5331214 DOI: 10.1038/ncomms14319] [Citation(s) in RCA: 275] [Impact Index Per Article: 34.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Accepted: 12/08/2016] [Indexed: 12/28/2022] Open
Abstract
Whether mammal–microbiome interactions are persistent and specific over evolutionary time is controversial. Here we show that host phylogeny and major dietary shifts have affected the distribution of different gut bacterial lineages and did so on vastly different bacterial phylogenetic resolutions. Diet mostly influences the acquisition of ancient and large microbial lineages. Conversely, correlation with host phylogeny is mostly seen among more recently diverged bacterial lineages, consistent with processes operating at similar timescales to host evolution. Considering microbiomes at appropriate phylogenetic scales allows us to model their evolution along the mammalian tree and to infer ancient diets from the predicted microbiomes of mammalian ancestors. Phylogenetic analyses support co-speciation as having a significant role in the evolution of mammalian gut microbiome compositions. Highly co-speciating bacterial genera are also associated with immune diseases in humans, laying a path for future studies that probe these co-speciating bacteria for signs of co-evolution. Both host diet and phylogeny have been argued to shape mammalian microbiome communities. Here, the authors show that diet predicts the presence of ancient bacterial lineages in the microbiome, but that co-speciation between more recent bacterial lineages and their hosts may drive associations between microbiome composition and phylogeny.
Collapse
Affiliation(s)
- Mathieu Groussin
- Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.,Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Florent Mazel
- Laboratoire d Ecologie Alpine, CNRS, University of Grenoble Alpes, FR-38041, Grenoble Cedex 9, France
| | - Jon G Sanders
- Organismic and Evolutionary Biology, Harvard University, 26 Oxford St, Cambridge, Massachusetts 02138, USA
| | - Chris S Smillie
- Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.,Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.,The Broad Institute of MIT and Harvard, 415 Main Street Cambridge, Massachusetts 02142, USA
| | - Sébastien Lavergne
- Laboratoire d Ecologie Alpine, CNRS, University of Grenoble Alpes, FR-38041, Grenoble Cedex 9, France
| | - Wilfried Thuiller
- Laboratoire d Ecologie Alpine, CNRS, University of Grenoble Alpes, FR-38041, Grenoble Cedex 9, France
| | - Eric J Alm
- Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.,Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.,The Broad Institute of MIT and Harvard, 415 Main Street Cambridge, Massachusetts 02142, USA
| |
Collapse
|
48
|
Bailly-Bechet M, Martins-Simões P, Szöllősi GJ, Mialdea G, Sagot MF, Charlat S. How Long Does Wolbachia Remain on Board? Mol Biol Evol 2017; 34:1183-1193. [DOI: 10.1093/molbev/msx073] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
49
|
Merkl R, Sterner R. Ancestral protein reconstruction: techniques and applications. Biol Chem 2016; 397:1-21. [PMID: 26351909 DOI: 10.1515/hsz-2015-0158] [Citation(s) in RCA: 107] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 07/30/2015] [Indexed: 11/15/2022]
Abstract
Ancestral sequence reconstruction (ASR) is the calculation of ancient protein sequences on the basis of extant ones. It is most powerful in combination with the experimental characterization of the corresponding proteins. Such analyses allow for the study of problems that are otherwise intractable. For example, ASR has been used to characterize ancestral enzymes dating back to the Paleoarchean era and to deduce properties of the corresponding habitats. In addition, the historical approach underlying ASR enables the identification of amino acid residues key to protein function, which is often not possible by only comparing extant proteins. Along these lines, residues responsible for the spectroscopic properties of protein pigments were identified as well as residues determining the binding specificity of steroid receptors. Further applications are studies related to the longevity of mutations, the contribution of gene duplications to enzyme functionalization, and the evolution of protein complexes. For these applications of ASR, we discuss recent examples; moreover, we introduce the basic principles of the underlying algorithms and present state-of-the-art protocols.
Collapse
|
50
|
Szöllősi GJ, Davín AA, Tannier E, Daubin V, Boussau B. Genome-scale phylogenetic analysis finds extensive gene transfer among fungi. Philos Trans R Soc Lond B Biol Sci 2016; 370:20140335. [PMID: 26323765 PMCID: PMC4571573 DOI: 10.1098/rstb.2014.0335] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Although the role of lateral gene transfer is well recognized in the evolution of bacteria, it is generally assumed that it has had less influence among eukaryotes. To explore this hypothesis, we compare the dynamics of genome evolution in two groups of organisms: cyanobacteria and fungi. Ancestral genomes are inferred in both clades using two types of methods: first, Count, a gene tree unaware method that models gene duplications, gains and losses to explain the observed numbers of genes present in a genome; second, ALE, a more recent gene tree-aware method that reconciles gene trees with a species tree using a model of gene duplication, loss and transfer. We compare their merits and their ability to quantify the role of transfers, and assess the impact of taxonomic sampling on their inferences. We present what we believe is compelling evidence that gene transfer plays a significant role in the evolution of fungi.
Collapse
Affiliation(s)
- Gergely J Szöllősi
- ELTE-MTA 'Lendület' Biophysics Research Group, Pázmány P. stny. 1A, 1117 Budapest, Hungary
| | - Adrián Arellano Davín
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, 69000 Lyon, France
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, 69000 Lyon, France Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, 38334 Montbonnot, France
| | - Vincent Daubin
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, 69000 Lyon, France Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, 69622 Villeurbanne, France
| | - Bastien Boussau
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, 69000 Lyon, France Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, 69622 Villeurbanne, France
| |
Collapse
|