1
|
Parey E, Roest Crollius H, Berthelot C. SCORPiOs, a Novel Method to Reconstruct Gene Phylogenies in the Context of a Known WGD Event. Methods Mol Biol 2023; 2545:155-173. [PMID: 36720812 DOI: 10.1007/978-1-0716-2561-3_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Phylogenetic gene trees recapitulate the evolutionary history of genes across species, forming an essential framework for comparative genomic studies. In particular, within the context of whole-genome duplications (WGDs), they serve as a basis to investigate patterns of duplicate gene retention and loss, timing of genome rediploidization, and, more generally, to explore the functional consequences of the duplication in descending species. Yet, despite ever more sophisticated models to describe the evolution of gene sequences, building accurate gene trees remains a challenge in ancient polyploid taxons. WGDs generate complex gene families with many duplicated copies and recurrent gene losses, which complicate this task even more. Here, we describe how to use SCORPiOs, a novel method that leverages synteny conservation to provide more accurate phylogenies in the presence of a known WGD event.
Collapse
Affiliation(s)
- Elise Parey
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
- INRAE, LPGP, Rennes, France
| | - Hugues Roest Crollius
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Camille Berthelot
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France.
| |
Collapse
|
2
|
Parey E, Louis A, Montfort J, Guiguen Y, Crollius HR, Berthelot C. An atlas of fish genome evolution reveals delayed rediploidization following the teleost whole-genome duplication. Genome Res 2022; 32:1685-1697. [PMID: 35961774 PMCID: PMC9528989 DOI: 10.1101/gr.276953.122] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 08/09/2022] [Indexed: 11/25/2022]
Abstract
Teleost fishes are ancient tetraploids descended from an ancestral whole-genome duplication that may have contributed to the impressive diversification of this clade. Whole-genome duplications can occur via self-doubling (autopolyploidy) or via hybridization between different species (allopolyploidy). The mode of tetraploidization conditions evolutionary processes by which duplicated genomes return to diploid meiotic pairing, and subsequent genetic divergence of duplicated genes (cytological and genetic rediploidization). How teleosts became tetraploid remains unresolved, leaving a fundamental gap in the interpretation of their functional evolution. As a result of the whole-genome duplication, identifying orthologous and paralogous genomic regions across teleosts is challenging, hindering genome-wide investigations into their polyploid history. Here, we combine tailored gene phylogeny methodology together with a state-of-the-art ancestral karyotype reconstruction to establish the first high-resolution comparative atlas of paleopolyploid regions across 74 teleost genomes. We then leverage this atlas to investigate how rediploidization occurred in teleosts at the genome-wide level. We uncover that some duplicated regions maintained tetraploidy for more than 60 million years, with three chromosome pairs diverging genetically only after the separation of major teleost families. This evidence suggests that the teleost ancestor was an autopolyploid. Further, we find evidence for biased gene retention along several duplicated chromosomes, contradicting current paradigms that asymmetrical evolution is specific to allopolyploids. Altogether, our results offer novel insights into genome evolutionary dynamics following ancient polyploidizations in vertebrates.
Collapse
Affiliation(s)
- Elise Parey
- Institut de Biologie de l'Ecole normale supérieure (IBENS), Département de Biologie, Ecole normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
- INRAE, LPGP, 35000, Rennes, France
| | - Alexandra Louis
- Institut de Biologie de l'Ecole normale supérieure (IBENS), Département de Biologie, Ecole normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | | | | | - Hugues Roest Crollius
- Institut de Biologie de l'Ecole normale supérieure (IBENS), Département de Biologie, Ecole normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Camille Berthelot
- Institut de Biologie de l'Ecole normale supérieure (IBENS), Département de Biologie, Ecole normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| |
Collapse
|
3
|
Abstract
Syntenies are genomic segments of consecutive genes identified by a certain conservation in gene content and order. The notion of conservation may vary from one definition to another, the more constrained requiring identical gene contents and gene orders, while more relaxed definitions just require a certain similarity in gene content, and not necessarily in the same order. Regardless of the way they are identified, the goal is to characterize homologous genomic regions, i.e., regions deriving from a common ancestral region, reflecting a certain gene co-evolution that can enlighten important functional properties. In addition of being able to identify them, it is also necessary to infer the evolutionary history that has led from the ancestral segment to the extant ones. In this field, most algorithmic studies address the problem of inferring rearrangement scenarios explaining the disruption in gene order between segments with the same gene content, some of them extending the evolutionary model to gene insertion and deletion. However, syntenies also evolve through other events modifying their content in genes, such as duplications, losses or horizontal gene transfers, i.e., the movement of genes from one species to another. Although the reconciliation approach between a gene tree and a species tree addresses the problem of inferring such events for single-gene families, little effort has been dedicated to the generalization to segmental events and to syntenies. This paper reviews some of the main algorithmic methods for inferring ancestral syntenies and focus on those integrating both gene orders and gene trees.
Collapse
|
4
|
Morel B, Kozlov AM, Stamatakis A, Szöllősi GJ. GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss. Mol Biol Evol 2021; 37:2763-2774. [PMID: 32502238 PMCID: PMC8312565 DOI: 10.1093/molbev/msaa141] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Inferring phylogenetic trees for individual homologous gene families is difficult because
alignments are often too short, and thus contain insufficient signal, while substitution
models inevitably fail to capture the complexity of the evolutionary processes. To
overcome these challenges, species-tree-aware methods also leverage information from a
putative species tree. However, only few methods are available that implement a full
likelihood framework or account for horizontal gene transfers. Furthermore, these methods
often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on
approximations and heuristics that limit the degree of tree space exploration. Here, we
present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference
software. It simultaneously accounts for substitutions at the sequence level as well as
gene level events, such as duplication, transfer, and loss relying on established maximum
likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for
multiple gene families, directly from the per-gene sequence alignments and a rooted, yet
undated, species tree. We show that compared with competing tools, on simulated data
GeneRax infers trees that are the closest to the true tree in 90% of the simulations in
terms of relative Robinson–Foulds distance. On empirical data sets, GeneRax is the fastest
among all tested methods when starting from aligned sequences, and it infers trees with
the highest likelihood score, based on our model. GeneRax completed tree inferences and
reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its
parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at
https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).
Collapse
Affiliation(s)
- Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Alexey M Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.,Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Gergely J Szöllősi
- ELTE-MTA "Lendület" Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös University, Budapest, Hungary.,Evolutionary Systems Research Group, Centre for Ecological Research, Hungarian Academy of Sciences, Tihany, Hungary
| |
Collapse
|
5
|
Comte N, Morel B, Hasić D, Guéguen L, Boussau B, Daubin V, Penel S, Scornavacca C, Gouy M, Stamatakis A, Tannier E, Parsons DP. Treerecs: an integrated phylogenetic tool, from sequences to reconciliations. Bioinformatics 2021; 36:4822-4824. [PMID: 33085745 DOI: 10.1093/bioinformatics/btaa615] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 06/22/2020] [Accepted: 07/09/2020] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Gene and species tree reconciliation methods are used to interpret gene trees, root them and correct uncertainties that are due to scarcity of signal in multiple sequence alignments. So far, reconciliation tools have not been integrated in standard phylogenetic software and they either lack performance on certain functions, or usability for biologists. RESULTS We present Treerecs, a phylogenetic software based on duplication-loss reconciliation. Treerecs is simple to install and to use. It is fast and versatile, has a graphic output, and can be used along with methods for phylogenetic inference on multiple alignments like PLL and Seaview. AVAILABILITY AND IMPLEMENTATION Treerecs is open-source. Its source code (C++, AGPLv3) and manuals are available from https://project.inria.fr/treerecs/.
Collapse
Affiliation(s)
- Nicolas Comte
- Inria Grenoble Rhône-Alpes, 38334 Montbonnot, France
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Damir Hasić
- Department of Mathematics, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Laurent Guéguen
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Bastien Boussau
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Vincent Daubin
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Simon Penel
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Celine Scornavacca
- ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier 34000, France
| | - Manolo Gouy
- Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.,Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Eric Tannier
- Inria Grenoble Rhône-Alpes, 38334 Montbonnot, France.,Université de Lyon, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR5558, F-69622 Villeurbanne, France
| | | |
Collapse
|
6
|
Saclier N, Chardon P, Malard F, Konecny-Dupré L, Eme D, Bellec A, Breton V, Duret L, Lefebure T, Douady CJ. Bedrock radioactivity influences the rate and spectrum of mutation. eLife 2020; 9:56830. [PMID: 33252037 PMCID: PMC7723406 DOI: 10.7554/elife.56830] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 11/30/2020] [Indexed: 12/24/2022] Open
Abstract
All organisms on Earth are exposed to low doses of natural radioactivity but some habitats are more radioactive than others. Yet, documenting the influence of natural radioactivity on the evolution of biodiversity is challenging. Here, we addressed whether organisms living in naturally more radioactive habitats accumulate more mutations across generations using 14 species of waterlice living in subterranean habitats with contrasted levels of radioactivity. We found that the mitochondrial and nuclear mutation rates across a waterlouse species’ genome increased on average by 60% and 30%, respectively, when radioactivity increased by a factor of three. We also found a positive correlation between the level of radioactivity and the probability of G to T (and complementary C to A) mutations, a hallmark of oxidative stress. We conclude that even low doses of natural bedrock radioactivity influence the mutation rate possibly through the accumulation of oxidative damage, in particular in the mitochondrial genome.
Collapse
Affiliation(s)
- Nathanaëlle Saclier
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5023, ENTPE, Laboratoire d'Ecologie des Hydrosystèmes Naturels et Anthropisés, Villeurbanne, France
| | - Patrick Chardon
- LPC, Université Clermont Auvergne, CNRS/IN2P3 UMR6533, Clermont-Ferrand, France
| | - Florian Malard
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5023, ENTPE, Laboratoire d'Ecologie des Hydrosystèmes Naturels et Anthropisés, Villeurbanne, France
| | - Lara Konecny-Dupré
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5023, ENTPE, Laboratoire d'Ecologie des Hydrosystèmes Naturels et Anthropisés, Villeurbanne, France
| | - David Eme
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5023, ENTPE, Laboratoire d'Ecologie des Hydrosystèmes Naturels et Anthropisés, Villeurbanne, France
| | - Arnaud Bellec
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5023, ENTPE, Laboratoire d'Ecologie des Hydrosystèmes Naturels et Anthropisés, Villeurbanne, France.,Univ Lyon, Université Jean Moulin Lyon 3, CNRS UMR 5600 Environnement Ville Société, Lyon, France
| | - Vincent Breton
- LPC, Université Clermont Auvergne, CNRS/IN2P3 UMR6533, Clermont-Ferrand, France
| | - Laurent Duret
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France
| | - Tristan Lefebure
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5023, ENTPE, Laboratoire d'Ecologie des Hydrosystèmes Naturels et Anthropisés, Villeurbanne, France
| | - Christophe J Douady
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5023, ENTPE, Laboratoire d'Ecologie des Hydrosystèmes Naturels et Anthropisés, Villeurbanne, France.,Institut Universitaire de France, Paris, France
| |
Collapse
|
7
|
Zhang C, Scornavacca C, Molloy EK, Mirarab S. ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy. Mol Biol Evol 2020; 37:3292-3307. [PMID: 32886770 PMCID: PMC7751180 DOI: 10.1093/molbev/msaa139] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Phylogenetic inference from genome-wide data (phylogenomics) has revolutionized the study of evolution because it enables accounting for discordance among evolutionary histories across the genome. To this end, summary methods have been developed to allow accurate and scalable inference of species trees from gene trees. However, most of these methods, including the widely used ASTRAL, can only handle single-copy gene trees and do not attempt to model gene duplication and gene loss. As a result, most phylogenomic studies have focused on single-copy genes and have discarded large parts of the data. Here, we first propose a measure of quartet similarity between single-copy and multicopy trees that accounts for orthology and paralogy. We then introduce a method called ASTRAL-Pro (ASTRAL for PaRalogs and Orthologs) to find the species tree that optimizes our quartet similarity measure using dynamic programing. By studying its performance on an extensive collection of simulated data sets and on real data sets, we show that ASTRAL-Pro is more accurate than alternative methods.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, University of California San Diego, San Diego, CA
| | | | - Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA
| |
Collapse
|
8
|
Parey E, Louis A, Cabau C, Guiguen Y, Roest Crollius H, Berthelot C. Synteny-Guided Resolution of Gene Trees Clarifies the Functional Impact of Whole-Genome Duplications. Mol Biol Evol 2020; 37:3324-3337. [DOI: 10.1093/molbev/msaa149] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Abstract
Whole-genome duplications (WGDs) have major impacts on the evolution of species, as they produce new gene copies contributing substantially to adaptation, isolation, phenotypic robustness, and evolvability. They result in large, complex gene families with recurrent gene losses in descendant species that sequence-based phylogenetic methods fail to reconstruct accurately. As a result, orthologs and paralogs are difficult to identify reliably in WGD-descended species, which hinders the exploration of functional consequences of WGDs. Here, we present Synteny-guided CORrection of Paralogies and Orthologies (SCORPiOs), a novel method to reconstruct gene phylogenies in the context of a known WGD event. WGDs generate large duplicated syntenic regions, which SCORPiOs systematically leverages as a complement to sequence evolution to infer the evolutionary history of genes. We applied SCORPiOs to the 320-My-old WGD at the origin of teleost fish. We find that almost one in four teleost gene phylogenies in the Ensembl database (3,394) are inconsistent with their syntenic contexts. For 70% of these gene families (2,387), we were able to propose an improved phylogenetic tree consistent with both the molecular substitution distances and the local syntenic information. We show that these synteny-guided phylogenies are more congruent with the species tree, with sequence evolution and with expected expression conservation patterns than those produced by state-of-the-art methods. Finally, we show that synteny-guided gene trees emphasize contributions of WGD paralogs to evolutionary innovations in the teleost clade.
Collapse
Affiliation(s)
- Elise Parey
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Alexandra Louis
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Cédric Cabau
- SIGENAE, GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France
| | | | - Hugues Roest Crollius
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Camille Berthelot
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| |
Collapse
|
9
|
Nagy LG, Merényi Z, Hegedüs B, Bálint B. Novel phylogenetic methods are needed for understanding gene function in the era of mega-scale genome sequencing. Nucleic Acids Res 2020; 48:2209-2219. [PMID: 31943056 PMCID: PMC7049691 DOI: 10.1093/nar/gkz1241] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 12/15/2019] [Accepted: 12/31/2019] [Indexed: 12/21/2022] Open
Abstract
Ongoing large-scale genome sequencing projects are forecasting a data deluge that will almost certainly overwhelm current analytical capabilities of evolutionary genomics. In contrast to population genomics, there are no standardized methods in evolutionary genomics for extracting evolutionary and functional (e.g. gene-trait association) signal from genomic data. Here, we examine how current practices of multi-species comparative genomics perform in this aspect and point out that many genomic datasets are under-utilized due to the lack of powerful methodologies. As a result, many current analyses emphasize gene families for which some functional data is already available, resulting in a growing gap between functionally well-characterized genes/organisms and the universe of unknowns. This leaves unknown genes on the 'dark side' of genomes, a problem that will not be mitigated by sequencing more and more genomes, unless we develop tools to infer functional hypotheses for unknown genes in a systematic manner. We provide an inventory of recently developed methods capable of predicting gene-gene and gene-trait associations based on comparative data, then argue that realizing the full potential of whole genome datasets requires the integration of phylogenetic comparative methods into genomics, a rich but underutilized toolbox for looking into the past.
Collapse
Affiliation(s)
- László G Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Zsolt Merényi
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Botond Hegedüs
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Balázs Bálint
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| |
Collapse
|
10
|
A Screen for Gene Paralogies Delineating Evolutionary Branching Order of Early Metazoa. G3-GENES GENOMES GENETICS 2020; 10:811-826. [PMID: 31879283 PMCID: PMC7003098 DOI: 10.1534/g3.119.400951] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The evolutionary diversification of animals is one of Earth’s greatest marvels, yet its earliest steps are shrouded in mystery. Animals, the monophyletic clade known as Metazoa, evolved wildly divergent multicellular life strategies featuring ciliated sensory epithelia. In many lineages epithelial sensoria became coupled to increasingly complex nervous systems. Currently, different phylogenetic analyses of single-copy genes support mutually-exclusive possibilities that either Porifera or Ctenophora is sister to all other animals. Resolving this dilemma would advance the ecological and evolutionary understanding of the first animals and the evolution of nervous systems. Here we describe a comparative phylogenetic approach based on gene duplications. We computationally identify and analyze gene families with early metazoan duplications using an approach that mitigates apparent gene loss resulting from the miscalling of paralogs. In the transmembrane channel-like (TMC) family of mechano-transducing channels, we find ancient duplications that define separate clades for Eumetazoa (Placozoa + Cnidaria + Bilateria) vs. Ctenophora, and one duplication that is shared only by Eumetazoa and Porifera. In the Max-like protein X (MLX and MLXIP) family of bHLH-ZIP regulators of metabolism, we find that all major lineages from Eumetazoa and Porifera (sponges) share a duplicated gene pair that is sister to the single-copy gene maintained in Ctenophora. These results suggest a new avenue for deducing deep phylogeny by choosing rather than avoiding ancient gene paralogies.
Collapse
|
11
|
Christensen S, Molloy EK, Vachaspati P, Yammanuru A, Warnow T. Non-parametric correction of estimated gene trees using TRACTION. Algorithms Mol Biol 2020; 15:1. [PMID: 31911812 PMCID: PMC6942343 DOI: 10.1186/s13015-019-0161-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 12/18/2019] [Indexed: 11/16/2022] Open
Abstract
Motivation Estimated gene trees are often inaccurate, due to insufficient phylogenetic signal in the single gene alignment, among other causes. Gene tree correction aims to improve the accuracy of an estimated gene tree by using computational techniques along with auxiliary information, such as a reference species tree or sequencing data. However, gene trees and species trees can differ as a result of gene duplication and loss (GDL), incomplete lineage sorting (ILS), and other biological processes. Thus gene tree correction methods need to take estimation error as well as gene tree heterogeneity into account. Many prior gene tree correction methods have been developed for the case where GDL is present. Results Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to ILS and/or HGT. We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-optimal tree refinement and completion (RF-OTRC) Problem, which seeks a refinement and completion of a singly-labeled gene tree with respect to a given singly-labeled species tree so as to minimize the Robinson−Foulds (RF) distance. Our extensive simulation study on 68,000 estimated gene trees shows that TRACTION matches or improves on the accuracy of well-established methods from the GDL literature when HGT and ILS are both present, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. We also show that a naive generalization of the RF-OTRC problem to multi-labeled trees is possible, but can produce misleading results where gene tree heterogeneity is due to GDL.
Collapse
|
12
|
Glover N, Dessimoz C, Ebersberger I, Forslund SK, Gabaldón T, Huerta-Cepas J, Martin MJ, Muffato M, Patricio M, Pereira C, da Silva AS, Wang Y, Sonnhammer E, Thomas PD. Advances and Applications in the Quest for Orthologs. Mol Biol Evol 2020; 36:2157-2164. [PMID: 31241141 PMCID: PMC6759064 DOI: 10.1093/molbev/msz150] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Gene families evolve by the processes of speciation (creating orthologs), gene duplication (paralogs), and horizontal gene transfer (xenologs), in addition to sequence divergence and gene loss. Orthologs in particular play an essential role in comparative genomics and phylogenomic analyses. With the continued sequencing of organisms across the tree of life, the data are available to reconstruct the unique evolutionary histories of tens of thousands of gene families. Accurate reconstruction of these histories, however, is a challenging computational problem, and the focus of the Quest for Orthologs Consortium. We review the recent advances and outstanding challenges in this field, as revealed at a symposium and meeting held at the University of Southern California in 2017. Key advances have been made both at the level of orthology algorithm development and with respect to coordination across the community of algorithm developers and orthology end-users. Applications spanned a broad range, including gene function prediction, phylostratigraphy, genome evolution, and phylogenomics. The meetings highlighted the increasing use of meta-analyses integrating results from multiple different algorithms, and discussed ongoing challenges in orthology inference as well as the next steps toward improvement and integration of orthology resources.
Collapse
Affiliation(s)
- Natasha Glover
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution & Environment, University College London, London, United Kingdom.,Department of Computer Science, University College London, London, United Kingdom
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Frankfurt, Germany.,Senckenberg Biodiversity and Climate Research Centre (BIK-F), Frankfurt, Germany.,LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
| | - Sofia K Forslund
- Experimental and Clinical Research Center, A Cooperation of Charité-Universitätsmedizin Berlin and Max Delbruck Center for Molecular Medicine, Berlin, Germany.,Max Delbruck Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany.,Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität u Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany.,Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Toni Gabaldón
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,ICREA, Barcelona, Spain
| | - Jaime Huerta-Cepas
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.,Centro de Biotecnología y Genómica de Plantas, Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Mateus Patricio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Cécile Pereira
- Eura Nova, Marseille, France.,Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL
| | - Alan Sousa da Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Yan Wang
- Department of Microbiology and Plant Pathology, Institute for Integrative Genome Biology, University of California-Riverside, Riverside, CA
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| |
Collapse
|
13
|
Duchemin W, Gence G, Arigon Chifolleau AM, Arvestad L, Bansal MS, Berry V, Boussau B, Chevenet F, Comte N, Davín AA, Dessimoz C, Dylus D, Hasic D, Mallo D, Planel R, Posada D, Scornavacca C, Szöllosi G, Zhang L, Tannier É, Daubin V. RecPhyloXML: a format for reconciled gene trees. Bioinformatics 2019; 34:3646-3652. [PMID: 29762653 PMCID: PMC6198865 DOI: 10.1093/bioinformatics/bty389] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 05/09/2018] [Indexed: 12/21/2022] Open
Abstract
Motivation A reconciliation is an annotation of the nodes of a gene tree with evolutionary events—for example, speciation, gene duplication, transfer, loss, etc.—along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative—albeit flexible—specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. Availability and implementation http://phylariane.univ-lyon1.fr/recphyloxml/.
Collapse
Affiliation(s)
- Wandrille Duchemin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France.,MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Guillaume Gence
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| | - Anne-Muriel Arigon Chifolleau
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France
| | - Lars Arvestad
- Department of Mathematics, Stockholm University, Stockholm, Sweden.,Swedish e-Science Research Centre (SeRC), Stockholm, Sweden
| | - Mukul S Bansal
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.,Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - Vincent Berry
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France.,ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier, France
| | - Bastien Boussau
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| | - François Chevenet
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,MIVEGEC, CNRS 5290, IRD 224, Université de Montpellier, Montpellier, France
| | - Nicolas Comte
- INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| | - Adrián A Davín
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution and Environment, University College London, London, UK.,Department of Computer Science, University College London, London, UK.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - David Dylus
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Damir Hasic
- Department of Mathematics, Faculty of Science, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Diego Mallo
- Virginia G. Piper Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Rémi Planel
- Laboratoire d'Analyse Bio-informatique en Génomique et Métabolisme CNRS-UMR 8030, Commissariat à l'Énergie Atomique (CEA), Institut de Génomique, Genoscope, Evry, France
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Celine Scornavacca
- Institut de Biologie Computationnelle (IBC), Montpellier, France.,ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier, France
| | - Gergely Szöllosi
- MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Louxin Zhang
- Department of Mathematics, National University of Singapore, Singapore, Singapore
| | - Éric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| |
Collapse
|
14
|
Savinova OS, Moiseenko KV, Vavilova EA, Chulkin AM, Fedorova TV, Tyazhelova TV, Vasina DV. Evolutionary Relationships Between the Laccase Genes of Polyporales: Orthology-Based Classification of Laccase Isozymes and Functional Insight From Trametes hirsuta. Front Microbiol 2019; 10:152. [PMID: 30792703 PMCID: PMC6374638 DOI: 10.3389/fmicb.2019.00152] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 01/22/2019] [Indexed: 01/06/2023] Open
Abstract
Laccase is one of the oldest known and intensively studied fungal enzymes capable of oxidizing recalcitrant lignin-resembling phenolic compounds. It is currently well established that fungal genomes almost always contain several non-allelic copies of laccase genes (laccase multigene families); nevertheless, many aspects of laccase multigenicity, for example, their precise biological functions or evolutionary relationships, are mostly unknown. Here, we present a detailed evolutionary analysis of the sensu stricto laccase genes (CAZy - AA1_1) from fungi of the Polyporales order. The conducted analysis provides a better understanding of the Polyporales laccase multigenicity and allows for the systemization of the individual features of different laccase isozymes. In addition, we provide a comparison of the biochemical and catalytic properties of the four laccase isozymes from Trametes hirsuta and suggest their functional diversification within the multigene family.
Collapse
Affiliation(s)
- Olga S Savinova
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Konstantin V Moiseenko
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Ekaterina A Vavilova
- Laboratory of Gene Expression Optimization, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Andrey M Chulkin
- Laboratory of Gene Expression Optimization, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Tatiana V Fedorova
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Tatiana V Tyazhelova
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Daria V Vasina
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
15
|
Lafond M, Chauve C, El-Mabrouk N, Ouangraoua A. Gene Tree Construction and Correction Using SuperTree and Reconciliation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1560-1570. [PMID: 28678712 DOI: 10.1109/tcbb.2017.2720581] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The supertree problem asking for a tree displaying a set of consistent input trees has been largely considered for the reconstruction of species trees. Here, we rather explore this framework for the sake of reconstructing a gene tree from a set of input gene trees on partial data. In this perspective, the phylogenetic tree for the species containing the genes of interest can be used to choose among the many possible compatible "supergenetrees", the most natural criteria being to minimize a reconciliation cost. We develop a variety of algorithmic solutions for the construction and correction of gene trees using the supertree framework. A dynamic programming supertree algorithm for constructing or correcting gene trees, exponential in the number of input trees, is first developed for the less constrained version of the problem. It is then adapted to gene trees with nodes labeled as duplication or speciation, the additional constraint being to preserve the orthology and paralogy relations between genes. Then, a quadratic time algorithm is developed for efficiently correcting an initial gene tree while preserving a set of "trusted" subtrees, as well as the relative phylogenetic distance between them, in both cases of labeled or unlabeled input trees. By applying these algorithms to the set of Ensembl gene trees, we show that this new correction framework is particularly useful to correct weakly-supported duplication nodes. The C++ source code for the algorithms and simulations described in the paper are available at https://github.com/UdeM-LBIT/SuGeT.
Collapse
|
16
|
Little A, Schwerdt JG, Shirley NJ, Khor SF, Neumann K, O'Donovan LA, Lahnstein J, Collins HM, Henderson M, Fincher GB, Burton RA. Revised Phylogeny of the Cellulose Synthase Gene Superfamily: Insights into Cell Wall Evolution. PLANT PHYSIOLOGY 2018; 177:1124-1141. [PMID: 29780036 PMCID: PMC6052982 DOI: 10.1104/pp.17.01718] [Citation(s) in RCA: 98] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 05/10/2018] [Indexed: 05/18/2023]
Abstract
Cell walls are crucial for the integrity and function of all land plants and are of central importance in human health, livestock production, and as a source of renewable bioenergy. Many enzymes that mediate the biosynthesis of cell wall polysaccharides are encoded by members of the large cellulose synthase (CesA) gene superfamily. Here, we analyzed 29 sequenced genomes and 17 transcriptomes to revise the phylogeny of the CesA gene superfamily in angiosperms. Our results identify ancestral gene clusters that predate the monocot-eudicot divergence and reveal several novel evolutionary observations, including the expansion of the Poaceae-specific cellulose synthase-like CslF family to the graminids and restiids and the characterization of a previously unreported eudicot lineage, CslM, that forms a reciprocally monophyletic eudicot-monocot grouping with the CslJ clade. The CslM lineage is widely distributed in eudicots, and the CslJ clade, which was thought previously to be restricted to the Poales, is widely distributed in monocots. Our analyses show that some members of the CslJ lineage, but not the newly identified CslM genes, are capable of directing (1,3;1,4)-β-glucan biosynthesis, which, contrary to current dogma, is not restricted to Poaceae.
Collapse
Affiliation(s)
- Alan Little
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of Agriculture, Food, and Wine, University of Adelaide, Waite Campus, Glen Osmond, South Australia 5064, Australia
| | - Julian G Schwerdt
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of Agriculture, Food, and Wine, University of Adelaide, Waite Campus, Glen Osmond, South Australia 5064, Australia
| | - Neil J Shirley
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of Agriculture, Food, and Wine, University of Adelaide, Waite Campus, Glen Osmond, South Australia 5064, Australia
| | - Shi F Khor
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of Agriculture, Food, and Wine, University of Adelaide, Waite Campus, Glen Osmond, South Australia 5064, Australia
| | - Kylie Neumann
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of Agriculture, Food, and Wine, University of Adelaide, Waite Campus, Glen Osmond, South Australia 5064, Australia
| | - Lisa A O'Donovan
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of Agriculture, Food, and Wine, University of Adelaide, Waite Campus, Glen Osmond, South Australia 5064, Australia
| | - Jelle Lahnstein
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of Agriculture, Food, and Wine, University of Adelaide, Waite Campus, Glen Osmond, South Australia 5064, Australia
| | - Helen M Collins
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of Agriculture, Food, and Wine, University of Adelaide, Waite Campus, Glen Osmond, South Australia 5064, Australia
| | - Marilyn Henderson
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of Agriculture, Food, and Wine, University of Adelaide, Waite Campus, Glen Osmond, South Australia 5064, Australia
| | - Geoffrey B Fincher
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of Agriculture, Food, and Wine, University of Adelaide, Waite Campus, Glen Osmond, South Australia 5064, Australia
| | - Rachel A Burton
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of Agriculture, Food, and Wine, University of Adelaide, Waite Campus, Glen Osmond, South Australia 5064, Australia
| |
Collapse
|
17
|
Anselmetti Y, Duchemin W, Tannier E, Chauve C, Bérard S. Phylogenetic signal from rearrangements in 18 Anopheles species by joint scaffolding extant and ancestral genomes. BMC Genomics 2018; 19:96. [PMID: 29764366 PMCID: PMC5954271 DOI: 10.1186/s12864-018-4466-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Genomes rearrangements carry valuable information for phylogenetic inference or the elucidation of molecular mechanisms of adaptation. However, the detection of genome rearrangements is often hampered by current deficiencies in data and methods: Genomes obtained from short sequence reads have generally very fragmented assemblies, and comparing multiple gene orders generally leads to computationally intractable algorithmic questions. Results We present a computational method, ADseq, which, by combining ancestral gene order reconstruction, comparative scaffolding and de novo scaffolding methods, overcomes these two caveats. ADseq provides simultaneously improved assemblies and ancestral genomes, with statistical supports on all local features. Compared to previous comparative methods, it runs in polynomial time, it samples solutions in a probabilistic space, and it can handle a significantly larger gene complement from the considered extant genomes, with complex histories including gene duplications and losses. We use ADseq to provide improved assemblies and a genome history made of duplications, losses, gene translocations, rearrangements, of 18 complete Anopheles genomes, including several important malaria vectors. We also provide additional support for a differentiated mode of evolution of the sex chromosome and of the autosomes in these mosquito genomes. Conclusions We demonstrate the method’s ability to improve extant assemblies accurately through a procedure simulating realistic assembly fragmentation. We study a debated issue regarding the phylogeny of the Gambiae complex group of Anopheles genomes in the light of the evolution of chromosomal rearrangements, suggesting that the phylogenetic signal they carry can differ from the phylogenetic signal carried by gene sequences, more prone to introgression. Electronic supplementary material The online version of this article (10.1186/s12864-018-4466-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yoann Anselmetti
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.,Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France
| | - Wandrille Duchemin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France.,INRIA Grenoble - Rhône-Alpes, 655 Avenue de l'Europe, Montbonnot-Saint-Martin, 38330, France
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France.,INRIA Grenoble - Rhône-Alpes, 655 Avenue de l'Europe, Montbonnot-Saint-Martin, 38330, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, V5A1S6, BC, Canada
| | - Sèverine Bérard
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.
| |
Collapse
|
18
|
Abstract
Background One of evolutionary molecular biology fundamental issues is to discover genomic duplication events and their correspondence to the species tree. Such events can be reconstructed by clustering single gene duplications inferred by reconciling a set of gene trees with a species tree. Results Here we propose the first solutions to the genomic duplication problem in which every reconciliation with the minimal number of single gene duplications is allowed and the method of clustering called minimum episodes under the assumption that input gene trees are unrooted. Conclusions We showed new theoretical properties of unrooted reconciliation for the duplication cost and apply them to design several exact and heuristic algorithms for solving the problem. Our evaluation study on empirical dataset confirmed several genomic duplication events from the literature and demonstrate that algorithms can be successfully applied.
Collapse
Affiliation(s)
- Jarosław Paszek
- Warsaw University, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, Warsaw, 02-097, Poland.
| | - Paweł Górecki
- Warsaw University, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, Warsaw, 02-097, Poland
| |
Collapse
|
19
|
GATC: a genetic algorithm for gene tree construction under the Duplication-Transfer-Loss model of evolution. BMC Genomics 2018; 19:102. [PMID: 29764363 PMCID: PMC5954287 DOI: 10.1186/s12864-018-4455-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
20
|
Christensen S, Molloy EK, Vachaspati P, Warnow T. OCTAL: Optimal Completion of gene trees in polynomial time. Algorithms Mol Biol 2018; 13:6. [PMID: 29568323 PMCID: PMC5853121 DOI: 10.1186/s13015-018-0124-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2017] [Accepted: 03/06/2018] [Indexed: 12/16/2022] Open
Abstract
Background For a combination of reasons (including data generation protocols, approaches to taxon and gene sampling, and gene birth and loss), estimated gene trees are often incomplete, meaning that they do not contain all of the species of interest. As incomplete gene trees can impact downstream analyses, accurate completion of gene trees is desirable. Results We introduce the Optimal Tree Completion problem, a general optimization problem that involves completing an unrooted binary tree (i.e., adding missing leaves) so as to minimize its distance from a reference tree on a superset of the leaves. We present OCTAL, an algorithm that finds an optimal solution to this problem when the distance between trees is defined using the Robinson–Foulds (RF) distance, and we prove that OCTAL runs in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$O(n^2)$$\end{document}O(n2) time, where n is the total number of species. We report on a simulation study in which gene trees can differ from the species tree due to incomplete lineage sorting, and estimated gene trees are completed using OCTAL with a reference tree based on a species tree estimated from the multi-locus dataset. OCTAL produces completed gene trees that are closer to the true gene trees than an existing heuristic approach in ASTRAL-II, but the accuracy of a completed gene tree computed by OCTAL depends on how topologically similar the reference tree (typically an estimated species tree) is to the true gene tree. Conclusions OCTAL is a useful technique for adding missing taxa to incomplete gene trees and provides good accuracy under a wide range of model conditions. However, results show that OCTAL’s accuracy can be reduced when incomplete lineage sorting is high, as the reference tree can be far from the true gene tree. Hence, this study suggests that OCTAL would benefit from using other types of reference trees instead of species trees when there are large topological distances between true gene trees and species trees. Electronic supplementary material The online version of this article (10.1186/s13015-018-0124-5) contains supplementary material, which is available to authorized users.
Collapse
|
21
|
Bayzid MS, Warnow T. Gene tree parsimony for incomplete gene trees: addressing true biological loss. Algorithms Mol Biol 2018; 13:1. [PMID: 29387142 PMCID: PMC5774205 DOI: 10.1186/s13015-017-0120-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2017] [Accepted: 12/27/2017] [Indexed: 11/10/2022] Open
Abstract
Motivation Species tree estimation from gene trees can be complicated by gene duplication and loss, and “gene tree parsimony” (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species). Results We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the “standard” calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/smirarab/DynaDup.
Collapse
|
22
|
Duchemin W, Anselmetti Y, Patterson M, Ponty Y, Bérard S, Chauve C, Scornavacca C, Daubin V, Tannier E. DeCoSTAR: Reconstructing the Ancestral Organization of Genes or Genomes Using Reconciled Phylogenies. Genome Biol Evol 2018; 9:1312-1319. [PMID: 28402423 PMCID: PMC5441342 DOI: 10.1093/gbe/evx069] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/07/2017] [Indexed: 12/15/2022] Open
Abstract
DeCoSTAR is a software that aims at reconstructing the organization of ancestral genes or genomes in the form of sets of neighborhood relations (adjacencies) between pairs of ancestral genes or gene domains. It can also improve the assembly of fragmented genomes by proposing evolutionary-induced adjacencies between scaffolding fragments. Ancestral genes or domains are deduced from reconciled phylogenetic trees under an evolutionary model that considers gains, losses, speciations, duplications, and transfers as possible events for gene evolution. Reconciliations are either given as input or computed with the ecceTERA package, into which DeCoSTAR is integrated. DeCoSTAR computes adjacency evolutionary scenarios using a scoring scheme based on a weighted sum of adjacency gains and breakages. Solutions, both optimal and near-optimal, are sampled according to the Boltzmann–Gibbs distribution centered around parsimonious solutions, and statistical supports on ancestral and extant adjacencies are provided. DeCoSTAR supports the features of previously contributed tools that reconstruct ancestral adjacencies, namely DeCo, DeCoLT, ART-DeCo, and DeClone. In a few minutes, DeCoSTAR can reconstruct the evolutionary history of domains inside genes, of gene fusion and fission events, or of gene order along chromosomes, for large data sets including dozens of whole genomes from all kingdoms of life. We illustrate the potential of DeCoSTAR with several applications: ancestral reconstruction of gene orders for Anopheles mosquito genomes, multidomain proteins in Drosophila, and gene fusion and fission detection in Actinobacteria. Availability:http://pbil.univ-lyon1.fr/software/DeCoSTAR (Last accessed April 24, 2017).
Collapse
Affiliation(s)
- Wandrille Duchemin
- Inria Grenoble Rhône-Alpes, Montbonnot, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| | - Yoann Anselmetti
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France.,Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Murray Patterson
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France.,Experimental Algorithmics Lab (AlgoLab), Dipartimento di Informatica, Sistemistica e Comunicazione (DISCo), Università degli Studi di Milano-Bicocca, Viale Sarca, Milano, Italy
| | - Yann Ponty
- CNRS, Ecole Polytechnique, LIX UMR7161, Palaiseau, France.,Inria Saclay, EP AMIB, Palaiseau, France
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.,LIRMM, Université de Montpellier, CNRS, Montpellier, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Celine Scornavacca
- Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Vincent Daubin
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| | - Eric Tannier
- Inria Grenoble Rhône-Alpes, Montbonnot, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| |
Collapse
|
23
|
Mykowiecka A, Gorecki P. Credibility of Evolutionary Events in Gene Trees. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:713-726. [PMID: 29990287 DOI: 10.1109/tcbb.2017.2788888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Based on the classical non-parametric bootstrapping for phylogenetic trees, we propose a novel bootstrap method to define support for gene duplication and speciation events. By comparing bootstrap gene trees to the original gene tree, we calculate support for evolutionary events. While this approach can be used to annotate orthology and paralogy, we show how it can be used to verify the reliability of tree reconciliation. We propose a linear time algorithm for the computation of bootstrap values, and we show the correspondence of our method with the classical non-parametric bootstrapping. Finally, we present two computational experiments. In the first one, based on simulated data and nine yeast genomes, we show a comparative study of several tree rooting methods and evaluation of their performance by using our bootstrapping method. In the second experiment, using data from the TreeFam database, we tested how the reliability of the gene trees influence the inferred supertree. We found out that species trees inferred from gene trees having highly supported events are more biologically consistent.
Collapse
|
24
|
Kuitche E, Lafond M, Ouangraoua A. Reconstructing protein and gene phylogenies using reconciliation and soft-clustering. J Bioinform Comput Biol 2017; 15:1740007. [DOI: 10.1142/s0219720017400078] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The architecture of eukaryotic coding genes allows the production of several different protein isoforms by genes. Current gene phylogeny reconstruction methods make use of a single protein product per gene, ignoring information on alternative protein isoforms. These methods often lead to inaccurate gene tree reconstructions that require to be corrected before phylogenetic analyses. Here, we propose a new approach for the reconstruction of gene trees and protein trees accounting for alternative protein isoforms. We extend the concept of reconciliation to protein trees, and we define a new reconciliation problem called MinDRGT that consists in finding a gene tree that minimizes a double reconciliation cost with a given protein tree and a given species tree. We define a second problem called MinDRPGT that consists in finding a protein supertree and a gene tree minimizing a double reconciliation cost, given a species tree and a set of protein subtrees. We propose a shift from the traditional view of protein ortholog groups as hard-clusters to soft-clusters and we study the MinDRPGT problem under this assumption. We provide algorithmic exact and heuristic solutions for versions of the problems, and we present the results of applications on protein and gene trees from the Ensembl database. The implementations of the methods are available at https://github.com/UdeS-CoBIUS/Protein2GeneTree and https://github.com/UdeS-CoBIUS/SuperProteinTree .
Collapse
Affiliation(s)
- Esaie Kuitche
- Department of Computer Science, Université de Sherbrooke, Sherbrooke, QC J1K2R1, Canada
| | - Manuel Lafond
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON K1N6N5, Canada
| | - Aïda Ouangraoua
- Department of Computer Science, Université de Sherbrooke, Sherbrooke, QC J1K2R1, Canada
| |
Collapse
|
25
|
Lucas JMEX, Roest Crollius H. High precision detection of conserved segments from synteny blocks. PLoS One 2017; 12:e0180198. [PMID: 28671949 PMCID: PMC5495381 DOI: 10.1371/journal.pone.0180198] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 06/12/2017] [Indexed: 11/19/2022] Open
Abstract
A conserved segment, i.e. a segment of chromosome unbroken during evolution, is an important operational concept in comparative genomics. Until now, algorithms that are designed to identify conserved segments often return synteny blocks that overlap, synteny blocks that include micro-rearrangements or synteny blocks erroneously short. Here we present definitions of conserved segments and synteny blocks independent of any heuristic method and we describe four new post-processing strategies to refine synteny blocks into accurate conserved segments. The first strategy identifies micro-rearrangements, the second strategy identifies mono-genic conserved segments, the third returns non-overlapping segments and the fourth repairs incorrect ruptures of synteny. All these refinements are implemented in a new version of PhylDiag that has been benchmarked against i-ADHoRe 3.0 and Cyntenator, based on a realistic simulated evolution and true simulated conserved segments.
Collapse
Affiliation(s)
- Joseph MEX Lucas
- IBENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research, University, Paris, France
| | - Hugues Roest Crollius
- IBENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research, University, Paris, France
| |
Collapse
|
26
|
Jacox E, Weller M, Tannier E, Scornavacca C. Resolution and reconciliation of non-binary gene trees with transfers, duplications and losses. Bioinformatics 2017; 33:980-987. [PMID: 28073758 DOI: 10.1093/bioinformatics/btw778] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Accepted: 12/02/2016] [Indexed: 11/14/2022] Open
Abstract
Summary Gene trees reconstructed from sequence alignments contain poorly supported branches when the phylogenetic signal in the sequences is insufficient to determine them all. When a species tree is available, the signal of gains and losses of genes can be used to correctly resolve the unsupported parts of the gene history. However finding a most parsimonious binary resolution of a non-binary tree obtained by contracting the unsupported branches is NP-hard if transfer events are considered as possible gene scale events, in addition to gene origination, duplication and loss. We propose an exact, parameterized algorithm to solve this problem in single-exponential time, where the parameter is the number of connected branches of the gene tree that show low support from the sequence alignment or, equivalently, the maximum number of children of any node of the gene tree once the low-support branches have been collapsed. This improves on the best known algorithm by an exponential factor. We propose a way to choose among optimal solutions based on the available information. We show the usability of this principle on several simulated and biological datasets. The results are comparable in quality to several other tested methods having similar goals, but our approach provides a lower running time and a guarantee that the produced solution is optimal. Availability and Implementation Our algorithm has been integrated into the ecceTERA phylogeny package, available at http://mbb.univ-montp2.fr/MBB/download_sources/16__ecceTERA and which can be run online at http://mbb.univ-montp2.fr/MBB/subsection/softExec.php?soft=eccetera . Contact celine.scornavacca@umontpellier.fr. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Edwin Jacox
- ISE-M, Université Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Mathias Weller
- Institut de Biologie Computationnelle (IBC), Montpellier, France.,LIRMM, Université Montpellier, CNRS, Montpellier, France
| | - Eric Tannier
- INRIA Rhône-Alpes, LBBE, Université Lyon 1, Lyon, France
| | - Celine Scornavacca
- ISE-M, Université Montpellier, CNRS, IRD, EPHE, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France
| |
Collapse
|