1
|
López Sánchez A, Lafond M. Predicting horizontal gene transfers with perfect transfer networks. Algorithms Mol Biol 2024; 19:6. [PMID: 38321476 PMCID: PMC10848447 DOI: 10.1186/s13015-023-00242-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 10/25/2023] [Indexed: 02/08/2024] Open
Abstract
BACKGROUND Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of such events. In this work, we ask whether character-based methods can predict gene transfers. Their advantage over sequences is that homologous genes can have low DNA similarity, but still have retained enough important common motifs that allow them to have common character traits, for instance the same functional or expression profile. A phylogeny that has two separate clades that acquired the same character independently might indicate the presence of a transfer even in the absence of sequence similarity. OUR CONTRIBUTIONS We introduce perfect transfer networks, which are phylogenetic networks that can explain the character diversity of a set of taxa under the assumption that characters have unique births, and that once a character is gained it is rarely lost. Examples of such traits include transposable elements, biochemical markers and emergence of organelles, just to name a few. We study the differences between our model and two similar models: perfect phylogenetic networks and ancestral recombination networks. Our goals are to initiate a study on the structural and algorithmic properties of perfect transfer networks. We then show that in polynomial time, one can decide whether a given network is a valid explanation for a set of taxa, and show how, for a given tree, one can add transfer edges to it so that it explains a set of taxa. We finally provide lower and upper bounds on the number of transfers required to explain a set of taxa, in the worst case.
Collapse
Affiliation(s)
| | - Manuel Lafond
- Department of Computer Science, Université de Sherbrooke, Sherbrooke, Canada
| |
Collapse
|
2
|
van Iersel L, Janssen R, Jones M, Murakami Y. Orchard Networks are Trees with Additional Horizontal Arcs. Bull Math Biol 2022; 84:76. [PMID: 35727410 PMCID: PMC9213324 DOI: 10.1007/s11538-022-01037-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 05/30/2022] [Indexed: 11/30/2022]
Abstract
Phylogenetic networks are used in biology to represent evolutionary histories. The class of orchard phylogenetic networks was recently introduced for their computational benefits, without any biological justification. Here, we show that orchard networks can be interpreted as trees with additional horizontal arcs. Therefore, they are closely related to tree-based networks, where the difference is that in tree-based networks the additional arcs do not need to be horizontal. Then, we use this new characterization to show that the space of orchard networks on n leaves with k reticulations is connected under the rNNI rearrangement move with diameter \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$O(kn+n\log (n))$$\end{document}O(kn+nlog(n)).
Collapse
Affiliation(s)
- Leo van Iersel
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, South Holland, The Netherlands
| | - Remie Janssen
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, South Holland, The Netherlands
| | - Mark Jones
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, South Holland, The Netherlands
| | - Yukihiro Murakami
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, South Holland, The Netherlands.
| |
Collapse
|
3
|
Kong S, Pons JC, Kubatko L, Wicke K. Classes of explicit phylogenetic networks and their biological and mathematical significance. J Math Biol 2022; 84:47. [PMID: 35503141 DOI: 10.1007/s00285-022-01746-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/18/2022] [Accepted: 03/31/2022] [Indexed: 11/24/2022]
Abstract
The evolutionary relationships among organisms have traditionally been represented using rooted phylogenetic trees. However, due to reticulate processes such as hybridization or lateral gene transfer, evolution cannot always be adequately represented by a phylogenetic tree, and rooted phylogenetic networks that describe such complex processes have been introduced as a generalization of rooted phylogenetic trees. In fact, estimating rooted phylogenetic networks from genomic sequence data and analyzing their structural properties is one of the most important tasks in contemporary phylogenetics. Over the last two decades, several subclasses of rooted phylogenetic networks (characterized by certain structural constraints) have been introduced in the literature, either to model specific biological phenomena or to enable tractable mathematical and computational analyses. In the present manuscript, we provide a thorough review of these network classes, as well as provide a biological interpretation of the structural constraints underlying these networks where possible. In addition, we discuss how imposing structural constraints on the network topology can be used to address the scalability and identifiability challenges faced in the estimation of phylogenetic networks from empirical data.
Collapse
Affiliation(s)
- Sungsik Kong
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA
| | - Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, 07122, Spain
| | - Laura Kubatko
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA.,Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Kristina Wicke
- Department of Mathematics, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
4
|
Pons JC, Scornavacca C, Cardona G. Generation of Level- k LGT Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:158-164. [PMID: 30703035 DOI: 10.1109/tcbb.2019.2895344] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Phylogenetic networks provide a mathematical model to represent the evolution of a set of species where, apart from speciation, reticulate evolutionary events have to be taken into account. Among these events, lateral gene transfers need special consideration due to the asymmetry in the roles of the species involved in such an event. To take into account this asymmetry, LGT networks were introduced. Contrarily to the case of phylogenetic trees, the combinatorial structure of phylogenetic networks is much less known and difficult to describe. One of the approaches in the literature is to classify them according to their level and find generators of the given level that can be used to recursively generate all networks. In this paper, we adapt the concept of generators to the case of LGT networks. We show how these generators, classified by their level, give rise to simple LGT networks of the specified level, and how any LGT network can be obtained from these simple networks, that act as building blocks of the generic structure. The stochastic models of evolution of phylogenetic networks are also much less studied than those for phylogenetic trees. In this setting, we introduce a novel two-parameter model that generates LGT networks. Finally, we present some computer simulations using this model in order to investigate the complexity of the generated networks, depending on the parameters of the model.
Collapse
|
5
|
Cardona G, Pons JC, Scornavacca C. Generation of Binary Tree-Child phylogenetic networks. PLoS Comput Biol 2019; 15:e1007347. [PMID: 31509525 PMCID: PMC6756559 DOI: 10.1371/journal.pcbi.1007347] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 09/23/2019] [Accepted: 08/20/2019] [Indexed: 11/18/2022] Open
Abstract
Phylogenetic networks generalize phylogenetic trees by allowing the modelization of events of reticulate evolution. Among the different kinds of phylogenetic networks that have been proposed in the literature, the subclass of binary tree-child networks is one of the most studied ones. However, very little is known about the combinatorial structure of these networks. In this paper we address the problem of generating all possible binary tree-child (BTC) networks with a given number of leaves in an efficient way via reduction/augmentation operations that extend and generalize analogous operations for phylogenetic trees, and are biologically relevant. Since our solution is recursive, this also provides us with a recurrence relation giving an upper bound on the number of such networks. We also show how the operations introduced in this paper can be employed to extend the evolutive history of a set of sequences, represented by a BTC network, to include a new sequence. An implementation in python of the algorithms described in this paper, along with some computational experiments, can be downloaded from https://github.com/bielcardona/TCGenerators.
Collapse
Affiliation(s)
- Gabriel Cardona
- Department of Mathematics and Computer Science, University of the Balearic Islands, Ctra. de Valldemossa Ctra. de Valldemossa km. 7.5. 07122 - Palma, Spain
| | - Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, Ctra. de Valldemossa Ctra. de Valldemossa km. 7.5. 07122 - Palma, Spain
| | - Celine Scornavacca
- Institut des Sciences de l’Evolution (ISE-M), Université de Montpellier, CNRS, IRD, EPHE, 34095 Montpellier Cedex 5, France
| |
Collapse
|
6
|
Van Iersel L, Jones M, Scornavacca C. Improved Maximum Parsimony Models for Phylogenetic Networks. Syst Biol 2018; 67:518-542. [PMID: 29272537 DOI: 10.1093/sysbio/syx094] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 12/11/2017] [Indexed: 11/13/2022] Open
Abstract
Phylogenetic networks are well suited to represent evolutionary histories comprising reticulate evolution. Several methods aiming at reconstructing explicit phylogenetic networks have been developed in the last two decades. In this article, we propose a new definition of maximum parsimony for phylogenetic networks that permits to model biological scenarios that cannot be modeled by the definitions currently present in the literature (namely, the "hardwired" and "softwired" parsimony). Building on this new definition, we provide several algorithmic results that lay the foundations for new parsimony-based methods for phylogenetic network reconstruction.
Collapse
Affiliation(s)
- Leo Van Iersel
- Delft Institute of Applied Mathematics, Delft University of Technology, P.O. Box 5, 2600 AA Delft, the Netherlands
| | - Mark Jones
- Delft Institute of Applied Mathematics, Delft University of Technology, P.O. Box 5, 2600 AA Delft, the Netherlands
| | - Celine Scornavacca
- Institut des Sciences de l'Évolution Université de Montpellier, CNRS, IRD, EPHE CC 064, Place Eugène Bataillon 34095 Montpellier Cedex 05, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France
| |
Collapse
|
7
|
Cardona G, Pons JC. Reconstruction of LGT networks from tri-LGT-nets. J Math Biol 2017; 75:1669-1692. [PMID: 28451760 DOI: 10.1007/s00285-017-1131-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 02/16/2017] [Indexed: 01/06/2023]
Abstract
Phylogenetic networks have gained attention from the scientific community due to the evidence of the existence of evolutionary events that cannot be represented using trees. A variant of phylogenetic networks, called LGT networks, models specifically lateral gene transfer events, which cannot be properly represented with generic phylogenetic networks. In this paper we treat the problem of the reconstruction of LGT networks from substructures induced by three leaves, which we call tri-LGT-nets. We first restrict ourselves to a class of LGT networks that are both mathematically treatable and biologically significant, called BAN-LGT networks. Then, we study the decomposition of such networks in subnetworks with three leaves and ask whether or not this decomposition determines the network. The answer to this question is negative, but if we further impose time-consistency (species involved in a later gene transfer must coexist) the answer is affirmative, up to some redundancy that can never be recovered but is fully characterized.
Collapse
Affiliation(s)
- Gabriel Cardona
- Department of Mathematics and Computer Science, University of the Balearic Islands, 07122, Palma, Spain.
| | - Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, 07122, Palma, Spain
| |
Collapse
|
8
|
Bordewich M, Linz S, Semple C. Lost in space? Generalising subtree prune and regraft to spaces of phylogenetic networks. J Theor Biol 2017; 423:1-12. [PMID: 28414085 DOI: 10.1016/j.jtbi.2017.03.032] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 03/18/2017] [Accepted: 03/20/2017] [Indexed: 10/19/2022]
Abstract
Over the last fifteen years, phylogenetic networks have become a popular tool to analyse relationships between species whose past includes reticulation events such as hybridisation or horizontal gene transfer. However, the space of phylogenetic networks is significantly larger than that of phylogenetic trees, and how to analyse and search this enlarged space remains a poorly understood problem. Inspired by the widely-used rooted subtree prune and regraft (rSPR) operation on rooted phylogenetic trees, we propose a new operation-called subnet prune and regraft (SNPR)-that induces a metric on the space of all rooted phylogenetic networks on a fixed set of leaves. We show that the spaces of several popular classes of rooted phylogenetic networks (e.g. tree child, reticulation visible, and tree based) are connected under SNPR and that connectedness remains for the subclasses of these networks with a fixed number of reticulations. Lastly, we bound the distance between two rooted phylogenetic networks under the SNPR operation, show that it is computationally hard to compute this distance exactly, and analyse how the SNPR-distance between two such networks relates to the rSPR-distance between rooted phylogenetic trees that are embedded in these networks.
Collapse
Affiliation(s)
- Magnus Bordewich
- School of Engineering and Computing Sciences, Durham University, Durham DH1 3LE, United Kingdom.
| | - Simone Linz
- Department of Computer Science, The University of Auckland, Private Bag 92019, Auckland 1142, New Zealand.
| | - Charles Semple
- School of Mathematics and Statistics, University of Canterbury, Private Bag 4800, Christchurch 8140, New Zealand.
| |
Collapse
|
9
|
Scornavacca C, Mayol JCP, Cardona G. Fast algorithm for the reconciliation of gene trees and LGT networks. J Theor Biol 2017; 418:129-137. [PMID: 28111320 DOI: 10.1016/j.jtbi.2017.01.024] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Revised: 09/27/2016] [Accepted: 01/16/2017] [Indexed: 10/20/2022]
Abstract
In phylogenomics, reconciliations aim at explaining the discrepancies between the evolutionary histories of genes and species. Several reconciliation models are available when the evolution of the species of interest is modelled via phylogenetic trees; the most commonly used are the DL model, accounting for duplications and losses in gene evolution and yielding polynomially-solvable problems, and the DTL model, which also accounts for gene transfers and implies NP-hard problems. However, when dealing with non-tree-like evolutionary events such as hybridisations, phylogenetic networks - and not phylogenetic trees - should be used to model species evolution. Reconciliation models involving phylogenetic networks are still at their early days. In this paper, we propose a new reconciliation model in which the evolution of species is modelled by a special kind of phylogenetic networks - the LGT networks. Our model considers duplications, losses and transfers of genes, but restricts transfers to happen through some specific arcs of the network, called secondary arcs. Moreover, we provide a polynomial algorithm to compute the most parsimonious reconciliation between a gene tree and an LGT network under this model. Our method, when combined with quartet decomposition methods to detect putative "highways" of transfers, permits to refine their analyses by allowing to examine the two possible directions of a highway and even consider combinations of highways.
Collapse
Affiliation(s)
- Celine Scornavacca
- Institut des Sciences de l'Evolution, Université de Montpellier, CNRS, IRD, EPHE 34095 Montpellier Cedex 5, France.
| | - Joan Carles Pons Mayol
- Department of Mathematics and Computer Science, University of the Balearic Islands, E-07122 Palma, Spain.
| | - Gabriel Cardona
- Department of Mathematics and Computer Science, University of the Balearic Islands, E-07122 Palma, Spain.
| |
Collapse
|