1
|
Yan Z, Cao Z, Nakhleh L. Polyphest: fast polyploid phylogeny estimation. Bioinformatics 2024; 40:ii20-ii28. [PMID: 39230710 PMCID: PMC11373313 DOI: 10.1093/bioinformatics/btae390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
MOTIVATION Despite the widespread occurrence of polyploids across the Tree of Life, especially in the plant kingdom, very few computational methods have been developed to handle the specific complexities introduced by polyploids in phylogeny estimation. Furthermore, methods that are designed to account for polyploidy often disregard incomplete lineage sorting (ILS), a major source of heterogeneous gene histories, or are computationally very demanding. Therefore, there is a great need for efficient and robust methods to accurately reconstruct polyploid phylogenies. RESULTS We introduce Polyphest (POLYploid PHylogeny ESTimation), a new method for efficiently and accurately inferring species phylogenies in the presence of both polyploidy and ILS. Polyphest bypasses the need for extensive network space searches by first generating a multilabeled tree based on gene trees, which is then converted into a (uniquely labeled) species phylogeny. We compare the performance of Polyphest to that of two polyploid phylogeny estimation methods, one of which does not account for ILS, namely PADRE, and another that accounts for ILS, namely MPAllopp. Polyphest is more accurate than PADRE and achieves comparable accuracy to MPAllopp, while being significantly faster. We also demonstrate the application of Polyphest to empirical data from the hexaploid bread wheat and confirm the allopolyploid origin of bread wheat along with the closest relatives for each of its subgenomes. AVAILABILITY AND IMPLEMENTATION Polyphest is available at https://github.com/NakhlehLab/Polyphest.
Collapse
Affiliation(s)
- Zhi Yan
- Department of Computer Science, Rice University, Houston, TX 77005, United States
| | - Zhen Cao
- Department of Computer Science, Rice University, Houston, TX 77005, United States
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX 77005, United States
- Department of BioSciences, Rice University, Houston, TX 77005, United States
| |
Collapse
|
2
|
Döcker J, Linz S, Wicke K. Bounding the Softwired Parsimony Score of a Phylogenetic Network. Bull Math Biol 2024; 86:121. [PMID: 39174812 PMCID: PMC11341636 DOI: 10.1007/s11538-024-01350-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 08/16/2024] [Indexed: 08/24/2024]
Abstract
In comparison to phylogenetic trees, phylogenetic networks are more suitable to represent complex evolutionary histories of species whose past includes reticulation such as hybridisation or lateral gene transfer. However, the reconstruction of phylogenetic networks remains challenging and computationally expensive due to their intricate structural properties. For example, the small parsimony problem that is solvable in polynomial time for phylogenetic trees, becomes NP-hard on phylogenetic networks under softwired and parental parsimony, even for a single binary character and structurally constrained networks. To calculate the parsimony score of a phylogenetic network N, these two parsimony notions consider different exponential-size sets of phylogenetic trees that can be extracted from N and infer the minimum parsimony score over all trees in the set. In this paper, we ask: What is the maximum difference between the parsimony score of any phylogenetic tree that is contained in the set of considered trees and a phylogenetic tree whose parsimony score equates to the parsimony score of N? Given a gap-free sequence alignment of multi-state characters and a rooted binary level-k phylogenetic network, we use the novel concept of an informative blob to show that this difference is bounded by k + 1 times the softwired parsimony score of N. In particular, the difference is independent of the alignment length and the number of character states. We show that an analogous bound can be obtained for the softwired parsimony score of semi-directed networks, while under parental parsimony on the other hand, such a bound does not hold.
Collapse
Affiliation(s)
- Janosch Döcker
- School of Computer Science, University of Auckland, Auckland, New Zealand
| | - Simone Linz
- School of Computer Science, University of Auckland, Auckland, New Zealand.
| | - Kristina Wicke
- Department of Mathematical Sciences, New Jersey Institute of Technology, Newark, USA
| |
Collapse
|
3
|
Ané C, Fogg J, Allman ES, Baños H, Rhodes JA. Anomalous networks under the multispecies coalescent: theory and prevalence. J Math Biol 2024; 88:29. [PMID: 38372830 DOI: 10.1007/s00285-024-02050-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 01/18/2024] [Accepted: 01/21/2024] [Indexed: 02/20/2024]
Abstract
Reticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an anomalous network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the network. We investigate anomalous networks under the Network Multispecies Coalescent Model with possible correlated inheritance at reticulations. Focusing on subsets of 4 taxa, we describe a new algorithm to calculate quartet concordance factors on networks of any level, faster than previous algorithms because of its focus on 4 taxa. We then study topological properties required for a 4-taxon network to be anomalous, uncovering the key role of [Formula: see text]-cycles: cycles of 3 edges parent to a sister group of 2 taxa. Under the model of common inheritance, that is, when each gene tree coalesces within a species tree displayed in the network, we prove that 4-taxon networks are never anomalous. Under independent and various levels of correlated inheritance, we use simulations under realistic parameters to quantify the prevalence of anomalous 4-taxon networks, finding that truly anomalous networks are rare. At the same time, however, we find a significant fraction of networks close enough to the anomaly zone to appear anomalous, when considering the quartet concordance factors observed from a few hundred genes. These apparent anomalies may challenge network inference methods.
Collapse
Affiliation(s)
- Cécile Ané
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA.
- Department of Botany, University of Wisconsin - Madison, Madison, WI, 53706, USA.
| | - John Fogg
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA
| | - Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775-6660, USA
| | - Hector Baños
- Department of Biochemistry & Molecular Biology, Dalhousie University, Halifax, NS, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775-6660, USA
| |
Collapse
|
4
|
Francis A, Steel M. Labellable Phylogenetic Networks. Bull Math Biol 2023; 85:46. [PMID: 37097343 PMCID: PMC10129972 DOI: 10.1007/s11538-023-01157-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 04/03/2023] [Indexed: 04/26/2023]
Abstract
Phylogenetic networks are mathematical representations of evolutionary history that are able to capture both tree-like evolutionary processes (speciations) and non-tree-like 'reticulate' processes such as hybridization or horizontal gene transfer. The additional complexity that comes with this capacity, however, makes networks harder to infer from data, and more complicated to work with as mathematical objects. In this paper, we define a new, large class of phylogenetic networks, that we call labellable, and show that they are in bijection with the set of 'expanding covers' of finite sets. This correspondence is a generalisation of the encoding of phylogenetic forests by partitions of finite sets. Labellable networks can be characterised by a simple combinatorial condition, and we describe the relationship between this large class and other commonly studied classes. Furthermore, we show that all phylogenetic networks have a quotient network that is labellable.
Collapse
Affiliation(s)
- Andrew Francis
- Centre for Research in Mathematics and Data Science, Western Sydney University, Penrith, Australia.
| | - Mike Steel
- Biomathematics Research Centre, Canterbury University, Christchurch, New Zealand
| |
Collapse
|
5
|
van Iersel L, Moulton V, Murakami Y. Polynomial invariants for cactuses. INFORM PROCESS LETT 2023. [DOI: 10.1016/j.ipl.2023.106394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
6
|
Huber KT, Maher LJ. The hybrid number of a ploidy profile. J Math Biol 2022; 85:30. [PMID: 36114394 PMCID: PMC9481518 DOI: 10.1007/s00285-022-01792-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 08/09/2022] [Accepted: 08/18/2022] [Indexed: 11/26/2022]
Abstract
Polyploidization, whereby an organism inherits multiple copies of the genome of their parents, is an important evolutionary event that has been observed in plants and animals. One way to study such events is in terms of the ploidy number of the species that make up a dataset of interest. It is therefore natural to ask: How much information about the evolutionary past of the set of species that form a dataset can be gleaned from the ploidy numbers of the species? To help answer this question, we introduce and study the novel concept of a ploidy profile which allows us to formalize it in terms of a multiplicity vector indexed by the species the dataset is comprised of. Using the framework of a phylogenetic network, we present a closed formula for computing the hybrid number (i.e. the minimal number of polyploidization events required to explain a ploidy profile) of a large class of ploidy profiles. This formula relies on the construction of a certain phylogenetic network from the simplification sequence of a ploidy profile and the hybrid number of the ploidy profile with which this construction is initialized. Both of them can be computed easily in case the ploidy numbers that make up the ploidy profile are not too large. To help illustrate the applicability of our approach, we apply it to a simplified version of a publicly available Viola dataset.
Collapse
|
7
|
Pons JC, Coronado TM, Hendriksen M, Francis A. A polynomial invariant for a new class of phylogenetic networks. PLoS One 2022; 17:e0268181. [PMID: 35594308 PMCID: PMC9122212 DOI: 10.1371/journal.pone.0268181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2022] [Accepted: 04/24/2022] [Indexed: 11/30/2022] Open
Abstract
Invariants for complicated objects such as those arising in phylogenetics, whether they are invariants as matrices, polynomials, or other mathematical structures, are important tools for distinguishing and working with such objects. In this paper, we generalize a complete polynomial invariant on trees to a class of phylogenetic networks called separable networks, which will include orchard networks. Networks are becoming increasingly important for their ability to represent reticulation events, such as hybridization, in evolutionary history. We provide a function from the space of internally multi-labelled phylogenetic networks, a more generic graph structure than phylogenetic networks where the reticulations are also labelled, to a polynomial ring. We prove that the separability condition allows us to characterize, via the polynomial, the phylogenetic networks with the same number of leaves and same number of reticulations by considering their internally labelled versions. While the invariant for trees is a polynomial in Z[x1,…,xn,y] where n is the number of leaves, the invariant for internally multi-labelled phylogenetic networks is an element of Z[x1,…,xn,λ1,…,λr,y], where r is the number of reticulations in the network. When the networks are considered without leaf labels the number of variables reduces to r + 2.
Collapse
Affiliation(s)
- Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, Spain
- * E-mail:
| | - Tomás M. Coronado
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, Spain
| | - Michael Hendriksen
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Andrew Francis
- Centre for Research in Mathematics and Data Science, Western Sydney University, Parramatta, Australia
| |
Collapse
|
8
|
Wawerka M, Dąbkowski D, Rutecka N, Mykowiecka A, Górecki P. Embedding gene trees into phylogenetic networks by conflict resolution algorithms. Algorithms Mol Biol 2022; 17:11. [PMID: 35590416 PMCID: PMC9119282 DOI: 10.1186/s13015-022-00218-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 03/22/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Phylogenetic networks are mathematical models of evolutionary processes involving reticulate events such as hybridization, recombination, or horizontal gene transfer. One of the crucial notions in phylogenetic network modelling is displayed tree, which is obtained from a network by removing a set of reticulation edges. Displayed trees may represent an evolutionary history of a gene family if the evolution is shaped by reticulation events. RESULTS We address the problem of inferring an optimal tree displayed by a network, given a gene tree G and a tree-child network N, under the deep coalescence and duplication costs. We propose an O(mn)-time dynamic programming algorithm (DP) to compute a lower bound of the optimal displayed tree cost, where m and n are the sizes of G and N, respectively. In addition, our algorithm can verify whether the solution is exact. Moreover, it provides a set of reticulation edges corresponding to the obtained cost. If the cost is exact, the set induces an optimal displayed tree. Otherwise, the set contains pairs of conflicting edges, i.e., edges sharing a reticulation node. Next, we show a conflict resolution algorithm that requires [Formula: see text] invocations of DP in the worst case, where r is the number of reticulations. We propose a similar [Formula: see text]-time algorithm for level-k tree-child networks and a branch and bound solution to compute lower and upper bounds of optimal costs. We also extend the algorithms to a broader class of phylogenetic networks. Based on simulated data, the average runtime is [Formula: see text] under the deep-coalescence cost and [Formula: see text] under the duplication cost. CONCLUSIONS Despite exponential complexity in the worst case, our algorithms perform significantly well on empirical and simulated datasets, due to the strategy of resolving internal dissimilarities between gene trees and networks. Therefore, the algorithms are efficient alternatives to enumeration strategies commonly proposed in the literature and enable analyses of complex networks with dozens of reticulations.
Collapse
|
9
|
Kong S, Pons JC, Kubatko L, Wicke K. Classes of explicit phylogenetic networks and their biological and mathematical significance. J Math Biol 2022; 84:47. [PMID: 35503141 DOI: 10.1007/s00285-022-01746-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/18/2022] [Accepted: 03/31/2022] [Indexed: 11/24/2022]
Abstract
The evolutionary relationships among organisms have traditionally been represented using rooted phylogenetic trees. However, due to reticulate processes such as hybridization or lateral gene transfer, evolution cannot always be adequately represented by a phylogenetic tree, and rooted phylogenetic networks that describe such complex processes have been introduced as a generalization of rooted phylogenetic trees. In fact, estimating rooted phylogenetic networks from genomic sequence data and analyzing their structural properties is one of the most important tasks in contemporary phylogenetics. Over the last two decades, several subclasses of rooted phylogenetic networks (characterized by certain structural constraints) have been introduced in the literature, either to model specific biological phenomena or to enable tractable mathematical and computational analyses. In the present manuscript, we provide a thorough review of these network classes, as well as provide a biological interpretation of the structural constraints underlying these networks where possible. In addition, we discuss how imposing structural constraints on the network topology can be used to address the scalability and identifiability challenges faced in the estimation of phylogenetic networks from empirical data.
Collapse
Affiliation(s)
- Sungsik Kong
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA
| | - Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, 07122, Spain
| | - Laura Kubatko
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA.,Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Kristina Wicke
- Department of Mathematics, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
10
|
Yan Z, Cao Z, Liu Y, Ogilvie HA, Nakhleh L. Maximum Parsimony Inference of Phylogenetic Networks in the Presence of Polyploid Complexes. Syst Biol 2021; 71:706-720. [PMID: 34605924 PMCID: PMC9017653 DOI: 10.1093/sysbio/syab081] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 09/26/2021] [Accepted: 09/29/2021] [Indexed: 12/18/2022] Open
Abstract
Phylogenetic networks provide a powerful framework for modeling and analyzing reticulate
evolutionary histories. While polyploidy has been shown to be prevalent not only in plants
but also in other groups of eukaryotic species, most work done thus far on phylogenetic
network inference assumes diploid hybridization. These inference methods have been
applied, with varying degrees of success, to data sets with polyploid species, even though
polyploidy violates the mathematical assumptions underlying these methods. Statistical
methods were developed recently for handling specific types of polyploids and so were
parsimony methods that could handle polyploidy more generally yet while excluding
processes such as incomplete lineage sorting. In this article, we introduce a new method
for inferring most parsimonious phylogenetic networks on data that include polyploid
species. Taking gene tree topologies as input, the method seeks a phylogenetic network
that minimizes deep coalescences while accounting for polyploidy. We demonstrate the
performance of the method on both simulated and biological data. The inference method as
well as a method for evaluating evolutionary hypotheses in the form of phylogenetic
networks are implemented and publicly available in the PhyloNet software package.
[Incomplete lineage sorting; minimizing deep coalescences; multilabeled trees;
multispecies network coalescent; phylogenetic networks; polyploidy.]
Collapse
Affiliation(s)
- Zhi Yan
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| | - Zhen Cao
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| | - Yushu Liu
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| | - Huw A Ogilvie
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
- Department of Biosciences, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| |
Collapse
|
11
|
Huber KT, Linz S, Moulton V. The rigid hybrid number for two phylogenetic trees. J Math Biol 2021; 82:40. [PMID: 33770290 PMCID: PMC7997861 DOI: 10.1007/s00285-021-01594-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 01/25/2021] [Accepted: 03/13/2021] [Indexed: 11/29/2022]
Abstract
Recently there has been considerable interest in the problem of finding a phylogenetic network with a minimum number of reticulation vertices which displays a given set of phylogenetic trees, that is, a network with minimum hybrid number. Such networks are useful for representing the evolution of species whose genomes have undergone processes such as lateral gene transfer and recombination that cannot be represented appropriately by a phylogenetic tree. Even so, as was recently pointed out in the literature, insisting that a network displays the set of trees can be an overly restrictive assumption when modeling certain evolutionary phenomena such as incomplete lineage sorting. In this paper, we thus consider the less restrictive notion of rigidly displaying which we introduce and study here. More specifically, we characterize when two trees can be rigidly displayed by a certain type of phylogenetic network called a temporal tree-child network in terms of fork-picking sequences. These are sequences of special subconfigurations of the two trees related to the well-studied cherry-picking sequences. We also show that, in case it exists, the rigid hybrid number for two phylogenetic trees is given by a minimum weight fork-picking sequence for the trees. Finally, we consider the relationship between the rigid hybrid number and three closely related numbers; the weak, beaded, and temporal hybrid numbers. In particular, we show that these numbers can all be different even for a fixed pair of trees, and also present an infinite family of pairs of trees which demonstrates that the difference between the rigid hybrid number and the temporal-hybrid number for two phylogenetic trees on the same set of n leaves can grow at least linearly with n.
Collapse
Affiliation(s)
| | - Simone Linz
- School of Computer Science, University of Auckland, Auckland, New Zealand
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, UK
| |
Collapse
|
12
|
Pons JC, Scornavacca C, Cardona G. Generation of Level- k LGT Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:158-164. [PMID: 30703035 DOI: 10.1109/tcbb.2019.2895344] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Phylogenetic networks provide a mathematical model to represent the evolution of a set of species where, apart from speciation, reticulate evolutionary events have to be taken into account. Among these events, lateral gene transfers need special consideration due to the asymmetry in the roles of the species involved in such an event. To take into account this asymmetry, LGT networks were introduced. Contrarily to the case of phylogenetic trees, the combinatorial structure of phylogenetic networks is much less known and difficult to describe. One of the approaches in the literature is to classify them according to their level and find generators of the given level that can be used to recursively generate all networks. In this paper, we adapt the concept of generators to the case of LGT networks. We show how these generators, classified by their level, give rise to simple LGT networks of the specified level, and how any LGT network can be obtained from these simple networks, that act as building blocks of the generic structure. The stochastic models of evolution of phylogenetic networks are also much less studied than those for phylogenetic trees. In this setting, we introduce a novel two-parameter model that generates LGT networks. Finally, we present some computer simulations using this model in order to investigate the complexity of the generated networks, depending on the parameters of the model.
Collapse
|
13
|
Hellmuth M, Huber KT, Moulton V. Reconciling event-labeled gene trees with MUL-trees and species networks. J Math Biol 2019; 79:1885-1925. [PMID: 31410552 DOI: 10.1007/s00285-019-01414-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 05/08/2019] [Indexed: 11/30/2022]
Abstract
Phylogenomics commonly aims to construct evolutionary trees from genomic sequence information. One way to approach this problem is to first estimate event-labeled gene trees (i.e., rooted trees whose non-leaf vertices are labeled by speciation or gene duplication events), and to then look for a species tree which can be reconciled with this tree through a reconciliation map between the trees. In practice, however, it can happen that there is no such map from a given event-labeled tree to any species tree. An important situation where this might arise is where the species evolution is better represented by a network instead of a tree. In this paper, we therefore consider the problem of reconciling event-labeled trees with species networks. In particular, we prove that any event-labeled gene tree can be reconciled with some network and that, under certain mild assumptions on the gene tree, the network can even be assumed to be multi-arc free. To prove this result, we show that we can always reconcile the gene tree with some multi-labeled (MUL-)tree, which can then be "folded up" to produce the desired reconciliation and network. In addition, we study the interplay between reconciliation maps from event-labeled gene trees to MUL-trees and networks. Our results could be useful for understanding how genomes have evolved after undergoing complex evolutionary events such as polyploidy.
Collapse
Affiliation(s)
- Marc Hellmuth
- Institute of Mathematics and Computer Science, University of Greifswald, Greifswald, Germany. .,Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
| | - Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, UK
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, UK
| |
Collapse
|
14
|
Van Iersel L, Janssen R, Jones M, Murakami Y, Zeh N. Polynomial-Time Algorithms for Phylogenetic Inference Problems involving duplication and reticulation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 17:14-26. [PMID: 31425045 DOI: 10.1109/tcbb.2019.2934957] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
A common problem in phylogenetics is to try to infer a species phylogeny from gene trees. We consider different variants of this problem. The first variant, called Unrestricted Minimal Episodes Inference, aims at inferring a species tree based on a model with speciation and duplication where duplications are clustered in duplication episodes. The goal is to minimize the number of such episodes. The second variant, Parental Hybridization, aims at inferring a species network based on a model with speciation and reticulation. The goal is to minimize the number of reticulation events. It is a variant of the well-studied Hybridization Number problem with a more generous view on which gene trees are consistent with a given species network. We show that these seemingly different problems are in fact closely related and can, surprisingly, both be solved in polynomial time, using a structure we call "beaded trees". However, we also show that methods based on these problems have to be used with care because the optimal species phylogenies always have a restricted form. To mitigate this problem, we introduce a new variant of Unrestricted Minimal Episodes Inference that minimizes the duplication episode depth. We prove that this new variant of the problem can also be solved in polynomial time.
Collapse
|
15
|
Van Iersel L, Jones M, Scornavacca C. Improved Maximum Parsimony Models for Phylogenetic Networks. Syst Biol 2018; 67:518-542. [PMID: 29272537 DOI: 10.1093/sysbio/syx094] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 12/11/2017] [Indexed: 11/13/2022] Open
Abstract
Phylogenetic networks are well suited to represent evolutionary histories comprising reticulate evolution. Several methods aiming at reconstructing explicit phylogenetic networks have been developed in the last two decades. In this article, we propose a new definition of maximum parsimony for phylogenetic networks that permits to model biological scenarios that cannot be modeled by the definitions currently present in the literature (namely, the "hardwired" and "softwired" parsimony). Building on this new definition, we provide several algorithmic results that lay the foundations for new parsimony-based methods for phylogenetic network reconstruction.
Collapse
Affiliation(s)
- Leo Van Iersel
- Delft Institute of Applied Mathematics, Delft University of Technology, P.O. Box 5, 2600 AA Delft, the Netherlands
| | - Mark Jones
- Delft Institute of Applied Mathematics, Delft University of Technology, P.O. Box 5, 2600 AA Delft, the Netherlands
| | - Celine Scornavacca
- Institut des Sciences de l'Évolution Université de Montpellier, CNRS, IRD, EPHE CC 064, Place Eugène Bataillon 34095 Montpellier Cedex 05, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France
| |
Collapse
|
16
|
Abstract
The need for structures capable of accommodating complex evolutionary signals such as those found in, for example, wheat has fueled research into phylogenetic networks. Such structures generalize the standard model of a phylogenetic tree by also allowing for cycles and have been introduced in rooted and unrooted form. In contrast to phylogenetic trees or their unrooted versions, rooted phylogenetic networks are notoriously difficult to understand. To help alleviate this, recent work on them has also centered on their "uprooted" versions. By focusing on such graphs and the combinatorial concept of a split system which underpins an unrooted phylogenetic network, we show that not only can a so-called (uprooted) 1-nested network N be obtained from the Buneman graph (sometimes also called a median network) associated with the split system [Formula: see text] induced on the set of leaves of N but also that that graph is, in a well-defined sense, optimal. Along the way, we establish the 1-nested analogue of the fundamental "splits equivalence theorem" for phylogenetic trees and characterize maximal circular split systems.
Collapse
Affiliation(s)
- P. Gambette
- LIGM (UMR 8049), UPEM, CNRS, ESIEE, ENPC, Université Paris-Est, 77454 Marne-la-Vallée, France
| | - K. T. Huber
- School of Computing Sciences, University of East Anglia, Norwich, UK
| | - G. E. Scholz
- School of Computing Sciences, University of East Anglia, Norwich, UK
| |
Collapse
|
17
|
Schliep K, Potts AJ, Morrison DA, Grimm GW. Intertwining phylogenetic trees and networks. Methods Ecol Evol 2017. [DOI: 10.1111/2041-210x.12760] [Citation(s) in RCA: 119] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
18
|
Abstract
BACKGROUND Phylogenetic networks model reticulate evolutionary histories. The last two decades have seen an increased interest in establishing mathematical results and developing computational methods for inferring and analyzing these networks. A salient concept underlying a great majority of these developments has been the notion that a network displays a set of trees and those trees can be used to infer, analyze, and study the network. RESULTS In this paper, we show that in the presence of coalescence effects, the set of displayed trees is not sufficient to capture the network. We formally define the set of parental trees of a network and make three contributions based on this definition. First, we extend the notion of anomaly zone to phylogenetic networks and report on anomaly results for different networks. Second, we demonstrate how coalescence events could negatively affect the ability to infer a species tree that could be augmented into the correct network. Third, we demonstrate how a phylogenetic network can be viewed as a mixture model that lends itself to a novel inference approach via gene tree clustering. CONCLUSIONS Our results demonstrate the limitations of focusing on the set of trees displayed by a network when analyzing and inferring the network. Our findings can form the basis for achieving higher accuracy when inferring phylogenetic networks and open up new venues for research in this area, including new problem formulations based on the notion of a network's parental trees.
Collapse
Affiliation(s)
- Jiafan Zhu
- Department of Computer Science, Rice University, Houston, 77005 Texas USA
| | - Yun Yu
- Department of Computer Science, Rice University, Houston, 77005 Texas USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, 77005 Texas USA
- Department of BioSciences, Rice University, Houston, 77005 Texas USA
| |
Collapse
|