1
|
Bernardini G, van Iersel L, Julien E, Stougie L. Inferring phylogenetic networks from multifurcating trees via cherry picking and machine learning. Mol Phylogenet Evol 2024; 199:108137. [PMID: 39029549 DOI: 10.1016/j.ympev.2024.108137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 02/19/2024] [Accepted: 06/29/2024] [Indexed: 07/21/2024]
Abstract
The Hybridization problem asks to reconcile a set of conflicting phylogenetic trees into a single phylogenetic network with the smallest possible number of reticulation nodes. This problem is computationally hard and previous solutions are limited to small and/or severely restricted data sets, for example, a set of binary trees with the same taxon set or only two non-binary trees with non-equal taxon sets. Building on our previous work on binary trees, we present FHyNCH, the first algorithmic framework to heuristically solve the Hybridization problem for large sets of multifurcating trees whose sets of taxa may differ. Our heuristics combine the cherry-picking technique, recently proposed to solve the same problem for binary trees, with two carefully designed machine-learning models. We demonstrate that our methods are practical and produce qualitatively good solutions through experiments on both synthetic and real data sets.
Collapse
Affiliation(s)
| | - Leo van Iersel
- Delft Institute of Applied Mathematics, Delft, The Netherlands
| | - Esther Julien
- Delft Institute of Applied Mathematics, Delft, The Netherlands.
| | - Leen Stougie
- CWI, Amsterdam, the Netherlands; Vrije Universiteit, Amsterdam, The Netherlands; INRIA-Erable, France
| |
Collapse
|
2
|
Lutteropp S, Scornavacca C, Kozlov AM, Morel B, Stamatakis A. NetRAX: accurate and fast maximum likelihood phylogenetic network inference. BIOINFORMATICS (OXFORD, ENGLAND) 2022; 38:3725-3733. [PMID: 35713506 DOI: 10.1101/2021.08.30.458194] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 05/11/2022] [Accepted: 06/14/2022] [Indexed: 05/26/2023]
Abstract
MOTIVATION Phylogenetic networks can represent non-treelike evolutionary scenarios. Current, actively developed approaches for phylogenetic network inference jointly account for non-treelike evolution and incomplete lineage sorting (ILS). Unfortunately, this induces a very high computational complexity and current tools can only analyze small datasets. RESULTS We present NetRAX, a tool for maximum likelihood (ML) inference of phylogenetic networks in the absence of ILS. Our tool leverages state-of-the-art methods for efficiently computing the phylogenetic likelihood function on trees, and extends them to phylogenetic networks via the notion of 'displayed trees'. NetRAX can infer ML phylogenetic networks from partitioned multiple sequence alignments and returns the inferred networks in Extended Newick format. On simulated data, our results show a very low relative difference in Bayesian Information Criterion (BIC) score and a near-zero unrooted softwired cluster distance to the true, simulated networks. With NetRAX, a network inference on a partitioned alignment with 8000 sites, 30 taxa and 3 reticulations completes within a few minutes on a standard laptop. AVAILABILITY AND IMPLEMENTATION Our implementation is available under the GNU General Public License v3.0 at https://github.com/lutteropp/NetRAX. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sarah Lutteropp
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
| | - Céline Scornavacca
- Institut des Sciences de l'Évolution Université de Montpellier, CNRS, IRD, EPHE Place Eugène Bataillon, 34095 Montpellier Cedex 05, France
| | - Alexey M Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe 76128, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe 76128, Germany
| |
Collapse
|
3
|
Karimi N, Grover CE, Gallagher JP, Wendel JF, Ané C, Baum DA. Reticulate Evolution Helps Explain Apparent Homoplasy in Floral Biology and Pollination in Baobabs (Adansonia; Bombacoideae; Malvaceae). Syst Biol 2019; 69:462-478. [DOI: 10.1093/sysbio/syz073] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 10/24/2019] [Accepted: 10/26/2019] [Indexed: 12/17/2022] Open
Abstract
Abstract
Baobabs (Adansonia) are a cohesive group of tropical trees with a disjunct distribution in Australia, Madagascar, and continental Africa, and diverse flowers associated with two pollination modes. We used custom-targeted sequence capture in conjunction with new and existing phylogenetic comparative methods to explore the evolution of floral traits and pollination systems while allowing for reticulate evolution. Our analyses suggest that relationships in Adansonia are confounded by reticulation, with network inference methods supporting at least one reticulation event. The best supported hypothesis involves introgression between Adansonia rubrostipa and core Longitubae, both of which are hawkmoth pollinated with yellow/red flowers, but there is also some support for introgression between the African lineage and Malagasy Brevitubae, which are both mammal-pollinated with white flowers. New comparative methods for phylogenetic networks were developed that allow maximum-likelihood inference of ancestral states and were applied to study the apparent homoplasy in floral biology and pollination mode seen in Adansonia. This analysis supports a role for introgressive hybridization in morphological evolution even in a clade with highly divergent and geographically widespread species. Our new comparative methods for discrete traits on species networks are implemented in the software PhyloNetworks. [Comparative methods; Hyb-Seq; introgression; network inference; population trees; reticulate evolution; species tree inference; targeted sequence capture.]
Collapse
Affiliation(s)
- Nisa Karimi
- Department of Botany, University of Wisconsin – Madison, 430 Lincoln Drive, Madison, WI 53706, USA
| | - Corrinne E Grover
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, 2200 Osborn Drive, Ames, IA 50011, USA
| | - Joseph P Gallagher
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, 2200 Osborn Drive, Ames, IA 50011, USA
- Department of Biology, University of Massachusetts, 611 North Pleasant Street, Amherst, MA 01003, USA
| | - Jonathan F Wendel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, 2200 Osborn Drive, Ames, IA 50011, USA
| | - Cécile Ané
- Department of Botany, University of Wisconsin – Madison, 430 Lincoln Drive, Madison, WI 53706, USA
- Department of Statistics, University of Wisconsin – Madison, 1300 University Ave, WI, 53706, USA
| | - David A Baum
- Department of Botany, University of Wisconsin – Madison, 430 Lincoln Drive, Madison, WI 53706, USA
- Wisconsin Institute for Discovery, 330 N Orchard Street, Madison, 430 Lincoln Drive, Madison, WI 53706, USA
| |
Collapse
|
4
|
Zhu S, Degnan JH. Displayed Trees Do Not Determine Distinguishability Under the Network Multispecies Coalescent. Syst Biol 2018; 66:283-298. [PMID: 27780899 DOI: 10.1093/sysbio/syw097] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 03/08/2016] [Indexed: 11/13/2022] Open
Abstract
Recent work in estimating species relationships from gene trees has included inferring networks assuming that past hybridization has occurred between species. Probabilistic models using the multispecies coalescent can be used in this framework for likelihood-based inference of both network topologies and parameters, including branch lengths and hybridization parameters. A difficulty for such methods is that it is not always clear whether, or to what extent, networks are identifiable-that is whether there could be two distinct networks that lead to the same distribution of gene trees. For cases in which incomplete lineage sorting occurs in addition to hybridization, we demonstrate a new representation of the species network likelihood that expresses the probability distribution of the gene tree topologies as a linear combination of gene tree distributions given a set of species trees. This representation makes it clear that in some cases in which two distinct networks give the same distribution of gene trees when sampling one allele per species, the two networks can be distinguished theoretically when multiple individuals are sampled per species. This result means that network identifiability is not only a function of the trees displayed by the networks but also depends on allele sampling within species. We additionally give an example in which two networks that display exactly the same trees can be distinguished from their gene trees even when there is only one lineage sampled per species. [gene tree, hybridization, identifiability, maximum likelihood, species tree, phylogeny.].
Collapse
Affiliation(s)
- Sha Zhu
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | - James H Degnan
- Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM 87110, USA
| |
Collapse
|
5
|
Abstract
Phylogeography, and its extensions into comparative phylogeography, have their roots in the layering of gene trees across geography, a paradigm that was greatly facilitated by the nonrecombining, fast evolution provided by animal mtDNA. As phylogeography moves into the era of next-generation sequencing, the specter of reticulation at several levels-within loci and genomes in the form of recombination and across populations and species in the form of introgression-has raised its head with a prominence even greater than glimpsed during the nuclear gene PCR era. Here we explore the theme of reticulation in comparative phylogeography, speciation analysis, and phylogenomics, and ask how the centrality of gene trees has fared in the next-generation era. To frame these issues, we first provide a snapshot of multilocus phylogeographic studies across the Carpentarian Barrier, a prominent biogeographic barrier dividing faunas spanning the monsoon tropics in northern Australia. We find that divergence across this barrier is evident in most species, but is heterogeneous in time and demographic history, often reflecting the taxonomic distinctness of lineages spanning it. We then discuss a variety of forces generating reticulate patterns in phylogeography, including introgression, contact zones, and the potential selection-driven outliers on next-generation molecular markers. We emphasize the continued need for demographic models incorporating reticulation at the level of genomes and populations, and conclude that gene trees, whether explicit or implicit, should continue to play a role in the future of phylogeography.
Collapse
|
6
|
Gambette P, van Iersel L, Jones M, Lafond M, Pardi F, Scornavacca C. Rearrangement moves on rooted phylogenetic networks. PLoS Comput Biol 2017; 13:e1005611. [PMID: 28763439 PMCID: PMC5557604 DOI: 10.1371/journal.pcbi.1005611] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Revised: 08/15/2017] [Accepted: 05/27/2017] [Indexed: 12/05/2022] Open
Abstract
Phylogenetic tree reconstruction is usually done by local search heuristics that explore the space of the possible tree topologies via simple rearrangements of their structure. Tree rearrangement heuristics have been used in combination with practically all optimization criteria in use, from maximum likelihood and parsimony to distance-based principles, and in a Bayesian context. Their basic components are rearrangement moves that specify all possible ways of generating alternative phylogenies from a given one, and whose fundamental property is to be able to transform, by repeated application, any phylogeny into any other phylogeny. Despite their long tradition in tree-based phylogenetics, very little research has gone into studying similar rearrangement operations for phylogenetic network—that is, phylogenies explicitly representing scenarios that include reticulate events such as hybridization, horizontal gene transfer, population admixture, and recombination. To fill this gap, we propose “horizontal” moves that ensure that every network of a certain complexity can be reached from any other network of the same complexity, and “vertical” moves that ensure reachability between networks of different complexities. When applied to phylogenetic trees, our horizontal moves—named rNNI and rSPR—reduce to the best-known moves on rooted phylogenetic trees, nearest-neighbor interchange and rooted subtree pruning and regrafting. Besides a number of reachability results—separating the contributions of horizontal and vertical moves—we prove that rNNI moves are local versions of rSPR moves, and provide bounds on the sizes of the rNNI neighborhoods. The paper focuses on the most biologically meaningful versions of phylogenetic networks, where edges are oriented and reticulation events clearly identified. Moreover, our rearrangement moves are robust to the fact that networks with higher complexity usually allow a better fit with the data. Our goal is to provide a solid basis for practical phylogenetic network reconstruction. Phylogenetic networks are used to represent reticulate evolution, that is, cases in which the tree-of-life metaphor for evolution breaks down, because some of its branches have merged at one or several points in the past. This may occur, for example, when some organisms in the phylogeny are hybrids. In this paper, we deal with an elementary question for the reconstruction of phylogenetic networks: how to explore the space of all possible networks. The fundamental component for this is the set of operations that should be employed to generate alternative hypotheses for what happened in the past—which serve as basic blocks for optimization techniques such as hill-climbing. Although these approaches have a long tradition in classic tree-based phylogenetics, their application to networks that explicitly represent reticulate evolution is relatively unexplored. This paper provides the fundamental definitions and theoretical results for subsequent work in practical methods for phylogenetic network reconstruction: we subdivide networks into layers, according to a generally-accepted measure of their complexity, and provide operations that allow both to fully explore each layer, and to move across different layers. These operations constitute natural generalizations of well-known operations for the exploration of the space of phylogenetic trees, the lowest layer in the hierarchy described above.
Collapse
Affiliation(s)
- Philippe Gambette
- Laboratoire d’Informatique Gaspard-Monge (LIGM), Université Paris-Est, CNRS, ENPC, ESIEE Paris, UPEM, F-77454, Marne-la-Vallée, France
| | - Leo van Iersel
- Delft Institute of Applied Mathematics, Delft University of Technology, Postbus 5031, 2628 CD Delft, The Netherlands
| | - Mark Jones
- Delft Institute of Applied Mathematics, Delft University of Technology, Postbus 5031, 2628 CD Delft, The Netherlands
| | - Manuel Lafond
- Department of Mathematics and Statistics, University of Ottawa, K1N 6N5 Ottawa, Canada
| | - Fabio Pardi
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier, CNRS, 34095 Montpellier Cedex 5, France
- Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France
- * E-mail:
| | - Celine Scornavacca
- Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France
- Institut des Sciences de l’Evolution (ISE-M), Université de Montpellier, CNRS, IRD, EPHE, 34095 Montpellier Cedex 5, France
| |
Collapse
|
7
|
Implementing and testing the multispecies coalescent model: A valuable paradigm for phylogenomics. Mol Phylogenet Evol 2016; 94:447-62. [DOI: 10.1016/j.ympev.2015.10.027] [Citation(s) in RCA: 265] [Impact Index Per Article: 33.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
8
|
|
9
|
Pardi F, Scornavacca C. Reconstructible phylogenetic networks: do not distinguish the indistinguishable. PLoS Comput Biol 2015; 11:e1004135. [PMID: 25849429 PMCID: PMC4388854 DOI: 10.1371/journal.pcbi.1004135] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2014] [Accepted: 01/19/2015] [Indexed: 12/21/2022] Open
Abstract
Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are "indistinguishable". This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.
Collapse
Affiliation(s)
- Fabio Pardi
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM, UMR 5506) CNRS, Université de Montpellier, France
- Institut de Biologie Computationnelle, Montpellier, France
| | - Celine Scornavacca
- Institut des Sciences de l’Evolution de Montpellier (ISE-M, UMR 5554) CNRS, IRD, Université de Montpellier, France
- Institut de Biologie Computationnelle, Montpellier, France
| |
Collapse
|