1
|
Bienvenu F. Mathematically tractable models of random phylogenetic networks: an overview of some recent developments. Philos Trans R Soc Lond B Biol Sci 2025; 380:20230301. [PMID: 39976400 DOI: 10.1098/rstb.2023.0301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 10/17/2024] [Accepted: 10/18/2024] [Indexed: 02/21/2025] Open
Abstract
Models of random phylogenetic networks have been used since the inception of the field, but the introduction and rigorous study of mathematically tractable models is a much more recent topic that has gained momentum in the last 5 years. This manuscript discusses some recent developments in the field through a selection of examples. The emphasis is on the techniques rather than on the results themselves, and on probabilistic tools rather than on combinatorial ones.This article is part of the theme issue '"A mathematical theory of evolution": phylogenetic models dating back 100 years'.
Collapse
|
2
|
Teo B, Bastide P, Ané C. Leveraging graphical model techniques to study evolution on phylogenetic networks. Philos Trans R Soc Lond B Biol Sci 2025; 380:20230310. [PMID: 39976402 DOI: 10.1098/rstb.2023.0310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 08/27/2024] [Accepted: 09/16/2024] [Indexed: 02/21/2025] Open
Abstract
The evolution of molecular and phenotypic traits is commonly modelled using Markov processes along a phylogeny. This phylogeny can be a tree, or a network if it includes reticulations, representing events such as hybridization or admixture. Computing the likelihood of data observed at the leaves is costly as the size and complexity of the phylogeny grows. Efficient algorithms exist for trees, but cannot be applied to networks. We show that a vast array of models for trait evolution along phylogenetic networks can be reformulated as graphical models, for which efficient belief propagation algorithms exist. We provide a brief review of belief propagation on general graphical models, then focus on linear Gaussian models for continuous traits. We show how belief propagation techniques can be applied for exact or approximate (but more scalable) likelihood and gradient calculations, and prove novel results for efficient parameter inference of some models. We highlight the possible fruitful interactions between graphical models and phylogenetic methods. For example, approximate likelihood approaches have the potential to greatly reduce computational costs for phylogenies with reticulations.This article is part of the theme issue '"A mathematical theory of evolution": phylogenetic models dating back 100 years'.
Collapse
Affiliation(s)
- Benjamin Teo
- Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA
| | - Paul Bastide
- IMAG, Université de Montpellier, CNRS, Montpellier, France
| | - Cécile Ané
- Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA
- Department of Botany, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
3
|
Kong S, Swofford DL, Kubatko LS. Inference of Phylogenetic Networks From Sequence Data Using Composite Likelihood. Syst Biol 2025; 74:53-69. [PMID: 39387633 DOI: 10.1093/sysbio/syae054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 09/13/2024] [Accepted: 10/08/2024] [Indexed: 10/12/2024] Open
Abstract
While phylogenies have been essential in understanding how species evolve, they do not adequately describe some evolutionary processes. For instance, hybridization, a common phenomenon where interbreeding between 2 species leads to formation of a new species, must be depicted by a phylogenetic network, a structure that modifies a phylogenetic tree by allowing 2 branches to merge into 1, resulting in reticulation. However, existing methods for estimating networks become computationally expensive as the dataset size and/or topological complexity increase. The lack of methods for scalable inference hampers phylogenetic networks from being widely used in practice, despite accumulating evidence that hybridization occurs frequently in nature. Here, we propose a novel method, PhyNEST (Phylogenetic Network Estimation using SiTe patterns), that estimates binary, level-1 phylogenetic networks with a fixed, user-specified number of reticulations directly from sequence data. By using the composite likelihood as the basis for inference, PhyNEST is able to use the full genomic data in a computationally tractable manner, eliminating the need to summarize the data as a set of gene trees prior to network estimation. To search network space, PhyNEST implements both hill climbing and simulated annealing algorithms. PhyNEST assumes that the data are composed of coalescent independent sites that evolve according to the Jukes-Cantor substitution model and that the network has a constant effective population size. Simulation studies demonstrate that PhyNEST is often more accurate than 2 existing composite likelihood summary methods (SNaQand PhyloNet) and that it is robust to at least one form of model misspecification (assuming a less complex nucleotide substitution model than the true generating model). We applied PhyNEST to reconstruct the evolutionary relationships among Heliconius butterflies and Papionini primates, characterized by hybrid speciation and widespread introgression, respectively. PhyNEST is implemented in an open-source Julia package and is publicly available at https://github.com/sungsik-kong/PhyNEST.jl.
Collapse
Affiliation(s)
- Sungsik Kong
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - David L Swofford
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Laura S Kubatko
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA
- Department of Statistics, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
4
|
Heiss J, Huson DH, Steel M. Transformations to Simplify Phylogenetic Networks. Bull Math Biol 2025; 87:20. [PMID: 39752084 DOI: 10.1007/s11538-024-01398-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Accepted: 12/06/2024] [Indexed: 01/04/2025]
Abstract
The evolutionary relationships between species are typically represented in the biological literature by rooted phylogenetic trees. However, a tree fails to capture ancestral reticulate processes, such as the formation of hybrid species or lateral gene transfer events between lineages, and so the history of life is more accurately described by a rooted phylogenetic network. Nevertheless, phylogenetic networks may be complex and difficult to interpret, so biologists sometimes prefer a tree that summarises the central tree-like trend of evolution. In this paper, we formally investigate methods for transforming an arbitrary phylogenetic network into a tree (on the same set of leaves) and ask which ones (if any) satisfy a simple consistency condition. This consistency condition states that if we add additional species into a phylogenetic network (without otherwise changing this original network) then transforming this enlarged network into a rooted phylogenetic tree induces the same tree on the original set of species as transforming the original network. We show that the LSA (lowest stable ancestor) tree method satisfies this consistency property, whereas several other commonly used methods (and a new one we introduce) do not. We also briefly consider transformations that convert arbitrary phylogenetic networks to another simpler class, namely normal networks.
Collapse
Affiliation(s)
- Johanna Heiss
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand
| | - Daniel H Huson
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Mike Steel
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand.
| |
Collapse
|
5
|
Coronado TM, Pons JC, Riera G. Counting Cherry Reduction Sequences in Phylogenetic Tree-Child Networks is Counting Linear Extensions. Bull Math Biol 2024; 86:146. [PMID: 39520517 PMCID: PMC11550256 DOI: 10.1007/s11538-024-01374-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 10/18/2024] [Indexed: 11/16/2024]
Abstract
Orchard and tree-child networks share an important property with phylogenetic trees: they can be completely reduced to a single node by iteratively deleting cherries and reticulated cherries. As it is the case with phylogenetic trees, the number of ways in which this can be done gives information about the topology of the network. Here, we show that the problem of computing this number in tree-child networks is akin to that of finding the number of linear extensions of the poset induced by each network, and give an algorithm based on this reduction whose complexity is bounded in terms of the level of the network.
Collapse
Affiliation(s)
- Tomás M Coronado
- Department of Mathematics and Computer Science, Universitat de les Illes Balears, Ctra. de Valldemossa, km 7,5, 07122, Palma, Illes Balears, Spain
| | - Joan Carles Pons
- Department of Mathematics and Computer Science, Universitat de les Illes Balears, Ctra. de Valldemossa, km 7,5, 07122, Palma, Illes Balears, Spain
| | - Gabriel Riera
- Department of Mathematics and Computer Science, Universitat de les Illes Balears, Ctra. de Valldemossa, km 7,5, 07122, Palma, Illes Balears, Spain.
| |
Collapse
|
6
|
Allman ES, Baños H, Mitchell JD, Rhodes JA. TINNiK: inference of the tree of blobs of a species network under the coalescent model. Algorithms Mol Biol 2024; 19:23. [PMID: 39501362 PMCID: PMC11539473 DOI: 10.1186/s13015-024-00266-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Accepted: 08/22/2024] [Indexed: 11/08/2024] Open
Abstract
The tree of blobs of a species network shows only the tree-like aspects of relationships of taxa on a network, omitting information on network substructures where hybridization or other types of lateral transfer of genetic information occur. By isolating such regions of a network, inference of the tree of blobs can serve as a starting point for a more detailed investigation, or indicate the limit of what may be inferrable without additional assumptions. Building on our theoretical work on the identifiability of the tree of blobs from gene quartet distributions under the Network Multispecies Coalescent model, we develop an algorithm, TINNiK, for statistically consistent tree of blobs inference. We provide examples of its application to both simulated and empirical datasets, utilizing an implementation in the MSCquartets 2.0 R package.
Collapse
Affiliation(s)
- Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska, Fairbanks, AK, USA.
| | - Hector Baños
- Department of Mathematics, California State University San Bernadino, San Bernadino, CA, USA
| | - Jonathan D Mitchell
- School of Natural Sciences (Mathematics), University of Tasmania, Hobart, TAS, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Tasmania, Hobart, TAS, Australia
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska, Fairbanks, AK, USA
| |
Collapse
|
7
|
Chang YS, Fuchs M. Counting Phylogenetic Networks with Few Reticulation Vertices: Galled and Reticulation-Visible Networks. Bull Math Biol 2024; 86:76. [PMID: 38762579 DOI: 10.1007/s11538-024-01309-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 05/02/2024] [Indexed: 05/20/2024]
Abstract
We give exact and asymptotic counting results for the number of galled networks and reticulation-visible networks with few reticulation vertices. Our results are obtained with the component graph method, which was introduced by L. Zhang and his coauthors, and generating function techniques. For galled networks, we in addition use analytic combinatorics. Moreover, in an appendix, we consider maximally reticulated reticulation-visible networks and derive their number, too.
Collapse
Affiliation(s)
- Yu-Sheng Chang
- Department of Mathematical Sciences, National Chengchi University, Taipei, 116, Taiwan
| | - Michael Fuchs
- Department of Mathematical Sciences, National Chengchi University, Taipei, 116, Taiwan.
| |
Collapse
|
8
|
Allman ES, Baños H, Mitchell JD, Rhodes JA. TINNiK: Inference of the Tree of Blobs of a Species Network Under the Coalescent. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.20.590418. [PMID: 38712257 PMCID: PMC11071406 DOI: 10.1101/2024.04.20.590418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
The tree of blobs of a species network shows only the tree-like aspects of relationships of taxa on a network, omitting information on network substructures where hybridization or other types of lateral transfer of genetic information occur. By isolating such regions of a network, inference of the tree of blobs can serve as a starting point for a more detailed investigation, or indicate the limit of what may be inferrable without additional assumptions. Building on our theoretical work on the identifiability of the tree of blobs from gene quartet distributions under the Network Multispecies Coalescent model, we develop an algorithm, TINNiK, for statistically consistent tree of blobs inference. We provide examples of its application to both simulated and empirical datasets, utilizing an implementation in the MSCquartets 2.0 R package.
Collapse
Affiliation(s)
- Elizabeth S. Allman
- Department of Mathematics and Statistics, University of Alaska, Fairbanks, AK, USA
| | - Hector Baños
- Department of Mathematics, California State University San Bernadino, San Bernadino, CA, USA
| | - Jonathan D. Mitchell
- School of Natural Sciences (Mathematics), University of Tasmania, Hobart, TAS, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Tasmania, Hobart, TAS, Australia
| | - John A. Rhodes
- Department of Mathematics and Statistics, University of Alaska, Fairbanks, AK, USA
| |
Collapse
|
9
|
Francis A, Marchei D, Steel M. Phylogenetic network classes through the lens of expanding covers. J Math Biol 2024; 88:58. [PMID: 38584237 PMCID: PMC10999392 DOI: 10.1007/s00285-024-02075-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 11/18/2023] [Accepted: 03/05/2024] [Indexed: 04/09/2024]
Abstract
It was recently shown that a large class of phylogenetic networks, the 'labellable' networks, is in bijection with the set of 'expanding' covers of finite sets. In this paper, we show how several prominent classes of phylogenetic networks can be characterised purely in terms of properties of their associated covers. These classes include the tree-based, tree-child, orchard, tree-sibling, and normal networks. In the opposite direction, we give an example of how a restriction on the set of expanding covers can define a new class of networks, which we call 'spinal' phylogenetic networks.
Collapse
Affiliation(s)
- Andrew Francis
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, Australia.
- School of Mathematics and Statistics, University of New South Wales, Sydney, Australia.
| | - Daniele Marchei
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, Australia
- Computer Science, University of Camerino, Camerino, Italy
| | - Mike Steel
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
10
|
Wagle S, Markin A, Górecki P, Anderson TK, Eulenstein O. Asymmetric Cluster-Based Measures for Comparative Phylogenetics. J Comput Biol 2024; 31:312-327. [PMID: 38634854 PMCID: PMC11057527 DOI: 10.1089/cmb.2023.0338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
Phylogenetic inference and reconstruction methods generate hypotheses on evolutionary history. Competing inference methods are frequently used, and the evaluation of the generated hypotheses is achieved using tree comparison costs. The Robinson-Foulds (RF) distance is a widely used cost to compare the topology of two trees, but this cost is sensitive to tree error and can overestimate tree differences. To overcome this limitation, a refined version of the RF distance called the Cluster Affinity (CA) distance was introduced. However, CA distances are symmetric and cannot compare different types of trees. These asymmetric comparisons occur when gene trees are compared with species trees, when disparate datasets are integrated into a supertree, or when tree comparison measures are used to infer a phylogenetic network. In this study, we introduce a relaxation of the original Affinity distance to compare heterogeneous trees called the asymmetric CA cost. We also develop a biologically interpretable cost, the Cluster Support cost that normalizes by cluster size across gene trees. The characteristics of these costs are similar to the symmetric CA cost. We describe efficient algorithms, derive the exact diameters, and use these to standardize the cost to be applicable in practice. These costs provide objective, fine-scale, and biologically interpretable values that can assess differences and similarities between phylogenetic trees.
Collapse
Affiliation(s)
- Sanket Wagle
- Department of Computer Science, Iowa State University, Ames, Iowa, USA
| | - Alexey Markin
- National Animal Disease Center, USDA-ARS, Ames, Iowa, USA
| | - Paweł Górecki
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | | | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
11
|
Landry K, Tremblay-Savard O, Lafond M. A Fixed-Parameter Tractable Algorithm for Finding Agreement Cherry-Reduced Subnetworks in Level-1 Orchard Networks. J Comput Biol 2024; 31:360-379. [PMID: 38117611 DOI: 10.1089/cmb.2023.0317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2023] Open
Abstract
Phylogenetic networks are increasingly being considered better suited to represent the complexity of the evolutionary relationships between species. One class of phylogenetic networks that have received a lot of attention recently is the class of orchard networks, which is composed of networks that can be reduced to a single leaf using cherry reductions. Cherry reductions, also called cherry-picking operations, remove either a leaf of a simple cherry (sibling leaves sharing a parent) or a reticulate edge of a reticulate cherry (two leaves whose parents are connected by a reticulate edge). In this article, we present a fixed-parameter tractable algorithm to solve the problem of finding a maximum agreement cherry-reduced subnetwork (MACRS) between two rooted binary level-1 networks. This is the first exact algorithm proposed to solve the MACRS problem. As proven in an earlier work, there is a direct relationship between finding an MACRS and calculating a distance based on cherry operations. As a result, the proposed algorithm also provides a distance that can be used for the comparison of level-1 networks.
Collapse
Affiliation(s)
- Kaari Landry
- Department of Computer Science, University of Manitoba, Winnipeg, Canada
| | | | - Manuel Lafond
- Département d'informatique, Université de Sherbrooke, Sherbrooke, Canada
| |
Collapse
|
12
|
Agranat-Tamir L, Mathur S, Rosenberg NA. Enumeration of Rooted Binary Unlabeled Galled Trees. Bull Math Biol 2024; 86:45. [PMID: 38519704 PMCID: PMC10959814 DOI: 10.1007/s11538-024-01270-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 02/15/2024] [Indexed: 03/25/2024]
Abstract
Rooted binary galled trees generalize rooted binary trees to allow a restricted class of cycles, known as galls. We build upon the Wedderburn-Etherington enumeration of rooted binary unlabeled trees with n leaves to enumerate rooted binary unlabeled galled trees with n leaves, also enumerating rooted binary unlabeled galled trees with n leaves and g galls, 0 ⩽ g ⩽ ⌊ n - 1 2 ⌋ . The enumerations rely on a recursive decomposition that considers subtrees descended from the nodes of a gall, adopting a restriction on galls that amounts to considering only the rooted binary normal unlabeled galled trees in our enumeration. We write an implicit expression for the generating function encoding the numbers of trees for all n. We show that the number of rooted binary unlabeled galled trees grows with 0.0779 ( 4 . 8230 n ) n - 3 2 , exceeding the growth 0.3188 ( 2 . 4833 n ) n - 3 2 of the number of rooted binary unlabeled trees without galls. However, the growth of the number of galled trees with only one gall has the same exponential order 2.4833 as the number with no galls, exceeding it only in the subexponential term, 0.3910 n 1 2 compared to 0.3188 n - 3 2 . For a fixed number of leaves n, the number of galls g that produces the largest number of rooted binary unlabeled galled trees lies intermediate between the minimum of g = 0 and the maximum of g = ⌊ n - 1 2 ⌋ . We discuss implications in mathematical phylogenetics.
Collapse
Affiliation(s)
| | - Shaili Mathur
- Department of Biology, Stanford University, Stanford, CA, 94305, USA
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
13
|
Wu Z, Solís-Lemus C. Ultrafast learning of four-node hybridization cycles in phylogenetic networks using algebraic invariants. BIOINFORMATICS ADVANCES 2024; 4:vbae014. [PMID: 38384862 PMCID: PMC10879748 DOI: 10.1093/bioadv/vbae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 12/23/2023] [Accepted: 02/06/2024] [Indexed: 02/23/2024]
Abstract
Motivation The abundance of gene flow in the Tree of Life challenges the notion that evolution can be represented with a fully bifurcating process which cannot capture important biological realities like hybridization, introgression, or horizontal gene transfer. Coalescent-based network methods are increasingly popular, yet not scalable for big data, because they need to perform a heuristic search in the space of networks as well as numerical optimization that can be NP-hard. Here, we introduce a novel method to reconstruct phylogenetic networks based on algebraic invariants. While there is a long tradition of using algebraic invariants in phylogenetics, our work is the first to define phylogenetic invariants on concordance factors (frequencies of four-taxon splits in the input gene trees) to identify level-1 phylogenetic networks under the multispecies coalescent model. Results Our novel hybrid detection methodology is optimization-free as it only requires the evaluation of polynomial equations, and as such, it bypasses the traversal of network space, yielding a computational speed at least 10 times faster than the fastest-to-date network methods. We illustrate our method's performance on simulated and real data from the genus Canis. Availability and implementation We present an open-source publicly available Julia package PhyloDiamond.jl available at https://github.com/solislemuslab/PhyloDiamond.jl with broad applicability within the evolutionary community.
Collapse
Affiliation(s)
- Zhaoxing Wu
- Department of Statistics, Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53706, United States
| | - Claudia Solís-Lemus
- Department of Plant Pathology, Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53706, United States
| |
Collapse
|
14
|
Cardona G, Ribas G, Pons JC. Generation of Orchard and Tree-Child Networks. Bull Math Biol 2023; 86:10. [PMID: 38117376 PMCID: PMC10733240 DOI: 10.1007/s11538-023-01239-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 11/22/2023] [Indexed: 12/21/2023]
Abstract
Phylogenetic networks are an extension of phylogenetic trees that allow for the representation of reticulate evolution events. One of the classes of networks that has gained the attention of the scientific community over the last years is the class of orchard networks, that generalizes tree-child networks, one of the most studied classes of networks. In this paper we focus on the combinatorial and algorithmic problem of the generation of binary orchard networks, and also of binary tree-child networks. To this end, we use that these networks are defined as those that can be recovered by reversing a certain reduction process. Then, we show how to choose a "minimum" reduction process among all that can be applied to a network, and hence we get a unique representation of the network that, in fact, can be given in terms of sequences of pairs of integers, whose length is related to the number of leaves and reticulations of the network. Therefore, the generation of networks is reduced to the generation of such sequences of pairs. Our main result is a recursive method for the efficient generation of all minimum sequences, and hence of all orchard (or tree-child) networks with a given number of leaves and reticulations. An implementation in C of the algorithms described in this paper, along with some computational experiments, can be downloaded from the public repository https://github.com/gerardet46/OrchardGenerator . Using this implementation, we have computed the number of binary orchard networks with at most 6 leaves and 8 reticulations.
Collapse
Affiliation(s)
- Gabriel Cardona
- Department of Mathematics and Computer Science, University of the Balearic Islands, Ctra. Valldemossa, km. 7.5, 07122, Palma, Spain
| | - Gerard Ribas
- Higher Polytechnic School, University of the Balearic Islands, Ctra. Valldemossa, km. 7.5, 07122, Palma, Spain
| | - Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, Ctra. Valldemossa, km. 7.5, 07122, Palma, Spain.
| |
Collapse
|
15
|
Hellmuth M, Schaller D, Stadler PF. Clustering systems of phylogenetic networks. Theory Biosci 2023; 142:301-358. [PMID: 37573261 PMCID: PMC10564800 DOI: 10.1007/s12064-023-00398-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 06/25/2023] [Indexed: 08/14/2023]
Abstract
Rooted acyclic graphs appear naturally when the phylogenetic relationship of a set X of taxa involves not only speciations but also recombination, horizontal transfer, or hybridization that cannot be captured by trees. A variety of classes of such networks have been discussed in the literature, including phylogenetic, level-1, tree-child, tree-based, galled tree, regular, or normal networks as models of different types of evolutionary processes. Clusters arise in models of phylogeny as the sets [Formula: see text] of descendant taxa of a vertex v. The clustering system [Formula: see text] comprising the clusters of a network N conveys key information on N itself. In the special case of rooted phylogenetic trees, T is uniquely determined by its clustering system [Formula: see text]. Although this is no longer true for networks in general, it is of interest to relate properties of N and [Formula: see text]. Here, we systematically investigate the relationships of several well-studied classes of networks and their clustering systems. The main results are correspondences of classes of networks and clustering systems of the following form: If N is a network of type [Formula: see text], then [Formula: see text] satisfies [Formula: see text], and conversely if [Formula: see text] is a clustering system satisfying [Formula: see text] then there is network N of type [Formula: see text] such that [Formula: see text].This, in turn, allows us to investigate the mutual dependencies between the distinct types of networks in much detail.
Collapse
Affiliation(s)
- Marc Hellmuth
- Department of Mathematics, Faculty of Science, Stockholm University, Albanovägen 28, 10691 Stockholm, Sweden
| | - David Schaller
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Leipzig University, Härtelstraße 16-18, 04107 Leipzig, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Leipzig University, Härtelstraße 16-18, 04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, Germany
- Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Vienna, Austria
- Facultad de Ciencias, Universidad National de Colombia, Bogotá, Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501 USA
| |
Collapse
|
16
|
Francis A, Steel M. Labellable Phylogenetic Networks. Bull Math Biol 2023; 85:46. [PMID: 37097343 PMCID: PMC10129972 DOI: 10.1007/s11538-023-01157-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 04/03/2023] [Indexed: 04/26/2023]
Abstract
Phylogenetic networks are mathematical representations of evolutionary history that are able to capture both tree-like evolutionary processes (speciations) and non-tree-like 'reticulate' processes such as hybridization or horizontal gene transfer. The additional complexity that comes with this capacity, however, makes networks harder to infer from data, and more complicated to work with as mathematical objects. In this paper, we define a new, large class of phylogenetic networks, that we call labellable, and show that they are in bijection with the set of 'expanding covers' of finite sets. This correspondence is a generalisation of the encoding of phylogenetic forests by partitions of finite sets. Labellable networks can be characterised by a simple combinatorial condition, and we describe the relationship between this large class and other commonly studied classes. Furthermore, we show that all phylogenetic networks have a quotient network that is labellable.
Collapse
Affiliation(s)
- Andrew Francis
- Centre for Research in Mathematics and Data Science, Western Sydney University, Penrith, Australia.
| | - Mike Steel
- Biomathematics Research Centre, Canterbury University, Christchurch, New Zealand
| |
Collapse
|
17
|
Zaharias P, Warnow T. Recent progress on methods for estimating and updating large phylogenies. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210244. [PMID: 35989607 PMCID: PMC9393559 DOI: 10.1098/rstb.2021.0244] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 01/07/2022] [Indexed: 12/20/2022] Open
Abstract
With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the past few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g. incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
Collapse
Affiliation(s)
- Paul Zaharias
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
18
|
Kong S. Digest: Frequent hybridization in Darevskia rarely leads to the evolution of asexuality. Evolution 2022; 76:2216-2217. [PMID: 35909234 PMCID: PMC9546131 DOI: 10.1111/evo.14587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 07/22/2022] [Indexed: 01/22/2023]
Abstract
Asexuality in vertebrates is often generated via hybridization, but is it a rare product of pervasive hybridization or a common product of rare hybridization? Freitas et al. show that hybridization is frequent among the sexual species of Darevskia, although the crossings between parents of the asexual hybrids are undetected. This study illustrates that hybridization is not extraordinary in nature, and thus scalable phylogenetic network inference methods, rather than phylogenetic trees, are needed to accurately represent the true evolutionary history.
Collapse
Affiliation(s)
- Sungsik Kong
- Department of EvolutionEcology, and Organismal Biology, The Ohio State UniversityColumbusOhioUSA
| |
Collapse
|