1
|
Ané C, Fogg J, Allman ES, Baños H, Rhodes JA. Anomalous networks under the multispecies coalescent: theory and prevalence. J Math Biol 2024; 88:29. [PMID: 38372830 DOI: 10.1007/s00285-024-02050-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 01/18/2024] [Accepted: 01/21/2024] [Indexed: 02/20/2024]
Abstract
Reticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an anomalous network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the network. We investigate anomalous networks under the Network Multispecies Coalescent Model with possible correlated inheritance at reticulations. Focusing on subsets of 4 taxa, we describe a new algorithm to calculate quartet concordance factors on networks of any level, faster than previous algorithms because of its focus on 4 taxa. We then study topological properties required for a 4-taxon network to be anomalous, uncovering the key role of [Formula: see text]-cycles: cycles of 3 edges parent to a sister group of 2 taxa. Under the model of common inheritance, that is, when each gene tree coalesces within a species tree displayed in the network, we prove that 4-taxon networks are never anomalous. Under independent and various levels of correlated inheritance, we use simulations under realistic parameters to quantify the prevalence of anomalous 4-taxon networks, finding that truly anomalous networks are rare. At the same time, however, we find a significant fraction of networks close enough to the anomaly zone to appear anomalous, when considering the quartet concordance factors observed from a few hundred genes. These apparent anomalies may challenge network inference methods.
Collapse
Affiliation(s)
- Cécile Ané
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA.
- Department of Botany, University of Wisconsin - Madison, Madison, WI, 53706, USA.
| | - John Fogg
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA
| | - Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775-6660, USA
| | - Hector Baños
- Department of Biochemistry & Molecular Biology, Dalhousie University, Halifax, NS, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775-6660, USA
| |
Collapse
|
2
|
Frankel LE, Ané C. Summary Tests of Introgression Are Highly Sensitive to Rate Variation Across Lineages. Syst Biol 2023; 72:1357-1369. [PMID: 37698548 DOI: 10.1093/sysbio/syad056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 07/07/2023] [Accepted: 08/29/2023] [Indexed: 09/13/2023] Open
Abstract
The evolutionary implications and frequency of hybridization and introgression are increasingly being recognized across the tree of life. To detect hybridization from multi-locus and genome-wide sequence data, a popular class of methods are based on summary statistics from subsets of 3 or 4 taxa. However, these methods often carry the assumption of a constant substitution rate across lineages and genes, which is commonly violated in many groups. In this work, we quantify the effects of rate variation on the D test (also known as ABBA-BABA test), the D3 test, and HyDe. All 3 tests are used widely across a range of taxonomic groups, in part because they are very fast to compute. We consider rate variation across species lineages, across genes, their lineage-by-gene interaction, and rate variation across gene-tree edges. We simulated species networks according to a birth-death-hybridization process, so as to capture a range of realistic species phylogenies. For all 3 methods tested, we found a marked increase in the false discovery of reticulation (type-1 error rate) when there is rate variation across species lineages. The D3 test was the most sensitive, with around 80% type-1 error, such that D3 appears to more sensitive to a departure from the clock than to the presence of reticulation. For all 3 tests, the power to detect hybridization events decreased as the number of hybridization events increased, indicating that multiple hybridization events can obscure one another if they occur within a small subset of taxa. Our study highlights the need to consider rate variation when using site-based summary statistics, and points to the advantages of methods that do not require assumptions on evolutionary rates across lineages or across genes.
Collapse
Affiliation(s)
- Lauren E Frankel
- Department of Botany, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Cécile Ané
- Department of Botany, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
3
|
Linz S, Wicke K. Exploring spaces of semi-directed level-1 networks. J Math Biol 2023; 87:70. [PMID: 37831304 PMCID: PMC10575830 DOI: 10.1007/s00285-023-02004-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 09/14/2023] [Accepted: 09/20/2023] [Indexed: 10/14/2023]
Abstract
Semi-directed phylogenetic networks have recently emerged as a class of phylogenetic networks sitting between rooted (directed) and unrooted (undirected) phylogenetic networks as they contain both directed as well as undirected edges. While various spaces of rooted phylogenetic networks and unrooted phylogenetic networks have been analyzed in recent years and several rearrangement moves to traverse these spaces have been introduced, little is known about spaces of semi-directed phylogenetic networks. Here, we propose a simple rearrangement move for semi-directed phylogenetic networks, called cut edge transfer (CET), and show that the space of semi-directed level-1 networks with precisely k reticulations is connected under CET. This level-1 space is currently the predominantly used search space for most algorithms that reconstruct semi-directed phylogenetic networks. Our results imply that every semi-directed level-1 network with a fixed number of reticulations and leaf set can be reached from any other such network by a sequence of CETs. By introducing two additional moves, R[Formula: see text] and R[Formula: see text], that allow for the addition and deletion, respectively, of a reticulation, we then establish connectedness for the space of all semi-directed level-1 networks on a fixed leaf set. As a byproduct of our results for semi-directed phylogenetic networks, we also show that the space of rooted level-1 networks with a fixed number of reticulations and leaf set is connected under CET, when translated into the rooted setting.
Collapse
Affiliation(s)
- Simone Linz
- School of Computer Science, University of Auckland, Auckland, New Zealand.
| | - Kristina Wicke
- Department of Mathematical Sciences, New Jersey Institute of Technology, Newark, NJ, USA
| |
Collapse
|
4
|
Ané C, Fogg J, Allman ES, Baños H, Rhodes JA. ANOMALOUS NETWORKS UNDER THE MULTISPECIES COALESCENT: THEORY AND PREVALENCE. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.18.553582. [PMID: 37662314 PMCID: PMC10473666 DOI: 10.1101/2023.08.18.553582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Reticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an anomalous network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the network. We investigate anomalous networks under the Network Multispecies Coalescent Model with possible correlated inheritance at reticulations. Focusing on subsets of 4 taxa, we describe a new algorithm to calculate quartet concordance factors on networks of any level, faster than previous algorithms because of its focus on 4 taxa. We then study topological properties required for a 4-taxon network to be anomalous, uncovering the key role of 32-cycles: cycles of 3 edges parent to a sister group of 2 taxa. Under the model of common inheritance, that is, when each gene tree coalesces within a species tree displayed in the network, we prove that 4-taxon networks are never anomalous. Under independent and various levels of correlated inheritance, we use simulations under realistic parameters to quantify the prevalence of anomalous 4-taxon networks, finding that truly anomalous networks are rare. At the same time, however, we find a significant fraction of networks close enough to the anomaly zone to appear anomalous, when considering the quartet concordance factors observed from a few hundred genes. These apparent anomalies may challenge network inference methods.
Collapse
Affiliation(s)
- Cécile Ané
- Department of Statistics, University of Wisconsin - Madison, WI, 53706, USA
- Department of Botany, University of Wisconsin - Madison, WI, 53706, USA
| | - John Fogg
- Department of Statistics, University of Wisconsin - Madison, WI, 53706, USA
| | - Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska - Fairbanks, AK, 99775-6660, USA
| | - Hector Baños
- Department of Biochemistry & Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska - Fairbanks, AK, 99775-6660, USA
| |
Collapse
|
5
|
Allman ES, Baños H, Mitchell JD, Rhodes JA. The tree of blobs of a species network: identifiability under the coalescent. J Math Biol 2022; 86:10. [PMID: 36472708 PMCID: PMC10062380 DOI: 10.1007/s00285-022-01838-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 08/31/2022] [Accepted: 11/17/2022] [Indexed: 12/12/2022]
Abstract
Inference of species networks from genomic data under the Network Multispecies Coalescent Model is currently severely limited by heavy computational demands. It also remains unclear how complicated networks can be for consistent inference to be possible. As a step toward inferring a general species network, this work considers its tree of blobs, in which non-cut edges are contracted to nodes, so only tree-like relationships between the taxa are shown. An identifiability theorem, that most features of the unrooted tree of blobs can be determined from the distribution of gene quartet topologies, is established. This depends upon an analysis of gene quartet concordance factors under the model, together with a new combinatorial inference rule. The arguments for this theoretical result suggest a practical algorithm for tree of blobs inference, to be fully developed in a subsequent work.
Collapse
Affiliation(s)
- Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA
| | - Hector Baños
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Dalhousie University, Halifax, NS, Canada
- Department of Mathematics and Statistics, Faculty of Science, Dalhousie University, Halifax, NS, Canada
| | - Jonathan D Mitchell
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA
- School of Natural Sciences (Mathematics), University of Tasmania, Hobart, TAS, 7001, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Tasmania, Hobart, TAS, 7001, Australia
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA.
| |
Collapse
|