1
|
Ané C, Fogg J, Allman ES, Baños H, Rhodes JA. Anomalous networks under the multispecies coalescent: theory and prevalence. J Math Biol 2024; 88:29. [PMID: 38372830 DOI: 10.1007/s00285-024-02050-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 01/18/2024] [Accepted: 01/21/2024] [Indexed: 02/20/2024]
Abstract
Reticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an anomalous network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the network. We investigate anomalous networks under the Network Multispecies Coalescent Model with possible correlated inheritance at reticulations. Focusing on subsets of 4 taxa, we describe a new algorithm to calculate quartet concordance factors on networks of any level, faster than previous algorithms because of its focus on 4 taxa. We then study topological properties required for a 4-taxon network to be anomalous, uncovering the key role of [Formula: see text]-cycles: cycles of 3 edges parent to a sister group of 2 taxa. Under the model of common inheritance, that is, when each gene tree coalesces within a species tree displayed in the network, we prove that 4-taxon networks are never anomalous. Under independent and various levels of correlated inheritance, we use simulations under realistic parameters to quantify the prevalence of anomalous 4-taxon networks, finding that truly anomalous networks are rare. At the same time, however, we find a significant fraction of networks close enough to the anomaly zone to appear anomalous, when considering the quartet concordance factors observed from a few hundred genes. These apparent anomalies may challenge network inference methods.
Collapse
Affiliation(s)
- Cécile Ané
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA.
- Department of Botany, University of Wisconsin - Madison, Madison, WI, 53706, USA.
| | - John Fogg
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA
| | - Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775-6660, USA
| | - Hector Baños
- Department of Biochemistry & Molecular Biology, Dalhousie University, Halifax, NS, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775-6660, USA
| |
Collapse
|
2
|
Allman ES, Baños H, Garrote-Lopez M, Rhodes JA. IDENTIFIABILITY OF LEVEL-1 SPECIES NETWORKS FROM GENE TREE QUARTETS. ArXiv 2024:arXiv:2401.06290v1. [PMID: 38259350 PMCID: PMC10802673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
When hybridization or other forms of lateral gene transfer have occurred, evolutionary relationships of species are better represented by phylogenetic networks than by trees. While inference of such networks remains challenging, several recently proposed methods are based on quartet concordance factors - the probabilities that a tree relating a gene sampled from the species displays the possible 4-taxon relationships. Building on earlier results, we investigate what level-1 network features are identifiable from concordance factors under the network multispecies coalescent model. We obtain results on both topological features of the network, and numerical parameters, uncovering a number of failures of identifiability related to 3-cycles in the network.
Collapse
|
3
|
Boisseau M, Mach N, Basiaga M, Kuzmina T, Laugier C, Sallé G. Patterns of variation in equine strongyle community structure across age groups and gut compartments. Parasit Vectors 2023; 16:64. [PMID: 36765420 PMCID: PMC9921056 DOI: 10.1186/s13071-022-05645-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 12/28/2022] [Indexed: 02/12/2023] Open
Abstract
BACKGROUND Equine strongyles encompass more than 64 species of nematode worms that are responsible for growth retardation and the death of animals. The factors underpinning variation in the structure of the equine strongyle community remain unknown. METHODS Using horse-based strongyle community data collected after horse deworming (48 horses in Poland, 197 horses in Ukraine), we regressed species richness and the Gini-Simpson index upon the horse's age, faecal egg count, sex and operation of origin. Using the Ukrainian observations, we applied a hierarchical diversity partitioning framework to estimate how communities were remodelled across operations, age groups and horses. Lastly, strongyle species counts collected after necropsy (46 horses in France, 150 in Australia) were considered for analysis of their co-occurrences across intestinal compartments using a joint species distribution modelling approach. RESULTS First, inter-operation variation accounted for > 45% of the variance in species richness or the Gini-Simpson index (which relates to species dominance in communities). Species richness decreased with horse's age (P = 0.01) and showed a mild increase with parasite egg excretion (P < 0.1), but the Gini-Simpson index was neither associated with parasite egg excretion (P = 0.8) nor with horse age (P = 0.37). Second, within-host diversity represented half of the overall diversity across Ukrainian operations. While this is expected to erase species diversity across communities, community dissimilarity between horse age classes was the second most important contributor to overall diversity (25.8%). Third, analysis of species abundance data quantified at necropsy defined a network of positive co-occurrences between the four most prevalent strongyle genera. This pattern was common to necropsies performed in France and Australia. CONCLUSIONS Taken together, these results show a pattern of β-diversity maintenance across age classes combined with positive co-occurrences that might be grounded by priority effects between the major species.
Collapse
Affiliation(s)
- Michel Boisseau
- INRE, ISP, Université de Tours, Nouzilly, France ,grid.508721.9IHAP, INRAE, ENVT, Université de Toulouse, Toulouse, France
| | - Núria Mach
- grid.508721.9IHAP, INRAE, ENVT, Université de Toulouse, Toulouse, France
| | - Marta Basiaga
- grid.410701.30000 0001 2150 7124Department of Zoology and Animal Welfare, Faculty of Animal Science, University of Agriculture in Kraków, 24/28 Mickiewicza Av., 30-059 Cracow, Poland
| | - Tetiana Kuzmina
- grid.418751.e0000 0004 0385 8977Department of Parasitology I.I. Schmalhausen Institute of Zoology, National Academy of Sciences (NAS) of Ukraine, Kiev, Ukraine ,grid.419303.c0000 0001 2180 9405Institute of Parasitology, Slovak Academy of Sciences, Hlinkova 3, 040 01 Kosice, Slovak Republic
| | - Claire Laugier
- grid.425727.10000 0001 1954 9050Conseil Général de l’Alimentation, de l’Agriculture et Des Espaces Ruraux, Ministère de l’Agriculture et de l’Alimentation, Paris, France
| | | |
Collapse
|
4
|
Xu J, Ané C. Identifiability of local and global features of phylogenetic networks from average distances. J Math Biol 2022; 86:12. [PMID: 36481927 DOI: 10.1007/s00285-022-01847-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 11/17/2022] [Accepted: 11/22/2022] [Indexed: 12/12/2022]
Abstract
Phylogenetic networks extend phylogenetic trees to model non-vertical inheritance, by which a lineage inherits material from multiple parents. The computational complexity of estimating phylogenetic networks from genome-wide data with likelihood-based methods limits the size of networks that can be handled. Methods based on pairwise distances could offer faster alternatives. We study here the information that average pairwise distances contain on the underlying phylogenetic network, by characterizing local and global features that can or cannot be identified. For general networks, we clarify that the root and edge lengths adjacent to reticulations are not identifiable, and then focus on the class of zipped-up semidirected networks. We provide a criterion to swap subgraphs locally, such as 3-cycles, resulting in indistinguishable networks. We propose the "distance split tree", which can be constructed from pairwise distances, and prove that it is a refinement of the network's tree of blobs, capturing the tree-like features of the network. For level-1 networks, this distance split tree is equal to the tree of blobs refined to separate polytomies from blobs, and we prove that the mixed representation of the network is identifiable. The information loss is localized around 4-cycles, for which the placement of the reticulation is unidentifiable. The mixed representation combines split edges for 4-cycles, regular tree and hybrid edges from the semidirected network, and edge parameters that encode all information identifiable from average pairwise distances.
Collapse
Affiliation(s)
- Jingcheng Xu
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA.
| | - Cécile Ané
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA
- Department of Botany, University of Wisconsin - Madison, Madison, WI, 53706, USA
| |
Collapse
|
5
|
Allman ES, Baños H, Mitchell JD, Rhodes JA. The tree of blobs of a species network: identifiability under the coalescent. J Math Biol 2022; 86:10. [PMID: 36472708 PMCID: PMC10062380 DOI: 10.1007/s00285-022-01838-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 08/31/2022] [Accepted: 11/17/2022] [Indexed: 12/12/2022]
Abstract
Inference of species networks from genomic data under the Network Multispecies Coalescent Model is currently severely limited by heavy computational demands. It also remains unclear how complicated networks can be for consistent inference to be possible. As a step toward inferring a general species network, this work considers its tree of blobs, in which non-cut edges are contracted to nodes, so only tree-like relationships between the taxa are shown. An identifiability theorem, that most features of the unrooted tree of blobs can be determined from the distribution of gene quartet topologies, is established. This depends upon an analysis of gene quartet concordance factors under the model, together with a new combinatorial inference rule. The arguments for this theoretical result suggest a practical algorithm for tree of blobs inference, to be fully developed in a subsequent work.
Collapse
Affiliation(s)
- Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA
| | - Hector Baños
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Dalhousie University, Halifax, NS, Canada
- Department of Mathematics and Statistics, Faculty of Science, Dalhousie University, Halifax, NS, Canada
| | - Jonathan D Mitchell
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA
- School of Natural Sciences (Mathematics), University of Tasmania, Hobart, TAS, 7001, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Tasmania, Hobart, TAS, 7001, Australia
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA.
| |
Collapse
|
6
|
Allman ES, Baños H, Rhodes JA. Identifiability of species network topologies from genomic sequences using the logDet distance. J Math Biol 2022; 84:35. [PMID: 35385988 DOI: 10.1007/s00285-022-01734-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 01/12/2022] [Accepted: 03/02/2022] [Indexed: 10/18/2022]
Abstract
Inference of network-like evolutionary relationships between species from genomic data must address the interwoven signals from both gene flow and incomplete lineage sorting. The heavy computational demands of standard approaches to this problem severely limit the size of datasets that may be analyzed, in both the number of species and the number of genetic loci. Here we provide a theoretical pointer to more efficient methods, by showing that logDet distances computed from genomic-scale sequences retain sufficient information to recover network relationships in the level-1 ultrametric case. This result is obtained under the Network Multispecies Coalescent model combined with a mixture of General Time-Reversible sequence evolution models across individual gene trees. It applies to both unlinked site data, such as for SNPs, and to sequence data in which many contiguous sites may have evolved on a common tree, such as concatenated gene sequences. Thus under standard stochastic models statistically justifiable inference of network relationships from sequences can be accomplished without consideration of individual genes or gene trees.
Collapse
|