1
|
Pang XX, Zhang DY. Detection of Ghost Introgression Requires Exploiting Topological and Branch Length Information. Syst Biol 2024; 73:207-222. [PMID: 38224495 PMCID: PMC11129598 DOI: 10.1093/sysbio/syad077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 12/17/2023] [Accepted: 12/27/2023] [Indexed: 01/17/2024] Open
Abstract
In recent years, the study of hybridization and introgression has made significant progress, with ghost introgression-the transfer of genetic material from extinct or unsampled lineages to extant species-emerging as a key area for research. Accurately identifying ghost introgression, however, presents a challenge. To address this issue, we focused on simple cases involving 3 species with a known phylogenetic tree. Using mathematical analyses and simulations, we evaluated the performance of popular phylogenetic methods, including HyDe and PhyloNet/MPL, and the full-likelihood method, Bayesian Phylogenetics and Phylogeography (BPP), in detecting ghost introgression. Our findings suggest that heuristic approaches relying on site-pattern counts or gene-tree topologies struggle to differentiate ghost introgression from introgression between sampled non-sister species, frequently leading to incorrect identification of donor and recipient species. The full-likelihood method BPP uses multilocus sequence alignments directly-hence taking into account both gene-tree topologies and branch lengths, by contrast, is capable of detecting ghost introgression in phylogenomic datasets. We analyzed a real-world phylogenomic dataset of 14 species of Jaltomata (Solanaceae) to showcase the potential of full-likelihood methods for accurate inference of introgression.
Collapse
Affiliation(s)
- Xiao-Xu Pang
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Da-Yong Zhang
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
2
|
Allman ES, Baños H, Mitchell JD, Rhodes JA. TINNiK: Inference of the Tree of Blobs of a Species Network Under the Coalescent. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.20.590418. [PMID: 38712257 PMCID: PMC11071406 DOI: 10.1101/2024.04.20.590418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
The tree of blobs of a species network shows only the tree-like aspects of relationships of taxa on a network, omitting information on network substructures where hybridization or other types of lateral transfer of genetic information occur. By isolating such regions of a network, inference of the tree of blobs can serve as a starting point for a more detailed investigation, or indicate the limit of what may be inferrable without additional assumptions. Building on our theoretical work on the identifiability of the tree of blobs from gene quartet distributions under the Network Multispecies Coalescent model, we develop an algorithm, TINNiK, for statistically consistent tree of blobs inference. We provide examples of its application to both simulated and empirical datasets, utilizing an implementation in the MSCquartets 2.0 R package. MSC Classification 92D15, 92D20.
Collapse
|
3
|
Ané C, Fogg J, Allman ES, Baños H, Rhodes JA. Anomalous networks under the multispecies coalescent: theory and prevalence. J Math Biol 2024; 88:29. [PMID: 38372830 DOI: 10.1007/s00285-024-02050-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 01/18/2024] [Accepted: 01/21/2024] [Indexed: 02/20/2024]
Abstract
Reticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an anomalous network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the network. We investigate anomalous networks under the Network Multispecies Coalescent Model with possible correlated inheritance at reticulations. Focusing on subsets of 4 taxa, we describe a new algorithm to calculate quartet concordance factors on networks of any level, faster than previous algorithms because of its focus on 4 taxa. We then study topological properties required for a 4-taxon network to be anomalous, uncovering the key role of [Formula: see text]-cycles: cycles of 3 edges parent to a sister group of 2 taxa. Under the model of common inheritance, that is, when each gene tree coalesces within a species tree displayed in the network, we prove that 4-taxon networks are never anomalous. Under independent and various levels of correlated inheritance, we use simulations under realistic parameters to quantify the prevalence of anomalous 4-taxon networks, finding that truly anomalous networks are rare. At the same time, however, we find a significant fraction of networks close enough to the anomaly zone to appear anomalous, when considering the quartet concordance factors observed from a few hundred genes. These apparent anomalies may challenge network inference methods.
Collapse
Affiliation(s)
- Cécile Ané
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA.
- Department of Botany, University of Wisconsin - Madison, Madison, WI, 53706, USA.
| | - John Fogg
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA
| | - Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775-6660, USA
| | - Hector Baños
- Department of Biochemistry & Molecular Biology, Dalhousie University, Halifax, NS, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775-6660, USA
| |
Collapse
|
4
|
Russo CAM, Eyre-Walker A, Katz LA, Gaut BS. Forty Years of Inferential Methods in the Journals of the Society for Molecular Biology and Evolution. Mol Biol Evol 2024; 41:msad264. [PMID: 38197288 PMCID: PMC10763999 DOI: 10.1093/molbev/msad264] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 11/27/2023] [Indexed: 01/11/2024] Open
Abstract
We are launching a series to celebrate the 40th anniversary of the first issue of Molecular Biology and Evolution. In 2024, we will publish virtual issues containing selected papers published in the Society for Molecular Biology and Evolution journals, Molecular Biology and Evolution and Genome Biology and Evolution. Each virtual issue will be accompanied by a perspective that highlights the historic and contemporary contributions of our journals to a specific topic in molecular evolution. This perspective, the first in the series, presents an account of the broad array of methods that have been published in the Society for Molecular Biology and Evolution journals, including methods to infer phylogenies, to test hypotheses in a phylogenetic framework, and to infer population genetic processes. We also mention many of the software implementations that make methods tractable for empiricists. In short, the Society for Molecular Biology and Evolution community has much to celebrate after four decades of publishing high-quality science including numerous important inferential methods.
Collapse
Affiliation(s)
- Claudia A M Russo
- Departamento de Genética, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | | | - Laura A Katz
- Department of Biological Sciences, Smith College, Northampton, MA, USA
| | - Brandon S Gaut
- School of Biological Sciences, University of California, Irvine, CA, USA
| |
Collapse
|
5
|
Thawornwattana Y, Seixas F, Yang Z, Mallet J. Major patterns in the introgression history of Heliconius butterflies. eLife 2023; 12:RP90656. [PMID: 38108819 PMCID: PMC10727504 DOI: 10.7554/elife.90656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023] Open
Abstract
Gene flow between species, although usually deleterious, is an important evolutionary process that can facilitate adaptation and lead to species diversification. It also makes estimation of species relationships difficult. Here, we use the full-likelihood multispecies coalescent (MSC) approach to estimate species phylogeny and major introgression events in Heliconius butterflies from whole-genome sequence data. We obtain a robust estimate of species branching order among major clades in the genus, including the 'melpomene-silvaniform' group, which shows extensive historical and ongoing gene flow. We obtain chromosome-level estimates of key parameters in the species phylogeny, including species divergence times, present-day and ancestral population sizes, as well as the direction, timing, and intensity of gene flow. Our analysis leads to a phylogeny with introgression events that differ from those obtained in previous studies. We find that Heliconius aoede most likely represents the earliest-branching lineage of the genus and that 'silvaniform' species are paraphyletic within the melpomene-silvaniform group. Our phylogeny provides new, parsimonious histories for the origins of key traits in Heliconius, including pollen feeding and an inversion involved in wing pattern mimicry. Our results demonstrate the power and feasibility of the full-likelihood MSC approach for estimating species phylogeny and key population parameters despite extensive gene flow. The methods used here should be useful for analysis of other difficult species groups with high rates of introgression.
Collapse
Affiliation(s)
| | - Fernando Seixas
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
| | - James Mallet
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| |
Collapse
|
6
|
Fogg J, Allman ES, Ané C. PhyloCoalSimulations: A Simulator for Network Multispecies Coalescent Models, Including a New Extension for the Inheritance of Gene Flow. Syst Biol 2023; 72:1171-1179. [PMID: 37254872 DOI: 10.1093/sysbio/syad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 05/03/2023] [Accepted: 05/15/2023] [Indexed: 06/01/2023] Open
Abstract
We consider the evolution of phylogenetic gene trees along phylogenetic species networks, according to the network multispecies coalescent process, and introduce a new network coalescent model with correlated inheritance of gene flow. This model generalizes two traditional versions of the network coalescent: with independent or common inheritance. At each reticulation, multiple lineages of a given locus are inherited from parental populations chosen at random, either independently across lineages or with positive correlation according to a Dirichlet process. This process may account for locus-specific probabilities of inheritance, for example. We implemented the simulation of gene trees under these network coalescent models in the Julia package PhyloCoalSimulations, which depends on PhyloNetworks and its powerful network manipulation tools. Input species phylogenies can be read in extended Newick format, either in numbers of generations or in coalescent units. Simulated gene trees can be written in Newick format, and in a way that preserves information about their embedding within the species network. This embedding can be used for downstream purposes, such as to simulate species-specific processes like rate variation across species, or for other scenarios as illustrated in this note. This package should be useful for simulation studies and simulation-based inference methods. The software is available open source with documentation and a tutorial at https://github.com/cecileane/PhyloCoalSimulations.jl.
Collapse
Affiliation(s)
- John Fogg
- Department of Statistics, University of Wisconsin - Madison, WI, 53706, USA
| | - Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska - Fairbanks, AK, 99775, USA
| | - Cécile Ané
- Department of Statistics, University of Wisconsin - Madison, WI, 53706, USA
- Department of Botany, University of Wisconsin - Madison, WI, 53706, USA
| |
Collapse
|
7
|
Flouri T, Jiao X, Huang J, Rannala B, Yang Z. Efficient Bayesian inference under the multispecies coalescent with migration. Proc Natl Acad Sci U S A 2023; 120:e2310708120. [PMID: 37871206 PMCID: PMC10622872 DOI: 10.1073/pnas.2310708120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 08/15/2023] [Indexed: 10/25/2023] Open
Abstract
Analyses of genome sequence data have revealed pervasive interspecific gene flow and enriched our understanding of the role of gene flow in speciation and adaptation. Inference of gene flow using genomic data requires powerful statistical methods. Yet current likelihood-based methods involve heavy computation and are feasible for small datasets only. Here, we implement the multispecies-coalescent-with-migration model in the Bayesian program bpp, which can be used to test for gene flow and estimate migration rates, as well as species divergence times and population sizes. We develop Markov chain Monte Carlo algorithms for efficient sampling from the posterior, enabling the analysis of genome-scale datasets with thousands of loci. Implementation of both introgression and migration models in the same program allows us to test whether gene flow occurred continuously over time or in pulses. Analyses of genomic data from Anopheles mosquitoes demonstrate rich information in typical genomic datasets about the mode and rate of gene flow.
Collapse
Affiliation(s)
- Tomáš Flouri
- Department of Genetics, Evolution, and Environment, University College London, LondonWC1E 6BT, United Kingdom
| | - Xiyun Jiao
- Department of Statistics and Data Science, China Southern University of Science and Technology, Shenzhen518055, China
| | - Jun Huang
- Department of Intelligent Medical Engineering, School of Biomedical Engineering, Capital Medical University, Beijing100069, China
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California, Davis, CA95616
| | - Ziheng Yang
- Department of Genetics, Evolution, and Environment, University College London, LondonWC1E 6BT, United Kingdom
| |
Collapse
|
8
|
Bernardini G, van Iersel L, Julien E, Stougie L. Constructing phylogenetic networks via cherry picking and machine learning. Algorithms Mol Biol 2023; 18:13. [PMID: 37717003 PMCID: PMC10505335 DOI: 10.1186/s13015-023-00233-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 06/10/2023] [Indexed: 09/18/2023] Open
Abstract
BACKGROUND Combining a set of phylogenetic trees into a single phylogenetic network that explains all of them is a fundamental challenge in evolutionary studies. Existing methods are computationally expensive and can either handle only small numbers of phylogenetic trees or are limited to severely restricted classes of networks. RESULTS In this paper, we apply the recently-introduced theoretical framework of cherry picking to design a class of efficient heuristics that are guaranteed to produce a network containing each of the input trees, for practical-size datasets consisting of binary trees. Some of the heuristics in this framework are based on the design and training of a machine learning model that captures essential information on the structure of the input trees and guides the algorithms towards better solutions. We also propose simple and fast randomised heuristics that prove to be very effective when run multiple times. CONCLUSIONS Unlike the existing exact methods, our heuristics are applicable to datasets of practical size, and the experimental study we conducted on both simulated and real data shows that these solutions are qualitatively good, always within some small constant factor from the optimum. Moreover, our machine-learned heuristics are one of the first applications of machine learning to phylogenetics and show its promise.
Collapse
Affiliation(s)
| | - Leo van Iersel
- Delft Institute of Applied Mathematics, Delft, The Netherlands
| | - Esther Julien
- Delft Institute of Applied Mathematics, Delft, The Netherlands
| | - Leen Stougie
- CWI, Amsterdam, The Netherlands.
- Vrije Universiteit, Amsterdam, The Netherlands.
- INRIA-ERABLE, Lyon, France.
| |
Collapse
|
9
|
Ané C, Fogg J, Allman ES, Baños H, Rhodes JA. ANOMALOUS NETWORKS UNDER THE MULTISPECIES COALESCENT: THEORY AND PREVALENCE. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.18.553582. [PMID: 37662314 PMCID: PMC10473666 DOI: 10.1101/2023.08.18.553582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Reticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an anomalous network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the network. We investigate anomalous networks under the Network Multispecies Coalescent Model with possible correlated inheritance at reticulations. Focusing on subsets of 4 taxa, we describe a new algorithm to calculate quartet concordance factors on networks of any level, faster than previous algorithms because of its focus on 4 taxa. We then study topological properties required for a 4-taxon network to be anomalous, uncovering the key role of 32-cycles: cycles of 3 edges parent to a sister group of 2 taxa. Under the model of common inheritance, that is, when each gene tree coalesces within a species tree displayed in the network, we prove that 4-taxon networks are never anomalous. Under independent and various levels of correlated inheritance, we use simulations under realistic parameters to quantify the prevalence of anomalous 4-taxon networks, finding that truly anomalous networks are rare. At the same time, however, we find a significant fraction of networks close enough to the anomaly zone to appear anomalous, when considering the quartet concordance factors observed from a few hundred genes. These apparent anomalies may challenge network inference methods.
Collapse
Affiliation(s)
- Cécile Ané
- Department of Statistics, University of Wisconsin - Madison, WI, 53706, USA
- Department of Botany, University of Wisconsin - Madison, WI, 53706, USA
| | - John Fogg
- Department of Statistics, University of Wisconsin - Madison, WI, 53706, USA
| | - Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska - Fairbanks, AK, 99775-6660, USA
| | - Hector Baños
- Department of Biochemistry & Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska - Fairbanks, AK, 99775-6660, USA
| |
Collapse
|
10
|
Tiley GP, Flouri T, Jiao X, Poelstra JW, Xu B, Zhu T, Rannala B, Yoder AD, Yang Z. Estimation of species divergence times in presence of cross-species gene flow. Syst Biol 2023; 72:820-836. [PMID: 36961245 PMCID: PMC10405360 DOI: 10.1093/sysbio/syad015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 03/22/2023] [Indexed: 03/25/2023] Open
Abstract
Cross-species introgression can have significant impacts on phylogenomic reconstruction of species divergence events. Here, we used simulations to show how the presence of even a small amount of introgression can bias divergence time estimates when gene flow is ignored in the analysis. Using advances in analytical methods under the multispecies coalescent (MSC) model, we demonstrate that by accounting for incomplete lineage sorting and introgression using large phylogenomic data sets this problem can be avoided. The multispecies-coalescent-with-introgression (MSci) model is capable of accurately estimating both divergence times and ancestral effective population sizes, even when only a single diploid individual per species is sampled. We characterize some general expectations for biases in divergence time estimation under three different scenarios: 1) introgression between sister species, 2) introgression between non-sister species, and 3) introgression from an unsampled (i.e., ghost) outgroup lineage. We also conducted simulations under the isolation-with-migration (IM) model and found that the MSci model assuming episodic gene flow was able to accurately estimate species divergence times despite high levels of continuous gene flow. We estimated divergence times under the MSC and MSci models from two published empirical datasets with previous evidence of introgression, one of 372 target-enrichment loci from baobabs (Adansonia), and another of 1000 transcriptome loci from 14 species of the tomato relative, Jaltomata. The empirical analyses not only confirm our findings from simulations, demonstrating that the MSci model can reliably estimate divergence times but also show that divergence time estimation under the MSC can be robust to the presence of small amounts of introgression in empirical datasets with extensive taxon sampling. [divergence time; gene flow; hybridization; introgression; MSci model; multispecies coalescent].
Collapse
Affiliation(s)
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Xiyun Jiao
- Department of Genetics, Evolution and Environment, University College London, London, UK
- Department of Statistics and Data Science, China Southern University of Science and Technology, Shenzhen, Guangdong, China
| | | | - Bo Xu
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Tianqi Zhu
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California, Davis, Davis, CA, USA
| | - Anne D Yoder
- Department of Biology, Duke University, Durham, NC, USA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London, UK
| |
Collapse
|
11
|
Thawornwattana Y, Huang J, Flouri T, Mallet J, Yang Z. Inferring the Direction of Introgression Using Genomic Sequence Data. Mol Biol Evol 2023; 40:msad178. [PMID: 37552932 PMCID: PMC10439365 DOI: 10.1093/molbev/msad178] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 08/01/2023] [Accepted: 08/02/2023] [Indexed: 08/10/2023] Open
Abstract
Genomic data are informative about the history of species divergence and interspecific gene flow, including the direction, timing, and strength of gene flow. However, gene flow in opposite directions generates similar patterns in multilocus sequence data, such as reduced sequence divergence between the hybridizing species. As a result, inference of the direction of gene flow is challenging. Here, we investigate the information about the direction of gene flow present in genomic sequence data using likelihood-based methods under the multispecies-coalescent-with-introgression model. We analyze the case of two species, and use simulation to examine cases with three or four species. We find that it is easier to infer gene flow from a small population to a large one than in the opposite direction, and easier to infer inflow (gene flow from outgroup species to an ingroup species) than outflow (gene flow from an ingroup species to an outgroup species). It is also easier to infer gene flow if there is a longer time of separate evolution between the initial divergence and subsequent introgression. When introgression is assumed to occur in the wrong direction, the time of introgression tends to be correctly estimated and the Bayesian test of gene flow is often significant, while estimates of introgression probability can be even greater than the true probability. We analyze genomic sequences from Heliconius butterflies to demonstrate that typical genomic datasets are informative about the direction of interspecific gene flow, as well as its timing and strength.
Collapse
Affiliation(s)
| | - Jun Huang
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, P.R. China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - James Mallet
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| |
Collapse
|
12
|
Ji J, Jackson DJ, Leaché AD, Yang Z. Power of Bayesian and Heuristic Tests to Detect Cross-Species Introgression with Reference to Gene Flow in the Tamias quadrivittatus Group of North American Chipmunks. Syst Biol 2023; 72:446-465. [PMID: 36504374 PMCID: PMC10275556 DOI: 10.1093/sysbio/syac077] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 11/15/2022] [Accepted: 12/01/2022] [Indexed: 10/25/2023] Open
Abstract
In the past two decades, genomic data have been widely used to detect historical gene flow between species in a variety of plants and animals. The Tamias quadrivittatus group of North America chipmunks, which originated through a series of rapid speciation events, are known to undergo massive amounts of mitochondrial introgression. Yet in a recent analysis of targeted nuclear loci from the group, no evidence for cross-species introgression was detected, indicating widespread cytonuclear discordance. The study used the heuristic method HYDE to detect gene flow, which may suffer from low power. Here we use the Bayesian method implemented in the program BPP to re-analyze these data. We develop a Bayesian test of introgression, calculating the Bayes factor via the Savage-Dickey density ratio using the Markov chain Monte Carlo (MCMC) sample under the model of introgression. We take a stepwise approach to constructing an introgression model by adding introgression events onto a well-supported binary species tree. The analysis detected robust evidence for multiple ancient introgression events affecting the nuclear genome, with introgression probabilities reaching 63%. We estimate population parameters and highlight the fact that species divergence times may be seriously underestimated if ancient cross-species gene flow is ignored in the analysis. We examine the assumptions and performance of HYDE and demonstrate that it lacks power if gene flow occurs between sister lineages or if the mode of gene flow does not match the assumed hybrid-speciation model with symmetrical population sizes. Our analyses highlight the power of likelihood-based inference of cross-species gene flow using genomic sequence data. [Bayesian test; BPP; chipmunks; introgression; MSci; multispecies coalescent; Savage-Dickey density ratio.].
Collapse
Affiliation(s)
- Jiayi Ji
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Donavan J Jackson
- Department of Biology and Burke Museum of Natural History and Culture, University of Washington, Box 351800, Seattle, WA 98195-1800, USA
| | - Adam D Leaché
- Department of Biology and Burke Museum of Natural History and Culture, University of Washington, Box 351800, Seattle, WA 98195-1800, USA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| |
Collapse
|
13
|
Leal JL, Milesi P, Salojärvi J, Lascoux M. Phylogenetic Analysis of Allotetraploid Species Using Polarized Genomic Sequences. Syst Biol 2023; 72:372-390. [PMID: 36932679 PMCID: PMC10275558 DOI: 10.1093/sysbio/syad009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 10/14/2022] [Accepted: 03/10/2023] [Indexed: 03/19/2023] Open
Abstract
Phylogenetic analysis of polyploid hybrid species has long posed a formidable challenge as it requires the ability to distinguish between alleles of different ancestral origins in order to disentangle their individual evolutionary history. This problem has been previously addressed by conceiving phylogenies as reticulate networks, using a two-step phasing strategy that first identifies and segregates homoeologous loci and then, during a second phasing step, assigns each gene copy to one of the subgenomes of an allopolyploid species. Here, we propose an alternative approach, one that preserves the core idea behind phasing-to produce separate nucleotide sequences that capture the reticulate evolutionary history of a polyploid-while vastly simplifying its implementation by reducing a complex multistage procedure to a single phasing step. While most current methods used for phylogenetic reconstruction of polyploid species require sequencing reads to be pre-phased using experimental or computational methods-usually an expensive, complex, and/or time-consuming endeavor-phasing executed using our algorithm is performed directly on the multiple-sequence alignment (MSA), a key change that allows for the simultaneous segregation and sorting of gene copies. We introduce the concept of genomic polarization that, when applied to an allopolyploid species, produces nucleotide sequences that capture the fraction of a polyploid genome that deviates from that of a reference sequence, usually one of the other species present in the MSA. We show that if the reference sequence is one of the parental species, the polarized polyploid sequence has a close resemblance (high pairwise sequence identity) to the second parental species. This knowledge is harnessed to build a new heuristic algorithm where, by replacing the allopolyploid genomic sequence in the MSA by its polarized version, it is possible to identify the phylogenetic position of the polyploid's ancestral parents in an iterative process. The proposed methodology can be used with long-read and short-read high-throughput sequencing data and requires only one representative individual for each species to be included in the phylogenetic analysis. In its current form, it can be used in the analysis of phylogenies containing tetraploid and diploid species. We test the newly developed method extensively using simulated data in order to evaluate its accuracy. We show empirically that the use of polarized genomic sequences allows for the correct identification of both parental species of an allotetraploid with up to 97% certainty in phylogenies with moderate levels of incomplete lineage sorting (ILS) and 87% in phylogenies containing high levels of ILS. We then apply the polarization protocol to reconstruct the reticulate histories of Arabidopsis kamchatica and Arabidopsis suecica, two allopolyploids whose ancestry has been well documented. [Allopolyploidy; Arabidopsis; genomic polarization; homoeologs; incomplete lineage sorting; phasing; polyploid phylogenetics; reticulate evolution.].
Collapse
Affiliation(s)
- J Luis Leal
- Plant Ecology and Evolution, Department of Ecology and Genetics, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden
| | - Pascal Milesi
- Plant Ecology and Evolution, Department of Ecology and Genetics, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden
- Science for Life Laboratory (SciLifeLab), Uppsala University, 75237 Uppsala, Sweden
| | - Jarkko Salojärvi
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences, and Viikki Plant Science Centre, University of Helsinki, P.O. Box 65 (Viikinkaari 1), 00014 Helsinki, Finland
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
| | - Martin Lascoux
- Plant Ecology and Evolution, Department of Ecology and Genetics, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden
- Science for Life Laboratory (SciLifeLab), Uppsala University, 75237 Uppsala, Sweden
| |
Collapse
|
14
|
Bai J, Du C, Lu Y, Wang R, Su X, Yu K, Qin Q, Chen Y, Wei Z, Huang W, Ouyang K. Phylogenetic and Spatiotemporal Analyses of Porcine Epidemic Diarrhea Virus in Guangxi, China during 2017–2022. Animals (Basel) 2023; 13:ani13071215. [PMID: 37048471 PMCID: PMC10093014 DOI: 10.3390/ani13071215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 03/24/2023] [Accepted: 03/26/2023] [Indexed: 04/03/2023] Open
Abstract
Since 2010, porcine epidemic diarrhea virus (PEDV) has swept across China and spread throughout the country, causing huge economic losses. In this study, 673 diarrhea samples from 143 pig farms in Guangxi during 2017–2022 were collected and detected for PEDV. Ninety-eight strains were selected for S1 gene analyses and these strains were classified into four subgroups (G1b, G2a, G2b and G2c), accounting for 1.02 (1/98), 75.51 (74/98), 16.33 (16/98) and 7.14% (7/98) of the total, respectively. Importantly, an increased number of strains in the G2c subgroup was found from 2019 onwards. Bayesian analysis revealed that Guigang may have been the epicenter of PEDVs in Guangxi. In addition, Guigang was identified as the primary hub from which PEDVs spread via two routes, namely Guigang–Wuzhou and Guigang–Laibin. Moreover, several coinfections of novel PEDV variants bearing large deletions in the partial S1 protein and PEDVs possessing an intact partial S1 protein were found in pigs. Further recombination analyses indicated that two of the strains, 18-GXNN-6 and 19-GXBH-2, originated from intra-genogroup recombination. Together, our data revealed a new profile of PEDV in Guangxi, China, which enhances our understanding of the distribution, genetic characteristics and evolutionary profile of the circulating PEDV strains in China.
Collapse
Affiliation(s)
- Jiaguo Bai
- Laboratory of Animal Infectious Diseases and Molecular Immunology, College of Animal Science and Technology, Guangxi University, Nanning 530005, China
| | - Chen Du
- Laboratory of Animal Infectious Diseases and Molecular Immunology, College of Animal Science and Technology, Guangxi University, Nanning 530005, China
| | - Ying Lu
- Laboratory of Animal Infectious Diseases and Molecular Immunology, College of Animal Science and Technology, Guangxi University, Nanning 530005, China
| | - Ruomu Wang
- Laboratory of Animal Infectious Diseases and Molecular Immunology, College of Animal Science and Technology, Guangxi University, Nanning 530005, China
| | - Xueli Su
- Laboratory of Animal Infectious Diseases and Molecular Immunology, College of Animal Science and Technology, Guangxi University, Nanning 530005, China
| | - Kechen Yu
- Laboratory of Animal Infectious Diseases and Molecular Immunology, College of Animal Science and Technology, Guangxi University, Nanning 530005, China
| | - Qiuying Qin
- Laboratory of Animal Infectious Diseases and Molecular Immunology, College of Animal Science and Technology, Guangxi University, Nanning 530005, China
| | - Ying Chen
- Laboratory of Animal Infectious Diseases and Molecular Immunology, College of Animal Science and Technology, Guangxi University, Nanning 530005, China
- Guangxi Zhuang Autonomous Region Engineering Research Center of Veterinary Biologics, Nanning 530005, China
- Guangxi Key Laboratory of Animal Breeding, Disease Control and Prevention, Nanning 530005, China
- Key Laboratory of Prevention and Control for Animal Disease, Guangxi University, Nanning 530005, China
| | - Zuzhang Wei
- Laboratory of Animal Infectious Diseases and Molecular Immunology, College of Animal Science and Technology, Guangxi University, Nanning 530005, China
- Guangxi Zhuang Autonomous Region Engineering Research Center of Veterinary Biologics, Nanning 530005, China
- Guangxi Key Laboratory of Animal Breeding, Disease Control and Prevention, Nanning 530005, China
- Key Laboratory of Prevention and Control for Animal Disease, Guangxi University, Nanning 530005, China
| | - Weijian Huang
- Laboratory of Animal Infectious Diseases and Molecular Immunology, College of Animal Science and Technology, Guangxi University, Nanning 530005, China
- Guangxi Zhuang Autonomous Region Engineering Research Center of Veterinary Biologics, Nanning 530005, China
- Guangxi Key Laboratory of Animal Breeding, Disease Control and Prevention, Nanning 530005, China
- Key Laboratory of Prevention and Control for Animal Disease, Guangxi University, Nanning 530005, China
| | - Kang Ouyang
- Laboratory of Animal Infectious Diseases and Molecular Immunology, College of Animal Science and Technology, Guangxi University, Nanning 530005, China
- Guangxi Zhuang Autonomous Region Engineering Research Center of Veterinary Biologics, Nanning 530005, China
- Guangxi Key Laboratory of Animal Breeding, Disease Control and Prevention, Nanning 530005, China
- Key Laboratory of Prevention and Control for Animal Disease, Guangxi University, Nanning 530005, China
- Correspondence:
| |
Collapse
|
15
|
Nielsen SV, Vaughn AH, Leppälä K, Landis MJ, Mailund T, Nielsen R. Bayesian inference of admixture graphs on Native American and Arctic populations. PLoS Genet 2023; 19:e1010410. [PMID: 36780565 PMCID: PMC9956672 DOI: 10.1371/journal.pgen.1010410] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 02/24/2023] [Accepted: 01/23/2023] [Indexed: 02/15/2023] Open
Abstract
Admixture graphs are mathematical structures that describe the ancestry of populations in terms of divergence and merging (admixing) of ancestral populations as a graph. An admixture graph consists of a graph topology, branch lengths, and admixture proportions. The branch lengths and admixture proportions can be estimated using numerous numerical optimization methods, but inferring the topology involves a combinatorial search for which no polynomial algorithm is known. In this paper, we present a reversible jump MCMC algorithm for sampling high-probability admixture graphs and show that this approach works well both as a heuristic search for a single best-fitting graph and for summarizing shared features extracted from posterior samples of graphs. We apply the method to 11 Native American and Siberian populations and exploit the shared structure of high-probability graphs to characterize the relationship between Saqqaq, Inuit, Koryaks, and Athabascans. Our analyses show that the Saqqaq is not a good proxy for the previously identified gene flow from Arctic people into the Na-Dene speaking Athabascans.
Collapse
Affiliation(s)
- Svend V. Nielsen
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Andrew H. Vaughn
- Center for Computational Biology, University of California Berkeley, Berkeley, California, United States of America
| | - Kalle Leppälä
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
| | - Michael J. Landis
- Department of Biology, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Thomas Mailund
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Rasmus Nielsen
- Departments of Integrative Biology and Statistics, University of California Berkeley, Berkeley, California, United States of America
- Center for GeoGenetics, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
16
|
Duan Y, Fu S, Ye Z, Bu W. Phylogeny of Urostylididae (Heteroptera: Pentatomoidea) reveals rapid radiation and challenges traditional classification. ZOOL SCR 2023. [DOI: 10.1111/zsc.12582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Affiliation(s)
- Yujie Duan
- Institute of Entomology, College of Life Sciences Nankai University Tianjin China
| | - Siying Fu
- Institute of Entomology, College of Life Sciences Nankai University Tianjin China
| | - Zhen Ye
- Institute of Entomology, College of Life Sciences Nankai University Tianjin China
| | - Wenjun Bu
- Institute of Entomology, College of Life Sciences Nankai University Tianjin China
| |
Collapse
|
17
|
Babarinde IA, Adeola AC, Djagoun CAMS, Nneji LM, Okeyoyin AO, Niba G, Wanzie NK, Oladipo OC, Adebambo AO, Bello SF, Ng'ang'a SI, Olaniyi WA, Okoro VMO, Adedeji BE, Olatunde O, Ayoola AO, Matouke MM, Wang YY, Sanke OJ, Oseni SO, Nwani CD, Murphy RW. Population structure and evolutionary history of the greater cane rat ( Thryonomys swinderianus) from the Guinean Forests of West Africa. Front Genet 2023; 14:1041103. [PMID: 36923796 PMCID: PMC10010571 DOI: 10.3389/fgene.2023.1041103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Accepted: 02/07/2023] [Indexed: 03/02/2023] Open
Abstract
Grasscutter (Thryonomys swinderianus) is a large-body old world rodent found in sub-Saharan Africa. The body size and the unique taste of the meat of this major crop pest have made it a target of intense hunting and a potential consideration as a micro-livestock. However, there is insufficient knowledge on the genetic diversity of its populations across African Guinean forests. Herein, we investigated the genetic diversity, population structures and evolutionary history of seven Nigerian wild grasscutter populations together with individuals from Cameroon, Republic of Benin, and Ghana, using five mitochondrial fragments, including D-loop and cytochrome b (CYTB). D-loop haplotype diversity ranged from 0.571 (± 0.149) in Republic of Benin to 0.921 (± 0.013) in Ghana. Within Nigeria, the haplotype diversity ranged from 0.659 (± 0.059) in Cross River to 0.837 (± 0.075) in Ondo subpopulation. The fixation index (FST), haplotype frequency distribution and analysis of molecular variance revealed varying levels of population structures across populations. No significant signature of population contraction was detected in the grasscutter populations. Evolutionary analyses of CYTB suggests that South African population might have diverged from other populations about 6.1 (2.6-10.18, 95% CI) MYA. Taken together, this study reveals the population status and evolutionary history of grasscutter populations in the region.
Collapse
Affiliation(s)
- Isaac A Babarinde
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Adeniyi C Adeola
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.,Sino-Africa Joint Research Centre, Chinese Academy of Sciences, Kunming, China.,Centre for Biotechnology Research, Bayero University, Kano, Nigeria
| | - Chabi A M S Djagoun
- Laboratory of Applied Ecology, Faculty of Agronomic Sciences, University of Abomey-Calavi, Cotonou, Benin
| | - Lotanna M Nneji
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ, United States
| | - Agboola O Okeyoyin
- National Park Service Headquarters, Federal Capital Territory, Abuja, Nigeria
| | - George Niba
- National Centre for Animal Husbandry and Veterinary Training, Jakiri, North West Region, Cameroon
| | - Ndifor K Wanzie
- Department of Zoology, University of Douala, Douala, Cameroon.,Department of Zoology, Faculty of Life Sciences, University of Ilorin, Ilorin, Kwara State, Nigeria
| | | | - Ayotunde O Adebambo
- Animal Genetics & Biotechnology, Federal University of Agriculture, Abeokuta, Nigeria
| | - Semiu F Bello
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Said I Ng'ang'a
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Wasiu A Olaniyi
- Department of Animal Science, Faculty of Agriculture, Adekunle Ajasin University, Akungba-Akoko, Ondo State, Nigeria
| | - Victor M O Okoro
- Department of Animal Science and Technology, School of Agriculture and Agricultural Technology, Federal University of Technology, Owerri, Nigeria
| | | | - Omotoso Olatunde
- Department of Zoology, University of Ibadan, Ibadan, Oyo State, Nigeria
| | - Adeola O Ayoola
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.,Sino-Africa Joint Research Centre, Chinese Academy of Sciences, Kunming, China
| | - Moise M Matouke
- Department of Fisheries and Aquatic Resources Management, University of Buea, Buea, Cameroon
| | | | - Oscar J Sanke
- Taraba State Ministry of Agriculture and Natural Resources, Jalingo, Nigeria
| | - Saidu O Oseni
- Department of Animal Sciences, Faculty of Agriculture, Obafemi Awolowo University, Ile-Ife, Nigeria
| | - Christopher D Nwani
- Department of Zoology and Environmental Biology, Faculty of Biological Sciences, University of Nigeria, Nsukka, Nigeria
| | - Robert W Murphy
- Centre for Biodiversity and Conservation Biology, Royal Ontario Museum, Toronto, ON, Canada
| |
Collapse
|
18
|
Allman ES, Baños H, Mitchell JD, Rhodes JA. The tree of blobs of a species network: identifiability under the coalescent. J Math Biol 2022; 86:10. [PMID: 36472708 PMCID: PMC10062380 DOI: 10.1007/s00285-022-01838-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 08/31/2022] [Accepted: 11/17/2022] [Indexed: 12/12/2022]
Abstract
Inference of species networks from genomic data under the Network Multispecies Coalescent Model is currently severely limited by heavy computational demands. It also remains unclear how complicated networks can be for consistent inference to be possible. As a step toward inferring a general species network, this work considers its tree of blobs, in which non-cut edges are contracted to nodes, so only tree-like relationships between the taxa are shown. An identifiability theorem, that most features of the unrooted tree of blobs can be determined from the distribution of gene quartet topologies, is established. This depends upon an analysis of gene quartet concordance factors under the model, together with a new combinatorial inference rule. The arguments for this theoretical result suggest a practical algorithm for tree of blobs inference, to be fully developed in a subsequent work.
Collapse
Affiliation(s)
- Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA
| | - Hector Baños
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Dalhousie University, Halifax, NS, Canada
- Department of Mathematics and Statistics, Faculty of Science, Dalhousie University, Halifax, NS, Canada
| | - Jonathan D Mitchell
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA
- School of Natural Sciences (Mathematics), University of Tasmania, Hobart, TAS, 7001, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Tasmania, Hobart, TAS, 7001, Australia
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA.
| |
Collapse
|
19
|
Huang J, Thawornwattana Y, Flouri T, Mallet J, Yang Z. Inference of Gene Flow between Species under Misspecified Models. Mol Biol Evol 2022; 39:6783212. [PMID: 36317198 PMCID: PMC9729068 DOI: 10.1093/molbev/msac237] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Genomic sequence data provide a rich source of information about the history of species divergence and interspecific hybridization or introgression. Despite recent advances in genomics and statistical methods, it remains challenging to infer gene flow, and as a result, one may have to estimate introgression rates and times under misspecified models. Here we use mathematical analysis and computer simulation to examine estimation bias and issues of interpretation when the model of gene flow is misspecified in analysis of genomic datasets, for example, if introgression is assigned to the wrong lineages. In the case of two species, we establish a correspondence between the migration rate in the continuous migration model and the introgression probability in the introgression model. When gene flow occurs continuously through time but in the analysis is assumed to occur at a fixed time point, common evolutionary parameters such as species divergence times are surprisingly well estimated. However, the time of introgression tends to be estimated towards the recent end of the period of continuous gene flow. When introgression events are assigned incorrectly to the parental or daughter lineages, introgression times tend to collapse onto species divergence times, with introgression probabilities underestimated. Overall, our analyses suggest that the simple introgression model is useful for extracting information concerning between-specific gene flow and divergence even when the model may be misspecified. However, for reliable inference of gene flow it is important to include multiple samples per species, in particular, from hybridizing species.
Collapse
Affiliation(s)
| | | | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, United Kingdom
| | - James Mallet
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138
| | | |
Collapse
|
20
|
LeMay M, Libeskind-Hadas R, Wu YC. A Polynomial-Time Algorithm for Minimizing the Deep Coalescence Cost for Level-1 Species Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2642-2653. [PMID: 34406946 DOI: 10.1109/tcbb.2021.3105922] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Phylogenetic analyses commonly assume that the species history can be represented as a tree. However, in the presence of hybridization, the species history is more accurately captured as a network. Despite several advances in modeling phylogenetic networks, there is no known polynomial-time algorithm for parsimoniously reconciling gene trees with species networks while accounting for incomplete lineage sorting. To address this issue, we present a polynomial-time algorithm for the case of level-1 networks, in which no hybrid species is the direct ancestor of another hybrid species. This work enables more efficient reconciliation of gene trees with species networks, which in turn, enables more efficient reconstruction of species networks.
Collapse
|
21
|
Lutteropp S, Scornavacca C, Kozlov AM, Morel B, Stamatakis A. NetRAX: accurate and fast maximum likelihood phylogenetic network inference. BIOINFORMATICS (OXFORD, ENGLAND) 2022; 38:3725-3733. [PMID: 35713506 DOI: 10.1101/2021.08.30.458194] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 05/11/2022] [Accepted: 06/14/2022] [Indexed: 05/26/2023]
Abstract
MOTIVATION Phylogenetic networks can represent non-treelike evolutionary scenarios. Current, actively developed approaches for phylogenetic network inference jointly account for non-treelike evolution and incomplete lineage sorting (ILS). Unfortunately, this induces a very high computational complexity and current tools can only analyze small datasets. RESULTS We present NetRAX, a tool for maximum likelihood (ML) inference of phylogenetic networks in the absence of ILS. Our tool leverages state-of-the-art methods for efficiently computing the phylogenetic likelihood function on trees, and extends them to phylogenetic networks via the notion of 'displayed trees'. NetRAX can infer ML phylogenetic networks from partitioned multiple sequence alignments and returns the inferred networks in Extended Newick format. On simulated data, our results show a very low relative difference in Bayesian Information Criterion (BIC) score and a near-zero unrooted softwired cluster distance to the true, simulated networks. With NetRAX, a network inference on a partitioned alignment with 8000 sites, 30 taxa and 3 reticulations completes within a few minutes on a standard laptop. AVAILABILITY AND IMPLEMENTATION Our implementation is available under the GNU General Public License v3.0 at https://github.com/lutteropp/NetRAX. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sarah Lutteropp
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
| | - Céline Scornavacca
- Institut des Sciences de l'Évolution Université de Montpellier, CNRS, IRD, EPHE Place Eugène Bataillon, 34095 Montpellier Cedex 05, France
| | - Alexey M Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe 76128, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe 76128, Germany
| |
Collapse
|
22
|
Pang XX, Zhang DY. Impact of Ghost Introgression on Coalescent-based Species Tree Inference and Estimation of Divergence Time. Syst Biol 2022; 72:35-49. [PMID: 35799362 DOI: 10.1093/sysbio/syac047] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 06/25/2022] [Accepted: 07/05/2022] [Indexed: 11/15/2022] Open
Abstract
The species studied in any evolutionary investigation generally constitute a small proportion of all the species currently existing or that have gone extinct. It is therefore likely that introgression, which is widespread across the tree of life, involves "ghosts," i.e., unsampled, unknown, or extinct lineages. However, the impact of ghost introgression on estimations of species trees has rarely been studied and is poorly understood. Here, we use mathematical analysis and simulations to examine the robustness of species tree methods based on the multispecies coalescent model to introgression from a ghost or extant lineage. We found that many results originally obtained for introgression between extant species can easily be extended to ghost introgression, such as the strongly interactive effects of incomplete lineage sorting (ILS) and introgression on the occurrence of anomalous gene trees (AGTs). The relative performance of the summary species tree method (ASTRAL) and the full-likelihood method (*BEAST) varies under different introgression scenarios, with the former being more robust to gene flow between non-sister species whereas the latter performing better under certain conditions of ghost introgression. When an outgroup ghost (defined as a lineage that diverged before the most basal species under investigation) acts as the donor of the introgressed genes, the time of root divergence among the investigated species generally was overestimated, whereas ingroup introgression, as commonly perceived, can only lead to underestimation. In many cases of ingroup introgression that may or may not involve ghost lineages, the stronger the ILS, the higher the accuracy achieved in estimating the time of root divergence, although the topology of the species tree is more prone to be biased by the effect of introgression.
Collapse
Affiliation(s)
- Xiao-Xu Pang
- State Key Laboratory of Earth Surface Processes and Resource Ecology and Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Da-Yong Zhang
- State Key Laboratory of Earth Surface Processes and Resource Ecology and Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
23
|
Lutteropp S, Scornavacca C, Kozlov AM, Morel B, Stamatakis A. NetRAX: Accurate and Fast Maximum Likelihood Phylogenetic Network Inference. Bioinformatics 2022; 38:3725-3733. [PMID: 35713506 PMCID: PMC9344847 DOI: 10.1093/bioinformatics/btac396] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 05/11/2022] [Accepted: 06/14/2022] [Indexed: 12/03/2022] Open
Abstract
Motivation Phylogenetic networks can represent non-treelike evolutionary scenarios. Current, actively developed approaches for phylogenetic network inference jointly account for non-treelike evolution and incomplete lineage sorting (ILS). Unfortunately, this induces a very high computational complexity and current tools can only analyze small datasets. Results We present NetRAX, a tool for maximum likelihood (ML) inference of phylogenetic networks in the absence of ILS. Our tool leverages state-of-the-art methods for efficiently computing the phylogenetic likelihood function on trees, and extends them to phylogenetic networks via the notion of ‘displayed trees’. NetRAX can infer ML phylogenetic networks from partitioned multiple sequence alignments and returns the inferred networks in Extended Newick format. On simulated data, our results show a very low relative difference in Bayesian Information Criterion (BIC) score and a near-zero unrooted softwired cluster distance to the true, simulated networks. With NetRAX, a network inference on a partitioned alignment with 8000 sites, 30 taxa and 3 reticulations completes within a few minutes on a standard laptop. Availability and implementation Our implementation is available under the GNU General Public License v3.0 at https://github.com/lutteropp/NetRAX. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sarah Lutteropp
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, 69118, Germany
| | - Céline Scornavacca
- Institut des Sciences de l'Évolution Université de Montpellier, CNRS, IRD, EPHE Place Eugène Bataillon 34095, Montpellier Cedex 05, France
| | - Alexey M Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, 69118, Germany
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, 69118, Germany.,Institute for Theoretical Informatics,Karlsruhe Institute of Technology, Karlsruhe, 76128, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, 69118, Germany.,Institute for Theoretical Informatics,Karlsruhe Institute of Technology, Karlsruhe, 76128, Germany
| |
Collapse
|
24
|
Young MK, Smith R, Pilgrim KL, Isaak DJ, McKelvey KS, Parkes S, Egge J, Schwartz MK. A Molecular Taxonomy of Cottus in western North America. WEST N AM NATURALIST 2022. [DOI: 10.3398/064.082.0208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Affiliation(s)
- Michael K. Young
- USDA Forest Service, National Genomics Center for Wildlife and Fish Conservation, Rocky Mountain Research Station, 800 E. Beckwith Avenue, Missoula, MT 59802
| | - Rebecca Smith
- USDA Forest Service, National Genomics Center for Wildlife and Fish Conservation, Rocky Mountain Research Station, 800 E. Beckwith Avenue, Missoula, MT 59802
| | - Kristine L. Pilgrim
- USDA Forest Service, National Genomics Center for Wildlife and Fish Conservation, Rocky Mountain Research Station, 800 E. Beckwith Avenue, Missoula, MT 59802
| | - Daniel J. Isaak
- USDA Forest Service, Rocky Mountain Research Station, 322 East Front Street Suite 401, Boise, ID 83702
| | - Kevin S. McKelvey
- USDA Forest Service, National Genomics Center for Wildlife and Fish Conservation, Rocky Mountain Research Station, 800 E. Beckwith Avenue, Missoula, MT 59802
| | - Sharon Parkes
- USDA Forest Service, Rocky Mountain Research Station, 322 East Front Street Suite 401, Boise, ID 83702
| | - Jacob Egge
- Department of Biology, Pacific Lutheran University, Tacoma, WA 98447
| | - Michael K. Schwartz
- USDA Forest Service, National Genomics Center for Wildlife and Fish Conservation, Rocky Mountain Research Station, 800 E. Beckwith Avenue, Missoula, MT 59802
| |
Collapse
|
25
|
Interpreting phylogenetic conflict: Hybridization in the most speciose genus of lichen-forming fungi. Mol Phylogenet Evol 2022; 174:107543. [PMID: 35690378 DOI: 10.1016/j.ympev.2022.107543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 02/06/2022] [Accepted: 05/13/2022] [Indexed: 11/24/2022]
Abstract
While advances in sequencing technologies have been invaluable for understanding evolutionary relationships, increasingly large genomic data sets may result in conflicting evolutionary signals that are often caused by biological processes, including hybridization. Hybridization has been detected in a variety of organisms, influencing evolutionary processes such as generating reproductive barriers and mixing standing genetic variation. Here, we investigate the potential role of hybridization in the diversification of the most speciose genus of lichen-forming fungi, Xanthoparmelia. As Xanthoparmelia is projected to have gone through recent, rapid diversification, this genus is particularly suitable for investigating and interpreting the origins of phylogenomic conflict. Focusing on a clade of Xanthoparmelia largely restricted to the Holarctic region, we used a genome skimming approach to generate 962 single-copy gene regions representing over 2 Mbp of the mycobiont genome. From this genome-scale dataset, we inferred evolutionary relationships using both concatenation and coalescent-based species tree approaches. We also used three independent tests for hybridization. Although different species tree reconstruction methods recovered largely consistent and well-supported trees, there was widespread incongruence among individual gene trees. Despite challenges in differentiating hybridization from ILS in situations of recent rapid radiations, our genome-wide analyses detected multiple potential hybridization events in the Holarctic clade, suggesting one possible source of trait variability in this hyperdiverse genus. This study highlights the value in using a pluralistic approach for characterizing genome-scale conflict, even in groups with well-resolved phylogenies, while highlighting current challenges in detecting the specific impacts of hybridization.
Collapse
|
26
|
Kong S, Pons JC, Kubatko L, Wicke K. Classes of explicit phylogenetic networks and their biological and mathematical significance. J Math Biol 2022; 84:47. [PMID: 35503141 DOI: 10.1007/s00285-022-01746-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/18/2022] [Accepted: 03/31/2022] [Indexed: 11/24/2022]
Abstract
The evolutionary relationships among organisms have traditionally been represented using rooted phylogenetic trees. However, due to reticulate processes such as hybridization or lateral gene transfer, evolution cannot always be adequately represented by a phylogenetic tree, and rooted phylogenetic networks that describe such complex processes have been introduced as a generalization of rooted phylogenetic trees. In fact, estimating rooted phylogenetic networks from genomic sequence data and analyzing their structural properties is one of the most important tasks in contemporary phylogenetics. Over the last two decades, several subclasses of rooted phylogenetic networks (characterized by certain structural constraints) have been introduced in the literature, either to model specific biological phenomena or to enable tractable mathematical and computational analyses. In the present manuscript, we provide a thorough review of these network classes, as well as provide a biological interpretation of the structural constraints underlying these networks where possible. In addition, we discuss how imposing structural constraints on the network topology can be used to address the scalability and identifiability challenges faced in the estimation of phylogenetic networks from empirical data.
Collapse
Affiliation(s)
- Sungsik Kong
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA
| | - Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, 07122, Spain
| | - Laura Kubatko
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA.,Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Kristina Wicke
- Department of Mathematics, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
27
|
Chen LY, Lu B, Morales-Briones DF, Moody ML, Liu F, Hu GW, Huang CH, Chen JM, Wang QF. Phylogenomic Analyses of Alismatales Shed Light into Adaptations to Aquatic Environments. Mol Biol Evol 2022; 39:6570642. [PMID: 35438770 PMCID: PMC9070837 DOI: 10.1093/molbev/msac079] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Land plants first evolved from freshwater algae, and flowering plants returned to water as early as the Cretaceous and multiple times subsequently. Alismatales is the largest clade of aquatic angiosperms including all marine angiosperms, as well as terrestrial plants. We used Alismatales to explore plant adaptations to aquatic environments by analyzing a data set that included 95 samples (89 Alismatales species) covering four genomes and 91 transcriptomes (59 generated in this study). To provide a basis for investigating adaptations, we assessed phylogenetic conflict and whole-genome duplication (WGD) events in Alismatales. We recovered a relationship for the three main clades in Alismatales as (Tofieldiaceae, Araceae) + core Alismatids. We also found phylogenetic conflict among the three main clades that was best explained by incomplete lineage sorting and introgression. Overall, we identified 18 putative WGD events across Alismatales. One of them occurred at the most recent common ancestor of core Alismatids, and three occurred at seagrass lineages. We also found that lineage and life-form were both important for different evolutionary patterns for the genes related to freshwater and marine adaptation. For example, several light- or ethylene-related genes were lost in the seagrass Zosteraceae, but are present in other seagrasses and freshwater species. Stomata-related genes were lost in both submersed freshwater species and seagrasses. Nicotianamine synthase genes, which are important in iron intake, expanded in both submersed freshwater species and seagrasses. Our results advance the understanding of the adaptation to aquatic environments and WGDs using phylogenomics.
Collapse
Affiliation(s)
- Ling-Yun Chen
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden/Core Botanical Garden, Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan 430074, China.,Department of Resources Science of Traditional Chinese Medicines, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Bei Lu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden/Core Botanical Garden, Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan 430074, China.,College of Life Science, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Diego F Morales-Briones
- Department of Plant and Microbial Biology, University of Minnesota, 140 Gortner Laboratory, 1479 Gortner Avenue, Saint Paul, MN 55108, USA.,Systematics, Biodiversity and Evolution of Plants, Ludwig-Maximilians-Universität München, Menzinger Str. 67, 80638 Munich, Germany
| | - Michael L Moody
- Department of Biological Sciences, University of Texas at El Paso, 500 West University Ave, El Paso, TX 79968, USA
| | - Fan Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden/Core Botanical Garden, Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan 430074, China
| | - Guang-Wan Hu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden/Core Botanical Garden, Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan 430074, China
| | - Chien-Hsun Huang
- State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Biodiversity Sciences and Ecological Engineering, Institute of Plant Biology, Institute of Biodiversity Sciences, School of Life Sciences, Fudan University, Shanghai 200433, China
| | - Jin-Ming Chen
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden/Core Botanical Garden, Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan 430074, China
| | - Qing-Feng Wang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden/Core Botanical Garden, Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
28
|
Yang Z, Flouri T. Estimation of Cross-Species Introgression Rates using Genomic Data Despite Model Unidentifiability. Mol Biol Evol 2022; 39:6568285. [PMID: 35417543 PMCID: PMC9087891 DOI: 10.1093/molbev/msac083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Full-likelihood implementations of the multispecies coalescent with introgression (MSci) model treat genealogical fluctuations across the genome as a major source of information to infer the history of species divergence and gene flow using multilocus sequence data. However, MSci models are known to have unidentifiability issues, whereby different models or parameters make the same predictions about the data and cannot be distinguished by the data. Previous studies of unidentifiability have focused on heuristic methods based on gene trees and do not make an efficient use of the information in the data. Here we study the unidentifiability of MSci models under the full-likelihood methods. We characterize the unidentifiability of the bidirectional introgression (BDI) model, which assumes that gene flow occurs in both directions. We derive simple rules for arbitrary BDI models, which create unidentifiability of the label-switching type. In general, an MSci model with k BDI events has 2k unidentifiable modes or towers in the posterior, with each BDI event between sister species creating within-model parameter unidentifiability and each BDI event between nonsister species creating between-model unidentifiability. We develop novel algorithms for processing Markov chain Monte Carlo samples to remove label-switching problems and implement them in the bpp program. We analyze real and synthetic data to illustrate the utility of the BDI models and the new algorithms. We discuss the unidentifiability of heuristic methods and provide guidelines for the use of MSci models to infer gene flow using genomic data.
Collapse
Affiliation(s)
- Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E6BT, UK
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E6BT, UK
| |
Collapse
|
29
|
Markin A, Wagle S, Anderson TK, Eulenstein O. RF-Net 2: fast inference of virus reassortment and hybridization networks. Bioinformatics 2022; 38:2144-2152. [PMID: 35150239 PMCID: PMC9004648 DOI: 10.1093/bioinformatics/btac075] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 01/26/2022] [Accepted: 02/07/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION A phylogenetic network is a powerful model to represent entangled evolutionary histories with both divergent (speciation) and convergent (e.g. hybridization, reassortment, recombination) evolution. The standard approach to inference of hybridization networks is to (i) reconstruct rooted gene trees and (ii) leverage gene tree discordance for network inference. Recently, we introduced a method called RF-Net for accurate inference of virus reassortment and hybridization networks from input gene trees in the presence of errors commonly found in phylogenetic trees. While RF-Net demonstrated the ability to accurately infer networks with up to four reticulations from erroneous input gene trees, its application was limited by the number of reticulations it could handle in a reasonable amount of time. This limitation is particularly restrictive in the inference of the evolutionary history of segmented RNA viruses such as influenza A virus (IAV), where reassortment is one of the major mechanisms shaping the evolution of these pathogens. RESULTS Here, we expand the functionality of RF-Net that makes it significantly more applicable in practice. Crucially, we introduce a fast extension to RF-Net, called Fast-RF-Net, that can handle large numbers of reticulations without sacrificing accuracy. In addition, we develop automatic stopping criteria to select the appropriate number of reticulations heuristically and implement a feature for RF-Net to output error-corrected input gene trees. We then conduct a comprehensive study of the original method and its novel extensions and confirm their efficacy in practice using extensive simulation and empirical IAV evolutionary analyses. AVAILABILITY AND IMPLEMENTATION RF-Net 2 is available at https://github.com/flu-crew/rf-net-2. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alexey Markin
- Virus and Prion Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA 50010, USA
| | - Sanket Wagle
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| | - Tavis K Anderson
- Virus and Prion Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA 50010, USA
| | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
30
|
Identifiability of species network topologies from genomic sequences using the logDet distance. J Math Biol 2022; 84:35. [PMID: 35385988 DOI: 10.1007/s00285-022-01734-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 01/12/2022] [Accepted: 03/02/2022] [Indexed: 10/18/2022]
Abstract
Inference of network-like evolutionary relationships between species from genomic data must address the interwoven signals from both gene flow and incomplete lineage sorting. The heavy computational demands of standard approaches to this problem severely limit the size of datasets that may be analyzed, in both the number of species and the number of genetic loci. Here we provide a theoretical pointer to more efficient methods, by showing that logDet distances computed from genomic-scale sequences retain sufficient information to recover network relationships in the level-1 ultrametric case. This result is obtained under the Network Multispecies Coalescent model combined with a mixture of General Time-Reversible sequence evolution models across individual gene trees. It applies to both unlinked site data, such as for SNPs, and to sequence data in which many contiguous sites may have evolved on a common tree, such as concatenated gene sequences. Thus under standard stochastic models statistically justifiable inference of network relationships from sequences can be accomplished without consideration of individual genes or gene trees.
Collapse
|
31
|
Zhu T, Flouri T, Yang Z. A simulation study to examine the impact of recombination on phylogenomic inferences under the multispecies coalescent model. Mol Ecol 2022; 31:2814-2829. [PMID: 35313033 PMCID: PMC9321900 DOI: 10.1111/mec.16433] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 01/25/2022] [Accepted: 02/28/2022] [Indexed: 11/28/2022]
Affiliation(s)
- Tianqi Zhu
- Institute of Applied Mathematics Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing 100190 China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences Beijing 100190 China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| |
Collapse
|
32
|
Sanderson MJ, Búrquez A, Copetti D, McMahon MM, Zeng Y, Wojciechowski MF. Origin and diversification of the saguaro cactus (Carnegiea gigantea): a within-species phylogenomic analysis. Syst Biol 2022; 71:1178-1194. [PMID: 35244183 DOI: 10.1093/sysbio/syac017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 02/18/2022] [Accepted: 02/25/2022] [Indexed: 11/14/2022] Open
Abstract
Reconstructing accurate historical relationships within a species poses numerous challenges, not least in many plant groups in which gene flow is high enough to extend well beyond species boundaries. Nonetheless, the extent of tree-like history within a species is an empirical question on which it is now possible to bring large amounts of genome sequence to bear. We assess phylogenetic structure across the geographic range of the saguaro cactus, an emblematic member of Cactaceae, a clade known for extensive hybridization and porous species boundaries. Using 200 Gb of whole genome resequencing data from 20 individuals sampled from 10 localities, we assembled two data sets comprising 150,000 biallelic single nucleotide polymorphisms (SNPs) from protein coding sequences. From these we inferred within-species trees and evaluated their significance and robustness using five qualitatively different inference methods. Despite the low sequence diversity, large census population sizes, and presence of wide-ranging pollen and seed dispersal agents, phylogenetic trees were well resolved and highly consistent across both data sets and all methods. We inferred that the most likely root, based on marginal likelihood comparisons, is to the east and south of the region of highest genetic diversity, which lies along the coast of the Gulf of California in Sonora, Mexico. Together with striking decreases in marginal likelihood found to the north, this supports hypotheses that saguaro's current range reflects post-glacial expansion from the refugia in the south of its range. We conclude with observations about practical and theoretical issues raised by phylogenomic data sets within species, in which SNP-based methods must be used rather than gene tree methods that are widely used when sequence divergence is higher. These include computational scalability, inference of gene flow, and proper assessment of statistical support in the presence of linkage effects.
Collapse
Affiliation(s)
- Michael J Sanderson
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Alberto Búrquez
- Instituto de Ecología, Unidad Hermosillo, Universidad Nacional Autónoma de México, Hermosillo, Sonora, Mexico
| | - Dario Copetti
- Arizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, AZ, 85721 USA
| | | | - Yichao Zeng
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | | |
Collapse
|
33
|
Giaretta A, Murphy B, Maurin O, Mazine FF, Sano P, Lucas E. Phylogenetic Relationships Within the Hyper-Diverse Genus Eugenia (Myrtaceae: Myrteae) Based on Target Enrichment Sequencing. FRONTIERS IN PLANT SCIENCE 2022; 12:759460. [PMID: 35185945 PMCID: PMC8855041 DOI: 10.3389/fpls.2021.759460] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 11/29/2021] [Indexed: 06/14/2023]
Abstract
Eugenia is one of the most taxonomically challenging lineages of flowering plants, in which morphological delimitation has changed over the last few years resulting from recent phylogenetic study based on molecular data. Efforts, until now, have been limited to Sanger sequencing of mostly plastid markers. These phylogenetic studies indicate 11 clades formalized as infrageneric groups. However, relationships among these clades are poorly supported at key nodes and inconsistent between studies, particularly along the backbone and within Eugenia sect. Umbellatae encompasses ca. 700 species. To resolve and better understand systematic discordance, 54 Eugenia taxa were subjected to phylogenomic Hyb-Seq using 353 low-copy nuclear genes. Twenty species trees based on coding and non-coding loci of nuclear and plastid datasets were recovered using coalescent and concatenated approaches. Concordant and conflicting topologies were assessed by comparing tree landscapes, topology tests, and gene and site concordance factors. The topologies are similar except between nuclear and plastid datasets. The coalescent trees better accommodate disparity in the intron dataset, which contains more parsimony informative sites, while concatenated trees recover more conservative topologies, as they have narrower distribution in the tree landscape. This suggests that highly supported phylogenetic relationships determined in previous studies do not necessarily indicate overwhelming concordant signal. Congruence must be interpreted carefully especially in concatenated datasets. Despite this, the congruence between the multi-species coalescent (MSC) approach and concatenated tree topologies found here is notable. Our analysis does not support Eugenia subg. Pseudeugenia or sect. Pilothecium, as currently circumscribed, suggesting necessary taxonomic reassessment. Five clades are further discussed within Eugenia sect. Umbellatae progress toward its division into workable clades. While targeted sequencing provides a massive quantity of data that improves phylogenetic resolution in Eugenia, uncertainty still remains in Eugenia sect. Umbellatae. The general pattern of higher site coefficient factor (CF) than gene CF in the backbone of Eugenia suggests stochastic error from limited signal. Tree landscapes in combination with concordance factor scores, as implemented here, provide a comprehensive approach that incorporates several phylogenetic hypotheses. We believe the protocols employed here will be of use for future investigations on the evolutionary history of Myrtaceae.
Collapse
Affiliation(s)
- Augusto Giaretta
- Faculdade de Ciências Biológicas e Ambientais, Universidade Federal da Grande Dourados, Unidade II, Dourados, Brazil
- Laboratório de Sistemática Vegetal, Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Bruce Murphy
- Jodrell Laboratory, Royal Botanic Gardens, Kew, Surrey, United Kingdom
- Department of Life Sciences, Imperial College, London, United Kingdom
| | - Olivier Maurin
- Jodrell Laboratory, Royal Botanic Gardens, Kew, Surrey, United Kingdom
| | - Fiorella F. Mazine
- Centro de Ciências e Tecnologias para a Sustentabilidade, Universidade Federal de São Carlos, Campus Sorocaba, Sorocaba, Brazil
| | - Paulo Sano
- Laboratório de Sistemática Vegetal, Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Eve Lucas
- Herbarium, Royal Botanic Gardens, Kew, Surrey, United Kingdom
| |
Collapse
|
34
|
Jiao X, Flouri T, Yang Z. Multispecies coalescent and its applications to infer species phylogenies and cross-species gene flow. Natl Sci Rev 2022; 8:nwab127. [PMID: 34987842 PMCID: PMC8692950 DOI: 10.1093/nsr/nwab127] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/10/2021] [Accepted: 07/11/2021] [Indexed: 02/06/2023] Open
Abstract
Multispecies coalescent (MSC) is the extension of the single-population coalescent model to multiple species. It integrates the phylogenetic process of species divergences and the population genetic process of coalescent, and provides a powerful framework for a number of inference problems using genomic sequence data from multiple species, including estimation of species divergence times and population sizes, estimation of species trees accommodating discordant gene trees, inference of cross-species gene flow and species delimitation. In this review, we introduce the major features of the MSC model, discuss full-likelihood and heuristic methods of species tree estimation and summarize recent methodological advances in inference of cross-species gene flow. We discuss the statistical and computational challenges in the field and research directions where breakthroughs may be likely in the next few years.
Collapse
Affiliation(s)
- Xiyun Jiao
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| |
Collapse
|
35
|
Thawornwattana Y, Seixas FA, Yang Z, Mallet J. OUP accepted manuscript. Syst Biol 2022; 71:1159-1177. [PMID: 35169847 PMCID: PMC9366460 DOI: 10.1093/sysbio/syac009] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 02/01/2022] [Accepted: 02/08/2022] [Indexed: 11/21/2022] Open
Abstract
Introgressive hybridization plays a key role in adaptive evolution and species diversification in many groups of species. However, frequent hybridization and gene flow between species make estimation of the species phylogeny and key population parameters challenging. Here, we show that by accounting for phasing and using full-likelihood methods, introgression histories and population parameters can be estimated reliably from whole-genome sequence data. We employ the multispecies coalescent (MSC) model with and without gene flow to infer the species phylogeny and cross-species introgression events using genomic data from six members of the erato-sara clade of Heliconius butterflies. The methods naturally accommodate random fluctuations in genealogical history across the genome due to deep coalescence. To avoid heterozygote phasing errors in haploid sequences commonly produced by genome assembly methods, we process and compile unphased diploid sequence alignments and use analytical methods to average over uncertainties in heterozygote phase resolution. There is robust evidence for introgression across the genome, both among distantly related species deep in the phylogeny and between sister species in shallow parts of the tree. We obtain chromosome-specific estimates of key population parameters such as introgression directions, times and probabilities, as well as species divergence times and population sizes for modern and ancestral species. We confirm ancestral gene flow between the sara clade and an ancestral population of Heliconius telesiphe, a likely hybrid speciation origin for Heliconius hecalesia, and gene flow between the sister species Heliconius erato and Heliconius himera. Inferred introgression among ancestral species also explains the history of two chromosomal inversions deep in the phylogeny of the group. This study illustrates how a full-likelihood approach based on the MSC makes it possible to extract rich historical information of species divergence and gene flow from genomic data. [3s; bpp; gene flow; Heliconius; hybrid speciation; introgression; inversion; multispecies coalescent]
Collapse
Affiliation(s)
- Yuttapong Thawornwattana
- Correspondence to be sent to: Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA; E-mail: ; (Y.T. and J.M.); Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK; E-mail: (Z.Y.)
| | - Fernando A Seixas
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Ziheng Yang
- Correspondence to be sent to: Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA; E-mail: ; (Y.T. and J.M.); Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK; E-mail: (Z.Y.)
| | - James Mallet
- Correspondence to be sent to: Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA; E-mail: ; (Y.T. and J.M.); Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK; E-mail: (Z.Y.)
| |
Collapse
|
36
|
Abstract
Phylogenetic networks represent evolutionary history of species and can record natural reticulate evolutionary processes such as horizontal gene transfer and gene recombination. This makes phylogenetic networks a more comprehensive representation of evolutionary history compared to phylogenetic trees. Stochastic processes for generating random trees or networks are important tools in evolutionary analysis, especially in phylogeny reconstruction where they can be utilized for validation or serve as priors for Bayesian methods. However, as more network generators are developed, there is a lack of discussion or comparison for different generators. To bridge this gap, we compare a set of phylogenetic network generators by profiling topological summary statistics of the generated networks over the number of reticulations and comparing the topological profiles.
Collapse
Affiliation(s)
- Remie Janssen
- Delft University of Technology, Delft Institute of Applied Mathematics, Mekelweg 4, 2628 CD, Delft, The Netherlands
| | - Pengyu Liu
- Simon Fraser University, Department of Mathematics, 8888 University Drive, Burnaby, BC V5A 1S6, Canada
| |
Collapse
|
37
|
Hibbins MS, Hahn MW. Phylogenomic approaches to detecting and characterizing introgression. Genetics 2021; 220:6425633. [PMID: 34788444 PMCID: PMC9208645 DOI: 10.1093/genetics/iyab173] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 10/02/2021] [Indexed: 12/26/2022] Open
Abstract
Phylogenomics has revealed the remarkable frequency with which introgression occurs across the tree of life. These discoveries have been enabled by the rapid growth of methods designed to detect and characterize introgression from whole-genome sequencing data. A large class of phylogenomic methods makes use of data across species to infer and characterize introgression based on expectations from the multispecies coalescent. These methods range from simple tests, such as the D-statistic, to model-based approaches for inferring phylogenetic networks. Here, we provide a detailed overview of the various signals that different modes of introgression are expected leave in the genome, and how current methods are designed to detect them. We discuss the strengths and pitfalls of these approaches and identify areas for future development, highlighting the different signals of introgression, and the power of each method to detect them. We conclude with a discussion of current challenges in inferring introgression and how they could potentially be addressed.
Collapse
Affiliation(s)
- Mark S Hibbins
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Matthew W Hahn
- Department of Biology, Indiana University, Bloomington, IN 47405, USA.,Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
38
|
Myers EA, Mulcahy DG, Falk B, Johnson K, Carbi M, de Queiroz K. Interspecific Gene Flow and Mitochondrial Genome Capture During the Radiation of Jamaican Anolis Lizards (Squamata; Iguanidae). Syst Biol 2021; 71:501-511. [PMID: 34735007 DOI: 10.1093/sysbio/syab089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 10/25/2021] [Accepted: 10/28/2021] [Indexed: 11/12/2022] Open
Abstract
Gene flow and reticulation are increasingly recognized as important processes in the diversification of many taxonomic groups. With the increasing ease of collecting genomic data and the development of multispecies coalescent network approaches, such reticulations can be accounted for when inferring phylogeny and diversification. Caribbean Anolis lizards are a classic example of an adaptive radiation in which species have independently radiated on the islands of the Greater Antilles into the same ecomorph classes. Within the Jamaican radiation at least one species, A. opalinus, has been documented to be polyphyletic in its mitochondrial DNA, which could be the result of an ancient reticulation event or incomplete lineage sorting. Here we generate mtDNA and genotyping-by-sequencing (GBS) data and implement gene-tree, species-tree, and multispecies coalescent network methods to infer the diversification of this group. Our mtDNA gene-tree recovers the same relationships previously inferred for this group, which is strikingly different from the species-tree inferred from our GBS data. Posterior predictive simulations suggest that our genomic data violate commonly adopted assumptions of the multispecies coalescent model, so we use network approaches to infer phylogenetic relationships. The inferred network topology contains a reticulation event but does not explain the mtDNA polyphyly observed in this group, however coalescent simulations suggest that the observed mtDNA topology is likely the result of past introgression. How common a signature of gene flow and reticulation is across the radiation of Anolis is unknown; however, the reticulation events that we demonstrate here may have allowed for adaptive evolution, as has been suggested in other, more recent adaptive radiations.
Collapse
Affiliation(s)
- Edward A Myers
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA.,Department of Herpetology, The American Museum of Natural History, New York, NY, USA
| | - Daniel G Mulcahy
- Global Genome Initiative, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Bryan Falk
- Division of Invertebrate Zoology, Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY, USA
| | - Kiyomi Johnson
- Science Research Mentoring Program, American Museum of Natural History, Central Park West and 79th St., NY, NY 10024, USA
| | - Marina Carbi
- Science Research Mentoring Program, American Museum of Natural History, Central Park West and 79th St., NY, NY 10024, USA
| | - Kevin de Queiroz
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| |
Collapse
|
39
|
Mirarab S, Nakhleh L, Warnow T. Multispecies Coalescent: Theory and Applications in Phylogenetics. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-095340] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Species tree estimation is a basic part of many biological research projects, ranging from answering basic evolutionary questions (e.g., how did a group of species adapt to their environments?) to addressing questions in functional biology. Yet, species tree estimation is very challenging, due to processes such as incomplete lineage sorting, gene duplication and loss, horizontal gene transfer, and hybridization, which can make gene trees differ from each other and from the overall evolutionary history of the species. Over the last 10–20 years, there has been tremendous growth in methods and mathematical theory for estimating species trees and phylogenetic networks, and some of these methods are now in wide use. In this survey, we provide an overview of the current state of the art, identify the limitations of existing methods and theory, and propose additional research problems and directions.
Collapse
Affiliation(s)
- Siavash Mirarab
- Electrical and Computer Engineering Department, University of California, San Diego, La Jolla, California 92093, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
40
|
Myers EA. Genome-wide data reveal extensive gene flow during the diversification of the western rattlesnakes (Viperidae: Crotalinae: Crotalus). Mol Phylogenet Evol 2021; 165:107313. [PMID: 34537323 DOI: 10.1016/j.ympev.2021.107313] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 08/28/2021] [Accepted: 09/14/2021] [Indexed: 11/15/2022]
Abstract
Hybridization and introgression are important, but often overlooked processes when inferring phylogenies. When these processes are not accounted for and a strictly diverging phylogenetic model is applied to groups with a history of hybridization, phylogenetic inference and parameter estimation can be inaccurate. Recent developments in phylogenetic network approaches coupled with the increasing availability of genomic data allow inferences of reticulate evolutionary histories across the tree of life. The western rattlesnake species group (C. viridis species complex, C. mitchellii species complex, C. scutulutas, and C. tigris) is an iconic snake lineage that is widespread across western North America. This group is composed of several species complexes with unclear species limits, likely the result of ongoing gene flow among nascent lineages. Here I generate reduced representation genomic data and test for a history of reticulation within this group. I demonstrate that all species have undergone hybridization with at least one other lineage, suggesting introgression is widespread in this group. Topologies differ between phylogenies estimated under the multispecies coalescent and multispecies network coalescent methods, indicating that gene flow has obscured phylogenetic relationships within this group. These past introgression events are predominantly restricted to species that co-occur geographically. However, within species that have a history of introgression, this signature is detected regardless of specimen sampling across geography. Overall, my results suggest the accumulation of reproductive isolating barriers occurs slowly in rattlesnakes which likely leads to the difficulty in delimiting species, furthermore, the results of this study have implications for trait evolution in this group.
Collapse
Affiliation(s)
- Edward A Myers
- Department of Herpetology, American Museum of Natural History, New York, NY, USA; Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA.
| |
Collapse
|
41
|
Rabier CE, Berry V, Stoltz M, Santos JD, Wang W, Glaszmann JC, Pardi F, Scornavacca C. On the inference of complex phylogenetic networks by Markov Chain Monte-Carlo. PLoS Comput Biol 2021; 17:e1008380. [PMID: 34478440 PMCID: PMC8445492 DOI: 10.1371/journal.pcbi.1008380] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 09/16/2021] [Accepted: 07/13/2021] [Indexed: 11/19/2022] Open
Abstract
For various species, high quality sequences and complete genomes are nowadays available for many individuals. This makes data analysis challenging, as methods need not only to be accurate, but also time efficient given the tremendous amount of data to process. In this article, we introduce an efficient method to infer the evolutionary history of individuals under the multispecies coalescent model in networks (MSNC). Phylogenetic networks are an extension of phylogenetic trees that can contain reticulate nodes, which allow to model complex biological events such as horizontal gene transfer, hybridization and introgression. We present a novel way to compute the likelihood of biallelic markers sampled along genomes whose evolution involved such events. This likelihood computation is at the heart of a Bayesian network inference method called SnappNet, as it extends the Snapp method inferring evolutionary trees under the multispecies coalescent model, to networks. SnappNet is available as a package of the well-known beast 2 software. Recently, the MCMC_BiMarkers method, implemented in PhyloNet, also extended Snapp to networks. Both methods take biallelic markers as input, rely on the same model of evolution and sample networks in a Bayesian framework, though using different methods for computing priors. However, SnappNet relies on algorithms that are exponentially more time-efficient on non-trivial networks. Using simulations, we compare performances of SnappNet and MCMC_BiMarkers. We show that both methods enjoy similar abilities to recover simple networks, but SnappNet is more accurate than MCMC_BiMarkers on more complex network scenarios. Also, on complex networks, SnappNet is found to be extremely faster than MCMC_BiMarkers in terms of time required for the likelihood computation. We finally illustrate SnappNet performances on a rice data set. SnappNet infers a scenario that is consistent with previous results and provides additional understanding of rice evolution.
Collapse
Affiliation(s)
- Charles-Elie Rabier
- Institut des Sciences de l’Evolution (ISEM), Université de Montpellier, CNRS, EPHE, IRD, Montpellier, France
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier, CNRS, Montpellier, France
- Institut Montpelliérain Alexander Grothendieck (IMAG), Université de Montpellier, CNRS, Montpellier, France
| | - Vincent Berry
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier, CNRS, Montpellier, France
| | - Marnus Stoltz
- Institut des Sciences de l’Evolution (ISEM), Université de Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - João D. Santos
- CIRAD, UMR AGAP, Montpellier, France
- Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP), Université de Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Wensheng Wang
- Institute of Crop Sciences (ICS), Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jean-Christophe Glaszmann
- CIRAD, UMR AGAP, Montpellier, France
- Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP), Université de Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Fabio Pardi
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier, CNRS, Montpellier, France
| | - Celine Scornavacca
- Institut des Sciences de l’Evolution (ISEM), Université de Montpellier, CNRS, EPHE, IRD, Montpellier, France
| |
Collapse
|
42
|
Rhodes JA, Baños H, Mitchell JD, Allman ES. MSCquartets 1.0: quartet methods for species trees and networks under the multispecies coalescent model in R. Bioinformatics 2021; 37:1766-1768. [PMID: 33031510 DOI: 10.1093/bioinformatics/btaa868] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 09/17/2020] [Accepted: 09/23/2020] [Indexed: 12/29/2022] Open
Abstract
SUMMARY MSCquartets is an R package for species tree hypothesis testing, inference of species trees and inference of species networks under the Multispecies Coalescent model of incomplete lineage sorting and its network analog. Input for these analyses are collections of metric or topological locus trees which are then summarized by the quartets displayed on them. Results of hypothesis tests at user-supplied levels are displayed in a simplex plot by color-coded points. The package implements the QDC and WQDC algorithms for topological and metric species tree inference, and the NANUQ algorithm for level-1 topological species network inference, all of which give statistically consistent estimators under the model. AVAILABILITY AND IMPLEMENTATION MSCquartets is available through the Comprehensive R Archive Network: https://CRAN.R-project.org/package=MSCquartets.
Collapse
Affiliation(s)
- John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99775-6660, USA
| | - Hector Baños
- School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332-0160, USA
| | - Jonathan D Mitchell
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99775-6660, USA.,Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France
| | - Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99775-6660, USA
| |
Collapse
|
43
|
Guinand B, Oral M, Tougard C. Brown trout phylogenetics: A persistent mirage towards (too) many species. JOURNAL OF FISH BIOLOGY 2021; 99:298-307. [PMID: 33483952 DOI: 10.1111/jfb.14686] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 12/28/2020] [Accepted: 01/19/2021] [Indexed: 06/12/2023]
Affiliation(s)
- Bruno Guinand
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Münevver Oral
- Faculty of Fisheries and Aquatic Science, Recep Tayyip Erdogan University, Rize, Turkey
| | | |
Collapse
|
44
|
Rhodes JA, Baños H, Mitchell JD, Allman ES. MSCquartets 1.0: quartet methods for species trees and networks under the multispecies coalescent model in R. BIOINFORMATICS (OXFORD, ENGLAND) 2021; 37:1766-1768. [PMID: 33031510 DOI: 10.1101/2020.05.01.073361] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 09/17/2020] [Accepted: 09/23/2020] [Indexed: 05/26/2023]
Abstract
SUMMARY MSCquartets is an R package for species tree hypothesis testing, inference of species trees and inference of species networks under the Multispecies Coalescent model of incomplete lineage sorting and its network analog. Input for these analyses are collections of metric or topological locus trees which are then summarized by the quartets displayed on them. Results of hypothesis tests at user-supplied levels are displayed in a simplex plot by color-coded points. The package implements the QDC and WQDC algorithms for topological and metric species tree inference, and the NANUQ algorithm for level-1 topological species network inference, all of which give statistically consistent estimators under the model. AVAILABILITY AND IMPLEMENTATION MSCquartets is available through the Comprehensive R Archive Network: https://CRAN.R-project.org/package=MSCquartets.
Collapse
Affiliation(s)
- John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99775-6660, USA
| | - Hector Baños
- School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332-0160, USA
| | - Jonathan D Mitchell
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99775-6660, USA
- Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France
| | - Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99775-6660, USA
| |
Collapse
|
45
|
Gene flow in phylogenomics: Sequence capture resolves species limits and biogeography of Afromontane forest endemic frogs from the Cameroon Highlands. Mol Phylogenet Evol 2021; 163:107258. [PMID: 34252546 DOI: 10.1016/j.ympev.2021.107258] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 06/28/2021] [Accepted: 07/07/2021] [Indexed: 11/21/2022]
Abstract
Puddle frogs of the Phrynobatrachus steindachneri species complex are a useful group for investigating speciation and phylogeography in Afromontane forests of the Cameroon Volcanic Line, western Central Africa. The species complex is represented by six morphologically relatively cryptic mitochondrial DNA lineages, only two of which are distinguished at the species level - southern P. jimzimkusi and Lake Oku endemic P. njiomock, leaving the remaining four lineages identified as 'P. steindachneri'. In this study, the six mtDNA lineages are subjected to genomic sequence capture analyses and morphological examination to delimit species and to study biogeography. The nuclear DNA data (387 loci; 571,936 aligned base pairs) distinguished all six mtDNA lineages, but the topological pattern and divergence depths supported only four main clades: P. jimzimkusi, P. njiomock, and only two divergent evolutionary lineages within the four 'P. steindachneri' mtDNA lineages. One of the two lineages is herein described as a new species, P. amieti sp. nov. Reticulate evolution (hybridization) was detected within the species complex with morphologically intermediate hybrid individuals placed between the parental species in phylogenomic analyses, forming a ladder-like phylogenetic pattern. The presence of hybrids is undesirable in standard phylogenetic analyses but is essential and beneficial in the network multispecies coalescent. This latter approach provided insight into the reticulate evolutionary history of these endemic frogs. Introgressions likely occurred during the Middle and Late Pleistocene climatic oscillations, due to the cyclic connections (likely dominating during cold glacials) and separations (during warm interglacials) of montane forests. The genomic phylogeographic pattern supports the separation of the southern (Mt. Manengouba to Mt. Oku) and northern mountains at the onset of the Pleistocene. Further subdivisions occurred in the Early Pleistocene, separating populations from the northernmost (Tchabal Mbabo, Gotel Mts.) and middle mountains (Mt. Mbam, Mt. Oku, Mambilla Plateau), as well as the microendemic lineage restricted to Lake Oku (Mt. Oku). This unique model system is highly threatened as all the species within the complex have exhibited severe population declines in the past decade, placing them on the brink of extinction. In addition, Mount Oku is identified to be of particular conservation importance because it harbors three species of this complex. We, therefore, urge for conservation actions in the Cameroon Highlands to preserve their diversity before it is too late.
Collapse
|
46
|
Kozak KM, Joron M, McMillan WO, Jiggins CD. Rampant Genome-Wide Admixture across the Heliconius Radiation. Genome Biol Evol 2021; 13:evab099. [PMID: 33944917 PMCID: PMC8283734 DOI: 10.1093/gbe/evab099] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/30/2021] [Indexed: 12/12/2022] Open
Abstract
How frequent is gene flow between species? The pattern of evolution is typically portrayed as a phylogenetic tree, yet gene flow between good species may be an important mechanism in diversification, spreading adaptive traits and leading to a complex pattern of phylogenetic incongruence. This process has thus far been studied mainly among a few closely related species, or in geographically restricted areas such as islands, but not on the scale of a continental radiation. Using a genomic representation of 40 out of 47 species in the genus, we demonstrate that admixture has played a role throughout the evolution of the charismatic Neotropical butterflies Heliconius. Modeling of phylogenetic networks based on the exome uncovers up to 13 instances of interspecific gene flow. Admixture is detected among the relatives of Heliconius erato, as well as between the ancient lineages leading to modern clades. Interspecific gene flow played a role throughout the evolution of the genus, although the process has been most frequent in the clade of Heliconius melpomene and relatives. We identify Heliconius hecalesia and relatives as putative hybrids, including new evidence for introgression at the loci controlling the mimetic wing patterns. Models accounting for interspecific gene flow yield a more complete picture of the radiation as a network, which will improve our ability to study trait evolution in a realistic comparative framework.
Collapse
Affiliation(s)
- Krzysztof M Kozak
- Smithsonian Tropical Research Institute, Panamá, Panamá
- Department of Zoology, University of Cambridge, United Kingdom
| | - Mathieu Joron
- Centre d’Ecologie Fonctionnelle et Evolutive (CEFE), CNRS, Université de Montpellier, Université Paul Valéry Montpellier 3, EPHE, IRD, France
| | | | - Chris D Jiggins
- Smithsonian Tropical Research Institute, Panamá, Panamá
- Department of Zoology, University of Cambridge, United Kingdom
| |
Collapse
|
47
|
Ogilvie HA, Mendes FK, Vaughan TG, Matzke NJ, Stadler T, Welch D, Drummond AJ. Novel Integrative Modeling of Molecules and Morphology across Evolutionary Timescales. Syst Biol 2021; 71:208-220. [PMID: 34228807 PMCID: PMC8677526 DOI: 10.1093/sysbio/syab054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 06/23/2021] [Accepted: 06/29/2021] [Indexed: 11/13/2022] Open
Abstract
Evolutionary models account for either population- or species-level processes but usually not both. We introduce a new model, the FBD-MSC, which makes it possible for the first time to integrate both the genealogical and fossilization phenomena, by means of the multispecies coalescent (MSC) and the fossilized birth–death (FBD) processes. Using this model, we reconstruct the phylogeny representing all extant and many fossil Caninae, recovering both the relative and absolute time of speciation events. We quantify known inaccuracy issues with divergence time estimates using the popular strategy of concatenating molecular alignments and show that the FBD-MSC solves them. Our new integrative method and empirical results advance the paradigm and practice of probabilistic total evidence analyses in evolutionary biology.[Caninae; fossilized birth–death; molecular clock; multispecies coalescent; phylogenetics; species trees.]
Collapse
Affiliation(s)
- Huw A Ogilvie
- Department of Computer Science, Rice University, Houston TX, 77005, USA
| | - Fábio K Mendes
- Centre for Computational Evolution, The University of Auckland, Auckland, 1010, New Zealand.,School of Biological Sciences, The University of Auckland, Auckland, 1010, New Zealand
| | - Timothy G Vaughan
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - Nicholas J Matzke
- Centre for Computational Evolution, The University of Auckland, Auckland, 1010, New Zealand.,School of Biological Sciences, The University of Auckland, Auckland, 1010, New Zealand
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - David Welch
- Centre for Computational Evolution, The University of Auckland, Auckland, 1010, New Zealand.,School of Computer Science, The University of Auckland, Auckland, 1010, New Zealand
| | - Alexei J Drummond
- Centre for Computational Evolution, The University of Auckland, Auckland, 1010, New Zealand.,School of Computer Science, The University of Auckland, Auckland, 1010, New Zealand.,School of Biological Sciences, The University of Auckland, Auckland, 1010, New Zealand
| |
Collapse
|
48
|
Huang J, Bennett J, Flouri T, Leaché AD, Yang Z. Phase Resolution of Heterozygous Sites in Diploid Genomes is Important to Phylogenomic Analysis under the Multispecies Coalescent Model. Syst Biol 2021; 71:334-352. [PMID: 34143216 PMCID: PMC8977997 DOI: 10.1093/sysbio/syab047] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 06/03/2021] [Accepted: 06/21/2021] [Indexed: 01/01/2023] Open
Abstract
Genome sequencing projects routinely generate haploid consensus sequences from diploid
genomes, which are effectively chimeric sequences with the phase at heterozygous sites
resolved at random. The impact of phasing errors on phylogenomic analyses under the
multispecies coalescent (MSC) model is largely unknown. Here, we conduct a computer
simulation to evaluate the performance of four phase-resolution strategies (the true phase
resolution, the diploid analytical integration algorithm which averages over all phase
resolutions, computational phase resolution using the program PHASE, and random
resolution) on estimation of the species tree and evolutionary parameters in analysis of
multilocus genomic data under the MSC model. We found that species tree estimation is
robust to phasing errors when species divergences were much older than average coalescent
times but may be affected by phasing errors when the species tree is shallow. Estimation
of parameters under the MSC model with and without introgression is affected by phasing
errors. In particular, random phase resolution causes serious overestimation of population
sizes for modern species and biased estimation of cross-species introgression probability.
In general, the impact of phasing errors is greater when the mutation rate is higher, the
data include more samples per species, and the species tree is shallower with recent
divergences. Use of phased sequences inferred by the PHASE program produced small biases
in parameter estimates. We analyze two real data sets, one of East Asian brown frogs and
another of Rocky Mountains chipmunks, to demonstrate that heterozygote phase-resolution
strategies have similar impacts on practical data analyses. We suggest that genome
sequencing projects should produce unphased diploid genotype sequences if fully phased
data are too challenging to generate, and avoid haploid consensus sequences, which have
heterozygous sites phased at random. In case the analytical integration algorithm is
computationally unfeasible, computational phasing prior to population genomic analyses is
an acceptable alternative. [BPP; introgression; multispecies coalescent; phase; species
tree.]
Collapse
Affiliation(s)
- Jun Huang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK.,Department of Mathematics, Beijing Jiaotong University, Beijing, 100044, P.R. China
| | - Jeremy Bennett
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK.,Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, Unit 3043, Storrs, CT 06269-3043, USA
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Adam D Leaché
- Department of Biology & Burke Museum of Natural History and Culture, University of Washington, Seattle, WA 98195-1800, USA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
49
|
Cai R, Ané C. Assessing the fit of the multi-species network coalescent to multi-locus data. Bioinformatics 2021; 37:634-641. [PMID: 33027508 DOI: 10.1093/bioinformatics/btaa863] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 09/14/2020] [Accepted: 09/22/2020] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION With growing genome-wide molecular datasets from next-generation sequencing, phylogenetic networks can be estimated using a variety of approaches. These phylogenetic networks include events like hybridization, gene flow or horizontal gene transfer explicitly. However, the most accurate network inference methods are computationally heavy. Methods that scale to larger datasets do not calculate a full likelihood, such that traditional likelihood-based tools for model selection are not applicable to decide how many past hybridization events best fit the data. We propose here a goodness-of-fit test to quantify the fit between data observed from genome-wide multi-locus data, and patterns expected under the multi-species coalescent model on a candidate phylogenetic network. RESULTS We identified weaknesses in the previously proposed TICR test, and proposed corrections. The performance of our new test was validated by simulations on real-world phylogenetic networks. Our test provides one of the first rigorous tools for model selection, to select the adequate network complexity for the data at hand. The test can also work for identifying poorly inferred areas on a network. AVAILABILITY AND IMPLEMENTATION Software for the goodness-of-fit test is available as a Julia package at https://github.com/cecileane/QuartetNetworkGoodnessFit.jl. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruoyi Cai
- Department of Statistics, University of Wisconsin - Madison, Madison, WI 53706, USA
| | - Cécile Ané
- Department of Statistics, University of Wisconsin - Madison, Madison, WI 53706, USA.,Department of Botany, University of Wisconsin - Madison, Madison, WI 53706, USA
| |
Collapse
|
50
|
Introgression is widespread in the radiation of carnivorous Nepenthes pitcher plants. Mol Phylogenet Evol 2021; 163:107214. [PMID: 34052438 DOI: 10.1016/j.ympev.2021.107214] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Revised: 05/14/2021] [Accepted: 05/25/2021] [Indexed: 11/23/2022]
Abstract
Introgression and hybridization are important processes in plant evolution, but they are difficult to study from a phylogenetic perspective, because they conflict with the bifurcating evolutionary history typically depicted in phylogenetic models. The role of hybridization in plant evolution is best documented in the form of allo-polyploidizations. In contrast, homoploid hybridization and introgression are less explored, although they may be crucial in adaptive radiations. Here we employ genome-wide data (ddRAD-seq, transcriptomes) to investigate the evolutionary history of Nepenthes, a radiation of c. 160 species of iconic carnivorous plants mainly from tropical Asia. Our data indicates that the main radiation is only c. 5 million years old, and confirms previous bifurcating phylogenies. However, due to a greatly expanded number of loci, we were able test for the first time the long-standing hypotheses of introgression and historical hybridization. The genus presents one very clear case of organellar capture between two distantly related but sympatric groups. Furthermore, all Nepenthes species show introgression signals in their nuclear genomes, as uncovered by a general survey of ABBA-BABA-like statistics. The ancestor of the rapid main radiation shows ancestry from two deeply diverged lineages, as indicated by phylogenetic network analyses. All major clades of the main radiation show further introgression both within and between each other, as suggested by admixture graphs. Our study supports the hypothesis that rapid adaptive radiations are hotspots of introgression in the tree of life, and highlights the need to consider non-treelike processes in evolutionary studies of Nepenthes in particular.
Collapse
|