1
|
Sashittal P, Chen V, Pasarkar A, Raphael BJ. Joint inference of cell lineage and mitochondrial evolution from single-cell sequencing data. Bioinformatics 2024; 40:i218-i227. [PMID: 38940122 PMCID: PMC11211840 DOI: 10.1093/bioinformatics/btae231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Eukaryotic cells contain organelles called mitochondria that have their own genome. Most cells contain thousands of mitochondria which replicate, even in nondividing cells, by means of a relatively error-prone process resulting in somatic mutations in their genome. Because of the higher mutation rate compared to the nuclear genome, mitochondrial mutations have been used to track cellular lineage, particularly using single-cell sequencing that measures mitochondrial mutations in individual cells. However, existing methods to infer the cell lineage tree from mitochondrial mutations do not model "heteroplasmy," which is the presence of multiple mitochondrial clones with distinct sets of mutations in an individual cell. Single-cell sequencing data thus provide a mixture of the mitochondrial clones in individual cells, with the ancestral relationships between these clones described by a mitochondrial clone tree. While deconvolution of somatic mutations from a mixture of evolutionarily related genomes has been extensively studied in the context of bulk sequencing of cancer tumor samples, the problem of mitochondrial deconvolution has the additional constraint that the mitochondrial clone tree must be concordant with the cell lineage tree. RESULTS We formalize the problem of inferring a concordant pair of a mitochondrial clone tree and a cell lineage tree from single-cell sequencing data as the Nested Perfect Phylogeny Mixture (NPPM) problem. We derive a combinatorial characterization of the solutions to the NPPM problem, and formulate an algorithm, MERLIN, to solve this problem exactly using a mixed integer linear program. We show on simulated data that MERLIN outperforms existing methods that do not model mitochondrial heteroplasmy nor the concordance between the mitochondrial clone tree and the cell lineage tree. We use MERLIN to analyze single-cell whole-genome sequencing data of 5220 cells of a gastric cancer cell line and show that MERLIN infers a more biologically plausible cell lineage tree and mitochondrial clone tree compared to existing methods. AVAILABILITY AND IMPLEMENTATION https://github.com/raphael-group/MERLIN.
Collapse
Affiliation(s)
- Palash Sashittal
- Department of Computer Science, Princeton University, Princeton, NJ 08540, United States
| | - Viola Chen
- Department of Computer Science, Princeton University, Princeton, NJ 08540, United States
| | - Amey Pasarkar
- Department of Computer Science, Princeton University, Princeton, NJ 08540, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08540, United States
| |
Collapse
|
2
|
Pang XX, Zhang DY. Detection of Ghost Introgression Requires Exploiting Topological and Branch Length Information. Syst Biol 2024; 73:207-222. [PMID: 38224495 PMCID: PMC11129598 DOI: 10.1093/sysbio/syad077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 12/17/2023] [Accepted: 12/27/2023] [Indexed: 01/17/2024] Open
Abstract
In recent years, the study of hybridization and introgression has made significant progress, with ghost introgression-the transfer of genetic material from extinct or unsampled lineages to extant species-emerging as a key area for research. Accurately identifying ghost introgression, however, presents a challenge. To address this issue, we focused on simple cases involving 3 species with a known phylogenetic tree. Using mathematical analyses and simulations, we evaluated the performance of popular phylogenetic methods, including HyDe and PhyloNet/MPL, and the full-likelihood method, Bayesian Phylogenetics and Phylogeography (BPP), in detecting ghost introgression. Our findings suggest that heuristic approaches relying on site-pattern counts or gene-tree topologies struggle to differentiate ghost introgression from introgression between sampled non-sister species, frequently leading to incorrect identification of donor and recipient species. The full-likelihood method BPP uses multilocus sequence alignments directly-hence taking into account both gene-tree topologies and branch lengths, by contrast, is capable of detecting ghost introgression in phylogenomic datasets. We analyzed a real-world phylogenomic dataset of 14 species of Jaltomata (Solanaceae) to showcase the potential of full-likelihood methods for accurate inference of introgression.
Collapse
Affiliation(s)
- Xiao-Xu Pang
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Da-Yong Zhang
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
3
|
Sánchez KI, Recknagel H, Elmer KR, Avila LJ, Morando M. Tracing evolutionary trajectories in the presence of gene flow in South American temperate lizards (Squamata: Liolaemus kingii group). Evolution 2024; 78:716-733. [PMID: 38262697 DOI: 10.1093/evolut/qpae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 12/20/2023] [Accepted: 01/17/2024] [Indexed: 01/25/2024]
Abstract
Evolutionary processes behind lineage divergence often involve multidimensional differentiation. However, in the context of recent divergences, the signals exhibited by each dimension may not converge. In such scenarios, incomplete lineage sorting, gene flow, and scarce phenotypic differentiation are pervasive. Here, we integrated genomic (RAD loci of 90 individuals), phenotypic (linear and geometric traits of 823 and 411 individuals, respectively), spatial, and climatic data to reconstruct the evolutionary history of a speciation continuum of liolaemid lizards (Liolaemus kingii group). Specifically, we (a) inferred the population structure of the group and contrasted it with the phenotypic variability; (b) assessed the role of postdivergence gene flow in shaping phylogeographic and phenotypic patterns; and (c) explored ecogeographic drivers of diversification across time and space. We inferred eight genomic clusters exhibiting leaky genetic borders coincident with geographic transitions. We also found evidence of postdivergence gene flow resulting in transgressive phenotypic evolution in one species. Predicted ancestral niches unveiled suitable areas in southern and eastern Patagonia during glacial and interglacial periods. Our study underscores integrating different data and model-based approaches to determine the underlying causes of diversification, a challenge faced in the study of recently diverged groups. We also highlight Liolaemus as a model system for phylogeographic and broader evolutionary studies.
Collapse
Affiliation(s)
- Kevin I Sánchez
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales, Consejo Nacional de Investigaciones Científicas y Técnicas (IPEEC-CONICET), Puerto Madryn, Chubut, Argentina
| | - Hans Recknagel
- Department of Biology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Kathryn R Elmer
- School of Biodiversity, One Health and Veterinary Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Luciano J Avila
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales, Consejo Nacional de Investigaciones Científicas y Técnicas (IPEEC-CONICET), Puerto Madryn, Chubut, Argentina
| | - Mariana Morando
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales, Consejo Nacional de Investigaciones Científicas y Técnicas (IPEEC-CONICET), Puerto Madryn, Chubut, Argentina
- Departamento de Biología y Ambiente, Universidad Nacional de la Patagonia San Juan Bosco, Sede Puerto Madryn, Puerto Madryn, Chubut, Argentina
| |
Collapse
|
4
|
Leaché AD, Davis HR, Feldman CR, Fujita MK, Singhal S. Repeated patterns of reptile diversification in Western North America supported by the Northern Alligator Lizard (Elgaria coerulea). J Hered 2024; 115:57-71. [PMID: 37982433 PMCID: PMC10838131 DOI: 10.1093/jhered/esad073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 11/09/2023] [Indexed: 11/21/2023] Open
Abstract
Understanding the processes that shape genetic diversity by either promoting or preventing population divergence can help identify geographic areas that either facilitate or limit gene flow. Furthermore, broadly distributed species allow us to understand how biogeographic and ecogeographic transitions affect gene flow. We investigated these processes using genomic data in the Northern Alligator Lizard (Elgaria coerulea), which is widely distributed in Western North America across diverse ecoregions (California Floristic Province and Pacific Northwest) and mountain ranges (Sierra Nevada, Coastal Ranges, and Cascades). We collected single-nucleotide polymorphism data from 120 samples of E. coerulea. Biogeographic analyses of squamate reptiles with similar distributions have identified several shared diversification patterns that provide testable predictions for E. coerulea, including deep genetic divisions in the Sierra Nevada, demographic stability of southern populations, and recent post-Pleistocene expansion into the Pacific Northwest. We use genomic data to test these predictions by estimating the structure, connectivity, and phylogenetic history of populations. At least 10 distinct populations are supported, with mixed-ancestry individuals situated at most population boundaries. A species tree analysis provides strong support for the early divergence of populations in the Sierra Nevada Mountains and recent diversification into the Pacific Northwest. Admixture and migration analyses detect gene flow among populations in the Lower Cascades and Northern California, and a spatial analysis of gene flow identified significant barriers to gene flow across both the Sierra Nevada and Coast Ranges. The distribution of genetic diversity in E. coerulea is uneven, patchy, and interconnected at population boundaries. The biogeographic patterns seen in E. coerulea are consistent with predictions from co-distributed species.
Collapse
Affiliation(s)
- Adam D Leaché
- Department of Biology & Burke Museum of Natural History and Culture, University of Washington, Seattle, WA, United States
| | - Hayden R Davis
- Department of Biology & Burke Museum of Natural History and Culture, University of Washington, Seattle, WA, United States
| | - Chris R Feldman
- Department of Biology and Program in Ecology, Evolution and Conservation Biology, University of Nevada, Reno, NV, United States
| | - Matthew K Fujita
- Department of Biology, The University of Texas at Arlington, Arlington, TX, United States
| | - Sonal Singhal
- Department of Biology, California State University - Dominguez Hills, Carson, CA, United States
| |
Collapse
|
5
|
Thawornwattana Y, Seixas F, Yang Z, Mallet J. Major patterns in the introgression history of Heliconius butterflies. eLife 2023; 12:RP90656. [PMID: 38108819 PMCID: PMC10727504 DOI: 10.7554/elife.90656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023] Open
Abstract
Gene flow between species, although usually deleterious, is an important evolutionary process that can facilitate adaptation and lead to species diversification. It also makes estimation of species relationships difficult. Here, we use the full-likelihood multispecies coalescent (MSC) approach to estimate species phylogeny and major introgression events in Heliconius butterflies from whole-genome sequence data. We obtain a robust estimate of species branching order among major clades in the genus, including the 'melpomene-silvaniform' group, which shows extensive historical and ongoing gene flow. We obtain chromosome-level estimates of key parameters in the species phylogeny, including species divergence times, present-day and ancestral population sizes, as well as the direction, timing, and intensity of gene flow. Our analysis leads to a phylogeny with introgression events that differ from those obtained in previous studies. We find that Heliconius aoede most likely represents the earliest-branching lineage of the genus and that 'silvaniform' species are paraphyletic within the melpomene-silvaniform group. Our phylogeny provides new, parsimonious histories for the origins of key traits in Heliconius, including pollen feeding and an inversion involved in wing pattern mimicry. Our results demonstrate the power and feasibility of the full-likelihood MSC approach for estimating species phylogeny and key population parameters despite extensive gene flow. The methods used here should be useful for analysis of other difficult species groups with high rates of introgression.
Collapse
Affiliation(s)
| | - Fernando Seixas
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
| | - James Mallet
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| |
Collapse
|
6
|
Pereira DS, Hilário S, Gonçalves MFM, Phillips AJL. Diaporthe Species on Palms: Molecular Re-Assessment and Species Boundaries Delimitation in the D. arecae Species Complex. Microorganisms 2023; 11:2717. [PMID: 38004729 PMCID: PMC10673533 DOI: 10.3390/microorganisms11112717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 10/25/2023] [Accepted: 11/03/2023] [Indexed: 11/26/2023] Open
Abstract
Due to cryptic diversification, phenotypic plasticity and host associations, multilocus phylogenetic analyses have become the most important tool in accurately identifying and circumscribing species in the Diaporthe genus. However, the application of the genealogical concordance criterion has often been overlooked, ultimately leading to an exponential increase in novel Diaporthe spp. Due to the large number of species, many lineages remain poorly understood under the so-called species complexes. For this reason, a robust delimitation of the species boundaries in Diaporthe is still an ongoing challenge. Therefore, the present study aimed to resolve the species boundaries of the Diaporthe arecae species complex (DASC) by implementing an integrative taxonomic approach. The Genealogical Phylogenetic Species Recognition (GCPSR) principle revealed incongruences between the individual gene genealogies. Moreover, the Poisson Tree Processes' (PTPs) coalescent-based species delimitation models identified three well-delimited subclades represented by the species D. arecae, D. chiangmaiensis and D. smilacicola. These results evidence that all species previously described in the D. arecae subclade are conspecific, which is coherent with the morphological indistinctiveness observed and the absence of reproductive isolation and barriers to gene flow. Thus, 52 Diaporthe spp. are reduced to synonymy under D. arecae. Recent population expansion and the possibility of incomplete lineage sorting suggested that the D. arecae subclade may be considered as ongoing evolving lineages under active divergence and speciation. Hence, the genetic diversity and intraspecific variability of D. arecae in the context of current global climate change and the role of D. arecae as a pathogen on palm trees and other hosts are also discussed. This study illustrates that species in Diaporthe are highly overestimated, and highlights the relevance of applying an integrative taxonomic approach to accurately circumscribe the species boundaries in the genus Diaporthe.
Collapse
Affiliation(s)
- Diana S. Pereira
- Faculdade de Ciências, Biosystems and Integrative Sciences Institute (BioISI), Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal;
| | - Sandra Hilário
- Interdisciplinary Centre of Marine and Environmental Research (CIIMAR), Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n, 4450-208 Porto, Portugal;
- Faculty of Sciences, Biology Department, University of Porto, Rua do Campo Alegre, Edifício FC4, 4169-007 Porto, Portugal
| | - Micael F. M. Gonçalves
- Faculty of Sciences, Biology Department, University of Porto, Rua do Campo Alegre, Edifício FC4, 4169-007 Porto, Portugal
- Centre for Environmental and Marine Studies, Department of Biology, Campus Universitário de Santiago, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Alan J. L. Phillips
- Faculdade de Ciências, Biosystems and Integrative Sciences Institute (BioISI), Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal;
| |
Collapse
|
7
|
Flouri T, Jiao X, Huang J, Rannala B, Yang Z. Efficient Bayesian inference under the multispecies coalescent with migration. Proc Natl Acad Sci U S A 2023; 120:e2310708120. [PMID: 37871206 PMCID: PMC10622872 DOI: 10.1073/pnas.2310708120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 08/15/2023] [Indexed: 10/25/2023] Open
Abstract
Analyses of genome sequence data have revealed pervasive interspecific gene flow and enriched our understanding of the role of gene flow in speciation and adaptation. Inference of gene flow using genomic data requires powerful statistical methods. Yet current likelihood-based methods involve heavy computation and are feasible for small datasets only. Here, we implement the multispecies-coalescent-with-migration model in the Bayesian program bpp, which can be used to test for gene flow and estimate migration rates, as well as species divergence times and population sizes. We develop Markov chain Monte Carlo algorithms for efficient sampling from the posterior, enabling the analysis of genome-scale datasets with thousands of loci. Implementation of both introgression and migration models in the same program allows us to test whether gene flow occurred continuously over time or in pulses. Analyses of genomic data from Anopheles mosquitoes demonstrate rich information in typical genomic datasets about the mode and rate of gene flow.
Collapse
Affiliation(s)
- Tomáš Flouri
- Department of Genetics, Evolution, and Environment, University College London, LondonWC1E 6BT, United Kingdom
| | - Xiyun Jiao
- Department of Statistics and Data Science, China Southern University of Science and Technology, Shenzhen518055, China
| | - Jun Huang
- Department of Intelligent Medical Engineering, School of Biomedical Engineering, Capital Medical University, Beijing100069, China
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California, Davis, CA95616
| | - Ziheng Yang
- Department of Genetics, Evolution, and Environment, University College London, LondonWC1E 6BT, United Kingdom
| |
Collapse
|
8
|
Laetsch DR, Bisschop G, Martin SH, Aeschbacher S, Setter D, Lohse K. Demographically explicit scans for barriers to gene flow using gIMble. PLoS Genet 2023; 19:e1010999. [PMID: 37816069 PMCID: PMC10610087 DOI: 10.1371/journal.pgen.1010999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/27/2023] [Accepted: 09/25/2023] [Indexed: 10/12/2023] Open
Abstract
Identifying regions of the genome that act as barriers to gene flow between recently diverged taxa has remained challenging given the many evolutionary forces that generate variation in genetic diversity and divergence along the genome, and the stochastic nature of this variation. Progress has been impeded by a conceptual and methodological divide between analyses that infer the demographic history of speciation and genome scans aimed at identifying locally maladaptive alleles i.e. genomic barriers to gene flow. Here we implement genomewide IM blockwise likelihood estimation (gIMble), a composite likelihood approach for the quantification of barriers, that bridges this divide. This analytic framework captures background selection and selection against barriers in a model of isolation with migration (IM) as heterogeneity in effective population size (Ne) and effective migration rate (me), respectively. Variation in both effective demographic parameters is estimated in sliding windows via pre-computed likelihood grids. gIMble includes modules for pre-processing/filtering of genomic data and performing parametric bootstraps using coalescent simulations. To demonstrate the new approach, we analyse data from a well-studied pair of sister species of tropical butterflies with a known history of post-divergence gene flow: Heliconius melpomene and H. cydno. Our analyses uncover both large-effect barrier loci (including well-known wing-pattern genes) and a genome-wide signal of a polygenic barrier architecture.
Collapse
Affiliation(s)
- Dominik R. Laetsch
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Gertjan Bisschop
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Simon H. Martin
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Simon Aeschbacher
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - Derek Setter
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Konrad Lohse
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
9
|
Crossman CA, Fontaine MC, Frasier TR. A comparison of genomic diversity and demographic history of the North Atlantic and Southwest Atlantic southern right whales. Mol Ecol 2023. [PMID: 37577945 DOI: 10.1111/mec.17099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 07/25/2023] [Accepted: 07/31/2023] [Indexed: 08/15/2023]
Abstract
Right whales (genus Eubalaena) were among the first, and most extensively pursued, targets of commercial whaling. However, understanding the impacts of this persecution requires knowledge of the demographic histories of these species prior to exploitation. We used deep whole genome sequencing (~40×) of 12 North Atlantic (E. glacialis) and 10 Southwest Atlantic southern (E. australis) right whales to quantify contemporary levels of genetic diversity and infer their demographic histories over time. Using coalescent- and identity-by-descent-based modelling to estimate ancestral effective population sizes from genomic data, we demonstrate that North Atlantic right whales have lived with smaller effective population sizes (Ne ) than southern right whales in the Southwest Atlantic since their divergence and describe the decline in both populations around the time of whaling. North Atlantic right whales exhibit reduced genetic diversity and longer runs of homozygosity leading to higher inbreeding coefficients compared to the sampled population of southern right whales. This study represents the first comprehensive assessment of genome-wide diversity of right whales in the western Atlantic and underscores the benefits of high coverage, genome-wide datasets to help resolve long-standing questions about how historical changes in effective population size over different time scales shape contemporary diversity estimates. This knowledge is crucial to improve our understanding of the right whales' history and inform our approaches to address contemporary conservation issues. Understanding and quantifying the cumulative impact of long-term small Ne , low levels of diversity and recent inbreeding on North Atlantic right whale recovery will be important next steps.
Collapse
Affiliation(s)
- Carla A Crossman
- Biology Department, Saint Mary's University, Halifax, Nova Scotia, Canada
| | - Michael C Fontaine
- Laboratoire MIVEGEC (Université de Montpellier, CNRS 5290, IRD 224), Montpellier, France
- Groningen Institute for Evolutionary Life Sciences (GELIFES), University of Groningen, Groningen, The Netherlands
| | - Timothy R Frasier
- Biology Department, Saint Mary's University, Halifax, Nova Scotia, Canada
| |
Collapse
|
10
|
Mcguire JA, Huang X, Reilly SB, Iskandar DT, Wang-Claypool CY, Werning S, Chong RA, Lawalata SZS, Stubbs AL, Frederick JH, Brown RM, Evans BJ, Arifin U, Riyanto A, Hamidy A, Arida E, Koo MS, Supriatna J, Andayani N, Hall R. Species Delimitation, Phylogenomics, and Biogeography of Sulawesi Flying Lizards: A Diversification History Complicated by Ancient Hybridization, Cryptic Species, and Arrested Speciation. Syst Biol 2023; 72:885-911. [PMID: 37074804 PMCID: PMC10405571 DOI: 10.1093/sysbio/syad020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 03/14/2023] [Accepted: 04/13/2023] [Indexed: 04/20/2023] Open
Abstract
The biota of Sulawesi is noted for its high degree of endemism and for its substantial levels of in situ biological diversification. While the island's long period of isolation and dynamic tectonic history have been implicated as drivers of the regional diversification, this has rarely been tested in the context of an explicit geological framework. Here, we provide a tectonically informed biogeographical framework that we use to explore the diversification history of Sulawesi flying lizards (the Draco lineatus Group), a radiation that is endemic to Sulawesi and its surrounding islands. We employ a framework for inferring cryptic speciation that involves phylogeographic and genetic clustering analyses as a means of identifying potential species followed by population demographic assessment of divergence-timing and rates of bi-directional migration as means of confirming lineage independence (and thus species status). Using this approach, phylogenetic and population genetic analyses of mitochondrial sequence data obtained for 613 samples, a 50-SNP data set for 370 samples, and a 1249-locus exon-capture data set for 106 samples indicate that the current taxonomy substantially understates the true number of Sulawesi Draco species, that both cryptic and arrested speciations have taken place, and that ancient hybridization confounds phylogenetic analyses that do not explicitly account for reticulation. The Draco lineatus Group appears to comprise 15 species-9 on Sulawesi proper and 6 on peripheral islands. The common ancestor of this group colonized Sulawesi ~11 Ma when proto-Sulawesi was likely composed of two ancestral islands, and began to radiate ~6 Ma as new islands formed and were colonized via overwater dispersal. The enlargement and amalgamation of many of these proto-islands into modern Sulawesi, especially during the past 3 Ma, set in motion dynamic species interactions as once-isolated lineages came into secondary contact, some of which resulted in lineage merger, and others surviving to the present. [Genomics; Indonesia; introgression; mitochondria; phylogenetics; phylogeography; population genetics; reptiles.].
Collapse
Affiliation(s)
- Jimmy A Mcguire
- Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720, USA
- Department of Integrative Biology, University of California, Berkeley, CA 94720, USA
| | - Xiaoting Huang
- College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Qindao, Shandong, 266003, PR China
| | - Sean B Reilly
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA 95060, USA
| | - Djoko T Iskandar
- School of Life Sciences and Technology, Institut Teknologi Bandung, Bandung, Indonesia
| | - Cynthia Y Wang-Claypool
- Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720, USA
- Department of Integrative Biology, University of California, Berkeley, CA 94720, USA
| | - Sarah Werning
- Department of Anatomy, Des Moines University, 3200 Grand Avenue, Des Moines, IA 50312-4198, USA
| | - Rebecca A Chong
- Department of Biology, University of Hawaii at Manoa, Honolulu, HI 96822, USA
| | - Shobi Z S Lawalata
- Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720, USA
- Department of Integrative Biology, University of California, Berkeley, CA 94720, USA
- United in Diversity Foundation, Jalan Hayam Wuruk, Jakarta, Indonesia
| | - Alexander L Stubbs
- Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720, USA
- Department of Integrative Biology, University of California, Berkeley, CA 94720, USA
| | - Jeffrey H Frederick
- Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720, USA
- Department of Integrative Biology, University of California, Berkeley, CA 94720, USA
| | - Rafe M Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, 1345 Jayhawk Blvd., University of Kansas, Lawrence, KS 66045, USA
| | - Ben J Evans
- Biology Department, McMaster University, Hamilton, Ontario, Canada
| | - Umilaela Arifin
- Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720, USA
- School of Life Sciences and Technology, Institut Teknologi Bandung, Bandung, Indonesia
- Center for Taxonomy and Morphology, Zoologisches Museum Hamburg, Leibniz Institute for the Analysis of Biodiversity Change, Martin-Luther-King-Platz 3, R230 20146 Hamburg, Germany
| | - Awal Riyanto
- Laboratory of Herpetology, Museum Zoologicum Bogoriense, Research Center for Biosystematics and Evolution, National Research and Innovation Agency of Indonesia (BRIN), Cibinong 16911, Indonesia
| | - Amir Hamidy
- Laboratory of Herpetology, Museum Zoologicum Bogoriense, Research Center for Biosystematics and Evolution, National Research and Innovation Agency of Indonesia (BRIN), Cibinong 16911, Indonesia
| | - Evy Arida
- Research Center for Applied Zoology, National Research and Innovation Agency of Indonesia (BRIN), Cibinong 16911, Indonesia
| | - Michelle S Koo
- Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720, USA
| | - Jatna Supriatna
- Department of Biology, Institute for Sustainable Earth and Resources (I-SER), Gedung Laboratorium Multidisiplin, and Research Center for Climate Change (RCCC-UI), Gedung Laboratorium Multidisiplin, Faculty of Mathematics and Natural Sciences, Universitas Indonesia, Depok 16424, Indonesia
| | - Noviar Andayani
- Department of Biology, Institute for Sustainable Earth and Resources (I-SER), Gedung Laboratorium Multidisiplin, and Research Center for Climate Change (RCCC-UI), Gedung Laboratorium Multidisiplin, Faculty of Mathematics and Natural Sciences, Universitas Indonesia, Depok 16424, Indonesia
| | - Robert Hall
- SE Asia Research Group (SEARG), Department of Earth Sciences, Royal Holloway University of London, Egham, Surrey TW20 0EX, UK
| |
Collapse
|
11
|
Thawornwattana Y, Huang J, Flouri T, Mallet J, Yang Z. Inferring the Direction of Introgression Using Genomic Sequence Data. Mol Biol Evol 2023; 40:msad178. [PMID: 37552932 PMCID: PMC10439365 DOI: 10.1093/molbev/msad178] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 08/01/2023] [Accepted: 08/02/2023] [Indexed: 08/10/2023] Open
Abstract
Genomic data are informative about the history of species divergence and interspecific gene flow, including the direction, timing, and strength of gene flow. However, gene flow in opposite directions generates similar patterns in multilocus sequence data, such as reduced sequence divergence between the hybridizing species. As a result, inference of the direction of gene flow is challenging. Here, we investigate the information about the direction of gene flow present in genomic sequence data using likelihood-based methods under the multispecies-coalescent-with-introgression model. We analyze the case of two species, and use simulation to examine cases with three or four species. We find that it is easier to infer gene flow from a small population to a large one than in the opposite direction, and easier to infer inflow (gene flow from outgroup species to an ingroup species) than outflow (gene flow from an ingroup species to an outgroup species). It is also easier to infer gene flow if there is a longer time of separate evolution between the initial divergence and subsequent introgression. When introgression is assumed to occur in the wrong direction, the time of introgression tends to be correctly estimated and the Bayesian test of gene flow is often significant, while estimates of introgression probability can be even greater than the true probability. We analyze genomic sequences from Heliconius butterflies to demonstrate that typical genomic datasets are informative about the direction of interspecific gene flow, as well as its timing and strength.
Collapse
Affiliation(s)
| | - Jun Huang
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, P.R. China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - James Mallet
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| |
Collapse
|
12
|
Shi CM, Zhang XS, Liu L, Ji YJ, Zhang DX. Phylogeography of the desert scorpion illuminates a route out of Central Asia. Curr Zool 2023; 69:442-455. [PMID: 37614924 PMCID: PMC10443618 DOI: 10.1093/cz/zoac061] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 07/27/2022] [Indexed: 08/25/2023] Open
Abstract
A comprehensive understanding of phylogeography requires the integration of knowledge across different organisms, ecosystems, and geographic regions. However, a critical knowledge gap exists in the arid biota of the vast Asian drylands. To narrow this gap, here we test an "out-of-Central Asia" hypothesis for the desert scorpion Mesobuthus mongolicus by combining Bayesian phylogeographic reconstruction and ecological niche modeling. Phylogenetic analyses of one mitochondrial and three nuclear loci and molecular dating revealed that M. mongolicus represents a coherent lineage that diverged from its most closely related lineage in Central Asia about 1.36 Ma and underwent radiation ever since. Bayesian phylogeographic reconstruction indicated that the ancestral population dispersed from Central Asia gradually eastward to the Gobi region via the Junggar Basin, suggesting that the Junggar Basin has severed as a corridor for Quaternary faunal exchange between Central Asia and East Asia. Two major dispersal events occurred probably during interglacial periods (around 0.8 and 0.4 Ma, respectively) when climatic conditions were analogous to present-day status, under which the scorpion achieved its maximum distributional range. M. mongolicus underwent demographic expansion during the Last Glacial Maximum, although the predicted distributional areas were smaller than those at present and during the Last Interglacial. Development of desert ecosystems in northwest China incurred by intensified aridification might have opened up empty habitats that sustained population expansion. Our results extend the spatiotemporal dimensions of trans-Eurasia faunal exchange and suggest that species' adaptation is an important determinant of their phylogeographic and demographic responses to climate changes.
Collapse
Affiliation(s)
- Cheng-Min Shi
- State Key Laboratory of North China Crop Improvement and Regulation, College of Plant Protection, Hebei Agricultural University, Baoding 071001, China
| | - Xue-Shu Zhang
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Lin Liu
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Ya-Jie Ji
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - De-Xing Zhang
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, 19 Yuquan Road, Beijing 100049, China
| |
Collapse
|
13
|
Ji J, Jackson DJ, Leaché AD, Yang Z. Power of Bayesian and Heuristic Tests to Detect Cross-Species Introgression with Reference to Gene Flow in the Tamias quadrivittatus Group of North American Chipmunks. Syst Biol 2023; 72:446-465. [PMID: 36504374 PMCID: PMC10275556 DOI: 10.1093/sysbio/syac077] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 11/15/2022] [Accepted: 12/01/2022] [Indexed: 10/25/2023] Open
Abstract
In the past two decades, genomic data have been widely used to detect historical gene flow between species in a variety of plants and animals. The Tamias quadrivittatus group of North America chipmunks, which originated through a series of rapid speciation events, are known to undergo massive amounts of mitochondrial introgression. Yet in a recent analysis of targeted nuclear loci from the group, no evidence for cross-species introgression was detected, indicating widespread cytonuclear discordance. The study used the heuristic method HYDE to detect gene flow, which may suffer from low power. Here we use the Bayesian method implemented in the program BPP to re-analyze these data. We develop a Bayesian test of introgression, calculating the Bayes factor via the Savage-Dickey density ratio using the Markov chain Monte Carlo (MCMC) sample under the model of introgression. We take a stepwise approach to constructing an introgression model by adding introgression events onto a well-supported binary species tree. The analysis detected robust evidence for multiple ancient introgression events affecting the nuclear genome, with introgression probabilities reaching 63%. We estimate population parameters and highlight the fact that species divergence times may be seriously underestimated if ancient cross-species gene flow is ignored in the analysis. We examine the assumptions and performance of HYDE and demonstrate that it lacks power if gene flow occurs between sister lineages or if the mode of gene flow does not match the assumed hybrid-speciation model with symmetrical population sizes. Our analyses highlight the power of likelihood-based inference of cross-species gene flow using genomic sequence data. [Bayesian test; BPP; chipmunks; introgression; MSci; multispecies coalescent; Savage-Dickey density ratio.].
Collapse
Affiliation(s)
- Jiayi Ji
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Donavan J Jackson
- Department of Biology and Burke Museum of Natural History and Culture, University of Washington, Box 351800, Seattle, WA 98195-1800, USA
| | - Adam D Leaché
- Department of Biology and Burke Museum of Natural History and Culture, University of Washington, Box 351800, Seattle, WA 98195-1800, USA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| |
Collapse
|
14
|
Sun QH, Morales-Briones DF, Wang HX, Landis JB, Wen J, Wang HF. Target sequence capture data shed light on the deeper evolutionary relationships of subgenus Chamaecerasus in Lonicera (Caprifoliaceae). Mol Phylogenet Evol 2023; 184:107808. [PMID: 37156329 DOI: 10.1016/j.ympev.2023.107808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 04/29/2023] [Accepted: 05/01/2023] [Indexed: 05/10/2023]
Abstract
The genus Lonicera L. is widely distributed in the north temperate zone and is well-known for its high species richness and morphological diversity. Previous studies have suggested that many sections of Lonicera are not monophyletic and phylogenetic relationships within the genus are still poorly resolved. In this study, we sampled 37 accessions of Lonicera, covering four sections of subgenus Chamaecerasus plus six outgroup taxa, to recover the main clades of Lonicera based on sequences of nuclear loci generated by target enrichment and cpDNA from genome skimming. We found extensive cytonuclear discordance across the subgenus. Both nuclear and plastid phylogenetic analyses supported subgenus Chamaecerasus sister to subgenus Lonicera. Within subgenus Chamaecerasus, sections Isika and Nintooa were each polyphyletic. Based on the nuclear and chloroplast phylogenies, we propose to merge Lonicera korolkowii into section Coeloxylosteum and Lonicera caerulea into section Nintooa. In addition, Lonicera is estimated to have originated in the mid Oligocene (26.45 Ma). The stem age of section Nintooa was estimated to be 17.09 Ma (95% HPD: 13.30-24.45). The stem age of subgenus Lonicera was estimated to be 16.35 Ma (95% HPD: 14.12-23.66). Ancestral area reconstruction analyses indicate that subgenus Chamaecerasus originated in East Asia and Central Asia. In addition, sections Coeloxylosteum and Nintooa originated in East Asia, with subsequent dispersals into other areas. The aridification of the Asian interior likely promoted the rapid radiation of sections Coeloxylosteum and Nintooa within this region. Moreover, our biogeographic analysis fully supports the Bering and the North Atlantic Land Bridge hypotheses for the intercontinental migrations in the Northern Hemisphere. Overall, this study provides new insights into the taxonomically complex lineages of subgenus Chamaecerasus and the process of speciation.
Collapse
Affiliation(s)
- Qing-Hui Sun
- Sanya Nanfan Research Institute of Hainan University, Hainan Yazhou Bay Seed Laboratory, Sanya 572025, China; School of Tropical Medicine, Hainan Medical University, Haikou, Hainan, 571199, China
| | - Diego F Morales-Briones
- Department of Plant and Microbial Biology, College of Biological Sciences, University of Minnesota, 140 Gortner Laboratory, 1479 Gortner Avenue, Saint Paul, MN 55108, USA; Systematics, Biodiversity and Evolution of Plants, Department of Biology I, Ludwig-Maximilians-Universität München, Menzinger Str. 67, 80638, Munich, Germany
| | - Hong-Xin Wang
- Sanya Nanfan Research Institute of Hainan University, Hainan Yazhou Bay Seed Laboratory, Sanya 572025, China; Zhai Mingguo Academician Work Station, Sanya University, Sanya 572022, China
| | - Jacob B Landis
- School of Integrative Plant Science, Section of Plant Biology and the L.H. Bailey Hortorium, Cornell University, Ithaca, NY 14850, USA; BTI Computational Biology Center, Boyce Thompson Institute, Ithaca, NY 14853, USA
| | - Jun Wen
- Department of Botany, National Museum of Natural History, MRC-166, Smithsonian Institution, PO Box 37012, Washington, DC 20013-7012, USA
| | - Hua-Feng Wang
- Sanya Nanfan Research Institute of Hainan University, Hainan Yazhou Bay Seed Laboratory, Sanya 572025, China; Key Laboratory of Tropical Biological Resources of Ministry of Education, College of Tropical Crops, Hainan University, Haikou 570228, China.
| |
Collapse
|
15
|
Huang J, Thawornwattana Y, Flouri T, Mallet J, Yang Z. Inference of Gene Flow between Species under Misspecified Models. Mol Biol Evol 2022; 39:6783212. [PMID: 36317198 PMCID: PMC9729068 DOI: 10.1093/molbev/msac237] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Genomic sequence data provide a rich source of information about the history of species divergence and interspecific hybridization or introgression. Despite recent advances in genomics and statistical methods, it remains challenging to infer gene flow, and as a result, one may have to estimate introgression rates and times under misspecified models. Here we use mathematical analysis and computer simulation to examine estimation bias and issues of interpretation when the model of gene flow is misspecified in analysis of genomic datasets, for example, if introgression is assigned to the wrong lineages. In the case of two species, we establish a correspondence between the migration rate in the continuous migration model and the introgression probability in the introgression model. When gene flow occurs continuously through time but in the analysis is assumed to occur at a fixed time point, common evolutionary parameters such as species divergence times are surprisingly well estimated. However, the time of introgression tends to be estimated towards the recent end of the period of continuous gene flow. When introgression events are assigned incorrectly to the parental or daughter lineages, introgression times tend to collapse onto species divergence times, with introgression probabilities underestimated. Overall, our analyses suggest that the simple introgression model is useful for extracting information concerning between-specific gene flow and divergence even when the model may be misspecified. However, for reliable inference of gene flow it is important to include multiple samples per species, in particular, from hybridizing species.
Collapse
Affiliation(s)
| | | | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, United Kingdom
| | - James Mallet
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138
| | | |
Collapse
|
16
|
Flouri T, Huang J, Jiao X, Kapli P, Rannala B, Yang Z. Bayesian phylogenetic inference using relaxed-clocks and the multispecies coalescent. Mol Biol Evol 2022; 39:6652437. [PMID: 35907248 PMCID: PMC9366188 DOI: 10.1093/molbev/msac161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The multispecies coalescent (MSC) model accommodates both species divergences and within-species coalescent and provides a natural framework for phylogenetic analysis of genomic data when the gene trees vary across the genome. The MSC model implemented in the program bpp assumes a molecular clock and the Jukes–Cantor model, and is suitable for analyzing genomic data from closely related species. Here we extend our implementation to more general substitution models and relaxed clocks to allow the rate to vary among species. The MSC-with-relaxed-clock model allows the estimation of species divergence times and ancestral population sizes using genomic sequences sampled from contemporary species when the strict clock assumption is violated, and provides a simulation framework for evaluating species tree estimation methods. We conducted simulations and analyzed two real datasets to evaluate the utility of the new models. We confirm that the clock-JC model is adequate for inference of shallow trees with closely related species, but it is important to account for clock violation for distant species. Our simulation suggests that there is valuable phylogenetic information in the gene-tree branch lengths even if the molecular clock assumption is seriously violated, and the relaxed-clock models implemented in bpp are able to extract such information. Our Markov chain Monte Carlo algorithms suffer from mixing problems when used for species tree estimation under the relaxed clock and we discuss possible improvements. We conclude that the new models are currently most effective for estimating population parameters such as species divergence times when the species tree is fixed.
Collapse
Affiliation(s)
- Tomáš Flouri
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Jun Huang
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK.,School of Biomedical Engineering, Capital Medical University, Beijing, 100069, China
| | - Xiyun Jiao
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK.,Department of Statistics and Data Science, China Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | - Paschalia Kapli
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - Ziheng Yang
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
17
|
Pang XX, Zhang DY. Impact of Ghost Introgression on Coalescent-based Species Tree Inference and Estimation of Divergence Time. Syst Biol 2022; 72:35-49. [PMID: 35799362 DOI: 10.1093/sysbio/syac047] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 06/25/2022] [Accepted: 07/05/2022] [Indexed: 11/15/2022] Open
Abstract
The species studied in any evolutionary investigation generally constitute a small proportion of all the species currently existing or that have gone extinct. It is therefore likely that introgression, which is widespread across the tree of life, involves "ghosts," i.e., unsampled, unknown, or extinct lineages. However, the impact of ghost introgression on estimations of species trees has rarely been studied and is poorly understood. Here, we use mathematical analysis and simulations to examine the robustness of species tree methods based on the multispecies coalescent model to introgression from a ghost or extant lineage. We found that many results originally obtained for introgression between extant species can easily be extended to ghost introgression, such as the strongly interactive effects of incomplete lineage sorting (ILS) and introgression on the occurrence of anomalous gene trees (AGTs). The relative performance of the summary species tree method (ASTRAL) and the full-likelihood method (*BEAST) varies under different introgression scenarios, with the former being more robust to gene flow between non-sister species whereas the latter performing better under certain conditions of ghost introgression. When an outgroup ghost (defined as a lineage that diverged before the most basal species under investigation) acts as the donor of the introgressed genes, the time of root divergence among the investigated species generally was overestimated, whereas ingroup introgression, as commonly perceived, can only lead to underestimation. In many cases of ingroup introgression that may or may not involve ghost lineages, the stronger the ILS, the higher the accuracy achieved in estimating the time of root divergence, although the topology of the species tree is more prone to be biased by the effect of introgression.
Collapse
Affiliation(s)
- Xiao-Xu Pang
- State Key Laboratory of Earth Surface Processes and Resource Ecology and Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Da-Yong Zhang
- State Key Laboratory of Earth Surface Processes and Resource Ecology and Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
18
|
Kong S, Pons JC, Kubatko L, Wicke K. Classes of explicit phylogenetic networks and their biological and mathematical significance. J Math Biol 2022; 84:47. [PMID: 35503141 DOI: 10.1007/s00285-022-01746-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/18/2022] [Accepted: 03/31/2022] [Indexed: 11/24/2022]
Abstract
The evolutionary relationships among organisms have traditionally been represented using rooted phylogenetic trees. However, due to reticulate processes such as hybridization or lateral gene transfer, evolution cannot always be adequately represented by a phylogenetic tree, and rooted phylogenetic networks that describe such complex processes have been introduced as a generalization of rooted phylogenetic trees. In fact, estimating rooted phylogenetic networks from genomic sequence data and analyzing their structural properties is one of the most important tasks in contemporary phylogenetics. Over the last two decades, several subclasses of rooted phylogenetic networks (characterized by certain structural constraints) have been introduced in the literature, either to model specific biological phenomena or to enable tractable mathematical and computational analyses. In the present manuscript, we provide a thorough review of these network classes, as well as provide a biological interpretation of the structural constraints underlying these networks where possible. In addition, we discuss how imposing structural constraints on the network topology can be used to address the scalability and identifiability challenges faced in the estimation of phylogenetic networks from empirical data.
Collapse
Affiliation(s)
- Sungsik Kong
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA
| | - Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, 07122, Spain
| | - Laura Kubatko
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA.,Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Kristina Wicke
- Department of Mathematics, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
19
|
Yang Z, Flouri T. Estimation of Cross-Species Introgression Rates using Genomic Data Despite Model Unidentifiability. Mol Biol Evol 2022; 39:6568285. [PMID: 35417543 PMCID: PMC9087891 DOI: 10.1093/molbev/msac083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Full-likelihood implementations of the multispecies coalescent with introgression (MSci) model treat genealogical fluctuations across the genome as a major source of information to infer the history of species divergence and gene flow using multilocus sequence data. However, MSci models are known to have unidentifiability issues, whereby different models or parameters make the same predictions about the data and cannot be distinguished by the data. Previous studies of unidentifiability have focused on heuristic methods based on gene trees and do not make an efficient use of the information in the data. Here we study the unidentifiability of MSci models under the full-likelihood methods. We characterize the unidentifiability of the bidirectional introgression (BDI) model, which assumes that gene flow occurs in both directions. We derive simple rules for arbitrary BDI models, which create unidentifiability of the label-switching type. In general, an MSci model with k BDI events has 2k unidentifiable modes or towers in the posterior, with each BDI event between sister species creating within-model parameter unidentifiability and each BDI event between nonsister species creating between-model unidentifiability. We develop novel algorithms for processing Markov chain Monte Carlo samples to remove label-switching problems and implement them in the bpp program. We analyze real and synthetic data to illustrate the utility of the BDI models and the new algorithms. We discuss the unidentifiability of heuristic methods and provide guidelines for the use of MSci models to infer gene flow using genomic data.
Collapse
Affiliation(s)
- Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E6BT, UK
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E6BT, UK
| |
Collapse
|
20
|
Zhu T, Flouri T, Yang Z. A simulation study to examine the impact of recombination on phylogenomic inferences under the multispecies coalescent model. Mol Ecol 2022; 31:2814-2829. [PMID: 35313033 PMCID: PMC9321900 DOI: 10.1111/mec.16433] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 01/25/2022] [Accepted: 02/28/2022] [Indexed: 11/28/2022]
Affiliation(s)
- Tianqi Zhu
- Institute of Applied Mathematics Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing 100190 China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences Beijing 100190 China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| |
Collapse
|
21
|
Thawornwattana Y, Seixas FA, Yang Z, Mallet J. OUP accepted manuscript. Syst Biol 2022; 71:1159-1177. [PMID: 35169847 PMCID: PMC9366460 DOI: 10.1093/sysbio/syac009] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 02/01/2022] [Accepted: 02/08/2022] [Indexed: 11/21/2022] Open
Abstract
Introgressive hybridization plays a key role in adaptive evolution and species diversification in many groups of species. However, frequent hybridization and gene flow between species make estimation of the species phylogeny and key population parameters challenging. Here, we show that by accounting for phasing and using full-likelihood methods, introgression histories and population parameters can be estimated reliably from whole-genome sequence data. We employ the multispecies coalescent (MSC) model with and without gene flow to infer the species phylogeny and cross-species introgression events using genomic data from six members of the erato-sara clade of Heliconius butterflies. The methods naturally accommodate random fluctuations in genealogical history across the genome due to deep coalescence. To avoid heterozygote phasing errors in haploid sequences commonly produced by genome assembly methods, we process and compile unphased diploid sequence alignments and use analytical methods to average over uncertainties in heterozygote phase resolution. There is robust evidence for introgression across the genome, both among distantly related species deep in the phylogeny and between sister species in shallow parts of the tree. We obtain chromosome-specific estimates of key population parameters such as introgression directions, times and probabilities, as well as species divergence times and population sizes for modern and ancestral species. We confirm ancestral gene flow between the sara clade and an ancestral population of Heliconius telesiphe, a likely hybrid speciation origin for Heliconius hecalesia, and gene flow between the sister species Heliconius erato and Heliconius himera. Inferred introgression among ancestral species also explains the history of two chromosomal inversions deep in the phylogeny of the group. This study illustrates how a full-likelihood approach based on the MSC makes it possible to extract rich historical information of species divergence and gene flow from genomic data. [3s; bpp; gene flow; Heliconius; hybrid speciation; introgression; inversion; multispecies coalescent]
Collapse
Affiliation(s)
- Yuttapong Thawornwattana
- Correspondence to be sent to: Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA; E-mail: ; (Y.T. and J.M.); Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK; E-mail: (Z.Y.)
| | - Fernando A Seixas
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Ziheng Yang
- Correspondence to be sent to: Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA; E-mail: ; (Y.T. and J.M.); Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK; E-mail: (Z.Y.)
| | - James Mallet
- Correspondence to be sent to: Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA; E-mail: ; (Y.T. and J.M.); Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK; E-mail: (Z.Y.)
| |
Collapse
|
22
|
Hibbins MS, Hahn MW. Phylogenomic approaches to detecting and characterizing introgression. Genetics 2021; 220:6425633. [PMID: 34788444 PMCID: PMC9208645 DOI: 10.1093/genetics/iyab173] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 10/02/2021] [Indexed: 12/26/2022] Open
Abstract
Phylogenomics has revealed the remarkable frequency with which introgression occurs across the tree of life. These discoveries have been enabled by the rapid growth of methods designed to detect and characterize introgression from whole-genome sequencing data. A large class of phylogenomic methods makes use of data across species to infer and characterize introgression based on expectations from the multispecies coalescent. These methods range from simple tests, such as the D-statistic, to model-based approaches for inferring phylogenetic networks. Here, we provide a detailed overview of the various signals that different modes of introgression are expected leave in the genome, and how current methods are designed to detect them. We discuss the strengths and pitfalls of these approaches and identify areas for future development, highlighting the different signals of introgression, and the power of each method to detect them. We conclude with a discussion of current challenges in inferring introgression and how they could potentially be addressed.
Collapse
Affiliation(s)
- Mark S Hibbins
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Matthew W Hahn
- Department of Biology, Indiana University, Bloomington, IN 47405, USA.,Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|